SlideShare a Scribd company logo
1 of 44
Download to read offline
ML-IR Discussion:
Bag of Little
Bootstrap (BLB)
Recap:
- Recap
- Why bootstrap
- What is bootstrap
- Bag of Little Bootstrap (BLB)
- Guarantees
- Examples
Recap:
Population

Our Sample
Estimate the median!
Estimate the median!
Asymptotic Approach
Theory has it:
Asymptotic Approach
Theory has it:

?
Asymptotic Approach

95%
Confidence Interval
Problems with the asymptotic
Approach:

- Density “f” is hard to estimate
- Sample size demand is much larger than the mean for
Central Limit theorem to kick in
- True median unknown
Solution:
When theory is too hard…
Let’s empirically estimate
theoretical truth!
Empirical Approach: Ideal
Population

Sample Over and
Over again!
Empirical Approach: Ideal
Population

Sample Over and
Over again!

Median Est 1

Median Est 2
Empirical Approach: Ideal
Empirical Approach: Ideal
95% of sample medians
Similar
Enough?
Population

Our Sample
Empirical Approach: Bootstrap
Efron Tibshirani (1993)
Our Sample

Draw with replacement
n samples

Median Est* 1

Median Est* 2
Empirical Approach: Bootstrap
Empirical Approach: Bootstrap
95% of sample medians
Empirical Approach: Bootstrap
Used for:
- Bias estimation
- Variance
- Confidence intervals
Main benefits:
- Automatic
- Flexible
- Fast convergence (Hall, 1992)
Key: There are 3 distributions
Population
Key: There are 3 distributions
Population

Approximate
distribution
Actual Sample
Key: There are 3 distributions
Population

Approximate
distribution
Actual Sample

Approximate
distribution

Bootstrap Samples
Key: There are 3 distributions
Population

Approximate
distribution
Actual Sample

Approximate
distribution

Bootstrap Samples

Approximate
the approximation
- Is there bias?
- What’s the variance?
- etc.
No free meals:
- Bootstrapping requires re-sampling the entire
population B times
- Each sample is size n
- Sampling m < n will violate the sample size
properties
- Original sample size cannot be too small
- “Pre-asymptopia” cases
Hope
-

Resample expects .632n unique samples
Sample less – m out of n bootstrap is possible with
analytical adjustments. (Bickel 1997)
Hope
-

Resample expects .632n unique samples
Sample less – m out of n bootstrap is possible with
analytical adjustments. (Bickel 1997)

Intuition: Need less than all n values for each bootstrap.
Hope
-

Resample expects .632n unique samples
Sample less – m out of n bootstrap is possible with
analytical adjustments. (Bickel 1997)

Intuition: Need less than all n values for each bootstrap.

Problem:
- Analytical adjustment is not as automatic as desirable
- m out of n bootstrap is sensitive to choices of m
Bag of Little Bootstrap
-

Sample without
replacement the
sample s times into
sizes of b
Bag of Little Bootstrap
-

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
Bag of Little Bootstrap
-

Med 1

Med r

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
- Compute the
median for each
Bag of Little Bootstrap
-

Med 1

Med r

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
- Compute the
median for each
- Compute the
confidence interval
for each
Bag of Little Bootstrap
-

Med 1

Med r

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
- Compute the
median for each
- Compute the
confidence interval
for each
Bag of Little Bootstrap
-

Med 1

Med r

-

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
- Compute the
median for each
- Compute the
confidence interval
for each
Take average of each
upper and lower point
for the confidence
interval
Bag of Little Bootstrap
Klein et al. 2012
Computational Gains:
- Each sample only has b unique values!
- Can sample a b-dimensional multinomial
with n trials.
- Scales in b instead of n
- Easily parallelizable
Bag of Little Bootstrap
Klein et al. 2012
Computational Gains:
- Each sample only has b unique values!
- Can sample a b-dimensional multinomial
with n trials.
- Scales in b instead of n
- Easily parallelizable
If b=n^(0.6), a dataset of size 1TB:
- Bootstrap storage demands ~ 632GB
- BLB storage demands ~ 4GB
Bag of Little Bootstrap
Theoretical guarantees:
- Consistency
- Higher order correctness
- Fast convergence rate (same as bootstrap)
Performance
b = n^(gamma), 0.5<= gamma <=1
These choices of gamma ensures bootstrap convergence rates.
Performance
b = n^(gamma), 0.5<= gamma <=1
These choices of gamma ensures bootstrap convergence rates.
Relative error of confidence interval width of logistic regression
coefficients
(Klein et al. 2012)
Performance
b = n^(gamma), 0.5<= gamma <=1
These choices of gamma ensures bootstrap convergence rates.
Relative error of confidence interval width of logistic regression
coefficients
(Klein et al. 2012)

Gamma residuals

t-distr residuals
Performance vs Time
Selecting Hyperparameters
• b, the number of unique samples for each little bootstrap
• s, the number of size b samples w/o replacement
• r, the number of multinomials to draw
Selecting Hyperparameters
• b, the number of unique samples for each little bootstrap
• s, the number of size b samples w/o replacement
• r, the number of multinomials to draw

b: the larger the better
s, r: adaptively increase this until a convergence
has been reached. (Median doesn’t change)
Bag of Little Bootstrap
Main benefits:
- Computationally friendly
- Maintains most statistical properties of bootstrap
- Flexibility
- More robust to choice of b than older methods
Reference
• Efron, Tibshirani (1993) An Introduction to the Bootstrap
• Kleiner et al. (2012) A Scalable Bootstrap for Massive Data

Thanks!

More Related Content

What's hot

Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learningStanley Wang
 
Optimization Shuffled Frog Leaping Algorithm
Optimization Shuffled Frog Leaping AlgorithmOptimization Shuffled Frog Leaping Algorithm
Optimization Shuffled Frog Leaping AlgorithmUday Wankar
 
Modeling and Solving Scheduling Problems with CP Optimizer
Modeling and Solving Scheduling Problems with CP OptimizerModeling and Solving Scheduling Problems with CP Optimizer
Modeling and Solving Scheduling Problems with CP OptimizerPhilippe Laborie
 
Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Christian Robert
 
Particle swarm optimization
Particle swarm optimizationParticle swarm optimization
Particle swarm optimizationAbhishek Agrawal
 
Ant Colony Optimization: The Algorithm and Its Applications
Ant Colony Optimization: The Algorithm and Its ApplicationsAnt Colony Optimization: The Algorithm and Its Applications
Ant Colony Optimization: The Algorithm and Its Applicationsadil raja
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methodsChristian Robert
 
Hidden Markov Model paper presentation
Hidden Markov Model paper presentationHidden Markov Model paper presentation
Hidden Markov Model paper presentationShiraz316
 
Information retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of wordsInformation retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of wordsVaibhav Khanna
 
Introdução e estatísticas descritivas
Introdução e estatísticas descritivasIntrodução e estatísticas descritivas
Introdução e estatísticas descritivasFelipe Pontes
 
Introduction to Genetic Algorithms
Introduction to Genetic AlgorithmsIntroduction to Genetic Algorithms
Introduction to Genetic AlgorithmsAhmed Othman
 
Hoofdstuk 2 - Neerslagtitraties
Hoofdstuk 2 - Neerslagtitraties Hoofdstuk 2 - Neerslagtitraties
Hoofdstuk 2 - Neerslagtitraties Tom Mortier
 

What's hot (13)

Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
 
Chap8 slides
Chap8 slidesChap8 slides
Chap8 slides
 
Optimization Shuffled Frog Leaping Algorithm
Optimization Shuffled Frog Leaping AlgorithmOptimization Shuffled Frog Leaping Algorithm
Optimization Shuffled Frog Leaping Algorithm
 
Modeling and Solving Scheduling Problems with CP Optimizer
Modeling and Solving Scheduling Problems with CP OptimizerModeling and Solving Scheduling Problems with CP Optimizer
Modeling and Solving Scheduling Problems with CP Optimizer
 
Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)
 
Particle swarm optimization
Particle swarm optimizationParticle swarm optimization
Particle swarm optimization
 
Ant Colony Optimization: The Algorithm and Its Applications
Ant Colony Optimization: The Algorithm and Its ApplicationsAnt Colony Optimization: The Algorithm and Its Applications
Ant Colony Optimization: The Algorithm and Its Applications
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methods
 
Hidden Markov Model paper presentation
Hidden Markov Model paper presentationHidden Markov Model paper presentation
Hidden Markov Model paper presentation
 
Information retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of wordsInformation retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of words
 
Introdução e estatísticas descritivas
Introdução e estatísticas descritivasIntrodução e estatísticas descritivas
Introdução e estatísticas descritivas
 
Introduction to Genetic Algorithms
Introduction to Genetic AlgorithmsIntroduction to Genetic Algorithms
Introduction to Genetic Algorithms
 
Hoofdstuk 2 - Neerslagtitraties
Hoofdstuk 2 - Neerslagtitraties Hoofdstuk 2 - Neerslagtitraties
Hoofdstuk 2 - Neerslagtitraties
 

Similar to Introduction to Bag of Little Bootstrap

CHAPTER 3 EXERCISES (Set 2)
CHAPTER 3 EXERCISES (Set 2)CHAPTER 3 EXERCISES (Set 2)
CHAPTER 3 EXERCISES (Set 2)DleenBrowns
 
Genetic Algorithms-1.ppt
Genetic Algorithms-1.pptGenetic Algorithms-1.ppt
Genetic Algorithms-1.pptDrSanjeevPunia
 
NOTE: All requested Minitab output must be copied into your paper.
NOTE:  All requested Minitab output must be copied into your paper.NOTE:  All requested Minitab output must be copied into your paper.
NOTE: All requested Minitab output must be copied into your paper.DleenBrowns
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability DistributionsHarish Lunani
 
Chapter 8 review
Chapter 8 reviewChapter 8 review
Chapter 8 reviewdrahkos1
 
Andrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and toolsAndrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and toolsLviv Startup Club
 
Using this example, describe in your own words how you would go abou.pdf
Using this example, describe in your own words how you would go abou.pdfUsing this example, describe in your own words how you would go abou.pdf
Using this example, describe in your own words how you would go abou.pdfanandastores
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 MLconf
 
5. sampling design
5. sampling design5. sampling design
5. sampling designkbhupadhoj
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knimeGreg Landrum
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsDarius Barušauskas
 
randomization approach in case-based reasoning: case of study of mammography ...
randomization approach in case-based reasoning: case of study of mammography ...randomization approach in case-based reasoning: case of study of mammography ...
randomization approach in case-based reasoning: case of study of mammography ...Miled Basma Bentaiba
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 MLconf
 
regression.pptx
regression.pptxregression.pptx
regression.pptxaneeshs28
 

Similar to Introduction to Bag of Little Bootstrap (20)

Bootstrap.ppt
Bootstrap.pptBootstrap.ppt
Bootstrap.ppt
 
CHAPTER 3 EXERCISES (Set 2)
CHAPTER 3 EXERCISES (Set 2)CHAPTER 3 EXERCISES (Set 2)
CHAPTER 3 EXERCISES (Set 2)
 
Genetic Algorithms-1.ppt
Genetic Algorithms-1.pptGenetic Algorithms-1.ppt
Genetic Algorithms-1.ppt
 
NOTE: All requested Minitab output must be copied into your paper.
NOTE:  All requested Minitab output must be copied into your paper.NOTE:  All requested Minitab output must be copied into your paper.
NOTE: All requested Minitab output must be copied into your paper.
 
Stats chapter 9
Stats chapter 9Stats chapter 9
Stats chapter 9
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability Distributions
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Chapter 8 review
Chapter 8 reviewChapter 8 review
Chapter 8 review
 
Andrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and toolsAndrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and tools
 
Model selection
Model selectionModel selection
Model selection
 
Using this example, describe in your own words how you would go abou.pdf
Using this example, describe in your own words how you would go abou.pdfUsing this example, describe in your own words how you would go abou.pdf
Using this example, describe in your own words how you would go abou.pdf
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
 
5. sampling design
5. sampling design5. sampling design
5. sampling design
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
 
randomization approach in case-based reasoning: case of study of mammography ...
randomization approach in case-based reasoning: case of study of mammography ...randomization approach in case-based reasoning: case of study of mammography ...
randomization approach in case-based reasoning: case of study of mammography ...
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
 

More from Wayne Lee

Feature selection can hurt model inference
Feature selection can hurt model inferenceFeature selection can hurt model inference
Feature selection can hurt model inferenceWayne Lee
 
Explaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for StatisticiansExplaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for StatisticiansWayne Lee
 
What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?Wayne Lee
 
R merge-tutorial
R merge-tutorialR merge-tutorial
R merge-tutorialWayne Lee
 
The Key to Blind Dates - Data Snooping
The Key to Blind Dates - Data SnoopingThe Key to Blind Dates - Data Snooping
The Key to Blind Dates - Data SnoopingWayne Lee
 
Crash Course in A/B testing
Crash Course in A/B testingCrash Course in A/B testing
Crash Course in A/B testingWayne Lee
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's TutorialWayne Lee
 

More from Wayne Lee (7)

Feature selection can hurt model inference
Feature selection can hurt model inferenceFeature selection can hurt model inference
Feature selection can hurt model inference
 
Explaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for StatisticiansExplaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for Statisticians
 
What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?
 
R merge-tutorial
R merge-tutorialR merge-tutorial
R merge-tutorial
 
The Key to Blind Dates - Data Snooping
The Key to Blind Dates - Data SnoopingThe Key to Blind Dates - Data Snooping
The Key to Blind Dates - Data Snooping
 
Crash Course in A/B testing
Crash Course in A/B testingCrash Course in A/B testing
Crash Course in A/B testing
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's Tutorial
 

Recently uploaded

Presentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphPresentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphNetziValdelomar1
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxraviapr7
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxiammrhaywood
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxraviapr7
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxheathfieldcps1
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 
Philosophy of Education and Educational Philosophy
Philosophy of Education  and Educational PhilosophyPhilosophy of Education  and Educational Philosophy
Philosophy of Education and Educational PhilosophyShuvankar Madhu
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?TechSoup
 
Human-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesHuman-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesMohammad Hassany
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17Celine George
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.raviapr7
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfMohonDas
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17Celine George
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptxmary850239
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17Celine George
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesCeline George
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptxSandy Millin
 

Recently uploaded (20)

Presentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphPresentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a Paragraph
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptx
 
Finals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quizFinals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quiz
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 
Philosophy of Education and Educational Philosophy
Philosophy of Education  and Educational PhilosophyPhilosophy of Education  and Educational Philosophy
Philosophy of Education and Educational Philosophy
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?
 
Human-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesHuman-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming Classes
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdf
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptx
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 Sales
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
 

Introduction to Bag of Little Bootstrap

  • 1. ML-IR Discussion: Bag of Little Bootstrap (BLB)
  • 2. Recap: - Recap - Why bootstrap - What is bootstrap - Bag of Little Bootstrap (BLB) - Guarantees - Examples
  • 9. Problems with the asymptotic Approach: - Density “f” is hard to estimate - Sample size demand is much larger than the mean for Central Limit theorem to kick in - True median unknown
  • 10. Solution: When theory is too hard… Let’s empirically estimate theoretical truth!
  • 12. Empirical Approach: Ideal Population Sample Over and Over again! Median Est 1 Median Est 2
  • 14. Empirical Approach: Ideal 95% of sample medians
  • 16. Empirical Approach: Bootstrap Efron Tibshirani (1993) Our Sample Draw with replacement n samples Median Est* 1 Median Est* 2
  • 19. Empirical Approach: Bootstrap Used for: - Bias estimation - Variance - Confidence intervals Main benefits: - Automatic - Flexible - Fast convergence (Hall, 1992)
  • 20. Key: There are 3 distributions Population
  • 21. Key: There are 3 distributions Population Approximate distribution Actual Sample
  • 22. Key: There are 3 distributions Population Approximate distribution Actual Sample Approximate distribution Bootstrap Samples
  • 23. Key: There are 3 distributions Population Approximate distribution Actual Sample Approximate distribution Bootstrap Samples Approximate the approximation - Is there bias? - What’s the variance? - etc.
  • 24. No free meals: - Bootstrapping requires re-sampling the entire population B times - Each sample is size n - Sampling m < n will violate the sample size properties - Original sample size cannot be too small - “Pre-asymptopia” cases
  • 25. Hope - Resample expects .632n unique samples Sample less – m out of n bootstrap is possible with analytical adjustments. (Bickel 1997)
  • 26. Hope - Resample expects .632n unique samples Sample less – m out of n bootstrap is possible with analytical adjustments. (Bickel 1997) Intuition: Need less than all n values for each bootstrap.
  • 27. Hope - Resample expects .632n unique samples Sample less – m out of n bootstrap is possible with analytical adjustments. (Bickel 1997) Intuition: Need less than all n values for each bootstrap. Problem: - Analytical adjustment is not as automatic as desirable - m out of n bootstrap is sensitive to choices of m
  • 28. Bag of Little Bootstrap - Sample without replacement the sample s times into sizes of b
  • 29. Bag of Little Bootstrap - Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times.
  • 30. Bag of Little Bootstrap - Med 1 Med r Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times. - Compute the median for each
  • 31. Bag of Little Bootstrap - Med 1 Med r Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times. - Compute the median for each - Compute the confidence interval for each
  • 32. Bag of Little Bootstrap - Med 1 Med r Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times. - Compute the median for each - Compute the confidence interval for each
  • 33. Bag of Little Bootstrap - Med 1 Med r - Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times. - Compute the median for each - Compute the confidence interval for each Take average of each upper and lower point for the confidence interval
  • 34. Bag of Little Bootstrap Klein et al. 2012 Computational Gains: - Each sample only has b unique values! - Can sample a b-dimensional multinomial with n trials. - Scales in b instead of n - Easily parallelizable
  • 35. Bag of Little Bootstrap Klein et al. 2012 Computational Gains: - Each sample only has b unique values! - Can sample a b-dimensional multinomial with n trials. - Scales in b instead of n - Easily parallelizable If b=n^(0.6), a dataset of size 1TB: - Bootstrap storage demands ~ 632GB - BLB storage demands ~ 4GB
  • 36. Bag of Little Bootstrap Theoretical guarantees: - Consistency - Higher order correctness - Fast convergence rate (same as bootstrap)
  • 37. Performance b = n^(gamma), 0.5<= gamma <=1 These choices of gamma ensures bootstrap convergence rates.
  • 38. Performance b = n^(gamma), 0.5<= gamma <=1 These choices of gamma ensures bootstrap convergence rates. Relative error of confidence interval width of logistic regression coefficients (Klein et al. 2012)
  • 39. Performance b = n^(gamma), 0.5<= gamma <=1 These choices of gamma ensures bootstrap convergence rates. Relative error of confidence interval width of logistic regression coefficients (Klein et al. 2012) Gamma residuals t-distr residuals
  • 41. Selecting Hyperparameters • b, the number of unique samples for each little bootstrap • s, the number of size b samples w/o replacement • r, the number of multinomials to draw
  • 42. Selecting Hyperparameters • b, the number of unique samples for each little bootstrap • s, the number of size b samples w/o replacement • r, the number of multinomials to draw b: the larger the better s, r: adaptively increase this until a convergence has been reached. (Median doesn’t change)
  • 43. Bag of Little Bootstrap Main benefits: - Computationally friendly - Maintains most statistical properties of bootstrap - Flexibility - More robust to choice of b than older methods
  • 44. Reference • Efron, Tibshirani (1993) An Introduction to the Bootstrap • Kleiner et al. (2012) A Scalable Bootstrap for Massive Data Thanks!