SlideShare a Scribd company logo
1 of 5
Download to read offline
What next?
This document is for those who have met the pre-requisite for a course on Machine
Learning. You must be familiar with basic principles and ideas drawn from
Statistics, Linear Algebra, Calculus and Probability theory. This is not a
beginner’s document even though material presented is very basic.
Now that we have played around with individual supervised
learners we now have an obligation to pause and ponder.
If everything we have learned in the last 2 months is so promising,
why then we never achieved 100% accuracy even in the small toy datasets
we worked with.
What is the reason?
We may attribute it to one of several plausible underlying causes.
Perhaps we can think of them as process related (the manner in which we
conducted these exercises) or other problems
intrinsic unrelated to the process.
So let us begin our exploration with intrinsic factors that prevent us
from learning to perfection. They are Bias and Variance. Understanding Bias
and Variance at a deeper level is our goal for this lecture.
Bias is a stronger force, and it is always present and it is much harder to
eliminate. There are several forms of bias and most of these forms of bias may not
be attributable to the process but the data we are trying to learn from, except for
the inductive bias. Bias is present in the data due to any number of reasons --
Selection bias (Landon vs FDR Election polling in 1936), Survival bias (Outliers by
Gladwell, Wald experiment with Navy). There are many other forms of Bias, I
encourage you to consult our collective conscience, the www.
In Statistics,bias is indicated when the expected value of a sample statistic
differs from the population parameter. Let us conduct a small experiment to
understand bias.
You have seen the notation (mean,variance) for the Normal Distribution. What
exactly does this notation refer to? Population or sample? My answer would be it is
the population as it encompasses any and every observation that is part of that
normal distribution with that mean and variance.
You also have seen the R function rnorm(100,mean,sigma) which returns a sample of
100 observations from a normal distribution with those parameters.
Let us compute the mean of the sample observations and compare it with the
theoretical mean.
# am omitting the seed so that we all get different samples and i
# expect all observations to be consistent with our opinions below, numerically
different
# but we will come to the same conclusion
s1<-rnorm(100,10,8)
paste("Sample is ",
ifelse(mean(s1)!=10,"biased.", "unbiased."),
"Sample statistic is mean.", sep="")
#Let us consider another sample statistic SQRT(variance), the standard deviation.
paste("Sample is ",
ifelse(sd(s1)!=8,"biased.", "unbiased."),
"Sample statistic is sd.", sep="")
This is single sample what would happen if we do this a 1000 times.
We can think of such an average as its expected value?
# Let us find the mean of a sample of 1000, averaged over 1000 samples
N<-1000
L<-unlist(lapply(1:N,FUN=function(x){ s<-rnorm(1000,10,8)
list(mean=mean(s),sd=sd(s))}))
mean(L[seq(1,N,2)])
mean(L[seq(2,N,2)])
hist(L[seq(1,N,2)])
#10.0252 for mean and 7.997218 for sd
Every time, we run, we will get different results. That is the point. The sample
means are never the theoretical mean 10 (for the mean). This difference is the
Bias. Every dataset, we will ever work with, is a sample drawn from some unknown
distribution. Accordingly, we anticipate bias in any statistic we estimate from a
given sample. We cannot avoid that.
Bias and sample size
How does Bias vary with Sample Size?
# what can we expect if the sample sz is 100, but averaged over 1000 samples as
before
N<-1000
sz<-100
L<-unlist(lapply(1:N,FUN=function(x,m=sz){ s<-rnorm(m,10,8)
list(mean=mean(s),sd=sd(s))}))
paste("Sample is ",ifelse(mean(L[seq(1,N,2)])!=10,"biased.", "Unbiased."), "Sample
statistic is mean", sep="")
paste("Sample is ",ifelse(mean(L[seq(2,N,2)])!=8,"biased.", "Unbiased."), "Sample
statistic is sd", sep="")
We can plot a histogram of the means
hist(L[seq(1,N,2)])
Also, this is an opportune time to digress a bit toward CLT, LLN
CLT -- Regardless of the underlying distribution, the mean of the sample.statistic
will follow Normal Curve.
We can perform shapiro.test to validate CLT.
shapiro.test(L[seq(1,N,2)])
> shapiro.test(L[seq(1,N,2)])
Shapiro-Wilk normality test
data: L[seq(1, N, 2)]
W = 0.99663, p-value = 0.3825
The null hypothesis of Shapiro’s test is that the population is distributed
normally. Since the p-value is > 0.05 we cannot reject the null. That is, this
sample data is not significantly different from a normal distribution. CLT.
LLN -> sample.statistic will asymptotically reach the population parameter.
When training error is large and accuracy is close to 50%, and we consider the
learner is unable to learn. The hypothesis is too simple. This is referred to as
underfitting. We attribute underfitting to Bias, because the model
is unable to estimate the parameters. Model is too simple and unable to learn the
structure of the data presented.
https://becominghuman.ai/machine-learning-bias-vs-variance-641f924e6c57
https://www.mygreatlearning.com/blog/bias-variance-trade-off-in-machine-learning/
https://www.mygreatlearning.com/blog/overfitting-and-underfitting-in-machine-
learning/
Now the Variance
As shown above, each sample there is slight variation. No two samples are alike.
That comes with iid.
When a learner is unable to generalize, that is the model is unable to perform as
well as it did on the training set when presented with never seen before data, we
attribute that to variance and generally referred to as over-fitting.
Another way to think of this given different data, our classifier performs
differently. For classification problem, the same observation is classified
differently given different training set. The fluctuation around the true class
label, depending on the training set.
Model is more complex than necessary and captures the noise.
Total Error = variance + bias**2 + irreducible error
Most introductory books on M/L establish this constraint algebraically using first
principles. Given this constraint, variance and bias cannot be simultaneously
reduced.
Most optimal point is where the variance and the bias cross, when plotted as
function of model complexity.
http://scott.fortmann-roe.com/docs/BiasVariance.html
https://www.analyticsvidhya.com/blog/2020/08/bias-and-variance-tradeoff-machine-
learning/
Bias/Variance Tradeoff
As mentioned before variance and bias cannot be simultaneously reduced. If we seek
low variance, we will end up with high bias and vice versa. That is the
Bias/Variance tradeoff. By definition, a model with high bias, therefore, will have
low variance and a model with high variance will have low bias. This gives us the
ability to improve classifiers. We can start with high variance reduce its
variance. We will end up with an optimal model low bias and lower variance. Or
start with a low variance or high bias and seek to reduce its bias and end up with
a model that low variance and lower bias.
What is our option?
What can we vary or tune? What are the variables? Our dataset has N observations
and p features. We can vary them by considering different training sets. We now
know numerous strategies to classify. Therefore, we have the following variables
and we can vary them to optimize performance:
1. vary the training set (keep N constant, but change the observations)
2. Consider different features (p) in our training set
3. Train different models
Let us consider some well known techniques.
Methods to optimize bias/variance:
1. Cross-Validation : Cross Validation in its simplest form is a one round
validation, where we leave one sample as in-time validation and rest for
training the model. This form of CV is called LOO-CV (leave one out CV).
There is another family of cross validation where we split the data into
equal number of folds, 10,5, and 3 are common. Here in a k-fold CV, we
divide the dataset into k disjoint folds. One fold is kept as testing set
and the k-1 folds are used as training set. This process is repeated over
all the k-folds and the average is taken as the CV-metrics. The difference
between the LOO and k-fold CV is that in LOO-CV, there are as many folds as
there are observations. Each fold is 1 observation. Every observation
participates k-1 training-sessions and once as a test observation. From a
Big Data perspective cross-validation lends itself to parallelization and
MapReduce is an appropriate strategy.
2. Boosting combines many "weak" individual models in an ensemble that has
lower bias than the individual models. Boosting is what we do naturally.
When something goes wrong, we try to fix those parts that are erroneous and
iteratively eliminate all the errors making the solution error free.
Boosting therefore is iterative and does not lend itself to
parallelization. Not a good candidate for Big data experimentation.
3. Bagging combines "weak" learners in a way that reduces their variance. In
Bagging, we generate a number of bootstrap samples. Train the model and
average them in regression and take the majority vote in classification.
Bagging therefore is highly parallelizable and a good candidate for Big
Data experimentation.
4. In k-nearest neighbor models, a high value of k leads to high bias and low
variance. Think of a dataset with two classes. Assume the class proportions
are 70/30. If we make k to be the N of the dataset, kNN will classify all
new observations as belonging to the majority class. A biased classifier.
kNNs are also amenable to parallelization. Given an unknown observation,
the distance between the observations can be computed in parallel.
5. Early Stopping : Early stopping rules provide guidance as to how many
iterations can be run before the learner begins to over-fit. This technique
is often used in neural nets and Tree algorithms. Tree algorithms are
parallelizable.
6. Pruning : Pruning is used extensively while building CART (Tree) models. In
decision trees, the depth of the tree determines the variance. Decision
trees are commonly pruned to control variance.
7. Regularization: Linear and Generalized linear models can be regularized to
decrease their variance at the cost of increasing their bias. Iterative and
therefore not parallelizable.
8. Dimensionality reduction and feature selection can decrease variance by
simplifying models and getting rid of correlated features. Dimenstionality
methods rely on linear algebra techniques involving matrix inversion and
multiplication. Such operations are parallelizable and hence good candidate
for Big Data experimentation.
9. Adding features (predictors) tends to decrease bias, at the expense of
introducing additional variance.
10. A larger training set tends to decrease variance. Thus, a higher fold
cross validation results in lower variance.
More experimentation, varying sample size:
sd10<-unlist(lapply(1:1000,FUN=function(x)sd(rnorm(10,mean=10,sd=4))))
sd100<-unlist(lapply(1:1000,FUN=function(x)sd(rnorm(100,mean=10,sd=4))))
sd1000<-unlist(lapply(1:1000,FUN=function(x)sd(rnorm(1000,mean=10,sd=4))))
sd10000<-unlist(lapply(1:1000,FUN=function(x)sd(rnorm(10000,mean=10,sd=4))))
adf<-data.frame(ex10=sd10,ex100=sd100,ex1000=sd1000,ex10000=sd10000)
apply(adf,2,mean)
ex10 ex100 ex1000 ex10000
3.893439 3.980448 4.001862 3.997796
apply(adf,2,sd)
ex10 ex100 ex1000 ex10000
0.94593649 0.28605537 0.08785607 0.02829969
Note that the variance,as measured by sd, for larger samples is much lower than the
variance for smaller samples.

More Related Content

What's hot

Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
butest
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
butest
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Simplilearn
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
butest
 

What's hot (19)

Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory Concepts
 
Intro to modelling-supervised learning
Intro to modelling-supervised learningIntro to modelling-supervised learning
Intro to modelling-supervised learning
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 
Machine learning session9(clustering)
Machine learning   session9(clustering)Machine learning   session9(clustering)
Machine learning session9(clustering)
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Dealing with inconsistency
Dealing with inconsistencyDealing with inconsistency
Dealing with inconsistency
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
 
Ch06
Ch06Ch06
Ch06
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Introduction to Machine learning ppt
Introduction to Machine learning pptIntroduction to Machine learning ppt
Introduction to Machine learning ppt
 
Module 2: Machine Learning Deep Dive
Module 2:  Machine Learning Deep DiveModule 2:  Machine Learning Deep Dive
Module 2: Machine Learning Deep Dive
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2
 

Similar to M08 BiasVarianceTradeoff

35878 Topic Discussion5Number of Pages 1 (Double Spaced).docx
35878 Topic Discussion5Number of Pages 1 (Double Spaced).docx35878 Topic Discussion5Number of Pages 1 (Double Spaced).docx
35878 Topic Discussion5Number of Pages 1 (Double Spaced).docx
rhetttrevannion
 
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docxBUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
curwenmichaela
 
35881 DiscussionNumber of Pages 1 (Double Spaced)Number o.docx
35881 DiscussionNumber of Pages 1 (Double Spaced)Number o.docx35881 DiscussionNumber of Pages 1 (Double Spaced)Number o.docx
35881 DiscussionNumber of Pages 1 (Double Spaced)Number o.docx
rhetttrevannion
 
Estimating Models Using Dummy VariablesYou have had plenty of op.docx
Estimating Models Using Dummy VariablesYou have had plenty of op.docxEstimating Models Using Dummy VariablesYou have had plenty of op.docx
Estimating Models Using Dummy VariablesYou have had plenty of op.docx
SANSKAR20
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Sherri Gunder
 
35880 Topic Discussion7Number of Pages 1 (Double Spaced).docx
35880 Topic Discussion7Number of Pages 1 (Double Spaced).docx35880 Topic Discussion7Number of Pages 1 (Double Spaced).docx
35880 Topic Discussion7Number of Pages 1 (Double Spaced).docx
domenicacullison
 
Edison S Statistics
Edison S StatisticsEdison S Statistics
Edison S Statistics
teresa_soto
 
Edisons Statistics
Edisons StatisticsEdisons Statistics
Edisons Statistics
teresa_soto
 
F ProjHOSPITAL INPATIENT P & L20162017Variance Variance Per DC 20.docx
F ProjHOSPITAL INPATIENT P & L20162017Variance Variance Per DC 20.docxF ProjHOSPITAL INPATIENT P & L20162017Variance Variance Per DC 20.docx
F ProjHOSPITAL INPATIENT P & L20162017Variance Variance Per DC 20.docx
mecklenburgstrelitzh
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
boyfieldhouse
 

Similar to M08 BiasVarianceTradeoff (20)

chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
 
Real Estate Data Set
Real Estate Data SetReal Estate Data Set
Real Estate Data Set
 
35878 Topic Discussion5Number of Pages 1 (Double Spaced).docx
35878 Topic Discussion5Number of Pages 1 (Double Spaced).docx35878 Topic Discussion5Number of Pages 1 (Double Spaced).docx
35878 Topic Discussion5Number of Pages 1 (Double Spaced).docx
 
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docxBUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
 
35881 DiscussionNumber of Pages 1 (Double Spaced)Number o.docx
35881 DiscussionNumber of Pages 1 (Double Spaced)Number o.docx35881 DiscussionNumber of Pages 1 (Double Spaced)Number o.docx
35881 DiscussionNumber of Pages 1 (Double Spaced)Number o.docx
 
Estimating Models Using Dummy VariablesYou have had plenty of op.docx
Estimating Models Using Dummy VariablesYou have had plenty of op.docxEstimating Models Using Dummy VariablesYou have had plenty of op.docx
Estimating Models Using Dummy VariablesYou have had plenty of op.docx
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
 
Regularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxRegularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptx
 
Data science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxData science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptx
 
Lesson 1 07 measures of variation
Lesson 1 07 measures of variationLesson 1 07 measures of variation
Lesson 1 07 measures of variation
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
35880 Topic Discussion7Number of Pages 1 (Double Spaced).docx
35880 Topic Discussion7Number of Pages 1 (Double Spaced).docx35880 Topic Discussion7Number of Pages 1 (Double Spaced).docx
35880 Topic Discussion7Number of Pages 1 (Double Spaced).docx
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptx
 
Edison S Statistics
Edison S StatisticsEdison S Statistics
Edison S Statistics
 
A review of statistics
A review of statisticsA review of statistics
A review of statistics
 
Edisons Statistics
Edisons StatisticsEdisons Statistics
Edisons Statistics
 
Dimd_m_004 DL.pdf
Dimd_m_004 DL.pdfDimd_m_004 DL.pdf
Dimd_m_004 DL.pdf
 
F ProjHOSPITAL INPATIENT P & L20162017Variance Variance Per DC 20.docx
F ProjHOSPITAL INPATIENT P & L20162017Variance Variance Per DC 20.docxF ProjHOSPITAL INPATIENT P & L20162017Variance Variance Per DC 20.docx
F ProjHOSPITAL INPATIENT P & L20162017Variance Variance Per DC 20.docx
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
 

More from Raman Kannan

More from Raman Kannan (20)

Essays on-civic-responsibilty
Essays on-civic-responsibiltyEssays on-civic-responsibilty
Essays on-civic-responsibilty
 
M12 boosting-part02
M12 boosting-part02M12 boosting-part02
M12 boosting-part02
 
M12 random forest-part01
M12 random forest-part01M12 random forest-part01
M12 random forest-part01
 
M11 bagging loo cv
M11 bagging loo cvM11 bagging loo cv
M11 bagging loo cv
 
M10 gradient descent
M10 gradient descentM10 gradient descent
M10 gradient descent
 
M09-Cross validating-naive-bayes
M09-Cross validating-naive-bayesM09-Cross validating-naive-bayes
M09-Cross validating-naive-bayes
 
M06 tree
M06 treeM06 tree
M06 tree
 
M07 svm
M07 svmM07 svm
M07 svm
 
Chapter 04-discriminant analysis
Chapter 04-discriminant analysisChapter 04-discriminant analysis
Chapter 04-discriminant analysis
 
M03 nb-02
M03 nb-02M03 nb-02
M03 nb-02
 
Augmented 11022020-ieee
Augmented 11022020-ieeeAugmented 11022020-ieee
Augmented 11022020-ieee
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regression
 
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
 
A voyage-inward-02
A voyage-inward-02A voyage-inward-02
A voyage-inward-02
 
Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923
 
A data scientist's study plan
A data scientist's study planA data scientist's study plan
A data scientist's study plan
 
Cognitive Assistants
Cognitive AssistantsCognitive Assistants
Cognitive Assistants
 
Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysis
 
Joy of Unix
Joy of UnixJoy of Unix
Joy of Unix
 
How to-run-ols-diagnostics-02
How to-run-ols-diagnostics-02How to-run-ols-diagnostics-02
How to-run-ols-diagnostics-02
 

Recently uploaded

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 

Recently uploaded (20)

Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 

M08 BiasVarianceTradeoff

  • 1. What next? This document is for those who have met the pre-requisite for a course on Machine Learning. You must be familiar with basic principles and ideas drawn from Statistics, Linear Algebra, Calculus and Probability theory. This is not a beginner’s document even though material presented is very basic. Now that we have played around with individual supervised learners we now have an obligation to pause and ponder. If everything we have learned in the last 2 months is so promising, why then we never achieved 100% accuracy even in the small toy datasets we worked with. What is the reason? We may attribute it to one of several plausible underlying causes. Perhaps we can think of them as process related (the manner in which we conducted these exercises) or other problems intrinsic unrelated to the process. So let us begin our exploration with intrinsic factors that prevent us from learning to perfection. They are Bias and Variance. Understanding Bias and Variance at a deeper level is our goal for this lecture. Bias is a stronger force, and it is always present and it is much harder to eliminate. There are several forms of bias and most of these forms of bias may not be attributable to the process but the data we are trying to learn from, except for the inductive bias. Bias is present in the data due to any number of reasons -- Selection bias (Landon vs FDR Election polling in 1936), Survival bias (Outliers by Gladwell, Wald experiment with Navy). There are many other forms of Bias, I encourage you to consult our collective conscience, the www. In Statistics,bias is indicated when the expected value of a sample statistic differs from the population parameter. Let us conduct a small experiment to understand bias. You have seen the notation (mean,variance) for the Normal Distribution. What exactly does this notation refer to? Population or sample? My answer would be it is the population as it encompasses any and every observation that is part of that normal distribution with that mean and variance. You also have seen the R function rnorm(100,mean,sigma) which returns a sample of 100 observations from a normal distribution with those parameters. Let us compute the mean of the sample observations and compare it with the theoretical mean. # am omitting the seed so that we all get different samples and i # expect all observations to be consistent with our opinions below, numerically different # but we will come to the same conclusion s1<-rnorm(100,10,8) paste("Sample is ",
  • 2. ifelse(mean(s1)!=10,"biased.", "unbiased."), "Sample statistic is mean.", sep="") #Let us consider another sample statistic SQRT(variance), the standard deviation. paste("Sample is ", ifelse(sd(s1)!=8,"biased.", "unbiased."), "Sample statistic is sd.", sep="") This is single sample what would happen if we do this a 1000 times. We can think of such an average as its expected value? # Let us find the mean of a sample of 1000, averaged over 1000 samples N<-1000 L<-unlist(lapply(1:N,FUN=function(x){ s<-rnorm(1000,10,8) list(mean=mean(s),sd=sd(s))})) mean(L[seq(1,N,2)]) mean(L[seq(2,N,2)]) hist(L[seq(1,N,2)]) #10.0252 for mean and 7.997218 for sd Every time, we run, we will get different results. That is the point. The sample means are never the theoretical mean 10 (for the mean). This difference is the Bias. Every dataset, we will ever work with, is a sample drawn from some unknown distribution. Accordingly, we anticipate bias in any statistic we estimate from a given sample. We cannot avoid that. Bias and sample size How does Bias vary with Sample Size? # what can we expect if the sample sz is 100, but averaged over 1000 samples as before N<-1000 sz<-100 L<-unlist(lapply(1:N,FUN=function(x,m=sz){ s<-rnorm(m,10,8) list(mean=mean(s),sd=sd(s))})) paste("Sample is ",ifelse(mean(L[seq(1,N,2)])!=10,"biased.", "Unbiased."), "Sample statistic is mean", sep="") paste("Sample is ",ifelse(mean(L[seq(2,N,2)])!=8,"biased.", "Unbiased."), "Sample statistic is sd", sep="") We can plot a histogram of the means hist(L[seq(1,N,2)]) Also, this is an opportune time to digress a bit toward CLT, LLN CLT -- Regardless of the underlying distribution, the mean of the sample.statistic will follow Normal Curve. We can perform shapiro.test to validate CLT. shapiro.test(L[seq(1,N,2)]) > shapiro.test(L[seq(1,N,2)]) Shapiro-Wilk normality test data: L[seq(1, N, 2)] W = 0.99663, p-value = 0.3825 The null hypothesis of Shapiro’s test is that the population is distributed normally. Since the p-value is > 0.05 we cannot reject the null. That is, this sample data is not significantly different from a normal distribution. CLT.
  • 3. LLN -> sample.statistic will asymptotically reach the population parameter. When training error is large and accuracy is close to 50%, and we consider the learner is unable to learn. The hypothesis is too simple. This is referred to as underfitting. We attribute underfitting to Bias, because the model is unable to estimate the parameters. Model is too simple and unable to learn the structure of the data presented. https://becominghuman.ai/machine-learning-bias-vs-variance-641f924e6c57 https://www.mygreatlearning.com/blog/bias-variance-trade-off-in-machine-learning/ https://www.mygreatlearning.com/blog/overfitting-and-underfitting-in-machine- learning/ Now the Variance As shown above, each sample there is slight variation. No two samples are alike. That comes with iid. When a learner is unable to generalize, that is the model is unable to perform as well as it did on the training set when presented with never seen before data, we attribute that to variance and generally referred to as over-fitting. Another way to think of this given different data, our classifier performs differently. For classification problem, the same observation is classified differently given different training set. The fluctuation around the true class label, depending on the training set. Model is more complex than necessary and captures the noise. Total Error = variance + bias**2 + irreducible error Most introductory books on M/L establish this constraint algebraically using first principles. Given this constraint, variance and bias cannot be simultaneously reduced. Most optimal point is where the variance and the bias cross, when plotted as function of model complexity. http://scott.fortmann-roe.com/docs/BiasVariance.html https://www.analyticsvidhya.com/blog/2020/08/bias-and-variance-tradeoff-machine- learning/ Bias/Variance Tradeoff As mentioned before variance and bias cannot be simultaneously reduced. If we seek low variance, we will end up with high bias and vice versa. That is the Bias/Variance tradeoff. By definition, a model with high bias, therefore, will have low variance and a model with high variance will have low bias. This gives us the ability to improve classifiers. We can start with high variance reduce its variance. We will end up with an optimal model low bias and lower variance. Or start with a low variance or high bias and seek to reduce its bias and end up with a model that low variance and lower bias. What is our option? What can we vary or tune? What are the variables? Our dataset has N observations and p features. We can vary them by considering different training sets. We now know numerous strategies to classify. Therefore, we have the following variables
  • 4. and we can vary them to optimize performance: 1. vary the training set (keep N constant, but change the observations) 2. Consider different features (p) in our training set 3. Train different models Let us consider some well known techniques. Methods to optimize bias/variance: 1. Cross-Validation : Cross Validation in its simplest form is a one round validation, where we leave one sample as in-time validation and rest for training the model. This form of CV is called LOO-CV (leave one out CV). There is another family of cross validation where we split the data into equal number of folds, 10,5, and 3 are common. Here in a k-fold CV, we divide the dataset into k disjoint folds. One fold is kept as testing set and the k-1 folds are used as training set. This process is repeated over all the k-folds and the average is taken as the CV-metrics. The difference between the LOO and k-fold CV is that in LOO-CV, there are as many folds as there are observations. Each fold is 1 observation. Every observation participates k-1 training-sessions and once as a test observation. From a Big Data perspective cross-validation lends itself to parallelization and MapReduce is an appropriate strategy. 2. Boosting combines many "weak" individual models in an ensemble that has lower bias than the individual models. Boosting is what we do naturally. When something goes wrong, we try to fix those parts that are erroneous and iteratively eliminate all the errors making the solution error free. Boosting therefore is iterative and does not lend itself to parallelization. Not a good candidate for Big data experimentation. 3. Bagging combines "weak" learners in a way that reduces their variance. In Bagging, we generate a number of bootstrap samples. Train the model and average them in regression and take the majority vote in classification. Bagging therefore is highly parallelizable and a good candidate for Big Data experimentation. 4. In k-nearest neighbor models, a high value of k leads to high bias and low variance. Think of a dataset with two classes. Assume the class proportions are 70/30. If we make k to be the N of the dataset, kNN will classify all new observations as belonging to the majority class. A biased classifier. kNNs are also amenable to parallelization. Given an unknown observation, the distance between the observations can be computed in parallel. 5. Early Stopping : Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit. This technique is often used in neural nets and Tree algorithms. Tree algorithms are parallelizable. 6. Pruning : Pruning is used extensively while building CART (Tree) models. In decision trees, the depth of the tree determines the variance. Decision trees are commonly pruned to control variance. 7. Regularization: Linear and Generalized linear models can be regularized to decrease their variance at the cost of increasing their bias. Iterative and therefore not parallelizable. 8. Dimensionality reduction and feature selection can decrease variance by simplifying models and getting rid of correlated features. Dimenstionality methods rely on linear algebra techniques involving matrix inversion and multiplication. Such operations are parallelizable and hence good candidate for Big Data experimentation. 9. Adding features (predictors) tends to decrease bias, at the expense of introducing additional variance. 10. A larger training set tends to decrease variance. Thus, a higher fold cross validation results in lower variance.
  • 5. More experimentation, varying sample size: sd10<-unlist(lapply(1:1000,FUN=function(x)sd(rnorm(10,mean=10,sd=4)))) sd100<-unlist(lapply(1:1000,FUN=function(x)sd(rnorm(100,mean=10,sd=4)))) sd1000<-unlist(lapply(1:1000,FUN=function(x)sd(rnorm(1000,mean=10,sd=4)))) sd10000<-unlist(lapply(1:1000,FUN=function(x)sd(rnorm(10000,mean=10,sd=4)))) adf<-data.frame(ex10=sd10,ex100=sd100,ex1000=sd1000,ex10000=sd10000) apply(adf,2,mean) ex10 ex100 ex1000 ex10000 3.893439 3.980448 4.001862 3.997796 apply(adf,2,sd) ex10 ex100 ex1000 ex10000 0.94593649 0.28605537 0.08785607 0.02829969 Note that the variance,as measured by sd, for larger samples is much lower than the variance for smaller samples.