SlideShare une entreprise Scribd logo
1  sur  22
Classification Machine Learning:
Learning Linear Classifiers
By Thomas Norris
The component most responsible for
learning (improving) in a model is the
Quality Metric
Likelihood Function
Quality Metrics improve the coefficients of a classification model using a likelihood
function.
A likelihood function l(w) measures quality of fit for coefficients w by seeking to
maximize l(w), bringing it as close at 1 as possible.
Maximum Likelihood Estimation
Maximum Likelihood Estimation: An equation used on l(w) with the goal of maximizing
P(y|x,w) for all N coefficients w of x. The equation is written as:
Oftentimes, the maximum likelihood estimation function will utilize gradient ascent to
achieve its goal
Gradient Ascent
The gradient is the plane created by the collection of vectors f(), which is the
derivatives of each coefficient w.
Gradient Ascent then is an iterative optimization algorithm, which follows the
gradient vector of the greatest magnitude until the highest point in the plane is
discovered.
Contour Plot is the plotted trajectory taken by the gradient descent algorithm to
the maximum point.
Illustrations:
Gradient Ascent
Contour Plot
1[yi=+1] is the indicator function. It is a piecewise function, equalling 0 if yi does
not equal +1, and equalling +1 if it does. These values will differ based on the
values and number of possible outputs.
Example
hj(xi) yi P(yi=1|xi, wi) = 1/(1+e^-(w T h(xi)) Derivative
2.5 +1 .92414 2.5(1-.92414) = 0.18965
0.3 -1 .5744 .3(0-.5744) = -0.17232
2.8 +1 .9427 2.8(1-.9427) = 0.16044
0.5 +1 .6225 .5(1-.6225) = 0.18875
Derivative of l(wi) = .36652
Where indicator function 1[yi=+1] ={1, yi = +1
0, yi = -1
Oftentimes, log likelihood is used for calculation of derivatives instead of
likelihood.
This is simply because it makes some of the math involved easier.
Interpretation of Derivatives
Assuming hj(xi) = 1 for simplicity
P(y=+1|xi, w ) ≈ 1 P(y=+1|xi, w ) ≈ 0
yi = +1 Δi≈ (1-1) ≈ 0 , good
coefficients
Δi≈ (1-0) ≈ 1, coefficients
too small (false negative)
yi = -1 Δi≈ (0-1) ≈ -1, coefficients
too large (false positive)
Δi≈ (0-0) ≈ 0, good
coefficients
Algorithm for Gradient Ascent for Logistic
Regression
tolerance
step size
dimensions
Determining step size
Step sizes are determined by a process of trial and error
Step sizes are compared on a learning curve, where y axis = likelihood and x axis = # of
iterations.
Too small step size: Slowly growing curve
Too large step size: Oscillations (zig-zags) in curve
Example of too small a step size (green line) Example of too large a step size (red line, teal
line)
Measures for Classifier Evaluation
Error = # of mistakes/# of data points
Accuracy = # of correct predictions/# of data points
Ex: In a dataset of 30 data points, 20 are predicted correctly
Error = 10/30 Accuracy = 20/30
Overfitting in Classification
Overfitting: Training a model too tightly to a certain set of data points, to the extent that
it cannot make accurate predictions with new data.
Overfitting in classification models leads to overconfident predictions (extremely high
and low probabilities)
Signs of Overfitting
Strong Indicators of overfitting include:
● Model accuracy nearing or equalling 100% with training data
● Extremely large coefficients
● Overly complex decision boundaries
Remember the formula: P(y=z|xi) = 1/(1+e^-(w T h(xi))
When “good” = . 05 and “awful” = -3,
“good”+ “good” + “awful” = -2, P(y=1|xi) = 1/(1+e^-(-2)) = .119
When “good” = 5 and “awful” = -30,
“good”+ “good” + “awful” = -20, P(y=1|xi) = 1/(1+e^-(-20)) = 2.061e -9 (really small number)
Overconfidence example:
Method to Mitigate Overfitting
Overfitting can be prevented, or at a minimum reduced, by “penalizing” large coefficients.
Done by altering the quality metric
Total Quality = Measure of Fit AKA Data Likelihood
Total Quality = Measure of Fit - Measure of Magnitude of Coefficients
Measures of Magnitude of Coefficients
There are two common values used as measure of magnitude in order to penalize large
coefficients:
L2 Norm: Sum of coefficients’ squares
L1 Norm: Sum of coefficients’ absolute values
Both are magnified by λ, determined by validation set, referred to as the “tuning parameter”.
Greater λ -> Smaller Coefficients
L2 Regularization
The L2 norm is attained by taking the sum of the coefficients squared, w = (w1
2,w2
2, … ,
wD
2)
L2 regularization is then done by taking the derivative of l(w)and subtracting the
derivative of w2*λ, which is 2λw
Quality Metric =
L1 Regularization
L1 Norm is the sum of absolute values of w (thus penalizing large positive and negative
numbers equally)
Thus in L1 regularization:
Quality Metric =
Leads to sparse solutions, where many coefficients wj = 0
Impact of Regularization
L2: Coefficients approach (but don’t reach)
zero as λ increases
L1: Coefficients approach and reach zero as λ
increases

Contenu connexe

Tendances

Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Computer Graphic - Transformations in 2D
Computer Graphic - Transformations in 2DComputer Graphic - Transformations in 2D
Computer Graphic - Transformations in 2D2013901097
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent methodSanghyuk Chun
 
Supot37255412160
Supot37255412160Supot37255412160
Supot37255412160Ajay Ochani
 
Introduction to reinforcement learning - Phu Nguyen
Introduction to reinforcement learning - Phu NguyenIntroduction to reinforcement learning - Phu Nguyen
Introduction to reinforcement learning - Phu NguyenTu Le Dinh
 

Tendances (6)

Linear regression
Linear regressionLinear regression
Linear regression
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Computer Graphic - Transformations in 2D
Computer Graphic - Transformations in 2DComputer Graphic - Transformations in 2D
Computer Graphic - Transformations in 2D
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Supot37255412160
Supot37255412160Supot37255412160
Supot37255412160
 
Introduction to reinforcement learning - Phu Nguyen
Introduction to reinforcement learning - Phu NguyenIntroduction to reinforcement learning - Phu Nguyen
Introduction to reinforcement learning - Phu Nguyen
 

Similaire à Learning Linear Classifiers

L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Shrinkage Methods in Linear Regression
Shrinkage Methods in Linear RegressionShrinkage Methods in Linear Regression
Shrinkage Methods in Linear RegressionBennoG1
 
Linear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in MLLinear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in MLKumud Arora
 
Error analysis statistics
Error analysis   statisticsError analysis   statistics
Error analysis statisticsTarun Gehlot
 
Optimization Methods in Finance
Optimization Methods in FinanceOptimization Methods in Finance
Optimization Methods in Financethilankm
 
Linear regression
Linear regression Linear regression
Linear regression mohamed Naas
 
Machine Learning Printable For Studying Exam
Machine Learning Printable For Studying ExamMachine Learning Printable For Studying Exam
Machine Learning Printable For Studying ExamYusufFakhriAldrian1
 
Linear Regression.pptx
Linear Regression.pptxLinear Regression.pptx
Linear Regression.pptxnathansel1
 
1609 probability function p on subspace of s
1609 probability function p on subspace of s1609 probability function p on subspace of s
1609 probability function p on subspace of sDr Fereidoun Dejahang
 
IVS-B UNIT-1_merged. Semester 2 fundamental of sciencepdf
IVS-B UNIT-1_merged. Semester 2 fundamental of sciencepdfIVS-B UNIT-1_merged. Semester 2 fundamental of sciencepdf
IVS-B UNIT-1_merged. Semester 2 fundamental of sciencepdf42Rnu
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descentRevanth Kumar
 
2. Linear regression with one variable.pptx
2. Linear regression with one variable.pptx2. Linear regression with one variable.pptx
2. Linear regression with one variable.pptxEmad Nabil
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1arogozhnikov
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3arogozhnikov
 
lecture15-regularization.pptx
lecture15-regularization.pptxlecture15-regularization.pptx
lecture15-regularization.pptxsghorai
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdfAmir Saleh
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackarogozhnikov
 

Similaire à Learning Linear Classifiers (20)

Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
 
3ml.pdf
3ml.pdf3ml.pdf
3ml.pdf
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Shrinkage Methods in Linear Regression
Shrinkage Methods in Linear RegressionShrinkage Methods in Linear Regression
Shrinkage Methods in Linear Regression
 
Linear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in MLLinear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in ML
 
Error analysis statistics
Error analysis   statisticsError analysis   statistics
Error analysis statistics
 
Optimization Methods in Finance
Optimization Methods in FinanceOptimization Methods in Finance
Optimization Methods in Finance
 
Linear regression
Linear regression Linear regression
Linear regression
 
Machine Learning Printable For Studying Exam
Machine Learning Printable For Studying ExamMachine Learning Printable For Studying Exam
Machine Learning Printable For Studying Exam
 
Linear Regression.pptx
Linear Regression.pptxLinear Regression.pptx
Linear Regression.pptx
 
1609 probability function p on subspace of s
1609 probability function p on subspace of s1609 probability function p on subspace of s
1609 probability function p on subspace of s
 
IVS-B UNIT-1_merged. Semester 2 fundamental of sciencepdf
IVS-B UNIT-1_merged. Semester 2 fundamental of sciencepdfIVS-B UNIT-1_merged. Semester 2 fundamental of sciencepdf
IVS-B UNIT-1_merged. Semester 2 fundamental of sciencepdf
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descent
 
Lec05.pptx
Lec05.pptxLec05.pptx
Lec05.pptx
 
2. Linear regression with one variable.pptx
2. Linear regression with one variable.pptx2. Linear regression with one variable.pptx
2. Linear regression with one variable.pptx
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3
 
lecture15-regularization.pptx
lecture15-regularization.pptxlecture15-regularization.pptx
lecture15-regularization.pptx
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdf
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic track
 

Dernier

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 

Dernier (20)

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 

Learning Linear Classifiers

  • 1. Classification Machine Learning: Learning Linear Classifiers By Thomas Norris
  • 2. The component most responsible for learning (improving) in a model is the Quality Metric
  • 3. Likelihood Function Quality Metrics improve the coefficients of a classification model using a likelihood function. A likelihood function l(w) measures quality of fit for coefficients w by seeking to maximize l(w), bringing it as close at 1 as possible.
  • 4. Maximum Likelihood Estimation Maximum Likelihood Estimation: An equation used on l(w) with the goal of maximizing P(y|x,w) for all N coefficients w of x. The equation is written as: Oftentimes, the maximum likelihood estimation function will utilize gradient ascent to achieve its goal
  • 5. Gradient Ascent The gradient is the plane created by the collection of vectors f(), which is the derivatives of each coefficient w. Gradient Ascent then is an iterative optimization algorithm, which follows the gradient vector of the greatest magnitude until the highest point in the plane is discovered. Contour Plot is the plotted trajectory taken by the gradient descent algorithm to the maximum point.
  • 7. 1[yi=+1] is the indicator function. It is a piecewise function, equalling 0 if yi does not equal +1, and equalling +1 if it does. These values will differ based on the values and number of possible outputs.
  • 8. Example hj(xi) yi P(yi=1|xi, wi) = 1/(1+e^-(w T h(xi)) Derivative 2.5 +1 .92414 2.5(1-.92414) = 0.18965 0.3 -1 .5744 .3(0-.5744) = -0.17232 2.8 +1 .9427 2.8(1-.9427) = 0.16044 0.5 +1 .6225 .5(1-.6225) = 0.18875 Derivative of l(wi) = .36652 Where indicator function 1[yi=+1] ={1, yi = +1 0, yi = -1
  • 9. Oftentimes, log likelihood is used for calculation of derivatives instead of likelihood. This is simply because it makes some of the math involved easier.
  • 10. Interpretation of Derivatives Assuming hj(xi) = 1 for simplicity P(y=+1|xi, w ) ≈ 1 P(y=+1|xi, w ) ≈ 0 yi = +1 Δi≈ (1-1) ≈ 0 , good coefficients Δi≈ (1-0) ≈ 1, coefficients too small (false negative) yi = -1 Δi≈ (0-1) ≈ -1, coefficients too large (false positive) Δi≈ (0-0) ≈ 0, good coefficients
  • 11. Algorithm for Gradient Ascent for Logistic Regression tolerance step size dimensions
  • 12. Determining step size Step sizes are determined by a process of trial and error Step sizes are compared on a learning curve, where y axis = likelihood and x axis = # of iterations. Too small step size: Slowly growing curve Too large step size: Oscillations (zig-zags) in curve
  • 13. Example of too small a step size (green line) Example of too large a step size (red line, teal line)
  • 14. Measures for Classifier Evaluation Error = # of mistakes/# of data points Accuracy = # of correct predictions/# of data points Ex: In a dataset of 30 data points, 20 are predicted correctly Error = 10/30 Accuracy = 20/30
  • 15. Overfitting in Classification Overfitting: Training a model too tightly to a certain set of data points, to the extent that it cannot make accurate predictions with new data. Overfitting in classification models leads to overconfident predictions (extremely high and low probabilities)
  • 16. Signs of Overfitting Strong Indicators of overfitting include: ● Model accuracy nearing or equalling 100% with training data ● Extremely large coefficients ● Overly complex decision boundaries
  • 17. Remember the formula: P(y=z|xi) = 1/(1+e^-(w T h(xi)) When “good” = . 05 and “awful” = -3, “good”+ “good” + “awful” = -2, P(y=1|xi) = 1/(1+e^-(-2)) = .119 When “good” = 5 and “awful” = -30, “good”+ “good” + “awful” = -20, P(y=1|xi) = 1/(1+e^-(-20)) = 2.061e -9 (really small number) Overconfidence example:
  • 18. Method to Mitigate Overfitting Overfitting can be prevented, or at a minimum reduced, by “penalizing” large coefficients. Done by altering the quality metric Total Quality = Measure of Fit AKA Data Likelihood Total Quality = Measure of Fit - Measure of Magnitude of Coefficients
  • 19. Measures of Magnitude of Coefficients There are two common values used as measure of magnitude in order to penalize large coefficients: L2 Norm: Sum of coefficients’ squares L1 Norm: Sum of coefficients’ absolute values Both are magnified by λ, determined by validation set, referred to as the “tuning parameter”. Greater λ -> Smaller Coefficients
  • 20. L2 Regularization The L2 norm is attained by taking the sum of the coefficients squared, w = (w1 2,w2 2, … , wD 2) L2 regularization is then done by taking the derivative of l(w)and subtracting the derivative of w2*λ, which is 2λw Quality Metric =
  • 21. L1 Regularization L1 Norm is the sum of absolute values of w (thus penalizing large positive and negative numbers equally) Thus in L1 regularization: Quality Metric = Leads to sparse solutions, where many coefficients wj = 0
  • 22. Impact of Regularization L2: Coefficients approach (but don’t reach) zero as λ increases L1: Coefficients approach and reach zero as λ increases