SlideShare a Scribd company logo
1 of 17
Download to read offline
Machine Learning     
    
    
       
   
Srihari




   Basic Concepts in Machine
           Learning
                   Sargur N. Srihari




                                                      1
Machine Learning   
   
   
   
   
Srihari




     Introduction to ML: Topics
1. Polynomial Curve Fitting
2. Probability Theory of multiple variables
3. Maximum Likelihood
4. Bayesian Approach
5. Model Selection
6. Curse of Dimensionality



                                                  2
Machine Learning     
    
    
       
   
Srihari




       Polynomial Curve Fitting

                   Sargur N. Srihari




                                                      3
Machine Learning   
   
        
   
        
Srihari




    Simple Regression Problem
•  Observe Real-valued
   input variable x
•  Use x to predict value
   of target variable t
                          Target
•  Synthetic data         Variable
   generated from
   sin(2π x)
•  Random noise in
                                             Input Variable
   target values
                                                              4
Machine Learning   
        
          
        
        
Srihari




                        Notation
•  N observations of x
   x = (x1,..,xN)T
   t = (t1,..,tN)T
•  Goal is to exploit training
   set to predict value of
   from x
                                     Data Generation:
•  Inherently a difficult            N = 10
   problem                           Spaced uniformly in range [0,1]
                                     Generated from sin(2πx) by adding
•  Probability theory allows         small Gaussian noise
   us to make a prediction           Noise typical due to unobserved variables



                                                                          5
Machine Learning   
   
   
     
     
Srihari




        Polynomial Curve Fitting
•  Polynomial function


•  Where M is the order of the polynomial
•  Is higher value of M better? We’ll see shortly!
•  Coefficients w0 ,…wM are denoted by vector w
•  Nonlinear function of x, linear function of
   coefficients w
•  Called Linear Models

                                                       6
Machine Learning     
   
   
          
        
Srihari




                       Error Function
•  Sum of squares of the
   errors between the                Red line is best polynomial fit
   predictions y(xn,w) for
   each data point xn and
   target value tn


•  Factor ½ included for
   later convenience
•  Solve by choosing value
   of w for which E(w) is
   as small as possible
                                                                   7
Machine Learning   
   
       
              
                            
Srihari




    Minimization of Error Function
•  Error function is a quadratic in
   coefficients w
•  Derivative with respect to             Since y(x,w) = ∑ w j x j
                                                                              M




   coefficients will be linear in
                                                                              j= 0




   elements of w
                                          ∂E(w) N
                                                = ∑{y(x n ,w) − t n }x n
                                                                       i

•  Thus error function has a unique        ∂w i   n=1


   minimum denoted w*
                                                                  N      M
                                                         = ∑{∑ w j x nj − t n }x n
                                                                                 i

                                                                 n =1   j=0

•  Resulting polynomial is y(x,w*)        Setting equal to zero
                                          N   M                              N

                                          ∑∑w            j   x   i+ j
                                                                 n      = ∑ tn x n
                                                                                 i

                                          n=1 j= 0                        n=1

                                          Set of M +1 equations (i = 0,.., M) are
                                          solved to get elements of w *
                                                                                             8


                                  €
Machine Learning    
          
   
   
   
Srihari


      Choosing the order of M
•  Model Comparison or Model Selection
•  Red lines are best fits with
   –  M = 0,1,3,9 and N=10


                                              Poor
                                              representations
                                              of sin(2πx)




                                                   Over Fit
                           Best Fit
                                                   Poor
                           to                      representation
                           sin(2πx)
                                                   of sin(2πx)


                                                            9
Machine Learning        
               
            
       
       
Srihari




         Generalization Performance
    •  Consider separate test set of
       100 points
    •  For each value of M evaluate
             1 N
      E(w*) = ∑{y(x n ,w*) − t n }2
             2 n=1                                M
                                      y(x,w*) = ∑ w *j x j
                                                 j= 0

       for training data and test data
€   •  Use RMS error €
                                                                                      M=9 means ten
                                                                                      degrees of
                                                                           Small      freedom.
                                                             Poor due to              Tuned
       –  Division by N allows different                                   Error
                                                             Inflexible               exactly to 10
          sizes of N to be compared on                       polynomials              training points
          equal footing                                                               (wild
       –  Square root ensures ERMS is                                                 oscillations
          measured in same units as t                                                 in polynomial)
                                                                                              10
Machine Learning       
        
        
        
          
Srihari


  Values of Coefficients w* for
different polynomials of order M




        As M increases magnitude of coefficients increases
        At M=9 finely tuned to random noise in target values
                                                                          11
Machine Learning   
   
    
   
   
Srihari




    Increasing Size of Data Set

N=15, 100
For a given model complexity
  overfitting problem is less
  severe as size of data set
  increases
Larger the data set, the more
  complex we can afford to fit
  the data
Data should be no less than 5
  to 10 times adaptive
  parameters in model                               12
Machine Learning   
   
      
      
      
Srihari


 Least Squares is case of Maximum
            Likelihood
•  Unsatisfying to limit the number of parameters to
   size of training set
•  More reasonable to choose model complexity
   according to problem complexity
•  Least squares approach is a specific case of
   maximum likelihood
  –  Over-fitting is a general property of maximum likelihood
•  Bayesian approach avoids over-fitting problem
  –  No. of parameters can greatly exceed no. of data points
  –  Effective no. of parameters adapts automatically to size of
     data set
                                                             13
Machine Learning   
   
     
     
        
Srihari




Regularization of Least Squares
•  Using relatively complex models with data
   sets of limited size
•  Add a penalty term to error function to
   discourage coefficients from reaching
   large values                  •  λ determines relative
                                    importance of
                                           regularization term
                                           to error term
                                        •  Can be minimized
                                            exactly in closed form
                                        •  Known as shrinkage
                                            in statistics
                                            Weight decay in neural
                                              networks          14
Machine Learning             
     
            
          
      
Srihari




                  Effect of Regularizer
M=9 polynomials using regularized error function
                           Optimal
                                                            No                       Large
                                                            Regularizer              Regularizer
                                                            λ = 0
                   λ = 1




 Large Regularizer              No Regularizer

                                               λ = 0




                                                                                          15
Machine Learning   
      
      
   
      
Srihari




  Impact of Regularization on Error
•  λ controls the complexity of the
   model and hence degree of over-
   fitting
   –  Analogous to choice of M
•  Suggested Approach:
•  Training set
   –  to determine coefficients w             M=9 polynomial
   –  For different values of (M or λ)
•  Validation set (holdout)
   –  to optimize model complexity (M or λ)
                                                               16
Machine Learning   
   
   
     
     
Srihari




Concluding Remarks on Regression
•  Approach suggests partitioning data into training
   set to determine coefficients w
•  Separate validation set (or hold-out set) to
   optimize model complexity M or λ
•  More sophisticated approaches are not as
   wasteful of training data
•  More principled approach is based on probability
   theory
•  Classification is a special case of regression
   where target value is discrete values
                                                      17

More Related Content

What's hot

Stressen's matrix multiplication
Stressen's matrix multiplicationStressen's matrix multiplication
Stressen's matrix multiplication
Kumar
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
DEEPASHRI HK
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
Slideshare
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
butest
 

What's hot (20)

Stressen's matrix multiplication
Stressen's matrix multiplicationStressen's matrix multiplication
Stressen's matrix multiplication
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Deep learning
Deep learningDeep learning
Deep learning
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
 
Markov Chain Monte Carlo Methods
Markov Chain Monte Carlo MethodsMarkov Chain Monte Carlo Methods
Markov Chain Monte Carlo Methods
 
Probabilistic Reasoning
Probabilistic Reasoning Probabilistic Reasoning
Probabilistic Reasoning
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptx
 
polynomial linear regression
polynomial linear regressionpolynomial linear regression
polynomial linear regression
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
 
Curve fitting
Curve fittingCurve fitting
Curve fitting
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
 
MACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMMACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHM
 
Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsMl3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metrics
 

Viewers also liked

Least square method
Least square methodLeast square method
Least square method
Somya Bagai
 
The least square method
The least square methodThe least square method
The least square method
kevinlefol
 
Method of least square
Method of least squareMethod of least square
Method of least square
Somya Bagai
 
Polynomial functions
Polynomial functionsPolynomial functions
Polynomial functions
nicole379865
 
2.4 grapgs of second degree functions
2.4 grapgs of second degree functions2.4 grapgs of second degree functions
2.4 grapgs of second degree functions
math260
 
Curve Fitting results 1110
Curve Fitting results 1110Curve Fitting results 1110
Curve Fitting results 1110
Ray Wang
 

Viewers also liked (20)

Curve fitting
Curve fitting Curve fitting
Curve fitting
 
Curve fitting - Lecture Notes
Curve fitting - Lecture NotesCurve fitting - Lecture Notes
Curve fitting - Lecture Notes
 
Applied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
Applied Numerical Methods Curve Fitting: Least Squares Regression, InterpolationApplied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
Applied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
 
Least square method
Least square methodLeast square method
Least square method
 
Polynomials and Curve Fitting in MATLAB
Polynomials and Curve Fitting in MATLABPolynomials and Curve Fitting in MATLAB
Polynomials and Curve Fitting in MATLAB
 
Numerical method (curve fitting)
Numerical method (curve fitting)Numerical method (curve fitting)
Numerical method (curve fitting)
 
The least square method
The least square methodThe least square method
The least square method
 
Applied numerical methods lec8
Applied numerical methods lec8Applied numerical methods lec8
Applied numerical methods lec8
 
Matlab polynimials and curve fitting
Matlab polynimials and curve fittingMatlab polynimials and curve fitting
Matlab polynimials and curve fitting
 
Method of least square
Method of least squareMethod of least square
Method of least square
 
Curve fitting
Curve fittingCurve fitting
Curve fitting
 
case study of curve fitting
case study of curve fittingcase study of curve fitting
case study of curve fitting
 
Polynomial functions
Polynomial functionsPolynomial functions
Polynomial functions
 
Non linear curve fitting
Non linear curve fitting Non linear curve fitting
Non linear curve fitting
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 
2.4 grapgs of second degree functions
2.4 grapgs of second degree functions2.4 grapgs of second degree functions
2.4 grapgs of second degree functions
 
Automating pKa Curve Fitting Using Origin
Automating pKa Curve Fitting Using OriginAutomating pKa Curve Fitting Using Origin
Automating pKa Curve Fitting Using Origin
 
Curve Fitting results 1110
Curve Fitting results 1110Curve Fitting results 1110
Curve Fitting results 1110
 
Applocation of Numerical Methods
Applocation of Numerical MethodsApplocation of Numerical Methods
Applocation of Numerical Methods
 
WF ED 540, CLASS MEETING 6, Linear Regression, 2016
WF ED 540, CLASS MEETING 6, Linear Regression, 2016WF ED 540, CLASS MEETING 6, Linear Regression, 2016
WF ED 540, CLASS MEETING 6, Linear Regression, 2016
 

Similar to Curve fitting

Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputs
zukun
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
Vu Pham
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
npinto
 
05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer vision
zukun
 
04 structured support vector machine
04 structured support vector machine04 structured support vector machine
04 structured support vector machine
zukun
 
/.Amd mnt/lotus/host/home/jaishakthi/presentation/tencon/tencon
/.Amd mnt/lotus/host/home/jaishakthi/presentation/tencon/tencon/.Amd mnt/lotus/host/home/jaishakthi/presentation/tencon/tencon
/.Amd mnt/lotus/host/home/jaishakthi/presentation/tencon/tencon
Dr. Jai Sakthi
 
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
ICAC09
 
icml2004 tutorial on bayesian methods for machine learning
icml2004 tutorial on bayesian methods for machine learningicml2004 tutorial on bayesian methods for machine learning
icml2004 tutorial on bayesian methods for machine learning
zukun
 
Bayesian Methods for Machine Learning
Bayesian Methods for Machine LearningBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning
butest
 
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptcs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
GayathriRHICETCSESTA
 

Similar to Curve fitting (20)

Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputs
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
 
05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer vision
 
Dsp2
Dsp2Dsp2
Dsp2
 
Higham
HighamHigham
Higham
 
Bouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration ToolboxBouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration Toolbox
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
04 structured support vector machine
04 structured support vector machine04 structured support vector machine
04 structured support vector machine
 
Notes 6-2
Notes 6-2Notes 6-2
Notes 6-2
 
Mesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingMesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic Sampling
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
/.Amd mnt/lotus/host/home/jaishakthi/presentation/tencon/tencon
/.Amd mnt/lotus/host/home/jaishakthi/presentation/tencon/tencon/.Amd mnt/lotus/host/home/jaishakthi/presentation/tencon/tencon
/.Amd mnt/lotus/host/home/jaishakthi/presentation/tencon/tencon
 
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
 
icml2004 tutorial on bayesian methods for machine learning
icml2004 tutorial on bayesian methods for machine learningicml2004 tutorial on bayesian methods for machine learning
icml2004 tutorial on bayesian methods for machine learning
 
Bayesian Methods for Machine Learning
Bayesian Methods for Machine LearningBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning
 
feedforward-network-
feedforward-network-feedforward-network-
feedforward-network-
 
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptcs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
 
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptcs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
 

More from JULIO GONZALEZ SANZ

Cmmi hm 2008 sepg model changes for high maturity 1v01[1]
Cmmi hm 2008 sepg model changes for high maturity  1v01[1]Cmmi hm 2008 sepg model changes for high maturity  1v01[1]
Cmmi hm 2008 sepg model changes for high maturity 1v01[1]
JULIO GONZALEZ SANZ
 
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
JULIO GONZALEZ SANZ
 
Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]
JULIO GONZALEZ SANZ
 
Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]
JULIO GONZALEZ SANZ
 
The need for a balanced measurement system
The need for a balanced measurement systemThe need for a balanced measurement system
The need for a balanced measurement system
JULIO GONZALEZ SANZ
 
Just in-time and lean production
Just in-time and lean productionJust in-time and lean production
Just in-time and lean production
JULIO GONZALEZ SANZ
 
History of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfrHistory of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfr
JULIO GONZALEZ SANZ
 
An architecture for data quality
An architecture for data qualityAn architecture for data quality
An architecture for data quality
JULIO GONZALEZ SANZ
 
Sap analytics creating smart business processes
Sap analytics   creating smart business processesSap analytics   creating smart business processes
Sap analytics creating smart business processes
JULIO GONZALEZ SANZ
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
JULIO GONZALEZ SANZ
 

More from JULIO GONZALEZ SANZ (20)

Cmmi hm 2008 sepg model changes for high maturity 1v01[1]
Cmmi hm 2008 sepg model changes for high maturity  1v01[1]Cmmi hm 2008 sepg model changes for high maturity  1v01[1]
Cmmi hm 2008 sepg model changes for high maturity 1v01[1]
 
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
 
Cmmi 26 ago_2009_
Cmmi 26 ago_2009_Cmmi 26 ago_2009_
Cmmi 26 ago_2009_
 
Creation use-of-simple-model
Creation use-of-simple-modelCreation use-of-simple-model
Creation use-of-simple-model
 
Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]
 
Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]
 
The need for a balanced measurement system
The need for a balanced measurement systemThe need for a balanced measurement system
The need for a balanced measurement system
 
Magic quadrant
Magic quadrantMagic quadrant
Magic quadrant
 
6 six sigma presentation
6 six sigma presentation6 six sigma presentation
6 six sigma presentation
 
Volvo csr suppliers guide vsib
Volvo csr suppliers guide vsibVolvo csr suppliers guide vsib
Volvo csr suppliers guide vsib
 
Just in-time and lean production
Just in-time and lean productionJust in-time and lean production
Just in-time and lean production
 
History of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfrHistory of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfr
 
Using minitab exec files
Using minitab exec filesUsing minitab exec files
Using minitab exec files
 
Sga iso-14001
Sga iso-14001Sga iso-14001
Sga iso-14001
 
Cslt closing plenary_portugal
Cslt closing plenary_portugalCslt closing plenary_portugal
Cslt closing plenary_portugal
 
Une 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julioUne 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julio
 
Swebokv3
Swebokv3 Swebokv3
Swebokv3
 
An architecture for data quality
An architecture for data qualityAn architecture for data quality
An architecture for data quality
 
Sap analytics creating smart business processes
Sap analytics   creating smart business processesSap analytics   creating smart business processes
Sap analytics creating smart business processes
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 

Curve fitting

  • 1. Machine Learning Srihari Basic Concepts in Machine Learning Sargur N. Srihari 1
  • 2. Machine Learning Srihari Introduction to ML: Topics 1. Polynomial Curve Fitting 2. Probability Theory of multiple variables 3. Maximum Likelihood 4. Bayesian Approach 5. Model Selection 6. Curse of Dimensionality 2
  • 3. Machine Learning Srihari Polynomial Curve Fitting Sargur N. Srihari 3
  • 4. Machine Learning Srihari Simple Regression Problem •  Observe Real-valued input variable x •  Use x to predict value of target variable t Target •  Synthetic data Variable generated from sin(2π x) •  Random noise in Input Variable target values 4
  • 5. Machine Learning Srihari Notation •  N observations of x x = (x1,..,xN)T t = (t1,..,tN)T •  Goal is to exploit training set to predict value of from x Data Generation: •  Inherently a difficult N = 10 problem Spaced uniformly in range [0,1] Generated from sin(2πx) by adding •  Probability theory allows small Gaussian noise us to make a prediction Noise typical due to unobserved variables 5
  • 6. Machine Learning Srihari Polynomial Curve Fitting •  Polynomial function •  Where M is the order of the polynomial •  Is higher value of M better? We’ll see shortly! •  Coefficients w0 ,…wM are denoted by vector w •  Nonlinear function of x, linear function of coefficients w •  Called Linear Models 6
  • 7. Machine Learning Srihari Error Function •  Sum of squares of the errors between the Red line is best polynomial fit predictions y(xn,w) for each data point xn and target value tn •  Factor ½ included for later convenience •  Solve by choosing value of w for which E(w) is as small as possible 7
  • 8. Machine Learning Srihari Minimization of Error Function •  Error function is a quadratic in coefficients w •  Derivative with respect to Since y(x,w) = ∑ w j x j M coefficients will be linear in j= 0 elements of w ∂E(w) N = ∑{y(x n ,w) − t n }x n i •  Thus error function has a unique ∂w i n=1 minimum denoted w* N M = ∑{∑ w j x nj − t n }x n i n =1 j=0 •  Resulting polynomial is y(x,w*) Setting equal to zero N M N ∑∑w j x i+ j n = ∑ tn x n i n=1 j= 0 n=1 Set of M +1 equations (i = 0,.., M) are solved to get elements of w * 8 €
  • 9. Machine Learning Srihari Choosing the order of M •  Model Comparison or Model Selection •  Red lines are best fits with –  M = 0,1,3,9 and N=10 Poor representations of sin(2πx) Over Fit Best Fit Poor to representation sin(2πx) of sin(2πx) 9
  • 10. Machine Learning Srihari Generalization Performance •  Consider separate test set of 100 points •  For each value of M evaluate 1 N E(w*) = ∑{y(x n ,w*) − t n }2 2 n=1 M y(x,w*) = ∑ w *j x j j= 0 for training data and test data € •  Use RMS error € M=9 means ten degrees of Small freedom. Poor due to Tuned –  Division by N allows different Error Inflexible exactly to 10 sizes of N to be compared on polynomials training points equal footing (wild –  Square root ensures ERMS is oscillations measured in same units as t in polynomial) 10
  • 11. Machine Learning Srihari Values of Coefficients w* for different polynomials of order M As M increases magnitude of coefficients increases At M=9 finely tuned to random noise in target values 11
  • 12. Machine Learning Srihari Increasing Size of Data Set N=15, 100 For a given model complexity overfitting problem is less severe as size of data set increases Larger the data set, the more complex we can afford to fit the data Data should be no less than 5 to 10 times adaptive parameters in model 12
  • 13. Machine Learning Srihari Least Squares is case of Maximum Likelihood •  Unsatisfying to limit the number of parameters to size of training set •  More reasonable to choose model complexity according to problem complexity •  Least squares approach is a specific case of maximum likelihood –  Over-fitting is a general property of maximum likelihood •  Bayesian approach avoids over-fitting problem –  No. of parameters can greatly exceed no. of data points –  Effective no. of parameters adapts automatically to size of data set 13
  • 14. Machine Learning Srihari Regularization of Least Squares •  Using relatively complex models with data sets of limited size •  Add a penalty term to error function to discourage coefficients from reaching large values •  λ determines relative importance of regularization term to error term •  Can be minimized exactly in closed form •  Known as shrinkage in statistics Weight decay in neural networks 14
  • 15. Machine Learning Srihari Effect of Regularizer M=9 polynomials using regularized error function Optimal No Large Regularizer Regularizer λ = 0 λ = 1 Large Regularizer No Regularizer λ = 0 15
  • 16. Machine Learning Srihari Impact of Regularization on Error •  λ controls the complexity of the model and hence degree of over- fitting –  Analogous to choice of M •  Suggested Approach: •  Training set –  to determine coefficients w M=9 polynomial –  For different values of (M or λ) •  Validation set (holdout) –  to optimize model complexity (M or λ) 16
  • 17. Machine Learning Srihari Concluding Remarks on Regression •  Approach suggests partitioning data into training set to determine coefficients w •  Separate validation set (or hold-out set) to optimize model complexity M or λ •  More sophisticated approaches are not as wasteful of training data •  More principled approach is based on probability theory •  Classification is a special case of regression where target value is discrete values 17