SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
Machine Learning     
    
    
       
   
Srihari




   Basic Concepts in Machine
           Learning
                   Sargur N. Srihari




                                                      1
Machine Learning   
   
   
   
   
Srihari




     Introduction to ML: Topics
1. Polynomial Curve Fitting
2. Probability Theory of multiple variables
3. Maximum Likelihood
4. Bayesian Approach
5. Model Selection
6. Curse of Dimensionality



                                                  2
Machine Learning     
    
    
       
   
Srihari




       Polynomial Curve Fitting

                   Sargur N. Srihari




                                                      3
Machine Learning   
   
        
   
        
Srihari




    Simple Regression Problem
•  Observe Real-valued
   input variable x
•  Use x to predict value
   of target variable t
                          Target
•  Synthetic data         Variable
   generated from
   sin(2π x)
•  Random noise in
                                             Input Variable
   target values
                                                              4
Machine Learning   
        
          
        
        
Srihari




                        Notation
•  N observations of x
   x = (x1,..,xN)T
   t = (t1,..,tN)T
•  Goal is to exploit training
   set to predict value of
   from x
                                     Data Generation:
•  Inherently a difficult            N = 10
   problem                           Spaced uniformly in range [0,1]
                                     Generated from sin(2πx) by adding
•  Probability theory allows         small Gaussian noise
   us to make a prediction           Noise typical due to unobserved variables



                                                                          5
Machine Learning   
   
   
     
     
Srihari




        Polynomial Curve Fitting
•  Polynomial function


•  Where M is the order of the polynomial
•  Is higher value of M better? We’ll see shortly!
•  Coefficients w0 ,…wM are denoted by vector w
•  Nonlinear function of x, linear function of
   coefficients w
•  Called Linear Models

                                                       6
Machine Learning     
   
   
          
        
Srihari




                       Error Function
•  Sum of squares of the
   errors between the                Red line is best polynomial fit
   predictions y(xn,w) for
   each data point xn and
   target value tn


•  Factor ½ included for
   later convenience
•  Solve by choosing value
   of w for which E(w) is
   as small as possible
                                                                   7
Machine Learning   
   
       
              
                            
Srihari




    Minimization of Error Function
•  Error function is a quadratic in
   coefficients w
•  Derivative with respect to             Since y(x,w) = ∑ w j x j
                                                                              M




   coefficients will be linear in
                                                                              j= 0




   elements of w
                                          ∂E(w) N
                                                = ∑{y(x n ,w) − t n }x n
                                                                       i

•  Thus error function has a unique        ∂w i   n=1


   minimum denoted w*
                                                                  N      M
                                                         = ∑{∑ w j x nj − t n }x n
                                                                                 i

                                                                 n =1   j=0

•  Resulting polynomial is y(x,w*)        Setting equal to zero
                                          N   M                              N

                                          ∑∑w            j   x   i+ j
                                                                 n      = ∑ tn x n
                                                                                 i

                                          n=1 j= 0                        n=1

                                          Set of M +1 equations (i = 0,.., M) are
                                          solved to get elements of w *
                                                                                             8


                                  €
Machine Learning    
          
   
   
   
Srihari


      Choosing the order of M
•  Model Comparison or Model Selection
•  Red lines are best fits with
   –  M = 0,1,3,9 and N=10


                                              Poor
                                              representations
                                              of sin(2πx)




                                                   Over Fit
                           Best Fit
                                                   Poor
                           to                      representation
                           sin(2πx)
                                                   of sin(2πx)


                                                            9
Machine Learning        
               
            
       
       
Srihari




         Generalization Performance
    •  Consider separate test set of
       100 points
    •  For each value of M evaluate
             1 N
      E(w*) = ∑{y(x n ,w*) − t n }2
             2 n=1                                M
                                      y(x,w*) = ∑ w *j x j
                                                 j= 0

       for training data and test data
€   •  Use RMS error €
                                                                                      M=9 means ten
                                                                                      degrees of
                                                                           Small      freedom.
                                                             Poor due to              Tuned
       –  Division by N allows different                                   Error
                                                             Inflexible               exactly to 10
          sizes of N to be compared on                       polynomials              training points
          equal footing                                                               (wild
       –  Square root ensures ERMS is                                                 oscillations
          measured in same units as t                                                 in polynomial)
                                                                                              10
Machine Learning       
        
        
        
          
Srihari


  Values of Coefficients w* for
different polynomials of order M




        As M increases magnitude of coefficients increases
        At M=9 finely tuned to random noise in target values
                                                                          11
Machine Learning   
   
    
   
   
Srihari




    Increasing Size of Data Set

N=15, 100
For a given model complexity
  overfitting problem is less
  severe as size of data set
  increases
Larger the data set, the more
  complex we can afford to fit
  the data
Data should be no less than 5
  to 10 times adaptive
  parameters in model                               12
Machine Learning   
   
      
      
      
Srihari


 Least Squares is case of Maximum
            Likelihood
•  Unsatisfying to limit the number of parameters to
   size of training set
•  More reasonable to choose model complexity
   according to problem complexity
•  Least squares approach is a specific case of
   maximum likelihood
  –  Over-fitting is a general property of maximum likelihood
•  Bayesian approach avoids over-fitting problem
  –  No. of parameters can greatly exceed no. of data points
  –  Effective no. of parameters adapts automatically to size of
     data set
                                                             13
Machine Learning   
   
     
     
        
Srihari




Regularization of Least Squares
•  Using relatively complex models with data
   sets of limited size
•  Add a penalty term to error function to
   discourage coefficients from reaching
   large values                  •  λ determines relative
                                    importance of
                                           regularization term
                                           to error term
                                        •  Can be minimized
                                            exactly in closed form
                                        •  Known as shrinkage
                                            in statistics
                                            Weight decay in neural
                                              networks          14
Machine Learning             
     
            
          
      
Srihari




                  Effect of Regularizer
M=9 polynomials using regularized error function
                           Optimal
                                                            No                       Large
                                                            Regularizer              Regularizer
                                                            λ = 0
                   λ = 1




 Large Regularizer              No Regularizer

                                               λ = 0




                                                                                          15
Machine Learning   
      
      
   
      
Srihari




  Impact of Regularization on Error
•  λ controls the complexity of the
   model and hence degree of over-
   fitting
   –  Analogous to choice of M
•  Suggested Approach:
•  Training set
   –  to determine coefficients w             M=9 polynomial
   –  For different values of (M or λ)
•  Validation set (holdout)
   –  to optimize model complexity (M or λ)
                                                               16
Machine Learning   
   
   
     
     
Srihari




Concluding Remarks on Regression
•  Approach suggests partitioning data into training
   set to determine coefficients w
•  Separate validation set (or hold-out set) to
   optimize model complexity M or λ
•  More sophisticated approaches are not as
   wasteful of training data
•  More principled approach is based on probability
   theory
•  Classification is a special case of regression
   where target value is discrete values
                                                      17

Contenu connexe

Tendances

3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
3. Linear Algebra for Machine Learning: Factorization and Linear TransformationsCeni Babaoglu, PhD
 
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...Ceni Babaoglu, PhD
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learningKien Le
 
Numerical methods and its applications
Numerical methods and its applicationsNumerical methods and its applications
Numerical methods and its applicationsHaiderParekh1
 
Regula falsi method
Regula falsi methodRegula falsi method
Regula falsi methodandrushow
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descentSuraj Parmar
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes ClassifierYiqun Hu
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERKnoldus Inc.
 
Simulated Annealing
Simulated AnnealingSimulated Annealing
Simulated AnnealingJoy Dutta
 
Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2Ankit Gupta
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.Megha Sharma
 
B-spline
B-spline B-spline
B-spline nmahi96
 
Linear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaLinear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaEdureka!
 
Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Statistics Using Python | Statistics Python Tutorial | Python Certification T...Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Statistics Using Python | Statistics Python Tutorial | Python Certification T...Edureka!
 

Tendances (20)

3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
 
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...
 
2.03 bayesian estimation
2.03 bayesian estimation2.03 bayesian estimation
2.03 bayesian estimation
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Numerical methods and its applications
Numerical methods and its applicationsNumerical methods and its applications
Numerical methods and its applications
 
Regula falsi method
Regula falsi methodRegula falsi method
Regula falsi method
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Euler and runge kutta method
Euler and runge kutta methodEuler and runge kutta method
Euler and runge kutta method
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 
Lecture26
Lecture26Lecture26
Lecture26
 
Simulated Annealing
Simulated AnnealingSimulated Annealing
Simulated Annealing
 
Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2
 
Hmm viterbi
Hmm viterbiHmm viterbi
Hmm viterbi
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.
 
B-spline
B-spline B-spline
B-spline
 
Linear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaLinear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | Edureka
 
Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Statistics Using Python | Statistics Python Tutorial | Python Certification T...Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Statistics Using Python | Statistics Python Tutorial | Python Certification T...
 

En vedette

Curve fitting - Lecture Notes
Curve fitting - Lecture NotesCurve fitting - Lecture Notes
Curve fitting - Lecture NotesDr. Nirav Vyas
 
Applied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
Applied Numerical Methods Curve Fitting: Least Squares Regression, InterpolationApplied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
Applied Numerical Methods Curve Fitting: Least Squares Regression, InterpolationBrian Erandio
 
Least square method
Least square methodLeast square method
Least square methodSomya Bagai
 
Polynomials and Curve Fitting in MATLAB
Polynomials and Curve Fitting in MATLABPolynomials and Curve Fitting in MATLAB
Polynomials and Curve Fitting in MATLABShameer Ahmed Koya
 
The least square method
The least square methodThe least square method
The least square methodkevinlefol
 
Applied numerical methods lec8
Applied numerical methods lec8Applied numerical methods lec8
Applied numerical methods lec8Yasser Ahmed
 
Matlab polynimials and curve fitting
Matlab polynimials and curve fittingMatlab polynimials and curve fitting
Matlab polynimials and curve fittingAmeen San
 
Method of least square
Method of least squareMethod of least square
Method of least squareSomya Bagai
 
Curve fitting
Curve fittingCurve fitting
Curve fittingdusan4rs
 
case study of curve fitting
case study of curve fittingcase study of curve fitting
case study of curve fittingAdarsh Patel
 
Polynomial functions
Polynomial functionsPolynomial functions
Polynomial functionsnicole379865
 
Non linear curve fitting
Non linear curve fitting Non linear curve fitting
Non linear curve fitting Anumita Mondal
 
2.4 grapgs of second degree functions
2.4 grapgs of second degree functions2.4 grapgs of second degree functions
2.4 grapgs of second degree functionsmath260
 
Automating pKa Curve Fitting Using Origin
Automating pKa Curve Fitting Using OriginAutomating pKa Curve Fitting Using Origin
Automating pKa Curve Fitting Using OriginBrian Bissett
 
Curve Fitting results 1110
Curve Fitting results 1110Curve Fitting results 1110
Curve Fitting results 1110Ray Wang
 
Applocation of Numerical Methods
Applocation of Numerical MethodsApplocation of Numerical Methods
Applocation of Numerical MethodsNahian Ahmed
 
WF ED 540, CLASS MEETING 6, Linear Regression, 2016
WF ED 540, CLASS MEETING 6, Linear Regression, 2016WF ED 540, CLASS MEETING 6, Linear Regression, 2016
WF ED 540, CLASS MEETING 6, Linear Regression, 2016Penn State University
 

En vedette (20)

Curve fitting
Curve fitting Curve fitting
Curve fitting
 
Curve fitting - Lecture Notes
Curve fitting - Lecture NotesCurve fitting - Lecture Notes
Curve fitting - Lecture Notes
 
Applied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
Applied Numerical Methods Curve Fitting: Least Squares Regression, InterpolationApplied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
Applied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
 
Least square method
Least square methodLeast square method
Least square method
 
Polynomials and Curve Fitting in MATLAB
Polynomials and Curve Fitting in MATLABPolynomials and Curve Fitting in MATLAB
Polynomials and Curve Fitting in MATLAB
 
Numerical method (curve fitting)
Numerical method (curve fitting)Numerical method (curve fitting)
Numerical method (curve fitting)
 
The least square method
The least square methodThe least square method
The least square method
 
Applied numerical methods lec8
Applied numerical methods lec8Applied numerical methods lec8
Applied numerical methods lec8
 
Matlab polynimials and curve fitting
Matlab polynimials and curve fittingMatlab polynimials and curve fitting
Matlab polynimials and curve fitting
 
Method of least square
Method of least squareMethod of least square
Method of least square
 
Curve fitting
Curve fittingCurve fitting
Curve fitting
 
case study of curve fitting
case study of curve fittingcase study of curve fitting
case study of curve fitting
 
Polynomial functions
Polynomial functionsPolynomial functions
Polynomial functions
 
Non linear curve fitting
Non linear curve fitting Non linear curve fitting
Non linear curve fitting
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 
2.4 grapgs of second degree functions
2.4 grapgs of second degree functions2.4 grapgs of second degree functions
2.4 grapgs of second degree functions
 
Automating pKa Curve Fitting Using Origin
Automating pKa Curve Fitting Using OriginAutomating pKa Curve Fitting Using Origin
Automating pKa Curve Fitting Using Origin
 
Curve Fitting results 1110
Curve Fitting results 1110Curve Fitting results 1110
Curve Fitting results 1110
 
Applocation of Numerical Methods
Applocation of Numerical MethodsApplocation of Numerical Methods
Applocation of Numerical Methods
 
WF ED 540, CLASS MEETING 6, Linear Regression, 2016
WF ED 540, CLASS MEETING 6, Linear Regression, 2016WF ED 540, CLASS MEETING 6, Linear Regression, 2016
WF ED 540, CLASS MEETING 6, Linear Regression, 2016
 

Similaire à Curve fitting

Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputszukun
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture modelsVu Pham
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...npinto
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Beniamino Murgante
 
05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer visionzukun
 
Bouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration ToolboxBouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration ToolboxYuji Oyamada
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
 
04 structured support vector machine
04 structured support vector machine04 structured support vector machine
04 structured support vector machinezukun
 
Mesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingMesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingGabriel Peyré
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
/.Amd mnt/lotus/host/home/jaishakthi/presentation/tencon/tencon
/.Amd mnt/lotus/host/home/jaishakthi/presentation/tencon/tencon/.Amd mnt/lotus/host/home/jaishakthi/presentation/tencon/tencon
/.Amd mnt/lotus/host/home/jaishakthi/presentation/tencon/tenconDr. Jai Sakthi
 
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...ICAC09
 
icml2004 tutorial on bayesian methods for machine learning
icml2004 tutorial on bayesian methods for machine learningicml2004 tutorial on bayesian methods for machine learning
icml2004 tutorial on bayesian methods for machine learningzukun
 
Bayesian Methods for Machine Learning
Bayesian Methods for Machine LearningBayesian Methods for Machine Learning
Bayesian Methods for Machine Learningbutest
 
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptcs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptGayathriRHICETCSESTA
 
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptcs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptGayathriRHICETCSESTA
 

Similaire à Curve fitting (20)

Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputs
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
 
05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer vision
 
Dsp2
Dsp2Dsp2
Dsp2
 
Higham
HighamHigham
Higham
 
Bouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration ToolboxBouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration Toolbox
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
04 structured support vector machine
04 structured support vector machine04 structured support vector machine
04 structured support vector machine
 
Notes 6-2
Notes 6-2Notes 6-2
Notes 6-2
 
Mesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingMesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic Sampling
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
/.Amd mnt/lotus/host/home/jaishakthi/presentation/tencon/tencon
/.Amd mnt/lotus/host/home/jaishakthi/presentation/tencon/tencon/.Amd mnt/lotus/host/home/jaishakthi/presentation/tencon/tencon
/.Amd mnt/lotus/host/home/jaishakthi/presentation/tencon/tencon
 
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
 
icml2004 tutorial on bayesian methods for machine learning
icml2004 tutorial on bayesian methods for machine learningicml2004 tutorial on bayesian methods for machine learning
icml2004 tutorial on bayesian methods for machine learning
 
Bayesian Methods for Machine Learning
Bayesian Methods for Machine LearningBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning
 
feedforward-network-
feedforward-network-feedforward-network-
feedforward-network-
 
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptcs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
 
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptcs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
 

Plus de JULIO GONZALEZ SANZ

Cmmi hm 2008 sepg model changes for high maturity 1v01[1]
Cmmi hm 2008 sepg model changes for high maturity  1v01[1]Cmmi hm 2008 sepg model changes for high maturity  1v01[1]
Cmmi hm 2008 sepg model changes for high maturity 1v01[1]JULIO GONZALEZ SANZ
 
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]JULIO GONZALEZ SANZ
 
Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]JULIO GONZALEZ SANZ
 
Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]JULIO GONZALEZ SANZ
 
The need for a balanced measurement system
The need for a balanced measurement systemThe need for a balanced measurement system
The need for a balanced measurement systemJULIO GONZALEZ SANZ
 
Just in-time and lean production
Just in-time and lean productionJust in-time and lean production
Just in-time and lean productionJULIO GONZALEZ SANZ
 
History of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfrHistory of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfrJULIO GONZALEZ SANZ
 
Une 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julioUne 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julioJULIO GONZALEZ SANZ
 
An architecture for data quality
An architecture for data qualityAn architecture for data quality
An architecture for data qualityJULIO GONZALEZ SANZ
 
Sap analytics creating smart business processes
Sap analytics   creating smart business processesSap analytics   creating smart business processes
Sap analytics creating smart business processesJULIO GONZALEZ SANZ
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 

Plus de JULIO GONZALEZ SANZ (20)

Cmmi hm 2008 sepg model changes for high maturity 1v01[1]
Cmmi hm 2008 sepg model changes for high maturity  1v01[1]Cmmi hm 2008 sepg model changes for high maturity  1v01[1]
Cmmi hm 2008 sepg model changes for high maturity 1v01[1]
 
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
 
Cmmi 26 ago_2009_
Cmmi 26 ago_2009_Cmmi 26 ago_2009_
Cmmi 26 ago_2009_
 
Creation use-of-simple-model
Creation use-of-simple-modelCreation use-of-simple-model
Creation use-of-simple-model
 
Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]
 
Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]
 
The need for a balanced measurement system
The need for a balanced measurement systemThe need for a balanced measurement system
The need for a balanced measurement system
 
Magic quadrant
Magic quadrantMagic quadrant
Magic quadrant
 
6 six sigma presentation
6 six sigma presentation6 six sigma presentation
6 six sigma presentation
 
Volvo csr suppliers guide vsib
Volvo csr suppliers guide vsibVolvo csr suppliers guide vsib
Volvo csr suppliers guide vsib
 
Just in-time and lean production
Just in-time and lean productionJust in-time and lean production
Just in-time and lean production
 
History of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfrHistory of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfr
 
Using minitab exec files
Using minitab exec filesUsing minitab exec files
Using minitab exec files
 
Sga iso-14001
Sga iso-14001Sga iso-14001
Sga iso-14001
 
Cslt closing plenary_portugal
Cslt closing plenary_portugalCslt closing plenary_portugal
Cslt closing plenary_portugal
 
Une 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julioUne 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julio
 
Swebokv3
Swebokv3 Swebokv3
Swebokv3
 
An architecture for data quality
An architecture for data qualityAn architecture for data quality
An architecture for data quality
 
Sap analytics creating smart business processes
Sap analytics   creating smart business processesSap analytics   creating smart business processes
Sap analytics creating smart business processes
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 

Curve fitting

  • 1. Machine Learning Srihari Basic Concepts in Machine Learning Sargur N. Srihari 1
  • 2. Machine Learning Srihari Introduction to ML: Topics 1. Polynomial Curve Fitting 2. Probability Theory of multiple variables 3. Maximum Likelihood 4. Bayesian Approach 5. Model Selection 6. Curse of Dimensionality 2
  • 3. Machine Learning Srihari Polynomial Curve Fitting Sargur N. Srihari 3
  • 4. Machine Learning Srihari Simple Regression Problem •  Observe Real-valued input variable x •  Use x to predict value of target variable t Target •  Synthetic data Variable generated from sin(2π x) •  Random noise in Input Variable target values 4
  • 5. Machine Learning Srihari Notation •  N observations of x x = (x1,..,xN)T t = (t1,..,tN)T •  Goal is to exploit training set to predict value of from x Data Generation: •  Inherently a difficult N = 10 problem Spaced uniformly in range [0,1] Generated from sin(2πx) by adding •  Probability theory allows small Gaussian noise us to make a prediction Noise typical due to unobserved variables 5
  • 6. Machine Learning Srihari Polynomial Curve Fitting •  Polynomial function •  Where M is the order of the polynomial •  Is higher value of M better? We’ll see shortly! •  Coefficients w0 ,…wM are denoted by vector w •  Nonlinear function of x, linear function of coefficients w •  Called Linear Models 6
  • 7. Machine Learning Srihari Error Function •  Sum of squares of the errors between the Red line is best polynomial fit predictions y(xn,w) for each data point xn and target value tn •  Factor ½ included for later convenience •  Solve by choosing value of w for which E(w) is as small as possible 7
  • 8. Machine Learning Srihari Minimization of Error Function •  Error function is a quadratic in coefficients w •  Derivative with respect to Since y(x,w) = ∑ w j x j M coefficients will be linear in j= 0 elements of w ∂E(w) N = ∑{y(x n ,w) − t n }x n i •  Thus error function has a unique ∂w i n=1 minimum denoted w* N M = ∑{∑ w j x nj − t n }x n i n =1 j=0 •  Resulting polynomial is y(x,w*) Setting equal to zero N M N ∑∑w j x i+ j n = ∑ tn x n i n=1 j= 0 n=1 Set of M +1 equations (i = 0,.., M) are solved to get elements of w * 8 €
  • 9. Machine Learning Srihari Choosing the order of M •  Model Comparison or Model Selection •  Red lines are best fits with –  M = 0,1,3,9 and N=10 Poor representations of sin(2πx) Over Fit Best Fit Poor to representation sin(2πx) of sin(2πx) 9
  • 10. Machine Learning Srihari Generalization Performance •  Consider separate test set of 100 points •  For each value of M evaluate 1 N E(w*) = ∑{y(x n ,w*) − t n }2 2 n=1 M y(x,w*) = ∑ w *j x j j= 0 for training data and test data € •  Use RMS error € M=9 means ten degrees of Small freedom. Poor due to Tuned –  Division by N allows different Error Inflexible exactly to 10 sizes of N to be compared on polynomials training points equal footing (wild –  Square root ensures ERMS is oscillations measured in same units as t in polynomial) 10
  • 11. Machine Learning Srihari Values of Coefficients w* for different polynomials of order M As M increases magnitude of coefficients increases At M=9 finely tuned to random noise in target values 11
  • 12. Machine Learning Srihari Increasing Size of Data Set N=15, 100 For a given model complexity overfitting problem is less severe as size of data set increases Larger the data set, the more complex we can afford to fit the data Data should be no less than 5 to 10 times adaptive parameters in model 12
  • 13. Machine Learning Srihari Least Squares is case of Maximum Likelihood •  Unsatisfying to limit the number of parameters to size of training set •  More reasonable to choose model complexity according to problem complexity •  Least squares approach is a specific case of maximum likelihood –  Over-fitting is a general property of maximum likelihood •  Bayesian approach avoids over-fitting problem –  No. of parameters can greatly exceed no. of data points –  Effective no. of parameters adapts automatically to size of data set 13
  • 14. Machine Learning Srihari Regularization of Least Squares •  Using relatively complex models with data sets of limited size •  Add a penalty term to error function to discourage coefficients from reaching large values •  λ determines relative importance of regularization term to error term •  Can be minimized exactly in closed form •  Known as shrinkage in statistics Weight decay in neural networks 14
  • 15. Machine Learning Srihari Effect of Regularizer M=9 polynomials using regularized error function Optimal No Large Regularizer Regularizer λ = 0 λ = 1 Large Regularizer No Regularizer λ = 0 15
  • 16. Machine Learning Srihari Impact of Regularization on Error •  λ controls the complexity of the model and hence degree of over- fitting –  Analogous to choice of M •  Suggested Approach: •  Training set –  to determine coefficients w M=9 polynomial –  For different values of (M or λ) •  Validation set (holdout) –  to optimize model complexity (M or λ) 16
  • 17. Machine Learning Srihari Concluding Remarks on Regression •  Approach suggests partitioning data into training set to determine coefficients w •  Separate validation set (or hold-out set) to optimize model complexity M or λ •  More sophisticated approaches are not as wasteful of training data •  More principled approach is based on probability theory •  Classification is a special case of regression where target value is discrete values 17