SlideShare une entreprise Scribd logo
1  sur  13
Télécharger pour lire hors ligne
Least squares Support Vector Machine

            Rajkumar Singh


          November 25, 2012
Table of Contents




   Support Vector machine



   Least Squares Support Vector Machine Classifier



   Conclusion
Support Vector Machines

      SVM is a classifier derived from statistical learning theory by
      Vapnik and Chervonnkis.
      SVMs introduced by Boser, Guyon, Vapnik in COLT-92.
      Initially popularized in NIPS community, now an important
      and active field of all Machine Learning Research
   What is SVM?
      SVMs are learning systems that
          Use a hypothesis space of linear functions
          In a high dimensional feature space - kernel functions
          Trained with a learning algorithm from optimization theory -
          Lagrange
          Implements a learning bias derived from statistical learning
          theory - Generalization
Support Vector Machines for Classification
   Given a training set of N data points {yk , xk }N , the support
                                                   k=1
   vector method approach aims at constructing a classifier of the
   form.
                                   N
                   y (x) = sign[         αk yk ψ(x, xk ) + b]        (1)
                                   k=1

   where
   xk ∈ Rn is the k th input pattern
   yk ∈ Rn is the k t h output.
   αk are positive constants, b is real constant.
               
                  T
               xk x                  Linear SVM.
               
               
                T
               (x x + 1)d
                   k                  Polynomial SVM of degree d
   ψ(x, xk ) =               2 /σ 2 }
               exp{− x−xk 2
                                     RBF SVM
               
               tanh(kx T x + θ) Two layer neural SVM
               
                         k
   where σ, θ, k are constants.
SVMs for Classifications
   The classifier is constructed as follows. One assumes that.

                      ω T φ(xk ) + b ≥ 1,      ifyk = +1
                                                                    (2)
                      ω T φ(xk ) + b ≤ −1,     ifyk = −1

   Which is equivalent to

                    yk [ω T φ(xk ) + b] ≥ 1, k = 1, . . . N         (3)
   Where φ() is a non-linear function which maps the input space into
   higher dimensional space.
   In order to have the possibility to violate (3), in case a separating
   hyperplane in this high dimensional space does not exist, variables
   ξk are introduced such that

                 yk [ω T φ(xk ) + b] ≥ 1 − ξk , k = 1, . . . N
                                                                    (4)
                 ξk ≥,                          k = 1, . . . N
SVMs for Classification
   According to the structural risk minimization principle, the risk
   bound is minimized by formulating the optimization problem.
                                                            N
                                       1
                       minJ1 (ω, ξk ) = ω T ω + c                ξk                 (5)
                       ω,ξk            2
                                                           k=1
   Subject to (4). Therefore, one constructs the Lagrangian.
                                              N
   L1 (ω, b, ξk , αk , vk ) = J1 (ω, ξk ) −         αk {yk [ω T φ(xk ) + b] − 1 + ξk }
                                              k=1
                                                                              N
                                                                          −         v k ξk
                                                                              k=1
                                                                                    (6)
   by introducing Lagrange multipliers αk ≥ 0, vk ≥ 0(k = 1, . . . N)
   The solution is given by the saddle point of the Lagrangian by
   computing.
                       max min L1 (ω, b, ξk , αk , vk ).            (7)
                          αk ,vk ω,b,ξk
SVMs for Classification
   from (7) one can obtain.
                                                   N
                         δL1
                             =0→ω=                       αk yk φ(xk
                         δω
                                                  k=1
                                                   N
                                   δL1                                            (8)
                                       =0→               αk y k = 0
                                   δb
                                                   k=1
                 δL1
                      = 0 → 0 ≤ αk ≤ c, k = 1, . . . N.
                 δξk
   Which leads to the solution of the following quadratic
   programming problem
                                    N                                    N
                               1
    max Q1 (αk ; φ(xk )) = −               yk yl φ(xk )T φ(xl )αk αl +         αk (9)
     αk                        2
                                   k,l=1                                 k=1

   such that N αk yk = 0, 0 ≤ α ≤ c, k = 1, . . . N. The function
                 k=1
   φ(xk ) in (9) is related then to ψ(x, xk ) by imposing
                        φ(x)T φ(xk ) = ψ(x, xk ),                                (10)
Note that for the two layer neural SVM, Mercer’s condtion only
holds for certain parameter values of k and θ. The classifier (3) is
designed by solving.
                                    N                                 N
                               1
 max Q1 (αk ; ψ(xk , xl )) = −             yk yl ψ(xk , xl )αk αl +         αk (11)
  αk                           2
                                   k,l=1                              k=1

Subject to the constrainnts in (9). one does not have to calculate
ω nor φ(xk ) in order to determine the decision surface. The
solution to (11) will be global.
Further, it can be show that the hyperplane (3) satisfying
constraint ω 2 ≤ α have a VC-dimension h which is boundend by
                        h ≤ min([r 2 a2 ], n) + 1                              (12)
Where [.] denoted the integer part and r is the radius of the
smallest ball containing the points φ(x1 ), . . . φ(xN ). Such ball is
found by defining Lagrangian.
                                    N
                            2
           L2 (r , q, λk ) = r −         λk (r 2 −   φ(xk ) − q       2
                                                                      2        (13)
                                   k=1
SVMs for Classification


   in (13) q is the center, λk are positive lagrange multipliers. Here
   q = k λk φ(xk ), where the lagrangian follows from.
                                                              N
                                             T
   max Q2 (λk ; φ(xk )) = −           Nφ(xk ) φ(xl )λk λl +         λk φ(xk )T φ(xk )
    λk
                              k,l=1                           k=1
                                                                      (14)
                N
   Such that    k=1 λk = 1, λk ≥ 0, k = 1, . . . N. Based on (10), Q2
   can also be expressed in terms of ψ(xk , xl ). Finally one selects a
   support vector machine VC dimension by solving (11) and
   computing (12) and (12).
Least Squares Support Vector Machines
   Least squares version to the SVM classifier by formulating the
   classification problem as
                                                            N
                                     1          1
                   min J3 (ω, b, e) = ω T ω + γ                    2
                                                                  ek    (15)
                   ω,b,e             2          2
                                                           k=1
   subject to the equality constraints.
                yk [w T φ(xk ) + b] = 1 − ek , k = 1, . . . , N.        (16)
   Lagrangian defined as
                                      N
    L3 (ω, b, e, α) = J3 (ω, b, e)−         αk {yk [ω T φ(xk )+b]−1+ek } (17)
                                      k=1
   where αk are lagrange multipliers. The conditions for optimality
                                              N
                      δL3
                          =0→ω=                    αk yk φ(xk )
                      δω
                                             k=1
                                                                        (18)
                                               N
                             δL3
                                   =0→              αk yk = 0
Least Squares Support Vector Machines
   δL3
       = 0 → αk = γek , k = 1, . . . N
   δek
   δL3
        = 0 → yk [ω T φ(xk ) + b] − 1 + ek = 0, k = 1, . . . , N
   δαk
   can be writte n as the solution to the following set of linear
   equations.
                                    −Z T
                                           
                    I 0 0                     ω      0
                  0 0 0            −Y T   b  0
                                           =  
                                                                   (19)
                   0 0 γI           −I   e  0
                    Z Y I             0       α      I
   Where
   Z = [φ(x1 )T y1 ; . . . φ(xN )T yN ]
   Y = [y1 ; . . . ; yN ]
   →
   1 = [1; . . . 1]
   e = [e1 , . . . eN ],
   α = [α1 , . . . αN ]
Least Squares Support Vector Machines
   The solution is given by

                     0            −Y T           b   0
                                                   =               (20)
                     Y        ZZ T + γ − 1I      a   1

   Mercer’s Condition can be applied again to the matrix Ω = ZZ T ,
   where


                         Ωkl = yk yl φ(xk )T φ(xl )
                                                                   (21)
                                  = yk yl ψ(xk ), xl )

   Hence the classifier (1) is found by solving the linear equations
   (20), (21) instead of quadratic programming. The parameters of
   the kernele such as σ for the RBF kernel can be optimally chosen
   according to (12). The support values αk are proportional to the
   errors ar the data points (18), while in case of (14) most values are
   equal to zero. Hence one could rather speak of a support value
   spectrum in the least squares case.
Conclusion



       Due to the equality constraints, a set of linear equations has
       to be solved instead of quadratic programming,
       Mercer’s condition is applied as in other SVM’s.
       Least squares SVM with RBF kernel is readily found with
       excellent generalization performence and low computational
       cost.
   References
    1. Least Squres Support Vector machine Classifiers., J.A.K
       Suykens, and J. Vandewalie.

Contenu connexe

Tendances

Two dimensional Pool Boiling
Two dimensional Pool BoilingTwo dimensional Pool Boiling
Two dimensional Pool BoilingRobvanGils
 
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Yandex
 
Lecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhsLecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhsStéphane Canu
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...Leo Asselborn
 
fauvel_igarss.pdf
fauvel_igarss.pdffauvel_igarss.pdf
fauvel_igarss.pdfgrssieee
 
Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Shiang-Yun Yang
 
R. Jimenez - Fundamental Physics from Astronomical Observations
R. Jimenez - Fundamental Physics from Astronomical ObservationsR. Jimenez - Fundamental Physics from Astronomical Observations
R. Jimenez - Fundamental Physics from Astronomical ObservationsSEENET-MTP
 
Sienna 2 analysis
Sienna 2 analysisSienna 2 analysis
Sienna 2 analysischidabdu
 
Nonlinear Stochastic Optimization by the Monte-Carlo Method
Nonlinear Stochastic Optimization by the Monte-Carlo MethodNonlinear Stochastic Optimization by the Monte-Carlo Method
Nonlinear Stochastic Optimization by the Monte-Carlo MethodSSA KPI
 
Pydata Katya Vasilaky
Pydata Katya VasilakyPydata Katya Vasilaky
Pydata Katya Vasilakyknv4
 
Control of Uncertain Nonlinear Systems Using Ellipsoidal Reachability Calculus
Control of Uncertain Nonlinear Systems Using Ellipsoidal Reachability CalculusControl of Uncertain Nonlinear Systems Using Ellipsoidal Reachability Calculus
Control of Uncertain Nonlinear Systems Using Ellipsoidal Reachability CalculusLeo Asselborn
 
Integration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsIntegration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsMercier Jean-Marc
 
Visualizing, Modeling and Forecasting of Functional Time Series
Visualizing, Modeling and Forecasting of Functional Time SeriesVisualizing, Modeling and Forecasting of Functional Time Series
Visualizing, Modeling and Forecasting of Functional Time Serieshanshang
 

Tendances (20)

Two dimensional Pool Boiling
Two dimensional Pool BoilingTwo dimensional Pool Boiling
Two dimensional Pool Boiling
 
Rdnd2008
Rdnd2008Rdnd2008
Rdnd2008
 
Adc
AdcAdc
Adc
 
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
 
Lecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhsLecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhs
 
Chris Sherlock's slides
Chris Sherlock's slidesChris Sherlock's slides
Chris Sherlock's slides
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
fauvel_igarss.pdf
fauvel_igarss.pdffauvel_igarss.pdf
fauvel_igarss.pdf
 
Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)
 
R. Jimenez - Fundamental Physics from Astronomical Observations
R. Jimenez - Fundamental Physics from Astronomical ObservationsR. Jimenez - Fundamental Physics from Astronomical Observations
R. Jimenez - Fundamental Physics from Astronomical Observations
 
Codes and Isogenies
Codes and IsogeniesCodes and Isogenies
Codes and Isogenies
 
Sienna 2 analysis
Sienna 2 analysisSienna 2 analysis
Sienna 2 analysis
 
Nonlinear Stochastic Optimization by the Monte-Carlo Method
Nonlinear Stochastic Optimization by the Monte-Carlo MethodNonlinear Stochastic Optimization by the Monte-Carlo Method
Nonlinear Stochastic Optimization by the Monte-Carlo Method
 
Pydata Katya Vasilaky
Pydata Katya VasilakyPydata Katya Vasilaky
Pydata Katya Vasilaky
 
Control of Uncertain Nonlinear Systems Using Ellipsoidal Reachability Calculus
Control of Uncertain Nonlinear Systems Using Ellipsoidal Reachability CalculusControl of Uncertain Nonlinear Systems Using Ellipsoidal Reachability Calculus
Control of Uncertain Nonlinear Systems Using Ellipsoidal Reachability Calculus
 
Integration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsIntegration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methods
 
Visualizing, Modeling and Forecasting of Functional Time Series
Visualizing, Modeling and Forecasting of Functional Time SeriesVisualizing, Modeling and Forecasting of Functional Time Series
Visualizing, Modeling and Forecasting of Functional Time Series
 

Similaire à Least squares support Vector Machine Classifier

Variations on the method of Coleman-Chabauty
Variations on the method of Coleman-ChabautyVariations on the method of Coleman-Chabauty
Variations on the method of Coleman-Chabautymmasdeu
 
Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplingsPierre Jacob
 
Mit2 092 f09_lec23
Mit2 092 f09_lec23Mit2 092 f09_lec23
Mit2 092 f09_lec23Rahman Hakim
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Tomoya Murata
 
Solution of the Difference equations.pptx
Solution of  the Difference equations.pptxSolution of  the Difference equations.pptx
Solution of the Difference equations.pptxvikhramecesec
 
Solution to schrodinger equation with dirac comb potential
Solution to schrodinger equation with dirac comb potential Solution to schrodinger equation with dirac comb potential
Solution to schrodinger equation with dirac comb potential slides
 
SMB_2012_HR_VAN_ST-last version
SMB_2012_HR_VAN_ST-last versionSMB_2012_HR_VAN_ST-last version
SMB_2012_HR_VAN_ST-last versionLilyana Vankova
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Lect4 ellipse
Lect4 ellipseLect4 ellipse
Lect4 ellipseBCET
 
Dsp U Lec10 DFT And FFT
Dsp U   Lec10  DFT And  FFTDsp U   Lec10  DFT And  FFT
Dsp U Lec10 DFT And FFTtaha25
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMCPierre Jacob
 
MASSS_Presentation_20160209
MASSS_Presentation_20160209MASSS_Presentation_20160209
MASSS_Presentation_20160209Yimin Wu
 
Rosser's theorem
Rosser's theoremRosser's theorem
Rosser's theoremWathna
 
An introduction to quantum stochastic calculus
An introduction to quantum stochastic calculusAn introduction to quantum stochastic calculus
An introduction to quantum stochastic calculusSpringer
 

Similaire à Least squares support Vector Machine Classifier (20)

Variations on the method of Coleman-Chabauty
Variations on the method of Coleman-ChabautyVariations on the method of Coleman-Chabauty
Variations on the method of Coleman-Chabauty
 
Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplings
 
Md2521102111
Md2521102111Md2521102111
Md2521102111
 
Mit2 092 f09_lec23
Mit2 092 f09_lec23Mit2 092 f09_lec23
Mit2 092 f09_lec23
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
 
Solution of the Difference equations.pptx
Solution of  the Difference equations.pptxSolution of  the Difference equations.pptx
Solution of the Difference equations.pptx
 
Solution to schrodinger equation with dirac comb potential
Solution to schrodinger equation with dirac comb potential Solution to schrodinger equation with dirac comb potential
Solution to schrodinger equation with dirac comb potential
 
Taylor problem
Taylor problemTaylor problem
Taylor problem
 
SMB_2012_HR_VAN_ST-last version
SMB_2012_HR_VAN_ST-last versionSMB_2012_HR_VAN_ST-last version
SMB_2012_HR_VAN_ST-last version
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Lect4 ellipse
Lect4 ellipseLect4 ellipse
Lect4 ellipse
 
Neural network and mlp
Neural network and mlpNeural network and mlp
Neural network and mlp
 
Unit 4 jwfiles
Unit 4 jwfilesUnit 4 jwfiles
Unit 4 jwfiles
 
Dsp U Lec10 DFT And FFT
Dsp U   Lec10  DFT And  FFTDsp U   Lec10  DFT And  FFT
Dsp U Lec10 DFT And FFT
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMC
 
MASSS_Presentation_20160209
MASSS_Presentation_20160209MASSS_Presentation_20160209
MASSS_Presentation_20160209
 
Rosser's theorem
Rosser's theoremRosser's theorem
Rosser's theorem
 
Mcgill3
Mcgill3Mcgill3
Mcgill3
 
An introduction to quantum stochastic calculus
An introduction to quantum stochastic calculusAn introduction to quantum stochastic calculus
An introduction to quantum stochastic calculus
 

Plus de Raj Sikarwar

Rabindranath tagore Biography
Rabindranath tagore BiographyRabindranath tagore Biography
Rabindranath tagore BiographyRaj Sikarwar
 
Hidden Surface Removal using Z-buffer
Hidden Surface Removal using Z-bufferHidden Surface Removal using Z-buffer
Hidden Surface Removal using Z-bufferRaj Sikarwar
 
Overlapping community Detection Using Bayesian NMF
Overlapping community Detection Using Bayesian NMFOverlapping community Detection Using Bayesian NMF
Overlapping community Detection Using Bayesian NMFRaj Sikarwar
 
Authentication in Different Scenarios
Authentication in Different ScenariosAuthentication in Different Scenarios
Authentication in Different ScenariosRaj Sikarwar
 
Authentication in Different Scenarios
Authentication in Different ScenariosAuthentication in Different Scenarios
Authentication in Different ScenariosRaj Sikarwar
 
AODV protocol and Black Hole attack
AODV protocol and Black Hole attackAODV protocol and Black Hole attack
AODV protocol and Black Hole attackRaj Sikarwar
 

Plus de Raj Sikarwar (7)

Rabindranath tagore Biography
Rabindranath tagore BiographyRabindranath tagore Biography
Rabindranath tagore Biography
 
Hidden Surface Removal using Z-buffer
Hidden Surface Removal using Z-bufferHidden Surface Removal using Z-buffer
Hidden Surface Removal using Z-buffer
 
Overlapping community Detection Using Bayesian NMF
Overlapping community Detection Using Bayesian NMFOverlapping community Detection Using Bayesian NMF
Overlapping community Detection Using Bayesian NMF
 
Authentication in Different Scenarios
Authentication in Different ScenariosAuthentication in Different Scenarios
Authentication in Different Scenarios
 
Authentication in Different Scenarios
Authentication in Different ScenariosAuthentication in Different Scenarios
Authentication in Different Scenarios
 
AODV protocol
AODV protocolAODV protocol
AODV protocol
 
AODV protocol and Black Hole attack
AODV protocol and Black Hole attackAODV protocol and Black Hole attack
AODV protocol and Black Hole attack
 

Least squares support Vector Machine Classifier

  • 1. Least squares Support Vector Machine Rajkumar Singh November 25, 2012
  • 2. Table of Contents Support Vector machine Least Squares Support Vector Machine Classifier Conclusion
  • 3. Support Vector Machines SVM is a classifier derived from statistical learning theory by Vapnik and Chervonnkis. SVMs introduced by Boser, Guyon, Vapnik in COLT-92. Initially popularized in NIPS community, now an important and active field of all Machine Learning Research What is SVM? SVMs are learning systems that Use a hypothesis space of linear functions In a high dimensional feature space - kernel functions Trained with a learning algorithm from optimization theory - Lagrange Implements a learning bias derived from statistical learning theory - Generalization
  • 4. Support Vector Machines for Classification Given a training set of N data points {yk , xk }N , the support k=1 vector method approach aims at constructing a classifier of the form. N y (x) = sign[ αk yk ψ(x, xk ) + b] (1) k=1 where xk ∈ Rn is the k th input pattern yk ∈ Rn is the k t h output. αk are positive constants, b is real constant.  T xk x Linear SVM.    T (x x + 1)d k Polynomial SVM of degree d ψ(x, xk ) = 2 /σ 2 } exp{− x−xk 2  RBF SVM  tanh(kx T x + θ) Two layer neural SVM  k where σ, θ, k are constants.
  • 5. SVMs for Classifications The classifier is constructed as follows. One assumes that. ω T φ(xk ) + b ≥ 1, ifyk = +1 (2) ω T φ(xk ) + b ≤ −1, ifyk = −1 Which is equivalent to yk [ω T φ(xk ) + b] ≥ 1, k = 1, . . . N (3) Where φ() is a non-linear function which maps the input space into higher dimensional space. In order to have the possibility to violate (3), in case a separating hyperplane in this high dimensional space does not exist, variables ξk are introduced such that yk [ω T φ(xk ) + b] ≥ 1 − ξk , k = 1, . . . N (4) ξk ≥, k = 1, . . . N
  • 6. SVMs for Classification According to the structural risk minimization principle, the risk bound is minimized by formulating the optimization problem. N 1 minJ1 (ω, ξk ) = ω T ω + c ξk (5) ω,ξk 2 k=1 Subject to (4). Therefore, one constructs the Lagrangian. N L1 (ω, b, ξk , αk , vk ) = J1 (ω, ξk ) − αk {yk [ω T φ(xk ) + b] − 1 + ξk } k=1 N − v k ξk k=1 (6) by introducing Lagrange multipliers αk ≥ 0, vk ≥ 0(k = 1, . . . N) The solution is given by the saddle point of the Lagrangian by computing. max min L1 (ω, b, ξk , αk , vk ). (7) αk ,vk ω,b,ξk
  • 7. SVMs for Classification from (7) one can obtain. N δL1 =0→ω= αk yk φ(xk δω k=1 N δL1 (8) =0→ αk y k = 0 δb k=1 δL1 = 0 → 0 ≤ αk ≤ c, k = 1, . . . N. δξk Which leads to the solution of the following quadratic programming problem N N 1 max Q1 (αk ; φ(xk )) = − yk yl φ(xk )T φ(xl )αk αl + αk (9) αk 2 k,l=1 k=1 such that N αk yk = 0, 0 ≤ α ≤ c, k = 1, . . . N. The function k=1 φ(xk ) in (9) is related then to ψ(x, xk ) by imposing φ(x)T φ(xk ) = ψ(x, xk ), (10)
  • 8. Note that for the two layer neural SVM, Mercer’s condtion only holds for certain parameter values of k and θ. The classifier (3) is designed by solving. N N 1 max Q1 (αk ; ψ(xk , xl )) = − yk yl ψ(xk , xl )αk αl + αk (11) αk 2 k,l=1 k=1 Subject to the constrainnts in (9). one does not have to calculate ω nor φ(xk ) in order to determine the decision surface. The solution to (11) will be global. Further, it can be show that the hyperplane (3) satisfying constraint ω 2 ≤ α have a VC-dimension h which is boundend by h ≤ min([r 2 a2 ], n) + 1 (12) Where [.] denoted the integer part and r is the radius of the smallest ball containing the points φ(x1 ), . . . φ(xN ). Such ball is found by defining Lagrangian. N 2 L2 (r , q, λk ) = r − λk (r 2 − φ(xk ) − q 2 2 (13) k=1
  • 9. SVMs for Classification in (13) q is the center, λk are positive lagrange multipliers. Here q = k λk φ(xk ), where the lagrangian follows from. N T max Q2 (λk ; φ(xk )) = − Nφ(xk ) φ(xl )λk λl + λk φ(xk )T φ(xk ) λk k,l=1 k=1 (14) N Such that k=1 λk = 1, λk ≥ 0, k = 1, . . . N. Based on (10), Q2 can also be expressed in terms of ψ(xk , xl ). Finally one selects a support vector machine VC dimension by solving (11) and computing (12) and (12).
  • 10. Least Squares Support Vector Machines Least squares version to the SVM classifier by formulating the classification problem as N 1 1 min J3 (ω, b, e) = ω T ω + γ 2 ek (15) ω,b,e 2 2 k=1 subject to the equality constraints. yk [w T φ(xk ) + b] = 1 − ek , k = 1, . . . , N. (16) Lagrangian defined as N L3 (ω, b, e, α) = J3 (ω, b, e)− αk {yk [ω T φ(xk )+b]−1+ek } (17) k=1 where αk are lagrange multipliers. The conditions for optimality N δL3 =0→ω= αk yk φ(xk ) δω k=1 (18) N δL3 =0→ αk yk = 0
  • 11. Least Squares Support Vector Machines δL3 = 0 → αk = γek , k = 1, . . . N δek δL3 = 0 → yk [ω T φ(xk ) + b] − 1 + ek = 0, k = 1, . . . , N δαk can be writte n as the solution to the following set of linear equations. −Z T      I 0 0 ω 0 0 0 0 −Y T   b  0   =    (19)  0 0 γI −I   e  0 Z Y I 0 α I Where Z = [φ(x1 )T y1 ; . . . φ(xN )T yN ] Y = [y1 ; . . . ; yN ] → 1 = [1; . . . 1] e = [e1 , . . . eN ], α = [α1 , . . . αN ]
  • 12. Least Squares Support Vector Machines The solution is given by 0 −Y T b 0 = (20) Y ZZ T + γ − 1I a 1 Mercer’s Condition can be applied again to the matrix Ω = ZZ T , where Ωkl = yk yl φ(xk )T φ(xl ) (21) = yk yl ψ(xk ), xl ) Hence the classifier (1) is found by solving the linear equations (20), (21) instead of quadratic programming. The parameters of the kernele such as σ for the RBF kernel can be optimally chosen according to (12). The support values αk are proportional to the errors ar the data points (18), while in case of (14) most values are equal to zero. Hence one could rather speak of a support value spectrum in the least squares case.
  • 13. Conclusion Due to the equality constraints, a set of linear equations has to be solved instead of quadratic programming, Mercer’s condition is applied as in other SVM’s. Least squares SVM with RBF kernel is readily found with excellent generalization performence and low computational cost. References 1. Least Squres Support Vector machine Classifiers., J.A.K Suykens, and J. Vandewalie.