Least squares support Vector Machine Classifier

Least squares Support Vector Machine

Rajkumar Singh

November 25, 2012

Table of Contents

Support Vector machine

Least Squares Support Vector Machine Classiﬁer

Conclusion

Support Vector Machines

SVM is a classiﬁer derived from statistical learning theory by
Vapnik and Chervonnkis.
SVMs introduced by Boser, Guyon, Vapnik in COLT-92.
Initially popularized in NIPS community, now an important
and active ﬁeld of all Machine Learning Research
What is SVM?
SVMs are learning systems that
Use a hypothesis space of linear functions
In a high dimensional feature space - kernel functions
Trained with a learning algorithm from optimization theory -
Lagrange
Implements a learning bias derived from statistical learning
theory - Generalization

Support Vector Machines for Classiﬁcation
Given a training set of N data points {yk , xk }N , the support
k=1
vector method approach aims at constructing a classiﬁer of the
form.
N
y (x) = sign[ αk yk ψ(x, xk ) + b] (1)
k=1

where
xk ∈ Rn is the k th input pattern
yk ∈ Rn is the k t h output.
αk are positive constants, b is real constant.

T
xk x Linear SVM.


 T
(x x + 1)d
k Polynomial SVM of degree d
ψ(x, xk ) = 2 /σ 2 }
exp{− x−xk 2
 RBF SVM

tanh(kx T x + θ) Two layer neural SVM

k
where σ, θ, k are constants.

SVMs for Classiﬁcations
The classiﬁer is constructed as follows. One assumes that.

ω T φ(xk ) + b ≥ 1, ifyk = +1
(2)
ω T φ(xk ) + b ≤ −1, ifyk = −1

Which is equivalent to

yk [ω T φ(xk ) + b] ≥ 1, k = 1, . . . N (3)
Where φ() is a non-linear function which maps the input space into
higher dimensional space.
In order to have the possibility to violate (3), in case a separating
hyperplane in this high dimensional space does not exist, variables
ξk are introduced such that

yk [ω T φ(xk ) + b] ≥ 1 − ξk , k = 1, . . . N
(4)
ξk ≥, k = 1, . . . N

SVMs for Classiﬁcation
According to the structural risk minimization principle, the risk
bound is minimized by formulating the optimization problem.
N
1
minJ1 (ω, ξk ) = ω T ω + c ξk (5)
ω,ξk 2
k=1
Subject to (4). Therefore, one constructs the Lagrangian.
N
L1 (ω, b, ξk , αk , vk ) = J1 (ω, ξk ) − αk {yk [ω T φ(xk ) + b] − 1 + ξk }
k=1
N
− v k ξk
k=1
(6)
by introducing Lagrange multipliers αk ≥ 0, vk ≥ 0(k = 1, . . . N)
The solution is given by the saddle point of the Lagrangian by
computing.
max min L1 (ω, b, ξk , αk , vk ). (7)
αk ,vk ω,b,ξk

from (7) one can obtain.
N
δL1
=0→ω= αk yk φ(xk
δω
k=1
N
δL1 (8)
=0→ αk y k = 0
δb
k=1
δL1
= 0 → 0 ≤ αk ≤ c, k = 1, . . . N.
δξk
Which leads to the solution of the following quadratic
programming problem
N N
1
max Q1 (αk ; φ(xk )) = − yk yl φ(xk )T φ(xl )αk αl + αk (9)
αk 2
k,l=1 k=1

such that N αk yk = 0, 0 ≤ α ≤ c, k = 1, . . . N. The function
k=1
φ(xk ) in (9) is related then to ψ(x, xk ) by imposing
φ(x)T φ(xk ) = ψ(x, xk ), (10)

Note that for the two layer neural SVM, Mercer’s condtion only
holds for certain parameter values of k and θ. The classiﬁer (3) is
designed by solving.
N N
1
max Q1 (αk ; ψ(xk , xl )) = − yk yl ψ(xk , xl )αk αl + αk (11)
αk 2
k,l=1 k=1

Subject to the constrainnts in (9). one does not have to calculate
ω nor φ(xk ) in order to determine the decision surface. The
solution to (11) will be global.
Further, it can be show that the hyperplane (3) satisfying
constraint ω 2 ≤ α have a VC-dimension h which is boundend by
h ≤ min([r 2 a2 ], n) + 1 (12)
Where [.] denoted the integer part and r is the radius of the
smallest ball containing the points φ(x1 ), . . . φ(xN ). Such ball is
found by deﬁning Lagrangian.
N
2
L2 (r , q, λk ) = r − λk (r 2 − φ(xk ) − q 2
2 (13)
k=1


in (13) q is the center, λk are positive lagrange multipliers. Here
q = k λk φ(xk ), where the lagrangian follows from.
N
T
max Q2 (λk ; φ(xk )) = − Nφ(xk ) φ(xl )λk λl + λk φ(xk )T φ(xk )
λk
k,l=1 k=1
(14)
N
Such that k=1 λk = 1, λk ≥ 0, k = 1, . . . N. Based on (10), Q2
can also be expressed in terms of ψ(xk , xl ). Finally one selects a
support vector machine VC dimension by solving (11) and
computing (12) and (12).

Least Squares Support Vector Machines
Least squares version to the SVM classifier by formulating the
classification problem as
N
1 1
min J3 (ω, b, e) = ω T ω + γ 2
ek (15)
ω,b,e 2 2
k=1
subject to the equality constraints.
yk [w T φ(xk ) + b] = 1 − ek , k = 1, . . . , N. (16)
Lagrangian defined as
N
L3 (ω, b, e, α) = J3 (ω, b, e)− αk {yk [ω T φ(xk )+b]−1+ek } (17)
k=1
where αk are lagrange multipliers. The conditions for optimality
N
δL3
=0→ω= αk yk φ(xk )
δω
k=1
(18)
N
δL3
=0→ αk yk = 0

δL3
= 0 → αk = γek , k = 1, . . . N
δek
δL3
= 0 → yk [ω T φ(xk ) + b] − 1 + ek = 0, k = 1, . . . , N
δαk
can be writte n as the solution to the following set of linear
equations.
−Z T
    
I 0 0 ω 0
0 0 0 −Y T   b  0
  =  
 (19)
 0 0 γI −I   e  0
Z Y I 0 α I
Where
Z = [φ(x1 )T y1 ; . . . φ(xN )T yN ]
Y = [y1 ; . . . ; yN ]
→
1 = [1; . . . 1]
e = [e1 , . . . eN ],
α = [α1 , . . . αN ]

The solution is given by

0 −Y T b 0
= (20)
Y ZZ T + γ − 1I a 1

Mercer’s Condition can be applied again to the matrix Ω = ZZ T ,
where

Ωkl = yk yl φ(xk )T φ(xl )
(21)
= yk yl ψ(xk ), xl )

Hence the classiﬁer (1) is found by solving the linear equations
(20), (21) instead of quadratic programming. The parameters of
the kernele such as σ for the RBF kernel can be optimally chosen
according to (12). The support values αk are proportional to the
errors ar the data points (18), while in case of (14) most values are
equal to zero. Hence one could rather speak of a support value
spectrum in the least squares case.

Conclusion

Due to the equality constraints, a set of linear equations has
to be solved instead of quadratic programming,
Mercer’s condition is applied as in other SVM’s.
Least squares SVM with RBF kernel is readily found with
excellent generalization performence and low computational
cost.
References
1. Least Squres Support Vector machine Classiﬁers., J.A.K
Suykens, and J. Vandewalie.

Least squares support Vector Machine Classifier

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Least squares support Vector Machine Classifier

Similaire à Least squares support Vector Machine Classifier (20)

Plus de Raj Sikarwar

Plus de Raj Sikarwar (7)

Least squares support Vector Machine Classifier