SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Lecture 2: Linear SVM in the Dual
Stéphane Canu
stephane.canu@litislab.eu
Sao Paulo 2014
March 12, 2014
Road map
1 Linear SVM
Optimization in 10 slides
Equality constraints
Inequality constraints
Dual formulation of the linear SVM
Solving the dual
Figure from L. Bottou & C.J. Lin, Support vector machine solvers, in Large scale kernel machines, 2007.
Linear SVM: the problem
Linear SVM are the solution of the following problem (called primal)
Let {(xi , yi ); i = 1 : n} be a set of labelled data with
xi ∈ IRd
, yi ∈ {1, −1}.
A support vector machine (SVM) is a linear classifier associated with the
following decision function: D(x) = sign w x + b where w ∈ IRd
and
b ∈ IR a given thought the solution of the following problem:
min
w,b
1
2 w 2 = 1
2w w
with yi (w xi + b) ≥ 1 i = 1, n
This is a quadratic program (QP):
min
z
1
2 z Az − d z
with Bz ≤ e
z = (w, b) , d = (0, . . . , 0) , A =
I 0
0 0
, B = −[diag(y)X, y] et e = −(1, . . . , 1)
Road map
1 Linear SVM
Optimization in 10 slides
Equality constraints
Inequality constraints
Dual formulation of the linear SVM
Solving the dual
A simple example (to begin with)
min
x1,x2
J(x) = (x1 − a)2 + (x2 − b)2
with
x x
xJ(x)
iso cost lines: J(x) = k
A simple example (to begin with)
min
x1,x2
J(x) = (x1 − a)2 + (x2 − b)2
with H(x) = α(x1 − c)2 + β(x2 − d)2 + γx1x2 − 1
Ω = {x|H(x) = 0}
x x
xJ(x)
∆x
xH(x)
tangent hyperplane
iso cost lines: J(x) = k
xH(x) = λ xJ(x)
The only one equality constraint case
min
x
J(x) J(x + εd) ≈ J(x) + ε xJ(x) d
with H(x) = 0 H(x + εd) ≈ H(x) + ε xH(x) d
Loss J : d is a descent direction if it exists ε0 ∈ IR such that
∀ε ∈ IR, 0 < ε ≤ ε0
J(x + εd) < J(x) ⇒ xJ(x) d < 0
constraint H : d is a feasible descent direction if it exists ε0 ∈ IR such
that ∀ε ∈ IR, 0 < ε ≤ ε0
H(x + εd) = 0 ⇒ xH(x) d = 0
If at x , vectors xJ(x ) and xH(x ) are collinear there is no feasible
descent direction d. Therefore, x is a local solution of the problem.
Lagrange multipliers
Assume J and functions Hi are continuously differentials (and independent)
P =



min
x∈IRn
J(x)
avec H1(x) = 0
et H2(x) = 0
. . .
Hp(x) = 0
Lagrange multipliers
Assume J and functions Hi are continuously differentials (and independent)
P =



min
x∈IRn
J(x)
avec H1(x) = 0 λ1
et H2(x) = 0 λ2
. . .
Hp(x) = 0 λp
each constraint is associated with λi : the Lagrange multiplier.
Lagrange multipliers
Assume J and functions Hi are continuously differentials (and independent)
P =



min
x∈IRn
J(x)
avec H1(x) = 0 λ1
et H2(x) = 0 λ2
. . .
Hp(x) = 0 λp
each constraint is associated with λi : the Lagrange multiplier.
Theorem (First order optimality conditions)
for x being a local minima of P, it is necessary that:
x J(x ) +
p
i=1
λi x Hi (x ) = 0 and Hi (x ) = 0, i = 1, p
Plan
1 Linear SVM
Optimization in 10 slides
Equality constraints
Inequality constraints
Dual formulation of the linear SVM
Solving the dual
Stéphane Canu (INSA Rouen - LITIS) March 12, 2014 8 / 32
The only one inequality constraint case
min
x
J(x) J(x + εd) ≈ J(x) + ε xJ(x) d
with G(x) ≤ 0 G(x + εd) ≈ G(x) + ε xG(x) d
cost J : d is a descent direction if it exists ε0 ∈ IR such that
∀ε ∈ IR, 0 < ε ≤ ε0
J(x + εd) < J(x) ⇒ xJ(x) d < 0
constraint G : d is a feasible descent direction if it exists ε0 ∈ IR such that
∀ε ∈ IR, 0 < ε ≤ ε0
G(x + εd) ≤ 0 ⇒
G(x) < 0 : no limit here on d
G(x) = 0 : xG(x) d ≤ 0
Two possibilities
If x lies at the limit of the feasible domain (G(x ) = 0) and if vectors
xJ(x ) and xG(x ) are collinear and in opposite directions, there is no
feasible descent direction d at that point. Therefore, x is a local solution
of the problem... Or if xJ(x ) = 0
Two possibilities for optimality
xJ(x ) = −µ xG(x ) and µ > 0; G(x ) = 0
or
xJ(x ) = 0 and µ = 0; G(x ) < 0
This alternative is summarized in the so called complementarity condition:
µ G(x ) = 0
µ = 0
G(x ) < 0
G(x ) = 0
µ > 0
First order optimality condition (1)
problem P =



min
x∈IRn
J(x)
with hj (x) = 0 j = 1, . . . , p
and gi (x) ≤ 0 i = 1, . . . , q
Definition: Karush, Kuhn and Tucker (KKT) conditions
stationarity J(x ) +
p
j=1
λj hj (x ) +
q
i=1
µi gi (x ) = 0
primal admissibility hj (x ) = 0 j = 1, . . . , p
gi (x ) ≤ 0 i = 1, . . . , q
dual admissibility µi ≥ 0 i = 1, . . . , q
complementarity µi gi (x ) = 0 i = 1, . . . , q
λj and µi are called the Lagrange multipliers of problem P
First order optimality condition (2)
Theorem (12.1 Nocedal & Wright pp 321)
If a vector x is a stationary point of problem P
Then there existsa Lagrange multipliers such that x , {λj }j=1:p, {µi }i=1:q
fulfill KKT conditions
a
under some conditions e.g. linear independence constraint qualification
If the problem is convex, then a stationary point is the solution of the
problem
A quadratic program (QP) is convex when. . .
(QP)
min
z
1
2 z Az − d z
with Bz ≤ e
. . . when matrix A is positive definite
KKT condition - Lagrangian (3)
problem P =



min
x∈IRn
J(x)
with hj (x) = 0 j = 1, . . . , p
and gi (x) ≤ 0 i = 1, . . . , q
Definition: Lagrangian
The lagrangian of problem P is the following function:
L(x, λ, µ) = J(x) +
p
j=1
λj hj (x) +
q
i=1
µi gi (x)
The importance of being a lagrangian
the stationarity condition can be written: L(x , λ, µ) = 0
the lagrangian saddle point max
λ,µ
min
x
L(x, λ, µ)
Primal variables: x and dual variables λ, µ (the Lagrange multipliers)
Duality – definitions (1)
Primal and (Lagrange) dual problems
P =



min
x∈IRn
J(x)
with hj (x) = 0 j = 1, p
and gi (x) ≤ 0 i = 1, q
D =
max
λ∈IRp,µ∈IRq
Q(λ, µ)
with µj ≥ 0 j = 1, q
Dual objective function:
Q(λ, µ) = inf
x
L(x, λ, µ)
= inf
x
J(x) +
p
j=1
λj hj (x) +
q
i=1
µi gi (x)
Wolf dual problem
W =



max
x,λ∈IRp,µ∈IRq
L(x, λ, µ)
with µj ≥ 0 j = 1, q
and J(x ) +
p
j=1
λj hj (x ) +
q
i=1
µi gi (x ) = 0
Duality – theorems (2)
Theorem (12.12, 12.13 and 12.14 Nocedal & Wright pp 346)
If f , g and h are convex and continuously differentiablea, then the solution
of the dual problem is the same as the solution of the primal
a
under some conditions e.g. linear independence constraint qualification
(λ , µ ) = solution of problem D
x = arg min
x
L(x, λ , µ )
Q(λ , µ ) = arg min
x
L(x, λ , µ ) = L(x , λ , µ )
= J(x ) + λ H(x ) + µ G(x ) = J(x )
and for any feasible point x
Q(λ, µ) ≤ J(x) → 0 ≤ J(x) − Q(λ, µ)
The duality gap is the difference between the primal and dual cost functions
Road map
1 Linear SVM
Optimization in 10 slides
Equality constraints
Inequality constraints
Dual formulation of the linear SVM
Solving the dual
Figure from L. Bottou & C.J. Lin, Support vector machine solvers, in Large scale kernel machines, 2007.
Linear SVM dual formulation - The lagrangian
min
w,b
1
2 w 2
with yi (w xi + b) ≥ 1 i = 1, n
Looking for the lagrangian saddle point max
α
min
w,b
L(w, b, α) with so called
lagrange multipliers αi ≥ 0
L(w, b, α) =
1
2
w 2
−
n
i=1
αi yi (w xi + b) − 1
αi represents the influence of constraint thus the influence of the training
example (xi , yi )
Stationarity conditions
L(w, b, α) =
1
2
w 2
−
n
i=1
αi yi (w xi + b) − 1
Computing the gradients:



wL(w, b, α) = w −
n
i=1
αi yi xi
∂L(w, b, α)
∂b
=
n
i=1 αi yi
we have the following optimality conditions



wL(w, b, α) = 0 ⇒ w =
n
i=1
αi yi xi
∂L(w, b, α)
∂b
= 0 ⇒
n
i=1
αi yi = 0
KKT conditions for SVM
stationarity w −
n
i=1
αi yi xi = 0 and
n
i=1
αi yi = 0
primal admissibility yi (w xi + b) ≥ 1 i = 1, . . . , n
dual admissibility αi ≥ 0 i = 1, . . . , n
complementarity αi yi (w xi + b) − 1 = 0 i = 1, . . . , n
The complementary condition split the data into two sets
A be the set of active constraints: usefull points
A = {i ∈ [1, n] yi (w∗
xi + b∗
) = 1}
its complementary ¯A useless points
if i /∈ A, αi = 0
The KKT conditions for SVM
The same KKT but using matrix notations and the active set A
stationarity w − X Dy α = 0
α y = 0
primal admissibility Dy (Xw + b I1) ≥ I1
dual admissibility α ≥ 0
complementarity Dy (XAw + b I1A) = I1A
α ¯A = 0
Knowing A, the solution verifies the following linear system:



w −XA Dy αA = 0
−Dy XAw −byA = −eA
−yAαA = 0
with Dy = diag(yA), αA = α(A) , yA = y(A) et XA = X(XA; :).
The KKT conditions as a linear system



w −XA Dy αA = 0
−Dy XAw −byA = −eA
−yAαA = 0
with Dy = diag(yA), αA = α(A) , yA = y(A) et XA = X(XA; :).
=
I −XA Dy 0
−Dy XA 0 −yA
0 −yA 0
w
αA
b
0
−eA
0
we can work on it to separate w from (αA, b)
The SVM dual formulation
The SVM Wolfe dual



max
w,b,α
1
2 w 2
−
n
i=1
αi yi (w xi + b) − 1
with αi ≥ 0 i = 1, . . . , n
and w −
n
i=1
αi yi xi = 0 and
n
i=1
αi yi = 0
using the fact: w =
n
i=1
αi yi xi
The SVM Wolfe dual without w and b



max
α
−1
2
n
i=1
n
j=1
αj αi yi yj xj xi +
n
i=1
αi
with αi ≥ 0 i = 1, . . . , n
and
n
i=1
αi yi = 0
Linear SVM dual formulation
L(w, b, α) =
1
2
w 2
−
n
i=1
αi yi (w xi + b) − 1
Optimality: w =
n
i=1
αi yi xi
n
i=1
αi yi = 0
L(α) = 1
2
n
i=1
n
j=1
αj αi yi yj xj xi
w w
−
n
i=1 αi yi
n
j=1
αj yj xj
w
xi − b
n
i=1
αi yi
=0
+
n
i=1 αi
= −
1
2
n
i=1
n
j=1
αj αi yi yj xj xi +
n
i=1
αi
Dual linear SVM is also a quadratic program
problem D



min
α∈IRn
1
2 α Gα − e α
with y α = 0
and 0 ≤ αi i = 1, n
with G a symmetric matrix n × n such that Gij = yi yj xj xi
SVM primal vs. dual
Primal



min
w∈IRd
,b∈IR
1
2 w 2
with yi (w xi + b) ≥ 1
i = 1, n
d + 1 unknown
n constraints
classical QP
perfect when d << n
Dual



min
α∈IRn
1
2α Gα − e α
with y α = 0
and 0 ≤ αi i = 1, n
n unknown
G Gram matrix (pairwise
influence matrix)
n box constraints
easy to solve
to be used when d > n
SVM primal vs. dual
Primal



min
w∈IRd
,b∈IR
1
2 w 2
with yi (w xi + b) ≥ 1
i = 1, n
d + 1 unknown
n constraints
classical QP
perfect when d << n
Dual



min
α∈IRn
1
2α Gα − e α
with y α = 0
and 0 ≤ αi i = 1, n
n unknown
G Gram matrix (pairwise
influence matrix)
n box constraints
easy to solve
to be used when d > n
f (x) =
d
j=1
wj xj + b =
n
i=1
αi yi (x xi ) + b
The bi dual (the dual of the dual)


min
α∈IRn
1
2 α Gα − e α
with y α = 0
and 0 ≤ αi i = 1, n
L(α, λ, µ) = 1
2 α Gα − e α + λ y α − µ α
αL(α, λ, µ) = Gα − e + λ y − µ
The bidual 


max
α,λ,µ
−1
2 α Gα
with Gα − e + λ y − µ = 0
and 0 ≤ µ
since w 2
= 1
2 α Gα and DXw = Gα
max
w,λ
−1
2 w 2
with DXw + λ y ≥ e
by identification (possibly up to a sign)
b = λ is the Lagrange multiplier of the equality constraint
Cold case: the least square problem
Linear model
yi =
d
j=1
wj xij + εi , i = 1, n
n data and d variables; d < n
min
w
=
n
i=1


d
j=1
xij wj − yi


2
= Xw − y 2
Solution: w = (X X)−1X y
f (x) = x (X X)−1
X y
w
What is the influence of each data point (matrix X lines) ?
Shawe-Taylor & Cristianini’s Book, 2004
data point influence (contribution)
for any new data point x
f (x) = x (X X)(X X)−1 (X X)−1
X y
w
= x X X(X X)−1
(X X)−1
X y
α
x
n examples
dvariables
X
α
w
f (x) =
d
j=1
wj xj
data point influence (contribution)
for any new data point x
f (x) = x (X X)(X X)−1 (X X)−1
X y
w
= x X X(X X)−1
(X X)−1
X y
α
x
n examples
dvariables
X
α
w
x xi
f (x) =
d
j=1
wj xj =
n
i=1
αi (x xi )
from variables to examples
α = X(X X)−1
w
n examples
et w = X α
d variables
what if d ≥ n !
SVM primal vs. dual
Primal



min
w∈IRd
,b∈IR
1
2 w 2
with yi (w xi + b) ≥ 1
i = 1, n
d + 1 unknown
n constraints
classical QP
perfect when d << n
Dual



min
α∈IRn
1
2α Gα − e α
with y α = 0
and 0 ≤ αi i = 1, n
n unknown
G Gram matrix (pairwise
influence matrix)
n box constraints
easy to solve
to be used when d > n
f (x) =
d
j=1
wj xj + b =
n
i=1
αi yi (x xi ) + b
Road map
1 Linear SVM
Optimization in 10 slides
Equality constraints
Inequality constraints
Dual formulation of the linear SVM
Solving the dual
Figure from L. Bottou & C.J. Lin, Support vector machine solvers, in Large scale kernel machines, 2007.
Solving the dual (1)
Data point influence
αi = 0 this point is useless
αi = 0 this point is said to be
support
f (x) =
d
j=1
wj xj + b =
n
i=1
αi yi (x xi ) + b
Solving the dual (1)
Data point influence
αi = 0 this point is useless
αi = 0 this point is said to be
support
f (x) =
d
j=1
wj xj + b =
3
i=1
αi yi (x xi ) + b
Decison border only depends on 3 points (d + 1)
Solving the dual (2)
Assume we know these 3 data points



min
α∈IRn
1
2α Gα − e α
with y α = 0
and 0 ≤ αi i = 1, n
=⇒
min
α∈IR3
1
2α Gα − e α
with y α = 0
L(α, b) =
1
2
α Gα − e α + b y α
solve the following linear system
Gα + b y = e
y α = 0
U = chol(G); % upper
a = U (U’e);
c = U (U’y);
b = (y’*a)(y’*c)
alpha = U (U’(e - b*y));
Conclusion: variables or data point?
seeking for a universal learning algorithm
no model for IP(x, y)
the linear case: data is separable
the non separable case
double objective: minimizing the error together with the regularity of
the solution
multi objective optimisation
dualiy : variable – example
use the primal when d < n (in the liner case) or when matrix G is hard
to compute
otherwise use the dual
universality = nonlinearity
kernels

Contenu connexe

Tendances

Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with PerformersJoonhyung Lee
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
進化計算とは(what is evolutional algorythm)
進化計算とは(what is evolutional algorythm)進化計算とは(what is evolutional algorythm)
進化計算とは(what is evolutional algorythm)tetuwo181
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density ModelsSangwoo Mo
 
生成對抗模式 GAN 的介紹
生成對抗模式 GAN 的介紹生成對抗模式 GAN 的介紹
生成對抗模式 GAN 的介紹Yen-lung Tsai
 
Getting Started with Jetson Nano
Getting Started with Jetson NanoGetting Started with Jetson Nano
Getting Started with Jetson NanoNVIDIA Japan
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroBill Liu
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsSungchul Kim
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningSungchul Kim
 
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜Unity Technologies Japan K.K.
 
実戦投入事例! Niagaraで地球の風をビジュアライズ!
実戦投入事例! Niagaraで地球の風をビジュアライズ!実戦投入事例! Niagaraで地球の風をビジュアライズ!
実戦投入事例! Niagaraで地球の風をビジュアライズ!historia_Inc
 
ららマジでしかできない!?キャラクターアセット最適化事例
ららマジでしかできない!?キャラクターアセット最適化事例ららマジでしかできない!?キャラクターアセット最適化事例
ららマジでしかできない!?キャラクターアセット最適化事例gree_tech
 
Bert(transformer,attention)
Bert(transformer,attention)Bert(transformer,attention)
Bert(transformer,attention)norimatsu5
 
Gan seminar
Gan seminarGan seminar
Gan seminarSan Kim
 
Unityで音ゲーをつくる
Unityで音ゲーをつくるUnityで音ゲーをつくる
Unityで音ゲーをつくるchronoah
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu
 
NDC 2016 이은석 - 돌죽을 끓입시다: 창의적 게임개발팀을 위한 왓 스튜디오의 업무 문화
NDC 2016 이은석 - 돌죽을 끓입시다: 창의적 게임개발팀을 위한 왓 스튜디오의 업무 문화NDC 2016 이은석 - 돌죽을 끓입시다: 창의적 게임개발팀을 위한 왓 스튜디오의 업무 문화
NDC 2016 이은석 - 돌죽을 끓입시다: 창의적 게임개발팀을 위한 왓 스튜디오의 업무 문화Eunseok Yi
 
ゲーム開発環境の自動化
ゲーム開発環境の自動化ゲーム開発環境の自動化
ゲーム開発環境の自動化Masahiko Nakamura
 

Tendances (20)

Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with Performers
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
進化計算とは(what is evolutional algorythm)
進化計算とは(what is evolutional algorythm)進化計算とは(what is evolutional algorythm)
進化計算とは(what is evolutional algorythm)
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
生成對抗模式 GAN 的介紹
生成對抗模式 GAN 的介紹生成對抗模式 GAN 的介紹
生成對抗模式 GAN 的介紹
 
Getting Started with Jetson Nano
Getting Started with Jetson NanoGetting Started with Jetson Nano
Getting Started with Jetson Nano
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜
 
実戦投入事例! Niagaraで地球の風をビジュアライズ!
実戦投入事例! Niagaraで地球の風をビジュアライズ!実戦投入事例! Niagaraで地球の風をビジュアライズ!
実戦投入事例! Niagaraで地球の風をビジュアライズ!
 
ららマジでしかできない!?キャラクターアセット最適化事例
ららマジでしかできない!?キャラクターアセット最適化事例ららマジでしかできない!?キャラクターアセット最適化事例
ららマジでしかできない!?キャラクターアセット最適化事例
 
Bert(transformer,attention)
Bert(transformer,attention)Bert(transformer,attention)
Bert(transformer,attention)
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
Unityで音ゲーをつくる
Unityで音ゲーをつくるUnityで音ゲーをつくる
Unityで音ゲーをつくる
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 
NDC 2016 이은석 - 돌죽을 끓입시다: 창의적 게임개발팀을 위한 왓 스튜디오의 업무 문화
NDC 2016 이은석 - 돌죽을 끓입시다: 창의적 게임개발팀을 위한 왓 스튜디오의 업무 문화NDC 2016 이은석 - 돌죽을 끓입시다: 창의적 게임개발팀을 위한 왓 스튜디오의 업무 문화
NDC 2016 이은석 - 돌죽을 끓입시다: 창의적 게임개발팀을 위한 왓 스튜디오의 업무 문화
 
ゲーム開発環境の自動化
ゲーム開発環境の自動化ゲーム開発環境の自動化
ゲーム開発環境の自動化
 

En vedette

How to use SVM for data classification
How to use SVM for data classificationHow to use SVM for data classification
How to use SVM for data classificationYiwei Chen
 
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetSupport Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetPawandeep Kaur
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tearsAnkit Sharma
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector MachineShao-Chuan Wang
 

En vedette (7)

How to use SVM for data classification
How to use SVM for data classificationHow to use SVM for data classification
How to use SVM for data classification
 
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetSupport Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom Dataset
 
Svm my
Svm mySvm my
Svm my
 
SVM
SVMSVM
SVM
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector Machine
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 

Similaire à Lecture 2: linear SVM in the dual

Lecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primalLecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primalStéphane Canu
 
Lecture3 linear svm_with_slack
Lecture3 linear svm_with_slackLecture3 linear svm_with_slack
Lecture3 linear svm_with_slackStéphane Canu
 
Basic math including gradient
Basic math including gradientBasic math including gradient
Basic math including gradientRamesh Kesavan
 
Density theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsDensity theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsVjekoslavKovac1
 
2013 IEEE International Symposium on Information Theory
2013 IEEE International Symposium on Information Theory2013 IEEE International Symposium on Information Theory
2013 IEEE International Symposium on Information TheoryJoe Suzuki
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machinenozomuhamada
 
34032 green func
34032 green func34032 green func
34032 green funcansarixxx
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Beniamino Murgante
 
Linear Machine Learning Models with L2 Regularization and Kernel Tricks
Linear Machine Learning Models with L2 Regularization and Kernel TricksLinear Machine Learning Models with L2 Regularization and Kernel Tricks
Linear Machine Learning Models with L2 Regularization and Kernel TricksFengtao Wu
 
ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)CrackDSE
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Gabriel Peyré
 
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Gabriel Peyré
 
A Simple Review on SVM
A Simple Review on SVMA Simple Review on SVM
A Simple Review on SVMHonglin Yu
 
1609 probability function p on subspace of s
1609 probability function p on subspace of s1609 probability function p on subspace of s
1609 probability function p on subspace of sDr Fereidoun Dejahang
 

Similaire à Lecture 2: linear SVM in the dual (20)

Lecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primalLecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primal
 
Lecture5 kernel svm
Lecture5 kernel svmLecture5 kernel svm
Lecture5 kernel svm
 
Lecture3 linear svm_with_slack
Lecture3 linear svm_with_slackLecture3 linear svm_with_slack
Lecture3 linear svm_with_slack
 
Basic math including gradient
Basic math including gradientBasic math including gradient
Basic math including gradient
 
Duality.pdf
Duality.pdfDuality.pdf
Duality.pdf
 
cswiercz-general-presentation
cswiercz-general-presentationcswiercz-general-presentation
cswiercz-general-presentation
 
Density theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsDensity theorems for anisotropic point configurations
Density theorems for anisotropic point configurations
 
2013 IEEE International Symposium on Information Theory
2013 IEEE International Symposium on Information Theory2013 IEEE International Symposium on Information Theory
2013 IEEE International Symposium on Information Theory
 
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
 
34032 green func
34032 green func34032 green func
34032 green func
 
stoch41.pdf
stoch41.pdfstoch41.pdf
stoch41.pdf
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
 
Linear Machine Learning Models with L2 Regularization and Kernel Tricks
Linear Machine Learning Models with L2 Regularization and Kernel TricksLinear Machine Learning Models with L2 Regularization and Kernel Tricks
Linear Machine Learning Models with L2 Regularization and Kernel Tricks
 
ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
 
A Simple Review on SVM
A Simple Review on SVMA Simple Review on SVM
A Simple Review on SVM
 
1609 probability function p on subspace of s
1609 probability function p on subspace of s1609 probability function p on subspace of s
1609 probability function p on subspace of s
 

Plus de Stéphane Canu

Lecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualLecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualStéphane Canu
 
Lecture10 outilier l0_svdd
Lecture10 outilier l0_svddLecture10 outilier l0_svdd
Lecture10 outilier l0_svddStéphane Canu
 
Lecture8 multi class_svm
Lecture8 multi class_svmLecture8 multi class_svm
Lecture8 multi class_svmStéphane Canu
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validationStéphane Canu
 
Lecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhsLecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhsStéphane Canu
 
Lecture9 multi kernel_svm
Lecture9 multi kernel_svmLecture9 multi kernel_svm
Lecture9 multi kernel_svmStéphane Canu
 
Main recsys factorisation
Main recsys factorisationMain recsys factorisation
Main recsys factorisationStéphane Canu
 

Plus de Stéphane Canu (8)

Lecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualLecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the Dual
 
Lecture10 outilier l0_svdd
Lecture10 outilier l0_svddLecture10 outilier l0_svdd
Lecture10 outilier l0_svdd
 
Lecture8 multi class_svm
Lecture8 multi class_svmLecture8 multi class_svm
Lecture8 multi class_svm
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
 
Lecture6 svdd
Lecture6 svddLecture6 svdd
Lecture6 svdd
 
Lecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhsLecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhs
 
Lecture9 multi kernel_svm
Lecture9 multi kernel_svmLecture9 multi kernel_svm
Lecture9 multi kernel_svm
 
Main recsys factorisation
Main recsys factorisationMain recsys factorisation
Main recsys factorisation
 

Dernier

Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 

Dernier (20)

Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 

Lecture 2: linear SVM in the dual

  • 1. Lecture 2: Linear SVM in the Dual Stéphane Canu stephane.canu@litislab.eu Sao Paulo 2014 March 12, 2014
  • 2. Road map 1 Linear SVM Optimization in 10 slides Equality constraints Inequality constraints Dual formulation of the linear SVM Solving the dual Figure from L. Bottou & C.J. Lin, Support vector machine solvers, in Large scale kernel machines, 2007.
  • 3. Linear SVM: the problem Linear SVM are the solution of the following problem (called primal) Let {(xi , yi ); i = 1 : n} be a set of labelled data with xi ∈ IRd , yi ∈ {1, −1}. A support vector machine (SVM) is a linear classifier associated with the following decision function: D(x) = sign w x + b where w ∈ IRd and b ∈ IR a given thought the solution of the following problem: min w,b 1 2 w 2 = 1 2w w with yi (w xi + b) ≥ 1 i = 1, n This is a quadratic program (QP): min z 1 2 z Az − d z with Bz ≤ e z = (w, b) , d = (0, . . . , 0) , A = I 0 0 0 , B = −[diag(y)X, y] et e = −(1, . . . , 1)
  • 4. Road map 1 Linear SVM Optimization in 10 slides Equality constraints Inequality constraints Dual formulation of the linear SVM Solving the dual
  • 5. A simple example (to begin with) min x1,x2 J(x) = (x1 − a)2 + (x2 − b)2 with x x xJ(x) iso cost lines: J(x) = k
  • 6. A simple example (to begin with) min x1,x2 J(x) = (x1 − a)2 + (x2 − b)2 with H(x) = α(x1 − c)2 + β(x2 − d)2 + γx1x2 − 1 Ω = {x|H(x) = 0} x x xJ(x) ∆x xH(x) tangent hyperplane iso cost lines: J(x) = k xH(x) = λ xJ(x)
  • 7. The only one equality constraint case min x J(x) J(x + εd) ≈ J(x) + ε xJ(x) d with H(x) = 0 H(x + εd) ≈ H(x) + ε xH(x) d Loss J : d is a descent direction if it exists ε0 ∈ IR such that ∀ε ∈ IR, 0 < ε ≤ ε0 J(x + εd) < J(x) ⇒ xJ(x) d < 0 constraint H : d is a feasible descent direction if it exists ε0 ∈ IR such that ∀ε ∈ IR, 0 < ε ≤ ε0 H(x + εd) = 0 ⇒ xH(x) d = 0 If at x , vectors xJ(x ) and xH(x ) are collinear there is no feasible descent direction d. Therefore, x is a local solution of the problem.
  • 8. Lagrange multipliers Assume J and functions Hi are continuously differentials (and independent) P =    min x∈IRn J(x) avec H1(x) = 0 et H2(x) = 0 . . . Hp(x) = 0
  • 9. Lagrange multipliers Assume J and functions Hi are continuously differentials (and independent) P =    min x∈IRn J(x) avec H1(x) = 0 λ1 et H2(x) = 0 λ2 . . . Hp(x) = 0 λp each constraint is associated with λi : the Lagrange multiplier.
  • 10. Lagrange multipliers Assume J and functions Hi are continuously differentials (and independent) P =    min x∈IRn J(x) avec H1(x) = 0 λ1 et H2(x) = 0 λ2 . . . Hp(x) = 0 λp each constraint is associated with λi : the Lagrange multiplier. Theorem (First order optimality conditions) for x being a local minima of P, it is necessary that: x J(x ) + p i=1 λi x Hi (x ) = 0 and Hi (x ) = 0, i = 1, p
  • 11. Plan 1 Linear SVM Optimization in 10 slides Equality constraints Inequality constraints Dual formulation of the linear SVM Solving the dual Stéphane Canu (INSA Rouen - LITIS) March 12, 2014 8 / 32
  • 12. The only one inequality constraint case min x J(x) J(x + εd) ≈ J(x) + ε xJ(x) d with G(x) ≤ 0 G(x + εd) ≈ G(x) + ε xG(x) d cost J : d is a descent direction if it exists ε0 ∈ IR such that ∀ε ∈ IR, 0 < ε ≤ ε0 J(x + εd) < J(x) ⇒ xJ(x) d < 0 constraint G : d is a feasible descent direction if it exists ε0 ∈ IR such that ∀ε ∈ IR, 0 < ε ≤ ε0 G(x + εd) ≤ 0 ⇒ G(x) < 0 : no limit here on d G(x) = 0 : xG(x) d ≤ 0 Two possibilities If x lies at the limit of the feasible domain (G(x ) = 0) and if vectors xJ(x ) and xG(x ) are collinear and in opposite directions, there is no feasible descent direction d at that point. Therefore, x is a local solution of the problem... Or if xJ(x ) = 0
  • 13. Two possibilities for optimality xJ(x ) = −µ xG(x ) and µ > 0; G(x ) = 0 or xJ(x ) = 0 and µ = 0; G(x ) < 0 This alternative is summarized in the so called complementarity condition: µ G(x ) = 0 µ = 0 G(x ) < 0 G(x ) = 0 µ > 0
  • 14. First order optimality condition (1) problem P =    min x∈IRn J(x) with hj (x) = 0 j = 1, . . . , p and gi (x) ≤ 0 i = 1, . . . , q Definition: Karush, Kuhn and Tucker (KKT) conditions stationarity J(x ) + p j=1 λj hj (x ) + q i=1 µi gi (x ) = 0 primal admissibility hj (x ) = 0 j = 1, . . . , p gi (x ) ≤ 0 i = 1, . . . , q dual admissibility µi ≥ 0 i = 1, . . . , q complementarity µi gi (x ) = 0 i = 1, . . . , q λj and µi are called the Lagrange multipliers of problem P
  • 15. First order optimality condition (2) Theorem (12.1 Nocedal & Wright pp 321) If a vector x is a stationary point of problem P Then there existsa Lagrange multipliers such that x , {λj }j=1:p, {µi }i=1:q fulfill KKT conditions a under some conditions e.g. linear independence constraint qualification If the problem is convex, then a stationary point is the solution of the problem A quadratic program (QP) is convex when. . . (QP) min z 1 2 z Az − d z with Bz ≤ e . . . when matrix A is positive definite
  • 16. KKT condition - Lagrangian (3) problem P =    min x∈IRn J(x) with hj (x) = 0 j = 1, . . . , p and gi (x) ≤ 0 i = 1, . . . , q Definition: Lagrangian The lagrangian of problem P is the following function: L(x, λ, µ) = J(x) + p j=1 λj hj (x) + q i=1 µi gi (x) The importance of being a lagrangian the stationarity condition can be written: L(x , λ, µ) = 0 the lagrangian saddle point max λ,µ min x L(x, λ, µ) Primal variables: x and dual variables λ, µ (the Lagrange multipliers)
  • 17. Duality – definitions (1) Primal and (Lagrange) dual problems P =    min x∈IRn J(x) with hj (x) = 0 j = 1, p and gi (x) ≤ 0 i = 1, q D = max λ∈IRp,µ∈IRq Q(λ, µ) with µj ≥ 0 j = 1, q Dual objective function: Q(λ, µ) = inf x L(x, λ, µ) = inf x J(x) + p j=1 λj hj (x) + q i=1 µi gi (x) Wolf dual problem W =    max x,λ∈IRp,µ∈IRq L(x, λ, µ) with µj ≥ 0 j = 1, q and J(x ) + p j=1 λj hj (x ) + q i=1 µi gi (x ) = 0
  • 18. Duality – theorems (2) Theorem (12.12, 12.13 and 12.14 Nocedal & Wright pp 346) If f , g and h are convex and continuously differentiablea, then the solution of the dual problem is the same as the solution of the primal a under some conditions e.g. linear independence constraint qualification (λ , µ ) = solution of problem D x = arg min x L(x, λ , µ ) Q(λ , µ ) = arg min x L(x, λ , µ ) = L(x , λ , µ ) = J(x ) + λ H(x ) + µ G(x ) = J(x ) and for any feasible point x Q(λ, µ) ≤ J(x) → 0 ≤ J(x) − Q(λ, µ) The duality gap is the difference between the primal and dual cost functions
  • 19. Road map 1 Linear SVM Optimization in 10 slides Equality constraints Inequality constraints Dual formulation of the linear SVM Solving the dual Figure from L. Bottou & C.J. Lin, Support vector machine solvers, in Large scale kernel machines, 2007.
  • 20. Linear SVM dual formulation - The lagrangian min w,b 1 2 w 2 with yi (w xi + b) ≥ 1 i = 1, n Looking for the lagrangian saddle point max α min w,b L(w, b, α) with so called lagrange multipliers αi ≥ 0 L(w, b, α) = 1 2 w 2 − n i=1 αi yi (w xi + b) − 1 αi represents the influence of constraint thus the influence of the training example (xi , yi )
  • 21. Stationarity conditions L(w, b, α) = 1 2 w 2 − n i=1 αi yi (w xi + b) − 1 Computing the gradients:    wL(w, b, α) = w − n i=1 αi yi xi ∂L(w, b, α) ∂b = n i=1 αi yi we have the following optimality conditions    wL(w, b, α) = 0 ⇒ w = n i=1 αi yi xi ∂L(w, b, α) ∂b = 0 ⇒ n i=1 αi yi = 0
  • 22. KKT conditions for SVM stationarity w − n i=1 αi yi xi = 0 and n i=1 αi yi = 0 primal admissibility yi (w xi + b) ≥ 1 i = 1, . . . , n dual admissibility αi ≥ 0 i = 1, . . . , n complementarity αi yi (w xi + b) − 1 = 0 i = 1, . . . , n The complementary condition split the data into two sets A be the set of active constraints: usefull points A = {i ∈ [1, n] yi (w∗ xi + b∗ ) = 1} its complementary ¯A useless points if i /∈ A, αi = 0
  • 23. The KKT conditions for SVM The same KKT but using matrix notations and the active set A stationarity w − X Dy α = 0 α y = 0 primal admissibility Dy (Xw + b I1) ≥ I1 dual admissibility α ≥ 0 complementarity Dy (XAw + b I1A) = I1A α ¯A = 0 Knowing A, the solution verifies the following linear system:    w −XA Dy αA = 0 −Dy XAw −byA = −eA −yAαA = 0 with Dy = diag(yA), αA = α(A) , yA = y(A) et XA = X(XA; :).
  • 24. The KKT conditions as a linear system    w −XA Dy αA = 0 −Dy XAw −byA = −eA −yAαA = 0 with Dy = diag(yA), αA = α(A) , yA = y(A) et XA = X(XA; :). = I −XA Dy 0 −Dy XA 0 −yA 0 −yA 0 w αA b 0 −eA 0 we can work on it to separate w from (αA, b)
  • 25. The SVM dual formulation The SVM Wolfe dual    max w,b,α 1 2 w 2 − n i=1 αi yi (w xi + b) − 1 with αi ≥ 0 i = 1, . . . , n and w − n i=1 αi yi xi = 0 and n i=1 αi yi = 0 using the fact: w = n i=1 αi yi xi The SVM Wolfe dual without w and b    max α −1 2 n i=1 n j=1 αj αi yi yj xj xi + n i=1 αi with αi ≥ 0 i = 1, . . . , n and n i=1 αi yi = 0
  • 26. Linear SVM dual formulation L(w, b, α) = 1 2 w 2 − n i=1 αi yi (w xi + b) − 1 Optimality: w = n i=1 αi yi xi n i=1 αi yi = 0 L(α) = 1 2 n i=1 n j=1 αj αi yi yj xj xi w w − n i=1 αi yi n j=1 αj yj xj w xi − b n i=1 αi yi =0 + n i=1 αi = − 1 2 n i=1 n j=1 αj αi yi yj xj xi + n i=1 αi Dual linear SVM is also a quadratic program problem D    min α∈IRn 1 2 α Gα − e α with y α = 0 and 0 ≤ αi i = 1, n with G a symmetric matrix n × n such that Gij = yi yj xj xi
  • 27. SVM primal vs. dual Primal    min w∈IRd ,b∈IR 1 2 w 2 with yi (w xi + b) ≥ 1 i = 1, n d + 1 unknown n constraints classical QP perfect when d << n Dual    min α∈IRn 1 2α Gα − e α with y α = 0 and 0 ≤ αi i = 1, n n unknown G Gram matrix (pairwise influence matrix) n box constraints easy to solve to be used when d > n
  • 28. SVM primal vs. dual Primal    min w∈IRd ,b∈IR 1 2 w 2 with yi (w xi + b) ≥ 1 i = 1, n d + 1 unknown n constraints classical QP perfect when d << n Dual    min α∈IRn 1 2α Gα − e α with y α = 0 and 0 ≤ αi i = 1, n n unknown G Gram matrix (pairwise influence matrix) n box constraints easy to solve to be used when d > n f (x) = d j=1 wj xj + b = n i=1 αi yi (x xi ) + b
  • 29. The bi dual (the dual of the dual)   min α∈IRn 1 2 α Gα − e α with y α = 0 and 0 ≤ αi i = 1, n L(α, λ, µ) = 1 2 α Gα − e α + λ y α − µ α αL(α, λ, µ) = Gα − e + λ y − µ The bidual    max α,λ,µ −1 2 α Gα with Gα − e + λ y − µ = 0 and 0 ≤ µ since w 2 = 1 2 α Gα and DXw = Gα max w,λ −1 2 w 2 with DXw + λ y ≥ e by identification (possibly up to a sign) b = λ is the Lagrange multiplier of the equality constraint
  • 30. Cold case: the least square problem Linear model yi = d j=1 wj xij + εi , i = 1, n n data and d variables; d < n min w = n i=1   d j=1 xij wj − yi   2 = Xw − y 2 Solution: w = (X X)−1X y f (x) = x (X X)−1 X y w What is the influence of each data point (matrix X lines) ? Shawe-Taylor & Cristianini’s Book, 2004
  • 31. data point influence (contribution) for any new data point x f (x) = x (X X)(X X)−1 (X X)−1 X y w = x X X(X X)−1 (X X)−1 X y α x n examples dvariables X α w f (x) = d j=1 wj xj
  • 32. data point influence (contribution) for any new data point x f (x) = x (X X)(X X)−1 (X X)−1 X y w = x X X(X X)−1 (X X)−1 X y α x n examples dvariables X α w x xi f (x) = d j=1 wj xj = n i=1 αi (x xi ) from variables to examples α = X(X X)−1 w n examples et w = X α d variables what if d ≥ n !
  • 33. SVM primal vs. dual Primal    min w∈IRd ,b∈IR 1 2 w 2 with yi (w xi + b) ≥ 1 i = 1, n d + 1 unknown n constraints classical QP perfect when d << n Dual    min α∈IRn 1 2α Gα − e α with y α = 0 and 0 ≤ αi i = 1, n n unknown G Gram matrix (pairwise influence matrix) n box constraints easy to solve to be used when d > n f (x) = d j=1 wj xj + b = n i=1 αi yi (x xi ) + b
  • 34. Road map 1 Linear SVM Optimization in 10 slides Equality constraints Inequality constraints Dual formulation of the linear SVM Solving the dual Figure from L. Bottou & C.J. Lin, Support vector machine solvers, in Large scale kernel machines, 2007.
  • 35. Solving the dual (1) Data point influence αi = 0 this point is useless αi = 0 this point is said to be support f (x) = d j=1 wj xj + b = n i=1 αi yi (x xi ) + b
  • 36. Solving the dual (1) Data point influence αi = 0 this point is useless αi = 0 this point is said to be support f (x) = d j=1 wj xj + b = 3 i=1 αi yi (x xi ) + b Decison border only depends on 3 points (d + 1)
  • 37. Solving the dual (2) Assume we know these 3 data points    min α∈IRn 1 2α Gα − e α with y α = 0 and 0 ≤ αi i = 1, n =⇒ min α∈IR3 1 2α Gα − e α with y α = 0 L(α, b) = 1 2 α Gα − e α + b y α solve the following linear system Gα + b y = e y α = 0 U = chol(G); % upper a = U (U’e); c = U (U’y); b = (y’*a)(y’*c) alpha = U (U’(e - b*y));
  • 38. Conclusion: variables or data point? seeking for a universal learning algorithm no model for IP(x, y) the linear case: data is separable the non separable case double objective: minimizing the error together with the regularity of the solution multi objective optimisation dualiy : variable – example use the primal when d < n (in the liner case) or when matrix G is hard to compute otherwise use the dual universality = nonlinearity kernels