SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Lecture 8: Multi Class SVM
Stéphane Canu
stephane.canu@litislab.eu
Sao Paulo 2014
April 10, 2014
Roadmap
1 Multi Class SVM
3 different strategies for multi class SVM
Multi Class SVM by decomposition
Multi class SVM
Coupling convex hulls
1.5 1.5
1.51.5
1.5
1.5
2.5
2.5
2.5
−0.5 0 0.5 1 1.5
0
0.5
1
1.5
2
3 different strategies for multi class SVM
1 Decomposition approaches
one vs all: winner takes all
one vs one:
max-wins voting
pairwise coupling: use probability
c SVDD
2 global approach (size c × n),
formal (different variations)



min
f ∈H,α0,ξ∈IRn
1
2
c
=1
f 2
H +
C
p
n
i=1
c
=1, =yi
ξp
i
with fyi
(xi ) + byi
≥ f (xi ) + b + 2 − ξi
and ξi ≥ 0 for i = 1, ..., n; = 1, ..., c; = yi
non consistent estimator but practically useful
structured outputs
3 A coupling formulation using the convex hulls
3 different strategies for multi class SVM
1 Decomposition approaches
one vs all: winner takes all
one vs one:
max-wins voting
pairwise coupling: use probability – best results
c SVDD
2 global approach (size c × n),
formal (different variations)



min
f ∈H,α0,ξ∈IRn
1
2
c
=1
f 2
H +
C
p
n
i=1
c
=1, =yi
ξp
i
with fyi
(xi ) + byi
≥ f (xi ) + b + 2 − ξi
and ξi ≥ 0 for i = 1, ..., n; = 1, ..., c; = yi
non consistent estimator but practically useful
structured outputs
3 A coupling formulation using the convex hulls
Multiclass SVM: complexity issues
n training data
n = 60, 000 for MNIST
c class
c = 10 for MNIST
approach
problem
size
number of
sub problems
discrimination rejection
1 vs. all n c ++ -
1 vs. 1 2n
c
c(c−1)
2 ++ -
c SVDD n
c c - ++
all together n × c 1 ++ -
coupling CH n 1 + +
Roadmap
1 Multi Class SVM
3 different strategies for multi class SVM
Multi Class SVM by decomposition
Multi class SVM
Coupling convex hulls
1.5 1.5
1.51.5
1.5
1.5
2.5
2.5
2.5
−0.5 0 0.5 1 1.5
0
0.5
1
1.5
2
Multi Class SVM by decomposition
One-Against-All Methods
→ winner-takes-all strategy
One-vs-One: pairwise methods
→ max-wins voting
→ directed acyclic graph (DAG)
→ error-correcting codes
→ post process probabilities
Hierarchical binary tree for
multi-class SVM http://courses.media.mit.edu/2006fall/
mas622j/Projects/aisen-project/
SVM and probabilities (Platt, 1999)
The decision function of the SVM is: sign f (x) + b
log
IP(Y = 1|x)
IP(Y = −1|x)
should have (almost) the same sign as f (x) + b
log
IP(Y = 1|x)
IP(Y = −1|x)
= a1(f (x) + b) + a2 IP(Y = 1|x) = 1 −
1
1 + expa1(f (x)+b)+a2
a1 et a2 estimated using maximum likelihood on new data
max
a1,a2
L
with L =
n
i=1
IP(Y = 1|xi )yi
+ (1 − IP(Y = 1|xi ))(1−yi )
and log L =
n
i=1 yi log(IP(Y = 1|xi )) + (1 − yi )log(1 − IP(Y = 1|xi ))
=
n
i=1 yi log IP(Y =1|xi )
1−IP(Y =1|xi ) + log(1 − IP(Y = 1|xi ))
=
n
i=1 yi a1(f (xi ) + b) + a2 − log(1 + expa1(f (xi )+b)+a2
)
=
n
i=1 yi a zi − log(1 + expa zi )
Newton iterations: anew ← aold − H−1 logL
SVM and probabilities (Platt, 1999)
max
a∈IR2
log L =
n
i=1
yi a zi − log(1 + expa zi
)
Newton iterations
anew
← aold
− H−1
logL
logL =
n
i=1
yi zi −
expa z
1 + expa z
zi
=
n
i=1
yi − IP(Y = 1|xi ) zi = Z (y − p)
H = −
n
i=1
zi zi IP(Y = 1|xi ) 1 − IP(Y = 1|xi ) = −Z WZ
Newton iterations
anew
← aold
+ (Z WZ)−1
Z (y − p)
SVM and probabilities: practical issues
y −→ t =



1 − ε+ =
n+ + 1
n+ + 2
if yi = 1
ε− =
1
n− + 2
if yi = −1
1 in: X, y, f /out: p
2 t ←
3 Z ←
4 loop until convergence
1 p ← 1 − 1
1+expa z
2 W ← diag p(1 − p)
3 anew
← aold
+ (Z WZ)−1
Z (t − p)
SVM and probabilities: pairwise coupling
From pairwise probabilities IP(c , cj ) to class probabilities p = IP(c |x)
min
p
c
=1
−1
j=1
IP(c , cj )2
(p − pj )2
Q e
e 0
p
µ
=
0
1
with Q j =
IP(c , cj )2 = j
i IP(c , ci )2 = j
The global procedure :
1 (Xa, ya, Xt, yt) ← split(X, y)
2 (X , y , Xp, yp) ← split(Xa, ya)
3 loop for all pairs (ci , cj ) of classes
1 modeli,j ← train_SVM(X , y , (ci , cj ))
2 IP(ci , cj ) ← estimate_proba(Xp, yp, model) % Platt estimate
4 p ← post_process(Xt, yt, IP) % Pairwise Coupling
Wu, Lin & Weng, 2004, Duan & Keerti, 05
SVM and probabilities
Some facts
SVM is universally consistent (converges towards the Bayes risk)
SVM asymptotically implements the bayes rule
but theoretically: no consistency towards conditional probabilities (due
to the nature of sparsity)
to estimate conditional probabilities on an interval
(typically[1
2 − η, 1
2 + η]) to sparseness in this interval (all data points
have to be support vectors)
Bartlett & Tewari, JMLR, 07
SVM and probabilities (2/2)
An alternative approach
g(x) − ε−
(x) ≤ IP(Y = 1|x) ≤ g(x) + ε+
(x)
with g(x) = 1
1+4−f (x)−α0
non parametric functions ε− and ε+ have to verify:
g(x) + ε+
(x) = exp−a1(1−f (x)−α0)++a2
1 − g(x) − ε−
(x) = exp−a1(1+f (x)+α0)++a2
with a1 = log 2 and a2 = 0
Grandvalet et al., 07
Roadmap
1 Multi Class SVM
3 different strategies for multi class SVM
Multi Class SVM by decomposition
Multi class SVM
Coupling convex hulls
1.5 1.5
1.51.5
1.5
1.5
2.5
2.5
2.5
−0.5 0 0.5 1 1.5
0
0.5
1
1.5
2
Multi class SVM: the decision function
One hyperplane by class
f (x) = w x + b = 1, c
Winner takes all decision function
D(x) = Argmax
=1,c
w1 x + b1, w2 x + b2, . . . , w x + b , . . . , wc x + bc
We can revisit the 2 classes case in this setting
c × (d + 1) unknown variables (w , b ); = 1, c
Multi class SVM: the optimization problem
The margin in the multidimensional case
m = min
=yi
vyi
xi − ayi
− v xi + a = vyi
xi + ayi
− max
=yi
v xi + a
The maximal margin multiclass SVM



max
v ,a
m
with vyi
xi + ayi
− v xi − a ≥ m for i = 1, n; = 1, c; = yi
and 1
2
c
=1
v 2
= 1
The multiclass SVM



min
w ,b
1
2
c
=1
w 2
with xi (wyi
− w ) + byi
− b ≥ 1 for i = 1, n; = 1, c; = yi
Multi class SVM: KKT and dual form: The 3 classes case



min
w ,b
1
2
3
=1
w 2
with wyi
xi + byi
≥ w xi + b + 1 for i = 1, n; = 1, 3; = yi



min
w ,b
1
2 w1
2
+ 1
2 w2
2
+ 1
2 w3
2
with w1 xi + b1 ≥ w2 xi + b2 + 1 for i such that yi = 1
w1 xi + b1 ≥ w3 xi + b3 + 1 for i such that yi = 1
w2 xi + b2 ≥ w1 xi + b1 + 1 for i such that yi = 2
w2 xi + b2 ≥ w3 xi + b3 + 1 for i such that yi = 2
w3 xi + b3 ≥ w1 xi + b1 + 1 for i such that yi = 3
w3 xi + b3 ≥ w2 xi + b2 + 1 for i such that yi = 3
L = 1
2 ( w1
2
+ w2
2
+ w3
2
) −α12(X1(w1 − w2) + b1 − b2 − 1)
−α13(X1(w1 − w3) + b1 − b3 − 1)
−α21(X2(w2 − w1) + b2 − b1 − 1)
−α23(X2(w2 − w3) + b2 − b3 − 1)
−α31(X3(w3 − w1) + b3 − b1 − 1)
−α32(X3(w3 − w2) + b3 − b2 − 1)
Multi class SVM: KKT and dual form: The 3 classes case
L =
1
2
w 2
− α (XMw + Ab − 1)
with
w =


w1
w2
w3

 ∈ IR3d
M = M ⊗ I =








I −I 0
I 0 −I
−I I 0
0 I −I
−I 0 I
0 −I I








a 6d × 3d matrix
where
I the identity matrix
and
X =








X1 0 0 0 0 0
0 X1 0 0 0 0
0 0 X2 0 0 0
0 0 0 X2 0 0
0 0 0 0 X3 0
0 0 0 0 0 X3








a 2n × 6d matrix
with input data
X =


X1
X2
X3

 n × d
Multi class SVM: KKT and dual form: The 3 classes case
KKT Stationality conditions =
wL = w − M X α
bL = A α
The dual
min
α∈IR2n
1
2 α Gα − e α
with Ab = 0
and 0 ≤ α
With
G = XMM X
= X(M ⊗ I)(M ⊗ I) X
= X(MM ⊗ I)X
= (MM ⊗ I). × XX
= (MM ⊗ I). × 1I K 1I
and M =








1 −1 0
1 0 −1
−1 1 0
0 1 −1
−1 0 1
0 −1 1








Multi class SVM and slack variables (2 variants)
A slack for all (Vapnik & Blanz, Weston & Watkins 1998)



min
w ,b ,ξ∈IRcn
1
2
c
=1
w 2
+ C
n
i=1
c
=1, =yi
ξi
with wyi
xi + byi
− w xi − b ≥ 1 − ξi
and ξi ≥ 0 for i = 1, n; = 1, c; = yi
The dual
min
α∈IR2n
1
2 α Gα − e α
with Ab = 0
and 0 ≤ α ≤ C
Max error, a slack per training data (Cramer and Singer, 2001)



min
w ,b ,ξ∈IRn
1
2
c
=1
w 2
+ C
n
i=1
ξi
with (wyi
− w ) xi ≥ 1 − ξi for i = 1, n; = 1, c; = yi
i=1
and ξi ≥ 0 for i = 1, n
Multi class SVM and Kernels



min
f ∈H,α0,ξ∈IRcn
1
2
c
=1
f 2
H + C
n
i=1
c
=1, =yi
ξi
with fyi
(xi ) + byi
− f (xi ) − b ≥ 1 − ξi
n
i=1
and ξi ≥ 0 for i = 1, n; = 1, c; = yi
The dual
min
α∈IR2n
1
2 α Gα − e α
with Ab = 0
and 0 ≤ α≤ C
where G is the multi class kernel matrix
Other Multi class SVM
Lee, Lin & Wahba, 2004



min
f ∈H
λ
2
c
=1
f 2
H +
1
n
n
i=1
c
=1, =yi
(f (xi ) +
1
c − 1
)+
with
c
=1
f (x) = 0 ∀x
Structured outputs = Cramer and Singer, 2001
MSVMpack : A Multi-Class Support Vector Machine Package Fabien Lauer
& Yann Guermeur
Roadmap
1 Multi Class SVM
3 different strategies for multi class SVM
Multi Class SVM by decomposition
Multi class SVM
Coupling convex hulls
1.5 1.5
1.51.5
1.5
1.5
2.5
2.5
2.5
−0.5 0 0.5 1 1.5
0
0.5
1
1.5
2
One more way to derivate SVM
Minimizing the distance between the convex hulls



min
α
u − v 2
with u(x) =
{i|yi =1}
αi (xi x), v(x) =
{i|yi =−1}
αi (xi x)
and
{i|yi =1}
αi = 1,
{i|yi =−1}
αi = 1, 0 ≤ αi i = 1, n
The multi class case



min
α
c
=1
c
=1
u − u 2
with u (x) =
{i|yi = }
αi, (xi x), = 1, c
and
{i|yi = }
αi, = 1, 0 ≤ αi, i = 1, n; = 1, c
Bibliography
Estimating probabilities
Platt, J. (2000). Probabilistic outputs for support vector machines and
comparison to regularized likelihood methods. In Advances in large
margin classifiers. MIT Press.
T. Lin, C.-J. Lin, R.C. Weng, A note on Platt’s probabilistic outputs
for support vector machines, Mach. Learn. 68 (2007) 267–276
http://www.cs.cornell.edu/courses/cs678/2007sp/platt.pdf
Multiclass SVM
K.-B. Duan & S. Keerthi (2005). "Which Is the Best Multiclass SVM
Method? An Empirical Study".
T.-F. Wu, C.-J. Lin, R.C. Weng, Probability estimates for multi-class
classification by pairwise coupling, JMLR. 5 (2004) 975–1005.
K. Crammer & Y. Singer (2001). "On the Algorithmic Implementation
of Multiclass Kernel-based Vector Machines". JMLR 2: 265–292.
Lee, Y.; Lin, Y.; and Wahba, G. (2001). "Multicategory Support
Vector Machines". Computing Science and Statistics 33.
http://www.loria.fr/~guermeur/NN2008_M_SVM_YG.pdf
http://jmlr.org/papers/volume12/lauer11a/lauer11a.pdf
Stéphane Canu (INSA Rouen - LITIS) April 10, 2014 25 / 25

Contenu connexe

Tendances

Lecture10 outilier l0_svdd
Lecture10 outilier l0_svddLecture10 outilier l0_svdd
Lecture10 outilier l0_svdd
Stéphane Canu
 
Lecture3 linear svm_with_slack
Lecture3 linear svm_with_slackLecture3 linear svm_with_slack
Lecture3 linear svm_with_slack
Stéphane Canu
 
23 industrial engineering
23 industrial engineering23 industrial engineering
23 industrial engineering
mloeb825
 

Tendances (20)

Lecture10 outilier l0_svdd
Lecture10 outilier l0_svddLecture10 outilier l0_svdd
Lecture10 outilier l0_svdd
 
Lect3cg2011
Lect3cg2011Lect3cg2011
Lect3cg2011
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01
 
26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means
 
Lecture4 xing
Lecture4 xingLecture4 xing
Lecture4 xing
 
ICPR 2016
ICPR 2016ICPR 2016
ICPR 2016
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
Lecture3 linear svm_with_slack
Lecture3 linear svm_with_slackLecture3 linear svm_with_slack
Lecture3 linear svm_with_slack
 
23 industrial engineering
23 industrial engineering23 industrial engineering
23 industrial engineering
 
Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dual
 
Lecture5 kernel svm
Lecture5 kernel svmLecture5 kernel svm
Lecture5 kernel svm
 
Unit 3
Unit 3Unit 3
Unit 3
 
Bresenham's line algorithm
Bresenham's line algorithmBresenham's line algorithm
Bresenham's line algorithm
 
A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
A Multi-Objective Genetic Algorithm for Pruning Support Vector MachinesA Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine Learning11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine Learning
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNN
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
 

Similaire à Lecture8 multi class_svm

Howard, anton calculo i- um novo horizonte - exercicio resolvidos v1
Howard, anton   calculo i- um novo horizonte - exercicio resolvidos v1Howard, anton   calculo i- um novo horizonte - exercicio resolvidos v1
Howard, anton calculo i- um novo horizonte - exercicio resolvidos v1
cideni
 
Gabarito completo anton_calculo_8ed_caps_01_08
Gabarito completo anton_calculo_8ed_caps_01_08Gabarito completo anton_calculo_8ed_caps_01_08
Gabarito completo anton_calculo_8ed_caps_01_08
joseotaviosurdi
 
Amth250 octave matlab some solutions (2)
Amth250 octave matlab some solutions (2)Amth250 octave matlab some solutions (2)
Amth250 octave matlab some solutions (2)
asghar123456
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
nozomuhamada
 
1 of 402.5 PointsUse Cramer’s Rule to solve the following syst.docx
1 of 402.5 PointsUse Cramer’s Rule to solve the following syst.docx1 of 402.5 PointsUse Cramer’s Rule to solve the following syst.docx
1 of 402.5 PointsUse Cramer’s Rule to solve the following syst.docx
mercysuttle
 

Similaire à Lecture8 multi class_svm (20)

Howard, anton calculo i- um novo horizonte - exercicio resolvidos v1
Howard, anton   calculo i- um novo horizonte - exercicio resolvidos v1Howard, anton   calculo i- um novo horizonte - exercicio resolvidos v1
Howard, anton calculo i- um novo horizonte - exercicio resolvidos v1
 
Gabarito completo anton_calculo_8ed_caps_01_08
Gabarito completo anton_calculo_8ed_caps_01_08Gabarito completo anton_calculo_8ed_caps_01_08
Gabarito completo anton_calculo_8ed_caps_01_08
 
Solutions Manual for Calculus Early Transcendentals 10th Edition by Anton
Solutions Manual for Calculus Early Transcendentals 10th Edition by AntonSolutions Manual for Calculus Early Transcendentals 10th Edition by Anton
Solutions Manual for Calculus Early Transcendentals 10th Edition by Anton
 
linear SVM.ppt
linear SVM.pptlinear SVM.ppt
linear SVM.ppt
 
stoch41.pdf
stoch41.pdfstoch41.pdf
stoch41.pdf
 
New stack
New stackNew stack
New stack
 
Stacks image 1721_36
Stacks image 1721_36Stacks image 1721_36
Stacks image 1721_36
 
A Simple Review on SVM
A Simple Review on SVMA Simple Review on SVM
A Simple Review on SVM
 
Calculus Early Transcendentals 10th Edition Anton Solutions Manual
Calculus Early Transcendentals 10th Edition Anton Solutions ManualCalculus Early Transcendentals 10th Edition Anton Solutions Manual
Calculus Early Transcendentals 10th Edition Anton Solutions Manual
 
Algebra Revision.ppt
Algebra Revision.pptAlgebra Revision.ppt
Algebra Revision.ppt
 
Amth250 octave matlab some solutions (2)
Amth250 octave matlab some solutions (2)Amth250 octave matlab some solutions (2)
Amth250 octave matlab some solutions (2)
 
Lecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primalLecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primal
 
Solution Manual : Chapter - 01 Functions
Solution Manual : Chapter - 01 FunctionsSolution Manual : Chapter - 01 Functions
Solution Manual : Chapter - 01 Functions
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
 
Chapter 6 taylor and maclaurin series
Chapter 6 taylor and maclaurin seriesChapter 6 taylor and maclaurin series
Chapter 6 taylor and maclaurin series
 
CS571: Gradient Descent
CS571: Gradient DescentCS571: Gradient Descent
CS571: Gradient Descent
 
Gradient Descent
Gradient DescentGradient Descent
Gradient Descent
 
Lecture 03 special products and factoring
Lecture 03 special products and factoringLecture 03 special products and factoring
Lecture 03 special products and factoring
 
Support Vector Machine Derivative
Support Vector Machine DerivativeSupport Vector Machine Derivative
Support Vector Machine Derivative
 
1 of 402.5 PointsUse Cramer’s Rule to solve the following syst.docx
1 of 402.5 PointsUse Cramer’s Rule to solve the following syst.docx1 of 402.5 PointsUse Cramer’s Rule to solve the following syst.docx
1 of 402.5 PointsUse Cramer’s Rule to solve the following syst.docx
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Lecture8 multi class_svm

  • 1. Lecture 8: Multi Class SVM Stéphane Canu stephane.canu@litislab.eu Sao Paulo 2014 April 10, 2014
  • 2. Roadmap 1 Multi Class SVM 3 different strategies for multi class SVM Multi Class SVM by decomposition Multi class SVM Coupling convex hulls 1.5 1.5 1.51.5 1.5 1.5 2.5 2.5 2.5 −0.5 0 0.5 1 1.5 0 0.5 1 1.5 2
  • 3. 3 different strategies for multi class SVM 1 Decomposition approaches one vs all: winner takes all one vs one: max-wins voting pairwise coupling: use probability c SVDD 2 global approach (size c × n), formal (different variations)    min f ∈H,α0,ξ∈IRn 1 2 c =1 f 2 H + C p n i=1 c =1, =yi ξp i with fyi (xi ) + byi ≥ f (xi ) + b + 2 − ξi and ξi ≥ 0 for i = 1, ..., n; = 1, ..., c; = yi non consistent estimator but practically useful structured outputs 3 A coupling formulation using the convex hulls
  • 4. 3 different strategies for multi class SVM 1 Decomposition approaches one vs all: winner takes all one vs one: max-wins voting pairwise coupling: use probability – best results c SVDD 2 global approach (size c × n), formal (different variations)    min f ∈H,α0,ξ∈IRn 1 2 c =1 f 2 H + C p n i=1 c =1, =yi ξp i with fyi (xi ) + byi ≥ f (xi ) + b + 2 − ξi and ξi ≥ 0 for i = 1, ..., n; = 1, ..., c; = yi non consistent estimator but practically useful structured outputs 3 A coupling formulation using the convex hulls
  • 5. Multiclass SVM: complexity issues n training data n = 60, 000 for MNIST c class c = 10 for MNIST approach problem size number of sub problems discrimination rejection 1 vs. all n c ++ - 1 vs. 1 2n c c(c−1) 2 ++ - c SVDD n c c - ++ all together n × c 1 ++ - coupling CH n 1 + +
  • 6. Roadmap 1 Multi Class SVM 3 different strategies for multi class SVM Multi Class SVM by decomposition Multi class SVM Coupling convex hulls 1.5 1.5 1.51.5 1.5 1.5 2.5 2.5 2.5 −0.5 0 0.5 1 1.5 0 0.5 1 1.5 2
  • 7. Multi Class SVM by decomposition One-Against-All Methods → winner-takes-all strategy One-vs-One: pairwise methods → max-wins voting → directed acyclic graph (DAG) → error-correcting codes → post process probabilities Hierarchical binary tree for multi-class SVM http://courses.media.mit.edu/2006fall/ mas622j/Projects/aisen-project/
  • 8. SVM and probabilities (Platt, 1999) The decision function of the SVM is: sign f (x) + b log IP(Y = 1|x) IP(Y = −1|x) should have (almost) the same sign as f (x) + b log IP(Y = 1|x) IP(Y = −1|x) = a1(f (x) + b) + a2 IP(Y = 1|x) = 1 − 1 1 + expa1(f (x)+b)+a2 a1 et a2 estimated using maximum likelihood on new data max a1,a2 L with L = n i=1 IP(Y = 1|xi )yi + (1 − IP(Y = 1|xi ))(1−yi ) and log L = n i=1 yi log(IP(Y = 1|xi )) + (1 − yi )log(1 − IP(Y = 1|xi )) = n i=1 yi log IP(Y =1|xi ) 1−IP(Y =1|xi ) + log(1 − IP(Y = 1|xi )) = n i=1 yi a1(f (xi ) + b) + a2 − log(1 + expa1(f (xi )+b)+a2 ) = n i=1 yi a zi − log(1 + expa zi ) Newton iterations: anew ← aold − H−1 logL
  • 9. SVM and probabilities (Platt, 1999) max a∈IR2 log L = n i=1 yi a zi − log(1 + expa zi ) Newton iterations anew ← aold − H−1 logL logL = n i=1 yi zi − expa z 1 + expa z zi = n i=1 yi − IP(Y = 1|xi ) zi = Z (y − p) H = − n i=1 zi zi IP(Y = 1|xi ) 1 − IP(Y = 1|xi ) = −Z WZ Newton iterations anew ← aold + (Z WZ)−1 Z (y − p)
  • 10. SVM and probabilities: practical issues y −→ t =    1 − ε+ = n+ + 1 n+ + 2 if yi = 1 ε− = 1 n− + 2 if yi = −1 1 in: X, y, f /out: p 2 t ← 3 Z ← 4 loop until convergence 1 p ← 1 − 1 1+expa z 2 W ← diag p(1 − p) 3 anew ← aold + (Z WZ)−1 Z (t − p)
  • 11. SVM and probabilities: pairwise coupling From pairwise probabilities IP(c , cj ) to class probabilities p = IP(c |x) min p c =1 −1 j=1 IP(c , cj )2 (p − pj )2 Q e e 0 p µ = 0 1 with Q j = IP(c , cj )2 = j i IP(c , ci )2 = j The global procedure : 1 (Xa, ya, Xt, yt) ← split(X, y) 2 (X , y , Xp, yp) ← split(Xa, ya) 3 loop for all pairs (ci , cj ) of classes 1 modeli,j ← train_SVM(X , y , (ci , cj )) 2 IP(ci , cj ) ← estimate_proba(Xp, yp, model) % Platt estimate 4 p ← post_process(Xt, yt, IP) % Pairwise Coupling Wu, Lin & Weng, 2004, Duan & Keerti, 05
  • 12. SVM and probabilities Some facts SVM is universally consistent (converges towards the Bayes risk) SVM asymptotically implements the bayes rule but theoretically: no consistency towards conditional probabilities (due to the nature of sparsity) to estimate conditional probabilities on an interval (typically[1 2 − η, 1 2 + η]) to sparseness in this interval (all data points have to be support vectors) Bartlett & Tewari, JMLR, 07
  • 13. SVM and probabilities (2/2) An alternative approach g(x) − ε− (x) ≤ IP(Y = 1|x) ≤ g(x) + ε+ (x) with g(x) = 1 1+4−f (x)−α0 non parametric functions ε− and ε+ have to verify: g(x) + ε+ (x) = exp−a1(1−f (x)−α0)++a2 1 − g(x) − ε− (x) = exp−a1(1+f (x)+α0)++a2 with a1 = log 2 and a2 = 0 Grandvalet et al., 07
  • 14. Roadmap 1 Multi Class SVM 3 different strategies for multi class SVM Multi Class SVM by decomposition Multi class SVM Coupling convex hulls 1.5 1.5 1.51.5 1.5 1.5 2.5 2.5 2.5 −0.5 0 0.5 1 1.5 0 0.5 1 1.5 2
  • 15. Multi class SVM: the decision function One hyperplane by class f (x) = w x + b = 1, c Winner takes all decision function D(x) = Argmax =1,c w1 x + b1, w2 x + b2, . . . , w x + b , . . . , wc x + bc We can revisit the 2 classes case in this setting c × (d + 1) unknown variables (w , b ); = 1, c
  • 16. Multi class SVM: the optimization problem The margin in the multidimensional case m = min =yi vyi xi − ayi − v xi + a = vyi xi + ayi − max =yi v xi + a The maximal margin multiclass SVM    max v ,a m with vyi xi + ayi − v xi − a ≥ m for i = 1, n; = 1, c; = yi and 1 2 c =1 v 2 = 1 The multiclass SVM    min w ,b 1 2 c =1 w 2 with xi (wyi − w ) + byi − b ≥ 1 for i = 1, n; = 1, c; = yi
  • 17. Multi class SVM: KKT and dual form: The 3 classes case    min w ,b 1 2 3 =1 w 2 with wyi xi + byi ≥ w xi + b + 1 for i = 1, n; = 1, 3; = yi    min w ,b 1 2 w1 2 + 1 2 w2 2 + 1 2 w3 2 with w1 xi + b1 ≥ w2 xi + b2 + 1 for i such that yi = 1 w1 xi + b1 ≥ w3 xi + b3 + 1 for i such that yi = 1 w2 xi + b2 ≥ w1 xi + b1 + 1 for i such that yi = 2 w2 xi + b2 ≥ w3 xi + b3 + 1 for i such that yi = 2 w3 xi + b3 ≥ w1 xi + b1 + 1 for i such that yi = 3 w3 xi + b3 ≥ w2 xi + b2 + 1 for i such that yi = 3 L = 1 2 ( w1 2 + w2 2 + w3 2 ) −α12(X1(w1 − w2) + b1 − b2 − 1) −α13(X1(w1 − w3) + b1 − b3 − 1) −α21(X2(w2 − w1) + b2 − b1 − 1) −α23(X2(w2 − w3) + b2 − b3 − 1) −α31(X3(w3 − w1) + b3 − b1 − 1) −α32(X3(w3 − w2) + b3 − b2 − 1)
  • 18. Multi class SVM: KKT and dual form: The 3 classes case L = 1 2 w 2 − α (XMw + Ab − 1) with w =   w1 w2 w3   ∈ IR3d M = M ⊗ I =         I −I 0 I 0 −I −I I 0 0 I −I −I 0 I 0 −I I         a 6d × 3d matrix where I the identity matrix and X =         X1 0 0 0 0 0 0 X1 0 0 0 0 0 0 X2 0 0 0 0 0 0 X2 0 0 0 0 0 0 X3 0 0 0 0 0 0 X3         a 2n × 6d matrix with input data X =   X1 X2 X3   n × d
  • 19. Multi class SVM: KKT and dual form: The 3 classes case KKT Stationality conditions = wL = w − M X α bL = A α The dual min α∈IR2n 1 2 α Gα − e α with Ab = 0 and 0 ≤ α With G = XMM X = X(M ⊗ I)(M ⊗ I) X = X(MM ⊗ I)X = (MM ⊗ I). × XX = (MM ⊗ I). × 1I K 1I and M =         1 −1 0 1 0 −1 −1 1 0 0 1 −1 −1 0 1 0 −1 1        
  • 20. Multi class SVM and slack variables (2 variants) A slack for all (Vapnik & Blanz, Weston & Watkins 1998)    min w ,b ,ξ∈IRcn 1 2 c =1 w 2 + C n i=1 c =1, =yi ξi with wyi xi + byi − w xi − b ≥ 1 − ξi and ξi ≥ 0 for i = 1, n; = 1, c; = yi The dual min α∈IR2n 1 2 α Gα − e α with Ab = 0 and 0 ≤ α ≤ C Max error, a slack per training data (Cramer and Singer, 2001)    min w ,b ,ξ∈IRn 1 2 c =1 w 2 + C n i=1 ξi with (wyi − w ) xi ≥ 1 − ξi for i = 1, n; = 1, c; = yi i=1 and ξi ≥ 0 for i = 1, n
  • 21. Multi class SVM and Kernels    min f ∈H,α0,ξ∈IRcn 1 2 c =1 f 2 H + C n i=1 c =1, =yi ξi with fyi (xi ) + byi − f (xi ) − b ≥ 1 − ξi n i=1 and ξi ≥ 0 for i = 1, n; = 1, c; = yi The dual min α∈IR2n 1 2 α Gα − e α with Ab = 0 and 0 ≤ α≤ C where G is the multi class kernel matrix
  • 22. Other Multi class SVM Lee, Lin & Wahba, 2004    min f ∈H λ 2 c =1 f 2 H + 1 n n i=1 c =1, =yi (f (xi ) + 1 c − 1 )+ with c =1 f (x) = 0 ∀x Structured outputs = Cramer and Singer, 2001 MSVMpack : A Multi-Class Support Vector Machine Package Fabien Lauer & Yann Guermeur
  • 23. Roadmap 1 Multi Class SVM 3 different strategies for multi class SVM Multi Class SVM by decomposition Multi class SVM Coupling convex hulls 1.5 1.5 1.51.5 1.5 1.5 2.5 2.5 2.5 −0.5 0 0.5 1 1.5 0 0.5 1 1.5 2
  • 24. One more way to derivate SVM Minimizing the distance between the convex hulls    min α u − v 2 with u(x) = {i|yi =1} αi (xi x), v(x) = {i|yi =−1} αi (xi x) and {i|yi =1} αi = 1, {i|yi =−1} αi = 1, 0 ≤ αi i = 1, n
  • 25. The multi class case    min α c =1 c =1 u − u 2 with u (x) = {i|yi = } αi, (xi x), = 1, c and {i|yi = } αi, = 1, 0 ≤ αi, i = 1, n; = 1, c
  • 26. Bibliography Estimating probabilities Platt, J. (2000). Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in large margin classifiers. MIT Press. T. Lin, C.-J. Lin, R.C. Weng, A note on Platt’s probabilistic outputs for support vector machines, Mach. Learn. 68 (2007) 267–276 http://www.cs.cornell.edu/courses/cs678/2007sp/platt.pdf Multiclass SVM K.-B. Duan & S. Keerthi (2005). "Which Is the Best Multiclass SVM Method? An Empirical Study". T.-F. Wu, C.-J. Lin, R.C. Weng, Probability estimates for multi-class classification by pairwise coupling, JMLR. 5 (2004) 975–1005. K. Crammer & Y. Singer (2001). "On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines". JMLR 2: 265–292. Lee, Y.; Lin, Y.; and Wahba, G. (2001). "Multicategory Support Vector Machines". Computing Science and Statistics 33. http://www.loria.fr/~guermeur/NN2008_M_SVM_YG.pdf http://jmlr.org/papers/volume12/lauer11a/lauer11a.pdf Stéphane Canu (INSA Rouen - LITIS) April 10, 2014 25 / 25