SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Lecture 9: Multi Kernel SVM
Stéphane Canu
stephane.canu@litislab.eu
Sao Paulo 2014
April 16, 2014
Roadmap
1 Tuning the kernel: MKL
The multiple kernel problem
Sparse kernel machines for regression: SVR
SimpleMKL: the multiple kernel solution
Standard Learning with Kernels
User
Learning Machine
kernel k data
f
http://www.cs.nyu.edu/~mohri/icml2011-tutorial/tutorial-icml2011-2.pdf
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 3 / 21
Learning Kernel framework
User
Learning Machine
kernel
family
km
data
f , k(., .)
http://www.cs.nyu.edu/~mohri/icml2011-tutorial/tutorial-icml2011-2.pdf
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 3 / 21
from SVM
SVM: single kernel k
f (x) =
n
i=1
αi k (x, xi ) + b
=
http://www.nowozin.net/sebastian/talks/ICCV-2009-LPbeta.pdf
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 4 / 21
from SVM → to Multiple Kernel Learning (MKL)
SVM: single kernel k
MKL: set of M kernels k1, . . . , km, . . . , kM
learn classier and combination weights
can be cast as a convex optimization problem
f (x) =
n
i=1
αi
M
m=1
dm km(x, xi ) + b
M
m=1
dm = 1 and 0 ≤ dm
=
http://www.nowozin.net/sebastian/talks/ICCV-2009-LPbeta.pdf
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 4 / 21
from SVM → to Multiple Kernel Learning (MKL)
SVM: single kernel k
MKL: set of M kernels k1, . . . , km, . . . , kM
learn classier and combination weights
can be cast as a convex optimization problem
f (x) =
n
i=1
αi
M
m=1
dm km(x, xi ) + b
M
m=1
dm = 1 and 0 ≤ dm
=
n
i=1
αi K(x, xi ) + b with K(x, xi ) =
M
m=1
dm km(x, xi )
http://www.nowozin.net/sebastian/talks/ICCV-2009-LPbeta.pdf
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 4 / 21
Multiple Kernel
The model
f (x) =
n
i=1
αi
M
m=1
dmkm(x, xi ) + b,
M
m=1
dm = 1 and 0 ≤ dm
Given M kernel functions k1, . . . , kM that are potentially well suited for a
given problem, find a positive linear combination of these kernels such that
the resulting kernel k is “optimal”
k(x, x ) =
M
m=1
dmkm(x, x ), with dm ≥ 0,
m
dm = 1
Learning together
The kernel coefficients dm and the SVM parameters αi , b.
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 5 / 21
Multiple Kernel: illustration
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 6 / 21
Multiple Kernel Strategies
Wrapper method (Weston et al., 2000; Chapelle et al., 2002)
solve SVM
gradient descent on dm on criterion:
margin criterion
span criterion
Kernel Learning & Feature Selection
use Kernels as dictionary
Embedded Multi Kernel Learning (MKL)
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 7 / 21
Multiple Kernel functional Learning
The problem (for given C)
min
f ∈H,b,ξ,d
1
2
f 2
H + C
i
ξi
with yi f (xi ) + b ≥ 1 + ξi ; ξi ≥ 0 ∀i
M
m=1
dm = 1 , dm ≥ 0 ∀m ,
f =
m
fm and k(x, x ) =
M
m=1
dmkm(x, x ), with dm ≥ 0
The functional framework
H =
M
m=1
Hm f , g Hm
=
1
dm
f , g Hm
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 8 / 21
Multiple Kernel functional Learning
The problem (for given C)
min
{fm},b,ξ,d
1
2 m
1
dm
fm
2
Hm
+ C
i
ξi
with yi
m
fm(xi ) + b ≥ 1 + ξi ; ξi ≥ 0 ∀i
m
dm = 1 , dm ≥ 0 ∀m ,
Treated as a bi-level optimization task
min
d∈IRM



min
{fm},b,ξ
1
2 m
1
dm
fm
2
Hm
+ C
i
ξi
with yi
m
fm(xi ) + b ≥ 1 + ξi ; ξi ≥ 0 ∀i
s.t.
m
dm = 1 , dm ≥ 0 ∀m ,
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 9 / 21
Multiple Kernel representer theorem and dual
The Lagrangian:
L =
1
2 m
1
dm
fm
2
Hm
+ C
i
ξi −
i
αi yi
m
fm(xi ) + b − 1 − ξi −
i
βi ξi
Associated KKT stationarity conditions:
mL = 0 ⇔
1
dm
fm(•) =
n
i=1
αi yi km(•, xi ) m = 1, M
Representer theorem
f (•) =
m
fm(•) =
n
i=1
αi yi
m
dmkm(•, xi )
K(•,xi )
We have a standard SVM problem with respect to function f and kernel K.
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 10 / 21
Multiple Kernel Algorithm
Use a Reduced Gradient Algorithm1
min
d∈IRM
J(d)
s.t.
m
dm = 1 , dm ≥ 0 ∀m ,
SimpleMKL algorithm
set dm = 1
M for m = 1, . . . , M
while stopping criterion not met do
compute J(d) using an QP solver with K = m dmKm
compute ∂J
∂dm
, and projected gradient as a descent direction D
γ ← compute optimal stepsize
d ← d + γD
end while
−→ Improvement reported using the Hessian
1
Rakotomamonjy et al. JMLR 08
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 11 / 21
Computing the reduced gradient
At the optimal the primal cost = dual cost
1
2 m
1
dm
fm
2
Hm
+ C
i
ξi
primal cost
=
1
2
α Gα − e α
dual cost
with G = m dmGm where Gm,ij = km(xi , xj )
Dual cost is easier for the gradient
dm J(d) =
1
2
α Gmα
Reduce (or project) to check the constraints m dm = 1 → m Dm = 0
Dm = dm
J(d) − d1
J(d) and D1 = −
M
m=2
Dm
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 12 / 21
Complexity
For each iteration:
SVM training: O(nnsv + n3
sv).
Inverting Ksv,sv is O(n3
sv), but might already be available as a
by-product of the SVM training.
Computing H: O(Mn2
sv)
Finding d: O(M3).
The number of iterations is usually less than 10.
−→ When M < nsv, computing d is not more expensive than QP.
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 13 / 21
MKL on the 101-caltech dataset
http://www.robots.ox.ac.uk/~vgg/software/MKL/
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 14 / 21
Support vector regression (SVR)
the t-insensitive loss
min
f ∈H
1
2 f 2
H
with |f (xi ) − yi | ≤ t, i = 1, n
The support vector regression introduce slack variables
(SVR)
min
f ∈H
1
2 f 2
H + C |ξi |
with |f (xi ) − yi | ≤ t + ξi 0 ≤ ξi i = 1, n
a typical multi parametric quadratic program (mpQP)
piecewise linear regularization path
α(C, t) = α(C0, t0) + (
1
C
−
1
C0
)u +
1
C0
(t − t0)v
2d Pareto’s front (the tube width and the regularity)
Support vector regression illustration
0 1 2 3 4 5 6 7 8
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Support Vector Machine Regression
x
y
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−1.5
−1
−0.5
0
0.5
1
1.5
Support Vector Machine Regression
x
y
C large C small
there exists other formulations such as LP SVR...
Multiple Kernel Learning for regression
The problem (for given C and t)
min
{fm},b,ξ,d
1
2 m
1
dm
fm
2
Hm
+ C
i
ξi
s.t.
m
fm(xi ) + b − yi ≤ t + ξi ∀iξi ≥ 0 ∀i
m
dm = 1 , dm ≥ 0 ∀m ,
regularization formulation
min
{fm},b,d
1
2 m
1
dm
fm
2
Hm
+ C
i
max(
m
fm(xi ) + b − yi − t, 0)
m
dm = 1 , dm ≥ 0 ∀m ,
Equivalently
min
m},b,ξ,d
i
max
m
fm(xi ) + b − yi − t, 0 +
1
2C m
1
dm
fm
2
Hm
+ µ
m
|dm|
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 17 / 21
Multiple Kernel functional Learning
The problem (for given C and t)
min
{fm},b,ξ,d
1
2 m
1
dm
fm
2
Hm
+ C
i
ξi
s.t.
m
fm(xi ) + b − yi ≤ t + ξi ∀iξi ≥ 0 ∀i
m
dm = 1 , dm ≥ 0 ∀m ,
Treated as a bi-level optimization task
min
d∈IRM



min
{fm},b,ξ
1
2 m
1
dm
fm
2
Hm
+ C
i
ξi
s.t.
m
fm(xi ) + b − yi ≥ t + ξi ∀i
ξi ≥ 0 ∀i
s.t.
m
dm = 1 , dm ≥ 0 ∀m ,
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 18 / 21
Multiple Kernel experiments
0 0.2 0.4 0.6 0.8 1
−1
−0.5
0
0.5
1
LinChirp
0 0.2 0.4 0.6 0.8 1
−2
−1
0
1
2
x
0 0.2 0.4 0.6 0.8 1
0.2
0.4
0.6
0.8
1
Wave
0 0.2 0.4 0.6 0.8 1
0
0.5
1
x
0 0.2 0.4 0.6 0.8 1
0.2
0.4
0.6
0.8
1
Blocks
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
x
0 0.2 0.4 0.6 0.8 1
0.2
0.4
0.6
0.8
1
Spikes
0 0.2 0.4 0.6 0.8 1
0
0.5
1
x
Single Kernel Kernel Dil Kernel Dil-Trans
Data Set Norm. MSE (%) #Kernel Norm. MSE #Kernel Norm. MSE
LinChirp 1.46 ± 0.28 7.0 1.00 ± 0.15 21.5 0.92 ± 0.20
Wave 0.98 ± 0.06 5.5 0.73 ± 0.10 20.6 0.79 ± 0.07
Blocks 1.96 ± 0.14 6.0 2.11 ± 0.12 19.4 1.94 ± 0.13
Spike 6.85 ± 0.68 6.1 6.97 ± 0.84 12.8 5.58 ± 0.84
Table: Normalized Mean Square error averaged over 20 runs.
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 19 / 21
Conclusion on multiple kernel (MKL)
MKL: Kernel tuning, variable selection. . .
extention to classification and one class SVM
SVM KM: an efficient Matlab toolbox (available at MLOSS)2
Multiple Kernels for Image Classification: Software and Experiments
on Caltech-1013
new trend: Multi kernel, Multi task and ∞ number of kernels
2
http://mloss.org/software/view/33/
3
http://www.robots.ox.ac.uk/~vgg/software/MKL/
Bibliography
A. Rakotomamonjy, F. Bach, S. Canu & Y. Grandvalet. SimpleMKL. J.
Mach. Learn. Res. 2008, 9:2491–2521.
M. Gönen & E. Alpaydin Multiple kernel learning algorithms. J. Mach.
Learn. Res. 2008;12:2211-2268.
http://www.cs.nyu.edu/~mohri/icml2011-tutorial/tutorial-icml2011-2.pdf
http://www.robots.ox.ac.uk/~vgg/software/MKL/
http://www.nowozin.net/sebastian/talks/ICCV-2009-LPbeta.pdf
Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 21 / 21

Contenu connexe

Tendances

Pc30 June 2001 Exam Ans Key
Pc30 June 2001 Exam Ans KeyPc30 June 2001 Exam Ans Key
Pc30 June 2001 Exam Ans Key
ingroy
 
Howard, anton cálculo ii- um novo horizonte - exercicio resolvidos v2
Howard, anton   cálculo ii- um novo horizonte - exercicio resolvidos v2Howard, anton   cálculo ii- um novo horizonte - exercicio resolvidos v2
Howard, anton cálculo ii- um novo horizonte - exercicio resolvidos v2
Breno Costa
 
Dynamic Programming Over Graphs of Bounded Treewidth
Dynamic Programming Over Graphs of Bounded TreewidthDynamic Programming Over Graphs of Bounded Treewidth
Dynamic Programming Over Graphs of Bounded Treewidth
ASPAK2014
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Frank Nielsen
 

Tendances (20)

Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dual
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Pc30 June 2001 Exam Ans Key
Pc30 June 2001 Exam Ans KeyPc30 June 2001 Exam Ans Key
Pc30 June 2001 Exam Ans Key
 
Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01
 
26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
Calculus First Test 2011/10/20
Calculus First Test 2011/10/20Calculus First Test 2011/10/20
Calculus First Test 2011/10/20
 
Intro to ABC
Intro to ABCIntro to ABC
Intro to ABC
 
Howard, anton cálculo ii- um novo horizonte - exercicio resolvidos v2
Howard, anton   cálculo ii- um novo horizonte - exercicio resolvidos v2Howard, anton   cálculo ii- um novo horizonte - exercicio resolvidos v2
Howard, anton cálculo ii- um novo horizonte - exercicio resolvidos v2
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Dynamic Programming Over Graphs of Bounded Treewidth
Dynamic Programming Over Graphs of Bounded TreewidthDynamic Programming Over Graphs of Bounded Treewidth
Dynamic Programming Over Graphs of Bounded Treewidth
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
 
Gentle intro to SVM
Gentle intro to SVMGentle intro to SVM
Gentle intro to SVM
 
Svm V SVC
Svm V SVCSvm V SVC
Svm V SVC
 
Lect3cg2011
Lect3cg2011Lect3cg2011
Lect3cg2011
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Lecture4 xing
Lecture4 xingLecture4 xing
Lecture4 xing
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 

En vedette

Mod 4 lesson 9
Mod 4 lesson 9Mod 4 lesson 9
Mod 4 lesson 9
mlabuski
 
Odd and Even Numbers
Odd and Even NumbersOdd and Even Numbers
Odd and Even Numbers
yojm
 
Real Numbers
Real NumbersReal Numbers
Real Numbers
Sarah525
 
Basic Shapes
Basic ShapesBasic Shapes
Basic Shapes
Hasan Ege
 
Kinds Igneous Rocks
Kinds Igneous RocksKinds Igneous Rocks
Kinds Igneous Rocks
amjekOo Jai
 
Money- Counting Coins
Money- Counting CoinsMoney- Counting Coins
Money- Counting Coins
ashleyjudy
 
Parts of the body and clothes
Parts of the body and clothesParts of the body and clothes
Parts of the body and clothes
Amandabugarin
 

En vedette (20)

Mod 4 lesson 9
Mod 4 lesson 9Mod 4 lesson 9
Mod 4 lesson 9
 
Basic Shapes
Basic ShapesBasic Shapes
Basic Shapes
 
Even and odd numbers
Even and odd numbersEven and odd numbers
Even and odd numbers
 
Dharapat
DharapatDharapat
Dharapat
 
Odd and Even Numbers
Odd and Even NumbersOdd and Even Numbers
Odd and Even Numbers
 
Real Numbers
Real NumbersReal Numbers
Real Numbers
 
Numbers
NumbersNumbers
Numbers
 
Subject-verb agreement (grammar)
Subject-verb agreement (grammar)Subject-verb agreement (grammar)
Subject-verb agreement (grammar)
 
Basic Shapes
Basic ShapesBasic Shapes
Basic Shapes
 
Body
BodyBody
Body
 
Animals4
Animals4Animals4
Animals4
 
Lesson 2 Addition & Subtraction
Lesson 2   Addition & SubtractionLesson 2   Addition & Subtraction
Lesson 2 Addition & Subtraction
 
Numbers part 1 prep 4 and 5
Numbers part 1 prep 4 and 5Numbers part 1 prep 4 and 5
Numbers part 1 prep 4 and 5
 
Kinds Igneous Rocks
Kinds Igneous RocksKinds Igneous Rocks
Kinds Igneous Rocks
 
Money- Counting Coins
Money- Counting CoinsMoney- Counting Coins
Money- Counting Coins
 
Parts of the body and clothes
Parts of the body and clothesParts of the body and clothes
Parts of the body and clothes
 
Plant kingdom
Plant kingdomPlant kingdom
Plant kingdom
 
Play Group English Learning Alphabet 1 (I-K)
Play Group English Learning Alphabet 1 (I-K)Play Group English Learning Alphabet 1 (I-K)
Play Group English Learning Alphabet 1 (I-K)
 
Air and weather
Air and weatherAir and weather
Air and weather
 
Nouns Introduction
Nouns  Introduction Nouns  Introduction
Nouns Introduction
 

Similaire à Lecture9 multi kernel_svm

05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer vision
zukun
 
Linear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector MachinesLinear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector Machines
butest
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
nozomuhamada
 
Dominance-Based Pareto-Surrogate for Multi-Objective Optimization
Dominance-Based Pareto-Surrogate for Multi-Objective OptimizationDominance-Based Pareto-Surrogate for Multi-Objective Optimization
Dominance-Based Pareto-Surrogate for Multi-Objective Optimization
Ilya Loshchilov
 

Similaire à Lecture9 multi kernel_svm (20)

Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
ICPR 2016
ICPR 2016ICPR 2016
ICPR 2016
 
Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...
 
SIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsSIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithms
 
Pres metabief2020jmm
Pres metabief2020jmmPres metabief2020jmm
Pres metabief2020jmm
 
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
 
05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer vision
 
BEADS : filtrage asymétrique de ligne de base (tendance) et débruitage pour d...
BEADS : filtrage asymétrique de ligne de base (tendance) et débruitage pour d...BEADS : filtrage asymétrique de ligne de base (tendance) et débruitage pour d...
BEADS : filtrage asymétrique de ligne de base (tendance) et débruitage pour d...
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines Simply
 
linear SVM.ppt
linear SVM.pptlinear SVM.ppt
linear SVM.ppt
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Estimating Future Initial Margin with Machine Learning
Estimating Future Initial Margin with Machine LearningEstimating Future Initial Margin with Machine Learning
Estimating Future Initial Margin with Machine Learning
 
Linear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector MachinesLinear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector Machines
 
Matrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender SystemsMatrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender Systems
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Means
 
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
 
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris. Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
 
Dominance-Based Pareto-Surrogate for Multi-Objective Optimization
Dominance-Based Pareto-Surrogate for Multi-Objective OptimizationDominance-Based Pareto-Surrogate for Multi-Objective Optimization
Dominance-Based Pareto-Surrogate for Multi-Objective Optimization
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Lecture9 multi kernel_svm

  • 1. Lecture 9: Multi Kernel SVM Stéphane Canu stephane.canu@litislab.eu Sao Paulo 2014 April 16, 2014
  • 2. Roadmap 1 Tuning the kernel: MKL The multiple kernel problem Sparse kernel machines for regression: SVR SimpleMKL: the multiple kernel solution
  • 3. Standard Learning with Kernels User Learning Machine kernel k data f http://www.cs.nyu.edu/~mohri/icml2011-tutorial/tutorial-icml2011-2.pdf Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 3 / 21
  • 4. Learning Kernel framework User Learning Machine kernel family km data f , k(., .) http://www.cs.nyu.edu/~mohri/icml2011-tutorial/tutorial-icml2011-2.pdf Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 3 / 21
  • 5. from SVM SVM: single kernel k f (x) = n i=1 αi k (x, xi ) + b = http://www.nowozin.net/sebastian/talks/ICCV-2009-LPbeta.pdf Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 4 / 21
  • 6. from SVM → to Multiple Kernel Learning (MKL) SVM: single kernel k MKL: set of M kernels k1, . . . , km, . . . , kM learn classier and combination weights can be cast as a convex optimization problem f (x) = n i=1 αi M m=1 dm km(x, xi ) + b M m=1 dm = 1 and 0 ≤ dm = http://www.nowozin.net/sebastian/talks/ICCV-2009-LPbeta.pdf Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 4 / 21
  • 7. from SVM → to Multiple Kernel Learning (MKL) SVM: single kernel k MKL: set of M kernels k1, . . . , km, . . . , kM learn classier and combination weights can be cast as a convex optimization problem f (x) = n i=1 αi M m=1 dm km(x, xi ) + b M m=1 dm = 1 and 0 ≤ dm = n i=1 αi K(x, xi ) + b with K(x, xi ) = M m=1 dm km(x, xi ) http://www.nowozin.net/sebastian/talks/ICCV-2009-LPbeta.pdf Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 4 / 21
  • 8. Multiple Kernel The model f (x) = n i=1 αi M m=1 dmkm(x, xi ) + b, M m=1 dm = 1 and 0 ≤ dm Given M kernel functions k1, . . . , kM that are potentially well suited for a given problem, find a positive linear combination of these kernels such that the resulting kernel k is “optimal” k(x, x ) = M m=1 dmkm(x, x ), with dm ≥ 0, m dm = 1 Learning together The kernel coefficients dm and the SVM parameters αi , b. Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 5 / 21
  • 9. Multiple Kernel: illustration Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 6 / 21
  • 10. Multiple Kernel Strategies Wrapper method (Weston et al., 2000; Chapelle et al., 2002) solve SVM gradient descent on dm on criterion: margin criterion span criterion Kernel Learning & Feature Selection use Kernels as dictionary Embedded Multi Kernel Learning (MKL) Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 7 / 21
  • 11. Multiple Kernel functional Learning The problem (for given C) min f ∈H,b,ξ,d 1 2 f 2 H + C i ξi with yi f (xi ) + b ≥ 1 + ξi ; ξi ≥ 0 ∀i M m=1 dm = 1 , dm ≥ 0 ∀m , f = m fm and k(x, x ) = M m=1 dmkm(x, x ), with dm ≥ 0 The functional framework H = M m=1 Hm f , g Hm = 1 dm f , g Hm Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 8 / 21
  • 12. Multiple Kernel functional Learning The problem (for given C) min {fm},b,ξ,d 1 2 m 1 dm fm 2 Hm + C i ξi with yi m fm(xi ) + b ≥ 1 + ξi ; ξi ≥ 0 ∀i m dm = 1 , dm ≥ 0 ∀m , Treated as a bi-level optimization task min d∈IRM    min {fm},b,ξ 1 2 m 1 dm fm 2 Hm + C i ξi with yi m fm(xi ) + b ≥ 1 + ξi ; ξi ≥ 0 ∀i s.t. m dm = 1 , dm ≥ 0 ∀m , Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 9 / 21
  • 13. Multiple Kernel representer theorem and dual The Lagrangian: L = 1 2 m 1 dm fm 2 Hm + C i ξi − i αi yi m fm(xi ) + b − 1 − ξi − i βi ξi Associated KKT stationarity conditions: mL = 0 ⇔ 1 dm fm(•) = n i=1 αi yi km(•, xi ) m = 1, M Representer theorem f (•) = m fm(•) = n i=1 αi yi m dmkm(•, xi ) K(•,xi ) We have a standard SVM problem with respect to function f and kernel K. Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 10 / 21
  • 14. Multiple Kernel Algorithm Use a Reduced Gradient Algorithm1 min d∈IRM J(d) s.t. m dm = 1 , dm ≥ 0 ∀m , SimpleMKL algorithm set dm = 1 M for m = 1, . . . , M while stopping criterion not met do compute J(d) using an QP solver with K = m dmKm compute ∂J ∂dm , and projected gradient as a descent direction D γ ← compute optimal stepsize d ← d + γD end while −→ Improvement reported using the Hessian 1 Rakotomamonjy et al. JMLR 08 Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 11 / 21
  • 15. Computing the reduced gradient At the optimal the primal cost = dual cost 1 2 m 1 dm fm 2 Hm + C i ξi primal cost = 1 2 α Gα − e α dual cost with G = m dmGm where Gm,ij = km(xi , xj ) Dual cost is easier for the gradient dm J(d) = 1 2 α Gmα Reduce (or project) to check the constraints m dm = 1 → m Dm = 0 Dm = dm J(d) − d1 J(d) and D1 = − M m=2 Dm Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 12 / 21
  • 16. Complexity For each iteration: SVM training: O(nnsv + n3 sv). Inverting Ksv,sv is O(n3 sv), but might already be available as a by-product of the SVM training. Computing H: O(Mn2 sv) Finding d: O(M3). The number of iterations is usually less than 10. −→ When M < nsv, computing d is not more expensive than QP. Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 13 / 21
  • 17. MKL on the 101-caltech dataset http://www.robots.ox.ac.uk/~vgg/software/MKL/ Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 14 / 21
  • 18. Support vector regression (SVR) the t-insensitive loss min f ∈H 1 2 f 2 H with |f (xi ) − yi | ≤ t, i = 1, n The support vector regression introduce slack variables (SVR) min f ∈H 1 2 f 2 H + C |ξi | with |f (xi ) − yi | ≤ t + ξi 0 ≤ ξi i = 1, n a typical multi parametric quadratic program (mpQP) piecewise linear regularization path α(C, t) = α(C0, t0) + ( 1 C − 1 C0 )u + 1 C0 (t − t0)v 2d Pareto’s front (the tube width and the regularity)
  • 19. Support vector regression illustration 0 1 2 3 4 5 6 7 8 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Support Vector Machine Regression x y 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −1.5 −1 −0.5 0 0.5 1 1.5 Support Vector Machine Regression x y C large C small there exists other formulations such as LP SVR...
  • 20. Multiple Kernel Learning for regression The problem (for given C and t) min {fm},b,ξ,d 1 2 m 1 dm fm 2 Hm + C i ξi s.t. m fm(xi ) + b − yi ≤ t + ξi ∀iξi ≥ 0 ∀i m dm = 1 , dm ≥ 0 ∀m , regularization formulation min {fm},b,d 1 2 m 1 dm fm 2 Hm + C i max( m fm(xi ) + b − yi − t, 0) m dm = 1 , dm ≥ 0 ∀m , Equivalently min m},b,ξ,d i max m fm(xi ) + b − yi − t, 0 + 1 2C m 1 dm fm 2 Hm + µ m |dm| Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 17 / 21
  • 21. Multiple Kernel functional Learning The problem (for given C and t) min {fm},b,ξ,d 1 2 m 1 dm fm 2 Hm + C i ξi s.t. m fm(xi ) + b − yi ≤ t + ξi ∀iξi ≥ 0 ∀i m dm = 1 , dm ≥ 0 ∀m , Treated as a bi-level optimization task min d∈IRM    min {fm},b,ξ 1 2 m 1 dm fm 2 Hm + C i ξi s.t. m fm(xi ) + b − yi ≥ t + ξi ∀i ξi ≥ 0 ∀i s.t. m dm = 1 , dm ≥ 0 ∀m , Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 18 / 21
  • 22. Multiple Kernel experiments 0 0.2 0.4 0.6 0.8 1 −1 −0.5 0 0.5 1 LinChirp 0 0.2 0.4 0.6 0.8 1 −2 −1 0 1 2 x 0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Wave 0 0.2 0.4 0.6 0.8 1 0 0.5 1 x 0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Blocks 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 x 0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Spikes 0 0.2 0.4 0.6 0.8 1 0 0.5 1 x Single Kernel Kernel Dil Kernel Dil-Trans Data Set Norm. MSE (%) #Kernel Norm. MSE #Kernel Norm. MSE LinChirp 1.46 ± 0.28 7.0 1.00 ± 0.15 21.5 0.92 ± 0.20 Wave 0.98 ± 0.06 5.5 0.73 ± 0.10 20.6 0.79 ± 0.07 Blocks 1.96 ± 0.14 6.0 2.11 ± 0.12 19.4 1.94 ± 0.13 Spike 6.85 ± 0.68 6.1 6.97 ± 0.84 12.8 5.58 ± 0.84 Table: Normalized Mean Square error averaged over 20 runs. Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 19 / 21
  • 23. Conclusion on multiple kernel (MKL) MKL: Kernel tuning, variable selection. . . extention to classification and one class SVM SVM KM: an efficient Matlab toolbox (available at MLOSS)2 Multiple Kernels for Image Classification: Software and Experiments on Caltech-1013 new trend: Multi kernel, Multi task and ∞ number of kernels 2 http://mloss.org/software/view/33/ 3 http://www.robots.ox.ac.uk/~vgg/software/MKL/
  • 24. Bibliography A. Rakotomamonjy, F. Bach, S. Canu & Y. Grandvalet. SimpleMKL. J. Mach. Learn. Res. 2008, 9:2491–2521. M. Gönen & E. Alpaydin Multiple kernel learning algorithms. J. Mach. Learn. Res. 2008;12:2211-2268. http://www.cs.nyu.edu/~mohri/icml2011-tutorial/tutorial-icml2011-2.pdf http://www.robots.ox.ac.uk/~vgg/software/MKL/ http://www.nowozin.net/sebastian/talks/ICCV-2009-LPbeta.pdf Stéphane Canu (INSA Rouen - LITIS) April 16, 2014 21 / 21