SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Support Vector Machine
  (and Statistical Learning Theory)

           Tutorial
             Jason Weston
          NEC Labs America
  4 Independence Way, Princeton, USA.
         jasonw@nec-labs.com
1 Support Vector Machines: history
 • SVMs introduced in COLT-92 by Boser, Guyon & Vapnik. Became
   rather popular since.

 • Theoretically well motivated algorithm: developed from Statistical
   Learning Theory (Vapnik & Chervonenkis) since the 60s.

 • Empirically good performance: successful applications in many
   fields (bioinformatics, text, image recognition, . . . )
2 Support Vector Machines: history II
 • Centralized website: www.kernel-machines.org.

 • Several textbooks, e.g. ”An introduction to Support Vector
   Machines” by Cristianini and Shawe-Taylor is one.
 • A large and diverse community work on them: from machine
   learning, optimization, statistics, neural networks, functional
   analysis, etc.
3 Support Vector Machines: basics
[Boser, Guyon, Vapnik ’92],[Cortes & Vapnik ’95]


            -              -             -              - margin
                                -

                                                                 margin
                    +                        +
           +
                                    +                    +
Nice properties: convex, theoretically motivated, nonlinear with kernels..
4 Preliminaries:
 • Machine learning is about learning structure from data.

 • Although the class of algorithms called ”SVM”s can do more, in this
   talk we focus on pattern recognition.

 • So we want to learn the mapping: X → Y, where x ∈ X is some
   object and y ∈ Y is a class label.
 • Let’s take the simplest case: 2-class classification. So: x ∈ Rn ,
   y ∈ {±1}.
5 Example:

Suppose we have 50 photographs of elephants and 50 photos of tigers.




                                  vs.

We digitize them into 100 x 100 pixel images, so we have x ∈ Rn where
n = 10, 000.
Now, given a new (different) photograph we want to answer the question:
is it an elephant or a tiger? [we assume it is one or the other.]
6 Training sets and prediction models
 • input/output sets X , Y

 • training set (x1 , y1 ), . . . , (xm , ym )
 • ”generalization”: given a previously seen x ∈ X , find a suitable
   y ∈ Y.

 • i.e., want to learn a classifier: y = f (x, α), where α are the
   parameters of the function.
 • For example, if we are choosing our model from the set of
   hyperplanes in Rn , then we have:

                          f (x, {w, b}) = sign(w · x + b).
7 Empirical Risk and the true Risk
 • We can try to learn f (x, α) by choosing a function that performs well
   on training data:
                             m
                       1
            Remp (α) =            (f (xi , α), yi ) = Training Error
                       m    i=1

   where is the zero-one loss function, (y, y ) = 1, if y = y , and 0
                                             ˆ              ˆ
   otherwise. Remp is called the empirical risk.
 • By doing this we are trying to minimize the overall risk:

              R(α) =       (f (x, α), y)dP (x, y) = Test Error

   where P(x,y) is the (unknown) joint distribution function of x and y.
8 Choosing the set of functions
What about f (x, α) allowing all functions from X to {±1}?
Training set (x1 , y1 ), . . . , (xm , ym ) ∈ X × {±1}
Test set x1 , . . . , xm ∈ X ,
         ¯            ¯¯
such that the two sets do not intersect.
For any f there exists f ∗ :
 1. f ∗ (xi ) = f (xi ) for all i
 2. f ∗ (xj ) = f (xj ) for all j
Based on the training data alone, there is no means of choosing which
function is better. On the test set however they give different results. So
generalization is not guaranteed.
=⇒ a restriction must be placed on the functions that we allow.
9 Empirical Risk and the true Risk
Vapnik & Chervonenkis showed that an upper bound on the true risk can
be given by the empirical risk + an additional term:


                                   h(log( 2m + 1) − log( η )
                                                         4
           R(α) ≤ Remp (α) +               h
                                             m
where h is the VC dimension of the set of functions parameterized by α.
 • The VC dimension of a set of functions is a measure of their capacity
   or complexity.
 • If you can describe a lot of different phenomena with a set of
   functions then the value of h is large.
[VC dim = the maximum number of points that can be separated in all
possible ways by that set of functions.]
10 VC dimension:

The VC dimension of a set of functions is the maximum number of points
that can be separated in all possible ways by that set of functions. For
hyperplanes in Rn , the VC dimension can be shown to be n + 1.
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                             x
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                     x
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                              x                                        x
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                                                                    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
         x    xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
                                                 x
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                          x                                        x
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                         x
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                x                                        x
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                                                  x
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

  xxxxxxxxxxxxxxxxxxxx                                                             xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                                                             xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                                                             xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                                                             xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
                             x                                       x
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                              x
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                                                       x
                                                                                                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
         x
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
                                                 x
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                          x
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                                   x                  xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
                         x                 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                x
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                         x                                        x   xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                                                                                                                                xxxxxxxxxxxx
11 VC dimension and capacity of functions

Simplification of bound:
  Test Error ≤ Training Error + Complexity of set of Models

 • Actually, a lot of bounds of this form have been proved (different
   measures of capacity). The complexity function is often called a
   regularizer.

 • If you take a high capacity set of functions (explain a lot) you get low
   training error. But you might ”overfit”.

 • If you take a very simple set of models, you have low complexity, but
   won’t get low training error.
12 Capacity of a set of functions (classification)




[Images taken from a talk by B. Schoelkopf.]
13 Capacity of a set of functions (regression)
                  y

                                     sine curve fit


                                             hyperplane fit



                                         true function
                                     x
14 Controlling the risk: model complexity

                                 Bound on the risk


                                 Confidence interval




                                          Empirical risk
                                          (training error)


         h1           h*    hn


              S1 S*    Sn
15 Capacity of hyperplanes

Vapnik & Chervonenkis also showed the following:
Consider hyperplanes (w · x) = 0 where w is normalized w.r.t a set of
points X ∗ such that: mini |w · xi | = 1.
The set of decision functions fw (x) = sign(w · x) defined on X ∗ such
that ||w|| ≤ A has a VC dimension satisfying

                               h ≤ R2 A2 .

where R is the radius of the smallest sphere around the origin containing
X ∗.
=⇒ minimize ||w||2 and have low capacity
=⇒ minimizing ||w||2 equivalent to obtaining a large margin classifier
<w, x> + b > 0

                                    x

           q                    x
                       q

                                            x
<w, x> + b < 0     w                    x


                       q
               q
                           q
                               {x | <w, x> + b = 0}
{x | <w, x> + b = +1}
{x | <w, x> + b = −1}                                    Note:
                                      x
                                                                  <w, x1> + b = +1
           r                      x   x1       yi = +1            <w, x2> + b = −1
                   x2r
                                                         =>     <w , (x1−x2)> = 2
                                                x
                             ,                                  w
    yi = −1        w                       x             =>   <     , (x1−x2) = 2
                                                                           >
                                                              ||w||            ||w||

                        r
               r
                            r
                                 {x | <w, x> + b = 0}
16 Linear Support Vector Machines (at last!)

So, we would like to find the function which minimizes an objective like:
  Training Error + Complexity term
We write that as:
                     m
                 1
                           (f (xi , α), yi ) + Complexity term
                 m   i=1

For now we will choose the set of hyperplanes (we will extend this later),
so f (x) = (w · x) + b:
                           m
                      1
                                 (w · xi + b, yi ) + ||w||2
                      m    i=1

subject to mini |w · xi | = 1.
17 Linear Support Vector Machines II
That function before was a little difficult to minimize because of the step
function in (y, y ) (either 1 or 0).
                ˆ
Let’s assume we can separate the data perfectly. Then we can optimize
the following:
Minimize ||w||2 , subject to:


                      (w · xi + b) ≥ 1, if yi = 1
                    (w · xi + b) ≤ −1, if yi = −1
The last two constraints can be compacted to:

                            yi (w · xi + b) ≥ 1

This is a quadratic program.
18 SVMs : non-separable case
To deal with the non-separable case, one can rewrite the problem as:
Minimize:
                                             m
                                ||w||2 + C         ξi
                                             i=1
subject to:

                     yi (w · xi + b) ≥ 1 − ξi ,         ξi ≥ 0

This is just the same as the original objective:
                          m
                      1
                                (w · xi + b, yi ) + ||w||2
                      m   i=1

except is no longer the zero-one loss, but is called the ”hinge-loss”:
 (y, y ) = max(0, 1 − y y ). This is still a quadratic program!
     ˆ                  ˆ
-       +
-                        -        -
                 -

                                 margin
    +                        +
+       ξi
                     +       -        +
             -
19 Support Vector Machines - Primal
 • Decision function:
                            f (x) = w · x + b

 • Primal formulation:
                             1     2
       min P (w, b) =          w           + C          H1 [ yi f (xi ) ]
                             2                      i
                         maximize margin
                                            minimize training error
   Ideally H1 would count the number of errors, approximate with:
                                                H1(z)




   Hinge Loss H1 (z) = max(0, 1 − z)

                                                    0                   z
20 SVMs : non-linear case
Linear classifiers aren’t complex enough sometimes. SVM solution:
Map data into a richer feature space including nonlinear features, then
construct a hyperplane in that space so all other equations are the same!
Formally, preprocess the data with:

                               x → Φ(x)

and then learn the map from Φ(x) to y:

                          f (x) = w · Φ(x) + b.
21 SVMs : polynomial mapping

                             Φ : R2 → R3
          (x1 , x2 ) → (z1 , z2 , z3 ) := (x2 ,
                                            1                            (2)x1 x2 , x2 )
                                                                                     2

                                   x2
                                                                                     z3
                   !
                                           !                             !                !
      !                                                !
                                       !                                         !
                !                                                        !
                                           !                                              !
                                                                                              !
                                                                     !
      !    !                       r               !                                 !
                                                                             !
               r
                           r
                                           r
                                                           x1        r
                                                                     r                            !
                               r
                                                                       r !
      !
                                   r
                                           r
                                                                      rr   !
                                                                             !
                                                                                                      z1
           !               r                       !                 r r
                                                                         r
                                                           !                                          !
                       !
                                               !

            !              !                           !
                                       !                        z2
22 SVMs : non-linear case II
For example MNIST hand-writing recognition.
60,000 training examples, 10000 test examples, 28x28.
Linear SVM has around 8.5% test error.
Polynomial SVM has around 1% test error.

                     5   0   4   1   9   2   1   3   1   4


                     3   5   3   6   1   7   2   8   6   9


                     4   0   9   1   1   2   4   3   2   7


                     3   8   6   9   0   5   6   0   7   6


                     1   8   7   9   3   9   8   5   9   3


                     3   0   7   4   9   8   0   9   4   1


                     4   4   6   0   4   5   6   1   0   0


                     1   7   1   6   3   0   2   1   1   7


                     9   0   2   6   7   8   3   9   0   4


                     6   7   4   6   8   0   7   8   3   1
23 SVMs : full MNIST results

                        Classifier           Test Error
                          linear            8.4%
                    3-nearest-neighbor      2.4%
                       RBF-SVM              1.4 %
                     Tangent distance       1.1 %
                          LeNet             1.1 %
                     Boosted LeNet          0.7 %
                Translation invariant SVM   0.56 %


Choosing a good mapping Φ(·) (encoding prior knowledge + getting right
complexity of function class) for your problem improves results.
24 SVMs : the kernel trick
Problem: the dimensionality of Φ(x) can be very large, making w hard to
represent explicitly in memory, and hard for the QP to solve.
The Representer theorem (Kimeldorf & Wahba, 1971) shows that (for
SVMs as a special case):
                                     m
                            w=             αi Φ(xi )
                                     i=1

for some variables α. Instead of optimizing w directly we can thus
optimize α.
The decision rule is now:
                               m
                     f (x) =         αi Φ(xi ) · Φ(x) + b
                               i=1

We call K(xi , x) = Φ(xi ) · Φ(x) the kernel function.
25 Support Vector Machines - kernel trick II

We can rewrite all the SVM equations we saw before, but with the
w = m αi Φ(xi ) equation:
       i=1

 • Decision function:

                      f (x) =           αi Φ(xi ) · Φ(x) + b
                                    i


                          =             αi K(xi , x) + b
                                i

 • Dual formulation:
                              m
                      1                         2
       min P (w, b) =               αi Φ(xi )       + C        H1 [ yi f (xi ) ]
                      2     i=1                            i

                         maximize margin             minimize training error
26 Support Vector Machines - Dual
But people normally write it like this:
  • Dual formulation:
                                                                     
               1                                                        È α =0
                                                                           i   i
    min D(α) =              αi αj Φ(xi )·Φ(xj )−        yi αi s.t.
     α         2                                                        0≤yi αi ≤C
                      i,j                           i

  • Dual Decision function:

                            f (x) =       αi K(xi , x) + b
                                      i

  • Kernel function K(·, ·) is used to make (implicit) nonlinear feature
    map, e.g.
     – Polynomial kernel:       K(x, x ) = (x · x + 1)d .
     – RBF kernel:           K(x, x ) = exp(−γ||x − x ||2 ).
27 Polynomial-SVMs

The kernel K(x, x ) = (x · x )d gives the same result as the explicit
mapping + dot product that we described before:
     Φ : R2 → R3        (x1 , x2 ) → (z1 , z2 , z3 ) := (x2 ,
                                                          1       (2)x1 x2 , x2 )
                                                                              2
                                                            2                       2
 Φ((x1 , x2 ) · Φ((x1 , x2 ) = (x2 ,
                                 1     (2)x1 x2 , x2 ) · (x 1 ,
                                                   2               (2)x 1 x 2 , x 2 )
                              2                             2
                      = x2 x 1 + 2x1 x 1 x2 x 2 + x2 x 2
                         1                         2

is the same as:

              K(x, x ) = (x · x )2 = ((x1 , x2 ) · (x 1 , x 2 ))2
                                           2         2
          = (x1 x 1 + x2 x 2 )2 = x2 x 1 + x2 x 2 + 2x1 x 1 x2 x 2
                                   1        2

Interestingly, if d is large the kernel is still only requires n multiplications
to compute, whereas the explicit representation may not fit in memory!
28 RBF-SVMs
The RBF kernel K(x, x ) = exp(−γ||x − x ||2 ) is one of the most
popular kernel functions. It adds a ”bump” around each data point:
                             m
                  f (x) =          αi exp(−γ||xi − x||2 ) + b
                            i=1




                    Φ




      .    .
      x    x'               Φ(x)     Φ(x')


Using this one can get state-of-the-art results.
29 SVMs : more results

There is much more in the field of SVMs/ kernel machines than we could
cover here, including:

 • Regression, clustering, semi-supervised learning and other domains.
 • Lots of other kernels, e.g. string kernels to handle text.

 • Lots of research in modifications, e.g. to improve generalization
   ability, or tailoring to a particular task.
 • Lots of research in speeding up training.

Please see text books such as the ones by Cristianini & Shawe-Taylor or
by Schoelkopf and Smola.
30 SVMs : software

Lots of SVM software:
 • LibSVM (C++)

 • SVMLight (C)
As well as complete machine learning toolboxes that include SVMs:

 • Torch (C++)
 • Spider (Matlab)

 • Weka (Java)
All available through www.kernel-machines.org.

Contenu connexe

En vedette

Self Driving Cars V11
Self Driving Cars V11Self Driving Cars V11
Self Driving Cars V11
Kevin Root
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
台灣資料科學年會
 

En vedette (9)

Machine Learning for Body Sensor Networks
Machine Learning for Body Sensor NetworksMachine Learning for Body Sensor Networks
Machine Learning for Body Sensor Networks
 
Self driving cars -
Self driving cars - Self driving cars -
Self driving cars -
 
Python and Machine Learning
Python and Machine LearningPython and Machine Learning
Python and Machine Learning
 
Self Driving Cars V11
Self Driving Cars V11Self Driving Cars V11
Self Driving Cars V11
 
Introduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnIntroduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learn
 
Data Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnData Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learn
 
Machine learning pt.1: Artificial Neural Networks ® All Rights Reserved
Machine learning pt.1: Artificial Neural Networks ® All Rights ReservedMachine learning pt.1: Artificial Neural Networks ® All Rights Reserved
Machine learning pt.1: Artificial Neural Networks ® All Rights Reserved
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
 

Similaire à Jason svm tutorial

Demonstração 1 apresentação ppt
Demonstração 1 apresentação pptDemonstração 1 apresentação ppt
Demonstração 1 apresentação ppt
Cláudia Pires
 

Similaire à Jason svm tutorial (20)

Demonstração 1 apresentação ppt
Demonstração 1 apresentação pptDemonstração 1 apresentação ppt
Demonstração 1 apresentação ppt
 
36x60 vertical templatev12
36x60 vertical templatev1236x60 vertical templatev12
36x60 vertical templatev12
 
Fungicide Resistance: Can it be Prevented?
Fungicide Resistance: Can it be Prevented?Fungicide Resistance: Can it be Prevented?
Fungicide Resistance: Can it be Prevented?
 
Cdn Tabulacion Exel
Cdn Tabulacion ExelCdn Tabulacion Exel
Cdn Tabulacion Exel
 
How SaaS founders can raise Angel funds
How SaaS founders can raise Angel fundsHow SaaS founders can raise Angel funds
How SaaS founders can raise Angel funds
 
Diagramacion
DiagramacionDiagramacion
Diagramacion
 
48x48 square templatev12
48x48 square templatev1248x48 square templatev12
48x48 square templatev12
 
Honey comb weave design
Honey comb weave designHoney comb weave design
Honey comb weave design
 
Bule E XíCara
Bule E XíCaraBule E XíCara
Bule E XíCara
 
M&A in der Medienbranche
M&A in der MedienbrancheM&A in der Medienbranche
M&A in der Medienbranche
 
Diamond Age Games Deck
Diamond Age Games DeckDiamond Age Games Deck
Diamond Age Games Deck
 
Copa do mundo 2018 continentes paises vencedores
Copa do mundo 2018 continentes paises vencedoresCopa do mundo 2018 continentes paises vencedores
Copa do mundo 2018 continentes paises vencedores
 
Copa do mundo 2018 goleadores
Copa do mundo 2018 goleadoresCopa do mundo 2018 goleadores
Copa do mundo 2018 goleadores
 
Sayasaya
SayasayaSayasaya
Sayasaya
 
36x60 horizontal templatev12
36x60 horizontal templatev1236x60 horizontal templatev12
36x60 horizontal templatev12
 
Preventing vaw in-elections
Preventing vaw in-electionsPreventing vaw in-elections
Preventing vaw in-elections
 
SignWriting in an ASCII World
SignWriting in an ASCII WorldSignWriting in an ASCII World
SignWriting in an ASCII World
 
SIGNWRITING SYMPOSIUM PRESENTATION 49, Part 1: SignWriting in an ASCII World ...
SIGNWRITING SYMPOSIUM PRESENTATION 49, Part 1: SignWriting in an ASCII World ...SIGNWRITING SYMPOSIUM PRESENTATION 49, Part 1: SignWriting in an ASCII World ...
SIGNWRITING SYMPOSIUM PRESENTATION 49, Part 1: SignWriting in an ASCII World ...
 
Copa do mundo 2018 bolas
Copa do mundo 2018 bolasCopa do mundo 2018 bolas
Copa do mundo 2018 bolas
 
Icraf seminar(de clerck)
Icraf seminar(de clerck)Icraf seminar(de clerck)
Icraf seminar(de clerck)
 

Dernier

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 

Dernier (20)

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 

Jason svm tutorial

  • 1. Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com
  • 2. 1 Support Vector Machines: history • SVMs introduced in COLT-92 by Boser, Guyon & Vapnik. Became rather popular since. • Theoretically well motivated algorithm: developed from Statistical Learning Theory (Vapnik & Chervonenkis) since the 60s. • Empirically good performance: successful applications in many fields (bioinformatics, text, image recognition, . . . )
  • 3. 2 Support Vector Machines: history II • Centralized website: www.kernel-machines.org. • Several textbooks, e.g. ”An introduction to Support Vector Machines” by Cristianini and Shawe-Taylor is one. • A large and diverse community work on them: from machine learning, optimization, statistics, neural networks, functional analysis, etc.
  • 4. 3 Support Vector Machines: basics [Boser, Guyon, Vapnik ’92],[Cortes & Vapnik ’95] - - - - margin - margin + + + + + Nice properties: convex, theoretically motivated, nonlinear with kernels..
  • 5. 4 Preliminaries: • Machine learning is about learning structure from data. • Although the class of algorithms called ”SVM”s can do more, in this talk we focus on pattern recognition. • So we want to learn the mapping: X → Y, where x ∈ X is some object and y ∈ Y is a class label. • Let’s take the simplest case: 2-class classification. So: x ∈ Rn , y ∈ {±1}.
  • 6. 5 Example: Suppose we have 50 photographs of elephants and 50 photos of tigers. vs. We digitize them into 100 x 100 pixel images, so we have x ∈ Rn where n = 10, 000. Now, given a new (different) photograph we want to answer the question: is it an elephant or a tiger? [we assume it is one or the other.]
  • 7. 6 Training sets and prediction models • input/output sets X , Y • training set (x1 , y1 ), . . . , (xm , ym ) • ”generalization”: given a previously seen x ∈ X , find a suitable y ∈ Y. • i.e., want to learn a classifier: y = f (x, α), where α are the parameters of the function. • For example, if we are choosing our model from the set of hyperplanes in Rn , then we have: f (x, {w, b}) = sign(w · x + b).
  • 8. 7 Empirical Risk and the true Risk • We can try to learn f (x, α) by choosing a function that performs well on training data: m 1 Remp (α) = (f (xi , α), yi ) = Training Error m i=1 where is the zero-one loss function, (y, y ) = 1, if y = y , and 0 ˆ ˆ otherwise. Remp is called the empirical risk. • By doing this we are trying to minimize the overall risk: R(α) = (f (x, α), y)dP (x, y) = Test Error where P(x,y) is the (unknown) joint distribution function of x and y.
  • 9. 8 Choosing the set of functions What about f (x, α) allowing all functions from X to {±1}? Training set (x1 , y1 ), . . . , (xm , ym ) ∈ X × {±1} Test set x1 , . . . , xm ∈ X , ¯ ¯¯ such that the two sets do not intersect. For any f there exists f ∗ : 1. f ∗ (xi ) = f (xi ) for all i 2. f ∗ (xj ) = f (xj ) for all j Based on the training data alone, there is no means of choosing which function is better. On the test set however they give different results. So generalization is not guaranteed. =⇒ a restriction must be placed on the functions that we allow.
  • 10. 9 Empirical Risk and the true Risk Vapnik & Chervonenkis showed that an upper bound on the true risk can be given by the empirical risk + an additional term: h(log( 2m + 1) − log( η ) 4 R(α) ≤ Remp (α) + h m where h is the VC dimension of the set of functions parameterized by α. • The VC dimension of a set of functions is a measure of their capacity or complexity. • If you can describe a lot of different phenomena with a set of functions then the value of h is large. [VC dim = the maximum number of points that can be separated in all possible ways by that set of functions.]
  • 11. 10 VC dimension: The VC dimension of a set of functions is the maximum number of points that can be separated in all possible ways by that set of functions. For hyperplanes in Rn , the VC dimension can be shown to be n + 1. xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx
  • 12. 11 VC dimension and capacity of functions Simplification of bound: Test Error ≤ Training Error + Complexity of set of Models • Actually, a lot of bounds of this form have been proved (different measures of capacity). The complexity function is often called a regularizer. • If you take a high capacity set of functions (explain a lot) you get low training error. But you might ”overfit”. • If you take a very simple set of models, you have low complexity, but won’t get low training error.
  • 13. 12 Capacity of a set of functions (classification) [Images taken from a talk by B. Schoelkopf.]
  • 14. 13 Capacity of a set of functions (regression) y sine curve fit hyperplane fit true function x
  • 15. 14 Controlling the risk: model complexity Bound on the risk Confidence interval Empirical risk (training error) h1 h* hn S1 S* Sn
  • 16. 15 Capacity of hyperplanes Vapnik & Chervonenkis also showed the following: Consider hyperplanes (w · x) = 0 where w is normalized w.r.t a set of points X ∗ such that: mini |w · xi | = 1. The set of decision functions fw (x) = sign(w · x) defined on X ∗ such that ||w|| ≤ A has a VC dimension satisfying h ≤ R2 A2 . where R is the radius of the smallest sphere around the origin containing X ∗. =⇒ minimize ||w||2 and have low capacity =⇒ minimizing ||w||2 equivalent to obtaining a large margin classifier
  • 17. <w, x> + b > 0 x q x q x <w, x> + b < 0 w x q q q {x | <w, x> + b = 0}
  • 18. {x | <w, x> + b = +1} {x | <w, x> + b = −1} Note: x <w, x1> + b = +1 r x x1 yi = +1 <w, x2> + b = −1 x2r => <w , (x1−x2)> = 2 x , w yi = −1 w x => < , (x1−x2) = 2 > ||w|| ||w|| r r r {x | <w, x> + b = 0}
  • 19. 16 Linear Support Vector Machines (at last!) So, we would like to find the function which minimizes an objective like: Training Error + Complexity term We write that as: m 1 (f (xi , α), yi ) + Complexity term m i=1 For now we will choose the set of hyperplanes (we will extend this later), so f (x) = (w · x) + b: m 1 (w · xi + b, yi ) + ||w||2 m i=1 subject to mini |w · xi | = 1.
  • 20. 17 Linear Support Vector Machines II That function before was a little difficult to minimize because of the step function in (y, y ) (either 1 or 0). ˆ Let’s assume we can separate the data perfectly. Then we can optimize the following: Minimize ||w||2 , subject to: (w · xi + b) ≥ 1, if yi = 1 (w · xi + b) ≤ −1, if yi = −1 The last two constraints can be compacted to: yi (w · xi + b) ≥ 1 This is a quadratic program.
  • 21. 18 SVMs : non-separable case To deal with the non-separable case, one can rewrite the problem as: Minimize: m ||w||2 + C ξi i=1 subject to: yi (w · xi + b) ≥ 1 − ξi , ξi ≥ 0 This is just the same as the original objective: m 1 (w · xi + b, yi ) + ||w||2 m i=1 except is no longer the zero-one loss, but is called the ”hinge-loss”: (y, y ) = max(0, 1 − y y ). This is still a quadratic program! ˆ ˆ
  • 22. - + - - - - margin + + + ξi + - + -
  • 23. 19 Support Vector Machines - Primal • Decision function: f (x) = w · x + b • Primal formulation: 1 2 min P (w, b) = w + C H1 [ yi f (xi ) ] 2 i maximize margin minimize training error Ideally H1 would count the number of errors, approximate with: H1(z) Hinge Loss H1 (z) = max(0, 1 − z) 0 z
  • 24. 20 SVMs : non-linear case Linear classifiers aren’t complex enough sometimes. SVM solution: Map data into a richer feature space including nonlinear features, then construct a hyperplane in that space so all other equations are the same! Formally, preprocess the data with: x → Φ(x) and then learn the map from Φ(x) to y: f (x) = w · Φ(x) + b.
  • 25. 21 SVMs : polynomial mapping Φ : R2 → R3 (x1 , x2 ) → (z1 , z2 , z3 ) := (x2 , 1 (2)x1 x2 , x2 ) 2 x2 z3 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! r ! ! ! r r r x1 r r ! r r ! ! r r rr ! ! z1 ! r ! r r r ! ! ! ! ! ! ! ! z2
  • 26. 22 SVMs : non-linear case II For example MNIST hand-writing recognition. 60,000 training examples, 10000 test examples, 28x28. Linear SVM has around 8.5% test error. Polynomial SVM has around 1% test error. 5 0 4 1 9 2 1 3 1 4 3 5 3 6 1 7 2 8 6 9 4 0 9 1 1 2 4 3 2 7 3 8 6 9 0 5 6 0 7 6 1 8 7 9 3 9 8 5 9 3 3 0 7 4 9 8 0 9 4 1 4 4 6 0 4 5 6 1 0 0 1 7 1 6 3 0 2 1 1 7 9 0 2 6 7 8 3 9 0 4 6 7 4 6 8 0 7 8 3 1
  • 27. 23 SVMs : full MNIST results Classifier Test Error linear 8.4% 3-nearest-neighbor 2.4% RBF-SVM 1.4 % Tangent distance 1.1 % LeNet 1.1 % Boosted LeNet 0.7 % Translation invariant SVM 0.56 % Choosing a good mapping Φ(·) (encoding prior knowledge + getting right complexity of function class) for your problem improves results.
  • 28. 24 SVMs : the kernel trick Problem: the dimensionality of Φ(x) can be very large, making w hard to represent explicitly in memory, and hard for the QP to solve. The Representer theorem (Kimeldorf & Wahba, 1971) shows that (for SVMs as a special case): m w= αi Φ(xi ) i=1 for some variables α. Instead of optimizing w directly we can thus optimize α. The decision rule is now: m f (x) = αi Φ(xi ) · Φ(x) + b i=1 We call K(xi , x) = Φ(xi ) · Φ(x) the kernel function.
  • 29. 25 Support Vector Machines - kernel trick II We can rewrite all the SVM equations we saw before, but with the w = m αi Φ(xi ) equation: i=1 • Decision function: f (x) = αi Φ(xi ) · Φ(x) + b i = αi K(xi , x) + b i • Dual formulation: m 1 2 min P (w, b) = αi Φ(xi ) + C H1 [ yi f (xi ) ] 2 i=1 i maximize margin minimize training error
  • 30. 26 Support Vector Machines - Dual But people normally write it like this: • Dual formulation:  1  È α =0 i i min D(α) = αi αj Φ(xi )·Φ(xj )− yi αi s.t. α 2  0≤yi αi ≤C i,j i • Dual Decision function: f (x) = αi K(xi , x) + b i • Kernel function K(·, ·) is used to make (implicit) nonlinear feature map, e.g. – Polynomial kernel: K(x, x ) = (x · x + 1)d . – RBF kernel: K(x, x ) = exp(−γ||x − x ||2 ).
  • 31. 27 Polynomial-SVMs The kernel K(x, x ) = (x · x )d gives the same result as the explicit mapping + dot product that we described before: Φ : R2 → R3 (x1 , x2 ) → (z1 , z2 , z3 ) := (x2 , 1 (2)x1 x2 , x2 ) 2 2 2 Φ((x1 , x2 ) · Φ((x1 , x2 ) = (x2 , 1 (2)x1 x2 , x2 ) · (x 1 , 2 (2)x 1 x 2 , x 2 ) 2 2 = x2 x 1 + 2x1 x 1 x2 x 2 + x2 x 2 1 2 is the same as: K(x, x ) = (x · x )2 = ((x1 , x2 ) · (x 1 , x 2 ))2 2 2 = (x1 x 1 + x2 x 2 )2 = x2 x 1 + x2 x 2 + 2x1 x 1 x2 x 2 1 2 Interestingly, if d is large the kernel is still only requires n multiplications to compute, whereas the explicit representation may not fit in memory!
  • 32. 28 RBF-SVMs The RBF kernel K(x, x ) = exp(−γ||x − x ||2 ) is one of the most popular kernel functions. It adds a ”bump” around each data point: m f (x) = αi exp(−γ||xi − x||2 ) + b i=1 Φ . . x x' Φ(x) Φ(x') Using this one can get state-of-the-art results.
  • 33. 29 SVMs : more results There is much more in the field of SVMs/ kernel machines than we could cover here, including: • Regression, clustering, semi-supervised learning and other domains. • Lots of other kernels, e.g. string kernels to handle text. • Lots of research in modifications, e.g. to improve generalization ability, or tailoring to a particular task. • Lots of research in speeding up training. Please see text books such as the ones by Cristianini & Shawe-Taylor or by Schoelkopf and Smola.
  • 34. 30 SVMs : software Lots of SVM software: • LibSVM (C++) • SVMLight (C) As well as complete machine learning toolboxes that include SVMs: • Torch (C++) • Spider (Matlab) • Weka (Java) All available through www.kernel-machines.org.