SlideShare a Scribd company logo
1 of 89
Computer vision: models,
 learning and inference
          Chapter 7
  Modeling complex densities


   Please send errata to s.prince@cs.ucl.ac.uk
Structure
• Densities for classification
• Models with hidden variables
  – Mixture of Gaussians
  – t-distributions
  – Factor analysis
• EM algorithm in detail
• Applications


         Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   2
Models for machine vision




  Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   3
Face Detection




Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   4
Type 3: Pr(x|w) - Generative

How to model Pr(x|w)?
  – Choose an appropriate form for Pr(x)
  – Make parameters a function of w
  – Function takes parameters q that define its shape

Learning algorithm: learn parameters q from training data x,w
Inference algorithm: Define prior Pr(w) and then compute
    Pr(w|x) using Bayes’ rule



            Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   5
Classification Model

Or writing in terms of class conditional density functions



Parameters m0, S0 learnt just from data S0 where w=0




 Similarly, parameters m1, S1 learnt just from data S1 where w=1
          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   6
Inference algorithm: Define prior Pr(w) and
      then compute Pr(w|x) using Bayes’ rule




                Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   7
Experiment
1000 non-faces
1000 faces

60x60x3 Images =10800 x1 vectors

Equal priors Pr(y=1)=Pr(y=0) = 0.5

75% performance on test set. Not
very good!




              Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   8
Results (diagonal covariance)




   Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   9
The plan...




Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   10
Structure
• Densities for classification
• Models with hidden variables
  – Mixture of Gaussians
  – t-distributions
  – Factor analysis
• EM algorithm in detail
• Applications


         Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   11
Hidden (or latent) Variables
Key idea: represent density Pr(x) as marginalization of joint
density with another variable h that we do not see




Will also depend on some parameters:




            Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   12
Hidden (or latent) Variables




  Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   13
Expectation Maximization
An algorithm specialized to fitting pdfs which are the
marginalization of a joint distribution




Defines a lower bound on log likelihood and increases bound
iteratively




          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   14
Lower bound


Lower bound is a function of parameters q and a set of
probability distributions qi(hi)

Expectation Maximization (EM) algorithm alternates E-
steps and M-Steps

E-Step – Maximize bound w.r.t. distributions q(hi)
M-Step – Maximize bound w.r.t. parameters q

           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   15
Lower bound




Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   16
E-Step & M-Step
E-Step – Maximize bound w.r.t. distributions qi(hi)




M-Step – Maximize bound w.r.t. parameters q




          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   17
E-Step & M-Step




Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   18
Structure
• Densities for classification
• Models with hidden variables
  – Mixture of Gaussians
  – t-distributions
  – Factor analysis
• EM algorithm in detail
• Applications


         Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   19
Mixture of Gaussians (MoG)




  Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   20
ML Learning
Q. Can we learn this model with maximum likelihood?




A. Yes, but using brute force approach is tricky
   • If you take derivative and set to zero, can’t solve
               -- the log of the sum causes problems
   • Have to enforce constraints on parameters
       • covariances must be positive definite
       • weights must sum to one
        Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   21
MoG as a marginalization
Define a variable                                     and then write




Then we can recover the density by marginalizing Pr(x,h)




          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   22
MoG as a marginalization




 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   23
MoG as a marginalization
Define a variable                                      and then write




Note :
    •    This gives us a method to generate data from MoG
         • First sample Pr(h), then sample Pr(x|h)
    •    The hidden variable h has a clear interpretation –
         it tells you which Gaussian created data point x

          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   24
Expectation Maximization for MoG
GOAL: to learn parameters                                                             from
              training data
E-Step – Maximize bound w.r.t. distributions q(hi)




M-Step – Maximize bound w.r.t. parameters q




           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince          25
E-Step




                                                         We’ll call this the
                                                         responsibility of the
                                                         kth Gaussian for the ith data
                                                         point

                                                         Repeat this procedure for
                                                         every datapoint!



Computer vision: models, learning and inference. ©2011 Simon J.D. Prince                 26
M-Step



Take derivative, equate to zero and solve (Lagrange multipliers for l)




             Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   27
M-Step




Update means , covariances and weights according to
          responsibilities of datapoints
     Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   28
Iterate until no further improvement




    E-Step                                                               M-Step
      Computer vision: models, learning and inference. ©2011 Simon J.D. Prince    29
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   30
Different flavours...




   Full                              Diagonal                                   Same
covariance                          covariance                                covariance


       Computer vision: models, learning and inference. ©2011 Simon J.D. Prince            31
Local Minima
Start from three random positions




      L = 98.76                              L = 96.97                                   L = 94.35

              Computer vision: models, learning and inference. ©2011 Simon J.D. Prince               32
Means of face/non-face model




Classification  84% (9% improvement!)


         Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   33
The plan...




Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   34
Structure
• Densities for classification
• Models with hidden variables
  – Mixture of Gaussians
  – t-distributions
  – Factor analysis
• EM algorithm in detail
• Applications


         Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   35
Student t-distributions




Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   36
Student t-distributions motivation
The normal distribution is not very robust – a single outlier can
completely throw it off because the tails fall off so fast...




   Normal distribution               Normal distribution                                  t-distribution
                                    w/ one extra datapoint!

   Normal distribution               Normal distribution                                  t-distribution
                                    w/ one extra datapoint!
               Computer vision: models, learning and inference. ©2011 Simon J.D. Prince                    37
Student t-distributions
Univariate student t-distribution




Multivariate student t-distribution




           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   38
t-distribution as a marginalization
Define hidden variable h




Can be expressed as a marginalization




          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   39
Gamma distribution




Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   40
t-distribution as a marginalization
Define hidden variable h




Things to note:
•    Again this provides a method to sample from the t-distribution
•    Variable h has a clear interpretation:
     • Each datum drawn from a Gaussian, mean m
     • Covariance depends inversely on h
•    Can think of this as an infinite mixture (sum becomes integral)
     of Gaussians w/ same mean, but different variances
             Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   41
t-distribution




Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   42
EM for t-distributions
GOAL: to learn parameters                                                     from training
                    data
E-Step – Maximize bound w.r.t. distributions q(hi)




M-Step – Maximize bound w.r.t. parameters q




           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince           43
E-Step




Extract expectations




              Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   44
M-Step




Where...




           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   45
Updates




No closed form solution for n. Must optimize bound – since it
is only one number we can optimize by just evaluating with a
fine grid of values and picking the best


          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   46
EM algorithm for t-distributions




    Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   47
The plan...




Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   48
Structure
• Densities for classification
• Models with hidden variables
  – Mixture of Gaussians
  – t-distributions
  – Factor analysis
• EM algorithm in detail
• Applications


         Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   49
Factor Analysis
Compromise between
  –   Full covariance matrix (desirable but many parameters)
  –   Diagonal covariance matrix (no modelling of covariance)


Models full covariance in a subspace F
Mops everything else up with diagonal component S




          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   50
Data Density



Consider modelling distribution of 100x100x3 RGB Images

Gaussian w/ spherical covariance: 1                                           covariance matrix parameters

Gaussian w/ diagonal covariance:                   Dx                         covariance matrix parameters

Full Gaussian:                                     ~Dx2                       covariance matrix parameters

PPCA:                                              ~Dx Dh                     covariance matrix parameters

Factor Analysis:                                   ~Dx (Dh+1)                  covariance matrix parameters
                   Computer vision: models, learning and inference. ©2011 Simon J.D. Prince            51
                                                                      (full covariance in subspace + diagonal)
Subspaces




Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   52
Factor Analysis
Compromise between
  –   Full covariance matrix (desirable but many parameters)
  –   Diagonal covariance matrix (no modelling of covariance)


Models full covariance in a subspace F
Mops everything else up with diagonal component S




          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   53
Factor Analysis as a Marginalization
Let’s define




Then it can be shown (not obvious) that




           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   54
Sampling from factor analyzer




   Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   55
Factor analysis vs. MoG




Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   56
E-Step
•   Compute posterior over hidden variables




•   Compute expectations needed for M-Step




         Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   57
E-Step




Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   58
M-Step
•   Optimize parameters w.r.t. EM bound




where....



            Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   59
M-Step
•   Optimize parameters w.r.t. EM bound



where...


giving expressions:




           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   60
Face model




Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   61
Sampling from 10 parameter model
 To generate:
    •   Choose factor loadings, hi from standard normal distribution
    •   Multiply by factors, F
    •   Add mean, m
    •   (should add random noise component ei w/ diagonal cov S)




           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   62
Combining Models
Can combine three models in various combinations

•   Mixture of t-distributions = mixture of Gaussians + t-distributions
•   Robust subspace models = t-distributions + factor analysis
•   Mixture of factor analyizers = mixture of Gaussians + factor analysis


Or combine all models to create mixture of robust subspace
models




                Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   63
Structure
• Densities for classification
• Models with hidden variables
  – Mixture of Gaussians
  – t-distributions
  – Factor analysis
• EM algorithm in detail
• Applications


         Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   64
Expectation Maximization
Problem: Optimize cost functions of the form


                                                                                      Discrete case



                                                                                      Continuous case




Solution: Expectation Maximization (EM) algorithm
                                             (Dempster, Laird and Rubin 1977)

Key idea: Define lower bound on log-likelihood and
                            increase at each iteration
           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince                   65
Expectation Maximization

Defines a lower bound on log likelihood and increases bound
iteratively




           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   66
Lower bound




Lower bound is a function of parameters q and a set of
probability distributions {qi(hi)}
         Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   67
E-Step & M-Step
E-Step – Maximize bound w.r.t. distributions {qi(hi)}




M-Step – Maximize bound w.r.t. parameters q




          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   68
E-Step: Update {qi[hi]} so that bound equals log likelihood for this q
              Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   69
M-Step: Update q to maximum
        Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   70
Missing Parts of Argument




  Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   71
Missing Parts of Argument
1. Show that this is a lower bound
2. Show that the E-Step update is correct
3. Show that the M-Step update is correct




        Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   72
Jensen’s Inequality




                      or…


               similarly

Computer vision: models, learning and inference. ©2011 Simon
                                                               73
                          J.D. Prince
Lower Bound for EM Algorithm




Where we have used Jensen’s inequality




          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   74
E-Step – Optimize bound w.r.t {qi(hi)}




             Constant w.r.t. q(h)                      Only this term matters


   Computer vision: models, learning and inference. ©2011 Simon J.D. Prince     75
E-Step




  Kullback Leibler divergence – distance between probability
  distributions . We are maximizing the negative distance (i.e.
  Minimizing distance)
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   76
Use this relation




                           log[y]<y-1
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   77
Kullback Leibler Divergence




So the cost function must be positive




In other words, the best we can do is choose qi(hi) so that this is
zero
             Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   78
E-Step
So the cost function must be positive




The best we can do is choose qi(hi) so that this is zero.
How can we do this? Easy – choose posterior Pr(h|x)




             Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   79
M-Step – Optimize bound w.r.t. q




In the M-Step we optimize expected joint log likelihood with
respect to parameters q (Expectation w.r.t distribution from E-Step)

            Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   80
E-Step & M-Step
E-Step – Maximize bound w.r.t. distributions qi(hi)




M-Step – Maximize bound w.r.t. parameters q




          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   81
Structure
• Densities for classification
• Models with hidden variables
  – Mixture of Gaussians
  – t-distributions
  – Factor analysis
• EM algorithm in detail
• Applications


         Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   82
Face Detection




(not a very realistic model – face detection really done using discriminative methods)

             Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   83
Object recognition




                                                Model as J independent patches
t-distributions
                     normal
                  distributions

                     Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   84
Segmentation




Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   85
Face Recognition




Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   86
Pose Regression




Training data                    Learning                                          Inference

Frontal faces x                  Fit factor analysis model to                      Given new x, infer w
Non-frontal faces w              joint distribution of x,w                         (cond. dist of normal)

                Computer vision: models, learning and inference. ©2011 Simon J.D. Prince               87
Conditional Distributions




If


then


        Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   88
Modeling transformations




 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   89

More Related Content

What's hot

An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...NTNU
 
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural NetworksQuantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural NetworksValentin De Bortoli
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr taeseon ryu
 
Dixon Deep Learning
Dixon Deep LearningDixon Deep Learning
Dixon Deep LearningSciCompIIT
 
Probability distributions for ml
Probability distributions for mlProbability distributions for ml
Probability distributions for mlSung Yub Kim
 
Fisherfaces Face Recognition Algorithm
Fisherfaces Face Recognition AlgorithmFisherfaces Face Recognition Algorithm
Fisherfaces Face Recognition AlgorithmVishnu K N
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksSteve Nouri
 
A comparison of three learning methods to predict N20 fluxes and N leaching
A comparison of three learning methods to predict N20 fluxes and N leachingA comparison of three learning methods to predict N20 fluxes and N leaching
A comparison of three learning methods to predict N20 fluxes and N leachingtuxette
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiestaeseon ryu
 
Machine learning for computer vision part 2
Machine learning for computer vision part 2Machine learning for computer vision part 2
Machine learning for computer vision part 2potaters
 
A note on word embedding
A note on word embeddingA note on word embedding
A note on word embeddingKhang Pham
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forestsChristian Robert
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Frank Nielsen
 
Measuring Dependency via Intrinsic Dimensionality (ICPR 2016)
Measuring Dependency via Intrinsic Dimensionality (ICPR 2016)Measuring Dependency via Intrinsic Dimensionality (ICPR 2016)
Measuring Dependency via Intrinsic Dimensionality (ICPR 2016)Simone Romano
 

What's hot (20)

An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
 
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural NetworksQuantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr
 
Dixon Deep Learning
Dixon Deep LearningDixon Deep Learning
Dixon Deep Learning
 
Probability distributions for ml
Probability distributions for mlProbability distributions for ml
Probability distributions for ml
 
PMF BPMF and BPTF
PMF BPMF and BPTFPMF BPMF and BPTF
PMF BPMF and BPTF
 
Fisherfaces Face Recognition Algorithm
Fisherfaces Face Recognition AlgorithmFisherfaces Face Recognition Algorithm
Fisherfaces Face Recognition Algorithm
 
CC mmds talk 2106
CC mmds talk 2106CC mmds talk 2106
CC mmds talk 2106
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networks
 
A comparison of three learning methods to predict N20 fluxes and N leaching
A comparison of three learning methods to predict N20 fluxes and N leachingA comparison of three learning methods to predict N20 fluxes and N leaching
A comparison of three learning methods to predict N20 fluxes and N leaching
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilities
 
Generalized Reinforcement Learning
Generalized Reinforcement LearningGeneralized Reinforcement Learning
Generalized Reinforcement Learning
 
Machine learning for computer vision part 2
Machine learning for computer vision part 2Machine learning for computer vision part 2
Machine learning for computer vision part 2
 
A note on word embedding
A note on word embeddingA note on word embedding
A note on word embedding
 
Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 
Measuring Dependency via Intrinsic Dimensionality (ICPR 2016)
Measuring Dependency via Intrinsic Dimensionality (ICPR 2016)Measuring Dependency via Intrinsic Dimensionality (ICPR 2016)
Measuring Dependency via Intrinsic Dimensionality (ICPR 2016)
 

Viewers also liked

Snakes in Images (Active contour tutorial)
Snakes in Images (Active contour tutorial)Snakes in Images (Active contour tutorial)
Snakes in Images (Active contour tutorial)Yan Xu
 
Flexible Shape Matching
Flexible Shape MatchingFlexible Shape Matching
Flexible Shape MatchingAhmed Gad
 
Active contour segmentation
Active contour segmentationActive contour segmentation
Active contour segmentationNishant Jain
 
Image segmentation
Image segmentationImage segmentation
Image segmentationRania H
 
Image segmentation ppt
Image segmentation pptImage segmentation ppt
Image segmentation pptGichelle Amon
 

Viewers also liked (6)

Snakes in Images (Active contour tutorial)
Snakes in Images (Active contour tutorial)Snakes in Images (Active contour tutorial)
Snakes in Images (Active contour tutorial)
 
Flexible Shape Matching
Flexible Shape MatchingFlexible Shape Matching
Flexible Shape Matching
 
Active contour segmentation
Active contour segmentationActive contour segmentation
Active contour segmentation
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
IMAGE SEGMENTATION.
IMAGE SEGMENTATION.IMAGE SEGMENTATION.
IMAGE SEGMENTATION.
 
Image segmentation ppt
Image segmentation pptImage segmentation ppt
Image segmentation ppt
 

Similar to 07 cv mil_modeling_complex_densities

09 cv mil_classification
09 cv mil_classification09 cv mil_classification
09 cv mil_classificationzukun
 
08 cv mil_regression
08 cv mil_regression08 cv mil_regression
08 cv mil_regressionzukun
 
04 cv mil_fitting_probability_models
04 cv mil_fitting_probability_models04 cv mil_fitting_probability_models
04 cv mil_fitting_probability_modelszukun
 
11 cv mil_models_for_chains_and_trees
11 cv mil_models_for_chains_and_trees11 cv mil_models_for_chains_and_trees
11 cv mil_models_for_chains_and_treeszukun
 
06 cv mil_learning_and_inference
06 cv mil_learning_and_inference06 cv mil_learning_and_inference
06 cv mil_learning_and_inferencezukun
 
17 cv mil_models_for_shape
17 cv mil_models_for_shape17 cv mil_models_for_shape
17 cv mil_models_for_shapezukun
 
15 cv mil_models_for_transformations
15 cv mil_models_for_transformations15 cv mil_models_for_transformations
15 cv mil_models_for_transformationszukun
 
13 cv mil_preprocessing
13 cv mil_preprocessing13 cv mil_preprocessing
13 cv mil_preprocessingzukun
 
16 cv mil_multiple_cameras
16 cv mil_multiple_cameras16 cv mil_multiple_cameras
16 cv mil_multiple_cameraszukun
 
19 cv mil_temporal_models
19 cv mil_temporal_models19 cv mil_temporal_models
19 cv mil_temporal_modelszukun
 
Graphical Models for chains, trees and grids
Graphical Models for chains, trees and gridsGraphical Models for chains, trees and grids
Graphical Models for chains, trees and gridspotaters
 
14 cv mil_the_pinhole_camera
14 cv mil_the_pinhole_camera14 cv mil_the_pinhole_camera
14 cv mil_the_pinhole_camerazukun
 
03 cv mil_probability_distributions
03 cv mil_probability_distributions03 cv mil_probability_distributions
03 cv mil_probability_distributionszukun
 
20 cv mil_models_for_words
20 cv mil_models_for_words20 cv mil_models_for_words
20 cv mil_models_for_wordszukun
 
02 cv mil_intro_to_probability
02 cv mil_intro_to_probability02 cv mil_intro_to_probability
02 cv mil_intro_to_probabilityzukun
 
12 cv mil_models_for_grids
12 cv mil_models_for_grids12 cv mil_models_for_grids
12 cv mil_models_for_gridszukun
 
Common Probability Distibution
Common Probability DistibutionCommon Probability Distibution
Common Probability DistibutionLukas Tencer
 
Introduction to Probability
Introduction to ProbabilityIntroduction to Probability
Introduction to ProbabilityLukas Tencer
 
18 cv mil_style_and_identity
18 cv mil_style_and_identity18 cv mil_style_and_identity
18 cv mil_style_and_identityzukun
 

Similar to 07 cv mil_modeling_complex_densities (20)

09 cv mil_classification
09 cv mil_classification09 cv mil_classification
09 cv mil_classification
 
08 cv mil_regression
08 cv mil_regression08 cv mil_regression
08 cv mil_regression
 
04 cv mil_fitting_probability_models
04 cv mil_fitting_probability_models04 cv mil_fitting_probability_models
04 cv mil_fitting_probability_models
 
11 cv mil_models_for_chains_and_trees
11 cv mil_models_for_chains_and_trees11 cv mil_models_for_chains_and_trees
11 cv mil_models_for_chains_and_trees
 
06 cv mil_learning_and_inference
06 cv mil_learning_and_inference06 cv mil_learning_and_inference
06 cv mil_learning_and_inference
 
17 cv mil_models_for_shape
17 cv mil_models_for_shape17 cv mil_models_for_shape
17 cv mil_models_for_shape
 
15 cv mil_models_for_transformations
15 cv mil_models_for_transformations15 cv mil_models_for_transformations
15 cv mil_models_for_transformations
 
13 cv mil_preprocessing
13 cv mil_preprocessing13 cv mil_preprocessing
13 cv mil_preprocessing
 
16 cv mil_multiple_cameras
16 cv mil_multiple_cameras16 cv mil_multiple_cameras
16 cv mil_multiple_cameras
 
19 cv mil_temporal_models
19 cv mil_temporal_models19 cv mil_temporal_models
19 cv mil_temporal_models
 
Graphical Models for chains, trees and grids
Graphical Models for chains, trees and gridsGraphical Models for chains, trees and grids
Graphical Models for chains, trees and grids
 
14 cv mil_the_pinhole_camera
14 cv mil_the_pinhole_camera14 cv mil_the_pinhole_camera
14 cv mil_the_pinhole_camera
 
03 cv mil_probability_distributions
03 cv mil_probability_distributions03 cv mil_probability_distributions
03 cv mil_probability_distributions
 
20 cv mil_models_for_words
20 cv mil_models_for_words20 cv mil_models_for_words
20 cv mil_models_for_words
 
02 cv mil_intro_to_probability
02 cv mil_intro_to_probability02 cv mil_intro_to_probability
02 cv mil_intro_to_probability
 
12 cv mil_models_for_grids
12 cv mil_models_for_grids12 cv mil_models_for_grids
12 cv mil_models_for_grids
 
Common Probability Distibution
Common Probability DistibutionCommon Probability Distibution
Common Probability Distibution
 
Introduction to Probability
Introduction to ProbabilityIntroduction to Probability
Introduction to Probability
 
18 cv mil_style_and_identity
18 cv mil_style_and_identity18 cv mil_style_and_identity
18 cv mil_style_and_identity
 
Lecture16 xing
Lecture16 xingLecture16 xing
Lecture16 xing
 

More from zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

07 cv mil_modeling_complex_densities

  • 1. Computer vision: models, learning and inference Chapter 7 Modeling complex densities Please send errata to s.prince@cs.ucl.ac.uk
  • 2. Structure • Densities for classification • Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis • EM algorithm in detail • Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 2
  • 3. Models for machine vision Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 3
  • 4. Face Detection Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 4
  • 5. Type 3: Pr(x|w) - Generative How to model Pr(x|w)? – Choose an appropriate form for Pr(x) – Make parameters a function of w – Function takes parameters q that define its shape Learning algorithm: learn parameters q from training data x,w Inference algorithm: Define prior Pr(w) and then compute Pr(w|x) using Bayes’ rule Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 5
  • 6. Classification Model Or writing in terms of class conditional density functions Parameters m0, S0 learnt just from data S0 where w=0 Similarly, parameters m1, S1 learnt just from data S1 where w=1 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 6
  • 7. Inference algorithm: Define prior Pr(w) and then compute Pr(w|x) using Bayes’ rule Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 7
  • 8. Experiment 1000 non-faces 1000 faces 60x60x3 Images =10800 x1 vectors Equal priors Pr(y=1)=Pr(y=0) = 0.5 75% performance on test set. Not very good! Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 8
  • 9. Results (diagonal covariance) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 9
  • 10. The plan... Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 10
  • 11. Structure • Densities for classification • Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis • EM algorithm in detail • Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 11
  • 12. Hidden (or latent) Variables Key idea: represent density Pr(x) as marginalization of joint density with another variable h that we do not see Will also depend on some parameters: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 12
  • 13. Hidden (or latent) Variables Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 13
  • 14. Expectation Maximization An algorithm specialized to fitting pdfs which are the marginalization of a joint distribution Defines a lower bound on log likelihood and increases bound iteratively Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 14
  • 15. Lower bound Lower bound is a function of parameters q and a set of probability distributions qi(hi) Expectation Maximization (EM) algorithm alternates E- steps and M-Steps E-Step – Maximize bound w.r.t. distributions q(hi) M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 15
  • 16. Lower bound Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 16
  • 17. E-Step & M-Step E-Step – Maximize bound w.r.t. distributions qi(hi) M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 17
  • 18. E-Step & M-Step Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 18
  • 19. Structure • Densities for classification • Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis • EM algorithm in detail • Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 19
  • 20. Mixture of Gaussians (MoG) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 20
  • 21. ML Learning Q. Can we learn this model with maximum likelihood? A. Yes, but using brute force approach is tricky • If you take derivative and set to zero, can’t solve -- the log of the sum causes problems • Have to enforce constraints on parameters • covariances must be positive definite • weights must sum to one Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 21
  • 22. MoG as a marginalization Define a variable and then write Then we can recover the density by marginalizing Pr(x,h) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 22
  • 23. MoG as a marginalization Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 23
  • 24. MoG as a marginalization Define a variable and then write Note : • This gives us a method to generate data from MoG • First sample Pr(h), then sample Pr(x|h) • The hidden variable h has a clear interpretation – it tells you which Gaussian created data point x Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 24
  • 25. Expectation Maximization for MoG GOAL: to learn parameters from training data E-Step – Maximize bound w.r.t. distributions q(hi) M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 25
  • 26. E-Step We’ll call this the responsibility of the kth Gaussian for the ith data point Repeat this procedure for every datapoint! Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 26
  • 27. M-Step Take derivative, equate to zero and solve (Lagrange multipliers for l) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 27
  • 28. M-Step Update means , covariances and weights according to responsibilities of datapoints Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 28
  • 29. Iterate until no further improvement E-Step M-Step Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 29
  • 30. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 30
  • 31. Different flavours... Full Diagonal Same covariance covariance covariance Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 31
  • 32. Local Minima Start from three random positions L = 98.76 L = 96.97 L = 94.35 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 32
  • 33. Means of face/non-face model Classification  84% (9% improvement!) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 33
  • 34. The plan... Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 34
  • 35. Structure • Densities for classification • Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis • EM algorithm in detail • Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 35
  • 36. Student t-distributions Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 36
  • 37. Student t-distributions motivation The normal distribution is not very robust – a single outlier can completely throw it off because the tails fall off so fast... Normal distribution Normal distribution t-distribution w/ one extra datapoint! Normal distribution Normal distribution t-distribution w/ one extra datapoint! Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 37
  • 38. Student t-distributions Univariate student t-distribution Multivariate student t-distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 38
  • 39. t-distribution as a marginalization Define hidden variable h Can be expressed as a marginalization Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 39
  • 40. Gamma distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 40
  • 41. t-distribution as a marginalization Define hidden variable h Things to note: • Again this provides a method to sample from the t-distribution • Variable h has a clear interpretation: • Each datum drawn from a Gaussian, mean m • Covariance depends inversely on h • Can think of this as an infinite mixture (sum becomes integral) of Gaussians w/ same mean, but different variances Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 41
  • 42. t-distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 42
  • 43. EM for t-distributions GOAL: to learn parameters from training data E-Step – Maximize bound w.r.t. distributions q(hi) M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 43
  • 44. E-Step Extract expectations Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 44
  • 45. M-Step Where... Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 45
  • 46. Updates No closed form solution for n. Must optimize bound – since it is only one number we can optimize by just evaluating with a fine grid of values and picking the best Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 46
  • 47. EM algorithm for t-distributions Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 47
  • 48. The plan... Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 48
  • 49. Structure • Densities for classification • Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis • EM algorithm in detail • Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 49
  • 50. Factor Analysis Compromise between – Full covariance matrix (desirable but many parameters) – Diagonal covariance matrix (no modelling of covariance) Models full covariance in a subspace F Mops everything else up with diagonal component S Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 50
  • 51. Data Density Consider modelling distribution of 100x100x3 RGB Images Gaussian w/ spherical covariance: 1 covariance matrix parameters Gaussian w/ diagonal covariance: Dx covariance matrix parameters Full Gaussian: ~Dx2 covariance matrix parameters PPCA: ~Dx Dh covariance matrix parameters Factor Analysis: ~Dx (Dh+1) covariance matrix parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 51 (full covariance in subspace + diagonal)
  • 52. Subspaces Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 52
  • 53. Factor Analysis Compromise between – Full covariance matrix (desirable but many parameters) – Diagonal covariance matrix (no modelling of covariance) Models full covariance in a subspace F Mops everything else up with diagonal component S Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 53
  • 54. Factor Analysis as a Marginalization Let’s define Then it can be shown (not obvious) that Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 54
  • 55. Sampling from factor analyzer Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 55
  • 56. Factor analysis vs. MoG Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 56
  • 57. E-Step • Compute posterior over hidden variables • Compute expectations needed for M-Step Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 57
  • 58. E-Step Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 58
  • 59. M-Step • Optimize parameters w.r.t. EM bound where.... Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 59
  • 60. M-Step • Optimize parameters w.r.t. EM bound where... giving expressions: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 60
  • 61. Face model Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 61
  • 62. Sampling from 10 parameter model To generate: • Choose factor loadings, hi from standard normal distribution • Multiply by factors, F • Add mean, m • (should add random noise component ei w/ diagonal cov S) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 62
  • 63. Combining Models Can combine three models in various combinations • Mixture of t-distributions = mixture of Gaussians + t-distributions • Robust subspace models = t-distributions + factor analysis • Mixture of factor analyizers = mixture of Gaussians + factor analysis Or combine all models to create mixture of robust subspace models Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 63
  • 64. Structure • Densities for classification • Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis • EM algorithm in detail • Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 64
  • 65. Expectation Maximization Problem: Optimize cost functions of the form Discrete case Continuous case Solution: Expectation Maximization (EM) algorithm (Dempster, Laird and Rubin 1977) Key idea: Define lower bound on log-likelihood and increase at each iteration Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 65
  • 66. Expectation Maximization Defines a lower bound on log likelihood and increases bound iteratively Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 66
  • 67. Lower bound Lower bound is a function of parameters q and a set of probability distributions {qi(hi)} Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 67
  • 68. E-Step & M-Step E-Step – Maximize bound w.r.t. distributions {qi(hi)} M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 68
  • 69. E-Step: Update {qi[hi]} so that bound equals log likelihood for this q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 69
  • 70. M-Step: Update q to maximum Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 70
  • 71. Missing Parts of Argument Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 71
  • 72. Missing Parts of Argument 1. Show that this is a lower bound 2. Show that the E-Step update is correct 3. Show that the M-Step update is correct Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 72
  • 73. Jensen’s Inequality or… similarly Computer vision: models, learning and inference. ©2011 Simon 73 J.D. Prince
  • 74. Lower Bound for EM Algorithm Where we have used Jensen’s inequality Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 74
  • 75. E-Step – Optimize bound w.r.t {qi(hi)} Constant w.r.t. q(h) Only this term matters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 75
  • 76. E-Step Kullback Leibler divergence – distance between probability distributions . We are maximizing the negative distance (i.e. Minimizing distance) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 76
  • 77. Use this relation log[y]<y-1 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 77
  • 78. Kullback Leibler Divergence So the cost function must be positive In other words, the best we can do is choose qi(hi) so that this is zero Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 78
  • 79. E-Step So the cost function must be positive The best we can do is choose qi(hi) so that this is zero. How can we do this? Easy – choose posterior Pr(h|x) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 79
  • 80. M-Step – Optimize bound w.r.t. q In the M-Step we optimize expected joint log likelihood with respect to parameters q (Expectation w.r.t distribution from E-Step) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 80
  • 81. E-Step & M-Step E-Step – Maximize bound w.r.t. distributions qi(hi) M-Step – Maximize bound w.r.t. parameters q Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 81
  • 82. Structure • Densities for classification • Models with hidden variables – Mixture of Gaussians – t-distributions – Factor analysis • EM algorithm in detail • Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 82
  • 83. Face Detection (not a very realistic model – face detection really done using discriminative methods) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 83
  • 84. Object recognition Model as J independent patches t-distributions normal distributions Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 84
  • 85. Segmentation Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 85
  • 86. Face Recognition Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 86
  • 87. Pose Regression Training data Learning Inference Frontal faces x Fit factor analysis model to Given new x, infer w Non-frontal faces w joint distribution of x,w (cond. dist of normal) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 87
  • 88. Conditional Distributions If then Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 88
  • 89. Modeling transformations Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 89