SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Object Detection with
     Discriminatively Trained Part
            Based Models
                Pedro F. Felzenszwalb
            Department of Computer Science
                University of Chicago


Joint with David Mcallester, Deva Ramanan, Ross Girshick
PASCAL Challenge
•   ~10,000 images, with ~25,000 target objects
    - Objects from 20 categories (person, car, bicycle, cow, table...)
    - Objects are annotated with labeled bounding boxes
Why is it hard?

•   Objects in rich categories exhibit significant variability
    - Photometric variation
    - Viewpoint variation
    - Intra-class variability
      - Cars come in a variety of shapes (sedan, minivan, etc)
      - People wear different clothes and take different poses
We need rich object models
But this leads to difficult matching and training problems
Starting point: sliding window classifiers


                                               Feature vector
                                               x = [... , ... , ... , ... ]




•   Detect objects by testing each subwindow
    - Reduces object detection to binary classification
    - Dalal & Triggs: HOG features + linear SVM classifier
    - Previous state of the art for detecting people
Histogram of Gradient (HOG) features




•   Image is partitioned into 8x8 pixel blocks

•   In each block we compute a histogram of gradient orientations
    - Invariant to changes in lighting, small deformations, etc.
•   Compute features at different resolutions (pyramid)
HOG Filters
•   Array of weights for features in subwindow of HOG pyramid

•   Score is dot product of filter and feature vector
                                    p


                                             Filter F


                                            Score of F at position p is
                                                   F ⋅ φ(p, H)

                                           φ(p, H) = concatenation of
                                              HOG features from
                         HOG pyramid H     subwindow specified by p
Dalal & Triggs: HOG + linear SVMs


                             φ(p, H)


                             φ(q, H)




                  There is much more background than objects
                  Start with random negatives and repeat:
                   1) Train a model
                   2) Harvest false positives to define “hard negatives”
Typical form of
   a model
Overview of our models




•   Mixture of deformable part models

•   Each component has global template + deformable parts

•   Fully trained from bounding boxes alone
2 component bicycle model




   root filters        part filters     deformation
coarse resolution   finer resolution     models

        Each component has a root filter F0
           and n part models (Fi, vi, di)
Object hypothesis


                                              z = (p0,..., pn)
                                              p0 : location of root
                                          p1,..., pn : location of parts




                                            Score is sum of filter
                                               scores minus
                                             deformation costs

Image pyramid       HOG feature pyramid


Multiscale model captures features at two-resolutions
Score of a hypothesis


                                 “data term”               “spatial prior”
                             n                         n
score(p0 , . . . , pn ) =         Fi · φ(H, pi ) −         di · (dx2 , dyi )
                                                                   i
                                                                         2

                            i=0                      i=1        displacements

                                 filters              deformation parameters


                                      score(z) = β · Ψ(H, z)

                            concatenation filters and       concatenation of HOG
                            deformation parameters            features and part
                                                           displacement features
Matching

•   Define an overall score for each root location

    - Based on best placement of parts
         score(p0 ) = max score(p0 , . . . , pn ).
                       p1 ,...,pn



•   High scoring root locations define detections

    - “sliding window approach”
•   Efficient computation: dynamic programming +
    generalized distance transforms (max-convolution)
input image
 head filter

Response of filter in l-th pyramid level
Rl (x, y) = F · φ(H, (x, y, l))
 cross-correlation

Transformed response
Dl (x, y) = max Rl (x + dx, y + dy) − di · (dx2 , dy 2 )
              dx,dy

  max-convolution, computed in linear time
        (spreading, local max, etc)
model
 feature map                      feature map at twice the resolution




                                         x                  ...              x
             x




                                                            ...
                                             response of part filters


response of root filter

                                                            ...
                                             transformed responses


                                         +

        color encoding of filter
           response values

                                                                          combined score of
                                                                            root locations
Matching results




(after non-maximum suppression)
 ~1 second to search all scales
Training
•   Training data consists of images with labeled bounding boxes.

•   Need to learn the model structure, filters and deformation costs.




                                    Training
Latent SVM (MI-SVM)
   Classifiers that score an example x using
      fβ (x) = max β · Φ(x, z)
                z∈Z(x)

   β are model parameters
   z are latent values

  Training data D = ( x1 , y1 , . . . , xn , yn )     yi ∈ {−1, 1}
  We would like to find β such that: yi fβ (xi ) > 0

Minimize
                           n
          1
  LD (β) = ||β||2 + C           max(0, 1 − yi fβ (xi ))
          2               i=1
Semi-convexity

• Maximum of convex functions is convex
• fβ (x) = z∈Z(x) β · Φ(x, z) is convex in β
            max

• max(0, 1 − yi fβ (xi )) is convex for negative examples

                       n
         1
 LD (β) = ||β||2 + C         max(0, 1 − yi fβ (xi ))
         2             i=1

 Convex if latent values for positive examples are fixed
Latent SVM training
                                        n
                       1
               LD (β) = ||β||2 + C           max(0, 1 − yi fβ (xi ))
                       2               i=1


•   Convex if we fix z for positive examples

•   Optimization:

    -   Initialize β and iterate:

        -   Pick best z for each positive example

        -   Optimize β via gradient descent with data-mining
Training Models
•   Reduce to Latent SVM training problem

•   Positive example specifies some z should have high score

•   Bounding box defines range of root locations
    - Parts can be anywhere
    - This defines Z(x)
Background

•   Negative example specifies no z should have high score

•   One negative example per root location in a background image
    - Huge number of negative examples
    - Consistent with requiring low false-positive rate
Training algorithm, nested iterations
Fix “best” positive latent values for positives
      Harvest high scoring (x,z) pairs from background images
      Update model using gradient descent
      Trow away (x,z) pairs with low score



•   Sequence of training rounds
    - Train root filters
    - Initialize parts from root
    - Train final model
Car model




   root filters         part filters     deformation
coarse resolution    finer resolution     models
Person model




   root filters      part filters     deformation
coarse resolution finer resolution     models
Cat model




   root filters        part filters     deformation
coarse resolution   finer resolution     models
Bottle model




   root filters      part filters     deformation
coarse resolution finer resolution     models
Car detections

high scoring true positives   high scoring false positives
Person detections

                              high scoring false positives
high scoring true positives      (not enough overlap)
Horse detections

high scoring true positives   high scoring false positives
Cat detections

                               high scoring false positives
high scoring true positives
                                  (not enough overlap)
Quantitative results

•   7 systems competed in the 2008 challenge

•   Out of 20 classes we got:
    - First place in 7 classes
    - Second place in 8 classes
•   Some statistics:
    - It takes ~2 seconds to evaluate a model in one image
    - It takes ~4 hours to train a model
    - MUCH faster than most systems.
Precision/Recall results on Bicycles 2008
Precision/Recall results on Person 2008
Precision/Recall results on Bird 2008
Comparison of Car models on 2006 data

                                  class: car, year 2006
               1

              0.9

              0.8

              0.7

              0.6
  precision




              0.5

              0.4

              0.3
                    1 Root (0.48)
              0.2
                    2 Root (0.58)
                    1 Root+Parts (0.55)
              0.1   2 Root+Parts (0.62)
                    2 Root+Parts+BB (0.64)
               0
                0   0.1     0.2       0.3            0.4   0.5   0.6   0.7
                                            recall
Summary
•   Deformable models for object detection

    - Fast matching algorithms
    - Learning from weakly-labeled data
    - Leads to state-of-the-art results in PASCAL challenge
•   Future work:

    - Hierarchical models
    - Visual grammars
    - AO* search (coarse-to-fine)

Contenu connexe

Tendances

CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slideszukun
 
01 graphical models
01 graphical models01 graphical models
01 graphical modelszukun
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration StructuresMark Kilgard
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future TrendCVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trendzukun
 
Lesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite IntegralsLesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite IntegralsMatthew Leingang
 
Masters Thesis Defense
Masters Thesis DefenseMasters Thesis Defense
Masters Thesis Defensessj4mathgenius
 
Time Machine session @ ICME 2012 - DTW's New Youth
Time Machine session @ ICME 2012 - DTW's New YouthTime Machine session @ ICME 2012 - DTW's New Youth
Time Machine session @ ICME 2012 - DTW's New YouthXavier Anguera
 
Identity Based Encryption
Identity Based EncryptionIdentity Based Encryption
Identity Based EncryptionPratik Poddar
 
bag-of-words models
bag-of-words models bag-of-words models
bag-of-words models Xiaotao Zou
 
CS 354 Global Illumination
CS 354 Global IlluminationCS 354 Global Illumination
CS 354 Global IlluminationMark Kilgard
 
Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talkjasonj383
 
Form 5 formulae and note
Form 5 formulae and noteForm 5 formulae and note
Form 5 formulae and notesmktsj2
 
Computational tools for Bayesian model choice
Computational tools for Bayesian model choiceComputational tools for Bayesian model choice
Computational tools for Bayesian model choiceChristian Robert
 
Integrated Math 2 Section 6-2
Integrated Math 2 Section 6-2Integrated Math 2 Section 6-2
Integrated Math 2 Section 6-2Jimbo Lamb
 
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)Joo-Haeng Lee
 
Practical computation of Hecke operators
Practical computation of Hecke operatorsPractical computation of Hecke operators
Practical computation of Hecke operatorsMathieu Dutour Sikiric
 

Tendances (20)

CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
 
01 graphical models
01 graphical models01 graphical models
01 graphical models
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration Structures
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future TrendCVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
 
Lesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite IntegralsLesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite Integrals
 
Masters Thesis Defense
Masters Thesis DefenseMasters Thesis Defense
Masters Thesis Defense
 
Curve fitting
Curve fittingCurve fitting
Curve fitting
 
16 17 bag_words
16 17 bag_words16 17 bag_words
16 17 bag_words
 
Time Machine session @ ICME 2012 - DTW's New Youth
Time Machine session @ ICME 2012 - DTW's New YouthTime Machine session @ ICME 2012 - DTW's New Youth
Time Machine session @ ICME 2012 - DTW's New Youth
 
Identity Based Encryption
Identity Based EncryptionIdentity Based Encryption
Identity Based Encryption
 
bag-of-words models
bag-of-words models bag-of-words models
bag-of-words models
 
CS 354 Global Illumination
CS 354 Global IlluminationCS 354 Global Illumination
CS 354 Global Illumination
 
Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talk
 
Gottlob ICDE 2011
Gottlob ICDE 2011Gottlob ICDE 2011
Gottlob ICDE 2011
 
Lecture9
Lecture9Lecture9
Lecture9
 
Form 5 formulae and note
Form 5 formulae and noteForm 5 formulae and note
Form 5 formulae and note
 
Computational tools for Bayesian model choice
Computational tools for Bayesian model choiceComputational tools for Bayesian model choice
Computational tools for Bayesian model choice
 
Integrated Math 2 Section 6-2
Integrated Math 2 Section 6-2Integrated Math 2 Section 6-2
Integrated Math 2 Section 6-2
 
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
 
Practical computation of Hecke operators
Practical computation of Hecke operatorsPractical computation of Hecke operators
Practical computation of Hecke operators
 

En vedette

Real time pedestrian detection with deformable part models [h. cho, p. rybski...
Real time pedestrian detection with deformable part models [h. cho, p. rybski...Real time pedestrian detection with deformable part models [h. cho, p. rybski...
Real time pedestrian detection with deformable part models [h. cho, p. rybski...tino
 
High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1NVIDIA
 
Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the ...
Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the ...Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the ...
Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the ...AIST
 
Learning Object Detectors From Weakly Supervised Image Data
Learning Object Detectors From Weakly Supervised Image DataLearning Object Detectors From Weakly Supervised Image Data
Learning Object Detectors From Weakly Supervised Image DataYandex
 
Deformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural NetworksDeformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural NetworksWei Yang
 
Pedestrian Detection Technology - Brochure
Pedestrian Detection Technology - BrochurePedestrian Detection Technology - Brochure
Pedestrian Detection Technology - BrochureMobileye
 

En vedette (6)

Real time pedestrian detection with deformable part models [h. cho, p. rybski...
Real time pedestrian detection with deformable part models [h. cho, p. rybski...Real time pedestrian detection with deformable part models [h. cho, p. rybski...
Real time pedestrian detection with deformable part models [h. cho, p. rybski...
 
High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1
 
Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the ...
Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the ...Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the ...
Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the ...
 
Learning Object Detectors From Weakly Supervised Image Data
Learning Object Detectors From Weakly Supervised Image DataLearning Object Detectors From Weakly Supervised Image Data
Learning Object Detectors From Weakly Supervised Image Data
 
Deformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural NetworksDeformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural Networks
 
Pedestrian Detection Technology - Brochure
Pedestrian Detection Technology - BrochurePedestrian Detection Technology - Brochure
Pedestrian Detection Technology - Brochure
 

Similaire à Object Detection with Discriminatively Trained Part Based Models

Team meeting 100325
Team meeting 100325Team meeting 100325
Team meeting 100325Yi-Hsin Liu
 
ICCV2009: MAP Inference in Discrete Models: Part 2
ICCV2009: MAP Inference in Discrete Models: Part 2ICCV2009: MAP Inference in Discrete Models: Part 2
ICCV2009: MAP Inference in Discrete Models: Part 2zukun
 
Discrete Models in Computer Vision
Discrete Models in Computer VisionDiscrete Models in Computer Vision
Discrete Models in Computer VisionYap Wooi Hen
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2zukun
 
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdfMcSwathi
 
CVPR2010: higher order models in computer vision: Part 4
CVPR2010: higher order models in computer vision: Part 4 CVPR2010: higher order models in computer vision: Part 4
CVPR2010: higher order models in computer vision: Part 4 zukun
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative ModelsKenta Oono
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
 
Submodularity slides
Submodularity slidesSubmodularity slides
Submodularity slidesdragonthu
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdfAmir Saleh
 
Lesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite IntegralsLesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite IntegralsMatthew Leingang
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance charlesmartin14
 
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisDetecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisSilvio Cesare
 
Nonlinear programming 2013
Nonlinear programming 2013Nonlinear programming 2013
Nonlinear programming 2013sharifz
 
Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031frdos
 
Functionsandpigeonholeprinciple
FunctionsandpigeonholeprincipleFunctionsandpigeonholeprinciple
FunctionsandpigeonholeprincipleShiwani Gupta
 

Similaire à Object Detection with Discriminatively Trained Part Based Models (20)

Team meeting 100325
Team meeting 100325Team meeting 100325
Team meeting 100325
 
ICCV2009: MAP Inference in Discrete Models: Part 2
ICCV2009: MAP Inference in Discrete Models: Part 2ICCV2009: MAP Inference in Discrete Models: Part 2
ICCV2009: MAP Inference in Discrete Models: Part 2
 
Discrete Models in Computer Vision
Discrete Models in Computer VisionDiscrete Models in Computer Vision
Discrete Models in Computer Vision
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2
 
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
 
point processing
point processingpoint processing
point processing
 
CVPR2010: higher order models in computer vision: Part 4
CVPR2010: higher order models in computer vision: Part 4 CVPR2010: higher order models in computer vision: Part 4
CVPR2010: higher order models in computer vision: Part 4
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
Fourier Transforms
Fourier TransformsFourier Transforms
Fourier Transforms
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
 
Submodularity slides
Submodularity slidesSubmodularity slides
Submodularity slides
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdf
 
Exponents)
Exponents)Exponents)
Exponents)
 
Lesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite IntegralsLesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite Integrals
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance
 
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisDetecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
 
Nonlinear programming 2013
Nonlinear programming 2013Nonlinear programming 2013
Nonlinear programming 2013
 
Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031
 
Functionsandpigeonholeprinciple
FunctionsandpigeonholeprincipleFunctionsandpigeonholeprinciple
Functionsandpigeonholeprinciple
 

Plus de zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featureszukun
 
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...zukun
 
Quoc le tera-scale deep learning
Quoc le   tera-scale deep learningQuoc le   tera-scale deep learning
Quoc le tera-scale deep learningzukun
 

Plus de zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant features
 
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
 
Quoc le tera-scale deep learning
Quoc le   tera-scale deep learningQuoc le   tera-scale deep learning
Quoc le tera-scale deep learning
 

Dernier

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Dernier (20)

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 

Object Detection with Discriminatively Trained Part Based Models

  • 1. Object Detection with Discriminatively Trained Part Based Models Pedro F. Felzenszwalb Department of Computer Science University of Chicago Joint with David Mcallester, Deva Ramanan, Ross Girshick
  • 2. PASCAL Challenge • ~10,000 images, with ~25,000 target objects - Objects from 20 categories (person, car, bicycle, cow, table...) - Objects are annotated with labeled bounding boxes
  • 3.
  • 4. Why is it hard? • Objects in rich categories exhibit significant variability - Photometric variation - Viewpoint variation - Intra-class variability - Cars come in a variety of shapes (sedan, minivan, etc) - People wear different clothes and take different poses We need rich object models But this leads to difficult matching and training problems
  • 5. Starting point: sliding window classifiers Feature vector x = [... , ... , ... , ... ] • Detect objects by testing each subwindow - Reduces object detection to binary classification - Dalal & Triggs: HOG features + linear SVM classifier - Previous state of the art for detecting people
  • 6. Histogram of Gradient (HOG) features • Image is partitioned into 8x8 pixel blocks • In each block we compute a histogram of gradient orientations - Invariant to changes in lighting, small deformations, etc. • Compute features at different resolutions (pyramid)
  • 7. HOG Filters • Array of weights for features in subwindow of HOG pyramid • Score is dot product of filter and feature vector p Filter F Score of F at position p is F ⋅ φ(p, H) φ(p, H) = concatenation of HOG features from HOG pyramid H subwindow specified by p
  • 8. Dalal & Triggs: HOG + linear SVMs φ(p, H) φ(q, H) There is much more background than objects Start with random negatives and repeat: 1) Train a model 2) Harvest false positives to define “hard negatives” Typical form of a model
  • 9. Overview of our models • Mixture of deformable part models • Each component has global template + deformable parts • Fully trained from bounding boxes alone
  • 10. 2 component bicycle model root filters part filters deformation coarse resolution finer resolution models Each component has a root filter F0 and n part models (Fi, vi, di)
  • 11. Object hypothesis z = (p0,..., pn) p0 : location of root p1,..., pn : location of parts Score is sum of filter scores minus deformation costs Image pyramid HOG feature pyramid Multiscale model captures features at two-resolutions
  • 12. Score of a hypothesis “data term” “spatial prior” n n score(p0 , . . . , pn ) = Fi · φ(H, pi ) − di · (dx2 , dyi ) i 2 i=0 i=1 displacements filters deformation parameters score(z) = β · Ψ(H, z) concatenation filters and concatenation of HOG deformation parameters features and part displacement features
  • 13. Matching • Define an overall score for each root location - Based on best placement of parts score(p0 ) = max score(p0 , . . . , pn ). p1 ,...,pn • High scoring root locations define detections - “sliding window approach” • Efficient computation: dynamic programming + generalized distance transforms (max-convolution)
  • 14. input image head filter Response of filter in l-th pyramid level Rl (x, y) = F · φ(H, (x, y, l)) cross-correlation Transformed response Dl (x, y) = max Rl (x + dx, y + dy) − di · (dx2 , dy 2 ) dx,dy max-convolution, computed in linear time (spreading, local max, etc)
  • 15. model feature map feature map at twice the resolution x ... x x ... response of part filters response of root filter ... transformed responses + color encoding of filter response values combined score of root locations
  • 16. Matching results (after non-maximum suppression) ~1 second to search all scales
  • 17. Training • Training data consists of images with labeled bounding boxes. • Need to learn the model structure, filters and deformation costs. Training
  • 18. Latent SVM (MI-SVM) Classifiers that score an example x using fβ (x) = max β · Φ(x, z) z∈Z(x) β are model parameters z are latent values Training data D = ( x1 , y1 , . . . , xn , yn ) yi ∈ {−1, 1} We would like to find β such that: yi fβ (xi ) > 0 Minimize n 1 LD (β) = ||β||2 + C max(0, 1 − yi fβ (xi )) 2 i=1
  • 19. Semi-convexity • Maximum of convex functions is convex • fβ (x) = z∈Z(x) β · Φ(x, z) is convex in β max • max(0, 1 − yi fβ (xi )) is convex for negative examples n 1 LD (β) = ||β||2 + C max(0, 1 − yi fβ (xi )) 2 i=1 Convex if latent values for positive examples are fixed
  • 20. Latent SVM training n 1 LD (β) = ||β||2 + C max(0, 1 − yi fβ (xi )) 2 i=1 • Convex if we fix z for positive examples • Optimization: - Initialize β and iterate: - Pick best z for each positive example - Optimize β via gradient descent with data-mining
  • 21. Training Models • Reduce to Latent SVM training problem • Positive example specifies some z should have high score • Bounding box defines range of root locations - Parts can be anywhere - This defines Z(x)
  • 22. Background • Negative example specifies no z should have high score • One negative example per root location in a background image - Huge number of negative examples - Consistent with requiring low false-positive rate
  • 23. Training algorithm, nested iterations Fix “best” positive latent values for positives Harvest high scoring (x,z) pairs from background images Update model using gradient descent Trow away (x,z) pairs with low score • Sequence of training rounds - Train root filters - Initialize parts from root - Train final model
  • 24. Car model root filters part filters deformation coarse resolution finer resolution models
  • 25. Person model root filters part filters deformation coarse resolution finer resolution models
  • 26. Cat model root filters part filters deformation coarse resolution finer resolution models
  • 27. Bottle model root filters part filters deformation coarse resolution finer resolution models
  • 28. Car detections high scoring true positives high scoring false positives
  • 29. Person detections high scoring false positives high scoring true positives (not enough overlap)
  • 30. Horse detections high scoring true positives high scoring false positives
  • 31. Cat detections high scoring false positives high scoring true positives (not enough overlap)
  • 32. Quantitative results • 7 systems competed in the 2008 challenge • Out of 20 classes we got: - First place in 7 classes - Second place in 8 classes • Some statistics: - It takes ~2 seconds to evaluate a model in one image - It takes ~4 hours to train a model - MUCH faster than most systems.
  • 36. Comparison of Car models on 2006 data class: car, year 2006 1 0.9 0.8 0.7 0.6 precision 0.5 0.4 0.3 1 Root (0.48) 0.2 2 Root (0.58) 1 Root+Parts (0.55) 0.1 2 Root+Parts (0.62) 2 Root+Parts+BB (0.64) 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 recall
  • 37. Summary • Deformable models for object detection - Fast matching algorithms - Learning from weakly-labeled data - Leads to state-of-the-art results in PASCAL challenge • Future work: - Hierarchical models - Visual grammars - AO* search (coarse-to-fine)