SlideShare une entreprise Scribd logo
1  sur  107
Télécharger pour lire hors ligne
6.870 Grounding object
          recognition and scene
              understanding
Wednesdays 1-4pm
Room 13-1143
Instructor: Antonio Torralba
Email: torralba@csail.mit.edu

http://people.csail.mit.edu/torralba/courses/6.870/6.870.recognition.htm
Some slides are borrowed from other classes (see links on the course
web site). Let me know if I forget to give credit to the right people.
http://groups.csail.mit.edu/vision/courses/6.869/
Grading

•  Class participation: 20%

•  Paper presentations: 40%

•  Course project: 40%
Course project
•  Topics for projects: It can derive from one
   of the papers studied or from your own
   research.

•  Work individually or in pairs.

•  Results described as a 4 pages CVPR
   paper

•  Short presentation at the end of the
   semester
Paper presentations (40%)
Email me at the end of the class for scheduling the next week. We will
  first decide how to structure the week together.

•  Presenter:
    –  Present the key ideas, background material, and technical details.
    –  Show me the slides two days before the class.
    –  To test the basic ideas of the paper(s), using code available online or
       writing toy code.
    –  Create toy test problems that reveal something about the algorithm.
    –  Constructive criticism.
Readings	
  
6.870 Grounding object recognition
                           and scene understanding




Lecture	
  1	
  
  	
  Class	
  goals	
  and	
  
  	
  a	
  short	
  introduc2on	
  
What	
  is	
  vision?	
  
•  What	
  does	
  it	
  mean,	
  to	
  see?	
  	
  “to	
  know	
  what	
  is	
  
   where	
  by	
  looking”.	
  
•  How	
  to	
  discover	
  from	
  images	
  what	
  is	
  present	
  
   in	
  the	
  world,	
  where	
  things	
  are,	
  what	
  ac2ons	
  
   are	
  taking	
  place.	
  




 from	
  Marr,	
  1982	
  
The	
  importance	
  of	
  images	
  
   Some	
  images	
  are	
  more	
  important	
  than	
  others	
  	
  




                                                                  “Dora	
  Maar	
  au	
  Chat”	
  
                                                                  Pablo	
  Picasso,	
  1941	
  




                       100	
  million	
  $	
  
Why	
  is	
  vision	
  hard?	
  
The	
  structure	
  of	
  ambient	
  light	
  
The	
  structure	
  of	
  ambient	
  light	
  
The	
  Plenop2c	
  Func2on	
  
                                Adelson & Bergen, 91




  The intensity P can be parameterized as:

                          P (θ, φ,	

 λ,	

 t, X, Y, Z)
“The complete set of all convergence points constitutes the permanent possibilities
of vision.” Gibson
Why	
  is	
  vision	
  hard?	
  
Measuring	
  light	
  vs.	
  measuring	
  
     scene	
  proper2es	
  




     We perceive two squares, one on top of each other.
Measuring	
  light	
  vs.	
  measuring	
  scene	
  
                proper2es	
  




                            by Roger Shepard (”Turning the Tables”)


        Depth processing is automatic, and we can not shut it down…
Measuring	
  light	
  vs.	
  measuring	
  
     scene	
  proper2es	
  
Measuring	
  light	
  vs.	
  measuring	
  
     scene	
  proper2es	
  
Measuring	
  light	
  vs.	
  measuring	
  
     scene	
  proper2es	
  




                        (c) 2006 Walt Anthony
Assump2ons	
  can	
  be	
  wrong	
  




             Ames	
  room	
  
By Aude Oliva
Why	
  is	
  vision	
  hard?	
  
Some	
  things	
  have	
  strong	
  varia2ons	
  
            in	
  appearance	
  
Some	
  things	
  know	
  that	
  you	
  have	
  eyes	
  




Brady,	
  M.	
  J.,	
  &	
  Kersten,	
  D.	
  (2003).	
  Bootstrapped	
  learning	
  of	
  novel	
  objects.	
  J	
  Vis,	
  3(6),	
  413-­‐422	
  	
  
A	
  short	
  history	
  of	
  vision	
  
The	
  early	
  op2mism	
  
The	
  crisis	
  of	
  the	
  80’s
                                 	
  
Object	
  recogni2on	
  
                  Is	
  it	
  really	
  so	
  hard?	
  


Yes,	
  object	
  recogni2on	
  is	
  hard…	
  
                   (or at least it seems so for now…)
Challenges 1: view point variation




Michelangelo 1475-1564
Challenges 2: illumination




                             slide credit: S. Ullman
Challenges 3: occlusion




         Magritte, 1957
Challenges 4: scale
Challenges 5: deformation




                            Xu, Beihong 1943
Challenges 6: background clutter




      Klimt, 1913
Challenges 7: intra-class variation
Challenges




Brady, M. J., & Kersten, D. (2003). Bootstrapped learning of novel objects. J Vis, 3(6), 413-422
Discover the camouflaged object




Brady, M. J., & Kersten, D. (2003). Bootstrapped learning of novel objects. J Vis, 3(6), 413-422
Discover the camouflaged object




Brady, M. J., & Kersten, D. (2003). Bootstrapped learning of novel objects. J Vis, 3(6), 413-422
Any guesses?
So,	
  let’s	
  make	
  the	
  problem	
  simpler:	
  
                       Block	
  world	
  




Nice framework to develop fancy math, but too far from reality…
                                           Object Recognition in the Geometric Era:
                                           a Retrospective. Joseph L. Mundy. 2006
Binford	
  and	
  generalized	
  cylinders	
  




                                 Object Recognition in the Geometric Era:
                                 a Retrospective. Joseph L. Mundy. 2006
Binford	
  and	
  generalized	
  cylinders	
  
Recogni2on	
  by	
  components	
  



Irving Biederman
Recognition-by-Components: A Theory of Human Image Understanding.
Psychological Review, 1987.
Recogni2on	
  by	
  components	
  
The	
  fundamental	
  assump2on	
  of	
  the	
  proposed	
  theory,	
  
  recogni2on-­‐by-­‐components	
  (RBC),	
  is	
  that	
  a	
  modest	
  set	
  of	
  
  generalized-­‐cone	
  components,	
  called	
  geons	
  (N	
  =	
  36),	
  can	
  be	
  
  derived	
  from	
  contrasts	
  of	
  five	
  readily	
  detectable	
  proper2es	
  of	
  
  edges	
  in	
  a	
  two-­‐dimensional	
  image:	
  curvature,	
  collinearity,	
  
  symmetry,	
  parallelism,	
  and	
  cotermina2on.	
  

The	
  “contribu2on	
  lies	
  in	
  its	
  proposal	
  for	
  a	
  par2cular	
  vocabulary	
  
  of	
  components	
  derived	
  from	
  perceptual	
  mechanisms	
  and	
  its	
  
  account	
  of	
  how	
  an	
  arrangement	
  of	
  these	
  components	
  can	
  
  access	
  a	
  representa2on	
  of	
  an	
  object	
  in	
  memory.”	
  
A	
  do-­‐it-­‐yourself	
  example	
  




1)  We know that this object is nothing we know
2)  We can split this objects into parts that everybody will agree
3)  We can see how it resembles something familiar: “a hot dog cart”


“The naive realism that emerges in descriptions of nonsense objects may be
   reflecting the workings of a representational system by which objects are
   identified.”
Stages	
  of	
  processing	
  




“Parsing is performed, primarily at concave regions, simultaneously with a
detection of nonaccidental properties.”
Non	
  accidental	
  proper2es	
  
Certain properties of edges in a two-dimensional image are taken by the visual
system as strong evidence that the edges in the three-dimensional world contain those
same properties.

Non accidental properties, (Witkin & Tenenbaum,1983): Rarely be produced by
accidental alignments of viewpoint and object features and consequently are generally
unaffected by slight variations in viewpoint.

                                         image




                                                          ?
Examples:
•  Colinearity
•  Smoothness
•  Symmetry
•  Parallelism
•  Cotermination
From	
  generalized	
  cylinders	
  to	
  GEONS	
  




“From variation over only two or three levels in the nonaccidental relations of four
attributes of generalized cylinders, a set of 36 GEONS can be generated.”
 Geons represent a restricted form of generalized cylinders.
Objects	
  and	
  their	
  geons	
  
Scenes	
  and	
  geons	
  




                      Mezzanotte & Biederman
The	
  importance	
  of	
  spa2al	
  
         arrangement	
  
Parts and Structure approaches
With a different perspective, these models focused more on the
   geometry than on defining the constituent elements:

•    Fischler & Elschlager 1973
•    Yuille ‘91
•    Brunelli & Poggio ‘93
•    Lades, v.d. Malsburg et al. ‘93
•    Cootes, Lanitis, Taylor et al. ‘95
•    Amit & Geman ‘95, ‘99
•    Perona et al. ‘95, ‘96, ’98, ’00, ’03, ‘04, ‘05
•    Felzenszwalb & Huttenlocher ’00, ’04           Figure from [Fischler & Elschlager 73]

•    Crandall & Huttenlocher ’05, ’06
•    Leibe & Schiele ’03, ’04
•    Many papers since 2000
But,	
  despite	
  promising	
  ini2al	
  results…things	
  did	
  not	
  
  work	
  out	
  so	
  well	
  (lack	
  of	
  data,	
  processing	
  power,	
  lack	
  
  of	
  reliable	
  methods	
  for	
  low-­‐level	
  and	
  mid-­‐level	
  
  vision)	
  

Instead,	
  a	
  different	
  way	
  of	
  thinking	
  about	
  object	
  
  detec2on	
  started	
  making	
  some	
  progress:	
  learning	
  
  based	
  approaches	
  and	
  classifiers,	
  which	
  ignored	
  low	
  
  and	
  mid-­‐level	
  vision.	
  

Maybe	
  the	
  2me	
  is	
  here	
  to	
  come	
  back	
  to	
  some	
  of	
  the	
  
 earlier	
  models,	
  more	
  grounded	
  in	
  intui2ons	
  about	
  
 visual	
  percep2on.	
  
Renewed	
  op2mism	
  
Neocognitron	
  
          Fukushima (1980). Hierarchical multilayered neural network




S-cells work as feature-extracting cells. They resemble simple cells of the
primary visual cortex in their response.
C-cells, which resembles complex cells in the visual cortex, are inserted in the
network to allow for positional errors in the features of the stimulus. The input
connections of C-cells, which come from S-cells of the preceding layer, are fixed
and invariable. Each C-cell receives excitatory input connections from a group
of S-cells that extract the same feature, but from slightly different positions. The
C-cell responds if at least one of these S-cells yield an output.
Neocognitron	
  




         Learning is done greedily for each layer
Convolu2onal	
  Neural	
  Network	
  




                                                   Le Cun et al, 98




The output neurons share all the intermediate levels
Face detection and the success
      of learning based approaches




•  The representation and matching of pictorial structures Fischler, Elschlager (1973).
•  Face recognition using eigenfaces M. Turk and A. Pentland (1991).
•  Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995)
•  Graded Learning for Object Detection - Fleuret, Geman (1999)
•  Robust Real-time Object Detection - Viola, Jones (2001)
•  Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre,
Mukherjee, Poggio (2001)
• ….
•  The representation and matching of pictorial structures Fischler, Elschlager (1973).
•  Face recognition using eigenfaces M. Turk and A. Pentland (1991).
•  Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995)
•  Graded Learning for Object Detection - Fleuret, Geman (1999)
•  Robust Real-time Object Detection - Viola, Jones (2001)
•  Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre,
Mukherjee, Poggio (2001)
• ….
Faces	
  everywhere	
  




http://www.marcofolio.net/imagedump/faces_everywhere_15_images_8_illusions.html   72
The face age




  Feret dataset, 1996 DARPA

•  The representation and matching of pictorial structures Fischler,
Elschlager (1973).
•  Face recognition using eigenfaces M. Turk and A. Pentland (1991).
•  Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995)
•  Graded Learning for Object Detection - Fleuret, Geman (1999)
•  Robust Real-time Object Detection - Viola, Jones (2001)
•  Feature Reduction and Hierarchy of Classifiers for Fast Object Detection
in Video Images - Heisele, Serre, Mukherjee, Poggio (2001)
• ….
Rapid Object Detection Using a Boosted
                              Cascade of Simple Features




                             Paul Viola     Michael J. Jones
                    Mitsubishi Electric Research Laboratories (MERL)
                                      Cambridge, MA


                Most of this work was done at Compaq CRL before the authors moved to MERL

Manuscript available on web:
http://citeseer.ist.psu.edu/cache/papers/cs/23183/http:zSzzSzwww.ai.mit.eduzSzpeoplezSzviolazSzresearchzSzpublicationszSzICCV01-Viola-Jones.pdf/viola01robust.pdf
Haar-like filters and cascades
Viola and Jones, ICCV 2001




                               The average intensity in the
                               block is computed with four
                               sums independently of the
                               block size.
Also Fleuret and Geman, 2001
Face detection
•  The representation and matching of pictorial structures
Fischler, Elschlager (1973).
•  Face recognition using eigenfaces M. Turk and A.
Pentland (1991).
•  Human Face Detection in Visual Scenes - Rowley, Baluja,
Kanade (1995)
•  Graded Learning for Object Detection - Fleuret, Geman
(1999)
•  Robust Real-time Object Detection - Viola, Jones (2001)
•  Feature Reduction and Hierarchy of Classifiers for Fast
Object Detection in Video Images - Heisele, Serre,
Mukherjee, Poggio (2001)
• ….
Families of recognition algorithms
                                             Voting models                           Shape matching
  Bag of words models
                                                                                     Deformable models



                                         Viola and Jones, ICCV 2001                Berg, Berg, Malik, 2005
Csurka, Dance, Fan, Willamowski, and    Heisele, Poggio, et. al., NIPS 01
                                                                                   Cootes, Edwards, Taylor, 2001
Bray 2004                                Schneiderman, Kanade 2004
Sivic, Russell, Freeman, Zisserman,       Vidal-Naquet, Ullman 2003
ICCV 2005

                                                                            Rigid template models
                Constellation models




              Fischler and Elschlager, 1973                                  Sirovich and Kirby 1987
                                                                             Turk, Pentland, 1991
             Burl, Leung, and Perona, 1995
            Weber, Welling, and Perona, 2000                                 Dalal & Triggs, 2006
        Fergus, Perona, & Zisserman, CVPR 2003
Scene understanding
Torralba,	
  Sinha	
  (2001)	
                                                                           Torralba	
  Murphy	
  Freeman	
  (2004)	
  
                                         Carboneio,	
  de	
  Freitas	
  &	
  Barnard	
  (2004)	
  




Fink	
  &	
  Perona	
  (2003)	
  

                                                                                                                          Rabinovich	
  et	
  al	
  (2007)	
  
                                       Sudderth,	
  Torralba,	
  
                                       Wilsky,	
  Freeman	
  (2005)	
  	
  
                                                                               Hoiem,	
  Efros,	
  Hebert	
  (2005)	
  


Kumar,	
  Hebert	
  (2005)	
  

                                                                                                                              Choi, Lim,
                                                                                                                              Torralba,
                                                                           Desai,	
  Ramanan,	
  and	
  Fowlkes	
  (2009)	
  
                                                                                                                              Willsky (2010)
                                    Heitz	
  and	
  Koller	
  (2008)	
  
NSF Frontiers in computer vision workshop, 2011
MobilEye
Demo google googles
The	
  labeling	
  crisis
                               	
  
                SKY


                                      TREE


       PERSON   BENCH
                            PERSON

                                      PATH
       LAKE                                  PERSON

                               DUCK

                                      PERSON
                     DUCK

SIGN          DUCK

                              GRASS
So what does object recognition involve?




                            Slide by Fei-Fei, Fergus, Torralba
Verification: is that a lamp?




                                Slide by Fei-Fei, Fergus, Torralba
Detection: are there people?




                               Slide by Fei-Fei, Fergus, Torralba
Identification: is that Potala Palace?




                              Slide by Fei-Fei, Fergus, Torralba
Object categorization

                             mountain



         tree
                           building
          banner

                         street lamp

                               vendor
                people
                                Slide by Fei-Fei, Fergus, Torralba
Scene and context categorization
                        •  outdoor
                        •  city
                        •  …




                               Slide by Fei-Fei, Fergus, Torralba
Is this space large or small?
How far are the buildings in the back?




                             Slide by Fei-Fei, Fergus, Torralba
Activity




What is this person doing?
                             What are these two doing??




                                              Slide by Fei-Fei, Fergus, Torralba
What	
  are	
  we	
  tuned	
  to?	
  

The	
  visual	
  system	
  is	
  tuned	
  to	
  process	
  structures
  	
  typically	
  found	
  in	
  the	
  world.	
  	
  
The visual system seems to be tuned to a set of images:




                                                    Demo inspired from D. Field
Remember these images
Did you saw this image?
Remember these images
        Test 2
Did you saw this image?
Data
Human vision
• Many input modalities
• Active
• Supervised, unsupervised, semi supervised
learning. It can look for supervision.




Robot vision
• Many poor input modalities
• Active, but it does not go far


Internet vision
• Many input modalities
• It can reach everywhere
• Tons of data
Kinect
Active stereo with structured light



                                                     Li Zhang’s one-shot stereo

                 camera 1                                                                      camera 1


projector                                                                 projector


                 camera 2

          Project “structured” light patterns onto the object
                 •  simplifies the correspondence problem
Li Zhang, Brian Curless, and Steven M. Seitz. Rapid Shape Acquisition Using Color Structured
Light and Multi-pass Dynamic Programming. In Proceedings of the 1st International
Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT), Padova, Italy,
June 19-21, 2002, pp. 24-36.	

         CSE 576, Spring 2008 Szeliski
               Slide credit: Rick                            Stereo matching                              100
CSE 576, Spring 2008   Stereo matching   101
102
Willow garage




 http://www.willowgarage.com/pages/pr2/overview
Class goals

•  Vision and language

•  Vision and robotics

•  Vision and others
  The strategies our visual system uses are tuned to our visual world


          To provide the right vision tools for not vision experts
          Thinking about the tasks to find new representations

Contenu connexe

Similaire à MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1

Object recognition
Object recognitionObject recognition
Object recognitionakkichester
 
Iccv2009 recognition and learning object categories p0 c00 - introduction
Iccv2009 recognition and learning object categories   p0 c00 - introductionIccv2009 recognition and learning object categories   p0 c00 - introduction
Iccv2009 recognition and learning object categories p0 c00 - introductionzukun
 
Introduction vision
Introduction visionIntroduction vision
Introduction visionAshish Kumar
 
Chapter 6 (percpetion)
Chapter 6 (percpetion)Chapter 6 (percpetion)
Chapter 6 (percpetion)dcrocke1
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualizationJan Aerts
 
General relativity vs. quantum mechanics issues of foundations uv 1_oct2018
General relativity vs. quantum mechanics issues of foundations uv 1_oct2018General relativity vs. quantum mechanics issues of foundations uv 1_oct2018
General relativity vs. quantum mechanics issues of foundations uv 1_oct2018SOCIEDAD JULIO GARAVITO
 
Mit6870 orsu lecture11
Mit6870 orsu lecture11Mit6870 orsu lecture11
Mit6870 orsu lecture11zukun
 
Constructivist Learning
Constructivist LearningConstructivist Learning
Constructivist Learningdrburwell
 
Attention & Perception - Cognitive Psychology.pptx
Attention & Perception - Cognitive Psychology.pptxAttention & Perception - Cognitive Psychology.pptx
Attention & Perception - Cognitive Psychology.pptxLinda M
 
Reflection-refraction.ppt
Reflection-refraction.pptReflection-refraction.ppt
Reflection-refraction.pptsimonckt1
 
Quantum Mechanics by Dr Steven Spencer
Quantum Mechanics by Dr Steven SpencerQuantum Mechanics by Dr Steven Spencer
Quantum Mechanics by Dr Steven SpencerAlec Gisbert
 
Abstract of project 2
Abstract of project 2Abstract of project 2
Abstract of project 2Vikram Mandal
 

Similaire à MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1 (20)

Object recognition
Object recognitionObject recognition
Object recognition
 
Iccv2009 recognition and learning object categories p0 c00 - introduction
Iccv2009 recognition and learning object categories   p0 c00 - introductionIccv2009 recognition and learning object categories   p0 c00 - introduction
Iccv2009 recognition and learning object categories p0 c00 - introduction
 
Introduction vision
Introduction visionIntroduction vision
Introduction vision
 
Chapter 6 (percpetion)
Chapter 6 (percpetion)Chapter 6 (percpetion)
Chapter 6 (percpetion)
 
perception
perceptionperception
perception
 
Perception
PerceptionPerception
Perception
 
Perception
PerceptionPerception
Perception
 
Chapter4
Chapter4Chapter4
Chapter4
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
Chapter 6 ap psych- Perception
Chapter 6 ap psych- PerceptionChapter 6 ap psych- Perception
Chapter 6 ap psych- Perception
 
General relativity vs. quantum mechanics issues of foundations uv 1_oct2018
General relativity vs. quantum mechanics issues of foundations uv 1_oct2018General relativity vs. quantum mechanics issues of foundations uv 1_oct2018
General relativity vs. quantum mechanics issues of foundations uv 1_oct2018
 
Mit6870 orsu lecture11
Mit6870 orsu lecture11Mit6870 orsu lecture11
Mit6870 orsu lecture11
 
Constructivist Learning
Constructivist LearningConstructivist Learning
Constructivist Learning
 
Memory2009
Memory2009Memory2009
Memory2009
 
Attention & Perception - Cognitive Psychology.pptx
Attention & Perception - Cognitive Psychology.pptxAttention & Perception - Cognitive Psychology.pptx
Attention & Perception - Cognitive Psychology.pptx
 
Reflection-refraction.ppt
Reflection-refraction.pptReflection-refraction.ppt
Reflection-refraction.ppt
 
Reflection-refraction.ppt
Reflection-refraction.pptReflection-refraction.ppt
Reflection-refraction.ppt
 
Quantum Mechanics by Dr Steven Spencer
Quantum Mechanics by Dr Steven SpencerQuantum Mechanics by Dr Steven Spencer
Quantum Mechanics by Dr Steven Spencer
 
Abstract of project 2
Abstract of project 2Abstract of project 2
Abstract of project 2
 
QM philosophy talk
QM philosophy talkQM philosophy talk
QM philosophy talk
 

Plus de zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

Plus de zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Dernier

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxdhanalakshmis0310
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 

Dernier (20)

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 

MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1

  • 1. 6.870 Grounding object recognition and scene understanding Wednesdays 1-4pm Room 13-1143 Instructor: Antonio Torralba Email: torralba@csail.mit.edu http://people.csail.mit.edu/torralba/courses/6.870/6.870.recognition.htm Some slides are borrowed from other classes (see links on the course web site). Let me know if I forget to give credit to the right people.
  • 3. Grading •  Class participation: 20% •  Paper presentations: 40% •  Course project: 40%
  • 4. Course project •  Topics for projects: It can derive from one of the papers studied or from your own research. •  Work individually or in pairs. •  Results described as a 4 pages CVPR paper •  Short presentation at the end of the semester
  • 5. Paper presentations (40%) Email me at the end of the class for scheduling the next week. We will first decide how to structure the week together. •  Presenter: –  Present the key ideas, background material, and technical details. –  Show me the slides two days before the class. –  To test the basic ideas of the paper(s), using code available online or writing toy code. –  Create toy test problems that reveal something about the algorithm. –  Constructive criticism.
  • 7. 6.870 Grounding object recognition and scene understanding Lecture  1    Class  goals  and    a  short  introduc2on  
  • 8. What  is  vision?   •  What  does  it  mean,  to  see?    “to  know  what  is   where  by  looking”.   •  How  to  discover  from  images  what  is  present   in  the  world,  where  things  are,  what  ac2ons   are  taking  place.   from  Marr,  1982  
  • 9. The  importance  of  images   Some  images  are  more  important  than  others     “Dora  Maar  au  Chat”   Pablo  Picasso,  1941   100  million  $  
  • 10. Why  is  vision  hard?  
  • 11. The  structure  of  ambient  light  
  • 12. The  structure  of  ambient  light  
  • 13. The  Plenop2c  Func2on   Adelson & Bergen, 91 The intensity P can be parameterized as: P (θ, φ, λ, t, X, Y, Z) “The complete set of all convergence points constitutes the permanent possibilities of vision.” Gibson
  • 14.
  • 15.
  • 16.
  • 17.
  • 18. Why  is  vision  hard?  
  • 19. Measuring  light  vs.  measuring   scene  proper2es   We perceive two squares, one on top of each other.
  • 20. Measuring  light  vs.  measuring  scene   proper2es   by Roger Shepard (”Turning the Tables”) Depth processing is automatic, and we can not shut it down…
  • 21. Measuring  light  vs.  measuring   scene  proper2es  
  • 22. Measuring  light  vs.  measuring   scene  proper2es  
  • 23. Measuring  light  vs.  measuring   scene  proper2es   (c) 2006 Walt Anthony
  • 24. Assump2ons  can  be  wrong   Ames  room  
  • 26. Why  is  vision  hard?  
  • 27. Some  things  have  strong  varia2ons   in  appearance  
  • 28. Some  things  know  that  you  have  eyes   Brady,  M.  J.,  &  Kersten,  D.  (2003).  Bootstrapped  learning  of  novel  objects.  J  Vis,  3(6),  413-­‐422    
  • 29. A  short  history  of  vision  
  • 31.
  • 32. The  crisis  of  the  80’s  
  • 33. Object  recogni2on   Is  it  really  so  hard?   Yes,  object  recogni2on  is  hard…   (or at least it seems so for now…)
  • 34. Challenges 1: view point variation Michelangelo 1475-1564
  • 35. Challenges 2: illumination slide credit: S. Ullman
  • 36. Challenges 3: occlusion Magritte, 1957
  • 38. Challenges 5: deformation Xu, Beihong 1943
  • 39. Challenges 6: background clutter Klimt, 1913
  • 41. Challenges Brady, M. J., & Kersten, D. (2003). Bootstrapped learning of novel objects. J Vis, 3(6), 413-422
  • 42. Discover the camouflaged object Brady, M. J., & Kersten, D. (2003). Bootstrapped learning of novel objects. J Vis, 3(6), 413-422
  • 43. Discover the camouflaged object Brady, M. J., & Kersten, D. (2003). Bootstrapped learning of novel objects. J Vis, 3(6), 413-422
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 50.
  • 51. So,  let’s  make  the  problem  simpler:   Block  world   Nice framework to develop fancy math, but too far from reality… Object Recognition in the Geometric Era: a Retrospective. Joseph L. Mundy. 2006
  • 52. Binford  and  generalized  cylinders   Object Recognition in the Geometric Era: a Retrospective. Joseph L. Mundy. 2006
  • 53. Binford  and  generalized  cylinders  
  • 54. Recogni2on  by  components   Irving Biederman Recognition-by-Components: A Theory of Human Image Understanding. Psychological Review, 1987.
  • 55. Recogni2on  by  components   The  fundamental  assump2on  of  the  proposed  theory,   recogni2on-­‐by-­‐components  (RBC),  is  that  a  modest  set  of   generalized-­‐cone  components,  called  geons  (N  =  36),  can  be   derived  from  contrasts  of  five  readily  detectable  proper2es  of   edges  in  a  two-­‐dimensional  image:  curvature,  collinearity,   symmetry,  parallelism,  and  cotermina2on.   The  “contribu2on  lies  in  its  proposal  for  a  par2cular  vocabulary   of  components  derived  from  perceptual  mechanisms  and  its   account  of  how  an  arrangement  of  these  components  can   access  a  representa2on  of  an  object  in  memory.”  
  • 56. A  do-­‐it-­‐yourself  example   1)  We know that this object is nothing we know 2)  We can split this objects into parts that everybody will agree 3)  We can see how it resembles something familiar: “a hot dog cart” “The naive realism that emerges in descriptions of nonsense objects may be reflecting the workings of a representational system by which objects are identified.”
  • 57. Stages  of  processing   “Parsing is performed, primarily at concave regions, simultaneously with a detection of nonaccidental properties.”
  • 58. Non  accidental  proper2es   Certain properties of edges in a two-dimensional image are taken by the visual system as strong evidence that the edges in the three-dimensional world contain those same properties. Non accidental properties, (Witkin & Tenenbaum,1983): Rarely be produced by accidental alignments of viewpoint and object features and consequently are generally unaffected by slight variations in viewpoint. image ?
  • 59. Examples: •  Colinearity •  Smoothness •  Symmetry •  Parallelism •  Cotermination
  • 60. From  generalized  cylinders  to  GEONS   “From variation over only two or three levels in the nonaccidental relations of four attributes of generalized cylinders, a set of 36 GEONS can be generated.” Geons represent a restricted form of generalized cylinders.
  • 61. Objects  and  their  geons  
  • 62. Scenes  and  geons   Mezzanotte & Biederman
  • 63. The  importance  of  spa2al   arrangement  
  • 64. Parts and Structure approaches With a different perspective, these models focused more on the geometry than on defining the constituent elements: •  Fischler & Elschlager 1973 •  Yuille ‘91 •  Brunelli & Poggio ‘93 •  Lades, v.d. Malsburg et al. ‘93 •  Cootes, Lanitis, Taylor et al. ‘95 •  Amit & Geman ‘95, ‘99 •  Perona et al. ‘95, ‘96, ’98, ’00, ’03, ‘04, ‘05 •  Felzenszwalb & Huttenlocher ’00, ’04 Figure from [Fischler & Elschlager 73] •  Crandall & Huttenlocher ’05, ’06 •  Leibe & Schiele ’03, ’04 •  Many papers since 2000
  • 65. But,  despite  promising  ini2al  results…things  did  not   work  out  so  well  (lack  of  data,  processing  power,  lack   of  reliable  methods  for  low-­‐level  and  mid-­‐level   vision)   Instead,  a  different  way  of  thinking  about  object   detec2on  started  making  some  progress:  learning   based  approaches  and  classifiers,  which  ignored  low   and  mid-­‐level  vision.   Maybe  the  2me  is  here  to  come  back  to  some  of  the   earlier  models,  more  grounded  in  intui2ons  about   visual  percep2on.  
  • 67. Neocognitron   Fukushima (1980). Hierarchical multilayered neural network S-cells work as feature-extracting cells. They resemble simple cells of the primary visual cortex in their response. C-cells, which resembles complex cells in the visual cortex, are inserted in the network to allow for positional errors in the features of the stimulus. The input connections of C-cells, which come from S-cells of the preceding layer, are fixed and invariable. Each C-cell receives excitatory input connections from a group of S-cells that extract the same feature, but from slightly different positions. The C-cell responds if at least one of these S-cells yield an output.
  • 68. Neocognitron   Learning is done greedily for each layer
  • 69. Convolu2onal  Neural  Network   Le Cun et al, 98 The output neurons share all the intermediate levels
  • 70. Face detection and the success of learning based approaches •  The representation and matching of pictorial structures Fischler, Elschlager (1973). •  Face recognition using eigenfaces M. Turk and A. Pentland (1991). •  Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995) •  Graded Learning for Object Detection - Fleuret, Geman (1999) •  Robust Real-time Object Detection - Viola, Jones (2001) •  Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001) • ….
  • 71. •  The representation and matching of pictorial structures Fischler, Elschlager (1973). •  Face recognition using eigenfaces M. Turk and A. Pentland (1991). •  Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995) •  Graded Learning for Object Detection - Fleuret, Geman (1999) •  Robust Real-time Object Detection - Viola, Jones (2001) •  Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001) • ….
  • 73. The face age Feret dataset, 1996 DARPA •  The representation and matching of pictorial structures Fischler, Elschlager (1973). •  Face recognition using eigenfaces M. Turk and A. Pentland (1991). •  Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995) •  Graded Learning for Object Detection - Fleuret, Geman (1999) •  Robust Real-time Object Detection - Viola, Jones (2001) •  Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001) • ….
  • 74. Rapid Object Detection Using a Boosted Cascade of Simple Features Paul Viola Michael J. Jones Mitsubishi Electric Research Laboratories (MERL) Cambridge, MA Most of this work was done at Compaq CRL before the authors moved to MERL Manuscript available on web: http://citeseer.ist.psu.edu/cache/papers/cs/23183/http:zSzzSzwww.ai.mit.eduzSzpeoplezSzviolazSzresearchzSzpublicationszSzICCV01-Viola-Jones.pdf/viola01robust.pdf
  • 75. Haar-like filters and cascades Viola and Jones, ICCV 2001 The average intensity in the block is computed with four sums independently of the block size. Also Fleuret and Geman, 2001
  • 77. •  The representation and matching of pictorial structures Fischler, Elschlager (1973). •  Face recognition using eigenfaces M. Turk and A. Pentland (1991). •  Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995) •  Graded Learning for Object Detection - Fleuret, Geman (1999) •  Robust Real-time Object Detection - Viola, Jones (2001) •  Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001) • ….
  • 78. Families of recognition algorithms Voting models Shape matching Bag of words models Deformable models Viola and Jones, ICCV 2001 Berg, Berg, Malik, 2005 Csurka, Dance, Fan, Willamowski, and Heisele, Poggio, et. al., NIPS 01 Cootes, Edwards, Taylor, 2001 Bray 2004 Schneiderman, Kanade 2004 Sivic, Russell, Freeman, Zisserman, Vidal-Naquet, Ullman 2003 ICCV 2005 Rigid template models Constellation models Fischler and Elschlager, 1973 Sirovich and Kirby 1987 Turk, Pentland, 1991 Burl, Leung, and Perona, 1995 Weber, Welling, and Perona, 2000 Dalal & Triggs, 2006 Fergus, Perona, & Zisserman, CVPR 2003
  • 79. Scene understanding Torralba,  Sinha  (2001)   Torralba  Murphy  Freeman  (2004)   Carboneio,  de  Freitas  &  Barnard  (2004)   Fink  &  Perona  (2003)   Rabinovich  et  al  (2007)   Sudderth,  Torralba,   Wilsky,  Freeman  (2005)     Hoiem,  Efros,  Hebert  (2005)   Kumar,  Hebert  (2005)   Choi, Lim, Torralba, Desai,  Ramanan,  and  Fowlkes  (2009)   Willsky (2010) Heitz  and  Koller  (2008)  
  • 80. NSF Frontiers in computer vision workshop, 2011
  • 83. The  labeling  crisis   SKY TREE PERSON BENCH PERSON PATH LAKE PERSON DUCK PERSON DUCK SIGN DUCK GRASS
  • 84. So what does object recognition involve? Slide by Fei-Fei, Fergus, Torralba
  • 85. Verification: is that a lamp? Slide by Fei-Fei, Fergus, Torralba
  • 86. Detection: are there people? Slide by Fei-Fei, Fergus, Torralba
  • 87. Identification: is that Potala Palace? Slide by Fei-Fei, Fergus, Torralba
  • 88. Object categorization mountain tree building banner street lamp vendor people Slide by Fei-Fei, Fergus, Torralba
  • 89. Scene and context categorization •  outdoor •  city •  … Slide by Fei-Fei, Fergus, Torralba
  • 90. Is this space large or small? How far are the buildings in the back? Slide by Fei-Fei, Fergus, Torralba
  • 91. Activity What is this person doing? What are these two doing?? Slide by Fei-Fei, Fergus, Torralba
  • 92. What  are  we  tuned  to?   The  visual  system  is  tuned  to  process  structures  typically  found  in  the  world.    
  • 93. The visual system seems to be tuned to a set of images: Demo inspired from D. Field
  • 95. Did you saw this image?
  • 97. Did you saw this image?
  • 98. Data Human vision • Many input modalities • Active • Supervised, unsupervised, semi supervised learning. It can look for supervision. Robot vision • Many poor input modalities • Active, but it does not go far Internet vision • Many input modalities • It can reach everywhere • Tons of data
  • 100. Active stereo with structured light Li Zhang’s one-shot stereo camera 1 camera 1 projector projector camera 2 Project “structured” light patterns onto the object •  simplifies the correspondence problem Li Zhang, Brian Curless, and Steven M. Seitz. Rapid Shape Acquisition Using Color Structured Light and Multi-pass Dynamic Programming. In Proceedings of the 1st International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT), Padova, Italy, June 19-21, 2002, pp. 24-36. CSE 576, Spring 2008 Szeliski Slide credit: Rick Stereo matching 100
  • 101. CSE 576, Spring 2008 Stereo matching 101
  • 102. 102
  • 103.
  • 104.
  • 106.
  • 107. Class goals •  Vision and language •  Vision and robotics •  Vision and others The strategies our visual system uses are tuned to our visual world To provide the right vision tools for not vision experts Thinking about the tasks to find new representations