Object recognition is the process of identifying objects in images or videos. It involves segmenting objects from scenes, labeling objects, and extracting parametric descriptions. Object recognition has many applications including locating objects to recognize other objects or guide actions. It involves binding visual features into coherent objects and binding objects into spatial arrangements. Major challenges include handling different viewpoints and solving the binding problem of associating features with objects. Common approaches to object recognition include template matching, hierarchical processing of features, and recognizing objects as combinations of geometric parts or geons.
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Object recognition
1.
2. What is Object Recognition?
DescriptionsSensory data
Color, texture, shape, “toy”, “stuffed Pooh”, “a
motion, size, weight, frontal, close-up shot of
smell, touch, sound, … stuffed Pooh”, “ToysRus
#345812”, …
One of the fundamental problems of computer vision:
+
Apple
3. 8
Applications of Object Recognition
Some images from http://www.cs.utexas.edu/~grauman/research/research.html
4. What is Object Recognition?
› Segmentation/Figure-Ground Separation: prerequisite or
consequence
› Labeling an object [The focus of most studies]
› Extracting a parametric description as well
Object Recognition versus Scene Analysis
› An object may be part of a scene or
› Itself be recognized as a “scene”
What is Object Recognition for?
› As a context for recognizing something else (locating a house
by the tree in the garden)
› As a target for action (climb that tree)
5. We bind features of objects into objects (feature binding)
We bind objects in space into some arrangement (space
binding)
We perceive the scene.
Feature binding = what/how stream
Space binding = where stream
Double role of spatial relationships:
› To relate different portions of an object or scene as a guide to
recognition
› Augmented by other “how” parameters, to guide our behavior
with respect to the observed scene.
6. The binding problem: binding different features
(color, orientation, etc) to yield a unitary percept.
(see next slide)
Bottom-up vs. top-down processing: how
much is assumed top-down vs. extracted
from the image?
Perception vs. recognition vs. categorization: seeing an
object vs. seeing is as something. Matching views of
known objects to memory vs. matching a novel object to
object categories in memory.
Viewpoint invariance: a major issue is to recognize objects
irrespective of the viewpoint from which we see them.
7. 1) pixel-based (light intensity)
2) primal sketch (discontinuities in intensity)
3) 2 ½ D sketch (oriented surfaces, relative
depth between surfaces)
4) 3D model (shapes, spatial relationships,
volumes)
8.
9. Major problem for recognition.
Biederman & Gerhardstein, 1994:
We can recognize two views of an unfamiliar object
as being the same object.
Thus, viewpoint invariance cannot only rely on
matching views to memory.
10. Direct Template Matching:
Processing hierarchy yields activation of
view-tuned units.
A collection of view-tuned units is
associated with one object.
View tuned units are built from V4-like
units,
using sets of weights which differ for
each object.
11.
12. Hierarchical Template Matching:
Image passed through layers of units with progressively more
complex features at progressively less specific locations.
Hierarchical in that features at one stage are built from
features at earlier stages.
e.g., Fukushima & Miyake (1982)’s Neocognitron:
Several processing layers, comprising simple (S) and complex
(C) cells.
S-cells in one layer respond to conjunctions of C-cells in previous
layer.
C-cells in one layer are excited by small neighborhoods of S-
cells.
13.
14. Transform & Match:
First take care of rotation, translation, scale, etc.
invariances.
Then recognize based on standardized pixel
representation of objects.
e.g., Olshausen et al, 1993,
dynamic routing model
Template match: e.g., with
an associative memory based on
a Hopfield network.
15.
16. GEONS: geometric elements of which all objects are
composed (cylinders, cones, etc). On the order of 30
different shapes.
Skips 2 ½ D sketch: Geons are directly recognized
from edges, based on their nonaccidental properties
(i.e., 3D features that are usually preserved by the
projective imaging process).
17. They are sufficiently different from each other
to be easily discriminated
They are view-invariant (look identical from
most viewpoints)
They are robust to noise (can be identified
even with parts of image missing)
18. Edelman, 1997
A. Structural description not
enough, also need metric info
B. Difficult to extract geons
from real images
C. Ambiguity in the structu-
ral description: most often
we have several candidates
D. For some objects,
deriving a structural repre-
sentation can be difficult
19. Structural approach to object recognition:
Biederman, 1987:
Complex objects are composed so simpler pieces
We can recognize a novel/unfamiliar object by parsing it
in terms of its component pieces, then comparing the
assemblage of pieces to those of known objects.
20. Biederman et al. (1991 – )
“geons”: units of
3D geometric structure
21.
22. 37
Object Recognition in Practice
Commercial object recognition
Currently a $4 billion/year industry for inspection and
assembly
Almost entirely based on template matching
Upcoming applications
Mobile robots, toys, user interfaces
Location recognition
Digital camera panoramas, 3D scene modeling
courtesy of David Lowe,
website and CVPR 2003 Tutorial