In this talk, I will discuss the extensions we have made to our approach to semantic image segmentation. I will show how the results of object detectors and spatial priors can be naturally integrated into our hierarchical conditional random field (HCRF) approach based on the harmony potential. The addition of these extra cues, as well as class-specific normalization of classifier outputs, significantly improves segmentation quality.
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
PASCAL VOC 2010: semantic object segmentation and action recognition in still images
1. Introduction
Harmony potential 2.0: fusing across scale
Action recognition
Discussion
PASCAL VOC 2010
Semantic object segmentation and action recognition in still images
Andrew D. Bagdanov
bagdanov@cvc.uab.es
´
Departamento de Ciencias de la Computacion
´
Universidad Autnoma de Barcelona
Xavier Pep Nataliya Wenjuan Fahad
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
2. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Overview
On 03/05/2010 the PASCAL VOC competition was announced
and the training and validation sets published.
20 semantic categories for the competition remain the same:
aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, diningtable,
dog, horse, motorbike, person, potted plant, sheep, sofa, train, and tv/monitor.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
3. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Old competitions, new competitions
There are two (+ 1/2) main challenges in PASCAL.
Image classification is the prediction of the presence/absence of
an instance of class in a test image.
Object detection is the prediction of the bounding box and label
of each object from the twenty target classes in a test image.
Semantic image segmentation is the assignment of one of the
twenty class labels to every pixel in a test image.
Image segmentation is becoming a mainstream competition.
Action recognition in still images was included as a new “taster
challenge” this year.
Taster competitions are used to measure interest in new problems.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
4. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Our contributions to PASCAL VOC 2010
Last year we participated in the Detection, Classification and
Segmentation challenges.
This year we decided to concentrate on Classification and
Segmentation. Our segmentation technique relies heavily on
classification.
We also fielded a team in Action Recognition this year to see
what that’s all about.
As always, success in PASCAL VOC challenges is approximately
85% engineering, 10% inspiration and 5% luck (if you’re lucky).
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
5. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Outline
1 Introduction
Overview of the challenges
Our contribution and main ideas
2 The harmony potential 2.0: fusing across scale
Building on last year’s submission
Fusing across scales and learning
3 Action recognition
A torrent of features
Exploiting the size of the problem
4 Discussion
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
6. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Giving semantics to pixels
Image Object Class
Semantic image segmentation is not object segmentation
Only for simple cases are they the same.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
7. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Turning a hard problem into a harder one
Image Object Class
The object is to assign semantic labels to every pixel
Fine distinctions must be made
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
8. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Make that a very hard one
Image Object Class
The objective is to assign semantic labels to every pixel
Fine distinctions must be made
Occlusions, varying viewpoint and size complicate things
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
9. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Action recognition in still images
New competition this year: human action recognition in still
images.
Individual images sampled from the Flikr dataset.
Bounding boxes of the human in each image is provided.
Very important: we don’t have to solve the detection problem.
Action recognition is offered as a “taster challenge” in order to
gauge interest in the general problem.
It was difficult to hypothesize about what would succeed and what
would not in this challenge.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
10. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Action classes
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
11. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Segmentation: the role of context
Context provides very important cues for make fine
discriminations at the (super-) pixel scale.
We can exploit three levels of scale: local, mid-level and global
[Zhu, NIPS2008].
Existing techniques apply overly-simplified models of context that
do not generalize upward from local to global scales.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
12. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Segmentation: global constraints on label
combinations
Our principal idea is to use global Classification to enhance
segmentation results.
Global image classification results tend to be less noisy than ones.
We will use them to constrain the combinations of semantic labels
we are likely to encounter during segmentation.
We showed last year how a tractable inference technique can be
devised for this labeling problem (our PASCAL 2009 entry).
This year we also show how mid-level context can be incorporated
in the form of object detections.
We also show how position priors cam be similarly incorporated
into the framework to provide class specific location information.
Finally, we devised a stochastic steepest ascent technique for
optimizing the many parameters in a class-specific way.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
13. Introduction PASCAL VOC 2010
Harmony potential 2.0: fusing across scale Semantic image segmentation
Action recognition Action recognition
Discussion Our main ideas
Action recognition: driven by data limitations
Initial experiments confirmed our intuition about the limitations of
the data.
Structural learning: sampling of pose space not dense enough.
Latent SVM: object interactions under-sampled as well.
Multiple kernel learning: converges to simple selection.
From a very early stage, we decided to treat action recognition as
an image classification problem.
We exploit the small size dataset by performing extensive cross
validation.
Features are one of our string points, and we had to get the
feature pipeline running for Classification in any case.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
14. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
HCRFs for labeling problem
We represent our segmentation problem as a graph: G = (V, E)
V is used for indexing random variables, and E is the set of
undirected edges representing compatibility relationships between
random variables.
X = {Xi } denotes the set of random variables or nodes, for i ∈ V.
An energy function will be defined over graphical configurations of
random variables.
By the Hammersley-Clifford theorem, the energy of a configuration
of x = {xi } can be written as the negative exponential of an
energy function E(x) = c∈C ϕc (xc ), where ϕc is the potential
function of clique c ∈ C.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
15. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Consistency potentials for labeling problems
The energy function of G can be written as:
E(x) = φ(xi ) + ψL (xi , xj ) + ψG (xi , xg ).
i∈V (i,j)∈EL (i,g)∈EG
The unary term φ(xi ) depends on a single probability
P(Xi = xi |Øi ), where Øi is the observation that affects Xi in the
model.
The smoothness potential ψL (xi , xj ) determines the pairwise
relationship between two local nodes.
The consistency potential ψG (xi , xg ) expresses the dependency
between local nodes and a global node.
And the Maximum a Posteriori (MAP) estimate of the optimal
labeling is:
x∗ = arg min E(x).
x
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
16. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
HCRF models of image segmentation
Smoothness Potts Robust P N
Free
(Shotten et al, CVPR2008) (Plath et al, ICML2009) (Ladicky et al, ICCV2009)
Colored nodes represent (hidden) semantic labels.
Dark nodes represent image measurements.
Red edges represent penalties imposed by potential.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
17. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Different features for discriminations
The previously mentioned approaches all try to make global
distinctions using local information.
Either by voting of local observations (Potts).
Or, by penalizing rampantly discordant local label assignments
PN .
None of these techniques try to exploit truly global information to
constrain local labels.
And none incorporate the notion of encoding combinations of
primitive node labels at the global level.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
18. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
The harmony potential: selective subsets
Only labels that do not agree with subset are penalized.
Can represent more diverse combinations.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
19. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
The harmony potential: overview
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
20. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Ranked subsampling of P(L)
We can do this using the following posterior:
∗ ∗ ∗
P( ⊆ xg |Ø) ∝ P( ⊆ xg )P(O| ⊆ xg ).
This allows us to effectively rank possible global node labels, and
∗
thus to prioritize candidates in the search for the optimal label xg .
∗
P( ⊆ xg |O) establishes an order on subsets of the (unknown)
∗
optimal labeling of the global node xg that guides the
consideration of global labels.
We may not be able to exhaustively consider all labels in P(L), but
∗
at least we consider the most likely candidates for xg .
And image classification can give us an estimate of this posterior.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
21. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
PASCAL 2010: pushing the limit
The previous slides describe our approach used for the PASCAL
2009 submission.
The discriminative model was based on only SVMs trained to
discriminate object classes from their own backgrounds.
Starting with the harmony potential approach, this year we
concentrated on adding cues derived from different levels of
mid-level context.
We found the HCRF model with harmony potential to be very
useful for performing this fusion.
Our hypothesis at the end of the 2009 competition was that
detection would be essential for pushing forward the
state-of-the-art.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
22. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
PASCAL 2010: fusing across scales
1 FG/BG: 20 SVMs trained to discriminate classes from their own
background. The same discriminative model used last year,
essential for localizing object boundaries.
2 CLASS: 20 SVMs trained to discriminate each object class from
the other object. Essential for distinguishing objects with similar
backgrounds (e.g. cows from sheep, birds from planes).
Incorporated directly into unary potential.
3 LOC: 20 class-specific location priors. Computed from ground
truth segmentations by simple, spatial averaging. A form of
top-down mid-level context.
4 OBJ: 20 class-specific object detectors [Felzenszwalb 2010] are
converted to superpixel scores by selecting the highest scoring
detection intersecting each pixel of the superpixel. A type of
bottom-up mid-level context.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
23. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
PASCAL 2010: learning unary potentials
We compute the unary potential by weighting the classification
scores {si (k , xi )}k∈F through a sigmoid function. The unary
potential becomes:
1
φL (xi ) = −µL Ki log
i
1 + exp(fi (k, xi ))
k∈F
fi (k , xi ) = a(k, xi )si (k , xi ) + b(k, xi )
µL is the weighting factor of the local unary potential, and
Ki normalizes over the number of pixels inside the superpixel.
We have two sigmoid parameters for each class/cue pair: a(k , xi )
and b(k , xi ).
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
24. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Datasets
We have evaluated the harmony potential approach on two
standard, publicly available datasets.
The Pascal VOC 2010 Segmentation Challenge dataset contains
2250 color images of 20 different semantic classes.
This set is split into 750 images for training, 750 images for
testing, and 750 for validation.
The Microsoft MSRC-21 dataset contains 591 color images of 21
object classes.
We do our own splits for cross-validation on MSRC-21.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
25. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Unsupervised segmentation
Images are first over-segmented to with quick-shift to derive
super-pixels [Fulkerson, ICCV 2009].
This preserves object boundaries while simplifying the
representation.
Working at the super-pixel level reduces the number of nodes in
the CRF by 102 to 105 per image.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
26. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Local classification scores: P(Xi = xi |Oi )
We extract patches with 50% overlap on a regular grid at several
resolutions (12, 24, 36 and 48 pixels in diameter).
Patches are described with SIFT, color and for MSCR-21 location
features.
A vocabulary is constructed using k-means to quantize to 1000
SIFT words and 400 color words.
An SVM classifier using an intersection kernel is built for each
semantic category.
A similar number of positive and negative examples are used:
around a total of 8.000 superpixel samples for MSCR-21, and
20.000 for VOC 2010 for each class.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
27. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Global potential and general approach
For the PASCAL 2010 dataset we use our entry to the 2010 VOC
Classification Challenge:
[Khan, IJCV2010 (submitted)].
It uses a bag-of-words representation based on SIFT and color
SIFT, plus spatial pyramids and color attention
[Khan, ICCV 2009].
An SVM classifier with a χ2 kernel is trained for each semantic
category in the dataset.
The FG/BG and CLASS cues are computed by training a
discriminative model using an SVM with histogram intersection
kernel.
Except for the additional cues and optimization strategy,
architecture the same as our approach described at CVPR.
[Gonfaus, CVPR2010]
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
28. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Learning the HCRF parameters
We found it to be essential to train the per-class sigmoid
parameters through cross validation.
Classification scores are learned independently, are unbalanced
and are effectively incomparable in many cases.
The sigmoid functions weight the importance of each cue for each
class.
In addition to these (180) sigmoid parameters, we also must learn
the weighting factors for each potential.
We use a stochastic, steepest ascent technique to optimize these
parameters on a validation set.
In each step we randomly generate new instances of parameters.
New parameter instances are generated using a Gibbs-like
sampling strategy.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
30. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Qualitative results: MSRC-21
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
31. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Quantitative results: MSRC-21
MSRC-21 contains more multi-class images than PASCAL.
Our performance demonstrates the benefits of incorporating
global scale when making local decisions.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
32. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Qualitative results: PASCAL 2010
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
33. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Quantitative results: PASCAL 2010
FG/BG shows the performance of our baseline (PASCAL 2009)
approach.
At the top, performance on the validation set (i.e. how well we
thought we were doing).
Image tags indicated how well the technique can perform with
perfect global information.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
34. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
The cost of segmentation
The optimal MAP label configuration x∗ is inferred using
α-expansion graph cuts [Kolmogorov, PAMI2004].
The global node uses the 100 most probable label subsets
Sheet1
obtained from ranked subsampling.
MSRC-21 PASCAL 2010
85 50
48
80
mAP on PASCAL VOC 2010
46
75 44
mAP on MSRC-21
70 42
40
65 38
60 36
34
55
32
50 30
1 2 3 5 10 15 20 25 30 35 40 50 75 100 150 200
# labels selected
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
35. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Qualitative results: PASCAL 2010 failures
Context is sometimes weighted too much.
When the global classifier fails, little can be done.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
36. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
Every little bit helps
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
37. Introduction
Our point of departure
Harmony potential 2.0: fusing across scale
Datasets and implementation
Action recognition
Experimental results
Discussion
A photo finish
Sheet1
Sheet1
42
15 20 25 30 35 40
40
mAP on PASCAL VOC 2010
FG-BG 33.9
CLASS 23.4 38
LOC 20.1 36
OBJ 26.2
34
FG-BG + CLASS 36.6
32
All 40.4
30
0 500 1000 1500 2000 2500 3000
#iterations
The final results are tough to call between BONN and CVC.
In the end, fusion over many scales and per-class, per-feature
parameter optimization won.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
38. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
The action recognition taster
Images collected from Flikr using action queries. A set of nine
actions was chosen in the end.
They are disjoint from the main challenge dataset.
Only subset of people are annotated (bounding box + action).
This subset labelled with exactly one action class.
Important point: we don’t have to solve the detection problem.
Most action classes in the challenge contain either large variation
in scale or large variations in pose (or both).
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
40. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Grouplets and poselets
Two state-of-the art techniques to action recognition in still
images. The grouplets of Fei Fei Li [Yao et al, CVPR2010]:
And the latent poses of Greg Mori [Yang et al, CVPR2010]:
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
41. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Treat it like image classification
Initial experiments confirmed our intuition about the limitations of
the data.
Structural learning: sampling of pose space not dense enough.
Latent SVM: complexity of object interactions problematic.
Multiple kernel learning: converges to simple selection.
State-of-the-art techniques rely on learning complex structural
models of pose-variations over many
From a very early stage, we decided to treat action recognition as
an image classification problem.
We exploit the small size dataset by performing extensive cross
validation.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
42. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
The classification pipeline
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
43. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Action recognition: features
SIFT, color SIFT (normalize R/G and opponent), self-similarity,
SURF, PHOG (good for capturing pose), and color attention
(focuses on interesting color features).
Sparse and dense variations of most of these.
Plus a range of pyramid configurations (1, 2 × 2, 3 × 3, 4 × 4).
Object detectors also incorporated using a simple occurrence
histogram [Felzenszwalb 2010].
The goal was to incorporate all of this into a BoVW classifier and
push the limits of what is possible using classical BoW on actions.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
44. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Action recognition: contextual pyramids
Context was also important for most object classes.
We used a type of foreground/background pyramid decomposition
that split features into object or background.
The was done using a type of spatial soft-assign based on the
distance to the boundary of the object.
For some classes, we also assigned contextual object regions that
model the appearance of objects associated with them (the “horsy
box”).
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
45. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Action recognition: learning in the design space
In the end, after all of the combinatorics introduced by pyramids
and other variations, we had about 100 feature configurations in a
big pool.
Most attempts to automatically learn the parameters of these
features were total failures.
Except one. Initial experiments with multiple kernel learning
showed that MKL starts converging quickly towards class-specific
feature selection rather than mixing.
With such a small dataset, and a little heuristic trimming, we were
able to exhaustively explore a part of the design space.
This resulted in the best per-class feature combinations.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
46. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Action recognition: classification
We experimented with a number of kernels (histogram
intersection, χ2 , bin-ratio distance).
There wasn’t a huge difference among these kernels.
In the end, we chose histogram intersection for our submission as
it appeared to generalize better.
In addition to over-fitting less, there are no parameters to tune and
it is very fast.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
47. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Overall results: average precision
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
48. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Per-class AP
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
49. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Per technique median average precision
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
50. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Qualitative results
When the horsey box and detectors fail, context dominates.
Classifier still surprisingly robust.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
51. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Qualitative results
Some fine discriminations very difficult to make.
Probably difficult even for humans.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
52. Introduction The data
Harmony potential 2.0: fusing across scale State-of-the-art
Action recognition Our approach
Discussion Results
Qualitative results
People taking photos should be banned.
Classes with large pose variations were the most difficult.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
53. Introduction
Harmony potential 2.0: fusing across scale
Action recognition
Discussion
Discussion: semantic image segmentation
The harmony potential works well for fusing global information into
local segmentations.
This year we also showed that the harmony potential framework is
also appropriate for incorporating different types of mid-level cues
as well.
Ranked sub-sampling, driven by the same posterior as used to
define the global potential function, renders the optimization
problem tractable.
Most useful when multiple semantic classes co-occur frequently.
Per-class learning of parameters essential (about +5% in final
results).
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
54. Introduction
Harmony potential 2.0: fusing across scale
Action recognition
Discussion
Discussion: action recognition
This year’s taster challenge on action recognition was little more
than a toy.
However, we have demonstrated what is possible using proven
techniques from image classification.
We feel that object context, in particular object interaction context,
is the way forward.
The PASCAL data set is the right direction to go (more general),
but we need more samples.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
55. Introduction
Harmony potential 2.0: fusing across scale
Action recognition
Discussion
The future: segmentation
Semantic image segmentation has come a long way, but still has a
long way to go.
It is becoming a mainstream event in PASCAL.
This year we arrived as a sort of three-way detente between the
CVC (winner 2010), BONN (winner 2009) and OXFORD (best
paper award ECCV 2010) in segmentation.
Each have their own approach, and each has its advantages and
disadvantages.
Engineering can probably maximize results.
It is becoming mature, and we can begin thinking about what new
applications are enabled by such technologies.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010
56. Introduction
Harmony potential 2.0: fusing across scale
Action recognition
Discussion
The future: action recognition
It seems that action recognition in still images is a popular
challenge.
The PASCAL organizers are keen to promote it for the future.
The concentration will remain on still images, but perhaps more
concentration on incorporating user interaction as well.
It seems that the community is becoming more interested in the
“alternative” PASCAL challenges.
The multimedia community probably has an important role to play
here.
The CVC PASCAL VOC Team CVC PASCAL VOC 2010