From Unsupervised to Semi-Supervised Event Detection

From Unsupervised to
Semi-Supervised Event Detection
Wen-Sheng Chu
Robotics Institute, Carnegie Mellon University
July 9, 2013
1
Jeffery CohnFernando De la Torre

Outline
1. Unsupervised Temporal Commonality
Discovery
(Chu et al, ECCV’12)
2. Personalized Facial Action Unit Detection
(Chu et al, CVPR’13)
2

Unsupervised Commonality Discovery
in Images
Where are the repeated patterns?
3
(Chu’10, Mukherjee’11, Collins’12)

Unsupervised Commonality Discovery
in Videos?
• We name it Temporal Commonality Discovery (TCD).
• Goal: Given two videos, discover common events in
an unsupervised fashion. 4

TCD is hard!
1) No prior knowledge on commonalities
– We do not know what, where and how many
commonalities exist in the video
2) Exhaustive search are computationally prohibitive
– E.g., two videos with 300 frames have >8,000,000,000
possible matches.
possible locations possible lengths
possibilities/sequence
Another possibilities/sequence
5

Formulation
6
Integer programming!

Optimization: Interpretation
7

Optimization: Native Search
Complexity 8

Optimization: Branch-and-Bound
• Similar to the idea of ESS (Lampert’08), we search the
space by splitting intervals.
9

• Bounding histogram bins
10

1. Bounding L1 distance:
2. Intersection similarity:
3. X2 distance:
11

Unlikely
search
regions
(B1,E1,B2,E2; -10)
Searching Structure
(B1,E1,B2,E2; 32)
Priority queue
(sorted by bound scores)
…
(B1,E1,B2,E2; -50)
(B1,E1,B2,E2; -105)
State S = (Rectangle set; score)
12

(B1,E1,B2,E2; -105)
Algorithm
(B1,E1,B2,E2; 32)
Priority queue
…
(B1,E1,B2,E2; -50)
(B1,E1,B2,E2; -105)
Top state
1. Pop out
the top state
2. Split
13

(B1,E1,B2,E2; -105)
Algorithm
(B1,E1,B2,E2; 32)
Priority queue
…
(B1,E1,B2,E2; -50)
Top state
(B1,E’1,B2,E2; -76)
(B1,E’’1,B2,E2; -61)
3. Compute
bounding scores
4. Push back the
split states
14

Algorithm
(B1,E1,B2,E2; 32)
Priority queue
…
(B1,E1,B2,E2; -50)
Top state
(B1,E’1,B2,E2; -76)
(B1,E’’1,B2,E2; -61)
• The algorithm stop when
the top state contains an
unique rectangle.
Omit most of the
search space with
large distances
15

Compare with Relevant Work
1. Difference between TCD and ESS
[1]/STBB[2]
– Different learning framework:
• Unsupervised v.s. Supervised
– New bounding functions for TCD
2. Difference between TCD and [3]
– Different objective:
• Commonality Discovery v.s. Temporal Clustering
[1] “Efficient subwindow search: A branch and bound framework for object
localization”, PAMI 2009.
[2] “Discriminative video pattern search for efficient action detection”, PAMI 2011. 16

Experiment (1): Synthesized Sequence
Histograms of the discovered
pair of subsequences
17

Experiment (2):
Discover Common Facial Actions
• RU-FACS dataset*
– Interview videos with 29 subjects
– 5000~8000 frames/video
– Collect 100 segments that containing smiley mouths (AU-
12)
– Evaluate in terms of averaged precision
18
* “Automatic recognition of facial actions in spontaneous expressions”, Journal of
Multimedia 2006.

Experiment (2):
19

• Parametric settings for Sliding Windows (SW)
• Log of #evaluations:
• Quality of discovered patterns:
• a
Experiment (2): Speed Evaluation
Speed #evaluation of the distance function´
log
nT C D
nSW i
d(r SW i
) ¡ d(r T C D
)
20

Experiment (2):
• Compare with LCCS* on -distance
21
* “Frame-level temporal calibration of unsynchronized cameras by using Longest
Consecutive Common Subsequence”, ICASSP 2009.

Experiment (3): Discover
Multiple Common Human Motions
• CMU-Mocap dataset:
– http://mocap.cs.cmu.edu/
• 15 sequences from Subject 86
• 1200~2600 frames and up to 10 actions/seq
• Exclude the comparison with SW because it
needs >1012 evaluations
22

23

• Compare with LCCS* on -distance
24

Extension: Video Indexing
• Goal: Given a query , find the best common
subsequence in the target video
• A straightforward extension:
Temporal
Search
Space
25

A Prototype for Video Indexing
26

Questions?
[1+ “Common Visual Pattern Discovery via Spatially Coherent
Correspondences,” In CVPR 2010.
[2+ “MOMI-cosegmentation: simultaneous segmentation of multiple objects
among multiple images,” In ACCV 2010.
[3+ “Scale invariant cosegmentation for image groups,” In CVPR 2011.
[4+ “Random walks based multi-image segmentation: Quasiconvexity results
and GPU-based solutions,” In CVPR 2012.
[5+ “Frame-level temporal calibration of unsynchronized cameras by using
Longest Consecutive Common Subsequence,” In ICASSP 2009.
[6+ “Efficient ESS with submodular score functions,” In CVPR 2011.
28
http://humansensing.cs.cmu.edu/wschu/

Outline
1. Unsupervised Temporal Commonality
Discovery
(Chu et al, ECCV’12)
2. Selective Transfer Machine for Personalized
Facial Action Unit Detection
(Chu et al, CVPR’13)
29

AU 6+12
Facial Action Units (AU)
30

Feature Bias
Person specific!
34

Selective Transfer Machine (STM)
Formulation
Maximizes margin of penalized SVM
Minimize distribution mismatch
36

Goal (1): Maximize penalized SVM margin
margin
penalized loss
37

Goal (2): Minimize Distribution Mismatch
• Kernel Mean Matching (KMM)*
38
* “Covariate shift by kernel mean matching”, Dataset shift in machine learning, 2009.

Groundtruth
Bad estimator
for testing data!
39

Better fitting!
Groundtruth
Selection by reweighting
training data
40

42
Optimization: Alternate Convex Search

43
Optimization: Alternative Convex Search

Compare with Relevant Work
44
[1] "Covariate shift by kernel mean matching," Dataset shift in machine
learning, 2009.
[2] "Transductive inference for text classification using support vector
machines," In ICML 1999.
[3] "Domain adaptation problems: A DASVM classification technique and a
circular validation strategy," PAMI 2010.

Experiments
• Features
– SIFT descriptors on 49 facial landmarks
– Preserve 98% energy using PCA
45
Datasets #Subjects #Videos #Frm/vid Content
CK+ 123 593 ~20 NeutralPeak
GEMEP-FERA 7 87 20~60 Acting
RU-FACS 29 29 5000~7500 Interview

Experiment (1): Synthetic Data
46

• Two protocols
– PS1: train/test are separate data of the same subject
– PS2: training subjects include test subject (same protocol in [2])
• GEMEP-FERA
Experiment (2): Comparison with Person-
specific (PS) Classifiers
47

Experiment (2): Selection Ability of STM
48

• 123 subjects, 597 videos, ~20 frames/video
Experiment (3): CK+
49

Experiment (4): GEMEP-FERA
50
• 7 subjects, 87 videos, 20~60 frames/video

• 29 subjects, 29 videos, 5000~7000 frames/vid
Experiment (5): RU-FACS
51

Summary
• Person-specific biases exist among face-
related problems, esp. facial expression
• We propose to alleviate the biases by
personalizing classifiers using STM
• Next
– Joint optimization in terms of
– Reduce the memory cost using SMO
– Explore more potential biases in face problems,
e.g., occurrence bias
52

Questions?
[1] "Covariate shift by kernel mean matching," Dataset shift in machine
learning, 2009.
[2] "Transductive inference for text classification using support vector
machines," In ICML 1999.
[3] "Domain adaptation problems: A DASVM classification technique and a
circular validation strategy," PAMI 2010.
*4+ “Integrating structured biological data by kernel maximum mean
discrepancy”, Bioinformatics 2006.
*5+ “Meta-analysis of the first facial expression recognition challenge,” IEEE
Trans. on Systems, Man, and Cybernetics, Part B, 2012.
53
http://humansensing.cs.cmu.edu/wschu/

From Unsupervised to Semi-Supervised Event Detection

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (18)

Similaire à From Unsupervised to Semi-Supervised Event Detection

Similaire à From Unsupervised to Semi-Supervised Event Detection (20)

Dernier

Dernier (20)

From Unsupervised to Semi-Supervised Event Detection