Video concept detection by learning from web images
1. Video Concept Detection by Learning
from Web Images
Shiai Zhu, Ting Yao, Chong-Wah Ngo
City University of Hong Kong
2. Why concepts for media remixing?
00:00 00:10 00:20 Time (seconds)
Cake is in preparation
Someone is talkingSomeone is talking
Background music is playing
…
A example of event recounting
4. Why concept learning is challenging?
Requires thousands of concepts for practical applications
Collecting training examples is always expensive
5. Economic solution: Get it free from Internet
Thousands of new
upload per minute
Zoom
Wow
Usa
Texas
Critter
Creature
Olympus
Closeup
Cat
June
2010
Animal
pet
11. Painful Experience on TRECVID
64 runs of other SIN systems
TRECVID training data alone
Cross domain learning
Web image alone
Baseline detector ASVM-SF TradaBoost-SF SP SF
Negative transfer!
Negative transfer happens when knowledge
transfer has a negative impact on target domain
Dataset Training
set
Testing
set
# evaluated
concepts
# positive
instances
TRECVID 2011 266,474 137,327 50/346 1800
12. Positive or negative transfer?
Number of training examples?
Type of a concept?
People, object, scene, event
Change of data distribution?
2/4
1/13
1/14
2/19
< 500 positive examples
500~1000
1001~2000
>2000
Percentage of improved
concepts versus number of
positive training examples
13. A case study on cross-domain learning
Target domain (Web videos)
– TRECVID 2012 dataset
Source domain (Web images)
– Semantic Field
1000 positive examples per concept
– Semantic Pooling
SF + additional 1000 examples per concept
16,367 of concepts + 0.7 million images for pooling
Dataset Training
set
Testing
set
# evaluated
concepts
# positive
instances
TRECVID 2012 400,289 145,634 46/346 1200
15. Number of positive examples?
A-SVM-SF: Semantic Field + A-SVM
A-SVM-SP: Semantic Pooling + A-SVM
Baseline: learnt using TRECVID training example
MinfAP
Number of positive instances
Baseline A-SVM-SF A-SVM-SP
-transfer
+transfer
Pooling is a practical
strategy to diversify the
coverage of training
examples
22/46 concepts improve
if each concept only has
100 positive examples
16. Type of Concept?MinfAP
Number of positive instances
MinfAP
Number of positive instances
MinfAP
Number of positive instances
MinfAP
Number of positive instances
Scene
(15)
Object (10)
People
(12)
Event (8)
Probably not a good idea to use images for learning event
17. Change in Data Distribution?
Maximum Mean Discrepancy (MMD)
23 concepts with lower mismatch 23 concepts with higher mismatch
20. Question?
• Using Web images
to learn concept
classifiers for video
(TRECVID) domain
• When positive examples in
target domain < 100
• Event might be difficult to
transfer
• Data distribution can be a
cue to predict the difficulty
• Pooling strategy
has a better chance
to survive positive
transfer
• Feasibility of
transfer learning?
Key ideas Messages