Video concept detection by learning from web images

Video Concept Detection by Learning
from Web Images
Shiai Zhu, Ting Yao, Chong-Wah Ngo
City University of Hong Kong

Why concepts for media remixing?
00:00 00:10 00:20 Time (seconds)
Cake is in preparation
Someone is talkingSomeone is talking
Background music is playing
…
A example of event recounting

Recounting using 21 concept classifiers
kitchen, outdoor/indoor, baseball field, crowd, cake, walking, running,
squatting, standing, hand, batting, speech, music, clipping, cheering

Why concept learning is challenging?
Requires thousands of concepts for practical applications
Collecting training examples is always expensive

Economic solution: Get it free from Internet
Thousands of new
upload per minute
Zoom
Wow
Usa
Texas
Critter
Creature
Olympus
Closeup
Cat
June
2010
Animal
pet

Residence
Place of
Worship
Building
Country
House
Temple
Church
Buildin
g
House
Approach I – Semantic Field (SF)

Residence
Place of
Worship
Building
Country
House
Temple
Church
Buildin
g
House
Approach II – Semantic Pooling (SP)

Does it work practically?
Dancing Dancing
Boy
TRECVID videos Flickr images
Ocean
Boy
Ocean

Transfer Learning
Wenyuan Dai, ICML 2007
(TrAdaBoost)
Knowledge
of instance
Kate Saenko, ECCV 2010
(shared representation)
Feature
representation
Jun Yang, ACM MM 2007
(Adaptive‐SVM)
Parameter
(Model)
Yu‐Gang Jiang, ACM MM 2009
(Semantic context transfer)
Rational
knowledge
Transfer
learning

Transfer Learning
Adaptive SVM
 Model-level learning
TrAdaBoost
 Instance-level learning
Target domain (video) data
Source domain (image) data
0.5
0.5 0.5
0.5
0.5
0.5
0.5
0.8
0.5
0.5
0.5
0.3
0.3
0.3

Painful Experience on TRECVID
64 runs of other SIN systems
TRECVID training data alone
Cross domain learning
Web image alone
Baseline detector ASVM-SF TradaBoost-SF SP SF
Negative transfer!
Negative transfer happens when knowledge
transfer has a negative impact on target domain
Dataset Training
set
Testing
set
# evaluated
concepts
# positive
instances
TRECVID 2011 266,474 137,327 50/346 1800

Positive or negative transfer?
 Number of training examples?
 Type of a concept?
 People, object, scene, event
 Change of data distribution?
2/4
1/13
1/14
2/19
< 500 positive examples
500~1000
1001~2000
>2000
Percentage of improved
concepts versus number of
positive training examples

A case study on cross-domain learning
Target domain (Web videos)
– TRECVID 2012 dataset
Source domain (Web images)
– Semantic Field
 1000 positive examples per concept
– Semantic Pooling
 SF + additional 1000 examples per concept
 16,367 of concepts + 0.7 million images for pooling
Dataset Training
set
Testing
set
# evaluated
concepts
# positive
instances
TRECVID 2012 400,289 145,634 46/346 1200

SIFTfeaturespace
Basic Framework

Number of positive examples?
A-SVM-SF: Semantic Field + A-SVM
A-SVM-SP: Semantic Pooling + A-SVM
Baseline: learnt using TRECVID training example
MinfAP
Number of positive instances
Baseline A-SVM-SF A-SVM-SP
-transfer
+transfer
 Pooling is a practical
strategy to diversify the
coverage of training
examples
 22/46 concepts improve
if each concept only has
100 positive examples

Type of Concept?MinfAP
MinfAP
MinfAP
MinfAP
Scene
(15)
Object (10)
People
(12)
Event (8)
 Probably not a good idea to use images for learning event

Change in Data Distribution?
Maximum Mean Discrepancy (MMD)
23 concepts with lower mismatch 23 concepts with higher mismatch

ForestForest
TRECVID Flickr
ComputerComputer
SingingSingingMeetingMeeting
TRECVID Flickr
KitchenKitchen
MotorcycleMotorcycle
ThrowingThrowing
TRECVID Flickr
StadiumStadium
ChairChair
< 10 = 50 > 100
break even point

Average is average
MMD
Break-evenpoint
50
100
Boat_Ship Glasses Singing
AirplaneBaby Male_Person
Airplane_Flying Instrumental_Musician
OceansForest
Man_Wearing_A_Suit
Bridges
Military_Airplane
Fields
Stadium Landscape
SkierPress_Conference
Nighttime
Teenagers
Highway Walking_RunningLakes
Bicycling
Computers
Roadway_Junction
Apartments
Clearing
Girl
Civilian_Person
KitchenMotorcycle
Meeting
Female_Person Government‐Leader
Sitting_Down Hill
Boy
SoldiersChair
Basketball Throwing
OfficeGeorge_Bush
Scene_Text
Greeting
difficulty
difficulty
lower mismatch higher mismatch

Question?
• Using Web images
to learn concept
classifiers for video
(TRECVID) domain
• When positive examples in
target domain < 100
• Event might be difficult to
transfer
• Data distribution can be a
cue to predict the difficulty
• Pooling strategy
has a better chance
to survive positive
transfer
• Feasibility of
transfer learning?
Key ideas Messages

Video concept detection by learning from web images

Recommandé

Recommandé

Contenu connexe

Similaire à Video concept detection by learning from web images

Similaire à Video concept detection by learning from web images (20)

Plus de MediaMixerCommunity

Plus de MediaMixerCommunity (14)

Dernier

Dernier (20)

Video concept detection by learning from web images