Voginip lezing 2015: Classificeren zonder voorbeelden

Visual Classification
without examples
Classificeren van beeld zonder voorbeelden
Thomas Mensink
VOGIN-IP-LEZING 2015

• What is an axolotl?
• Some examples
Preview
VOGIN-IP 20152

VOGIN-IP 20153
We can classify based on labeled examples
(supervised learning)

Preview
• What is an aye-aye?
• Textual description:
– Is nocturnal
– Lives in trees
– Has large eyes
– Has long middle fingers
VOGIN-IP 20154

VOGIN-IP 20155
We can classify based on a textual description
(and some prior knowledge)

VOGIN-IP 20156
Can a computer do the same?
(yes, that is what this talk is about)

Agenda
• Supervised Visual Classification
• Attribute-Based Classification
• Co-occurrence Based Classification
VOGIN-IP 20157

Computer Vision – in the news
VOGIN-IP 20158

Visual Recognition
9
Cityscap
eOutdoor
…
tree
Buildin
g
Lamp
People
John
Dam
Slide credit: Jan van Gemert
VOGIN-IP 2015

Supervised Classification
• Obtain annotated examples
• Find a representation
• Train a generic classifier
VOGIN-IP 201510
Remarks:
- New class: retrain on new examples
- How to obtain training examples?
- How to represent images?

Visual Classification: Two trends
Datasets
• 2005: motorbikes,
bicycles, people, cars
• Since 2010: ImageNet
15K
Representations
• 2005: Manual derived
features/encodings
• 2012: Trained end-to-
end, Deep Neural Nets
VOGIN-IP 201511

VOGIN-IP 201512
1000 classes, 5 guesses per image
Current state-of-the-art: 6.7% error

Estimating human performance
VOGIN-IP 201513
Andrej Karpathy: “I realized that I needed to go through the painfully
long training process myself”Test set 1500 images
GoogleNet: 6.8% error
Karpathy: 5.1% error

VOGIN-IP 201514
Visual Classification:
near-human performance
when ample train data is available

Attribute-based classification
VOGIN-IP 201515
1. Define vocabulary
2. Train visual classifiers
3. Class to attribute mapping
4. Infer class

What are good attributes?
• Good attributes
– are task and category dependent;
– class discriminative, but not class specific;
– interpretable by humans; and
– detectable by computers
VOGIN-IP 201516

Quiz: What are good attributes?
• is grey?
• is made of atoms?
• lives in Amsterdam?
• is sunny?
• eat fish?
• has a SIFT descriptor with empty bin 3?
• has 4 wheels?
VOGIN-IP 201517

How many attributes?
• In theory k binary attributes can represent
– 2k classes
• In practice for c classes we need
– Many attributes
VOGIN-IP 201518

Animals with Attributes
VOGIN-IP 201519

Animals with Attributes - Vocabular
VOGIN-IP 201520

Class to attribute mapping
VOGIN-IP 201521

Attribute Based Prediction
1. Learn attribute classifiers
from related classes
2. Train and Test set are
disjoint
3. Infer attributes from new
test image
4. Use attribute-to-class
mapping to predict class
VOGIN-IP 201522

Animals with Attributes (results)
VOGIN-IP 201523

Disadvantages
• Unnatural distinction between
– Attributes to be detected
– Classes of interest
• Inherently multi-class zero-shot prediction
VOGIN-IP 201524

Classification based on co-occurrences
I’m looking for a label, which I have not seen
before. However, this picture contains also:
1. Indoor
2. Living room
3. Table
4. Chair
VOGIN-IP 201525

VOGIN-IP 201526
We can classify based on context

COSTA: Design
VOGIN-IP 201528
• Many visual concepts can be described as an open
set of concept-to-concept relations
• Describe image semantics with co-occurrences
• Exploit natural bias in natural images

Exploit natural bias in natural images
VOGIN-IP 201529
Sink"is"u
visual"spa
a"stove,"
(2)
onal to the
ments weestimate lab
truthlabellingof our i
occurrencestatisticsc
pora,e.g.,Wordnet or
suchasYahoo, Google
3
Lreg =
i
kwi −
k
akw
=
X
i
X
d
wid − a>
whereindex i and k both run over th
sii = 0. The vector vid contains
weightedweight vectorsvidk = sik w
Notethat thelossisformulatedove
over train images. Moreover, Eq. (9)
=
i d
wid −
whereindex i and k both run ov
sii = 0. The vector vid conta
weightedweight vectorsvidk = s
Notethat thelossisformulated
over train images. Moreover, Eq
obtainedinclosed-formusingrid
weobservethat regularization is
formance, thedimensionality of a
(1)
etween the
s paper, we
atistics be-
erent simi-
two labels.
informativeclueabout thev
isalsoshowninanimagere
In addition to the positi
c++
ij , wealsousetheother p
thepresenceof label i with
senceof label i withthepres
of bothlabels, denotedby c+
i
eachof thesedeﬁnitionsof
larity measuresdeﬁnedabov
Using the positive and
weight vector w of anunkn

COSTA: Classifier
• Goal: Estimate classifier for unseen label
• Knowledge base:
– k trained classifiers
– Co-occurrences
• Zero-shot classifier:
VOGIN-IP 201530

Co-Occurrence Statistics
• Ground-truth data (proof-of-concept)
• Web search engines
• Flickr Tags
• Language resources
• Visual annotated data (eg Microsoft COCO)
VOGIN-IP 201531

Example: Beach Holiday
VOGIN-IP 201532
Concept Normalized Co-Oc Weight
Sea 0.1810
Water 0.0992
Summer 0.0548
LandscapeNature 0.0435
SunsetSunrise 0.0383
Sports 0.0367
Travel 0.0347
Ship 0.0346
Sunny 0.0319
Big Group 0.0282

Example: Beach Holidays
VOGIN-IP 201533

Results per concept
VOGIN-IP 201534

Co-occurrences from the Web
VOGIN-IP 201535

Conclusions
• Supervised visual classification performs well
when ample train data is available
• Classification without examples:
– Define some set of base classifiers
– Transfer new class to space of these classifiers
– Two examples: attributes and co-occurrences
VOGIN-IP 201536

Thanks to:
• The organizers
• Christoph Lampert for slides and inspiration
• Authors of the cited papers
• Colleagues and supervisors (UvA: Amir, Cees, Jan, Spencer &
Stratis, PhD: Cordelia, Florent, Gabriela, Jakob)
VOGIN-IP 201537

Literature
• Frome, Corrado, Shlens, Bengio, Dean, Ranzato,and Mikolov,
“DeViSE: A Deep Visual-Semantic Embedding Model”, NIPS 2013
• Habibian, Mensink, and Snoek, “VideoStory: A New Multimedia
Embedding for Few-Example Recognition and Translation of Events”,
ACM MM 2014
• Lampert, Nickish, and Harmeling, “Attribute-Based Classification for
Zero-Shot Learning of Object Categories”, TPAMI 2013
• Li, Gavves, Mensink, and Snoek, “Attributes Make Sense on
Segmented Objects”, ECCV 2014
• Mensink, Gavves, and Snoek, “COSTA: Co-Occurrence Statistics for
Zero-Shot Classification”, CVPR 2014
• Norouzi, Mikolov, Bengio, Singer, Shlens, Frome, Corrado, and Dean,
“Zero-Shot Learning by Convex Combination of Semantic
Embeddings”, ICLR 2014
VOGIN-IP 201538

Voginip lezing 2015: Classificeren zonder voorbeelden

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (20)

Similaire à Voginip lezing 2015: Classificeren zonder voorbeelden

Similaire à Voginip lezing 2015: Classificeren zonder voorbeelden (20)

Dernier

Dernier (20)

Voginip lezing 2015: Classificeren zonder voorbeelden

Notes de l'éditeur