Traditionele beeld-classificatiesystemen zijn getraind op basis van handmatig gelabelde voorbeelden. Per concept is een groot aantal voorbeelden gelabeld, en dan wordt met behulp van machine-learning technieken een classificatiesysteem aangeleerd. Maar wat als het concept waarin je geïnteresseerd bent, niet tussen die voorbeelden staat? In deze voordracht bespreek ik machine-learning methoden die ook zonder expliciete voorbeelden beelden kunnen classificeren voor een specifiek concept.
Traditional visual classification systems are trained based on manually labeled examples. For each concept a large number of images are annotated, on which a classifier is trained. But what if your desired class or label is not in the vocabulary? In this presentation I discuss techniques used to classify without any examples. My results and methods are based on computer vision applications, but the underlying ideas can be used in other domains as well.
10. Supervised Classification
• Obtain annotated examples
• Find a representation
• Train a generic classifier
VOGIN-IP 201510
Remarks:
- New class: retrain on new examples
- How to obtain training examples?
- How to represent images?
13. Estimating human performance
VOGIN-IP 201513
Andrej Karpathy: “I realized that I needed to go through the painfully
long training process myself”Test set 1500 images
GoogleNet: 6.8% error
Karpathy: 5.1% error
16. What are good attributes?
• Good attributes
– are task and category dependent;
– class discriminative, but not class specific;
– interpretable by humans; and
– detectable by computers
VOGIN-IP 201516
17. Quiz: What are good attributes?
• is grey?
• is made of atoms?
• lives in Amsterdam?
• is sunny?
• eat fish?
• has a SIFT descriptor with empty bin 3?
• has 4 wheels?
VOGIN-IP 201517
18. How many attributes?
• In theory k binary attributes can represent
– 2k classes
• In practice for c classes we need
– Many attributes
VOGIN-IP 201518
22. Attribute Based Prediction
1. Learn attribute classifiers
from related classes
2. Train and Test set are
disjoint
3. Infer attributes from new
test image
4. Use attribute-to-class
mapping to predict class
VOGIN-IP 201522
24. Disadvantages
• Unnatural distinction between
– Attributes to be detected
– Classes of interest
• Inherently multi-class zero-shot prediction
VOGIN-IP 201524
25. Classification based on co-occurrences
I’m looking for a label, which I have not seen
before. However, this picture contains also:
1. Indoor
2. Living room
3. Table
4. Chair
VOGIN-IP 201525
28. COSTA: Design
VOGIN-IP 201528
• Many visual concepts can be described as an open
set of concept-to-concept relations
• Describe image semantics with co-occurrences
• Exploit natural bias in natural images
29. Exploit natural bias in natural images
VOGIN-IP 201529
Sink"is"u
visual"spa
a"stove,"
(2)
onal to the
ments weestimate lab
truthlabellingof our i
occurrencestatisticsc
pora,e.g.,Wordnet or
suchasYahoo, Google
3
Lreg =
i
kwi −
k
akw
=
X
i
X
d
wid − a>
whereindex i and k both run over th
sii = 0. The vector vid contains
weightedweight vectorsvidk = sik w
Notethat thelossisformulatedove
over train images. Moreover, Eq. (9)
=
i d
wid −
whereindex i and k both run ov
sii = 0. The vector vid conta
weightedweight vectorsvidk = s
Notethat thelossisformulated
over train images. Moreover, Eq
obtainedinclosed-formusingrid
weobservethat regularization is
formance, thedimensionality of a
(1)
etween the
s paper, we
atistics be-
erent simi-
two labels.
informativeclueabout thev
isalsoshowninanimagere
In addition to the positi
c++
ij , wealsousetheother p
thepresenceof label i with
senceof label i withthepres
of bothlabels, denotedby c+
i
eachof thesedefinitionsof
larity measuresdefinedabov
Using the positive and
weight vector w of anunkn
36. Conclusions
• Supervised visual classification performs well
when ample train data is available
• Classification without examples:
– Define some set of base classifiers
– Transfer new class to space of these classifiers
– Two examples: attributes and co-occurrences
VOGIN-IP 201536
37. Thanks to:
• The organizers
• Christoph Lampert for slides and inspiration
• Authors of the cited papers
• Colleagues and supervisors (UvA: Amir, Cees, Jan, Spencer &
Stratis, PhD: Cordelia, Florent, Gabriela, Jakob)
VOGIN-IP 201537
38. Literature
• Frome, Corrado, Shlens, Bengio, Dean, Ranzato,and Mikolov,
“DeViSE: A Deep Visual-Semantic Embedding Model”, NIPS 2013
• Habibian, Mensink, and Snoek, “VideoStory: A New Multimedia
Embedding for Few-Example Recognition and Translation of Events”,
ACM MM 2014
• Lampert, Nickish, and Harmeling, “Attribute-Based Classification for
Zero-Shot Learning of Object Categories”, TPAMI 2013
• Li, Gavves, Mensink, and Snoek, “Attributes Make Sense on
Segmented Objects”, ECCV 2014
• Mensink, Gavves, and Snoek, “COSTA: Co-Occurrence Statistics for
Zero-Shot Classification”, CVPR 2014
• Norouzi, Mikolov, Bengio, Singer, Shlens, Frome, Corrado, and Dean,
“Zero-Shot Learning by Convex Combination of Semantic
Embeddings”, ICLR 2014
VOGIN-IP 201538