1. The document discusses an approach for the MediaEval 2012 Search and Hyperlinking task that creates an enriched representation of videos and queries, applies multiple similarity metrics, and merges results through late fusion.
2. Three similarity metrics are used - bag-of-words, named entity-based, and tag-based - each with their own advantages and disadvantages.
3. Evaluation results showed the combination of bag-of-words and named entity-based similarity performed best for search, while improvements are needed for linking, including optimizing parameters.
ARF @ MediaEval 2012: Multimodal Video Classification
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities
1. ELIS – Multimedia Lab
MediaEval: Search and Hyperlinking
4-5 October, Pisa, Italy
Tom De Nies
Pedro Debevere, Davy Van Deursen, Wesley De Neve, Erik
Mannens and Rik Van de Walle
Ghent University – IBBT – Multimedia Lab
2. ELIS – Multimedia Lab
Our approach in a nutshell
1. Create enriched representation
of videos and queries
2. Apply multiple similarity metrics
3. Merge results by late fusion
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 2
05/10/2012
3. ELIS – Multimedia Lab
Enriched Data Representation
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 3
05/10/2012
4. ELIS – Multimedia Lab
Enriched Data Representation
Advantages
Comparable queries and videos
Extra metadata containing disambiguated concepts
Easy conversion from video to query object
→ possible to use same approach for Search and Linking!
Disadvantages
o Enrichment step when ingesting data can take a while
o Only English NER tools → automatic translation step for
other languages
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 4
05/10/2012
5. ELIS – Multimedia Lab
1. Create enriched representation
of videos and queries
2. Apply multiple similarity metrics
3. Merge results by late fusion
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 5
05/10/2012
6. ELIS – Multimedia Lab
Similarity metrics
1. “Bag of words” similarity
2. Named Entity-based similarity
3. Tag-based similarity
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 6
05/10/2012
7. ELIS – Multimedia Lab
Bag of Words similarity
TEXT
STOP WORD
TEXT WITHOUT
REMOVAL
STOPWORDS
TF(t,D) = # of
CALCULATE
occurrences of
TERM FREQUENCY (TF)
t in D
FOR
EACH
WORD CALCULATE
INVERSE DOCUMENT
FREQUENCY (IDF)
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 7
05/10/2012
8. ELIS – Multimedia Lab
Bag of Words similarity
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 8
05/10/2012
9. ELIS – Multimedia Lab
Bag of Words similarity
Both corpus & documents taken into account
Common words get lower weight to exploit unique
features
Expensive training step (IDF initialization)
No semantics → ambiguity
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 9
05/10/2012
10. ELIS – Multimedia Lab
Named Entity-based Similarity
Named Entities are extracted from content
Similar content will have similar entities!
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 10
05/10/2012
11. ELIS – Multimedia Lab
Named Entity-based Similarity
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 11
05/10/2012
12. ELIS – Multimedia Lab
Named Entity-based Similarity
Less entities than terms → less calculations than BoW
IDF → IS : no indexing of corpus required
Named Entities are unambiguous
Lower precision / coarser granularity than BoW
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 12
05/10/2012
13. ELIS – Multimedia Lab
Tag-based similarity
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 13
05/10/2012
14. ELIS – Multimedia Lab
Tag-based similarity
Uses user-generated metadata
Synonyms for higher recall
Very coarse granularity / Low precision
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 14
05/10/2012
15. ELIS – Multimedia Lab
1. Create enriched representation
of videos and queries
2. Apply multiple similarity metrics
3. Merge results by late fusion
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 15
05/10/2012
16. ELIS – Multimedia Lab
Late Fusion
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 16
05/10/2012
18. ELIS – Multimedia Lab
Evaluation: Search
Unexpected:
• LIUM > LIMSI, even though LIMSI had better language
detection
→ due to automatic translation?
• NE + BoW > NE + BoW + Tags
→ Tags give false positives higher rank and find more
results, so MRR decreases
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 18
05/10/2012
19. ELIS – Multimedia Lab
Evaluation: Search
Run Precision @60 Recall @60
1 (LIMSI: BoW+NE) 0.056 0.40
2 (LIUM: BoW+NE) 0.061 0.467
3 (LIMSI: BoW+NE+Tags) 0.054 0.433
4 (LIUM: BoW+NE+Tags) 0.059 0.50
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 19
05/10/2012
20. ELIS – Multimedia Lab
Evaluation: Linking
MAP (Ground Truth) MAP (Search results)
LIMSI (BoW + NE) 0.157 0.014
LIUM (BoW + NE) 0.171 0.040
LIMSI (BoW + NE + Tags) 0.157 0.003
LIUM (BoW + NE + Tags) 0.171 0.037
Possible explanations:
• Thresholds optimized for Search task, not for Linking
• User-generated tags vs. extracted tags
… to be investigated!
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 20
05/10/2012
21. ELIS – Multimedia Lab
Improvements / Future Work
• Better ranking criteria / late fusion
• Improve tag-similarity
• Optimize parameters for linking
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 21
05/10/2012
22. ELIS – Multimedia Lab
Discussion
These research activities were funded by Ghent University, IBBT, the IWT
Flanders, the FWO-Flanders, and the European Union, in the context of the
IBBT project Smarter Media in Flanders (SMIF).
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 22
05/10/2012