SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Exploiting User Comments for Audio-visual
Content Indexing and Retrieval

Carsten Eickhoff, Wen Li and Arjen P. de Vries
March 25, 2013




          Delft
          University of
          Technology

          Challenge the future
Overview


• Introduction and statistics

• Harnessing user comments for content indexing

• Dealing with noise

• Retrieval experiments




                          User Comments for Content Indexing and Retrieval   2
Example




          User Comments for Content Indexing and Retrieval   3
Content Annotation



• Audio-visual content retrieval relies on textual meta data

• Author-provided titles and descriptions are often not enough

• Collaborative tagging can provide more information




                         User Comments for Content Indexing and Retrieval   4
Available Annotation Sources

• Tagging content is a tedious task

• To make it more interesting, tagging is sometimes integrated in
  games and reputation schemes

• Still, 58% of a 10,000-video sample from YouTube are annotated
  with less than 140 characters of text each

• At the same time, comment threads are massive…




                        User Comments for Content Indexing and Retrieval   5
Automatic term extraction
           You will get kissed on the nearest
           possible Friday by the love of your   omg i luv
                                                 that stuff
           life.Tomorrow will be the best day
           of your life.However,if you don't
           post this comment to at least 3
           videos,you will die within 2
           days.Now uv started reading dis
           dunt stop…




                               lol luv it luv
                                              Cute
                               snoopy



                 User Comments for Content Indexing and Retrieval   6
Types of Noise

1. Uninformative comments
                                                   omg i luv
                                                   that stuff




                     User Comments for Content Indexing and Retrieval   7
Types of Noise

1. Uninformative comments                        You will get kissed on the nearest
                                                 possible Friday by the love of your
                                                 life.Tomorrow will be the best day
2. Unrelated comments (incl. spam)               of your life.However,if you don't
                                                 post this comment to at least 3
                                                 videos,you will die within 2
                                                 days.Now uv started reading dis
                                                 dunt stop…




                      User Comments for Content Indexing and Retrieval             8
Types of Noise

1. Uninformative comments
                                                  OMG YEAH
2. Unrelated comments (incl. spam)                LOL1!1!!! i luv
                                                  that part u like
3. Misspellings and chat speak                    robot chicken?




                       User Comments for Content Indexing and Retrieval   9
Types of Noise

1. Uninformative comments

2. Unrelated comments (incl. spam)               Snoopy est
                                                 si mignon!!
3. Misspellings and chat speak

4. Foreign language utterances




                       User Comments for Content Indexing and Retrieval   10
LM-based Keyword extraction

• Find those terms that have a locally higher likelihood of
  occurrence than globally in the collection




• Similar notion as tf/idf but within the LM framework




                         User Comments for Content Indexing and Retrieval   11
Bursts

• Peaks in commenting activity may contain interesting information




                        User Comments for Content Indexing and Retrieval   12
Bursts

• Peaks in commenting activity may contain interesting information




[External]:
Actor wins
 an award




                        User Comments for Content Indexing and Retrieval   13
Bursts

• Peaks in commenting activity may contain interesting information




                                                        [Internal]:
                                                       Controversial
                                                         comment


                        User Comments for Content Indexing and Retrieval   14
Generalized Burst Detection

• Kleinberg [1] measured bursts per term

• We need a more general representation of activity peaks




[1] John Kleinberg. Bursty and Hierarchical Structure in Streams, 2003

                              User Comments for Content Indexing and Retrieval   15
Burst and Cause

• Capturing bursts seems to help

• But we also need its cause

• A mixture of language models
  accounts for burst and pre-
  burst term likelihoods




                        User Comments for Content Indexing and Retrieval   16
Vocabulary Regularization

• Currently: Discriminative terms are good

• As a result: Misspellings and non-English terms are recommended

• Wikipedia can help identify such cases:




     Snoopy




                        User Comments for Content Indexing and Retrieval   17
Vocabulary Regularization

• Currently: Discriminative terms are good

• As a result: Misspellings and non-English terms are recommended

• Wikipedia can help identify such cases:




   Yeah!!1%                                                    Wait, that’s
                                                               not a word…



                        User Comments for Content Indexing and Retrieval   18
Data Set


• 10,000 YouTube videos crawled in 2009/10

• 20 seed queries, following “related videos” link

• 4.7 M user comments

• On average 360 comments per video (σ = 984)




                         User Comments for Content Indexing and Retrieval   19
Retrieval experiments

• TREC-style retrieval experiment

• 40 manually constructed topics

• Pooled top 10 results evaluated via crowdsourcing

• BM25F models with fields per source (title, description, etc.)




                         User Comments for Content Indexing and Retrieval   20
Retrieval performance




             User Comments for Content Indexing and Retrieval   21
Retrieval performance




             User Comments for Content Indexing and Retrieval   22
Retrieval performance




             User Comments for Content Indexing and Retrieval   23
Retrieval performance




• 40% gain in MAP


                    User Comments for Content Indexing and Retrieval   24
Retrieval performance




• 40% gain in MAP


                    User Comments for Content Indexing and Retrieval   25
Experiments under Sparsity

• 58% of all video descriptions are shorter than 140 characters

• 50% of all titles are shorter than 35 characters

• We limit our corpus to videos with short titles and/or descriptors

• This affects 77% of all videos in our sample…




                         User Comments for Content Indexing and Retrieval   26
Retrieval performance (sparse)




             User Comments for Content Indexing and Retrieval   27
Retrieval performance (sparse)




• 54% gain in MAP



                    User Comments for Content Indexing and Retrieval   28
Closing the Circle




             User Comments for Content Indexing and Retrieval   29
Conclusion

• User comments can enhance content annotation if we deal with
  the domain-inherent noise appropriately

• Modeling commenting activity bursts, we can find informative
  on-topic comments

• Through the use of Wikipedia, misspellings and foreign language
  utterances can be reliably identified




                        User Comments for Content Indexing and Retrieval   30
Future Directions

• Additional regularization resources (e.g., Delicious, WordNet)

• New domains (e.g., social media streams linked to TV)

• Content-aware term extraction

• Cold start problem

• Cross-language ability




                           User Comments for Content Indexing and Retrieval   31
Thank You!




 User Comments for Content Indexing and Retrieval   32

Contenu connexe

En vedette

Invitation - 40ans Flo
Invitation - 40ans FloInvitation - 40ans Flo
Invitation - 40ans FloHayozf
 
Leprosy part 2 - a presentation at www.eyenirvaan.com
Leprosy part 2 - a presentation at www.eyenirvaan.comLeprosy part 2 - a presentation at www.eyenirvaan.com
Leprosy part 2 - a presentation at www.eyenirvaan.comEyenirvaan
 
Cuadro comparativo psicologia laboral jmc
Cuadro comparativo psicologia laboral jmcCuadro comparativo psicologia laboral jmc
Cuadro comparativo psicologia laboral jmcJose Manuel Carreño
 
05 developing training materials
05 developing training materials05 developing training materials
05 developing training materialsabir hossain
 
Rapport de projet Odoo - gestion de projet et gestion de ressources humaines
Rapport de projet Odoo - gestion de projet et gestion de ressources humainesRapport de projet Odoo - gestion de projet et gestion de ressources humaines
Rapport de projet Odoo - gestion de projet et gestion de ressources humainesAyoub Ayyoub
 

En vedette (12)

FIDEVIC - 2dos Pisos
FIDEVIC - 2dos PisosFIDEVIC - 2dos Pisos
FIDEVIC - 2dos Pisos
 
89_lehenengo galdera.doc
89_lehenengo galdera.doc89_lehenengo galdera.doc
89_lehenengo galdera.doc
 
Invitation - 40ans Flo
Invitation - 40ans FloInvitation - 40ans Flo
Invitation - 40ans Flo
 
Leprosy part 2 - a presentation at www.eyenirvaan.com
Leprosy part 2 - a presentation at www.eyenirvaan.comLeprosy part 2 - a presentation at www.eyenirvaan.com
Leprosy part 2 - a presentation at www.eyenirvaan.com
 
Cuadro comparativo psicologia laboral jmc
Cuadro comparativo psicologia laboral jmcCuadro comparativo psicologia laboral jmc
Cuadro comparativo psicologia laboral jmc
 
Anbefaling BF
Anbefaling BFAnbefaling BF
Anbefaling BF
 
05 developing training materials
05 developing training materials05 developing training materials
05 developing training materials
 
ShaabanMahran
ShaabanMahranShaabanMahran
ShaabanMahran
 
1
11
1
 
TEORÍAS Y MODELOS EDUCATIVOS - 2a Parte
TEORÍAS Y MODELOS EDUCATIVOS - 2a ParteTEORÍAS Y MODELOS EDUCATIVOS - 2a Parte
TEORÍAS Y MODELOS EDUCATIVOS - 2a Parte
 
R intro
R introR intro
R intro
 
Rapport de projet Odoo - gestion de projet et gestion de ressources humaines
Rapport de projet Odoo - gestion de projet et gestion de ressources humainesRapport de projet Odoo - gestion de projet et gestion de ressources humaines
Rapport de projet Odoo - gestion de projet et gestion de ressources humaines
 

Similaire à Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECIR'13)

1908 working memory
1908 working memory1908 working memory
1908 working memoryWarNik Chow
 
Tutorial 13 (explicit ugc + sentiment analysis)
Tutorial 13 (explicit ugc + sentiment analysis)Tutorial 13 (explicit ugc + sentiment analysis)
Tutorial 13 (explicit ugc + sentiment analysis)Kira
 
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...Subhabrata Mukherjee
 
Introduction to MTM-4005
Introduction to MTM-4005Introduction to MTM-4005
Introduction to MTM-4005Susan Murphy
 
Immersive Recommendation
Immersive RecommendationImmersive Recommendation
Immersive Recommendation承剛 謝
 
Socialcom2011 discussionactivityprediction
Socialcom2011 discussionactivitypredictionSocialcom2011 discussionactivityprediction
Socialcom2011 discussionactivitypredictionWeGov project
 
Franklin university humn 240 assignment help
Franklin university humn 240 assignment helpFranklin university humn 240 assignment help
Franklin university humn 240 assignment helpleesa marteen
 
Level2 lesson2
Level2 lesson2Level2 lesson2
Level2 lesson2MsSandyB
 
Data Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisData Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisAli BELCAID
 
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...hajinouha0
 
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...TAUS - The Language Data Network
 
Write a better FM
Write a better FMWrite a better FM
Write a better FMRich Bowen
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim
 
NLP Project Presentation
NLP Project PresentationNLP Project Presentation
NLP Project PresentationAryak Sengupta
 
Designing the Future of Broadcasting
Designing the Future of BroadcastingDesigning the Future of Broadcasting
Designing the Future of BroadcastingDaytona
 
Technology intergrationplan kellieouzts_application
Technology intergrationplan kellieouzts_applicationTechnology intergrationplan kellieouzts_application
Technology intergrationplan kellieouzts_applicationBarrow County Schools
 
[UMAP 2016] User-Oriented Context Suggestion
[UMAP 2016] User-Oriented Context Suggestion[UMAP 2016] User-Oriented Context Suggestion
[UMAP 2016] User-Oriented Context SuggestionYONG ZHENG
 

Similaire à Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECIR'13) (20)

1908 working memory
1908 working memory1908 working memory
1908 working memory
 
Tutorial 13 (explicit ugc + sentiment analysis)
Tutorial 13 (explicit ugc + sentiment analysis)Tutorial 13 (explicit ugc + sentiment analysis)
Tutorial 13 (explicit ugc + sentiment analysis)
 
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
 
Introduction to MTM-4005
Introduction to MTM-4005Introduction to MTM-4005
Introduction to MTM-4005
 
Immersive Recommendation
Immersive RecommendationImmersive Recommendation
Immersive Recommendation
 
Socialcom2011 discussionactivityprediction
Socialcom2011 discussionactivitypredictionSocialcom2011 discussionactivityprediction
Socialcom2011 discussionactivityprediction
 
Franklin university humn 240 assignment help
Franklin university humn 240 assignment helpFranklin university humn 240 assignment help
Franklin university humn 240 assignment help
 
Level2 lesson2
Level2 lesson2Level2 lesson2
Level2 lesson2
 
Data Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisData Acquisition for Sentiment Analysis
Data Acquisition for Sentiment Analysis
 
Fpvp
FpvpFpvp
Fpvp
 
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
 
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...
 
Write a better FM
Write a better FMWrite a better FM
Write a better FM
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
NLP Project Presentation
NLP Project PresentationNLP Project Presentation
NLP Project Presentation
 
4117817.ppt
4117817.ppt4117817.ppt
4117817.ppt
 
Designing the Future of Broadcasting
Designing the Future of BroadcastingDesigning the Future of Broadcasting
Designing the Future of Broadcasting
 
Dmk audioviz
Dmk audiovizDmk audioviz
Dmk audioviz
 
Technology intergrationplan kellieouzts_application
Technology intergrationplan kellieouzts_applicationTechnology intergrationplan kellieouzts_application
Technology intergrationplan kellieouzts_application
 
[UMAP 2016] User-Oriented Context Suggestion
[UMAP 2016] User-Oriented Context Suggestion[UMAP 2016] User-Oriented Context Suggestion
[UMAP 2016] User-Oriented Context Suggestion
 

Plus de Carsten Eickhoff

Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...
Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...
Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...Carsten Eickhoff
 
Web2Text: Deep Structured Boilerplate Removal
Web2Text: Deep Structured Boilerplate RemovalWeb2Text: Deep Structured Boilerplate Removal
Web2Text: Deep Structured Boilerplate RemovalCarsten Eickhoff
 
Cognitive Biases in Crowdsourcing
Cognitive Biases in CrowdsourcingCognitive Biases in Crowdsourcing
Cognitive Biases in CrowdsourcingCarsten Eickhoff
 
Evaluating Music Recommender Systems for Groups
Evaluating Music Recommender Systems for GroupsEvaluating Music Recommender Systems for Groups
Evaluating Music Recommender Systems for GroupsCarsten Eickhoff
 
Active Content-Based Crowdsourcing Task Selection
Active Content-Based Crowdsourcing Task SelectionActive Content-Based Crowdsourcing Task Selection
Active Content-Based Crowdsourcing Task SelectionCarsten Eickhoff
 
Efficient Parallel Learning of Word2Vec
Efficient Parallel Learning of Word2VecEfficient Parallel Learning of Word2Vec
Efficient Parallel Learning of Word2VecCarsten Eickhoff
 
An Eye-Tracking Study of Query Reformulation
An Eye-Tracking Study of Query ReformulationAn Eye-Tracking Study of Query Reformulation
An Eye-Tracking Study of Query ReformulationCarsten Eickhoff
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalCarsten Eickhoff
 

Plus de Carsten Eickhoff (8)

Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...
Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...
Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...
 
Web2Text: Deep Structured Boilerplate Removal
Web2Text: Deep Structured Boilerplate RemovalWeb2Text: Deep Structured Boilerplate Removal
Web2Text: Deep Structured Boilerplate Removal
 
Cognitive Biases in Crowdsourcing
Cognitive Biases in CrowdsourcingCognitive Biases in Crowdsourcing
Cognitive Biases in Crowdsourcing
 
Evaluating Music Recommender Systems for Groups
Evaluating Music Recommender Systems for GroupsEvaluating Music Recommender Systems for Groups
Evaluating Music Recommender Systems for Groups
 
Active Content-Based Crowdsourcing Task Selection
Active Content-Based Crowdsourcing Task SelectionActive Content-Based Crowdsourcing Task Selection
Active Content-Based Crowdsourcing Task Selection
 
Efficient Parallel Learning of Word2Vec
Efficient Parallel Learning of Word2VecEfficient Parallel Learning of Word2Vec
Efficient Parallel Learning of Word2Vec
 
An Eye-Tracking Study of Query Reformulation
An Eye-Tracking Study of Query ReformulationAn Eye-Tracking Study of Query Reformulation
An Eye-Tracking Study of Query Reformulation
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 

Dernier

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 

Dernier (20)

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 

Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECIR'13)

  • 1. Exploiting User Comments for Audio-visual Content Indexing and Retrieval Carsten Eickhoff, Wen Li and Arjen P. de Vries March 25, 2013 Delft University of Technology Challenge the future
  • 2. Overview • Introduction and statistics • Harnessing user comments for content indexing • Dealing with noise • Retrieval experiments User Comments for Content Indexing and Retrieval 2
  • 3. Example User Comments for Content Indexing and Retrieval 3
  • 4. Content Annotation • Audio-visual content retrieval relies on textual meta data • Author-provided titles and descriptions are often not enough • Collaborative tagging can provide more information User Comments for Content Indexing and Retrieval 4
  • 5. Available Annotation Sources • Tagging content is a tedious task • To make it more interesting, tagging is sometimes integrated in games and reputation schemes • Still, 58% of a 10,000-video sample from YouTube are annotated with less than 140 characters of text each • At the same time, comment threads are massive… User Comments for Content Indexing and Retrieval 5
  • 6. Automatic term extraction You will get kissed on the nearest possible Friday by the love of your omg i luv that stuff life.Tomorrow will be the best day of your life.However,if you don't post this comment to at least 3 videos,you will die within 2 days.Now uv started reading dis dunt stop… lol luv it luv Cute snoopy User Comments for Content Indexing and Retrieval 6
  • 7. Types of Noise 1. Uninformative comments omg i luv that stuff User Comments for Content Indexing and Retrieval 7
  • 8. Types of Noise 1. Uninformative comments You will get kissed on the nearest possible Friday by the love of your life.Tomorrow will be the best day 2. Unrelated comments (incl. spam) of your life.However,if you don't post this comment to at least 3 videos,you will die within 2 days.Now uv started reading dis dunt stop… User Comments for Content Indexing and Retrieval 8
  • 9. Types of Noise 1. Uninformative comments OMG YEAH 2. Unrelated comments (incl. spam) LOL1!1!!! i luv that part u like 3. Misspellings and chat speak robot chicken? User Comments for Content Indexing and Retrieval 9
  • 10. Types of Noise 1. Uninformative comments 2. Unrelated comments (incl. spam) Snoopy est si mignon!! 3. Misspellings and chat speak 4. Foreign language utterances User Comments for Content Indexing and Retrieval 10
  • 11. LM-based Keyword extraction • Find those terms that have a locally higher likelihood of occurrence than globally in the collection • Similar notion as tf/idf but within the LM framework User Comments for Content Indexing and Retrieval 11
  • 12. Bursts • Peaks in commenting activity may contain interesting information User Comments for Content Indexing and Retrieval 12
  • 13. Bursts • Peaks in commenting activity may contain interesting information [External]: Actor wins an award User Comments for Content Indexing and Retrieval 13
  • 14. Bursts • Peaks in commenting activity may contain interesting information [Internal]: Controversial comment User Comments for Content Indexing and Retrieval 14
  • 15. Generalized Burst Detection • Kleinberg [1] measured bursts per term • We need a more general representation of activity peaks [1] John Kleinberg. Bursty and Hierarchical Structure in Streams, 2003 User Comments for Content Indexing and Retrieval 15
  • 16. Burst and Cause • Capturing bursts seems to help • But we also need its cause • A mixture of language models accounts for burst and pre- burst term likelihoods User Comments for Content Indexing and Retrieval 16
  • 17. Vocabulary Regularization • Currently: Discriminative terms are good • As a result: Misspellings and non-English terms are recommended • Wikipedia can help identify such cases: Snoopy User Comments for Content Indexing and Retrieval 17
  • 18. Vocabulary Regularization • Currently: Discriminative terms are good • As a result: Misspellings and non-English terms are recommended • Wikipedia can help identify such cases: Yeah!!1% Wait, that’s not a word… User Comments for Content Indexing and Retrieval 18
  • 19. Data Set • 10,000 YouTube videos crawled in 2009/10 • 20 seed queries, following “related videos” link • 4.7 M user comments • On average 360 comments per video (σ = 984) User Comments for Content Indexing and Retrieval 19
  • 20. Retrieval experiments • TREC-style retrieval experiment • 40 manually constructed topics • Pooled top 10 results evaluated via crowdsourcing • BM25F models with fields per source (title, description, etc.) User Comments for Content Indexing and Retrieval 20
  • 21. Retrieval performance User Comments for Content Indexing and Retrieval 21
  • 22. Retrieval performance User Comments for Content Indexing and Retrieval 22
  • 23. Retrieval performance User Comments for Content Indexing and Retrieval 23
  • 24. Retrieval performance • 40% gain in MAP User Comments for Content Indexing and Retrieval 24
  • 25. Retrieval performance • 40% gain in MAP User Comments for Content Indexing and Retrieval 25
  • 26. Experiments under Sparsity • 58% of all video descriptions are shorter than 140 characters • 50% of all titles are shorter than 35 characters • We limit our corpus to videos with short titles and/or descriptors • This affects 77% of all videos in our sample… User Comments for Content Indexing and Retrieval 26
  • 27. Retrieval performance (sparse) User Comments for Content Indexing and Retrieval 27
  • 28. Retrieval performance (sparse) • 54% gain in MAP User Comments for Content Indexing and Retrieval 28
  • 29. Closing the Circle User Comments for Content Indexing and Retrieval 29
  • 30. Conclusion • User comments can enhance content annotation if we deal with the domain-inherent noise appropriately • Modeling commenting activity bursts, we can find informative on-topic comments • Through the use of Wikipedia, misspellings and foreign language utterances can be reliably identified User Comments for Content Indexing and Retrieval 30
  • 31. Future Directions • Additional regularization resources (e.g., Delicious, WordNet) • New domains (e.g., social media streams linked to TV) • Content-aware term extraction • Cold start problem • Cross-language ability User Comments for Content Indexing and Retrieval 31
  • 32. Thank You! User Comments for Content Indexing and Retrieval 32