SlideShare une entreprise Scribd logo
1  sur  57
Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Singapore Jonathan Elsas, Jaime Arguello, Jamie Callan & Jaime Carbonell LTI/SCS/CMU
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Background
What is a Blog?
What is a Feed? <xml> <feed> <entry> <author>Peter …</> <title>Good, Evil…</> <content>I’ve said…</> </entry> <entry> <author>Peter …</> <title>Agreeing…</> <content>Some peo…</> </entry> …
Blog-Feed Correspondence Blog Feed Post Entry HTML XML
Why are Blogs important? ,[object Object],[http://www.technorati.com/about/]
The Task
Feed Search at TREC ,[object Object],[object Object],[object Object],(a.k.a. Blog Distillation)
Feed Search at TREC ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Represent Ongoing Information  Needs Frequently Very General
Challenges in Feed Search
Challenges in Feed Search ,[object Object],entries time feed
[object Object],[object Object],Challenges in Feed Search entries time feed
Challenges in Feed Search ,[object Object],time Space Exploration topic NASA China’s plans for the moon shuttle launch My dog Mars rover Boeing
Challenges in Feed Search ,[object Object],[object Object],Space Exploration time topic
Challenges in Feed Search ,[object Object],[object Object],time
Challenges in Feed Search ,[object Object],[Mac] [Music] [Food] [Wine] …  post regularly about new  products ,  features , or  application software  of Apple Mac computers. …  describing  songs ,  biographies  of musicians, musical  styles  and their  influences  of music on people are discussed. … such as  tastings ,  reviews , food  matching  or  pairing , and  oenophile news  and  events . …  describing experiences  eating  cuisines,  culinary delights , recipes ,  nutrition plans .
Our Approach
Feeds: ,[object Object],[object Object],[object Object],Information Needs: General & Ongoing Challenges Our Approach Retrieval Models Feedback Models
Retrieval Models ,[object Object],[object Object],[object Object]
Large Document (Feed) Model [Q] <?xml… … </…> `<?xml… … </…> <?xml… … </…> <?xml… <feed> <entry> <entry> <entry> <entry> <entry> … </…> <?xml… … </…> <?xml… … </…> <?xml… … </…> <?xml… <feed> <entry> <entry> <entry> <entry> <entry> … </…> Feed Document  Collection Ranked Feeds Rank by Indri’s standard retrieval model [Metzler and Croft, 2004; 2005]
Large Document (Feed) Model ,[object Object],[object Object],[object Object],[object Object],[object Object],Feed Entry E E Entry Entry E
Small Document (Entry) Model Ranked Entries [Q] <entry> <entry> <entry> <entry> <?xml… <entry> Entry Document  Collection <entry> <entry> <entry> <entry> <?xml… <entry> <entry> <entry> <entry> <entry> <?xml… <entry> <entry> <entry> <entry> <entry> <?xml… <entry> <entry> <entry> <entry> <entry> <?xml… <entry> <entry> <entry> <entry> <entry> <?xml… <entry> <entry> <entry> <entry> <entry> <?xml… <entry> Ranked Feeds document = entry Apply some rank aggregation function Rank By
Small Document (Entry) Model ,[object Object],[object Object],[object Object],ReDDE Federated Search Algortihm [Si & Callan, 2003]
Entry Centrality ,[object Object],[object Object],time topic
Small Document (Entry) Model ,[object Object],[object Object],[object Object],[object Object],[object Object],Not only improves speed,  Also performance Q
Retrieval Model Results
Retrieval Model Results ,[object Object],[object Object],[object Object]
Retrieval Model Results Mean Average Precision Large Document (Feed) Model Small Document (Entry) Models
Retrieval Model Results Mean Average Precision Uniform Log(Feed Length) Uniform Log Prior Map 0.188
Retrieval Model Results Mean Average Precision Uniform Log(Feed Length) Uniform n/a
Feedback Models ,[object Object],[object Object],[object Object]
Query Expansion (PRF) [Q] BLOG06 Collection Related Terms from top K documents [Q + Terms] [Lavrenko & Croft, 2001]
Query Expansion Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[Photography] PRF photography nude erotic art girl free teen fashion women
Feedback Model Results Mean Average Precision None PRF
Query Expansion (Wikipedia PRF) [Q] BLOG06 Collection [Q + Terms] [Lavrenko & Croft, 2001] Wikipedia [Diaz & Metzler, 2006] Related Terms from top K documents
Query Expansion Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[Photography] PRF photography nude erotic art girl free teen fashion women Wikipedia PRF photography director special film art camera music cinematographer photographic
Feedback Model Results Mean Average Precision None PRF Wiki. PRF
Query Expansion (Wikipedia Link) [Q] BLOG06 Collection [Q + Terms] Wikipedia Related Terms from  link structure
Wikipedia Link-Based Query Expansion
Wikipedia Link-Based Expansion Wikipedia … Q
Wikipedia Link-Based Expansion … Relevance Set,  Top R = 100 Working Set,  Top W = 1000 Q Wikipedia
Wikipedia Link-Based Expansion … Wikipedia Q Relevance Set,  Top R = 100 Working Set,  Top W = 1000
Wikipedia Link-Based Expansion Relevance Set,  Top R = 100 Working Set,  Top W = 1000 … Wikipedia Extract anchor text from Working Set  that link to the  Relevance Set . Q
Wikipedia Link-Based Expansion Relevance Set,  Top R = 500 Working Set,  Top W = 1000 … Wikipedia Extract anchor text from Working Set  that link to the  Relevance Set . Q Combines relevance and popularity Relevance: An anchor phrase that links to a high ranked article gets a high score Popularity: An anchor phrase that links many times to a mid-ranked articles also gets high score
Query Expansion Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[Photography] PRF photography nude erotic art girl free teen fashion women Ideal digital photography depth of field photographic film photojournalism cinematography
Feedback Model Results Mean Average Precision None PRF Wiki. PRF Wiki. Link
Conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Thank You! Student Travel Grant funding from:    ACM SIGIR,    Amit Singhal,    Microsoft Research
Entry Centrality GM Derivation where Entry Generation Likelihood: |E|
Query Expansion Examples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[Music] PRF Music Country Download Free MP3 Mp3andmore Lyric Listen Song
Query Expansion Examples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[Scottish Independence] PRF scotland independence party convention politics snp national people scot
Query Expansion Examples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[Machine Learning] PRF learn machine credit card karaoke journal sex model sew
Query Generality Characteristics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Relevance Set Cohesiveness … Relevance Set,  Top R = 100 Wikipedia Cohesiveness = |  L in  | |  L in  U  L out  |
Relevant Set Cohesiveness
Is it the Queries? ,[object Object],[object Object],[object Object],But, none of these measures predict whether wikipedia expansions helps…

Contenu connexe

Similaire à Retrieval and Feedback Models for Blog Feed Search

Data scientist enablement dse 400 week 3 roadmap
Data scientist enablement   dse 400   week 3 roadmapData scientist enablement   dse 400   week 3 roadmap
Data scientist enablement dse 400 week 3 roadmap
Dr. Mohan K. Bavirisetty
 
1 ASSIGNMENT 1 REVIEWING RESEARCH AND MAKIN.docx
1  ASSIGNMENT 1   REVIEWING RESEARCH AND MAKIN.docx1  ASSIGNMENT 1   REVIEWING RESEARCH AND MAKIN.docx
1 ASSIGNMENT 1 REVIEWING RESEARCH AND MAKIN.docx
oswald1horne84988
 
Data scientist enablement dse 400 week 4 roadmap
Data scientist enablement   dse 400   week 4 roadmap Data scientist enablement   dse 400   week 4 roadmap
Data scientist enablement dse 400 week 4 roadmap
Dr. Mohan K. Bavirisetty
 

Similaire à Retrieval and Feedback Models for Blog Feed Search (20)

Word embeddings as a service - PyData NYC 2015
Word embeddings as a service -  PyData NYC 2015Word embeddings as a service -  PyData NYC 2015
Word embeddings as a service - PyData NYC 2015
 
Семантический поиск - что это, как работает и чем отличается от просто поиска
Семантический поиск - что это, как работает и чем отличается от просто поискаСемантический поиск - что это, как работает и чем отличается от просто поиска
Семантический поиск - что это, как работает и чем отличается от просто поиска
 
Data scientist enablement dse 400 week 3 roadmap
Data scientist enablement   dse 400   week 3 roadmapData scientist enablement   dse 400   week 3 roadmap
Data scientist enablement dse 400 week 3 roadmap
 
Scalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data Mining
 
Fiddling with flickr
Fiddling with flickrFiddling with flickr
Fiddling with flickr
 
CM UTaipei Kaggle Share
CM UTaipei Kaggle ShareCM UTaipei Kaggle Share
CM UTaipei Kaggle Share
 
AI and Python: Developing a Conversational Interface using Python
AI and Python: Developing a Conversational Interface using PythonAI and Python: Developing a Conversational Interface using Python
AI and Python: Developing a Conversational Interface using Python
 
Pratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnectPratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnect
 
Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013
 
data-science-pdf-16588.pdf
data-science-pdf-16588.pdfdata-science-pdf-16588.pdf
data-science-pdf-16588.pdf
 
1 ASSIGNMENT 1 REVIEWING RESEARCH AND MAKIN.docx
1  ASSIGNMENT 1   REVIEWING RESEARCH AND MAKIN.docx1  ASSIGNMENT 1   REVIEWING RESEARCH AND MAKIN.docx
1 ASSIGNMENT 1 REVIEWING RESEARCH AND MAKIN.docx
 
Dynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & StatisticsDynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & Statistics
 
Data scientist enablement dse 400 week 4 roadmap
Data scientist enablement   dse 400   week 4 roadmap Data scientist enablement   dse 400   week 4 roadmap
Data scientist enablement dse 400 week 4 roadmap
 
Elsevier/Maryland Publishing Connect - 14_0331 (pdf)
Elsevier/Maryland Publishing Connect - 14_0331 (pdf)Elsevier/Maryland Publishing Connect - 14_0331 (pdf)
Elsevier/Maryland Publishing Connect - 14_0331 (pdf)
 
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
 
Data-Driven Growth: Lies, Lawyers & Outsized Results
Data-Driven Growth: Lies, Lawyers & Outsized ResultsData-Driven Growth: Lies, Lawyers & Outsized Results
Data-Driven Growth: Lies, Lawyers & Outsized Results
 
Ed Fry — Data-Driven Growth: Lies, Lawyers & Outsized Results (Turing Fest 2018)
Ed Fry — Data-Driven Growth: Lies, Lawyers & Outsized Results (Turing Fest 2018)Ed Fry — Data-Driven Growth: Lies, Lawyers & Outsized Results (Turing Fest 2018)
Ed Fry — Data-Driven Growth: Lies, Lawyers & Outsized Results (Turing Fest 2018)
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
MT and Post-Editing User-Generated Content AMTA 2014
MT and Post-Editing User-Generated Content AMTA 2014MT and Post-Editing User-Generated Content AMTA 2014
MT and Post-Editing User-Generated Content AMTA 2014
 
QALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic WebQALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic Web
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Retrieval and Feedback Models for Blog Feed Search

  • 1. Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Singapore Jonathan Elsas, Jaime Arguello, Jamie Callan & Jaime Carbonell LTI/SCS/CMU
  • 2.
  • 4. What is a Blog?
  • 5. What is a Feed? <xml> <feed> <entry> <author>Peter …</> <title>Good, Evil…</> <content>I’ve said…</> </entry> <entry> <author>Peter …</> <title>Agreeing…</> <content>Some peo…</> </entry> …
  • 6. Blog-Feed Correspondence Blog Feed Post Entry HTML XML
  • 7.
  • 9.
  • 10.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 19.
  • 20.
  • 21. Large Document (Feed) Model [Q] <?xml… … </…> `<?xml… … </…> <?xml… … </…> <?xml… <feed> <entry> <entry> <entry> <entry> <entry> … </…> <?xml… … </…> <?xml… … </…> <?xml… … </…> <?xml… <feed> <entry> <entry> <entry> <entry> <entry> … </…> Feed Document Collection Ranked Feeds Rank by Indri’s standard retrieval model [Metzler and Croft, 2004; 2005]
  • 22.
  • 23. Small Document (Entry) Model Ranked Entries [Q] <entry> <entry> <entry> <entry> <?xml… <entry> Entry Document Collection <entry> <entry> <entry> <entry> <?xml… <entry> <entry> <entry> <entry> <entry> <?xml… <entry> <entry> <entry> <entry> <entry> <?xml… <entry> <entry> <entry> <entry> <entry> <?xml… <entry> <entry> <entry> <entry> <entry> <?xml… <entry> <entry> <entry> <entry> <entry> <?xml… <entry> Ranked Feeds document = entry Apply some rank aggregation function Rank By
  • 24.
  • 25.
  • 26.
  • 28.
  • 29. Retrieval Model Results Mean Average Precision Large Document (Feed) Model Small Document (Entry) Models
  • 30. Retrieval Model Results Mean Average Precision Uniform Log(Feed Length) Uniform Log Prior Map 0.188
  • 31. Retrieval Model Results Mean Average Precision Uniform Log(Feed Length) Uniform n/a
  • 32.
  • 33. Query Expansion (PRF) [Q] BLOG06 Collection Related Terms from top K documents [Q + Terms] [Lavrenko & Croft, 2001]
  • 34.
  • 35. Feedback Model Results Mean Average Precision None PRF
  • 36. Query Expansion (Wikipedia PRF) [Q] BLOG06 Collection [Q + Terms] [Lavrenko & Croft, 2001] Wikipedia [Diaz & Metzler, 2006] Related Terms from top K documents
  • 37.
  • 38. Feedback Model Results Mean Average Precision None PRF Wiki. PRF
  • 39. Query Expansion (Wikipedia Link) [Q] BLOG06 Collection [Q + Terms] Wikipedia Related Terms from link structure
  • 42. Wikipedia Link-Based Expansion … Relevance Set, Top R = 100 Working Set, Top W = 1000 Q Wikipedia
  • 43. Wikipedia Link-Based Expansion … Wikipedia Q Relevance Set, Top R = 100 Working Set, Top W = 1000
  • 44. Wikipedia Link-Based Expansion Relevance Set, Top R = 100 Working Set, Top W = 1000 … Wikipedia Extract anchor text from Working Set that link to the Relevance Set . Q
  • 45. Wikipedia Link-Based Expansion Relevance Set, Top R = 500 Working Set, Top W = 1000 … Wikipedia Extract anchor text from Working Set that link to the Relevance Set . Q Combines relevance and popularity Relevance: An anchor phrase that links to a high ranked article gets a high score Popularity: An anchor phrase that links many times to a mid-ranked articles also gets high score
  • 46.
  • 47. Feedback Model Results Mean Average Precision None PRF Wiki. PRF Wiki. Link
  • 48.
  • 49. Thank You! Student Travel Grant funding from: ACM SIGIR, Amit Singhal, Microsoft Research
  • 50. Entry Centrality GM Derivation where Entry Generation Likelihood: |E|
  • 51.
  • 52.
  • 53.
  • 54.
  • 55. Relevance Set Cohesiveness … Relevance Set, Top R = 100 Wikipedia Cohesiveness = | L in | | L in U L out |
  • 57.