SlideShare une entreprise Scribd logo
1  sur  20
Summary of Papers of
SIGIR 2011 Workshop on Query
Representation and Understanding
Chetana Gavankar
Ricardo Campos, Alipio Jorge, Gael Dias:
"Using Web Snippets and Query-logs to
Measure Implicit Temporal Intents in
Queries"
Types of Temporal queries
1. Atemporal: Queries not sensitive to
time like plan my trip
2.Temporal unambiguous: Queries in
concrete time period. Ex: Haiti earthquake
in 2010
3. Temporal ambiguous: queries with
multiple instances over time. Ex: Cricket
worldcup which occurs every four years.
Web snippets and Query Logs
Content-Related Resources, based on a web content approach
Simply requires the set of web search results.
Query-Log Resources, based on similar year-qualified queries
Imply that some versions of the query have already been issued.
1.Web snippets
(temporal evidence within web pages):
TA(q)=∑fεI wf f(q)
I = {Tsnippet(.),TTitle(.),TUrl(.)}
Value each feature differently using wf
18.14 for TTitles, 50.91 for TSnippets and 30.95 for Turl(.)
If TA(q) value < 10% then Atemporal.
Dates appearing in query & docs may not match.
TSnippets =
# Snippets Retrieved
# Snippets Retrieved with Dates
Identifying implicit temporal queries
Identifying implicit temporal queries
2.Web Query Logs: Temporal activity can be
recorded from date & time of request and from user
activity.
No. of times query is pre, post qualified by year is
WA(q,y)=#(y,q) + #(q,y)
α(q) = ∑y WA (q,y) / ∑x#(x,q) + ∑x#(q,x)
If query qualified with single year then α(q) =1
Results
Temporal information is more frequent in web snippets than
in any of the query logs of Google and Yahoo!;
Most of the queries have a TSnippet(.) value around 20%,
TLogYahoo(.) and TLogGoogle(.) are mostly near to 0%.
Conclusion
➔Future dates common in snippets than query log
➔Query having dates does not necessarily mean
that it has temporal intent (from web query logs of
Google and yahoo) Ex: October Sky movie
➔Web snippets statistically more relevant in terms
of temporal intent than query logs
Rishiraj Saha Roy, Niloy Ganguly, Monojit
Choudhury, Naveen Singh:
"Complex Network Analysis Reveals
Kernel-Periphery Structure in Web
Search Queries"
Search Queries
Search Query language: bag of segments
Word occurrence n/w: Edge exists if Pij > Pi Pj
Eight complex network models for query logs
●
Query Unrestricted wordnet(local) and (global)
●
Query Restricted wordnet(local) and (global)
●
Query Unrestricted SegmentNet(local) and (global)
●
Query Restricted SegmentNet(local) and (global)
Kernel and Peripheral lexicons
Two regimes in DD of word occurrence N/W:
1.Kernel lexicons (K-Lex or modifiers):
• Units popular in query (high degrees)
• Generic and domain independent
2.Peripheral lexicon (P-Lex or HEADs):Rare ones
with degree much less than those in kernal
P
K-Lex (popular segments) P-Lex (rarer segments)
how to matthew brodrick
wiki accessories
free police officer
and who is
in australia epson tx800
videos star trek next gen
Degree Disribution
|N| = Nodes, |E| = edges
C= average clustering coefficient
d=mean shortest path between edges
Crand and drand are corr. Values in random graph
Crand ~ k'/ |N| , drand ~ ln(|N|)/ ln(|k'|)
k'= average degree of graph
Degree distribution= p(k)
= nodes with degree k/ total nodes
Two regime power law
Conclusion
● Like NL, Queries reflect kernal-periphery distinction
Unlike NL, Query N/W lack small word property for
quickly retrieving words from mind
● More difficult to understand context of segment in query.
● Peripheral N/W consist of large number of small
disconnected components
● Capability of peripheral units to exist by themselves
makes POS identification hard in Queries.
● Socio-cultural factors govern the kernel-periphery
distinction in queries
Lidong Bing, Wai Lam:
"Investigation of Web Query Refinement
via Topic Analysis and Learning with
Personalization"
Web Query Refinement
● Query Refinement
● Substitution
● Expansion
● Deletion
● Stemming
● Spelling correction
● Abbreviation expansion
......................
● Generate some candidate queries first, and score
the quality of these candidates.
Latent Topic Analysis in Query Log
Query log record (user_id, query, clicked_url, time)
Pseudo-document generation: Queries related to the same host are
aggregated. General sites like “en.wikipedia.org” are not suitable for
latent topic analysis & are eliminated
Latent Dirichlet Allocation Algorithm) LDA to conduct the latent
semantic topic analysis on the collection of host-based pseudo-
documents.
Z = set of latent topics zi
Each zi is associated with multinomial distribution of terms
P(tk|zi)= prob of term tk given topic zi
Personalization
πu ={πu
1, πu
2, … , πu
|z|} = profile of the user u,
πu
i = P(zi|u) = probability that the user u prefers the
topic zi
Generate user-based pseudo-document U for user u.
{P(z1|U), P(z2|U), … , P(z|Z||U)} = profile of u.
candidate query q: t1, … tn
Topic of term tr = zr
Topic based scoring with
personalization
Candidate query score:
model parameter P(zj|zi) captures the relationship of two
topics
With personal profile
P(z1|u) = probability that user u prefers the topic z1
Conclusion
Framework that considers
personalization achieves
the best performance.
With user profiles, the
topic-based scoring part
is more reliable

Contenu connexe

Similaire à SIGIR 2011

Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Recordspbajcsy
 
WOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsWOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsAndreas Kamilaris
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsMarina Santini
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Lucidworks
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemTrey Grainger
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Lucidworks
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through EntitiesPeter Mika
 
Building a Semantic search Engine in a library
Building a Semantic search Engine in a libraryBuilding a Semantic search Engine in a library
Building a Semantic search Engine in a librarySEECS NUST
 
Hierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyondHierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyondFrank Kelly
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...Fwdays
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web PagesMichael Nelson
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Takeshi Morita
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollinkSSSW
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingPeter Haase
 

Similaire à SIGIR 2011 (20)

Sigir 2011 proceedings
Sigir 2011 proceedingsSigir 2011 proceedings
Sigir 2011 proceedings
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Records
 
WOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsWOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of Things
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through Entities
 
Building a Semantic search Engine in a library
Building a Semantic search Engine in a libraryBuilding a Semantic search Engine in a library
Building a Semantic search Engine in a library
 
Hierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyondHierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyond
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollink
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
 
Duplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy DatasetDuplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy Dataset
 

Dernier

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 

Dernier (20)

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 

SIGIR 2011

  • 1. Summary of Papers of SIGIR 2011 Workshop on Query Representation and Understanding Chetana Gavankar
  • 2. Ricardo Campos, Alipio Jorge, Gael Dias: "Using Web Snippets and Query-logs to Measure Implicit Temporal Intents in Queries"
  • 3. Types of Temporal queries 1. Atemporal: Queries not sensitive to time like plan my trip 2.Temporal unambiguous: Queries in concrete time period. Ex: Haiti earthquake in 2010 3. Temporal ambiguous: queries with multiple instances over time. Ex: Cricket worldcup which occurs every four years.
  • 4. Web snippets and Query Logs Content-Related Resources, based on a web content approach Simply requires the set of web search results. Query-Log Resources, based on similar year-qualified queries Imply that some versions of the query have already been issued.
  • 5. 1.Web snippets (temporal evidence within web pages): TA(q)=∑fεI wf f(q) I = {Tsnippet(.),TTitle(.),TUrl(.)} Value each feature differently using wf 18.14 for TTitles, 50.91 for TSnippets and 30.95 for Turl(.) If TA(q) value < 10% then Atemporal. Dates appearing in query & docs may not match. TSnippets = # Snippets Retrieved # Snippets Retrieved with Dates Identifying implicit temporal queries
  • 6. Identifying implicit temporal queries 2.Web Query Logs: Temporal activity can be recorded from date & time of request and from user activity. No. of times query is pre, post qualified by year is WA(q,y)=#(y,q) + #(q,y) α(q) = ∑y WA (q,y) / ∑x#(x,q) + ∑x#(q,x) If query qualified with single year then α(q) =1
  • 7. Results Temporal information is more frequent in web snippets than in any of the query logs of Google and Yahoo!; Most of the queries have a TSnippet(.) value around 20%, TLogYahoo(.) and TLogGoogle(.) are mostly near to 0%.
  • 8. Conclusion ➔Future dates common in snippets than query log ➔Query having dates does not necessarily mean that it has temporal intent (from web query logs of Google and yahoo) Ex: October Sky movie ➔Web snippets statistically more relevant in terms of temporal intent than query logs
  • 9. Rishiraj Saha Roy, Niloy Ganguly, Monojit Choudhury, Naveen Singh: "Complex Network Analysis Reveals Kernel-Periphery Structure in Web Search Queries"
  • 10. Search Queries Search Query language: bag of segments Word occurrence n/w: Edge exists if Pij > Pi Pj Eight complex network models for query logs ● Query Unrestricted wordnet(local) and (global) ● Query Restricted wordnet(local) and (global) ● Query Unrestricted SegmentNet(local) and (global) ● Query Restricted SegmentNet(local) and (global)
  • 11. Kernel and Peripheral lexicons Two regimes in DD of word occurrence N/W: 1.Kernel lexicons (K-Lex or modifiers): • Units popular in query (high degrees) • Generic and domain independent 2.Peripheral lexicon (P-Lex or HEADs):Rare ones with degree much less than those in kernal P K-Lex (popular segments) P-Lex (rarer segments) how to matthew brodrick wiki accessories free police officer and who is in australia epson tx800 videos star trek next gen
  • 12. Degree Disribution |N| = Nodes, |E| = edges C= average clustering coefficient d=mean shortest path between edges Crand and drand are corr. Values in random graph Crand ~ k'/ |N| , drand ~ ln(|N|)/ ln(|k'|) k'= average degree of graph Degree distribution= p(k) = nodes with degree k/ total nodes
  • 14. Conclusion ● Like NL, Queries reflect kernal-periphery distinction Unlike NL, Query N/W lack small word property for quickly retrieving words from mind ● More difficult to understand context of segment in query. ● Peripheral N/W consist of large number of small disconnected components ● Capability of peripheral units to exist by themselves makes POS identification hard in Queries. ● Socio-cultural factors govern the kernel-periphery distinction in queries
  • 15. Lidong Bing, Wai Lam: "Investigation of Web Query Refinement via Topic Analysis and Learning with Personalization"
  • 16. Web Query Refinement ● Query Refinement ● Substitution ● Expansion ● Deletion ● Stemming ● Spelling correction ● Abbreviation expansion ...................... ● Generate some candidate queries first, and score the quality of these candidates.
  • 17. Latent Topic Analysis in Query Log Query log record (user_id, query, clicked_url, time) Pseudo-document generation: Queries related to the same host are aggregated. General sites like “en.wikipedia.org” are not suitable for latent topic analysis & are eliminated Latent Dirichlet Allocation Algorithm) LDA to conduct the latent semantic topic analysis on the collection of host-based pseudo- documents. Z = set of latent topics zi Each zi is associated with multinomial distribution of terms P(tk|zi)= prob of term tk given topic zi
  • 18. Personalization πu ={πu 1, πu 2, … , πu |z|} = profile of the user u, πu i = P(zi|u) = probability that the user u prefers the topic zi Generate user-based pseudo-document U for user u. {P(z1|U), P(z2|U), … , P(z|Z||U)} = profile of u. candidate query q: t1, … tn Topic of term tr = zr
  • 19. Topic based scoring with personalization Candidate query score: model parameter P(zj|zi) captures the relationship of two topics With personal profile P(z1|u) = probability that user u prefers the topic z1
  • 20. Conclusion Framework that considers personalization achieves the best performance. With user profiles, the topic-based scoring part is more reliable