1. Summary of Papers of SIGIR 2011 Workshop on Query Representation and Understanding Chetana Gavankar
2. Ricardo Campos, Alipio Jorge, Gael Dias: "Using Web Snippets and Query-logs to Measure Implicit Temporal Intents in Queries"
3. Temporal queries 1. Atemporal : Queries not sensitive to time like plan my trip 2. Temporal unambiguous : Queries in concrete time period. Ex : Haiti earthquake in 2010 3. Temporal ambiguous : queries with multiple instances over time. Ex : Cricket worldcup which occurs every four years.
4. Web snippets and Query Logs Content-Related Resources , based on a web content approach Simply requires the set of web search results. Query-Log Resources , based on similar year-qualified queries Imply that some versions of the query have already been issued.
5. 1. Web snippets ( temporal evidence within web pages): TA(q)= ∑ f ε I w f f(q) I = {Tsnippet(.),TTitle(.),TUrl(.)} Value each feature differently using w f 18.14 for TTitles, 50.91 for TSnippets and 30.95 for Turl(.) If TA(q) value < 10% then Atemporal. Dates appearing in query & docs may not match. # Snippets Retrieved with Dates Identifying implicit temporal queries TSnippets = # Snippets Retrieved
6. Identifying implicit temporal queries 2.Web Query Logs : Temporal activity can be recorded from date & time of request and from user activity. No. of times query is pre, post qualified by year is WA(q,y)=#(y,q) + #(q,y) α(q) = ∑ y WA (q,y) / ∑ x #(x,q) + ∑ x #(q,x) If query qualified with single year then α(q) =1
7. Results Temporal information is more frequent in web snippets than in any of the query logs of Google and Yahoo!; Most of the queries have a TSnippet(.) value around 20%, TLogYahoo(.) and TLogGoogle(.) are mostly near to 0%.
8.
9. Query having dates does not necessarily mean that it has temporal intent (from web query logs of Google and yahoo) Ex: October Sky movie
18. Ex: how to, wikipedia 2.Peripheral lexicon (P-Lex or HEADs): Rare ones with degree much less than those in kernel Ex: Decision Tree algorithm
19. Degree Disribution |N| = Nodes, |E| = edges C= average clustering coefficient d=mean shortest path between edges C rand and d rand are corr. Values in random graph C rand ~ k'/ |N| , d rand ~ ln(|N|)/ ln(|k'|) k' = average degree of graph Degree distribution= p(k) = nodes with degree k/ total nodes
35. Latent Topic Analysis in Query Log Query log record (user_id, query, clicked_url, time) Pseudo-document generation: Queries related to the same host are aggregated. General sites like “en.wikipedia.org” are not suitable for latent topic analysis & are eliminated Latent Dirichlet Allocation Algorithm) LDA to conduct the latent semantic topic analysis on the collection of host-based pseudo-documents. Z = set of latent topic s z i Each z i is associated with multinomial distribution of terms P ( tk | z i )= prob of term tk given topic z i
36. Personalization π u ={ π u 1 , π u 2 , … , π u |z| } = profile of the user u , π u i = P ( z i | u ) = probability that the user u prefers the topic z i Generate user-based pseudo-document U for user u . { P ( z 1 | U ), P ( z 2 | U ), … , P ( z | Z | | U )} = profile of u . candidate query q : t 1 , … t n Topic of term t r = z r
37. Topic based scoring with personalization Candidate query score: model parameter P ( zj | zi ) captures the relationship of two topics With personal profile P ( z 1 | u ) = probability that user u prefers the topic z 1
38. Conclusion Framework that considers personalization achieves the best performance. With user profiles, the topic-based scoring part is more reliable