4. Mining user behaviour/preferences
Predict document relevance
Re-rank the search results
Compare different ranking functions (train/test)
Optimize the ad. performance
Query suggestions
How Big are these logs?
◦ 10+ terabyte of entries each day
◦ Composed of billions of distinct (query, url)’s
Comp 7220 1/11/2012 4
5. Documents/results Ranking factors
Many ranking factors
presented in order of depend on query,
considered when
the relevance to the document and
ranking these results
query query-document pair
Improving ranking
based on user Personalized search Recency (temporal)
preferences +Social search ranking
(likes/dislikes)
Comp 7220 1/11/2012 5
7. # of clicks received
[CIKM'09 Tutorial]
Comp 7220 1/11/2012 7
8. Trust factor: Preferences to certain URLs more than the other,
e.g., wikipedia.com, stackoverflow.com, Yahoo answers,
about.com
What is missing (in previous models) ?
Modelling trust factor
Clicks on sponsored results
Related queries/searches (sidebars)
Realistic and flexible assumptions on user behaviour
Comp 7220 1/11/2012 8
11. No No
Snippet Examine? Snippet Examine?
No
Yes Yes
No
Snippet Attractive? Snippet Attractive?
No
Yes No Yes
Enough Utility? Enough Utility?
Yes Yes
End End
Comp 7220 1/11/2012 11
12. Realistic and flexible assumptions on user
behaviour (session modelling)
Consider trust bias (trust factor)
Order results for particular query by
relevance scores predicted by model
Comparison of this order to the editorial
ranking
Is it good model? If orderings agree upto a
considerable extent
Comp 7220 1/11/2012 12
13. Deploy this model as a feature/factor for predicting relevance in learning
to rank algorithm
Deriving retrieval/ranking function
If metric gains over baseline ranking function? Model insights can be used
as a feature in ranking function
Ranking function tests with different class of queries for metric gains
Comp 7220 1/11/2012 13
14. Metrics
• Discounted Cumulative Gain (DCG)
• Normalized DCG (NDCG)
• Precision
• Recall
Two types of data
1. Search click logs (from real or meta search engines)
2. Benchmarking dataset LEarning TO Rank (LETOR) for
information retrieval
Comp 7220 1/11/2012 14
15. [Guo et al., 2009]
[Chapelle and Zhang, 2009]
Comp 7220 1/11/2012 15
16. David Green Blog. http://davidgreen.com/comparative-value-of-google-search-
rankings (accessed 20th-April-2011)
Fan Guo and Chao Liu. Statistical Models for Web Search Click Log Analysis.
Tutorial, 2009
Fan Guo, Chao Liu, and Yi Min Wang. Efficient multiple-click models in web
search. In Proceedings of Second Web Search and Data Mining (WSDM)
Conference, Barcelona, Spain, pages 124-131. ACM, 9-11 February, 2009
Olivier Chapelle and Ye Zhang. A dynamic bayesian network click model for
web search and ranking. In Proceedings of the 18th International Conference
on World Wide web (WWW), Madrid, Spain, pages 1-10, ACM, 20-24 April,
2009
Comp 7220 1/11/2012 16
User Browsing Model (UBM) [Dupret and Piwowarski, 2008]Dynamic Bayesian Model (DBM) [Chapelle and Zhang, 2009] Session Utility Model (SUM) [Dupret and Liao, 2010]Independent Click Model (ICM) [Guo et. al, 2009]Dependent Click Model (DCM) [Guo et. al, 2009]