In Dynamic Information Retrieval modeling we model dynamic systems which change or adapt over time or a sequence of events using a range of techniques from artificial intelligence and reinforcement learning. Many of the open problems in current IR research can be described as dynamic systems, for instance, session search or computational advertising. State of the art research provides solutions to these
problems that are responsive to a changing environment, learn from past interactions and predict future utility. Advances in IR interface, personalization and ad display demand models that can react to users in real time and in an intelligent, contextual way.
The objective of this tutorial is to provide a comprehensive and up-to-date introduction to Dynamic Information Retrieval Modeling. We motivate a conceptual model linking static, interactive and dynamic retrieval and use this to define dynamics within the context of IR. We then cover a number of algorithms and techniques from the
artificial intelligence (AI) and online learning literature such as Markov Decision Processes (MDP), their partially observable variation (POMDP) and multi-armed bandits. Following this we describe how to identify dynamics in an IR problem and demonstrate how to model them using the described techniques. The remainder of the tutorial will then cover an array of state-of-the-art research on dynamic systems in IR and how they can be modeled using dynamic IR. We use research on session search, multi-page search and online advertising as in-depth examples of such work.
An older version of this tutorial presented at SIGIR 2014 can be found here http://www.slideshare.net/marcCsloan/dynamic-information-retrieval-tutorial
This version has a greater emphasis on the underlying theory and a guest lecture on evaluation by Dr Emine Yilmaz. This newer version presents a wider range of applications of DIR in state of the art research and includes a guest lecture on evaluation by Prof Charles Clarke.
http://www.dynamic-ir-modeling.org/
@inproceedings{Yang:2015:DIR:2684822.2697038,
author = {Yang, Hui and Sloan, Marc and Wang, Jun},
title = {Dynamic Information Retrieval Modeling},
booktitle = {Proceedings of the Eighth ACM International Conference on Web Search and Data Mining},
series = {WSDM '15},
year = {2015},
isbn = {978-1-4503-3317-7},
location = {Shanghai, China},
pages = {409--410},
numpages = {2},
url = {http://doi.acm.org/10.1145/2684822.2697038},
doi = {10.1145/2684822.2697038},
acmid = {2697038},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {dynamic information retrieval modeling, probabilistic relevance model, reinforcement learning},
}
Dynamic Information Retrieval Tutorial - WSDM 2015
1. WSDM Tutorial February 2nd 2015
Grace Hui Yang
Marc Sloan
Jun Wang
Guest Speaker: Charlie Clarke
Dynamic Information Retrieval
Modeling
2. Dynamic Information
Retrieval
Dynamic Information Retrieval Modeling Tutorial
20152
Document
s to
explore
Informatio
n
need
Observed
document
s
User
Devise a strategy
for helping the
user explore the
information space
in order to learn
which documents
are relevant and
which aren’t, and
satisfy their
information need.
3. Evolving IR
Dynamic Information Retrieval Modeling Tutorial
20153
Paradigm shifts in IR as new models
emerge
e.g. VSM → BM25 → Language Model
Different ways of defining relationship
between query and document
Static → Interactive → Dynamic
Evolution in modeling user interaction with
search engine
4. Outline
Dynamic Information Retrieval Modeling Tutorial
20154
Introduction & Theory
Static IR
Interactive IR
Dynamic IR
Session Search
Dynamic Ranking
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
5. Conceptual Model – Static IR
Dynamic Information Retrieval Modeling Tutorial
20155
Static IR
Interactive
IR
Dynamic
IR
No feedback
6. Characteristics of Static IR
Dynamic Information Retrieval Modeling Tutorial
20156
Does not learn directly from
user
Parameters updated
periodically
7. Dynamic Information Retrieval Modeling Tutorial
20157
Commonly Used Static IR
Models
BM25
PageRank
Language
Model
Learning to
Rank
9. Outline
Dynamic Information Retrieval Modeling Tutorial
20159
Introduction & Theory
Static IR
Interactive IR
Dynamic IR
Session Search
Dynamic Ranking
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
10. Conceptual Model – Interactive
IR
Dynamic Information Retrieval Modeling Tutorial
201510
Static IR
Interactive
IR
Dynamic
IR
Exploit Feedback
11. Learn the user’s taste
interactively!
At the same time, provide good
recommendations!
Dynamic Information Retrieval Modeling Tutorial
201511
Interactive Recommender
Systems
12. Toy Example
Dynamic Information Retrieval Modeling Tutorial
201512
Multi-Page search scenario
User image searches for “jaguar”
Rank two of the four results over two
pages:
𝑟 = 0.5 𝑟 = 0.51 𝑟 = 0.9𝑟 = 0.49
13. Toy Example – Static
Ranking
Dynamic Information Retrieval Modeling Tutorial
201513
Ranked according to PRP
Page 1 Page 2
1.
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
𝑟 = 0.5
𝑟 = 0.49
14. Toy Example – Relevance
Feedback
Dynamic Information Retrieval Modeling Tutorial
201514
Interactive Search
Improve 2nd page based on feedback
from 1st page
Use clicks as relevance feedback
Rocchio1 algorithm on terms in image
webpage
𝑤 𝑞
′
= 𝛼𝑤 𝑞 +
𝛽
|𝐷 𝑟| 𝑑∈𝐷 𝑟
𝑤 𝑑 −
𝛾
𝐷 𝑛
𝑑∈𝐷 𝑛
𝑤 𝑑
New query closer to relevant documents
and different to non-relevant documents1Rocchio, J. J., ’71, Baeza-
Yates & Ribeiro-Neto ‘99
15. Toy Example – Relevance
Feedback
Dynamic Information Retrieval Modeling Tutorial
201515
Ranked according to PRP and Rocchio
Page 1 Page 2
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
𝑟 = 0.5
𝑟 = 0.49
*
1.
* Click
16. Toy Example – Relevance
Feedback
Dynamic Information Retrieval Modeling Tutorial
201516
No click when searching for animals
Page 1 Page 2
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
1.
?
?
17. Toy Example – Value
Function
Dynamic Information Retrieval Modeling Tutorial
201517
Optimize both pages using dynamic IR
Bellman equation for value function
Simplified example:
𝑉 𝑡
𝜃 𝑡
, Σ 𝑡
= max
𝑠 𝑡
𝜃𝑠
𝑡
+ 𝐸(𝑉 𝑡+1
𝜃 𝑡+1
, Σ 𝑡+1
𝐶 𝑡
)
𝜃 𝑡, Σ 𝑡 = relevance and covariance of documents for
page 𝑡
𝐶 𝑡 = clicks on page 𝑡
𝑉 𝑡 = ‘value’ of ranking on page 𝑡
Maximize value over all pages based on
estimating feedback
X Jin, M. Sloan and J. Wang
’13
18. 1 0.8 0.1 0
0.8 1 0.1 0
0.1 0.1 1 0.95
0 0 0.95 1
Toy Example - Covariance
Dynamic Information Retrieval Modeling Tutorial
201518
Covariance matrix represents similarity between
images
X Jin, M. Sloan and J. Wang
’13
19. Toy Example – Myopic Value
Dynamic Information Retrieval Modeling Tutorial
201519
For myopic ranking, 𝑉2
= 16.380
Page 1
2.
1.
X Jin, M. Sloan and J. Wang
’13
20. Toy Example – Myopic
Ranking
Dynamic Information Retrieval Modeling Tutorial
201520
Page 2 ranking stays the same regardless of
clicksPage 1 Page 2
2.
1.
2.
1.
X Jin, M. Sloan and J. Wang
’13
21. Toy Example – Optimal Value
Dynamic Information Retrieval Modeling Tutorial
201521
For optimal ranking, 𝑉2
= 16.528
Page 1
2.
1.
X Jin, M. Sloan and J. Wang
’13
22. Toy Example – Optimal Ranking
Dynamic Information Retrieval Modeling Tutorial
201522
If car clicked, Jaguar logo is more relevant on
next pagePage 1 Page 2
2.
1.
2.
1.
X Jin, M. Sloan and J. Wang
’13
23. Toy Example – Optimal Ranking
Dynamic Information Retrieval Modeling Tutorial
201523
In all other scenarios, rank animal first on next
pagePage 1 Page 2
2.
1.
2.
1.
X Jin, M. Sloan and J. Wang
’13
24. x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o
x x
doc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
Documents exist in vector space
24 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
Static IR Visualization
25. Static IR Visualization
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o
x x
doc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
Q
25 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
t = 1: Static IR considers Relevancy
26. Static IR Visualization
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o
x x
doc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
Q
26 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
t = 1: Static IR considers Relevancy
27. Interactive IR Update
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o
x x
doc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
Q
-1
-1
+1
Q’
27 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
t = 1: Static IR considers Relevancy
t = 2: Interactive considers local gain
28. Interactive IR Update
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o
x x
doc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
Q
-1
-1
+1
Q’
28 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
t = 1: Static IR considers Relevancy
t = 2: Interactive considers local gain
29. Dynamic Ranking Principle
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o
x x
doc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
t = 1: Relevancy + Variance
Q
29 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
30. Dynamic Ranking Principle
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o
x x
doc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
t = 1: Relevancy + Variance + |Correlations|
Q
-1
-1
+1
30 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
31. Dynamic Ranking Principle
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o
x x
doc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
t = 1: Relevancy + Variance + |Correlations|
Diversified, exploratory relevance ranking
Q
31 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
32. Dynamic Ranking Principle
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o
x x
doc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
Q
-1
-1
+1
Q’
32 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
t = 1: Relevancy + Variance + |Correlations|
Diversified, exploratory relevance ranking
t = 2: Personalized Re-ranking
33. Interactive vs Dynamic IR
Dynamic Information Retrieval Modeling Tutorial
201533
• Treats
interactions
independently
• Responds to
immediate
feedback
• Static IR used
before feedback
received
• Optimizes
over all
interaction
• Long term
gains
• Models future
user feedback
• Also used at
beginning of
interaction
Interactive Dynamic
34. Interactive & Dynamic
Techniques
Dynamic Information Retrieval Modeling Tutorial
201534
• Rocchio
equation in
Relevance
Feedback
• Collaborative
filtering in
recommender
systems
• Active learning
in interactive
retrieval
• POMDP in
multi page
search and ad
recommendati
on
• Multi Armed
Bandits in
Online
Evaluation
• MDP in
session search
Interactive Dynamic
35. Outline
Dynamic Information Retrieval Modeling Tutorial
201535
Introduction & Theory
Static IR
Interactive IR
Dynamic IR
Session Search
Dynamic Ranking
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
36. Conceptual Model – Interactive
IR
Dynamic Information Retrieval Modeling Tutorial
201536
Static IR
Interactive
IR
Dynamic
IR
Explore and exploit Feedback
37. Characteristics of Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
201537
Rich interactions
Query formulation
Document clicks
Document examination
Eye movement
Mouse movements
etc.
[Luo et al., IRJ under revision 2014]
38. Characteristics of Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
201538
Temporal dependency
clicked
documentsquery
D1
ranked documents
q1 C1
D2
q2 C2
……
…… Dn
qn Cn
I
information need
iteration 1 iteration 2 iteration n
[Luo et al., IRJ under revision 2014]
39. Characteristics of Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
201539
Overall goal
Optimize over all iterations for goal
IR metric or user satisfaction
Optimal policy
[Luo et al., IRJ under revision 2014]
40. 40/33
Dynamic Information
Retrieval
Dynamic Relevance
Dynamic Users
Dynamic Queries
Dynamic Documents
Dynamic Information Needs
Users change behavior
over time, user history
Topic Trends, Filtering,
document content change
User perceived
relevance changes
Changing query
definition i.e. ‘Twitter’
Information needs evolve over time
Next
generation
Search
Engine
41. Why Not Existing Supervised
Learning for Dynamic IR Modeling?
Dynamic Information Retrieval Modeling Tutorial
201541
Lack of enough training data
Dynamic IR problems contain a sequence of dynamic
interactions
E.g. a series of queries in session
Rare to find repeated sequences (close to zero)
Even in large query logs (WSCD 2013 & 2014, query logs
from Yandex)
Chance of finding repeated adjacent query
pairs is also lowDataset Repeated
Adjacent Query
Pairs
Total Adjacent
Query Pairs
Repeated
Percentage
WSCD
2013
476,390 17,784,583 2.68%
WSCD
2014
1,959,440 35,376,008 5.54%
42. Our Solution
Dynamic Information Retrieval Modeling Tutorial
201542
Try to find an optimal solution
through a sequence of dynamic
interactions
Trial and Error:
learn from repeated, varied attempts
which are continued until success
No (or less) Supervised Learning
44. What is a Desirable Model for
Dynamic IR
Dynamic Information Retrieval Modeling Tutorial
201544
Model interactions, which means it needs to have place
holders for actions;
Model information need hidden behind user queries and
other interactions;
Set up a reward mechanism to guide the entire search
algorithm to adjust its retrieval strategies;
Represent Markov properties to handle the temporal
dependency.
A model in Trial and Error setting will do!
A Markov Model will do!
45. Markov Decision Process
Dynamic Information Retrieval Modeling Tutorial
201545
MDP extends MC with actions and rewards1
si– state ai – action ri – reward
pi – transition probability
p0 p1 p2
……s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
1R. Bellman, ‘57
(S, M, A, R, γ)
46. Definition of MDP
Dynamic Information Retrieval Modeling Tutorial
201546
A tuple (S, M, A, R, γ)
S : state space
M: transition matrix
Ma(s, s') = P(s'|s, a)
A: action space
R: reward function
R(s,a) = immediate reward taking action a at state s
γ: discount factor, 0< γ ≤1
policy π
π(s) = the action taken at state s
Goal is to find an optimal policy π* maximizing the expected
total rewards.
47. Optimality — Bellman
Equation
Dynamic Information Retrieval Modeling Tutorial
201547
The Bellman equation1 to MDP is a recursive
definition of the optimal value function V*(.)
𝑉∗ s = max
𝑎
𝑅 𝑠, 𝑎 + 𝛾
𝑠′
𝑀 𝑎(𝑠, 𝑠′)𝑉∗(𝑠′)
Optimal Policy
π∗ s = arg 𝑚𝑎𝑥
𝑎
𝑅 𝑠, 𝑎 + 𝛾
𝑠′
𝑀 𝑎 𝑠, 𝑠′ 𝑉∗(𝑠′)
1R. Bellman, ‘57
state-value function
48. MDP algorithms
Dynamic Information Retrieval Modeling Tutorial
201548
Value Iteration
Policy Iteration
Modified Policy Iteration
Prioritized Sweeping
Temporal Difference (TD) Learning
Q-Learning
Model free
approaches
Model-based
approaches
[Bellman, ’57, Howard, ‘60, Puterman and Shin, ‘78, Singh & Sutton, ‘96, Sutton &
Barto, ‘98, Richard Sutton, ‘88, Watkins, ‘92]
Solve
Bellman
equation
Optimal
value
V*(s)
Optimal
policy *(s)
[Slide altered from Carlos Guestrin’s ML
lecture]
49. Apply an MDP to an IR
Problem
Dynamic Information Retrieval Modeling Tutorial
201549
We can model IR systems using a Markov
Decision Process
Is there a temporal component?
States – What changes with each time step?
Actions – How does your system change the
state?
Rewards – How do you measure feedback or
effectiveness in your problem at each time
step?
Transition Probability – Can you determine
this?
50. Outline
Dynamic Information Retrieval Modeling Tutorial
201550
Introduction & Theory
Session Search
Dynamic Ranking
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
51. TREC Session Tracks (2010-
now)
Given a series of queries {q1,q2,…,qn}, top 10
retrieval results {D1, … Di-1 } for q1 to qi-1, and
click information
The task is to retrieve a list of documents for the
current/last query, qn
Relevance judgment is made based on how
relevant the documents are for qn, and how relevant
they are for information needs for the entire session
(in topic description)
no need to segment the sessions
51
Dynamic Information Retrieval Modeling Tutorial
2015
52. 1.pocono mountains pennsylvania
2.pocono mountains pennsylvania hotels
3.pocono mountains pennsylvania things to do
4.pocono mountains pennsylvania hotels
5.pocono mountains camelbeach
6.pocono mountains camelbeach hotel
7.pocono mountains chateau resort
8.pocono mountains chateau resort attractions
9.pocono mountains chateau resort getting to
10.chateau resort getting to
11.pocono mountains chateau resort directions
TREC 2012 Session 6
52
Information needs:
You are planning a winter vacation
to the Pocono Mountains region in
Pennsylvania in the US. Where will
you stay? What will you do while
there? How will you get there?
In a session, queries change
constantly
Dynamic Information Retrieval Modeling Tutorial
2015
53. Markov Decision Process
We propose to model session search as a
Markov decision process (MDP)
Two agents: the User and the Search Engine
53
[Guan, Zhang and Yang SIGIR 2013]
54. Settings of the Session MDP
States: Queries
Environments: Search results
Actions:
User actions:
Add/remove/ unchange the query terms
Nicely correspond to our definition of query change
Search Engine actions:
Increase/ decrease /remain term weights
54
[Guan, Zhang and Yang SIGIR 2013]
55. Search Engine Agent’s
Actions
∈ Di−1 action Example
qtheme
Y increase “pocono mountain” in s6
N increase
“france world cup 98 reaction” in s28,
france world cup 98 reaction stock
market→ france world cup 98 reaction
+∆q
Y decrease
‘policy’ in s37, Merck lobbyists → Merck
lobbyists US policy
N increase
‘US’ in s37, Merck lobbyists → Merck
lobbyists US policy
−∆q
Y decrease
‘reaction’ in s28, france world cup 98
reaction
→ france world cup 98
N
No
change
‘legislation’ in s32, bollywood legislation
→bollywood law
55 [Guan, Zhang and Yang SIGIR 2013]
56. Bellman Equation
In a MDP, it is believed that a future reward is
not worth quite as much as a current reward
and thus a discount factor γ ϵ (0,1) is applied
to future rewards.
Bellman Equation gives the optimal value
(expected long term reward starting from state
s and continuing with policy π from then on)
for an MDP:
56
V*
(s) = max
a
R(s,a) + g P(s' | s,a)
s'
å V*
(s')
57. Our Tweak
In a MDP, it is believed that a future reward is
not worth quite as much as a current reward
and thus a discount factor γ ϵ (0,1) is applied
to future rewards.
In session search, a past reward is not worth
quite as much as a current reward and thus a
discount factor γ should be applied to past
rewards
We model the MDP for session search in a reverse
order
57
58. Query Change retrieval Model
(QCM)
Bellman Equation gives the optimal value for
an MDP:
The reward function is used as the document
relevance score function and is tweaked
backwards from Bellman equation:
58
V*
(s) = max
a
R(s,a) + g P(s' | s,a)
s'
å V*
(s')
a
Di
)D|(qPmaxa),D,q|(qP+d)|(qP=d),Score(q 1-i1-i1-i1-iiii
1
Document
relevant
score Query
Transition
model
Maximum
past
relevanceCurrent
reward/relevan
ce score
[Guan, Zhang and Yang SIGIR 2013]
59. Calculating the Transition Model
)|(log)|(
)|(log)()|(log)|(
)|(log)]|(1[+d)|P(qlog=d),Score(q
*
1
*
1
*
1ii
*
1
*
1
dtPdtP
dtPtidfdtPdtP
dtPdtP
qt
i
dt
qt
dt
qt
i
qthemet
i
ii
59
• According to Query Change and Search
Engine ActionsCurrent reward/
relevance
score
Increase
weights for
theme terms
Decrease
weights for
removed terms
Increase
weights for
novel added
termsDecrease
weights for old
added terms
[Guan, Zhang and Yang SIGIR 2013]
60. Maximizing the Reward Function
Generate a maximum rewarded document
denoted as d*
i-1, from Di-1
That is the document(s) most relevant to qi-1
The relevance score can be calculated as
𝑃 𝑞𝑖−1 𝑑𝑖−1 = 1 −
𝑡∈𝑞 𝑖−1
{1 − 𝑃(𝑡|𝑑𝑖−1)}
𝑃 𝑡 𝑑𝑖−1 =
#(𝑡,𝑑 𝑖−1)
|𝑑 𝑖−1|
From several options, we choose to only use the
document with top relevance
max
Di-1
P(qi-1 | Di-1)
60
Dynamic Information Retrieval Modeling Tutorial
2015 [Guan, Zhang and Yang SIGIR 2013]
61. Scoring the Entire Session
The overall relevance score for a session of
queries is aggregated recursively :
Scoresession (qn, d) = Score(qn, d) + gScoresession (qn-1, d)
= Score(qn, d) + g[Score(qn-1, d) + gScoresession (qn-2, d)]
= gn-i
i=1
n
å Score(qi, d)
61
Dynamic Information Retrieval Modeling Tutorial
2015 [Guan, Zhang and Yang SIGIR 2013]
62. Experiments
TREC 2011-2012 query sets, datasets
ClubWeb09 Category B
62
Dynamic Information Retrieval Modeling Tutorial
2015
65. Search Accuracy for Different
Session Types
TREC 2012 Sessions are classified into:
Product: Factual / Intellectual
Goal quality: Specific / Amorphous
Intellec
tual
%chg Amorphous %chg Specific %chg Factual %chg
TREC best 0.3369 0.00% 0.3495 0.00% 0.3007 0.00% 0.3138 0.00%
Nugget 0.3305 -1.90% 0.3397 -2.80% 0.2736 -9.01% 0.2871 -8.51%
QCM 0.3870 14.87% 0.3689 5.55% 0.3091 2.79% 0.3066 -2.29%
QCM+DUP 0.3900 15.76% 0.3692 5.64% 0.3114 3.56% 0.3072 -2.10%
65
- Better handle sessions that demonstrate evolution and
exploration Because QCM treats a session as a continuous
process by studying changes among query transitions and
modeling the dynamicsDynamic Information Retrieval Modeling Tutorial
2015
66. POMDP Model
Dynamic Information Retrieval Modeling Tutorial
201566
……s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
Hidden states
Observations
Belief
1R. D. Smallwood et. al., ‘73
o1 o2 o3
67. POMDP Definition
Dynamic Information Retrieval Modeling Tutorial
201567
A tuple (S, M, A, R, γ, O, Θ, B)
S : state space
M: transition matrix
A: action space
R: reward function
γ: discount factor, 0< γ ≤1
O: observation set
an observation is a symbol emitted according to a hidden
state.
Θ: observation function
Θ(s,a,o) is the probability that o is observed when the
system transitions into state s after taking action a, i.e.
P(o|s,a).
B: belief space
Belief is a probability distribution over hidden states.
68. 68/33
A Markov Chain of Decision Making
…
A1 A2 A3 A4
S1 S2 S3 Sn
“old US coins” “collecting old
US coins”
“selling old US
coins”
q1 q2 q3
“D1 is relevant and I
stay to find out more
about collecting…”
D1 D2 D3
“D2 is relevant and
I now move to the
next topic…”
“D3 is irrelevant; I slightly
edit the query and stay
here a little longer…”
[Luo, Zhang and Yang SIGIR 2014]
69. 69/33
Hidden Decision Making States
SRT
Relevant &
Exploitation
SRR
Relevant &
Exploration
SNRT
Non-Relevant
& Exploitation
SNRR
Non-Relevant
& Exploration
scooter price ⟶ scooter
stores
collecting old US coins⟶
selling old US coins
Philadelphia NYC travel ⟶
Philadelphia NYC train
Boston tourism ⟶ NYC
tourism
q0
[Luo, Zhang and Yang SIGIR 2014]
70. 70/33
Dual Agent Stochastic Game
Hidden states
Actions
Rewards
Markov
……s0
r0
a0
r1
a1
r2
a2
s1 s2 s3
Dual-agent game
Cooperative game
Joint optimization D2
User Agent
Search Engine
Agent
[Luo, Zhang and Yang SIGIR 2014]
71. 71/33
Actions
User Action (Au)
add query terms (+Δq)
remove query terms (-Δq)
keep query terms (qtheme)
Search Engine Action(Ase)
Increase/ decrease/ keep term weights
Switch on or off a search technique,
e.g. to use or not to use query expansion
adjust parameters in search techniques
e.g., select the best k for the top k docs used in PRF
Message from the user(Σu)
clicked documents
SAT clicked documents
Message from search engine(Σse)
top k returned documents
Messages are essentially
documents that an agent
thinks are relevant.
[Luo, Zhang and Yang SIGIR 2014]
75. 75/33
Observation function (O)
O(st+1, at, ωt) = P(ωt|st+1, at)
Two types of observations
Relevance related
Exploration-exploitation related
Probability of making observation ωt after taking action
at and landing in state st+1
[Luo, Zhang and Yang SIGIR 2014]
76. 76/33
Relevance-related Observation
Intuition
Similarly, we have
As well as 76
st is likely to be
Relevant
Non-Relevant
If ∃d ∈ Dt-1 and
d is SAT Clicked
otherwise
It happens after the user sends out the message 𝛴 𝑢
𝑡
(clicks)
𝑂( 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 ,
ωt=Rel)≝ 𝑃(ωt = 𝑅𝑒𝑙| 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢)
𝑂(𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 , ωt = 𝑅𝑒𝑙) ∝ 𝑃 𝑠𝑡 = 𝑅𝑒𝑙 ω𝑡 = 𝑅𝑒𝑙 𝑃(ωt = 𝑅𝑒𝑙, 𝑢)
∝ 𝑃 𝑠𝑡 = 𝑅𝑒𝑙 ω𝑡 = 𝑅𝑒𝑙 𝑃(ωt = 𝑅𝑒𝑙| 𝑢)
𝑂 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙, 𝑢 , ωt = 𝑁𝑜𝑛𝑅𝑒𝑙
∝ 𝑃 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙 ω𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙 𝑃(ωt = 𝑁𝑜𝑛𝑅𝑒𝑙| 𝑢)
𝑂 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙, 𝑢 , ωt = 𝑅𝑒𝑙
𝑂 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 , ωt = 𝑁𝑜𝑛𝑅𝑒𝑙
[Luo, Zhang and Yang SIGIR 2014]
77. 77/33
It is a combined observation
It happens when updating the before-message-belief-state for a user
action au(query change) and a search engine message Ʃse =Dt-1
Intuition
st is likely to be
Exploration
Exploitation
if (+Δqt≠∅ and +Δqt∉Dt-1)
or (+Δqt=∅ and -Δqt≠∅ )
if (+Δqt≠∅ and +Δqt∈Dt-1)
or (+Δqt=∅ and –Δqt=∅ )
EXPLORATION-RELATED OBSERVATION
𝑂 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛, 𝑎𝑢 = ∆𝑞𝑡, 𝑠𝑒 = 𝐷𝑡 − 1, ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛
∝ 𝑃 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛
× 𝑃 ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ∆𝑞𝑡, 𝐷𝑡 − 1
𝑂 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛, 𝑎𝑢 = ∆𝑞𝑡, 𝑠𝑒 = 𝐷𝑡 − 1, ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛
∝ 𝑃 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛 𝑤𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛
× 𝑃(𝑤𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛|∆𝑞𝑡, 𝐷𝑡 − 1)
[Luo, Zhang and Yang SIGIR 2014]
78. 78/33
The belief state b is updated when a new observation is
obtained.
𝒃 𝒕+𝟏(𝒔𝒋) = 𝑷(𝒔𝒋|𝝎𝒕, 𝒂 𝒕, 𝒃 𝒕
=
𝑷(𝝎 𝒕|𝒔𝒋, 𝒂 𝒕, 𝒃 𝒕)
𝒔 𝒊∈𝑺
𝑷(𝒔𝒋|𝒔𝒊, 𝒂 𝒕, 𝒃 𝒕)𝒃 𝒕(𝒔𝒊
)𝑷(𝝎 𝒕|𝒂𝒕, 𝒃 𝒕
=
𝑶(𝒔𝒋, 𝒂 𝒕, 𝝎 𝒕)
𝒔 𝒊∈𝑺
𝑷(𝒔𝒋|𝒔𝒊, 𝒂 𝒕, 𝒃 𝒕)𝒃 𝒕(𝒔𝒊
)𝑷(𝝎 𝒕|𝒂𝒕, 𝒃 𝒕
BELIEF UPDATES (B)
79. 79/33
The long term reward for the search engine agent
The long term reward for the user agent
Joint optimization
𝑸 𝒔𝒆(𝒃, 𝒂) =
𝒔∈𝑺
)𝒃(𝒔)𝑹(𝒔, 𝒂 + 𝜸
𝝎∈𝜴
𝑷(𝝎|𝒃, 𝒂 𝒖, 𝜮 𝒔𝒆)𝑷(𝝎|𝒃, 𝜮 𝒖)𝒎𝒂𝒙
𝒂
𝑸 𝒔𝒆(𝒃′, 𝒂
𝑸 𝒖(𝒃, 𝒂 𝒖) = 𝑹(𝒔, 𝒂 𝒖) + 𝜸 𝒂 𝒖
)𝑻(𝒔 𝒕|𝒔𝒕−𝟏, 𝑫 𝒕−𝟏 𝒎𝒂𝒙 𝒔 𝒕−𝟏
𝑸 𝒖(𝒔𝒕−𝟏, 𝒂 𝒖)
= P(qt|d) +𝜸 𝒂 𝒖
)𝐏(𝒒 𝒕|𝒒 𝒕−𝟏, 𝑫 𝒕−𝟏, 𝒂 𝒎𝒂𝒙 𝑫 𝒕−𝟏
𝑷 (𝒒 𝒕−𝟏|𝑫 𝒕−𝟏)
𝒂 𝒔𝒆 = 𝒂𝒓𝒈𝒎𝒂𝒙
𝒂
(𝑸 𝒔𝒆(𝒃, 𝒂) + 𝑸 𝒖(𝒃, 𝒂 𝒖))
JOINT OPTIMIZATION — WIN-WIN
[Luo, Zhang and Yang SIGIR 2014]
81. 81/33
EXPERIMENTS
Evaluate on TREC 2012 and 2013 Session Tracks
The session logs contain
session topic
user queries
previously retrieved URLs, snippets
user clicks, and dwell time etc.
Task: retrieve 2,000 documents for the last query in each session
The evaluation is based on the whole session.
A document related to any query in the session is a good document
81
Datasets
ClueWeb09
ClueWeb12
Spams, dups are
removed
82. 82/33
ACTIONS
increasing weights of the added terms by a factor of
x={1.05, 1.10, 1.15, 1.20, 1.25, 1.5, 1.75 or 2};
decreasing weights of the added terms by a factor of
y={0.5, 0.57, 0.67, 0.8, 0.83, 0.87, 0.9 or 0.95};
Query Change Model (QCM) proposed in Guan et. al
SIGIR’13;
Pseudo Relevance Feedback which assumes the top 20
retrieved documents are relevant;
directly uses the query in current iteration to perform
retrieval;
combines all queries in a session weights them equally.
82
a
Di
)D|(qPmaxa),D,q|(qP+d)|(qP=d),Score(q 1-i1-i1-i1-iiii
1
83. 83/33
SEARCH ACCURACY
Search accuracy on TREC 2012 Session Track
83
Win-win outperforms most retrieval algorithms on
TREC 2012.
84. 84/33
84
Win-win outperforms all retrieval algorithms
on TREC 2013.
It is highly effective in Session Search.
Search accuracy on TREC 2013 Session Track
SEARCH ACCURACY
85. 85/33
IMMEDIATE SEARCH ACCURACY
85
Original run: top returned documents provided by TREC log data
Win-win’s immediate search accuracy is better than the Original at
every iteration
Win-win's immediate search accuracy increases while the number
of search iterations increases
TREC 2012 Session Track TREC 2013 Session Track
86. 86/33
86
q1=“best US destinations”
observation= NRR
SRT
Relevant &
Exploitation
0.1784
SRR
Relevant &
Exploration
0.1135
SNRT
Non-Relevant &
Exploitation
0.2838
SNRR
Non-Relevant
& Exploration
0.4243
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
BELIEF UPDATES (B)
q0
87. 87/33
87
q1=“best US destinations”
observation= NRR
q2=“distance New York
Boston”
observation = RT
SRT
Relevant &
Exploitation
0.0005
SRR
Relevant &
Exploration
0.0068
SNRT
Non-Relevant &
Exploitation
0.0715
SNRR
Non-Relevant
& Exploration
0.9212
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
88. 88/33
88
q1=“best US destinations”
observation= NRR
q2=“distance New York
Boston”
observation = RT
SRT
Relevant &
Exploitation
0.0005
SRR
Relevant &
Exploration
0.0068
SNRT
Non-Relevant &
Exploitation
0.0715
SNRR
Non-Relevant
& Exploration
0.9212
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
89. 89/33
89
q1=“best US destinations”
observation= NRR
q2=“distance New York
Boston”
observation = RT
q3=“maps.bing.com”
observation = NRT
SRT
Relevant &
Exploitation
0.0151
SRR
Relevant &
Exploration
0.4347
SNRT
Non-Relevant &
Exploitation
0.0276
SNRR
Non-Relevant
& Exploration
0.5226
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
90. 90/33
90
q1=“best US destinations”
observation= NRR
q2=“distance New York
Boston”
observation = RT
q3=“maps.bing.com”
observation = NRT
SRT
Relevant &
Exploitation
0.0151
SRR
Relevant &
Exploration
0.4347
SNRT
Non-Relevant &
Exploitation
0.0276
SNRR
Non-Relevant
& Exploration
0.5226
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
91. 91/33
91
q1=“best US destinations”
observation= NRR
q2=“distance New York
Boston”
observation = RT
q3=“maps.bing.com”
observation = NRT
SRT
Relevant &
Exploitation
0.0291
SRR
Relevant &
Exploration
0.7837
SNRT
Non-Relevant &
Exploitation
0.0081
SNRR
Non-Relevant
& Exploration
0.1790 q20=“Philadelphia NYC train”
observation = NRT
……
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
92. 92/33
92
q1=“best US destinations”
observation= NRR
q2=“distance New York
Boston”
observation = RT
q3=“maps.bing.com”
observation = NRT
SRT
Relevant &
Exploitation
0.0291
SRR
Relevant &
Exploration
0.7837
SNRT
Non-Relevant &
Exploitation
0.0081
SNRR
Non-Relevant
& Exploration
0.1790 q20=“Philadelphia NYC train”
observation = NRT
……
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
93. 93/33
93
q1=“best US destinations”
observation= NRR
q2=“distance New York
Boston”
observation = RT
q3=“maps.bing.com”
observation = NRT
SRT
Relevant &
Exploitation
0.0304
SRR
Relevant &
Exploration
0.8126
SNRT
Non-Relevant &
Exploitation
0.0066
SNRR
Non-Relevant
& Exploration
0.1505 q20=“Philadelphia NYC train”
observation = NRT
q21=“Philadelphia NYC bus”
observation = NRT
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
……
94. 94/33
94
q1=“best US destinations”
observation= NRR
q2=“distance New York
Boston”
observation = RT
q3=“maps.bing.com”
observation = NRT
SRT
Relevant &
Exploitation
0.0304
SRR
Relevant &
Exploration
0.8126
SNRT
Non-Relevant &
Exploitation
0.0066
SNRR
Non-Relevant
& Exploration
0.1505 q20=“Philadelphia NYC train”
observation = NRT
q21=“Philadelphia NYC bus”
observation = NRT
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
……
96. Apply an MDP to an IR Problem
- Example
Dynamic Information Retrieval Modeling Tutorial
201596
User agent in session search
States – user’s relevance judgement
Action – new query
Reward – information gained
[Luo, Zhang, Yang SIGIR’14]
97. The agent uses a state estimator to update its
belief about the hidden states
b′
= 𝑆𝐸(𝑏, 𝑎, 𝑜′)
b′
s′
= P s′
o′
, a, b =
𝑃(𝑠′,𝑜′|𝑎,𝑏)
P(𝑜′|𝑎,𝑏)
=
Θ(𝑠′, 𝑎, 𝑜′) 𝑠 𝑀(𝑠, 𝑎, 𝑠′)𝑏(𝑠)
𝑃(𝑜′|𝑎, 𝑏)
POMDP → Belief Update
Dynamic Information Retrieval Modeling Tutorial
201597
98. POMDP → Bellman Equation
Dynamic Information Retrieval Modeling Tutorial
201598
The Bellman equation for POMDP
𝑉 𝑏 = max
𝑎
𝑟 𝑏, 𝑎 + 𝛾
𝑜′
𝑃(𝑜′
|𝑎, 𝑏)𝑉(𝑏′
)
A POMDP can be transformed into a continuous belief MDP (B, 𝑀′, A,
r, γ)
B : the continuous belief space
𝑀′: transition function 𝑀 𝑎
′ (𝑏, 𝑏′)= 𝑜∈𝑂 1 𝑎,𝑜′(𝑏′, 𝑏)Pr(𝑜′|𝑎, 𝑏)
where 1 𝑎,𝑜′ 𝑏′
, 𝑏 =
1, 𝑖𝑓 𝑆𝐸 𝑏, 𝑎, 𝑜′
= 𝑏′
0, 𝑒𝑙𝑠𝑒
.
A: action space
r: reward function r(b, a)= 𝑠∈𝑆 𝑏 𝑠 𝑅(𝑠, 𝑎)
99. Applying POMDP to Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
201599
POMDP Dynamic IR
Environment Documents
Agents User, Search engine
States Queries, User’s decision making status, Relevance of
documents, etc
Actions Provide a ranking of documents, Weigh terms in the query,
Add/remove/unchange the query terms, Switch on or
switch off a search technology, Adjust parameters for a
search technology
Observations Queries, Clicks, Document lists, Snippets, Terms, etc
Rewards Evaluation measures (such as DCG, NDCG or MAP)
Clicking information
Transition matrix Given in advance or estimated from training data.
Observation
function
Problem dependent, Estimated based on sample datasets
100. Session Search Example - States
100
SRT
Relevant &
Exploitation
SRR
Relevant &
Exploration
SNRT
Non-Relevant &
Exploitation
SNRR
Non-Relevant &
Exploration
scooter price ⟶ scooter
stores
Hartford visitors ⟶ Hartford
Connecticut tourism
Philadelphia NYC travel ⟶
Philadelphia NYC train
distance New York Boston ⟶
maps.bing.com
q0
[ J. Luo ,et al., ’14]
Dynamic Information Retrieval Modeling Tutorial
2015
101. Session Search Example - Actions
(Au, Ase)
101
User Action(Au)
Add query terms (+Δq)
Remove query terms (-Δq)
keep query terms (qtheme)
clicked documents
SAT clicked documents
Search Engine Action(Ase)
increase/decrease/keep term weights,
Switch on or switch off query expansion
Adjust the number of top documents used in PRF
etc.
[ J. Luo et al., ’14]
Dynamic Information Retrieval Modeling Tutorial
2015
102. TREC Session Tracks (2010-
2012)
Given a series of queries {q1,q2,…,qn}, top 10
retrieval results {D1, … Di-1 } for q1 to qi-1, and
click information
The task is to retrieve a list of documents for the
current/last query, qn
Relevance judgment is made based on how
relevant the documents are for qn, and how relevant
they are for information needs for the entire session
(in topic description)
no need to segment the sessions
102
Dynamic Information Retrieval Modeling Tutorial
2015
103. Query change is an important
form of feedback
We define query change as the syntactic
editing changes between two adjacent queries:
includes
, added terms
, removed terms
The unchanged/shared terms are called:
, theme term
1 iii qqq
iq
103
iq
iq
iq
themeq
q1 = “bollywood
legislation”
q2 = “bollywood law”
-------------------------------------
--
Theme Term =
“bollywood”
Dynamic Information Retrieval Modeling Tutorial
2015
104. Where do these query changes come
from?
Given TREC Session settings, we consider two
sources of query change:
the previous search results that a user
viewed/read/examined
the information need
Example:
Kurosawa Kurosawa wife
`wife’ is not in any previous results, but in the topic
description
However, knowing information needs before
search is difficult to achieve
104
Dynamic Information Retrieval Modeling Tutorial
2015
105. Previous search results could
influence query change in quite
complex ways
Merck lobbyists Merck lobbying US policy
D1 contains several mentions of ‘policy’, such as
“A lobbyist who until 2004 worked as senior policy
advisor to Canadian Prime Minister Stephen Harper was
hired last month by Merck …”
These mentions are about Canadian policies; while
the user adds US policy in q2
Our guess is that the user might be inspired by
‘policy’, but he/she prefers a different sub-concept
other than `Canadian policy’
Therefore, for the added terms `US policy’, ‘US’ is the
novel term here, and ‘policy’ is not since it appeared
in D1.
The two terms should be treated differently
105
Dynamic Information Retrieval Modeling Tutorial
2015
106. 106/33
POMDP
Rich Interactions
Hidden, Evolving
Information Needs
A Long Term
Goal
Temporal
Dependency
actions
hidden states
rewards
Markov
property
POMDP
(Partially Observable
Markov Decision
Process)
SG
(Stochastic Games)
Multi-agent
Collaboration
107. Recap – Characteristics of
Dynamic IR
Dynamic Information Retrieval Modeling Tutorial
2015107
Rich interactions
Query formulation, Document clicks, Document
examination, eye movement, mouse movements, etc.
Temporal dependency
Overall goal
108. Modeling Query Change
A framework that is inspired by Reinforcement
Learning
Reinforcement Learning for Markov Decision
Process
models a state space S and an action space A
according to a transition model T = P(si+1|si ,ai)
a policy π(s) = a indicates that at a state s, what are
the actions a can be taken by the agent
each state is associated with a reward function R
that indicates possible positive reward or negative
loss that a state and an action may result.
Reinforcement learning offers general solutions to
MDP and seeks for the best policy for an agent.108
109. Outline
Dynamic Information Retrieval Modeling Tutorial
2015109
Introduction & Theory
Session Search
Dynamic Ranking
Multi Armed Bandits
Portfolio Ranking
Multi-Page Search
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
110. Dynamic Information Retrieval Modeling Tutorial
2015110
Markov Process
Hidden Markov Model
Markov Decision Process
Partially Observable Markov Decision Process
Multi-Armed Bandit
Family of Markov Models
111. Multi Armed Bandits (MAB)
Dynamic Information Retrieval Modeling Tutorial
2015111
……
……
Which slot
machine
should I select
in this round?
Reward
112. Multi Armed Bandits (MAB)
Dynamic Information Retrieval Modeling Tutorial
2015112
I won! Is this
the best slot
machine?
Reward
113. MAB Definition
Dynamic Information Retrieval Modeling Tutorial
2015113
A tuple (S, A, R, B)
S : hidden reward distribution of each
bandit
A: choose which bandit to play
R: reward for playing bandit
B: belief space, our estimate of each
bandit’s distribution
114. Comparison with Markov Models
Dynamic Information Retrieval Modeling Tutorial
2015114
Single state Markov Decision Process
No transition probability
Similar to POMDP in that we maintain a
belief state
Action = choose a bandit, does not
affect state
Does not ‘plan ahead’ but intelligently
adapts
Somewhere between interactive and
dynamic IR
115. MAB Policy Reward
Dynamic Information Retrieval Modeling Tutorial
2015115
MAB algorithm describes a policy 𝜋 for
choosing bandits
Maximise rewards from chosen bandits
over all time steps
Minimize regret
𝑡=1
𝑇
𝑅𝑒𝑤𝑎𝑟𝑑 𝑎∗ − 𝑅𝑒𝑤𝑎𝑟𝑑(𝑎 𝜋(𝑡))
Cumulative difference between optimal
reward and actual reward
116. Exploration vs Exploitation
Dynamic Information Retrieval Modeling Tutorial
2015116
Exploration
Try out bandits to find which has highest average
reward
Exploitation
Too much exploration leads to poor performance
Play bandits that are known to pay out higher
reward on average
MAB algorithms balance exploration and
exploitation
Start by exploring more to find best bandits
Exploit more as best bandits become known
117. MAB – Index Algorithms
Dynamic Information Retrieval Modeling Tutorial
2015117
Gittens index1
Play bandit with highest ‘Dynamic Allocation Index’
Modelled using MDP but suffers ‘curse of
dimensionality’
𝜖-greedy2
Play highest reward bandit with probability 1 − ϵ
Play random bandit with probability 𝜖
UCB (Upper Confidence Bound)3
1J. C. Gittins. ‘89
2Nicolò Cesa-Bianchi et. al.,
‘98
118. Comparison of Markov
Models
Dynamic Information Retrieval Modeling Tutorial
2015118
Markov Process – a fully observable stochastic
process
Hidden Markov Model – a partially observable
stochastic process
MDP – a fully observable decision process
MAB – a decision process, either fully or partially
observable
POMDP – a partially observable decision process
actions rewards states
Markov Process No No Observable
Hidden Markov
Model
No No Unobservable
MDP Yes Yes Observable
POMDP Yes Yes Unobservable
MAB Yes Yes Fixed
119. Outline
Dynamic Information Retrieval Modeling Tutorial
2015119
Introduction & Theory
Session Search
Dynamic Ranking
Multi Armed Bandits
Portfolio Ranking
Multi-Page Search
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
122. UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015122
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
Calculate for all 𝑖 and select highest
Average reward 𝑥𝑖
123. UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015123
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
Calculate for all 𝑖 and select highest
Average reward 𝑥𝑖
Time step 𝑡
124. UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015124
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
Calculate for all 𝑖 and select highest
Average reward 𝑥𝑖
Time step 𝑡
Number of times bandit 𝑖 has been played 𝑇𝑖
125. UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015125
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
Calculate for all 𝑖 and select highest
Average reward 𝑥𝑖
Time step 𝑡
Number of times bandit 𝑖 has been played 𝑇𝑖
Chances of playing infrequently played bandits
increases over time
128. Iterative Expectation
Dynamic Information Retrieval Modeling Tutorial
2015128
𝑟𝑖 +
2 ln 𝑡
𝑇𝑖
Documents 𝑖
Average probability of relevance 𝑟𝑖
M. Sloan and J. Wang ‘1
129. Iterative Expectation
Dynamic Information Retrieval Modeling Tutorial
2015129
𝑟𝑖 +
2 ln 𝑡
𝛾𝑖(𝑡)
Documents 𝑖
Average probability of relevance 𝑟𝑖
‘Effective’ number of impressions
𝛾𝑖 𝑡 = 𝑘=1
𝑡
𝛼
𝐶 𝑘
𝛽
1−𝐶 𝑘
𝛼 and 𝛽 reward clicks and non-clicks depending on
rank
M. Sloan and J. Wang ‘1
130. Iterative Expectation
Dynamic Information Retrieval Modeling Tutorial
2015130
𝑟𝑖 + 𝜆
2 ln 𝑡
𝛾𝑖(𝑡)
Documents 𝑖
Average probability of relevance 𝑟𝑖
‘Effective’ number of impressions
𝛾𝑖 𝑡 = 𝑘=1
𝑡
𝛼
𝐶 𝑘
𝛽
1−𝐶 𝑘
𝛼 and 𝛽 reward clicks and non-clicks depending on
rank
Exploration parameter 𝜆
M. Sloan and J. Wang ‘1
131. Portfolio Theory of IR
Dynamic Information Retrieval Modeling Tutorial
2015131
Portfolio Theory maximises expected return for a
given amount of risk1
Diversity of portfolio increases likely return
We can consider documents as ‘shares’
Documents are dependent on one another, unlike
PRP
Portfolio Theory of IR2 allows us to introduce diversity
1H. Markowitz. ‘52
2J. Wang et. al. ‘09
132. Portfolio Ranking
Dynamic Information Retrieval Modeling Tutorial
2015132
Documents are dependent on each other
Co-click Matrix from users and logs1
Portfolio Armed Bandit Ranking2:
Exploratively rank using Iterative Expectation
Diversify using portfolio optimisation over co-click matrix
Update relevance and dependence with each click
Both explorative and diverse
1W. Wu et al. ‘11
2M. Sloan and Jun Wang‘1
133. Outline
Dynamic Information Retrieval Modeling Tutorial
2015133
Introduction & Theory
Session Search
Dynamic Ranking
Multi Armed Bandits
Portfolio Ranking
Multi-Page Search
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
134. Multi Page Search
Dynamic Information Retrieval Modeling Tutorial
2015134
Page 1 Page 2
2.
1.
2.
1.
X Jin, M. Sloan and J. Wang
’13
135. Multi Page Search Example -
States & Actions
Dynamic Information Retrieval Modeling Tutorial
2015135
State:
Relevanc
e of
docume
nt
Action:
Ranking
of
document
s
Observatio
n: Clicks Belief:
Multivariate
Guassian
Reward: DCG
over 2 pages
X Jin, M. Sloan and J. Wang
’13
161. Proposed EE algorithms
Thompson Sampling
Linear-UCB
General Linear-UCB
Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CI
2013.
163. Ad selection problem
Dynamic Information Retrieval Modeling Tutorial
2015163
how online publishers could optimally select ads
to maximize their ad incomes over time?
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
Selling in
multiple-
channels
with non-
fixed
prices
164. Dynamic Information Retrieval Modeling Tutorial
2015164
Problem formulation
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
165. Problem formulation
Dynamic Information Retrieval Modeling Tutorial
2015165
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
166. Objective function
Dynamic Information Retrieval Modeling Tutorial
2015166
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
167. Belief update
Dynamic Information Retrieval Modeling Tutorial
2015167
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
169. Outline
Dynamic Information Retrieval Modeling Tutorial
2015169
Introduction & Theory
Session Search
Dynamic Ranking
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
170. Dynamic Information Retrieval Evaluation
Guest talk at the WSDM 2015 tutorial on
Dynamic Information Retrieval Modeling
Charlie Clarke
(with much much input from Mark Smucker)
University of Waterloo, Canada
171. Moving from static ranking to dynamic domains
• How to extend IR evaluation methodologies to
dynamic domains?
• Three key ideas:
1. Realistic models of searcher interactions
2. Measures costs to searcher in meaningful units
(e.g., time, money, …)
3. Measure benefits to searcher in meaningful units
(e.g, time, nuggets, …)
Charles Clarke, University of Waterloo 171
This talk strongly reflects my opinions (not trying to be neutral).
But I am the guest speaker
172. Evaluating Information Access Systems
Charles Clarke, University of Waterloo 172
searching, browsing, summarization,
visualization, desktop, mobile, web,
books, images, questions, etc., and
combinations of these
Does the system work for its users?
Will this change make the system better or worse?
How do we quantify performance?
173. Performance 101: Is this a good search result?
Charles Clarke, University of Waterloo 173
174. How to evaluate?
Study users
Charles Clarke, University of Waterloo 174
Users in the wild:
• A/B Testing
• Result interleaving
• Clicks and dwell time
• Mouse movements
• Other implicit feedback
• …
Users in the lab:
• Time to task completion
• Think aloud protocols
• Questionnaires
• Eye tracking
• …
175. Unfortunately user studies are
• Slow
• Expensive
• Conditions can never be exactly duplicated
(e.g., learning to rank)
Charles Clarke, University of Waterloo 175
176. Alternative: User performance prediction
Can we predict the impact of a proposed change to an
information access system (while respecting and reflecting
differences between users)?
Can we quantify performance improvements in meaningful
units so that effect sizes can be considered in statistical
testing? Are improvements practically significant, as well as
statistically significant?
Want to predict the impact of a proposed change
automatically, based on existing user performance data,
rather than gathering new performance data.
Charles Clarke, University of Waterloo 176
The BIG goal
↵
177. Traditional Evaluation of Rankers
• Test collection:
– Documents
– Queries
– Relevance judgments
• Each ranker generates a ranked list of
documents for each query
• Score ranked lists using relevance judgments
and standard metrics (recall, mean average
precision, nDCG, ERR, RBP, ….).
Charles Clarke, University of Waterloo 177
178. Charles Clarke, University of Waterloo 178
Example of a good-old-fashioned IR Metric
Relevant2.
Non-relevant1.
Non-relevant3.
Relevant5.
Non-relevant4.
Non-relevant6.
Non-relevant7.
Ranked List of
Documents
8.
…
Precision at
Rank N
0.00
0.50
0.33
0.25
0.40
0.33
0.29
…
Average Precision is
the average of the
precision at N for each
relevant document.
Mean average
precision (MAP) is AP
averaged over the set
of queries.
AP =
1
R
Prec(Ri )
Ri
å
Precision at rank N is the fraction
of documents that are relevant in
the first N documents.
179. General form of effectiveness measures
Nearly all standard effectiveness measures
have the same basic form (including nDCG,
RBP, ERR, average precision,…):
Charles Clarke, University of Waterloo 179
Normalization
Rank Gain at rank k
Discount
factor
180. Implicit user model…
• User works down the ranked list spending
equal time on each document. Captions,
navigation, etc., have no impact.
• If they make it to rank i, they receive some
benefit (i.e., gain).
• Eventually they stop, which is reflected in the
discount (i.e., they are less likely to reach
lower ranks).
• Normalization typically maps the score into
the range [0:1]. Units may not be meaningful.
Charles Clarke, University of Waterloo 180
181. Traditional Evaluation of Rankers
• Many effectiveness measures: precision,
recall, average precision, rank-biased
precision, discounted cumulative gain, etc.
• Widely used and accepted as standard
practice.
• But…
• What does an improvement in average precision from
0.28 to 0.31 mean to users?
• Does an increase in the measure really translate to an
improved user experience?
• How will an improve in the performance of a single
component impact overall system performance?
Charles Clarke, University of Waterloo 181
182. How to better reflect user variation and system performance?
Charles Clarke, University of Waterloo 182
Example: What’s the simplest possible user interface for search?
1) User issues a query
2) System returns material to read
i.e., system returns stuff to read, in order
(not a list of documents; more like a newspaper article)
A correspondingly simple user model, has two parameters:
1) Reading speed
2) Time spent reading
183. Reading speed distribution (from users in the lab)
Charles Clarke, University of Waterloo 183
Empirical distribution of reading speed during an information access task,
and its fit to a log-normal distribution.
184. Stopping time distribution (from users in the wild)
Charles Clarke, University of Waterloo 184
Empirical distribution of time spent searching during an information access
task, and its fit to a log-normal distribution.
185. Evaluating a search result
Charles Clarke, University of Waterloo 185
1) Generate a reading speed from the distribution
2) Generate a stopping time from the distribution
3) How much useful material did the user read?
4) Repeat for many (simulated) users
As an example, we use passage retrieval runs from TREC 2006
Hard Track, which essentially assume our simple user interface.
We measure costs to searcher in terms of time spent searching.
We measure benefits to searcher in terms of “time well spent”.
186. Useful characters read vs. Characters read
Charles Clarke, University of Waterloo 186
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
187. Useful characters read vs. Time spent reading
Charles Clarke, University of Waterloo 187
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
188. Time well spent vs. Time spent reading
Charles Clarke, University of Waterloo 188
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
189. Distribution of time well spent
Charles Clarke, University of Waterloo 189
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
190. Temporal precision vs. Time spent Reading
Charles Clarke, University of Waterloo 190
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
191. Distribution of temporal precision
Charles Clarke, University of Waterloo 191
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
192. General Framework (Part I): Cumulative Gain
• Consider the performance of a system in terms
of a cost-benefit (cumulative gain) curve G(t).
– Measure costs (e.g., in terms of time spent).
– Measure benefits (e.g., in terms of time well
spent).
• A particular instance of G(t) represents a
single user (described by a set of parameters)
interacting with a system. not just a list!!!
• G(t) captures factors intrinsic to the system.
We don’t know how much time the user has to
invest, but for different levels of investment,
G(t) indicates the benefit.Charles Clarke, University of Waterloo 192
193. General Framework (Part II): Decay
• Consider the user’s willingness to invest time in
terms of a decay curve D(t), which provides a
survival probability.
• We assume that G(t) and D(t) are independent.
(System dependent stopping probabilities are
accommodated in G(t). Details on request.)
• D(t) captures factors extrinsic to the system.
The user only has so much time they could
invest. The cannot invest more, even if they
would receive substantial additional benefit
from further interaction.
Charles Clarke, University of Waterloo 193
194. General form of effectiveness measures (REMINDER)
Nearly all standard effectiveness measures
have the same basic form (including nDCG,
RBP, ERR, average precision,…):
Charles Clarke, University of Waterloo 194
Normalization
Rank Gain at rank k
Discount
factor
195. General Framework (Part III): Time-biased gain
Overall system performance may be expressed
as expected cumulative gain (which also
incorporates standard effectiveness measures):
Charles Clarke, University of Waterloo 195
Normalization (== 1?)
Time Gain at time t
Decay
factor
196. General Framework (Part IV): Multiple users
• Cumulative gain may be computed by
– Simulation (drawing a set of parameters from a
population of users).
– Measuring actual interaction on live systems.
– Combinations of measurement and simulation.
• Simulating and/or measuring multiple users
allows us to consider performance difference
across the population of users.
• Simulation provides matching pairs (the same
user on both systems) increasing our ability to
detect differences.
Charles Clarke, University of Waterloo 196
197. General Framework
Most of the evaluation proposals in the
references can be reformulated in terms of this
general framework, including those that
address issues of:
– Novelty and diversity
– Filtering, summarization, question answering
– Session search, etc.
Charles Clarke, University of Waterloo 197
One more example from our current research…
198. Session search example
• Two (or more) result lists, e.g., from query
reformulation, query suggestion, or switching
search engines.
• Modeling searcher interaction requires a
switch from one result to another.
• The optimal time to switch depends on the
total time available to search.
For example (with many details omitted…):
Charles Clarke, University of Waterloo 198
199. Simulation of searchers switching between lists: A vs. B
Charles Clarke, University of Waterloo 199
User starts on list A.
If the user has less
than five minutes to
search, they should
stay on list A.
If the user has more
than five minutes to
search, they should
leave list A after 90
seconds.
But can we assume
optimal behavior when
modeling users?
200. Simulation of searchers switching between lists: A vs. B
Charles Clarke, University of Waterloo 200
0 2 4 6 8 10
02468
Switch Time (minutes)
AverageGain(relevantdocuments)
10 minutes
8 minutes
6 minutes
4 minutes
2 minutes
Session Duration
Topic = 389, List A = sab05ror1, List B = uic0501
Different view of the
same simulation, with
thousands of simulated
users.
Here, benefits are
measured by number of
relevant documents
seen.
Optimal switching time
depends on session
duration.
201. Summary
• Primary goal of IR evaluation: Predict how changes
to an IR system will impact the user experience.
• Evaluation in dynamic domains requires us to
explicitly model the system interface and the user’s
search behavior. Costs and benefits must be
measured in meaningful units (e.g., time).
• Successful IR evaluation requires measurement of
users, both “in the wild” and in the lab. These
measurements calibrate models, which make
predictions, which improve systems.
Charles Clarke, University of Waterloo 201
202. A few key papers
• Leif Azzopardi. 2009. Usage based effectiveness measures: monitoring application
performance in information retrieval. In Proceedings of the 18th ACM conference on
Information and knowledge management (CIKM '09).
• Leif Azzopardi, Diane Kelly, and Kathy Brennan. 2013. How query cost affects search
behavior. In Proceedings of the 36th international ACM SIGIR conference on Research and
development in information retrieval (SIGIR '13).
• Feza Baskaya, Heikki Keskustalo, and Kalervo Järvelin. 2012. Time drives interaction:
simulating sessions in diverse searching environments. In Proceedings of the 35th
international ACM SIGIR conference on research and development in information retrieval
(SIGIR '12).
• Ben Carterette. 2011. System effectiveness, user models, and user utility: a conceptual
framework for investigation. In Proceedings of the 34th international ACM SIGIR
conference on research and development in Information Retrieval (SIGIR '11).
• Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2011. Simulating simple user
behavior for system effectiveness evaluation. In Proceedings of the 20th ACM international
conference on information and knowledge management (CIKM '11).
• Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2012. Incorporating variability in
user behavior into systems based evaluation. In Proceedings of the 21st ACM international
conference on information and knowledge management (CIKM '12).
Charles Clarke, University of Waterloo 202
203. A few more key papers
• Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected
reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on
information and knowledge management (CIKM '09).
• Charles L.A. Clarke, Nick Craswell, Ian Soboroff, and Azin Ashkan. 2011. A comparative
analysis of cascade measures for novelty and diversity. In Proceedings of the fourth ACM
international conference on web search and data mining (WSDM '11).
• Charles L. A. Clarke and Mark D. Smucker. 2014. Time well spent. In Proceedings of the
5th information interaction in context symposium (IIiX '14).
• Georges Dupret and Mounia Lalmas. 2013. Absence time and user engagement:
evaluating ranking functions. In Proceedings of the sixth ACM international conference on
web search and data mining (WSDM '13).
• Kalervo Järvelin, Susan L. Price, Lois M. L. Delcambre, and Marianne Lykke Nielsen. 2008.
Discounted cumulated gain based evaluation of multiple-query IR sessions. In Proceedings
of the IR research, 30th European conference on Advances in information retrieval
(ECIR'08).
• Jiyun Luo, Christopher Wing, Hui Yang, and Marti Hearst. 2013. The water filling model and
the cube test: multi-dimensional evaluation for professional search. In Proceedings of the
22nd ACM international conference on information & knowledge management (CIKM '13).
Charles Clarke, University of Waterloo 203
204. And yet more key papers
• Tetsuya Sakai and Zhicheng Dou. 2013. Summaries, ranked retrieval and sessions: a
unified framework for information access evaluation. In Proceedings of the 36th
international ACM SIGIR conference on Research and development in information retrieval
(SIGIR '13).
• Mark D. Smucker and Charles L.A. Clarke. 2012. Time-based calibration of effectiveness
measures. In Proceedings of the 35th international ACM SIGIR conference on Research
and development in information retrieval (SIGIR '12).
• Mark D. Smucker and Charles L. A. Clarke. 2012. Modeling user variance in time-biased
gain. In Proceedings of the Symposium on Human-Computer Interaction and Information
Retrieval (HCIR '12).
• Emine Yilmaz, Milad Shokouhi, Nick Craswell, and Stephen Robertson. 2010. Expected
browsing utility for web search evaluation. In Proceedings of the 19th ACM international
conference on Information and knowledge management (CIKM '10).
• Yiming Yang and Abhimanyu Lad. 2009. Modeling expected utility of multi-session
information distillation. In Proceedings of the 2nd international conference on the theory of
information retrieval (ICTIR ’09).
• Plus many other (ask me).
Charles Clarke, University of Waterloo 204
205. Dynamic Information Retrieval Evaluation
Guest talk at the WSDM 2015 tutorial on
Dynamic Information Retrieval Modeling
Charlie Clarke
University of Waterloo, Canada
Thank you!
206. Outline
Dynamic Information Retrieval Modeling Tutorial
2015206
Introduction & Theory
Session Search
Dynamic Ranking
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
207. Apply an MDP to an IR
Problem
Dynamic Information Retrieval Modeling Tutorial
2015207
We can model IR systems using a Markov
Decision Process
Is there a temporal component?
States – What changes with each time step?
Actions – How does your system change the
state?
Rewards – How do you measure feedback or
effectiveness in your problem at each time
step?
Transition Probability – Can you determine
this?
208. Apply an MDP to an IR Problem
- Example
Dynamic Information Retrieval Modeling Tutorial
2015208
User agent in session search
States – user’s relevance judgement
Action – new query
Reward – information gained
[Luo, Zhang, Yang SIGIR’14]
209. Apply an MDP to an IR Problem
- Example
Dynamic Information Retrieval Modeling Tutorial
2015209
Search engine’s perspective
What if we can’t directly observe user’s
relevance judgement?
Click ≠ relevance
? ? ? ?
210. Applying POMDP to Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
2015210
POMDP Dynamic IR
Environment Documents
Agents User, Search engine
States Queries, User’s decision making status, Relevance of
documents, etc
Actions Provide a ranking of documents, Weigh terms in the query,
Add/remove/unchange the query terms, Switch on or
switch off a search technology, Adjust parameters for a
search technology
Observations Queries, Clicks, Document lists, Snippets, Terms, etc
Rewards Evaluation measures (such as DCG, NDCG or MAP)
Clicking information
Transition matrix Given in advance or estimated from training data.
Observation
function
Problem dependent, Estimated based on sample datasets
211. SIGIR Tutorial July 7th 2014
Grace Hui Yang
Marc Sloan
Jun Wang
Guest Speaker: Emine Yilmaz
Dynamic Information Retrieval
Modeling
Panel
Discussion
212. Outline
Dynamic Information Retrieval Modeling Tutorial
2015212
Introduction & Theory
Session Search
Dynamic Ranking
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
Conclusion
213. Conclusions
Dynamic Information Retrieval Modeling Tutorial
2015213
Dynamic IR describes a new class of interactive
model
Incorporates rich feedback, temporal dependency
and is goal oriented.
Family of Markov models and Multi Armed Bandit
theory useful in building DIR models
Applicable to a range of IR problems
Useful in applications such as session search and
evaluation
214. Dynamic IR Book
Dynamic Information Retrieval Modeling Tutorial
2015214
Published by Morgan & Claypool
‘Synthesis Lectures on Information Concepts,
Retrieval, and Services’
Due April / May 2015 (in time for SIGIR 2015)
215. TREC 2015
Dynamic Domain Track
Co-organized by Grace Hui Yang, John Frank, Ian Soboroff
Underexplored subsets of Web content
Limited scope and richness of indexed content, which may not
include relevant components of the deep web
temporary pages,
pages behind forms, etc.
Basic search interfaces, where there is little collaboration or
history beyond independent keyword search
Complex, task-based, dynamic search
Temporal dependency
Rich interactions
Complex, evolving information needs
Professional users
A wide range of search strategies
215
216. Task
An interactive, multiple runs of search
Starting point: System is given a search query
Iterate
System returns a ranked list of 5 documents
API returns relevance judgments
go to next iteration of retrieval
until done (system decides when to stop)
The goal of the system is to find relevant information for
each topic as soon as possible
One-shot ad-hoc search is included
If system decides to stop after iteration one
216
217. domains
Domain Corpus
Illicit goods 30k forum posts from 5-10 forums (total ~300k posts)
Which users are working together to sell illicit goods?
Ebola One million tweets
300k docs from in-country web sites (mostly official sites)
Who is doing what and where?
Local Politics 300k docs from local political groups in Pacific Northwest
and British Columbia. Who is campaigning for what and
why?
217
218. TIME Line
TREC Call for Participation: January 2015
Data Available: March
Detailed Guidelines: April/May
Topics, Tasks available: June
Systems do their thing: June-July
Evaluation: August
Results to participants: September
Conference: November 2015
218
219. TREC 2015
Total Recall Track
Co-organized by Gord Cormack, Maura
Grossman, , Adam Roegiest, Charlie Clarke
Explores high recall tasks through an active
learning process modeled on legal search tasks
(eDiscovery, patent search).
Participating system start with a topic and proposes
a relevant document.
Systems gets immediate feedback on relevance.
Continues to propose additional documents and
receive feedback until stopping condition is
researched.
Shared online infrastructure and collections with
Dynamic Domain. Easy to participate in both, if
you participate in one.
219
220. Acknowledgment
Dynamic Information Retrieval Modeling Tutorial
2015220
We thank Prof. Charlie Clarke and for his guest
lecture
We sincerely thank Dr. Xuchu Dong for his help in
preparation of the tutorial
We also thank comments and suggestions from
the following colleagues:
Dr Filip Radlinski
Prof. Maarten de Rijke
221. References
Dynamic Information Retrieval Modeling Tutorial
2015221
Static IR
Modern Information Retrieval. R. Baeza-Yates and B.
Ribeiro-Neto. Addison-Wesley, 1999.
The PageRank Citation Ranking: Bringing Order to
the Web. Lawrence Page , Sergey Brin , Rajeev
Motwani , Terry Winograd. 1999
Implicit User Modeling for Personalized Search,
Xuehua Shen et. al, CIKM, 2005
A Short Introduction to Learning to Rank. Hang Li,
IEICE Transactions 94-D(10): 1854-1862, 2011.
Portfolio Theory of Information Retrieval. J. Wang and
J. Zhu. In SIGIR 2009
222. References
Dynamic Information Retrieval Modeling Tutorial
2015222
Interactive IR
Relevance Feedback in Information Retrieval,
Rocchio, J. J., The SMART Retrieval System (pp.
313-23), 1971
A study in interface support mechanisms for
interactive information retrieval, Ryen W. White et. al,
JASIST, 2006
Visualizing stages during an exploratory search
session, Bill Kules et. al, HCIR, 2011
Dynamic Ranked Retrieval, Cristina Brandt et. al,
WSDM, 2011
Structured Learning of Two-level Dynamic Rankings,
Karthik Raman et. al, CIKM, 2011
223. References
Dynamic Information Retrieval Modeling Tutorial
2015223
Dynamic IR
A hidden Markov model information retrieval system.
D. R. H. Miller, T. Leek, and R. M. Schwartz. In
SIGIR’99, pages 214-221.
Threshold setting and performance optimization in
adaptive filtering, Stephen Robertson, JIR 2002
A large-scale study of the evolution of web pages,
Dennis Fetterly et. al., WWW 2003
Learning diverse rankings with multi-armed bandits.
Filip Radlinski, Robert Kleinberg, Thorsten Joachims.
ICML, 2008.
Interactively Optimizing Information Retrieval Systems
as a Dueling Bandits Problem, Yisong Yue et. al.,
ICML 2009
Meme-tracking and the dynamics of the news cycle,
224. References
Dynamic Information Retrieval Modeling Tutorial
2015224
Dynamic IR
Mortal multi-armed bandits. Deepayan Chakrabarti, Ravi
Kumar, Filip Radlinski, Eli Upfal. NIPS 2009
A Novel Click Model and Its Applications to Online
Advertising , Zeyuan Allen Zhu et. al., WSDM 2010
A contextual-bandit approach to personalized news article
recommendation. Lihong Li, Wei Chu, John Langford,
Robert E. Schapire. WWW, 2010
Inferring search behaviors using partially observable
markov model with duration (POMD), Yin he et. al.,
WSDM, 2011
No Clicks, No Problem: Using Cursor Movements to
Understand and Improve Search, Jeff Huang et. al., CHI
2011
Balancing Exploration and Exploitation in Learning to Rank
Online, Katja Hofmann et. al., ECIR, 2011
Large-Scale Validation and Analysis of Interleaved Search
Evaluation, Olivier Chapelle et. al., TOIS 2012
225. References
Dynamic Information Retrieval Modeling Tutorial
2015225
Dynamic IR
Using Control Theory for Stable and Efficient
Recommender Systems. T. Jambor, J. Wang, N.
Lathia. In: WWW '12, pages 11-20.
Sequential selection of correlated ads by POMDPs,
Shuai Yuan et. al., CIKM 2012
Utilizing query change for session search. D. Guan,
S. Zhang, and H. Yang. In SIGIR ’13, pages 453–
462.
Query Change as Relevance Feedback in Session
Search (short paper). S. Zhang, D. Guan, and H.
Yang. In SIGIR 2013.
Interactive exploratory search for multi page search
results. X. Jin, M. Sloan, and J. Wang. In WWW ’13.
Interactive Collaborative Filtering. X. Zhao, W.
226. References
Dynamic Information Retrieval Modeling Tutorial
2015226
Dynamic IR
Win-win search: Dual-agent stochastic game in
session search. J. Luo, S. Zhang, and H. Yang. In
SIGIR ’14.
Iterative Expectation for Multi-Period Information
Retrieval. M. Sloan and J. Wang. In WSCD 2013.
Dynamical Information Retrieval Modelling: A
Portfolio-Armed Bandit Machine Approach. M.
Sloan and J. Wang. In WWW 2012.
Jiyun Luo, Sicong Zhang, Xuchu Dong and Hui
Yang. Designing States, Actions, and Rewards for
Using POMDP in Session Search. In ECIR 2015.
Sicong Zhang, Jiyun Luo, Hui Yang. A POMDP
Model for Content-Free Document Re-ranking. In
SIGIR 2014.
227. References
Dynamic Information Retrieval Modeling Tutorial
2015227
Markov Processes
A markovian decision process. R. Bellman. Indiana
University Mathematics Journal, 6:679–684, 1957.
Dynamic Programming. R. Bellman. Princeton University
Press, Princeton, NJ, USA, first edition, 1957.
Dynamic Programming and Markov Processes. R.A.
Howard. MIT Press. 1960
Linear Programming and Sequential Decisions. Alan S.
Manne. Management Science, 1960
Statistical Inference for Probabilistic Functions of Finite
State Markov Chains. Baum, Leonard E.; Petrie, Ted. The
Annals of Mathematical Statistics 37, 1966
228. References
Dynamic Information Retrieval Modeling Tutorial
2015228
Markov Processes
Learning to predict by the methods of temporal differences.
Richard Sutton. Machine Learning 3. 1988
Computationally feasible bounds for partially observed
Markov decision processes. W. Lovejoy. Operations
Research 39: 162–175, 1991.
Q-Learning. Christopher J.C.H. Watkins, Peter Dayan.
Machine Learning. 1992
Reinforcement learning with replacing eligibility traces.
Singh, S. P. & Sutton, R. S. Machine Learning, 22, pages
123-158, 1996.
Reinforcement Learning: An Introduction. Richard S.
Sutton and Andrew G. Barto. MIT Press, 1998.
Planning and acting in partially observable stochastic
domains. L. Kaelbling, M. Littman, and A. Cassandra.
Artificial Intelligence, 101(1-2):99–134, 1998.
229. References
Dynamic Information Retrieval Modeling Tutorial
2015229
Markov Processes
Finding approximate POMDP solutions through belief
compression. N. Roy. PhD Thesis Carnegie Mellon. 2003
VDCBPI: an approximate scalable algorithm for large scale
POMDPs, P. Poupart and C. Boutilier. In NIPS-2004,
pages 1081–1088.
Finding Approximate POMDP solutions Through Belief
Compression. N. Roy, G. Gordon and S. Thrun. Journal of
Artificial Intelligence Research, 23:1-40,2005.
Probabilistic robotics. S. Thrun, W. Burgard, D. Fox.
Cambridge. MIT Press. 2005
Anytime Point-Based Approximations for Large POMDPs.
J. Pineau, G. Gordon and S. Thrun. Volume 27, pages
335-380, 2006
Probabilistic Robotics. S. Thrun, W. Burgard, D. Fox. The
MIT Press, 2006.
230. References
Dynamic Information Retrieval Modeling Tutorial
2015230
Markov Processes
The optimal control of partially observable Markov decision
processes over a finite horizon. R. D. Smallwood, E.J. Sondik.
Operations Research. 1973
Modified Policy Iteration Algorithms for Discounted Markov
Decision Problems. M. L. Puterman and Shin M. C. Management
Science 24, 1978.
An example of statistical investigation of the text eugene onegin
the connection of samples in chains. A. A. Markov. Science in
Context, 19:591–600, 12 2006.
Learning to Rank for Information Retrieval. Tie-Yan Liu. Springer
Science & Business Media. 2011
Finite-Time Regret Bounds for the Multiarmed Bandit Problem,
Nicolò Cesa-Bianchi, Paul Fischer. ICML 100-108, 1998
Multi-armed bandit allocation indices, Wiley, J. C. Gittins. 1989
Finite-time Analysis of the Multiarmed Bandit Problem, Peter
Auer et. al., Machine Learning 47, Issue 2-3. 2002.