SlideShare une entreprise Scribd logo
1  sur  230
WSDM Tutorial February 2nd 2015
Grace Hui Yang
Marc Sloan
Jun Wang
Guest Speaker: Charlie Clarke
Dynamic Information Retrieval
Modeling
Dynamic Information
Retrieval
Dynamic Information Retrieval Modeling Tutorial
20152
Document
s to
explore
Informatio
n
need
Observed
document
s
User
Devise a strategy
for helping the
user explore the
information space
in order to learn
which documents
are relevant and
which aren’t, and
satisfy their
information need.
Evolving IR
Dynamic Information Retrieval Modeling Tutorial
20153
 Paradigm shifts in IR as new models
emerge
 e.g. VSM → BM25 → Language Model
 Different ways of defining relationship
between query and document
 Static → Interactive → Dynamic
 Evolution in modeling user interaction with
search engine
Outline
Dynamic Information Retrieval Modeling Tutorial
20154
 Introduction & Theory
 Static IR
 Interactive IR
 Dynamic IR
 Session Search
 Dynamic Ranking
 Recommendation and Advertising
 Guest Talk: Charlie Clarke
 Discussion Panel
Conceptual Model – Static IR
Dynamic Information Retrieval Modeling Tutorial
20155
Static IR
Interactive
IR
Dynamic
IR
 No feedback
Characteristics of Static IR
Dynamic Information Retrieval Modeling Tutorial
20156
 Does not learn directly from
user
 Parameters updated
periodically
Dynamic Information Retrieval Modeling Tutorial
20157
Commonly Used Static IR
Models
BM25
PageRank
Language
Model
Learning to
Rank
Feedback in IR
Dynamic Information Retrieval Modeling Tutorial
20158
Outline
Dynamic Information Retrieval Modeling Tutorial
20159
 Introduction & Theory
 Static IR
 Interactive IR
 Dynamic IR
 Session Search
 Dynamic Ranking
 Recommendation and Advertising
 Guest Talk: Charlie Clarke
 Discussion Panel
Conceptual Model – Interactive
IR
Dynamic Information Retrieval Modeling Tutorial
201510
Static IR
Interactive
IR
Dynamic
IR
 Exploit Feedback
Learn the user’s taste
interactively!
At the same time, provide good
recommendations!
Dynamic Information Retrieval Modeling Tutorial
201511
Interactive Recommender
Systems
Toy Example
Dynamic Information Retrieval Modeling Tutorial
201512
 Multi-Page search scenario
 User image searches for “jaguar”
 Rank two of the four results over two
pages:
𝑟 = 0.5 𝑟 = 0.51 𝑟 = 0.9𝑟 = 0.49
Toy Example – Static
Ranking
Dynamic Information Retrieval Modeling Tutorial
201513
 Ranked according to PRP
Page 1 Page 2
1.
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
𝑟 = 0.5
𝑟 = 0.49
Toy Example – Relevance
Feedback
Dynamic Information Retrieval Modeling Tutorial
201514
 Interactive Search
 Improve 2nd page based on feedback
from 1st page
 Use clicks as relevance feedback
 Rocchio1 algorithm on terms in image
webpage
 𝑤 𝑞
′
= 𝛼𝑤 𝑞 +
𝛽
|𝐷 𝑟| 𝑑∈𝐷 𝑟
𝑤 𝑑 −
𝛾
𝐷 𝑛
𝑑∈𝐷 𝑛
𝑤 𝑑
 New query closer to relevant documents
and different to non-relevant documents1Rocchio, J. J., ’71, Baeza-
Yates & Ribeiro-Neto ‘99
Toy Example – Relevance
Feedback
Dynamic Information Retrieval Modeling Tutorial
201515
 Ranked according to PRP and Rocchio
Page 1 Page 2
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
𝑟 = 0.5
𝑟 = 0.49
*
1.
* Click
Toy Example – Relevance
Feedback
Dynamic Information Retrieval Modeling Tutorial
201516
 No click when searching for animals
Page 1 Page 2
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
1.
?
?
Toy Example – Value
Function
Dynamic Information Retrieval Modeling Tutorial
201517
 Optimize both pages using dynamic IR
 Bellman equation for value function
 Simplified example:
 𝑉 𝑡
𝜃 𝑡
, Σ 𝑡
= max
𝑠 𝑡
𝜃𝑠
𝑡
+ 𝐸(𝑉 𝑡+1
𝜃 𝑡+1
, Σ 𝑡+1
𝐶 𝑡
)
 𝜃 𝑡, Σ 𝑡 = relevance and covariance of documents for
page 𝑡
 𝐶 𝑡 = clicks on page 𝑡
 𝑉 𝑡 = ‘value’ of ranking on page 𝑡
 Maximize value over all pages based on
estimating feedback
X Jin, M. Sloan and J. Wang
’13
1 0.8 0.1 0
0.8 1 0.1 0
0.1 0.1 1 0.95
0 0 0.95 1
Toy Example - Covariance
Dynamic Information Retrieval Modeling Tutorial
201518
 Covariance matrix represents similarity between
images
X Jin, M. Sloan and J. Wang
’13
Toy Example – Myopic Value
Dynamic Information Retrieval Modeling Tutorial
201519
 For myopic ranking, 𝑉2
= 16.380
Page 1
2.
1.
X Jin, M. Sloan and J. Wang
’13
Toy Example – Myopic
Ranking
Dynamic Information Retrieval Modeling Tutorial
201520
 Page 2 ranking stays the same regardless of
clicksPage 1 Page 2
2.
1.
2.
1.
X Jin, M. Sloan and J. Wang
’13
Toy Example – Optimal Value
Dynamic Information Retrieval Modeling Tutorial
201521
 For optimal ranking, 𝑉2
= 16.528
Page 1
2.
1.
X Jin, M. Sloan and J. Wang
’13
Toy Example – Optimal Ranking
Dynamic Information Retrieval Modeling Tutorial
201522
 If car clicked, Jaguar logo is more relevant on
next pagePage 1 Page 2
2.
1.
2.
1.
X Jin, M. Sloan and J. Wang
’13
Toy Example – Optimal Ranking
Dynamic Information Retrieval Modeling Tutorial
201523
 In all other scenarios, rank animal first on next
pagePage 1 Page 2
2.
1.
2.
1.
X Jin, M. Sloan and J. Wang
’13
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple fruit


O: doc about apple iphone
Documents exist in vector space
24 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
Static IR Visualization
Static IR Visualization
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple fruit


O: doc about apple iphone
Q
25 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
t = 1: Static IR considers Relevancy
Static IR Visualization
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple fruit


O: doc about apple iphone
Q
26 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
t = 1: Static IR considers Relevancy
Interactive IR Update
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple fruit


O: doc about apple iphone
Q
-1
-1
+1
Q’
27 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
t = 1: Static IR considers Relevancy
t = 2: Interactive considers local gain
Interactive IR Update
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple fruit


O: doc about apple iphone
Q
-1
-1
+1
Q’
28 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
t = 1: Static IR considers Relevancy
t = 2: Interactive considers local gain
Dynamic Ranking Principle
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple fruit


O: doc about apple iphone
t = 1: Relevancy + Variance
Q
29 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
Dynamic Ranking Principle
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple fruit


O: doc about apple iphone
t = 1: Relevancy + Variance + |Correlations|
Q
-1
-1
+1
30 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
Dynamic Ranking Principle
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple fruit


O: doc about apple iphone
t = 1: Relevancy + Variance + |Correlations|
Diversified, exploratory relevance ranking
Q
31 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
Dynamic Ranking Principle
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple fruit


O: doc about apple iphone
Q
-1
-1
+1
Q’
32 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
t = 1: Relevancy + Variance + |Correlations|
Diversified, exploratory relevance ranking
t = 2: Personalized Re-ranking
Interactive vs Dynamic IR
Dynamic Information Retrieval Modeling Tutorial
201533
• Treats
interactions
independently
• Responds to
immediate
feedback
• Static IR used
before feedback
received
• Optimizes
over all
interaction
• Long term
gains
• Models future
user feedback
• Also used at
beginning of
interaction
Interactive Dynamic
Interactive & Dynamic
Techniques
Dynamic Information Retrieval Modeling Tutorial
201534
• Rocchio
equation in
Relevance
Feedback
• Collaborative
filtering in
recommender
systems
• Active learning
in interactive
retrieval
• POMDP in
multi page
search and ad
recommendati
on
• Multi Armed
Bandits in
Online
Evaluation
• MDP in
session search
Interactive Dynamic
Outline
Dynamic Information Retrieval Modeling Tutorial
201535
 Introduction & Theory
 Static IR
 Interactive IR
 Dynamic IR
 Session Search
 Dynamic Ranking
 Recommendation and Advertising
 Guest Talk: Charlie Clarke
 Discussion Panel
Conceptual Model – Interactive
IR
Dynamic Information Retrieval Modeling Tutorial
201536
Static IR
Interactive
IR
Dynamic
IR
 Explore and exploit Feedback
Characteristics of Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
201537
Rich interactions
Query formulation
Document clicks
Document examination
Eye movement
Mouse movements
etc.
[Luo et al., IRJ under revision 2014]
Characteristics of Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
201538
Temporal dependency
clicked
documentsquery
D1
ranked documents
q1 C1
D2
q2 C2
……
…… Dn
qn Cn
I
information need
iteration 1 iteration 2 iteration n
[Luo et al., IRJ under revision 2014]
Characteristics of Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
201539
Overall goal
Optimize over all iterations for goal
IR metric or user satisfaction
Optimal policy
[Luo et al., IRJ under revision 2014]
40/33
Dynamic Information
Retrieval
Dynamic Relevance
Dynamic Users
Dynamic Queries
Dynamic Documents
Dynamic Information Needs
Users change behavior
over time, user history
Topic Trends, Filtering,
document content change
User perceived
relevance changes
Changing query
definition i.e. ‘Twitter’
Information needs evolve over time
Next
generation
Search
Engine
Why Not Existing Supervised
Learning for Dynamic IR Modeling?
Dynamic Information Retrieval Modeling Tutorial
201541
 Lack of enough training data
 Dynamic IR problems contain a sequence of dynamic
interactions
 E.g. a series of queries in session
 Rare to find repeated sequences (close to zero)
 Even in large query logs (WSCD 2013 & 2014, query logs
from Yandex)
 Chance of finding repeated adjacent query
pairs is also lowDataset Repeated
Adjacent Query
Pairs
Total Adjacent
Query Pairs
Repeated
Percentage
WSCD
2013
476,390 17,784,583 2.68%
WSCD
2014
1,959,440 35,376,008 5.54%
Our Solution
Dynamic Information Retrieval Modeling Tutorial
201542
Try to find an optimal solution
through a sequence of dynamic
interactions
Trial and Error:
learn from repeated, varied attempts
which are continued until success
No (or less) Supervised Learning
Trial and Error
Dynamic Information Retrieval Modeling Tutorial
201543
 q1 – "dulles hotels"
 q2 – "dulles airport"
 q3 – "dulles airport
location"
 q4 – "dulles metrostop"
What is a Desirable Model for
Dynamic IR
Dynamic Information Retrieval Modeling Tutorial
201544
 Model interactions, which means it needs to have place
holders for actions;
 Model information need hidden behind user queries and
other interactions;
 Set up a reward mechanism to guide the entire search
algorithm to adjust its retrieval strategies;
 Represent Markov properties to handle the temporal
dependency.
A model in Trial and Error setting will do!
A Markov Model will do!
Markov Decision Process
Dynamic Information Retrieval Modeling Tutorial
201545
 MDP extends MC with actions and rewards1
si– state ai – action ri – reward
pi – transition probability
p0 p1 p2
……s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
1R. Bellman, ‘57
(S, M, A, R, γ)
Definition of MDP
Dynamic Information Retrieval Modeling Tutorial
201546
 A tuple (S, M, A, R, γ)
 S : state space
 M: transition matrix
Ma(s, s') = P(s'|s, a)
 A: action space
 R: reward function
R(s,a) = immediate reward taking action a at state s
 γ: discount factor, 0< γ ≤1
 policy π
π(s) = the action taken at state s
 Goal is to find an optimal policy π* maximizing the expected
total rewards.
Optimality — Bellman
Equation
Dynamic Information Retrieval Modeling Tutorial
201547
 The Bellman equation1 to MDP is a recursive
definition of the optimal value function V*(.)
𝑉∗ s = max
𝑎
𝑅 𝑠, 𝑎 + 𝛾
𝑠′
𝑀 𝑎(𝑠, 𝑠′)𝑉∗(𝑠′)
 Optimal Policy
π∗ s = arg 𝑚𝑎𝑥
𝑎
𝑅 𝑠, 𝑎 + 𝛾
𝑠′
𝑀 𝑎 𝑠, 𝑠′ 𝑉∗(𝑠′)
1R. Bellman, ‘57
state-value function
MDP algorithms
Dynamic Information Retrieval Modeling Tutorial
201548
 Value Iteration
 Policy Iteration
 Modified Policy Iteration
 Prioritized Sweeping
 Temporal Difference (TD) Learning
 Q-Learning
Model free
approaches
Model-based
approaches
[Bellman, ’57, Howard, ‘60, Puterman and Shin, ‘78, Singh & Sutton, ‘96, Sutton &
Barto, ‘98, Richard Sutton, ‘88, Watkins, ‘92]
Solve
Bellman
equation
Optimal
value
V*(s)
Optimal
policy *(s)
[Slide altered from Carlos Guestrin’s ML
lecture]
Apply an MDP to an IR
Problem
Dynamic Information Retrieval Modeling Tutorial
201549
 We can model IR systems using a Markov
Decision Process
 Is there a temporal component?
 States – What changes with each time step?
 Actions – How does your system change the
state?
 Rewards – How do you measure feedback or
effectiveness in your problem at each time
step?
 Transition Probability – Can you determine
this?
Outline
Dynamic Information Retrieval Modeling Tutorial
201550
 Introduction & Theory
 Session Search
 Dynamic Ranking
 Recommendation and Advertising
 Guest Talk: Charlie Clarke
 Discussion Panel
TREC Session Tracks (2010-
now)
 Given a series of queries {q1,q2,…,qn}, top 10
retrieval results {D1, … Di-1 } for q1 to qi-1, and
click information
 The task is to retrieve a list of documents for the
current/last query, qn
 Relevance judgment is made based on how
relevant the documents are for qn, and how relevant
they are for information needs for the entire session
(in topic description)
 no need to segment the sessions
51
Dynamic Information Retrieval Modeling Tutorial
2015
1.pocono mountains pennsylvania
2.pocono mountains pennsylvania hotels
3.pocono mountains pennsylvania things to do
4.pocono mountains pennsylvania hotels
5.pocono mountains camelbeach
6.pocono mountains camelbeach hotel
7.pocono mountains chateau resort
8.pocono mountains chateau resort attractions
9.pocono mountains chateau resort getting to
10.chateau resort getting to
11.pocono mountains chateau resort directions
TREC 2012 Session 6
52
Information needs:
You are planning a winter vacation
to the Pocono Mountains region in
Pennsylvania in the US. Where will
you stay? What will you do while
there? How will you get there?
In a session, queries change
constantly
Dynamic Information Retrieval Modeling Tutorial
2015
Markov Decision Process
 We propose to model session search as a
Markov decision process (MDP)
 Two agents: the User and the Search Engine
53
[Guan, Zhang and Yang SIGIR 2013]
Settings of the Session MDP
 States: Queries
 Environments: Search results
 Actions:
 User actions:
 Add/remove/ unchange the query terms
 Nicely correspond to our definition of query change
 Search Engine actions:
 Increase/ decrease /remain term weights
54
[Guan, Zhang and Yang SIGIR 2013]
Search Engine Agent’s
Actions
∈ Di−1 action Example
qtheme
Y increase “pocono mountain” in s6
N increase
“france world cup 98 reaction” in s28,
france world cup 98 reaction stock
market→ france world cup 98 reaction
+∆q
Y decrease
‘policy’ in s37, Merck lobbyists → Merck
lobbyists US policy
N increase
‘US’ in s37, Merck lobbyists → Merck
lobbyists US policy
−∆q
Y decrease
‘reaction’ in s28, france world cup 98
reaction
→ france world cup 98
N
No
change
‘legislation’ in s32, bollywood legislation
→bollywood law
55 [Guan, Zhang and Yang SIGIR 2013]
Bellman Equation
 In a MDP, it is believed that a future reward is
not worth quite as much as a current reward
and thus a discount factor γ ϵ (0,1) is applied
to future rewards.
 Bellman Equation gives the optimal value
(expected long term reward starting from state
s and continuing with policy π from then on)
for an MDP:
56
V*
(s) = max
a
R(s,a) + g P(s' | s,a)
s'
å V*
(s')
Our Tweak
 In a MDP, it is believed that a future reward is
not worth quite as much as a current reward
and thus a discount factor γ ϵ (0,1) is applied
to future rewards.
 In session search, a past reward is not worth
quite as much as a current reward and thus a
discount factor γ should be applied to past
rewards
 We model the MDP for session search in a reverse
order
57
Query Change retrieval Model
(QCM)
 Bellman Equation gives the optimal value for
an MDP:
 The reward function is used as the document
relevance score function and is tweaked
backwards from Bellman equation:
58
V*
(s) = max
a
R(s,a) + g P(s' | s,a)
s'
å V*
(s')
 
a
Di
)D|(qPmaxa),D,q|(qP+d)|(qP=d),Score(q 1-i1-i1-i1-iiii
1

Document
relevant
score Query
Transition
model
Maximum
past
relevanceCurrent
reward/relevan
ce score
[Guan, Zhang and Yang SIGIR 2013]
Calculating the Transition Model
)|(log)|(
)|(log)()|(log)|(
)|(log)]|(1[+d)|P(qlog=d),Score(q
*
1
*
1
*
1ii
*
1
*
1
dtPdtP
dtPtidfdtPdtP
dtPdtP
qt
i
dt
qt
dt
qt
i
qthemet
i
ii



















59
• According to Query Change and Search
Engine ActionsCurrent reward/
relevance
score
Increase
weights for
theme terms
Decrease
weights for
removed terms
Increase
weights for
novel added
termsDecrease
weights for old
added terms
[Guan, Zhang and Yang SIGIR 2013]
Maximizing the Reward Function
 Generate a maximum rewarded document
denoted as d*
i-1, from Di-1
 That is the document(s) most relevant to qi-1
 The relevance score can be calculated as
𝑃 𝑞𝑖−1 𝑑𝑖−1 = 1 −
𝑡∈𝑞 𝑖−1
{1 − 𝑃(𝑡|𝑑𝑖−1)}
𝑃 𝑡 𝑑𝑖−1 =
#(𝑡,𝑑 𝑖−1)
|𝑑 𝑖−1|
 From several options, we choose to only use the
document with top relevance
max
Di-1
P(qi-1 | Di-1)
60
Dynamic Information Retrieval Modeling Tutorial
2015 [Guan, Zhang and Yang SIGIR 2013]
Scoring the Entire Session
 The overall relevance score for a session of
queries is aggregated recursively :
Scoresession (qn, d) = Score(qn, d) + gScoresession (qn-1, d)
= Score(qn, d) + g[Score(qn-1, d) + gScoresession (qn-2, d)]
= gn-i
i=1
n
å Score(qi, d)
61
Dynamic Information Retrieval Modeling Tutorial
2015 [Guan, Zhang and Yang SIGIR 2013]
Experiments
 TREC 2011-2012 query sets, datasets
 ClubWeb09 Category B
62
Dynamic Information Retrieval Modeling Tutorial
2015
Search Accuracy (TREC
2012)
 nDCG@10 (official metric used in TREC)
Approach nDCG@10 %chg MAP %chg
Lemur 0.2474 -21.54% 0.1274 -18.28%
TREC’12 median 0.2608 -17.29% 0.1440 -7.63%
Our TREC’12
submission
0.3021 −4.19% 0.1490 -4.43%
TREC’12 best 0.3221 0.00% 0.1559 0.00%
QCM 0.3353 4.10%† 0.1529 -1.92%
QCM+Dup 0.3368 4.56%† 0.1537 -1.41%
63
Dynamic Information Retrieval Modeling Tutorial
2015
Search Accuracy (TREC
2011)
 nDCG@10 (official metric used in TREC)
Approach nDCG@10 %chg MAP %chg
Lemur 0.3378 -23.38% 0.1118 -25.86%
TREC’11 median 0.3544 -19.62% 0.1143 -24.20%
TREC’11 best 0.4409 0.00% 0.1508 0.00%
QCM 0.4728 7.24%† 0.1713 13.59%†
QCM+Dup 0.4821 9.34%† 0.1714 13.66%†
Our TREC’12
submission
0.4836 9.68%† 0.1724 14.32%†
64
Dynamic Information Retrieval Modeling Tutorial
2015
Search Accuracy for Different
Session Types
 TREC 2012 Sessions are classified into:
 Product: Factual / Intellectual
 Goal quality: Specific / Amorphous
Intellec
tual
%chg Amorphous %chg Specific %chg Factual %chg
TREC best 0.3369 0.00% 0.3495 0.00% 0.3007 0.00% 0.3138 0.00%
Nugget 0.3305 -1.90% 0.3397 -2.80% 0.2736 -9.01% 0.2871 -8.51%
QCM 0.3870 14.87% 0.3689 5.55% 0.3091 2.79% 0.3066 -2.29%
QCM+DUP 0.3900 15.76% 0.3692 5.64% 0.3114 3.56% 0.3072 -2.10%
65
- Better handle sessions that demonstrate evolution and
exploration Because QCM treats a session as a continuous
process by studying changes among query transitions and
modeling the dynamicsDynamic Information Retrieval Modeling Tutorial
2015
POMDP Model
Dynamic Information Retrieval Modeling Tutorial
201566
……s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
 Hidden states
 Observations
 Belief
1R. D. Smallwood et. al., ‘73
o1 o2 o3
POMDP Definition
Dynamic Information Retrieval Modeling Tutorial
201567
 A tuple (S, M, A, R, γ, O, Θ, B)
 S : state space
 M: transition matrix
 A: action space
 R: reward function
 γ: discount factor, 0< γ ≤1
 O: observation set
an observation is a symbol emitted according to a hidden
state.
 Θ: observation function
Θ(s,a,o) is the probability that o is observed when the
system transitions into state s after taking action a, i.e.
P(o|s,a).
 B: belief space
Belief is a probability distribution over hidden states.
68/33
A Markov Chain of Decision Making
…
A1 A2 A3 A4
S1 S2 S3 Sn
“old US coins” “collecting old
US coins”
“selling old US
coins”
q1 q2 q3
“D1 is relevant and I
stay to find out more
about collecting…”
D1 D2 D3
“D2 is relevant and
I now move to the
next topic…”
“D3 is irrelevant; I slightly
edit the query and stay
here a little longer…”
[Luo, Zhang and Yang SIGIR 2014]
69/33
Hidden Decision Making States
SRT
Relevant &
Exploitation
SRR
Relevant &
Exploration
SNRT
Non-Relevant
& Exploitation
SNRR
Non-Relevant
& Exploration
 scooter price ⟶ scooter
stores
 collecting old US coins⟶
selling old US coins
 Philadelphia NYC travel ⟶
Philadelphia NYC train
 Boston tourism ⟶ NYC
tourism
q0
[Luo, Zhang and Yang SIGIR 2014]
70/33
Dual Agent Stochastic Game
Hidden states
Actions
Rewards
Markov
……s0
r0
a0
r1
a1
r2
a2
s1 s2 s3
Dual-agent game
Cooperative game
Joint optimization D2
User Agent
Search Engine
Agent
[Luo, Zhang and Yang SIGIR 2014]
71/33
Actions
 User Action (Au)
 add query terms (+Δq)
 remove query terms (-Δq)
 keep query terms (qtheme)
 Search Engine Action(Ase)
 Increase/ decrease/ keep term weights
 Switch on or off a search technique,
 e.g. to use or not to use query expansion
 adjust parameters in search techniques
 e.g., select the best k for the top k docs used in PRF
 Message from the user(Σu)
 clicked documents
 SAT clicked documents
 Message from search engine(Σse)
 top k returned documents
Messages are essentially
documents that an agent
thinks are relevant.
[Luo, Zhang and Yang SIGIR 2014]
72/33
Dual-agent Stochastic Game
Documents
(world)
User agent Search engine agent
Belief
Updater
[Luo, Zhang and Yang SIGIR 2014]
Σse= 𝐷𝑡𝑜 𝑝_ 𝑟𝑒𝑡𝑢𝑟𝑛𝑒𝑑
73/33
Dual-agent Stochastic Game
Documents
(world)
User agent
4 3
Search engine agent
Belief
Updater
[Luo, Zhang and Yang SIGIR 2014]
Σse= 𝐷𝑡𝑜 𝑝_ 𝑟𝑒𝑡𝑢𝑟𝑛𝑒𝑑
74/33
Dual-agent Stochastic Game
Documents
(world)
User agent
4 3

[Luo, Zhang and Yang SIGIR 2014]
Belief
Updater
Search engine agent
Σse= 𝐷𝑡𝑜 𝑝_ 𝑟𝑒𝑡𝑢𝑟𝑛𝑒𝑑
75/33
Observation function (O)
O(st+1, at, ωt) = P(ωt|st+1, at)
 Two types of observations
 Relevance related
 Exploration-exploitation related
Probability of making observation ωt after taking action
at and landing in state st+1
[Luo, Zhang and Yang SIGIR 2014]
76/33
Relevance-related Observation
 Intuition
 Similarly, we have
 As well as 76
st is likely to be
Relevant
Non-Relevant
If ∃d ∈ Dt-1 and
d is SAT Clicked
otherwise
It happens after the user sends out the message 𝛴 𝑢
𝑡
(clicks)
𝑂( 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 ,
ωt=Rel)≝ 𝑃(ωt = 𝑅𝑒𝑙| 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢)
𝑂(𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 , ωt = 𝑅𝑒𝑙) ∝ 𝑃 𝑠𝑡 = 𝑅𝑒𝑙 ω𝑡 = 𝑅𝑒𝑙 𝑃(ωt = 𝑅𝑒𝑙, 𝑢)
∝ 𝑃 𝑠𝑡 = 𝑅𝑒𝑙 ω𝑡 = 𝑅𝑒𝑙 𝑃(ωt = 𝑅𝑒𝑙| 𝑢)
𝑂 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙, 𝑢 , ωt = 𝑁𝑜𝑛𝑅𝑒𝑙
∝ 𝑃 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙 ω𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙 𝑃(ωt = 𝑁𝑜𝑛𝑅𝑒𝑙| 𝑢)
𝑂 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙, 𝑢 , ωt = 𝑅𝑒𝑙
𝑂 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 , ωt = 𝑁𝑜𝑛𝑅𝑒𝑙
[Luo, Zhang and Yang SIGIR 2014]
77/33
 It is a combined observation
 It happens when updating the before-message-belief-state for a user
action au(query change) and a search engine message Ʃse =Dt-1
 Intuition
st is likely to be
Exploration
Exploitation
if (+Δqt≠∅ and +Δqt∉Dt-1)
or (+Δqt=∅ and -Δqt≠∅ )
if (+Δqt≠∅ and +Δqt∈Dt-1)
or (+Δqt=∅ and –Δqt=∅ )
EXPLORATION-RELATED OBSERVATION
𝑂 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛, 𝑎𝑢 = ∆𝑞𝑡, 𝑠𝑒 = 𝐷𝑡 − 1, ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛
∝ 𝑃 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛
× 𝑃 ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ∆𝑞𝑡, 𝐷𝑡 − 1
𝑂 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛, 𝑎𝑢 = ∆𝑞𝑡, 𝑠𝑒 = 𝐷𝑡 − 1, ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛
∝ 𝑃 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛 𝑤𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛
× 𝑃(𝑤𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛|∆𝑞𝑡, 𝐷𝑡 − 1)
[Luo, Zhang and Yang SIGIR 2014]
78/33
 The belief state b is updated when a new observation is
obtained.
𝒃 𝒕+𝟏(𝒔𝒋) = 𝑷(𝒔𝒋|𝝎𝒕, 𝒂 𝒕, 𝒃 𝒕
=
𝑷(𝝎 𝒕|𝒔𝒋, 𝒂 𝒕, 𝒃 𝒕)
𝒔 𝒊∈𝑺
𝑷(𝒔𝒋|𝒔𝒊, 𝒂 𝒕, 𝒃 𝒕)𝒃 𝒕(𝒔𝒊
)𝑷(𝝎 𝒕|𝒂𝒕, 𝒃 𝒕
=
𝑶(𝒔𝒋, 𝒂 𝒕, 𝝎 𝒕)
𝒔 𝒊∈𝑺
𝑷(𝒔𝒋|𝒔𝒊, 𝒂 𝒕, 𝒃 𝒕)𝒃 𝒕(𝒔𝒊
)𝑷(𝝎 𝒕|𝒂𝒕, 𝒃 𝒕
BELIEF UPDATES (B)
79/33
 The long term reward for the search engine agent
 The long term reward for the user agent
 Joint optimization
𝑸 𝒔𝒆(𝒃, 𝒂) =
𝒔∈𝑺
)𝒃(𝒔)𝑹(𝒔, 𝒂 + 𝜸
𝝎∈𝜴
𝑷(𝝎|𝒃, 𝒂 𝒖, 𝜮 𝒔𝒆)𝑷(𝝎|𝒃, 𝜮 𝒖)𝒎𝒂𝒙
𝒂
𝑸 𝒔𝒆(𝒃′, 𝒂
𝑸 𝒖(𝒃, 𝒂 𝒖) = 𝑹(𝒔, 𝒂 𝒖) + 𝜸 𝒂 𝒖
)𝑻(𝒔 𝒕|𝒔𝒕−𝟏, 𝑫 𝒕−𝟏 𝒎𝒂𝒙 𝒔 𝒕−𝟏
𝑸 𝒖(𝒔𝒕−𝟏, 𝒂 𝒖)
= P(qt|d) +𝜸 𝒂 𝒖
)𝐏(𝒒 𝒕|𝒒 𝒕−𝟏, 𝑫 𝒕−𝟏, 𝒂 𝒎𝒂𝒙 𝑫 𝒕−𝟏
𝑷 (𝒒 𝒕−𝟏|𝑫 𝒕−𝟏)
𝒂 𝒔𝒆 = 𝒂𝒓𝒈𝒎𝒂𝒙
𝒂
(𝑸 𝒔𝒆(𝒃, 𝒂) + 𝑸 𝒖(𝒃, 𝒂 𝒖))
JOINT OPTIMIZATION — WIN-WIN
[Luo, Zhang and Yang SIGIR 2014]
Dynamic Search Engine Demo
http://dumplingproject.org
Dynamic Information Retrieval Modeling Tutorial
201580
81/33
EXPERIMENTS
 Evaluate on TREC 2012 and 2013 Session Tracks
 The session logs contain
 session topic
 user queries
 previously retrieved URLs, snippets
 user clicks, and dwell time etc.
 Task: retrieve 2,000 documents for the last query in each session
 The evaluation is based on the whole session.
 A document related to any query in the session is a good document
81
 Datasets
 ClueWeb09
 ClueWeb12
 Spams, dups are
removed
82/33
ACTIONS
 increasing weights of the added terms by a factor of
x={1.05, 1.10, 1.15, 1.20, 1.25, 1.5, 1.75 or 2};
 decreasing weights of the added terms by a factor of
y={0.5, 0.57, 0.67, 0.8, 0.83, 0.87, 0.9 or 0.95};
 Query Change Model (QCM) proposed in Guan et. al
SIGIR’13;
 Pseudo Relevance Feedback which assumes the top 20
retrieved documents are relevant;
 directly uses the query in current iteration to perform
retrieval;
 combines all queries in a session weights them equally.
82
 
a
Di
)D|(qPmaxa),D,q|(qP+d)|(qP=d),Score(q 1-i1-i1-i1-iiii
1

83/33
SEARCH ACCURACY
 Search accuracy on TREC 2012 Session Track
83
 Win-win outperforms most retrieval algorithms on
TREC 2012.
84/33
84
Win-win outperforms all retrieval algorithms
on TREC 2013.
 It is highly effective in Session Search.
 Search accuracy on TREC 2013 Session Track
SEARCH ACCURACY
85/33
IMMEDIATE SEARCH ACCURACY
85
 Original run: top returned documents provided by TREC log data
 Win-win’s immediate search accuracy is better than the Original at
every iteration
 Win-win's immediate search accuracy increases while the number
of search iterations increases
TREC 2012 Session Track TREC 2013 Session Track
86/33
86
 q1=“best US destinations”
observation= NRR
SRT
Relevant &
Exploitation
0.1784
SRR
Relevant &
Exploration
0.1135
SNRT
Non-Relevant &
Exploitation
0.2838
SNRR
Non-Relevant
& Exploration
0.4243
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
BELIEF UPDATES (B)
q0
87/33
87
 q1=“best US destinations”
observation= NRR
 q2=“distance New York
Boston”
observation = RT
SRT
Relevant &
Exploitation
0.0005
SRR
Relevant &
Exploration
0.0068
SNRT
Non-Relevant &
Exploitation
0.0715
SNRR
Non-Relevant
& Exploration
0.9212
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
88/33
88
 q1=“best US destinations”
observation= NRR
 q2=“distance New York
Boston”
observation = RT
SRT
Relevant &
Exploitation
0.0005
SRR
Relevant &
Exploration
0.0068
SNRT
Non-Relevant &
Exploitation
0.0715
SNRR
Non-Relevant
& Exploration
0.9212
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
89/33
89
 q1=“best US destinations”
observation= NRR
 q2=“distance New York
Boston”
observation = RT
 q3=“maps.bing.com”
observation = NRT
SRT
Relevant &
Exploitation
0.0151
SRR
Relevant &
Exploration
0.4347
SNRT
Non-Relevant &
Exploitation
0.0276
SNRR
Non-Relevant
& Exploration
0.5226
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
90/33
90
 q1=“best US destinations”
observation= NRR
 q2=“distance New York
Boston”
observation = RT
 q3=“maps.bing.com”
observation = NRT
SRT
Relevant &
Exploitation
0.0151
SRR
Relevant &
Exploration
0.4347
SNRT
Non-Relevant &
Exploitation
0.0276
SNRR
Non-Relevant
& Exploration
0.5226
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
91/33
91
 q1=“best US destinations”
observation= NRR
 q2=“distance New York
Boston”
observation = RT
 q3=“maps.bing.com”
observation = NRT
SRT
Relevant &
Exploitation
0.0291
SRR
Relevant &
Exploration
0.7837
SNRT
Non-Relevant &
Exploitation
0.0081
SNRR
Non-Relevant
& Exploration
0.1790  q20=“Philadelphia NYC train”
observation = NRT
……
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
92/33
92
 q1=“best US destinations”
observation= NRR
 q2=“distance New York
Boston”
observation = RT
 q3=“maps.bing.com”
observation = NRT
SRT
Relevant &
Exploitation
0.0291
SRR
Relevant &
Exploration
0.7837
SNRT
Non-Relevant &
Exploitation
0.0081
SNRR
Non-Relevant
& Exploration
0.1790  q20=“Philadelphia NYC train”
observation = NRT
……
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
93/33
93
 q1=“best US destinations”
observation= NRR
 q2=“distance New York
Boston”
observation = RT
 q3=“maps.bing.com”
observation = NRT
SRT
Relevant &
Exploitation
0.0304
SRR
Relevant &
Exploration
0.8126
SNRT
Non-Relevant &
Exploitation
0.0066
SNRR
Non-Relevant
& Exploration
0.1505  q20=“Philadelphia NYC train”
observation = NRT
 q21=“Philadelphia NYC bus”
observation = NRT
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
……
94/33
94
 q1=“best US destinations”
observation= NRR
 q2=“distance New York
Boston”
observation = RT
 q3=“maps.bing.com”
observation = NRT
SRT
Relevant &
Exploitation
0.0304
SRR
Relevant &
Exploration
0.8126
SNRT
Non-Relevant &
Exploitation
0.0066
SNRR
Non-Relevant
& Exploration
0.1505  q20=“Philadelphia NYC train”
observation = NRT
 q21=“Philadelphia NYC bus”
observation = NRT
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
……
Coffee Break
Dynamic Information Retrieval Modeling Tutorial
201595
Apply an MDP to an IR Problem
- Example
Dynamic Information Retrieval Modeling Tutorial
201596
 User agent in session search
 States – user’s relevance judgement
 Action – new query
 Reward – information gained
[Luo, Zhang, Yang SIGIR’14]
 The agent uses a state estimator to update its
belief about the hidden states
b′
= 𝑆𝐸(𝑏, 𝑎, 𝑜′)
 b′
s′
= P s′
o′
, a, b =
𝑃(𝑠′,𝑜′|𝑎,𝑏)
P(𝑜′|𝑎,𝑏)
=
Θ(𝑠′, 𝑎, 𝑜′) 𝑠 𝑀(𝑠, 𝑎, 𝑠′)𝑏(𝑠)
𝑃(𝑜′|𝑎, 𝑏)
POMDP → Belief Update
Dynamic Information Retrieval Modeling Tutorial
201597
POMDP → Bellman Equation
Dynamic Information Retrieval Modeling Tutorial
201598
 The Bellman equation for POMDP
𝑉 𝑏 = max
𝑎
𝑟 𝑏, 𝑎 + 𝛾
𝑜′
𝑃(𝑜′
|𝑎, 𝑏)𝑉(𝑏′
)
 A POMDP can be transformed into a continuous belief MDP (B, 𝑀′, A,
r, γ)
 B : the continuous belief space
 𝑀′: transition function 𝑀 𝑎
′ (𝑏, 𝑏′)= 𝑜∈𝑂 1 𝑎,𝑜′(𝑏′, 𝑏)Pr(𝑜′|𝑎, 𝑏)
where 1 𝑎,𝑜′ 𝑏′
, 𝑏 =
1, 𝑖𝑓 𝑆𝐸 𝑏, 𝑎, 𝑜′
= 𝑏′
0, 𝑒𝑙𝑠𝑒
.
 A: action space
 r: reward function r(b, a)= 𝑠∈𝑆 𝑏 𝑠 𝑅(𝑠, 𝑎)
Applying POMDP to Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
201599
POMDP Dynamic IR
Environment Documents
Agents User, Search engine
States Queries, User’s decision making status, Relevance of
documents, etc
Actions Provide a ranking of documents, Weigh terms in the query,
Add/remove/unchange the query terms, Switch on or
switch off a search technology, Adjust parameters for a
search technology
Observations Queries, Clicks, Document lists, Snippets, Terms, etc
Rewards Evaluation measures (such as DCG, NDCG or MAP)
Clicking information
Transition matrix Given in advance or estimated from training data.
Observation
function
Problem dependent, Estimated based on sample datasets
Session Search Example - States
100
SRT
Relevant &
Exploitation
SRR
Relevant &
Exploration
SNRT
Non-Relevant &
Exploitation
SNRR
Non-Relevant &
Exploration
 scooter price ⟶ scooter
stores
 Hartford visitors ⟶ Hartford
Connecticut tourism
 Philadelphia NYC travel ⟶
Philadelphia NYC train
 distance New York Boston ⟶
maps.bing.com
q0
[ J. Luo ,et al., ’14]
Dynamic Information Retrieval Modeling Tutorial
2015
Session Search Example - Actions
(Au, Ase)
101
 User Action(Au)
 Add query terms (+Δq)
 Remove query terms (-Δq)
 keep query terms (qtheme)
 clicked documents
 SAT clicked documents
 Search Engine Action(Ase)
 increase/decrease/keep term weights,
 Switch on or switch off query expansion
 Adjust the number of top documents used in PRF
 etc.
[ J. Luo et al., ’14]
Dynamic Information Retrieval Modeling Tutorial
2015
TREC Session Tracks (2010-
2012)
 Given a series of queries {q1,q2,…,qn}, top 10
retrieval results {D1, … Di-1 } for q1 to qi-1, and
click information
 The task is to retrieve a list of documents for the
current/last query, qn
 Relevance judgment is made based on how
relevant the documents are for qn, and how relevant
they are for information needs for the entire session
(in topic description)
 no need to segment the sessions
102
Dynamic Information Retrieval Modeling Tutorial
2015
Query change is an important
form of feedback
 We define query change as the syntactic
editing changes between two adjacent queries:
 includes
 , added terms
 , removed terms
 The unchanged/shared terms are called:
 , theme term
1 iii qqq
iq
103
iq
iq
iq
themeq
q1 = “bollywood
legislation”
q2 = “bollywood law”
-------------------------------------
--
Theme Term =
“bollywood”
Dynamic Information Retrieval Modeling Tutorial
2015
Where do these query changes come
from?
 Given TREC Session settings, we consider two
sources of query change:
 the previous search results that a user
viewed/read/examined
 the information need
 Example:
 Kurosawa  Kurosawa wife
 `wife’ is not in any previous results, but in the topic
description
 However, knowing information needs before
search is difficult to achieve
104
Dynamic Information Retrieval Modeling Tutorial
2015
Previous search results could
influence query change in quite
complex ways
 Merck lobbyists  Merck lobbying US policy
 D1 contains several mentions of ‘policy’, such as
 “A lobbyist who until 2004 worked as senior policy
advisor to Canadian Prime Minister Stephen Harper was
hired last month by Merck …”
 These mentions are about Canadian policies; while
the user adds US policy in q2
 Our guess is that the user might be inspired by
‘policy’, but he/she prefers a different sub-concept
other than `Canadian policy’
 Therefore, for the added terms `US policy’, ‘US’ is the
novel term here, and ‘policy’ is not since it appeared
in D1.
 The two terms should be treated differently
105
Dynamic Information Retrieval Modeling Tutorial
2015
106/33
POMDP
Rich Interactions
Hidden, Evolving
Information Needs
A Long Term
Goal
Temporal
Dependency
actions
hidden states
rewards
Markov
property
POMDP
(Partially Observable
Markov Decision
Process)
SG
(Stochastic Games)
Multi-agent
Collaboration
Recap – Characteristics of
Dynamic IR
Dynamic Information Retrieval Modeling Tutorial
2015107
 Rich interactions
Query formulation, Document clicks, Document
examination, eye movement, mouse movements, etc.
 Temporal dependency
 Overall goal
Modeling Query Change
 A framework that is inspired by Reinforcement
Learning
 Reinforcement Learning for Markov Decision
Process
 models a state space S and an action space A
according to a transition model T = P(si+1|si ,ai)
 a policy π(s) = a indicates that at a state s, what are
the actions a can be taken by the agent
 each state is associated with a reward function R
that indicates possible positive reward or negative
loss that a state and an action may result.
 Reinforcement learning offers general solutions to
MDP and seeks for the best policy for an agent.108
Outline
Dynamic Information Retrieval Modeling Tutorial
2015109
 Introduction & Theory
 Session Search
 Dynamic Ranking
 Multi Armed Bandits
 Portfolio Ranking
 Multi-Page Search
 Recommendation and Advertising
 Guest Talk: Charlie Clarke
 Discussion Panel
Dynamic Information Retrieval Modeling Tutorial
2015110
 Markov Process
 Hidden Markov Model
 Markov Decision Process
 Partially Observable Markov Decision Process
 Multi-Armed Bandit
Family of Markov Models
Multi Armed Bandits (MAB)
Dynamic Information Retrieval Modeling Tutorial
2015111
……
……
Which slot
machine
should I select
in this round?
Reward
Multi Armed Bandits (MAB)
Dynamic Information Retrieval Modeling Tutorial
2015112
I won! Is this
the best slot
machine?
Reward
MAB Definition
Dynamic Information Retrieval Modeling Tutorial
2015113
 A tuple (S, A, R, B)
S : hidden reward distribution of each
bandit
A: choose which bandit to play
R: reward for playing bandit
B: belief space, our estimate of each
bandit’s distribution
Comparison with Markov Models
Dynamic Information Retrieval Modeling Tutorial
2015114
 Single state Markov Decision Process
No transition probability
 Similar to POMDP in that we maintain a
belief state
 Action = choose a bandit, does not
affect state
 Does not ‘plan ahead’ but intelligently
adapts
 Somewhere between interactive and
dynamic IR
MAB Policy Reward
Dynamic Information Retrieval Modeling Tutorial
2015115
 MAB algorithm describes a policy 𝜋 for
choosing bandits
 Maximise rewards from chosen bandits
over all time steps
 Minimize regret
 𝑡=1
𝑇
𝑅𝑒𝑤𝑎𝑟𝑑 𝑎∗ − 𝑅𝑒𝑤𝑎𝑟𝑑(𝑎 𝜋(𝑡))
 Cumulative difference between optimal
reward and actual reward
Exploration vs Exploitation
Dynamic Information Retrieval Modeling Tutorial
2015116
 Exploration
 Try out bandits to find which has highest average
reward
 Exploitation
 Too much exploration leads to poor performance
 Play bandits that are known to pay out higher
reward on average
 MAB algorithms balance exploration and
exploitation
 Start by exploring more to find best bandits
 Exploit more as best bandits become known
MAB – Index Algorithms
Dynamic Information Retrieval Modeling Tutorial
2015117
 Gittens index1
 Play bandit with highest ‘Dynamic Allocation Index’
 Modelled using MDP but suffers ‘curse of
dimensionality’
 𝜖-greedy2
 Play highest reward bandit with probability 1 − ϵ
 Play random bandit with probability 𝜖
 UCB (Upper Confidence Bound)3
1J. C. Gittins. ‘89
2Nicolò Cesa-Bianchi et. al.,
‘98
Comparison of Markov
Models
Dynamic Information Retrieval Modeling Tutorial
2015118
 Markov Process – a fully observable stochastic
process
 Hidden Markov Model – a partially observable
stochastic process
 MDP – a fully observable decision process
 MAB – a decision process, either fully or partially
observable
 POMDP – a partially observable decision process
actions rewards states
Markov Process No No Observable
Hidden Markov
Model
No No Unobservable
MDP Yes Yes Observable
POMDP Yes Yes Unobservable
MAB Yes Yes Fixed
Outline
Dynamic Information Retrieval Modeling Tutorial
2015119
 Introduction & Theory
 Session Search
 Dynamic Ranking
 Multi Armed Bandits
 Portfolio Ranking
 Multi-Page Search
 Recommendation and Advertising
 Guest Talk: Charlie Clarke
 Discussion Panel
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015120
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015121
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Calculate for all 𝑖 and select highest
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015122
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Calculate for all 𝑖 and select highest
 Average reward 𝑥𝑖
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015123
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Calculate for all 𝑖 and select highest
 Average reward 𝑥𝑖
 Time step 𝑡
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015124
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Calculate for all 𝑖 and select highest
 Average reward 𝑥𝑖
 Time step 𝑡
 Number of times bandit 𝑖 has been played 𝑇𝑖
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015125
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Calculate for all 𝑖 and select highest
 Average reward 𝑥𝑖
 Time step 𝑡
 Number of times bandit 𝑖 has been played 𝑇𝑖
 Chances of playing infrequently played bandits
increases over time
Iterative Expectation
Dynamic Information Retrieval Modeling Tutorial
2015126
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
M. Sloan and J. Wang ‘1
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015127
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Documents 𝑖
M. Sloan and J. Wang ‘1
Iterative Expectation
Dynamic Information Retrieval Modeling Tutorial
2015128
𝑟𝑖 +
2 ln 𝑡
𝑇𝑖
 Documents 𝑖
 Average probability of relevance 𝑟𝑖
M. Sloan and J. Wang ‘1
Iterative Expectation
Dynamic Information Retrieval Modeling Tutorial
2015129
𝑟𝑖 +
2 ln 𝑡
𝛾𝑖(𝑡)
 Documents 𝑖
 Average probability of relevance 𝑟𝑖
 ‘Effective’ number of impressions
 𝛾𝑖 𝑡 = 𝑘=1
𝑡
𝛼
𝐶 𝑘
𝛽
1−𝐶 𝑘
 𝛼 and 𝛽 reward clicks and non-clicks depending on
rank
M. Sloan and J. Wang ‘1
Iterative Expectation
Dynamic Information Retrieval Modeling Tutorial
2015130
𝑟𝑖 + 𝜆
2 ln 𝑡
𝛾𝑖(𝑡)
 Documents 𝑖
 Average probability of relevance 𝑟𝑖
 ‘Effective’ number of impressions
 𝛾𝑖 𝑡 = 𝑘=1
𝑡
𝛼
𝐶 𝑘
𝛽
1−𝐶 𝑘
 𝛼 and 𝛽 reward clicks and non-clicks depending on
rank
 Exploration parameter 𝜆
M. Sloan and J. Wang ‘1
Portfolio Theory of IR
Dynamic Information Retrieval Modeling Tutorial
2015131
 Portfolio Theory maximises expected return for a
given amount of risk1
 Diversity of portfolio increases likely return
 We can consider documents as ‘shares’
 Documents are dependent on one another, unlike
PRP
 Portfolio Theory of IR2 allows us to introduce diversity
1H. Markowitz. ‘52
2J. Wang et. al. ‘09
Portfolio Ranking
Dynamic Information Retrieval Modeling Tutorial
2015132
 Documents are dependent on each other
 Co-click Matrix from users and logs1
 Portfolio Armed Bandit Ranking2:
 Exploratively rank using Iterative Expectation
 Diversify using portfolio optimisation over co-click matrix
 Update relevance and dependence with each click
 Both explorative and diverse
1W. Wu et al. ‘11
2M. Sloan and Jun Wang‘1
Outline
Dynamic Information Retrieval Modeling Tutorial
2015133
 Introduction & Theory
 Session Search
 Dynamic Ranking
 Multi Armed Bandits
 Portfolio Ranking
 Multi-Page Search
 Recommendation and Advertising
 Guest Talk: Charlie Clarke
 Discussion Panel
Multi Page Search
Dynamic Information Retrieval Modeling Tutorial
2015134
Page 1 Page 2
2.
1.
2.
1.
X Jin, M. Sloan and J. Wang
’13
Multi Page Search Example -
States & Actions
Dynamic Information Retrieval Modeling Tutorial
2015135
State:
Relevanc
e of
docume
nt
Action:
Ranking
of
document
s
Observatio
n: Clicks Belief:
Multivariate
Guassian
Reward: DCG
over 2 pages
X Jin, M. Sloan and J. Wang
’13
Model
Dynamic Information Retrieval Modeling Tutorial
2015136
Model
Dynamic Information Retrieval Modeling Tutorial
2015137
 𝑁 𝜃1, Σ1
 𝜃1 -prior estimate of relevance
 Σ1 - prior estimate of covariance
 Document similarity
 Topic Clustering
Model
Dynamic Information Retrieval Modeling Tutorial
2015138
 Rank action for page 1
Model
Dynamic Information Retrieval Modeling Tutorial
2015139
Model
Dynamic Information Retrieval Modeling Tutorial
2015140
 Feedback from page 1
 𝒓 ~ 𝑁(𝜃𝒔
1
, Σ 𝒔
1
)
Model
Dynamic Information Retrieval Modeling Tutorial
2015141
 Update estimates using 𝒓1
 𝜃1
=
𝜃𝒔′
𝜃 𝒔′
Σ1
=
Σ𝒔′ Σs′𝒔′
Σs′𝒔′ Σ 𝒔′
 𝜃2
= 𝜃𝒔′ + Σs′𝒔′Σ 𝒔′
−1
(𝒓1
− 𝜃𝒔′)
 Σ2 = Σ𝒔′ - Σs′𝒔′Σ 𝒔′
−1
Σs′𝒔′
Model
Dynamic Information Retrieval Modeling Tutorial
2015142
 Rank using PRP
Model
Dynamic Information Retrieval Modeling Tutorial
2015143
 Utility or Ranking
 𝜆 𝑗=1
𝑀
𝜃 𝑠 𝑗
1
log2(𝑗+1)
+ 1 − 𝜆 𝑗=1+𝑀
2𝑀
𝜃 𝑠 𝑗
2
log2(𝑗+1)
 DCG
Model – Bellman Equation
Dynamic Information Retrieval Modeling Tutorial
2015144
 Optimize 𝒔1 to improve 𝑼 𝒔
2
 𝑉 𝜃1
, Σ1
, 1 = max
𝒔1
𝜆𝜃𝒔
1
. 𝑾1 +
𝜆
Dynamic Information Retrieval Modeling Tutorial
2015145
 Balances exploration and exploitation in page 1
 Tuned for different queries
 Navigational
 Informational
 𝜆 = 1 for non-ambiguous search
Approximation
Dynamic Information Retrieval Modeling Tutorial
2015146
 Monte Carlo Sampling
 ≈ max
𝒔1
𝜆𝜃𝒔
1
. 𝑾1 + max
𝒔2
1 − 𝜆
1
𝑆 𝑟∈𝑂 𝜃𝒔
2
. 𝑾2 𝑃 𝒓
 Sequential Ranking Decision
Experiment Data
Dynamic Information Retrieval Modeling Tutorial
2015147
 Difficult to evaluate without access to live users
 Simulated using 3 TREC collections and
relevance judgements
 WT10G – Explicit Ratings
 TREC8 – Clickthroughs
 Robust – Difficult (ambiguous) search
User Simulation
Dynamic Information Retrieval Modeling Tutorial
2015148
 Rank M documents
 Simulated user clicks according to relevance
judgements
 Update page 2 ranking
 Measure at page 1 and 2
 Recall
 Precision
 nDCG
 MRR
 BM25 – prior ranking model
Investigating λ
Dynamic Information Retrieval Modeling Tutorial
2015149
Baselines
Dynamic Information Retrieval Modeling Tutorial
2015150
 𝜆 determined experimentally
 BM25
 BM25 with conditional update (𝜆 = 1)
 Maximum Marginal Relevance (MMR)
 Diversification
 MMR with conditional update
 Rocchio
 Relevance Feedback
Results
Dynamic Information Retrieval Modeling Tutorial
2015151
Results
Dynamic Information Retrieval Modeling Tutorial
2015152
Results
Dynamic Information Retrieval Modeling Tutorial
2015153
Results
Dynamic Information Retrieval Modeling Tutorial
2015154
Results
Dynamic Information Retrieval Modeling Tutorial
2015155
Outline
Dynamic Information Retrieval Modeling Tutorial
2015156
 Introduction & Theory
 Session Search
 Dynamic Ranking
 Recommendation and Advertising
 Guest Talk: Charlie Clarke
 Discussion Panel
Cold-start problem in recommmender systems
Interactive Recommender Systems
Possible Solutions
Zhao, Xiaoxue, Weinan Zhang, and Jun
Wang. "Interactive collaborative filtering."
CIKM, 2013.
Objective
Cold-start problemInteractive
mechanism for CF
Zhao, Xiaoxue, Weinan Zhang, and Jun
Wang. "Interactive collaborative filtering."
CIKM, 2013.
Proposed EE algorithms
Thompson Sampling
Linear-UCB
General Linear-UCB
Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CI
2013.
Cold-start users
Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CIKM
2013.
Ad selection problem
Dynamic Information Retrieval Modeling Tutorial
2015163
 how online publishers could optimally select ads
to maximize their ad incomes over time?
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
Selling in
multiple-
channels
with non-
fixed
prices
Dynamic Information Retrieval Modeling Tutorial
2015164
Problem formulation
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
Problem formulation
Dynamic Information Retrieval Modeling Tutorial
2015165
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
Objective function
Dynamic Information Retrieval Modeling Tutorial
2015166
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
Belief update
Dynamic Information Retrieval Modeling Tutorial
2015167
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
Results
Dynamic Information Retrieval Modeling Tutorial
2015168
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
Outline
Dynamic Information Retrieval Modeling Tutorial
2015169
 Introduction & Theory
 Session Search
 Dynamic Ranking
 Recommendation and Advertising
 Guest Talk: Charlie Clarke
 Discussion Panel
Dynamic Information Retrieval Evaluation
Guest talk at the WSDM 2015 tutorial on
Dynamic Information Retrieval Modeling
Charlie Clarke
(with much much input from Mark Smucker)
University of Waterloo, Canada
Moving from static ranking to dynamic domains
• How to extend IR evaluation methodologies to
dynamic domains?
• Three key ideas:
1. Realistic models of searcher interactions
2. Measures costs to searcher in meaningful units
(e.g., time, money, …)
3. Measure benefits to searcher in meaningful units
(e.g, time, nuggets, …)
Charles Clarke, University of Waterloo 171
This talk strongly reflects my opinions (not trying to be neutral).
But I am the guest speaker 
Evaluating Information Access Systems
Charles Clarke, University of Waterloo 172
searching, browsing, summarization,
visualization, desktop, mobile, web,
books, images, questions, etc., and
combinations of these
Does the system work for its users?
Will this change make the system better or worse?
How do we quantify performance?
Performance 101: Is this a good search result?
Charles Clarke, University of Waterloo 173
How to evaluate?
Study users
Charles Clarke, University of Waterloo 174
Users in the wild:
• A/B Testing
• Result interleaving
• Clicks and dwell time
• Mouse movements
• Other implicit feedback
• …
Users in the lab:
• Time to task completion
• Think aloud protocols
• Questionnaires
• Eye tracking
• …
Unfortunately user studies are
• Slow
• Expensive
• Conditions can never be exactly duplicated
(e.g., learning to rank)
Charles Clarke, University of Waterloo 175
Alternative: User performance prediction
Can we predict the impact of a proposed change to an
information access system (while respecting and reflecting
differences between users)?
Can we quantify performance improvements in meaningful
units so that effect sizes can be considered in statistical
testing? Are improvements practically significant, as well as
statistically significant?
Want to predict the impact of a proposed change
automatically, based on existing user performance data,
rather than gathering new performance data.
Charles Clarke, University of Waterloo 176
The BIG goal
↵
Traditional Evaluation of Rankers
• Test collection:
– Documents
– Queries
– Relevance judgments
• Each ranker generates a ranked list of
documents for each query
• Score ranked lists using relevance judgments
and standard metrics (recall, mean average
precision, nDCG, ERR, RBP, ….).
Charles Clarke, University of Waterloo 177
Charles Clarke, University of Waterloo 178
Example of a good-old-fashioned IR Metric
Relevant2.
Non-relevant1.
Non-relevant3.
Relevant5.
Non-relevant4.
Non-relevant6.
Non-relevant7.
Ranked List of
Documents
8.
…
Precision at
Rank N
0.00
0.50
0.33
0.25
0.40
0.33
0.29
…
Average Precision is
the average of the
precision at N for each
relevant document.
Mean average
precision (MAP) is AP
averaged over the set
of queries.
AP =
1
R
Prec(Ri )
Ri
å
Precision at rank N is the fraction
of documents that are relevant in
the first N documents.
General form of effectiveness measures
Nearly all standard effectiveness measures
have the same basic form (including nDCG,
RBP, ERR, average precision,…):
Charles Clarke, University of Waterloo 179
Normalization
Rank Gain at rank k
Discount
factor
Implicit user model…
• User works down the ranked list spending
equal time on each document. Captions,
navigation, etc., have no impact.
• If they make it to rank i, they receive some
benefit (i.e., gain).
• Eventually they stop, which is reflected in the
discount (i.e., they are less likely to reach
lower ranks).
• Normalization typically maps the score into
the range [0:1]. Units may not be meaningful.
Charles Clarke, University of Waterloo 180
Traditional Evaluation of Rankers
• Many effectiveness measures: precision,
recall, average precision, rank-biased
precision, discounted cumulative gain, etc.
• Widely used and accepted as standard
practice.
• But…
• What does an improvement in average precision from
0.28 to 0.31 mean to users?
• Does an increase in the measure really translate to an
improved user experience?
• How will an improve in the performance of a single
component impact overall system performance?
Charles Clarke, University of Waterloo 181
How to better reflect user variation and system performance?
Charles Clarke, University of Waterloo 182
Example: What’s the simplest possible user interface for search?
1) User issues a query
2) System returns material to read
i.e., system returns stuff to read, in order
(not a list of documents; more like a newspaper article)
A correspondingly simple user model, has two parameters:
1) Reading speed
2) Time spent reading
Reading speed distribution (from users in the lab)
Charles Clarke, University of Waterloo 183
Empirical distribution of reading speed during an information access task,
and its fit to a log-normal distribution.
Stopping time distribution (from users in the wild)
Charles Clarke, University of Waterloo 184
Empirical distribution of time spent searching during an information access
task, and its fit to a log-normal distribution.
Evaluating a search result
Charles Clarke, University of Waterloo 185
1) Generate a reading speed from the distribution
2) Generate a stopping time from the distribution
3) How much useful material did the user read?
4) Repeat for many (simulated) users
As an example, we use passage retrieval runs from TREC 2006
Hard Track, which essentially assume our simple user interface.
We measure costs to searcher in terms of time spent searching.
We measure benefits to searcher in terms of “time well spent”.
Useful characters read vs. Characters read
Charles Clarke, University of Waterloo 186
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
Useful characters read vs. Time spent reading
Charles Clarke, University of Waterloo 187
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
Time well spent vs. Time spent reading
Charles Clarke, University of Waterloo 188
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
Distribution of time well spent
Charles Clarke, University of Waterloo 189
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
Temporal precision vs. Time spent Reading
Charles Clarke, University of Waterloo 190
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
Distribution of temporal precision
Charles Clarke, University of Waterloo 191
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
General Framework (Part I): Cumulative Gain
• Consider the performance of a system in terms
of a cost-benefit (cumulative gain) curve G(t).
– Measure costs (e.g., in terms of time spent).
– Measure benefits (e.g., in terms of time well
spent).
• A particular instance of G(t) represents a
single user (described by a set of parameters)
interacting with a system. not just a list!!!
• G(t) captures factors intrinsic to the system.
We don’t know how much time the user has to
invest, but for different levels of investment,
G(t) indicates the benefit.Charles Clarke, University of Waterloo 192
General Framework (Part II): Decay
• Consider the user’s willingness to invest time in
terms of a decay curve D(t), which provides a
survival probability.
• We assume that G(t) and D(t) are independent.
(System dependent stopping probabilities are
accommodated in G(t). Details on request.)
• D(t) captures factors extrinsic to the system.
The user only has so much time they could
invest. The cannot invest more, even if they
would receive substantial additional benefit
from further interaction.
Charles Clarke, University of Waterloo 193
General form of effectiveness measures (REMINDER)
Nearly all standard effectiveness measures
have the same basic form (including nDCG,
RBP, ERR, average precision,…):
Charles Clarke, University of Waterloo 194
Normalization
Rank Gain at rank k
Discount
factor
General Framework (Part III): Time-biased gain
Overall system performance may be expressed
as expected cumulative gain (which also
incorporates standard effectiveness measures):
Charles Clarke, University of Waterloo 195
Normalization (== 1?)
Time Gain at time t
Decay
factor
General Framework (Part IV): Multiple users
• Cumulative gain may be computed by
– Simulation (drawing a set of parameters from a
population of users).
– Measuring actual interaction on live systems.
– Combinations of measurement and simulation.
• Simulating and/or measuring multiple users
allows us to consider performance difference
across the population of users.
• Simulation provides matching pairs (the same
user on both systems) increasing our ability to
detect differences.
Charles Clarke, University of Waterloo 196
General Framework
Most of the evaluation proposals in the
references can be reformulated in terms of this
general framework, including those that
address issues of:
– Novelty and diversity
– Filtering, summarization, question answering
– Session search, etc.
Charles Clarke, University of Waterloo 197
One more example from our current research…
Session search example
• Two (or more) result lists, e.g., from query
reformulation, query suggestion, or switching
search engines.
• Modeling searcher interaction requires a
switch from one result to another.
• The optimal time to switch depends on the
total time available to search.
For example (with many details omitted…):
Charles Clarke, University of Waterloo 198
Simulation of searchers switching between lists: A vs. B
Charles Clarke, University of Waterloo 199
User starts on list A.
If the user has less
than five minutes to
search, they should
stay on list A.
If the user has more
than five minutes to
search, they should
leave list A after 90
seconds.
But can we assume
optimal behavior when
modeling users?
Simulation of searchers switching between lists: A vs. B
Charles Clarke, University of Waterloo 200
0 2 4 6 8 10
02468
Switch Time (minutes)
AverageGain(relevantdocuments)
10 minutes
8 minutes
6 minutes
4 minutes
2 minutes
Session Duration
Topic = 389, List A = sab05ror1, List B = uic0501
Different view of the
same simulation, with
thousands of simulated
users.
Here, benefits are
measured by number of
relevant documents
seen.
Optimal switching time
depends on session
duration.
Summary
• Primary goal of IR evaluation: Predict how changes
to an IR system will impact the user experience.
• Evaluation in dynamic domains requires us to
explicitly model the system interface and the user’s
search behavior. Costs and benefits must be
measured in meaningful units (e.g., time).
• Successful IR evaluation requires measurement of
users, both “in the wild” and in the lab. These
measurements calibrate models, which make
predictions, which improve systems.
Charles Clarke, University of Waterloo 201
A few key papers
• Leif Azzopardi. 2009. Usage based effectiveness measures: monitoring application
performance in information retrieval. In Proceedings of the 18th ACM conference on
Information and knowledge management (CIKM '09).
• Leif Azzopardi, Diane Kelly, and Kathy Brennan. 2013. How query cost affects search
behavior. In Proceedings of the 36th international ACM SIGIR conference on Research and
development in information retrieval (SIGIR '13).
• Feza Baskaya, Heikki Keskustalo, and Kalervo Järvelin. 2012. Time drives interaction:
simulating sessions in diverse searching environments. In Proceedings of the 35th
international ACM SIGIR conference on research and development in information retrieval
(SIGIR '12).
• Ben Carterette. 2011. System effectiveness, user models, and user utility: a conceptual
framework for investigation. In Proceedings of the 34th international ACM SIGIR
conference on research and development in Information Retrieval (SIGIR '11).
• Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2011. Simulating simple user
behavior for system effectiveness evaluation. In Proceedings of the 20th ACM international
conference on information and knowledge management (CIKM '11).
• Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2012. Incorporating variability in
user behavior into systems based evaluation. In Proceedings of the 21st ACM international
conference on information and knowledge management (CIKM '12).
Charles Clarke, University of Waterloo 202
A few more key papers
• Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected
reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on
information and knowledge management (CIKM '09).
• Charles L.A. Clarke, Nick Craswell, Ian Soboroff, and Azin Ashkan. 2011. A comparative
analysis of cascade measures for novelty and diversity. In Proceedings of the fourth ACM
international conference on web search and data mining (WSDM '11).
• Charles L. A. Clarke and Mark D. Smucker. 2014. Time well spent. In Proceedings of the
5th information interaction in context symposium (IIiX '14).
• Georges Dupret and Mounia Lalmas. 2013. Absence time and user engagement:
evaluating ranking functions. In Proceedings of the sixth ACM international conference on
web search and data mining (WSDM '13).
• Kalervo Järvelin, Susan L. Price, Lois M. L. Delcambre, and Marianne Lykke Nielsen. 2008.
Discounted cumulated gain based evaluation of multiple-query IR sessions. In Proceedings
of the IR research, 30th European conference on Advances in information retrieval
(ECIR'08).
• Jiyun Luo, Christopher Wing, Hui Yang, and Marti Hearst. 2013. The water filling model and
the cube test: multi-dimensional evaluation for professional search. In Proceedings of the
22nd ACM international conference on information & knowledge management (CIKM '13).
Charles Clarke, University of Waterloo 203
And yet more key papers
• Tetsuya Sakai and Zhicheng Dou. 2013. Summaries, ranked retrieval and sessions: a
unified framework for information access evaluation. In Proceedings of the 36th
international ACM SIGIR conference on Research and development in information retrieval
(SIGIR '13).
• Mark D. Smucker and Charles L.A. Clarke. 2012. Time-based calibration of effectiveness
measures. In Proceedings of the 35th international ACM SIGIR conference on Research
and development in information retrieval (SIGIR '12).
• Mark D. Smucker and Charles L. A. Clarke. 2012. Modeling user variance in time-biased
gain. In Proceedings of the Symposium on Human-Computer Interaction and Information
Retrieval (HCIR '12).
• Emine Yilmaz, Milad Shokouhi, Nick Craswell, and Stephen Robertson. 2010. Expected
browsing utility for web search evaluation. In Proceedings of the 19th ACM international
conference on Information and knowledge management (CIKM '10).
• Yiming Yang and Abhimanyu Lad. 2009. Modeling expected utility of multi-session
information distillation. In Proceedings of the 2nd international conference on the theory of
information retrieval (ICTIR ’09).
• Plus many other (ask me).
Charles Clarke, University of Waterloo 204
Dynamic Information Retrieval Evaluation
Guest talk at the WSDM 2015 tutorial on
Dynamic Information Retrieval Modeling
Charlie Clarke
University of Waterloo, Canada
Thank you!
Outline
Dynamic Information Retrieval Modeling Tutorial
2015206
 Introduction & Theory
 Session Search
 Dynamic Ranking
 Recommendation and Advertising
 Guest Talk: Charlie Clarke
 Discussion Panel
Apply an MDP to an IR
Problem
Dynamic Information Retrieval Modeling Tutorial
2015207
 We can model IR systems using a Markov
Decision Process
 Is there a temporal component?
 States – What changes with each time step?
 Actions – How does your system change the
state?
 Rewards – How do you measure feedback or
effectiveness in your problem at each time
step?
 Transition Probability – Can you determine
this?
Apply an MDP to an IR Problem
- Example
Dynamic Information Retrieval Modeling Tutorial
2015208
 User agent in session search
 States – user’s relevance judgement
 Action – new query
 Reward – information gained
[Luo, Zhang, Yang SIGIR’14]
Apply an MDP to an IR Problem
- Example
Dynamic Information Retrieval Modeling Tutorial
2015209
 Search engine’s perspective
 What if we can’t directly observe user’s
relevance judgement?
 Click ≠ relevance
? ? ? ?
Applying POMDP to Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
2015210
POMDP Dynamic IR
Environment Documents
Agents User, Search engine
States Queries, User’s decision making status, Relevance of
documents, etc
Actions Provide a ranking of documents, Weigh terms in the query,
Add/remove/unchange the query terms, Switch on or
switch off a search technology, Adjust parameters for a
search technology
Observations Queries, Clicks, Document lists, Snippets, Terms, etc
Rewards Evaluation measures (such as DCG, NDCG or MAP)
Clicking information
Transition matrix Given in advance or estimated from training data.
Observation
function
Problem dependent, Estimated based on sample datasets
 SIGIR Tutorial July 7th 2014
Grace Hui Yang
Marc Sloan
Jun Wang
 Guest Speaker: Emine Yilmaz
Dynamic Information Retrieval
Modeling
Panel
Discussion
Outline
Dynamic Information Retrieval Modeling Tutorial
2015212
 Introduction & Theory
 Session Search
 Dynamic Ranking
 Recommendation and Advertising
 Guest Talk: Charlie Clarke
 Discussion Panel
 Conclusion
Conclusions
Dynamic Information Retrieval Modeling Tutorial
2015213
 Dynamic IR describes a new class of interactive
model
 Incorporates rich feedback, temporal dependency
and is goal oriented.
 Family of Markov models and Multi Armed Bandit
theory useful in building DIR models
 Applicable to a range of IR problems
 Useful in applications such as session search and
evaluation
Dynamic IR Book
Dynamic Information Retrieval Modeling Tutorial
2015214
 Published by Morgan & Claypool
 ‘Synthesis Lectures on Information Concepts,
Retrieval, and Services’
 Due April / May 2015 (in time for SIGIR 2015)
TREC 2015
Dynamic Domain Track
 Co-organized by Grace Hui Yang, John Frank, Ian Soboroff
 Underexplored subsets of Web content
 Limited scope and richness of indexed content, which may not
include relevant components of the deep web
 temporary pages,
 pages behind forms, etc.
 Basic search interfaces, where there is little collaboration or
history beyond independent keyword search
 Complex, task-based, dynamic search
 Temporal dependency
 Rich interactions
 Complex, evolving information needs
 Professional users
 A wide range of search strategies
215
Task
 An interactive, multiple runs of search
 Starting point: System is given a search query
 Iterate
 System returns a ranked list of 5 documents
 API returns relevance judgments
 go to next iteration of retrieval
 until done (system decides when to stop)
 The goal of the system is to find relevant information for
each topic as soon as possible
 One-shot ad-hoc search is included
 If system decides to stop after iteration one
216
domains
Domain Corpus
Illicit goods 30k forum posts from 5-10 forums (total ~300k posts)
Which users are working together to sell illicit goods?
Ebola One million tweets
300k docs from in-country web sites (mostly official sites)
Who is doing what and where?
Local Politics 300k docs from local political groups in Pacific Northwest
and British Columbia. Who is campaigning for what and
why?
217
TIME Line
 TREC Call for Participation: January 2015
 Data Available: March
 Detailed Guidelines: April/May
 Topics, Tasks available: June
 Systems do their thing: June-July
 Evaluation: August
 Results to participants: September
 Conference: November 2015
218
TREC 2015
Total Recall Track
 Co-organized by Gord Cormack, Maura
Grossman, , Adam Roegiest, Charlie Clarke
 Explores high recall tasks through an active
learning process modeled on legal search tasks
(eDiscovery, patent search).
 Participating system start with a topic and proposes
a relevant document.
 Systems gets immediate feedback on relevance.
 Continues to propose additional documents and
receive feedback until stopping condition is
researched.
 Shared online infrastructure and collections with
Dynamic Domain. Easy to participate in both, if
you participate in one.
219
Acknowledgment
Dynamic Information Retrieval Modeling Tutorial
2015220
 We thank Prof. Charlie Clarke and for his guest
lecture
 We sincerely thank Dr. Xuchu Dong for his help in
preparation of the tutorial
 We also thank comments and suggestions from
the following colleagues:
 Dr Filip Radlinski
 Prof. Maarten de Rijke
References
Dynamic Information Retrieval Modeling Tutorial
2015221
Static IR
 Modern Information Retrieval. R. Baeza-Yates and B.
Ribeiro-Neto. Addison-Wesley, 1999.
 The PageRank Citation Ranking: Bringing Order to
the Web. Lawrence Page , Sergey Brin , Rajeev
Motwani , Terry Winograd. 1999
 Implicit User Modeling for Personalized Search,
Xuehua Shen et. al, CIKM, 2005
 A Short Introduction to Learning to Rank. Hang Li,
IEICE Transactions 94-D(10): 1854-1862, 2011.
 Portfolio Theory of Information Retrieval. J. Wang and
J. Zhu. In SIGIR 2009
References
Dynamic Information Retrieval Modeling Tutorial
2015222
Interactive IR
 Relevance Feedback in Information Retrieval,
Rocchio, J. J., The SMART Retrieval System (pp.
313-23), 1971
 A study in interface support mechanisms for
interactive information retrieval, Ryen W. White et. al,
JASIST, 2006
 Visualizing stages during an exploratory search
session, Bill Kules et. al, HCIR, 2011
 Dynamic Ranked Retrieval, Cristina Brandt et. al,
WSDM, 2011
 Structured Learning of Two-level Dynamic Rankings,
Karthik Raman et. al, CIKM, 2011
References
Dynamic Information Retrieval Modeling Tutorial
2015223
Dynamic IR
 A hidden Markov model information retrieval system.
D. R. H. Miller, T. Leek, and R. M. Schwartz. In
SIGIR’99, pages 214-221.
 Threshold setting and performance optimization in
adaptive filtering, Stephen Robertson, JIR 2002
 A large-scale study of the evolution of web pages,
Dennis Fetterly et. al., WWW 2003
 Learning diverse rankings with multi-armed bandits.
Filip Radlinski, Robert Kleinberg, Thorsten Joachims.
ICML, 2008.
 Interactively Optimizing Information Retrieval Systems
as a Dueling Bandits Problem, Yisong Yue et. al.,
ICML 2009
 Meme-tracking and the dynamics of the news cycle,
References
Dynamic Information Retrieval Modeling Tutorial
2015224
Dynamic IR
 Mortal multi-armed bandits. Deepayan Chakrabarti, Ravi
Kumar, Filip Radlinski, Eli Upfal. NIPS 2009
 A Novel Click Model and Its Applications to Online
Advertising , Zeyuan Allen Zhu et. al., WSDM 2010
 A contextual-bandit approach to personalized news article
recommendation. Lihong Li, Wei Chu, John Langford,
Robert E. Schapire. WWW, 2010
 Inferring search behaviors using partially observable
markov model with duration (POMD), Yin he et. al.,
WSDM, 2011
 No Clicks, No Problem: Using Cursor Movements to
Understand and Improve Search, Jeff Huang et. al., CHI
2011
 Balancing Exploration and Exploitation in Learning to Rank
Online, Katja Hofmann et. al., ECIR, 2011
 Large-Scale Validation and Analysis of Interleaved Search
Evaluation, Olivier Chapelle et. al., TOIS 2012
References
Dynamic Information Retrieval Modeling Tutorial
2015225
Dynamic IR
 Using Control Theory for Stable and Efficient
Recommender Systems. T. Jambor, J. Wang, N.
Lathia. In: WWW '12, pages 11-20.
 Sequential selection of correlated ads by POMDPs,
Shuai Yuan et. al., CIKM 2012
 Utilizing query change for session search. D. Guan,
S. Zhang, and H. Yang. In SIGIR ’13, pages 453–
462.
 Query Change as Relevance Feedback in Session
Search (short paper). S. Zhang, D. Guan, and H.
Yang. In SIGIR 2013.
 Interactive exploratory search for multi page search
results. X. Jin, M. Sloan, and J. Wang. In WWW ’13.
 Interactive Collaborative Filtering. X. Zhao, W.
References
Dynamic Information Retrieval Modeling Tutorial
2015226
Dynamic IR
 Win-win search: Dual-agent stochastic game in
session search. J. Luo, S. Zhang, and H. Yang. In
SIGIR ’14.
 Iterative Expectation for Multi-Period Information
Retrieval. M. Sloan and J. Wang. In WSCD 2013.
 Dynamical Information Retrieval Modelling: A
Portfolio-Armed Bandit Machine Approach. M.
Sloan and J. Wang. In WWW 2012.
 Jiyun Luo, Sicong Zhang, Xuchu Dong and Hui
Yang. Designing States, Actions, and Rewards for
Using POMDP in Session Search. In ECIR 2015.
 Sicong Zhang, Jiyun Luo, Hui Yang. A POMDP
Model for Content-Free Document Re-ranking. In
SIGIR 2014.
References
Dynamic Information Retrieval Modeling Tutorial
2015227
Markov Processes
 A markovian decision process. R. Bellman. Indiana
University Mathematics Journal, 6:679–684, 1957.
 Dynamic Programming. R. Bellman. Princeton University
Press, Princeton, NJ, USA, first edition, 1957.
 Dynamic Programming and Markov Processes. R.A.
Howard. MIT Press. 1960
 Linear Programming and Sequential Decisions. Alan S.
Manne. Management Science, 1960
 Statistical Inference for Probabilistic Functions of Finite
State Markov Chains. Baum, Leonard E.; Petrie, Ted. The
Annals of Mathematical Statistics 37, 1966
References
Dynamic Information Retrieval Modeling Tutorial
2015228
Markov Processes
 Learning to predict by the methods of temporal differences.
Richard Sutton. Machine Learning 3. 1988
 Computationally feasible bounds for partially observed
Markov decision processes. W. Lovejoy. Operations
Research 39: 162–175, 1991.
 Q-Learning. Christopher J.C.H. Watkins, Peter Dayan.
Machine Learning. 1992
 Reinforcement learning with replacing eligibility traces.
Singh, S. P. & Sutton, R. S. Machine Learning, 22, pages
123-158, 1996.
 Reinforcement Learning: An Introduction. Richard S.
Sutton and Andrew G. Barto. MIT Press, 1998.
 Planning and acting in partially observable stochastic
domains. L. Kaelbling, M. Littman, and A. Cassandra.
Artificial Intelligence, 101(1-2):99–134, 1998.
References
Dynamic Information Retrieval Modeling Tutorial
2015229
Markov Processes
 Finding approximate POMDP solutions through belief
compression. N. Roy. PhD Thesis Carnegie Mellon. 2003
 VDCBPI: an approximate scalable algorithm for large scale
POMDPs, P. Poupart and C. Boutilier. In NIPS-2004,
pages 1081–1088.
 Finding Approximate POMDP solutions Through Belief
Compression. N. Roy, G. Gordon and S. Thrun. Journal of
Artificial Intelligence Research, 23:1-40,2005.
 Probabilistic robotics. S. Thrun, W. Burgard, D. Fox.
Cambridge. MIT Press. 2005
 Anytime Point-Based Approximations for Large POMDPs.
J. Pineau, G. Gordon and S. Thrun. Volume 27, pages
335-380, 2006
 Probabilistic Robotics. S. Thrun, W. Burgard, D. Fox. The
MIT Press, 2006.
References
Dynamic Information Retrieval Modeling Tutorial
2015230
Markov Processes
 The optimal control of partially observable Markov decision
processes over a finite horizon. R. D. Smallwood, E.J. Sondik.
Operations Research. 1973
 Modified Policy Iteration Algorithms for Discounted Markov
Decision Problems. M. L. Puterman and Shin M. C. Management
Science 24, 1978.
 An example of statistical investigation of the text eugene onegin
the connection of samples in chains. A. A. Markov. Science in
Context, 19:591–600, 12 2006.
 Learning to Rank for Information Retrieval. Tie-Yan Liu. Springer
Science & Business Media. 2011
 Finite-Time Regret Bounds for the Multiarmed Bandit Problem,
Nicolò Cesa-Bianchi, Paul Fischer. ICML 100-108, 1998
 Multi-armed bandit allocation indices, Wiley, J. C. Gittins. 1989
 Finite-time Analysis of the Multiarmed Bandit Problem, Peter
Auer et. al., Machine Learning 47, Issue 2-3. 2002.

Contenu connexe

Dernier

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 

Dernier (20)

Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 

En vedette

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

En vedette (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Dynamic Information Retrieval Tutorial - WSDM 2015

  • 1. WSDM Tutorial February 2nd 2015 Grace Hui Yang Marc Sloan Jun Wang Guest Speaker: Charlie Clarke Dynamic Information Retrieval Modeling
  • 2. Dynamic Information Retrieval Dynamic Information Retrieval Modeling Tutorial 20152 Document s to explore Informatio n need Observed document s User Devise a strategy for helping the user explore the information space in order to learn which documents are relevant and which aren’t, and satisfy their information need.
  • 3. Evolving IR Dynamic Information Retrieval Modeling Tutorial 20153  Paradigm shifts in IR as new models emerge  e.g. VSM → BM25 → Language Model  Different ways of defining relationship between query and document  Static → Interactive → Dynamic  Evolution in modeling user interaction with search engine
  • 4. Outline Dynamic Information Retrieval Modeling Tutorial 20154  Introduction & Theory  Static IR  Interactive IR  Dynamic IR  Session Search  Dynamic Ranking  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  • 5. Conceptual Model – Static IR Dynamic Information Retrieval Modeling Tutorial 20155 Static IR Interactive IR Dynamic IR  No feedback
  • 6. Characteristics of Static IR Dynamic Information Retrieval Modeling Tutorial 20156  Does not learn directly from user  Parameters updated periodically
  • 7. Dynamic Information Retrieval Modeling Tutorial 20157 Commonly Used Static IR Models BM25 PageRank Language Model Learning to Rank
  • 8. Feedback in IR Dynamic Information Retrieval Modeling Tutorial 20158
  • 9. Outline Dynamic Information Retrieval Modeling Tutorial 20159  Introduction & Theory  Static IR  Interactive IR  Dynamic IR  Session Search  Dynamic Ranking  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  • 10. Conceptual Model – Interactive IR Dynamic Information Retrieval Modeling Tutorial 201510 Static IR Interactive IR Dynamic IR  Exploit Feedback
  • 11. Learn the user’s taste interactively! At the same time, provide good recommendations! Dynamic Information Retrieval Modeling Tutorial 201511 Interactive Recommender Systems
  • 12. Toy Example Dynamic Information Retrieval Modeling Tutorial 201512  Multi-Page search scenario  User image searches for “jaguar”  Rank two of the four results over two pages: 𝑟 = 0.5 𝑟 = 0.51 𝑟 = 0.9𝑟 = 0.49
  • 13. Toy Example – Static Ranking Dynamic Information Retrieval Modeling Tutorial 201513  Ranked according to PRP Page 1 Page 2 1. 2. 𝑟 = 0.9 𝑟 = 0.51 1. 2. 𝑟 = 0.5 𝑟 = 0.49
  • 14. Toy Example – Relevance Feedback Dynamic Information Retrieval Modeling Tutorial 201514  Interactive Search  Improve 2nd page based on feedback from 1st page  Use clicks as relevance feedback  Rocchio1 algorithm on terms in image webpage  𝑤 𝑞 ′ = 𝛼𝑤 𝑞 + 𝛽 |𝐷 𝑟| 𝑑∈𝐷 𝑟 𝑤 𝑑 − 𝛾 𝐷 𝑛 𝑑∈𝐷 𝑛 𝑤 𝑑  New query closer to relevant documents and different to non-relevant documents1Rocchio, J. J., ’71, Baeza- Yates & Ribeiro-Neto ‘99
  • 15. Toy Example – Relevance Feedback Dynamic Information Retrieval Modeling Tutorial 201515  Ranked according to PRP and Rocchio Page 1 Page 2 2. 𝑟 = 0.9 𝑟 = 0.51 1. 2. 𝑟 = 0.5 𝑟 = 0.49 * 1. * Click
  • 16. Toy Example – Relevance Feedback Dynamic Information Retrieval Modeling Tutorial 201516  No click when searching for animals Page 1 Page 2 2. 𝑟 = 0.9 𝑟 = 0.51 1. 2. 1. ? ?
  • 17. Toy Example – Value Function Dynamic Information Retrieval Modeling Tutorial 201517  Optimize both pages using dynamic IR  Bellman equation for value function  Simplified example:  𝑉 𝑡 𝜃 𝑡 , Σ 𝑡 = max 𝑠 𝑡 𝜃𝑠 𝑡 + 𝐸(𝑉 𝑡+1 𝜃 𝑡+1 , Σ 𝑡+1 𝐶 𝑡 )  𝜃 𝑡, Σ 𝑡 = relevance and covariance of documents for page 𝑡  𝐶 𝑡 = clicks on page 𝑡  𝑉 𝑡 = ‘value’ of ranking on page 𝑡  Maximize value over all pages based on estimating feedback X Jin, M. Sloan and J. Wang ’13
  • 18. 1 0.8 0.1 0 0.8 1 0.1 0 0.1 0.1 1 0.95 0 0 0.95 1 Toy Example - Covariance Dynamic Information Retrieval Modeling Tutorial 201518  Covariance matrix represents similarity between images X Jin, M. Sloan and J. Wang ’13
  • 19. Toy Example – Myopic Value Dynamic Information Retrieval Modeling Tutorial 201519  For myopic ranking, 𝑉2 = 16.380 Page 1 2. 1. X Jin, M. Sloan and J. Wang ’13
  • 20. Toy Example – Myopic Ranking Dynamic Information Retrieval Modeling Tutorial 201520  Page 2 ranking stays the same regardless of clicksPage 1 Page 2 2. 1. 2. 1. X Jin, M. Sloan and J. Wang ’13
  • 21. Toy Example – Optimal Value Dynamic Information Retrieval Modeling Tutorial 201521  For optimal ranking, 𝑉2 = 16.528 Page 1 2. 1. X Jin, M. Sloan and J. Wang ’13
  • 22. Toy Example – Optimal Ranking Dynamic Information Retrieval Modeling Tutorial 201522  If car clicked, Jaguar logo is more relevant on next pagePage 1 Page 2 2. 1. 2. 1. X Jin, M. Sloan and J. Wang ’13
  • 23. Toy Example – Optimal Ranking Dynamic Information Retrieval Modeling Tutorial 201523  In all other scenarios, rank animal first on next pagePage 1 Page 2 2. 1. 2. 1. X Jin, M. Sloan and J. Wang ’13
  • 24. x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone Documents exist in vector space 24 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015 Static IR Visualization
  • 25. Static IR Visualization x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone Q 25 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015 t = 1: Static IR considers Relevancy
  • 26. Static IR Visualization x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone Q 26 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015 t = 1: Static IR considers Relevancy
  • 27. Interactive IR Update x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone Q -1 -1 +1 Q’ 27 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015 t = 1: Static IR considers Relevancy t = 2: Interactive considers local gain
  • 28. Interactive IR Update x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone Q -1 -1 +1 Q’ 28 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015 t = 1: Static IR considers Relevancy t = 2: Interactive considers local gain
  • 29. Dynamic Ranking Principle x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone t = 1: Relevancy + Variance Q 29 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015
  • 30. Dynamic Ranking Principle x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone t = 1: Relevancy + Variance + |Correlations| Q -1 -1 +1 30 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015
  • 31. Dynamic Ranking Principle x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone t = 1: Relevancy + Variance + |Correlations| Diversified, exploratory relevance ranking Q 31 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015
  • 32. Dynamic Ranking Principle x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone Q -1 -1 +1 Q’ 32 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015 t = 1: Relevancy + Variance + |Correlations| Diversified, exploratory relevance ranking t = 2: Personalized Re-ranking
  • 33. Interactive vs Dynamic IR Dynamic Information Retrieval Modeling Tutorial 201533 • Treats interactions independently • Responds to immediate feedback • Static IR used before feedback received • Optimizes over all interaction • Long term gains • Models future user feedback • Also used at beginning of interaction Interactive Dynamic
  • 34. Interactive & Dynamic Techniques Dynamic Information Retrieval Modeling Tutorial 201534 • Rocchio equation in Relevance Feedback • Collaborative filtering in recommender systems • Active learning in interactive retrieval • POMDP in multi page search and ad recommendati on • Multi Armed Bandits in Online Evaluation • MDP in session search Interactive Dynamic
  • 35. Outline Dynamic Information Retrieval Modeling Tutorial 201535  Introduction & Theory  Static IR  Interactive IR  Dynamic IR  Session Search  Dynamic Ranking  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  • 36. Conceptual Model – Interactive IR Dynamic Information Retrieval Modeling Tutorial 201536 Static IR Interactive IR Dynamic IR  Explore and exploit Feedback
  • 37. Characteristics of Dynamic IR Dynamic Information Retrieval Modeling Tutorial 201537 Rich interactions Query formulation Document clicks Document examination Eye movement Mouse movements etc. [Luo et al., IRJ under revision 2014]
  • 38. Characteristics of Dynamic IR Dynamic Information Retrieval Modeling Tutorial 201538 Temporal dependency clicked documentsquery D1 ranked documents q1 C1 D2 q2 C2 …… …… Dn qn Cn I information need iteration 1 iteration 2 iteration n [Luo et al., IRJ under revision 2014]
  • 39. Characteristics of Dynamic IR Dynamic Information Retrieval Modeling Tutorial 201539 Overall goal Optimize over all iterations for goal IR metric or user satisfaction Optimal policy [Luo et al., IRJ under revision 2014]
  • 40. 40/33 Dynamic Information Retrieval Dynamic Relevance Dynamic Users Dynamic Queries Dynamic Documents Dynamic Information Needs Users change behavior over time, user history Topic Trends, Filtering, document content change User perceived relevance changes Changing query definition i.e. ‘Twitter’ Information needs evolve over time Next generation Search Engine
  • 41. Why Not Existing Supervised Learning for Dynamic IR Modeling? Dynamic Information Retrieval Modeling Tutorial 201541  Lack of enough training data  Dynamic IR problems contain a sequence of dynamic interactions  E.g. a series of queries in session  Rare to find repeated sequences (close to zero)  Even in large query logs (WSCD 2013 & 2014, query logs from Yandex)  Chance of finding repeated adjacent query pairs is also lowDataset Repeated Adjacent Query Pairs Total Adjacent Query Pairs Repeated Percentage WSCD 2013 476,390 17,784,583 2.68% WSCD 2014 1,959,440 35,376,008 5.54%
  • 42. Our Solution Dynamic Information Retrieval Modeling Tutorial 201542 Try to find an optimal solution through a sequence of dynamic interactions Trial and Error: learn from repeated, varied attempts which are continued until success No (or less) Supervised Learning
  • 43. Trial and Error Dynamic Information Retrieval Modeling Tutorial 201543  q1 – "dulles hotels"  q2 – "dulles airport"  q3 – "dulles airport location"  q4 – "dulles metrostop"
  • 44. What is a Desirable Model for Dynamic IR Dynamic Information Retrieval Modeling Tutorial 201544  Model interactions, which means it needs to have place holders for actions;  Model information need hidden behind user queries and other interactions;  Set up a reward mechanism to guide the entire search algorithm to adjust its retrieval strategies;  Represent Markov properties to handle the temporal dependency. A model in Trial and Error setting will do! A Markov Model will do!
  • 45. Markov Decision Process Dynamic Information Retrieval Modeling Tutorial 201545  MDP extends MC with actions and rewards1 si– state ai – action ri – reward pi – transition probability p0 p1 p2 ……s0 s1 r0 a0 s2 r1 a1 s3 r2 a2 1R. Bellman, ‘57 (S, M, A, R, γ)
  • 46. Definition of MDP Dynamic Information Retrieval Modeling Tutorial 201546  A tuple (S, M, A, R, γ)  S : state space  M: transition matrix Ma(s, s') = P(s'|s, a)  A: action space  R: reward function R(s,a) = immediate reward taking action a at state s  γ: discount factor, 0< γ ≤1  policy π π(s) = the action taken at state s  Goal is to find an optimal policy π* maximizing the expected total rewards.
  • 47. Optimality — Bellman Equation Dynamic Information Retrieval Modeling Tutorial 201547  The Bellman equation1 to MDP is a recursive definition of the optimal value function V*(.) 𝑉∗ s = max 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑠′ 𝑀 𝑎(𝑠, 𝑠′)𝑉∗(𝑠′)  Optimal Policy π∗ s = arg 𝑚𝑎𝑥 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑠′ 𝑀 𝑎 𝑠, 𝑠′ 𝑉∗(𝑠′) 1R. Bellman, ‘57 state-value function
  • 48. MDP algorithms Dynamic Information Retrieval Modeling Tutorial 201548  Value Iteration  Policy Iteration  Modified Policy Iteration  Prioritized Sweeping  Temporal Difference (TD) Learning  Q-Learning Model free approaches Model-based approaches [Bellman, ’57, Howard, ‘60, Puterman and Shin, ‘78, Singh & Sutton, ‘96, Sutton & Barto, ‘98, Richard Sutton, ‘88, Watkins, ‘92] Solve Bellman equation Optimal value V*(s) Optimal policy *(s) [Slide altered from Carlos Guestrin’s ML lecture]
  • 49. Apply an MDP to an IR Problem Dynamic Information Retrieval Modeling Tutorial 201549  We can model IR systems using a Markov Decision Process  Is there a temporal component?  States – What changes with each time step?  Actions – How does your system change the state?  Rewards – How do you measure feedback or effectiveness in your problem at each time step?  Transition Probability – Can you determine this?
  • 50. Outline Dynamic Information Retrieval Modeling Tutorial 201550  Introduction & Theory  Session Search  Dynamic Ranking  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  • 51. TREC Session Tracks (2010- now)  Given a series of queries {q1,q2,…,qn}, top 10 retrieval results {D1, … Di-1 } for q1 to qi-1, and click information  The task is to retrieve a list of documents for the current/last query, qn  Relevance judgment is made based on how relevant the documents are for qn, and how relevant they are for information needs for the entire session (in topic description)  no need to segment the sessions 51 Dynamic Information Retrieval Modeling Tutorial 2015
  • 52. 1.pocono mountains pennsylvania 2.pocono mountains pennsylvania hotels 3.pocono mountains pennsylvania things to do 4.pocono mountains pennsylvania hotels 5.pocono mountains camelbeach 6.pocono mountains camelbeach hotel 7.pocono mountains chateau resort 8.pocono mountains chateau resort attractions 9.pocono mountains chateau resort getting to 10.chateau resort getting to 11.pocono mountains chateau resort directions TREC 2012 Session 6 52 Information needs: You are planning a winter vacation to the Pocono Mountains region in Pennsylvania in the US. Where will you stay? What will you do while there? How will you get there? In a session, queries change constantly Dynamic Information Retrieval Modeling Tutorial 2015
  • 53. Markov Decision Process  We propose to model session search as a Markov decision process (MDP)  Two agents: the User and the Search Engine 53 [Guan, Zhang and Yang SIGIR 2013]
  • 54. Settings of the Session MDP  States: Queries  Environments: Search results  Actions:  User actions:  Add/remove/ unchange the query terms  Nicely correspond to our definition of query change  Search Engine actions:  Increase/ decrease /remain term weights 54 [Guan, Zhang and Yang SIGIR 2013]
  • 55. Search Engine Agent’s Actions ∈ Di−1 action Example qtheme Y increase “pocono mountain” in s6 N increase “france world cup 98 reaction” in s28, france world cup 98 reaction stock market→ france world cup 98 reaction +∆q Y decrease ‘policy’ in s37, Merck lobbyists → Merck lobbyists US policy N increase ‘US’ in s37, Merck lobbyists → Merck lobbyists US policy −∆q Y decrease ‘reaction’ in s28, france world cup 98 reaction → france world cup 98 N No change ‘legislation’ in s32, bollywood legislation →bollywood law 55 [Guan, Zhang and Yang SIGIR 2013]
  • 56. Bellman Equation  In a MDP, it is believed that a future reward is not worth quite as much as a current reward and thus a discount factor γ ϵ (0,1) is applied to future rewards.  Bellman Equation gives the optimal value (expected long term reward starting from state s and continuing with policy π from then on) for an MDP: 56 V* (s) = max a R(s,a) + g P(s' | s,a) s' å V* (s')
  • 57. Our Tweak  In a MDP, it is believed that a future reward is not worth quite as much as a current reward and thus a discount factor γ ϵ (0,1) is applied to future rewards.  In session search, a past reward is not worth quite as much as a current reward and thus a discount factor γ should be applied to past rewards  We model the MDP for session search in a reverse order 57
  • 58. Query Change retrieval Model (QCM)  Bellman Equation gives the optimal value for an MDP:  The reward function is used as the document relevance score function and is tweaked backwards from Bellman equation: 58 V* (s) = max a R(s,a) + g P(s' | s,a) s' å V* (s')   a Di )D|(qPmaxa),D,q|(qP+d)|(qP=d),Score(q 1-i1-i1-i1-iiii 1  Document relevant score Query Transition model Maximum past relevanceCurrent reward/relevan ce score [Guan, Zhang and Yang SIGIR 2013]
  • 59. Calculating the Transition Model )|(log)|( )|(log)()|(log)|( )|(log)]|(1[+d)|P(qlog=d),Score(q * 1 * 1 * 1ii * 1 * 1 dtPdtP dtPtidfdtPdtP dtPdtP qt i dt qt dt qt i qthemet i ii                    59 • According to Query Change and Search Engine ActionsCurrent reward/ relevance score Increase weights for theme terms Decrease weights for removed terms Increase weights for novel added termsDecrease weights for old added terms [Guan, Zhang and Yang SIGIR 2013]
  • 60. Maximizing the Reward Function  Generate a maximum rewarded document denoted as d* i-1, from Di-1  That is the document(s) most relevant to qi-1  The relevance score can be calculated as 𝑃 𝑞𝑖−1 𝑑𝑖−1 = 1 − 𝑡∈𝑞 𝑖−1 {1 − 𝑃(𝑡|𝑑𝑖−1)} 𝑃 𝑡 𝑑𝑖−1 = #(𝑡,𝑑 𝑖−1) |𝑑 𝑖−1|  From several options, we choose to only use the document with top relevance max Di-1 P(qi-1 | Di-1) 60 Dynamic Information Retrieval Modeling Tutorial 2015 [Guan, Zhang and Yang SIGIR 2013]
  • 61. Scoring the Entire Session  The overall relevance score for a session of queries is aggregated recursively : Scoresession (qn, d) = Score(qn, d) + gScoresession (qn-1, d) = Score(qn, d) + g[Score(qn-1, d) + gScoresession (qn-2, d)] = gn-i i=1 n å Score(qi, d) 61 Dynamic Information Retrieval Modeling Tutorial 2015 [Guan, Zhang and Yang SIGIR 2013]
  • 62. Experiments  TREC 2011-2012 query sets, datasets  ClubWeb09 Category B 62 Dynamic Information Retrieval Modeling Tutorial 2015
  • 63. Search Accuracy (TREC 2012)  nDCG@10 (official metric used in TREC) Approach nDCG@10 %chg MAP %chg Lemur 0.2474 -21.54% 0.1274 -18.28% TREC’12 median 0.2608 -17.29% 0.1440 -7.63% Our TREC’12 submission 0.3021 −4.19% 0.1490 -4.43% TREC’12 best 0.3221 0.00% 0.1559 0.00% QCM 0.3353 4.10%† 0.1529 -1.92% QCM+Dup 0.3368 4.56%† 0.1537 -1.41% 63 Dynamic Information Retrieval Modeling Tutorial 2015
  • 64. Search Accuracy (TREC 2011)  nDCG@10 (official metric used in TREC) Approach nDCG@10 %chg MAP %chg Lemur 0.3378 -23.38% 0.1118 -25.86% TREC’11 median 0.3544 -19.62% 0.1143 -24.20% TREC’11 best 0.4409 0.00% 0.1508 0.00% QCM 0.4728 7.24%† 0.1713 13.59%† QCM+Dup 0.4821 9.34%† 0.1714 13.66%† Our TREC’12 submission 0.4836 9.68%† 0.1724 14.32%† 64 Dynamic Information Retrieval Modeling Tutorial 2015
  • 65. Search Accuracy for Different Session Types  TREC 2012 Sessions are classified into:  Product: Factual / Intellectual  Goal quality: Specific / Amorphous Intellec tual %chg Amorphous %chg Specific %chg Factual %chg TREC best 0.3369 0.00% 0.3495 0.00% 0.3007 0.00% 0.3138 0.00% Nugget 0.3305 -1.90% 0.3397 -2.80% 0.2736 -9.01% 0.2871 -8.51% QCM 0.3870 14.87% 0.3689 5.55% 0.3091 2.79% 0.3066 -2.29% QCM+DUP 0.3900 15.76% 0.3692 5.64% 0.3114 3.56% 0.3072 -2.10% 65 - Better handle sessions that demonstrate evolution and exploration Because QCM treats a session as a continuous process by studying changes among query transitions and modeling the dynamicsDynamic Information Retrieval Modeling Tutorial 2015
  • 66. POMDP Model Dynamic Information Retrieval Modeling Tutorial 201566 ……s0 s1 r0 a0 s2 r1 a1 s3 r2 a2  Hidden states  Observations  Belief 1R. D. Smallwood et. al., ‘73 o1 o2 o3
  • 67. POMDP Definition Dynamic Information Retrieval Modeling Tutorial 201567  A tuple (S, M, A, R, γ, O, Θ, B)  S : state space  M: transition matrix  A: action space  R: reward function  γ: discount factor, 0< γ ≤1  O: observation set an observation is a symbol emitted according to a hidden state.  Θ: observation function Θ(s,a,o) is the probability that o is observed when the system transitions into state s after taking action a, i.e. P(o|s,a).  B: belief space Belief is a probability distribution over hidden states.
  • 68. 68/33 A Markov Chain of Decision Making … A1 A2 A3 A4 S1 S2 S3 Sn “old US coins” “collecting old US coins” “selling old US coins” q1 q2 q3 “D1 is relevant and I stay to find out more about collecting…” D1 D2 D3 “D2 is relevant and I now move to the next topic…” “D3 is irrelevant; I slightly edit the query and stay here a little longer…” [Luo, Zhang and Yang SIGIR 2014]
  • 69. 69/33 Hidden Decision Making States SRT Relevant & Exploitation SRR Relevant & Exploration SNRT Non-Relevant & Exploitation SNRR Non-Relevant & Exploration  scooter price ⟶ scooter stores  collecting old US coins⟶ selling old US coins  Philadelphia NYC travel ⟶ Philadelphia NYC train  Boston tourism ⟶ NYC tourism q0 [Luo, Zhang and Yang SIGIR 2014]
  • 70. 70/33 Dual Agent Stochastic Game Hidden states Actions Rewards Markov ……s0 r0 a0 r1 a1 r2 a2 s1 s2 s3 Dual-agent game Cooperative game Joint optimization D2 User Agent Search Engine Agent [Luo, Zhang and Yang SIGIR 2014]
  • 71. 71/33 Actions  User Action (Au)  add query terms (+Δq)  remove query terms (-Δq)  keep query terms (qtheme)  Search Engine Action(Ase)  Increase/ decrease/ keep term weights  Switch on or off a search technique,  e.g. to use or not to use query expansion  adjust parameters in search techniques  e.g., select the best k for the top k docs used in PRF  Message from the user(Σu)  clicked documents  SAT clicked documents  Message from search engine(Σse)  top k returned documents Messages are essentially documents that an agent thinks are relevant. [Luo, Zhang and Yang SIGIR 2014]
  • 72. 72/33 Dual-agent Stochastic Game Documents (world) User agent Search engine agent Belief Updater [Luo, Zhang and Yang SIGIR 2014] Σse= 𝐷𝑡𝑜 𝑝_ 𝑟𝑒𝑡𝑢𝑟𝑛𝑒𝑑
  • 73. 73/33 Dual-agent Stochastic Game Documents (world) User agent 4 3 Search engine agent Belief Updater [Luo, Zhang and Yang SIGIR 2014] Σse= 𝐷𝑡𝑜 𝑝_ 𝑟𝑒𝑡𝑢𝑟𝑛𝑒𝑑
  • 74. 74/33 Dual-agent Stochastic Game Documents (world) User agent 4 3  [Luo, Zhang and Yang SIGIR 2014] Belief Updater Search engine agent Σse= 𝐷𝑡𝑜 𝑝_ 𝑟𝑒𝑡𝑢𝑟𝑛𝑒𝑑
  • 75. 75/33 Observation function (O) O(st+1, at, ωt) = P(ωt|st+1, at)  Two types of observations  Relevance related  Exploration-exploitation related Probability of making observation ωt after taking action at and landing in state st+1 [Luo, Zhang and Yang SIGIR 2014]
  • 76. 76/33 Relevance-related Observation  Intuition  Similarly, we have  As well as 76 st is likely to be Relevant Non-Relevant If ∃d ∈ Dt-1 and d is SAT Clicked otherwise It happens after the user sends out the message 𝛴 𝑢 𝑡 (clicks) 𝑂( 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 , ωt=Rel)≝ 𝑃(ωt = 𝑅𝑒𝑙| 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢) 𝑂(𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 , ωt = 𝑅𝑒𝑙) ∝ 𝑃 𝑠𝑡 = 𝑅𝑒𝑙 ω𝑡 = 𝑅𝑒𝑙 𝑃(ωt = 𝑅𝑒𝑙, 𝑢) ∝ 𝑃 𝑠𝑡 = 𝑅𝑒𝑙 ω𝑡 = 𝑅𝑒𝑙 𝑃(ωt = 𝑅𝑒𝑙| 𝑢) 𝑂 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙, 𝑢 , ωt = 𝑁𝑜𝑛𝑅𝑒𝑙 ∝ 𝑃 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙 ω𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙 𝑃(ωt = 𝑁𝑜𝑛𝑅𝑒𝑙| 𝑢) 𝑂 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙, 𝑢 , ωt = 𝑅𝑒𝑙 𝑂 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 , ωt = 𝑁𝑜𝑛𝑅𝑒𝑙 [Luo, Zhang and Yang SIGIR 2014]
  • 77. 77/33  It is a combined observation  It happens when updating the before-message-belief-state for a user action au(query change) and a search engine message Ʃse =Dt-1  Intuition st is likely to be Exploration Exploitation if (+Δqt≠∅ and +Δqt∉Dt-1) or (+Δqt=∅ and -Δqt≠∅ ) if (+Δqt≠∅ and +Δqt∈Dt-1) or (+Δqt=∅ and –Δqt=∅ ) EXPLORATION-RELATED OBSERVATION 𝑂 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛, 𝑎𝑢 = ∆𝑞𝑡, 𝑠𝑒 = 𝐷𝑡 − 1, ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ∝ 𝑃 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 × 𝑃 ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ∆𝑞𝑡, 𝐷𝑡 − 1 𝑂 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛, 𝑎𝑢 = ∆𝑞𝑡, 𝑠𝑒 = 𝐷𝑡 − 1, ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛 ∝ 𝑃 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛 𝑤𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛 × 𝑃(𝑤𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛|∆𝑞𝑡, 𝐷𝑡 − 1) [Luo, Zhang and Yang SIGIR 2014]
  • 78. 78/33  The belief state b is updated when a new observation is obtained. 𝒃 𝒕+𝟏(𝒔𝒋) = 𝑷(𝒔𝒋|𝝎𝒕, 𝒂 𝒕, 𝒃 𝒕 = 𝑷(𝝎 𝒕|𝒔𝒋, 𝒂 𝒕, 𝒃 𝒕) 𝒔 𝒊∈𝑺 𝑷(𝒔𝒋|𝒔𝒊, 𝒂 𝒕, 𝒃 𝒕)𝒃 𝒕(𝒔𝒊 )𝑷(𝝎 𝒕|𝒂𝒕, 𝒃 𝒕 = 𝑶(𝒔𝒋, 𝒂 𝒕, 𝝎 𝒕) 𝒔 𝒊∈𝑺 𝑷(𝒔𝒋|𝒔𝒊, 𝒂 𝒕, 𝒃 𝒕)𝒃 𝒕(𝒔𝒊 )𝑷(𝝎 𝒕|𝒂𝒕, 𝒃 𝒕 BELIEF UPDATES (B)
  • 79. 79/33  The long term reward for the search engine agent  The long term reward for the user agent  Joint optimization 𝑸 𝒔𝒆(𝒃, 𝒂) = 𝒔∈𝑺 )𝒃(𝒔)𝑹(𝒔, 𝒂 + 𝜸 𝝎∈𝜴 𝑷(𝝎|𝒃, 𝒂 𝒖, 𝜮 𝒔𝒆)𝑷(𝝎|𝒃, 𝜮 𝒖)𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒆(𝒃′, 𝒂 𝑸 𝒖(𝒃, 𝒂 𝒖) = 𝑹(𝒔, 𝒂 𝒖) + 𝜸 𝒂 𝒖 )𝑻(𝒔 𝒕|𝒔𝒕−𝟏, 𝑫 𝒕−𝟏 𝒎𝒂𝒙 𝒔 𝒕−𝟏 𝑸 𝒖(𝒔𝒕−𝟏, 𝒂 𝒖) = P(qt|d) +𝜸 𝒂 𝒖 )𝐏(𝒒 𝒕|𝒒 𝒕−𝟏, 𝑫 𝒕−𝟏, 𝒂 𝒎𝒂𝒙 𝑫 𝒕−𝟏 𝑷 (𝒒 𝒕−𝟏|𝑫 𝒕−𝟏) 𝒂 𝒔𝒆 = 𝒂𝒓𝒈𝒎𝒂𝒙 𝒂 (𝑸 𝒔𝒆(𝒃, 𝒂) + 𝑸 𝒖(𝒃, 𝒂 𝒖)) JOINT OPTIMIZATION — WIN-WIN [Luo, Zhang and Yang SIGIR 2014]
  • 80. Dynamic Search Engine Demo http://dumplingproject.org Dynamic Information Retrieval Modeling Tutorial 201580
  • 81. 81/33 EXPERIMENTS  Evaluate on TREC 2012 and 2013 Session Tracks  The session logs contain  session topic  user queries  previously retrieved URLs, snippets  user clicks, and dwell time etc.  Task: retrieve 2,000 documents for the last query in each session  The evaluation is based on the whole session.  A document related to any query in the session is a good document 81  Datasets  ClueWeb09  ClueWeb12  Spams, dups are removed
  • 82. 82/33 ACTIONS  increasing weights of the added terms by a factor of x={1.05, 1.10, 1.15, 1.20, 1.25, 1.5, 1.75 or 2};  decreasing weights of the added terms by a factor of y={0.5, 0.57, 0.67, 0.8, 0.83, 0.87, 0.9 or 0.95};  Query Change Model (QCM) proposed in Guan et. al SIGIR’13;  Pseudo Relevance Feedback which assumes the top 20 retrieved documents are relevant;  directly uses the query in current iteration to perform retrieval;  combines all queries in a session weights them equally. 82   a Di )D|(qPmaxa),D,q|(qP+d)|(qP=d),Score(q 1-i1-i1-i1-iiii 1 
  • 83. 83/33 SEARCH ACCURACY  Search accuracy on TREC 2012 Session Track 83  Win-win outperforms most retrieval algorithms on TREC 2012.
  • 84. 84/33 84 Win-win outperforms all retrieval algorithms on TREC 2013.  It is highly effective in Session Search.  Search accuracy on TREC 2013 Session Track SEARCH ACCURACY
  • 85. 85/33 IMMEDIATE SEARCH ACCURACY 85  Original run: top returned documents provided by TREC log data  Win-win’s immediate search accuracy is better than the Original at every iteration  Win-win's immediate search accuracy increases while the number of search iterations increases TREC 2012 Session Track TREC 2013 Session Track
  • 86. 86/33 86  q1=“best US destinations” observation= NRR SRT Relevant & Exploitation 0.1784 SRR Relevant & Exploration 0.1135 SNRT Non-Relevant & Exploitation 0.2838 SNRR Non-Relevant & Exploration 0.4243 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit? BELIEF UPDATES (B) q0
  • 87. 87/33 87  q1=“best US destinations” observation= NRR  q2=“distance New York Boston” observation = RT SRT Relevant & Exploitation 0.0005 SRR Relevant & Exploration 0.0068 SNRT Non-Relevant & Exploitation 0.0715 SNRR Non-Relevant & Exploration 0.9212 BELIEF UPDATES (B) q0 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
  • 88. 88/33 88  q1=“best US destinations” observation= NRR  q2=“distance New York Boston” observation = RT SRT Relevant & Exploitation 0.0005 SRR Relevant & Exploration 0.0068 SNRT Non-Relevant & Exploitation 0.0715 SNRR Non-Relevant & Exploration 0.9212 BELIEF UPDATES (B) q0 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
  • 89. 89/33 89  q1=“best US destinations” observation= NRR  q2=“distance New York Boston” observation = RT  q3=“maps.bing.com” observation = NRT SRT Relevant & Exploitation 0.0151 SRR Relevant & Exploration 0.4347 SNRT Non-Relevant & Exploitation 0.0276 SNRR Non-Relevant & Exploration 0.5226 BELIEF UPDATES (B) q0 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
  • 90. 90/33 90  q1=“best US destinations” observation= NRR  q2=“distance New York Boston” observation = RT  q3=“maps.bing.com” observation = NRT SRT Relevant & Exploitation 0.0151 SRR Relevant & Exploration 0.4347 SNRT Non-Relevant & Exploitation 0.0276 SNRR Non-Relevant & Exploration 0.5226 BELIEF UPDATES (B) q0 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
  • 91. 91/33 91  q1=“best US destinations” observation= NRR  q2=“distance New York Boston” observation = RT  q3=“maps.bing.com” observation = NRT SRT Relevant & Exploitation 0.0291 SRR Relevant & Exploration 0.7837 SNRT Non-Relevant & Exploitation 0.0081 SNRR Non-Relevant & Exploration 0.1790  q20=“Philadelphia NYC train” observation = NRT …… BELIEF UPDATES (B) q0 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
  • 92. 92/33 92  q1=“best US destinations” observation= NRR  q2=“distance New York Boston” observation = RT  q3=“maps.bing.com” observation = NRT SRT Relevant & Exploitation 0.0291 SRR Relevant & Exploration 0.7837 SNRT Non-Relevant & Exploitation 0.0081 SNRR Non-Relevant & Exploration 0.1790  q20=“Philadelphia NYC train” observation = NRT …… BELIEF UPDATES (B) q0 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
  • 93. 93/33 93  q1=“best US destinations” observation= NRR  q2=“distance New York Boston” observation = RT  q3=“maps.bing.com” observation = NRT SRT Relevant & Exploitation 0.0304 SRR Relevant & Exploration 0.8126 SNRT Non-Relevant & Exploitation 0.0066 SNRR Non-Relevant & Exploration 0.1505  q20=“Philadelphia NYC train” observation = NRT  q21=“Philadelphia NYC bus” observation = NRT BELIEF UPDATES (B) q0 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit? ……
  • 94. 94/33 94  q1=“best US destinations” observation= NRR  q2=“distance New York Boston” observation = RT  q3=“maps.bing.com” observation = NRT SRT Relevant & Exploitation 0.0304 SRR Relevant & Exploration 0.8126 SNRT Non-Relevant & Exploitation 0.0066 SNRR Non-Relevant & Exploration 0.1505  q20=“Philadelphia NYC train” observation = NRT  q21=“Philadelphia NYC bus” observation = NRT BELIEF UPDATES (B) q0 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit? ……
  • 95. Coffee Break Dynamic Information Retrieval Modeling Tutorial 201595
  • 96. Apply an MDP to an IR Problem - Example Dynamic Information Retrieval Modeling Tutorial 201596  User agent in session search  States – user’s relevance judgement  Action – new query  Reward – information gained [Luo, Zhang, Yang SIGIR’14]
  • 97.  The agent uses a state estimator to update its belief about the hidden states b′ = 𝑆𝐸(𝑏, 𝑎, 𝑜′)  b′ s′ = P s′ o′ , a, b = 𝑃(𝑠′,𝑜′|𝑎,𝑏) P(𝑜′|𝑎,𝑏) = Θ(𝑠′, 𝑎, 𝑜′) 𝑠 𝑀(𝑠, 𝑎, 𝑠′)𝑏(𝑠) 𝑃(𝑜′|𝑎, 𝑏) POMDP → Belief Update Dynamic Information Retrieval Modeling Tutorial 201597
  • 98. POMDP → Bellman Equation Dynamic Information Retrieval Modeling Tutorial 201598  The Bellman equation for POMDP 𝑉 𝑏 = max 𝑎 𝑟 𝑏, 𝑎 + 𝛾 𝑜′ 𝑃(𝑜′ |𝑎, 𝑏)𝑉(𝑏′ )  A POMDP can be transformed into a continuous belief MDP (B, 𝑀′, A, r, γ)  B : the continuous belief space  𝑀′: transition function 𝑀 𝑎 ′ (𝑏, 𝑏′)= 𝑜∈𝑂 1 𝑎,𝑜′(𝑏′, 𝑏)Pr(𝑜′|𝑎, 𝑏) where 1 𝑎,𝑜′ 𝑏′ , 𝑏 = 1, 𝑖𝑓 𝑆𝐸 𝑏, 𝑎, 𝑜′ = 𝑏′ 0, 𝑒𝑙𝑠𝑒 .  A: action space  r: reward function r(b, a)= 𝑠∈𝑆 𝑏 𝑠 𝑅(𝑠, 𝑎)
  • 99. Applying POMDP to Dynamic IR Dynamic Information Retrieval Modeling Tutorial 201599 POMDP Dynamic IR Environment Documents Agents User, Search engine States Queries, User’s decision making status, Relevance of documents, etc Actions Provide a ranking of documents, Weigh terms in the query, Add/remove/unchange the query terms, Switch on or switch off a search technology, Adjust parameters for a search technology Observations Queries, Clicks, Document lists, Snippets, Terms, etc Rewards Evaluation measures (such as DCG, NDCG or MAP) Clicking information Transition matrix Given in advance or estimated from training data. Observation function Problem dependent, Estimated based on sample datasets
  • 100. Session Search Example - States 100 SRT Relevant & Exploitation SRR Relevant & Exploration SNRT Non-Relevant & Exploitation SNRR Non-Relevant & Exploration  scooter price ⟶ scooter stores  Hartford visitors ⟶ Hartford Connecticut tourism  Philadelphia NYC travel ⟶ Philadelphia NYC train  distance New York Boston ⟶ maps.bing.com q0 [ J. Luo ,et al., ’14] Dynamic Information Retrieval Modeling Tutorial 2015
  • 101. Session Search Example - Actions (Au, Ase) 101  User Action(Au)  Add query terms (+Δq)  Remove query terms (-Δq)  keep query terms (qtheme)  clicked documents  SAT clicked documents  Search Engine Action(Ase)  increase/decrease/keep term weights,  Switch on or switch off query expansion  Adjust the number of top documents used in PRF  etc. [ J. Luo et al., ’14] Dynamic Information Retrieval Modeling Tutorial 2015
  • 102. TREC Session Tracks (2010- 2012)  Given a series of queries {q1,q2,…,qn}, top 10 retrieval results {D1, … Di-1 } for q1 to qi-1, and click information  The task is to retrieve a list of documents for the current/last query, qn  Relevance judgment is made based on how relevant the documents are for qn, and how relevant they are for information needs for the entire session (in topic description)  no need to segment the sessions 102 Dynamic Information Retrieval Modeling Tutorial 2015
  • 103. Query change is an important form of feedback  We define query change as the syntactic editing changes between two adjacent queries:  includes  , added terms  , removed terms  The unchanged/shared terms are called:  , theme term 1 iii qqq iq 103 iq iq iq themeq q1 = “bollywood legislation” q2 = “bollywood law” ------------------------------------- -- Theme Term = “bollywood” Dynamic Information Retrieval Modeling Tutorial 2015
  • 104. Where do these query changes come from?  Given TREC Session settings, we consider two sources of query change:  the previous search results that a user viewed/read/examined  the information need  Example:  Kurosawa  Kurosawa wife  `wife’ is not in any previous results, but in the topic description  However, knowing information needs before search is difficult to achieve 104 Dynamic Information Retrieval Modeling Tutorial 2015
  • 105. Previous search results could influence query change in quite complex ways  Merck lobbyists  Merck lobbying US policy  D1 contains several mentions of ‘policy’, such as  “A lobbyist who until 2004 worked as senior policy advisor to Canadian Prime Minister Stephen Harper was hired last month by Merck …”  These mentions are about Canadian policies; while the user adds US policy in q2  Our guess is that the user might be inspired by ‘policy’, but he/she prefers a different sub-concept other than `Canadian policy’  Therefore, for the added terms `US policy’, ‘US’ is the novel term here, and ‘policy’ is not since it appeared in D1.  The two terms should be treated differently 105 Dynamic Information Retrieval Modeling Tutorial 2015
  • 106. 106/33 POMDP Rich Interactions Hidden, Evolving Information Needs A Long Term Goal Temporal Dependency actions hidden states rewards Markov property POMDP (Partially Observable Markov Decision Process) SG (Stochastic Games) Multi-agent Collaboration
  • 107. Recap – Characteristics of Dynamic IR Dynamic Information Retrieval Modeling Tutorial 2015107  Rich interactions Query formulation, Document clicks, Document examination, eye movement, mouse movements, etc.  Temporal dependency  Overall goal
  • 108. Modeling Query Change  A framework that is inspired by Reinforcement Learning  Reinforcement Learning for Markov Decision Process  models a state space S and an action space A according to a transition model T = P(si+1|si ,ai)  a policy π(s) = a indicates that at a state s, what are the actions a can be taken by the agent  each state is associated with a reward function R that indicates possible positive reward or negative loss that a state and an action may result.  Reinforcement learning offers general solutions to MDP and seeks for the best policy for an agent.108
  • 109. Outline Dynamic Information Retrieval Modeling Tutorial 2015109  Introduction & Theory  Session Search  Dynamic Ranking  Multi Armed Bandits  Portfolio Ranking  Multi-Page Search  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  • 110. Dynamic Information Retrieval Modeling Tutorial 2015110  Markov Process  Hidden Markov Model  Markov Decision Process  Partially Observable Markov Decision Process  Multi-Armed Bandit Family of Markov Models
  • 111. Multi Armed Bandits (MAB) Dynamic Information Retrieval Modeling Tutorial 2015111 …… …… Which slot machine should I select in this round? Reward
  • 112. Multi Armed Bandits (MAB) Dynamic Information Retrieval Modeling Tutorial 2015112 I won! Is this the best slot machine? Reward
  • 113. MAB Definition Dynamic Information Retrieval Modeling Tutorial 2015113  A tuple (S, A, R, B) S : hidden reward distribution of each bandit A: choose which bandit to play R: reward for playing bandit B: belief space, our estimate of each bandit’s distribution
  • 114. Comparison with Markov Models Dynamic Information Retrieval Modeling Tutorial 2015114  Single state Markov Decision Process No transition probability  Similar to POMDP in that we maintain a belief state  Action = choose a bandit, does not affect state  Does not ‘plan ahead’ but intelligently adapts  Somewhere between interactive and dynamic IR
  • 115. MAB Policy Reward Dynamic Information Retrieval Modeling Tutorial 2015115  MAB algorithm describes a policy 𝜋 for choosing bandits  Maximise rewards from chosen bandits over all time steps  Minimize regret  𝑡=1 𝑇 𝑅𝑒𝑤𝑎𝑟𝑑 𝑎∗ − 𝑅𝑒𝑤𝑎𝑟𝑑(𝑎 𝜋(𝑡))  Cumulative difference between optimal reward and actual reward
  • 116. Exploration vs Exploitation Dynamic Information Retrieval Modeling Tutorial 2015116  Exploration  Try out bandits to find which has highest average reward  Exploitation  Too much exploration leads to poor performance  Play bandits that are known to pay out higher reward on average  MAB algorithms balance exploration and exploitation  Start by exploring more to find best bandits  Exploit more as best bandits become known
  • 117. MAB – Index Algorithms Dynamic Information Retrieval Modeling Tutorial 2015117  Gittens index1  Play bandit with highest ‘Dynamic Allocation Index’  Modelled using MDP but suffers ‘curse of dimensionality’  𝜖-greedy2  Play highest reward bandit with probability 1 − ϵ  Play random bandit with probability 𝜖  UCB (Upper Confidence Bound)3 1J. C. Gittins. ‘89 2Nicolò Cesa-Bianchi et. al., ‘98
  • 118. Comparison of Markov Models Dynamic Information Retrieval Modeling Tutorial 2015118  Markov Process – a fully observable stochastic process  Hidden Markov Model – a partially observable stochastic process  MDP – a fully observable decision process  MAB – a decision process, either fully or partially observable  POMDP – a partially observable decision process actions rewards states Markov Process No No Observable Hidden Markov Model No No Unobservable MDP Yes Yes Observable POMDP Yes Yes Unobservable MAB Yes Yes Fixed
  • 119. Outline Dynamic Information Retrieval Modeling Tutorial 2015119  Introduction & Theory  Session Search  Dynamic Ranking  Multi Armed Bandits  Portfolio Ranking  Multi-Page Search  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  • 120. UCB Algorithm Dynamic Information Retrieval Modeling Tutorial 2015120 𝑥𝑖 + 2 ln 𝑡 𝑇𝑖
  • 121. UCB Algorithm Dynamic Information Retrieval Modeling Tutorial 2015121 𝑥𝑖 + 2 ln 𝑡 𝑇𝑖  Calculate for all 𝑖 and select highest
  • 122. UCB Algorithm Dynamic Information Retrieval Modeling Tutorial 2015122 𝑥𝑖 + 2 ln 𝑡 𝑇𝑖  Calculate for all 𝑖 and select highest  Average reward 𝑥𝑖
  • 123. UCB Algorithm Dynamic Information Retrieval Modeling Tutorial 2015123 𝑥𝑖 + 2 ln 𝑡 𝑇𝑖  Calculate for all 𝑖 and select highest  Average reward 𝑥𝑖  Time step 𝑡
  • 124. UCB Algorithm Dynamic Information Retrieval Modeling Tutorial 2015124 𝑥𝑖 + 2 ln 𝑡 𝑇𝑖  Calculate for all 𝑖 and select highest  Average reward 𝑥𝑖  Time step 𝑡  Number of times bandit 𝑖 has been played 𝑇𝑖
  • 125. UCB Algorithm Dynamic Information Retrieval Modeling Tutorial 2015125 𝑥𝑖 + 2 ln 𝑡 𝑇𝑖  Calculate for all 𝑖 and select highest  Average reward 𝑥𝑖  Time step 𝑡  Number of times bandit 𝑖 has been played 𝑇𝑖  Chances of playing infrequently played bandits increases over time
  • 126. Iterative Expectation Dynamic Information Retrieval Modeling Tutorial 2015126 𝑥𝑖 + 2 ln 𝑡 𝑇𝑖 M. Sloan and J. Wang ‘1
  • 127. UCB Algorithm Dynamic Information Retrieval Modeling Tutorial 2015127 𝑥𝑖 + 2 ln 𝑡 𝑇𝑖  Documents 𝑖 M. Sloan and J. Wang ‘1
  • 128. Iterative Expectation Dynamic Information Retrieval Modeling Tutorial 2015128 𝑟𝑖 + 2 ln 𝑡 𝑇𝑖  Documents 𝑖  Average probability of relevance 𝑟𝑖 M. Sloan and J. Wang ‘1
  • 129. Iterative Expectation Dynamic Information Retrieval Modeling Tutorial 2015129 𝑟𝑖 + 2 ln 𝑡 𝛾𝑖(𝑡)  Documents 𝑖  Average probability of relevance 𝑟𝑖  ‘Effective’ number of impressions  𝛾𝑖 𝑡 = 𝑘=1 𝑡 𝛼 𝐶 𝑘 𝛽 1−𝐶 𝑘  𝛼 and 𝛽 reward clicks and non-clicks depending on rank M. Sloan and J. Wang ‘1
  • 130. Iterative Expectation Dynamic Information Retrieval Modeling Tutorial 2015130 𝑟𝑖 + 𝜆 2 ln 𝑡 𝛾𝑖(𝑡)  Documents 𝑖  Average probability of relevance 𝑟𝑖  ‘Effective’ number of impressions  𝛾𝑖 𝑡 = 𝑘=1 𝑡 𝛼 𝐶 𝑘 𝛽 1−𝐶 𝑘  𝛼 and 𝛽 reward clicks and non-clicks depending on rank  Exploration parameter 𝜆 M. Sloan and J. Wang ‘1
  • 131. Portfolio Theory of IR Dynamic Information Retrieval Modeling Tutorial 2015131  Portfolio Theory maximises expected return for a given amount of risk1  Diversity of portfolio increases likely return  We can consider documents as ‘shares’  Documents are dependent on one another, unlike PRP  Portfolio Theory of IR2 allows us to introduce diversity 1H. Markowitz. ‘52 2J. Wang et. al. ‘09
  • 132. Portfolio Ranking Dynamic Information Retrieval Modeling Tutorial 2015132  Documents are dependent on each other  Co-click Matrix from users and logs1  Portfolio Armed Bandit Ranking2:  Exploratively rank using Iterative Expectation  Diversify using portfolio optimisation over co-click matrix  Update relevance and dependence with each click  Both explorative and diverse 1W. Wu et al. ‘11 2M. Sloan and Jun Wang‘1
  • 133. Outline Dynamic Information Retrieval Modeling Tutorial 2015133  Introduction & Theory  Session Search  Dynamic Ranking  Multi Armed Bandits  Portfolio Ranking  Multi-Page Search  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  • 134. Multi Page Search Dynamic Information Retrieval Modeling Tutorial 2015134 Page 1 Page 2 2. 1. 2. 1. X Jin, M. Sloan and J. Wang ’13
  • 135. Multi Page Search Example - States & Actions Dynamic Information Retrieval Modeling Tutorial 2015135 State: Relevanc e of docume nt Action: Ranking of document s Observatio n: Clicks Belief: Multivariate Guassian Reward: DCG over 2 pages X Jin, M. Sloan and J. Wang ’13
  • 136. Model Dynamic Information Retrieval Modeling Tutorial 2015136
  • 137. Model Dynamic Information Retrieval Modeling Tutorial 2015137  𝑁 𝜃1, Σ1  𝜃1 -prior estimate of relevance  Σ1 - prior estimate of covariance  Document similarity  Topic Clustering
  • 138. Model Dynamic Information Retrieval Modeling Tutorial 2015138  Rank action for page 1
  • 139. Model Dynamic Information Retrieval Modeling Tutorial 2015139
  • 140. Model Dynamic Information Retrieval Modeling Tutorial 2015140  Feedback from page 1  𝒓 ~ 𝑁(𝜃𝒔 1 , Σ 𝒔 1 )
  • 141. Model Dynamic Information Retrieval Modeling Tutorial 2015141  Update estimates using 𝒓1  𝜃1 = 𝜃𝒔′ 𝜃 𝒔′ Σ1 = Σ𝒔′ Σs′𝒔′ Σs′𝒔′ Σ 𝒔′  𝜃2 = 𝜃𝒔′ + Σs′𝒔′Σ 𝒔′ −1 (𝒓1 − 𝜃𝒔′)  Σ2 = Σ𝒔′ - Σs′𝒔′Σ 𝒔′ −1 Σs′𝒔′
  • 142. Model Dynamic Information Retrieval Modeling Tutorial 2015142  Rank using PRP
  • 143. Model Dynamic Information Retrieval Modeling Tutorial 2015143  Utility or Ranking  𝜆 𝑗=1 𝑀 𝜃 𝑠 𝑗 1 log2(𝑗+1) + 1 − 𝜆 𝑗=1+𝑀 2𝑀 𝜃 𝑠 𝑗 2 log2(𝑗+1)  DCG
  • 144. Model – Bellman Equation Dynamic Information Retrieval Modeling Tutorial 2015144  Optimize 𝒔1 to improve 𝑼 𝒔 2  𝑉 𝜃1 , Σ1 , 1 = max 𝒔1 𝜆𝜃𝒔 1 . 𝑾1 +
  • 145. 𝜆 Dynamic Information Retrieval Modeling Tutorial 2015145  Balances exploration and exploitation in page 1  Tuned for different queries  Navigational  Informational  𝜆 = 1 for non-ambiguous search
  • 146. Approximation Dynamic Information Retrieval Modeling Tutorial 2015146  Monte Carlo Sampling  ≈ max 𝒔1 𝜆𝜃𝒔 1 . 𝑾1 + max 𝒔2 1 − 𝜆 1 𝑆 𝑟∈𝑂 𝜃𝒔 2 . 𝑾2 𝑃 𝒓  Sequential Ranking Decision
  • 147. Experiment Data Dynamic Information Retrieval Modeling Tutorial 2015147  Difficult to evaluate without access to live users  Simulated using 3 TREC collections and relevance judgements  WT10G – Explicit Ratings  TREC8 – Clickthroughs  Robust – Difficult (ambiguous) search
  • 148. User Simulation Dynamic Information Retrieval Modeling Tutorial 2015148  Rank M documents  Simulated user clicks according to relevance judgements  Update page 2 ranking  Measure at page 1 and 2  Recall  Precision  nDCG  MRR  BM25 – prior ranking model
  • 149. Investigating λ Dynamic Information Retrieval Modeling Tutorial 2015149
  • 150. Baselines Dynamic Information Retrieval Modeling Tutorial 2015150  𝜆 determined experimentally  BM25  BM25 with conditional update (𝜆 = 1)  Maximum Marginal Relevance (MMR)  Diversification  MMR with conditional update  Rocchio  Relevance Feedback
  • 151. Results Dynamic Information Retrieval Modeling Tutorial 2015151
  • 152. Results Dynamic Information Retrieval Modeling Tutorial 2015152
  • 153. Results Dynamic Information Retrieval Modeling Tutorial 2015153
  • 154. Results Dynamic Information Retrieval Modeling Tutorial 2015154
  • 155. Results Dynamic Information Retrieval Modeling Tutorial 2015155
  • 156. Outline Dynamic Information Retrieval Modeling Tutorial 2015156  Introduction & Theory  Session Search  Dynamic Ranking  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  • 157. Cold-start problem in recommmender systems
  • 159. Possible Solutions Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CIKM, 2013.
  • 160. Objective Cold-start problemInteractive mechanism for CF Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CIKM, 2013.
  • 161. Proposed EE algorithms Thompson Sampling Linear-UCB General Linear-UCB Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CI 2013.
  • 162. Cold-start users Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CIKM 2013.
  • 163. Ad selection problem Dynamic Information Retrieval Modeling Tutorial 2015163  how online publishers could optimally select ads to maximize their ad incomes over time? Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM 2012 Selling in multiple- channels with non- fixed prices
  • 164. Dynamic Information Retrieval Modeling Tutorial 2015164 Problem formulation Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM 2012
  • 165. Problem formulation Dynamic Information Retrieval Modeling Tutorial 2015165 Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM 2012
  • 166. Objective function Dynamic Information Retrieval Modeling Tutorial 2015166 Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM 2012
  • 167. Belief update Dynamic Information Retrieval Modeling Tutorial 2015167 Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM 2012
  • 168. Results Dynamic Information Retrieval Modeling Tutorial 2015168 Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM 2012
  • 169. Outline Dynamic Information Retrieval Modeling Tutorial 2015169  Introduction & Theory  Session Search  Dynamic Ranking  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  • 170. Dynamic Information Retrieval Evaluation Guest talk at the WSDM 2015 tutorial on Dynamic Information Retrieval Modeling Charlie Clarke (with much much input from Mark Smucker) University of Waterloo, Canada
  • 171. Moving from static ranking to dynamic domains • How to extend IR evaluation methodologies to dynamic domains? • Three key ideas: 1. Realistic models of searcher interactions 2. Measures costs to searcher in meaningful units (e.g., time, money, …) 3. Measure benefits to searcher in meaningful units (e.g, time, nuggets, …) Charles Clarke, University of Waterloo 171 This talk strongly reflects my opinions (not trying to be neutral). But I am the guest speaker 
  • 172. Evaluating Information Access Systems Charles Clarke, University of Waterloo 172 searching, browsing, summarization, visualization, desktop, mobile, web, books, images, questions, etc., and combinations of these Does the system work for its users? Will this change make the system better or worse? How do we quantify performance?
  • 173. Performance 101: Is this a good search result? Charles Clarke, University of Waterloo 173
  • 174. How to evaluate? Study users Charles Clarke, University of Waterloo 174 Users in the wild: • A/B Testing • Result interleaving • Clicks and dwell time • Mouse movements • Other implicit feedback • … Users in the lab: • Time to task completion • Think aloud protocols • Questionnaires • Eye tracking • …
  • 175. Unfortunately user studies are • Slow • Expensive • Conditions can never be exactly duplicated (e.g., learning to rank) Charles Clarke, University of Waterloo 175
  • 176. Alternative: User performance prediction Can we predict the impact of a proposed change to an information access system (while respecting and reflecting differences between users)? Can we quantify performance improvements in meaningful units so that effect sizes can be considered in statistical testing? Are improvements practically significant, as well as statistically significant? Want to predict the impact of a proposed change automatically, based on existing user performance data, rather than gathering new performance data. Charles Clarke, University of Waterloo 176 The BIG goal ↵
  • 177. Traditional Evaluation of Rankers • Test collection: – Documents – Queries – Relevance judgments • Each ranker generates a ranked list of documents for each query • Score ranked lists using relevance judgments and standard metrics (recall, mean average precision, nDCG, ERR, RBP, ….). Charles Clarke, University of Waterloo 177
  • 178. Charles Clarke, University of Waterloo 178 Example of a good-old-fashioned IR Metric Relevant2. Non-relevant1. Non-relevant3. Relevant5. Non-relevant4. Non-relevant6. Non-relevant7. Ranked List of Documents 8. … Precision at Rank N 0.00 0.50 0.33 0.25 0.40 0.33 0.29 … Average Precision is the average of the precision at N for each relevant document. Mean average precision (MAP) is AP averaged over the set of queries. AP = 1 R Prec(Ri ) Ri å Precision at rank N is the fraction of documents that are relevant in the first N documents.
  • 179. General form of effectiveness measures Nearly all standard effectiveness measures have the same basic form (including nDCG, RBP, ERR, average precision,…): Charles Clarke, University of Waterloo 179 Normalization Rank Gain at rank k Discount factor
  • 180. Implicit user model… • User works down the ranked list spending equal time on each document. Captions, navigation, etc., have no impact. • If they make it to rank i, they receive some benefit (i.e., gain). • Eventually they stop, which is reflected in the discount (i.e., they are less likely to reach lower ranks). • Normalization typically maps the score into the range [0:1]. Units may not be meaningful. Charles Clarke, University of Waterloo 180
  • 181. Traditional Evaluation of Rankers • Many effectiveness measures: precision, recall, average precision, rank-biased precision, discounted cumulative gain, etc. • Widely used and accepted as standard practice. • But… • What does an improvement in average precision from 0.28 to 0.31 mean to users? • Does an increase in the measure really translate to an improved user experience? • How will an improve in the performance of a single component impact overall system performance? Charles Clarke, University of Waterloo 181
  • 182. How to better reflect user variation and system performance? Charles Clarke, University of Waterloo 182 Example: What’s the simplest possible user interface for search? 1) User issues a query 2) System returns material to read i.e., system returns stuff to read, in order (not a list of documents; more like a newspaper article) A correspondingly simple user model, has two parameters: 1) Reading speed 2) Time spent reading
  • 183. Reading speed distribution (from users in the lab) Charles Clarke, University of Waterloo 183 Empirical distribution of reading speed during an information access task, and its fit to a log-normal distribution.
  • 184. Stopping time distribution (from users in the wild) Charles Clarke, University of Waterloo 184 Empirical distribution of time spent searching during an information access task, and its fit to a log-normal distribution.
  • 185. Evaluating a search result Charles Clarke, University of Waterloo 185 1) Generate a reading speed from the distribution 2) Generate a stopping time from the distribution 3) How much useful material did the user read? 4) Repeat for many (simulated) users As an example, we use passage retrieval runs from TREC 2006 Hard Track, which essentially assume our simple user interface. We measure costs to searcher in terms of time spent searching. We measure benefits to searcher in terms of “time well spent”.
  • 186. Useful characters read vs. Characters read Charles Clarke, University of Waterloo 186 Performance of run york04ha1 on TREC 2004 HARD Track topic 424 (“Bollywood”) with 10,000 simulated users.
  • 187. Useful characters read vs. Time spent reading Charles Clarke, University of Waterloo 187 Performance of run york04ha1 on TREC 2004 HARD Track topic 424 (“Bollywood”) with 10,000 simulated users.
  • 188. Time well spent vs. Time spent reading Charles Clarke, University of Waterloo 188 Performance of run york04ha1 on TREC 2004 HARD Track topic 424 (“Bollywood”) with 10,000 simulated users.
  • 189. Distribution of time well spent Charles Clarke, University of Waterloo 189 Performance of run york04ha1 on TREC 2004 HARD Track topic 424 (“Bollywood”) with 10,000 simulated users.
  • 190. Temporal precision vs. Time spent Reading Charles Clarke, University of Waterloo 190 Performance of run york04ha1 on TREC 2004 HARD Track topic 424 (“Bollywood”) with 10,000 simulated users.
  • 191. Distribution of temporal precision Charles Clarke, University of Waterloo 191 Performance of run york04ha1 on TREC 2004 HARD Track topic 424 (“Bollywood”) with 10,000 simulated users.
  • 192. General Framework (Part I): Cumulative Gain • Consider the performance of a system in terms of a cost-benefit (cumulative gain) curve G(t). – Measure costs (e.g., in terms of time spent). – Measure benefits (e.g., in terms of time well spent). • A particular instance of G(t) represents a single user (described by a set of parameters) interacting with a system. not just a list!!! • G(t) captures factors intrinsic to the system. We don’t know how much time the user has to invest, but for different levels of investment, G(t) indicates the benefit.Charles Clarke, University of Waterloo 192
  • 193. General Framework (Part II): Decay • Consider the user’s willingness to invest time in terms of a decay curve D(t), which provides a survival probability. • We assume that G(t) and D(t) are independent. (System dependent stopping probabilities are accommodated in G(t). Details on request.) • D(t) captures factors extrinsic to the system. The user only has so much time they could invest. The cannot invest more, even if they would receive substantial additional benefit from further interaction. Charles Clarke, University of Waterloo 193
  • 194. General form of effectiveness measures (REMINDER) Nearly all standard effectiveness measures have the same basic form (including nDCG, RBP, ERR, average precision,…): Charles Clarke, University of Waterloo 194 Normalization Rank Gain at rank k Discount factor
  • 195. General Framework (Part III): Time-biased gain Overall system performance may be expressed as expected cumulative gain (which also incorporates standard effectiveness measures): Charles Clarke, University of Waterloo 195 Normalization (== 1?) Time Gain at time t Decay factor
  • 196. General Framework (Part IV): Multiple users • Cumulative gain may be computed by – Simulation (drawing a set of parameters from a population of users). – Measuring actual interaction on live systems. – Combinations of measurement and simulation. • Simulating and/or measuring multiple users allows us to consider performance difference across the population of users. • Simulation provides matching pairs (the same user on both systems) increasing our ability to detect differences. Charles Clarke, University of Waterloo 196
  • 197. General Framework Most of the evaluation proposals in the references can be reformulated in terms of this general framework, including those that address issues of: – Novelty and diversity – Filtering, summarization, question answering – Session search, etc. Charles Clarke, University of Waterloo 197 One more example from our current research…
  • 198. Session search example • Two (or more) result lists, e.g., from query reformulation, query suggestion, or switching search engines. • Modeling searcher interaction requires a switch from one result to another. • The optimal time to switch depends on the total time available to search. For example (with many details omitted…): Charles Clarke, University of Waterloo 198
  • 199. Simulation of searchers switching between lists: A vs. B Charles Clarke, University of Waterloo 199 User starts on list A. If the user has less than five minutes to search, they should stay on list A. If the user has more than five minutes to search, they should leave list A after 90 seconds. But can we assume optimal behavior when modeling users?
  • 200. Simulation of searchers switching between lists: A vs. B Charles Clarke, University of Waterloo 200 0 2 4 6 8 10 02468 Switch Time (minutes) AverageGain(relevantdocuments) 10 minutes 8 minutes 6 minutes 4 minutes 2 minutes Session Duration Topic = 389, List A = sab05ror1, List B = uic0501 Different view of the same simulation, with thousands of simulated users. Here, benefits are measured by number of relevant documents seen. Optimal switching time depends on session duration.
  • 201. Summary • Primary goal of IR evaluation: Predict how changes to an IR system will impact the user experience. • Evaluation in dynamic domains requires us to explicitly model the system interface and the user’s search behavior. Costs and benefits must be measured in meaningful units (e.g., time). • Successful IR evaluation requires measurement of users, both “in the wild” and in the lab. These measurements calibrate models, which make predictions, which improve systems. Charles Clarke, University of Waterloo 201
  • 202. A few key papers • Leif Azzopardi. 2009. Usage based effectiveness measures: monitoring application performance in information retrieval. In Proceedings of the 18th ACM conference on Information and knowledge management (CIKM '09). • Leif Azzopardi, Diane Kelly, and Kathy Brennan. 2013. How query cost affects search behavior. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '13). • Feza Baskaya, Heikki Keskustalo, and Kalervo Järvelin. 2012. Time drives interaction: simulating sessions in diverse searching environments. In Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval (SIGIR '12). • Ben Carterette. 2011. System effectiveness, user models, and user utility: a conceptual framework for investigation. In Proceedings of the 34th international ACM SIGIR conference on research and development in Information Retrieval (SIGIR '11). • Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2011. Simulating simple user behavior for system effectiveness evaluation. In Proceedings of the 20th ACM international conference on information and knowledge management (CIKM '11). • Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2012. Incorporating variability in user behavior into systems based evaluation. In Proceedings of the 21st ACM international conference on information and knowledge management (CIKM '12). Charles Clarke, University of Waterloo 202
  • 203. A few more key papers • Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on information and knowledge management (CIKM '09). • Charles L.A. Clarke, Nick Craswell, Ian Soboroff, and Azin Ashkan. 2011. A comparative analysis of cascade measures for novelty and diversity. In Proceedings of the fourth ACM international conference on web search and data mining (WSDM '11). • Charles L. A. Clarke and Mark D. Smucker. 2014. Time well spent. In Proceedings of the 5th information interaction in context symposium (IIiX '14). • Georges Dupret and Mounia Lalmas. 2013. Absence time and user engagement: evaluating ranking functions. In Proceedings of the sixth ACM international conference on web search and data mining (WSDM '13). • Kalervo Järvelin, Susan L. Price, Lois M. L. Delcambre, and Marianne Lykke Nielsen. 2008. Discounted cumulated gain based evaluation of multiple-query IR sessions. In Proceedings of the IR research, 30th European conference on Advances in information retrieval (ECIR'08). • Jiyun Luo, Christopher Wing, Hui Yang, and Marti Hearst. 2013. The water filling model and the cube test: multi-dimensional evaluation for professional search. In Proceedings of the 22nd ACM international conference on information & knowledge management (CIKM '13). Charles Clarke, University of Waterloo 203
  • 204. And yet more key papers • Tetsuya Sakai and Zhicheng Dou. 2013. Summaries, ranked retrieval and sessions: a unified framework for information access evaluation. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '13). • Mark D. Smucker and Charles L.A. Clarke. 2012. Time-based calibration of effectiveness measures. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '12). • Mark D. Smucker and Charles L. A. Clarke. 2012. Modeling user variance in time-biased gain. In Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval (HCIR '12). • Emine Yilmaz, Milad Shokouhi, Nick Craswell, and Stephen Robertson. 2010. Expected browsing utility for web search evaluation. In Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM '10). • Yiming Yang and Abhimanyu Lad. 2009. Modeling expected utility of multi-session information distillation. In Proceedings of the 2nd international conference on the theory of information retrieval (ICTIR ’09). • Plus many other (ask me). Charles Clarke, University of Waterloo 204
  • 205. Dynamic Information Retrieval Evaluation Guest talk at the WSDM 2015 tutorial on Dynamic Information Retrieval Modeling Charlie Clarke University of Waterloo, Canada Thank you!
  • 206. Outline Dynamic Information Retrieval Modeling Tutorial 2015206  Introduction & Theory  Session Search  Dynamic Ranking  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  • 207. Apply an MDP to an IR Problem Dynamic Information Retrieval Modeling Tutorial 2015207  We can model IR systems using a Markov Decision Process  Is there a temporal component?  States – What changes with each time step?  Actions – How does your system change the state?  Rewards – How do you measure feedback or effectiveness in your problem at each time step?  Transition Probability – Can you determine this?
  • 208. Apply an MDP to an IR Problem - Example Dynamic Information Retrieval Modeling Tutorial 2015208  User agent in session search  States – user’s relevance judgement  Action – new query  Reward – information gained [Luo, Zhang, Yang SIGIR’14]
  • 209. Apply an MDP to an IR Problem - Example Dynamic Information Retrieval Modeling Tutorial 2015209  Search engine’s perspective  What if we can’t directly observe user’s relevance judgement?  Click ≠ relevance ? ? ? ?
  • 210. Applying POMDP to Dynamic IR Dynamic Information Retrieval Modeling Tutorial 2015210 POMDP Dynamic IR Environment Documents Agents User, Search engine States Queries, User’s decision making status, Relevance of documents, etc Actions Provide a ranking of documents, Weigh terms in the query, Add/remove/unchange the query terms, Switch on or switch off a search technology, Adjust parameters for a search technology Observations Queries, Clicks, Document lists, Snippets, Terms, etc Rewards Evaluation measures (such as DCG, NDCG or MAP) Clicking information Transition matrix Given in advance or estimated from training data. Observation function Problem dependent, Estimated based on sample datasets
  • 211.  SIGIR Tutorial July 7th 2014 Grace Hui Yang Marc Sloan Jun Wang  Guest Speaker: Emine Yilmaz Dynamic Information Retrieval Modeling Panel Discussion
  • 212. Outline Dynamic Information Retrieval Modeling Tutorial 2015212  Introduction & Theory  Session Search  Dynamic Ranking  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel  Conclusion
  • 213. Conclusions Dynamic Information Retrieval Modeling Tutorial 2015213  Dynamic IR describes a new class of interactive model  Incorporates rich feedback, temporal dependency and is goal oriented.  Family of Markov models and Multi Armed Bandit theory useful in building DIR models  Applicable to a range of IR problems  Useful in applications such as session search and evaluation
  • 214. Dynamic IR Book Dynamic Information Retrieval Modeling Tutorial 2015214  Published by Morgan & Claypool  ‘Synthesis Lectures on Information Concepts, Retrieval, and Services’  Due April / May 2015 (in time for SIGIR 2015)
  • 215. TREC 2015 Dynamic Domain Track  Co-organized by Grace Hui Yang, John Frank, Ian Soboroff  Underexplored subsets of Web content  Limited scope and richness of indexed content, which may not include relevant components of the deep web  temporary pages,  pages behind forms, etc.  Basic search interfaces, where there is little collaboration or history beyond independent keyword search  Complex, task-based, dynamic search  Temporal dependency  Rich interactions  Complex, evolving information needs  Professional users  A wide range of search strategies 215
  • 216. Task  An interactive, multiple runs of search  Starting point: System is given a search query  Iterate  System returns a ranked list of 5 documents  API returns relevance judgments  go to next iteration of retrieval  until done (system decides when to stop)  The goal of the system is to find relevant information for each topic as soon as possible  One-shot ad-hoc search is included  If system decides to stop after iteration one 216
  • 217. domains Domain Corpus Illicit goods 30k forum posts from 5-10 forums (total ~300k posts) Which users are working together to sell illicit goods? Ebola One million tweets 300k docs from in-country web sites (mostly official sites) Who is doing what and where? Local Politics 300k docs from local political groups in Pacific Northwest and British Columbia. Who is campaigning for what and why? 217
  • 218. TIME Line  TREC Call for Participation: January 2015  Data Available: March  Detailed Guidelines: April/May  Topics, Tasks available: June  Systems do their thing: June-July  Evaluation: August  Results to participants: September  Conference: November 2015 218
  • 219. TREC 2015 Total Recall Track  Co-organized by Gord Cormack, Maura Grossman, , Adam Roegiest, Charlie Clarke  Explores high recall tasks through an active learning process modeled on legal search tasks (eDiscovery, patent search).  Participating system start with a topic and proposes a relevant document.  Systems gets immediate feedback on relevance.  Continues to propose additional documents and receive feedback until stopping condition is researched.  Shared online infrastructure and collections with Dynamic Domain. Easy to participate in both, if you participate in one. 219
  • 220. Acknowledgment Dynamic Information Retrieval Modeling Tutorial 2015220  We thank Prof. Charlie Clarke and for his guest lecture  We sincerely thank Dr. Xuchu Dong for his help in preparation of the tutorial  We also thank comments and suggestions from the following colleagues:  Dr Filip Radlinski  Prof. Maarten de Rijke
  • 221. References Dynamic Information Retrieval Modeling Tutorial 2015221 Static IR  Modern Information Retrieval. R. Baeza-Yates and B. Ribeiro-Neto. Addison-Wesley, 1999.  The PageRank Citation Ranking: Bringing Order to the Web. Lawrence Page , Sergey Brin , Rajeev Motwani , Terry Winograd. 1999  Implicit User Modeling for Personalized Search, Xuehua Shen et. al, CIKM, 2005  A Short Introduction to Learning to Rank. Hang Li, IEICE Transactions 94-D(10): 1854-1862, 2011.  Portfolio Theory of Information Retrieval. J. Wang and J. Zhu. In SIGIR 2009
  • 222. References Dynamic Information Retrieval Modeling Tutorial 2015222 Interactive IR  Relevance Feedback in Information Retrieval, Rocchio, J. J., The SMART Retrieval System (pp. 313-23), 1971  A study in interface support mechanisms for interactive information retrieval, Ryen W. White et. al, JASIST, 2006  Visualizing stages during an exploratory search session, Bill Kules et. al, HCIR, 2011  Dynamic Ranked Retrieval, Cristina Brandt et. al, WSDM, 2011  Structured Learning of Two-level Dynamic Rankings, Karthik Raman et. al, CIKM, 2011
  • 223. References Dynamic Information Retrieval Modeling Tutorial 2015223 Dynamic IR  A hidden Markov model information retrieval system. D. R. H. Miller, T. Leek, and R. M. Schwartz. In SIGIR’99, pages 214-221.  Threshold setting and performance optimization in adaptive filtering, Stephen Robertson, JIR 2002  A large-scale study of the evolution of web pages, Dennis Fetterly et. al., WWW 2003  Learning diverse rankings with multi-armed bandits. Filip Radlinski, Robert Kleinberg, Thorsten Joachims. ICML, 2008.  Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem, Yisong Yue et. al., ICML 2009  Meme-tracking and the dynamics of the news cycle,
  • 224. References Dynamic Information Retrieval Modeling Tutorial 2015224 Dynamic IR  Mortal multi-armed bandits. Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, Eli Upfal. NIPS 2009  A Novel Click Model and Its Applications to Online Advertising , Zeyuan Allen Zhu et. al., WSDM 2010  A contextual-bandit approach to personalized news article recommendation. Lihong Li, Wei Chu, John Langford, Robert E. Schapire. WWW, 2010  Inferring search behaviors using partially observable markov model with duration (POMD), Yin he et. al., WSDM, 2011  No Clicks, No Problem: Using Cursor Movements to Understand and Improve Search, Jeff Huang et. al., CHI 2011  Balancing Exploration and Exploitation in Learning to Rank Online, Katja Hofmann et. al., ECIR, 2011  Large-Scale Validation and Analysis of Interleaved Search Evaluation, Olivier Chapelle et. al., TOIS 2012
  • 225. References Dynamic Information Retrieval Modeling Tutorial 2015225 Dynamic IR  Using Control Theory for Stable and Efficient Recommender Systems. T. Jambor, J. Wang, N. Lathia. In: WWW '12, pages 11-20.  Sequential selection of correlated ads by POMDPs, Shuai Yuan et. al., CIKM 2012  Utilizing query change for session search. D. Guan, S. Zhang, and H. Yang. In SIGIR ’13, pages 453– 462.  Query Change as Relevance Feedback in Session Search (short paper). S. Zhang, D. Guan, and H. Yang. In SIGIR 2013.  Interactive exploratory search for multi page search results. X. Jin, M. Sloan, and J. Wang. In WWW ’13.  Interactive Collaborative Filtering. X. Zhao, W.
  • 226. References Dynamic Information Retrieval Modeling Tutorial 2015226 Dynamic IR  Win-win search: Dual-agent stochastic game in session search. J. Luo, S. Zhang, and H. Yang. In SIGIR ’14.  Iterative Expectation for Multi-Period Information Retrieval. M. Sloan and J. Wang. In WSCD 2013.  Dynamical Information Retrieval Modelling: A Portfolio-Armed Bandit Machine Approach. M. Sloan and J. Wang. In WWW 2012.  Jiyun Luo, Sicong Zhang, Xuchu Dong and Hui Yang. Designing States, Actions, and Rewards for Using POMDP in Session Search. In ECIR 2015.  Sicong Zhang, Jiyun Luo, Hui Yang. A POMDP Model for Content-Free Document Re-ranking. In SIGIR 2014.
  • 227. References Dynamic Information Retrieval Modeling Tutorial 2015227 Markov Processes  A markovian decision process. R. Bellman. Indiana University Mathematics Journal, 6:679–684, 1957.  Dynamic Programming. R. Bellman. Princeton University Press, Princeton, NJ, USA, first edition, 1957.  Dynamic Programming and Markov Processes. R.A. Howard. MIT Press. 1960  Linear Programming and Sequential Decisions. Alan S. Manne. Management Science, 1960  Statistical Inference for Probabilistic Functions of Finite State Markov Chains. Baum, Leonard E.; Petrie, Ted. The Annals of Mathematical Statistics 37, 1966
  • 228. References Dynamic Information Retrieval Modeling Tutorial 2015228 Markov Processes  Learning to predict by the methods of temporal differences. Richard Sutton. Machine Learning 3. 1988  Computationally feasible bounds for partially observed Markov decision processes. W. Lovejoy. Operations Research 39: 162–175, 1991.  Q-Learning. Christopher J.C.H. Watkins, Peter Dayan. Machine Learning. 1992  Reinforcement learning with replacing eligibility traces. Singh, S. P. & Sutton, R. S. Machine Learning, 22, pages 123-158, 1996.  Reinforcement Learning: An Introduction. Richard S. Sutton and Andrew G. Barto. MIT Press, 1998.  Planning and acting in partially observable stochastic domains. L. Kaelbling, M. Littman, and A. Cassandra. Artificial Intelligence, 101(1-2):99–134, 1998.
  • 229. References Dynamic Information Retrieval Modeling Tutorial 2015229 Markov Processes  Finding approximate POMDP solutions through belief compression. N. Roy. PhD Thesis Carnegie Mellon. 2003  VDCBPI: an approximate scalable algorithm for large scale POMDPs, P. Poupart and C. Boutilier. In NIPS-2004, pages 1081–1088.  Finding Approximate POMDP solutions Through Belief Compression. N. Roy, G. Gordon and S. Thrun. Journal of Artificial Intelligence Research, 23:1-40,2005.  Probabilistic robotics. S. Thrun, W. Burgard, D. Fox. Cambridge. MIT Press. 2005  Anytime Point-Based Approximations for Large POMDPs. J. Pineau, G. Gordon and S. Thrun. Volume 27, pages 335-380, 2006  Probabilistic Robotics. S. Thrun, W. Burgard, D. Fox. The MIT Press, 2006.
  • 230. References Dynamic Information Retrieval Modeling Tutorial 2015230 Markov Processes  The optimal control of partially observable Markov decision processes over a finite horizon. R. D. Smallwood, E.J. Sondik. Operations Research. 1973  Modified Policy Iteration Algorithms for Discounted Markov Decision Problems. M. L. Puterman and Shin M. C. Management Science 24, 1978.  An example of statistical investigation of the text eugene onegin the connection of samples in chains. A. A. Markov. Science in Context, 19:591–600, 12 2006.  Learning to Rank for Information Retrieval. Tie-Yan Liu. Springer Science & Business Media. 2011  Finite-Time Regret Bounds for the Multiarmed Bandit Problem, Nicolò Cesa-Bianchi, Paul Fischer. ICML 100-108, 1998  Multi-armed bandit allocation indices, Wiley, J. C. Gittins. 1989  Finite-time Analysis of the Multiarmed Bandit Problem, Peter Auer et. al., Machine Learning 47, Issue 2-3. 2002.