Dynamic Information Retrieval Tutorial - WSDM 2015

WSDM Tutorial February 2nd 2015
Grace Hui Yang
Marc Sloan
Jun Wang
Guest Speaker: Charlie Clarke
Dynamic Information Retrieval
Modeling

Dynamic Information
Retrieval
Dynamic Information Retrieval Modeling Tutorial
20152
Document
s to
explore
Informatio
n
need
Observed
document
s
User
Devise a strategy
for helping the
user explore the
information space
in order to learn
which documents
are relevant and
which aren’t, and
satisfy their
information need.

Evolving IR
20153
 Paradigm shifts in IR as new models
emerge
 e.g. VSM → BM25 → Language Model
 Different ways of defining relationship
between query and document
 Static → Interactive → Dynamic
 Evolution in modeling user interaction with
search engine

Outline
20154
 Introduction & Theory
 Static IR
 Interactive IR
 Dynamic IR
 Session Search
 Dynamic Ranking
 Recommendation and Advertising
 Guest Talk: Charlie Clarke
 Discussion Panel

Conceptual Model – Static IR
20155
Static IR
Interactive
IR
Dynamic
IR
 No feedback

Characteristics of Static IR
20156
 Does not learn directly from
user
 Parameters updated
periodically

20157
Commonly Used Static IR
Models
BM25
PageRank
Language
Model
Learning to
Rank

Feedback in IR
20158

Outline
20159
 Static IR
 Interactive IR
 Dynamic IR
 Session Search
 Dynamic Ranking

Conceptual Model – Interactive
IR
201510
Static IR
Interactive
IR
Dynamic
IR
 Exploit Feedback

Learn the user’s taste
interactively!
At the same time, provide good
recommendations!
201511
Interactive Recommender
Systems

Toy Example
201512
 Multi-Page search scenario
 User image searches for “jaguar”
 Rank two of the four results over two
pages:
𝑟 = 0.5 𝑟 = 0.51 𝑟 = 0.9𝑟 = 0.49

Toy Example – Static
Ranking
201513
 Ranked according to PRP
Page 1 Page 2
1.
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
𝑟 = 0.5
𝑟 = 0.49

Toy Example – Relevance
Feedback
201514
 Interactive Search
 Improve 2nd page based on feedback
from 1st page
 Use clicks as relevance feedback
 Rocchio1 algorithm on terms in image
webpage
 𝑤 𝑞
′
= 𝛼𝑤 𝑞 +
𝛽
|𝐷 𝑟| 𝑑∈𝐷 𝑟
𝑤 𝑑 −
𝛾
𝐷 𝑛
𝑑∈𝐷 𝑛
𝑤 𝑑
 New query closer to relevant documents
and different to non-relevant documents1Rocchio, J. J., ’71, Baeza-
Yates & Ribeiro-Neto ‘99

Feedback
201515
 Ranked according to PRP and Rocchio
Page 1 Page 2
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
𝑟 = 0.5
𝑟 = 0.49
*
1.
* Click

Feedback
201516
 No click when searching for animals
Page 1 Page 2
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
1.
?
?

Toy Example – Value
Function
201517
 Optimize both pages using dynamic IR
 Bellman equation for value function
 Simplified example:
 𝑉 𝑡
𝜃 𝑡
, Σ 𝑡
= max
𝑠 𝑡
𝜃𝑠
𝑡
+ 𝐸(𝑉 𝑡+1
𝜃 𝑡+1
, Σ 𝑡+1
𝐶 𝑡
)
 𝜃 𝑡, Σ 𝑡 = relevance and covariance of documents for
page 𝑡
 𝐶 𝑡 = clicks on page 𝑡
 𝑉 𝑡 = ‘value’ of ranking on page 𝑡
 Maximize value over all pages based on
estimating feedback
X Jin, M. Sloan and J. Wang
’13

1 0.8 0.1 0
0.8 1 0.1 0
0.1 0.1 1 0.95
0 0 0.95 1
Toy Example - Covariance
201518
 Covariance matrix represents similarity between
images
’13

Toy Example – Myopic Value
201519
 For myopic ranking, 𝑉2
= 16.380
Page 1
2.
1.
’13

Toy Example – Myopic
Ranking
201520
 Page 2 ranking stays the same regardless of
clicksPage 1 Page 2
2.
1.
2.
1.
’13

Toy Example – Optimal Value
201521
 For optimal ranking, 𝑉2
= 16.528
Page 1
2.
1.
’13

Toy Example – Optimal Ranking
201522
 If car clicked, Jaguar logo is more relevant on
next pagePage 1 Page 2
2.
1.
2.
1.
’13

Toy Example – Optimal Ranking
201523
 In all other scenarios, rank animal first on next
pagePage 1 Page 2
2.
1.
2.
1.
’13

x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple fruit


O: doc about apple iphone
Documents exist in vector space
24 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
Static IR Visualization

x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo


Q
submission, 2015
t = 1: Static IR considers Relevancy

x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo


Q
submission, 2015

Interactive IR Update
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo


Q
-1
-1
+1
Q’
submission, 2015
t = 2: Interactive considers local gain

Dynamic Ranking Principle
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo


t = 1: Relevancy + Variance
Q
submission, 2015

x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo


t = 1: Relevancy + Variance + |Correlations|
Q
-1
-1
+1
submission, 2015

x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo


Diversified, exploratory relevance ranking
Q
submission, 2015

x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo


Q
-1
-1
+1
Q’
submission, 2015
Diversified, exploratory relevance ranking
t = 2: Personalized Re-ranking

Interactive vs Dynamic IR
201533
• Treats
interactions
independently
• Responds to
immediate
feedback
• Static IR used
before feedback
received
• Optimizes
over all
interaction
• Long term
gains
• Models future
user feedback
• Also used at
beginning of
interaction
Interactive Dynamic

Interactive & Dynamic
Techniques
201534
• Rocchio
equation in
Relevance
Feedback
• Collaborative
filtering in
recommender
systems
• Active learning
in interactive
retrieval
• POMDP in
multi page
search and ad
recommendati
on
• Multi Armed
Bandits in
Online
Evaluation
• MDP in
session search
Interactive Dynamic

Outline
201535
 Static IR
 Interactive IR
 Dynamic IR
 Session Search
 Dynamic Ranking

Conceptual Model – Interactive
IR
201536
Static IR
Interactive
IR
Dynamic
IR
 Explore and exploit Feedback

Characteristics of Dynamic
IR
201537
Rich interactions
Query formulation
Document clicks
Document examination
Eye movement
Mouse movements
etc.
[Luo et al., IRJ under revision 2014]

IR
201538
Temporal dependency
clicked
documentsquery
D1
ranked documents
q1 C1
D2
q2 C2
……
…… Dn
qn Cn
I
information need
iteration 1 iteration 2 iteration n

IR
201539
Overall goal
Optimize over all iterations for goal
IR metric or user satisfaction
Optimal policy

40/33
Dynamic Information
Retrieval
Dynamic Relevance
Dynamic Users
Dynamic Queries
Dynamic Documents
Dynamic Information Needs
Users change behavior
over time, user history
Topic Trends, Filtering,
document content change
User perceived
relevance changes
Changing query
definition i.e. ‘Twitter’
Information needs evolve over time
Next
generation
Search
Engine

Why Not Existing Supervised
Learning for Dynamic IR Modeling?
201541
 Lack of enough training data
 Dynamic IR problems contain a sequence of dynamic
interactions
 E.g. a series of queries in session
 Rare to find repeated sequences (close to zero)
 Even in large query logs (WSCD 2013 & 2014, query logs
from Yandex)
 Chance of finding repeated adjacent query
pairs is also lowDataset Repeated
Adjacent Query
Pairs
Total Adjacent
Query Pairs
Repeated
Percentage
WSCD
2013
476,390 17,784,583 2.68%
WSCD
2014
1,959,440 35,376,008 5.54%

Our Solution
201542
Try to find an optimal solution
through a sequence of dynamic
interactions
Trial and Error:
learn from repeated, varied attempts
which are continued until success
No (or less) Supervised Learning

Trial and Error
201543
 q1 – "dulles hotels"
 q2 – "dulles airport"
 q3 – "dulles airport
location"
 q4 – "dulles metrostop"

What is a Desirable Model for
Dynamic IR
201544
 Model interactions, which means it needs to have place
holders for actions;
 Model information need hidden behind user queries and
other interactions;
 Set up a reward mechanism to guide the entire search
algorithm to adjust its retrieval strategies;
 Represent Markov properties to handle the temporal
dependency.
A model in Trial and Error setting will do!
A Markov Model will do!

Markov Decision Process
201545
 MDP extends MC with actions and rewards1
si– state ai – action ri – reward
pi – transition probability
p0 p1 p2
……s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
1R. Bellman, ‘57
(S, M, A, R, γ)

Definition of MDP
201546
 A tuple (S, M, A, R, γ)
 S : state space
 M: transition matrix
Ma(s, s') = P(s'|s, a)
 A: action space
 R: reward function
R(s,a) = immediate reward taking action a at state s
 γ: discount factor, 0< γ ≤1
 policy π
π(s) = the action taken at state s
 Goal is to find an optimal policy π* maximizing the expected
total rewards.

Optimality — Bellman
Equation
201547
 The Bellman equation1 to MDP is a recursive
definition of the optimal value function V*(.)
𝑉∗ s = max
𝑎
𝑅 𝑠, 𝑎 + 𝛾
𝑠′
𝑀 𝑎(𝑠, 𝑠′)𝑉∗(𝑠′)
 Optimal Policy
π∗ s = arg 𝑚𝑎𝑥
𝑎
𝑅 𝑠, 𝑎 + 𝛾
𝑠′
𝑀 𝑎 𝑠, 𝑠′ 𝑉∗(𝑠′)
1R. Bellman, ‘57
state-value function

MDP algorithms
201548
 Value Iteration
 Policy Iteration
 Modified Policy Iteration
 Prioritized Sweeping
 Temporal Difference (TD) Learning
 Q-Learning
Model free
approaches
Model-based
approaches
[Bellman, ’57, Howard, ‘60, Puterman and Shin, ‘78, Singh & Sutton, ‘96, Sutton &
Barto, ‘98, Richard Sutton, ‘88, Watkins, ‘92]
Solve
Bellman
equation
Optimal
value
V*(s)
Optimal
policy *(s)
[Slide altered from Carlos Guestrin’s ML
lecture]

Apply an MDP to an IR
Problem
201549
 We can model IR systems using a Markov
Decision Process
 Is there a temporal component?
 States – What changes with each time step?
 Actions – How does your system change the
state?
 Rewards – How do you measure feedback or
effectiveness in your problem at each time
step?
 Transition Probability – Can you determine
this?

Outline
201550
 Session Search
 Dynamic Ranking

TREC Session Tracks (2010-
now)
 Given a series of queries {q1,q2,…,qn}, top 10
retrieval results {D1, … Di-1 } for q1 to qi-1, and
click information
 The task is to retrieve a list of documents for the
current/last query, qn
 Relevance judgment is made based on how
relevant the documents are for qn, and how relevant
they are for information needs for the entire session
(in topic description)
 no need to segment the sessions
51
2015

1.pocono mountains pennsylvania
2.pocono mountains pennsylvania hotels
3.pocono mountains pennsylvania things to do
4.pocono mountains pennsylvania hotels
5.pocono mountains camelbeach
6.pocono mountains camelbeach hotel
7.pocono mountains chateau resort
8.pocono mountains chateau resort attractions
9.pocono mountains chateau resort getting to
10.chateau resort getting to
11.pocono mountains chateau resort directions
TREC 2012 Session 6
52
Information needs:
You are planning a winter vacation
to the Pocono Mountains region in
Pennsylvania in the US. Where will
you stay? What will you do while
there? How will you get there?
In a session, queries change
constantly
2015

Markov Decision Process
 We propose to model session search as a
Markov decision process (MDP)
 Two agents: the User and the Search Engine
53
[Guan, Zhang and Yang SIGIR 2013]

Settings of the Session MDP
 States: Queries
 Environments: Search results
 Actions:
 User actions:
 Add/remove/ unchange the query terms
 Nicely correspond to our definition of query change
 Search Engine actions:
 Increase/ decrease /remain term weights
54

Search Engine Agent’s
Actions
∈ Di−1 action Example
qtheme
Y increase “pocono mountain” in s6
N increase
“france world cup 98 reaction” in s28,
france world cup 98 reaction stock
market→ france world cup 98 reaction
+∆q
Y decrease
‘policy’ in s37, Merck lobbyists → Merck
lobbyists US policy
N increase
‘US’ in s37, Merck lobbyists → Merck
lobbyists US policy
−∆q
Y decrease
‘reaction’ in s28, france world cup 98
reaction
→ france world cup 98
N
No
change
‘legislation’ in s32, bollywood legislation
→bollywood law
55 [Guan, Zhang and Yang SIGIR 2013]

Bellman Equation
 In a MDP, it is believed that a future reward is
not worth quite as much as a current reward
and thus a discount factor γ ϵ (0,1) is applied
to future rewards.
 Bellman Equation gives the optimal value
(expected long term reward starting from state
s and continuing with policy π from then on)
for an MDP:
56
V*
(s) = max
a
R(s,a) + g P(s' | s,a)
s'
å V*
(s')

Our Tweak
 In a MDP, it is believed that a future reward is
not worth quite as much as a current reward
and thus a discount factor γ ϵ (0,1) is applied
to future rewards.
 In session search, a past reward is not worth
quite as much as a current reward and thus a
discount factor γ should be applied to past
rewards
 We model the MDP for session search in a reverse
order
57

Query Change retrieval Model
(QCM)
 Bellman Equation gives the optimal value for
an MDP:
 The reward function is used as the document
relevance score function and is tweaked
backwards from Bellman equation:
58
V*
(s) = max
a
R(s,a) + g P(s' | s,a)
s'
å V*
(s')
 
a
Di
)D|(qPmaxa),D,q|(qP+d)|(qP=d),Score(q 1-i1-i1-i1-iiii
1

Document
relevant
score Query
Transition
model
Maximum
past
relevanceCurrent
reward/relevan
ce score

Calculating the Transition Model
)|(log)|(
)|(log)()|(log)|(
)|(log)]|(1[+d)|P(qlog=d),Score(q
*
1
*
1
*
1ii
*
1
*
1
dtPdtP
dtPtidfdtPdtP
dtPdtP
qt
i
dt
qt
dt
qt
i
qthemet
i
ii



















59
• According to Query Change and Search
Engine ActionsCurrent reward/
relevance
score
Increase
weights for
theme terms
Decrease
weights for
removed terms
Increase
weights for
novel added
termsDecrease
weights for old
added terms

Maximizing the Reward Function
 Generate a maximum rewarded document
denoted as d*
i-1, from Di-1
 That is the document(s) most relevant to qi-1
 The relevance score can be calculated as
𝑃 𝑞𝑖−1 𝑑𝑖−1 = 1 −
𝑡∈𝑞 𝑖−1
{1 − 𝑃(𝑡|𝑑𝑖−1)}
𝑃 𝑡 𝑑𝑖−1 =
#(𝑡,𝑑 𝑖−1)
|𝑑 𝑖−1|
 From several options, we choose to only use the
document with top relevance
max
Di-1
P(qi-1 | Di-1)
60

Scoring the Entire Session
 The overall relevance score for a session of
queries is aggregated recursively :
Scoresession (qn, d) = Score(qn, d) + gScoresession (qn-1, d)
= Score(qn, d) + g[Score(qn-1, d) + gScoresession (qn-2, d)]
= gn-i
i=1
n
å Score(qi, d)
61

Experiments
 TREC 2011-2012 query sets, datasets
 ClubWeb09 Category B
62
2015

Search Accuracy (TREC
2012)
 nDCG@10 (official metric used in TREC)
Approach nDCG@10 %chg MAP %chg
Lemur 0.2474 -21.54% 0.1274 -18.28%
TREC’12 median 0.2608 -17.29% 0.1440 -7.63%
Our TREC’12
submission
0.3021 −4.19% 0.1490 -4.43%
TREC’12 best 0.3221 0.00% 0.1559 0.00%
QCM 0.3353 4.10%† 0.1529 -1.92%
QCM+Dup 0.3368 4.56%† 0.1537 -1.41%
63
2015

Search Accuracy (TREC
2011)
 nDCG@10 (official metric used in TREC)
Approach nDCG@10 %chg MAP %chg
Lemur 0.3378 -23.38% 0.1118 -25.86%
TREC’11 median 0.3544 -19.62% 0.1143 -24.20%
TREC’11 best 0.4409 0.00% 0.1508 0.00%
QCM 0.4728 7.24%† 0.1713 13.59%†
QCM+Dup 0.4821 9.34%† 0.1714 13.66%†
Our TREC’12
submission
0.4836 9.68%† 0.1724 14.32%†
64
2015

Search Accuracy for Different
Session Types
 TREC 2012 Sessions are classified into:
 Product: Factual / Intellectual
 Goal quality: Specific / Amorphous
Intellec
tual
%chg Amorphous %chg Specific %chg Factual %chg
TREC best 0.3369 0.00% 0.3495 0.00% 0.3007 0.00% 0.3138 0.00%
Nugget 0.3305 -1.90% 0.3397 -2.80% 0.2736 -9.01% 0.2871 -8.51%
QCM 0.3870 14.87% 0.3689 5.55% 0.3091 2.79% 0.3066 -2.29%
QCM+DUP 0.3900 15.76% 0.3692 5.64% 0.3114 3.56% 0.3072 -2.10%
65
- Better handle sessions that demonstrate evolution and
exploration Because QCM treats a session as a continuous
process by studying changes among query transitions and
modeling the dynamicsDynamic Information Retrieval Modeling Tutorial
2015

POMDP Model
201566
……s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
 Hidden states
 Observations
 Belief
1R. D. Smallwood et. al., ‘73
o1 o2 o3

POMDP Definition
201567
 A tuple (S, M, A, R, γ, O, Θ, B)
 S : state space
 M: transition matrix
 A: action space
 R: reward function
 γ: discount factor, 0< γ ≤1
 O: observation set
an observation is a symbol emitted according to a hidden
state.
 Θ: observation function
Θ(s,a,o) is the probability that o is observed when the
system transitions into state s after taking action a, i.e.
P(o|s,a).
 B: belief space
Belief is a probability distribution over hidden states.

68/33
A Markov Chain of Decision Making
…
A1 A2 A3 A4
S1 S2 S3 Sn
“old US coins” “collecting old
US coins”
“selling old US
coins”
q1 q2 q3
“D1 is relevant and I
stay to find out more
about collecting…”
D1 D2 D3
“D2 is relevant and
I now move to the
next topic…”
“D3 is irrelevant; I slightly
edit the query and stay
here a little longer…”
[Luo, Zhang and Yang SIGIR 2014]

69/33
Hidden Decision Making States
SRT
Relevant &
Exploitation
SRR
Relevant &
Exploration
SNRT
Non-Relevant
& Exploitation
SNRR
Non-Relevant
& Exploration
 scooter price ⟶ scooter
stores
 collecting old US coins⟶
selling old US coins
 Philadelphia NYC travel ⟶
Philadelphia NYC train
 Boston tourism ⟶ NYC
tourism
q0

70/33
Dual Agent Stochastic Game
Hidden states
Actions
Rewards
Markov
……s0
r0
a0
r1
a1
r2
a2
s1 s2 s3
Dual-agent game
Cooperative game
Joint optimization D2
User Agent
Search Engine
Agent

71/33
Actions
 User Action (Au)
 add query terms (+Δq)
 remove query terms (-Δq)
 keep query terms (qtheme)
 Search Engine Action(Ase)
 Increase/ decrease/ keep term weights
 Switch on or off a search technique,
 e.g. to use or not to use query expansion
 adjust parameters in search techniques
 e.g., select the best k for the top k docs used in PRF
 Message from the user(Σu)
 clicked documents
 SAT clicked documents
 Message from search engine(Σse)
 top k returned documents
Messages are essentially
documents that an agent
thinks are relevant.

72/33
Dual-agent Stochastic Game
Documents
(world)
User agent Search engine agent
Belief
Updater
Σse= 𝐷𝑡𝑜 𝑝_ 𝑟𝑒𝑡𝑢𝑟𝑛𝑒𝑑

73/33
Documents
(world)
User agent
4 3
Search engine agent
Belief
Updater

74/33
Documents
(world)
User agent
4 3

Belief
Updater
Search engine agent

75/33
Observation function (O)
O(st+1, at, ωt) = P(ωt|st+1, at)
 Two types of observations
 Relevance related
 Exploration-exploitation related
Probability of making observation ωt after taking action
at and landing in state st+1

76/33
Relevance-related Observation
 Intuition
 Similarly, we have
 As well as 76
st is likely to be
Relevant
Non-Relevant
If ∃d ∈ Dt-1 and
d is SAT Clicked
otherwise
It happens after the user sends out the message 𝛴 𝑢
𝑡
(clicks)
𝑂( 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 ,
ωt=Rel)≝ 𝑃(ωt = 𝑅𝑒𝑙| 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢)
𝑂(𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 , ωt = 𝑅𝑒𝑙) ∝ 𝑃 𝑠𝑡 = 𝑅𝑒𝑙 ω𝑡 = 𝑅𝑒𝑙 𝑃(ωt = 𝑅𝑒𝑙, 𝑢)
∝ 𝑃 𝑠𝑡 = 𝑅𝑒𝑙 ω𝑡 = 𝑅𝑒𝑙 𝑃(ωt = 𝑅𝑒𝑙| 𝑢)
𝑂 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙, 𝑢 , ωt = 𝑁𝑜𝑛𝑅𝑒𝑙
∝ 𝑃 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙 ω𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙 𝑃(ωt = 𝑁𝑜𝑛𝑅𝑒𝑙| 𝑢)
𝑂 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙, 𝑢 , ωt = 𝑅𝑒𝑙
𝑂 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 , ωt = 𝑁𝑜𝑛𝑅𝑒𝑙

77/33
 It is a combined observation
 It happens when updating the before-message-belief-state for a user
action au(query change) and a search engine message Ʃse =Dt-1
 Intuition
st is likely to be
Exploration
Exploitation
if (+Δqt≠∅ and +Δqt∉Dt-1)
or (+Δqt=∅ and -Δqt≠∅ )
if (+Δqt≠∅ and +Δqt∈Dt-1)
or (+Δqt=∅ and –Δqt=∅ )
EXPLORATION-RELATED OBSERVATION
𝑂 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛, 𝑎𝑢 = ∆𝑞𝑡, 𝑠𝑒 = 𝐷𝑡 − 1, ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛
∝ 𝑃 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛
× 𝑃 ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ∆𝑞𝑡, 𝐷𝑡 − 1
𝑂 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛, 𝑎𝑢 = ∆𝑞𝑡, 𝑠𝑒 = 𝐷𝑡 − 1, ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛
∝ 𝑃 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛 𝑤𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛
× 𝑃(𝑤𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛|∆𝑞𝑡, 𝐷𝑡 − 1)

79/33
 The long term reward for the search engine agent
 The long term reward for the user agent
 Joint optimization
𝑸 𝒔𝒆(𝒃, 𝒂) =
𝒔∈𝑺
)𝒃(𝒔)𝑹(𝒔, 𝒂 + 𝜸
𝝎∈𝜴
𝑷(𝝎|𝒃, 𝒂 𝒖, 𝜮 𝒔𝒆)𝑷(𝝎|𝒃, 𝜮 𝒖)𝒎𝒂𝒙
𝒂
𝑸 𝒔𝒆(𝒃′, 𝒂
𝑸 𝒖(𝒃, 𝒂 𝒖) = 𝑹(𝒔, 𝒂 𝒖) + 𝜸 𝒂 𝒖
)𝑻(𝒔 𝒕|𝒔𝒕−𝟏, 𝑫 𝒕−𝟏 𝒎𝒂𝒙 𝒔 𝒕−𝟏
𝑸 𝒖(𝒔𝒕−𝟏, 𝒂 𝒖)
= P(qt|d) +𝜸 𝒂 𝒖
)𝐏(𝒒 𝒕|𝒒 𝒕−𝟏, 𝑫 𝒕−𝟏, 𝒂 𝒎𝒂𝒙 𝑫 𝒕−𝟏
𝑷 (𝒒 𝒕−𝟏|𝑫 𝒕−𝟏)
𝒂 𝒔𝒆 = 𝒂𝒓𝒈𝒎𝒂𝒙
𝒂
(𝑸 𝒔𝒆(𝒃, 𝒂) + 𝑸 𝒖(𝒃, 𝒂 𝒖))
JOINT OPTIMIZATION — WIN-WIN

Dynamic Search Engine Demo
http://dumplingproject.org
201580

81/33
EXPERIMENTS
 Evaluate on TREC 2012 and 2013 Session Tracks
 The session logs contain
 session topic
 user queries
 previously retrieved URLs, snippets
 user clicks, and dwell time etc.
 Task: retrieve 2,000 documents for the last query in each session
 The evaluation is based on the whole session.
 A document related to any query in the session is a good document
81
 Datasets
 ClueWeb09
 ClueWeb12
 Spams, dups are
removed

82/33
ACTIONS
 increasing weights of the added terms by a factor of
x={1.05, 1.10, 1.15, 1.20, 1.25, 1.5, 1.75 or 2};
 decreasing weights of the added terms by a factor of
y={0.5, 0.57, 0.67, 0.8, 0.83, 0.87, 0.9 or 0.95};
 Query Change Model (QCM) proposed in Guan et. al
SIGIR’13;
 Pseudo Relevance Feedback which assumes the top 20
retrieved documents are relevant;
 directly uses the query in current iteration to perform
retrieval;
 combines all queries in a session weights them equally.
82
 
a
Di
)D|(qPmaxa),D,q|(qP+d)|(qP=d),Score(q 1-i1-i1-i1-iiii
1


83/33
SEARCH ACCURACY
 Search accuracy on TREC 2012 Session Track
83
 Win-win outperforms most retrieval algorithms on
TREC 2012.

84/33
84
Win-win outperforms all retrieval algorithms
on TREC 2013.
 It is highly effective in Session Search.
 Search accuracy on TREC 2013 Session Track
SEARCH ACCURACY

85/33
IMMEDIATE SEARCH ACCURACY
85
 Original run: top returned documents provided by TREC log data
 Win-win’s immediate search accuracy is better than the Original at
every iteration
 Win-win's immediate search accuracy increases while the number
of search iterations increases
TREC 2012 Session Track TREC 2013 Session Track

86/33
86
 q1=“best US destinations”
observation= NRR
SRT
Relevant &
Exploitation
0.1784
SRR
Relevant &
Exploration
0.1135
SNRT
Non-Relevant &
Exploitation
0.2838
SNRR
Non-Relevant
& Exploration
0.4243
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
BELIEF UPDATES (B)
q0

87/33
87
observation= NRR
 q2=“distance New York
Boston”
observation = RT
SRT
Relevant &
Exploitation
0.0005
SRR
Relevant &
Exploration
0.0068
SNRT
Non-Relevant &
Exploitation
0.0715
SNRR
Non-Relevant
& Exploration
0.9212
BELIEF UPDATES (B)
q0

88/33
88
observation= NRR
Boston”
observation = RT
SRT
Relevant &
Exploitation
0.0005
SRR
Relevant &
Exploration
0.0068
SNRT
Non-Relevant &
Exploitation
0.0715
SNRR
Non-Relevant
& Exploration
0.9212
BELIEF UPDATES (B)
q0

89/33
89
observation= NRR
Boston”
observation = RT
 q3=“maps.bing.com”
observation = NRT
SRT
Relevant &
Exploitation
0.0151
SRR
Relevant &
Exploration
0.4347
SNRT
Non-Relevant &
Exploitation
0.0276
SNRR
Non-Relevant
& Exploration
0.5226
BELIEF UPDATES (B)
q0

90/33
90
observation= NRR
Boston”
observation = RT
observation = NRT
SRT
Relevant &
Exploitation
0.0151
SRR
Relevant &
Exploration
0.4347
SNRT
Non-Relevant &
Exploitation
0.0276
SNRR
Non-Relevant
& Exploration
0.5226
BELIEF UPDATES (B)
q0

91/33
91
observation= NRR
Boston”
observation = RT
observation = NRT
SRT
Relevant &
Exploitation
0.0291
SRR
Relevant &
Exploration
0.7837
SNRT
Non-Relevant &
Exploitation
0.0081
SNRR
Non-Relevant
& Exploration
0.1790  q20=“Philadelphia NYC train”
observation = NRT
……
BELIEF UPDATES (B)
q0

92/33
92
observation= NRR
Boston”
observation = RT
observation = NRT
SRT
Relevant &
Exploitation
0.0291
SRR
Relevant &
Exploration
0.7837
SNRT
Non-Relevant &
Exploitation
0.0081
SNRR
Non-Relevant
& Exploration
observation = NRT
……
BELIEF UPDATES (B)
q0

93/33
93
observation= NRR
Boston”
observation = RT
observation = NRT
SRT
Relevant &
Exploitation
0.0304
SRR
Relevant &
Exploration
0.8126
SNRT
Non-Relevant &
Exploitation
0.0066
SNRR
Non-Relevant
& Exploration
observation = NRT
 q21=“Philadelphia NYC bus”
observation = NRT
BELIEF UPDATES (B)
q0
……

94/33
94
observation= NRR
Boston”
observation = RT
observation = NRT
SRT
Relevant &
Exploitation
0.0304
SRR
Relevant &
Exploration
0.8126
SNRT
Non-Relevant &
Exploitation
0.0066
SNRR
Non-Relevant
& Exploration
observation = NRT
 q21=“Philadelphia NYC bus”
observation = NRT
BELIEF UPDATES (B)
q0
……

Coffee Break
201595

Apply an MDP to an IR Problem
- Example
201596
 User agent in session search
 States – user’s relevance judgement
 Action – new query
 Reward – information gained
[Luo, Zhang, Yang SIGIR’14]

 The agent uses a state estimator to update its
belief about the hidden states
b′
= 𝑆𝐸(𝑏, 𝑎, 𝑜′)
 b′
s′
= P s′
o′
, a, b =
𝑃(𝑠′,𝑜′|𝑎,𝑏)
P(𝑜′|𝑎,𝑏)
=
Θ(𝑠′, 𝑎, 𝑜′) 𝑠 𝑀(𝑠, 𝑎, 𝑠′)𝑏(𝑠)
𝑃(𝑜′|𝑎, 𝑏)
POMDP → Belief Update
201597

POMDP → Bellman Equation
201598
 The Bellman equation for POMDP
𝑉 𝑏 = max
𝑎
𝑟 𝑏, 𝑎 + 𝛾
𝑜′
𝑃(𝑜′
|𝑎, 𝑏)𝑉(𝑏′
)
 A POMDP can be transformed into a continuous belief MDP (B, 𝑀′, A,
r, γ)
 B : the continuous belief space
 𝑀′: transition function 𝑀 𝑎
′ (𝑏, 𝑏′)= 𝑜∈𝑂 1 𝑎,𝑜′(𝑏′, 𝑏)Pr(𝑜′|𝑎, 𝑏)
where 1 𝑎,𝑜′ 𝑏′
, 𝑏 =
1, 𝑖𝑓 𝑆𝐸 𝑏, 𝑎, 𝑜′
= 𝑏′
0, 𝑒𝑙𝑠𝑒
.
 A: action space
 r: reward function r(b, a)= 𝑠∈𝑆 𝑏 𝑠 𝑅(𝑠, 𝑎)

Applying POMDP to Dynamic
IR
201599
POMDP Dynamic IR
Environment Documents
Agents User, Search engine
States Queries, User’s decision making status, Relevance of
documents, etc
Actions Provide a ranking of documents, Weigh terms in the query,
Add/remove/unchange the query terms, Switch on or
switch off a search technology, Adjust parameters for a
search technology
Observations Queries, Clicks, Document lists, Snippets, Terms, etc
Rewards Evaluation measures (such as DCG, NDCG or MAP)
Clicking information
Transition matrix Given in advance or estimated from training data.
Observation
function
Problem dependent, Estimated based on sample datasets

Session Search Example - States
100
SRT
Relevant &
Exploitation
SRR
Relevant &
Exploration
SNRT
Non-Relevant &
Exploitation
SNRR
Non-Relevant &
Exploration
 scooter price ⟶ scooter
stores
 Hartford visitors ⟶ Hartford
Connecticut tourism
 Philadelphia NYC travel ⟶
Philadelphia NYC train
 distance New York Boston ⟶
maps.bing.com
q0
[ J. Luo ,et al., ’14]
2015

Session Search Example - Actions
(Au, Ase)
101
 User Action(Au)
 Add query terms (+Δq)
 Remove query terms (-Δq)
 keep query terms (qtheme)
 clicked documents
 SAT clicked documents
 Search Engine Action(Ase)
 increase/decrease/keep term weights,
 Switch on or switch off query expansion
 Adjust the number of top documents used in PRF
 etc.
[ J. Luo et al., ’14]
2015

TREC Session Tracks (2010-
2012)
 Given a series of queries {q1,q2,…,qn}, top 10
retrieval results {D1, … Di-1 } for q1 to qi-1, and
click information
 The task is to retrieve a list of documents for the
current/last query, qn
 Relevance judgment is made based on how
relevant the documents are for qn, and how relevant
they are for information needs for the entire session
(in topic description)
 no need to segment the sessions
102
2015

Query change is an important
form of feedback
 We define query change as the syntactic
editing changes between two adjacent queries:
 includes
 , added terms
 , removed terms
 The unchanged/shared terms are called:
 , theme term
1 iii qqq
iq
103
iq
iq
iq
themeq
q1 = “bollywood
legislation”
q2 = “bollywood law”
-------------------------------------
--
Theme Term =
“bollywood”
2015

Where do these query changes come
from?
 Given TREC Session settings, we consider two
sources of query change:
 the previous search results that a user
viewed/read/examined
 the information need
 Example:
 Kurosawa  Kurosawa wife
 `wife’ is not in any previous results, but in the topic
description
 However, knowing information needs before
search is difficult to achieve
104
2015

Previous search results could
influence query change in quite
complex ways
 Merck lobbyists  Merck lobbying US policy
 D1 contains several mentions of ‘policy’, such as
 “A lobbyist who until 2004 worked as senior policy
advisor to Canadian Prime Minister Stephen Harper was
hired last month by Merck …”
 These mentions are about Canadian policies; while
the user adds US policy in q2
 Our guess is that the user might be inspired by
‘policy’, but he/she prefers a different sub-concept
other than `Canadian policy’
 Therefore, for the added terms `US policy’, ‘US’ is the
novel term here, and ‘policy’ is not since it appeared
in D1.
 The two terms should be treated differently
105
2015

106/33
POMDP
Rich Interactions
Hidden, Evolving
Information Needs
A Long Term
Goal
Temporal
Dependency
actions
hidden states
rewards
Markov
property
POMDP
(Partially Observable
Markov Decision
Process)
SG
(Stochastic Games)
Multi-agent
Collaboration

Recap – Characteristics of
Dynamic IR
2015107
 Rich interactions
Query formulation, Document clicks, Document
examination, eye movement, mouse movements, etc.
 Temporal dependency
 Overall goal

Modeling Query Change
 A framework that is inspired by Reinforcement
Learning
 Reinforcement Learning for Markov Decision
Process
 models a state space S and an action space A
according to a transition model T = P(si+1|si ,ai)
 a policy π(s) = a indicates that at a state s, what are
the actions a can be taken by the agent
 each state is associated with a reward function R
that indicates possible positive reward or negative
loss that a state and an action may result.
 Reinforcement learning offers general solutions to
MDP and seeks for the best policy for an agent.108

Outline
2015109
 Session Search
 Dynamic Ranking
 Multi Armed Bandits
 Portfolio Ranking
 Multi-Page Search

2015110
 Markov Process
 Hidden Markov Model
 Markov Decision Process
 Partially Observable Markov Decision Process
 Multi-Armed Bandit
Family of Markov Models

Multi Armed Bandits (MAB)
2015111
……
……
Which slot
machine
should I select
in this round?
Reward

Multi Armed Bandits (MAB)
2015112
I won! Is this
the best slot
machine?
Reward

MAB Definition
2015113
 A tuple (S, A, R, B)
S : hidden reward distribution of each
bandit
A: choose which bandit to play
R: reward for playing bandit
B: belief space, our estimate of each
bandit’s distribution

Comparison with Markov Models
2015114
 Single state Markov Decision Process
No transition probability
 Similar to POMDP in that we maintain a
belief state
 Action = choose a bandit, does not
affect state
 Does not ‘plan ahead’ but intelligently
adapts
 Somewhere between interactive and
dynamic IR

MAB Policy Reward
2015115
 MAB algorithm describes a policy 𝜋 for
choosing bandits
 Maximise rewards from chosen bandits
over all time steps
 Minimize regret
 𝑡=1
𝑇
𝑅𝑒𝑤𝑎𝑟𝑑 𝑎∗ − 𝑅𝑒𝑤𝑎𝑟𝑑(𝑎 𝜋(𝑡))
 Cumulative difference between optimal
reward and actual reward

Exploration vs Exploitation
2015116
 Exploration
 Try out bandits to find which has highest average
reward
 Exploitation
 Too much exploration leads to poor performance
 Play bandits that are known to pay out higher
reward on average
 MAB algorithms balance exploration and
exploitation
 Start by exploring more to find best bandits
 Exploit more as best bandits become known

MAB – Index Algorithms
2015117
 Gittens index1
 Play bandit with highest ‘Dynamic Allocation Index’
 Modelled using MDP but suffers ‘curse of
dimensionality’
 𝜖-greedy2
 Play highest reward bandit with probability 1 − ϵ
 Play random bandit with probability 𝜖
 UCB (Upper Confidence Bound)3
1J. C. Gittins. ‘89
2Nicolò Cesa-Bianchi et. al.,
‘98

Comparison of Markov
Models
2015118
 Markov Process – a fully observable stochastic
process
 Hidden Markov Model – a partially observable
stochastic process
 MDP – a fully observable decision process
 MAB – a decision process, either fully or partially
observable
 POMDP – a partially observable decision process
actions rewards states
Markov Process No No Observable
Hidden Markov
Model
No No Unobservable
MDP Yes Yes Observable
POMDP Yes Yes Unobservable
MAB Yes Yes Fixed

Outline
2015119
 Session Search
 Dynamic Ranking

UCB Algorithm
2015120
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖

UCB Algorithm
2015121
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Calculate for all 𝑖 and select highest

UCB Algorithm
2015122
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Average reward 𝑥𝑖

UCB Algorithm
2015123
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Time step 𝑡

UCB Algorithm
2015124
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Time step 𝑡
 Number of times bandit 𝑖 has been played 𝑇𝑖

UCB Algorithm
2015125
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Time step 𝑡
 Number of times bandit 𝑖 has been played 𝑇𝑖
 Chances of playing infrequently played bandits
increases over time

Iterative Expectation
2015126
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
M. Sloan and J. Wang ‘1

UCB Algorithm
2015127
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Documents 𝑖

2015128
𝑟𝑖 +
2 ln 𝑡
𝑇𝑖
 Documents 𝑖
 Average probability of relevance 𝑟𝑖

2015129
𝑟𝑖 +
2 ln 𝑡
𝛾𝑖(𝑡)
 Documents 𝑖
 ‘Effective’ number of impressions
 𝛾𝑖 𝑡 = 𝑘=1
𝑡
𝛼
𝐶 𝑘
𝛽
1−𝐶 𝑘
 𝛼 and 𝛽 reward clicks and non-clicks depending on
rank

2015130
𝑟𝑖 + 𝜆
2 ln 𝑡
𝛾𝑖(𝑡)
 Documents 𝑖
 ‘Effective’ number of impressions
 𝛾𝑖 𝑡 = 𝑘=1
𝑡
𝛼
𝐶 𝑘
𝛽
1−𝐶 𝑘
 𝛼 and 𝛽 reward clicks and non-clicks depending on
rank
 Exploration parameter 𝜆

Portfolio Theory of IR
2015131
 Portfolio Theory maximises expected return for a
given amount of risk1
 Diversity of portfolio increases likely return
 We can consider documents as ‘shares’
 Documents are dependent on one another, unlike
PRP
 Portfolio Theory of IR2 allows us to introduce diversity
1H. Markowitz. ‘52
2J. Wang et. al. ‘09

Portfolio Ranking
2015132
 Documents are dependent on each other
 Co-click Matrix from users and logs1
 Portfolio Armed Bandit Ranking2:
 Exploratively rank using Iterative Expectation
 Diversify using portfolio optimisation over co-click matrix
 Update relevance and dependence with each click
 Both explorative and diverse
1W. Wu et al. ‘11
2M. Sloan and Jun Wang‘1

Outline
2015133
 Session Search
 Dynamic Ranking

Multi Page Search
2015134
Page 1 Page 2
2.
1.
2.
1.
’13

Multi Page Search Example -
States & Actions
2015135
State:
Relevanc
e of
docume
nt
Action:
Ranking
of
document
s
Observatio
n: Clicks Belief:
Multivariate
Guassian
Reward: DCG
over 2 pages
’13

Model
2015136

Model
2015137
 𝑁 𝜃1, Σ1
 𝜃1 -prior estimate of relevance
 Σ1 - prior estimate of covariance
 Document similarity
 Topic Clustering

Model
2015138
 Rank action for page 1

Model
2015139

Model
2015140
 Feedback from page 1
 𝒓 ~ 𝑁(𝜃𝒔
1
, Σ 𝒔
1
)

Model
2015141
 Update estimates using 𝒓1
 𝜃1
=
𝜃𝒔′
𝜃 𝒔′
Σ1
=
Σ𝒔′ Σs′𝒔′
Σs′𝒔′ Σ 𝒔′
 𝜃2
= 𝜃𝒔′ + Σs′𝒔′Σ 𝒔′
−1
(𝒓1
− 𝜃𝒔′)
 Σ2 = Σ𝒔′ - Σs′𝒔′Σ 𝒔′
−1
Σs′𝒔′

Model
2015142
 Rank using PRP

Model
2015143
 Utility or Ranking
 𝜆 𝑗=1
𝑀
𝜃 𝑠 𝑗
1
log2(𝑗+1)
+ 1 − 𝜆 𝑗=1+𝑀
2𝑀
𝜃 𝑠 𝑗
2
log2(𝑗+1)
 DCG

Model – Bellman Equation
2015144
 Optimize 𝒔1 to improve 𝑼 𝒔
2
 𝑉 𝜃1
, Σ1
, 1 = max
𝒔1
𝜆𝜃𝒔
1
. 𝑾1 +

𝜆
2015145
 Balances exploration and exploitation in page 1
 Tuned for different queries
 Navigational
 Informational
 𝜆 = 1 for non-ambiguous search

Approximation
2015146
 Monte Carlo Sampling
 ≈ max
𝒔1
𝜆𝜃𝒔
1
. 𝑾1 + max
𝒔2
1 − 𝜆
1
𝑆 𝑟∈𝑂 𝜃𝒔
2
. 𝑾2 𝑃 𝒓
 Sequential Ranking Decision

Experiment Data
2015147
 Difficult to evaluate without access to live users
 Simulated using 3 TREC collections and
relevance judgements
 WT10G – Explicit Ratings
 TREC8 – Clickthroughs
 Robust – Difficult (ambiguous) search

User Simulation
2015148
 Rank M documents
 Simulated user clicks according to relevance
judgements
 Update page 2 ranking
 Measure at page 1 and 2
 Recall
 Precision
 nDCG
 MRR
 BM25 – prior ranking model

Investigating λ
2015149

Baselines
2015150
 𝜆 determined experimentally
 BM25
 BM25 with conditional update (𝜆 = 1)
 Maximum Marginal Relevance (MMR)
 Diversification
 MMR with conditional update
 Rocchio
 Relevance Feedback

Results
2015151

Results
2015152

Results
2015153

Results
2015154

Results
2015155

Outline
2015156
 Session Search
 Dynamic Ranking

Cold-start problem in recommmender systems

Interactive Recommender Systems

Possible Solutions
Zhao, Xiaoxue, Weinan Zhang, and Jun
Wang. "Interactive collaborative filtering."
CIKM, 2013.

Objective
Cold-start problemInteractive
mechanism for CF
Zhao, Xiaoxue, Weinan Zhang, and Jun
Wang. "Interactive collaborative filtering."
CIKM, 2013.

Proposed EE algorithms
Thompson Sampling
Linear-UCB
General Linear-UCB
Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CI
2013.

Cold-start users
Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CIKM
2013.

Ad selection problem
2015163
 how online publishers could optimally select ads
to maximize their ad incomes over time?
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
Selling in
multiple-
channels
with non-
fixed
prices

2015164
Problem formulation
2012

Problem formulation
2015165
2012

Objective function
2015166
2012

Belief update
2015167
2012

Results
2015168
2012

Outline
2015169
 Session Search
 Dynamic Ranking

Dynamic Information Retrieval Evaluation
Guest talk at the WSDM 2015 tutorial on
Dynamic Information Retrieval Modeling
Charlie Clarke
(with much much input from Mark Smucker)
University of Waterloo, Canada

Moving from static ranking to dynamic domains
• How to extend IR evaluation methodologies to
dynamic domains?
• Three key ideas:
1. Realistic models of searcher interactions
2. Measures costs to searcher in meaningful units
(e.g., time, money, …)
3. Measure benefits to searcher in meaningful units
(e.g, time, nuggets, …)
Charles Clarke, University of Waterloo 171
This talk strongly reflects my opinions (not trying to be neutral).
But I am the guest speaker 

Evaluating Information Access Systems
searching, browsing, summarization,
visualization, desktop, mobile, web,
books, images, questions, etc., and
combinations of these
Does the system work for its users?
Will this change make the system better or worse?
How do we quantify performance?

Performance 101: Is this a good search result?

How to evaluate?
Study users
Users in the wild:
• A/B Testing
• Result interleaving
• Clicks and dwell time
• Mouse movements
• Other implicit feedback
• …
Users in the lab:
• Time to task completion
• Think aloud protocols
• Questionnaires
• Eye tracking
• …

Unfortunately user studies are
• Slow
• Expensive
• Conditions can never be exactly duplicated
(e.g., learning to rank)

Alternative: User performance prediction
Can we predict the impact of a proposed change to an
information access system (while respecting and reflecting
differences between users)?
Can we quantify performance improvements in meaningful
units so that effect sizes can be considered in statistical
testing? Are improvements practically significant, as well as
statistically significant?
Want to predict the impact of a proposed change
automatically, based on existing user performance data,
rather than gathering new performance data.
The BIG goal
↵

Traditional Evaluation of Rankers
• Test collection:
– Documents
– Queries
– Relevance judgments
• Each ranker generates a ranked list of
documents for each query
• Score ranked lists using relevance judgments
and standard metrics (recall, mean average
precision, nDCG, ERR, RBP, ….).

Example of a good-old-fashioned IR Metric
Relevant2.
Non-relevant1.
Non-relevant3.
Relevant5.
Non-relevant4.
Non-relevant6.
Non-relevant7.
Ranked List of
Documents
8.
…
Precision at
Rank N
0.00
0.50
0.33
0.25
0.40
0.33
0.29
…
Average Precision is
the average of the
precision at N for each
relevant document.
Mean average
precision (MAP) is AP
averaged over the set
of queries.
AP =
1
R
Prec(Ri )
Ri
å
Precision at rank N is the fraction
of documents that are relevant in
the first N documents.

General form of effectiveness measures
Nearly all standard effectiveness measures
have the same basic form (including nDCG,
RBP, ERR, average precision,…):
Normalization
Rank Gain at rank k
Discount
factor

Implicit user model…
• User works down the ranked list spending
equal time on each document. Captions,
navigation, etc., have no impact.
• If they make it to rank i, they receive some
benefit (i.e., gain).
• Eventually they stop, which is reflected in the
discount (i.e., they are less likely to reach
lower ranks).
• Normalization typically maps the score into
the range [0:1]. Units may not be meaningful.

Traditional Evaluation of Rankers
• Many effectiveness measures: precision,
recall, average precision, rank-biased
precision, discounted cumulative gain, etc.
• Widely used and accepted as standard
practice.
• But…
• What does an improvement in average precision from
0.28 to 0.31 mean to users?
• Does an increase in the measure really translate to an
improved user experience?
• How will an improve in the performance of a single
component impact overall system performance?

How to better reflect user variation and system performance?
Example: What’s the simplest possible user interface for search?
1) User issues a query
2) System returns material to read
i.e., system returns stuff to read, in order
(not a list of documents; more like a newspaper article)
A correspondingly simple user model, has two parameters:
1) Reading speed
2) Time spent reading

Reading speed distribution (from users in the lab)
Empirical distribution of reading speed during an information access task,
and its fit to a log-normal distribution.

Stopping time distribution (from users in the wild)
Empirical distribution of time spent searching during an information access
task, and its fit to a log-normal distribution.

Evaluating a search result
1) Generate a reading speed from the distribution
2) Generate a stopping time from the distribution
3) How much useful material did the user read?
4) Repeat for many (simulated) users
As an example, we use passage retrieval runs from TREC 2006
Hard Track, which essentially assume our simple user interface.
We measure costs to searcher in terms of time spent searching.
We measure benefits to searcher in terms of “time well spent”.

Useful characters read vs. Characters read
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.

Useful characters read vs. Time spent reading

Time well spent vs. Time spent reading

Distribution of time well spent

Temporal precision vs. Time spent Reading

Distribution of temporal precision

General Framework (Part I): Cumulative Gain
• Consider the performance of a system in terms
of a cost-benefit (cumulative gain) curve G(t).
– Measure costs (e.g., in terms of time spent).
– Measure benefits (e.g., in terms of time well
spent).
• A particular instance of G(t) represents a
single user (described by a set of parameters)
interacting with a system. not just a list!!!
• G(t) captures factors intrinsic to the system.
We don’t know how much time the user has to
invest, but for different levels of investment,
G(t) indicates the benefit.Charles Clarke, University of Waterloo 192

General Framework (Part II): Decay
• Consider the user’s willingness to invest time in
terms of a decay curve D(t), which provides a
survival probability.
• We assume that G(t) and D(t) are independent.
(System dependent stopping probabilities are
accommodated in G(t). Details on request.)
• D(t) captures factors extrinsic to the system.
The user only has so much time they could
invest. The cannot invest more, even if they
would receive substantial additional benefit
from further interaction.

General form of effectiveness measures (REMINDER)
Nearly all standard effectiveness measures
have the same basic form (including nDCG,
RBP, ERR, average precision,…):
Normalization
Rank Gain at rank k
Discount
factor

General Framework (Part III): Time-biased gain
Overall system performance may be expressed
as expected cumulative gain (which also
incorporates standard effectiveness measures):
Normalization (== 1?)
Time Gain at time t
Decay
factor

General Framework (Part IV): Multiple users
• Cumulative gain may be computed by
– Simulation (drawing a set of parameters from a
population of users).
– Measuring actual interaction on live systems.
– Combinations of measurement and simulation.
• Simulating and/or measuring multiple users
allows us to consider performance difference
across the population of users.
• Simulation provides matching pairs (the same
user on both systems) increasing our ability to
detect differences.

General Framework
Most of the evaluation proposals in the
references can be reformulated in terms of this
general framework, including those that
address issues of:
– Novelty and diversity
– Filtering, summarization, question answering
– Session search, etc.
One more example from our current research…

Session search example
• Two (or more) result lists, e.g., from query
reformulation, query suggestion, or switching
search engines.
• Modeling searcher interaction requires a
switch from one result to another.
• The optimal time to switch depends on the
total time available to search.
For example (with many details omitted…):

Simulation of searchers switching between lists: A vs. B
User starts on list A.
If the user has less
than five minutes to
search, they should
stay on list A.
If the user has more
than five minutes to
search, they should
leave list A after 90
seconds.
But can we assume
optimal behavior when
modeling users?

Simulation of searchers switching between lists: A vs. B
0 2 4 6 8 10
02468
Switch Time (minutes)
AverageGain(relevantdocuments)
10 minutes
8 minutes
6 minutes
4 minutes
2 minutes
Session Duration
Topic = 389, List A = sab05ror1, List B = uic0501
Different view of the
same simulation, with
thousands of simulated
users.
Here, benefits are
measured by number of
relevant documents
seen.
Optimal switching time
depends on session
duration.

Summary
• Primary goal of IR evaluation: Predict how changes
to an IR system will impact the user experience.
• Evaluation in dynamic domains requires us to
explicitly model the system interface and the user’s
search behavior. Costs and benefits must be
measured in meaningful units (e.g., time).
• Successful IR evaluation requires measurement of
users, both “in the wild” and in the lab. These
measurements calibrate models, which make
predictions, which improve systems.

A few key papers
• Leif Azzopardi. 2009. Usage based effectiveness measures: monitoring application
performance in information retrieval. In Proceedings of the 18th ACM conference on
Information and knowledge management (CIKM '09).
• Leif Azzopardi, Diane Kelly, and Kathy Brennan. 2013. How query cost affects search
behavior. In Proceedings of the 36th international ACM SIGIR conference on Research and
development in information retrieval (SIGIR '13).
• Feza Baskaya, Heikki Keskustalo, and Kalervo Järvelin. 2012. Time drives interaction:
simulating sessions in diverse searching environments. In Proceedings of the 35th
international ACM SIGIR conference on research and development in information retrieval
(SIGIR '12).
• Ben Carterette. 2011. System effectiveness, user models, and user utility: a conceptual
framework for investigation. In Proceedings of the 34th international ACM SIGIR
conference on research and development in Information Retrieval (SIGIR '11).
• Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2011. Simulating simple user
behavior for system effectiveness evaluation. In Proceedings of the 20th ACM international
conference on information and knowledge management (CIKM '11).
• Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2012. Incorporating variability in
user behavior into systems based evaluation. In Proceedings of the 21st ACM international
conference on information and knowledge management (CIKM '12).

A few more key papers
• Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected
reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on
information and knowledge management (CIKM '09).
• Charles L.A. Clarke, Nick Craswell, Ian Soboroff, and Azin Ashkan. 2011. A comparative
analysis of cascade measures for novelty and diversity. In Proceedings of the fourth ACM
international conference on web search and data mining (WSDM '11).
• Charles L. A. Clarke and Mark D. Smucker. 2014. Time well spent. In Proceedings of the
5th information interaction in context symposium (IIiX '14).
• Georges Dupret and Mounia Lalmas. 2013. Absence time and user engagement:
evaluating ranking functions. In Proceedings of the sixth ACM international conference on
web search and data mining (WSDM '13).
• Kalervo Järvelin, Susan L. Price, Lois M. L. Delcambre, and Marianne Lykke Nielsen. 2008.
Discounted cumulated gain based evaluation of multiple-query IR sessions. In Proceedings
of the IR research, 30th European conference on Advances in information retrieval
(ECIR'08).
• Jiyun Luo, Christopher Wing, Hui Yang, and Marti Hearst. 2013. The water filling model and
the cube test: multi-dimensional evaluation for professional search. In Proceedings of the
22nd ACM international conference on information & knowledge management (CIKM '13).

And yet more key papers
• Tetsuya Sakai and Zhicheng Dou. 2013. Summaries, ranked retrieval and sessions: a
unified framework for information access evaluation. In Proceedings of the 36th
international ACM SIGIR conference on Research and development in information retrieval
(SIGIR '13).
• Mark D. Smucker and Charles L.A. Clarke. 2012. Time-based calibration of effectiveness
measures. In Proceedings of the 35th international ACM SIGIR conference on Research
and development in information retrieval (SIGIR '12).
• Mark D. Smucker and Charles L. A. Clarke. 2012. Modeling user variance in time-biased
gain. In Proceedings of the Symposium on Human-Computer Interaction and Information
Retrieval (HCIR '12).
• Emine Yilmaz, Milad Shokouhi, Nick Craswell, and Stephen Robertson. 2010. Expected
browsing utility for web search evaluation. In Proceedings of the 19th ACM international
conference on Information and knowledge management (CIKM '10).
• Yiming Yang and Abhimanyu Lad. 2009. Modeling expected utility of multi-session
information distillation. In Proceedings of the 2nd international conference on the theory of
information retrieval (ICTIR ’09).
• Plus many other (ask me).

Dynamic Information Retrieval Evaluation
Guest talk at the WSDM 2015 tutorial on
Dynamic Information Retrieval Modeling
Charlie Clarke
University of Waterloo, Canada
Thank you!

Outline
2015206
 Session Search
 Dynamic Ranking

Apply an MDP to an IR
Problem
2015207
 We can model IR systems using a Markov
Decision Process
 Is there a temporal component?
 States – What changes with each time step?
 Actions – How does your system change the
state?
 Rewards – How do you measure feedback or
effectiveness in your problem at each time
step?
 Transition Probability – Can you determine
this?

- Example
2015208
 User agent in session search
 States – user’s relevance judgement
 Action – new query
 Reward – information gained
[Luo, Zhang, Yang SIGIR’14]

- Example
2015209
 Search engine’s perspective
 What if we can’t directly observe user’s
relevance judgement?
 Click ≠ relevance
? ? ? ?

Applying POMDP to Dynamic
IR
2015210
POMDP Dynamic IR
Environment Documents
Agents User, Search engine
States Queries, User’s decision making status, Relevance of
documents, etc
Actions Provide a ranking of documents, Weigh terms in the query,
Add/remove/unchange the query terms, Switch on or
switch off a search technology, Adjust parameters for a
search technology
Observations Queries, Clicks, Document lists, Snippets, Terms, etc
Rewards Evaluation measures (such as DCG, NDCG or MAP)
Clicking information
Transition matrix Given in advance or estimated from training data.
Observation
function
Problem dependent, Estimated based on sample datasets

 SIGIR Tutorial July 7th 2014
Grace Hui Yang
Marc Sloan
Jun Wang
 Guest Speaker: Emine Yilmaz
Dynamic Information Retrieval
Modeling
Panel
Discussion

Outline
2015212
 Session Search
 Dynamic Ranking
 Conclusion

Conclusions
2015213
 Dynamic IR describes a new class of interactive
model
 Incorporates rich feedback, temporal dependency
and is goal oriented.
 Family of Markov models and Multi Armed Bandit
theory useful in building DIR models
 Applicable to a range of IR problems
 Useful in applications such as session search and
evaluation

Dynamic IR Book
2015214
 Published by Morgan & Claypool
 ‘Synthesis Lectures on Information Concepts,
Retrieval, and Services’
 Due April / May 2015 (in time for SIGIR 2015)

TREC 2015
Dynamic Domain Track
 Co-organized by Grace Hui Yang, John Frank, Ian Soboroff
 Underexplored subsets of Web content
 Limited scope and richness of indexed content, which may not
include relevant components of the deep web
 temporary pages,
 pages behind forms, etc.
 Basic search interfaces, where there is little collaboration or
history beyond independent keyword search
 Complex, task-based, dynamic search
 Temporal dependency
 Rich interactions
 Complex, evolving information needs
 Professional users
 A wide range of search strategies
215

Task
 An interactive, multiple runs of search
 Starting point: System is given a search query
 Iterate
 System returns a ranked list of 5 documents
 API returns relevance judgments
 go to next iteration of retrieval
 until done (system decides when to stop)
 The goal of the system is to find relevant information for
each topic as soon as possible
 One-shot ad-hoc search is included
 If system decides to stop after iteration one
216

domains
Domain Corpus
Illicit goods 30k forum posts from 5-10 forums (total ~300k posts)
Which users are working together to sell illicit goods?
Ebola One million tweets
300k docs from in-country web sites (mostly official sites)
Who is doing what and where?
Local Politics 300k docs from local political groups in Pacific Northwest
and British Columbia. Who is campaigning for what and
why?
217

TIME Line
 TREC Call for Participation: January 2015
 Data Available: March
 Detailed Guidelines: April/May
 Topics, Tasks available: June
 Systems do their thing: June-July
 Evaluation: August
 Results to participants: September
 Conference: November 2015
218

TREC 2015
Total Recall Track
 Co-organized by Gord Cormack, Maura
Grossman, , Adam Roegiest, Charlie Clarke
 Explores high recall tasks through an active
learning process modeled on legal search tasks
(eDiscovery, patent search).
 Participating system start with a topic and proposes
a relevant document.
 Systems gets immediate feedback on relevance.
 Continues to propose additional documents and
receive feedback until stopping condition is
researched.
 Shared online infrastructure and collections with
Dynamic Domain. Easy to participate in both, if
you participate in one.
219

Acknowledgment
2015220
 We thank Prof. Charlie Clarke and for his guest
lecture
 We sincerely thank Dr. Xuchu Dong for his help in
preparation of the tutorial
 We also thank comments and suggestions from
the following colleagues:
 Dr Filip Radlinski
 Prof. Maarten de Rijke

References
2015221
Static IR
 Modern Information Retrieval. R. Baeza-Yates and B.
Ribeiro-Neto. Addison-Wesley, 1999.
 The PageRank Citation Ranking: Bringing Order to
the Web. Lawrence Page , Sergey Brin , Rajeev
Motwani , Terry Winograd. 1999
 Implicit User Modeling for Personalized Search,
Xuehua Shen et. al, CIKM, 2005
 A Short Introduction to Learning to Rank. Hang Li,
IEICE Transactions 94-D(10): 1854-1862, 2011.
 Portfolio Theory of Information Retrieval. J. Wang and
J. Zhu. In SIGIR 2009

References
2015222
Interactive IR
 Relevance Feedback in Information Retrieval,
Rocchio, J. J., The SMART Retrieval System (pp.
313-23), 1971
 A study in interface support mechanisms for
interactive information retrieval, Ryen W. White et. al,
JASIST, 2006
 Visualizing stages during an exploratory search
session, Bill Kules et. al, HCIR, 2011
 Dynamic Ranked Retrieval, Cristina Brandt et. al,
WSDM, 2011
 Structured Learning of Two-level Dynamic Rankings,
Karthik Raman et. al, CIKM, 2011

References
2015223
Dynamic IR
 A hidden Markov model information retrieval system.
D. R. H. Miller, T. Leek, and R. M. Schwartz. In
SIGIR’99, pages 214-221.
 Threshold setting and performance optimization in
adaptive ﬁltering, Stephen Robertson, JIR 2002
 A large-scale study of the evolution of web pages,
Dennis Fetterly et. al., WWW 2003
 Learning diverse rankings with multi-armed bandits.
Filip Radlinski, Robert Kleinberg, Thorsten Joachims.
ICML, 2008.
 Interactively Optimizing Information Retrieval Systems
as a Dueling Bandits Problem, Yisong Yue et. al.,
ICML 2009
 Meme-tracking and the dynamics of the news cycle,

References
2015224
Dynamic IR
 Mortal multi-armed bandits. Deepayan Chakrabarti, Ravi
Kumar, Filip Radlinski, Eli Upfal. NIPS 2009
 A Novel Click Model and Its Applications to Online
Advertising , Zeyuan Allen Zhu et. al., WSDM 2010
 A contextual-bandit approach to personalized news article
recommendation. Lihong Li, Wei Chu, John Langford,
Robert E. Schapire. WWW, 2010
 Inferring search behaviors using partially observable
markov model with duration (POMD), Yin he et. al.,
WSDM, 2011
 No Clicks, No Problem: Using Cursor Movements to
Understand and Improve Search, Jeff Huang et. al., CHI
2011
 Balancing Exploration and Exploitation in Learning to Rank
Online, Katja Hofmann et. al., ECIR, 2011
 Large-Scale Validation and Analysis of Interleaved Search
Evaluation, Olivier Chapelle et. al., TOIS 2012

References
2015225
Dynamic IR
 Using Control Theory for Stable and Efficient
Recommender Systems. T. Jambor, J. Wang, N.
Lathia. In: WWW '12, pages 11-20.
 Sequential selection of correlated ads by POMDPs,
Shuai Yuan et. al., CIKM 2012
 Utilizing query change for session search. D. Guan,
S. Zhang, and H. Yang. In SIGIR ’13, pages 453–
462.
 Query Change as Relevance Feedback in Session
Search (short paper). S. Zhang, D. Guan, and H.
Yang. In SIGIR 2013.
 Interactive exploratory search for multi page search
results. X. Jin, M. Sloan, and J. Wang. In WWW ’13.
 Interactive Collaborative Filtering. X. Zhao, W.

References
2015226
Dynamic IR
 Win-win search: Dual-agent stochastic game in
session search. J. Luo, S. Zhang, and H. Yang. In
SIGIR ’14.
 Iterative Expectation for Multi-Period Information
Retrieval. M. Sloan and J. Wang. In WSCD 2013.
 Dynamical Information Retrieval Modelling: A
Portfolio-Armed Bandit Machine Approach. M.
Sloan and J. Wang. In WWW 2012.
 Jiyun Luo, Sicong Zhang, Xuchu Dong and Hui
Yang. Designing States, Actions, and Rewards for
Using POMDP in Session Search. In ECIR 2015.
 Sicong Zhang, Jiyun Luo, Hui Yang. A POMDP
Model for Content-Free Document Re-ranking. In
SIGIR 2014.

References
2015227
Markov Processes
 A markovian decision process. R. Bellman. Indiana
University Mathematics Journal, 6:679–684, 1957.
 Dynamic Programming. R. Bellman. Princeton University
Press, Princeton, NJ, USA, first edition, 1957.
 Dynamic Programming and Markov Processes. R.A.
Howard. MIT Press. 1960
 Linear Programming and Sequential Decisions. Alan S.
Manne. Management Science, 1960
 Statistical Inference for Probabilistic Functions of Finite
State Markov Chains. Baum, Leonard E.; Petrie, Ted. The
Annals of Mathematical Statistics 37, 1966

References
2015228
Markov Processes
 Learning to predict by the methods of temporal differences.
Richard Sutton. Machine Learning 3. 1988
 Computationally feasible bounds for partially observed
Markov decision processes. W. Lovejoy. Operations
Research 39: 162–175, 1991.
 Q-Learning. Christopher J.C.H. Watkins, Peter Dayan.
Machine Learning. 1992
 Reinforcement learning with replacing eligibility traces.
Singh, S. P. & Sutton, R. S. Machine Learning, 22, pages
123-158, 1996.
 Reinforcement Learning: An Introduction. Richard S.
Sutton and Andrew G. Barto. MIT Press, 1998.
 Planning and acting in partially observable stochastic
domains. L. Kaelbling, M. Littman, and A. Cassandra.
Artificial Intelligence, 101(1-2):99–134, 1998.

References
2015229
Markov Processes
 Finding approximate POMDP solutions through belief
compression. N. Roy. PhD Thesis Carnegie Mellon. 2003
 VDCBPI: an approximate scalable algorithm for large scale
POMDPs, P. Poupart and C. Boutilier. In NIPS-2004,
pages 1081–1088.
 Finding Approximate POMDP solutions Through Belief
Compression. N. Roy, G. Gordon and S. Thrun. Journal of
Artificial Intelligence Research, 23:1-40,2005.
 Probabilistic robotics. S. Thrun, W. Burgard, D. Fox.
Cambridge. MIT Press. 2005
 Anytime Point-Based Approximations for Large POMDPs.
J. Pineau, G. Gordon and S. Thrun. Volume 27, pages
335-380, 2006
 Probabilistic Robotics. S. Thrun, W. Burgard, D. Fox. The
MIT Press, 2006.

References
2015230
Markov Processes
 The optimal control of partially observable Markov decision
processes over a finite horizon. R. D. Smallwood, E.J. Sondik.
Operations Research. 1973
 Modified Policy Iteration Algorithms for Discounted Markov
Decision Problems. M. L. Puterman and Shin M. C. Management
Science 24, 1978.
 An example of statistical investigation of the text eugene onegin
the connection of samples in chains. A. A. Markov. Science in
Context, 19:591–600, 12 2006.
 Learning to Rank for Information Retrieval. Tie-Yan Liu. Springer
Science & Business Media. 2011
 Finite-Time Regret Bounds for the Multiarmed Bandit Problem,
Nicolò Cesa-Bianchi, Paul Fischer. ICML 100-108, 1998
 Multi-armed bandit allocation indices, Wiley, J. C. Gittins. 1989
 Finite-time Analysis of the Multiarmed Bandit Problem, Peter
Auer et. al., Machine Learning 47, Issue 2-3. 2002.

Dynamic Information Retrieval Tutorial - WSDM 2015

Recommandé

Recommandé

Contenu connexe

Dernier

Dernier (20)

En vedette

En vedette (20)

Dynamic Information Retrieval Tutorial - WSDM 2015