The Effects of Time on Query Flow Graph-based Models for Query Suggestion

The Effects of Time on Query
Flow Graph-based Models for
Query Suggestion
Carlos Castillo, Debora Donato Ranieri Baraglia, Franco Maria Nardini
Raffaele Perego, Fabrizio Silvestri
Yahoo! Research Barcelona
HPC Lab, ISTI-CNR, Pisa

martedì 4 maggio 2010

Outline


Outline
• Introduction
• Aims of this Work
• The Query-Flow Graph
• Evaluating the Aging Effect
• Combating the Aging Effect
• Distributed QFG Building
• Conclusions & Future Works


Introduction


Introduction
• Web search engines use query recommender
systems to improve users’ search experience;


Introduction
• Query recommender systems give hints to users on
possible “interesting queries”:
• relative to their information needs;


Introduction
• Query recommender systems give hints to users on
possible “interesting queries”:
• relative to their information needs;
• Query recommender systems exploit the
knowledge of past web search engines users:
• recorded in query logs.

Aims of this Work


Aims of this Work
• to show that time has negative effects on a query
recommender model:
• the model becomes unable to generate good suggestions
as time passes;
• bursty queries;


Aims of this Work
• to show that time has negative effects on a query
recommender model:
• the model becomes unable to generate good suggestions
as time passes;
• bursty queries;
• to extend a state-of-the-art recommender system by providing
a methodology for dealing efficiently with evolving data;
• to define a “good” strategy to update the model;
• to define an distributed/parallel algorithm to update the
model;

The Query-Flow Graph


•
barcelona fc
QFG [Boldi et al., CIKM’08] is a website

compact and powerful representation 0.043
barcelona fc

of Web Search engine users’ behavior; 0.031
fixtures

barcelona fc 0.017 real
madrid
0.080
0.011
0.506

0.439
barcelona
hotels 0.072
0.018 cheap
barcelona
0.023
hotels
0.029
<T>

barcelona luxury
0.043
barcelona
0.018
barcelona hotels
weather
0.416

0.523
0.100

barcelona
weather
online


•
barcelona fc
QFG [Boldi et al., CIKM’08] is a website

compact and powerful representation 0.043
barcelona fc

of Web Search engine users’ behavior; 0.031
fixtures

• QFG is a graph composed by:
0.080
barcelona fc 0.017 real
madrid

1. a set of nodes, V = Q ∪ {s,t}; 0.011
0.506

0.439

2. a set of directed edges, E ⊆ V x V: barcelona
hotels 0.072
0.018 cheap

•
barcelona
0.023
(q, q’) are connected if they are 0.029
hotels
<T>

consecutive at least one time in 0.043
barcelona luxury

at least one session;
barcelona
0.018
barcelona hotels
weather
0.416

3. a weighting function w = E --> (0, 1]:
•
0.523
assigning a weight w(q, q’) to 0.100

each edge; barcelona
weather
online



• two weighting schemes:
• relative frequencies: counting query occurrences;
• chaining probabilities: (q,q’) in the same chain
• classiﬁcation on a set of features (text, n-grams,
session) over all sessions where (q,q’) are
consecutive;



• two weighting schemes:
• relative frequencies: counting query occurrences;
• chaining probabilities: (q,q’) in the same chain
• classiﬁcation on a set of features (text, n-grams,
session) over all sessions where (q,q’) are
consecutive;
• noisy edges: edges with low probability are removed;


• Query recommendation:
• random walk with restart on the graph;
• considering history of the users (on the
preference vector);



• Query recommendation:
• random walk with restart on the graph;
• considering history of the users (on the
preference vector);
• A score is associated to each suggestion;

Experimental
Framework


Experimental
Framework
• Experiments on the AOL query log:


Experimental
Framework
• 20 millions queries;


Experimental
Framework
• 650,000 different users;


Experimental
Framework
• 3 months (03/01/2006 --> 05/31/2006).


Experimental
Framework
• 3 months (03/01/2006 --> 05/31/2006).
• Three segments of the query log:
M1 M2

!"#$%&'()$ !"#$*+',-$ !"#$%&.$ /!#)$%&.$

Experimental
Assumptions


Boldi et al. in [4]. This method uses chaining probabi
measured by means of a machine learning method. The

Experimental
tial step was thus to extract those features from each t
ing log, and storing them into a compressed graph re
sentation. In particular we extracted 25 different feat

Assumptions
(time-related, session and textual features) for each pa
queries (q, q ) that are consecutive in at least one sessio
the query log.
Table 1 shows the number of nodes and edges of the

• M , M are used for training;
1 2
ferent graphs corresponding to each query log segment
for training.

• two different QFGs; time window
March 06
id
M1
nodes
3,814,748
edges
6,129,629
April 06 M2 3,832,973 6,266,648

Table 1: Number of nodes and edges for the gra
corresponding to the two different training
ments.

It is important to remark that we have not re-trained
classification model for the assignment of weights associ
with QFG edges. We reuse the one that has been used i
for segmenting users sessions into query chains1 . Th
another point in favor of QFG-based models. Once you t
the classifier to assign weights to QFG edges, you can r
it on different data-sets without losing in effectiveness.
martedì 4 maggio 2010 1


Experimental

Assumptions
the query log.

1 2
for training.

March 06
id
M1
nodes
3,814,748
edges
6,129,629
April 06 M2 3,832,973 6,266,648

• Queries in the third month Number of nodes testing; for the gra
Table 1: are used for and edges
ments.

for segmenting users sessions into query chains1 . Th


Experimental

Assumptions
the query log.

1 2
for training.

March 06
id
M1
nodes
3,814,748
edges
6,129,629
April 06 M2 3,832,973 6,266,648

• Queries in the third month Number of nodes testing; for the gra
Table 1: are used for and edges

• We evaluate the aging effect by measuring the quality
ments.

of suggestions produced by models on M , and M ;
1 2

• If the model ages M
outperforms M , in terms of
for segmenting users sessions1into query chains1 . Th
2
quality of suggestions;

Evaluating the Aging
Effect


Effect
1e+06
Top 1000 queries in month 1 on month 1

100000

10000

1000

100

10 !#$%'()*+,'

1
1 10 100 1000


Effect
• Two classes of test queries:
• F1: 30 queries highly
1e+06

frequent in M1 having a 100000

large drop in the test
month (ex. shakira). 10000

• F3: 30 queries highly 1000

frequent in the test
month having a large
100

drop in M1 (ex. da vinci 10 !#$%'()*+,'
code, mothers day gift);
1
1 10 100 1000


Effect
• Two classes of test queries:
• F1: 30 queries highly
1e+06

frequent in M1 having a 100000

large drop in the test
month (ex. shakira). 10000

• F3: 30 queries highly 1000

frequent in the test
month having a large
100

drop in M1 (ex. da vinci 10 !#$%'()*+,'
code, mothers day gift);
•
1

F1, F3 contain very diverse
1 10 100 1000

queries;

Effect (II)


3742 2652
2162 2615

2001 2341
1913 2341
1913 2341

Effect (II)
(!!!

'!!!

!!!

%!!! )*+,
-./)012.342+*5
$!!!

#!!!

!
# $ % ' (


3742 2652
2162 2615

2001 2341
1913 2341
1913 2341

Effect (II)
• When k suggestions share the
same score, those are useless; (!!!

'!!!

!!!

%!!! )*+,
-./)012.342+*5
$!!!

#!!!

!
# $ % ' (


3742 2652
2162 2615

2001 2341
1913 2341
1913 2341

Effect (II)

• Same suggestion score: '!!!

•
!!!
same probability on the
graph; %!!! )*+,
-./)012.342+*5

• the model is not able to $!!!

give a priority to #!!!

recommendations; !
# $ % ' (


3742 2652
2162 2615

2001 2341
1913 2341
1913 2341

Effect (II)

• Same suggestion score: '!!!

•
!!!
same probability on the
graph; %!!! )*+,
-./)012.342+*5

• the model is not able to $!!!

give a priority to #!!!

recommendations; !

• Conﬁrmed by an user-study
# $ % ' (

on F1, and F3;

Effect (III)


Effect (III)
• Working hypothesis:
• useful recommendations do not share the same
recommendation score;


Effect (III)
• Working hypothesis:
• useful recommendations do not share the same
recommendation score;
• Automatic evaluation;
• 400 highly frequent queries in the test month;
• evaluating the number of useful recommendations;
• k = 3;

Effect (IV)


ate recommendations are taken from diﬀerent query

recommendations with their assigned relative scores.

Effect (IV)
reduces the “noise” on the data and generates more precise
knowledge on which recommendations are computed. Fur-
thermore, the increase is quite independent from the thresh-
old level, i.e. by increasing the threshold from 0.5 to 0.75
the overall quality is, roughly, constant.

• Results: ﬁltering
threshold
average number
of useful sugges-
tions on M1
average number
of useful sugges-
tions on M2
0 2.84 2.91
0.5 5.85 6.23
0.65 5.85 6.23
0.75 5.85 6.18

Table 4: Recommendation statistics obtained by us-
ing the automatic evaluation method on a set of 400
queries drawn from the most frequent in the third
month.

We further break down the overall results shown in Table 4
to show the number of queries on which the QFG-based

ate recommendations are taken from diﬀerent query

recommendations with their assigned relative scores.

Effect (IV)
reduces the “noise” on the data and generates more precise
knowledge on which recommendations are computed. Fur-
thermore, the increase is quite independent from the thresh-
old level, i.e. by increasing the threshold from 0.5 to 0.75
the overall quality is, roughly, constant.

threshold
average number
of useful sugges-
tions on M1
average number
of useful sugges-
tions on M2
0 2.84 2.91
0.5 5.85 6.23
0.65 5.85 6.23
0.75 5.85 6.18

• Table 4: Recommendation statistics obtained by us-
Average ing the automatic evaluation method on a set of 400
number of useful suggestions is greater in
M2 than queries drawn from the most frequent in the third
in M1;
month.

• Filtering process helps a lot;
We further break down the overall results shown in Table 4
to show the number of queries on which the QFG-based

Effect (V)


Effect (V)

• On a histogram (cumulative distribution):
400

300

200

100

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

M1 M2


Effect (V)

• On a histogram (cumulative distribution):
400

300

200

100

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

M1 M2

• Results on M are always better than those on M :
2 1

• less queries without suggestions;

Combating the Aging
Effect


Combating the Aging
Effect

• QFG recommender models age:
• Average recommendation quality degrades;
• Recommendations should not be inﬂuenced by
time;


Combating the Aging
Effect

• QFG recommender models age:
• Average recommendation quality degrades;
• Recommendations should not be inﬂuenced by
time;
• Update of the model vs. rebuilding it “from scratch”;

Combating the Aging
Effect (II)


Combating the Aging
t a model
or which Effect (II)
QFGs. Suppose the model used to generate recommenda-
tions consists of a portion of data representing one month
(for M1 and M2 ) or two months (for M12 ) of the query
commen- log. The model is being updated every 15 days (for M1

•
to always and M2 ) or every 30 days (for M12 ). By using the ﬁrst ap-
Solution: incremental update of Mevery means days to rebuild
proach, we pay 22 (44) minutes 1 by 15 (30) of “fresh data” in M2

•
the new model from scratch on a new set of data obtained
Graph the last two months of the query log. Instead, by using
from algebra [Bordino et al., 2008];
FLOW
•
the second approach, we need to pay only 15 (32) minutes
Some measures on the two different approaches:
for updating the one-month (two-months) QFG.
apidly in
“From scratch” “Incremental”
commen-
Dataset strategy [min.] strategy [min.]
endation M1 (March 2006) 21 14
tive queries. M2 (April 2006) 22 15
both fre- M12 (March and April) 44 32
heir value
ariation). Table 5: Time needed to build a Query Flow Graph
o movies, from scratch and using our “incremental” approach
eral with (from merging two QFG representing an half of
it is easy data).

Combating the Aging
t a model
or which Effect (II)
QFGs. Suppose the model used to generate recommenda-
tions consists of a portion of data representing one month
(for M1 and M2 ) or two months (for M12 ) of the query
commen- log. The model is being updated every 15 days (for M1

•
to always and M2 ) or every 30 days (for M12 ). By using the ﬁrst ap-
Solution: incremental update of Mevery means days to rebuild
proach, we pay 22 (44) minutes 1 by 15 (30) of “fresh data” in M2

•
the new model from scratch on a new set of data obtained
Graph the last two months of the query log. Instead, by using
from algebra [Bordino et al., 2008];
FLOW
•
the second approach, we need to pay only 15 (32) minutes
Some measures on the two different approaches:
for updating the one-month (two-months) QFG.
apidly in
“From scratch” “Incremental”
commen-
Dataset strategy [min.] strategy [min.]
endation M1 (March 2006) 21 14
tive queries. M2 (April 2006) 22 15
both fre- M12 (March and April) 44 32

•
heir value
Incremental updates: 2/3 of the build w.r.t. “from scratch” strategy;
ariation). Table 5: Time needed to time a Query Flow Graph
from scratch and using our “incremental” approach
•
o movies,
Evaluation onmerging two QFG representing an half of
eral with (from the same set of 400 queries;
it is easy data).

Combating the Aging
Effect (III)


3698 shakira video
shakira 3135 shakira nude

Combating the Aging
3099 shakira wallpaper
3020 shakira biography
3018 shakira aol music
2015 free video downloads

Effect (III)
Table 7: Some examples of recommendations gen-
erated on diﬀerent QFG models. Queries used to
generate recommendations are taken from diﬀerent
query sets.

threshold
average number
of useful sugges-
tions on M2
average number
of useful sugges-
tions on M12
0 2.91 3.64
0.5 6.23 7.95
0.65 6.23 7.94
0.75 6.18 7.9

Table 8: Recommendation statistics obtained by us-
ing the automatic evaluation method on a relatively
large set of 400 queries drawn from the most fre-
quent in the third month.

gated the main reasons why we obtain such an improvement.

3698 shakira video
shakira 3135 shakira nude

Combating the Aging
3099 shakira wallpaper
3020 shakira biography
3018 shakira aol music
2015 free video downloads

Effect (III)
Table 7: Some examples of recommendations gen-
erated on diﬀerent QFG models. Queries used to
generate recommendations are taken from diﬀerent
query sets.

threshold
average number
of useful sugges-
tions on M2
average number
of useful sugges-
tions on M12
0 2.91 3.64
0.5 6.23 7.95
0.65 6.23 7.94
0.75 6.18 7.9

• Average number of useful suggestion is obtained by us-
Table 8: Recommendation statistics greater in
ing the automatic evaluation method on a relatively
M12 than in M2, or 400M1;
large set of in queries drawn from the most fre-
quent in the third month.

gated the main reasons why we obtain such an improvement.

Combating the Aging
Effect (IV)


12,5

Combating the Aging
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

M1 M2 M12

Effect (IV)
Figure 4: Histogram showing the number of queries
(on the y axis) having a certain number of useful
recommendations (on the x axis). Results are eval-

• uated automatically.
On a histogram (cumulative distribution):
400

300

t 200

100

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

M1 M2 M12

-
Figure 5: Histogram showing the total number of
queries (on the y axis) having at least a certain num-
ber of useful recommendations (on the x axis). For
instance the third bucket shows how many queries

12,5

Combating the Aging
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

M1 M2 M12

Effect (IV)
Figure 4: Histogram showing the number of queries
(on the y axis) having a certain number of useful
recommendations (on the x axis). Results are eval-

• uated automatically.
On a histogram (cumulative distribution):
400

300

t 200

100

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

M1 M2 M12

- • Results on M12 are always better than M1, and M2;
Figure 5: Histogram showing the total number of
• queries improvement ofhaving at least aleast four good
large (on the y axis) queries with at certain num-
suggestions;
ber of useful recommendations (on the x axis). For
instance the third bucket shows how many queries

Distributed QFG
Building


Distributed QFG
4. using the graph algebra described in [8], each pa
graph is iteratively merged. Each iteration is do
parallel on the different available nodes of the clo

Building
5. the final resulting data-graph is now processed
other steps [4] (normalization, chain extraction,
dom walk) to obtain the complete and usable QF

• a parallel way to update QFGs:
01)2()*+,'#3456#7)8#
Divide-and-Conquer approach;
• the query log is split in m
!#$%'# !#$%'# !#$%'# !#$%'#
parts;
• parallel extraction of the
-./# -./# -./# -./#
features;
• compressing step;
!#()*+,#-./# !#()*+,#-./#
• merging graphs;
• final operations 9#()*+,'#-./#
(normalization, pagerank, etc.);
martedì 4 maggio 2010 Figure 6: Example of the building of a two mo

Conclusions


Conclusions
• We study the effects of time on QFG-based query
recommender systems;


Conclusions
• We study the effects of time on QFG-based query
recommender systems;
• We built different QFGs from the AOL query log;
• we analyze the quality of recommendation;
• we show that recommendation models ages;
• we introduce an “incremental” algorithm for updating
the model;
• we propose a parallel/distributed way of building
QFGs;

Future Works


Future Works
• to deﬁne a strategy for merging graphs assigning
different weights to each subgraph;
• more importance to “fresh” data;


Future Works
• to compare the robustness of QFG recommender
systems with other query recommenders with
respect to aging;


Future Works
• to compare the robustness of QFG recommender
systems with other query recommenders with
respect to aging;
• to design a MapReduce algorithm to build and update
efﬁciently QFGs recommender systems;

Questions?

Thank you for your attention!


References

• [Boldi et al., CIKM’08]: The Query Flow Graph: model
and applications. Boldi, Bonchi, Castillo, Donato,
Gionis,Vigna. CIKM’08.
• [Boldi et al., WSCD’09]: Query Suggestions using
Query-Flow Graphs. Boldi, Bonchi, Castillo, Donato,
Vigna. WSCD’09.
• [Bordino et al., 2008]: Algebra for the joint mining of
query log graphs, 2008.


The Effects of Time on Query Flow Graph-based Models for Query Suggestion

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (13)

Plus de Carlos Castillo (ChaTo)

Plus de Carlos Castillo (ChaTo) (20)

Dernier

Dernier (20)

The Effects of Time on Query Flow Graph-based Models for Query Suggestion