This paper presents a data-driven approach for capturing the temporal variations in user search behaviour by modeling the dynamic query relationships using query-log data. The dependence between different queries (in terms of the query words and latent user intent) is represented using hypergraphs which allows us to explore more complex relationships compared to graph-based approaches. This time-varying dependence is modeled using the framework of probabilistic graphical models. The inferred interactions are used for query keyword suggestion - a key task in web information retrieval. Preliminary experiments using query logs collected from internal search engine of a large health care organization yield promising results. In particular, our model is able to capture temporal variations between queries relationships that reflect known trends in disease occurrence. Further, hypergraph-based modeling captures relationships significantly better compared to graph-based approaches.
Intent-Aware Temporal Query Modeling for Keyword Suggestion
1. Intent-aware Temporal Query Modeling for Keyword
Suggestion
Fredrik Johansson, Vinay Jethava Svetoslav Marinov
Tobias Färdig Chalmers University Findwise AB
Chalmers University Gothenburg, Sweden Gothenburg, Sweden
Gothenburg, Sweden jethava@chalmers.se svetoslav.marinov@findwise.com
{frejohk,tobiasf}@student.chalmers.se
ABSTRACT Modeling intent presents two major challenges. First,
This paper presents a data-driven approach for capturing while research into intent has been conducted for over a
the temporal variations in user search behaviour by mod- decade, no consensus regarding a definition of intent has
eling the dynamic query relationships using query-log data. been established. In order to systematically reason about
The dependence between different queries (in terms of the intent, a mathetmatical definition is needed. Second, it is
query words and latent user intent) is represented using hy- widely recognized that intent changes with time [37, 27].
pergraphs which allows us to explore more complex rela- However, because of the multiple definitions of intent, it is
tionships compared to graph-based approaches. This time- not clear how to model temporal dynamics.
varying dependence is modeled using the framework of prob- Traditionally, intent is modeled as a goal belonging to a
abilistic graphical models. The inferred interactions are used small set of categories such as navigational, transactional or
for query keyword suggestion - a key task in web informa- informational [11]. This view has been extended to multi-
tion retrieval. Preliminary experiments using query logs col- faceted descriptions [33, 4, 21]. However, scalability is a
lected from internal search engine of a large health care or- big issue with such classifications, requiring either costly,
ganization yield promising results. In particular, our model manual labeling, or machine learning approaches with large
is able to capture temporal variations between queries re- amounts of training data. Recent work, related to proba-
lationships that reflect known trends in disease occurrence. bilistic topic models [9], use an implicit, cluster-based rep-
Further, hypergraph-based modeling captures relationships resentation of intent [14].
significantly better compared to graph-based approaches. As of today, it is well known that user intent changes with
time – the same search query may express different goals at
different times [37, 27]. This poses a second challenge in
Categories and Subject Descriptors intent modeling, requiring temporal dynamics to be taken
H.3.3 [Information Storage and Retrieval]: [Informa- into account. While several authors have started applying
tion Search and Retrieval] this knowledge [37, 32] the research into modeling intent
dynamics is still nascent.
General Terms This paper investigates the problem of systematic mod-
eling of dynamic trends within the area of search query in-
Algorithms, Experimentation, Theory
tent. We use an implicit representation of query intent as
a distribution over a set of unknown topics. We present
Keywords a hypergraph-based approach for understanding the time-
user intent, keyword suggestion, graphical model, dynamic varying interactions between search queries. A query inter-
query interaction, query hypergraph action, represented by a hyperedge consisting of the search
queries, expresses commonality in query intent. The varying
1. INTRODUCTION strength of interaction, represented by a dynamic weight of
Understanding user intent is of great importance in many the hyperedge, reflects the changing user needs over some
applications within the realm of information retrieval (IR), time period. This dependence is modeled using the frame-
and more specifically search [37, 36]. A good understanding work of graphical models [26, 38]. Specifically, a genera-
of intent can improve the user experience in several appli- tive model is presented wherein the time-varying interaction
cations such as result ranking, query suggestion etc [13, 32, strength between two queries is modulated by the (latent)
27]. intents of the search queries. The model allows tractable
inference of query interactions from query logs using a mod-
ified Expectation-Maximization algorithm [16, 30].
This paper applies the modeling of query interactions to
Permission to make digital or hard copies of all or part of this work for query keyword suggestion - a key task in web information re-
personal or classroom use is granted without fee provided that copies are trieval. More specifically, alternate keywords are suggested
not made or distributed for profit or commercial advantage and that copies for new documents based on queries which strongly interact
bear this notice and the full citation on the first page. To copy otherwise, to with test query at current time. Preliminary experiments on
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. synthetic data as well querylogs extracted by the consult-
PIKM’12, November 2, 2012, Maui, Hawaii, USA.
Copyright 2012 ACM 978-1-4503-1719-1/12/11 ...$15.00.
2. job openings vacation days We denote simple (pairwise) graphs G = (V, E P ) and hy-
2
x2 = 5 4 x4 = 0 pergraphs H = (V, E). In general, we consider edge to mean
job bank employment hyperedge, of cardinality ke ≥ 2 unless otherwise specified.
e1 parking We model the query interaction network as a hypergraph,
1 3
e2
x1 = 2 x3 = 4 5 x5 = 1 where each node represents one unique query, and each hy-
w1 = 1 w2 = 0 peredge represents a subset of queries expressing similar user
intent, see Figure 1. Every edge e is associated with a
t
weight we , representing the strength of the interaction that
Figure 1: Hypergraph representation of query depends on the time point t. Weights take on values in a
interaction. Nodes represent unique queries and t
small set W . If W = {0, 1} we interpret the case of we = 0
hyperedges (ellipses) represent interactions of the as the queries making up edge e having no interaction at t.
queries inside. The input to our model consists of two components, query
usage data x and a base hypergraph H. Query usage data
are time sequences x = ((xt )T )N with j the index of the
j t=1 j=1
ing company, Findwise AB1 , show that a) our hypergraph-
query and t the time. Query usage xt is taken to be the
j
based approach does significantly better than graph-based
number of times a query j has been made at time t. The
approaches for query keyword suggestion, and b) temporal
hypergraph H = (V, E) is a specification of which edges to
query modeling can improve keyword suggestion.
infer interactions for. Each node represents a unique query
and the edges can be thought of as a representation of which
2. RELATED WORK interactions that are expected to occur at all.
Various methods for modeling query relationships have
been explored, for example using measures of query similar- Intent-based query interaction hypergraph.
ity and clickthrough URLs [5, 6] and topic models [20] We construct the base hypergraph H by identifying sub-
Modeling user intent has been studied in the context of sets of queries which express the same underlying intent. We
query suggestion [37, 13] and related problems including construct an implicit representation of intent in terms of the
query recommendation [32, 20] and query expansion [34]. click-through documents using LDA [10]. More precisely, we
This problem is related to the problem of keyword extrac- assume that there exists a set of topics H = {h1 , . . . , hNH }
tion [24] wherein new documents are assigned a set of key- that documents and queries can belong to. We associate
words when they are entered into a search engine. Baeza- with each click-through document d, a topic distribution
P
Yates et al. [6] analyze a bipartite query-document graph
Ad = [Ad,1 , . . . , Ad,H ] , such that H Ad,h = 1, extracted
h=1
to infer semantic relationships between queries. using LDA [10] with appropriate priors. From the query
Recent work has explored query dynamics towards im- logs, each query qi is associated with a set of documents,
proving the search experience in applications such as query Di = {di,1 , . . . , di,m }, one document clicked on at each in-
suggestions [3] and result ranking [32]. Kulkarni et. al. [27] stance of the query. We compute query topic distributions
study the temporal dynamics in queries, their underlying in- i = [κi,1 , . . . , κi,H ] as the average of document topic dis-
κ
tent and the associated documents. However, none of these tributions weighted by the number of times ci,d a document
approaches consider how queries relate to each other or how has been clicked on after using said query. We build the
such relationships evolve. base graph by identifying queries of similar topic distribu-
Probabilistic graphical models [26, 38] have been widely tions, . We add a node vi to V for each unique query qi .
κ
used in number of fields including study of social networks We add an edge eP = (vi , vj ) to E P if DKL ( i || j ) εKL
P κ κ
[2], biological networks [22, 35], text streams [1]. where DKL (P || Q) = i P (i) ln(P (i)/Q(i)) [28], and εKL
A related model for inferring pair-wise interactions in a is a parameter of the tool. The resulting (pairwise) graph
graph using node observations is Mixed-Membership Stochas- is then converted to a hypergraph H = {V, E} where we
tic Blockmodels (MMSB) [2]; which has been extended to take the hyperedges, E, to be the maximal cliques of the
track temporal dynamics of such interactions [18]. However, pairwise graph. Maximal cliques are computed using the
extension to multi-way interactions is highly non-trivial and Bron-Kerbosch algorithm [12], see extended version [23].
computationally infeasible since it involves inference of ten-
sors rather than matrices. Dependence of query statistics on interactions.
Bendersky and Croft [7] model query interactions as a The dependence of observed query statistics t = {xt }v∈V
x v
hypergraph to capture multi-way interactions. The work on query interactions wt = {we }e∈E present at that time is
t
indicates that a hypergraph representation of query graphs modeled using the Boltzmann distribution,
can be highly beneficial compared to a pairwise approach, „X «
but does not consider temporal dynamics. 1
x
P (X t = t | W t = wt ) =
exp t
we φ(e, t) (1)
Z(wt ) e∈E
3. METHODOLOGY
with Z(wt ) the normalization factor. Here, φ(e) is a poten-
Q
This paper investigates the increasingly recognized view [25,
tial function. In this model we use, φ(e, t) = v∈e xt with
i(v)
19, 7] that in order to successfully capture multi-way interac-
i(v) the index of node v.
tions, one must treat them accordingly. In order to capture
this effectively, we use hypergraphs [8], a generalization of
graphs which allow us to express interactions between sev- Modelling query interaction dynamics.
eral queries. The base hypergraph H specifies sets of queries (as hy-
peredges) which have similar user intent, expressed in terms
1
http://www.findwise.com of distribution i over topics H for each query qi . However,
κ
3. Model Error, k = 3 Error, k ≤ 4
θh Qh Graph-based approach [22] 66% 37%
H Our model 26% 30%
t
Qe wet xi t
Table 1: Error in inferred weights on synthetic data.
λe αe E N
E T k is the cardinality of hyperedges.
Figure 2: Plate notation representation of the gen-
erative model. Letters in bottom right corners of 4. EXPERIMENTS
boxes indicate repetitions of the variables inside.
Evaluation on synthetic data.
We generate random hypergraphs, H = (V, E) of two
user needs vary over time. For example, during periods of types, using the procedure described by Ghosal et. al. [19].
economic downturn, more users might be looking for a job; The first type has uniform edge cardinality k = 3 and the
and this in turn will be reflected in search queries express- second, random cardinality k ≤ 4 following the distribution
ing this need. Our model targets to capture such changes as P (k = 2) = 0.70, P (k = 3) = 0.25, P (k = 4) = 0.05. A copy
t
changes in the hyperedge weights we . of the hypergraph is then converted to a pairwise (simple)
In terms of topic distribution i obtained by LDA, some
κ graph G = (V, E P ) by connecting all pair of nodes (nj , nk )
of the topics will be more active at each time. These active of every hyperedge e with an edge eP = (j, k) in order to
topics govern which query interactions are active at that allow comparison with graph-based approaches [22].
time. We note that the active topics at each time are un- Weight sequences We and observations xt are generated
t
i
known - however, their influence is observed in terms of the for the hypergraph using known (randomly generated) pa-
(latent) active interactions and consequently in the observed rameters Qh and αe,h . We construct one instance of our
query statistics at that time. model using the hypergraph H and one using the pairwise
We make the assumption that edge dynamics are condi- graph G. We perform inference on both of the models us-
tionally independent conditioned on the active topics at that ing the observations x and the two graphs. This results in
time i.e. the interactions in one group of queries is indepen- ˜t ˜ P,t
two weight sequences, We and We , one inferred from each
dent of the interactions in other groups if we know which model, to compare to the original weight sequence, W t .
topic is influencing each set of queries. We assume that the Since the pairwise graph has more and different edges than
set of active topics changes smoothly over time, motivated the hypergraph used as the ground truth, we need to infer
by real-world phenomena such as seasonal changes. Under a weight sequence for the corresponding hypergraph using
this assumption, we model interactions as a Markov model. the pairwise weight sequence. We do this using one of the
t
The transition probability of the interaction strengths we simplest decision rules - majority vote. Furthermore, we
of an edge e can be written as P (We = wl | W 1 , . . . , W t−1 )
t
compute Fleiss’ Kappa[17] , κF as a measure of the agree-
= P (We = wl | W t−1 = wk ) = Qe (k, l) , with the constraint
t
ment between edges in the vote.
PK
l=1 Qe (k, l) = 1 for all k = 1, . . . , K. We take the the number of elements that differ from the
We assume that for each h ∈ H there exist an unknown ˜
truth in sequences We and We as error measure. For the
transition probability matrix Qh which we call topic tran- experiment we use weights W = {−1, 0, 1}, N = 20 nodes,
sition probability. We model the dependence of edge tran- T = 100 time points, H = 20 topics, and E = 50 hyperedges.
sitions on user intent as follows: if h ∈ H is the active In table 1, we present the results of comparing performance
topic for edge e at time t, then the transition probability for pair-wise graphs and hypergraphs on synthetically gener-
for edge e at time t is given by Qt = Qh . Mathematically,
e ated data. The average Fleiss kappa was κF = −0.01 which
this is equivalent to considering a mixture model over the indicates poor agreement in the majority vote [17].
set of topic transition probabilities for each edge e. In the
mixture model, edge transition probabilitis Qe depend on Evaluation on keyword suggestion.
topic transition probabilities Qh and mixture proportions In this section we evaluate the performance of our model in
αe,h . The parameter αe,h govern the influence of topic h in the task of suggesting such keywords for real-world data. We
the evolution of edge e and the relationship between Qe and
P use querylogs and click-through documents from the internal
Qh can be written as Qe (k, l) =
P h∈H αe,h Qh (k, l) with search-engine of a Swedish county council. The dataset con-
h∈H αe,h = 1 for all e ∈ E. The (unknown) parameter α sists of 350 documents and 1 000 000 query instances of 20
can be thought of as a matrix with element αe,h at index 000 unique queries. We construct the training and testing
(e, h). We impose appropriate priors for α and Qh , see sets by partitioning the data chronologically into training
extended version [23] for details. The generative model in- set (90%) and test set (10%).
corporating the concepts described in this section is shown We suggest as keywords, for document d at time t, the set
in Figure 2. t
of queries Kd making up the interacting edge with topic dis-
The modified expectation-maximization (EM) procedure tribution closest to that of the document in question. De-
[16, 31] used for parameter estimation can, because of edge note by Qt = {qd,1 , . . . , qd,n }, for each document d, the set
d
conditional independence, be parallelized using frameworks of queries used to access d in the time interval t. We con-
like MapReduce [15] or GraphLab [29]. t
sider Kd a successful suggestion of keywords for d at time
t if Qt ⊆ Kd , i.e. if we at least suggest as keywords all
d
t
queries that are used to reach the document.
We evaluate suggestions in terms of recall r defined in
the usual way, as well as the percentage of documents that
4. Model Avg. successful suggestions Avg. recall [10] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet
Our model 66% 0.62 allocation. JMLR, 3:993–1022, 2003.
[11] A. Z. Broder. A taxonomy of web search. SIGIR Forum,
36(2):3–10, 2002.
Table 2: Results for keyword suggestion on query [12] C. Bron and J. Kerbosch. Algorithm 457: finding all cliques of
dataset. an undirected graph. Commun. ACM, 16(9):575–577, 1973.
[13] H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li.
Context-aware query suggestion by mining click-through and
are given successful suggestions. This is done for every doc- session data. In Proc. of KDD, pages 875–883, 2008.
ument in the test set. Average recall and percentage of [14] A. Celikyilmaz, D. Hakkani-T¨ r, G. T¨ r, A. Fidler, and
u u
D. Hillard. Exploiting distance based similarity in topic models
successful suggestions are presented in Table 2. for user intent detection. In D. Nahamoo and M. Picheny,
An example of a successful suggestion is the set of key- editors, ASRU, pages 425–430, 2011.
words [pension, expense, travel expenses, foundation] for a [15] J. Dean and S. Ghemawat. Mapreduce: Simplified data
processing on large clusters. Communications of the ACM,
document reached by queries [expense, travel expenses]. 51(1):107–113, 2008.
[16] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood
5. CONCLUSIONS from incomplete data via the em algorithm. J. Royal
Statistical Society, Series B, 39(1):1–38, 1977.
In this paper we have constructed an implicit hypergraph- [17] J. Fleiss. Measuring nominal scale agreement among many
based model of dynamic search intent using the framework raters. Psychological Bulletin, 76(5):378–382, 1971.
[18] W. Fu, L. Song, and E. P. Xing. Dynamic mixed membership
of probabilistic graphical models. We have found through blockmodel for evolving networks. In Proc. of ICML 2009,
tests on synthetic as well as real-world data that our model pages 329–336, 2009.
is capable of recovering changing intent expressed as hidden [19] G. Ghoshal, V. Zlatic, G. Caldarelli, and M. E. J. Newman.
interaction dynamics in search query networks. Random hypergraphs and their applications. CoRR,
abs/0903.0419, 2009.
We have also found that our model can be used in an [20] J. Guo, X. Cheng, G. Xu, and X. Zhu. Intent-aware query
application to document keyword suggestions, and that the similarity. In Proc. of CIKM, pages 259–268, 2011.
suggested keywords are relevant in that they are actually [21] J. Hu, G. Wang, F. Lochovsky, J. T. Sun, and Z. Chen.
used as search terms. Understanding user’s query intent with wikipedia. In Proc. of
the WWW ’09, pages 471–480, New York, USA, 2009. ACM.
This paper is a small piece in long-term research into large [22] V. Jethava, C. Bhattacharyya, D. Dubhashi, and G. N. Vemuri.
scale inference of temporal relationships in multivariate de- Netgem: Network embedded temporal generative model for
pendence models. The implicit representation of intent as gene expression data. BMC Bioinformatics, 12(327), 2011.
a topic distribution used in the paper allows for parallel- [23] F. Johansson and T. F¨rdig. Query concept interaction over
a
time. Master’s thesis, Chalmers University of Technology, 2012.
lizable inference. This fact makes the model suitable for [24] J. Kaur and V. Gupta. Effective approaches for extraction of
large-scale applications using existing frameworks such as keywords. Journal of Computer Science, 7(6):144–148, 2010.
MapReduce [15] or GraphLab [29]. In future work, the im- [25] S. Klamt, U. Haus, and F. Theis. Hypergraphs and cellular
pact of the temporal modeling on real-world applications networks. PLoS computational biology, 5(5):e1000385, 2009.
[26] D. Koller and N. Friedman. Probabilistic Graphical Models -
needs to be demonstrated fully. Principles and Techniques. MIT Press, 2009.
[27] A. Kulkarni, J. Teevan, K. M. Svore, and S. T. Dumais.
6. ACKNOWLEDGMENTS Understanding temporal query dynamics. In Proc. of WSDM
2011, pages 167–176, 2011.
The authors would like to thank Devdatt Dubhashi and [28] S. Kullback and R. A. Leibler. On information and sufficiency.
Chiranjib Bhattacharyya for helpful suggestions during the Ann. Math. Statist., 22(1):79–86, 1951.
[29] Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and
course of this project. We would also like to thank Findwise J. Hellerstein. Graphlab: A new parallel framework for
AB, a consulting company in search solutions, and their em- machine learning. In Conf. on Uncertainty in Artificial
ployees for their insight and help and for providing data. Intelligence (UAI), Catalina Island, California, 2010.
[30] R. Neal and G. E. Hinton. A view of the em algorithm that
justifies incremental, sparse, and other variants. In Learning in
7. REFERENCES Graphical Models, pages 355–368. Kluwer Academic
Publishers, 1998.
[1] A. Ahmed and E. Xing. Timeline: A dynamic hierarchical
dirichlet process model for recovering birth/death and [31] R. M. Neal. Probabilistic inference using Markov chain Monte
evolution of topics in text stream. In Proc. of UAI, 2010. Carlo methods. Technical Report CRG-TR-93-1, Dept. of
Computer Science, University of Toronto, 1993.
[2] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing.
Mixed membership stochastic blockmodels. JMLR, [32] K. Radinsky, K. Svore, S. Dumais, J. Teevan, A. Bocharov,
9:1981–2014, 2008. and E. Horvitz. Modeling and predicting behavioral dynamics
on the web. In Proc. of WWW 2012, pages 599–608, 2012.
[3] E. Alfonseca, M. Ciaramita, and K. Hall. Gazpacho and
summer rash: lexical relationships from temporal patterns of [33] D. E. Rose and D. Levinson. Understanding user goals in web
web search queries. In Proc. of EMNLP 2009, pages search. In Proc. of WWW 2004, pages 13–19, 2004.
1046–1055, 2009. [34] E. Sadikov, J. Madhavan, L. Wang, and A. Halevy. Clustering
[4] R. Baeza-Yates, L. Calder´n-Benavides, and C. Gonz´lez-Caro.
o a query refinements by user intent. In Proc. of WWW 2010,
The intention behind web queries. In Proc. of SPIRE, volume pages 841–850, 2010.
409 of LNCS, pages 98–109, 2006. [35] L. Song, M. Kolar, and E. P. Xing. KELLER: estimating
[5] R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query time-varying interactions between genes. Bioinformatics,
recommendation using query logs in search engines. In Proc. of 25:i128–i136, 2009.
EDBT, pages 588–596, 2004. [36] J. Teevan, S. Dumais, and E. Horvitz. Potential for
[6] R. Baeza-Yates and A. Tiberi. Extracting semantic relations personalization. ACM Trans. on Computer, Jan 2010.
from query logs. In Proc. of KDD, pages 76–85, 2007. [37] J. Teevan, S. Dumais, and D. Liebling. To personalize or not to
[7] M. Bendersky and W. B. Croft. Modeling higher-order term personalize: modeling queries with variation in user intent.
dependencies in information retrieval using query hypergraphs. Proc. of SIGIR 2008, pages 163–170, 2008.
In SIGIR, pages 941–950, 2012. [38] M. Wainwright and M. Jordan. Graphical Models, Exponential
[8] C. Berge. Graphs and hypergraphs. Elsevier, 1976. Families, and Variational Inference. Foundations and Trends
in ML, 1(1-2):1–305, 2008.
[9] D. M. Blei. Introduction to probabilistic topic models.
Commun. ACM, 2011.