SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Efficient Query Suggestions in the Long Tail
                Joint work:
                R. Perego, F. Silvestri, H. Vahabi, R. Venturini, HPC Lab, Italy
                F. Bonchi, Yahoo! Research, Spain




Wednesday, August 24, 2011
Query suggestion practices

                   • Use of the Wisdom of the Crowd mined
                             from Query Logs to recommend related
                             queries that are likely to better specify the
                             information need of the user
                              • shorten length of user sessions
                              • enhance perceived QoE

Wednesday, August 24, 2011
Queries in the Head




Wednesday, August 24, 2011
Queries in the Head




Wednesday, August 24, 2011
Queries in the Head




Wednesday, August 24, 2011
Queries in the Long Tail




Wednesday, August 24, 2011
Queries in the Long Tail
                             ?




Wednesday, August 24, 2011
Queries in the Long Tail
                             ?




                       ?
Wednesday, August 24, 2011
Queries in the Long Tail
                                      ?




                             Rare and never-seen



                       ?
                             queries account for
                              more than 50% of
                                 the traffic!



Wednesday, August 24, 2011
Open issues
                                    • Sparsity of models:
                                      • query assistance services perform
                                          poorly or are not even triggered
                                          on long-tail queries
                                    •   Performance:
                       Popularity




                                        • on-line process going in parallel
                                          with query answering



                                               Queries ordered by popularity




Wednesday, August 24, 2011
SoA: Query Flow Graph

     •      Query-centric approach
     •      Suggest queries by
            computing Random Walks
            with Restarts (RWRs) on
            the query-flow graph
            (QFG) by starting from
            the current user query


P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S.Vigna: The query-flow graph: model and applications. CIKM 2008: 609-618
       P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S.Vigna: Query suggestions using query-flow graphs. WSCD, 2009
Wednesday, August 24, 2011
Query-centric suggestions
                        Computing RWRs on a huge graph, e.g., built
                        from a QL recording 580,797,850 queries
                        (from Y! us):
                             • |V|     28,763,637
                             • |E|      56,250,874




Wednesday, August 24, 2011
Query-centric suggestions
                        Computing RWRs on a huge graph, e.g., built
                        from a QL recording 580,797,850 queries
                        (from Y! us):
                             • |V|            28,763,637
                             • |E|            56,250,874

                             • |{q: f(q)=1}| 162,221,967 (28%)

Wednesday, August 24, 2011
Term-centric opportunities
                        But, in the same Y! QL:
                             • queries             580,797,850
                             • Term occurrences   1,343,988,549




Wednesday, August 24, 2011
Term-centric opportunities
                        But, in the same Y! QL:
                             • queries             580,797,850
                             • Term occurrences   1,343,988,549

                             • |{t: f(t)=1}|         5,099,145 (0.04%)




Wednesday, August 24, 2011
The TQGraph
                                                                 *#%+,)
                                                    "#%&'("')&                      /#)(
                                     %-!&.'"#

                             !"##


                                                                      "#%&'("')&
                                       !"##$                          /#)($*#%+,)
                                    "#%&'("')&
                                      *#%+,)$
                                     %-!&.'"#
                                                 !"#$%&'




                                                                                           Term nodes are added to the
                                                                                           QFG which have only outgoing
                                                                                           links pointing at the query
                                                                                           nodes corresponding to the
                                                                                           queries in which the terms
                                                                                           occur.

Wednesday, August 24, 2011
TQGraph model
                   Suggestions for an incoming query q composed of
                   terms {t1,...tm} ⊆ T are generated from G by
                   extracting the center-piece subgraph starting from
                   the nodes corresponding to terms t1,...,tm
                   •         perform m Random Walks with Restart from
                             each one of the m term nodes corresponding to
                             terms in q
                   •         multiply component-wise the resulting m
                             stationary distributions


Wednesday, August 24, 2011
fro
                        100 queries on Yahoo!   useful   somewhat   not useful   M
                        α = 0.9                  48%       11%        41%        re

                             TQG effectiveness
                        α = 0.5
                        α = 0.1
                                                 41%
                                                 37%
                                                           20%
                                                           20%
                                                                      39%
                                                                      43%

                  Table 2: Effectiveness of TQGraph-based recommen-
          •      User study results comparing TQG and QFGand query
                  dations on the two different set of queries effectiveness
                 for two different testbeds (Y! US and MSN QLs).
                  logs, by varying the restart parameter α.

                        TREC on MSN             useful   somewhat   not useful
                        TQGraph α = 0.9          57%       16%        27%
                        QFG                      50%        9%        42%        sid
                                                                                 is
                        100 queries on Yahoo!   useful   somewhat   not useful   sit
                        TQGraph α = 0.9          48%       11%        41%        qu
                        QFG                      23%       10%        67%        lem
                                                                                 in
                                                                                 m
                    Table 3: User study results comparing effectiveness           th
                    of our method with the baseline for the two different
                    testbeds.                                                    qu
Wednesday, August 24, 2011
queries on Yahoo! useful somewhat not useful                 MSN query
    ing to outperform the effectiveness of QFG. The fact that log. The query is “lower heart rate”. Below we
 0.9                          48%     11%       41%           report the top 5 recommendations.
 0.5
    we, instead, achieve such a remarkable result is, actually, an
                              41%     20%       39%
    additional benefit of TQGraph-based 43%      methods.

                  Effectiveness on rare queries
 0.1                          37%     20%                             Query: lower heart rate
    Anecdotal evidence. We next show a few examplesSuggested Query      of                                     Score
    query recommendations. We start from queries that have            things to lower heart rate             2.9 e−14
e 2: Effectiveness of TQGraph-based recommen-
                                                                      lower heart rate through exercise      2.6 e−14
 nsnever been observed in the query log, i.e., the most difficult
     on the two different set of queries and query
                                                                      accelerated heart rate and pregnant 2.9 e−15
 by varying the restart parameter α. is one among the eight
    cases. The first query that we show

          •
                                                                      web md                                 2.0 e−16
     on MSN    Anecdotal evidence
    from the TREC testbed that do not appear at all in the problems
ECMSN query log. useful query is “lower heart rate”. Below we
                              The somewhat not useful
                                                                      heart                                  8.0 e−17

Graph α = 0.9 top 5 57%
    report the                recommendations. 27%
                                      16%                        We can observe that all the top-5 suggestions can be con-
                              50%      9%       42%           sidered pertinent to the initial topic. Moreover, even if this
             Query: lower heart rate                          is not an objective in this paper, they present some diver-
 queries on Yahoo! useful somewhat not useful
             Suggested Query                              Score the first two are how-to queries, while the last three are
                                                              sity:
Graph α = things to lower heart rate
              0.9             48%     11%       41%           queries related to finding information w.r.t. possible prob-
                                                        2.9 e−14
                              23%     10%
             lower heart rate through exercise  67%
                                                        2.6 e−14            Query not occurring
                                                              lems (with one very specific for pregnant women). The most
                                                              interesting recommendation is probably “web md”, which
e 3: Userweb mdresults comparing effectiveness
               study
                                                             −15
                                                                             in the training log
             accelerated heart rate and pregnant 2.9 e makes perfect sense4 , and has a large edit distance from
                                                        2.0 e−16 original query.
                                                              the
r method with the baseline for the two different
             heart problems                             8.0 e−17The next query we present is a rare (i.e., rarely appearing
 eds.                                                         query): “dog heat”; which appears only twice in the MSN
                                                              query log.
eas, we pair TREC queries up with the model built on can be con-
       We can observe that all the top-5 suggestions
 SN query log. In fact,to the initial topic. Moreover, even if this
    sidered pertinent the period from which TREC                  Query: dog heat
  come, is an objectiveperiod in paper, MSN queries some diver-
    is not     close to the in this which they present
 ubmitted. first two are how-to queries, while the last three are  Suggested Query                                  Score
    sity: the                                                     heat cycle dog pads                            4.3 e−10
 generated the top-5 recommendations for each query
    queries related to finding information w.r.t. possible what happens when female dog is
                                                                  prob-
            Query occurring twice
 ng both the QFG and the TQGraph with different pa-
    lems (with one very specific for pregnant women). The most in heat & a male dog is around
 ers setting. Using a web interface each assessor was                                                            4.0 e−10
              in the training log
nted a random query followed by the list of all the“web md”, boxer dog in heat
    interesting recommendation is probably dif-
                            sense4 , Recommendations were
                                                                  which
    makes perfect produced.and has a large edit distancedog in heat symptoms
  recommendations                                                   from
                                                                                                               3.99 e−10
                                                                                                               3.98 e−10
    the original query.
nted shuffled, in order for the assessor to not be able to          behavior of a male dog
        which system produced them. We give (i.e., rarely appearing around a female dog in heat
guishThe next query we present is a rare assessors                                                             3.95 e−10
ossibility to “dog heat”; search engine results for the in the MSN
    query): observe the which appears only twice
 al query and the recommended query that was being
    query log.                                                   As in the previous example, the top-5 suggestions are
ated. The assessor was asked to rate a recommendation         qualitatively good and present some diversity. Also, the
 one of the following scores: useful, somewhat useful,
   Wednesday, August 24, 2011                                 TQGraph-based method returns long queries, thus likely to
TQG pros

                   • provide query suggestions of quality
                             comparable/better than QFG even for rare
                             and unique queries
                   • several possible optimizations for achieving


Wednesday, August 24, 2011
TQG pros

                   • provide query suggestions of quality
                             comparable/better than QFG even for rare
                             and unique queries
                   • several possible optimizations for achieving
                                   an efficient on-line query
                                   recommendation service

Wednesday, August 24, 2011
Indexing precomputed suggestions
                                                  !"#$%&'(




                                                                       012"#3"4%014"5%#"6#"7"1389:1%:;%3<"%=>=7%
                                                                       ?:$6@3"4%:1%3<"%!AB#86<C%!<"%D"5-?:1%-7%$84"%
                                                                       @6%:;%3"#$%1:4"7(%6:791E7%8#"%3<"%7389:18#F%
                                                                       4-73#-G@9:1%28D@"7C%




                                                             )*%+,-%$.(/01*$023+,-(,4("35$.(6,751(0-(*'5(
                                                             !"#$%&'(%1(,2*%0-57(2.(%(898(4$,:(!5$:(;(


                   •         recommendations for an incoming query+ are computed -by
                               !"#$%&%     '&()*%        ')()+*%      ') (),*%       ') ().-/&**%
                             processing the posting lists associated with the terms in the query
                             90*'0-(23<=5*1(>35$051(%$5(1,$*57(2.(*'50$(?/1@()<,$51(%$5(
                             %&&$,A0:%*57(2.(*'5(B$5%*51*(2,3-7C(0@5@(D0(4,$(%EE(0(F(G@(




Wednesday, August 24, 2011
Indexing precomputed suggestions
                                                  !"#$%&'(




                                                                       012"#3"4%014"5%#"6#"7"1389:1%:;%3<"%=>=7%
                                                                       ?:$6@3"4%:1%3<"%!AB#86<C%!<"%D"5-?:1%-7%$84"%
                                                                       @6%:;%3"#$%1:4"7(%6:791E7%8#"%3<"%7389:18#F%
                                                                       4-73#-G@9:1%28D@"7C%




                                                             )*%+,-%$.(/01*$023+,-(,4("35$.(6,751(0-(*'5(
                                                             !"#$%&'(%1(,2*%0-57(2.(%(898(4$,:(!5$:(;(


                   •         recommendations for an incoming query+ are computed -by
                               !"#$%&%     '&()*%        ')()+*%      ') (),*%       ') ().-/&**%
                             processing the posting lists associated with the terms in the query
                             90*'0-(23<=5*1(>35$051(%$5(1,$*57(2.(*'50$(?/1@()<,$51(%$5(
                                 :)      O(|T|) posting lists
                             %&&$,A0:%*57(2.(*'5(B$5%*51*(2,3-7C(0@5@(D0(4,$(%EE(0(F(G@(


                                 :(      O(|Q|) length of each posting list
Wednesday, August 24, 2011
Pruning posting lists
                    • sort postings by probability and prune them
                              at a reasonable threshold p, e.g. 20,000
me quality of those

Graph-based recom-
lity those produced
 ar, TQGraph-based
 very large fraction
 as we shall present
QGraph can be pre-
 d list”-based repre-
very fast generation


PH
 ved online, a query
ciently, possibly in
n we introduce some
eneration of recom-
r, we show that the
 Wednesday, August 24, 2011
Pruning posting lists
                    • sort postings by probability and prune them
                              at a reasonable threshold p, e.g. 20,000
me quality of those

Graph-based recom-
lity those produced
 ar, TQGraph-based
 very large fraction
 as we shall present
QGraph can be pre-
 d list”-based repre-
very fast generation


PH
 ved online, a query
          O(|T|) lists, each of size O(p) and no loss in quality!
ciently, possibly in
n we introduce some
eneration of recom-
r, we show that the
 Wednesday, August 24, 2011
!"#$%&'(




                   Bucketing probabilities                      012"#3"4%014"5%#"6#"7"1389:1%:;%3<"%=>=7%


                 • Most space used for storing probabilities
                                                                ?:$6@3"4%:1%3<"%!AB#86<C%!<"%D"5-?:1%-7%$84"%
                                                                @6%:;%3"#$%1:4"7(%6:791E7%8#"%3<"%7389:18#F%
                                                                4-73#-G@9:1%28D@"7C%


                 • Given ε < 1, we can arrange postings in
                         buckets implicitly coding the approximate
                         probabilities )*%+,-%$.(/01*$023+,-(,4("35$.(6,751(0-(*'5(
                                       !"#$%&'(%1(,2*%0-57(2.(%(898(4$,:(!5$:(;(



                             !"#$%&%       '&()*%             ')()+*%            ')+(),*%            ')-().-/&**%

                       90*'0-(23<=5*1(>35$051(%$5(1,$*57(2.(*'50$(?/1@()<,$51(%$5(
                       %&&$,A0:%*57(2.(*'5(B$5%*51*(2,3-7C(0@5@(D0(4,$(%EE(0(F(G@(




Wednesday, August 24, 2011
!"#$%&'(




                   Bucketing probabilities                      012"#3"4%014"5%#"6#"7"1389:1%:;%3<"%=>=7%


                 • Most space used for storing probabilities
                                                                ?:$6@3"4%:1%3<"%!AB#86<C%!<"%D"5-?:1%-7%$84"%
                                                                @6%:;%3"#$%1:4"7(%6:791E7%8#"%3<"%7389:18#F%
                                                                4-73#-G@9:1%28D@"7C%


                 • Given ε < 1, we can arrange postings in
                         buckets implicitly coding the approximate
                         probabilities )*%+,-%$.(/01*$023+,-(,4("35$.(6,751(0-(*'5(
                                       !"#$%&'(%1(,2*%0-57(2.(%(898(4$,:(!5$:(;(



                             !"#$%&%       '&()*%             ')()+*%            ')+(),*%            ')-().-/&**%

                       90*'0-(23<=5*1(>35$051(%$5(1,$*57(2.(*'50$(?/1@()<,$51(%$5(
                       %&&$,A0:%*57(2.(*'5(B$5%*51*(2,3-7C(0@5@(D0(4,$(%EE(0(F(G@(



                                • Each entry coded with a few bits, e.g., 11-19 bits
                                • ~5x reduction!
                                • no loss in quality!
Wednesday, August 24, 2011
Caching posting lists
ided
both
            • achieving in-memory query suggestion
 are



d us-
e lost
 ever,
sults.
qual-
en in
hat a
ed by
make
 iffer-
which                Figure 4: Miss ratio of our cache as a function of
t Wednesday, August 24, 2011
   our
Conclusions
                   •         TQG model to overcome limitations of current query
                             recommenders
                         •      based on a principled, term-centric approach supporting rare and
                                never-seen queries
                   •         deployment with a efficient inverted index resulting in effectiveness
                             comparable/better to SoA approaches
                   •         the pruning, bucketing, caching techniques proposed constitute a
                             independent contribution in the area of efficiency in large scale RWR
                             computations
                         •      reduction of about 80% in the space occupancy w.r.t.
                                uncompressed data structures
                         •      in-memory RWRs on huge graphs with 90+ % hit-ratio cache



Wednesday, August 24, 2011
Вопросы?
                       Questions?
Wednesday, August 24, 2011

Contenu connexe

Similaire à Raffaele Perego "Efficient Query Suggestions in the Long Tail"

Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Julius Hietala
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingLionel Briand
 
Linked Open Vocabulary Ranking and Terms Discovery
Linked Open Vocabulary Ranking and Terms DiscoveryLinked Open Vocabulary Ranking and Terms Discovery
Linked Open Vocabulary Ranking and Terms DiscoveryIoannis Stavrakantonakis
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...Jeff Z. Pan
 
Visually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and PatternsVisually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and PatternsXiaoyu Wang
 
An Introduction to Quantum Programming Languages
An Introduction to Quantum Programming LanguagesAn Introduction to Quantum Programming Languages
An Introduction to Quantum Programming LanguagesDavid Yonge-Mallo
 
Ph d sem_1@iitm
Ph d sem_1@iitmPh d sem_1@iitm
Ph d sem_1@iitmVinu Ev
 
sa-mincut-aditya.ppt
sa-mincut-aditya.pptsa-mincut-aditya.ppt
sa-mincut-aditya.pptaashnareddy1
 
A Study on Glyph-based Visualisation with Dense Visual Context
A Study on Glyph-based Visualisation with Dense Visual ContextA Study on Glyph-based Visualisation with Dense Visual Context
A Study on Glyph-based Visualisation with Dense Visual ContextSaiful Khan
 

Similaire à Raffaele Perego "Efficient Query Suggestions in the Long Tail" (11)

Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"
 
Discussants
DiscussantsDiscussants
Discussants
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software Testing
 
Linked Open Vocabulary Ranking and Terms Discovery
Linked Open Vocabulary Ranking and Terms DiscoveryLinked Open Vocabulary Ranking and Terms Discovery
Linked Open Vocabulary Ranking and Terms Discovery
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
 
Visually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and PatternsVisually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and Patterns
 
An Introduction to Quantum Programming Languages
An Introduction to Quantum Programming LanguagesAn Introduction to Quantum Programming Languages
An Introduction to Quantum Programming Languages
 
Ph d sem_1@iitm
Ph d sem_1@iitmPh d sem_1@iitm
Ph d sem_1@iitm
 
sa-mincut-aditya.ppt
sa-mincut-aditya.pptsa-mincut-aditya.ppt
sa-mincut-aditya.ppt
 
sa.ppt
sa.pptsa.ppt
sa.ppt
 
A Study on Glyph-based Visualisation with Dense Visual Context
A Study on Glyph-based Visualisation with Dense Visual ContextA Study on Glyph-based Visualisation with Dense Visual Context
A Study on Glyph-based Visualisation with Dense Visual Context
 

Plus de yaevents

Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...yaevents
 
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...yaevents
 
Мониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, ЯндексМониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, Яндексyaevents
 
Истории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, ЯндексИстории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, Яндексyaevents
 
Разработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, ShturmannРазработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, Shturmannyaevents
 
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...yaevents
 
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...yaevents
 
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, ЯндексСканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндексyaevents
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebookyaevents
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Юнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, GoogleЮнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, Googleyaevents
 
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...yaevents
 
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...yaevents
 
В поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, НигмаВ поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, Нигмаyaevents
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...yaevents
 
Поисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, ЯндексПоисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, Яндексyaevents
 
Julia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-awareJulia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-awareyaevents
 
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...yaevents
 
Evangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval EvaluationEvangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval Evaluationyaevents
 
Ben Carterett — Advances in Information Retrieval Evaluation
Ben Carterett — Advances in Information Retrieval EvaluationBen Carterett — Advances in Information Retrieval Evaluation
Ben Carterett — Advances in Information Retrieval Evaluationyaevents
 

Plus de yaevents (20)

Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
 
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
 
Мониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, ЯндексМониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, Яндекс
 
Истории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, ЯндексИстории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, Яндекс
 
Разработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, ShturmannРазработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, Shturmann
 
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
 
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
 
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, ЯндексСканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
Юнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, GoogleЮнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, Google
 
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
 
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
 
В поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, НигмаВ поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, Нигма
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...
 
Поисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, ЯндексПоисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, Яндекс
 
Julia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-awareJulia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-aware
 
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
 
Evangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval EvaluationEvangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval Evaluation
 
Ben Carterett — Advances in Information Retrieval Evaluation
Ben Carterett — Advances in Information Retrieval EvaluationBen Carterett — Advances in Information Retrieval Evaluation
Ben Carterett — Advances in Information Retrieval Evaluation
 

Dernier

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Dernier (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Raffaele Perego "Efficient Query Suggestions in the Long Tail"

  • 1. Efficient Query Suggestions in the Long Tail Joint work: R. Perego, F. Silvestri, H. Vahabi, R. Venturini, HPC Lab, Italy F. Bonchi, Yahoo! Research, Spain Wednesday, August 24, 2011
  • 2. Query suggestion practices • Use of the Wisdom of the Crowd mined from Query Logs to recommend related queries that are likely to better specify the information need of the user • shorten length of user sessions • enhance perceived QoE Wednesday, August 24, 2011
  • 3. Queries in the Head Wednesday, August 24, 2011
  • 4. Queries in the Head Wednesday, August 24, 2011
  • 5. Queries in the Head Wednesday, August 24, 2011
  • 6. Queries in the Long Tail Wednesday, August 24, 2011
  • 7. Queries in the Long Tail ? Wednesday, August 24, 2011
  • 8. Queries in the Long Tail ? ? Wednesday, August 24, 2011
  • 9. Queries in the Long Tail ? Rare and never-seen ? queries account for more than 50% of the traffic! Wednesday, August 24, 2011
  • 10. Open issues • Sparsity of models: • query assistance services perform poorly or are not even triggered on long-tail queries • Performance: Popularity • on-line process going in parallel with query answering Queries ordered by popularity Wednesday, August 24, 2011
  • 11. SoA: Query Flow Graph • Query-centric approach • Suggest queries by computing Random Walks with Restarts (RWRs) on the query-flow graph (QFG) by starting from the current user query P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S.Vigna: The query-flow graph: model and applications. CIKM 2008: 609-618 P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S.Vigna: Query suggestions using query-flow graphs. WSCD, 2009 Wednesday, August 24, 2011
  • 12. Query-centric suggestions Computing RWRs on a huge graph, e.g., built from a QL recording 580,797,850 queries (from Y! us): • |V| 28,763,637 • |E| 56,250,874 Wednesday, August 24, 2011
  • 13. Query-centric suggestions Computing RWRs on a huge graph, e.g., built from a QL recording 580,797,850 queries (from Y! us): • |V| 28,763,637 • |E| 56,250,874 • |{q: f(q)=1}| 162,221,967 (28%) Wednesday, August 24, 2011
  • 14. Term-centric opportunities But, in the same Y! QL: • queries 580,797,850 • Term occurrences 1,343,988,549 Wednesday, August 24, 2011
  • 15. Term-centric opportunities But, in the same Y! QL: • queries 580,797,850 • Term occurrences 1,343,988,549 • |{t: f(t)=1}| 5,099,145 (0.04%) Wednesday, August 24, 2011
  • 16. The TQGraph *#%+,) "#%&'("')& /#)( %-!&.'"# !"## "#%&'("')& !"##$ /#)($*#%+,) "#%&'("')& *#%+,)$ %-!&.'"# !"#$%&' Term nodes are added to the QFG which have only outgoing links pointing at the query nodes corresponding to the queries in which the terms occur. Wednesday, August 24, 2011
  • 17. TQGraph model Suggestions for an incoming query q composed of terms {t1,...tm} ⊆ T are generated from G by extracting the center-piece subgraph starting from the nodes corresponding to terms t1,...,tm • perform m Random Walks with Restart from each one of the m term nodes corresponding to terms in q • multiply component-wise the resulting m stationary distributions Wednesday, August 24, 2011
  • 18. fro 100 queries on Yahoo! useful somewhat not useful M α = 0.9 48% 11% 41% re TQG effectiveness α = 0.5 α = 0.1 41% 37% 20% 20% 39% 43% Table 2: Effectiveness of TQGraph-based recommen- • User study results comparing TQG and QFGand query dations on the two different set of queries effectiveness for two different testbeds (Y! US and MSN QLs). logs, by varying the restart parameter α. TREC on MSN useful somewhat not useful TQGraph α = 0.9 57% 16% 27% QFG 50% 9% 42% sid is 100 queries on Yahoo! useful somewhat not useful sit TQGraph α = 0.9 48% 11% 41% qu QFG 23% 10% 67% lem in m Table 3: User study results comparing effectiveness th of our method with the baseline for the two different testbeds. qu Wednesday, August 24, 2011
  • 19. queries on Yahoo! useful somewhat not useful MSN query ing to outperform the effectiveness of QFG. The fact that log. The query is “lower heart rate”. Below we 0.9 48% 11% 41% report the top 5 recommendations. 0.5 we, instead, achieve such a remarkable result is, actually, an 41% 20% 39% additional benefit of TQGraph-based 43% methods. Effectiveness on rare queries 0.1 37% 20% Query: lower heart rate Anecdotal evidence. We next show a few examplesSuggested Query of Score query recommendations. We start from queries that have things to lower heart rate 2.9 e−14 e 2: Effectiveness of TQGraph-based recommen- lower heart rate through exercise 2.6 e−14 nsnever been observed in the query log, i.e., the most difficult on the two different set of queries and query accelerated heart rate and pregnant 2.9 e−15 by varying the restart parameter α. is one among the eight cases. The first query that we show • web md 2.0 e−16 on MSN Anecdotal evidence from the TREC testbed that do not appear at all in the problems ECMSN query log. useful query is “lower heart rate”. Below we The somewhat not useful heart 8.0 e−17 Graph α = 0.9 top 5 57% report the recommendations. 27% 16% We can observe that all the top-5 suggestions can be con- 50% 9% 42% sidered pertinent to the initial topic. Moreover, even if this Query: lower heart rate is not an objective in this paper, they present some diver- queries on Yahoo! useful somewhat not useful Suggested Query Score the first two are how-to queries, while the last three are sity: Graph α = things to lower heart rate 0.9 48% 11% 41% queries related to finding information w.r.t. possible prob- 2.9 e−14 23% 10% lower heart rate through exercise 67% 2.6 e−14 Query not occurring lems (with one very specific for pregnant women). The most interesting recommendation is probably “web md”, which e 3: Userweb mdresults comparing effectiveness study −15 in the training log accelerated heart rate and pregnant 2.9 e makes perfect sense4 , and has a large edit distance from 2.0 e−16 original query. the r method with the baseline for the two different heart problems 8.0 e−17The next query we present is a rare (i.e., rarely appearing eds. query): “dog heat”; which appears only twice in the MSN query log. eas, we pair TREC queries up with the model built on can be con- We can observe that all the top-5 suggestions SN query log. In fact,to the initial topic. Moreover, even if this sidered pertinent the period from which TREC Query: dog heat come, is an objectiveperiod in paper, MSN queries some diver- is not close to the in this which they present ubmitted. first two are how-to queries, while the last three are Suggested Query Score sity: the heat cycle dog pads 4.3 e−10 generated the top-5 recommendations for each query queries related to finding information w.r.t. possible what happens when female dog is prob- Query occurring twice ng both the QFG and the TQGraph with different pa- lems (with one very specific for pregnant women). The most in heat & a male dog is around ers setting. Using a web interface each assessor was 4.0 e−10 in the training log nted a random query followed by the list of all the“web md”, boxer dog in heat interesting recommendation is probably dif- sense4 , Recommendations were which makes perfect produced.and has a large edit distancedog in heat symptoms recommendations from 3.99 e−10 3.98 e−10 the original query. nted shuffled, in order for the assessor to not be able to behavior of a male dog which system produced them. We give (i.e., rarely appearing around a female dog in heat guishThe next query we present is a rare assessors 3.95 e−10 ossibility to “dog heat”; search engine results for the in the MSN query): observe the which appears only twice al query and the recommended query that was being query log. As in the previous example, the top-5 suggestions are ated. The assessor was asked to rate a recommendation qualitatively good and present some diversity. Also, the one of the following scores: useful, somewhat useful, Wednesday, August 24, 2011 TQGraph-based method returns long queries, thus likely to
  • 20. TQG pros • provide query suggestions of quality comparable/better than QFG even for rare and unique queries • several possible optimizations for achieving Wednesday, August 24, 2011
  • 21. TQG pros • provide query suggestions of quality comparable/better than QFG even for rare and unique queries • several possible optimizations for achieving an efficient on-line query recommendation service Wednesday, August 24, 2011
  • 22. Indexing precomputed suggestions !"#$%&'( 012"#3"4%014"5%#"6#"7"1389:1%:;%3<"%=>=7% ?:$6@3"4%:1%3<"%!AB#86<C%!<"%D"5-?:1%-7%$84"% @6%:;%3"#$%1:4"7(%6:791E7%8#"%3<"%7389:18#F% 4-73#-G@9:1%28D@"7C% )*%+,-%$.(/01*$023+,-(,4("35$.(6,751(0-(*'5( !"#$%&'(%1(,2*%0-57(2.(%(898(4$,:(!5$:(;( • recommendations for an incoming query+ are computed -by !"#$%&% '&()*% ')()+*% ') (),*% ') ().-/&**% processing the posting lists associated with the terms in the query 90*'0-(23<=5*1(>35$051(%$5(1,$*57(2.(*'50$(?/1@()<,$51(%$5( %&&$,A0:%*57(2.(*'5(B$5%*51*(2,3-7C(0@5@(D0(4,$(%EE(0(F(G@( Wednesday, August 24, 2011
  • 23. Indexing precomputed suggestions !"#$%&'( 012"#3"4%014"5%#"6#"7"1389:1%:;%3<"%=>=7% ?:$6@3"4%:1%3<"%!AB#86<C%!<"%D"5-?:1%-7%$84"% @6%:;%3"#$%1:4"7(%6:791E7%8#"%3<"%7389:18#F% 4-73#-G@9:1%28D@"7C% )*%+,-%$.(/01*$023+,-(,4("35$.(6,751(0-(*'5( !"#$%&'(%1(,2*%0-57(2.(%(898(4$,:(!5$:(;( • recommendations for an incoming query+ are computed -by !"#$%&% '&()*% ')()+*% ') (),*% ') ().-/&**% processing the posting lists associated with the terms in the query 90*'0-(23<=5*1(>35$051(%$5(1,$*57(2.(*'50$(?/1@()<,$51(%$5( :) O(|T|) posting lists %&&$,A0:%*57(2.(*'5(B$5%*51*(2,3-7C(0@5@(D0(4,$(%EE(0(F(G@( :( O(|Q|) length of each posting list Wednesday, August 24, 2011
  • 24. Pruning posting lists • sort postings by probability and prune them at a reasonable threshold p, e.g. 20,000 me quality of those Graph-based recom- lity those produced ar, TQGraph-based very large fraction as we shall present QGraph can be pre- d list”-based repre- very fast generation PH ved online, a query ciently, possibly in n we introduce some eneration of recom- r, we show that the Wednesday, August 24, 2011
  • 25. Pruning posting lists • sort postings by probability and prune them at a reasonable threshold p, e.g. 20,000 me quality of those Graph-based recom- lity those produced ar, TQGraph-based very large fraction as we shall present QGraph can be pre- d list”-based repre- very fast generation PH ved online, a query O(|T|) lists, each of size O(p) and no loss in quality! ciently, possibly in n we introduce some eneration of recom- r, we show that the Wednesday, August 24, 2011
  • 26. !"#$%&'( Bucketing probabilities 012"#3"4%014"5%#"6#"7"1389:1%:;%3<"%=>=7% • Most space used for storing probabilities ?:$6@3"4%:1%3<"%!AB#86<C%!<"%D"5-?:1%-7%$84"% @6%:;%3"#$%1:4"7(%6:791E7%8#"%3<"%7389:18#F% 4-73#-G@9:1%28D@"7C% • Given ε < 1, we can arrange postings in buckets implicitly coding the approximate probabilities )*%+,-%$.(/01*$023+,-(,4("35$.(6,751(0-(*'5( !"#$%&'(%1(,2*%0-57(2.(%(898(4$,:(!5$:(;( !"#$%&% '&()*% ')()+*% ')+(),*% ')-().-/&**% 90*'0-(23<=5*1(>35$051(%$5(1,$*57(2.(*'50$(?/1@()<,$51(%$5( %&&$,A0:%*57(2.(*'5(B$5%*51*(2,3-7C(0@5@(D0(4,$(%EE(0(F(G@( Wednesday, August 24, 2011
  • 27. !"#$%&'( Bucketing probabilities 012"#3"4%014"5%#"6#"7"1389:1%:;%3<"%=>=7% • Most space used for storing probabilities ?:$6@3"4%:1%3<"%!AB#86<C%!<"%D"5-?:1%-7%$84"% @6%:;%3"#$%1:4"7(%6:791E7%8#"%3<"%7389:18#F% 4-73#-G@9:1%28D@"7C% • Given ε < 1, we can arrange postings in buckets implicitly coding the approximate probabilities )*%+,-%$.(/01*$023+,-(,4("35$.(6,751(0-(*'5( !"#$%&'(%1(,2*%0-57(2.(%(898(4$,:(!5$:(;( !"#$%&% '&()*% ')()+*% ')+(),*% ')-().-/&**% 90*'0-(23<=5*1(>35$051(%$5(1,$*57(2.(*'50$(?/1@()<,$51(%$5( %&&$,A0:%*57(2.(*'5(B$5%*51*(2,3-7C(0@5@(D0(4,$(%EE(0(F(G@( • Each entry coded with a few bits, e.g., 11-19 bits • ~5x reduction! • no loss in quality! Wednesday, August 24, 2011
  • 28. Caching posting lists ided both • achieving in-memory query suggestion are d us- e lost ever, sults. qual- en in hat a ed by make iffer- which Figure 4: Miss ratio of our cache as a function of t Wednesday, August 24, 2011 our
  • 29. Conclusions • TQG model to overcome limitations of current query recommenders • based on a principled, term-centric approach supporting rare and never-seen queries • deployment with a efficient inverted index resulting in effectiveness comparable/better to SoA approaches • the pruning, bucketing, caching techniques proposed constitute a independent contribution in the area of efficiency in large scale RWR computations • reduction of about 80% in the space occupancy w.r.t. uncompressed data structures • in-memory RWRs on huge graphs with 90+ % hit-ratio cache Wednesday, August 24, 2011
  • 30. Вопросы? Questions? Wednesday, August 24, 2011