SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
Efficient Diversification of Web
        Search Results
    G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri
                    ISTI - CNR, Pisa, Italy
Introduction: SE Results
             Diversification

• Query: “Vinci”, what’s the user’s intent?
   • Information on Leonardo da Vinci?
   • Information on Vinci the small village in Tuscany?
   • Information on Vinci the company?
   • Others?

           F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   2
Introduction: SE Results
             Diversification

• Query: “Vinci”, what’s the user’s intent?
   • Information on Leonardo da Vinci?
   • Information on Vinci the small village in Tuscany?
   • Information on Vinci the company?
   • Others?

           F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   2
Introduction: SE Results
             Diversification

• Query: “Vinci”, what’s the user’s intent?
   • Information on Leonardo da Vinci?
   • Information on Vinci the small village in Tuscany?
   • Information on Vinci the company?
   • Others?

           F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   2
Query Diversification as a
            Coverage Problem
• Hypothesis:
 • For each user’s query I can tell what’s the set of all possible intents
 • For each document in the collection I can tell what are all the possible user’s
    intents it represents
    • each intent for each document is, possibly, weighted by a value representing how
      much that intent is represented by that document (e.g., 1/2 of document D is
      related to the intent of “digital photography techniques”)
• Goal:
 • Select the set of k documents in the collection covering the maximum amount of
    intent weight. I.e., maximize the number of satisfied users.


              F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   3
State-of-the-Art Methods


•   IASelect:
 •   Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. 2009. Diversifying search results. In
     Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM '09), Ricardo Baeza-
     Yates, Paolo Boldi, Berthier Ribeiro-Neto, and B. Barla Cambazoglu (Eds.). ACM, New York, NY, USA, 5-14.


• xQuAD:
 •   Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. Exploiting query reformulations for Web search
     result diversification. In Proceedings of the 19th International Conference on World Wide Web, pages 881-890, Raleigh,
     NC, USA, 2010. ACM.




                  F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow     4
Diversify (k)




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   5
Diversify (k)
                                                                       intents




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   5
Diversify (k)
                                                                                                         the weight
                                                                       intents




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                5
Diversify (k)
                                                                                                         the weight
                                                                       intents




                                                                               the weight is the probability of
                                                                                  being relative to intent c




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                5
Diversify (k)
                                                                                                         the weight
                                                                       intents




                                                                               the weight is the probability of
                                                                                  being relative to intent c




                                                                   d is not
                                                                pertinent to c




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                5
Diversify (k)
                                                                                                         the weight
                                                                       intents




                                                                               the weight is the probability of
                                                                                  being relative to intent c




                                                                   d is not
                                                                pertinent to c
                                                   no doc is
                                                 pertinent to c



F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                5
Diversify (k)
                                                                                                         the weight
                                                                       intents




                                                                               the weight is the probability of
                                                                                  being relative to intent c




                                                                   d is not
                                                                pertinent to c

                at least one doc is                no doc is
                  pertinent to c                 pertinent to c



F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                5
Known Results
• Diversify(k) is NP-hard:
 • Reduction from max-weight coverage
• Diversify(k)’s objective function is sub-modular:
 • Admits a (1-1/e)-approx. algorithm.
 • The algorithm works by inserting one result at a time, we insert the
   result with the max marginal utility.
 • Quadratic complexity in the number of results to consider:
  • at each iteration scan the complete list of not-yet-inserted results.
            F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   6
Known Results
• Diversify(k) is NP-hard:
 • Reduction from max-weight coverage
• Diversify(k)’s objective function is sub-modular:
 • Admits a (1-1/e)-approx. algorithm.
 • The algorithm works by inserting one result at a time, we insert the
   result with the max marginal utility.
 • Quadratic complexity in the number of results to consider:
  • at each iteration scan the complete list of not-yet-inserted results.
            F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   6
It looks reasonable, but...
•   ... we might not diversify, at all!
•   Consider a query returning a set Rd={a,b,c} of documents and two possible categories g,h.
•   The query is pertaining to each document with the same probability, i.e., P(g|q) = P(h|q) =
    1/2.

                                     dV                     V(x|q,g)                     V(x|q,h)
                                      a                           1                            0
                                      b                           1                            0
                                      c                          1/2                          1/2


•   The optimal selection is S={a,b}, replacing either a or b with c will make the objective
    function decrease its value.


                  F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   7
It looks reasonable, but...
•   ... we might not diversify, at all!
•   Consider a query returning a set Rd={a,b,c} of documents and two possible categories g,h.
•   The query is pertaining to each document with the same probability, i.e., P(g|q) = P(h|q) =
    1/2.

                                     dV                     V(x|q,g)                     V(x|q,h)
                                      a                           1                            0
                                      b                           1                            0
                                      c                          1/2                          1/2


•   The optimal selection is S={a,b}, replacing either a or b with c will make the objective
    function decrease its value.


                  F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   7
xQuAD_Diversify(k)




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   8
xQuAD_Diversify(k)




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   8
xQuAD_Diversify(k)




                                                                       Same problem as before...
                                                                       It may not diversify, at all.
F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   8
Our Proposal:
                   MaxUtility




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
Vinci                     Our Proposal:
                           MaxUtility




        F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
Leonardo da Vinci
Vinci      Vinci Town                      Our Proposal:
           Vinci Group                      MaxUtility




                         F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
Leonardo da Vinci
Vinci      Vinci Town
                    1/3
                          5/12
                                            Our Proposal:
           Vinci Group
                    1/4
                                             MaxUtility




                          F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
Leonardo da Vinci
Vinci      Vinci Town
                    1/3
                          5/12
                                            Our Proposal:
           Vinci Group
                    1/4
                                             MaxUtility



                     Rq                                                                                                     S




                          F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
Leonardo da Vinci
Vinci      Vinci Town
                    1/3
                          5/12
                                            Our Proposal:
           Vinci Group
                    1/4
                                             MaxUtility



                     Rq                                                                                                     S




                          F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   9
MaxUtility_Diversify(k)




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   10
MaxUtility_Diversify(k)



                                                                                                         Probability of query q’ being a
                                                                                                           specialization for query q




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                                 10
MaxUtility_Diversify(k)



                                                                                                         Probability of query q’ being a
                                                                                                           specialization for query q


                                            Set of possible query
                                               specializations




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow                                 10
Why it is Efficient?

• By using a simple arithmetic argument we can show that:


• Therefore we can find the optimal set S of diversified
 documents by using a sort-based approach.


          F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   11
OptSelect




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   12
OptSelect




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   12
The Specialization Set Sq
• It is crucial for OptSelect to
  have the set of specialization
  available for each query.
• Our method is, thus, query log-
  based.
 • we use a query recommender system
   to obtain a set of queries from which Sq
   is built by including the most popular
   (i.e., freq. in query log > f(q) / s)
   recommendations:


                    F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   13
Probability Estimation




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   14
Usefulness of a Result




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   15
Usefulness of a Result




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   15
Experiments: Settings

• TREC 2009 Web track's Diversity Task framework:
 • ClueWeb-B, the subset of the TREC ClueWeb09 dataset
 • The 50 topics (i.e., queries) provided by TREC
 • We evaluate α-NDCG and IA-P
• All the tests were conducted on a Intel Core 2 Quad PC with
 8Gb of RAM and Ubuntu Linux 9.10 (kernel 2.6.31-22).


          F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   16
Experiments: Quality




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   17
Experiments: Efficiency




F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   18
Conclusions and Future Work
• We studied the problem of search results diversification from an efficiency point of
  view
• We derived a diversification method (OptSelect):
  •   same (or better) quality of the state of the art

  •   up to 100 times faster

• Future work:
  •   the exploitation of users' search history for personalizing result diversification

  •   the use of click-through data to improve our effectiveness results, and

  •   the study of a search architecture performing the diversification task in parallel with the
      document scoring phase (Done! See DDR2011 paper)


                 F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   19
Question Time




                                     Fabrizio Silvestri
                                   ISTI-CNR, Pisa Italy
                          http://hpc.isti.cnr.it/~fabriziosilvestri
                                   f.silvestri@isti.cnr.it
F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow   20

Contenu connexe

Plus de yaevents

Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...yaevents
 
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...yaevents
 
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...yaevents
 
Мониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, ЯндексМониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, Яндексyaevents
 
Истории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, ЯндексИстории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, Яндексyaevents
 
Разработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, ShturmannРазработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, Shturmannyaevents
 
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...yaevents
 
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...yaevents
 
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, ЯндексСканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндексyaevents
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebookyaevents
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Юнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, GoogleЮнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, Googleyaevents
 
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...yaevents
 
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...yaevents
 
В поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, НигмаВ поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, Нигмаyaevents
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...yaevents
 
Поисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, ЯндексПоисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, Яндексyaevents
 
Julia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-awareJulia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-awareyaevents
 
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...yaevents
 
Evangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval EvaluationEvangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval Evaluationyaevents
 

Plus de yaevents (20)

Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
 
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
 
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
 
Мониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, ЯндексМониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, Яндекс
 
Истории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, ЯндексИстории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, Яндекс
 
Разработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, ShturmannРазработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, Shturmann
 
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
 
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
 
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, ЯндексСканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
Юнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, GoogleЮнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, Google
 
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
 
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
 
В поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, НигмаВ поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, Нигма
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...
 
Поисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, ЯндексПоисковая технология "Спектр". Андрей Плахов, Яндекс
Поисковая технология "Спектр". Андрей Плахов, Яндекс
 
Julia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-awareJulia Stoyanovich - Making interval-based clustering rank-aware
Julia Stoyanovich - Making interval-based clustering rank-aware
 
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
 
Evangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval EvaluationEvangelos Kanoulas — Advances in Information Retrieval Evaluation
Evangelos Kanoulas — Advances in Information Retrieval Evaluation
 

Dernier

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Dernier (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

"Efficient Diversification of Web Search Results"

  • 1. Efficient Diversification of Web Search Results G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri ISTI - CNR, Pisa, Italy
  • 2. Introduction: SE Results Diversification • Query: “Vinci”, what’s the user’s intent? • Information on Leonardo da Vinci? • Information on Vinci the small village in Tuscany? • Information on Vinci the company? • Others? F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 2
  • 3. Introduction: SE Results Diversification • Query: “Vinci”, what’s the user’s intent? • Information on Leonardo da Vinci? • Information on Vinci the small village in Tuscany? • Information on Vinci the company? • Others? F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 2
  • 4. Introduction: SE Results Diversification • Query: “Vinci”, what’s the user’s intent? • Information on Leonardo da Vinci? • Information on Vinci the small village in Tuscany? • Information on Vinci the company? • Others? F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 2
  • 5. Query Diversification as a Coverage Problem • Hypothesis: • For each user’s query I can tell what’s the set of all possible intents • For each document in the collection I can tell what are all the possible user’s intents it represents • each intent for each document is, possibly, weighted by a value representing how much that intent is represented by that document (e.g., 1/2 of document D is related to the intent of “digital photography techniques”) • Goal: • Select the set of k documents in the collection covering the maximum amount of intent weight. I.e., maximize the number of satisfied users. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 3
  • 6. State-of-the-Art Methods • IASelect: • Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. 2009. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM '09), Ricardo Baeza- Yates, Paolo Boldi, Berthier Ribeiro-Neto, and B. Barla Cambazoglu (Eds.). ACM, New York, NY, USA, 5-14. • xQuAD: • Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. Exploiting query reformulations for Web search result diversification. In Proceedings of the 19th International Conference on World Wide Web, pages 881-890, Raleigh, NC, USA, 2010. ACM. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 4
  • 7. Diversify (k) F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 8. Diversify (k) intents F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 9. Diversify (k) the weight intents F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 10. Diversify (k) the weight intents the weight is the probability of being relative to intent c F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 11. Diversify (k) the weight intents the weight is the probability of being relative to intent c d is not pertinent to c F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 12. Diversify (k) the weight intents the weight is the probability of being relative to intent c d is not pertinent to c no doc is pertinent to c F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 13. Diversify (k) the weight intents the weight is the probability of being relative to intent c d is not pertinent to c at least one doc is no doc is pertinent to c pertinent to c F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 5
  • 14. Known Results • Diversify(k) is NP-hard: • Reduction from max-weight coverage • Diversify(k)’s objective function is sub-modular: • Admits a (1-1/e)-approx. algorithm. • The algorithm works by inserting one result at a time, we insert the result with the max marginal utility. • Quadratic complexity in the number of results to consider: • at each iteration scan the complete list of not-yet-inserted results. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 6
  • 15. Known Results • Diversify(k) is NP-hard: • Reduction from max-weight coverage • Diversify(k)’s objective function is sub-modular: • Admits a (1-1/e)-approx. algorithm. • The algorithm works by inserting one result at a time, we insert the result with the max marginal utility. • Quadratic complexity in the number of results to consider: • at each iteration scan the complete list of not-yet-inserted results. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 6
  • 16. It looks reasonable, but... • ... we might not diversify, at all! • Consider a query returning a set Rd={a,b,c} of documents and two possible categories g,h. • The query is pertaining to each document with the same probability, i.e., P(g|q) = P(h|q) = 1/2. dV V(x|q,g) V(x|q,h) a 1 0 b 1 0 c 1/2 1/2 • The optimal selection is S={a,b}, replacing either a or b with c will make the objective function decrease its value. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 7
  • 17. It looks reasonable, but... • ... we might not diversify, at all! • Consider a query returning a set Rd={a,b,c} of documents and two possible categories g,h. • The query is pertaining to each document with the same probability, i.e., P(g|q) = P(h|q) = 1/2. dV V(x|q,g) V(x|q,h) a 1 0 b 1 0 c 1/2 1/2 • The optimal selection is S={a,b}, replacing either a or b with c will make the objective function decrease its value. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 7
  • 18. xQuAD_Diversify(k) F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 8
  • 19. xQuAD_Diversify(k) F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 8
  • 20. xQuAD_Diversify(k) Same problem as before... It may not diversify, at all. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 8
  • 21. Our Proposal: MaxUtility F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 22. Vinci Our Proposal: MaxUtility F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 23. Leonardo da Vinci Vinci Vinci Town Our Proposal: Vinci Group MaxUtility F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 24. Leonardo da Vinci Vinci Vinci Town 1/3 5/12 Our Proposal: Vinci Group 1/4 MaxUtility F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 25. Leonardo da Vinci Vinci Vinci Town 1/3 5/12 Our Proposal: Vinci Group 1/4 MaxUtility Rq S F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 26. Leonardo da Vinci Vinci Vinci Town 1/3 5/12 Our Proposal: Vinci Group 1/4 MaxUtility Rq S F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 9
  • 27. MaxUtility_Diversify(k) F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 10
  • 28. MaxUtility_Diversify(k) Probability of query q’ being a specialization for query q F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 10
  • 29. MaxUtility_Diversify(k) Probability of query q’ being a specialization for query q Set of possible query specializations F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 10
  • 30. Why it is Efficient? • By using a simple arithmetic argument we can show that: • Therefore we can find the optimal set S of diversified documents by using a sort-based approach. F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 11
  • 31. OptSelect F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 12
  • 32. OptSelect F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 12
  • 33. The Specialization Set Sq • It is crucial for OptSelect to have the set of specialization available for each query. • Our method is, thus, query log- based. • we use a query recommender system to obtain a set of queries from which Sq is built by including the most popular (i.e., freq. in query log > f(q) / s) recommendations: F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 13
  • 34. Probability Estimation F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 14
  • 35. Usefulness of a Result F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 15
  • 36. Usefulness of a Result F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 15
  • 37. Experiments: Settings • TREC 2009 Web track's Diversity Task framework: • ClueWeb-B, the subset of the TREC ClueWeb09 dataset • The 50 topics (i.e., queries) provided by TREC • We evaluate α-NDCG and IA-P • All the tests were conducted on a Intel Core 2 Quad PC with 8Gb of RAM and Ubuntu Linux 9.10 (kernel 2.6.31-22). F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 16
  • 38. Experiments: Quality F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 17
  • 39. Experiments: Efficiency F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 18
  • 40. Conclusions and Future Work • We studied the problem of search results diversification from an efficiency point of view • We derived a diversification method (OptSelect): • same (or better) quality of the state of the art • up to 100 times faster • Future work: • the exploitation of users' search history for personalizing result diversification • the use of click-through data to improve our effectiveness results, and • the study of a search architecture performing the diversification task in parallel with the document scoring phase (Done! See DDR2011 paper) F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 19
  • 41. Question Time Fabrizio Silvestri ISTI-CNR, Pisa Italy http://hpc.isti.cnr.it/~fabriziosilvestri f.silvestri@isti.cnr.it F. Silvestri - Efficient Diversification of Web Search Results - Yandex Tech Talk 22 August 2011, Moscow 20