SlideShare a Scribd company logo
1 of 28
Download to read offline
Current Approaches in
Search Result Diversification
         Mario Sangiorgio
Presentation outline
Problem definition

What is diversity?

The relevance/diversity trade-off

Performance evaluation

Open issues and conclusions
Why is result diversification needed?

   A couple of real life examples
Ambiguous query


     Flash
Unambiguous query



 Nuclear power plant
Problem definition


Search result diversification is an optimization
 problem aiming to find k items which are the
subset of all relevant results that contains both
  most relevant and most diverse results.
What is needed



Relevance measure       Diversity measure




           Diversification objective
The result diversification process



Items are ranked by relevance                          Diversity is measured




                 The two measures are used to get the final ranking
What is diversity?
How can items be diverse?

  Word sense diversity,
from ambiguous queries




                  Information source
                     diversity, from
                  unambiguous queries
Measures of diversity

Diversity is tightly coupled with the concept of
                     similarity

To address the different aspects of the problem
         several measures emerged:
             Semantic distance
            Categorical distance
             Novel information
Semantic distance
     Diversifies on content dissimilarity
Uses the min-hashing
  scheme to get the        S๎‚žd ๎‚Ÿ={MH h 1 ๎‚žd ๎‚Ÿ ,... , MH h ๎‚žd ๎‚Ÿ}
                                                         n


sketch of a document
                                         โˆฃS๎‚žu ๎‚ŸโˆฉS๎‚žv๎‚Ÿโˆฃ
  Distance is computed      sim ๎‚žu , v๎‚Ÿ=
                                         โˆฃS๎‚žu ๎‚ŸโˆชS๎‚žv๎‚Ÿโˆฃ
 from Jaccard similarity      d ๎‚žu , v๎‚Ÿ=1โˆ’sim ๎‚žu , v๎‚Ÿ


Does not work well when the documents have too
      different lengths or small sketch size
Categorical distance
Emphasizes word sense diversification

  It is based on metadata (Taxonomy)

The measure is a weighted tree distance
                    l ๎‚žu ๎‚Ÿ                            l ๎‚žv๎‚Ÿ
                                     1                                 1
   d ๎‚žu , v๎‚Ÿ=      โˆ‘            2
                                    e ๎‚žiโˆ’1๎‚Ÿ
                                              ๎‚ƒ      โˆ‘            2
                                                                      e ๎‚žiโˆ’1๎‚Ÿ
                i=lca ๎‚žu , v๎‚Ÿ                     i=lca ๎‚žu , v๎‚Ÿ


        Examples of taxonomies:
      /Top/Health vs /Top/Finance
/Top/Sport/Racing vs /Top/Sport/Football
Novel information
   Diversifies on a general sense regarding
   content dissimilarity. Good for subtopics
Results are represented with unigram language
models (Used for natural language processing)
  For each document is evaluated (with the
Kullback-Leibler divergence) how much novel
       information it brings into the set
How many extra bits will be needed to describe
  the new document using only the already
        selected document in the set
Diversity measures: open issues

  Some aspects not taken into account:

    intrinsic properties of the document

          genre of the document

       sentiment regarding the topic
The relevance/diversity
 optimization problem
Diversification objectives
It has been proved impossible to find a function
       that has all the required properties:

               scale invariance
                 consistency
                   richness
                    stability
     independence of irrelevant attributes
                 monotonicity
            strength of relevance
            strength of similarity
Diversification objectives
           Several functions proposed:
         Max sum                 Max min
        (No stability)        (No consistency
                                nor stability)
 Max sum of max score
                              Mono objective
(Maximizes relevancy and
                              (No consistency)
     then diversity)
                                 Categorical
        Max product
                            (Results have to cover
(It is based on the already
                              a set of categories)
       chosen results)
Diversification algorithms
 Finding the best solution is a NP-Hard problem

  Algorithm depends on the objective function
    Approximation              Greedy


                  Open issues:
   Is Off-line
                           Are there efficient
pre-computation
                           data structures?
   applicable?
How to evaluate diversity in
         search
Data set for the evaluation
                     Full text
                 TREC Interactive
   Top results from commercial search engine

               Structured data
      Taxonomies (Open Directory Project)

                 Ground truth
        Wikipedia disambiguation pages
   Judgements from Amazon Mechanical Turk

There is the need of task-specific standard datasets
Benchmarks
          Adaptation from existing metrics:
    Alpha-NDCG             Subtopic recall and
Normalized discounted          precision
   cumulative gain         Number of subtopics
                                covered
      User intent              Comparison
   Results distribution         against the
 should reflect what the         optimum
    user is asking for
Alpha-nDCG

  Based on information nuggets (Answer to a
                 question)

A document is relevant when it contains a nugget
              needed by the user

 Quality of results graded by human assessors

    The most nuggets are in the set the best
Subtopic recall and precision

                   Is the result set exhaustive?
                  number of subtopics covered by the first k documents
sโˆ’recall at k =
                              total number of subtopics




                      Is the result set efficient?
                                       minRank ๎‚žS opt , r๎‚Ÿ
                    sโˆ’ precision at r=
                                        minRank ๎‚žS , r๎‚Ÿ
Conclusions


Diversification can really improve quality of search
                       results

There is still some work to do in order to achieve
   good results in all the possible scenarios
Open issues

  There is room for improvement defining new
           diversity types and metrics

Ranking functions should take in account diversity
  from the beginning in an integrated process

  Datasets to evaluate each notion of diversity
                should be built
References
   Minack, E., Demartini, G., Nejdl W.: Current Approaches to Search
           Result Diversification. In: Proceedings of ISWC '09

     Gollapudi, S., Sharma, A.: An Axiomatic Approach for Result
             Diversification.In: Proocedings of WWW '09

 Zhai, C.X., Cohen, W.W., Lafferty, J.: Beyond Independent Relevance:
Methods and Evaluation Metrics for Subtopic Retrieval. In: Proceedings
                              of SIGIR '03

 Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying Search
                  Results. In: Proceedings of WSDM '09

 Clough, P., Sanderson, M., Abouammoh, M., Navarro, S., Paramita, M.:
 Multiple Approaches to Analysing Query Diversity. In: Proceedings of
                               SIGIR '09

   Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A.,
   Bรผttcher, S., MacKinnon, I.: Novelty and Diversity in Information
           Retrieval Evaluation. In: Proceedings of SIGIR '08
Current Approaches in Search Result Diversification

More Related Content

What's hot

Learning from Multiple Annotators
Learning  from  Multiple AnnotatorsLearning  from  Multiple Annotators
Learning from Multiple AnnotatorsGaurav Trivedi
ย 
I0704047054
I0704047054I0704047054
I0704047054IJERD Editor
ย 
data_mining_Projectreport
data_mining_Projectreportdata_mining_Projectreport
data_mining_ProjectreportSampath Velaga
ย 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
ย 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
ย 
An Overview of Naรฏve Bayes Classifier
An Overview of Naรฏve Bayes Classifier An Overview of Naรฏve Bayes Classifier
An Overview of Naรฏve Bayes Classifier ananth
ย 
Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Techniqueiosrjce
ย 
Centralized Class Specific Dictionary Learning for wearable sensors based phy...
Centralized Class Specific Dictionary Learning for wearable sensors based phy...Centralized Class Specific Dictionary Learning for wearable sensors based phy...
Centralized Class Specific Dictionary Learning for wearable sensors based phy...Sherin Mathews
ย 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
ย 
Extending Recommendation Systems With Semantics And Context Awareness
Extending Recommendation Systems With Semantics And Context AwarenessExtending Recommendation Systems With Semantics And Context Awareness
Extending Recommendation Systems With Semantics And Context AwarenessVictor Codina
ย 
Naive Bayes | Statistics
Naive Bayes | StatisticsNaive Bayes | Statistics
Naive Bayes | StatisticsTransweb Global Inc
ย 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
ย 
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...ijaia
ย 
Knewton adaptive-learning-white-paper
Knewton adaptive-learning-white-paperKnewton adaptive-learning-white-paper
Knewton adaptive-learning-white-paperdearrd
ย 
uai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.docuai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.docbutest
ย 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/CategorizationOswal Abhishek
ย 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...IJERA Editor
ย 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET Journal
ย 

What's hot (18)

Learning from Multiple Annotators
Learning  from  Multiple AnnotatorsLearning  from  Multiple Annotators
Learning from Multiple Annotators
ย 
I0704047054
I0704047054I0704047054
I0704047054
ย 
data_mining_Projectreport
data_mining_Projectreportdata_mining_Projectreport
data_mining_Projectreport
ย 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
ย 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
ย 
An Overview of Naรฏve Bayes Classifier
An Overview of Naรฏve Bayes Classifier An Overview of Naรฏve Bayes Classifier
An Overview of Naรฏve Bayes Classifier
ย 
Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Technique
ย 
Centralized Class Specific Dictionary Learning for wearable sensors based phy...
Centralized Class Specific Dictionary Learning for wearable sensors based phy...Centralized Class Specific Dictionary Learning for wearable sensors based phy...
Centralized Class Specific Dictionary Learning for wearable sensors based phy...
ย 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
ย 
Extending Recommendation Systems With Semantics And Context Awareness
Extending Recommendation Systems With Semantics And Context AwarenessExtending Recommendation Systems With Semantics And Context Awareness
Extending Recommendation Systems With Semantics And Context Awareness
ย 
Naive Bayes | Statistics
Naive Bayes | StatisticsNaive Bayes | Statistics
Naive Bayes | Statistics
ย 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
ย 
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
ย 
Knewton adaptive-learning-white-paper
Knewton adaptive-learning-white-paperKnewton adaptive-learning-white-paper
Knewton adaptive-learning-white-paper
ย 
uai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.docuai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.doc
ย 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
ย 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...
ย 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
ย 

Similar to Current Approaches in Search Result Diversification

Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Ian Morgan
ย 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Bayes Nets meetup London
ย 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningKai Koenig
ย 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
ย 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionAlessandro Suglia
ย 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionClaudio Greco
ย 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
ย 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifierEsteban Ribero
ย 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
ย 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...Geetika Gautam
ย 
Summary2 (1)
Summary2 (1)Summary2 (1)
Summary2 (1)Adarsh Burma
ย 
Multivariate Models in Questionnaire Development
Multivariate Models in Questionnaire DevelopmentMultivariate Models in Questionnaire Development
Multivariate Models in Questionnaire DevelopmentD Dutta Roy
ย 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsRebecca Bilbro
ย 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
ย 
Clustering
ClusteringClustering
ClusteringNLPseminar
ย 
STUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREESTUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREEAkshay Jain
ย 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisOlga Scrivner
ย 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5ssuser33da69
ย 

Similar to Current Approaches in Search Result Diversification (20)

Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
ย 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
ย 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
ย 
G04124041046
G04124041046G04124041046
G04124041046
ย 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer Prediction
ย 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer Prediction
ย 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
ย 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifier
ย 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
ย 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
ย 
Summary2 (1)
Summary2 (1)Summary2 (1)
Summary2 (1)
ย 
Classification
ClassificationClassification
Classification
ย 
Classification
ClassificationClassification
Classification
ย 
Multivariate Models in Questionnaire Development
Multivariate Models in Questionnaire DevelopmentMultivariate Models in Questionnaire Development
Multivariate Models in Questionnaire Development
ย 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
ย 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
ย 
Clustering
ClusteringClustering
Clustering
ย 
STUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREESTUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREE
ย 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
ย 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
ย 

Recently uploaded

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
ย 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
ย 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
ย 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
ย 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
ย 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
ย 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
ย 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
ย 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
ย 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
ย 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
ย 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
ย 
๐Ÿฌ The future of MySQL is Postgres ๐Ÿ˜
๐Ÿฌ  The future of MySQL is Postgres   ๐Ÿ˜๐Ÿฌ  The future of MySQL is Postgres   ๐Ÿ˜
๐Ÿฌ The future of MySQL is Postgres ๐Ÿ˜RTylerCroy
ย 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
ย 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
ย 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
ย 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
ย 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
ย 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
ย 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
ย 

Recently uploaded (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
ย 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
ย 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ย 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
ย 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
ย 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
ย 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
ย 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
ย 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
ย 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
ย 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
ย 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
ย 
๐Ÿฌ The future of MySQL is Postgres ๐Ÿ˜
๐Ÿฌ  The future of MySQL is Postgres   ๐Ÿ˜๐Ÿฌ  The future of MySQL is Postgres   ๐Ÿ˜
๐Ÿฌ The future of MySQL is Postgres ๐Ÿ˜
ย 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
ย 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
ย 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
ย 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
ย 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
ย 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
ย 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
ย 

Current Approaches in Search Result Diversification

  • 1. Current Approaches in Search Result Diversification Mario Sangiorgio
  • 2. Presentation outline Problem definition What is diversity? The relevance/diversity trade-off Performance evaluation Open issues and conclusions
  • 3. Why is result diversification needed? A couple of real life examples
  • 6. Problem definition Search result diversification is an optimization problem aiming to find k items which are the subset of all relevant results that contains both most relevant and most diverse results.
  • 7. What is needed Relevance measure Diversity measure Diversification objective
  • 8. The result diversification process Items are ranked by relevance Diversity is measured The two measures are used to get the final ranking
  • 10. How can items be diverse? Word sense diversity, from ambiguous queries Information source diversity, from unambiguous queries
  • 11. Measures of diversity Diversity is tightly coupled with the concept of similarity To address the different aspects of the problem several measures emerged: Semantic distance Categorical distance Novel information
  • 12. Semantic distance Diversifies on content dissimilarity Uses the min-hashing scheme to get the S๎‚žd ๎‚Ÿ={MH h 1 ๎‚žd ๎‚Ÿ ,... , MH h ๎‚žd ๎‚Ÿ} n sketch of a document โˆฃS๎‚žu ๎‚ŸโˆฉS๎‚žv๎‚Ÿโˆฃ Distance is computed sim ๎‚žu , v๎‚Ÿ= โˆฃS๎‚žu ๎‚ŸโˆชS๎‚žv๎‚Ÿโˆฃ from Jaccard similarity d ๎‚žu , v๎‚Ÿ=1โˆ’sim ๎‚žu , v๎‚Ÿ Does not work well when the documents have too different lengths or small sketch size
  • 13. Categorical distance Emphasizes word sense diversification It is based on metadata (Taxonomy) The measure is a weighted tree distance l ๎‚žu ๎‚Ÿ l ๎‚žv๎‚Ÿ 1 1 d ๎‚žu , v๎‚Ÿ= โˆ‘ 2 e ๎‚žiโˆ’1๎‚Ÿ ๎‚ƒ โˆ‘ 2 e ๎‚žiโˆ’1๎‚Ÿ i=lca ๎‚žu , v๎‚Ÿ i=lca ๎‚žu , v๎‚Ÿ Examples of taxonomies: /Top/Health vs /Top/Finance /Top/Sport/Racing vs /Top/Sport/Football
  • 14. Novel information Diversifies on a general sense regarding content dissimilarity. Good for subtopics Results are represented with unigram language models (Used for natural language processing) For each document is evaluated (with the Kullback-Leibler divergence) how much novel information it brings into the set How many extra bits will be needed to describe the new document using only the already selected document in the set
  • 15. Diversity measures: open issues Some aspects not taken into account: intrinsic properties of the document genre of the document sentiment regarding the topic
  • 17. Diversification objectives It has been proved impossible to find a function that has all the required properties: scale invariance consistency richness stability independence of irrelevant attributes monotonicity strength of relevance strength of similarity
  • 18. Diversification objectives Several functions proposed: Max sum Max min (No stability) (No consistency nor stability) Max sum of max score Mono objective (Maximizes relevancy and (No consistency) then diversity) Categorical Max product (Results have to cover (It is based on the already a set of categories) chosen results)
  • 19. Diversification algorithms Finding the best solution is a NP-Hard problem Algorithm depends on the objective function Approximation Greedy Open issues: Is Off-line Are there efficient pre-computation data structures? applicable?
  • 20. How to evaluate diversity in search
  • 21. Data set for the evaluation Full text TREC Interactive Top results from commercial search engine Structured data Taxonomies (Open Directory Project) Ground truth Wikipedia disambiguation pages Judgements from Amazon Mechanical Turk There is the need of task-specific standard datasets
  • 22. Benchmarks Adaptation from existing metrics: Alpha-NDCG Subtopic recall and Normalized discounted precision cumulative gain Number of subtopics covered User intent Comparison Results distribution against the should reflect what the optimum user is asking for
  • 23. Alpha-nDCG Based on information nuggets (Answer to a question) A document is relevant when it contains a nugget needed by the user Quality of results graded by human assessors The most nuggets are in the set the best
  • 24. Subtopic recall and precision Is the result set exhaustive? number of subtopics covered by the first k documents sโˆ’recall at k = total number of subtopics Is the result set efficient? minRank ๎‚žS opt , r๎‚Ÿ sโˆ’ precision at r= minRank ๎‚žS , r๎‚Ÿ
  • 25. Conclusions Diversification can really improve quality of search results There is still some work to do in order to achieve good results in all the possible scenarios
  • 26. Open issues There is room for improvement defining new diversity types and metrics Ranking functions should take in account diversity from the beginning in an integrated process Datasets to evaluate each notion of diversity should be built
  • 27. References Minack, E., Demartini, G., Nejdl W.: Current Approaches to Search Result Diversification. In: Proceedings of ISWC '09 Gollapudi, S., Sharma, A.: An Axiomatic Approach for Result Diversification.In: Proocedings of WWW '09 Zhai, C.X., Cohen, W.W., Lafferty, J.: Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval. In: Proceedings of SIGIR '03 Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying Search Results. In: Proceedings of WSDM '09 Clough, P., Sanderson, M., Abouammoh, M., Navarro, S., Paramita, M.: Multiple Approaches to Analysing Query Diversity. In: Proceedings of SIGIR '09 Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A., Bรผttcher, S., MacKinnon, I.: Novelty and Diversity in Information Retrieval Evaluation. In: Proceedings of SIGIR '08