SlideShare une entreprise Scribd logo
1  sur  20
On Statistical Analysis and
Optimization of Information Retrieval
Effectiveness Metrics
Jun Wang
Joint work with Jianhan Zhu
Department of Computer Science
University College London
J.Wang@cs.ucl.ac.uk
Motivation
IR Models
Calculate (relevance)
scores for individual documents
Probability Indexing
BM25
Language Models
The Binary Independent Rel. Model
Motivation
✔
✖
✔
✖
m (a rank order | “true” relevance of documents))
A general definition:
Motivation
We have different rank preferences and thus IR
metrics
NDCG
IR Models
MRR
MAP
?
…
Something missing in
between
Motivation
The fundamental question
What is the underlying generative retrieval process?
Outline
• What is happening right now
• The statistical retrieval process
• Text retrieval experiments
What is happening right now (1)?
• Still focusing on (relevance) score, but with the
acknowledgement the final rank context
– The “less is more” model [Chen&Karger 2006] extended
the relevance model
– assumed the previously retrieved documents non-
relevant when calculating the rel. of documents for the
current rank position,
– equivalent to maximizing the Reciprocal Rank measure
What is happening right now (2)?
• Still focusing on (relevance) score, but with the
acknowledgement the final rank context
– In the Language Model framework, various loss
functions were defined to incorporate various ranking
strategies [Zhai&Lafferty 2006]
What is happening right now (3)?
• Focusing on IR metrics and Ranking
– bypass the step of estimating the relevance states of
individual documents
– construct a document ranking model from training data
by directly optimizing an IR metric [Volkovs&Zemel
2009]
• However, not all IR metrics necessarily
summarize the (training) data well; thus, training
data may not be fully explored
A “balanced” view of the retrieval process
– let us first understand
(infer) the relevance of
documents as accurate as
possible,
– and to summarize it by the
joint probability of
documents’ relevance
– dependency between
documents is considered
– Secondly, rank preference
is specified by an IR
metric.
– The rank decision making
is a stochastic one due to
the uncertainty about the
relevance
– As a result, the optimal
ranking action is the one
that maximizes the
expected value of the IR
metric
Given an IR Metric
The statistical document ranking process
ˆa = αργ µ αξα Ε(µ | θ)
= αργ µ αξα1 ,...,αΝ
( µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ))
ρ1 ,...,ρΝ
∑
The joint
probability of
relevance given a
query
IR metric:
Input:
1.A rank order
2.Relevance of
docs. r1,...,rN
a1,...,aN
The Optimal Ranker
uncertainty
Fixed an IR Metric
OUTPUT: the
estimated
Performance
Score
E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ)
ρ1 ,...,ρΝ
∑
m
a1,...,aN
p(r1,...,rN | q)
E(m | q)
Now the question is how to calculate the
Expected IR metric under the joint probability
of relevance
if we predefine the IR metric
E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ)
ρ1 ,...,ρΝ
∑
m(a1,...,aN | r1,...,rN )
We worked out it for the major IR metrics
(Average Precision, DCG, Precision at N,
Reciprocal Rank)
• Certain assumptions are needed
• The join distribution of relevance
is summarized by the marginal
means and co-variances
E(r1 | q),...,E(rN | q)
cov(ri ,rj | q)
p(r1,...,rN | q)
Some of the results
• Expect Average Precision:
• Expected Reciprocal Rank (two documents):
E[ m ]
Properties of IR metrics under the uncertainty
But, is this analysis can be used in practice?
• The key question is how to obtain the joint
probability of relevance?
– Click through data
– Marginal mean
• Current IR models – relevance models, language models
- Co-variance of relevance
- Use the documents’ score correlation to estimate the relevance
correlation.
- It is query-independent. We approximate it by sampling queries
and calculating the correlation between documents’ ranking
scores
E(r1 | q),...,E(rN | q)
cov(ri ,rj | q)
TREC evaluation
No free lunch
The ideal can be applied for evaluation too.
uncertainty
Fixed an IR Metric
Output the
estimated
Performance
Score
m
a1,...,aN
p(r1,...,rN | q)
E(m | q)
Input a IR model
Relevance judgments

Contenu connexe

Tendances

Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Mohammed Musah
 
Opinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationOpinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationIJECEIAES
 
Statistics for management
Statistics for managementStatistics for management
Statistics for managementVinay Aradhya
 
Introduction to statistics 1
Introduction to statistics 1Introduction to statistics 1
Introduction to statistics 1Anwar Afridi
 
Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2NBER
 
Statistics Assignments 090427
Statistics Assignments 090427Statistics Assignments 090427
Statistics Assignments 090427amykua
 
The pertinent single-attribute-based classifier for small datasets classific...
The pertinent single-attribute-based classifier  for small datasets classific...The pertinent single-attribute-based classifier  for small datasets classific...
The pertinent single-attribute-based classifier for small datasets classific...IJECEIAES
 
Research Methology -Factor Analyses
Research Methology -Factor AnalysesResearch Methology -Factor Analyses
Research Methology -Factor AnalysesNeerav Shivhare
 
Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor AnalysisDaire Hooper
 
Data-analytic sins in property-based molecular design
Data-analytic sins in property-based molecular design Data-analytic sins in property-based molecular design
Data-analytic sins in property-based molecular design Peter Kenny
 
Factor analysis (fa)
Factor analysis (fa)Factor analysis (fa)
Factor analysis (fa)Rajdeep Raut
 
Statistics in real life engineering
Statistics in real life engineeringStatistics in real life engineering
Statistics in real life engineeringMD TOUFIQ HASAN ANIK
 
Factor Analysis in Research
Factor Analysis in ResearchFactor Analysis in Research
Factor Analysis in ResearchQasim Raza
 
Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlationdomsr
 
30 14 jun17 3may 7620 7789-1-sm(edit)new
30 14 jun17 3may 7620 7789-1-sm(edit)new30 14 jun17 3may 7620 7789-1-sm(edit)new
30 14 jun17 3may 7620 7789-1-sm(edit)newIAESIJEECS
 
Intermediate Strategies for Metabolomic Data Analysis
Intermediate Strategies for Metabolomic Data AnalysisIntermediate Strategies for Metabolomic Data Analysis
Intermediate Strategies for Metabolomic Data AnalysisDmitry Grapov
 

Tendances (20)

Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)
 
Opinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationOpinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classication
 
Statistics for management
Statistics for managementStatistics for management
Statistics for management
 
Introduction to statistics 1
Introduction to statistics 1Introduction to statistics 1
Introduction to statistics 1
 
Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2
 
Statistics Assignments 090427
Statistics Assignments 090427Statistics Assignments 090427
Statistics Assignments 090427
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
 
The pertinent single-attribute-based classifier for small datasets classific...
The pertinent single-attribute-based classifier  for small datasets classific...The pertinent single-attribute-based classifier  for small datasets classific...
The pertinent single-attribute-based classifier for small datasets classific...
 
Research Methology -Factor Analyses
Research Methology -Factor AnalysesResearch Methology -Factor Analyses
Research Methology -Factor Analyses
 
Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor Analysis
 
Data-analytic sins in property-based molecular design
Data-analytic sins in property-based molecular design Data-analytic sins in property-based molecular design
Data-analytic sins in property-based molecular design
 
Factor analysis (fa)
Factor analysis (fa)Factor analysis (fa)
Factor analysis (fa)
 
Statistics in real life engineering
Statistics in real life engineeringStatistics in real life engineering
Statistics in real life engineering
 
Factor Analysis in Research
Factor Analysis in ResearchFactor Analysis in Research
Factor Analysis in Research
 
Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlation
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
30 14 jun17 3may 7620 7789-1-sm(edit)new
30 14 jun17 3may 7620 7789-1-sm(edit)new30 14 jun17 3may 7620 7789-1-sm(edit)new
30 14 jun17 3may 7620 7789-1-sm(edit)new
 
Factor analysis (1)
Factor analysis (1)Factor analysis (1)
Factor analysis (1)
 
Priya
PriyaPriya
Priya
 
Intermediate Strategies for Metabolomic Data Analysis
Intermediate Strategies for Metabolomic Data AnalysisIntermediate Strategies for Metabolomic Data Analysis
Intermediate Strategies for Metabolomic Data Analysis
 

En vedette

On Search, Personalisation and Real-time Advertising
On Search, Personalisation and Real-time AdvertisingOn Search, Personalisation and Real-time Advertising
On Search, Personalisation and Real-time AdvertisingJun Wang
 
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display AdvertisingWeinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display AdvertisingJun Wang
 
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...Jun Wang
 
Wsdm17 value-at-risk-bidding
Wsdm17 value-at-risk-biddingWsdm17 value-at-risk-bidding
Wsdm17 value-at-risk-biddingJun Wang
 
Statistical Information Retrieval Modelling: from the Probability Ranking Pr...
Statistical Information Retrieval Modelling:  from the Probability Ranking Pr...Statistical Information Retrieval Modelling:  from the Probability Ranking Pr...
Statistical Information Retrieval Modelling: from the Probability Ranking Pr...Jun Wang
 
Deep Learning
Deep LearningDeep Learning
Deep LearningJun Wang
 

En vedette (7)

On Search, Personalisation and Real-time Advertising
On Search, Personalisation and Real-time AdvertisingOn Search, Personalisation and Real-time Advertising
On Search, Personalisation and Real-time Advertising
 
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display AdvertisingWeinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
 
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...
 
Wsdm17 value-at-risk-bidding
Wsdm17 value-at-risk-biddingWsdm17 value-at-risk-bidding
Wsdm17 value-at-risk-bidding
 
Wsdm2015
Wsdm2015Wsdm2015
Wsdm2015
 
Statistical Information Retrieval Modelling: from the Probability Ranking Pr...
Statistical Information Retrieval Modelling:  from the Probability Ranking Pr...Statistical Information Retrieval Modelling:  from the Probability Ranking Pr...
Statistical Information Retrieval Modelling: from the Probability Ranking Pr...
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 

Similaire à On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

A model-based relevance estimation approach for feature selection in microarr...
A model-based relevance estimation approach for feature selection in microarr...A model-based relevance estimation approach for feature selection in microarr...
A model-based relevance estimation approach for feature selection in microarr...Gianluca Bontempi
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavAgile Testing Alliance
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity ResolutionBenjamin Bengfort
 
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Jo...
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Jo...Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Jo...
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Jo...Joydeep Mondal
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
 
Statistical analysis and interpretation
Statistical analysis and interpretationStatistical analysis and interpretation
Statistical analysis and interpretationDave Marcial
 
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...Alexandros Karatzoglou
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationDmitry Grapov
 
4.4 correlation manual calcualtion
4.4 correlation manual calcualtion4.4 correlation manual calcualtion
4.4 correlation manual calcualtionRajeev Kumar
 
Lecture 9 correlation-manual calcualtion
Lecture 9 correlation-manual calcualtionLecture 9 correlation-manual calcualtion
Lecture 9 correlation-manual calcualtionDr Rajeev Kumar
 
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text ConversationTopic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text ConversationTetsuya Sakai
 
Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspacePrakash Dubey
 
Ranking using pairwise preferences
Ranking using pairwise preferencesRanking using pairwise preferences
Ranking using pairwise preferencesSweta Sharma
 
Part 1
Part 1Part 1
Part 1butest
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Charles Martin
 

Similaire à On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics (20)

A model-based relevance estimation approach for feature selection in microarr...
A model-based relevance estimation approach for feature selection in microarr...A model-based relevance estimation approach for feature selection in microarr...
A model-based relevance estimation approach for feature selection in microarr...
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
 
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Jo...
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Jo...Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Jo...
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Jo...
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
Statistical analysis and interpretation
Statistical analysis and interpretationStatistical analysis and interpretation
Statistical analysis and interpretation
 
Building the Professional of 2020: An Approach to Business Change Process Int...
Building the Professional of 2020: An Approach to Business Change Process Int...Building the Professional of 2020: An Approach to Business Change Process Int...
Building the Professional of 2020: An Approach to Business Change Process Int...
 
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and Visualization
 
4.4 correlation manual calcualtion
4.4 correlation manual calcualtion4.4 correlation manual calcualtion
4.4 correlation manual calcualtion
 
Chapter two
Chapter twoChapter two
Chapter two
 
Lecture 9 correlation-manual calcualtion
Lecture 9 correlation-manual calcualtionLecture 9 correlation-manual calcualtion
Lecture 9 correlation-manual calcualtion
 
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text ConversationTopic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
 
ppt0320defenseday
ppt0320defensedayppt0320defenseday
ppt0320defenseday
 
EDA by Sastry.pptx
EDA by Sastry.pptxEDA by Sastry.pptx
EDA by Sastry.pptx
 
Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspace
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
 
Ranking using pairwise preferences
Ranking using pairwise preferencesRanking using pairwise preferences
Ranking using pairwise preferences
 
Part 1
Part 1Part 1
Part 1
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 

Dernier

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Dernier (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

  • 1. On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics Jun Wang Joint work with Jianhan Zhu Department of Computer Science University College London J.Wang@cs.ucl.ac.uk
  • 2. Motivation IR Models Calculate (relevance) scores for individual documents Probability Indexing BM25 Language Models The Binary Independent Rel. Model
  • 3. Motivation ✔ ✖ ✔ ✖ m (a rank order | “true” relevance of documents)) A general definition:
  • 4. Motivation We have different rank preferences and thus IR metrics NDCG IR Models MRR MAP ? … Something missing in between
  • 5. Motivation The fundamental question What is the underlying generative retrieval process?
  • 6. Outline • What is happening right now • The statistical retrieval process • Text retrieval experiments
  • 7. What is happening right now (1)? • Still focusing on (relevance) score, but with the acknowledgement the final rank context – The “less is more” model [Chen&Karger 2006] extended the relevance model – assumed the previously retrieved documents non- relevant when calculating the rel. of documents for the current rank position, – equivalent to maximizing the Reciprocal Rank measure
  • 8. What is happening right now (2)? • Still focusing on (relevance) score, but with the acknowledgement the final rank context – In the Language Model framework, various loss functions were defined to incorporate various ranking strategies [Zhai&Lafferty 2006]
  • 9. What is happening right now (3)? • Focusing on IR metrics and Ranking – bypass the step of estimating the relevance states of individual documents – construct a document ranking model from training data by directly optimizing an IR metric [Volkovs&Zemel 2009] • However, not all IR metrics necessarily summarize the (training) data well; thus, training data may not be fully explored
  • 10. A “balanced” view of the retrieval process – let us first understand (infer) the relevance of documents as accurate as possible, – and to summarize it by the joint probability of documents’ relevance – dependency between documents is considered – Secondly, rank preference is specified by an IR metric. – The rank decision making is a stochastic one due to the uncertainty about the relevance – As a result, the optimal ranking action is the one that maximizes the expected value of the IR metric Given an IR Metric
  • 11. The statistical document ranking process ˆa = αργ µ αξα Ε(µ | θ) = αργ µ αξα1 ,...,αΝ ( µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ)) ρ1 ,...,ρΝ ∑ The joint probability of relevance given a query IR metric: Input: 1.A rank order 2.Relevance of docs. r1,...,rN a1,...,aN
  • 12. The Optimal Ranker uncertainty Fixed an IR Metric OUTPUT: the estimated Performance Score E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ) ρ1 ,...,ρΝ ∑ m a1,...,aN p(r1,...,rN | q) E(m | q)
  • 13. Now the question is how to calculate the Expected IR metric under the joint probability of relevance if we predefine the IR metric E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ) ρ1 ,...,ρΝ ∑ m(a1,...,aN | r1,...,rN )
  • 14. We worked out it for the major IR metrics (Average Precision, DCG, Precision at N, Reciprocal Rank) • Certain assumptions are needed • The join distribution of relevance is summarized by the marginal means and co-variances E(r1 | q),...,E(rN | q) cov(ri ,rj | q) p(r1,...,rN | q)
  • 15. Some of the results • Expect Average Precision: • Expected Reciprocal Rank (two documents): E[ m ]
  • 16. Properties of IR metrics under the uncertainty
  • 17. But, is this analysis can be used in practice? • The key question is how to obtain the joint probability of relevance? – Click through data – Marginal mean • Current IR models – relevance models, language models - Co-variance of relevance - Use the documents’ score correlation to estimate the relevance correlation. - It is query-independent. We approximate it by sampling queries and calculating the correlation between documents’ ranking scores E(r1 | q),...,E(rN | q) cov(ri ,rj | q)
  • 20. The ideal can be applied for evaluation too. uncertainty Fixed an IR Metric Output the estimated Performance Score m a1,...,aN p(r1,...,rN | q) E(m | q) Input a IR model Relevance judgments

Notes de l'éditeur

  1. focus still on designing a scoring function of a document, but with the acknowledgement of various retrieval goals and the final rank context.
  2. focus still on designing a scoring function of a document, but with the acknowledgement of various retrieval goals and the final rank context.
  3. Informative argument.: some evaluation metrics are less informative than others [4]. some IR metrics thus do not necessarily summarize the (training) data well; if we begin optimizing IR metrics right from the data, the statistics of the data may not be fully explored and utilized. It is not really adaptive as have to re-do the whole training if want to optimize another metric.
  4. In the first stage, the aim is to estimate the relevance of documents as accurate as possible, and summarize it by the joint probability of documents’ relevance. Only in the second stage is the rank preference specified, possibly by an IR metric. The rank decision making is a stochastic one due to the uncertainty about the relevance. As a result, the optimal ranking action is the one that maximizes the expected value of the IR metric
  5. In the first stage, the aim is to estimate the relevance of documents as accurate as possible, and summarize it by the joint probability of documents’ relevance. Only in the second stage is the rank preference specified, possibly by an IR metric. The rank decision making is a stochastic one due to the uncertainty about the relevance. As a result, the optimal ranking action is the one that maximizes the expected value of the IR metric