SlideShare a Scribd company logo
1 of 13
Comparison-based evaluation
Jose Antonio Martinez Torres
Problem
Law of Large number
1 spin
-$500
1000 spin
$100,000
2
5
4
4
3
2
Experiments
• Movies Dataset
– 100,000 ratings
– 1000 users
– 1700 movies
• http://www.grouplens.org/node/73
True rating
• Based on the law of large numbers, average
rating using this movies dataset can work very
well.
• However, by using a small sample, average
rating would significantly differ from the true
rating.
= { 2, 4, 5,5,4,4,5,3,3,4,5,4,
5,4,5,4,5,4,3,2,3,3,4,5,4,4,5,5,3,4}
True rating = 4
= { 2, 4 }
Rating = 3
Kendall tau
• Statistic used to measure the degree of
similarity between two rankings.
• Practical use:
– Compare how close the top-10 results produced
by Google and Bing are.
A
B
C
D
E
C
D
A
B
E
Rank 1 Rank 2
T = 6 - 4 / ½ (5) (5-1) = 0.2
Dissimilarity goes from 1 to -1 where 1 means the two rankings are
the same and -1 means one ranking is the reverse of the other
Movie1 = {3, 4, 5,4,5,3,2,4,4,5,4,3,4,5,4,3,2,3,4,4,4,5,4,3,3,2}
Movie2 = {4, 3, 2,3,2,3,4,3,2,1,2,3,4,3,4,3,2,3,2,3,4,3,4,3,4,3}
Movie3 = {3, 4, 2,4,5,3,2,4,,4,5,4,3,4,5,4,3,2,3,4,4,4,5,4,3,3,2}
Movie4 = {2, 3, 4,5,3,4,4,4,5,4,3,3,2}
Movie5 = {3, 4, 5,2,4,5,4,3,3,2}
Movie6 = {5, 3, 4,3,3,5,4,3,3,2}
Movie7 = {1, 3, 4,3,2,3,43,2,3,4,2,3,4,2,3,4,3,3,2}
.
.
Movie1700= {3, 4, 5,4,2,3,3,2,3,2,1,1,2,3}
Ranking 1 = 3 , 4 , 3, 2, 3, 5, 1
Ranking 2 = 3.5 ,3.5 , 3.5, 2.5, 3.5, 4, 2
.
.
.
Calculate 1 to n ranking n and true rating Kendall tau correlation
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Kendalltvalue
Error rate k-distance
Serie…

More Related Content

Similar to Comparison based evaluation

Predictive Analytics with UX Research Data: Yes We Can!
Predictive Analytics with UX Research Data: Yes We Can!Predictive Analytics with UX Research Data: Yes We Can!
Predictive Analytics with UX Research Data: Yes We Can!UXPA Boston
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEYelp Engineering
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityJulián Urbano
 
03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdfSugumarSarDurai
 
I like it... I like it Not
I like it... I like it NotI like it... I like it Not
I like it... I like it NotXavier Amatriain
 
Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Institute of Contemporary Sciences
 
Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Machine Learning, Data Mining, Genetic Algorithms, Neural ...Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Machine Learning, Data Mining, Genetic Algorithms, Neural ...butest
 
Causal reasoning and Learning Systems
Causal reasoning and Learning SystemsCausal reasoning and Learning Systems
Causal reasoning and Learning SystemsTrieu Nguyen
 
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...Allen Day, PhD
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision treeyazad dumasia
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesDaniel Tunkelang
 
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)Amazon Web Services
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalJulián Urbano
 
Chapter 5 Query Evaluation.pdf
Chapter 5 Query Evaluation.pdfChapter 5 Query Evaluation.pdf
Chapter 5 Query Evaluation.pdfHabtamu100
 

Similar to Comparison based evaluation (20)

Predictive Analytics with UX Research Data: Yes We Can!
Predictive Analytics with UX Research Data: Yes We Can!Predictive Analytics with UX Research Data: Yes We Can!
Predictive Analytics with UX Research Data: Yes We Can!
 
Competition16
Competition16Competition16
Competition16
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
 
03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf
 
I like it... I like it Not
I like it... I like it NotI like it... I like it Not
I like it... I like it Not
 
Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...
 
Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Machine Learning, Data Mining, Genetic Algorithms, Neural ...Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Machine Learning, Data Mining, Genetic Algorithms, Neural ...
 
Causal reasoning and Learning Systems
Causal reasoning and Learning SystemsCausal reasoning and Learning Systems
Causal reasoning and Learning Systems
 
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
 
PCA.pptx
PCA.pptxPCA.pptx
PCA.pptx
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision tree
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce Queries
 
Machine learning
Machine learningMachine learning
Machine learning
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
205_April_22.pptx
205_April_22.pptx205_April_22.pptx
205_April_22.pptx
 
Big Data Challenges and Solutions
Big Data Challenges and SolutionsBig Data Challenges and Solutions
Big Data Challenges and Solutions
 
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
 
Chapter 5 Query Evaluation.pdf
Chapter 5 Query Evaluation.pdfChapter 5 Query Evaluation.pdf
Chapter 5 Query Evaluation.pdf
 

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Comparison based evaluation

  • 3. Law of Large number 1 spin -$500 1000 spin $100,000
  • 4.
  • 7. Experiments • Movies Dataset – 100,000 ratings – 1000 users – 1700 movies • http://www.grouplens.org/node/73
  • 8. True rating • Based on the law of large numbers, average rating using this movies dataset can work very well. • However, by using a small sample, average rating would significantly differ from the true rating.
  • 9. = { 2, 4, 5,5,4,4,5,3,3,4,5,4, 5,4,5,4,5,4,3,2,3,3,4,5,4,4,5,5,3,4} True rating = 4 = { 2, 4 } Rating = 3
  • 10. Kendall tau • Statistic used to measure the degree of similarity between two rankings. • Practical use: – Compare how close the top-10 results produced by Google and Bing are.
  • 11. A B C D E C D A B E Rank 1 Rank 2 T = 6 - 4 / ½ (5) (5-1) = 0.2 Dissimilarity goes from 1 to -1 where 1 means the two rankings are the same and -1 means one ranking is the reverse of the other
  • 12. Movie1 = {3, 4, 5,4,5,3,2,4,4,5,4,3,4,5,4,3,2,3,4,4,4,5,4,3,3,2} Movie2 = {4, 3, 2,3,2,3,4,3,2,1,2,3,4,3,4,3,2,3,2,3,4,3,4,3,4,3} Movie3 = {3, 4, 2,4,5,3,2,4,,4,5,4,3,4,5,4,3,2,3,4,4,4,5,4,3,3,2} Movie4 = {2, 3, 4,5,3,4,4,4,5,4,3,3,2} Movie5 = {3, 4, 5,2,4,5,4,3,3,2} Movie6 = {5, 3, 4,3,3,5,4,3,3,2} Movie7 = {1, 3, 4,3,2,3,43,2,3,4,2,3,4,2,3,4,3,3,2} . . Movie1700= {3, 4, 5,4,2,3,3,2,3,2,1,1,2,3} Ranking 1 = 3 , 4 , 3, 2, 3, 5, 1 Ranking 2 = 3.5 ,3.5 , 3.5, 2.5, 3.5, 4, 2 . . . Calculate 1 to n ranking n and true rating Kendall tau correlation
  • 13. 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Kendalltvalue Error rate k-distance Serie…