SlideShare une entreprise Scribd logo
1  sur  36
Effects of Position and Number of Relevant Documents on Users’ Evaluations of System Performance A presentation by Meg Eastwood  on the 2010 paper by D. Kelly, X. Fu, and C. Shah INF 384H September 26th, 2011 1
Diane Kelly Associate Professor, School of Library and Information Science, UNC Chapel Hill ,[object Object]
Ph.D., Rutgers University (Information Science)
MLS, Rutgers University (Information Retrieval)
BA, University of Alabama (Psychology and English)
Graduate Certificate in Cognitive Science, Rutgers Center for Cognitive Science2
Primary Aim of Research “to investigate the  relationship between actual system performance and users’ evaluations of system performance” (pg 9:2) 3
Secondary Aim of Research “to develop an experimental method that can be used to isolate and study specific aspects of the search process” (pg 9:2) 4
Previous Experimental Protocols Traditional lab-based Naturalistic TREC Interactive Track Study entire search episodes Thomas and Hawking (2006) Trade control for “ecological validity” 5 Both designs include so many variables that it can be “difficult to establish causal relationships” (pg 9:2)
Literature Review Main criticisms of previous studies: Evaluation measures were calculated based on TREC assessor’s relevance judgments, not user judgments Users not provided with explicit instructions Users may have been fatigued Low sample sizes 6
Methods 7
Studies 1 and 2 :  effect of position of relevant documents on user’s evaluation of system performance Study 3: effect of number of relevant documents 8
9 Participants were asked to help researchers evaluate four search engines For each search engine, read topic and posed one query
10 After issuing query, all participants were re-directed to the same results page with 10 standardized results
11 Participants asked to evaluate full text of each search result in the order presented and judge the relevance
12 After evaluating all the documents on the results page, participants were asked to evaluate the search engine
Study 1 Operationalized average precision at n Subjects required to evaluate all 10 documents 13
Study 2 Also operationalized average precision at n Subjects instructed to find five relevant documents 14
Study 3 – Operationalized Precision at n 15
Topics and Documents 16 Selected topics associated with newspaper articles about current events Selected documents with “high probability of being judged relevant or not relevant” (pg 9:12)
Study Participants 17 “Convenient sample” (pg 9:27) of undergraduates from UNC 27 participants for each study (1 -3) Demographic information collected: Sex Age Major Search experience Search frequency
Results Relevance Assessments 18
Did users’ relevance judgments agree with baseline assessments? 19
Did users’ relevance judgments agree with baseline assessments? 20
Did the topic affect differences in relevance assessments? 21
How much did relevance assessments vary between documents? 22
Results Evaluations of  System Performance 23
Did participants modify evaluation ratings? 24
Participant ratings compared between performance levels and studies 25
Participant ratings compared between performance levels and studies 26 Study 1 showed no significant differences in ratings according to performance level
Participant ratings compared between performance levels and studies 27 Studies 2 and 3 did show significant differences in ratings according to performance level
What are the differences between study 1 and study 2? Intended difference:  Completion time? 28
What are the differences between study 1 and study 2? Unintended differences: Instructions for study 2 provided clearer performance objective Subjects felt more successful in study 2? 29
User Experienced Precision 30 “experimental manipulations [of precision] were only 90% effective” (pg 9:24)
Are user-experienced precision values correlated with user ratings of system performance? 31
Are user-experienced precision values correlated with user ratings of system performance? 32

Contenu connexe

Tendances

9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic Review9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic ReviewResearchGuru
 
C:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigmC:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigmRobyn
 
Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3Rizwan S A
 
Measuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health InterventionsMeasuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health InterventionsYTH
 
Comparative and non-comparative study
Comparative and non-comparative studyComparative and non-comparative study
Comparative and non-comparative studyu070536
 
Assignment 2 ppt
Assignment 2 pptAssignment 2 ppt
Assignment 2 pptShiyuLi0903
 
Comparative and non comparative studies
Comparative and non comparative studiesComparative and non comparative studies
Comparative and non comparative studiesu069072
 
Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2Rizwan S A
 
Threats to Internal Validity
Threats to Internal ValidityThreats to Internal Validity
Threats to Internal ValidityRiya Jain
 
Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...Marco Aurelio Gerosa
 
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...Conferenceproceedings
 
Research Process Explained
Research Process ExplainedResearch Process Explained
Research Process Explained360dissertations
 

Tendances (19)

2. Research Process
2. Research Process2. Research Process
2. Research Process
 
Experimental research
Experimental researchExperimental research
Experimental research
 
9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic Review9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic Review
 
C:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigmC:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigm
 
Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3
 
Measuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health InterventionsMeasuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health Interventions
 
Trln
TrlnTrln
Trln
 
Comparative and non-comparative study
Comparative and non-comparative studyComparative and non-comparative study
Comparative and non-comparative study
 
Assignment 2 ppt
Assignment 2 pptAssignment 2 ppt
Assignment 2 ppt
 
Comparative and non comparative studies
Comparative and non comparative studiesComparative and non comparative studies
Comparative and non comparative studies
 
meta analysis
meta analysis meta analysis
meta analysis
 
Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2
 
Threats to Internal Validity
Threats to Internal ValidityThreats to Internal Validity
Threats to Internal Validity
 
Systematic Review & Meta-Analysis Course - Summary Slides
Systematic Review & Meta-Analysis Course - Summary SlidesSystematic Review & Meta-Analysis Course - Summary Slides
Systematic Review & Meta-Analysis Course - Summary Slides
 
Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...
 
Tufts Fwpe Data Analysis For Aota Pd Afc
Tufts Fwpe Data Analysis For Aota Pd AfcTufts Fwpe Data Analysis For Aota Pd Afc
Tufts Fwpe Data Analysis For Aota Pd Afc
 
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
 
Research Process Explained
Research Process ExplainedResearch Process Explained
Research Process Explained
 
Systematic review and meta analysis applications in medication safety 2
Systematic review and meta analysis applications in medication safety 2Systematic review and meta analysis applications in medication safety 2
Systematic review and meta analysis applications in medication safety 2
 

En vedette

Eastwood users lost
Eastwood users lostEastwood users lost
Eastwood users lostmegmeg42
 
Assignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute ManagementAssignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute ManagementJyotpreet Kaur
 
Alexis Is...
Alexis Is...Alexis Is...
Alexis Is...azayfert
 
Euroopa keeltepäev näidis
Euroopa keeltepäev näidisEuroopa keeltepäev näidis
Euroopa keeltepäev näidiskristamahl
 
Communal helpers
Communal helpersCommunal helpers
Communal helperskvilberg
 
D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'kvilberg
 
การวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามาการวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามาkruthai40
 
ITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional FeesITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional FeesShubhranshu Upadhyay
 
A Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-BankA Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-BankIDES Editor
 

En vedette (11)

Eastwood users lost
Eastwood users lostEastwood users lost
Eastwood users lost
 
Assignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute ManagementAssignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute Management
 
Intro to memtech java
Intro to memtech javaIntro to memtech java
Intro to memtech java
 
Alexis Is...
Alexis Is...Alexis Is...
Alexis Is...
 
Euroopa keeltepäev näidis
Euroopa keeltepäev näidisEuroopa keeltepäev näidis
Euroopa keeltepäev näidis
 
Communal helpers
Communal helpersCommunal helpers
Communal helpers
 
D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'
 
การวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามาการวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามา
 
ITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional FeesITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional Fees
 
Dip fingerprint
Dip fingerprintDip fingerprint
Dip fingerprint
 
A Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-BankA Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
 

Similaire à Eastwood presentation on_kellyetal2010

Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...Mahsa Farahanynia
 
Design based for lisbon 2011
Design based for lisbon 2011Design based for lisbon 2011
Design based for lisbon 2011Terry Anderson
 
What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...Aravind Sesagiri Raamkumar
 
Introduction to Systematic Literature Review method
Introduction to Systematic Literature Review methodIntroduction to Systematic Literature Review method
Introduction to Systematic Literature Review methodNorsaremah Salleh
 
Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...Nur Hazimah Khalid
 
The Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability TestsThe Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability TestsCSCJournals
 
Evaluating e reference
Evaluating e referenceEvaluating e reference
Evaluating e referenceElaine Lasda
 
Validity in Research
Validity in ResearchValidity in Research
Validity in ResearchEcem Ekinci
 
Scalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision MakingScalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision MakingKatrien Verbert
 
Resource comparison SciKnow 2019
Resource comparison SciKnow 2019Resource comparison SciKnow 2019
Resource comparison SciKnow 2019Allard Oelen
 
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...Katrien Verbert
 
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...Kristen Carter
 
1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptx1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptxGeraldRefil3
 
Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...Elisavet Andrikopoulou
 
impact of COViD 19.pdf
impact of COViD 19.pdfimpact of COViD 19.pdf
impact of COViD 19.pdfstudywriters
 
Systematic literature review technique.pptx
Systematic literature review technique.pptxSystematic literature review technique.pptx
Systematic literature review technique.pptxTANMAY DAS GUPTA
 
Colleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactioColleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactioWilheminaRossi174
 
RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...ASIS&T
 
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...Tao Zhang
 
Meta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance EducationMeta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance EducationSu-Tuan Lulee
 

Similaire à Eastwood presentation on_kellyetal2010 (20)

Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...
 
Design based for lisbon 2011
Design based for lisbon 2011Design based for lisbon 2011
Design based for lisbon 2011
 
What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...
 
Introduction to Systematic Literature Review method
Introduction to Systematic Literature Review methodIntroduction to Systematic Literature Review method
Introduction to Systematic Literature Review method
 
Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...
 
The Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability TestsThe Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability Tests
 
Evaluating e reference
Evaluating e referenceEvaluating e reference
Evaluating e reference
 
Validity in Research
Validity in ResearchValidity in Research
Validity in Research
 
Scalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision MakingScalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision Making
 
Resource comparison SciKnow 2019
Resource comparison SciKnow 2019Resource comparison SciKnow 2019
Resource comparison SciKnow 2019
 
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
 
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
 
1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptx1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptx
 
Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...
 
impact of COViD 19.pdf
impact of COViD 19.pdfimpact of COViD 19.pdf
impact of COViD 19.pdf
 
Systematic literature review technique.pptx
Systematic literature review technique.pptxSystematic literature review technique.pptx
Systematic literature review technique.pptx
 
Colleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactioColleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactio
 
RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...
 
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...
 
Meta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance EducationMeta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance Education
 

Dernier

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Dernier (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Eastwood presentation on_kellyetal2010

  • 1. Effects of Position and Number of Relevant Documents on Users’ Evaluations of System Performance A presentation by Meg Eastwood on the 2010 paper by D. Kelly, X. Fu, and C. Shah INF 384H September 26th, 2011 1
  • 2.
  • 3. Ph.D., Rutgers University (Information Science)
  • 4. MLS, Rutgers University (Information Retrieval)
  • 5. BA, University of Alabama (Psychology and English)
  • 6. Graduate Certificate in Cognitive Science, Rutgers Center for Cognitive Science2
  • 7. Primary Aim of Research “to investigate the relationship between actual system performance and users’ evaluations of system performance” (pg 9:2) 3
  • 8. Secondary Aim of Research “to develop an experimental method that can be used to isolate and study specific aspects of the search process” (pg 9:2) 4
  • 9. Previous Experimental Protocols Traditional lab-based Naturalistic TREC Interactive Track Study entire search episodes Thomas and Hawking (2006) Trade control for “ecological validity” 5 Both designs include so many variables that it can be “difficult to establish causal relationships” (pg 9:2)
  • 10. Literature Review Main criticisms of previous studies: Evaluation measures were calculated based on TREC assessor’s relevance judgments, not user judgments Users not provided with explicit instructions Users may have been fatigued Low sample sizes 6
  • 12. Studies 1 and 2 : effect of position of relevant documents on user’s evaluation of system performance Study 3: effect of number of relevant documents 8
  • 13. 9 Participants were asked to help researchers evaluate four search engines For each search engine, read topic and posed one query
  • 14. 10 After issuing query, all participants were re-directed to the same results page with 10 standardized results
  • 15. 11 Participants asked to evaluate full text of each search result in the order presented and judge the relevance
  • 16. 12 After evaluating all the documents on the results page, participants were asked to evaluate the search engine
  • 17. Study 1 Operationalized average precision at n Subjects required to evaluate all 10 documents 13
  • 18. Study 2 Also operationalized average precision at n Subjects instructed to find five relevant documents 14
  • 19. Study 3 – Operationalized Precision at n 15
  • 20. Topics and Documents 16 Selected topics associated with newspaper articles about current events Selected documents with “high probability of being judged relevant or not relevant” (pg 9:12)
  • 21. Study Participants 17 “Convenient sample” (pg 9:27) of undergraduates from UNC 27 participants for each study (1 -3) Demographic information collected: Sex Age Major Search experience Search frequency
  • 23. Did users’ relevance judgments agree with baseline assessments? 19
  • 24. Did users’ relevance judgments agree with baseline assessments? 20
  • 25. Did the topic affect differences in relevance assessments? 21
  • 26. How much did relevance assessments vary between documents? 22
  • 27. Results Evaluations of System Performance 23
  • 28. Did participants modify evaluation ratings? 24
  • 29. Participant ratings compared between performance levels and studies 25
  • 30. Participant ratings compared between performance levels and studies 26 Study 1 showed no significant differences in ratings according to performance level
  • 31. Participant ratings compared between performance levels and studies 27 Studies 2 and 3 did show significant differences in ratings according to performance level
  • 32. What are the differences between study 1 and study 2? Intended difference: Completion time? 28
  • 33. What are the differences between study 1 and study 2? Unintended differences: Instructions for study 2 provided clearer performance objective Subjects felt more successful in study 2? 29
  • 34. User Experienced Precision 30 “experimental manipulations [of precision] were only 90% effective” (pg 9:24)
  • 35. Are user-experienced precision values correlated with user ratings of system performance? 31
  • 36. Are user-experienced precision values correlated with user ratings of system performance? 32
  • 37. Regression analysis: can you use experienced precision to predict user evaluation? 33
  • 38. Authors’ Discussion and Conclusions “…variations in precision at 10 scores have the greatest impact on subjects’ evaluation ratings.” (pg 9:26) Thoughtful analysis of experimental caveats and generalizability of results Convenient sample of students Only one genre of documents represented Are these results specific to informational/exploratory tasks? 34
  • 39. Suggested Class Discussion Topics Areas where the experiment may have been too tightly controlled/artificial: Controlling order in which users could rate documents? Areas where the experiment may not have been as controlled as the authors intended: Allowing subjects to formulate own queries Study 2 allowed participants to feel “successful”? Ten-point evaluation scale versus five-point evaluation scale? 35
  • 40. References Kelly, D., Fu, X., and Shah, C. 2010. Effects of position and number of relevant documents retrieved on users’ evaluations of system performance. ACM Trans. Inf. Syst. 28, 2, Article 9 (May 2010), 29 pages. DOI 10.1145/1740592.1740597. http://doi.acm.org/10.1145/1740592.1740597 36

Notes de l'éditeur

  1. “My research is focused on information search behavior and the design and evaluation of systems that support interactive information retrieval.”UNC Chapel Hill : according to US News and World Report, they have the #2 library science graduate school in nation– very strong programXun Fu and Chirag Shah were P.h.D students in the program at the time this article was written