SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Effects of Position and Number of Relevant Documents on Users’ Evaluations of System Performance A presentation by Meg Eastwood  on the 2010 paper by D. Kelly, X. Fu, and C. Shah INF 384H September 26th, 2011 1
Diane Kelly Associate Professor, School of Library and Information Science, UNC Chapel Hill ,[object Object]
Ph.D., Rutgers University (Information Science)
MLS, Rutgers University (Information Retrieval)
BA, University of Alabama (Psychology and English)
Graduate Certificate in Cognitive Science, Rutgers Center for Cognitive Science2
Primary Aim of Research “to investigate the  relationship between actual system performance and users’ evaluations of system performance” (pg 9:2) 3
Secondary Aim of Research “to develop an experimental method that can be used to isolate and study specific aspects of the search process” (pg 9:2) 4
Previous Experimental Protocols Traditional lab-based Naturalistic TREC Interactive Track Study entire search episodes Thomas and Hawking (2006) Trade control for “ecological validity” 5 Both designs include so many variables that it can be “difficult to establish causal relationships” (pg 9:2)
Literature Review Main criticisms of previous studies: Evaluation measures were calculated based on TREC assessor’s relevance judgments, not user judgments Users not provided with explicit instructions Users may have been fatigued Low sample sizes 6
Methods 7
Studies 1 and 2 :  effect of position of relevant documents on user’s evaluation of system performance Study 3: effect of number of relevant documents 8
9 Participants were asked to help researchers evaluate four search engines For each search engine, read topic and posed one query
10 After issuing query, all participants were re-directed to the same results page with 10 standardized results
11 Participants asked to evaluate full text of each search result in the order presented and judge the relevance
12 After evaluating all the documents on the results page, participants were asked to evaluate the search engine
Study 1 Operationalized average precision at n Subjects required to evaluate all 10 documents 13
Study 2 Also operationalized average precision at n Subjects instructed to find five relevant documents 14
Study 3 – Operationalized Precision at n 15
Topics and Documents 16 Selected topics associated with newspaper articles about current events Selected documents with “high probability of being judged relevant or not relevant” (pg 9:12)
Study Participants 17 “Convenient sample” (pg 9:27) of undergraduates from UNC 27 participants for each study (1 -3) Demographic information collected: Sex Age Major Search experience Search frequency
Results Relevance Assessments 18
Did users’ relevance judgments agree with baseline assessments? 19
Did users’ relevance judgments agree with baseline assessments? 20
Did the topic affect differences in relevance assessments? 21
How much did relevance assessments vary between documents? 22
Results Evaluations of  System Performance 23
Did participants modify evaluation ratings? 24
Participant ratings compared between performance levels and studies 25
Participant ratings compared between performance levels and studies 26 Study 1 showed no significant differences in ratings according to performance level
Participant ratings compared between performance levels and studies 27 Studies 2 and 3 did show significant differences in ratings according to performance level
What are the differences between study 1 and study 2? Intended difference:  Completion time? 28
What are the differences between study 1 and study 2? Unintended differences: Instructions for study 2 provided clearer performance objective Subjects felt more successful in study 2? 29
User Experienced Precision 30 “experimental manipulations [of precision] were only 90% effective” (pg 9:24)
Are user-experienced precision values correlated with user ratings of system performance? 31
Are user-experienced precision values correlated with user ratings of system performance? 32

Contenu connexe

Tendances

9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic Review9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic ReviewResearchGuru
 
C:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigmC:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigmRobyn
 
Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3Rizwan S A
 
Measuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health InterventionsMeasuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health InterventionsYTH
 
Comparative and non-comparative study
Comparative and non-comparative studyComparative and non-comparative study
Comparative and non-comparative studyu070536
 
Assignment 2 ppt
Assignment 2 pptAssignment 2 ppt
Assignment 2 pptShiyuLi0903
 
Comparative and non comparative studies
Comparative and non comparative studiesComparative and non comparative studies
Comparative and non comparative studiesu069072
 
Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2Rizwan S A
 
Threats to Internal Validity
Threats to Internal ValidityThreats to Internal Validity
Threats to Internal ValidityRiya Jain
 
Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...Marco Aurelio Gerosa
 
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...Conferenceproceedings
 
Research Process Explained
Research Process ExplainedResearch Process Explained
Research Process Explained360dissertations
 

Tendances (19)

2. Research Process
2. Research Process2. Research Process
2. Research Process
 
Experimental research
Experimental researchExperimental research
Experimental research
 
9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic Review9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic Review
 
C:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigmC:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigm
 
Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3
 
Measuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health InterventionsMeasuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health Interventions
 
Trln
TrlnTrln
Trln
 
Comparative and non-comparative study
Comparative and non-comparative studyComparative and non-comparative study
Comparative and non-comparative study
 
Assignment 2 ppt
Assignment 2 pptAssignment 2 ppt
Assignment 2 ppt
 
Comparative and non comparative studies
Comparative and non comparative studiesComparative and non comparative studies
Comparative and non comparative studies
 
meta analysis
meta analysis meta analysis
meta analysis
 
Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2
 
Threats to Internal Validity
Threats to Internal ValidityThreats to Internal Validity
Threats to Internal Validity
 
Systematic Review & Meta-Analysis Course - Summary Slides
Systematic Review & Meta-Analysis Course - Summary SlidesSystematic Review & Meta-Analysis Course - Summary Slides
Systematic Review & Meta-Analysis Course - Summary Slides
 
Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...
 
Tufts Fwpe Data Analysis For Aota Pd Afc
Tufts Fwpe Data Analysis For Aota Pd AfcTufts Fwpe Data Analysis For Aota Pd Afc
Tufts Fwpe Data Analysis For Aota Pd Afc
 
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
 
Research Process Explained
Research Process ExplainedResearch Process Explained
Research Process Explained
 
Systematic review and meta analysis applications in medication safety 2
Systematic review and meta analysis applications in medication safety 2Systematic review and meta analysis applications in medication safety 2
Systematic review and meta analysis applications in medication safety 2
 

En vedette

Eastwood users lost
Eastwood users lostEastwood users lost
Eastwood users lostmegmeg42
 
Assignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute ManagementAssignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute ManagementJyotpreet Kaur
 
Alexis Is...
Alexis Is...Alexis Is...
Alexis Is...azayfert
 
Euroopa keeltepäev näidis
Euroopa keeltepäev näidisEuroopa keeltepäev näidis
Euroopa keeltepäev näidiskristamahl
 
Communal helpers
Communal helpersCommunal helpers
Communal helperskvilberg
 
D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'kvilberg
 
การวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามาการวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามาkruthai40
 
ITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional FeesITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional FeesShubhranshu Upadhyay
 
A Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-BankA Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-BankIDES Editor
 

En vedette (11)

Eastwood users lost
Eastwood users lostEastwood users lost
Eastwood users lost
 
Assignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute ManagementAssignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute Management
 
Intro to memtech java
Intro to memtech javaIntro to memtech java
Intro to memtech java
 
Alexis Is...
Alexis Is...Alexis Is...
Alexis Is...
 
Euroopa keeltepäev näidis
Euroopa keeltepäev näidisEuroopa keeltepäev näidis
Euroopa keeltepäev näidis
 
Communal helpers
Communal helpersCommunal helpers
Communal helpers
 
D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'
 
การวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามาการวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามา
 
ITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional FeesITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional Fees
 
Dip fingerprint
Dip fingerprintDip fingerprint
Dip fingerprint
 
A Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-BankA Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
 

Similaire à Eastwood presentation on_kellyetal2010

Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...Mahsa Farahanynia
 
Design based for lisbon 2011
Design based for lisbon 2011Design based for lisbon 2011
Design based for lisbon 2011Terry Anderson
 
What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...Aravind Sesagiri Raamkumar
 
Introduction to Systematic Literature Review method
Introduction to Systematic Literature Review methodIntroduction to Systematic Literature Review method
Introduction to Systematic Literature Review methodNorsaremah Salleh
 
Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...Nur Hazimah Khalid
 
The Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability TestsThe Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability TestsCSCJournals
 
Evaluating e reference
Evaluating e referenceEvaluating e reference
Evaluating e referenceElaine Lasda
 
Validity in Research
Validity in ResearchValidity in Research
Validity in ResearchEcem Ekinci
 
Scalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision MakingScalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision MakingKatrien Verbert
 
Resource comparison SciKnow 2019
Resource comparison SciKnow 2019Resource comparison SciKnow 2019
Resource comparison SciKnow 2019Allard Oelen
 
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...Katrien Verbert
 
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...Kristen Carter
 
1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptx1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptxGeraldRefil3
 
Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...Elisavet Andrikopoulou
 
impact of COViD 19.pdf
impact of COViD 19.pdfimpact of COViD 19.pdf
impact of COViD 19.pdfstudywriters
 
Systematic literature review technique.pptx
Systematic literature review technique.pptxSystematic literature review technique.pptx
Systematic literature review technique.pptxTANMAY DAS GUPTA
 
Colleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactioColleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactioWilheminaRossi174
 
RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...ASIS&T
 
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...Tao Zhang
 
Meta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance EducationMeta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance EducationSu-Tuan Lulee
 

Similaire à Eastwood presentation on_kellyetal2010 (20)

Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...
 
Design based for lisbon 2011
Design based for lisbon 2011Design based for lisbon 2011
Design based for lisbon 2011
 
What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...
 
Introduction to Systematic Literature Review method
Introduction to Systematic Literature Review methodIntroduction to Systematic Literature Review method
Introduction to Systematic Literature Review method
 
Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...
 
The Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability TestsThe Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability Tests
 
Evaluating e reference
Evaluating e referenceEvaluating e reference
Evaluating e reference
 
Validity in Research
Validity in ResearchValidity in Research
Validity in Research
 
Scalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision MakingScalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision Making
 
Resource comparison SciKnow 2019
Resource comparison SciKnow 2019Resource comparison SciKnow 2019
Resource comparison SciKnow 2019
 
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
 
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
 
1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptx1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptx
 
Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...
 
impact of COViD 19.pdf
impact of COViD 19.pdfimpact of COViD 19.pdf
impact of COViD 19.pdf
 
Systematic literature review technique.pptx
Systematic literature review technique.pptxSystematic literature review technique.pptx
Systematic literature review technique.pptx
 
Colleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactioColleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactio
 
RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...
 
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...
 
Meta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance EducationMeta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance Education
 

Dernier

A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - AvrilIvanti
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 

Dernier (20)

A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - Avril
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 

Eastwood presentation on_kellyetal2010

  • 1. Effects of Position and Number of Relevant Documents on Users’ Evaluations of System Performance A presentation by Meg Eastwood on the 2010 paper by D. Kelly, X. Fu, and C. Shah INF 384H September 26th, 2011 1
  • 2.
  • 3. Ph.D., Rutgers University (Information Science)
  • 4. MLS, Rutgers University (Information Retrieval)
  • 5. BA, University of Alabama (Psychology and English)
  • 6. Graduate Certificate in Cognitive Science, Rutgers Center for Cognitive Science2
  • 7. Primary Aim of Research “to investigate the relationship between actual system performance and users’ evaluations of system performance” (pg 9:2) 3
  • 8. Secondary Aim of Research “to develop an experimental method that can be used to isolate and study specific aspects of the search process” (pg 9:2) 4
  • 9. Previous Experimental Protocols Traditional lab-based Naturalistic TREC Interactive Track Study entire search episodes Thomas and Hawking (2006) Trade control for “ecological validity” 5 Both designs include so many variables that it can be “difficult to establish causal relationships” (pg 9:2)
  • 10. Literature Review Main criticisms of previous studies: Evaluation measures were calculated based on TREC assessor’s relevance judgments, not user judgments Users not provided with explicit instructions Users may have been fatigued Low sample sizes 6
  • 12. Studies 1 and 2 : effect of position of relevant documents on user’s evaluation of system performance Study 3: effect of number of relevant documents 8
  • 13. 9 Participants were asked to help researchers evaluate four search engines For each search engine, read topic and posed one query
  • 14. 10 After issuing query, all participants were re-directed to the same results page with 10 standardized results
  • 15. 11 Participants asked to evaluate full text of each search result in the order presented and judge the relevance
  • 16. 12 After evaluating all the documents on the results page, participants were asked to evaluate the search engine
  • 17. Study 1 Operationalized average precision at n Subjects required to evaluate all 10 documents 13
  • 18. Study 2 Also operationalized average precision at n Subjects instructed to find five relevant documents 14
  • 19. Study 3 – Operationalized Precision at n 15
  • 20. Topics and Documents 16 Selected topics associated with newspaper articles about current events Selected documents with “high probability of being judged relevant or not relevant” (pg 9:12)
  • 21. Study Participants 17 “Convenient sample” (pg 9:27) of undergraduates from UNC 27 participants for each study (1 -3) Demographic information collected: Sex Age Major Search experience Search frequency
  • 23. Did users’ relevance judgments agree with baseline assessments? 19
  • 24. Did users’ relevance judgments agree with baseline assessments? 20
  • 25. Did the topic affect differences in relevance assessments? 21
  • 26. How much did relevance assessments vary between documents? 22
  • 27. Results Evaluations of System Performance 23
  • 28. Did participants modify evaluation ratings? 24
  • 29. Participant ratings compared between performance levels and studies 25
  • 30. Participant ratings compared between performance levels and studies 26 Study 1 showed no significant differences in ratings according to performance level
  • 31. Participant ratings compared between performance levels and studies 27 Studies 2 and 3 did show significant differences in ratings according to performance level
  • 32. What are the differences between study 1 and study 2? Intended difference: Completion time? 28
  • 33. What are the differences between study 1 and study 2? Unintended differences: Instructions for study 2 provided clearer performance objective Subjects felt more successful in study 2? 29
  • 34. User Experienced Precision 30 “experimental manipulations [of precision] were only 90% effective” (pg 9:24)
  • 35. Are user-experienced precision values correlated with user ratings of system performance? 31
  • 36. Are user-experienced precision values correlated with user ratings of system performance? 32
  • 37. Regression analysis: can you use experienced precision to predict user evaluation? 33
  • 38. Authors’ Discussion and Conclusions “…variations in precision at 10 scores have the greatest impact on subjects’ evaluation ratings.” (pg 9:26) Thoughtful analysis of experimental caveats and generalizability of results Convenient sample of students Only one genre of documents represented Are these results specific to informational/exploratory tasks? 34
  • 39. Suggested Class Discussion Topics Areas where the experiment may have been too tightly controlled/artificial: Controlling order in which users could rate documents? Areas where the experiment may not have been as controlled as the authors intended: Allowing subjects to formulate own queries Study 2 allowed participants to feel “successful”? Ten-point evaluation scale versus five-point evaluation scale? 35
  • 40. References Kelly, D., Fu, X., and Shah, C. 2010. Effects of position and number of relevant documents retrieved on users’ evaluations of system performance. ACM Trans. Inf. Syst. 28, 2, Article 9 (May 2010), 29 pages. DOI 10.1145/1740592.1740597. http://doi.acm.org/10.1145/1740592.1740597 36

Notes de l'éditeur

  1. “My research is focused on information search behavior and the design and evaluation of systems that support interactive information retrieval.”UNC Chapel Hill : according to US News and World Report, they have the #2 library science graduate school in nation– very strong programXun Fu and Chirag Shah were P.h.D students in the program at the time this article was written