SlideShare une entreprise Scribd logo
1  sur  21
Matt Lease
• School of Information @mattlease
University of Texas at Austin ml@utexas.edu
Joint work with
with
Yinglong Zhang Jin Zhang Jacek Gwizdka
Multidimensional Relevance Modeling
via Psychometrics & Crowdsourcing
slides: www.slideshare.net/mattlease
Saracevic’s ‘97 Salton Award address
“…the human-centered side was often highly critical
of the systems side for ignoring users... [when]
results have implications for systems design &
practice. Unfortunately… beyond suggestions,
concrete design solutions were not delivered.
“…the systems side by and large ignores the user
side and user studies… the stance is ‘tell us what
to do and we will.’ But nobody is telling...
“Thus, there are not many interactions…”
Matt Lease <ml@utexas.edu> 2/20
Primary Research Question
• What is relevance?
– What factors constitute it? Can we quantify their
relative importance? How do they interact?
• Old IR question, many studies, little agreement
• Potential impacts?
– Further understanding of cognitive relevance
– Guide IR engineering toward inferring key factors
– Foster multi-dimensional evaluation of IR systems
Matt Lease <ml@utexas.edu> 3/20
Secondary Research Question
• How can we measure/ensure quality of
subjective relevance judgments
– How can we distinguish valid subjectivity vs. human
error in judging disagreements (traditional or online)?
• Potential impacts
– Help explain/reduce judging disagreements
– Enable evaluation wrt. distribution of opinions
– Encourage other subjective data collection in HCOMP
Matt Lease <ml@utexas.edu> 4/20
Pscychology to the Rescue!
• A Guide to Behavioral Experiments
on Mechanical Turk
– W. Mason and S. Suri (2010). SSRN online.
• Crowdsourcing for Human Subjects Research
– L. Schmidt (CrowdConf 2010)
• Crowdsourcing Content Analysis for Behavioral Research:
Insights from Mechanical Turk
– Conley & Tosti-Kharas (2010). Academy of Management
• Amazon's Mechanical Turk : A New Source of
Inexpensive, Yet High-Quality, Data?
– M. Buhrmester et al. (2011). Perspectives… 6(1):3-5.
– see also: Amazon Mechanical Turk Guide for Social Scientists
5/20
August 12, 2012
6/20
Contributions
• Describe a simple, reliable, scalable method for
collecting diverse (subjective), multi-dimensional
relevance judgments from online participants
– Online survey techniques from pscyhometrics
– Data available online
• Describe a rigorous, positivist, data-driven framework
for inferring & modeling multi-dimensional relevance
– Structural equation modeling (SEM) from pscyhometrics
– Run the experiment & let the data speak for itself!
– Implemented in standard R libraries available online
Matt Lease <ml@utexas.edu> 7/20
An example model of multi-dimensional relevance
Matt Lease <ml@utexas.edu> 8/20
Experimental Design
• Define some search tasks
• Pick some documents to be judged
• Hypothesize some relevance dimensions
• Ask participants to answer some questions
• Analyze data via Structural Equation Modeling (SEM)
– Use Exploratory Factor Analysis (EFA) to assess question-
factor relationships, then prune “bad” questions
– Use Confirmatory Factor Analysis (CFA) to assess
correlations, test significance, & compare models
– Cousin to graphical models in statistics/AI
Matt Lease <ml@utexas.edu> 9/20
Collecting multi-dimensional relevance
judgments
• Participant picks one of several pre-defined topics
– You want to plan a one week vacation in China
• Participant assigned a Web page to judge
– We wrote a query for each topic, submitted to a popular
search engine, and did stratified sampling of results
• Participant answers a set of likert-scale questions
– I think the information in this page is incorrect
– It’s difficult to understand the information in this page
– …
Matt Lease <ml@utexas.edu> 10/20
What Questions might we ask?
• What factors do you think impact relevance…
• We hypothesize same 5 factors as Xu & Chen ’06
– Topicality, reliability, novelty, understability, & scope
– Choose same to make revised mechanics & any
difference in findings maximally clear
• Assume factors are incomplete & imperfect
– Positivist approach: do these factors explain
observed data better than other alternatives:
uni-dimensional relevance or another set of factors?
Matt Lease <ml@utexas.edu> 11/20
How do we ask the questions?
• Ask 3+ questions per hypothesized dimension
– Ask repeated, similar questions, & change polarity
– Randomize question order (don’t group questions)
– Over-generate questions to allow for later pruning
– Exclude participants failing self-consistency checks
• Usual stuff
– Use clear, familiar, non-leading wording
– Balance likert response scale,
– Pre-test survey in-house, then pilot study online
Matt Lease <ml@utexas.edu> 12/20
Structural Equation Modeling (SEM)
• Based on Sewell Wright’s path analysis (1921)
– A factor model is parameterized by factor loadings,
covariances, & residual error terms
• Graphical representation: path diagram
– Observed variables in boxes
– Latent variables in ovals
– Directed edges denote
causal relationships
– Residual error terms
implicitly assumed
Matt Lease <ml@utexas.edu> 13/20
Exploratory Factor Analysis (EFA) – 1 of 2
• Is the sample large enough for EFA?
– Kaiser-Mayer-Olkin (KMO) Measure of Adequacy
– Bartlett’s Test of Sphericity
• Principal Axis Factoring (PAF) to find eigenvalues
– Assume some large, constant # of latent factors
– Assume each factor has a connecting edge to each question
– Estimate factor model parameters by least-squares (ML)
• Promax (oblique) rotation to maximize correlations
• Prune factors via Parallel Analysis
– Create random data with same # factors & questions
– Create correlation matrix and find eigenvalues
Matt Lease <ml@utexas.edu> 14/20
• Perform Parallel Analysis
– Create random data w/ same # of factors & questions
– Create correlation matrix and find eigenvalues
• Create Scree Plot of Eigenvalues
• Re-run EFA for reduced factors
• Compute Pearson correlations
• Discard questions with:
– Weak factor loading
– Strong cross-factor loading
– Lack of logical interpretation
• Kenny’s Rule: need >= 2 questions per factor for EFA
Exploratory Factor Analysis (EFA) – 2 of 2
Matt Lease <ml@utexas.edu> 15/20
Question-Factor Loadings (Weights)
Matt Lease <ml@utexas.edu> 16/20
CFA: Assess and Compare Models
• F First-order baseline model uses a single
latent factor to explain observed data
Posited hierarchical factor model
uses 5 relevance dimensions
Matt Lease <ml@utexas.edu> 17/20
• Null model assume observations independent
– Covariance between questions fixed at 0 and all means and
coveriances left free
• Comparison stats
– Non-Normed Fit Index (NNFI)
– Comparative Fit Index (CFI)
– Root-Mean Squared Error of Approximation (RMSEA)
– Standardized-root Mean-Square Residual (SMSR)
Confirmatory Factor Analysis (CFA)
Matt Lease <ml@utexas.edu> 18/20
Our model of multi-dimensional relevance
Matt Lease <ml@utexas.edu> 19/20
Future Directions
• More data-driven positivist research into factors
– Different user groups, search scenarios, devices, etc.
– Need more data to support normative claims
• Train/test operational systems for varying factors
– Identify/extend detected features for each dimension
– Personalize search results for individual preferences
• Improve judging agreement by making task more
natural and/or assessing impact of latent factors?
• Intra-subject vs. inter-subject aggregation?
– Other methods for ensuring subjective data quality?
20/20
Thank You!
ir.ischool.utexas.edu
21
Slides: www.slideshare.net/mattlease

Contenu connexe

Tendances

Exploring Digital Assessments: How Teachers Improve Learning Outcomes
Exploring Digital Assessments: How Teachers Improve Learning OutcomesExploring Digital Assessments: How Teachers Improve Learning Outcomes
Exploring Digital Assessments: How Teachers Improve Learning OutcomesTDWolsey
 
Data analysis for effective decision making
Data analysis for effective decision makingData analysis for effective decision making
Data analysis for effective decision makingsyed ahmed
 
Introduction to meta-analysis
Introduction to meta-analysisIntroduction to meta-analysis
Introduction to meta-analysisYoungjun Na
 
Request experiment at WES
Request experiment at WESRequest experiment at WES
Request experiment at WESGRAPE
 
Request experiment in CHOICE LAB
Request experiment in CHOICE LABRequest experiment in CHOICE LAB
Request experiment in CHOICE LABGRAPE
 
EAPRIL explanatory evaluation
EAPRIL explanatory evaluationEAPRIL explanatory evaluation
EAPRIL explanatory evaluationHenk Sligte
 
Problem and situation analysis
Problem and situation analysisProblem and situation analysis
Problem and situation analysisMoamen Abu Nada
 
Using interactive models to enhance UML education
Using interactive models to enhance UML educationUsing interactive models to enhance UML education
Using interactive models to enhance UML educationStephen Frezza
 
Chapter 3 Methodology (Capstone Research)
Chapter 3   Methodology (Capstone Research)Chapter 3   Methodology (Capstone Research)
Chapter 3 Methodology (Capstone Research)school
 
Problem formulation
Problem formulationProblem formulation
Problem formulationH9460730008
 
Mixed Effects Models - Power
Mixed Effects Models - PowerMixed Effects Models - Power
Mixed Effects Models - PowerScott Fraundorf
 
WPIPosterPresentation24x36
WPIPosterPresentation24x36WPIPosterPresentation24x36
WPIPosterPresentation24x36Allan La
 
System dynamics modeling and its applications on urban environmental management
System dynamics modeling and its applications on urban environmental managementSystem dynamics modeling and its applications on urban environmental management
System dynamics modeling and its applications on urban environmental managementMarion Micah Tinio
 

Tendances (15)

Exploring Digital Assessments: How Teachers Improve Learning Outcomes
Exploring Digital Assessments: How Teachers Improve Learning OutcomesExploring Digital Assessments: How Teachers Improve Learning Outcomes
Exploring Digital Assessments: How Teachers Improve Learning Outcomes
 
Data analysis for effective decision making
Data analysis for effective decision makingData analysis for effective decision making
Data analysis for effective decision making
 
Introduction to meta-analysis
Introduction to meta-analysisIntroduction to meta-analysis
Introduction to meta-analysis
 
Request experiment at WES
Request experiment at WESRequest experiment at WES
Request experiment at WES
 
Request experiment in CHOICE LAB
Request experiment in CHOICE LABRequest experiment in CHOICE LAB
Request experiment in CHOICE LAB
 
EAPRIL explanatory evaluation
EAPRIL explanatory evaluationEAPRIL explanatory evaluation
EAPRIL explanatory evaluation
 
Problem and situation analysis
Problem and situation analysisProblem and situation analysis
Problem and situation analysis
 
3701552978
37015529783701552978
3701552978
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
Using interactive models to enhance UML education
Using interactive models to enhance UML educationUsing interactive models to enhance UML education
Using interactive models to enhance UML education
 
Chapter 3 Methodology (Capstone Research)
Chapter 3   Methodology (Capstone Research)Chapter 3   Methodology (Capstone Research)
Chapter 3 Methodology (Capstone Research)
 
Problem formulation
Problem formulationProblem formulation
Problem formulation
 
Mixed Effects Models - Power
Mixed Effects Models - PowerMixed Effects Models - Power
Mixed Effects Models - Power
 
WPIPosterPresentation24x36
WPIPosterPresentation24x36WPIPosterPresentation24x36
WPIPosterPresentation24x36
 
System dynamics modeling and its applications on urban environmental management
System dynamics modeling and its applications on urban environmental managementSystem dynamics modeling and its applications on urban environmental management
System dynamics modeling and its applications on urban environmental management
 

Similaire à Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SIGIR 2014 Presentation

The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingMatthew Lease
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsMatthew Lease
 
Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationMatthew Lease
 
Exploratory
Exploratory Exploratory
Exploratory toby2036
 
Structural_equation_modeling_SEM_worksho (2).pptx
Structural_equation_modeling_SEM_worksho (2).pptxStructural_equation_modeling_SEM_worksho (2).pptx
Structural_equation_modeling_SEM_worksho (2).pptxAnkesh20
 
Slides sem on pls-complete
Slides sem on pls-completeSlides sem on pls-complete
Slides sem on pls-completeDr Hemant Sharma
 
Data analytics to support awareness and recommendation
Data analytics to support awareness and recommendationData analytics to support awareness and recommendation
Data analytics to support awareness and recommendationKatrien Verbert
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentationnirvdrum
 
Learning to Teach: Improving Instruction with Machine Learning Techniques
Learning to Teach: Improving Instruction with Machine Learning TechniquesLearning to Teach: Improving Instruction with Machine Learning Techniques
Learning to Teach: Improving Instruction with Machine Learning TechniquesBeverly Park Woolf
 
Crowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral OutcomesCrowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral OutcomesAlekya Yermal
 
Representative Of The Populationseek Your Dream/Tutorialoutletdotcom
Representative Of The Populationseek Your Dream/TutorialoutletdotcomRepresentative Of The Populationseek Your Dream/Tutorialoutletdotcom
Representative Of The Populationseek Your Dream/Tutorialoutletdotcomapjk512
 
[DSC Europe 22] Machine learning algorithms as tools for student success pred...
[DSC Europe 22] Machine learning algorithms as tools for student success pred...[DSC Europe 22] Machine learning algorithms as tools for student success pred...
[DSC Europe 22] Machine learning algorithms as tools for student success pred...DataScienceConferenc1
 
Introduction to participatory systemic inquiry mongolia
Introduction to participatory systemic inquiry   mongoliaIntroduction to participatory systemic inquiry   mongolia
Introduction to participatory systemic inquiry mongoliaGreen Economy Coalition
 
Socail Influence & Homophilly
Socail Influence & HomophillySocail Influence & Homophilly
Socail Influence & HomophillyNitish Upreti
 
Online Learning to Rank
Online Learning to RankOnline Learning to Rank
Online Learning to Rankewhuang3
 
Survey Research Methods with Lynn Silipigni Connaway
Survey Research Methods with Lynn Silipigni ConnawaySurvey Research Methods with Lynn Silipigni Connaway
Survey Research Methods with Lynn Silipigni ConnawayLynn Connaway
 
Psychometric instrument development
Psychometric instrument developmentPsychometric instrument development
Psychometric instrument developmentJames Neill
 

Similaire à Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SIGIR 2014 Presentation (20)

The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to Ethics
 
Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine Evaluation
 
Exploratory
Exploratory Exploratory
Exploratory
 
Structural_equation_modeling_SEM_worksho (2).pptx
Structural_equation_modeling_SEM_worksho (2).pptxStructural_equation_modeling_SEM_worksho (2).pptx
Structural_equation_modeling_SEM_worksho (2).pptx
 
Slides sem on pls-complete
Slides sem on pls-completeSlides sem on pls-complete
Slides sem on pls-complete
 
Data analytics to support awareness and recommendation
Data analytics to support awareness and recommendationData analytics to support awareness and recommendation
Data analytics to support awareness and recommendation
 
Rree measurement-larry-d3
Rree measurement-larry-d3Rree measurement-larry-d3
Rree measurement-larry-d3
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentation
 
Learning to Teach: Improving Instruction with Machine Learning Techniques
Learning to Teach: Improving Instruction with Machine Learning TechniquesLearning to Teach: Improving Instruction with Machine Learning Techniques
Learning to Teach: Improving Instruction with Machine Learning Techniques
 
Crowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral OutcomesCrowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral Outcomes
 
Representative Of The Populationseek Your Dream/Tutorialoutletdotcom
Representative Of The Populationseek Your Dream/TutorialoutletdotcomRepresentative Of The Populationseek Your Dream/Tutorialoutletdotcom
Representative Of The Populationseek Your Dream/Tutorialoutletdotcom
 
[DSC Europe 22] Machine learning algorithms as tools for student success pred...
[DSC Europe 22] Machine learning algorithms as tools for student success pred...[DSC Europe 22] Machine learning algorithms as tools for student success pred...
[DSC Europe 22] Machine learning algorithms as tools for student success pred...
 
Introduction to participatory systemic inquiry mongolia
Introduction to participatory systemic inquiry   mongoliaIntroduction to participatory systemic inquiry   mongolia
Introduction to participatory systemic inquiry mongolia
 
Socail Influence & Homophilly
Socail Influence & HomophillySocail Influence & Homophilly
Socail Influence & Homophilly
 
Online Learning to Rank
Online Learning to RankOnline Learning to Rank
Online Learning to Rank
 
MMR.pdf
MMR.pdfMMR.pdf
MMR.pdf
 
Survey Research Methods with Lynn Silipigni Connaway
Survey Research Methods with Lynn Silipigni ConnawaySurvey Research Methods with Lynn Silipigni Connaway
Survey Research Methods with Lynn Silipigni Connaway
 
Mixed methods
Mixed methodsMixed methods
Mixed methods
 
Psychometric instrument development
Psychometric instrument developmentPsychometric instrument development
Psychometric instrument development
 

Plus de Matthew Lease

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Matthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopMatthew Lease
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Matthew Lease
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information RetrievalMatthew Lease
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Matthew Lease
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...Matthew Lease
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingMatthew Lease
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)Matthew Lease
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016Matthew Lease
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)Matthew Lease
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing ScienceMatthew Lease
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsMatthew Lease
 

Plus de Matthew Lease (20)

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
 

Dernier

Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyUXDXConf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 

Dernier (20)

Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 

Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SIGIR 2014 Presentation

  • 1. Matt Lease • School of Information @mattlease University of Texas at Austin ml@utexas.edu Joint work with with Yinglong Zhang Jin Zhang Jacek Gwizdka Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing slides: www.slideshare.net/mattlease
  • 2. Saracevic’s ‘97 Salton Award address “…the human-centered side was often highly critical of the systems side for ignoring users... [when] results have implications for systems design & practice. Unfortunately… beyond suggestions, concrete design solutions were not delivered. “…the systems side by and large ignores the user side and user studies… the stance is ‘tell us what to do and we will.’ But nobody is telling... “Thus, there are not many interactions…” Matt Lease <ml@utexas.edu> 2/20
  • 3. Primary Research Question • What is relevance? – What factors constitute it? Can we quantify their relative importance? How do they interact? • Old IR question, many studies, little agreement • Potential impacts? – Further understanding of cognitive relevance – Guide IR engineering toward inferring key factors – Foster multi-dimensional evaluation of IR systems Matt Lease <ml@utexas.edu> 3/20
  • 4. Secondary Research Question • How can we measure/ensure quality of subjective relevance judgments – How can we distinguish valid subjectivity vs. human error in judging disagreements (traditional or online)? • Potential impacts – Help explain/reduce judging disagreements – Enable evaluation wrt. distribution of opinions – Encourage other subjective data collection in HCOMP Matt Lease <ml@utexas.edu> 4/20
  • 5. Pscychology to the Rescue! • A Guide to Behavioral Experiments on Mechanical Turk – W. Mason and S. Suri (2010). SSRN online. • Crowdsourcing for Human Subjects Research – L. Schmidt (CrowdConf 2010) • Crowdsourcing Content Analysis for Behavioral Research: Insights from Mechanical Turk – Conley & Tosti-Kharas (2010). Academy of Management • Amazon's Mechanical Turk : A New Source of Inexpensive, Yet High-Quality, Data? – M. Buhrmester et al. (2011). Perspectives… 6(1):3-5. – see also: Amazon Mechanical Turk Guide for Social Scientists 5/20
  • 7. Contributions • Describe a simple, reliable, scalable method for collecting diverse (subjective), multi-dimensional relevance judgments from online participants – Online survey techniques from pscyhometrics – Data available online • Describe a rigorous, positivist, data-driven framework for inferring & modeling multi-dimensional relevance – Structural equation modeling (SEM) from pscyhometrics – Run the experiment & let the data speak for itself! – Implemented in standard R libraries available online Matt Lease <ml@utexas.edu> 7/20
  • 8. An example model of multi-dimensional relevance Matt Lease <ml@utexas.edu> 8/20
  • 9. Experimental Design • Define some search tasks • Pick some documents to be judged • Hypothesize some relevance dimensions • Ask participants to answer some questions • Analyze data via Structural Equation Modeling (SEM) – Use Exploratory Factor Analysis (EFA) to assess question- factor relationships, then prune “bad” questions – Use Confirmatory Factor Analysis (CFA) to assess correlations, test significance, & compare models – Cousin to graphical models in statistics/AI Matt Lease <ml@utexas.edu> 9/20
  • 10. Collecting multi-dimensional relevance judgments • Participant picks one of several pre-defined topics – You want to plan a one week vacation in China • Participant assigned a Web page to judge – We wrote a query for each topic, submitted to a popular search engine, and did stratified sampling of results • Participant answers a set of likert-scale questions – I think the information in this page is incorrect – It’s difficult to understand the information in this page – … Matt Lease <ml@utexas.edu> 10/20
  • 11. What Questions might we ask? • What factors do you think impact relevance… • We hypothesize same 5 factors as Xu & Chen ’06 – Topicality, reliability, novelty, understability, & scope – Choose same to make revised mechanics & any difference in findings maximally clear • Assume factors are incomplete & imperfect – Positivist approach: do these factors explain observed data better than other alternatives: uni-dimensional relevance or another set of factors? Matt Lease <ml@utexas.edu> 11/20
  • 12. How do we ask the questions? • Ask 3+ questions per hypothesized dimension – Ask repeated, similar questions, & change polarity – Randomize question order (don’t group questions) – Over-generate questions to allow for later pruning – Exclude participants failing self-consistency checks • Usual stuff – Use clear, familiar, non-leading wording – Balance likert response scale, – Pre-test survey in-house, then pilot study online Matt Lease <ml@utexas.edu> 12/20
  • 13. Structural Equation Modeling (SEM) • Based on Sewell Wright’s path analysis (1921) – A factor model is parameterized by factor loadings, covariances, & residual error terms • Graphical representation: path diagram – Observed variables in boxes – Latent variables in ovals – Directed edges denote causal relationships – Residual error terms implicitly assumed Matt Lease <ml@utexas.edu> 13/20
  • 14. Exploratory Factor Analysis (EFA) – 1 of 2 • Is the sample large enough for EFA? – Kaiser-Mayer-Olkin (KMO) Measure of Adequacy – Bartlett’s Test of Sphericity • Principal Axis Factoring (PAF) to find eigenvalues – Assume some large, constant # of latent factors – Assume each factor has a connecting edge to each question – Estimate factor model parameters by least-squares (ML) • Promax (oblique) rotation to maximize correlations • Prune factors via Parallel Analysis – Create random data with same # factors & questions – Create correlation matrix and find eigenvalues Matt Lease <ml@utexas.edu> 14/20
  • 15. • Perform Parallel Analysis – Create random data w/ same # of factors & questions – Create correlation matrix and find eigenvalues • Create Scree Plot of Eigenvalues • Re-run EFA for reduced factors • Compute Pearson correlations • Discard questions with: – Weak factor loading – Strong cross-factor loading – Lack of logical interpretation • Kenny’s Rule: need >= 2 questions per factor for EFA Exploratory Factor Analysis (EFA) – 2 of 2 Matt Lease <ml@utexas.edu> 15/20
  • 16. Question-Factor Loadings (Weights) Matt Lease <ml@utexas.edu> 16/20
  • 17. CFA: Assess and Compare Models • F First-order baseline model uses a single latent factor to explain observed data Posited hierarchical factor model uses 5 relevance dimensions Matt Lease <ml@utexas.edu> 17/20
  • 18. • Null model assume observations independent – Covariance between questions fixed at 0 and all means and coveriances left free • Comparison stats – Non-Normed Fit Index (NNFI) – Comparative Fit Index (CFI) – Root-Mean Squared Error of Approximation (RMSEA) – Standardized-root Mean-Square Residual (SMSR) Confirmatory Factor Analysis (CFA) Matt Lease <ml@utexas.edu> 18/20
  • 19. Our model of multi-dimensional relevance Matt Lease <ml@utexas.edu> 19/20
  • 20. Future Directions • More data-driven positivist research into factors – Different user groups, search scenarios, devices, etc. – Need more data to support normative claims • Train/test operational systems for varying factors – Identify/extend detected features for each dimension – Personalize search results for individual preferences • Improve judging agreement by making task more natural and/or assessing impact of latent factors? • Intra-subject vs. inter-subject aggregation? – Other methods for ensuring subjective data quality? 20/20