SlideShare a Scribd company logo
1 of 39
The Search for Truth in
Objective & Subjective Crowdsourcing
Matt Lease
School of Information
University of Texas at Austin
ir.ischool.utexas.edu
@mattlease
ml@utexas.edu
Roadmap
• Two quick items
– What’s an iSchool & why pursue graduate study there?
– MTurk: anonymity & human subjects research
• Finding Consensus for Objective Tasks
• Subjective Relevance & Psychometrics
2
Matt Lease <ml@utexas.edu>
“The place where people & technology meet”
~ Wobbrock et al., 2009
www.ischools.org
4
FYI: MTurk & Human Subjects Research
•
“What are the characteristics of MTurk workers?... the MTurk
system is set up to strictly protect workers’ anonymity….”
5
`
A MTurk worker’s ID is
also their customer
ID on Amazon. Public
profile pages can link
worker ID to name.
Lease et al., SSRN’13 6
Roadmap
• Two quick items
– What’s an iSchool & why pursue graduate study there?
– MTurk: anonymity & human subjects research
• Finding Consensus for Objective Tasks
• Subjective Relevance & Psychometrics
7
Matt Lease <ml@utexas.edu>
Finding Consensus in Human Computation
• For an objective labeling task, how do we resolve
disagreement between respondents?
– e.g., majority voting, weighted voting
– Contrast cases: subjective, polling, & ideation
• Research pre-dates crowdsourcing (e.g. experts)
– Dawid and Skene’79, Smyth et al., ’95
• One of the most studied problems in HCOMP
– Quality control of crowd labeling via plurality
– Methods in many areas: ML, Vision, NLP, IR, DB, …
– With all the time & $$$ invested, what have we learned?
8
Matt Lease <ml@utexas.edu>
Value of Benchmarking
• “If you cannot measure it, you cannot improve it.”
• Drive field innovation by clear challenge tasks
– e.g., David Tse’s FIST 2012 Keynote (Comp. Biology)
• Tackling important questions
– What is the current state-of-the-art?
– How do current methods compare?
– What works, what doesn’t, and why?
– How has field progressed over time? 9
Matt Lease <ml@utexas.edu>
10
Matt Lease <ml@utexas.edu>
SQUARE:
A Benchmark
for Research on
Computing
Crowd
Consensus
@HCOMP’13
ir.ischool.utexas.edu/square
(open source)
11
“Real” Crowdsourcing Datasets
12
How does the
crowd behave?
Methods
Includes popular and/or open-source methods
• Task / Model / Supervision / Estimation & sparsity
• Task-independent
– Majority Voting
– ZenCrowd (Demartini et al., 2012), EM-based
– GLAD (Whitehill et al., 2009)
• Classification-specific (confusion matrices)
– Snow et al., 2008, Naïve Bayes
– Dawid & Skene (1979), EM-based
– Raykar et al. (2012)
– CUBAM (Welinder et al., 2010)
Matt Lease <ml@utexas.edu>
13
Results: Unsupervised Accuracy
Relative effectiveness vs. majority voting
15
-15%
-10%
-5%
0%
5%
10%
15%
BM HCB SpamCF WVSCM WB RTE TEMP WSD AC2 HC ALL
DS ZC RY GLAD CUBCAM
Results: Varying Supervision
16
Matt Lease <ml@utexas.edu>
Findings
• Majority voting never best, but rarely much worse
• No method performs far better than others
• Each method often best for some condition
– e.g., original dataset method was designed for
• DS & RY tend to perform best (RY adds priors)
– ZC (also EM-based) does well with injected noise
17
Matt Lease <ml@utexas.edu>
Provocative: So Where’s the Progress?
• Sure, progress is not only empirical, but…
• Maybe gold is too noisy to detect improvement?
– Cormack & Kolcz’09, Klebanov & Beigman’10
• Might we see bigger differences from
– Different tasks/scenarios? Larger data scales?
– Better methods or tuning? Better benchmark tests?
– Spammer detection and filtering?
• We invite community contributions!
18
Matt Lease <ml@utexas.edu>
Roadmap
• Two quick items
– What’s an iSchool & why pursue graduate study there?
– MTurk: anonymity & human subjects research
• Finding Consensus for Objective Tasks
• Subjective Relevance & Psychometrics
19
Matt Lease <ml@utexas.edu>
Multidimensional Relevance Modeling
via Psychometrics and Crowdsourcing
Joint work with
Yinglong Zhang Jin Zhang Jacek Gwizdka
Paper @ SIGIR 2014
Matt Lease <ml@utexas.edu>
20
How to Evaluate a Search Engine?
• 3 complementary approaches (with tradeoffs)
– Log analysis (“big data”): e.g., infer relevance from clicks
– User study: users perform controlled search task(s)
– Annotate: 1) create a set of queries, 2) label document
relevance to each, & 3) measure algorithmic effectiveness
• Cranfield (Cleverdon et al., 1966), simplified topical relevance
• Examples from Google
– Video: How Google makes improvements to its search
– Video: How does Google use human raters in web search?
– Search Quality Rating Guidelines (November 2, 2012) 21
Matt Lease <ml@utexas.edu>
Saracevic’s 1997 Salton Award address
“…the human-centered side was often highly critical
of the systems side for ignoring users... [when]
results have implications for systems design &
practice. Unfortunately… beyond suggestions,
concrete design solutions were not delivered.
“…the systems side by and large ignores the user
side and user studies… the stance is ‘tell us what
to do and we will.’ But nobody is telling...
“Thus, there are not many interactions…”
Matt Lease <ml@utexas.edu> 22/20
RQs: Information Retrieval
• What is relevance?
– What factors constitute it? Can we quantify their
relative importance? How do they interact?
• Old question, many studies, little agreement
• Significance
– Increase fundamental understanding of relevance
– Foster multi-dimensional evaluation of IR systems
– Bridge human & system-centered relevance modeling
• Create multi-dimensional judgment data for training & eval
• Motivate research to automatically infer underlying factors
Matt Lease <ml@utexas.edu> 23/20
RQs: Crowdsourcing Subjective Tasks
• How can we measure/ensure the quality of
subjective judgments (especially online)?
– Traditional, trusted personnel often disagree in
judging even simplified topical relevance
– How to distinguish valid subjectivity vs. human error?
• Significance
– Promote systematic study of quality assurance for
subjective tasks in HCOMP community
– Help explain/reduce observed labeling disagreements
Matt Lease <ml@utexas.edu> 24/20
Why Eytan Adar hates MTurk Research
(CHI 2011 CHC Workshop)
• Missing/ignoring prior work in other disciplines
– It turns out other fields have thought (a lot) about
a number of problems that show up in HCOMP!
• And other stuff (fun read…)
25
Social Sciences have been…
• …collecting reliable, subjective data from online
participants before “crowdsourcing” was coined
• …inferring latent factors and relationships from
noisy, observed data using powerful modeling
techniques that are positivist and data-driven
• …using MTurk to reproduce many traditional
behavioral studies with university students
Maybe we can learn something from them?
Matt Lease <ml@utexas.edu>
26
Pscychology to the Rescue!
• A Guide to Behavioral Experiments
on Mechanical Turk
– W. Mason and S. Suri (2010). SSRN online.
• Crowdsourcing for Human Subjects Research
– L. Schmidt (CrowdConf 2010)
• Crowdsourcing Content Analysis for Behavioral Research:
Insights from Mechanical Turk
– Conley & Tosti-Kharas (2010). Academy of Management
• Amazon's Mechanical Turk : A New Source of
Inexpensive, Yet High-Quality, Data?
– M. Buhrmester et al. (2011). Perspectives… 6(1):3-5.
– see also: Amazon Mechanical Turk Guide for Social Scientists
27/20
Key Ideas from Pscyhometrics
• Use established survey techniques to collect
subjective relevance judgments
– Ask repeated, similar questions, & change polarity
• Analyze via Structural Equation Modeling (SEM)
– Cousin to graphical models in statistics/AI
– Posit questions associated with latent factors
– Use Exploratory Factor Analysis (EFA) to assess
question-factor relationships & prune “bad” questions
– Use Confirmatory Factor Analysis (CFA) to assess
correlations, test significance, & compare models
Matt Lease <ml@utexas.edu>
28
Collecting multi-dimensional relevance
judgments
• Participant picks one of several pre-defined topics
– You want to plan a one week vacation in China
• Participant assigned a Web page to judge
– We wrote a query for each topic, submitted to a popular
search engine, and did stratified sampling of results
• Participant answers a set of likert-scale questions
– I think the information in this page is incorrect
– It’s difficult to understand the information in this page
Matt Lease <ml@utexas.edu> 29/20
How do we ask the questions?
• Ask 3+ questions per hypothesized dimension
– Ask repeated, similar questions, & change polarity
– Randomize question order (don’t group questions)
– Over-generate questions to allow for later pruning
– Exclude participants failing self-consistency checks
• Survey design principles: tailor, engage, QA
– Use clear, familiar, non-leading wording
– Balance response scale and question polarity
– Pre-test survey in-house, then pilot study online
Matt Lease <ml@utexas.edu> 30/20
What Questions might we ask?
• What factors might determine relevance?
• We adopt same 5 factors from (Xu & Chen, 2006)
– Topicality, reliability, novelty, understability, & scope
– Choose same to make revised mechanics & any
difference in findings maximally clear
• Assume factors are incomplete & imperfect
– Positivist approach: do these factors explain
observed data better than other alternatives:
uni-dimensional relevance or another set of factors?
Matt Lease <ml@utexas.edu> 31/20
Structural Equation Modeling (SEM)
• Based on Sewell Wright’s path analysis (1921)
– A factor model is parameterized by factor loadings,
covariances, & residual error terms
• Graphical representation: path diagram
– Observed variables in boxes
– Latent variables in ovals
– Directed edges denote
causal relationships
– Residual error terms
implicitly assumed
Matt Lease <ml@utexas.edu> 32/20
Exploratory Factor Analysis (EFA) – 1 of 2
• Is the sample large enough for EFA?
– Kaiser-Mayer-Olkin (KMO) Measure of Adequacy
– Bartlett’s Test of Sphericity
• Principal Axis Factoring (PAF) to find eigenvalues
– Assume some large, constant # of latent factors
– Assume each factor has connecting edge to each question
– Estimate factor model parameters by least-squares fit
• Prune factors via Parallel Analysis
– Create random data with same # factors & questions
– Create correlation matrix and find eigenvalues
Matt Lease <ml@utexas.edu> 33/20
• Perform Parallel Analysis
– Create random data w/ same # of factors & questions
– Create correlation matrix and find eigenvalues
• Create Scree Plot of Eigenvalues
• Re-run EFA for reduced factors
• Compute Pearson correlations
• Discard questions with:
– Weak factor loading
– Strong cross-factor loading
– Lack of logical interpretation
• Kenny’s Rule: need >= 2 questions per factor for EFA
Exploratory Factor Analysis (EFA) – 2 of 2
Matt Lease <ml@utexas.edu> 34/20
Question-Factor Loadings (Weights)
Matt Lease <ml@utexas.edu> 35/20
CFA: Assess and Compare Models
• F First-order baseline model uses a single
latent factor to explain observed data
Posited hierarchical factor model
uses 5 relevance dimensions
Matt Lease <ml@utexas.edu> 36/20
• Null model assume observations independent
– Covariance between questions fixed at 0, means &
coveriances left free
• Comparison stats
– Non-Normed Fit Index (NNFI)
– Comparative Fit Index (CFI)
– Root-Mean Squared Error of Approximation (RMSEA)
– Standardized-root Mean-Square Residual (SMSR)
Confirmatory Factor Analysis (CFA)
Matt Lease <ml@utexas.edu> 37/20
Contributions
• Simple, reliable, scalable way to collect diverse (subjective),
multi-dimensional judgments from online participants
– Online survey techniques from pscyhometrics
– Doesn’t require objective task, gold labels, or N+ judges
– Help distinguish subjectivity vs. error
• Describe a rigorous, positivist, data-driven framework for
inferring & modeling multi-dimensional relevance
– Structural equation modeling (SEM) from pscyhometrics
– Run the experiment & let the data speak for itself
• Implemented in standard R libraries, data available online
Matt Lease <ml@utexas.edu> 38/20
Future Directions
• More data-driven positivist research into factors
– Different user groups, search scenarios, devices, etc.
– Need more data to support normative claims
• Train/test operational systems for varying factors
– Identify/extend detected features for each dimension
– Personalize search results for individual preferences
• Improve agreement by making task more natural
and/or analyzing latent factors if disagreement
• Intra-subject vs. inter-subject aggregation?
– Other methods for ensuring subjective data quality?
• SEM vs. graphical models?
39/20
Thank You!
ir.ischool.utexas.edu
40
Slides: www.slideshare.net/mattlease

More Related Content

What's hot

The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016Matthew Lease
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)Matthew Lease
 
Joe keating - world legal summit - ethical data science
Joe keating  - world legal summit - ethical data scienceJoe keating  - world legal summit - ethical data science
Joe keating - world legal summit - ethical data scienceJoe Keating
 
Privacy-driven design of Learning Analytics applications – exploring the desi...
Privacy-driven design of Learning Analytics applications – exploring the desi...Privacy-driven design of Learning Analytics applications – exploring the desi...
Privacy-driven design of Learning Analytics applications – exploring the desi...Tore Hoel
 
AI and Legal Tech in Context: Privacy and Security Commons
AI and Legal Tech in Context: Privacy and Security CommonsAI and Legal Tech in Context: Privacy and Security Commons
AI and Legal Tech in Context: Privacy and Security Commonsprofessormadison
 
Student vulnerability, agency and learning analytics: an exploration
Student vulnerability, agency and learning analytics: an explorationStudent vulnerability, agency and learning analytics: an exploration
Student vulnerability, agency and learning analytics: an explorationUniversity of South Africa (Unisa)
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)Matthew Lease
 
Data Standards and Linked Data: Challenges & Use Cases in Europe and the Unit...
Data Standards and Linked Data: Challenges & Use Cases in Europe and the Unit...Data Standards and Linked Data: Challenges & Use Cases in Europe and the Unit...
Data Standards and Linked Data: Challenges & Use Cases in Europe and the Unit...Jonathan Pichot
 
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchCrowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchMatthew Lease
 
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Matthew Lease
 
Re-Defining Journalism Education: Using Systems Thinking and Design to Revolu...
Re-Defining Journalism Education: Using Systems Thinking and Design to Revolu...Re-Defining Journalism Education: Using Systems Thinking and Design to Revolu...
Re-Defining Journalism Education: Using Systems Thinking and Design to Revolu...RSD7 Symposium
 
Ws3 impact assessments talk
Ws3 impact assessments talkWs3 impact assessments talk
Ws3 impact assessments talkRuthBeresford
 
Building Recommendation Systems on Social Data @KTH - FutureFriday - March 2014
Building Recommendation Systems on Social Data @KTH - FutureFriday - March 2014Building Recommendation Systems on Social Data @KTH - FutureFriday - March 2014
Building Recommendation Systems on Social Data @KTH - FutureFriday - March 2014Nima Dokoohaki
 
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Access Innovations, Inc.
 

What's hot (20)

The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
Joe keating - world legal summit - ethical data science
Joe keating  - world legal summit - ethical data scienceJoe keating  - world legal summit - ethical data science
Joe keating - world legal summit - ethical data science
 
Privacy-driven design of Learning Analytics applications – exploring the desi...
Privacy-driven design of Learning Analytics applications – exploring the desi...Privacy-driven design of Learning Analytics applications – exploring the desi...
Privacy-driven design of Learning Analytics applications – exploring the desi...
 
AI and Legal Tech in Context: Privacy and Security Commons
AI and Legal Tech in Context: Privacy and Security CommonsAI and Legal Tech in Context: Privacy and Security Commons
AI and Legal Tech in Context: Privacy and Security Commons
 
Student vulnerability, agency and learning analytics: an exploration
Student vulnerability, agency and learning analytics: an explorationStudent vulnerability, agency and learning analytics: an exploration
Student vulnerability, agency and learning analytics: an exploration
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
 
Data Standards and Linked Data: Challenges & Use Cases in Europe and the Unit...
Data Standards and Linked Data: Challenges & Use Cases in Europe and the Unit...Data Standards and Linked Data: Challenges & Use Cases in Europe and the Unit...
Data Standards and Linked Data: Challenges & Use Cases in Europe and the Unit...
 
Data Science and its impact on society
Data Science and its impact on societyData Science and its impact on society
Data Science and its impact on society
 
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchCrowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
 
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
 
Ws2 values talk
Ws2 values talkWs2 values talk
Ws2 values talk
 
Knoesis Student Achievement
Knoesis Student AchievementKnoesis Student Achievement
Knoesis Student Achievement
 
UTS CIC2 Briefing, 17 June 2016
UTS CIC2 Briefing, 17 June 2016UTS CIC2 Briefing, 17 June 2016
UTS CIC2 Briefing, 17 June 2016
 
Broad Data
Broad DataBroad Data
Broad Data
 
Re-Defining Journalism Education: Using Systems Thinking and Design to Revolu...
Re-Defining Journalism Education: Using Systems Thinking and Design to Revolu...Re-Defining Journalism Education: Using Systems Thinking and Design to Revolu...
Re-Defining Journalism Education: Using Systems Thinking and Design to Revolu...
 
Testing slides
Testing slidesTesting slides
Testing slides
 
Ws3 impact assessments talk
Ws3 impact assessments talkWs3 impact assessments talk
Ws3 impact assessments talk
 
Building Recommendation Systems on Social Data @KTH - FutureFriday - March 2014
Building Recommendation Systems on Social Data @KTH - FutureFriday - March 2014Building Recommendation Systems on Social Data @KTH - FutureFriday - March 2014
Building Recommendation Systems on Social Data @KTH - FutureFriday - March 2014
 
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
 

Similar to The Search for Truth in Objective & Subject Crowdsourcing

Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationMatthew Lease
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsMatthew Lease
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Matthew Lease
 
UT Dallas CS - Rise of Crowd Computing
UT Dallas CS - Rise of Crowd ComputingUT Dallas CS - Rise of Crowd Computing
UT Dallas CS - Rise of Crowd ComputingMatthew Lease
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Matthew Lease
 
Metrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMetrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMatthew Lease
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in EducationPhilip Piety
 
Web search-metrics-tutorial-www2010-section-1of7-introduction
Web search-metrics-tutorial-www2010-section-1of7-introductionWeb search-metrics-tutorial-www2010-section-1of7-introduction
Web search-metrics-tutorial-www2010-section-1of7-introductionAli Dasdan
 
Characterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchCharacterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchMicah Altman
 
Technology in Employee Recruitment and Selection
Technology in Employee Recruitment and SelectionTechnology in Employee Recruitment and Selection
Technology in Employee Recruitment and SelectionIoannis Nikolaou
 
[DSC Europe 22] Machine learning algorithms as tools for student success pred...
[DSC Europe 22] Machine learning algorithms as tools for student success pred...[DSC Europe 22] Machine learning algorithms as tools for student success pred...
[DSC Europe 22] Machine learning algorithms as tools for student success pred...DataScienceConferenc1
 
Big Data for Student Learning
Big Data for Student LearningBig Data for Student Learning
Big Data for Student LearningMarie Bienkowski
 
Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)ShankarPrasaadRajama
 
The state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analyticsThe state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analyticsCagatay Turkay
 
A Query Routing Model to Rank Expertcandidates on Twitter
A Query Routing Model to Rank Expertcandidates on TwitterA Query Routing Model to Rank Expertcandidates on Twitter
A Query Routing Model to Rank Expertcandidates on TwitterJonathas Magalhães
 
Introduction to Learning Analytics - Framework and Implementation Concerns
Introduction to Learning Analytics - Framework and Implementation ConcernsIntroduction to Learning Analytics - Framework and Implementation Concerns
Introduction to Learning Analytics - Framework and Implementation ConcernsTore Hoel
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataCS, NcState
 
Sweeny group think-ias2015
Sweeny group think-ias2015Sweeny group think-ias2015
Sweeny group think-ias2015Marianne Sweeny
 
Goal Dynamics_From System Dynamics to Implementation
Goal Dynamics_From System Dynamics to ImplementationGoal Dynamics_From System Dynamics to Implementation
Goal Dynamics_From System Dynamics to ImplementationAmjad Adib
 

Similar to The Search for Truth in Objective & Subject Crowdsourcing (20)

Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine Evaluation
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to Ethics
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
 
UT Dallas CS - Rise of Crowd Computing
UT Dallas CS - Rise of Crowd ComputingUT Dallas CS - Rise of Crowd Computing
UT Dallas CS - Rise of Crowd Computing
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
 
Metrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMetrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-Computing
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in Education
 
Web search-metrics-tutorial-www2010-section-1of7-introduction
Web search-metrics-tutorial-www2010-section-1of7-introductionWeb search-metrics-tutorial-www2010-section-1of7-introduction
Web search-metrics-tutorial-www2010-section-1of7-introduction
 
Characterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchCharacterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science Research
 
Technology in Employee Recruitment and Selection
Technology in Employee Recruitment and SelectionTechnology in Employee Recruitment and Selection
Technology in Employee Recruitment and Selection
 
[DSC Europe 22] Machine learning algorithms as tools for student success pred...
[DSC Europe 22] Machine learning algorithms as tools for student success pred...[DSC Europe 22] Machine learning algorithms as tools for student success pred...
[DSC Europe 22] Machine learning algorithms as tools for student success pred...
 
Big Data for Student Learning
Big Data for Student LearningBig Data for Student Learning
Big Data for Student Learning
 
Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)
 
The state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analyticsThe state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analytics
 
How do crowdworkers learn
How do crowdworkers learnHow do crowdworkers learn
How do crowdworkers learn
 
A Query Routing Model to Rank Expertcandidates on Twitter
A Query Routing Model to Rank Expertcandidates on TwitterA Query Routing Model to Rank Expertcandidates on Twitter
A Query Routing Model to Rank Expertcandidates on Twitter
 
Introduction to Learning Analytics - Framework and Implementation Concerns
Introduction to Learning Analytics - Framework and Implementation ConcernsIntroduction to Learning Analytics - Framework and Implementation Concerns
Introduction to Learning Analytics - Framework and Implementation Concerns
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
 
Sweeny group think-ias2015
Sweeny group think-ias2015Sweeny group think-ias2015
Sweeny group think-ias2015
 
Goal Dynamics_From System Dynamics to Implementation
Goal Dynamics_From System Dynamics to ImplementationGoal Dynamics_From System Dynamics to Implementation
Goal Dynamics_From System Dynamics to Implementation
 

More from Matthew Lease

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Matthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopMatthew Lease
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Matthew Lease
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information RetrievalMatthew Lease
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Matthew Lease
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...Matthew Lease
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingMatthew Lease
 
Crowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkCrowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkMatthew Lease
 
Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Matthew Lease
 
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsMatthew Lease
 
Mechanical Turk is Not Anonymous
Mechanical Turk is Not AnonymousMechanical Turk is Not Anonymous
Mechanical Turk is Not AnonymousMatthew Lease
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Matthew Lease
 
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...Matthew Lease
 

More from Matthew Lease (20)

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
Crowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkCrowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical Turk
 
Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences.
 
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
 
Mechanical Turk is Not Anonymous
Mechanical Turk is Not AnonymousMechanical Turk is Not Anonymous
Mechanical Turk is Not Anonymous
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)
 
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
 

Recently uploaded

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

The Search for Truth in Objective & Subject Crowdsourcing

  • 1. The Search for Truth in Objective & Subjective Crowdsourcing Matt Lease School of Information University of Texas at Austin ir.ischool.utexas.edu @mattlease ml@utexas.edu
  • 2. Roadmap • Two quick items – What’s an iSchool & why pursue graduate study there? – MTurk: anonymity & human subjects research • Finding Consensus for Objective Tasks • Subjective Relevance & Psychometrics 2 Matt Lease <ml@utexas.edu>
  • 3. “The place where people & technology meet” ~ Wobbrock et al., 2009 www.ischools.org
  • 4. 4
  • 5. FYI: MTurk & Human Subjects Research • “What are the characteristics of MTurk workers?... the MTurk system is set up to strictly protect workers’ anonymity….” 5
  • 6. ` A MTurk worker’s ID is also their customer ID on Amazon. Public profile pages can link worker ID to name. Lease et al., SSRN’13 6
  • 7. Roadmap • Two quick items – What’s an iSchool & why pursue graduate study there? – MTurk: anonymity & human subjects research • Finding Consensus for Objective Tasks • Subjective Relevance & Psychometrics 7 Matt Lease <ml@utexas.edu>
  • 8. Finding Consensus in Human Computation • For an objective labeling task, how do we resolve disagreement between respondents? – e.g., majority voting, weighted voting – Contrast cases: subjective, polling, & ideation • Research pre-dates crowdsourcing (e.g. experts) – Dawid and Skene’79, Smyth et al., ’95 • One of the most studied problems in HCOMP – Quality control of crowd labeling via plurality – Methods in many areas: ML, Vision, NLP, IR, DB, … – With all the time & $$$ invested, what have we learned? 8 Matt Lease <ml@utexas.edu>
  • 9. Value of Benchmarking • “If you cannot measure it, you cannot improve it.” • Drive field innovation by clear challenge tasks – e.g., David Tse’s FIST 2012 Keynote (Comp. Biology) • Tackling important questions – What is the current state-of-the-art? – How do current methods compare? – What works, what doesn’t, and why? – How has field progressed over time? 9 Matt Lease <ml@utexas.edu>
  • 10. 10 Matt Lease <ml@utexas.edu> SQUARE: A Benchmark for Research on Computing Crowd Consensus @HCOMP’13 ir.ischool.utexas.edu/square (open source)
  • 13. Methods Includes popular and/or open-source methods • Task / Model / Supervision / Estimation & sparsity • Task-independent – Majority Voting – ZenCrowd (Demartini et al., 2012), EM-based – GLAD (Whitehill et al., 2009) • Classification-specific (confusion matrices) – Snow et al., 2008, Naïve Bayes – Dawid & Skene (1979), EM-based – Raykar et al. (2012) – CUBAM (Welinder et al., 2010) Matt Lease <ml@utexas.edu> 13
  • 14. Results: Unsupervised Accuracy Relative effectiveness vs. majority voting 15 -15% -10% -5% 0% 5% 10% 15% BM HCB SpamCF WVSCM WB RTE TEMP WSD AC2 HC ALL DS ZC RY GLAD CUBCAM
  • 15. Results: Varying Supervision 16 Matt Lease <ml@utexas.edu>
  • 16. Findings • Majority voting never best, but rarely much worse • No method performs far better than others • Each method often best for some condition – e.g., original dataset method was designed for • DS & RY tend to perform best (RY adds priors) – ZC (also EM-based) does well with injected noise 17 Matt Lease <ml@utexas.edu>
  • 17. Provocative: So Where’s the Progress? • Sure, progress is not only empirical, but… • Maybe gold is too noisy to detect improvement? – Cormack & Kolcz’09, Klebanov & Beigman’10 • Might we see bigger differences from – Different tasks/scenarios? Larger data scales? – Better methods or tuning? Better benchmark tests? – Spammer detection and filtering? • We invite community contributions! 18 Matt Lease <ml@utexas.edu>
  • 18. Roadmap • Two quick items – What’s an iSchool & why pursue graduate study there? – MTurk: anonymity & human subjects research • Finding Consensus for Objective Tasks • Subjective Relevance & Psychometrics 19 Matt Lease <ml@utexas.edu>
  • 19. Multidimensional Relevance Modeling via Psychometrics and Crowdsourcing Joint work with Yinglong Zhang Jin Zhang Jacek Gwizdka Paper @ SIGIR 2014 Matt Lease <ml@utexas.edu> 20
  • 20. How to Evaluate a Search Engine? • 3 complementary approaches (with tradeoffs) – Log analysis (“big data”): e.g., infer relevance from clicks – User study: users perform controlled search task(s) – Annotate: 1) create a set of queries, 2) label document relevance to each, & 3) measure algorithmic effectiveness • Cranfield (Cleverdon et al., 1966), simplified topical relevance • Examples from Google – Video: How Google makes improvements to its search – Video: How does Google use human raters in web search? – Search Quality Rating Guidelines (November 2, 2012) 21 Matt Lease <ml@utexas.edu>
  • 21. Saracevic’s 1997 Salton Award address “…the human-centered side was often highly critical of the systems side for ignoring users... [when] results have implications for systems design & practice. Unfortunately… beyond suggestions, concrete design solutions were not delivered. “…the systems side by and large ignores the user side and user studies… the stance is ‘tell us what to do and we will.’ But nobody is telling... “Thus, there are not many interactions…” Matt Lease <ml@utexas.edu> 22/20
  • 22. RQs: Information Retrieval • What is relevance? – What factors constitute it? Can we quantify their relative importance? How do they interact? • Old question, many studies, little agreement • Significance – Increase fundamental understanding of relevance – Foster multi-dimensional evaluation of IR systems – Bridge human & system-centered relevance modeling • Create multi-dimensional judgment data for training & eval • Motivate research to automatically infer underlying factors Matt Lease <ml@utexas.edu> 23/20
  • 23. RQs: Crowdsourcing Subjective Tasks • How can we measure/ensure the quality of subjective judgments (especially online)? – Traditional, trusted personnel often disagree in judging even simplified topical relevance – How to distinguish valid subjectivity vs. human error? • Significance – Promote systematic study of quality assurance for subjective tasks in HCOMP community – Help explain/reduce observed labeling disagreements Matt Lease <ml@utexas.edu> 24/20
  • 24. Why Eytan Adar hates MTurk Research (CHI 2011 CHC Workshop) • Missing/ignoring prior work in other disciplines – It turns out other fields have thought (a lot) about a number of problems that show up in HCOMP! • And other stuff (fun read…) 25
  • 25. Social Sciences have been… • …collecting reliable, subjective data from online participants before “crowdsourcing” was coined • …inferring latent factors and relationships from noisy, observed data using powerful modeling techniques that are positivist and data-driven • …using MTurk to reproduce many traditional behavioral studies with university students Maybe we can learn something from them? Matt Lease <ml@utexas.edu> 26
  • 26. Pscychology to the Rescue! • A Guide to Behavioral Experiments on Mechanical Turk – W. Mason and S. Suri (2010). SSRN online. • Crowdsourcing for Human Subjects Research – L. Schmidt (CrowdConf 2010) • Crowdsourcing Content Analysis for Behavioral Research: Insights from Mechanical Turk – Conley & Tosti-Kharas (2010). Academy of Management • Amazon's Mechanical Turk : A New Source of Inexpensive, Yet High-Quality, Data? – M. Buhrmester et al. (2011). Perspectives… 6(1):3-5. – see also: Amazon Mechanical Turk Guide for Social Scientists 27/20
  • 27. Key Ideas from Pscyhometrics • Use established survey techniques to collect subjective relevance judgments – Ask repeated, similar questions, & change polarity • Analyze via Structural Equation Modeling (SEM) – Cousin to graphical models in statistics/AI – Posit questions associated with latent factors – Use Exploratory Factor Analysis (EFA) to assess question-factor relationships & prune “bad” questions – Use Confirmatory Factor Analysis (CFA) to assess correlations, test significance, & compare models Matt Lease <ml@utexas.edu> 28
  • 28. Collecting multi-dimensional relevance judgments • Participant picks one of several pre-defined topics – You want to plan a one week vacation in China • Participant assigned a Web page to judge – We wrote a query for each topic, submitted to a popular search engine, and did stratified sampling of results • Participant answers a set of likert-scale questions – I think the information in this page is incorrect – It’s difficult to understand the information in this page Matt Lease <ml@utexas.edu> 29/20
  • 29. How do we ask the questions? • Ask 3+ questions per hypothesized dimension – Ask repeated, similar questions, & change polarity – Randomize question order (don’t group questions) – Over-generate questions to allow for later pruning – Exclude participants failing self-consistency checks • Survey design principles: tailor, engage, QA – Use clear, familiar, non-leading wording – Balance response scale and question polarity – Pre-test survey in-house, then pilot study online Matt Lease <ml@utexas.edu> 30/20
  • 30. What Questions might we ask? • What factors might determine relevance? • We adopt same 5 factors from (Xu & Chen, 2006) – Topicality, reliability, novelty, understability, & scope – Choose same to make revised mechanics & any difference in findings maximally clear • Assume factors are incomplete & imperfect – Positivist approach: do these factors explain observed data better than other alternatives: uni-dimensional relevance or another set of factors? Matt Lease <ml@utexas.edu> 31/20
  • 31. Structural Equation Modeling (SEM) • Based on Sewell Wright’s path analysis (1921) – A factor model is parameterized by factor loadings, covariances, & residual error terms • Graphical representation: path diagram – Observed variables in boxes – Latent variables in ovals – Directed edges denote causal relationships – Residual error terms implicitly assumed Matt Lease <ml@utexas.edu> 32/20
  • 32. Exploratory Factor Analysis (EFA) – 1 of 2 • Is the sample large enough for EFA? – Kaiser-Mayer-Olkin (KMO) Measure of Adequacy – Bartlett’s Test of Sphericity • Principal Axis Factoring (PAF) to find eigenvalues – Assume some large, constant # of latent factors – Assume each factor has connecting edge to each question – Estimate factor model parameters by least-squares fit • Prune factors via Parallel Analysis – Create random data with same # factors & questions – Create correlation matrix and find eigenvalues Matt Lease <ml@utexas.edu> 33/20
  • 33. • Perform Parallel Analysis – Create random data w/ same # of factors & questions – Create correlation matrix and find eigenvalues • Create Scree Plot of Eigenvalues • Re-run EFA for reduced factors • Compute Pearson correlations • Discard questions with: – Weak factor loading – Strong cross-factor loading – Lack of logical interpretation • Kenny’s Rule: need >= 2 questions per factor for EFA Exploratory Factor Analysis (EFA) – 2 of 2 Matt Lease <ml@utexas.edu> 34/20
  • 34. Question-Factor Loadings (Weights) Matt Lease <ml@utexas.edu> 35/20
  • 35. CFA: Assess and Compare Models • F First-order baseline model uses a single latent factor to explain observed data Posited hierarchical factor model uses 5 relevance dimensions Matt Lease <ml@utexas.edu> 36/20
  • 36. • Null model assume observations independent – Covariance between questions fixed at 0, means & coveriances left free • Comparison stats – Non-Normed Fit Index (NNFI) – Comparative Fit Index (CFI) – Root-Mean Squared Error of Approximation (RMSEA) – Standardized-root Mean-Square Residual (SMSR) Confirmatory Factor Analysis (CFA) Matt Lease <ml@utexas.edu> 37/20
  • 37. Contributions • Simple, reliable, scalable way to collect diverse (subjective), multi-dimensional judgments from online participants – Online survey techniques from pscyhometrics – Doesn’t require objective task, gold labels, or N+ judges – Help distinguish subjectivity vs. error • Describe a rigorous, positivist, data-driven framework for inferring & modeling multi-dimensional relevance – Structural equation modeling (SEM) from pscyhometrics – Run the experiment & let the data speak for itself • Implemented in standard R libraries, data available online Matt Lease <ml@utexas.edu> 38/20
  • 38. Future Directions • More data-driven positivist research into factors – Different user groups, search scenarios, devices, etc. – Need more data to support normative claims • Train/test operational systems for varying factors – Identify/extend detected features for each dimension – Personalize search results for individual preferences • Improve agreement by making task more natural and/or analyzing latent factors if disagreement • Intra-subject vs. inter-subject aggregation? – Other methods for ensuring subjective data quality? • SEM vs. graphical models? 39/20