SlideShare a Scribd company logo
1 of 57
Conducting Behavioral Research on Amazon’s Mechanical Turk Winter Mason Siddharth Suri Yahoo! Research, New York
Outline Why Mechanical Turk? Mechanical Turk Basics Internal HITs Preference Elicitation Surveys External HITs Random Assignment Synchronous Experiments Conclusion
What is Mechanical Turk? Crowdsourcing Jobs outsourced to a group through an open call (Howe ‘06) Online Labor Market Place for requesters to post jobs and workers to do them for pay This Talk How can we use MTurk for behavioral research? What kinds of behavioral research can we use MTurk for?
Why Mechanical Turk? Subject pool size Central place for > 100,000 workers (Pontin ‘07) Always-available subject pool Subject pool diversity Open to anyone globally with a computer, internet connection Low cost Reservation Wage:  $1.38/hour (Chilton et al ‘10) Effective Wage:       $4.80/hour (Ipeirotis, ’10) Faster theory/experiment cycle Hypothesis formulation Testing & evaluation of hypothesis New hypothesis tests
Validity of Worker Behavior (Quality-controlled) worker output can be as good as experts, sometimes better Labeling text with emotion (Snow, et al, 2008)  Audio transcriptions (Marge, et al, 2010) Similarity judgments for music (Urbano, et al, 2010) Search relevance judgments (Alonso & Mizzaro, 2009) Experiments with workers replicate studies conducted in laboratory or other online settings Standard psychometric tests (Buhrmester, et al, in press) Response in judgment and decision-making tests (Paolacci, et al, 2010) Responses in public good games (Suri & Watts, 2011)
Worker Demographics Self reported demographic information from 2,896 workers over 3 years (MW ‘09, MW ‘11, SW ’10) 55% Female, 45% Male Similar to other internet panels (e.g. Goldstein) Age:  Mean: 30 yrs,  Median: 32 yrs Mean Income: $30,000 / yr Similar to Ipeirotis ‘10, Ross et al ’10
Internal Consistency of Demographics 207 out of 2,896 workers did 2 of our studies Only 1 inconsistency on gender, age, income (0.4%) 31 workers did ≥ 3 of our studies 3 changed gender 1 changed age (by 6 years) 7 changed income bracket Strong internal consistency
Why Do Work on Mechanical Turk? “Mturk money is always necessary to make ends meet.” 5% U.S.    13% India “Mturk money is irrelevant.” 12% U.S.  10% India “Mturk is a fruitful way to spend free time and get some cash.” 69% U.S.  59% India (Ross et al ’10, Ipeirotis ’10)
Requesters Companies crowdsourcing part of their business Search companies:  relevance Online stores:  similar products from different stores (identifying competition) Online directories:  accuracy, freshness of listings Researchers Intermediaries CrowdFlower (formerly Delores Labs) Smartsheet.com
Turker Community Asymmetry in reputation mechanism Reputation of Workers is given by approval rating Requesters can reject work Requesters can refuse workers with low approval rates Reputation of Requesters is not built in to Mturk Turkopticon:  Workers rate requesters on communicativity, generosity, fairness and promptness (turkopticon.differenceengines.com) Turker Nation: Online forum for workers (turkers.proboards.com) Requesters should introduce themselves here Reputation matters, so abusive studies will fail quickly
Anatomy of a HIT HITs with the same title, description, pay rate, etc. are the same HIT type HITs are broken up into Assignments A worker cannot do more than 1 assignment of a HIT
Anatomy of a HIT HITs with the same title, description, pay rate, etc. are the same HIT type HITs are broken up into Assignments A worker cannot do more than 1 assignment of a HIT Requesters can set qualifications that determine who can work on the HIT e.g., Only US workers, workers with approval rating > 90%
Anatomy of a HIT HITs with the same title, description, pay rate, etc. are the same HIT type HITs are broken up into Assignments A worker cannot do more than 1 assignment of a HIT
HIT GROUP Assignment 1      “Black” Alice Assignment 2      “Night” Which is the better translation for Táy? ,[object Object]
NightHIT 1 Bob Which is the better translation for Nedj? ,[object Object]
WhiteHIT 2 Assignment 3      “Black” Charlie • • •
HIT GROUP Assignment 1      “White” Which is the better translation for Táy? ,[object Object]
NightAlice HIT 1 Assignment 2      “White” Which is the better translation for Nedj? ,[object Object]
WhiteHIT 2 Bob • • Assignment 3      “White” • David
Requester Worker Build HIT Test HIT Search for HITs Post HIT Accept HIT Do work Reject or Approve HIT Submit HIT
Lifecycle of a HIT Requester builds a HIT Internal HITs are hosted by Amazon External HITs are hosted by the requester HITs can be tested on {requester, worker}sandbox.mturk.com Requester posts HIT on mturk.com Can post as many HITs as account can cover Workers do HIT and submit work Requester approves/rejects work Payment is rendered Amazon charges requesters 10% HIT completes when it expires or all assignments are completed
How Much to Pay? Pay rate can affect quantity of work Pay rate does not have a big impact on quality (MW ’09) Accuracy Number of Tasks Completed Pay per Task Pay per Task
Completion Time 3, 6-question multiple choice surveys Launched same time of day, day of week $0.01, $0.03, $0.05 Past a threshold, pay rate does not increase speed Start with low pay rate work up
Internal HITs
Internal HITs on AMT	 Template tool Variables Preference Elicitation Honesty study
AMT Templates ,[object Object]
  Set parameters for HIT
  Title
  Description
  Keywords
  Reward
  Assignments per HIT
  Qualifications
  Time per assignment
  HIT expiration
  Auto-approve time
  Design an HTML form,[object Object]
Variables in Templates Example: Preference Elicitation HIT 1 Which would you prefer to watch? HIT 6 Which would you prefer to watch?
How to build an Internal HIT
Cross Cultural Studies: 2 Methods Self-reported:  Ask workers demographic questions, do experiment Qualifications:   Restrict HITs to worker’s country of origin using MTurk qualifications Honesty experiment:   Ask workers to roll a die (or go to a website that simulates one), pay $0.25 times the self-reported roll.
Cross Cultural Studies: 2 Methods Distribution of reported rolls indicates dishonesty No significant demographic differences in distributions
External HITs
External HITs on AMT	 Flexible survey Random Assignment Synchronous Experiments Security
Privacy survey External HIT Random order of answers Random order of questions Pop-out questions based on answers Changed wording on question from Annenberg study: Do you want the websites you visit to show you ads that are {tailored, relevant} to your interests?
Results Replicated original study Found effect of differences in wording MTurk Annenberg “Relevant” Yes No Maybe
Results BUT ,[object Object]
Results not replicated in subsequent phone surveyReplicated original study Found effect of differences in wording MTurk Annenberg “Relevant” Yes No Maybe
Random Assignment One HIT, multiple Assignments Only post once, or delete repeat submissions Preview page neutral for all conditions Once HIT accepted: If new, record WorkerID, Assignment ID assign to condition If old, get condition, “push” worker to last seen state of study Wage conditions = pay through bonus Intent to treat: Keep track of attrition by condition Example: Noisy sites decrease reading comprehension BUT find no difference between conditions Why?  Most people in noisy condition dropped out, only people left were deaf!
Financial Incentives & the performance of crowds Manipulated Task Value Amount earned per image set $0.01, $0.05, $0.10 No additional pay for image sets Difficulty Number of images per set 2, 3, 4 Measured Quantity Number of image sets submitted Quality Proportion of image sets correctly sorted Rank correlation of image sets with correct order
Results Pay rate can affect quantity of work Pay rate does not have a big impact on quality (MW ’09) Accuracy Number of Tasks Completed Pay per Task Pay per Task
Main API Functions CreateHIT(Requirements, Pay rate, Description)– returns HIT Id and HIT Type Id SubmitAssignment(AssignmentId) – notifies Amazon that this assignment has been completed ApproveAssignment(AssignmentID) – Requester accepts assignment, money is transferred, also RejectAssignment GrantBonus(WorkerID, Amount, Message) – Give the worker the specified bonus and sends message, should have a failsafe NotifyWorkers(list of WorkerIds, Message) – e-mails message to the workers.
Command-line Tools Configuration files mturk.properties – for interacting with MTurk API [task name].input– variable name & values by row [task name].properties– HIT parameters [task name].question– XML file Shell scripts run.sh – post HIT to Mechanical Turk (creates .success file) getResults.sh – download results (using .success file) reviewResults.sh – approve or reject assignments approveAndDeleteResults.sh – approve & delete all unreviewed HITs Output files [task name].success – created HIT ID & Assignment IDs [task name].results– tab-delimited output from workers (Also see http://smallsocialsystems.com/blog/?p=95)
mturk.properties access_key=ABCDEF0123455676789 secret_key=Fa234asOIU/as92345kasSDfq3rDSF #service_url=http://mechanicalturk.sandbox.amazonaws.com/?Service=AWSMechanicalTurkRequester service_url=http://mechanicalturk.amazonaws.com/?Service=AWSMechanicalTurkRequester # You should not need to adjust these values. retriable_errors=Server.ServiceUnavailable,503 retry_attempts=6 retry_delay_millis=500
[task name].properties title: Categorize Web Sites  description: Look at URLs, rate, and classify them. These websites have not been screened for adult content!  keywords: URL, categorize, web sites reward: 0.01 assignments: 10 annotation: # this Assignment Duration value is 30 * 60 = 0.5 hours assignmentduration:1800 # this HIT Lifetime value is 60*60*24*3 = 3 days hitlifetime:259200 # this Auto Approval period is 60*60*24*15 = 15 days autoapprovaldelay:1296000
[task name].question <?xml version="1.0"?> <QuestionFormxmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2005-10-01/QuestionForm.xsd">     <Overview>         <Title>Categorize an Image</Title> 	  <Text>Product:</Text>     </Overview>     <Question>     <QuestionIdentifier>category</QuestionIdentifier>     <QuestionContent><Text>Category:</Text></QuestionContent>         <AnswerSpecification>         <SelectionAnswer>           <MinSelectionCount>1</MinSelectionCount>           <StyleSuggestion>radiobutton</StyleSuggestion>           <Selections>             #set ( $categories = [ "Airplane","Anchor","Beaver","Binocular”] )             #foreach ( $category in $categories )                 <Selection>                   <SelectionIdentifier>$category</SelectionIdentifier>                   <Text>$category</Text>                 </Selection>             #end           </Selections>         </SelectionAnswer>         </AnswerSpecification>     </Question> </QuestionForm>
[task name].question <?xml version="1.0"?> <ExternalQuestionxmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">     <ExternalURL>http://mywebsite.com/experiment/index.htm</ExternalURL>     <FrameHeight>600</FrameHeight> </ExternalQuestion>

More Related Content

What's hot

Building the business case for AWS
Building the business case for AWSBuilding the business case for AWS
Building the business case for AWSAmazon Web Services
 
Aws cloud practitioner training - Dot Net Tricks
Aws cloud practitioner training - Dot Net TricksAws cloud practitioner training - Dot Net Tricks
Aws cloud practitioner training - Dot Net TricksGaurav Singh
 
A Roadmap to Cloud Center of Excellence Adoption
A Roadmap to Cloud Center of Excellence AdoptionA Roadmap to Cloud Center of Excellence Adoption
A Roadmap to Cloud Center of Excellence AdoptionAmazon Web Services
 
AWS Cloud Center Excellence Quick Start Prescriptive Guidance
AWS Cloud Center Excellence Quick Start Prescriptive GuidanceAWS Cloud Center Excellence Quick Start Prescriptive Guidance
AWS Cloud Center Excellence Quick Start Prescriptive GuidanceTom Laszewski
 
Enterprise Adoption – Patterns for Success with AWS - Business
Enterprise Adoption – Patterns for Success with AWS - BusinessEnterprise Adoption – Patterns for Success with AWS - Business
Enterprise Adoption – Patterns for Success with AWS - BusinessAmazon Web Services
 
Perform a Cloud Readiness Assessment for Your Own Company
Perform a Cloud Readiness Assessment for Your Own CompanyPerform a Cloud Readiness Assessment for Your Own Company
Perform a Cloud Readiness Assessment for Your Own CompanyAmazon Web Services
 
(ISM305) Framework: Create Cloud Strategy & Accelerate Results
(ISM305) Framework: Create Cloud Strategy & Accelerate Results(ISM305) Framework: Create Cloud Strategy & Accelerate Results
(ISM305) Framework: Create Cloud Strategy & Accelerate ResultsAmazon Web Services
 
For Partners: Build Your Business on AWS
For Partners:Build Your Business on AWSFor Partners:Build Your Business on AWS
For Partners: Build Your Business on AWSAmazon Web Services
 
Binbird Enterprise Architecture
Binbird Enterprise ArchitectureBinbird Enterprise Architecture
Binbird Enterprise ArchitecturePadam Sahi
 
The Cloud Enabled IT Operating Model - Business
The Cloud Enabled IT Operating Model - BusinessThe Cloud Enabled IT Operating Model - Business
The Cloud Enabled IT Operating Model - BusinessAmazon Web Services
 
The Value of Next Generation Managed Services
The Value of Next Generation Managed ServicesThe Value of Next Generation Managed Services
The Value of Next Generation Managed ServicesCloudHesive
 
Accenture 2014 AWS re:Invent Enterprise Migration Breakout Session
Accenture 2014 AWS re:Invent Enterprise Migration Breakout SessionAccenture 2014 AWS re:Invent Enterprise Migration Breakout Session
Accenture 2014 AWS re:Invent Enterprise Migration Breakout SessionTom Laszewski
 
Best Practices for Partnering with AWS
Best Practices for Partnering with AWSBest Practices for Partnering with AWS
Best Practices for Partnering with AWSAmazon Web Services
 
How a Business-First, Agile Cloud Migration Factory Approach Powers Digital S...
How a Business-First, Agile Cloud Migration Factory Approach Powers Digital S...How a Business-First, Agile Cloud Migration Factory Approach Powers Digital S...
How a Business-First, Agile Cloud Migration Factory Approach Powers Digital S...Cognizant
 
AWSome Day Manila - Opening Keynote, Feb 25 2014
AWSome Day Manila - Opening Keynote, Feb 25 2014AWSome Day Manila - Opening Keynote, Feb 25 2014
AWSome Day Manila - Opening Keynote, Feb 25 2014Amazon Web Services
 
Private Equity Value Creation Carve Outs, Divestitures and mergers
Private Equity Value Creation Carve Outs, Divestitures and mergersPrivate Equity Value Creation Carve Outs, Divestitures and mergers
Private Equity Value Creation Carve Outs, Divestitures and mergersTom Laszewski
 
EDB Executive Presentation 101515
EDB Executive Presentation 101515EDB Executive Presentation 101515
EDB Executive Presentation 101515Pierre Fricke
 

What's hot (20)

Building the business case for AWS
Building the business case for AWSBuilding the business case for AWS
Building the business case for AWS
 
Aws cloud practitioner training - Dot Net Tricks
Aws cloud practitioner training - Dot Net TricksAws cloud practitioner training - Dot Net Tricks
Aws cloud practitioner training - Dot Net Tricks
 
A Roadmap to Cloud Center of Excellence Adoption
A Roadmap to Cloud Center of Excellence AdoptionA Roadmap to Cloud Center of Excellence Adoption
A Roadmap to Cloud Center of Excellence Adoption
 
Transforming Your IT with AWS
Transforming Your IT with AWSTransforming Your IT with AWS
Transforming Your IT with AWS
 
AWS Cloud Center Excellence Quick Start Prescriptive Guidance
AWS Cloud Center Excellence Quick Start Prescriptive GuidanceAWS Cloud Center Excellence Quick Start Prescriptive Guidance
AWS Cloud Center Excellence Quick Start Prescriptive Guidance
 
Enterprise Adoption – Patterns for Success with AWS - Business
Enterprise Adoption – Patterns for Success with AWS - BusinessEnterprise Adoption – Patterns for Success with AWS - Business
Enterprise Adoption – Patterns for Success with AWS - Business
 
Perform a Cloud Readiness Assessment for Your Own Company
Perform a Cloud Readiness Assessment for Your Own CompanyPerform a Cloud Readiness Assessment for Your Own Company
Perform a Cloud Readiness Assessment for Your Own Company
 
(ISM305) Framework: Create Cloud Strategy & Accelerate Results
(ISM305) Framework: Create Cloud Strategy & Accelerate Results(ISM305) Framework: Create Cloud Strategy & Accelerate Results
(ISM305) Framework: Create Cloud Strategy & Accelerate Results
 
For Partners: Build Your Business on AWS
For Partners:Build Your Business on AWSFor Partners:Build Your Business on AWS
For Partners: Build Your Business on AWS
 
Binbird Enterprise Architecture
Binbird Enterprise ArchitectureBinbird Enterprise Architecture
Binbird Enterprise Architecture
 
The Cloud Enabled IT Operating Model - Business
The Cloud Enabled IT Operating Model - BusinessThe Cloud Enabled IT Operating Model - Business
The Cloud Enabled IT Operating Model - Business
 
The Value of Next Generation Managed Services
The Value of Next Generation Managed ServicesThe Value of Next Generation Managed Services
The Value of Next Generation Managed Services
 
Accenture 2014 AWS re:Invent Enterprise Migration Breakout Session
Accenture 2014 AWS re:Invent Enterprise Migration Breakout SessionAccenture 2014 AWS re:Invent Enterprise Migration Breakout Session
Accenture 2014 AWS re:Invent Enterprise Migration Breakout Session
 
Best Practices for Partnering with AWS
Best Practices for Partnering with AWSBest Practices for Partnering with AWS
Best Practices for Partnering with AWS
 
How a Business-First, Agile Cloud Migration Factory Approach Powers Digital S...
How a Business-First, Agile Cloud Migration Factory Approach Powers Digital S...How a Business-First, Agile Cloud Migration Factory Approach Powers Digital S...
How a Business-First, Agile Cloud Migration Factory Approach Powers Digital S...
 
AWSome Day Manila - Opening Keynote, Feb 25 2014
AWSome Day Manila - Opening Keynote, Feb 25 2014AWSome Day Manila - Opening Keynote, Feb 25 2014
AWSome Day Manila - Opening Keynote, Feb 25 2014
 
Digital Transformation
Digital TransformationDigital Transformation
Digital Transformation
 
Private Equity Value Creation Carve Outs, Divestitures and mergers
Private Equity Value Creation Carve Outs, Divestitures and mergersPrivate Equity Value Creation Carve Outs, Divestitures and mergers
Private Equity Value Creation Carve Outs, Divestitures and mergers
 
EDB Executive Presentation 101515
EDB Executive Presentation 101515EDB Executive Presentation 101515
EDB Executive Presentation 101515
 
Creating a Cloud First Standard
Creating a Cloud First StandardCreating a Cloud First Standard
Creating a Cloud First Standard
 

Similar to SA1: How to use Mechanical Turk for Behavioral Research

Introduction to Amazon Mechanical Turk
Introduction to Amazon Mechanical TurkIntroduction to Amazon Mechanical Turk
Introduction to Amazon Mechanical Turkamjludden
 
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)Amazon Web Services
 
Consumer Analytics in Real Time: InfoScout and Mechanical Turk (BDT206) | AWS...
Consumer Analytics in Real Time: InfoScout and Mechanical Turk (BDT206) | AWS...Consumer Analytics in Real Time: InfoScout and Mechanical Turk (BDT206) | AWS...
Consumer Analytics in Real Time: InfoScout and Mechanical Turk (BDT206) | AWS...Amazon Web Services
 
Intro to machine learning for web folks @ BlendWebMix
Intro to machine learning for web folks @ BlendWebMixIntro to machine learning for web folks @ BlendWebMix
Intro to machine learning for web folks @ BlendWebMixLouis Dorard
 
Crowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic PerspectivesCrowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic PerspectivesAditya Parameswaran
 
Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Mechanical Turk
 
Engaging with Users on Public Social Media
Engaging with Users on Public Social MediaEngaging with Users on Public Social Media
Engaging with Users on Public Social MediaJeffrey Nichols
 
An Analysis Of Keyword Preferences Amongst Recruiters And Candidate Resumes
An Analysis Of Keyword Preferences Amongst Recruiters And Candidate ResumesAn Analysis Of Keyword Preferences Amongst Recruiters And Candidate Resumes
An Analysis Of Keyword Preferences Amongst Recruiters And Candidate ResumesValerie Felton
 
AWS Mechanical Turk Office Hours - Jan 2011
AWS Mechanical Turk Office Hours - Jan 2011AWS Mechanical Turk Office Hours - Jan 2011
AWS Mechanical Turk Office Hours - Jan 2011Mechanical Turk
 
Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceTrey Grainger
 
Powerboost2010 crispincandexp
Powerboost2010 crispincandexpPowerboost2010 crispincandexp
Powerboost2010 crispincandexpCareerXroads
 
Powerboost2010-Crispin-Candidate Experience
Powerboost2010-Crispin-Candidate ExperiencePowerboost2010-Crispin-Candidate Experience
Powerboost2010-Crispin-Candidate ExperienceCareerXroads
 
Hr analytics – demystified!
Hr analytics – demystified!Hr analytics – demystified!
Hr analytics – demystified!Arun Krishnan
 
Mechanical Turk Demystified: Best practices for sourcing and scaling quality ...
Mechanical Turk Demystified: Best practices for sourcing and scaling quality ...Mechanical Turk Demystified: Best practices for sourcing and scaling quality ...
Mechanical Turk Demystified: Best practices for sourcing and scaling quality ...UXPA International
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...Hakka Labs
 
Design and implementation_of_relevance_assessm
Design and implementation_of_relevance_assessmDesign and implementation_of_relevance_assessm
Design and implementation_of_relevance_assessmshilpashukla01
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsKrishnaram Kenthapadi
 

Similar to SA1: How to use Mechanical Turk for Behavioral Research (20)

Introduction to Amazon Mechanical Turk
Introduction to Amazon Mechanical TurkIntroduction to Amazon Mechanical Turk
Introduction to Amazon Mechanical Turk
 
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
 
Consumer Analytics in Real Time: InfoScout and Mechanical Turk (BDT206) | AWS...
Consumer Analytics in Real Time: InfoScout and Mechanical Turk (BDT206) | AWS...Consumer Analytics in Real Time: InfoScout and Mechanical Turk (BDT206) | AWS...
Consumer Analytics in Real Time: InfoScout and Mechanical Turk (BDT206) | AWS...
 
Intro to machine learning for web folks @ BlendWebMix
Intro to machine learning for web folks @ BlendWebMixIntro to machine learning for web folks @ BlendWebMix
Intro to machine learning for web folks @ BlendWebMix
 
Crowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic PerspectivesCrowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic Perspectives
 
AI for HRM
AI for HRMAI for HRM
AI for HRM
 
Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar
 
Engaging with Users on Public Social Media
Engaging with Users on Public Social MediaEngaging with Users on Public Social Media
Engaging with Users on Public Social Media
 
An Analysis Of Keyword Preferences Amongst Recruiters And Candidate Resumes
An Analysis Of Keyword Preferences Amongst Recruiters And Candidate ResumesAn Analysis Of Keyword Preferences Amongst Recruiters And Candidate Resumes
An Analysis Of Keyword Preferences Amongst Recruiters And Candidate Resumes
 
AWS Mechanical Turk Office Hours - Jan 2011
AWS Mechanical Turk Office Hours - Jan 2011AWS Mechanical Turk Office Hours - Jan 2011
AWS Mechanical Turk Office Hours - Jan 2011
 
Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative Space
 
Powerboost2010 crispincandexp
Powerboost2010 crispincandexpPowerboost2010 crispincandexp
Powerboost2010 crispincandexp
 
Powerboost2010-Crispin-Candidate Experience
Powerboost2010-Crispin-Candidate ExperiencePowerboost2010-Crispin-Candidate Experience
Powerboost2010-Crispin-Candidate Experience
 
Project 500
Project 500Project 500
Project 500
 
Hr analytics – demystified!
Hr analytics – demystified!Hr analytics – demystified!
Hr analytics – demystified!
 
Mechanical Turk Demystified: Best practices for sourcing and scaling quality ...
Mechanical Turk Demystified: Best practices for sourcing and scaling quality ...Mechanical Turk Demystified: Best practices for sourcing and scaling quality ...
Mechanical Turk Demystified: Best practices for sourcing and scaling quality ...
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...
 
Design and implementation_of_relevance_assessm
Design and implementation_of_relevance_assessmDesign and implementation_of_relevance_assessm
Design and implementation_of_relevance_assessm
 
AI in Talent Acquisition
AI in Talent AcquisitionAI in Talent Acquisition
AI in Talent Acquisition
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML Systems
 

More from John Breslin

Ireland: Island of Innovation and Entrepreneurship
Ireland: Island of Innovation and EntrepreneurshipIreland: Island of Innovation and Entrepreneurship
Ireland: Island of Innovation and EntrepreneurshipJohn Breslin
 
Old Ireland in Colour
Old Ireland in ColourOld Ireland in Colour
Old Ireland in ColourJohn Breslin
 
A Balanced Routing Algorithm for Blockchain Offline Channels using Flocking
A Balanced Routing Algorithm for Blockchain Offline Channels using FlockingA Balanced Routing Algorithm for Blockchain Offline Channels using Flocking
A Balanced Routing Algorithm for Blockchain Offline Channels using FlockingJohn Breslin
 
Collusion Attack from Hubs in the Blockchain Offline Channel Network
Collusion Attack from Hubs in the Blockchain Offline Channel NetworkCollusion Attack from Hubs in the Blockchain Offline Channel Network
Collusion Attack from Hubs in the Blockchain Offline Channel NetworkJohn Breslin
 
Collaborative Leadership to Increase the Northern & Western Region’s Innovati...
Collaborative Leadership to Increase the Northern & Western Region’s Innovati...Collaborative Leadership to Increase the Northern & Western Region’s Innovati...
Collaborative Leadership to Increase the Northern & Western Region’s Innovati...John Breslin
 
TRICS: Teaching Researchers and Innovators how to Create Startups
TRICS: Teaching Researchers and Innovators how to Create StartupsTRICS: Teaching Researchers and Innovators how to Create Startups
TRICS: Teaching Researchers and Innovators how to Create StartupsJohn Breslin
 
Entrepreneurship is in Our DNA
Entrepreneurship is in Our DNAEntrepreneurship is in Our DNA
Entrepreneurship is in Our DNAJohn Breslin
 
Galway City Innovation District
Galway City Innovation DistrictGalway City Innovation District
Galway City Innovation DistrictJohn Breslin
 
Innovation Districts and Innovation Hubs
Innovation Districts and Innovation HubsInnovation Districts and Innovation Hubs
Innovation Districts and Innovation HubsJohn Breslin
 
Disciplined mHealth Entrepreneurship
Disciplined mHealth EntrepreneurshipDisciplined mHealth Entrepreneurship
Disciplined mHealth EntrepreneurshipJohn Breslin
 
Searching for Startups
Searching for StartupsSearching for Startups
Searching for StartupsJohn Breslin
 
Intellectual Property: Protecting Ideas, Designs and Brands in the Real World...
Intellectual Property: Protecting Ideas, Designs and Brands in the Real World...Intellectual Property: Protecting Ideas, Designs and Brands in the Real World...
Intellectual Property: Protecting Ideas, Designs and Brands in the Real World...John Breslin
 
Innovation and Entrepreneurship: Tips, Tools and Tricks
Innovation and Entrepreneurship: Tips, Tools and TricksInnovation and Entrepreneurship: Tips, Tools and Tricks
Innovation and Entrepreneurship: Tips, Tools and TricksJohn Breslin
 
Growing Galway's Startup Community
Growing Galway's Startup CommunityGrowing Galway's Startup Community
Growing Galway's Startup CommunityJohn Breslin
 
Startup Community: What Galway Can Do Next
Startup Community: What Galway Can Do NextStartup Community: What Galway Can Do Next
Startup Community: What Galway Can Do NextJohn Breslin
 
Adding More Semantics to the Social Web
Adding More Semantics to the Social WebAdding More Semantics to the Social Web
Adding More Semantics to the Social WebJohn Breslin
 
Communities and Tech: Build Which and What Will Come?
Communities and Tech: Build Which and What Will Come?Communities and Tech: Build Which and What Will Come?
Communities and Tech: Build Which and What Will Come?John Breslin
 
Data Analytics and Industry-Academic Partnerships: An Irish Perspective
Data Analytics and Industry-Academic Partnerships: An Irish PerspectiveData Analytics and Industry-Academic Partnerships: An Irish Perspective
Data Analytics and Industry-Academic Partnerships: An Irish PerspectiveJohn Breslin
 
“I Like” - Analysing Interactions within Social Networks to Assert the Trustw...
“I Like” - Analysing Interactions within Social Networks to Assert the Trustw...“I Like” - Analysing Interactions within Social Networks to Assert the Trustw...
“I Like” - Analysing Interactions within Social Networks to Assert the Trustw...John Breslin
 
John Breslin at the Innovation Academy
John Breslin at the Innovation AcademyJohn Breslin at the Innovation Academy
John Breslin at the Innovation AcademyJohn Breslin
 

More from John Breslin (20)

Ireland: Island of Innovation and Entrepreneurship
Ireland: Island of Innovation and EntrepreneurshipIreland: Island of Innovation and Entrepreneurship
Ireland: Island of Innovation and Entrepreneurship
 
Old Ireland in Colour
Old Ireland in ColourOld Ireland in Colour
Old Ireland in Colour
 
A Balanced Routing Algorithm for Blockchain Offline Channels using Flocking
A Balanced Routing Algorithm for Blockchain Offline Channels using FlockingA Balanced Routing Algorithm for Blockchain Offline Channels using Flocking
A Balanced Routing Algorithm for Blockchain Offline Channels using Flocking
 
Collusion Attack from Hubs in the Blockchain Offline Channel Network
Collusion Attack from Hubs in the Blockchain Offline Channel NetworkCollusion Attack from Hubs in the Blockchain Offline Channel Network
Collusion Attack from Hubs in the Blockchain Offline Channel Network
 
Collaborative Leadership to Increase the Northern & Western Region’s Innovati...
Collaborative Leadership to Increase the Northern & Western Region’s Innovati...Collaborative Leadership to Increase the Northern & Western Region’s Innovati...
Collaborative Leadership to Increase the Northern & Western Region’s Innovati...
 
TRICS: Teaching Researchers and Innovators how to Create Startups
TRICS: Teaching Researchers and Innovators how to Create StartupsTRICS: Teaching Researchers and Innovators how to Create Startups
TRICS: Teaching Researchers and Innovators how to Create Startups
 
Entrepreneurship is in Our DNA
Entrepreneurship is in Our DNAEntrepreneurship is in Our DNA
Entrepreneurship is in Our DNA
 
Galway City Innovation District
Galway City Innovation DistrictGalway City Innovation District
Galway City Innovation District
 
Innovation Districts and Innovation Hubs
Innovation Districts and Innovation HubsInnovation Districts and Innovation Hubs
Innovation Districts and Innovation Hubs
 
Disciplined mHealth Entrepreneurship
Disciplined mHealth EntrepreneurshipDisciplined mHealth Entrepreneurship
Disciplined mHealth Entrepreneurship
 
Searching for Startups
Searching for StartupsSearching for Startups
Searching for Startups
 
Intellectual Property: Protecting Ideas, Designs and Brands in the Real World...
Intellectual Property: Protecting Ideas, Designs and Brands in the Real World...Intellectual Property: Protecting Ideas, Designs and Brands in the Real World...
Intellectual Property: Protecting Ideas, Designs and Brands in the Real World...
 
Innovation and Entrepreneurship: Tips, Tools and Tricks
Innovation and Entrepreneurship: Tips, Tools and TricksInnovation and Entrepreneurship: Tips, Tools and Tricks
Innovation and Entrepreneurship: Tips, Tools and Tricks
 
Growing Galway's Startup Community
Growing Galway's Startup CommunityGrowing Galway's Startup Community
Growing Galway's Startup Community
 
Startup Community: What Galway Can Do Next
Startup Community: What Galway Can Do NextStartup Community: What Galway Can Do Next
Startup Community: What Galway Can Do Next
 
Adding More Semantics to the Social Web
Adding More Semantics to the Social WebAdding More Semantics to the Social Web
Adding More Semantics to the Social Web
 
Communities and Tech: Build Which and What Will Come?
Communities and Tech: Build Which and What Will Come?Communities and Tech: Build Which and What Will Come?
Communities and Tech: Build Which and What Will Come?
 
Data Analytics and Industry-Academic Partnerships: An Irish Perspective
Data Analytics and Industry-Academic Partnerships: An Irish PerspectiveData Analytics and Industry-Academic Partnerships: An Irish Perspective
Data Analytics and Industry-Academic Partnerships: An Irish Perspective
 
“I Like” - Analysing Interactions within Social Networks to Assert the Trustw...
“I Like” - Analysing Interactions within Social Networks to Assert the Trustw...“I Like” - Analysing Interactions within Social Networks to Assert the Trustw...
“I Like” - Analysing Interactions within Social Networks to Assert the Trustw...
 
John Breslin at the Innovation Academy
John Breslin at the Innovation AcademyJohn Breslin at the Innovation Academy
John Breslin at the Innovation Academy
 

Recently uploaded

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

SA1: How to use Mechanical Turk for Behavioral Research

  • 1. Conducting Behavioral Research on Amazon’s Mechanical Turk Winter Mason Siddharth Suri Yahoo! Research, New York
  • 2. Outline Why Mechanical Turk? Mechanical Turk Basics Internal HITs Preference Elicitation Surveys External HITs Random Assignment Synchronous Experiments Conclusion
  • 3. What is Mechanical Turk? Crowdsourcing Jobs outsourced to a group through an open call (Howe ‘06) Online Labor Market Place for requesters to post jobs and workers to do them for pay This Talk How can we use MTurk for behavioral research? What kinds of behavioral research can we use MTurk for?
  • 4. Why Mechanical Turk? Subject pool size Central place for > 100,000 workers (Pontin ‘07) Always-available subject pool Subject pool diversity Open to anyone globally with a computer, internet connection Low cost Reservation Wage: $1.38/hour (Chilton et al ‘10) Effective Wage: $4.80/hour (Ipeirotis, ’10) Faster theory/experiment cycle Hypothesis formulation Testing & evaluation of hypothesis New hypothesis tests
  • 5. Validity of Worker Behavior (Quality-controlled) worker output can be as good as experts, sometimes better Labeling text with emotion (Snow, et al, 2008) Audio transcriptions (Marge, et al, 2010) Similarity judgments for music (Urbano, et al, 2010) Search relevance judgments (Alonso & Mizzaro, 2009) Experiments with workers replicate studies conducted in laboratory or other online settings Standard psychometric tests (Buhrmester, et al, in press) Response in judgment and decision-making tests (Paolacci, et al, 2010) Responses in public good games (Suri & Watts, 2011)
  • 6. Worker Demographics Self reported demographic information from 2,896 workers over 3 years (MW ‘09, MW ‘11, SW ’10) 55% Female, 45% Male Similar to other internet panels (e.g. Goldstein) Age: Mean: 30 yrs, Median: 32 yrs Mean Income: $30,000 / yr Similar to Ipeirotis ‘10, Ross et al ’10
  • 7. Internal Consistency of Demographics 207 out of 2,896 workers did 2 of our studies Only 1 inconsistency on gender, age, income (0.4%) 31 workers did ≥ 3 of our studies 3 changed gender 1 changed age (by 6 years) 7 changed income bracket Strong internal consistency
  • 8. Why Do Work on Mechanical Turk? “Mturk money is always necessary to make ends meet.” 5% U.S. 13% India “Mturk money is irrelevant.” 12% U.S. 10% India “Mturk is a fruitful way to spend free time and get some cash.” 69% U.S. 59% India (Ross et al ’10, Ipeirotis ’10)
  • 9. Requesters Companies crowdsourcing part of their business Search companies: relevance Online stores: similar products from different stores (identifying competition) Online directories: accuracy, freshness of listings Researchers Intermediaries CrowdFlower (formerly Delores Labs) Smartsheet.com
  • 10. Turker Community Asymmetry in reputation mechanism Reputation of Workers is given by approval rating Requesters can reject work Requesters can refuse workers with low approval rates Reputation of Requesters is not built in to Mturk Turkopticon: Workers rate requesters on communicativity, generosity, fairness and promptness (turkopticon.differenceengines.com) Turker Nation: Online forum for workers (turkers.proboards.com) Requesters should introduce themselves here Reputation matters, so abusive studies will fail quickly
  • 11. Anatomy of a HIT HITs with the same title, description, pay rate, etc. are the same HIT type HITs are broken up into Assignments A worker cannot do more than 1 assignment of a HIT
  • 12. Anatomy of a HIT HITs with the same title, description, pay rate, etc. are the same HIT type HITs are broken up into Assignments A worker cannot do more than 1 assignment of a HIT Requesters can set qualifications that determine who can work on the HIT e.g., Only US workers, workers with approval rating > 90%
  • 13. Anatomy of a HIT HITs with the same title, description, pay rate, etc. are the same HIT type HITs are broken up into Assignments A worker cannot do more than 1 assignment of a HIT
  • 14.
  • 15.
  • 16. WhiteHIT 2 Assignment 3 “Black” Charlie • • •
  • 17.
  • 18.
  • 19. WhiteHIT 2 Bob • • Assignment 3 “White” • David
  • 20. Requester Worker Build HIT Test HIT Search for HITs Post HIT Accept HIT Do work Reject or Approve HIT Submit HIT
  • 21. Lifecycle of a HIT Requester builds a HIT Internal HITs are hosted by Amazon External HITs are hosted by the requester HITs can be tested on {requester, worker}sandbox.mturk.com Requester posts HIT on mturk.com Can post as many HITs as account can cover Workers do HIT and submit work Requester approves/rejects work Payment is rendered Amazon charges requesters 10% HIT completes when it expires or all assignments are completed
  • 22. How Much to Pay? Pay rate can affect quantity of work Pay rate does not have a big impact on quality (MW ’09) Accuracy Number of Tasks Completed Pay per Task Pay per Task
  • 23. Completion Time 3, 6-question multiple choice surveys Launched same time of day, day of week $0.01, $0.03, $0.05 Past a threshold, pay rate does not increase speed Start with low pay rate work up
  • 25. Internal HITs on AMT Template tool Variables Preference Elicitation Honesty study
  • 26.
  • 27. Set parameters for HIT
  • 32. Assignments per HIT
  • 34. Time per assignment
  • 35. HIT expiration
  • 37.
  • 38. Variables in Templates Example: Preference Elicitation HIT 1 Which would you prefer to watch? HIT 6 Which would you prefer to watch?
  • 39. How to build an Internal HIT
  • 40. Cross Cultural Studies: 2 Methods Self-reported: Ask workers demographic questions, do experiment Qualifications: Restrict HITs to worker’s country of origin using MTurk qualifications Honesty experiment: Ask workers to roll a die (or go to a website that simulates one), pay $0.25 times the self-reported roll.
  • 41. Cross Cultural Studies: 2 Methods Distribution of reported rolls indicates dishonesty No significant demographic differences in distributions
  • 43. External HITs on AMT Flexible survey Random Assignment Synchronous Experiments Security
  • 44. Privacy survey External HIT Random order of answers Random order of questions Pop-out questions based on answers Changed wording on question from Annenberg study: Do you want the websites you visit to show you ads that are {tailored, relevant} to your interests?
  • 45. Results Replicated original study Found effect of differences in wording MTurk Annenberg “Relevant” Yes No Maybe
  • 46.
  • 47. Results not replicated in subsequent phone surveyReplicated original study Found effect of differences in wording MTurk Annenberg “Relevant” Yes No Maybe
  • 48. Random Assignment One HIT, multiple Assignments Only post once, or delete repeat submissions Preview page neutral for all conditions Once HIT accepted: If new, record WorkerID, Assignment ID assign to condition If old, get condition, “push” worker to last seen state of study Wage conditions = pay through bonus Intent to treat: Keep track of attrition by condition Example: Noisy sites decrease reading comprehension BUT find no difference between conditions Why? Most people in noisy condition dropped out, only people left were deaf!
  • 49. Financial Incentives & the performance of crowds Manipulated Task Value Amount earned per image set $0.01, $0.05, $0.10 No additional pay for image sets Difficulty Number of images per set 2, 3, 4 Measured Quantity Number of image sets submitted Quality Proportion of image sets correctly sorted Rank correlation of image sets with correct order
  • 50.
  • 51. Results Pay rate can affect quantity of work Pay rate does not have a big impact on quality (MW ’09) Accuracy Number of Tasks Completed Pay per Task Pay per Task
  • 52. Main API Functions CreateHIT(Requirements, Pay rate, Description)– returns HIT Id and HIT Type Id SubmitAssignment(AssignmentId) – notifies Amazon that this assignment has been completed ApproveAssignment(AssignmentID) – Requester accepts assignment, money is transferred, also RejectAssignment GrantBonus(WorkerID, Amount, Message) – Give the worker the specified bonus and sends message, should have a failsafe NotifyWorkers(list of WorkerIds, Message) – e-mails message to the workers.
  • 53. Command-line Tools Configuration files mturk.properties – for interacting with MTurk API [task name].input– variable name & values by row [task name].properties– HIT parameters [task name].question– XML file Shell scripts run.sh – post HIT to Mechanical Turk (creates .success file) getResults.sh – download results (using .success file) reviewResults.sh – approve or reject assignments approveAndDeleteResults.sh – approve & delete all unreviewed HITs Output files [task name].success – created HIT ID & Assignment IDs [task name].results– tab-delimited output from workers (Also see http://smallsocialsystems.com/blog/?p=95)
  • 54. mturk.properties access_key=ABCDEF0123455676789 secret_key=Fa234asOIU/as92345kasSDfq3rDSF #service_url=http://mechanicalturk.sandbox.amazonaws.com/?Service=AWSMechanicalTurkRequester service_url=http://mechanicalturk.amazonaws.com/?Service=AWSMechanicalTurkRequester # You should not need to adjust these values. retriable_errors=Server.ServiceUnavailable,503 retry_attempts=6 retry_delay_millis=500
  • 55. [task name].properties title: Categorize Web Sites description: Look at URLs, rate, and classify them. These websites have not been screened for adult content! keywords: URL, categorize, web sites reward: 0.01 assignments: 10 annotation: # this Assignment Duration value is 30 * 60 = 0.5 hours assignmentduration:1800 # this HIT Lifetime value is 60*60*24*3 = 3 days hitlifetime:259200 # this Auto Approval period is 60*60*24*15 = 15 days autoapprovaldelay:1296000
  • 56. [task name].question <?xml version="1.0"?> <QuestionFormxmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2005-10-01/QuestionForm.xsd"> <Overview> <Title>Categorize an Image</Title> <Text>Product:</Text> </Overview> <Question> <QuestionIdentifier>category</QuestionIdentifier> <QuestionContent><Text>Category:</Text></QuestionContent> <AnswerSpecification> <SelectionAnswer> <MinSelectionCount>1</MinSelectionCount> <StyleSuggestion>radiobutton</StyleSuggestion> <Selections> #set ( $categories = [ "Airplane","Anchor","Beaver","Binocular”] ) #foreach ( $category in $categories ) <Selection> <SelectionIdentifier>$category</SelectionIdentifier> <Text>$category</Text> </Selection> #end </Selections> </SelectionAnswer> </AnswerSpecification> </Question> </QuestionForm>
  • 57. [task name].question <?xml version="1.0"?> <ExternalQuestionxmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd"> <ExternalURL>http://mywebsite.com/experiment/index.htm</ExternalURL> <FrameHeight>600</FrameHeight> </ExternalQuestion>
  • 59. How to build an External HIT
  • 60. Synchronous Experiments Example research questions Market behavior under new mechanism Network dynamics (e.g., contagion) Multi-player games Typical tasks on MTurk don’t depend on each other can be split up, done in parallel How does one get many workers to do an experiment at the same time? Panel Waiting Room
  • 61. Social Dilemmas in Networks A social dilemma occurs when the interest of the individual is at odds with the interest of the collective. In social networking sites one’s contributions are only seen by friends. E.g. photos in Flickr, status updates in Facebook More contributions, more engaged group, better for everyone Why contribute when one can free ride?
  • 62. 47 Cycle Cliques Paired Cliques Random Regular Small World
  • 63.
  • 64. Only human contributions are included in averages
  • 67. Building the Panel Do experiments requiring 4-8 fresh players Waiting time is not too high Less consequences if there is a bug Ask if they would like to be notified of future studies 85% opt in rate for SW ‘10 78% opt in rate for MW ‘11
  • 68. NotifyWorkers MTurk API call that sends an e-mail to workers Notify them a day early Experiments work well 11am-5pm EST If n subjects are needed, notify 3n Done experiments with 45 players simultaneously
  • 69. Waiting Room … Workers need to start a synchronous experiment at the same time Workers show up at slightly different times Have workers wait at a page until enough arrive Show how many they are waiting for After enough arrive tell the rest experiment is full Funnel extra players into another instance of the experiment False True
  • 70. Attrition In lab experiments subjects rarely walk out On the web: Browsers/computers crash Internet connections go down Bosses walk in Need a timeout and a default action Discard experiments with < 90% human actions SW ‘10 discarded 21 of 94 experiments with 20-24 people Discard experiment where one player acted < 50% of the time MW ‘11 discarded 43 of 232 experiments with 16 people
  • 71. Quality Assurance Majority vote – Snow, O’Connor, Jurafsky, & Ng (2008) Machine learning with responses – Sheng, Provost, & Ipeirotis (2008) Iterative vs. Parallel tasks – Little, Chilton, Goldman, & Miller (2010) Mutual Information – Ipeirotis, Provost, & Wang (2010) Verifiable answers – Kittur, Chi, Suh (2008) Time to completion Monitor discussion on forums. MW ’11: Players followed guidelines about what not to talk about.
  • 72. Security of External HITs Code security Code is exposed to entire internet, susceptible to attacks SQL injection attacks: malicious user inputs database code to damage or get access to database Scrub input for dB commands Cross-site scripting attacks (XSS): malicious user injects code into HTTP request or HTML form Scrub input and _GET and _POST variables
  • 74. Security of External HITs Code security Code is exposed to entire internet, susceptible to attacks SQL injection attacks: malicious user inputs database code to damage or get access to database Scrub input for dB commands Cross-site scripting attacks (XSS): malicious user injects code into HTTP request or HTML form Scrub input and _GET and _POST variables Protocol Security HITsvs Assignments If you want fresh players in different runs (HITs) of a synchronous experiment, need to check workerIds Made a synchronous experiment with many HITs, one assignment each One worker accepted most of the HITs, did the quiz, got paid
  • 75. Use Cases Internal HITs External HITs Pilot survey Preference elicitation Training data for machine learning algorithms “Polling” for wisdom of crowds / general knowledge Testing market mechanisms Behavioral game theory experiments User-generated content Effects of incentives ANY online study can be done on Turk Can be used as recruitment tool
  • 76. Some open questions What are the conditions in which workers perform differently than the laboratory setting? How often does one person violate Amazon’s terms of service by controlling more than one worker account? Although the demographics of the workers on Mechanical Turk is clearly not a sample of either the U.S. or world populations, could one devise a statistical weighting scheme to achieve this? Since workers are tied to a unique identifier (their Worker ID) one could conduct long term, longitudinal studies about how their behavior changes over time.
  • 77. Thank you! Conducting Behavioral Research on Amazon's Mechanical Turk (in press) http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1691163
  • 79.
  • 80. HITs with the same title, description, pay rate, etc. are the same HIT type
  • 81. HITs are broken up into Assignments
  • 82.
  • 84. each player has endowment e
  • 85. player p contributes simultaneously with other players
  • 86.
  • 87.
  • 88.
  • 89. Pay $0.50 for passing the quiz plus 1/2 cent per point
  • 90. 10 rounds, 45 sec for first two, 30 sec for the rest
  • 92. Endowment for each round is 10 points
  • 93. Marginal rate of return is 0.4 (Fehr & Gachter ’00), regular graphs
  • 95. n = 4, 6, 8, 24
  • 96. Some experiments all fresh players, others repeat players64
  • 97. 65
  • 98.
  • 101. MTurk experiments replicate tightly controlled lab experiments very well
  • 102. Fehr & Gachter, American Economic Review, 2000
  • 103.
  • 104. Statistical Properties of Networks 68 All Networks 24 nodes 60 edges regular degree 5
  • 105.
  • 106. especially if you unify starting points
  • 108.
  • 109. Punishment: pay to deduct point from others
  • 110. Ostrom et al ‘92, Fehr & Gachter ’00 ’02,...
  • 111. Results in a loss of efficiency but enhances cooperation
  • 113. with reciprocity enhances cooperation: Rand et al ’09
  • 114. without reciprocity doesn’t enhance cooperation: Sefton et al ’07,...
  • 115. Can enhance cooperation if you change the game
  • 116. artificial seed players that always contribute 10 or 070
  • 117.
  • 118. WildCat Wells Same amount of information, so network is unknown to players Known maximum 15 rounds to explore No information when rounds skipped
  • 119. WildCat Wells Waiting room: shows how many players waiting, allows players to come and go, starts game when 16 players have joined Panel: Notify players by email & Turker Nation ahead of time.
  • 120. Significant differences in how quickly players converged No differences in how quickly the maximum was found (avg. = 5.43 rounds)

Editor's Notes

  1. Start: 5:45 end: 6:08
  2. Small screen shot?
  3. Open to anyone globally? Paid in dollars or rupees?Picture of cycle?
  4. Main point of this slide is: who are your competitors
  5. Picture here
  6. Picture here
  7. Picture here
  8. Last bullet point probably belongs elsewhere
  9. Picture here