SlideShare une entreprise Scribd logo
1  sur  80
Conducting Behavioral Research
                 with Crowdsourcing
(especially Amazon’s Mechanical Turk)


        Winter Mason       Siddharth Suri
   Stevens Institute of   Yahoo! Research
           Technology
Outline

   Peer Production vs. Human Computation vs.
    Crowdsourcing
   Peer Production & Citizen Science
   Crowdsourcing
   Mechanical Turk Basics
   Internal HITs
       Preference Elicitation
       Surveys
   External HITs
       Random Assignment
       Synchronous Experiments
   Conclusion
Definitions
   Peer Production
       Creation through distributed contributions
   Human Computation
       Computation with “humans in the loop” (Law & von Ahn, ‘11)
   Crowdsourcing
       Jobs outsourced to a group through an open call (Howe ‘06)
Examples of Modern Peer Production
   Open source software               Crowdsourcing
       Linux, Apache, Fire Fox            ESP Game
       Mash-ups                           Fold-it!
   Prediction Markets                     galaxyZoo
       Iowa electronics markets,          threadless
        Hollywood stock exchange           Tagasauris
   Collaborative Knowledge                Innocentive
       Wikipedia, Intellipedia            TopCoder
       Yahoo! Answers                     oDesk
       Amazon, Yelp, Epinions             Mechanical Turk
   Social Tagging
    Communities
       Flickr, Del.icio.us
ESP Game
   Two player online game
   Players do not know who they are playing with
   Players cannot communicate
   Object of the game:
       Type the same word given an image
Games With a Purpose
   The outcome of the ESP game is labeled images.
   Google Images bought the ESP game, and has used
    it to improve image search.
   The contributions of the crowd are completely free
    for Google.
Fold.It!
   Fold.it is an online game in
    which players fold proteins
    into different configurations
   Certain configurations earn
    more points than others
   The configurations
    correspond to physical
    structures:
       some amino acids must be
        near the center, and others
        outside
       some pairs of amino acids
        must be close together and
        others far apart
   Players of the game recently
    unlocked the structure of an
    AIDS-related enzyme that
    the scientific community had
    been unable to unlock for a
    decade
galaxyZoo
               “Citizen Science”
               The number of images of galaxies
                taken by Hubble is immense.
               Computers can identify whether
                something is a galaxy, but not what
                type of galaxy it is (reliably).
               By employing the crowd, galaxyZoo
                has classified over 50M galaxies.
               Astronomers used to assume that if
                a galaxy appears red in color, it is
                also probably an elliptical galaxy.
                Galaxy Zoo has shown that up to a
                third of red galaxies are actually
                spirals.
Tagasauris
                Magnum Photos has a very large
                 collection of mis- or unclassified
                 photos

                To get a handle on it, they asked
                 crowd-workers to tag their photos

                Through this process, in
                 combination with a knowledge
                 base, they discovered lost photos
                 from the movie, “American Graffiti”

                The actors were tagged individually
                 in the photos (like the one on the
                 right), and the system linked them
                 together and discovered they were
                 all related to the film.
Innocentive
                 A “Seeker” creates a
                  “challenge”, typically requiring
                  serious skill and technical ability

                 Multiple “Solvers” submit
                  detailed solutions to the
                  challenge. If the solution is
                  selected, they win the (typically
                  sizable) reward.

                 For instance, by creating a
                  durable & inexpensive solar
                  flashlight that could double as a
                  lamp, a retired engineer won
                  $20,000 and brought lighting to
                  many rural Africans.
topCoder

              Programming jobs are
               offered as contests

              Coders submit their work,
               and the winner earns the
               reward

              Aside from the direct
               payoff, there are anecdotal
               reports of people being
               hired for permanent
               positions as a result of
               their contributions on
               TopCoder
oDesk
           Skilled crowdsourcing:
               for any job that requires some
                skills, but can be done entirely
                on a computer.

           Jobs are paid either as a flat,
            one-time reward, or on an
            hourly basis for longer
            contracts.

           Workers have extensive profiles
            & reputations, and wages are
            negotiated between Employer
            and Worker.

           Jobs cover a vary large
            spectrum, and pay varies with
            skill
Amazon’s Mechanical Turk

                     The original
                      crowdsourcing platform

                     “The human inside the
                      machine”; built to
                      programmatically
                      incorporate human input

                     Jobs are meant to be
                      doable by any human, and
                      every worker is meant to
                      be completely
                      interchangeable.
Generally-Shared Features of Existing
Systems
   Contributions highly modular
     Minimal contribution is small
           Single edit, single line of code, single tag
       Low interdependence between separate contributions
           Same document or function

   Distribution of contributions highly skewed
     Small number of heavy contributors
           Wikipedia, AMT, Digg
       Large number of “free riders”
           Very common feature of public goods
What is Mechanical Turk?

   Crowdsourcing
     Jobs outsourced to a group through an open
      call (Howe ‘06)
   Online Labor Market
     Place for requesters to post jobs and workers
      to do them for pay
   Participant recruitment and reimbursement
     How can we use MTurk for behavioral
      research?
     What kinds of behavioral research can we use
      MTurk for?
Why Mechanical Turk?

   Subject pool size
       Central place for > 100,000 workers (Pontin „07)
       Always-available subject pool
   Subject pool diversity
       Open to anyone globally with a computer, internet
        connection
   Low cost
       Reservation Wage: $1.38/hour (Chilton et al „10)
       Effective Wage:   $4.80/hour (Ipeirotis, ‟10)
   Faster theory/experiment cycle
       Hypothesis formulation
       Testing & evaluation of hypothesis
       New hypothesis tests
Validity of Worker Behavior
   (Quality-controlled) worker output can be as good as
    experts, sometimes better
       Labeling text with emotion (Snow, et al, 2008)
       Audio transcriptions (Marge, et al, 2010)
       Similarity judgments for music (Urbano, et al, 2010)
       Search relevance judgments (Alonso & Mizzaro, 2009)

   Experiments with workers replicate studies conducted in
    laboratory or other online settings
       Standard psychometric tests (Buhrmester, et al, 2011)
       Response in judgment and decision-making tests (Paolacci, et
        al, 2010)
       Responses in public good games (Suri & Watts, 2011)
Worker Demographics

   Self reported demographic
    information from 2,896 workers
    over 3 years (MW „09, MW „11, SW ‟10)
   55% Female, 45% Male
       Similar to other internet panels (e.g.
        Goldstein)
   Age:
       Mean: 30 yrs,
       Median: 32 yrs
   Mean Income: $30,000 / yr
   Similar to Ipeirotis „10, Ross et al
    ‟10
Internal Consistency of Demographics

   207 out of 2,896 workers did 2 of our studies
     Only 1 inconsistency on gender, age, income
      (0.4%)
   31 workers did ≥ 3 of our studies
     3 changed gender
     1 changed age (by 6 years)
     7 changed income bracket
   Strong internal consistency
Why Do Work on Mechanical Turk?

   “Mturk money is always necessary to make ends meet.”
     5% U.S. 13% India
   “Mturk money is irrelevant.”
     12% U.S. 10% India
   “Mturk is a fruitful way to spend free time and get some
    cash.”
     69% U.S. 59% India


                                 (Ross et al ‟10, Ipeirotis ‟10)
Requesters

   Companies crowdsourcing part of their business
     Search companies: relevance
     Online stores: similar products from different
      stores (identifying competition)
     Online directories: accuracy, freshness of
      listings
     Researchers
   Intermediaries
     CrowdFlower (formerly Delores Labs)
     Smartsheet.com
Common Tasks
   Image labeling
   Audio transcription
   Object / Website / Image classification
   Product evaluation
Uncommon tasks
   Workflow optimization
   Copy editing
   Product description
   Technical writing
Soylent
   Word processing with an embedded crowd (Bernstein
    et al, UIST 2010)
   Crowd proofreads each paragraph
   “Find-Fix-Verify” prevents “lazy worker” from ruining
    output
Find–Fix–Verify
   Find
       Identify one area that can be shortened without changing
        the meaning of the paragraph
   Fix
       Edit the highlighted section to shorten its length without
        changing the meaning of the paragraph
   Verify
       Choose one rewrite that fixes style errors and one that
        changes the meaning
Iterative processes
   By building on each
    other‟s work, the crowd
    can achieve remarkable
    outcomes

   Some tasks benefit from
    iterative processes,
    others from parallel
              (Little, et al, 2010)
TurkoMatic
 Crowd creates workflows
1. Ask workers to decompose task into steps
2. Ask if a step can be completed in 10 minutes
        If so, solve it
        If not, decompose the sub-task
3.   Combine outputs of sub-tasks into final output
                                     (Kalkani et al, CHI 2011)
Turker Community
   Asymmetry in reputation mechanism

   Reputation of Workers is given by approval rating
     Requesters can reject work
     Requesters can refuse workers with low approval rates


   Reputation of Requesters is not built in to Mturk
     Turkopticon: Workers rate requesters on
      communicativity, generosity, fairness and promptness
     Turker Nation: Online forum for workers
         Requesters should introduce themselves here
   Reputation matters, so abusive studies will fail quickly
Anatomy of a HIT

   HITs with the
    same title,
    description,
    pay rate, etc.
    are the same
    HIT type

   HITs are
    broken up into
    Assignments

   A worker
    cannot do
    more than 1
    assignment of
    a HIT
Anatomy of a HIT

   HITs with the
    same title,
    description,
    pay rate, etc.
    are the same
    HIT type

   HITs are
    broken up into
    Assignments
                     Requesters can set qualifications that determine who
   A worker         can work on the HIT
    cannot do        e.g., Only US workers, workers with approval rating >
    more than 1      90%
    assignment of
    a HIT
Anatomy of a HIT

   HITs with the
    same title,
    description, pay
    rate, etc. are
    the same HIT
    type

   HITs are broken
    up into
    Assignments

   A worker
    cannot do more
    than 1
    assignment of a
    HIT
HIT GROUP
                                                  Assignment 1
                                                     “Black”


                                                                  Alice


                                                  Assignment 2
      Which is the better translation for Táy ?
                                                     “Night”
HIT 1           o Black
                o Night
                                                                  Bob
      Which is the better translation for Nedj
HIT 2 ?
                o Clean
                o White                           Assignment 3
                        •                            “Black”
                        •                                        Charlie
                        •
HIT GROUP




                                                  Assignment 1
                                                     “White”

      Which is the better translation for Táy ?
HIT 1           o Black                                          Alice
                o Night

                                                  Assignment 2
      Which is the better translation for Nedj       “White”
HIT 2 ?
                o Clean
                o White
                        •                                        Bob

                        •
                        •
                                                  Assignment 3
                                                     “White”
                                                                 David
Requester     Worker

 Build HIT

              Search for
 Test HIT
                HITs



 Post HIT     Accept HIT




               Do work



 Reject or
              Submit HIT
Approve HIT
Lifecycle of a HIT
   Requester builds a HIT
       Internal HITs are hosted by Amazon
       External HITs are hosted by the requester
       HITs can be tested on {requester,
        worker}sandbox.mturk.com
   Requester posts HIT on mturk.com
       Can post as many HITs as account can cover
   Workers do HIT and submit work
   Requester approves/rejects work
       Payment is rendered
       Amazon charges requesters 10%
   HIT completes when it expires or all assignments are
    completed
How Much to Pay?
   Pay rate can affect quantity of work
   Pay rate does not have a big impact on quality
   (MW ‟09)
     Number of Tasks Completed




                                                Accuracy




                                 Pay per Task              Pay per Task
Completion Time

   3, 6-question multiple
    choice surveys
   Launched same time of
    day, day of week
   $0.01, $0.03, $0.05
   Past a threshold, pay
    rate does not increase
    speed
   Start with low pay rate
    work up
Internal HITs
Internal HITs on AMT

   Template tool
   Variables
   Preference Elicitation
   Honesty study
AMT Templates

•   Hosted by Amazon

•   Set parameters for HIT
    •   Title
    •   Description
    •   Keywords
    •   Reward
    •   Assignments per HIT
    •   Qualifications
    •   Time per assignment
    •   HIT expiration
    •   Auto-approve time



•   Design an HTML form
Variables in Templates

        Example: Preference Elicitation
          ${movie1 ${movie2
          }        }
HIT 1     img1.jpg   img2.jpg   Which would you prefer to
HIT 2     img1.jpg   img3.jpg    watch?
HIT 3     img1.jpg   img4.jpg    <img src=www.sid.com/${movie1}>
HIT 4     img2.jpg   img3.jpg    <img src=www.sid.com/${movie2}>

HIT 5     img2.jpg   img4.jpg
HIT 6     img3.jpg   img4.jpg
Variables in Templates
       Example: Preference Elicitation
              HIT 1
Which would you prefer to watch?



                                                 HIT 6
                                   Which would you prefer to watch?
How to build an Internal HIT
Cross Cultural Studies: 2 Methods

   Self-reported:
       Ask workers demographic questions, do experiment
   Qualifications:
       Restrict HITs to worker‟s country of origin using MTurk
        qualifications


   Honesty experiment:
       Ask workers to roll a die (or go to a website that
        simulates one), pay $0.25 times the self-reported roll.
One die, $0.25 + $0.25 / pip
   Average reported roll
    significantly higher than
    expected
       M = 3.91, p < 0.0005
   Players under-reported
    ones and twos and
    over-reported fives
   Replicates F & H
Dishonesty by Gender
   Men are more likely to
    over-report sixes

   Women are more likely
    to over-report fives
Dishonesty by Country
   Indians are more likely
    to over-report sixes

   Americans are more
    likely to over-report
    fives

   Might be conflated with
    gender
Dishonesty by Gender & Country
External HITs
External HITs on AMT

   Flexible survey
   Random Assignment
   Synchronous Experiments
   Security
Random Assignment
   One HIT, multiple Assignments
       Only post once, or delete repeat submissions
   Preview page neutral for all conditions

   Once HIT accepted:
       If new, record WorkerID, Assignment ID assign to condition
       If old, get condition, “push” worker to last seen state of study
   Wage conditions = pay through bonus

   Intent to treat:
       Keep track of attrition by condition
       Example: Noisy sites decrease reading comprehension
       BUT find no difference between conditions
       Why? Most people in noisy condition dropped out, only people left
        were deaf!
Javascript on Internal HIT
<script type=“javascript”>
var condition = Math.floor(Math.random()*2)
switch (condition)
{
  case 0:
                 pagetext = “Condition 1”;
                 break;
  case 1:
                 pagetext = “Condition 2”;
                 break;
}
document.getElementById(“page”).html() = pagetext;
</script>

<html><div id=“page”></div></html>
Privacy survey
   External HIT
       Random order of
        answers
       Random order of
        questions
       Pop-out questions based
        on answers

   Changed wording on
    question from
    Annenberg study:
Do you want the websites you
 visit to show you ads that are
 {tailored, relevant} to your
Results

   Replicated original
    study
   Found effect of
    differences in wording
   Annenberg         MTurk   “Relevant”


                                          Yes
                                          No
                                          Maybe
Results
                           BUT
   Replicated original     Not representative
    study                    sample
   Found effect of         Results not replicated in
    differences in wording   subsequent phone
   Annenberg         MTurk          “Relevant”
                             survey

                                                   Yes
                                                   No
                                                   Maybe
Financial Incentives
& the performance of crowds

Manipulated                               Measured
 Task Value                               Quantity
       Amount earned per image                  Number of image sets
        set                                       submitted
           $0.01, $0.05, $0.10              Quality
           No additional pay for image
                                                 Proportion of image sets
            sets
                                                  correctly sorted
   Difficulty                                   Rank correlation of image
       Number of images per set                  sets with correct order
           2, 3, 4
Results
   Pay rate can affect quantity of work
   Pay rate does not have a big impact on quality
   (MW ‟09)
     Number of Tasks Completed




                                                Accuracy




                                 Pay per Task              Pay per Task
Quality Assurance
   Majority vote – Snow, O‟Connor, Jurafsky, & Ng (2008)
   Machine learning with responses – Sheng, Provost, & Ipeirotis
    (2008)
   Iterative vs. Parallel tasks – Little, Chilton, Goldman, & Miller (2010)
   Mutual Information – Ipeirotis, Provost, & Wang (2010)

   Verifiable answers – Kittur, Chi, Suh (2008)
   Time to completion
   Honeypot tasks

   Monitor discussion on forums. MW ’11: Players followed guidelines
    about what not to talk about.
How to build an External HIT
Synchronous Experiments
   Example research questions
       Market behavior under new mechanism
       Network dynamics (e.g., contagion)
       Multi-player games

   Typical tasks on MTurk don‟t depend on each other
     can be split up, done in parallel


   How does one get many workers to do an
    experiment at the same time?
     Panel
     Waiting Room
Social Dilemmas in Networks
   A social dilemma occurs
    when the interest of the
    individual is at odds with the
    interest of the collective.
   In social networking sites
    one‟s contributions are only
    seen by friends.
       E.g. photos in Flickr, status
        updates in Facebook
       More contributions, more
        engaged group, better for
        everyone
       Why contribute when one can
        free ride?
64


                        Cycle
    Cliques
              Paired
              Cliques




Small         Random
World         Regular
Effect of Seed Nodes

•   10-seeds: 13 trials    65

    0-seeds: 17 trials
•   Only human
    contributions are
    included in averages
•   People are
    conditional
    cooperators
    • Fischbacher et al.
      „01
Building the Panel
   Do experiments requiring 4-8
    fresh players
     Waiting time is not too high
     Less consequences if there
       is a bug

   Ask if they would like to be
    notified of future studies
     85% opt in rate for SW „10
     78% opt in rate for MW „11
NotifyWorkers

   MTurk API call that sends an e-mail to workers

   Notify them a day early

   Experiments work well 11am-5pm EST

   If n subjects are needed, notify 3n
     Done experiments with 45 players
       simultaneously
Waiting Room
                                                  …
   Workers need to start a synchronous
    experiment at the same time
   Workers show up at slightly different times
   Have workers wait at a page until enough
    arrive                                          False
     Show how many they are waiting for
                                                  True
     After enough arrive tell the rest
      experiment is full
     Funnel extra players into another
      instance of the experiment
Attrition
   In lab experiments subjects rarely walk out
   On the web:
     Browsers/computers crash
     Internet connections go down
     Bosses walk in
   Need a timeout and a default action
     Discard experiments with < 90% human actions
           SW „10 discarded 21 of 94 experiments with 20-24
            people
       Discard experiment where one player acted <
        50% of the time
           MW „11 discarded 43 of 232 experiments with 16
            people
Security of External HITs
   Code security
       Code is exposed to entire internet, susceptible to
        attacks
           SQL injection attacks: malicious user inputs database code
            to damage or get access to database
               Scrub input for dB commands
           Cross-site scripting attacks (XSS): malicious user injects
            code into HTTP request or HTML form
               Scrub input and _GET and _POST variables
Checking results
Security of External HITs
   Code security
       Code is exposed to entire internet, susceptible to
        attacks
           SQL injection attacks: malicious user inputs database code
            to damage or get access to database
               Scrub input for dB commands
           Cross-site scripting attacks (XSS): malicious user injects
            code into HTTP request or HTML form
               Scrub input and _GET and _POST variables
   Protocol Security
       HITs vs Assignments
           If you want fresh players in different runs (HITs) of a
            synchronous experiment, need to check workerIds
           Made a synchronous experiment with many HITs, one
            assignment each
           One worker accepted most of the HITs, did the quiz, got
            paid
Use Cases

Internal HITs                 External HITs

   Pilot survey                 Testing market
   Preference elicitation        mechanisms
   Training data for            Behavioral game theory
    machine learning              experiments
    algorithms                   User-generated content
   “Polling” for wisdom of      Effects of incentives
    crowds / general
    knowledge
            ANY online study can be done on Turk
              Can be used as recruitment tool
Thank you!

Conducting Behavioral Research on Amazon's Mechanical Turk
(2011) Behavior Research Methods
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1691163
Main API Functions
   CreateHIT (Requirements, Pay rate, Description) – returns HIT Id and
    HIT Type Id

   SubmitAssignment (AssignmentId) – notifies Amazon that this
    assignment has been completed

   ApproveAssignment (AssignmentID) – Requester accepts assignment,
    money is transferred, also RejectAssignment

   GrantBonus (WorkerID, Amount, Message) – Give the worker the
    specified bonus and sends message, should have a failsafe

   NotifyWorkers (list of WorkerIds, Message) – e-mails message to the
    workers.
Command-line Tools
   Configuration files
       mturk.properties – for interacting with MTurk API
       [task name].input – variable name & values by row
       [task name].properties – HIT parameters
       [task name].question – XML file
   Shell scripts
       run.sh – post HIT to Mechanical Turk (creates .success file)
       getResults.sh – download results (using .success file)
       reviewResults.sh – approve or reject assignments
       approveAndDeleteResults.sh – approve & delete all
        unreviewed HITs
   Output files
       [task name].success – created HIT ID & Assignment IDs
       [task name].results – tab-delimited output from workers
mturk.properties
access_key=ABCDEF0123455676789
secret_key=Fa234asOIU/as92345kasSDfq3rDSF


#service_url=http://mechanicalturk.sandbox.amazonaws.com/?Service=AWSMechanical
   TurkRequester
service_url=http://mechanicalturk.amazonaws.com/?Service=AWSMechanicalTurkReque
   ster


# You should not need to adjust these values.
retriable_errors=Server.ServiceUnavailable,503
retry_attempts=6
retry_delay_millis=500
[task name].properties
title: Categorize Web Sites

description: Look at URLs, rate, and classify them. These websites have not
been screened for adult content!

keywords: URL, categorize, web sites
reward: 0.01
assignments: 10
annotation:

# this Assignment Duration value is 30 * 60 = 0.5 hours
assignmentduration:1800

# this HIT Lifetime value is 60*60*24*3 = 3 days
hitlifetime:259200

# this Auto Approval period is 60*60*24*15 = 15 days
autoapprovaldelay:1296000
[task name].question
<?xml version="1.0"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-
   14/ExternalQuestion.xsd">
  <ExternalURL>http://mywebsite.com/experiment/index.htm</ExternalURL>
  <FrameHeight>600</FrameHeight>
</ExternalQuestion>
[task name].results
                                                 feed        Answer.
hitid Assignment id Worker id accepted submitted back reject bonus
14SBGD
GM5ZHZ                                        Sat Oct 02     Sat Oct 02
          1BPE1URVWQKM6DSG40
FE3OU2                       A2IB92P5729K3Q   16:03:49 EDT   16:43:55 EDT   1.39
          MWDVKIAJ93B4
6DJESC2                                       2010           2010
0DXKY
14SBGD
GM5ZHZ                                        Sat Oct 02     Sat Oct 02
          1GMFLPGSL0NMWZJSTF
FE3OU2                       A2LKKOAIMEF1PT   16:10:23 EDT   16:44:33 EDT   1.54
          XNJ1FS74J6KW
6DJESC2                                       2010           2010
0DXKY
14SBGD
GM5ZHZ                                       Sat Oct 02      Sat Oct 02
          1VQ5ID82X6TJXBU4EKX
FE3OU2                        A15T1WFW5B2OPR 16:13:22 EDT    16:44:56 EDT   1.49
          YISVF8C4BWJ
6DJESC2                                      2010            2010
0DXKY
14SBGD
GM5ZHZ                                      Sat Oct 02       Sat Oct 02
          16XXR2KPFCB31UOCMB
FE3OU2                       A16ME0W2U4THE0 16:00:21 EDT     16:45:08 EDT   1.67
          G78KLMAD4HND
6DJESC2                                     2010             2010
0DXKY

Contenu connexe

Similaire à Conducting Behavioral Research with Crowdsourcing Platforms Like Mechanical Turk

Popularity As Natural Selection
Popularity As Natural SelectionPopularity As Natural Selection
Popularity As Natural Selectionpsawaya
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Paul Houle
 
"Methods for Understanding How Deep Neural Networks Work," a Presentation fro...
"Methods for Understanding How Deep Neural Networks Work," a Presentation fro..."Methods for Understanding How Deep Neural Networks Work," a Presentation fro...
"Methods for Understanding How Deep Neural Networks Work," a Presentation fro...Edge AI and Vision Alliance
 
Metadata in a Crowd: Shared Knowledge Production
Metadata in a Crowd: Shared Knowledge ProductionMetadata in a Crowd: Shared Knowledge Production
Metadata in a Crowd: Shared Knowledge ProductionKevin Rundblad
 
Deep Neural Networks for Machine Learning
Deep Neural Networks for Machine LearningDeep Neural Networks for Machine Learning
Deep Neural Networks for Machine LearningJustin Beirold
 
The BioMoby Semantic Annotation Experiment
The BioMoby Semantic Annotation ExperimentThe BioMoby Semantic Annotation Experiment
The BioMoby Semantic Annotation ExperimentMark Wilkinson
 
SE and AI: a two-way street
SE and AI: a two-way streetSE and AI: a two-way street
SE and AI: a two-way streetCS, NcState
 
Friendsters @ Work (SDForum)
Friendsters @ Work (SDForum)Friendsters @ Work (SDForum)
Friendsters @ Work (SDForum)Joe McCarthy
 
Ntegra 20231003 v3.pptx
Ntegra 20231003 v3.pptxNtegra 20231003 v3.pptx
Ntegra 20231003 v3.pptxISSIP
 
SBQS 2013 Keynote: Cooperative Testing and Analysis
SBQS 2013 Keynote: Cooperative Testing and AnalysisSBQS 2013 Keynote: Cooperative Testing and Analysis
SBQS 2013 Keynote: Cooperative Testing and AnalysisTao Xie
 
許永真/Crowd Computing for Big and Deep AI
許永真/Crowd Computing for Big and Deep AI許永真/Crowd Computing for Big and Deep AI
許永真/Crowd Computing for Big and Deep AI台灣資料科學年會
 
Big, Open, Data and Semantics for Real-World Application Near You
Big, Open, Data and Semantics for Real-World Application Near YouBig, Open, Data and Semantics for Real-World Application Near You
Big, Open, Data and Semantics for Real-World Application Near YouBiplav Srivastava
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligenceMayank Saxena
 
Rise of AI through DL
Rise of AI through DLRise of AI through DL
Rise of AI through DLRehan Guha
 
Intro to artificial intelligence
Intro to artificial intelligence Intro to artificial intelligence
Intro to artificial intelligence ankit yadav
 
20211103 jim spohrer oecd ai_science_productivity_panel v5
20211103 jim spohrer oecd ai_science_productivity_panel v520211103 jim spohrer oecd ai_science_productivity_panel v5
20211103 jim spohrer oecd ai_science_productivity_panel v5ISSIP
 
#1NWebinar: Digital on the Runway
#1NWebinar: Digital on the Runway#1NWebinar: Digital on the Runway
#1NWebinar: Digital on the RunwayOne North
 
SCONUL Summer Conference 2018 - Nicole coleman
SCONUL Summer Conference 2018 - Nicole colemanSCONUL Summer Conference 2018 - Nicole coleman
SCONUL Summer Conference 2018 - Nicole colemansconul
 

Similaire à Conducting Behavioral Research with Crowdsourcing Platforms Like Mechanical Turk (20)

Popularity As Natural Selection
Popularity As Natural SelectionPopularity As Natural Selection
Popularity As Natural Selection
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6
 
"Methods for Understanding How Deep Neural Networks Work," a Presentation fro...
"Methods for Understanding How Deep Neural Networks Work," a Presentation fro..."Methods for Understanding How Deep Neural Networks Work," a Presentation fro...
"Methods for Understanding How Deep Neural Networks Work," a Presentation fro...
 
Metadata in a Crowd: Shared Knowledge Production
Metadata in a Crowd: Shared Knowledge ProductionMetadata in a Crowd: Shared Knowledge Production
Metadata in a Crowd: Shared Knowledge Production
 
Deep Neural Networks for Machine Learning
Deep Neural Networks for Machine LearningDeep Neural Networks for Machine Learning
Deep Neural Networks for Machine Learning
 
The BioMoby Semantic Annotation Experiment
The BioMoby Semantic Annotation ExperimentThe BioMoby Semantic Annotation Experiment
The BioMoby Semantic Annotation Experiment
 
SE and AI: a two-way street
SE and AI: a two-way streetSE and AI: a two-way street
SE and AI: a two-way street
 
Friendsters @ Work (SDForum)
Friendsters @ Work (SDForum)Friendsters @ Work (SDForum)
Friendsters @ Work (SDForum)
 
Ntegra 20231003 v3.pptx
Ntegra 20231003 v3.pptxNtegra 20231003 v3.pptx
Ntegra 20231003 v3.pptx
 
SBQS 2013 Keynote: Cooperative Testing and Analysis
SBQS 2013 Keynote: Cooperative Testing and AnalysisSBQS 2013 Keynote: Cooperative Testing and Analysis
SBQS 2013 Keynote: Cooperative Testing and Analysis
 
許永真/Crowd Computing for Big and Deep AI
許永真/Crowd Computing for Big and Deep AI許永真/Crowd Computing for Big and Deep AI
許永真/Crowd Computing for Big and Deep AI
 
Big, Open, Data and Semantics for Real-World Application Near You
Big, Open, Data and Semantics for Real-World Application Near YouBig, Open, Data and Semantics for Real-World Application Near You
Big, Open, Data and Semantics for Real-World Application Near You
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligence
 
Rise of AI through DL
Rise of AI through DLRise of AI through DL
Rise of AI through DL
 
Intro to artificial intelligence
Intro to artificial intelligence Intro to artificial intelligence
Intro to artificial intelligence
 
20211103 jim spohrer oecd ai_science_productivity_panel v5
20211103 jim spohrer oecd ai_science_productivity_panel v520211103 jim spohrer oecd ai_science_productivity_panel v5
20211103 jim spohrer oecd ai_science_productivity_panel v5
 
#1NWebinar: Digital on the Runway
#1NWebinar: Digital on the Runway#1NWebinar: Digital on the Runway
#1NWebinar: Digital on the Runway
 
SCONUL Summer Conference 2018 - Nicole coleman
SCONUL Summer Conference 2018 - Nicole colemanSCONUL Summer Conference 2018 - Nicole coleman
SCONUL Summer Conference 2018 - Nicole coleman
 

Dernier

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Dernier (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Conducting Behavioral Research with Crowdsourcing Platforms Like Mechanical Turk

  • 1. Conducting Behavioral Research with Crowdsourcing (especially Amazon’s Mechanical Turk) Winter Mason Siddharth Suri Stevens Institute of Yahoo! Research Technology
  • 2. Outline  Peer Production vs. Human Computation vs. Crowdsourcing  Peer Production & Citizen Science  Crowdsourcing  Mechanical Turk Basics  Internal HITs  Preference Elicitation  Surveys  External HITs  Random Assignment  Synchronous Experiments  Conclusion
  • 3. Definitions  Peer Production  Creation through distributed contributions  Human Computation  Computation with “humans in the loop” (Law & von Ahn, ‘11)  Crowdsourcing  Jobs outsourced to a group through an open call (Howe ‘06)
  • 4. Examples of Modern Peer Production  Open source software  Crowdsourcing  Linux, Apache, Fire Fox  ESP Game  Mash-ups  Fold-it!  Prediction Markets  galaxyZoo  Iowa electronics markets,  threadless Hollywood stock exchange  Tagasauris  Collaborative Knowledge  Innocentive  Wikipedia, Intellipedia  TopCoder  Yahoo! Answers  oDesk  Amazon, Yelp, Epinions  Mechanical Turk  Social Tagging Communities  Flickr, Del.icio.us
  • 5. ESP Game  Two player online game  Players do not know who they are playing with  Players cannot communicate  Object of the game:  Type the same word given an image
  • 6.
  • 7. Games With a Purpose  The outcome of the ESP game is labeled images.  Google Images bought the ESP game, and has used it to improve image search.  The contributions of the crowd are completely free for Google.
  • 8. Fold.It!  Fold.it is an online game in which players fold proteins into different configurations  Certain configurations earn more points than others  The configurations correspond to physical structures:  some amino acids must be near the center, and others outside  some pairs of amino acids must be close together and others far apart  Players of the game recently unlocked the structure of an AIDS-related enzyme that the scientific community had been unable to unlock for a decade
  • 9. galaxyZoo  “Citizen Science”  The number of images of galaxies taken by Hubble is immense.  Computers can identify whether something is a galaxy, but not what type of galaxy it is (reliably).  By employing the crowd, galaxyZoo has classified over 50M galaxies.  Astronomers used to assume that if a galaxy appears red in color, it is also probably an elliptical galaxy. Galaxy Zoo has shown that up to a third of red galaxies are actually spirals.
  • 10. Tagasauris  Magnum Photos has a very large collection of mis- or unclassified photos  To get a handle on it, they asked crowd-workers to tag their photos  Through this process, in combination with a knowledge base, they discovered lost photos from the movie, “American Graffiti”  The actors were tagged individually in the photos (like the one on the right), and the system linked them together and discovered they were all related to the film.
  • 11. Innocentive  A “Seeker” creates a “challenge”, typically requiring serious skill and technical ability  Multiple “Solvers” submit detailed solutions to the challenge. If the solution is selected, they win the (typically sizable) reward.  For instance, by creating a durable & inexpensive solar flashlight that could double as a lamp, a retired engineer won $20,000 and brought lighting to many rural Africans.
  • 12. topCoder  Programming jobs are offered as contests  Coders submit their work, and the winner earns the reward  Aside from the direct payoff, there are anecdotal reports of people being hired for permanent positions as a result of their contributions on TopCoder
  • 13. oDesk  Skilled crowdsourcing:  for any job that requires some skills, but can be done entirely on a computer.  Jobs are paid either as a flat, one-time reward, or on an hourly basis for longer contracts.  Workers have extensive profiles & reputations, and wages are negotiated between Employer and Worker.  Jobs cover a vary large spectrum, and pay varies with skill
  • 14. Amazon’s Mechanical Turk  The original crowdsourcing platform  “The human inside the machine”; built to programmatically incorporate human input  Jobs are meant to be doable by any human, and every worker is meant to be completely interchangeable.
  • 15. Generally-Shared Features of Existing Systems  Contributions highly modular  Minimal contribution is small  Single edit, single line of code, single tag  Low interdependence between separate contributions  Same document or function  Distribution of contributions highly skewed  Small number of heavy contributors  Wikipedia, AMT, Digg  Large number of “free riders”  Very common feature of public goods
  • 16. What is Mechanical Turk?  Crowdsourcing  Jobs outsourced to a group through an open call (Howe ‘06)  Online Labor Market  Place for requesters to post jobs and workers to do them for pay  Participant recruitment and reimbursement  How can we use MTurk for behavioral research?  What kinds of behavioral research can we use MTurk for?
  • 17. Why Mechanical Turk?  Subject pool size  Central place for > 100,000 workers (Pontin „07)  Always-available subject pool  Subject pool diversity  Open to anyone globally with a computer, internet connection  Low cost  Reservation Wage: $1.38/hour (Chilton et al „10)  Effective Wage: $4.80/hour (Ipeirotis, ‟10)  Faster theory/experiment cycle  Hypothesis formulation  Testing & evaluation of hypothesis  New hypothesis tests
  • 18. Validity of Worker Behavior  (Quality-controlled) worker output can be as good as experts, sometimes better  Labeling text with emotion (Snow, et al, 2008)  Audio transcriptions (Marge, et al, 2010)  Similarity judgments for music (Urbano, et al, 2010)  Search relevance judgments (Alonso & Mizzaro, 2009)  Experiments with workers replicate studies conducted in laboratory or other online settings  Standard psychometric tests (Buhrmester, et al, 2011)  Response in judgment and decision-making tests (Paolacci, et al, 2010)  Responses in public good games (Suri & Watts, 2011)
  • 19. Worker Demographics  Self reported demographic information from 2,896 workers over 3 years (MW „09, MW „11, SW ‟10)  55% Female, 45% Male  Similar to other internet panels (e.g. Goldstein)  Age:  Mean: 30 yrs,  Median: 32 yrs  Mean Income: $30,000 / yr  Similar to Ipeirotis „10, Ross et al ‟10
  • 20. Internal Consistency of Demographics  207 out of 2,896 workers did 2 of our studies  Only 1 inconsistency on gender, age, income (0.4%)  31 workers did ≥ 3 of our studies  3 changed gender  1 changed age (by 6 years)  7 changed income bracket  Strong internal consistency
  • 21. Why Do Work on Mechanical Turk?  “Mturk money is always necessary to make ends meet.”  5% U.S. 13% India  “Mturk money is irrelevant.”  12% U.S. 10% India  “Mturk is a fruitful way to spend free time and get some cash.”  69% U.S. 59% India (Ross et al ‟10, Ipeirotis ‟10)
  • 22. Requesters  Companies crowdsourcing part of their business  Search companies: relevance  Online stores: similar products from different stores (identifying competition)  Online directories: accuracy, freshness of listings  Researchers  Intermediaries  CrowdFlower (formerly Delores Labs)  Smartsheet.com
  • 23. Common Tasks  Image labeling  Audio transcription  Object / Website / Image classification  Product evaluation
  • 24. Uncommon tasks  Workflow optimization  Copy editing  Product description  Technical writing
  • 25. Soylent  Word processing with an embedded crowd (Bernstein et al, UIST 2010)  Crowd proofreads each paragraph  “Find-Fix-Verify” prevents “lazy worker” from ruining output
  • 26. Find–Fix–Verify  Find  Identify one area that can be shortened without changing the meaning of the paragraph  Fix  Edit the highlighted section to shorten its length without changing the meaning of the paragraph  Verify  Choose one rewrite that fixes style errors and one that changes the meaning
  • 27. Iterative processes  By building on each other‟s work, the crowd can achieve remarkable outcomes  Some tasks benefit from iterative processes, others from parallel (Little, et al, 2010)
  • 28. TurkoMatic  Crowd creates workflows 1. Ask workers to decompose task into steps 2. Ask if a step can be completed in 10 minutes  If so, solve it  If not, decompose the sub-task 3. Combine outputs of sub-tasks into final output (Kalkani et al, CHI 2011)
  • 29. Turker Community  Asymmetry in reputation mechanism  Reputation of Workers is given by approval rating  Requesters can reject work  Requesters can refuse workers with low approval rates  Reputation of Requesters is not built in to Mturk  Turkopticon: Workers rate requesters on communicativity, generosity, fairness and promptness  Turker Nation: Online forum for workers  Requesters should introduce themselves here  Reputation matters, so abusive studies will fail quickly
  • 30. Anatomy of a HIT  HITs with the same title, description, pay rate, etc. are the same HIT type  HITs are broken up into Assignments  A worker cannot do more than 1 assignment of a HIT
  • 31. Anatomy of a HIT  HITs with the same title, description, pay rate, etc. are the same HIT type  HITs are broken up into Assignments Requesters can set qualifications that determine who  A worker can work on the HIT cannot do e.g., Only US workers, workers with approval rating > more than 1 90% assignment of a HIT
  • 32. Anatomy of a HIT  HITs with the same title, description, pay rate, etc. are the same HIT type  HITs are broken up into Assignments  A worker cannot do more than 1 assignment of a HIT
  • 33. HIT GROUP Assignment 1 “Black” Alice Assignment 2 Which is the better translation for Táy ? “Night” HIT 1 o Black o Night Bob Which is the better translation for Nedj HIT 2 ? o Clean o White Assignment 3 • “Black” • Charlie •
  • 34. HIT GROUP Assignment 1 “White” Which is the better translation for Táy ? HIT 1 o Black Alice o Night Assignment 2 Which is the better translation for Nedj “White” HIT 2 ? o Clean o White • Bob • • Assignment 3 “White” David
  • 35. Requester Worker Build HIT Search for Test HIT HITs Post HIT Accept HIT Do work Reject or Submit HIT Approve HIT
  • 36. Lifecycle of a HIT  Requester builds a HIT  Internal HITs are hosted by Amazon  External HITs are hosted by the requester  HITs can be tested on {requester, worker}sandbox.mturk.com  Requester posts HIT on mturk.com  Can post as many HITs as account can cover  Workers do HIT and submit work  Requester approves/rejects work  Payment is rendered  Amazon charges requesters 10%  HIT completes when it expires or all assignments are completed
  • 37. How Much to Pay?  Pay rate can affect quantity of work  Pay rate does not have a big impact on quality  (MW ‟09) Number of Tasks Completed Accuracy Pay per Task Pay per Task
  • 38. Completion Time  3, 6-question multiple choice surveys  Launched same time of day, day of week  $0.01, $0.03, $0.05  Past a threshold, pay rate does not increase speed  Start with low pay rate work up
  • 40. Internal HITs on AMT  Template tool  Variables  Preference Elicitation  Honesty study
  • 41. AMT Templates • Hosted by Amazon • Set parameters for HIT • Title • Description • Keywords • Reward • Assignments per HIT • Qualifications • Time per assignment • HIT expiration • Auto-approve time • Design an HTML form
  • 42. Variables in Templates Example: Preference Elicitation ${movie1 ${movie2 } } HIT 1 img1.jpg img2.jpg Which would you prefer to HIT 2 img1.jpg img3.jpg watch? HIT 3 img1.jpg img4.jpg <img src=www.sid.com/${movie1}> HIT 4 img2.jpg img3.jpg <img src=www.sid.com/${movie2}> HIT 5 img2.jpg img4.jpg HIT 6 img3.jpg img4.jpg
  • 43. Variables in Templates Example: Preference Elicitation HIT 1 Which would you prefer to watch? HIT 6 Which would you prefer to watch?
  • 44. How to build an Internal HIT
  • 45. Cross Cultural Studies: 2 Methods  Self-reported:  Ask workers demographic questions, do experiment  Qualifications:  Restrict HITs to worker‟s country of origin using MTurk qualifications  Honesty experiment:  Ask workers to roll a die (or go to a website that simulates one), pay $0.25 times the self-reported roll.
  • 46. One die, $0.25 + $0.25 / pip  Average reported roll significantly higher than expected  M = 3.91, p < 0.0005  Players under-reported ones and twos and over-reported fives  Replicates F & H
  • 47. Dishonesty by Gender  Men are more likely to over-report sixes  Women are more likely to over-report fives
  • 48. Dishonesty by Country  Indians are more likely to over-report sixes  Americans are more likely to over-report fives  Might be conflated with gender
  • 49. Dishonesty by Gender & Country
  • 51. External HITs on AMT  Flexible survey  Random Assignment  Synchronous Experiments  Security
  • 52. Random Assignment  One HIT, multiple Assignments  Only post once, or delete repeat submissions  Preview page neutral for all conditions  Once HIT accepted:  If new, record WorkerID, Assignment ID assign to condition  If old, get condition, “push” worker to last seen state of study  Wage conditions = pay through bonus  Intent to treat:  Keep track of attrition by condition  Example: Noisy sites decrease reading comprehension  BUT find no difference between conditions  Why? Most people in noisy condition dropped out, only people left were deaf!
  • 53. Javascript on Internal HIT <script type=“javascript”> var condition = Math.floor(Math.random()*2) switch (condition) { case 0: pagetext = “Condition 1”; break; case 1: pagetext = “Condition 2”; break; } document.getElementById(“page”).html() = pagetext; </script> <html><div id=“page”></div></html>
  • 54. Privacy survey  External HIT  Random order of answers  Random order of questions  Pop-out questions based on answers  Changed wording on question from Annenberg study: Do you want the websites you visit to show you ads that are {tailored, relevant} to your
  • 55. Results  Replicated original study  Found effect of differences in wording Annenberg MTurk “Relevant” Yes No Maybe
  • 56. Results BUT  Replicated original  Not representative study sample  Found effect of  Results not replicated in differences in wording subsequent phone Annenberg MTurk “Relevant” survey Yes No Maybe
  • 57. Financial Incentives & the performance of crowds Manipulated Measured  Task Value  Quantity  Amount earned per image  Number of image sets set submitted  $0.01, $0.05, $0.10  Quality  No additional pay for image  Proportion of image sets sets correctly sorted  Difficulty  Rank correlation of image  Number of images per set sets with correct order  2, 3, 4
  • 58.
  • 59. Results  Pay rate can affect quantity of work  Pay rate does not have a big impact on quality  (MW ‟09) Number of Tasks Completed Accuracy Pay per Task Pay per Task
  • 60. Quality Assurance  Majority vote – Snow, O‟Connor, Jurafsky, & Ng (2008)  Machine learning with responses – Sheng, Provost, & Ipeirotis (2008)  Iterative vs. Parallel tasks – Little, Chilton, Goldman, & Miller (2010)  Mutual Information – Ipeirotis, Provost, & Wang (2010)  Verifiable answers – Kittur, Chi, Suh (2008)  Time to completion  Honeypot tasks  Monitor discussion on forums. MW ’11: Players followed guidelines about what not to talk about.
  • 61. How to build an External HIT
  • 62. Synchronous Experiments  Example research questions  Market behavior under new mechanism  Network dynamics (e.g., contagion)  Multi-player games  Typical tasks on MTurk don‟t depend on each other  can be split up, done in parallel  How does one get many workers to do an experiment at the same time?  Panel  Waiting Room
  • 63. Social Dilemmas in Networks  A social dilemma occurs when the interest of the individual is at odds with the interest of the collective.  In social networking sites one‟s contributions are only seen by friends.  E.g. photos in Flickr, status updates in Facebook  More contributions, more engaged group, better for everyone  Why contribute when one can free ride?
  • 64. 64 Cycle Cliques Paired Cliques Small Random World Regular
  • 65. Effect of Seed Nodes • 10-seeds: 13 trials 65 0-seeds: 17 trials • Only human contributions are included in averages • People are conditional cooperators • Fischbacher et al. „01
  • 66. Building the Panel  Do experiments requiring 4-8 fresh players  Waiting time is not too high  Less consequences if there is a bug  Ask if they would like to be notified of future studies  85% opt in rate for SW „10  78% opt in rate for MW „11
  • 67. NotifyWorkers  MTurk API call that sends an e-mail to workers  Notify them a day early  Experiments work well 11am-5pm EST  If n subjects are needed, notify 3n  Done experiments with 45 players simultaneously
  • 68. Waiting Room …  Workers need to start a synchronous experiment at the same time  Workers show up at slightly different times  Have workers wait at a page until enough arrive False  Show how many they are waiting for True  After enough arrive tell the rest experiment is full  Funnel extra players into another instance of the experiment
  • 69. Attrition  In lab experiments subjects rarely walk out  On the web:  Browsers/computers crash  Internet connections go down  Bosses walk in  Need a timeout and a default action  Discard experiments with < 90% human actions  SW „10 discarded 21 of 94 experiments with 20-24 people  Discard experiment where one player acted < 50% of the time  MW „11 discarded 43 of 232 experiments with 16 people
  • 70. Security of External HITs  Code security  Code is exposed to entire internet, susceptible to attacks  SQL injection attacks: malicious user inputs database code to damage or get access to database  Scrub input for dB commands  Cross-site scripting attacks (XSS): malicious user injects code into HTTP request or HTML form  Scrub input and _GET and _POST variables
  • 72. Security of External HITs  Code security  Code is exposed to entire internet, susceptible to attacks  SQL injection attacks: malicious user inputs database code to damage or get access to database  Scrub input for dB commands  Cross-site scripting attacks (XSS): malicious user injects code into HTTP request or HTML form  Scrub input and _GET and _POST variables  Protocol Security  HITs vs Assignments  If you want fresh players in different runs (HITs) of a synchronous experiment, need to check workerIds  Made a synchronous experiment with many HITs, one assignment each  One worker accepted most of the HITs, did the quiz, got paid
  • 73. Use Cases Internal HITs External HITs  Pilot survey  Testing market  Preference elicitation mechanisms  Training data for  Behavioral game theory machine learning experiments algorithms  User-generated content  “Polling” for wisdom of  Effects of incentives crowds / general knowledge ANY online study can be done on Turk Can be used as recruitment tool
  • 74. Thank you! Conducting Behavioral Research on Amazon's Mechanical Turk (2011) Behavior Research Methods http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1691163
  • 75. Main API Functions  CreateHIT (Requirements, Pay rate, Description) – returns HIT Id and HIT Type Id  SubmitAssignment (AssignmentId) – notifies Amazon that this assignment has been completed  ApproveAssignment (AssignmentID) – Requester accepts assignment, money is transferred, also RejectAssignment  GrantBonus (WorkerID, Amount, Message) – Give the worker the specified bonus and sends message, should have a failsafe  NotifyWorkers (list of WorkerIds, Message) – e-mails message to the workers.
  • 76. Command-line Tools  Configuration files  mturk.properties – for interacting with MTurk API  [task name].input – variable name & values by row  [task name].properties – HIT parameters  [task name].question – XML file  Shell scripts  run.sh – post HIT to Mechanical Turk (creates .success file)  getResults.sh – download results (using .success file)  reviewResults.sh – approve or reject assignments  approveAndDeleteResults.sh – approve & delete all unreviewed HITs  Output files  [task name].success – created HIT ID & Assignment IDs  [task name].results – tab-delimited output from workers
  • 77. mturk.properties access_key=ABCDEF0123455676789 secret_key=Fa234asOIU/as92345kasSDfq3rDSF #service_url=http://mechanicalturk.sandbox.amazonaws.com/?Service=AWSMechanical TurkRequester service_url=http://mechanicalturk.amazonaws.com/?Service=AWSMechanicalTurkReque ster # You should not need to adjust these values. retriable_errors=Server.ServiceUnavailable,503 retry_attempts=6 retry_delay_millis=500
  • 78. [task name].properties title: Categorize Web Sites description: Look at URLs, rate, and classify them. These websites have not been screened for adult content! keywords: URL, categorize, web sites reward: 0.01 assignments: 10 annotation: # this Assignment Duration value is 30 * 60 = 0.5 hours assignmentduration:1800 # this HIT Lifetime value is 60*60*24*3 = 3 days hitlifetime:259200 # this Auto Approval period is 60*60*24*15 = 15 days autoapprovaldelay:1296000
  • 79. [task name].question <?xml version="1.0"?> <ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07- 14/ExternalQuestion.xsd"> <ExternalURL>http://mywebsite.com/experiment/index.htm</ExternalURL> <FrameHeight>600</FrameHeight> </ExternalQuestion>
  • 80. [task name].results feed Answer. hitid Assignment id Worker id accepted submitted back reject bonus 14SBGD GM5ZHZ Sat Oct 02 Sat Oct 02 1BPE1URVWQKM6DSG40 FE3OU2 A2IB92P5729K3Q 16:03:49 EDT 16:43:55 EDT 1.39 MWDVKIAJ93B4 6DJESC2 2010 2010 0DXKY 14SBGD GM5ZHZ Sat Oct 02 Sat Oct 02 1GMFLPGSL0NMWZJSTF FE3OU2 A2LKKOAIMEF1PT 16:10:23 EDT 16:44:33 EDT 1.54 XNJ1FS74J6KW 6DJESC2 2010 2010 0DXKY 14SBGD GM5ZHZ Sat Oct 02 Sat Oct 02 1VQ5ID82X6TJXBU4EKX FE3OU2 A15T1WFW5B2OPR 16:13:22 EDT 16:44:56 EDT 1.49 YISVF8C4BWJ 6DJESC2 2010 2010 0DXKY 14SBGD GM5ZHZ Sat Oct 02 Sat Oct 02 16XXR2KPFCB31UOCMB FE3OU2 A16ME0W2U4THE0 16:00:21 EDT 16:45:08 EDT 1.67 G78KLMAD4HND 6DJESC2 2010 2010 0DXKY

Notes de l'éditeur

  1. Start: 5:45 end: 6:08
  2. Small screen shot?
  3. Small screen shot?
  4. Open to anyone globally? Paid in dollars or rupees?Picture of cycle?
  5. Main point of this slide is: who are your competitors
  6. Picture here
  7. Picture here
  8. Picture here
  9. Why not over-report sixes?
  10. Last bullet point probably belongs elsewhere
  11. Picture here