SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
Comparing human computation
          services
  Elena Simperl (University of Southampton)
Human computation
• Outsourcing tasks that machines find difficult
  to solve to humans (accuracy, efficiency,
  costs)
Dimensions of human computation
                                               See also [Quinn & Bederson, 2012]

• What is outsourced
  – Tasks that require human skills that cannot be easily replicated by
    machines (visual recognition, language understanding, knowledge
    acquisition, basic human communication etc)
  – Sometimes only certain steps of a task are outsourced to humans, the
    rest is executed automatically
• How is the task being outsourced
  – Tasks broken down into smaller units undertaken in parallel by
    different people
  – Coordination required to handle cases with more complex workflows
  – Partial or independent answers consolidated and aggregated into
    complete solution
Dimensions of human computation
      (2)                See also [Quinn & Bederson, 2012]

• How are the results validated
   – Solutions space closed (choice of correct answer) vs open
     (collection of potential solutions)
   – Performance objectively measured or through ratings/votes
   – Statistical techniques employed to predict accurate solutions
       • May take into account confidence values of algorithmically generated solutions

• How can the overall process be optimized
   – Incentives and motivators (altruism, entertainment, intellectual challenge,
     social status, competition, financial compensation)
   – Assigning tasks to people based on their skills and performance (as
     opposed to random assignments)
   – Symbiotic combinations of human- and machine-driven computation,
     including combinations of different forms of crowdsourcing
Games with a purpose (GWAP)
                                       See also [van Ahn & Dabbish, 2008]

• Human computation disguised as casual games
• Tasks are divided into parallelizable atomic units
  (challenges) solved (consensually) by players
• Game models
   – Single vs multi-player
   – Selection agreement vs input agreement vs inversion-
     problem games
Dimensions of GWAP design
• What tasks are amenable to ‚GWAP-ification‘
   –   Work is decomposable into simpler (nested) tasks
   –   Performance is measurable according to an obvious rewarding scheme
   –   Skills can be arranged in a smooth learning curve
   –   Player’s retention vs repetitive tasks
• Note: Not all domains are equally appealing
   – Application domain needs to attract a large user base
   – Knowledge corpus has to be large-enough to avoid repetitions
   – Quality of automatically computed input may hamper game
     experience
• Attracting and retaining players
   – You need a critical mass of players to validate the results
   – Advertisement, building upon an existing user base
   – Continuous development
Microtask crowdsourcing
• Similar types of tasks, but different incentives
  model (monetary reward)
• Successfully applied to transcription,
  classification, and content generation, data
  collection, image tagging, website feedback,
  usability tests…
Our experiment
• Goals
  – Compare the two approaches for a given task
    (ontology engineering)
  – More general: description framework to compare
    different human computation models and use
    them in combination
• Set-up
  – Re-build OntoPronto within Amazon’s Mechanical
    Turk, based on existing OntoPronto data
OntoPronto
• Goal: extend Proton upper-
  level ontology
• Multi-player (single player
  using pre-recorded rounds)
   – Step 1: topic of Wikipedia
     article classified as class or
     instance
   – Step 2: browsing the Proton
     hierarchy from the root to
     identify most specific class
     which matches the topic of
     the article
• Consensual answers,
  additional points for more
  specific classes
Validation of players‘ inputs
• A topic is played at least six times
• Number of consensual answers to each
  question at least four
• The number of consensual answers modulo
  reliability more than half of the number of
  total answers received
  – Reliability measures relation consensual and
    correct answers given by a player
Evaluation and collected data
• 270 distinct players, 365 Wikipedia articles,
  2905 game rounds

• Approach is effective
  – 77% of challenges solved consensually
  – If agreement, most answers correct (97%)
• …and efficient
  – 122 classes and entities extending Proton (after
    validation)
Implementation through MTurk
• Server-side component
   – Generates new HITs
   – Evaluate assignments of
     existing HITs
• Two types of HITs
   – Class or instance (1 cent)
   – Proton class (5 cent)
• HITs generated using title,
  first paragraph and first
  image (if available)
• Qualification test with
  five questions, turkers
  with at least 90%
  accepted tasks
Implementation through MTurk (2)
• Multiple assignments per HIT, four consensual
  answers needed
  – (number of answers needed for consensus - 1) x
    (number of available answer options) + 1
• HITs with (four) consensual answers are
  considered completed
• Assignments matching consensus accepted
• HIT costs maximally (number of answers needed
  for consensus) x (reward per correct assignment)
Evaluation and collected data
Development time and costs per
    contribution
• OntoPronto: five development months
• MTurk: one month
  – Additional effort required because of the setting
    of the experiment
  – Less effort as HIT design and validation
    mechanisms adopted from OntoPronto
• Average cost for a correct answer on MTurk
  0.74 $
Quality of contributions
• Both approaches resulted in high-quality data
• Diversity and biases (270 players vs 16 turkers)
  – Additional functionality of MTurk
• Game-based approach economic in the long
  run if player retention strategy available
• Microtask-based approach uses ‚predictable‘
  motivation framework
• MTurk less diverse (270 players vs 16 turkers)
Challenges and open questions
• Synchronous vs asynchronous modes of
  interaction
  – Consensual answers, ratings by other turkers?
• Executing inter-dependent tasks in MTurk
  – Mapping game steps into HITs
  – Grouping HITs
• Using game-like interfaces within microtask
  crowdsourcing platforms
  – Impact on incentives and turkers‘ behavior?
• Using MTurk to test GWAP design decisions
Challenges and open questions (2)
• Descriptive framework for classification of human
  computation systems
   –   Types of tasks and their mode of execution
   –   Participants and their roles
   –   Interaction with system and among participants
   –   Validation of results
   –   Consolidation and aggregation of inputs into complete
       solution
• Reusable collection of algorithms for quality assurance,
  task assignment, workflow management, results
  consolidation etc
• Schemas recording provenance of crowdsourced data
S. Thaler, E. Simperl, S. Wölger. An experiment in
comparing human computation techniques. IEEE
     Internet Computing, 16(5): 52-58, 2012

             For more information
          email: e.simperl@soton.ac.uk
               twitter: @esimperl
Theory and practice of social machines

          http://sociam.org/www2013/

             Deadline: 25.02.2013

Contenu connexe

En vedette (9)

Human computation and the Semantic Web (examples)
Human computation and the Semantic Web (examples)Human computation and the Semantic Web (examples)
Human computation and the Semantic Web (examples)
 
Planetdata simpda
Planetdata simpdaPlanetdata simpda
Planetdata simpda
 
Insemtives iswc2011 session1
Insemtives iswc2011 session1Insemtives iswc2011 session1
Insemtives iswc2011 session1
 
Eswc2012 ss ontologies
Eswc2012 ss ontologiesEswc2012 ss ontologies
Eswc2012 ss ontologies
 
Methods and guidelines for the design and analysis of online citizen science
Methods and guidelines for the design and analysis of online citizen scienceMethods and guidelines for the design and analysis of online citizen science
Methods and guidelines for the design and analysis of online citizen science
 
Sssc2011 semsphere
Sssc2011 semsphereSssc2011 semsphere
Sssc2011 semsphere
 
Insemtives semtech2010-20100622
Insemtives semtech2010-20100622Insemtives semtech2010-20100622
Insemtives semtech2010-20100622
 
We are the data
We are the dataWe are the data
We are the data
 
Wims2012
Wims2012Wims2012
Wims2012
 

Similaire à Comparison GWAP Mechanical Turk

INSEMTIVES Tutorial ISWC2011 - Session1
INSEMTIVES Tutorial ISWC2011 - Session1INSEMTIVES Tutorial ISWC2011 - Session1
INSEMTIVES Tutorial ISWC2011 - Session1
INSEMTIVES project
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
Srinath Perera
 
Understanding the impact of certain uncertain event using bayesian network
Understanding the impact of  certain uncertain event using bayesian networkUnderstanding the impact of  certain uncertain event using bayesian network
Understanding the impact of certain uncertain event using bayesian network
Kobi Vider
 

Similaire à Comparison GWAP Mechanical Turk (20)

INSEMTIVES Tutorial ISWC2011 - Session1
INSEMTIVES Tutorial ISWC2011 - Session1INSEMTIVES Tutorial ISWC2011 - Session1
INSEMTIVES Tutorial ISWC2011 - Session1
 
Insemtives swat4ls 2012
Insemtives swat4ls 2012Insemtives swat4ls 2012
Insemtives swat4ls 2012
 
the application of machine lerning algorithm for SEE
the application of machine lerning algorithm for SEEthe application of machine lerning algorithm for SEE
the application of machine lerning algorithm for SEE
 
Ibm colloquium 070915_nyberg
Ibm colloquium 070915_nybergIbm colloquium 070915_nyberg
Ibm colloquium 070915_nyberg
 
Requirements analysis lecture
Requirements analysis lectureRequirements analysis lecture
Requirements analysis lecture
 
Building Surveys in Qualtrics for Efficient Analytics
Building Surveys in Qualtrics for Efficient AnalyticsBuilding Surveys in Qualtrics for Efficient Analytics
Building Surveys in Qualtrics for Efficient Analytics
 
Vitriol
VitriolVitriol
Vitriol
 
ai.pptx
ai.pptxai.pptx
ai.pptx
 
Simulation and modeling introduction.pptx
Simulation and modeling introduction.pptxSimulation and modeling introduction.pptx
Simulation and modeling introduction.pptx
 
The Art of Project Estimation
The Art of Project EstimationThe Art of Project Estimation
The Art of Project Estimation
 
Tutorial on Using Amazon Mechanical Turk (MTurk) for HCI Research
Tutorial on Using Amazon Mechanical Turk (MTurk) for HCI ResearchTutorial on Using Amazon Mechanical Turk (MTurk) for HCI Research
Tutorial on Using Amazon Mechanical Turk (MTurk) for HCI Research
 
kdd2015
kdd2015kdd2015
kdd2015
 
Unit 1 DSS
Unit 1 DSSUnit 1 DSS
Unit 1 DSS
 
Mramadhani project presentation report version 02
Mramadhani project presentation report version 02Mramadhani project presentation report version 02
Mramadhani project presentation report version 02
 
Crowdsourcing the Semantic Web
Crowdsourcing the Semantic WebCrowdsourcing the Semantic Web
Crowdsourcing the Semantic Web
 
Influence of Timeline and Named-entity Components on User Engagement
Influence of Timeline and Named-entity Components on User Engagement Influence of Timeline and Named-entity Components on User Engagement
Influence of Timeline and Named-entity Components on User Engagement
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
 
System engineering analysis and design
System engineering analysis and designSystem engineering analysis and design
System engineering analysis and design
 
10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge
10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge
10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge
 
Understanding the impact of certain uncertain event using bayesian network
Understanding the impact of  certain uncertain event using bayesian networkUnderstanding the impact of  certain uncertain event using bayesian network
Understanding the impact of certain uncertain event using bayesian network
 

Plus de Elena Simperl

One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
Elena Simperl
 

Plus de Elena Simperl (20)

This talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing science
 
Knowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationKnowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generation
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
 
Ten myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfTen myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdf
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Data commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfData commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdf
 
Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
 
Crowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesCrowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart cities
 
Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on Twitter
 
High-value datasets: from publication to impact
High-value datasets: from publication to impactHigh-value datasets: from publication to impact
High-value datasets: from publication to impact
 
The story of Data Stories
The story of Data StoriesThe story of Data Stories
The story of Data Stories
 
The human face of AI: how collective and augmented intelligence can help sol...
The human face of AI:  how collective and augmented intelligence can help sol...The human face of AI:  how collective and augmented intelligence can help sol...
The human face of AI: how collective and augmented intelligence can help sol...
 
Qrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesQrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart cities
 
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
 
Qrowd and the city
Qrowd and the cityQrowd and the city
Qrowd and the city
 
Inclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachInclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approach
 

Dernier

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Dernier (20)

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 

Comparison GWAP Mechanical Turk

  • 1. Comparing human computation services Elena Simperl (University of Southampton)
  • 2. Human computation • Outsourcing tasks that machines find difficult to solve to humans (accuracy, efficiency, costs)
  • 3. Dimensions of human computation See also [Quinn & Bederson, 2012] • What is outsourced – Tasks that require human skills that cannot be easily replicated by machines (visual recognition, language understanding, knowledge acquisition, basic human communication etc) – Sometimes only certain steps of a task are outsourced to humans, the rest is executed automatically • How is the task being outsourced – Tasks broken down into smaller units undertaken in parallel by different people – Coordination required to handle cases with more complex workflows – Partial or independent answers consolidated and aggregated into complete solution
  • 4. Dimensions of human computation (2) See also [Quinn & Bederson, 2012] • How are the results validated – Solutions space closed (choice of correct answer) vs open (collection of potential solutions) – Performance objectively measured or through ratings/votes – Statistical techniques employed to predict accurate solutions • May take into account confidence values of algorithmically generated solutions • How can the overall process be optimized – Incentives and motivators (altruism, entertainment, intellectual challenge, social status, competition, financial compensation) – Assigning tasks to people based on their skills and performance (as opposed to random assignments) – Symbiotic combinations of human- and machine-driven computation, including combinations of different forms of crowdsourcing
  • 5. Games with a purpose (GWAP) See also [van Ahn & Dabbish, 2008] • Human computation disguised as casual games • Tasks are divided into parallelizable atomic units (challenges) solved (consensually) by players • Game models – Single vs multi-player – Selection agreement vs input agreement vs inversion- problem games
  • 6. Dimensions of GWAP design • What tasks are amenable to ‚GWAP-ification‘ – Work is decomposable into simpler (nested) tasks – Performance is measurable according to an obvious rewarding scheme – Skills can be arranged in a smooth learning curve – Player’s retention vs repetitive tasks • Note: Not all domains are equally appealing – Application domain needs to attract a large user base – Knowledge corpus has to be large-enough to avoid repetitions – Quality of automatically computed input may hamper game experience • Attracting and retaining players – You need a critical mass of players to validate the results – Advertisement, building upon an existing user base – Continuous development
  • 7. Microtask crowdsourcing • Similar types of tasks, but different incentives model (monetary reward) • Successfully applied to transcription, classification, and content generation, data collection, image tagging, website feedback, usability tests…
  • 8. Our experiment • Goals – Compare the two approaches for a given task (ontology engineering) – More general: description framework to compare different human computation models and use them in combination • Set-up – Re-build OntoPronto within Amazon’s Mechanical Turk, based on existing OntoPronto data
  • 9. OntoPronto • Goal: extend Proton upper- level ontology • Multi-player (single player using pre-recorded rounds) – Step 1: topic of Wikipedia article classified as class or instance – Step 2: browsing the Proton hierarchy from the root to identify most specific class which matches the topic of the article • Consensual answers, additional points for more specific classes
  • 10. Validation of players‘ inputs • A topic is played at least six times • Number of consensual answers to each question at least four • The number of consensual answers modulo reliability more than half of the number of total answers received – Reliability measures relation consensual and correct answers given by a player
  • 11. Evaluation and collected data • 270 distinct players, 365 Wikipedia articles, 2905 game rounds • Approach is effective – 77% of challenges solved consensually – If agreement, most answers correct (97%) • …and efficient – 122 classes and entities extending Proton (after validation)
  • 12. Implementation through MTurk • Server-side component – Generates new HITs – Evaluate assignments of existing HITs • Two types of HITs – Class or instance (1 cent) – Proton class (5 cent) • HITs generated using title, first paragraph and first image (if available) • Qualification test with five questions, turkers with at least 90% accepted tasks
  • 13. Implementation through MTurk (2) • Multiple assignments per HIT, four consensual answers needed – (number of answers needed for consensus - 1) x (number of available answer options) + 1 • HITs with (four) consensual answers are considered completed • Assignments matching consensus accepted • HIT costs maximally (number of answers needed for consensus) x (reward per correct assignment)
  • 15. Development time and costs per contribution • OntoPronto: five development months • MTurk: one month – Additional effort required because of the setting of the experiment – Less effort as HIT design and validation mechanisms adopted from OntoPronto • Average cost for a correct answer on MTurk 0.74 $
  • 16. Quality of contributions • Both approaches resulted in high-quality data • Diversity and biases (270 players vs 16 turkers) – Additional functionality of MTurk • Game-based approach economic in the long run if player retention strategy available • Microtask-based approach uses ‚predictable‘ motivation framework • MTurk less diverse (270 players vs 16 turkers)
  • 17. Challenges and open questions • Synchronous vs asynchronous modes of interaction – Consensual answers, ratings by other turkers? • Executing inter-dependent tasks in MTurk – Mapping game steps into HITs – Grouping HITs • Using game-like interfaces within microtask crowdsourcing platforms – Impact on incentives and turkers‘ behavior? • Using MTurk to test GWAP design decisions
  • 18. Challenges and open questions (2) • Descriptive framework for classification of human computation systems – Types of tasks and their mode of execution – Participants and their roles – Interaction with system and among participants – Validation of results – Consolidation and aggregation of inputs into complete solution • Reusable collection of algorithms for quality assurance, task assignment, workflow management, results consolidation etc • Schemas recording provenance of crowdsourced data
  • 19. S. Thaler, E. Simperl, S. Wölger. An experiment in comparing human computation techniques. IEEE Internet Computing, 16(5): 52-58, 2012 For more information email: e.simperl@soton.ac.uk twitter: @esimperl
  • 20. Theory and practice of social machines http://sociam.org/www2013/ Deadline: 25.02.2013