SlideShare une entreprise Scribd logo
1  sur  23
Crowdsourcing Research Opportunities:
Lessons from Natural Language Processing
  Marta Sabou, Kalina Bontcheva, Arno Scharl
Crowdsourcing
Crowdsourcing




Undefined and generally large group
Crowdsourcing in Science
Crowdsourcing for NLP
Challenges
Crowdsourcing in science – is not new




Sir Francis Galton, “VOX POPULI”



Citizen science, from early 19th century, 60,000 – 80,000 yearly volunteers
Genre 1: Mechanised Labour
 Participants (workers) paid a small amount
  of money to complete easy tasks (HIT =
  Human Intelligence Task)
Genre 2: Games with a purpose
                                From 2008
                                240k players
Crowdsourcing via Facebook
Genre 3: Altruistic Crowdsourcing

                                    >250K players




          >670K players
Crowdsourcing in Science - Typical Use
                       •Harness human
                       intuition to prune
                       solution space




                              Process/               Evaluation
     Input                    Algorithm
                                            Output

•Form based data collection
•Labeling, Classification
•Surveys
Crowdsourcing in Science
Crowdsourcing for NLP
Challenges
Crowdsourcing in NLP
Papers relying on crowdsourcing in major NLP venues
Crowdsourcing Genres in NLP
Benefit 1: Affordable, Large-Scale Resources
 A variety of small-medium sized resources can be
  obtained with as little as 100$ using AMT
 Crowdsourcing is also cost effective for large
  resources (Poesio, 2012)


                             $/label 1 M labels ($)
Traditional High Q.             1       1,000,000
Mechanical Turk                .38   380,000 (<40%)
Game                           .19    217,000 (20%)
Benefit 2: Diversification of research
Challenge 1: Contributor Selection and Training
 From: prior to resource creation
 To: during the resource creation
Challenge 2: Aggregation and Quality Control

 From: a few experts‘ annotations
 To: multiple, noisy annotations from non-experts
 Approach 1: Statistical techniques
   Simplest (and most popular): majority voting
   More complex: Machine learning model trained on
    various features
 Approach 2: Crowdsourcing the QC process itself
            HIT1 (Create):                       HIT2 (Verify):
                                      Which of these 5 sentences is the
  Translate the following sentence:           best translation?
Conclusions (What have we learned from NLP?)

 Crowdsourcing is revolutionalising NLP
  research
   Cheaper resource acquisition
   Diversification of research agenda
 But requires more complex methodologies
   For contributor management
   For quality control and data aggregation
 Other findings: most popular
   Genre: mechanised labour
   Task: acquiring input data
   Problem: solving subjective tasks
Crowdsourcing in Science
Crowdsourcing for NLP
Challenges
User Motivation

 Motivating users
   Motivations for scientific projects might differ

   Task-granularity might impact motivation
 Promoting learning and science
   Advertise STEM research to young people
   Support learning and self-improvement through
    participation in crowdsourcing
Legal and Ethical Issues
 Acknowledging the Crowd‘s contribution
    S. Cooper, [other auhors], and Foldit players: Predicting
     protein structures with a multiplayer online game.
     Nature, 466(7307):756-760, 2010.
 Ensuring privacy and wellbeing
    Mechnised labour criticesed for low wages (,$2/hour),
     lack of worker rights
    Prevent addition, prolonged-use & user exploitation
 Licensing and consent
    Some clearly state the use of Creative Common licenses
    General failure to provide informed consent information
Technical Issues
 Scaling up to large resources
 Preventing bias
 Increasing repeatability
   Through reuse of crowdsourcing elements (e.g., HIT
    templates)
 uComp - Embedded Human Computation for
  Knowledge Extraction and Evaluation
   3 year project, starting November 2012
   Develops a scalable and generic HC framework for
    knowledge creation
   Provides reusable HC elements
Thank you!

Contenu connexe

Similaire à Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Leaning Lab il Living Lab di Pisa
Leaning Lab il Living Lab di PisaLeaning Lab il Living Lab di Pisa
Leaning Lab il Living Lab di PisaDaniele Mazzei
 
Establishing an Online Access Panel for Interactive Information Retrieval Res...
Establishing an Online Access Panel for Interactive Information Retrieval Res...Establishing an Online Access Panel for Interactive Information Retrieval Res...
Establishing an Online Access Panel for Interactive Information Retrieval Res...GESIS
 
How to facilitate crowd participation - presentation in ISPIM 2013
How to facilitate crowd participation - presentation in ISPIM 2013How to facilitate crowd participation - presentation in ISPIM 2013
How to facilitate crowd participation - presentation in ISPIM 2013Miia Kosonen
 
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Academia Sinica
 
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsMatthew Lease
 
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...Christoph Rensing
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Matthew Lease
 
Social machines: theory design and incentives
Social machines: theory design and incentivesSocial machines: theory design and incentives
Social machines: theory design and incentivesElena Simperl
 
Research to Innovation
Research to InnovationResearch to Innovation
Research to Innovationkhargonekar
 
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...Amit Sheth
 
Crowdsourcing - an overview
Crowdsourcing - an overviewCrowdsourcing - an overview
Crowdsourcing - an overviewMirko Presser
 
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Matthew Lease
 
Technology in the Wild: Dynamics and Uncertainty in Field Experiments, Vietnam
Technology in the Wild: Dynamics and Uncertainty in Field Experiments, VietnamTechnology in the Wild: Dynamics and Uncertainty in Field Experiments, Vietnam
Technology in the Wild: Dynamics and Uncertainty in Field Experiments, VietnamBenCorrigan
 
SSSW 2016 Cognition Tutorial
SSSW 2016 Cognition TutorialSSSW 2016 Cognition Tutorial
SSSW 2016 Cognition TutorialIrene Celino
 
Crowdsourcing: A Survey
Crowdsourcing: A SurveyCrowdsourcing: A Survey
Crowdsourcing: A SurveyIJERA Editor
 
Overview of Data Science and AI
Overview of Data Science and AIOverview of Data Science and AI
Overview of Data Science and AIjohnstamford
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData TheContentMine
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustLEARN Project
 

Similaire à Crowdsourcing Research Opportunities: Lessons from Natural Language Processing (20)

Leaning Lab il Living Lab di Pisa
Leaning Lab il Living Lab di PisaLeaning Lab il Living Lab di Pisa
Leaning Lab il Living Lab di Pisa
 
Establishing an Online Access Panel for Interactive Information Retrieval Res...
Establishing an Online Access Panel for Interactive Information Retrieval Res...Establishing an Online Access Panel for Interactive Information Retrieval Res...
Establishing an Online Access Panel for Interactive Information Retrieval Res...
 
How to facilitate crowd participation - presentation in ISPIM 2013
How to facilitate crowd participation - presentation in ISPIM 2013How to facilitate crowd participation - presentation in ISPIM 2013
How to facilitate crowd participation - presentation in ISPIM 2013
 
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
 
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
 
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)
 
Social machines: theory design and incentives
Social machines: theory design and incentivesSocial machines: theory design and incentives
Social machines: theory design and incentives
 
Research to Innovation
Research to InnovationResearch to Innovation
Research to Innovation
 
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
 
David Rejeski: The Synthetic Biology Startup Ecosystem in the US
David Rejeski: The Synthetic Biology Startup Ecosystem in the USDavid Rejeski: The Synthetic Biology Startup Ecosystem in the US
David Rejeski: The Synthetic Biology Startup Ecosystem in the US
 
Crowdsourcing - an overview
Crowdsourcing - an overviewCrowdsourcing - an overview
Crowdsourcing - an overview
 
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
 
Information entanglement
Information entanglementInformation entanglement
Information entanglement
 
Technology in the Wild: Dynamics and Uncertainty in Field Experiments, Vietnam
Technology in the Wild: Dynamics and Uncertainty in Field Experiments, VietnamTechnology in the Wild: Dynamics and Uncertainty in Field Experiments, Vietnam
Technology in the Wild: Dynamics and Uncertainty in Field Experiments, Vietnam
 
SSSW 2016 Cognition Tutorial
SSSW 2016 Cognition TutorialSSSW 2016 Cognition Tutorial
SSSW 2016 Cognition Tutorial
 
Crowdsourcing: A Survey
Crowdsourcing: A SurveyCrowdsourcing: A Survey
Crowdsourcing: A Survey
 
Overview of Data Science and AI
Overview of Data Science and AIOverview of Data Science and AI
Overview of Data Science and AI
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-Rust
 

Dernier

Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best ServicesMysore Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best Servicesnajka9823
 
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝soniya singh
 
Instruction Manual | ThermTec Hunt Thermal Clip-On Series | Optics Trade
Instruction Manual | ThermTec Hunt Thermal Clip-On Series | Optics TradeInstruction Manual | ThermTec Hunt Thermal Clip-On Series | Optics Trade
Instruction Manual | ThermTec Hunt Thermal Clip-On Series | Optics TradeOptics-Trade
 
Austria vs France David Alaba Switches Position to Defender in Austria's Euro...
Austria vs France David Alaba Switches Position to Defender in Austria's Euro...Austria vs France David Alaba Switches Position to Defender in Austria's Euro...
Austria vs France David Alaba Switches Position to Defender in Austria's Euro...Eticketing.co
 
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024Resultados del Campeonato mundial de Marcha por equipos Antalya 2024
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024Judith Chuquipul
 
8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7
8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/78377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7
8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7dollysharma2066
 
Technical Data | ThermTec Wild 335 | Optics Trade
Technical Data | ThermTec Wild 335 | Optics TradeTechnical Data | ThermTec Wild 335 | Optics Trade
Technical Data | ThermTec Wild 335 | Optics TradeOptics-Trade
 
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdfJORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdfArturo Pacheco Alvarez
 
IPL Quiz ( weekly quiz) by SJU quizzers.
IPL Quiz ( weekly quiz) by SJU quizzers.IPL Quiz ( weekly quiz) by SJU quizzers.
IPL Quiz ( weekly quiz) by SJU quizzers.SJU Quizzers
 
Technical Data | ThermTec Wild 650L | Optics Trade
Technical Data | ThermTec Wild 650L | Optics TradeTechnical Data | ThermTec Wild 650L | Optics Trade
Technical Data | ThermTec Wild 650L | Optics TradeOptics-Trade
 
办理学位证(KCL文凭证书)伦敦国王学院毕业证成绩单原版一模一样
办理学位证(KCL文凭证书)伦敦国王学院毕业证成绩单原版一模一样办理学位证(KCL文凭证书)伦敦国王学院毕业证成绩单原版一模一样
办理学位证(KCL文凭证书)伦敦国王学院毕业证成绩单原版一模一样7pn7zv3i
 
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docx
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docxFrance's UEFA Euro 2024 Ambitions Amid Coman's Injury.docx
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docxEuro Cup 2024 Tickets
 
Expert Pool Table Refelting in Lee & Collier County, FL
Expert Pool Table Refelting in Lee & Collier County, FLExpert Pool Table Refelting in Lee & Collier County, FL
Expert Pool Table Refelting in Lee & Collier County, FLAll American Billiards
 
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics TradeInstruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics TradeOptics-Trade
 
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited Money
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited MoneyReal Moto 2 MOD APK v1.1.721 All Bikes, Unlimited Money
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited MoneyApk Toly
 
Croatia vs Italy UEFA Euro 2024 Croatia's Checkered Legacy on Display in New ...
Croatia vs Italy UEFA Euro 2024 Croatia's Checkered Legacy on Display in New ...Croatia vs Italy UEFA Euro 2024 Croatia's Checkered Legacy on Display in New ...
Croatia vs Italy UEFA Euro 2024 Croatia's Checkered Legacy on Display in New ...Eticketing.co
 

Dernier (18)

Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best ServicesMysore Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best Services
 
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝
 
Instruction Manual | ThermTec Hunt Thermal Clip-On Series | Optics Trade
Instruction Manual | ThermTec Hunt Thermal Clip-On Series | Optics TradeInstruction Manual | ThermTec Hunt Thermal Clip-On Series | Optics Trade
Instruction Manual | ThermTec Hunt Thermal Clip-On Series | Optics Trade
 
Austria vs France David Alaba Switches Position to Defender in Austria's Euro...
Austria vs France David Alaba Switches Position to Defender in Austria's Euro...Austria vs France David Alaba Switches Position to Defender in Austria's Euro...
Austria vs France David Alaba Switches Position to Defender in Austria's Euro...
 
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024Resultados del Campeonato mundial de Marcha por equipos Antalya 2024
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024
 
8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7
8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/78377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7
8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7
 
Technical Data | ThermTec Wild 335 | Optics Trade
Technical Data | ThermTec Wild 335 | Optics TradeTechnical Data | ThermTec Wild 335 | Optics Trade
Technical Data | ThermTec Wild 335 | Optics Trade
 
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdfJORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
 
IPL Quiz ( weekly quiz) by SJU quizzers.
IPL Quiz ( weekly quiz) by SJU quizzers.IPL Quiz ( weekly quiz) by SJU quizzers.
IPL Quiz ( weekly quiz) by SJU quizzers.
 
young Call girls in Moolchand 🔝 9953056974 🔝 Delhi escort Service
young Call girls in Moolchand 🔝 9953056974 🔝 Delhi escort Serviceyoung Call girls in Moolchand 🔝 9953056974 🔝 Delhi escort Service
young Call girls in Moolchand 🔝 9953056974 🔝 Delhi escort Service
 
Technical Data | ThermTec Wild 650L | Optics Trade
Technical Data | ThermTec Wild 650L | Optics TradeTechnical Data | ThermTec Wild 650L | Optics Trade
Technical Data | ThermTec Wild 650L | Optics Trade
 
办理学位证(KCL文凭证书)伦敦国王学院毕业证成绩单原版一模一样
办理学位证(KCL文凭证书)伦敦国王学院毕业证成绩单原版一模一样办理学位证(KCL文凭证书)伦敦国王学院毕业证成绩单原版一模一样
办理学位证(KCL文凭证书)伦敦国王学院毕业证成绩单原版一模一样
 
FULL ENJOY Call Girls In Savitri Nagar (Delhi) Call Us 9953056974
FULL ENJOY Call Girls In  Savitri Nagar (Delhi) Call Us 9953056974FULL ENJOY Call Girls In  Savitri Nagar (Delhi) Call Us 9953056974
FULL ENJOY Call Girls In Savitri Nagar (Delhi) Call Us 9953056974
 
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docx
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docxFrance's UEFA Euro 2024 Ambitions Amid Coman's Injury.docx
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docx
 
Expert Pool Table Refelting in Lee & Collier County, FL
Expert Pool Table Refelting in Lee & Collier County, FLExpert Pool Table Refelting in Lee & Collier County, FL
Expert Pool Table Refelting in Lee & Collier County, FL
 
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics TradeInstruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
 
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited Money
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited MoneyReal Moto 2 MOD APK v1.1.721 All Bikes, Unlimited Money
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited Money
 
Croatia vs Italy UEFA Euro 2024 Croatia's Checkered Legacy on Display in New ...
Croatia vs Italy UEFA Euro 2024 Croatia's Checkered Legacy on Display in New ...Croatia vs Italy UEFA Euro 2024 Croatia's Checkered Legacy on Display in New ...
Croatia vs Italy UEFA Euro 2024 Croatia's Checkered Legacy on Display in New ...
 

Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

  • 1. Crowdsourcing Research Opportunities: Lessons from Natural Language Processing Marta Sabou, Kalina Bontcheva, Arno Scharl
  • 5. Crowdsourcing in science – is not new Sir Francis Galton, “VOX POPULI” Citizen science, from early 19th century, 60,000 – 80,000 yearly volunteers
  • 6. Genre 1: Mechanised Labour  Participants (workers) paid a small amount of money to complete easy tasks (HIT = Human Intelligence Task)
  • 7. Genre 2: Games with a purpose From 2008 240k players
  • 9. Genre 3: Altruistic Crowdsourcing >250K players >670K players
  • 10. Crowdsourcing in Science - Typical Use •Harness human intuition to prune solution space Process/ Evaluation Input Algorithm Output •Form based data collection •Labeling, Classification •Surveys
  • 12. Crowdsourcing in NLP Papers relying on crowdsourcing in major NLP venues
  • 14. Benefit 1: Affordable, Large-Scale Resources  A variety of small-medium sized resources can be obtained with as little as 100$ using AMT  Crowdsourcing is also cost effective for large resources (Poesio, 2012) $/label 1 M labels ($) Traditional High Q. 1 1,000,000 Mechanical Turk .38 380,000 (<40%) Game .19 217,000 (20%)
  • 16. Challenge 1: Contributor Selection and Training  From: prior to resource creation  To: during the resource creation
  • 17. Challenge 2: Aggregation and Quality Control  From: a few experts‘ annotations  To: multiple, noisy annotations from non-experts  Approach 1: Statistical techniques  Simplest (and most popular): majority voting  More complex: Machine learning model trained on various features  Approach 2: Crowdsourcing the QC process itself HIT1 (Create): HIT2 (Verify): Which of these 5 sentences is the Translate the following sentence: best translation?
  • 18. Conclusions (What have we learned from NLP?)  Crowdsourcing is revolutionalising NLP research  Cheaper resource acquisition  Diversification of research agenda  But requires more complex methodologies  For contributor management  For quality control and data aggregation  Other findings: most popular  Genre: mechanised labour  Task: acquiring input data  Problem: solving subjective tasks
  • 20. User Motivation  Motivating users  Motivations for scientific projects might differ  Task-granularity might impact motivation  Promoting learning and science  Advertise STEM research to young people  Support learning and self-improvement through participation in crowdsourcing
  • 21. Legal and Ethical Issues  Acknowledging the Crowd‘s contribution  S. Cooper, [other auhors], and Foldit players: Predicting protein structures with a multiplayer online game. Nature, 466(7307):756-760, 2010.  Ensuring privacy and wellbeing  Mechnised labour criticesed for low wages (,$2/hour), lack of worker rights  Prevent addition, prolonged-use & user exploitation  Licensing and consent  Some clearly state the use of Creative Common licenses  General failure to provide informed consent information
  • 22. Technical Issues  Scaling up to large resources  Preventing bias  Increasing repeatability  Through reuse of crowdsourcing elements (e.g., HIT templates)  uComp - Embedded Human Computation for Knowledge Extraction and Evaluation  3 year project, starting November 2012  Develops a scalable and generic HC framework for knowledge creation  Provides reusable HC elements

Notes de l'éditeur

  1. How does crowdsourcing relate to Research 2.0.? My talk will illustrate how certain web technologies can reduce the gap between scientists on one hand, and ordinary citizens on the other – thus enabling a certain form of research 2.0. If Web2.0 is often associate to “user generated content”, research 2.0, at least the one enabled by crowdsourcing, is “user generated/supported science”. Taking the field of NLP as an example, I will discuss how crowdsourcing is changing research practices and its effect on this scientific discipline. Research 2.0 deals with the involvement of the web in science. It spans from the utilization of Web 2.0 tools and technologies in research to a more open and sharing approach to science. Some definitions of Research 2.0 even include notions of a methodological change due to the abundance of data, and the nature of the socio-technical systems on the web. The change in scientific practices due to the involvement of Research 2.0 tools and technologies in the research process and the effects this has on science itself.
  2. But not projects that: Do not have the creation of scientific data as their main goal (e.g., Wikipedia) Use crowds to support auxiliary scientific processes (e.g., Mendeley) Recruit online but experiment in lab Recruit processing power and NOT human effort (SETI@home) Have as contributors scientific stuff alone, e.g., collaboratories
  3. But not projects that: Do not have the creation of scientific data as their main goal (e.g., Wikipedia) Use crowds to support auxiliary scientific processes (e.g., Mendeley) Recruit online but experiment in lab Recruit processing power and NOT human effort (SETI@home) Have as contributors scientific stuff alone, e.g., collaboratories
  4. In fact, already in 1907, Sir Francis Galton, (Darwin‘s cousin, A brilliant Victorian scientist,) has published a Nature article entitled „VOX Populi“ (or the voice of the people, the voice of the crowd), where he discribes his experiment at a lifestock fair: 787 persons were asked to estimate the weight of the ox, and, while none came close to the real value, the mean of the guesses was almost spot-on. Meanwhile, some other societies were using the crowd differently, namely, to support them in gathering scintific data. From the early 19th century, the Aubodon society has been relying on volunteers to count species of local birds. Their campaings continue to this date, and in 2012, volunteers submitted over 100, 000 ch ecklists leading to observations about 623 specied and over 17 million individual birds. These activities are often termed as citizen science. This is not a novel phenomenon Citizen science projects around since the beginning of last century (at least) There is a vast landscape and variety of citizen science projects where scientists call on the public for help - some examples, including from Lora‘s paper (her talk might have some mentions as well) IT enables virtual citizen science projects and this upsurge is a direct consequence of new and improved ways to involve the public into scientifc procecess
  5. Participants contribute while having fun 13 Apr 2012 | 16:35 EDT | Posted by Rebecca Hersher: Two years ago, FoldIt made headlines, lots of them, when players of the online protein-folding video game took three weeks to solve the three dimensional structure of a simian retroviral protein that is used in animal models of HIV, but whose structure had eluded biochemists for more than a decade. “: http://blogs.nature.com/spoonful/2012/04/foldit-games-next-play-crowdsourcing-better-drug-design.html Phylo is an experimental video game about multiple sequence alignment optimisation. “Since the launch in November 2010, we received more than 350,000 solutions submitted from more than 12,000 registered users. Our results show that solutions submitted contributed to improving the accuracy of up to 70% of the alignment blocks considered.” It is about showing that humans can aid algorithms rather than comparing human and machine performance.
  6. In 2008, the group built a FB game that required players to rate the sentiment associated to a sentence on a 5-values scale, then used this as atraining corpus for the sentiment detection module. Over 800 player played the game. In 2009 the game has been released in a slightly different form and with the aim to gather sentiment lexicons, i.e., associations between words and their sentiment polarity (ratings from as many as 12 players were averaged to get the final value). The game ran in 7 different languages and attracted over 4000 players. Let this be an introductory example of a crowdsourcing project, however, crowdsourcing is a not a new phenomenon.
  7. Volunteer contributes because he is interested in a domain, supports a cause
  8. More languages E.g., Urdu, Arabic, Hitian Creole Irvine and Klementiev create lexicons between English and 37 low resourced languages Diverse types of text (besides news-wire) Emails, twitter feeds, augmented and alternative communication texts Speech: transcription, accent rating, assessment of dialog systems Subjective tasks Sentiment detection, translation, word sense disambiguation, anaphora resolution, question answering, textual entailment, text summarization …. Niche language phenomena Lab experiments reproduced at a fraction of their cost E.g., contextual predictivity (Cloze task), corpus trends
  9. Completely new wrt traditional approaches Uses „create-verify“ workflows Widespred technique for translation tasks, less for labeling
  10. STEM (Science, Technology, Engineering, Mathematics) Harness increased visability and ease of engagement in social networks to make STEM research more attractive and understandable =&gt; more young people to study STEM
  11. STEM (Science, Technology, Engineering, Mathematics) Harness increased visability and ease of engagement in social networks to make STEM research more attractive and understandable =&gt; more young people to study STEM
  12. STEM (Science, Technology, Engineering, Mathematics) Harness increased visability and ease of engagement in social networks to make STEM research more attractive and understandable =&gt; more young people to study STEM