SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Using crowdsourcing for Semantic
     Web applications and tools
           Elena Simperl, Karlsruhe Institute of Technology, Germany
                      Talk at SWAT4LS Summer School, Aveiro, Portugal
                                        May 2012




6/7/2012                               www.insemtives.eu                1
Semantic technologies are mainly
    about automation…
• …but many tasks in semantic content
  authoring fundamentally rely on human input
  – Modeling a domain
  – Understanding text and media content (in all their
    forms and languages)
  – Integrating data sources originating from different
    contexts
Incentives and motivators
• What motivates people to engage with an application?
• Which rewards are effective and when?

• Motivation is the driving force that makes humans
  achieve their goals
• Incentives are ‘rewards’ assigned by an external ‘judge’
  to a performer for undertaking a specific task
   – Common belief (among economists): incentives can be
     translated into a sum of money for all practical purposes
• Incentives can be related to extrinsic and intrinsic
  motivations
Examples of applications




            www.insemtives.eu   4
Incentives and motivators (2)
• Successful volunteer crowdsourcing is difficult to
  predict or replicate
   – Highly context-specific
   – Not applicable to arbitrary tasks


• Reward models often easier to study and control
  (if performance can be reliably measured)
   – Different models: pay-per-time, pay-per-unit, winner-takes-it-
     all…
   – Not always easy to abstract from social aspects (free-riding,
     social pressure…)
   – May undermine intrinsic motivation
Examples (2)




Mason & Watts: Financial incentives and the performance of the crowds, HCOMP 2009.
Amazon‘s Mechanical Turk
  • Successfully applied to transcription, classification, and
    content generation, data collection, image tagging, website
    feedback, usability tests…*
  • Increasingly used by academia for evaluation purposes
  • Extensions for quality assurance, complex workflows,
    resource management, vertical domains…




* http://behind-the-enemy-lines.blogspot.com/2010/10/what-tasks-are-posted-on-mechanical.html
What tasks can be (microtask-)
    crowdsourced?
• Best case
  – Routine work requiring common knowledge,
    decomposable into simpler, independent sub- tasks,
    performance easily measurable, no spam
• Ongoing research in task design, quality
  assurance, estimated time of completion…

• Example: open-scale tasks in MTurk
  – Generate, then vote.
  – Introduce random noise to identify potential issues in
    the second step                            Label   Correct




                                                                         Vote answers
                                               Generate answer
                                                                 image                  or not?
Examples (2)




           www.insemtives.eu   9
GWAPs and gamification
 • GWAPs: human computation disguised as casual games
 • Gamification/game mechanics: integrate game elements
   to applications*
     – Accelerated feedback cycles
          • Annual performance appraisals vs immediate feedback to maintain
            engagement
     – Clear goals and rules of play
          • Players feel empowered to achieve goals vs fuzzy, complex system of
            rules in real-world
     – Compelling narrative
          • Gamification builds a narrative that engages players to participate and
            achieve the goals of the activity
     – But in the end it’s about what tasks users
     want to get better at

*http://www.gartner.com/it/page.jsp?id=1629214
What tasks can be gamified?*
•    Work is decomposable into simpler tasks
•    Tasks are nested
•    Performance is measurable
•    One can define an obvious rewarding scheme
•    Skills can be arranged in a smooth learning
     curve


    *http://www.lostgarden.com/2008/06/what-actitivies-that-can-be-turned-into.html
What is different about semantic
      systems?
• It‘s still about the context
  of the actual application

• User engagement with
  semantic tasks to
   – Ensure knowledge is
     relevant and up-to-date
   – People accept the new
     solution and understand its
     benefits
   – Avoid cold-start problems
   – Optimize maintenance
     costs
What do you want your
    users to do?
• Semantic applications
  – Context of the actual application
  – Need to involve users in knowledge engineering tasks?
     • Incentives are related to organizational and social factors
     • Seamless integration of new features
• Semantic tools
  – Game mechanics
  – Paid crowdsourcing (possibly integrated with the tool)
• Using results of casual games

        http://gapingvoid.com/2011/06/07/pixie-dust-the-mountain-of-mediocrity/
Crowdsourcing knowledge
       engineering
• Granularity of activities is
  typically too high
• Further splitting is needed
• Crowdsource very specific tasks
  that are (highly) divisible
   –Labeling (in different languages)
   –Finding relationships
   –Populating the ontology
   –Aligning and interlinking
   –Ontology-based annotation
   –Validating the results of automatic
    methods
   –…
                          www.insemtives.eu   14
Example: ontology building




6/7/2012              www.insemtives.eu   15
Example: relationship finding
Example: video annotation




          www.insemtives.eu   17
Example: ontology alignment




          www.insemtives.eu   18
Example: ontology evaluation




           www.insemtives.eu   19
OntoGame API
• API that provides several methods that are shared by the
  OntoGame games, such as
      –    Different agreement types (e.g. selection agreement)
      –    Input matching (e.g. , majority)
      –    Game modes (multi-player, single player)
      –    Player reliability evaluation
      –    Player matching (e.g., finding the optimal partner to play)
      –    Resource (i.e., data needed for games) management
      –    Creating semantic content
• http://insemtives.svn.sourceforge.net/viewvc/insemtives/gen
  eric-gaming-toolkit


6/7/2012                             www.insemtives.eu                   20
Lessons learned
• Approach is feasible for mainstream domains, where a
  knowledge corpus is available
• Approach is per design less applicable to Semantic Web-tasks
   – Knowledge-intensive tasks are not easily nestable
   – Repetitive tasks  players‘ retention?
• Knowledge corpus has to be large-enough to allow for a rich
  game experience
   – But you need a critical mass of players to validate the results
• Advertisement is essential
• Game design vs useful content
   – Reusing well-known game paradigms
   – Reusing game outcomes and integration in existing workflows and
     tools
• Cost-benefit analysis
General guidelines
• Focus on the actual goal and incentivize related
  actions
  – Write posts, create graphics, annotate pictures, reply
    to customers in a given time…
• Build a community around the intended actions
  – Reward helping each other in performing the task and
    interaction
  – Reward recruiting new contributors
• Reward repeated actions
  – Actions become part of the daily routine
Games vs Mechanical Turk
Combining human and computational
       intelligence
           Give me the German names of all commercial airports in Baden-
           Württemberg, ordered by their most informative description.


„Retrieve the labels in German of commercial airports located
in Baden-Württemberg, ordered by the better human-readable
description of the airport given in the comment“.


• This query cannot be optimally answered automatically
   – Incorrect/missing classification of entities (e.g. classification as airports
     instead of commercial airports)
   – Missing information in data sets (e.g. German labels)
   – It is not possible to optimally perform subjective operations (e.g. comparisons
     of pictures or NL comments)
What tasks should be
      crowdsourced?
„Retrieve the labels in German of commercial airports
located in Baden-Württemberg, ordered by the better
human-readable description of the airport given in the
comment“.
                                      Classification
SPARQL Query:                     1
SELECT ?label WHERE {
  ?x a metar:CommercialHubAirport;
    rdfs:label ?label;
    rdfs:comment ?comment .                        Identity resolution
  ?x geonames:parentFeature ?z .                                       2
  ?z owl:sameAs <http://dbpedia.org/resource/Baden-Wuerttemberg>
.                            3 Missing Information
  FILTER (LANG(?label) = "de")
} ORDER BY CROWD(?comment, "Better description of %x")  4 Ordering
Crowdsourced query processing
• Extensions to VoID and
  SPARQL
• Formal, declarative
  description of data and
  tasks using SPARQL
  patterns as a basis for the
  automatic design of HITs.
• Hybrid query processing
  (adaptive techniques,
  caching, semantically
  driven task design)
HITs design: Classification
  • It is not always possible to automatically infer classification
    from the properties.
  • Example: Retrieve the names (labels) of METAR stations that correspond to
     commercial airports.
SELECT ?label WHERE {
  ?station a metar:CommercialHubAirport;
    rdfs:label ?label .}
 Input:   {?station a metar:Station;
             rdfs:label ?label;
             wgs84:lat ?lat;
             wgs84:long ?long}

 Output: {?station a ?type.
          ?type rdfs:subClassOf metar:Station}
HITs design: Ordering
  • Orderings defined via less straightforward built-ins; for instance,
    the ordering of pictorial representations of entities.
  • SPARQL extension: ORDER BY CROWD
  • Example: Retrieves all airports and their pictures, and the pictures should be
         ordered according to the more representative image of the given airport.

SELECT ?airport ?picture WHERE {
  ?airport a metar:Airport;
    foaf:depiction ?picture .
} ORDER BY CROWD(?picture,
"Most representative image for %airport")

Input:     {?airport foaf:depiction ?x, ?y}

Output: {{(?x ?y) a rdf:List} UNION {(?y ?x) a rdf:List}}
Challenges
• Appropriate level of granularity for HITs design for specific
  SPARQL constructs
• Caching
   – Naively we can materialise HIT results into
     datasets
   – How to deal with partial coverage and dynamic
     datasets
• Optimal user interfaces of graph-like content

• Pricing and workers’ assignment
Thank you
  e: elena.simperl@kit.edu, t: @esimperl

Publications available at www.insemtives.org

   Team: Maribel Acosta, Barry Norton, Katharina Siorpaes,
       Stefan Thaler, Stephan Wölger and many others

Contenu connexe

Similaire à Insemtives swat4ls 2012

INSEMTIVES Tutorial ISWC2011 - Session1
INSEMTIVES Tutorial ISWC2011 - Session1INSEMTIVES Tutorial ISWC2011 - Session1
INSEMTIVES Tutorial ISWC2011 - Session1
INSEMTIVES project
 
Insemtives cluj meetup
Insemtives cluj meetupInsemtives cluj meetup
Insemtives cluj meetup
Elena Simperl
 
SemTech2011 - Employee-of-the-Month' Badge Unlocked
SemTech2011 - Employee-of-the-Month' Badge UnlockedSemTech2011 - Employee-of-the-Month' Badge Unlocked
SemTech2011 - Employee-of-the-Month' Badge Unlocked
INSEMTIVES project
 
Insemtives cluj iccp
Insemtives cluj iccpInsemtives cluj iccp
Insemtives cluj iccp
Elena Simperl
 

Similaire à Insemtives swat4ls 2012 (20)

INSEMTIVES Tutorial ISWC2011 - Session1
INSEMTIVES Tutorial ISWC2011 - Session1INSEMTIVES Tutorial ISWC2011 - Session1
INSEMTIVES Tutorial ISWC2011 - Session1
 
Insemtives iswc2011 session1
Insemtives iswc2011 session1Insemtives iswc2011 session1
Insemtives iswc2011 session1
 
Insemtives cluj meetup
Insemtives cluj meetupInsemtives cluj meetup
Insemtives cluj meetup
 
SemTech2011 - Employee-of-the-Month' Badge Unlocked
SemTech2011 - Employee-of-the-Month' Badge UnlockedSemTech2011 - Employee-of-the-Month' Badge Unlocked
SemTech2011 - Employee-of-the-Month' Badge Unlocked
 
MODEL-DRIVEN ENGINEERING (MDE) in Practice
MODEL-DRIVEN ENGINEERING (MDE) in PracticeMODEL-DRIVEN ENGINEERING (MDE) in Practice
MODEL-DRIVEN ENGINEERING (MDE) in Practice
 
Introduction (1/6)
Introduction (1/6)Introduction (1/6)
Introduction (1/6)
 
Keynote at-icpc-2020
Keynote at-icpc-2020Keynote at-icpc-2020
Keynote at-icpc-2020
 
Insemtives cluj iccp
Insemtives cluj iccpInsemtives cluj iccp
Insemtives cluj iccp
 
[2015/2016] Software systems engineering PRINCIPLES
[2015/2016] Software systems engineering PRINCIPLES[2015/2016] Software systems engineering PRINCIPLES
[2015/2016] Software systems engineering PRINCIPLES
 
Design Systems Operations
Design Systems OperationsDesign Systems Operations
Design Systems Operations
 
Insemtives stanford
Insemtives stanfordInsemtives stanford
Insemtives stanford
 
Lecture 3 GORE.pptx
Lecture 3 GORE.pptxLecture 3 GORE.pptx
Lecture 3 GORE.pptx
 
are algorithms really a black box
are algorithms really a black boxare algorithms really a black box
are algorithms really a black box
 
Technologies for startup
Technologies for startupTechnologies for startup
Technologies for startup
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
Object Oriented System Design
Object Oriented System DesignObject Oriented System Design
Object Oriented System Design
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Specialized estimation tech
Specialized estimation techSpecialized estimation tech
Specialized estimation tech
 
Maruti gollapudi cv
Maruti gollapudi cvMaruti gollapudi cv
Maruti gollapudi cv
 
Softwareproject planning
Softwareproject planningSoftwareproject planning
Softwareproject planning
 

Plus de Elena Simperl

One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
Elena Simperl
 

Plus de Elena Simperl (20)

This talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing science
 
Knowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationKnowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generation
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
 
Ten myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfTen myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdf
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Data commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfData commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdf
 
Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
 
Crowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesCrowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart cities
 
Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on Twitter
 
High-value datasets: from publication to impact
High-value datasets: from publication to impactHigh-value datasets: from publication to impact
High-value datasets: from publication to impact
 
The story of Data Stories
The story of Data StoriesThe story of Data Stories
The story of Data Stories
 
The human face of AI: how collective and augmented intelligence can help sol...
The human face of AI:  how collective and augmented intelligence can help sol...The human face of AI:  how collective and augmented intelligence can help sol...
The human face of AI: how collective and augmented intelligence can help sol...
 
Qrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesQrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart cities
 
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
 
Qrowd and the city
Qrowd and the cityQrowd and the city
Qrowd and the city
 
Inclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachInclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approach
 

Dernier

An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Dernier (20)

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 

Insemtives swat4ls 2012

  • 1. Using crowdsourcing for Semantic Web applications and tools Elena Simperl, Karlsruhe Institute of Technology, Germany Talk at SWAT4LS Summer School, Aveiro, Portugal May 2012 6/7/2012 www.insemtives.eu 1
  • 2. Semantic technologies are mainly about automation… • …but many tasks in semantic content authoring fundamentally rely on human input – Modeling a domain – Understanding text and media content (in all their forms and languages) – Integrating data sources originating from different contexts
  • 3. Incentives and motivators • What motivates people to engage with an application? • Which rewards are effective and when? • Motivation is the driving force that makes humans achieve their goals • Incentives are ‘rewards’ assigned by an external ‘judge’ to a performer for undertaking a specific task – Common belief (among economists): incentives can be translated into a sum of money for all practical purposes • Incentives can be related to extrinsic and intrinsic motivations
  • 4. Examples of applications www.insemtives.eu 4
  • 5. Incentives and motivators (2) • Successful volunteer crowdsourcing is difficult to predict or replicate – Highly context-specific – Not applicable to arbitrary tasks • Reward models often easier to study and control (if performance can be reliably measured) – Different models: pay-per-time, pay-per-unit, winner-takes-it- all… – Not always easy to abstract from social aspects (free-riding, social pressure…) – May undermine intrinsic motivation
  • 6. Examples (2) Mason & Watts: Financial incentives and the performance of the crowds, HCOMP 2009.
  • 7. Amazon‘s Mechanical Turk • Successfully applied to transcription, classification, and content generation, data collection, image tagging, website feedback, usability tests…* • Increasingly used by academia for evaluation purposes • Extensions for quality assurance, complex workflows, resource management, vertical domains… * http://behind-the-enemy-lines.blogspot.com/2010/10/what-tasks-are-posted-on-mechanical.html
  • 8. What tasks can be (microtask-) crowdsourced? • Best case – Routine work requiring common knowledge, decomposable into simpler, independent sub- tasks, performance easily measurable, no spam • Ongoing research in task design, quality assurance, estimated time of completion… • Example: open-scale tasks in MTurk – Generate, then vote. – Introduce random noise to identify potential issues in the second step Label Correct Vote answers Generate answer image or not?
  • 9. Examples (2) www.insemtives.eu 9
  • 10. GWAPs and gamification • GWAPs: human computation disguised as casual games • Gamification/game mechanics: integrate game elements to applications* – Accelerated feedback cycles • Annual performance appraisals vs immediate feedback to maintain engagement – Clear goals and rules of play • Players feel empowered to achieve goals vs fuzzy, complex system of rules in real-world – Compelling narrative • Gamification builds a narrative that engages players to participate and achieve the goals of the activity – But in the end it’s about what tasks users want to get better at *http://www.gartner.com/it/page.jsp?id=1629214
  • 11. What tasks can be gamified?* • Work is decomposable into simpler tasks • Tasks are nested • Performance is measurable • One can define an obvious rewarding scheme • Skills can be arranged in a smooth learning curve *http://www.lostgarden.com/2008/06/what-actitivies-that-can-be-turned-into.html
  • 12. What is different about semantic systems? • It‘s still about the context of the actual application • User engagement with semantic tasks to – Ensure knowledge is relevant and up-to-date – People accept the new solution and understand its benefits – Avoid cold-start problems – Optimize maintenance costs
  • 13. What do you want your users to do? • Semantic applications – Context of the actual application – Need to involve users in knowledge engineering tasks? • Incentives are related to organizational and social factors • Seamless integration of new features • Semantic tools – Game mechanics – Paid crowdsourcing (possibly integrated with the tool) • Using results of casual games http://gapingvoid.com/2011/06/07/pixie-dust-the-mountain-of-mediocrity/
  • 14. Crowdsourcing knowledge engineering • Granularity of activities is typically too high • Further splitting is needed • Crowdsource very specific tasks that are (highly) divisible –Labeling (in different languages) –Finding relationships –Populating the ontology –Aligning and interlinking –Ontology-based annotation –Validating the results of automatic methods –… www.insemtives.eu 14
  • 15. Example: ontology building 6/7/2012 www.insemtives.eu 15
  • 17. Example: video annotation www.insemtives.eu 17
  • 18. Example: ontology alignment www.insemtives.eu 18
  • 19. Example: ontology evaluation www.insemtives.eu 19
  • 20. OntoGame API • API that provides several methods that are shared by the OntoGame games, such as – Different agreement types (e.g. selection agreement) – Input matching (e.g. , majority) – Game modes (multi-player, single player) – Player reliability evaluation – Player matching (e.g., finding the optimal partner to play) – Resource (i.e., data needed for games) management – Creating semantic content • http://insemtives.svn.sourceforge.net/viewvc/insemtives/gen eric-gaming-toolkit 6/7/2012 www.insemtives.eu 20
  • 21. Lessons learned • Approach is feasible for mainstream domains, where a knowledge corpus is available • Approach is per design less applicable to Semantic Web-tasks – Knowledge-intensive tasks are not easily nestable – Repetitive tasks  players‘ retention? • Knowledge corpus has to be large-enough to allow for a rich game experience – But you need a critical mass of players to validate the results • Advertisement is essential • Game design vs useful content – Reusing well-known game paradigms – Reusing game outcomes and integration in existing workflows and tools • Cost-benefit analysis
  • 22. General guidelines • Focus on the actual goal and incentivize related actions – Write posts, create graphics, annotate pictures, reply to customers in a given time… • Build a community around the intended actions – Reward helping each other in performing the task and interaction – Reward recruiting new contributors • Reward repeated actions – Actions become part of the daily routine
  • 24. Combining human and computational intelligence Give me the German names of all commercial airports in Baden- Württemberg, ordered by their most informative description. „Retrieve the labels in German of commercial airports located in Baden-Württemberg, ordered by the better human-readable description of the airport given in the comment“. • This query cannot be optimally answered automatically – Incorrect/missing classification of entities (e.g. classification as airports instead of commercial airports) – Missing information in data sets (e.g. German labels) – It is not possible to optimally perform subjective operations (e.g. comparisons of pictures or NL comments)
  • 25. What tasks should be crowdsourced? „Retrieve the labels in German of commercial airports located in Baden-Württemberg, ordered by the better human-readable description of the airport given in the comment“. Classification SPARQL Query: 1 SELECT ?label WHERE { ?x a metar:CommercialHubAirport; rdfs:label ?label; rdfs:comment ?comment . Identity resolution ?x geonames:parentFeature ?z . 2 ?z owl:sameAs <http://dbpedia.org/resource/Baden-Wuerttemberg> . 3 Missing Information FILTER (LANG(?label) = "de") } ORDER BY CROWD(?comment, "Better description of %x") 4 Ordering
  • 26. Crowdsourced query processing • Extensions to VoID and SPARQL • Formal, declarative description of data and tasks using SPARQL patterns as a basis for the automatic design of HITs. • Hybrid query processing (adaptive techniques, caching, semantically driven task design)
  • 27. HITs design: Classification • It is not always possible to automatically infer classification from the properties. • Example: Retrieve the names (labels) of METAR stations that correspond to commercial airports. SELECT ?label WHERE { ?station a metar:CommercialHubAirport; rdfs:label ?label .} Input: {?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long} Output: {?station a ?type. ?type rdfs:subClassOf metar:Station}
  • 28. HITs design: Ordering • Orderings defined via less straightforward built-ins; for instance, the ordering of pictorial representations of entities. • SPARQL extension: ORDER BY CROWD • Example: Retrieves all airports and their pictures, and the pictures should be ordered according to the more representative image of the given airport. SELECT ?airport ?picture WHERE { ?airport a metar:Airport; foaf:depiction ?picture . } ORDER BY CROWD(?picture, "Most representative image for %airport") Input: {?airport foaf:depiction ?x, ?y} Output: {{(?x ?y) a rdf:List} UNION {(?y ?x) a rdf:List}}
  • 29. Challenges • Appropriate level of granularity for HITs design for specific SPARQL constructs • Caching – Naively we can materialise HIT results into datasets – How to deal with partial coverage and dynamic datasets • Optimal user interfaces of graph-like content • Pricing and workers’ assignment
  • 30. Thank you e: elena.simperl@kit.edu, t: @esimperl Publications available at www.insemtives.org Team: Maribel Acosta, Barry Norton, Katharina Siorpaes, Stefan Thaler, Stephan Wölger and many others