Measures of Dispersion and Variability: Range, QD, AD and SD
Insemtives swat4ls 2012
1. Using crowdsourcing for Semantic
Web applications and tools
Elena Simperl, Karlsruhe Institute of Technology, Germany
Talk at SWAT4LS Summer School, Aveiro, Portugal
May 2012
6/7/2012 www.insemtives.eu 1
2. Semantic technologies are mainly
about automation…
• …but many tasks in semantic content
authoring fundamentally rely on human input
– Modeling a domain
– Understanding text and media content (in all their
forms and languages)
– Integrating data sources originating from different
contexts
3. Incentives and motivators
• What motivates people to engage with an application?
• Which rewards are effective and when?
• Motivation is the driving force that makes humans
achieve their goals
• Incentives are ‘rewards’ assigned by an external ‘judge’
to a performer for undertaking a specific task
– Common belief (among economists): incentives can be
translated into a sum of money for all practical purposes
• Incentives can be related to extrinsic and intrinsic
motivations
5. Incentives and motivators (2)
• Successful volunteer crowdsourcing is difficult to
predict or replicate
– Highly context-specific
– Not applicable to arbitrary tasks
• Reward models often easier to study and control
(if performance can be reliably measured)
– Different models: pay-per-time, pay-per-unit, winner-takes-it-
all…
– Not always easy to abstract from social aspects (free-riding,
social pressure…)
– May undermine intrinsic motivation
6. Examples (2)
Mason & Watts: Financial incentives and the performance of the crowds, HCOMP 2009.
7. Amazon‘s Mechanical Turk
• Successfully applied to transcription, classification, and
content generation, data collection, image tagging, website
feedback, usability tests…*
• Increasingly used by academia for evaluation purposes
• Extensions for quality assurance, complex workflows,
resource management, vertical domains…
* http://behind-the-enemy-lines.blogspot.com/2010/10/what-tasks-are-posted-on-mechanical.html
8. What tasks can be (microtask-)
crowdsourced?
• Best case
– Routine work requiring common knowledge,
decomposable into simpler, independent sub- tasks,
performance easily measurable, no spam
• Ongoing research in task design, quality
assurance, estimated time of completion…
• Example: open-scale tasks in MTurk
– Generate, then vote.
– Introduce random noise to identify potential issues in
the second step Label Correct
Vote answers
Generate answer
image or not?
10. GWAPs and gamification
• GWAPs: human computation disguised as casual games
• Gamification/game mechanics: integrate game elements
to applications*
– Accelerated feedback cycles
• Annual performance appraisals vs immediate feedback to maintain
engagement
– Clear goals and rules of play
• Players feel empowered to achieve goals vs fuzzy, complex system of
rules in real-world
– Compelling narrative
• Gamification builds a narrative that engages players to participate and
achieve the goals of the activity
– But in the end it’s about what tasks users
want to get better at
*http://www.gartner.com/it/page.jsp?id=1629214
11. What tasks can be gamified?*
• Work is decomposable into simpler tasks
• Tasks are nested
• Performance is measurable
• One can define an obvious rewarding scheme
• Skills can be arranged in a smooth learning
curve
*http://www.lostgarden.com/2008/06/what-actitivies-that-can-be-turned-into.html
12. What is different about semantic
systems?
• It‘s still about the context
of the actual application
• User engagement with
semantic tasks to
– Ensure knowledge is
relevant and up-to-date
– People accept the new
solution and understand its
benefits
– Avoid cold-start problems
– Optimize maintenance
costs
13. What do you want your
users to do?
• Semantic applications
– Context of the actual application
– Need to involve users in knowledge engineering tasks?
• Incentives are related to organizational and social factors
• Seamless integration of new features
• Semantic tools
– Game mechanics
– Paid crowdsourcing (possibly integrated with the tool)
• Using results of casual games
http://gapingvoid.com/2011/06/07/pixie-dust-the-mountain-of-mediocrity/
14. Crowdsourcing knowledge
engineering
• Granularity of activities is
typically too high
• Further splitting is needed
• Crowdsource very specific tasks
that are (highly) divisible
–Labeling (in different languages)
–Finding relationships
–Populating the ontology
–Aligning and interlinking
–Ontology-based annotation
–Validating the results of automatic
methods
–…
www.insemtives.eu 14
20. OntoGame API
• API that provides several methods that are shared by the
OntoGame games, such as
– Different agreement types (e.g. selection agreement)
– Input matching (e.g. , majority)
– Game modes (multi-player, single player)
– Player reliability evaluation
– Player matching (e.g., finding the optimal partner to play)
– Resource (i.e., data needed for games) management
– Creating semantic content
• http://insemtives.svn.sourceforge.net/viewvc/insemtives/gen
eric-gaming-toolkit
6/7/2012 www.insemtives.eu 20
21. Lessons learned
• Approach is feasible for mainstream domains, where a
knowledge corpus is available
• Approach is per design less applicable to Semantic Web-tasks
– Knowledge-intensive tasks are not easily nestable
– Repetitive tasks players‘ retention?
• Knowledge corpus has to be large-enough to allow for a rich
game experience
– But you need a critical mass of players to validate the results
• Advertisement is essential
• Game design vs useful content
– Reusing well-known game paradigms
– Reusing game outcomes and integration in existing workflows and
tools
• Cost-benefit analysis
22. General guidelines
• Focus on the actual goal and incentivize related
actions
– Write posts, create graphics, annotate pictures, reply
to customers in a given time…
• Build a community around the intended actions
– Reward helping each other in performing the task and
interaction
– Reward recruiting new contributors
• Reward repeated actions
– Actions become part of the daily routine
24. Combining human and computational
intelligence
Give me the German names of all commercial airports in Baden-
Württemberg, ordered by their most informative description.
„Retrieve the labels in German of commercial airports located
in Baden-Württemberg, ordered by the better human-readable
description of the airport given in the comment“.
• This query cannot be optimally answered automatically
– Incorrect/missing classification of entities (e.g. classification as airports
instead of commercial airports)
– Missing information in data sets (e.g. German labels)
– It is not possible to optimally perform subjective operations (e.g. comparisons
of pictures or NL comments)
25. What tasks should be
crowdsourced?
„Retrieve the labels in German of commercial airports
located in Baden-Württemberg, ordered by the better
human-readable description of the airport given in the
comment“.
Classification
SPARQL Query: 1
SELECT ?label WHERE {
?x a metar:CommercialHubAirport;
rdfs:label ?label;
rdfs:comment ?comment . Identity resolution
?x geonames:parentFeature ?z . 2
?z owl:sameAs <http://dbpedia.org/resource/Baden-Wuerttemberg>
. 3 Missing Information
FILTER (LANG(?label) = "de")
} ORDER BY CROWD(?comment, "Better description of %x") 4 Ordering
26. Crowdsourced query processing
• Extensions to VoID and
SPARQL
• Formal, declarative
description of data and
tasks using SPARQL
patterns as a basis for the
automatic design of HITs.
• Hybrid query processing
(adaptive techniques,
caching, semantically
driven task design)
27. HITs design: Classification
• It is not always possible to automatically infer classification
from the properties.
• Example: Retrieve the names (labels) of METAR stations that correspond to
commercial airports.
SELECT ?label WHERE {
?station a metar:CommercialHubAirport;
rdfs:label ?label .}
Input: {?station a metar:Station;
rdfs:label ?label;
wgs84:lat ?lat;
wgs84:long ?long}
Output: {?station a ?type.
?type rdfs:subClassOf metar:Station}
28. HITs design: Ordering
• Orderings defined via less straightforward built-ins; for instance,
the ordering of pictorial representations of entities.
• SPARQL extension: ORDER BY CROWD
• Example: Retrieves all airports and their pictures, and the pictures should be
ordered according to the more representative image of the given airport.
SELECT ?airport ?picture WHERE {
?airport a metar:Airport;
foaf:depiction ?picture .
} ORDER BY CROWD(?picture,
"Most representative image for %airport")
Input: {?airport foaf:depiction ?x, ?y}
Output: {{(?x ?y) a rdf:List} UNION {(?y ?x) a rdf:List}}
29. Challenges
• Appropriate level of granularity for HITs design for specific
SPARQL constructs
• Caching
– Naively we can materialise HIT results into
datasets
– How to deal with partial coverage and dynamic
datasets
• Optimal user interfaces of graph-like content
• Pricing and workers’ assignment
30. Thank you
e: elena.simperl@kit.edu, t: @esimperl
Publications available at www.insemtives.org
Team: Maribel Acosta, Barry Norton, Katharina Siorpaes,
Stefan Thaler, Stephan Wölger and many others