1. Crowdsourcing tasks in Linked Data management
Elena Simperl,1 Barry Norton,2 Denny Vrandecic1
1Institute AIFB, Karlsruhe Institute of Technology, Germany
2Ontotext AD, Bulgaria
Institute of Applied Informatics and Formal Description Methods (AIFB)
Institute of Applied Informatics and Formal Description Methods (AIFB)
KIT – University of the State of Baden-Wuerttemberg and
National Research Center of the Helmholtz Association www.kit.edu
2. Motivation
Various aspects of Linked Data management
naturally rely on human intelligence to yield
optimal results
But reaching a critical mass of useful contributions
from all relevant stakeholders is still more an art
than an engineering exercise
2 23.10.2011 Seminar - Die Rolle von Ontologien in Linked Data – Kickoff Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
3. Microtask platforms
Break task
Evaluate the
Define task into smaller
results
units
3 23.10.2011 Seminar - Die Rolle von Ontologien in Linked Data – Kickoff Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
4. Approach
Formal, declarative description of the data and tasks
using SPARQL patterns as a basis for the automatic
design of HITs
Integral part of Linked Data tools and applications
At design time application developer specifies which data
portions workers can process and via which types of HITs
At run time
The system materializes the data
Workers process it
Data and application are updated to reflect crowdsourcing results
4 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
5. Examples of Linked Data tasks
amenable to crowdsourcing
Identity resolution
Metadata completion and checking/correction
Classification
Ordering
Quantitative
Qualitative
Translation
5 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
6. Running Example
6 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
7. Identity resolution
Identity Resolution “involves the creation of sameAs
links, either by comparison of metadata or by
investigation of links on the human Web.”
Input: {?station a metar:Station;
rdfs:label ?slabel;
wgs84:lat ?slat;
wgs84:long ?slong .
?airport a dbp-owl:Airport;
rdfs:label ?alabel;
wgs84:lat ?alat;
wgs84:long ?along}
Output: {OPTIONAL
{?airport owl:sameAs ?station}}
7 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
8. Metadata completion & correction
“Certain properties, necessary for a given query,
may not be uniformly populated. Manually conducted
research might be necessary to transfer this
information from the human-readable Web”
Input: {?station a metar:Station;
rdfs:label ?label;
wgs84:lat ?lat;
wgs84:long ?long;
dbp:icao ?badicao}
Output: {?station dbp:icao ?goodicao}
8 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
9. Classification
“Linked Data emphasis[es…] relationships between
resources [over classification]. [D]ue to the promoted
use of generic vocabularies, is it not always possible
to infer classification from […] properties”
Input: {?station a metar:Station;
rdfs:label ?label;
wgs84:lat ?lat;
wgs84:long ?long}
Output: {?station a ?type.
?type rdfs:subClassOf
metar:Station}
9 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
10. Ordering
“Having means to rank Linked Data content along
specific dimensions is typically deemed useful for
quantitative
querying and browsing […both] “specific” ordering
[(e.g. timestamps) … and] orderings […] via qualitative
“less straightforward” built-ins [(e.g. pref/alt labels)]”
Input: {?station foaf:depiction ?x, ?y}
Output: {{(?x ?y) a rdf:List}
UNION {(?y ?x) a rdf:List}}
10 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
11. Translation
“[An important] aspect of the labeling of resources for
humans is multi-linguality […] actual provision of labels
in non-English languages is currently rather low”
Input: {?station rdfs:label ?enlabel.
FILTER (LANG(?label) = "EN")}
Output: {?station rdfs:label ?bglabel.
FILTER (LANG(?label) = "BG")}
11 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
12. Open query answering
Query a FOAF-file using the vCard vocabulary
hp:Harry foaf:mbox <mailto:scarface@hogwarts.ac.uk> ;
foaf:nick "Harry" ; foaf:familyName "Potter" .
SELECT ?name ?email WHERE
{ ?p vcard:email ?email ; vcard:fn ?name }
In order to answer the query as intended
Vocabulary mapping and entity resolution (foaf to vcard)
Metadata completion (full name is Harry Potter)
12 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
13. Limitations of microtask crowdsourcing
Decomposability
Verifiability
Expertise
Compositions to deal with tasks with
underspecified workflow and/or multiple correct
answers
13 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
14. Challenges
Decomposition of user-visible queries:
SPARQL
Easy: Low quality (meta)data can be subject to automated
checking (even if not fixing)
Medium: Missing data (and translation) can be automatically
identified (but knowing to which dataset it should belong is not
necessarily clear)
Difficult:
Interlinking (at least sameAs) is somewhat implicit (using
entailment) and knowing where user expects
Query optimisation obfuscates what is used and should
involve costs for human tasks
Pig might be somewhat easier in latter regard
Caching
Naively we can materialise HIT results into datasets
How to deal with partial coverage and dynamic datasets
14 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
15. Further Challenges
Appropriate level of granularity for HITs design for
specific SPARQL constructs and typical
functionality of Linked Data management
components
Optimal user interfaces of graph-like content
(Contextual) Rendering of LOD entities and tasks
Pricing and workers’ assignment
Can we connect the end-users of an application and
their wish for specific data to be consumed with the
payment of workers and prioritization of HITs?
Dealing with spam / gaming
15 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)
16. QUESTIONS
16 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale
Beschreibungsverfahren (AIFB)