This document describes a methodology for crowdsourcing the assessment of Linked Data quality using both Linked Data experts and Amazon Mechanical Turk workers. It presents research on detecting three types of quality issues in DBpedia data via crowdsourcing: incorrect/incomplete object values, incorrect data types, and incorrect outlinks. The methodology uses a two-phase approach, with experts identifying issues in the "Find" phase and workers verifying issues in the "Verify" phase via microtasks. The results indicate that crowdsourcing is effective for detecting quality issues and that experts and workers are suited for different tasks based on required skills.
2. Linked Data - over billion facts
What about the quality?
3. Motivation - Linked Data Quality
Varying quality of Linked Data sources
Source, Extraction, Integration etc.
Some quality issues require certain interpretation
that can be easily performed by humans
Incompleteness
Incorrectness
Semantic Accuracy
4. Motivation - Linked Data Quality
Solution: Include human verification in the
process of LD quality assessment via
crowdsourcing
Human Intelligent Tasks (HIT)
Labor market
Monetary Reward/Incentive
Time & Cost effective
Large-scale problem solving approach, divided into
smaller tasks, independently solved by a large
group of people.
5. Research questions
RQ1: Is it possible to detect quality issues in LD data sets
via crowdsourcing mechanisms?
RQ2: What type of crowd is most suitable for each type of
quality issue?
RQ3: Which types of errors are made by lay users and
experts when assessing RDF triples?
6. Related work
Crowdsourcing
& Linked Data
Web of data
quality
assessment
Our work
ZenCrowd
Entity resolution
CrowdMAP
Ontology alignment
GWAP for LD
Assessing LD
mappings
(Automatic)
Quality
characteristics of
LD data sources
(Semi-automatic)
DBpedia
WIQA, Sieve,
(Manual)
8. Find-Verify Phases of Crowdsourcing
Contest
LD Experts
Difficult task
Final prize
Find Verify
Microtasks
Workers
Easy task
Micropayments
TripleCheckMate
[Kontoskostas2013] MTurk
(1) Adapted from [Bernstein2010]
http://mturk.com
9. LD Experts AMT Workers
Type Contest-based Human Intelligent Tasks
(HITs)
Participants Linked Data (LD) experts Labor market
Task Detect and classify
quality issues in
resources
Detect quality issues in
triples
Reward Most no. of resources
evaluated
Per task/triple
Tool TripleCheckMate Amazon Mechanical Turk,
CrowdFlower etc.
Difference between LD experts & Workers
11. Crowdsourcing using Linked Data
Experts — Methodology
Phase I: Creation of quality problem taxonomy
Phase II: Launching a contest
12. Zaveri et. al. Quality assessment methodologies for Linked Open Data. Semantic
Web Journal, 2015.
D = Detectable, F = Fixable, E = Extraction Framework, M = Mappings Wiki
Crowdsourcing using Linked Data
Experts — Quality Problem Taxonomy
13. D = Detectable, F = Fixable, E = Extraction Framework, M = Mappings Wiki
Crowdsourcing using Linked Data
Experts — Quality Problem Taxonomy
D = Detectable, F = Fixable, E = Extraction Framework, M = Mappings Wiki
17. Crowdsourcing using AMT Workers
Selecting LD quality issues to crowdsource
Designing and generating the micro tasks to present the
data to the crowd
1
2
Dataset
{s p o .}
{s p o .}
Correct
Incorrect +
Quality issue
Steps:
1
2
3
18. Three categories of quality problems occur
pervasively in DBpedia and can be crowdsourced:
Incorrect/Incomplete object
▪Example: dbpedia:Dave_Dobbyn dbprop:dateOfBirth “3”.
Incorrect data type or language tags
▪Example: dbpedia:Torishima_Izu_Islands foaf:name “ ”@en.
Incorrect link to “external Web pages”
▪Example: dbpedia:John-Two-Hawks dbpedia-owl:wikiPageExternalLink
<http://cedarlakedvd.com/>
Selecting LD quality issues to
crowdsource
1
19. Presenting the data to the crowd
• Selection of foaf:name or
rdfs:label to extract human-
readable descriptions
• Values extracted automatically
from Wikipedia infoboxes
• Link to the Wikipedia article via
foaf:isPrimaryTopicOf
• Preview of external pages by
implementing HTML iframe
Microtask interfaces: MTurk tasks
Incorrect object
Incorrect data type or language tag
Incorrect outlink
2
21. Experimental design
• Crowdsourcing approaches:
• Find stage: Contest with LD experts
• Verify stage: Microtasks
• Creation of a gold standard:
• Two of the authors of this paper (MA, AZ) generated the gold
standard for all the triples obtained from the contest
• Each author independently evaluated the triples
• Conflicts were resolved via mutual agreement
• Metric: precision
22. Overall results
LD Experts Microtask workers
Number of distinct
participants
50 80
Total time
3 weeks (predefined) 4 days
Total triples evaluated
1,512 1,073
Total cost
~ US$ 400 (predefined) ~ US$ 43
23. Precision results: Incorrect object task
• MTurk workers can be used to reduce the error rates of LD experts for
the Find stage
• 117 DBpedia triples had predicates related to dates with incorrect/
incomplete values:
”2005 Six Nations Championship” Date 12 .
• 52 DBpedia triples had erroneous values from the source:
”English (programming language)” Influenced by ? .
• Experts classified all these triples as incorrect
• Workers compared values against Wikipedia and successfully classified
this triples as “correct”
Triples compared LD Experts MTurk
(majority voting: n=5)
509 0.7151 0.8977
24. =
Precision results: Incorrect data type task
Numberoftriples
0
38
75
113
150
Data types
Date Millimetre Number Second Year
Experts TP
Experts FP
Crowd TP
Crowd FP
Triples compared LD Experts MTurk
(majority voting: n=5)
341 0.8270 0.4752
25. Precision results: Incorrect link task
• We analyzed the 189 misclassifications by the experts:
• The misclassifications by the workers correspond to pages
with a language different from English.
11%
39%
50%
Freebase links
Wikipedia images
External links
Triples compared Baseline LD Experts MTurk
(n=5 majority voting)
223 0.2598 0.1525 0.9412
26. Final discussion
RQ1: Is it possible to detect quality issues in LD data sets via crowdsourcing
mechanisms?
LD experts - incorrect datatype
AMT workers - incorrect/incomplete object value, incorrect
interlink
RQ2: What type of crowd is most suitable for each type of quality issue?
The effort of LD experts must be applied on those tasks demanding
specific-domain skills. AMT workers were exceptionally good at
performing data comparisons
RQ3: Which types of errors are made by lay users and experts?
Lay users do not have the skills to solve domain-specific tasks, while
experts performance is very low on tasks that demand an extra effort
(e.g., checking an external page)
28. Conclusions
A crowdsourcing methodology for LD quality assessment:
Find stage: LD experts
Verify stage: AMT workers
Methodology and tool are generic to be applied to other
scenarios
Crowdsourcing approaches are feasible in detecting the
studied quality issues
30. Future Work
Combining semi-automated and crowdsourcing methods
Predicted vs. crowdsourced metadata
Conducting new experiments (other domains)
Entity, dataset, experimental metadata
Fix/Improve Quality using Crowdsourcing
Find-Fix-Verify Phases
31. References
Triplecheckmate: A tool for crowdsourcing the quality assessment of
linked data. D Kontokostas, A Zaveri, S Auer, J Lehmann, ISWC 2013.
Crowdsourcing linked data quality assessment. M Acosta, A Zaveri, E
Simperl, D Kontokostas, S Auer, J Lehmann, ISWC 2013.
User-driven quality evaluation of DBpedia. A Zaveri, D Kontokostas, MA
Sherif, L Bühmann, M Morsey, S Auer, J Lehmann, ISEMANTiCS 2013.
Quality assessment for linked data: A survey. A Zaveri, A Rula, A
Maurino, R Pietrobon, J Lehmann, S Auer, Semantic Web Journal 2015.
Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia
Study. M Acosta, A Zaveri, E Simperl, D Kontokostas, F Flöck, J
Lehmann, Semantic Web Journal 2016.
ACRyLIQ: Leveraging DBpedia for Adaptive Crowdsourcing in Linked
Data Quality Assessment. U Hassan, A Zaveri, E Marx, E Curry, J
Lehmann. EKAW 2016.