Summary of crowdsourcing studies to assess the quality of knowledge graphs and complete missing values. Results focus on findings over the DBpedia knowledge graph ( https://wiki.dbpedia.org/).
Related publications:
Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Auer, S., & Lehmann, J. Crowdsourcing Linked Data Quality Assessment. In International Semantic Web Conference (pp. 260-276), 2013.
Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Flöck, F., & Lehmann, J. Detecting Linked Data Quality issues via Crowdsourcing: A DBpedia Study. Semantic Web Journal, 9(3), 303-335, 2018.
Acosta, M., Simperl, E., Flöck, F., & Vidal, M. E. HARE: A hybrid SPARQL engine to enhance query answers via crowdsourcing. In Proceedings of the 8th International Conference on Knowledge Capture (p. 11). 2015. Best Student Paper Award.
Acosta, M., Simperl, E., Flöck, F., & Vidal, M. E. Enhancing answer completeness of SPARQL queries via crowdsourcing. Journal of Web Semantics, 45, 41-62, 2017.
Acosta, M., Simperl, E., Flöck, F., & Vidal, M. E. HARE: An engine for enhancing answer completeness of SPARQL queries via crowdsourcing. Companion Volume of the Web Conference (pp. 501-505). 2018.
2. 2
- Different types of incorrectness
- Semi-structured data model
Correctness Challenges
Drug
Oralrdf:type
rdf:type
Data source: DBpedia endpoint (December 2018).
?
- Skewed data distributions
- Semi-structured data model
- Open World Assumption
Completeness Challenges
route
route
rdf:typeroute
Cheek
3. Crowdsourcing KG Correctness
Find-Verify Approach
3
Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Auer, S., & Lehmann, J. Crowdsourcing Linked Data Quality Assessment. In
International Semantic Web Conference (pp. 260-276), 2013.
Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Flöck, F., & Lehmann, J. Detecting Linked Data Quality issues via Crowdsourcing: A
DBpedia Study. Semantic Web Journal, 9(3), 303-335, 2018.
4. Find-Verify Approach
• Find stage: subject-centric
• In each task, the crowd assess the triples of a subject in the KG
• Incorrect triples are annotated with the corresponding quality issue
• Verify stage: issue-centric
• The crowd assess the triples annotated as incorrect in the previous stage
• In each task, the crowd assess triples annotated with the same quality issue
4
Crowdsourcing
Interface Crowd
Crowdsourcing
Interface
Incorrect
RDF Triples
Incorrect
RDF Triples
Tasks
Crowd
Verify StageFind Stage
Tasks
<<input>> <<output>>
RDF Triples
� �
Quality Issues:
5. Studied DBpedia Quality Issues
Three categories of quality issues that occur in DBpedia [Zaveri2013]:
• Incorrect object value
dbr:Dave_Dobbyn dbp:dateOfBirth “3” .
• Incorrect data type or language tags
dbr:Torishima_Izu_Islands foaf:name “鳥島”@en .
• Incorrect link to external sources
dbr:John-Two-Hawks dbpedia-owl:wikiPageExternalLink <http://cedarlakedvd.com/> .
5
Examples from DBpedia 2014.
6. Overview of the Results on DBpedia 3.9
6
Two workflows: Expert-Worker (EW), Worker-Worker (WW)
Triples crowdsourced in Find Stage: >30,000
Triples crowdsourced in Verify Stage: 1,073
Distribution of DBpedia quality issues (as of experts):
509 triples with incorrect object value
341 triples with incorrect datatype/language
223 triples with incorrect link
7. Results: Expert-Worker Workflow
• Crowd workers cannot detect datatype issues correctly
• Experts do not check the outlinks properly
7
0.00
0.25
0.50
0.75
1.00
Datatype / Language Link Object Value
MetricValue
EW (Find) Precision
EW (Verify) Precision
EW (Verify) Sensitivity
EW (Verify) Specificity
1
2
1
2
8. Results: Worker-Worker Workflow
• Precision of crowd workers in the find stage is very low
• Sensitivity values indicate that crowd workers reliably confirm incorrect triples
8
0.00
0.25
0.50
0.75
1.00
Datatype / Language Link Object Value
MetricValue
WW (Find) Precision
WW (Verify) Precision
WW (Verify) Sensitivity
WW (Verify) Specificity
2
1 1
1
2 2
1
2
9. Crowdsourcing KG Completeness
HARE (Hybrid SPARQL Query Engine)
9
Acosta, M., Simperl, E., Flöck, F., & Vidal, M. E. HARE: A hybrid SPARQL engine to enhance query answers via crowdsourcing. In
Proceedings of the 8th International Conference on Knowledge Capture (p. 11). 2015. Best Student Paper Award.
Acosta, M., Simperl, E., Flöck, F., & Vidal, M. E. Enhancing answer completeness of SPARQL queries via crowdsourcing. Journal of Web
Semantics, 45, 41-62, 2017.
Acosta, M., Simperl, E., Flöck, F., & Vidal, M. E. HARE: An engine for enhancing answer completeness of SPARQL queries via
crowdsourcing. Companion Volume of the Web Conference (pp. 501-505). 2018.
11. HARE
• A hybrid machine/human SPARQL query engine that is able to enhance
the size of query answers.
• Based on a novel RDF completeness model, HARE implements query
optimization and execution techniques:
Identifying portions of queries that yield missing values.
• HARE resorts to microtask crowdsourcing:
Resolving missing values.
11
12. Microtask Manager
• Receives triple patterns to crowdsource.
(dbr:Flecainide, dbp:routesOfAdministration, ?route)
• Creates human tasks using data from the KG
13. Experimental Settings
• Benchmark: 50 queries against (English version, 2014).
• Ten queries in five different knowledge domains:
History, Life Sciences, Movies, Music, and Sports.
• Implementation details:
• Dataset (queries executed directly against the dataset).
• HARE (our proposed approach).
• HARE BL (generates microtask interfaces replacing URIs by labels).
• Crowdsourcing configuration:
• The crowd is reached via CrowdFlower.
• Four different triple patterns per task, 0.07 US$ per task.
• At least 3 answers were collected per task.
13
14. Overview of the Results
Total triple patterns crowdsourced: 1,004
Total answers collected from the crowd: 3,163
75%-98% of the crowd answers
were produced in 12 minutes
14
15. Completeness of Query Answers
15
Recall of tested approaches w.r.t. D* per SPARQL query
Recall varies across queries and knowledge domains in DBpedia.
16. Completeness of Query Answers
16
Recall of tested approaches w.r.t. D* per SPARQL query
HARE outperforms the other approaches across all knowledge domains.
Our RDF completeness model captures the skewed distributions of values.
Recall varies across queries and knowledge domains in DBpedia.
✓ ✓✓✓✓ ✓✓✓✓✓ ✓ ✓✓✓✓ ✓✓✓✓✓ ✓ ✓✓✓✓ ✓✓✓✓✓ ✓✓✓✓ ✓✓✓✓✓ ✓ ✓✓✓✓ ✓✓✓✓✓
17. Quality of Crowd Answers: Precision
17
The crowd exhibits heterogeneous performance within DBpedia domains.
19. Conclusions & Outlook
• Crowdsourcing is a feasible solution for KG quality processing.
• KG correctness: The crowd performs best in verification tasks
(confirming incorrect facts).
• KG completeness: The precision of the crowd answer varies
within a knowledge domains in DBpedia.
• Outlook (Scalability): Further integration of the crowd answers
with automatic methods to scale up to large datasets.
19