Davis plaque method.pptx recombinant DNA technology
Amrapali Zaveri Defense
1. 17th April, 2015! ! ! ! ! ! ! ! ! ! ! ! ! ! ! Leipzig, Germany
Linked Data Quality Assessment and its
Application to Societal Progress Measurement
Amrapali Zaveri
1
Faculty of Mathematics and Computer Science!
!
Supervisors:!
Prof. Dr. Ing. habil. Klaus-Peter Fähnrich, University of Leipzig!
Dr. Jens Lehmann, University of Leipzig!
Prof. Dr. Sören Auer, University of Bonn
3. Outline
Motivation — Linked Data Quality
2Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
4. Outline
Motivation — Linked Data Quality
Linked Data Quality Assessment Methodologies
2Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
5. Outline
Motivation — Linked Data Quality
Linked Data Quality Assessment Methodologies
Use Case Leveraging Data Quality
2Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
6. Outline
Motivation — Linked Data Quality
Linked Data Quality Assessment Methodologies
Use Case Leveraging Data Quality
Contributions
2Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
7. Outline
Motivation — Linked Data Quality
Linked Data Quality Assessment Methodologies
Use Case Leveraging Data Quality
Contributions
Future Work
2Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
8. Motivation!
— Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri 3
9. Data on theWeb
4
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
10. Data on theWeb
5
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
15. Linked Data Principles
6
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
16. Linked Data Principles
6
Use URIs as names for things.
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
17. Linked Data Principles
6
Use URIs as names for things.
Use HTTP URIs, so that people can look up those names.
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
18. Linked Data Principles
6
Use URIs as names for things.
Use HTTP URIs, so that people can look up those names.
When someone looks up a URI, provide useful
information, using the standards (RDF, RDFS, OWL,
SPARQL).
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
19. Linked Data Principles
6
Use URIs as names for things.
Use HTTP URIs, so that people can look up those names.
When someone looks up a URI, provide useful
information, using the standards (RDF, RDFS, OWL,
SPARQL).
Include links to other URIs, so that they can discover
more things.
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
25. Data Quality
10
Data Quality is defined as:!
“fitness for use”*!
* Juran, J. (1974). The Quality Control Handbook. McGraw-Hill, New York.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
26. Consequences of Poor Quality
11Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
*http://www.gartner.com/newsroom/id/501733!
#http://www.mckinsey.com/insights/business_technology/
open_data_unlocking_innovation_and_performance_with_liquid_information
27. Consequences of Poor Quality
11
Propagation of errors in integrated datasets
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
*http://www.gartner.com/newsroom/id/501733!
#http://www.mckinsey.com/insights/business_technology/
open_data_unlocking_innovation_and_performance_with_liquid_information
28. Consequences of Poor Quality
11
Propagation of errors in integrated datasets
Major hindrance in acquiring reliable results
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
*http://www.gartner.com/newsroom/id/501733!
#http://www.mckinsey.com/insights/business_technology/
open_data_unlocking_innovation_and_performance_with_liquid_information
29. Consequences of Poor Quality
11
Propagation of errors in integrated datasets
Major hindrance in acquiring reliable results
Loss of important information
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
*http://www.gartner.com/newsroom/id/501733!
#http://www.mckinsey.com/insights/business_technology/
open_data_unlocking_innovation_and_performance_with_liquid_information
30. Consequences of Poor Quality
11
Propagation of errors in integrated datasets
Major hindrance in acquiring reliable results
Loss of important information
Loss in productivity — Additional costs*#
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
*http://www.gartner.com/newsroom/id/501733!
#http://www.mckinsey.com/insights/business_technology/
open_data_unlocking_innovation_and_performance_with_liquid_information
31. Data Quality Assessment
12Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
32. Data Quality Assessment
12
How can one assess the quality of data and make this
information explicit?!
Which criteria should be assessed?!
Which measures should be used?!
Which methodologies/tools can be utilized?
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
33. Main Research Question
13Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
34. Main Research Question
13
How can we exploit Linked Data for a particular use
case and ensure good data quality?
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
35. Overview
14
Systematic!
literature!
review
Linked Data Quality Assessment !
Methodologies Evaluation
User-driven Crowdsourcing
Semi-!
automated
Use case!
leveraging!
quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
36. Overview
15
Systematic!
literature!
review
Linked Data Quality Assessment !
Methodologies Evaluation
User-driven Crowdsourcing
Semi-!
automated
Use case!
leveraging!
quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
37. Current State
16Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
38. Current State
16
Lack of unified descriptions for data quality dimensions
and metrics for Linked Data
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
39. Current State
16
Lack of unified descriptions for data quality dimensions
and metrics for Linked Data
Lack of use-case-driven data quality assessment
methodologies for Linked Data
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
40. Current State
16
Lack of unified descriptions for data quality dimensions
and metrics for Linked Data
Lack of use-case-driven data quality assessment
methodologies for Linked Data
Lack of quality assessment of datasets before utilisation
in particular use cases
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
41. 17
Research Questions
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
42. 17
RQ1 What are the existing approaches to assess the
quality of Linked Data employing a conceptual
framework integrating prior approaches?!
RQ1.1 What are the data quality problems that each
approach assesses?!
RQ1.2 Which are the data quality dimensions and
metrics supported by the proposed approaches?!
RQ1.3 Which tools already exist to assess the
quality of Linked Data?
Research Questions
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
43. Qualitative Analysis
18
Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea
Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
44. Qualitative Analysis
18
30 core articles
Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea
Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
45. Qualitative Analysis
18
30 core articles
18 dimensions - definitions
Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea
Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
46. Qualitative Analysis
18
30 core articles
18 dimensions - definitions
69 metrics
Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea
Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
47. Qualitative Analysis
18
30 core articles
18 dimensions - definitions
69 metrics
12 tools compared using 8 attributes
Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea
Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
50. Metrics
20
Linked Data Quality Metrics
Dimension Metric Description QN/QL*
Completeness Schema completeness
No. of classes and properties / !
total no. of classes and properties
QN
Interlinking
Detection of good
quality interlinks
(i) detection of (a) interlinking degree, (b)
clustering coefficient, (c) centrality, (d)
open sameAs chains and (e) description
richness through sameAs by using network
measures, (ii) via crowdsourcing
QN
Timeliness Freshness of datasets Max{0, 1 − currency / volatility} QN
Trustworthiness
Trustworthiness of
information provider
indicating the level of trust for the
publisher on a scale of 1−9
QL
*QN - Quantitative Metric ; *QL - Qualitative Metric
Systematic Literature Review
51. Tools
21
Trellis TrustBOT tSPARQL WIQA ProLOD Flemming
Availablility - -
✔
- -
✔
Licensing Open-
source
- GPL v3 Apache v2 - -
Automation Semi-
automated
Semi-
automated
Semi-
automated
Semi-
automated
Semi-
automated
Semi-
automated
Collaboration Yes No No No No No
Customizability
✔
✔
✔
✔
✔
✔
Scalability - No Yes - - No
Usability 2 4 4 2 2 3
Maintainance 2005 2003 2012 2006 2010 2010
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
52. Tools
22
LinkQA Sieve RDFUnit DaCura TripleCheckMate LiQuate
Availablility
✔
✔
✔
-
✔
✔
Licensing Open-
source
Apache Apache - Apache -
Automation Automated
Semi-
automated
Semi-
automated
Semi-
automated
Semi-automated
Semi-
automated
Collaboration No No No Yes yes No
Customizability No
✔
✔
✔
✔
No
Scalability Yes Yes Yes No Yes No
Usability 2 4 3 1 5 1
Maintainance 2011 2012 2014 2013 2013 2013
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
53. Problems in Current Approaches
23Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
54. Problems in Current Approaches
23
Not catered to the use case
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
55. Problems in Current Approaches
23
Not catered to the use case
Results difficult to interpret
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
56. Problems in Current Approaches
23
Not catered to the use case
Results difficult to interpret
Do not report the root cause of the quality issues
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
57. Problems in Current Approaches
23
Not catered to the use case
Results difficult to interpret
Do not report the root cause of the quality issues
Require considerable amount of configuration
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
58. Problems in Current Approaches
23
Not catered to the use case
Results difficult to interpret
Do not report the root cause of the quality issues
Require considerable amount of configuration
Do not allow user to choose input dataset
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
59. Overview
24
Systematic!
literature!
review
Linked Data Quality Assessment !
Methodologies Evaluation
User-driven Crowdsourcing
Semi-!
automated
Use case!
leveraging!
quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
61. Research Questions
25
RQ2 How can we assess the quality of Linked Data
using a user-driven methodology?!
RQ2.1 How feasible is it to employ Linked Data
experts to assess the quality issues of LD?!
RQ2.2 How feasible is it to use a combination of
user-driven and semi-automated methodology to
assess the quality of LD?
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
62. Methodology
26
Resource Selection
[Per Class] [Manual]
[Random]
Resource
Evaluation mode
selection
Resource Evaluation
[Manual]
Triples
[Semi-automatic] [Automatic]
List of invalid facts
Data Quality
Improvement
Pre-selection
of triples
Patch Ontology
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement
63. Methodology
26
Resource Selection
[Per Class] [Manual]
[Random]
Resource
Evaluation mode
selection
Resource Evaluation
[Manual]
Triples
[Semi-automatic] [Automatic]
List of invalid facts
Data Quality
Improvement
Pre-selection
of triples
Patch Ontology
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement
Manual!
Semi-automated!
!
!
!
!
64. Manual — Phase I
27
Linked Data Quality Problem Taxonomy
Dimensions Category
Accuracy
Triple incorrectly extracted!
Datatype problems!
Implicit relationships between
attributesRelevancy Irrelevant information extracted
Representational consistency Representation of number values
Interlinking
External links
Interlinks with other datasets
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
65. Manual — Phase II
28
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
66. Manual — Phase II
28
Invited Linked Data experts!
Triple-based evaluation!
Contest-based - 3 weeks
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
67. Phase II —TripleCheckMate
29
User-Driven Quality Assessment
https://github.com/AKSW/TripleCheckMate
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Choose a resource
68. Phase II —TripleCheckMate
30
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Identify erroneous triples
69. Phase II —TripleCheckMate
30
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Identify erroneous triples
70. Phase II —TripleCheckMate
30
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Identify erroneous triples
72. Manual — Results
32
Total no. of users 58
Total no. of distinct resources evaluated 521
Total no. of distinct incorrect triples 2928
% of triples affected 11.93%
Resource-based inter-rater agreement (Cohen’s kappa) 0.34
Total no. of triples evaluated for correctness 700
% of triples evaluated incorrectly 19%
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
73. Semi-automated — Step 1
33
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
74. Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
75. Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
Functionality
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
76. Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
Functionality
Inverse functionality
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
77. Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
Functionality
Inverse functionality
Asymmetric
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
78. Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
Functionality
Inverse functionality
Asymmetric
Irreflexivity
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
79. Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
Functionality
Inverse functionality
Asymmetric
Irreflexivity
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Example:
80. Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
Functionality
Inverse functionality
Asymmetric
Irreflexivity
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Example:
Domain: Formula One
Racer
81. Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
Functionality
Inverse functionality
Asymmetric
Irreflexivity
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Example:
Domain: Formula One
Racer
Range: Grand Prix
82. Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
Functionality
Inverse functionality
Asymmetric
Irreflexivity
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Example:
Domain: Formula One
Racer
Range: Grand Prix
Only 1 first win of each
Formula One Racer
(Functional)
83. Semi-automated — Step 2
34
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
84. Semi-automated — Step 2
34
Manual evaluation of generated axioms
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
85. Semi-automated — Step 2
34
Manual evaluation of generated axioms
100 random axioms per type
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
86. Semi-automated — Step 2
34
Manual evaluation of generated axioms
100 random axioms per type
Only those axioms where at least one violation can
be found
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
87. Semi-automated — Step 2
34
Manual evaluation of generated axioms
100 random axioms per type
Only those axioms where at least one violation can
be found
Also taking target context into account
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
88. Semi-automated — Results
35
User-Driven Quality Assessment
Inverse!
functionality
Functionality
Asymmetry
Irreflexivity
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
90. Summary
36
Quality analysis of over 500 resources
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
91. Summary
36
Quality analysis of over 500 resources
12% error detected
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
92. Summary
36
Quality analysis of over 500 resources
12% error detected
Linked Data experts performed quality analysis but
evaluated correct triples as errors
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
93. Summary
36
Quality analysis of over 500 resources
12% error detected
Linked Data experts performed quality analysis but
evaluated correct triples as errors
75% functionality violations of property characteristics
detected but required manual verification
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
94. Overview
37
Systematic!
literature!
review
Linked Data Quality Assessment !
Methodologies Evaluation
User-driven Crowdsourcing
Semi-!
automated
Use case!
leveraging!
quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
96. Research Questions
38
RQ2.3 Is it possible to detect quality issues in LD
datasets via crowdsourcing mechanisms?!
RQ2.4 What type of crowd is most suitable for each
type of quality issue?!
RQ2.5 Which types of assessment errors are made by
lay users and experts?
Crowdsourcing Linked Data Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
97. Concepts
39
Crowdsourcing Linked Data Quality Assessment
- Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena
Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
- Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta,
Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann.
SWJ (Submitted) 2015.
98. Concepts
39
AMT - Amazon Mechanial Turk
Crowdsourcing Linked Data Quality Assessment
- Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena
Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
- Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta,
Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann.
SWJ (Submitted) 2015.
99. Concepts
39
AMT - Amazon Mechanial Turk
HITs - Human Intelligent Tasks/microtasks
Crowdsourcing Linked Data Quality Assessment
- Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena
Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
- Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta,
Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann.
SWJ (Submitted) 2015.
100. Concepts
39
AMT - Amazon Mechanial Turk
HITs - Human Intelligent Tasks/microtasks
MTurk Workers - monetary reward for each HIT
Crowdsourcing Linked Data Quality Assessment
- Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena
Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
- Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta,
Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann.
SWJ (Submitted) 2015.
101. Concepts
39
AMT - Amazon Mechanial Turk
HITs - Human Intelligent Tasks/microtasks
MTurk Workers - monetary reward for each HIT
Find-Fix-Verify phases
Crowdsourcing Linked Data Quality Assessment
- Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena
Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
- Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta,
Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann.
SWJ (Submitted) 2015.
102. Methodology
40
Resource
[Manual]
[Any]
Resource
selection
Evaluation of
resource’s
triples
Selection of
quality issues
[Incorrect triples]
[Yes]
[No]
List of incorrect
triples classified
by quality issue
(Find stage)
LD Experts in contest
HIT generation
(Verify stage)
Workers in paid microtasks
Accept HIT
Assess triple
according to
the given
quality issue
Submit HIT
[Per Class]
[Correct]
[Incorrect]
[Data doesn’t
make sense]
[I don’t
know]
[More triples to assess]
[No]
[Yes]
Experts Workers
Crowdsourcing Linked Data Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
107. Results - Experts vs. Crowd
42
Crowdsourcing Linked Data Quality Assessment
LD Expert MTurk Worker
58 80
3 weeks
4 days
1512
1073
0.38 0.73
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
108. LD experts MTurk Workers
Object values
Fair!
- required validation
Fair!
- simple comparisons
Datatypes & literals
Fair!
- required validation
Poor!
- inexperienced with
RDF
Interlinks
Poor!
- high effort required
Good!
- high inter-rater
agreement
Summary — Experts vs. Crowd
43
Crowdsourcing Linked Data Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
109. Overview
44
Systematic!
literature!
review
Linked Data Quality Assessment !
Methodologies Evaluation
User-driven Crowdsourcing
Semi-!
automated
Use case!
leveraging!
quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
110. Research Questions
45
Use Case Leveraging Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
111. Research Questions
45
RQ2.6 How can we semi-automatically assess the
quality of datasets and provide meaningful results to
the user?!
RQ3 How can we exploit Linked Data for building a
use case and ensure good data quality?
Use Case Leveraging Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
113. Motivation — User Scenario
46
Healthcare!
Policy maker
Use Case Leveraging Data Quality
114. Motivation — User Scenario
46
Healthcare!
Policy maker
Use Case Leveraging Data Quality
Which diseases?!
Deaths per diseases?!
Where to allocate funds?
interested in
115. Motivation — User Scenario
46
Healthcare!
Policy maker
Use Case Leveraging Data Quality
Which diseases?!
Deaths per diseases?!
Where to allocate funds?
interested in
Databases!
e.g. WHO, !
ClinicalTrials.gov
looks at
116. Motivation — User Scenario
46
Healthcare!
Policy maker
Use Case Leveraging Data Quality
Which diseases?!
Deaths per diseases?!
Where to allocate funds?
interested in
Databases!
e.g. WHO, !
ClinicalTrials.gov
looks at
Data in disparate datasets, !
in different formats!
Data quality problems!
Subset of data!
Error-prone analysis etc.
analysis
117. Motivation — User Scenario
46
Healthcare!
Policy maker
Use Case Leveraging Data Quality
Which diseases?!
Deaths per diseases?!
Where to allocate funds?
interested in
Databases!
e.g. WHO, !
ClinicalTrials.gov
looks at
Data in disparate datasets, !
in different formats!
Data quality problems!
Subset of data!
Error-prone analysis etc.
analysis translates to Inadequate !
allocations of!
funds
118. Use Case — Societal Progress
Indicators
47
Evaluate the impact of Research & Development (R&D)
— educational performance — on a country’s
performance in:!
Economical!
Healthcare
Use Case Leveraging Data Quality
Using Linked Data to evaluate the impact of Research and Development in Europe: a Structural Equation
Model. Amrapali Zaveri, Joao Ricardo Nickenig Vissoci, Cinzia Daraio and Ricardo Pietrobon.
ISWC 2013.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
119. Datasets &Variables
48
Use Case Leveraging Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
120. Datasets &Variables
48
4 datasets!
World Bank!
LinkedCT!
Scimago!
USPTO
Use Case Leveraging Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
121. Datasets &Variables
48
4 datasets!
World Bank!
LinkedCT!
Scimago!
USPTO
17 variables !
Examples!
GDP (economical)!
Birth rate, death
rate (healthcare)!
h-index
(educational)
Use Case Leveraging Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
122. Methodology
49
World Bank Scimago
LinkedCT USPTO
Use Case Leveraging Data Quality
*van Hage, W. R., Kauppinen, T., Graeler, B., Davis, C., Hoek- sema, J., Ruttenberg, A., and Bahls, D.
(2014). SPARQL Package, v1.6. R Foundation for Statistical Computing.!
* https://github.com/amrapalijz/R-LOD-SEM/blob/master/RSPARQL
extract
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
123. Methodology
49
World Bank Scimago
LinkedCT USPTO
RSPARQL*
Use Case Leveraging Data Quality
*van Hage, W. R., Kauppinen, T., Graeler, B., Davis, C., Hoek- sema, J., Ruttenberg, A., and Bahls, D.
(2014). SPARQL Package, v1.6. R Foundation for Statistical Computing.!
* https://github.com/amrapalijz/R-LOD-SEM/blob/master/RSPARQL
extract
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
124. Methodology
49
World Bank Scimago
LinkedCT USPTO
RSPARQL*
Use Case Leveraging Data Quality
*van Hage, W. R., Kauppinen, T., Graeler, B., Davis, C., Hoek- sema, J., Ruttenberg, A., and Bahls, D.
(2014). SPARQL Package, v1.6. R Foundation for Statistical Computing.!
* https://github.com/amrapalijz/R-LOD-SEM/blob/master/RSPARQL
perform
extract
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
125. Methodology
49
World Bank Scimago
LinkedCT USPTO
Quality !
Assessment
RSPARQL*
Use Case Leveraging Data Quality
*van Hage, W. R., Kauppinen, T., Graeler, B., Davis, C., Hoek- sema, J., Ruttenberg, A., and Bahls, D.
(2014). SPARQL Package, v1.6. R Foundation for Statistical Computing.!
* https://github.com/amrapalijz/R-LOD-SEM/blob/master/RSPARQL
perform
extract
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
126. R2RLint tool*!
7 dimensions!
13 quality metrics !
Use case specific
Semi-automated Quality
Assessment
50
*https://github.com/AKSW/R2RLint
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
127. R2RLint tool*!
7 dimensions!
13 quality metrics !
Use case specific
Semi-automated Quality
Assessment
50
Availability Completeness
Interlinking
Syntactic!
validity!
Consistency
Interpretability
Representational
conciseness
*https://github.com/AKSW/R2RLint
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
128. Quality Assessment Results
51
Use Case Leveraging Data Quality
Interlinking !
completeness
Population !
incompleteness
Inconsistency
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Total no. detected
129. 11/17Variables
52
Latent
variables
Observed variables
Educational!
performance
Number of articles (h) that have at least h citations (h-index)
Total no. of documents published per country per year
High-technology export (HTE)
Healthcare!
performance
Adolescent fertility rate (AFR)
Birth rate (BR)
Death rate (DR)
Health expenditure public (HEP)
Immunization DPT (IDPT)
Immunization measles (IM)
Mortality rate, infant (MR)
Economic
performance
GDP per capita (current US$)
Use Case Leveraging Data Quality
130. Methodology
53
World Bank
Scimago
Structural Equation Modeling
EFA*-CFA*-!
EFA-CFA
Apply SEM to !
hypothesis
variables
Step I
Step II
Use Case Leveraging Data Quality
apply
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
131. Methodology
53
World Bank
Scimago
Structural Equation Modeling
EFA*-CFA*-!
EFA-CFA
Apply SEM to !
hypothesis
variables
Step I
Step II
*EFA - Exploratory Factor Analysis!
*CFA - Confirmatory Factor Analysis
Use Case Leveraging Data Quality
apply
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
132. Theoretical Framework
54
Use Case Leveraging Data Quality
Educational !
performance
Healthcare!
performance
Economical!
performance
correlation
correlation
correlation
133. Structural Equation Modeling
55
Use Case Leveraging Data Quality
https://github.com/amrapalijz/R-LOD-SEM/blob/master/sem_script.R
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
#Insert covariance matrix!
var<-var(semdata)!
cov<-cov(datanew)!
cor<-cor(datanew)!
#Acquire data!
> data<-
with(data,data.frame(hindex,n
oOfDocs,IDPT,IM,MR,!
AFR,BR,DR,GDP,HEP,HET))!
> ssemmodel<- specifyModel()!
#Latent Variables!
> HealthCare->IDPT,efa14,NA;
HealthCare->IM,efa11,NA;
HealthCare-> MR,efa12,NA;
HealthCare->AFR,efa13,NA;!
….!
#Running SEM model!
> sem <- sem::sem(semmodel,cor,
N=781)!
> summary(sem,fit.indices=c("GFI",
"AGFI", "RMSEA", "NFI","NNFI",
"CFI", "RNI", "IFI", "SRMR", "AIC",
"AICc"))!
> modIndices(sem)!
> qgraph(sem,cut = 0.8,gray=TRUE)
134. Structural Equation Modeling
56
Use Case Leveraging Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
135. Theoretical Framework
57
Use Case Leveraging Data Quality
Educational !
performance
Healthcare!
performance
Economical!
performance
correlation
correlation
correlation
136. Conclusions
58
Use Case Leveraging Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
137. Conclusions
58
Performing robust statistical analysis on Linked Data
can lead to important and meaningful insights on
publicly available data for societal progress
measurement.!
Importance of performing use-case driven data quality
assessment of datasets before their utilisation.
Use Case Leveraging Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
139. Contributions
59
Comprehensive survey !
18 data quality dimensions with definitions; 69 metrics!
12 tools compared according to 8 attributes!
Development and evaluation of data quality assessment methodologies!
User-driven - manual and semi-automated!
Crowdsourcing - experts vs. workers!
Semi-automated - application to a use case !
Consumption of Linked Data leveraging data quality
142. FutureWork
60
Standardized Quality assessment methodology for
Linked Data
Quality assessment tools for Linked Data
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
143. FutureWork
60
Standardized Quality assessment methodology for
Linked Data
Quality assessment tools for Linked Data
Detection as well as improvement of quality issues
before utilization in Linked Data use cases
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
144. Conference Publications
61
Using Linked Data to evaluate the impact of Research and
Development in Europe: a Structural Equation Model.
Amrapali Zaveri, Joao Ricardo Nickenig Vissoci, Cinzia
Daraio and Ricardo Pietrobon. ISWC 2013.!
Crowdsourcing Linked Data quality assessment. Maribel Acosta
and Amrapali Zaveri, Elena Simperl, Dimitris
Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013. !
User-driven Quality Evaluation of DBpedia. Amrapali Zaveri,
Dimitris Kontokostas, Mohamed A. Sherif, Lorenz
Bühmann, Mohamed Morsey, Sören Auer and Jens
Lehmann. ISEMANTICS 2013.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
145. Journal Publications
62
Quality assessment methodologies for Linked Data: A Survey.
Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo
Pietrobon, Jens Lehmann and Sören Auer. Semantic Web
Journal 2015.!
Using Linked Data to build an Observatory of Societal Progress
Indicators. Amrapali Zaveri, Joao Ricardo Nickenig Vissoci,
Patrick Westphal, Jose Roberto Nascimento Junior, Luciano de
Andrade, Cinzia Daraio, Jens Lehmann. Journal of Web
Semantics 2014 (under review).!
Publishing and Interlinking the USPTO Patent Data. Amrapali
Zaveri, Mofeed M. Hassan, Tariq Yousef, Sören Auer, Jens
Lehmann. Semantic Web Journal 2014 (under review).
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
146. Publications
63
No. of publications: 34 (Google Scholar),16 (DBLP)!
Citations: 251 !
h-index: 9; i-10 index: 8 (Google Scholar)
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
147. Thank you for your attention !!
Questions?
zaveri@informatik.uni-leipzig.de
64