SlideShare une entreprise Scribd logo
1  sur  147
Télécharger pour lire hors ligne
17th April, 2015! ! ! ! ! ! ! ! ! ! ! ! ! ! ! Leipzig, Germany
Linked Data Quality Assessment and its
Application to Societal Progress Measurement	

Amrapali Zaveri
1
Faculty of Mathematics and Computer Science!
!
Supervisors:!
Prof. Dr. Ing. habil. Klaus-Peter Fähnrich, University of Leipzig!
Dr. Jens Lehmann, University of Leipzig!
Prof. Dr. Sören Auer, University of Bonn
Outline
2Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Outline
Motivation — Linked Data Quality
2Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Outline
Motivation — Linked Data Quality
Linked Data Quality Assessment Methodologies
2Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Outline
Motivation — Linked Data Quality
Linked Data Quality Assessment Methodologies
Use Case Leveraging Data Quality
2Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Outline
Motivation — Linked Data Quality
Linked Data Quality Assessment Methodologies
Use Case Leveraging Data Quality
Contributions
2Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Outline
Motivation — Linked Data Quality
Linked Data Quality Assessment Methodologies
Use Case Leveraging Data Quality
Contributions
Future Work
2Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation!
— Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri 3
Data on theWeb
4
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Data on theWeb
5
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Data on theWeb
5
Accessible
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Data on theWeb
5
Accessible
Re-usable
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Data on theWeb
5
Accessible
Re-usable
Understandable
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Data on theWeb
5
Accessible
Re-usable
Understandable
Discoverable
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Linked Data Principles
6
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Linked Data Principles
6
Use URIs as names for things.
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Linked Data Principles
6
Use URIs as names for things.
Use HTTP URIs, so that people can look up those names.
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Linked Data Principles
6
Use URIs as names for things.
Use HTTP URIs, so that people can look up those names.
When someone looks up a URI, provide useful
information, using the standards (RDF, RDFS, OWL,
SPARQL).
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Linked Data Principles
6
Use URIs as names for things.
Use HTTP URIs, so that people can look up those names.
When someone looks up a URI, provide useful
information, using the standards (RDF, RDFS, OWL,
SPARQL).
Include links to other URIs, so that they can discover
more things.
Motivation — Linked Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Linked Data
7
Motivation — Linked Data Quality
Linked Data
8Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
Linked Data
9
Motivation — Linked Data Quality
Linked Data
9
Motivation — Linked Data Quality
Linked Data
9
What about the quality?
Motivation — Linked Data Quality
Data Quality
10
Data Quality is defined as:!
“fitness for use”*!
* Juran, J. (1974). The Quality Control Handbook. McGraw-Hill, New York.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
Consequences of Poor Quality
11Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
*http://www.gartner.com/newsroom/id/501733!
#http://www.mckinsey.com/insights/business_technology/
open_data_unlocking_innovation_and_performance_with_liquid_information
Consequences of Poor Quality
11
Propagation of errors in integrated datasets
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
*http://www.gartner.com/newsroom/id/501733!
#http://www.mckinsey.com/insights/business_technology/
open_data_unlocking_innovation_and_performance_with_liquid_information
Consequences of Poor Quality
11
Propagation of errors in integrated datasets
Major hindrance in acquiring reliable results
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
*http://www.gartner.com/newsroom/id/501733!
#http://www.mckinsey.com/insights/business_technology/
open_data_unlocking_innovation_and_performance_with_liquid_information
Consequences of Poor Quality
11
Propagation of errors in integrated datasets
Major hindrance in acquiring reliable results
Loss of important information
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
*http://www.gartner.com/newsroom/id/501733!
#http://www.mckinsey.com/insights/business_technology/
open_data_unlocking_innovation_and_performance_with_liquid_information
Consequences of Poor Quality
11
Propagation of errors in integrated datasets
Major hindrance in acquiring reliable results
Loss of important information
Loss in productivity — Additional costs*#
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
*http://www.gartner.com/newsroom/id/501733!
#http://www.mckinsey.com/insights/business_technology/
open_data_unlocking_innovation_and_performance_with_liquid_information
Data Quality Assessment
12Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
Data Quality Assessment
12
How can one assess the quality of data and make this
information explicit?!
Which criteria should be assessed?!
Which measures should be used?!
Which methodologies/tools can be utilized?
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
Main Research Question
13Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
Main Research Question
13
How can we exploit Linked Data for a particular use
case and ensure good data quality?
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — Linked Data Quality
Overview
14
Systematic!
literature!
review
Linked Data Quality Assessment !
Methodologies Evaluation
User-driven Crowdsourcing
Semi-!
automated
Use case!
leveraging!
quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Overview
15
Systematic!
literature!
review
Linked Data Quality Assessment !
Methodologies Evaluation
User-driven Crowdsourcing
Semi-!
automated
Use case!
leveraging!
quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Current State
16Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
Current State
16
Lack of unified descriptions for data quality dimensions
and metrics for Linked Data
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
Current State
16
Lack of unified descriptions for data quality dimensions
and metrics for Linked Data
Lack of use-case-driven data quality assessment
methodologies for Linked Data
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
Current State
16
Lack of unified descriptions for data quality dimensions
and metrics for Linked Data
Lack of use-case-driven data quality assessment
methodologies for Linked Data
Lack of quality assessment of datasets before utilisation
in particular use cases
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
17
Research Questions
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
17
RQ1 What are the existing approaches to assess the
quality of Linked Data employing a conceptual
framework integrating prior approaches?!
RQ1.1 What are the data quality problems that each
approach assesses?!
RQ1.2 Which are the data quality dimensions and
metrics supported by the proposed approaches?!
RQ1.3 Which tools already exist to assess the
quality of Linked Data?
Research Questions
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
Qualitative Analysis
18
Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea
Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
Qualitative Analysis
18
30 core articles
Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea
Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
Qualitative Analysis
18
30 core articles
18 dimensions - definitions
Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea
Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
Qualitative Analysis
18
30 core articles
18 dimensions - definitions
69 metrics
Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea
Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
Qualitative Analysis
18
30 core articles
18 dimensions - definitions
69 metrics
12 tools compared using 8 attributes
Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea
Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
Dimensions
19Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
*specific for Linked Data
Dimensions
Relevancy
Conciseness
Timeliness
Rep.-
Conciseness
Interoperability
Consistency
Interpretability
Understandability
Versatility*
Availability
Performance* Interlinking*
Syntactic
Validity
Representation
Contextual
Intrinsic
Accessibility
Trustworthiness
Two dimensions
are related
Licensing*
Semantic
Accuracy
Completeness
Security*
Dim1 Dim2
19Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
*specific for Linked Data
Metrics
20
Linked Data Quality Metrics
Dimension Metric Description QN/QL*
Completeness Schema completeness
No. of classes and properties / !
total no. of classes and properties
QN
Interlinking
Detection of good
quality interlinks
(i) detection of (a) interlinking degree, (b)
clustering coefficient, (c) centrality, (d)
open sameAs chains and (e) description
richness through sameAs by using network
measures, (ii) via crowdsourcing	

QN
Timeliness Freshness of datasets Max{0, 1 − currency / volatility} QN
Trustworthiness
Trustworthiness of
information provider
indicating the level of trust for the
publisher on a scale of 1−9	

QL
*QN - Quantitative Metric ; *QL - Qualitative Metric
Systematic Literature Review
Tools
21
Trellis TrustBOT tSPARQL WIQA ProLOD Flemming
Availablility - -
✔	

- -
✔	

Licensing Open-
source
- GPL v3 Apache v2 - -
Automation Semi-
automated
Semi-
automated
Semi-
automated
Semi-
automated
Semi-
automated
Semi-
automated
Collaboration Yes No No No No No
Customizability
✔	

 ✔	

 ✔	

 ✔	

 ✔	

 ✔	

Scalability - No Yes - - No
Usability 2 4 4 2 2 3
Maintainance 2005 2003 2012 2006 2010 2010
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
Tools
22
LinkQA Sieve RDFUnit DaCura TripleCheckMate LiQuate
Availablility
✔	

 ✔	

 ✔	

-
✔	

 ✔	

Licensing Open-
source
Apache Apache - Apache -
Automation Automated
Semi-
automated
Semi-
automated
Semi-
automated
Semi-automated
Semi-
automated
Collaboration No No No Yes yes No
Customizability No
✔	

 ✔	

 ✔	

 ✔	

No
Scalability Yes Yes Yes No Yes No
Usability 2 4 3 1 5 1
Maintainance 2011 2012 2014 2013 2013 2013
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
Problems in Current Approaches
23Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
Problems in Current Approaches
23
Not catered to the use case
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
Problems in Current Approaches
23
Not catered to the use case
Results difficult to interpret
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
Problems in Current Approaches
23
Not catered to the use case
Results difficult to interpret
Do not report the root cause of the quality issues
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
Problems in Current Approaches
23
Not catered to the use case
Results difficult to interpret
Do not report the root cause of the quality issues
Require considerable amount of configuration
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
Problems in Current Approaches
23
Not catered to the use case
Results difficult to interpret
Do not report the root cause of the quality issues
Require considerable amount of configuration
Do not allow user to choose input dataset
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Systematic Literature Review
Overview
24
Systematic!
literature!
review
Linked Data Quality Assessment !
Methodologies Evaluation
User-driven Crowdsourcing
Semi-!
automated
Use case!
leveraging!
quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Research Questions
25
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Research Questions
25
RQ2 How can we assess the quality of Linked Data
using a user-driven methodology?!
RQ2.1 How feasible is it to employ Linked Data
experts to assess the quality issues of LD?!
RQ2.2 How feasible is it to use a combination of
user-driven and semi-automated methodology to
assess the quality of LD?
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Methodology
26
Resource Selection
[Per Class] [Manual]
[Random]
Resource
Evaluation mode
selection
Resource Evaluation
[Manual]
Triples
[Semi-automatic] [Automatic]
List of invalid facts
Data Quality
Improvement
Pre-selection
of triples
Patch Ontology
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement
Methodology
26
Resource Selection
[Per Class] [Manual]
[Random]
Resource
Evaluation mode
selection
Resource Evaluation
[Manual]
Triples
[Semi-automatic] [Automatic]
List of invalid facts
Data Quality
Improvement
Pre-selection
of triples
Patch Ontology
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement
Manual!
Semi-automated!
!
!
!
!
Manual — Phase I
27
Linked Data Quality Problem Taxonomy
Dimensions Category
Accuracy
Triple incorrectly extracted!
Datatype problems!
Implicit relationships between
attributesRelevancy Irrelevant information extracted
Representational consistency Representation of number values
Interlinking
External links
Interlinks with other datasets
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Manual — Phase II
28
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Manual — Phase II
28
Invited Linked Data experts!
Triple-based evaluation!
Contest-based - 3 weeks
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Phase II —TripleCheckMate
29
User-Driven Quality Assessment
https://github.com/AKSW/TripleCheckMate
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Choose a resource
Phase II —TripleCheckMate
30
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Identify erroneous triples
Phase II —TripleCheckMate
30
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Identify erroneous triples
Phase II —TripleCheckMate
30
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Identify erroneous triples
Phase II —TripleCheckMate
31
User-Driven Quality Assessment
Map to the quality problem taxonomy
Manual — Results
32
Total no. of users 58
Total no. of distinct resources evaluated 521
Total no. of distinct incorrect triples 2928
% of triples affected 11.93%
Resource-based inter-rater agreement (Cohen’s kappa) 0.34
Total no. of triples evaluated for correctness 700
% of triples evaluated incorrectly 19%
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Semi-automated — Step 1
33
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
Functionality
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
Functionality
Inverse functionality
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
Functionality
Inverse functionality
Asymmetric
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
Functionality
Inverse functionality
Asymmetric
Irreflexivity
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
Functionality
Inverse functionality
Asymmetric
Irreflexivity
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Example:
Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
Functionality
Inverse functionality
Asymmetric
Irreflexivity
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Example:
Domain: Formula One
Racer
Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
Functionality
Inverse functionality
Asymmetric
Irreflexivity
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Example:
Domain: Formula One
Racer
Range: Grand Prix
Semi-automated — Step 1
33
Generate schema axioms
for properties via DL-
Learner*
Functionality
Inverse functionality
Asymmetric
Irreflexivity
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of
Machine Learning Research, 10:2639–2642.!
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Example:
Domain: Formula One
Racer
Range: Grand Prix
Only 1 first win of each
Formula One Racer
(Functional)
Semi-automated — Step 2
34
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Semi-automated — Step 2
34
Manual evaluation of generated axioms
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Semi-automated — Step 2
34
Manual evaluation of generated axioms
100 random axioms per type
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Semi-automated — Step 2
34
Manual evaluation of generated axioms
100 random axioms per type
Only those axioms where at least one violation can
be found
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Semi-automated — Step 2
34
Manual evaluation of generated axioms
100 random axioms per type
Only those axioms where at least one violation can
be found
Also taking target context into account
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Semi-automated — Results
35
User-Driven Quality Assessment
Inverse!
functionality
Functionality
Asymmetry
Irreflexivity
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Summary
36
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Summary
36
Quality analysis of over 500 resources
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Summary
36
Quality analysis of over 500 resources
12% error detected
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Summary
36
Quality analysis of over 500 resources
12% error detected
Linked Data experts performed quality analysis but
evaluated correct triples as errors
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Summary
36
Quality analysis of over 500 resources
12% error detected
Linked Data experts performed quality analysis but
evaluated correct triples as errors
75% functionality violations of property characteristics
detected but required manual verification
User-Driven Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Overview
37
Systematic!
literature!
review
Linked Data Quality Assessment !
Methodologies Evaluation
User-driven Crowdsourcing
Semi-!
automated
Use case!
leveraging!
quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Research Questions
38
Crowdsourcing Linked Data Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Research Questions
38
RQ2.3 Is it possible to detect quality issues in LD
datasets via crowdsourcing mechanisms?!
RQ2.4 What type of crowd is most suitable for each
type of quality issue?!
RQ2.5 Which types of assessment errors are made by
lay users and experts?
Crowdsourcing Linked Data Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Concepts
39
Crowdsourcing Linked Data Quality Assessment
- Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena
Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
- Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta,
Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann.
SWJ (Submitted) 2015.
Concepts
39
AMT - Amazon Mechanial Turk
Crowdsourcing Linked Data Quality Assessment
- Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena
Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
- Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta,
Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann.
SWJ (Submitted) 2015.
Concepts
39
AMT - Amazon Mechanial Turk
HITs - Human Intelligent Tasks/microtasks
Crowdsourcing Linked Data Quality Assessment
- Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena
Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
- Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta,
Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann.
SWJ (Submitted) 2015.
Concepts
39
AMT - Amazon Mechanial Turk
HITs - Human Intelligent Tasks/microtasks
MTurk Workers - monetary reward for each HIT
Crowdsourcing Linked Data Quality Assessment
- Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena
Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
- Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta,
Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann.
SWJ (Submitted) 2015.
Concepts
39
AMT - Amazon Mechanial Turk
HITs - Human Intelligent Tasks/microtasks
MTurk Workers - monetary reward for each HIT
Find-Fix-Verify phases
Crowdsourcing Linked Data Quality Assessment
- Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena
Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
- Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta,
Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann.
SWJ (Submitted) 2015.
Methodology
40
Resource
[Manual]
[Any]
Resource
selection
Evaluation of
resource’s
triples
Selection of
quality issues
[Incorrect triples]
[Yes]
[No]
List of incorrect
triples classified
by quality issue
(Find stage)
LD Experts in contest
HIT generation
(Verify stage)
Workers in paid microtasks
Accept HIT
Assess triple
according to
the given
quality issue
Submit HIT
[Per Class]
[Correct]
[Incorrect]
[Data doesn’t
make sense]
[I don’t
know]
[More triples to assess]
[No]
[Yes]
Experts Workers
Crowdsourcing Linked Data Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Quality IssuesTypes
41
Crowdsourcing Linked Data Quality Assessment
Quality IssuesTypes
41
Incorrect/incomplete object value
Crowdsourcing Linked Data Quality Assessment
Quality IssuesTypes
41
Incorrect/incomplete object value
dbpedia:Oreye! !
dbpedia-owl:postalCode! !
“4360”!@en
Incorrect datatypes/literals
Crowdsourcing Linked Data Quality Assessment
Quality IssuesTypes
41
Incorrect/incomplete object value
Incorrect interlink
dbpedia:Oreye! !
dbpedia-owl:postalCode! !
“4360”!@en
Incorrect datatypes/literals
Crowdsourcing Linked Data Quality Assessment
Results - Experts vs. Crowd
42
Crowdsourcing Linked Data Quality Assessment
LD Expert MTurk Worker
58 80
3 weeks
4 days
1512
1073
0.38 0.73
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
LD experts MTurk Workers
Object values
Fair!
- required validation
Fair!
- simple comparisons
Datatypes & literals
Fair!
- required validation
Poor!
- inexperienced with
RDF
Interlinks
Poor!
- high effort required
Good!
- high inter-rater
agreement
Summary — Experts vs. Crowd
43
Crowdsourcing Linked Data Quality Assessment
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Overview
44
Systematic!
literature!
review
Linked Data Quality Assessment !
Methodologies Evaluation
User-driven Crowdsourcing
Semi-!
automated
Use case!
leveraging!
quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Research Questions
45
Use Case Leveraging Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Research Questions
45
RQ2.6 How can we semi-automatically assess the
quality of datasets and provide meaningful results to
the user?!
RQ3 How can we exploit Linked Data for building a
use case and ensure good data quality?
Use Case Leveraging Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Motivation — User Scenario
46
Use Case Leveraging Data Quality
Motivation — User Scenario
46
Healthcare!
Policy maker
Use Case Leveraging Data Quality
Motivation — User Scenario
46
Healthcare!
Policy maker
Use Case Leveraging Data Quality
Which diseases?!
Deaths per diseases?!
Where to allocate funds?
interested in
Motivation — User Scenario
46
Healthcare!
Policy maker
Use Case Leveraging Data Quality
Which diseases?!
Deaths per diseases?!
Where to allocate funds?
interested in
Databases!
e.g. WHO, !
ClinicalTrials.gov
looks at
Motivation — User Scenario
46
Healthcare!
Policy maker
Use Case Leveraging Data Quality
Which diseases?!
Deaths per diseases?!
Where to allocate funds?
interested in
Databases!
e.g. WHO, !
ClinicalTrials.gov
looks at
Data in disparate datasets, !
in different formats!
Data quality problems!
Subset of data!
Error-prone analysis etc.
analysis
Motivation — User Scenario
46
Healthcare!
Policy maker
Use Case Leveraging Data Quality
Which diseases?!
Deaths per diseases?!
Where to allocate funds?
interested in
Databases!
e.g. WHO, !
ClinicalTrials.gov
looks at
Data in disparate datasets, !
in different formats!
Data quality problems!
Subset of data!
Error-prone analysis etc.
analysis translates to Inadequate !
allocations of!
funds
Use Case — Societal Progress
Indicators
47
Evaluate the impact of Research & Development (R&D)
— educational performance — on a country’s
performance in:!
Economical!
Healthcare
Use Case Leveraging Data Quality
Using Linked Data to evaluate the impact of Research and Development in Europe: a Structural Equation
Model. Amrapali Zaveri, Joao Ricardo Nickenig Vissoci, Cinzia Daraio and Ricardo Pietrobon.
ISWC 2013.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Datasets &Variables
48
Use Case Leveraging Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Datasets &Variables
48
4 datasets!
World Bank!
LinkedCT!
Scimago!
USPTO
Use Case Leveraging Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Datasets &Variables
48
4 datasets!
World Bank!
LinkedCT!
Scimago!
USPTO
17 variables !
Examples!
GDP (economical)!
Birth rate, death
rate (healthcare)!
h-index
(educational)
Use Case Leveraging Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Methodology
49
World Bank Scimago
LinkedCT USPTO
Use Case Leveraging Data Quality
*van Hage, W. R., Kauppinen, T., Graeler, B., Davis, C., Hoek- sema, J., Ruttenberg, A., and Bahls, D.
(2014). SPARQL Package, v1.6. R Foundation for Statistical Computing.!
* https://github.com/amrapalijz/R-LOD-SEM/blob/master/RSPARQL
extract
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Methodology
49
World Bank Scimago
LinkedCT USPTO
RSPARQL*
Use Case Leveraging Data Quality
*van Hage, W. R., Kauppinen, T., Graeler, B., Davis, C., Hoek- sema, J., Ruttenberg, A., and Bahls, D.
(2014). SPARQL Package, v1.6. R Foundation for Statistical Computing.!
* https://github.com/amrapalijz/R-LOD-SEM/blob/master/RSPARQL
extract
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Methodology
49
World Bank Scimago
LinkedCT USPTO
RSPARQL*
Use Case Leveraging Data Quality
*van Hage, W. R., Kauppinen, T., Graeler, B., Davis, C., Hoek- sema, J., Ruttenberg, A., and Bahls, D.
(2014). SPARQL Package, v1.6. R Foundation for Statistical Computing.!
* https://github.com/amrapalijz/R-LOD-SEM/blob/master/RSPARQL
perform
extract
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Methodology
49
World Bank Scimago
LinkedCT USPTO
Quality !
Assessment
RSPARQL*
Use Case Leveraging Data Quality
*van Hage, W. R., Kauppinen, T., Graeler, B., Davis, C., Hoek- sema, J., Ruttenberg, A., and Bahls, D.
(2014). SPARQL Package, v1.6. R Foundation for Statistical Computing.!
* https://github.com/amrapalijz/R-LOD-SEM/blob/master/RSPARQL
perform
extract
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
R2RLint tool*!
7 dimensions!
13 quality metrics !
Use case specific
Semi-automated Quality
Assessment
50
*https://github.com/AKSW/R2RLint
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
R2RLint tool*!
7 dimensions!
13 quality metrics !
Use case specific
Semi-automated Quality
Assessment
50
Availability Completeness
Interlinking
Syntactic!
validity!
Consistency
Interpretability
Representational
conciseness
*https://github.com/AKSW/R2RLint
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Quality Assessment Results
51
Use Case Leveraging Data Quality
Interlinking !
completeness
Population !
incompleteness
Inconsistency
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Total no. detected
11/17Variables
52
Latent
variables
Observed variables
Educational!
performance
Number of articles (h) that have at least h citations (h-index)
Total no. of documents published per country per year
High-technology export (HTE)
Healthcare!
performance
Adolescent fertility rate (AFR)
Birth rate (BR)
Death rate (DR)
Health expenditure public (HEP)
Immunization DPT (IDPT)
Immunization measles (IM)
Mortality rate, infant (MR)
Economic
performance
GDP per capita (current US$)
Use Case Leveraging Data Quality
Methodology
53
World Bank
Scimago
Structural Equation Modeling
EFA*-CFA*-!
EFA-CFA
Apply SEM to !
hypothesis
variables
Step I
Step II
Use Case Leveraging Data Quality
apply
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Methodology
53
World Bank
Scimago
Structural Equation Modeling
EFA*-CFA*-!
EFA-CFA
Apply SEM to !
hypothesis
variables
Step I
Step II
*EFA - Exploratory Factor Analysis!
*CFA - Confirmatory Factor Analysis
Use Case Leveraging Data Quality
apply
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Theoretical Framework
54
Use Case Leveraging Data Quality
Educational !
performance
Healthcare!
performance
Economical!
performance
correlation
correlation
correlation
Structural Equation Modeling
55
Use Case Leveraging Data Quality
https://github.com/amrapalijz/R-LOD-SEM/blob/master/sem_script.R
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
#Insert covariance matrix!
var<-var(semdata)!
cov<-cov(datanew)!
cor<-cor(datanew)!
#Acquire data!
> data<-
with(data,data.frame(hindex,n
oOfDocs,IDPT,IM,MR,!
AFR,BR,DR,GDP,HEP,HET))!
> ssemmodel<- specifyModel()!
#Latent Variables!
> HealthCare->IDPT,efa14,NA;
HealthCare->IM,efa11,NA;
HealthCare-> MR,efa12,NA;
HealthCare->AFR,efa13,NA;!
….!
#Running SEM model!
> sem <- sem::sem(semmodel,cor,
N=781)!
> summary(sem,fit.indices=c("GFI",
"AGFI", "RMSEA", "NFI","NNFI",
"CFI", "RNI", "IFI", "SRMR", "AIC",
"AICc"))!
> modIndices(sem)!
> qgraph(sem,cut = 0.8,gray=TRUE)
Structural Equation Modeling
56
Use Case Leveraging Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Theoretical Framework
57
Use Case Leveraging Data Quality
Educational !
performance
Healthcare!
performance
Economical!
performance
correlation
correlation
correlation
Conclusions
58
Use Case Leveraging Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Conclusions
58
Performing robust statistical analysis on Linked Data
can lead to important and meaningful insights on
publicly available data for societal progress
measurement.!
Importance of performing use-case driven data quality
assessment of datasets before their utilisation.
Use Case Leveraging Data Quality
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Contributions
59
Contributions
59
Comprehensive survey !
18 data quality dimensions with definitions; 69 metrics!
12 tools compared according to 8 attributes!
Development and evaluation of data quality assessment methodologies!
User-driven - manual and semi-automated!
Crowdsourcing - experts vs. workers!
Semi-automated - application to a use case !
Consumption of Linked Data leveraging data quality
FutureWork
60Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
FutureWork
60
Standardized Quality assessment methodology for
Linked Data
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
FutureWork
60
Standardized Quality assessment methodology for
Linked Data
Quality assessment tools for Linked Data
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
FutureWork
60
Standardized Quality assessment methodology for
Linked Data
Quality assessment tools for Linked Data
Detection as well as improvement of quality issues
before utilization in Linked Data use cases
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Conference Publications
61
Using Linked Data to evaluate the impact of Research and
Development in Europe: a Structural Equation Model.
Amrapali Zaveri, Joao Ricardo Nickenig Vissoci, Cinzia
Daraio and Ricardo Pietrobon. ISWC 2013.!
Crowdsourcing Linked Data quality assessment. Maribel Acosta
and Amrapali Zaveri, Elena Simperl, Dimitris
Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013. !
User-driven Quality Evaluation of DBpedia. Amrapali Zaveri,
Dimitris Kontokostas, Mohamed A. Sherif, Lorenz
Bühmann, Mohamed Morsey, Sören Auer and Jens
Lehmann. ISEMANTICS 2013.
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Journal Publications
62
Quality assessment methodologies for Linked Data: A Survey.
Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo
Pietrobon, Jens Lehmann and Sören Auer. Semantic Web
Journal 2015.!
Using Linked Data to build an Observatory of Societal Progress
Indicators. Amrapali Zaveri, Joao Ricardo Nickenig Vissoci,
Patrick Westphal, Jose Roberto Nascimento Junior, Luciano de
Andrade, Cinzia Daraio, Jens Lehmann. Journal of Web
Semantics 2014 (under review).!
Publishing and Interlinking the USPTO Patent Data. Amrapali
Zaveri, Mofeed M. Hassan, Tariq Yousef, Sören Auer, Jens
Lehmann. Semantic Web Journal 2014 (under review).
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Publications
63
No. of publications: 34 (Google Scholar),16 (DBLP)!
Citations: 251 !
h-index: 9; i-10 index: 8 (Google Scholar)
Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
Thank you for your attention !!
Questions?
zaveri@informatik.uni-leipzig.de
64

Contenu connexe

Tendances

ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodKarry Lu
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustMenchita Falcutila Dumlao
 
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
BIOLINK 2008:    Linking database submissions to primary citations with PubMe...BIOLINK 2008:    Linking database submissions to primary citations with PubMe...
BIOLINK 2008: Linking database submissions to primary citations with PubMe...Heather Piwowar
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerOpenSource Connections
 
Paradigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4
 
Recommender System in light of Big Data
Recommender System in light of Big DataRecommender System in light of Big Data
Recommender System in light of Big DataKhadija Atiya
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflowsSSSW
 
A BRIEF SURVEY OF QUESTION ANSWERING SYSTEMS
A BRIEF SURVEY OF QUESTION ANSWERING SYSTEMSA BRIEF SURVEY OF QUESTION ANSWERING SYSTEMS
A BRIEF SURVEY OF QUESTION ANSWERING SYSTEMSijaia
 
BioIT 2017 - Ontoforce and Amgen Gene Knowledge Discovery
BioIT 2017 - Ontoforce and Amgen Gene Knowledge DiscoveryBioIT 2017 - Ontoforce and Amgen Gene Knowledge Discovery
BioIT 2017 - Ontoforce and Amgen Gene Knowledge DiscoveryWolfgang G. Hoeck
 
Semantics in the Enterprise: Roles & Capabilities
Semantics in the Enterprise: Roles & CapabilitiesSemantics in the Enterprise: Roles & Capabilities
Semantics in the Enterprise: Roles & CapabilitiesChristine Connors
 
Invited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information RetrievalInvited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information RetrievalDavidMaxwell77
 
Semantic Search at Yahoo
Semantic Search at YahooSemantic Search at Yahoo
Semantic Search at YahooPeter Mika
 
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document ...
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document ...Capturing the Ineffable: Collecting, Analysing, and Automating Web Document ...
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document ...Davide Ceolin
 
Practical Approaches to Sharing Information
Practical Approaches to Sharing InformationPractical Approaches to Sharing Information
Practical Approaches to Sharing InformationChristine Connors
 
140127 Performance Metrics WG
140127 Performance Metrics WG140127 Performance Metrics WG
140127 Performance Metrics WGGenomeInABottle
 
XAPI and Machine Learning for Patient / Learner
XAPI and Machine Learning for Patient / LearnerXAPI and Machine Learning for Patient / Learner
XAPI and Machine Learning for Patient / LearnerJessie Chuang
 
an empirical performance evaluation of relational keyword search techniques
an empirical performance evaluation of relational keyword search techniquesan empirical performance evaluation of relational keyword search techniques
an empirical performance evaluation of relational keyword search techniquesswathi78
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Rich Heimann
 

Tendances (20)

ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For Good
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrust
 
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
BIOLINK 2008:    Linking database submissions to primary citations with PubMe...BIOLINK 2008:    Linking database submissions to primary citations with PubMe...
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
 
Paradigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the table
 
Trust Management: A Tutorial
Trust Management: A TutorialTrust Management: A Tutorial
Trust Management: A Tutorial
 
Recommender System in light of Big Data
Recommender System in light of Big DataRecommender System in light of Big Data
Recommender System in light of Big Data
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflows
 
A BRIEF SURVEY OF QUESTION ANSWERING SYSTEMS
A BRIEF SURVEY OF QUESTION ANSWERING SYSTEMSA BRIEF SURVEY OF QUESTION ANSWERING SYSTEMS
A BRIEF SURVEY OF QUESTION ANSWERING SYSTEMS
 
BioIT 2017 - Ontoforce and Amgen Gene Knowledge Discovery
BioIT 2017 - Ontoforce and Amgen Gene Knowledge DiscoveryBioIT 2017 - Ontoforce and Amgen Gene Knowledge Discovery
BioIT 2017 - Ontoforce and Amgen Gene Knowledge Discovery
 
Semantics in the Enterprise: Roles & Capabilities
Semantics in the Enterprise: Roles & CapabilitiesSemantics in the Enterprise: Roles & Capabilities
Semantics in the Enterprise: Roles & Capabilities
 
Invited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information RetrievalInvited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information Retrieval
 
Semantic Search at Yahoo
Semantic Search at YahooSemantic Search at Yahoo
Semantic Search at Yahoo
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
 
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document ...
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document ...Capturing the Ineffable: Collecting, Analysing, and Automating Web Document ...
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document ...
 
Practical Approaches to Sharing Information
Practical Approaches to Sharing InformationPractical Approaches to Sharing Information
Practical Approaches to Sharing Information
 
140127 Performance Metrics WG
140127 Performance Metrics WG140127 Performance Metrics WG
140127 Performance Metrics WG
 
XAPI and Machine Learning for Patient / Learner
XAPI and Machine Learning for Patient / LearnerXAPI and Machine Learning for Patient / Learner
XAPI and Machine Learning for Patient / Learner
 
an empirical performance evaluation of relational keyword search techniques
an empirical performance evaluation of relational keyword search techniquesan empirical performance evaluation of relational keyword search techniques
an empirical performance evaluation of relational keyword search techniques
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
 

Similaire à Amrapali Zaveri Defense

FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked DataFAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked DataLinked Enterprise Date Services
 
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic InnovationLFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic InnovationAmazon Web Services
 
Data Collection Tool Used For Information About Individuals
Data Collection Tool Used For Information About IndividualsData Collection Tool Used For Information About Individuals
Data Collection Tool Used For Information About IndividualsChristy Hunt
 
planning tools and techniqes by book of stephen robonsPresentation of mana...
planning tools and techniqes    by book of stephen robonsPresentation of mana...planning tools and techniqes    by book of stephen robonsPresentation of mana...
planning tools and techniqes by book of stephen robonsPresentation of mana...Neha Raja
 
17568 hbr sas report_webview
17568 hbr sas report_webview17568 hbr sas report_webview
17568 hbr sas report_webviewR Sekar Ramajeyam
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingDenodo
 
Assessment of Constraints to Data Use
Assessment of Constraints to Data UseAssessment of Constraints to Data Use
Assessment of Constraints to Data UseMEASURE Evaluation
 
TUW-ASE-SUmmer 2014: Evaluating and Utilizing Data Concerns for DaaS
TUW-ASE-SUmmer 2014: Evaluating and Utilizing Data Concerns for DaaSTUW-ASE-SUmmer 2014: Evaluating and Utilizing Data Concerns for DaaS
TUW-ASE-SUmmer 2014: Evaluating and Utilizing Data Concerns for DaaSHong-Linh Truong
 
Use of monitoring data for evidence-based decision making: A factor analysis
Use of monitoring data for evidence-based decision making: A factor analysisUse of monitoring data for evidence-based decision making: A factor analysis
Use of monitoring data for evidence-based decision making: A factor analysisIRC
 
RWE & Patient Analytics Leveraging Databricks – A Use Case
RWE & Patient Analytics Leveraging Databricks – A Use CaseRWE & Patient Analytics Leveraging Databricks – A Use Case
RWE & Patient Analytics Leveraging Databricks – A Use CaseDatabricks
 
Make Your Reports Over the Counter
Make Your Reports Over the CounterMake Your Reports Over the Counter
Make Your Reports Over the CounterTIBCO Jaspersoft
 
Operational Risk Management Data Validation Architecture
Operational Risk Management Data Validation ArchitectureOperational Risk Management Data Validation Architecture
Operational Risk Management Data Validation ArchitectureAlan McSweeney
 
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R gerogepatton
 
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND R
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND RANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND R
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND Rijaia
 
How to unlock new data-driven potential for your organization
How to unlock new data-driven potential for your organizationHow to unlock new data-driven potential for your organization
How to unlock new data-driven potential for your organizationMichal Hodinka
 

Similaire à Amrapali Zaveri Defense (20)

Data Driven Philantropy
Data Driven PhilantropyData Driven Philantropy
Data Driven Philantropy
 
FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked DataFAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic InnovationLFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
 
Data Collection Tool Used For Information About Individuals
Data Collection Tool Used For Information About IndividualsData Collection Tool Used For Information About Individuals
Data Collection Tool Used For Information About Individuals
 
The evolution of decision making
The evolution of decision makingThe evolution of decision making
The evolution of decision making
 
planning tools and techniqes by book of stephen robonsPresentation of mana...
planning tools and techniqes    by book of stephen robonsPresentation of mana...planning tools and techniqes    by book of stephen robonsPresentation of mana...
planning tools and techniqes by book of stephen robonsPresentation of mana...
 
17568 hbr sas report_webview
17568 hbr sas report_webview17568 hbr sas report_webview
17568 hbr sas report_webview
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-Making
 
Assessment of Constraints to Data Use
Assessment of Constraints to Data UseAssessment of Constraints to Data Use
Assessment of Constraints to Data Use
 
TUW-ASE-SUmmer 2014: Evaluating and Utilizing Data Concerns for DaaS
TUW-ASE-SUmmer 2014: Evaluating and Utilizing Data Concerns for DaaSTUW-ASE-SUmmer 2014: Evaluating and Utilizing Data Concerns for DaaS
TUW-ASE-SUmmer 2014: Evaluating and Utilizing Data Concerns for DaaS
 
Use of monitoring data for evidence-based decision making: A factor analysis
Use of monitoring data for evidence-based decision making: A factor analysisUse of monitoring data for evidence-based decision making: A factor analysis
Use of monitoring data for evidence-based decision making: A factor analysis
 
RWE & Patient Analytics Leveraging Databricks – A Use Case
RWE & Patient Analytics Leveraging Databricks – A Use CaseRWE & Patient Analytics Leveraging Databricks – A Use Case
RWE & Patient Analytics Leveraging Databricks – A Use Case
 
Make Your Reports Over the Counter
Make Your Reports Over the CounterMake Your Reports Over the Counter
Make Your Reports Over the Counter
 
Operational Risk Management Data Validation Architecture
Operational Risk Management Data Validation ArchitectureOperational Risk Management Data Validation Architecture
Operational Risk Management Data Validation Architecture
 
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R
 
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND R
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND RANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND R
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND R
 
How to unlock new data-driven potential for your organization
How to unlock new data-driven potential for your organizationHow to unlock new data-driven potential for your organization
How to unlock new data-driven potential for your organization
 
do_dq.pdf
do_dq.pdfdo_dq.pdf
do_dq.pdf
 
Arjun Thiagarajan_06_01
Arjun Thiagarajan_06_01Arjun Thiagarajan_06_01
Arjun Thiagarajan_06_01
 

Plus de Amrapali Zaveri, PhD

CrowdED: Guideline for optimal Crowdsourcing Experimental Design
CrowdED: Guideline for optimal Crowdsourcing Experimental DesignCrowdED: Guideline for optimal Crowdsourcing Experimental Design
CrowdED: Guideline for optimal Crowdsourcing Experimental DesignAmrapali Zaveri, PhD
 
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality AssessmentMetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality AssessmentAmrapali Zaveri, PhD
 
smartAPI: Towards a more intelligent network of Web APIs
smartAPI: Towards a more intelligent network of Web APIssmartAPI: Towards a more intelligent network of Web APIs
smartAPI: Towards a more intelligent network of Web APIsAmrapali Zaveri, PhD
 
Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyLinked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyAmrapali Zaveri, PhD
 
Towards Biomedical Data Integration for Analyzing the Evolution of Cognition
Towards Biomedical Data Integration for Analyzing the Evolution of CognitionTowards Biomedical Data Integration for Analyzing the Evolution of Cognition
Towards Biomedical Data Integration for Analyzing the Evolution of CognitionAmrapali Zaveri, PhD
 
User-driven Quality Evaluation of DBpedia
User-driven Quality Evaluation of DBpediaUser-driven Quality Evaluation of DBpedia
User-driven Quality Evaluation of DBpediaAmrapali Zaveri, PhD
 

Plus de Amrapali Zaveri, PhD (13)

ESOF Panel 2018
ESOF Panel 2018ESOF Panel 2018
ESOF Panel 2018
 
CrowdED: Guideline for optimal Crowdsourcing Experimental Design
CrowdED: Guideline for optimal Crowdsourcing Experimental DesignCrowdED: Guideline for optimal Crowdsourcing Experimental Design
CrowdED: Guideline for optimal Crowdsourcing Experimental Design
 
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality AssessmentMetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
 
smartAPI: Towards a more intelligent network of Web APIs
smartAPI: Towards a more intelligent network of Web APIssmartAPI: Towards a more intelligent network of Web APIs
smartAPI: Towards a more intelligent network of Web APIs
 
Introduction to Bio SPARQL
Introduction to Bio SPARQL Introduction to Bio SPARQL
Introduction to Bio SPARQL
 
Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyLinked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A Survey
 
LDQ 2014 DQ Methodology
LDQ 2014 DQ MethodologyLDQ 2014 DQ Methodology
LDQ 2014 DQ Methodology
 
LOD-SEM
LOD-SEMLOD-SEM
LOD-SEM
 
TripleCheckMate
TripleCheckMateTripleCheckMate
TripleCheckMate
 
Towards Biomedical Data Integration for Analyzing the Evolution of Cognition
Towards Biomedical Data Integration for Analyzing the Evolution of CognitionTowards Biomedical Data Integration for Analyzing the Evolution of Cognition
Towards Biomedical Data Integration for Analyzing the Evolution of Cognition
 
User-driven Quality Evaluation of DBpedia
User-driven Quality Evaluation of DBpediaUser-driven Quality Evaluation of DBpedia
User-driven Quality Evaluation of DBpedia
 
Converting GHO to RDF
Converting GHO to RDFConverting GHO to RDF
Converting GHO to RDF
 
ReDD-Observatory
ReDD-ObservatoryReDD-Observatory
ReDD-Observatory
 

Dernier

《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 

Dernier (20)

Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 

Amrapali Zaveri Defense

  • 1. 17th April, 2015! ! ! ! ! ! ! ! ! ! ! ! ! ! ! Leipzig, Germany Linked Data Quality Assessment and its Application to Societal Progress Measurement Amrapali Zaveri 1 Faculty of Mathematics and Computer Science! ! Supervisors:! Prof. Dr. Ing. habil. Klaus-Peter Fähnrich, University of Leipzig! Dr. Jens Lehmann, University of Leipzig! Prof. Dr. Sören Auer, University of Bonn
  • 2. Outline 2Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 3. Outline Motivation — Linked Data Quality 2Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 4. Outline Motivation — Linked Data Quality Linked Data Quality Assessment Methodologies 2Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 5. Outline Motivation — Linked Data Quality Linked Data Quality Assessment Methodologies Use Case Leveraging Data Quality 2Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 6. Outline Motivation — Linked Data Quality Linked Data Quality Assessment Methodologies Use Case Leveraging Data Quality Contributions 2Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 7. Outline Motivation — Linked Data Quality Linked Data Quality Assessment Methodologies Use Case Leveraging Data Quality Contributions Future Work 2Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 8. Motivation! — Linked Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri 3
  • 9. Data on theWeb 4 Motivation — Linked Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 10. Data on theWeb 5 Motivation — Linked Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 11. Data on theWeb 5 Accessible Motivation — Linked Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 12. Data on theWeb 5 Accessible Re-usable Motivation — Linked Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 13. Data on theWeb 5 Accessible Re-usable Understandable Motivation — Linked Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 14. Data on theWeb 5 Accessible Re-usable Understandable Discoverable Motivation — Linked Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 15. Linked Data Principles 6 Motivation — Linked Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 16. Linked Data Principles 6 Use URIs as names for things. Motivation — Linked Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 17. Linked Data Principles 6 Use URIs as names for things. Use HTTP URIs, so that people can look up those names. Motivation — Linked Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 18. Linked Data Principles 6 Use URIs as names for things. Use HTTP URIs, so that people can look up those names. When someone looks up a URI, provide useful information, using the standards (RDF, RDFS, OWL, SPARQL). Motivation — Linked Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 19. Linked Data Principles 6 Use URIs as names for things. Use HTTP URIs, so that people can look up those names. When someone looks up a URI, provide useful information, using the standards (RDF, RDFS, OWL, SPARQL). Include links to other URIs, so that they can discover more things. Motivation — Linked Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 20. Linked Data 7 Motivation — Linked Data Quality
  • 21. Linked Data 8Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Motivation — Linked Data Quality
  • 22. Linked Data 9 Motivation — Linked Data Quality
  • 23. Linked Data 9 Motivation — Linked Data Quality
  • 24. Linked Data 9 What about the quality? Motivation — Linked Data Quality
  • 25. Data Quality 10 Data Quality is defined as:! “fitness for use”*! * Juran, J. (1974). The Quality Control Handbook. McGraw-Hill, New York. Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Motivation — Linked Data Quality
  • 26. Consequences of Poor Quality 11Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Motivation — Linked Data Quality *http://www.gartner.com/newsroom/id/501733! #http://www.mckinsey.com/insights/business_technology/ open_data_unlocking_innovation_and_performance_with_liquid_information
  • 27. Consequences of Poor Quality 11 Propagation of errors in integrated datasets Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Motivation — Linked Data Quality *http://www.gartner.com/newsroom/id/501733! #http://www.mckinsey.com/insights/business_technology/ open_data_unlocking_innovation_and_performance_with_liquid_information
  • 28. Consequences of Poor Quality 11 Propagation of errors in integrated datasets Major hindrance in acquiring reliable results Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Motivation — Linked Data Quality *http://www.gartner.com/newsroom/id/501733! #http://www.mckinsey.com/insights/business_technology/ open_data_unlocking_innovation_and_performance_with_liquid_information
  • 29. Consequences of Poor Quality 11 Propagation of errors in integrated datasets Major hindrance in acquiring reliable results Loss of important information Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Motivation — Linked Data Quality *http://www.gartner.com/newsroom/id/501733! #http://www.mckinsey.com/insights/business_technology/ open_data_unlocking_innovation_and_performance_with_liquid_information
  • 30. Consequences of Poor Quality 11 Propagation of errors in integrated datasets Major hindrance in acquiring reliable results Loss of important information Loss in productivity — Additional costs*# Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Motivation — Linked Data Quality *http://www.gartner.com/newsroom/id/501733! #http://www.mckinsey.com/insights/business_technology/ open_data_unlocking_innovation_and_performance_with_liquid_information
  • 31. Data Quality Assessment 12Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Motivation — Linked Data Quality
  • 32. Data Quality Assessment 12 How can one assess the quality of data and make this information explicit?! Which criteria should be assessed?! Which measures should be used?! Which methodologies/tools can be utilized? Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Motivation — Linked Data Quality
  • 33. Main Research Question 13Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Motivation — Linked Data Quality
  • 34. Main Research Question 13 How can we exploit Linked Data for a particular use case and ensure good data quality? Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Motivation — Linked Data Quality
  • 35. Overview 14 Systematic! literature! review Linked Data Quality Assessment ! Methodologies Evaluation User-driven Crowdsourcing Semi-! automated Use case! leveraging! quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 36. Overview 15 Systematic! literature! review Linked Data Quality Assessment ! Methodologies Evaluation User-driven Crowdsourcing Semi-! automated Use case! leveraging! quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 37. Current State 16Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 38. Current State 16 Lack of unified descriptions for data quality dimensions and metrics for Linked Data Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 39. Current State 16 Lack of unified descriptions for data quality dimensions and metrics for Linked Data Lack of use-case-driven data quality assessment methodologies for Linked Data Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 40. Current State 16 Lack of unified descriptions for data quality dimensions and metrics for Linked Data Lack of use-case-driven data quality assessment methodologies for Linked Data Lack of quality assessment of datasets before utilisation in particular use cases Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 41. 17 Research Questions Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 42. 17 RQ1 What are the existing approaches to assess the quality of Linked Data employing a conceptual framework integrating prior approaches?! RQ1.1 What are the data quality problems that each approach assesses?! RQ1.2 Which are the data quality dimensions and metrics supported by the proposed approaches?! RQ1.3 Which tools already exist to assess the quality of Linked Data? Research Questions Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 43. Qualitative Analysis 18 Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015. Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 44. Qualitative Analysis 18 30 core articles Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015. Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 45. Qualitative Analysis 18 30 core articles 18 dimensions - definitions Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015. Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 46. Qualitative Analysis 18 30 core articles 18 dimensions - definitions 69 metrics Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015. Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 47. Qualitative Analysis 18 30 core articles 18 dimensions - definitions 69 metrics 12 tools compared using 8 attributes Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015. Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 48. Dimensions 19Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review *specific for Linked Data
  • 49. Dimensions Relevancy Conciseness Timeliness Rep.- Conciseness Interoperability Consistency Interpretability Understandability Versatility* Availability Performance* Interlinking* Syntactic Validity Representation Contextual Intrinsic Accessibility Trustworthiness Two dimensions are related Licensing* Semantic Accuracy Completeness Security* Dim1 Dim2 19Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review *specific for Linked Data
  • 50. Metrics 20 Linked Data Quality Metrics Dimension Metric Description QN/QL* Completeness Schema completeness No. of classes and properties / ! total no. of classes and properties QN Interlinking Detection of good quality interlinks (i) detection of (a) interlinking degree, (b) clustering coefficient, (c) centrality, (d) open sameAs chains and (e) description richness through sameAs by using network measures, (ii) via crowdsourcing QN Timeliness Freshness of datasets Max{0, 1 − currency / volatility} QN Trustworthiness Trustworthiness of information provider indicating the level of trust for the publisher on a scale of 1−9 QL *QN - Quantitative Metric ; *QL - Qualitative Metric Systematic Literature Review
  • 51. Tools 21 Trellis TrustBOT tSPARQL WIQA ProLOD Flemming Availablility - - ✔ - - ✔ Licensing Open- source - GPL v3 Apache v2 - - Automation Semi- automated Semi- automated Semi- automated Semi- automated Semi- automated Semi- automated Collaboration Yes No No No No No Customizability ✔ ✔ ✔ ✔ ✔ ✔ Scalability - No Yes - - No Usability 2 4 4 2 2 3 Maintainance 2005 2003 2012 2006 2010 2010 Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 52. Tools 22 LinkQA Sieve RDFUnit DaCura TripleCheckMate LiQuate Availablility ✔ ✔ ✔ - ✔ ✔ Licensing Open- source Apache Apache - Apache - Automation Automated Semi- automated Semi- automated Semi- automated Semi-automated Semi- automated Collaboration No No No Yes yes No Customizability No ✔ ✔ ✔ ✔ No Scalability Yes Yes Yes No Yes No Usability 2 4 3 1 5 1 Maintainance 2011 2012 2014 2013 2013 2013 Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 53. Problems in Current Approaches 23Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 54. Problems in Current Approaches 23 Not catered to the use case Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 55. Problems in Current Approaches 23 Not catered to the use case Results difficult to interpret Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 56. Problems in Current Approaches 23 Not catered to the use case Results difficult to interpret Do not report the root cause of the quality issues Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 57. Problems in Current Approaches 23 Not catered to the use case Results difficult to interpret Do not report the root cause of the quality issues Require considerable amount of configuration Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 58. Problems in Current Approaches 23 Not catered to the use case Results difficult to interpret Do not report the root cause of the quality issues Require considerable amount of configuration Do not allow user to choose input dataset Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Systematic Literature Review
  • 59. Overview 24 Systematic! literature! review Linked Data Quality Assessment ! Methodologies Evaluation User-driven Crowdsourcing Semi-! automated Use case! leveraging! quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 60. Research Questions 25 User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 61. Research Questions 25 RQ2 How can we assess the quality of Linked Data using a user-driven methodology?! RQ2.1 How feasible is it to employ Linked Data experts to assess the quality issues of LD?! RQ2.2 How feasible is it to use a combination of user-driven and semi-automated methodology to assess the quality of LD? User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 62. Methodology 26 Resource Selection [Per Class] [Manual] [Random] Resource Evaluation mode selection Resource Evaluation [Manual] Triples [Semi-automatic] [Automatic] List of invalid facts Data Quality Improvement Pre-selection of triples Patch Ontology User-Driven Quality Assessment Linked Data Quality Assessment and its Application to Societal Progress Measurement
  • 63. Methodology 26 Resource Selection [Per Class] [Manual] [Random] Resource Evaluation mode selection Resource Evaluation [Manual] Triples [Semi-automatic] [Automatic] List of invalid facts Data Quality Improvement Pre-selection of triples Patch Ontology User-Driven Quality Assessment Linked Data Quality Assessment and its Application to Societal Progress Measurement Manual! Semi-automated! ! ! ! !
  • 64. Manual — Phase I 27 Linked Data Quality Problem Taxonomy Dimensions Category Accuracy Triple incorrectly extracted! Datatype problems! Implicit relationships between attributesRelevancy Irrelevant information extracted Representational consistency Representation of number values Interlinking External links Interlinks with other datasets User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 65. Manual — Phase II 28 User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 66. Manual — Phase II 28 Invited Linked Data experts! Triple-based evaluation! Contest-based - 3 weeks User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 67. Phase II —TripleCheckMate 29 User-Driven Quality Assessment https://github.com/AKSW/TripleCheckMate Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Choose a resource
  • 68. Phase II —TripleCheckMate 30 User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Identify erroneous triples
  • 69. Phase II —TripleCheckMate 30 User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Identify erroneous triples
  • 70. Phase II —TripleCheckMate 30 User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Identify erroneous triples
  • 71. Phase II —TripleCheckMate 31 User-Driven Quality Assessment Map to the quality problem taxonomy
  • 72. Manual — Results 32 Total no. of users 58 Total no. of distinct resources evaluated 521 Total no. of distinct incorrect triples 2928 % of triples affected 11.93% Resource-based inter-rater agreement (Cohen’s kappa) 0.34 Total no. of triples evaluated for correctness 700 % of triples evaluated incorrectly 19% User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 73. Semi-automated — Step 1 33 User-Driven Quality Assessment *Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.! Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 74. Semi-automated — Step 1 33 Generate schema axioms for properties via DL- Learner* User-Driven Quality Assessment *Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.! Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 75. Semi-automated — Step 1 33 Generate schema axioms for properties via DL- Learner* Functionality User-Driven Quality Assessment *Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.! Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 76. Semi-automated — Step 1 33 Generate schema axioms for properties via DL- Learner* Functionality Inverse functionality User-Driven Quality Assessment *Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.! Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 77. Semi-automated — Step 1 33 Generate schema axioms for properties via DL- Learner* Functionality Inverse functionality Asymmetric User-Driven Quality Assessment *Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.! Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 78. Semi-automated — Step 1 33 Generate schema axioms for properties via DL- Learner* Functionality Inverse functionality Asymmetric Irreflexivity User-Driven Quality Assessment *Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.! Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 79. Semi-automated — Step 1 33 Generate schema axioms for properties via DL- Learner* Functionality Inverse functionality Asymmetric Irreflexivity User-Driven Quality Assessment *Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.! Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Example:
  • 80. Semi-automated — Step 1 33 Generate schema axioms for properties via DL- Learner* Functionality Inverse functionality Asymmetric Irreflexivity User-Driven Quality Assessment *Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.! Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Example: Domain: Formula One Racer
  • 81. Semi-automated — Step 1 33 Generate schema axioms for properties via DL- Learner* Functionality Inverse functionality Asymmetric Irreflexivity User-Driven Quality Assessment *Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.! Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Example: Domain: Formula One Racer Range: Grand Prix
  • 82. Semi-automated — Step 1 33 Generate schema axioms for properties via DL- Learner* Functionality Inverse functionality Asymmetric Irreflexivity User-Driven Quality Assessment *Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.! Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Example: Domain: Formula One Racer Range: Grand Prix Only 1 first win of each Formula One Racer (Functional)
  • 83. Semi-automated — Step 2 34 User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 84. Semi-automated — Step 2 34 Manual evaluation of generated axioms User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 85. Semi-automated — Step 2 34 Manual evaluation of generated axioms 100 random axioms per type User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 86. Semi-automated — Step 2 34 Manual evaluation of generated axioms 100 random axioms per type Only those axioms where at least one violation can be found User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 87. Semi-automated — Step 2 34 Manual evaluation of generated axioms 100 random axioms per type Only those axioms where at least one violation can be found Also taking target context into account User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 88. Semi-automated — Results 35 User-Driven Quality Assessment Inverse! functionality Functionality Asymmetry Irreflexivity Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 89. Summary 36 User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 90. Summary 36 Quality analysis of over 500 resources User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 91. Summary 36 Quality analysis of over 500 resources 12% error detected User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 92. Summary 36 Quality analysis of over 500 resources 12% error detected Linked Data experts performed quality analysis but evaluated correct triples as errors User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 93. Summary 36 Quality analysis of over 500 resources 12% error detected Linked Data experts performed quality analysis but evaluated correct triples as errors 75% functionality violations of property characteristics detected but required manual verification User-Driven Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 94. Overview 37 Systematic! literature! review Linked Data Quality Assessment ! Methodologies Evaluation User-driven Crowdsourcing Semi-! automated Use case! leveraging! quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 95. Research Questions 38 Crowdsourcing Linked Data Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 96. Research Questions 38 RQ2.3 Is it possible to detect quality issues in LD datasets via crowdsourcing mechanisms?! RQ2.4 What type of crowd is most suitable for each type of quality issue?! RQ2.5 Which types of assessment errors are made by lay users and experts? Crowdsourcing Linked Data Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 97. Concepts 39 Crowdsourcing Linked Data Quality Assessment - Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013. Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri - Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann. SWJ (Submitted) 2015.
  • 98. Concepts 39 AMT - Amazon Mechanial Turk Crowdsourcing Linked Data Quality Assessment - Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013. Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri - Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann. SWJ (Submitted) 2015.
  • 99. Concepts 39 AMT - Amazon Mechanial Turk HITs - Human Intelligent Tasks/microtasks Crowdsourcing Linked Data Quality Assessment - Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013. Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri - Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann. SWJ (Submitted) 2015.
  • 100. Concepts 39 AMT - Amazon Mechanial Turk HITs - Human Intelligent Tasks/microtasks MTurk Workers - monetary reward for each HIT Crowdsourcing Linked Data Quality Assessment - Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013. Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri - Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann. SWJ (Submitted) 2015.
  • 101. Concepts 39 AMT - Amazon Mechanial Turk HITs - Human Intelligent Tasks/microtasks MTurk Workers - monetary reward for each HIT Find-Fix-Verify phases Crowdsourcing Linked Data Quality Assessment - Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013. Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri - Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann. SWJ (Submitted) 2015.
  • 102. Methodology 40 Resource [Manual] [Any] Resource selection Evaluation of resource’s triples Selection of quality issues [Incorrect triples] [Yes] [No] List of incorrect triples classified by quality issue (Find stage) LD Experts in contest HIT generation (Verify stage) Workers in paid microtasks Accept HIT Assess triple according to the given quality issue Submit HIT [Per Class] [Correct] [Incorrect] [Data doesn’t make sense] [I don’t know] [More triples to assess] [No] [Yes] Experts Workers Crowdsourcing Linked Data Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 104. Quality IssuesTypes 41 Incorrect/incomplete object value Crowdsourcing Linked Data Quality Assessment
  • 105. Quality IssuesTypes 41 Incorrect/incomplete object value dbpedia:Oreye! ! dbpedia-owl:postalCode! ! “4360”!@en Incorrect datatypes/literals Crowdsourcing Linked Data Quality Assessment
  • 106. Quality IssuesTypes 41 Incorrect/incomplete object value Incorrect interlink dbpedia:Oreye! ! dbpedia-owl:postalCode! ! “4360”!@en Incorrect datatypes/literals Crowdsourcing Linked Data Quality Assessment
  • 107. Results - Experts vs. Crowd 42 Crowdsourcing Linked Data Quality Assessment LD Expert MTurk Worker 58 80 3 weeks 4 days 1512 1073 0.38 0.73 Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 108. LD experts MTurk Workers Object values Fair! - required validation Fair! - simple comparisons Datatypes & literals Fair! - required validation Poor! - inexperienced with RDF Interlinks Poor! - high effort required Good! - high inter-rater agreement Summary — Experts vs. Crowd 43 Crowdsourcing Linked Data Quality Assessment Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 109. Overview 44 Systematic! literature! review Linked Data Quality Assessment ! Methodologies Evaluation User-driven Crowdsourcing Semi-! automated Use case! leveraging! quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 110. Research Questions 45 Use Case Leveraging Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 111. Research Questions 45 RQ2.6 How can we semi-automatically assess the quality of datasets and provide meaningful results to the user?! RQ3 How can we exploit Linked Data for building a use case and ensure good data quality? Use Case Leveraging Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 112. Motivation — User Scenario 46 Use Case Leveraging Data Quality
  • 113. Motivation — User Scenario 46 Healthcare! Policy maker Use Case Leveraging Data Quality
  • 114. Motivation — User Scenario 46 Healthcare! Policy maker Use Case Leveraging Data Quality Which diseases?! Deaths per diseases?! Where to allocate funds? interested in
  • 115. Motivation — User Scenario 46 Healthcare! Policy maker Use Case Leveraging Data Quality Which diseases?! Deaths per diseases?! Where to allocate funds? interested in Databases! e.g. WHO, ! ClinicalTrials.gov looks at
  • 116. Motivation — User Scenario 46 Healthcare! Policy maker Use Case Leveraging Data Quality Which diseases?! Deaths per diseases?! Where to allocate funds? interested in Databases! e.g. WHO, ! ClinicalTrials.gov looks at Data in disparate datasets, ! in different formats! Data quality problems! Subset of data! Error-prone analysis etc. analysis
  • 117. Motivation — User Scenario 46 Healthcare! Policy maker Use Case Leveraging Data Quality Which diseases?! Deaths per diseases?! Where to allocate funds? interested in Databases! e.g. WHO, ! ClinicalTrials.gov looks at Data in disparate datasets, ! in different formats! Data quality problems! Subset of data! Error-prone analysis etc. analysis translates to Inadequate ! allocations of! funds
  • 118. Use Case — Societal Progress Indicators 47 Evaluate the impact of Research & Development (R&D) — educational performance — on a country’s performance in:! Economical! Healthcare Use Case Leveraging Data Quality Using Linked Data to evaluate the impact of Research and Development in Europe: a Structural Equation Model. Amrapali Zaveri, Joao Ricardo Nickenig Vissoci, Cinzia Daraio and Ricardo Pietrobon. ISWC 2013. Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 119. Datasets &Variables 48 Use Case Leveraging Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 120. Datasets &Variables 48 4 datasets! World Bank! LinkedCT! Scimago! USPTO Use Case Leveraging Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 121. Datasets &Variables 48 4 datasets! World Bank! LinkedCT! Scimago! USPTO 17 variables ! Examples! GDP (economical)! Birth rate, death rate (healthcare)! h-index (educational) Use Case Leveraging Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 122. Methodology 49 World Bank Scimago LinkedCT USPTO Use Case Leveraging Data Quality *van Hage, W. R., Kauppinen, T., Graeler, B., Davis, C., Hoek- sema, J., Ruttenberg, A., and Bahls, D. (2014). SPARQL Package, v1.6. R Foundation for Statistical Computing.! * https://github.com/amrapalijz/R-LOD-SEM/blob/master/RSPARQL extract Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 123. Methodology 49 World Bank Scimago LinkedCT USPTO RSPARQL* Use Case Leveraging Data Quality *van Hage, W. R., Kauppinen, T., Graeler, B., Davis, C., Hoek- sema, J., Ruttenberg, A., and Bahls, D. (2014). SPARQL Package, v1.6. R Foundation for Statistical Computing.! * https://github.com/amrapalijz/R-LOD-SEM/blob/master/RSPARQL extract Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 124. Methodology 49 World Bank Scimago LinkedCT USPTO RSPARQL* Use Case Leveraging Data Quality *van Hage, W. R., Kauppinen, T., Graeler, B., Davis, C., Hoek- sema, J., Ruttenberg, A., and Bahls, D. (2014). SPARQL Package, v1.6. R Foundation for Statistical Computing.! * https://github.com/amrapalijz/R-LOD-SEM/blob/master/RSPARQL perform extract Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 125. Methodology 49 World Bank Scimago LinkedCT USPTO Quality ! Assessment RSPARQL* Use Case Leveraging Data Quality *van Hage, W. R., Kauppinen, T., Graeler, B., Davis, C., Hoek- sema, J., Ruttenberg, A., and Bahls, D. (2014). SPARQL Package, v1.6. R Foundation for Statistical Computing.! * https://github.com/amrapalijz/R-LOD-SEM/blob/master/RSPARQL perform extract Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 126. R2RLint tool*! 7 dimensions! 13 quality metrics ! Use case specific Semi-automated Quality Assessment 50 *https://github.com/AKSW/R2RLint Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 127. R2RLint tool*! 7 dimensions! 13 quality metrics ! Use case specific Semi-automated Quality Assessment 50 Availability Completeness Interlinking Syntactic! validity! Consistency Interpretability Representational conciseness *https://github.com/AKSW/R2RLint Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 128. Quality Assessment Results 51 Use Case Leveraging Data Quality Interlinking ! completeness Population ! incompleteness Inconsistency Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri Total no. detected
  • 129. 11/17Variables 52 Latent variables Observed variables Educational! performance Number of articles (h) that have at least h citations (h-index) Total no. of documents published per country per year High-technology export (HTE) Healthcare! performance Adolescent fertility rate (AFR) Birth rate (BR) Death rate (DR) Health expenditure public (HEP) Immunization DPT (IDPT) Immunization measles (IM) Mortality rate, infant (MR) Economic performance GDP per capita (current US$) Use Case Leveraging Data Quality
  • 130. Methodology 53 World Bank Scimago Structural Equation Modeling EFA*-CFA*-! EFA-CFA Apply SEM to ! hypothesis variables Step I Step II Use Case Leveraging Data Quality apply Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 131. Methodology 53 World Bank Scimago Structural Equation Modeling EFA*-CFA*-! EFA-CFA Apply SEM to ! hypothesis variables Step I Step II *EFA - Exploratory Factor Analysis! *CFA - Confirmatory Factor Analysis Use Case Leveraging Data Quality apply Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 132. Theoretical Framework 54 Use Case Leveraging Data Quality Educational ! performance Healthcare! performance Economical! performance correlation correlation correlation
  • 133. Structural Equation Modeling 55 Use Case Leveraging Data Quality https://github.com/amrapalijz/R-LOD-SEM/blob/master/sem_script.R Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri #Insert covariance matrix! var<-var(semdata)! cov<-cov(datanew)! cor<-cor(datanew)! #Acquire data! > data<- with(data,data.frame(hindex,n oOfDocs,IDPT,IM,MR,! AFR,BR,DR,GDP,HEP,HET))! > ssemmodel<- specifyModel()! #Latent Variables! > HealthCare->IDPT,efa14,NA; HealthCare->IM,efa11,NA; HealthCare-> MR,efa12,NA; HealthCare->AFR,efa13,NA;! ….! #Running SEM model! > sem <- sem::sem(semmodel,cor, N=781)! > summary(sem,fit.indices=c("GFI", "AGFI", "RMSEA", "NFI","NNFI", "CFI", "RNI", "IFI", "SRMR", "AIC", "AICc"))! > modIndices(sem)! > qgraph(sem,cut = 0.8,gray=TRUE)
  • 134. Structural Equation Modeling 56 Use Case Leveraging Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 135. Theoretical Framework 57 Use Case Leveraging Data Quality Educational ! performance Healthcare! performance Economical! performance correlation correlation correlation
  • 136. Conclusions 58 Use Case Leveraging Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 137. Conclusions 58 Performing robust statistical analysis on Linked Data can lead to important and meaningful insights on publicly available data for societal progress measurement.! Importance of performing use-case driven data quality assessment of datasets before their utilisation. Use Case Leveraging Data Quality Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 139. Contributions 59 Comprehensive survey ! 18 data quality dimensions with definitions; 69 metrics! 12 tools compared according to 8 attributes! Development and evaluation of data quality assessment methodologies! User-driven - manual and semi-automated! Crowdsourcing - experts vs. workers! Semi-automated - application to a use case ! Consumption of Linked Data leveraging data quality
  • 140. FutureWork 60Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 141. FutureWork 60 Standardized Quality assessment methodology for Linked Data Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 142. FutureWork 60 Standardized Quality assessment methodology for Linked Data Quality assessment tools for Linked Data Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 143. FutureWork 60 Standardized Quality assessment methodology for Linked Data Quality assessment tools for Linked Data Detection as well as improvement of quality issues before utilization in Linked Data use cases Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 144. Conference Publications 61 Using Linked Data to evaluate the impact of Research and Development in Europe: a Structural Equation Model. Amrapali Zaveri, Joao Ricardo Nickenig Vissoci, Cinzia Daraio and Ricardo Pietrobon. ISWC 2013.! Crowdsourcing Linked Data quality assessment. Maribel Acosta and Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013. ! User-driven Quality Evaluation of DBpedia. Amrapali Zaveri, Dimitris Kontokostas, Mohamed A. Sherif, Lorenz Bühmann, Mohamed Morsey, Sören Auer and Jens Lehmann. ISEMANTICS 2013. Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 145. Journal Publications 62 Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015.! Using Linked Data to build an Observatory of Societal Progress Indicators. Amrapali Zaveri, Joao Ricardo Nickenig Vissoci, Patrick Westphal, Jose Roberto Nascimento Junior, Luciano de Andrade, Cinzia Daraio, Jens Lehmann. Journal of Web Semantics 2014 (under review).! Publishing and Interlinking the USPTO Patent Data. Amrapali Zaveri, Mofeed M. Hassan, Tariq Yousef, Sören Auer, Jens Lehmann. Semantic Web Journal 2014 (under review). Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 146. Publications 63 No. of publications: 34 (Google Scholar),16 (DBLP)! Citations: 251 ! h-index: 9; i-10 index: 8 (Google Scholar) Linked Data QualityAssessment and itsApplication to Societal Progress Measurement A.Zaveri
  • 147. Thank you for your attention !! Questions? zaveri@informatik.uni-leipzig.de 64