Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft www.kit.edu
Validation Framework
for RDF-based Constraint La...
2
enthusiasm for SW technologies
problem statement
3
common need for RDF Validation
problem statement
4
common needs of data practitioners
2013: W3C RDF Validation Workshop
2014: 2 international working groups on RDF validat...
5
Resource Description Framework (RDF)
5problem statement
6
constraints of running example
6problem statement
7
constraints of running example
7problem statement
8
constraints of running example
8problem statement
9
constraints of running example
9problem statement
10
constraints of running example
10problem statement
11
provide a basis for continued research
RDF validation
development of constraint languages
further development of constr...
www.kit.edu
12
5 research questions
13
Which types of research data and related metadata
are not yet representable in RDF and
how to adequately model them
to ...
14
How to directly validate XML data
on semantically rich OWL axioms
using common RDF validation tools
when XML Schemas, a...
www.kit.edu
15
research question 3
16
http://purl.org/net/rdf-validation
DC 2014RQ3
17RQ3
18RQ3
19RQ3
20RQ3
21
Which types of constraints
must be expressible by constraint languages to meet
all collaboratively and comprehensively ...
22
a constraint is instantiated from a constraint type
each constraint type corresponds to a requirement
81 constraint typ...
www.kit.edu
23
research question 4
24
ShEx:
ReSh:
SHACL:
:Book { :author @:Person{1, } }
:Book a rs:ResourceShape ; rs:property [
rs:propertyDefinition :auth...
25
SPARQL and SPIN:
CONSTRUCT { [ a spin:ConstraintViolation ... . ] } WHERE {
?subject
a ?C1 ;
?predicate ?object .
BIND ...
26
minimum qualified cardinality restrictions (R-75)
OWL:
DSP:
:Book rdfs:subClassOf
[ a owl:Restriction ;
owl:minQualifie...
27
high-level constraint languages either
lack an implementation or
are based on different implementations
How to consiste...
28
validation environment
constraint language implementation (SPIN mapping):
:MinimumQualifiedCardinalityRestrictions
a sp...
29
validation process
RQ4
30RQ4
validation results
30
31
validation results
RQ4 31
32
validation results
RQ4 32
33
validation results
RQ4 33
34
validation results
RQ4 34
35
validation results
RQ4 35
36
validation results
RQ4 36
37
full implementations for
all OWL 2 and DSP language constructs
all constraint types expressible in OWL 2 and DSP
major ...
38
http://purl.org/net/rdfval-demo
RQ4
39
constraints and constraint language constructs
must be representable in RDF
constraint languages and supported constrai...
40
How to represent constraints of any constraint type and
how to reduce the representation of
constraints of any constrai...
41
intermediate abstraction layer
based on formal logics
enables to express any constraint type
enables straight-forward m...
42
conceptual model
DC 2015
RQ4
74%
26%
43RQ4 43
simple constraints
44
different validation results
RQ4
45
different validation results
RQ4 45
46
different validation results
RQ4 46
47
different validation results
RQ4 47
48
different validation results
RQ4 48
49
different validation results
RQ4 49
50
How to ensure for any constraint type that
RDF data is consistently validated against
semantically equivalent constrain...
51RQ4
semantically equivalent constraints
51
52
How to ensure for any constraint type that
semantically equivalent constraints of the same constraint type
can be trans...
53
What is the role reasoning plays in practical data validation and
for which constraint types reasoning may be performed...
54
collected, classified, and implemented 115 constraints
from vocabularies or domain experts
on 3 common vocabularies
wel...
55
future work: validation database and framework
maintain and extend RDF validation database
collect case studies and use...
56
future work: combine framework with SHACL
derive SHACL extensions
define mappings from SHACL to the abstraction layer a...
57
summary of main contributions
development of 3 RDF vocabularies
direct validation of XML using common RDF validation to...
58
acknowledgements, publications, research data
30 publications
6 journal articles, 9 conference articles, 3 workshop art...
www.kit.edu
59
appendix
60
publications: journal articles
1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). Directing the Devel...
61
publications: articles in conference proceedings
1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). V...
62
publications: articles in conference proceedings
6. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2012). Le...
63
publications: articles in workshop proceedings
Please note that in 2015, my last name changed from Bosch to Hartmann.
1...
64
publications: specifications
Please note that in 2015, my last name changed from Bosch to Hartmann.
1. Bosch, Thomas, C...
65
publications: technical reports
Please note that in 2015, my last name changed from Bosch to Hartmann.
1. Hartmann, Tho...
66
publications: technical reports
Please note that in 2015, my last name changed from Bosch to Hartmann.
6. Hartmann, Tho...
67
research questions
1. Which types of research data and related metadata are not yet representable in RDF and how
to ade...
68
summary of contributions
1. Development of three RDF vocabularies (1) to represent all types of research data and relat...
69
summary of limitations
1. XML Schemas must adequately represent particular domains in a syntactically and semantically ...
www.kit.edu
70
research question 1
71
Which types of research data and related metadata
are not yet representable in RDF and
how to adequately model them
to ...
72
development of 3 RDF vocabularies:
1. DDI-RDF Discovery Vocabulary (DDI-RDF)
to describe unit-record data
2. Physical D...
www.kit.edu
73
research question 2
74
XML, XML Schema (XSD)
RDF, Web Ontology Language (OWL)
XML Schemas > OWL ontologies
time-consuming work designing domai...
75
How to directly validate XML data
on semantically rich OWL axioms
using common RDF validation tools
when XML Schemas, a...
76
sub-class relationships
OWL hasValue restrictions on data properties
OWL universal restrictions on object properties
se...
77
on formal logics based transformations
OWL axioms extracted out of XML Schemas
explicitly
implicitly
formally underpin ...
78
ISWC 2012
ICITST 2011
OCAS (ISWC 2011)
RQ2
79
1. step of approach
executed generic test cases created out of the XML Schema meta-model
transformed XML Schemas of 6 X...
www.kit.edu
80
research question 5
81
What is the role reasoning plays in practical data validation and
for which constraint types reasoning may be performed...
82
What is the role reasoning plays in practical data validation?
research question 5-1
RQ5
83
reasoning may resolve violations
Book ⊑  author.Person
Book(Huckleberry-Finn)
author(Huckleberry-Finn, Mark-Twain)
→ P...
84
reasoning may cause violations
Publication ⊑ ∃ publisher.Publisher
Book(Huckleberry-Finn)
Book ⊑ Publication
RQ5
85
reasoning solves redundency
Publication ⊑ ∃ publicationDate . xsd:date
Book ⊑ Publication
Conference-Proceeding ⊑ Publi...
86
For which constraint types reasoning may be performed
prior to validation to enhance data quality?
research question 5-...
87
> 2/5 of constraint types
property domains (R-25):
constraint types with reasoning
∃ author.⊤ ⊑ Publication
author(Alic...
88
< 3/5 of constraint types
literal pattern matching (R-44):
constraint types without reasoning
RQ5
ISBN a rdfs:Datatype ...
89
For which constraint types validation results differ
(1) if the CWA or the OWA and
(2) if the UNA or the nUNA is assume...
90
56.8% of constraint types
minimum qualified cardinality restrictions (R-75):
CWA dependent constraint types
RQ5
Book ⊑ ...
91
disjoint classes (R-7):
CWA independent constraint types
RQ5
Book ⊓ JournalArticle ⊑ ⊥
92
66.6% of constraint types
functional properties (R-57/65):
UNA dependent constraint types
RQ5
funct(title)
title(The-Ad...
93
literal value comparison (R-43):
UNA independent constraint types
RQ5
birthDate(Albert-Einstein, "1955-04-18")
deathDat...
www.kit.edu
94
evaluation
95
collected, classified, and implemented 115 constraints
from vocabularies or domain experts
on 3 common vocabularies
wel...
96
classification of constraint types
RDFS/OWL based
constraint language based
SPARQL based
classification of constraints
...
97
RDFS/OWL based
evaluation
classification of constraint types
:Publication rdfs:subClassOf
[ a owl:Restriction ;
owl:onP...
98
constraint language based
evaluation
classification of constraint types
:Publication {
( :isbn xsd:string, :title xsd:s...
99
SPARQL based
evaluation
classification of constraint types
SELECT ?concept
WHERE {
?concept a [ rdfs:subClassOf* skos:C...
100
C (constraints), CV (constraint violations)
values in %
evaluation
finding 1
C CV
SPARQL 63.2 78.2
CL 34.7 21.8
RDFS/O...
101
C (constraints), CV (constraint violations)
values in %
evaluation
finding 2
C CV
SPARQL 63.2 78.2
CL 34.7 21.8
RDFS/O...
102
C (constraints), CV (constraint violations)
values in %
evaluation
finding 3
C CV
Info 42.3 31.3
Warning 18.7 62.7
Err...
www.kit.edu
103
future work
104
future work: RQ1
publication of RDF vocabularies
DDI Alliance specifications
W3C recommendation for DDI-RDF
DDI-Lifecy...
105
aligning PHDD and CSV on the WEB
overlap in the description of tabular data in CSV format
broader scope of PHDD
descri...
106
future work: RQ2
bidirectional transformations from models of any meta-model to OWL
generalize from XSD meta-model bas...
107
future work: validation database and framework
maintain and extend RDF validation database
collect case studies and us...
108
future work: combine framework with SHACL
derive SHACL extensions
define mappings from SHACL to the abstraction layer ...
Prochain SlideShare
Chargement dans…5
×

Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

357 vues

Publié le

In this thesis, a validation framework is introduced that enables to consistently execute RDF-based constraint languages on RDF data and to formulate constraints of any type. The framework reduces the representation of constraints to the absolute minimum, is based on formal logics, consists of a small lightweight vocabulary, and ensures consistency regarding validation results and enables constraint transformations for each constraint type across RDF-based constraint languages.

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

  1. 1. KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft www.kit.edu Validation Framework for RDF-based Constraint Languages M.Sc. (TUM) Thomas Hartmann Professor Dr. York Sure-Vetter Professor Dr. Kai Eckert (Stuttgart Media University) Professor Dr. Rudi Studer Professor Dr. Andreas Geyer-Schulz Disputation, 08.07.2016
  2. 2. 2 enthusiasm for SW technologies problem statement
  3. 3. 3 common need for RDF Validation problem statement
  4. 4. 4 common needs of data practitioners 2013: W3C RDF Validation Workshop 2014: 2 international working groups on RDF validation constraint languages SPARQL Query Language for RDF SPARQL Inferencing Notation (SPIN) Web Ontology Language (OWL) Shape Expressions (ShEx) Resource Shapes (ReSh) Description Set Profiles (DSP) Shapes Constraint Language (SHACL) none of these languages meets all requirements RDF validation as research field problem statement W3C RDF Data Shapes Working Group DCMI RDF Application Profiles Task Group
  5. 5. 5 Resource Description Framework (RDF) 5problem statement
  6. 6. 6 constraints of running example 6problem statement
  7. 7. 7 constraints of running example 7problem statement
  8. 8. 8 constraints of running example 8problem statement
  9. 9. 9 constraints of running example 9problem statement
  10. 10. 10 constraints of running example 10problem statement
  11. 11. 11 provide a basis for continued research RDF validation development of constraint languages further development of constraint languages based on commonly approved requirements incorporate the findings into the working groups thesis objectives thesis objectives
  12. 12. www.kit.edu 12 5 research questions
  13. 13. 13 Which types of research data and related metadata are not yet representable in RDF and how to adequately model them to be able to validate RDF data against constraints extractable from these vocabularies? research question 1 RQ1 IASSIST Quarterly, 38(4) & 39(1), 7-16 IASSIST Quarterly, 38(4) & 39(1), 17-24 IASSIST Quarterly, 38(4) & 39(1), 25-37 IASSIST Quarterly, 38(4) & 39(1), 38-46 LDOW (WWW 2013) SemStats (ISWC 2013) DC 2012 ESWC 2011 (Poster) DDI Moving Forward Project RDF Vocabularies Working Group
  14. 14. 14 How to directly validate XML data on semantically rich OWL axioms using common RDF validation tools when XML Schemas, adequately representing particular domains, have already been designed? research question 2 RQ2 IJMSO, 8(3) ISWC 2012 ICITST 2011 OCAS (ISWC 2011)
  15. 15. www.kit.edu 15 research question 3
  16. 16. 16 http://purl.org/net/rdf-validation DC 2014RQ3
  17. 17. 17RQ3
  18. 18. 18RQ3
  19. 19. 19RQ3
  20. 20. 20RQ3
  21. 21. 21 Which types of constraints must be expressible by constraint languages to meet all collaboratively and comprehensively identified requirements to formulate constraints and validate RDF data? research question 3 RQ3
  22. 22. 22 a constraint is instantiated from a constraint type each constraint type corresponds to a requirement 81 constraint types types of constraints on RDF data RQ3
  23. 23. www.kit.edu 23 research question 4
  24. 24. 24 ShEx: ReSh: SHACL: :Book { :author @:Person{1, } } :Book a rs:ResourceShape ; rs:property [ rs:propertyDefinition :author ; rs:valueShape :Person ; rs:occurs rs:One-or-many ; ] . minimum qualified cardinality restrictions (R-75) :BookShape a sh:Shape ; sh:scopeClass :Book ; sh:property [ sh:predicate :author ; sh:valueShape :PersonShape ; sh:minCount 1 ; ] . :PersonShape a sh:Shape ; sh:scopeClass :Person . RQ4
  25. 25. 25 SPARQL and SPIN: CONSTRUCT { [ a spin:ConstraintViolation ... . ] } WHERE { ?subject a ?C1 ; ?predicate ?object . BIND ( qualifiedCardinality( ?subject, ?predicate, ?C2 ) AS ?c ). BIND( STRDT ( STR ( ?c ), xsd:nonNegativeInteger ) AS ?cardinality ) . FILTER ( ?cardinality < ?minimumCardinality ) . FILTER ( ?minimumCardinality = 1 ) . FILTER ( ?C1 = :Book ) . FILTER ( ?C2 = :Person ) . FILTER ( ?predicate = :author ) . } SELECT ( COUNT ( ?arg1 ) AS ?c ) WHERE { ?arg1 ?arg2 ?object . ?object a ?arg3 . } RQ4 minimum qualified cardinality restrictions (R-75)
  26. 26. 26 minimum qualified cardinality restrictions (R-75) OWL: DSP: :Book rdfs:subClassOf [ a owl:Restriction ; owl:minQualifiedCardinality 1 ; owl:onProperty :author ; owl:onClass :Person ] . [ dsp:resourceClass :Book ; dsp:statementTemplate [ dsp:minOccur 1 ; dsp:property :author ; dsp:nonLiteralConstraint [ dsp:valueClass :Person ] ] ] . RQ4
  27. 27. 27 high-level constraint languages either lack an implementation or are based on different implementations How to consistently validate RDF data against constraints of any constraint type expressed in any RDF-based constraint language? research question 4-1 RQ4
  28. 28. 28 validation environment constraint language implementation (SPIN mapping): :MinimumQualifiedCardinalityRestrictions a spin:ConstructTemplate ; spin:body [ ... CONSTRUCT { ... } WHERE { ... } ... ] . RQ4
  29. 29. 29 validation process RQ4
  30. 30. 30RQ4 validation results 30
  31. 31. 31 validation results RQ4 31
  32. 32. 32 validation results RQ4 32
  33. 33. 33 validation results RQ4 33
  34. 34. 34 validation results RQ4 34
  35. 35. 35 validation results RQ4 35
  36. 36. 36 validation results RQ4 36
  37. 37. 37 full implementations for all OWL 2 and DSP language constructs all constraint types expressible in OWL 2 and DSP major constraint types representable by ShEx and ReSh RDF serialization for DSP validation environment http://purl.org/net/rdfval-demo RQ4
  38. 38. 38 http://purl.org/net/rdfval-demo RQ4
  39. 39. 39 constraints and constraint language constructs must be representable in RDF constraint languages and supported constraint types must be expressible in SPARQL limitations RQ4
  40. 40. 40 How to represent constraints of any constraint type and how to reduce the representation of constraints of any constraint type to the absolute minimum? research question 4-2 RQ4 DSP ReSh ShEx SHACL OWL 2 SPARQL 17.3 (14) 25.9 (21) 29.6 (24) 51.9 (42) 67.9 (55) 100.0 (81)
  41. 41. 41 intermediate abstraction layer based on formal logics enables to express any constraint type enables straight-forward mappings from high-level constraint languages reduces the representation of constraints to the absolute minimum validation framework for RDF-based constraint languages RQ4
  42. 42. 42 conceptual model DC 2015 RQ4 74% 26%
  43. 43. 43RQ4 43 simple constraints
  44. 44. 44 different validation results RQ4
  45. 45. 45 different validation results RQ4 45
  46. 46. 46 different validation results RQ4 46
  47. 47. 47 different validation results RQ4 47
  48. 48. 48 different validation results RQ4 48
  49. 49. 49 different validation results RQ4 49
  50. 50. 50 How to ensure for any constraint type that RDF data is consistently validated against semantically equivalent constraints of the same constraint type across RDF-based constraint languages? framework is solely based on the abstract definitions of constraint types just 1 SPIN mapping for each constraint type research question 4-3 RQ4
  51. 51. 51RQ4 semantically equivalent constraints 51
  52. 52. 52 How to ensure for any constraint type that semantically equivalent constraints of the same constraint type can be transformed from one RDF-based constraint language to another? gc = mα (cα) cβ = m'β (gc) RQ4 research question 4-4
  53. 53. 53 What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed prior to validation to enhance data quality? research question 5 RQ5 SEMANTiCS 2015
  54. 54. 54 collected, classified, and implemented 115 constraints from vocabularies or domain experts on 3 common vocabularies well-established (QB, SKOS) under development (DDI-RDF) evaluation evaluation IJSC, 10(2) ICSC 2016 33 SPARQL endpoints
  55. 55. 55 future work: validation database and framework maintain and extend RDF validation database collect case studies and use cases extract requirements publish constraint types keep framework in sync evaluate solutions future work http://purl.org/net/rdf-validation
  56. 56. 56 future work: combine framework with SHACL derive SHACL extensions define mappings from SHACL to the abstraction layer and back maintain consistency of implementations of constraint types future work W3C RDF Data Shapes Working Group DCMI RDF Application Profiles Task Group
  57. 57. 57 summary of main contributions development of 3 RDF vocabularies direct validation of XML using common RDF validation tools publication of 81 constraint types validation framework for RDF-based constraint languages role of reasoning for RDF validation THANK YOU!
  58. 58. 58 acknowledgements, publications, research data 30 publications 6 journal articles, 9 conference articles, 3 workshop articles, 2 specifications, 10 technical reports 1. author of all (except 1) journal articles, conference articles, workshop articles research data and results KIT research data repository: http://dx.doi.org/10.5445/BWDD/11 GitHub repository: https://github.com/github-thomas-hartmann/phd-thesis 4 international working groups DCMI RDF Application Profiles Task Group part of the editorial board RDF Vocabularies Working Group editor for DDI-RDF and PHDD W3C RDF Data Shapes Working Group DDI Moving Forward Project THANK YOU!
  59. 59. www.kit.edu 59 appendix
  60. 60. 60 publications: journal articles 1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). Directing the Development of Constraint Languages by Checking Constraints on RDF Data. International Journal of Semantic Computing, 10(02), 1–25. http://www.worldscientific.com/worldscinet/ijsc 2. Bosch, Thomas & Mathiak, B. (2015). Use Cases Related to an Ontology of the Data Documentation Initiative. IASSIST Quarterly, 38(4) & 39(1), 25–37. http://iassistdata.org/iq/issue/38/4 3. Bosch, Thomas, Olsson, O., Gregory, A., & Wackerow, J. (2015). DDI-RDF Discovery - A Discovery Model for Microdata. IASSIST Quarterly, 38(4) & 39(1), 17–24. http://iassistdata.org/iq/issue/38/4 4. Bosch, Thomas & Zapilko, B. (2015). Semantic Web Applications for the Social Sciences. IASSIST Quarterly, 38(4) & 39(1), 7–16. http://iassistdata.org/iq/issue/38/4 5. Schaible, J., Zapilko, B., Bosch, Thomas, & Zenk-Möltgen, W. (2015). Linking Study Descriptions to the Linked Open Data Cloud. IASSIST Quarterly, 38(4) & 39(1), 38–46. http://iassistdata.org/iq/issue/38/4 6. Bosch, Thomas & Mathiak, B. (2013). How to Accelerate the Process of Designing Domain Ontologies based on XML Schemas. International Journal of Metadata, Semantics and Ontologies - Special Issue on Metadata, Semantics and Ontologies for Web Intelligence, 8(3), 254 – 266. http://www.inderscience.com/info/inarticle.php?artid=57760 Please note that in 2015, my last name changed from Bosch to Hartmann.
  61. 61. 61 publications: articles in conference proceedings 1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). Validating RDF Data Quality using Constraints to Direct the Development of Constraint Languages. In Proceedings of the 10th International Conference on Semantic Computing (ICSC 2016) Laguna Hills, California, USA: IEEE. http://www.ieee-icsc.com/ 2. Bosch, Thomas & Eckert, K. (2015). Guidance, Please! Towards a Framework for RDF-based Constraint Languages. In Proceedings of the 15th DCMI International Conference on Dublin Core and Metadata Applications (DC 2015) São Paulo, Brazil. http://dcevents.dublincore.org/IntConf/dc-2015/paper/view/386/368 3. Bosch, Thomas, Acar, E., Nolle, A., & Eckert, K. (2015). The Role of Reasoning for RDF Validation. In Proceedings of the 11th International Conference on Semantic Systems (SEMANTiCS 2015) (pp. 33–40). Vienna, Austria: ACM. http://doi.acm.org/10.1145/2814864.2814867 4. Bosch, Thomas & Eckert, K. (2014). Requirements on RDF Constraint Formulation and Validation. In Proceedings of the 14th DCMI International Conference on Dublin Core and Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc-2014/paper/view/257 5. Bosch, Thomas & Eckert, K. (2014). Towards Description Set Profiles for RDF using SPARQL as Intermediate Language. In Proceedings of the 14th DCMI International Conference on Dublin Core and Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc- 2014/paper/view/270 Please note that in 2015, my last name changed from Bosch to Hartmann.
  62. 62. 62 publications: articles in conference proceedings 6. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2012). Leveraging the DDI Model for Linked Statistical Data in the Social, Behavioural, and Economic Sciences. In Proceedings of the 12th DCMI International Conference on Dublin Core and Metadata Applications (DC 2012) Kuching, Sarawak, Malaysia. http://dcpapers.dublincore.org/pubs/article/view/3654 7. Bosch, Thomas (2012). Reusing XML Schemas’ Information as a Foundation for Designing Domain Ontologies. In P. Cudré-Mauroux, J. Heflin, E. Sirin, T. Tudorache, J. Euzenat, M. Hauswirth, J. Parreira, J. Hendler, G. Schreiber, A. Bernstein, & E. Blomqvist (Eds.), The Semantic Web - ISWC 2012, volume 7650 of Lecture Notes in Computer Science (pp. 437–440). Springer Berlin Heidelberg. http://dx.doi.org/10.1007/978-3-642-35173-0_34 8. Bosch, Thomas & Mathiak, B. (2012). XSLT Transformation Generating OWL Ontologies Automatically Based on XML Schemas. In Proceedings of the 6th International Conference for Internet Technology and Secured Transactions (ICITST 2011), IEEE Xplore Digital Library (pp. 660–667). Abu Dhabi, United Arab Emirates. http://edas.info/web/icitst2011/program.html 9. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2011). Designing an Ontology for the Data Documentation Initiative. In Proceedings of the 8th Extended Semantic Web Conference (ESWC 2011), Poster-Session Heraklion, Greece. http://www.eswc2011.org/content/accepted-posters.html Please note that in 2015, my last name changed from Bosch to Hartmann.
  63. 63. 63 publications: articles in workshop proceedings Please note that in 2015, my last name changed from Bosch to Hartmann. 1. Bosch, Thomas, Cyganiak, R., Gregory, A., & Wackerow, J. (2013). DDI-RDF Discovery Vocabulary: A Metadata Vocabulary for Documenting Research and Survey Data. In Proceedings of the 6th Workshop on Linked Data on the Web (LDOW 2013), 22nd International World Wide Web Conference (WWW 2013), volume 996 Rio de Janeiro, Brazil. http://ceur-ws.org/Vol-996/ 2. Bosch, Thomas, Zapilko, B., Wackerow, J., & Gregory, A. (2013). Towards the Discovery of Person-Level Data - Reuse of Vocabularies and Related Use Cases. In Proceedings of the 1st International Workshop on Semantic Statistics (SemStats 2013), 12th International Semantic Web Conference (ISWC 2013), Sydney, Australia. http://semstats.github.io/2013/proceedings 3. Bosch, Thomas & Mathiak, B. (2011). Generic Multilevel Approach Designing Domain Ontologies Based on XML Schemas. In Proceedings of the 1st Workshop Ontologies Come of Age in the Semantic Web (OCAS 2011), 10th International Semantic Web Conference (ISWC 2011) (pp. 1–12). Bonn, Germany. http://ceur-ws.org/Vol-809/
  64. 64. 64 publications: specifications Please note that in 2015, my last name changed from Bosch to Hartmann. 1. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2016). DDI-RDF Discovery Vocabulary: A Vocabulary for Publishing Metadata about Data Sets (Research and Survey Data) into the Web of Linked Data. DDI Alliance Specification, DDI Alliance. http://rdf-vocabulary.ddialliance.org/discovery 2. Wackerow, J., Hoyle, L., & Bosch, Thomas (2016). Physical Data Description. DDI Alliance Specification, DDI Alliance. http://rdf-vocabulary.ddialliance.org/phdd.html
  65. 65. 65 publications: technical reports Please note that in 2015, my last name changed from Bosch to Hartmann. 1. Hartmann, Thomas (2016). Validation Framework for RDF-based Constraint Languages - PhD Thesis Appendix. Karlsruhe Institute of Technology (KIT), Karlsruhe. http://dx.doi.org/10.5445/IR/1000054062 2. Vompras, J., Gregory, A., Bosch, Thomas, & Wackerow, J. (2015). Scenarios for the DDI-RDF Discovery Vocabulary. DDI Working Paper Series. http://dx.doi.org/10.3886/DDISemanticWeb02 3. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A., Rühle, S., & Svensson, L. (2015). Report on Validation Requirements. DCMI Draft, Dublin Core Metadata Initiative (DCMI). http://wiki.dublincore.org/index.php/RDF_Application_Profiles/Requirements 4. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A., Rühle, S., & Svensson, L. (2015). Report on the Current State: Use Cases and Validation Requirements. DCMI Draft, Dublin Core Metadata Initiative (DCMI). http://wiki.dublincore.org/index.php/RDF_Application_Profiles/UCR_Deliverable 5. Bosch, Thomas, Nolle, A., Acar, E., & Eckert, K. (2015). RDF Validation Requirements - Evaluation and Logical Underpinning. Computing Research Repository (CoRR), abs/1501.03933. http://arxiv.org/abs/1501.03933
  66. 66. 66 publications: technical reports Please note that in 2015, my last name changed from Bosch to Hartmann. 6. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015). Constraints to Validate RDF Data Quality on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research Repository (CoRR), abs/1504.04479. http://arxiv.org/abs/1504.04479 7. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015). Evaluating the Quality of RDF Data Sets on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research Repository (CoRR), abs/1504.04478. http://arxiv.org/abs/1504.04478 8. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2014). Designing an Ontology for the Data Documentation Initiative. Computing Research Repository (CoRR), abs/1402.3470. http://arxiv.org/abs/1402.3470 9. Bosch, Thomas & Mathiak, B. (2013). Evaluation of a Generic Approach for Designing Domain Ontologies Based on XML Schemas. Gesis Technical Report 08, Gesis - Leibniz Institute for the Social Sciences, Mannheim, Germany. http://www.gesis.org/publikationen/archiv/gesis-technical-reports/ 10. Block, W., Bosch, Thomas, Fitzpatrick, B., Gillman, D., Greenfield, J., Gregory, A., Hebing, M., Hoyle, L., Humphrey, C., Johnson, J., Linnerud, J., Mathiak, B., McEachern, S., Radler, B., Risnes, Ø., Smith, D., Thomas, W., Wackerow, J., Wegener, D., & Zenk-Möltgen, W. (2012). Developing a Model-Driven DDI Specification. DDI Working Paper Series
  67. 67. 67 research questions 1. Which types of research data and related metadata are not yet representable in RDF and how to adequately model them to be able to validate RDF data against constraints extractable from these vocabularies? 2. How to directly validate XML data on semantically rich OWL axioms using common RDF validation tools when XML Schemas, adequately representing particular domains, have already been designed? 3. Which types of constraints must be expressible by constraint languages to meet all collaboratively and comprehensively identified requirements to formulate constraints and validate RDF data? 4. How to ensure for any constraint type that (1) RDF data is consistently validated against semantically equivalent constraints of the same constraint type across RDF-based constraint languages and (2) semantically equivalent constraints of the same constraint type can be transformed from one RDF-based constraint language to another? 5. What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed prior to validation to enhance data quality? appendix
  68. 68. 68 summary of contributions 1. Development of three RDF vocabularies (1) to represent all types of research data and related metadata in RDF and (2) to validate RDF data against constraints extractable from these vocabularies 2. Direct validation of XML data using common RDF validation tools against semantically rich OWL axioms extracted from XML Schemas properly describing certain domains 3. Publication of 81 types of constraints that must be expressible by constraint languages to meet all jointly and extensively identified requirements to formulate constraints and validate RDF data against constraints 4.1 Consistent validation across RDF-based constraint languages 4.2 Minimal representation of constraints of any type 4.3 For any constraint type, RDF data is consistently validated against semantically equivalent constraints of the same constraint type across RDF-based constraint languages 4.4 For any constraint type, semantically equivalent constraints of the same constraint type can be transformed from one RDF-based constraint language to another 5. We delineate the role reasoning plays in practical data validation and investigated for each constraint type (1) if reasoning may be performed prior to validation to enhance data quality, (2) how efficient in terms of runtime validation is performed with and without reasoning, and (3) if validation results depend on different underlying semantics 6. Evaluation of the Usability of Constraint Types for Assessing RDF Data Quality appendix
  69. 69. 69 summary of limitations 1. XML Schemas must adequately represent particular domains in a syntactically and semantically correct way 2. Constraints of supported constraint types and constraint language constructs must be representable in RDF 3. Constraint languages and supported constraint types must be expressible in SPARQL 4. The generality of the findings of the large-scale evaluation has to be proved for all vocabularies appendix
  70. 70. www.kit.edu 70 research question 1
  71. 71. 71 Which types of research data and related metadata are not yet representable in RDF and how to adequately model them to be able to validate RDF data against constraints extractable from these vocabularies? research question 1 RQ1 IASSIST Quarterly, 38(4) & 39(1), 7-16 IASSIST Quarterly, 38(4) & 39(1), 17-24 IASSIST Quarterly, 38(4) & 39(1), 25-37 IASSIST Quarterly, 38(4) & 39(1), 38-46 LDOW (WWW 2013) SemStats (ISWC 2013) DC 2012 ESWC 2011 (Poster) DDI Moving Forward Project RDF Vocabularies Working Group
  72. 72. 72 development of 3 RDF vocabularies: 1. DDI-RDF Discovery Vocabulary (DDI-RDF) to describe unit-record data 2. Physical Data Description (PHDD) to describe data in tabular format and its physical properties 3. The SKOS Extension for Statistics (XKOS) to describe the structure and textual properties of formal statistical classifications to describe relations between classifications and concepts and among concepts contribution RQ1
  73. 73. www.kit.edu 73 research question 2
  74. 74. 74 XML, XML Schema (XSD) RDF, Web Ontology Language (OWL) XML Schemas > OWL ontologies time-consuming work designing domain ontologies from scratch by hand reuse information contained in XML Schemas designing OWL domain ontologies RQ2
  75. 75. 75 How to directly validate XML data on semantically rich OWL axioms using common RDF validation tools when XML Schemas, adequately representing particular domains, have already been designed? research question 2 RQ2 IJMSO, 8(3) ISWC 2012 ICITST 2011 OCAS (ISWC 2011)
  76. 76. 76 sub-class relationships OWL hasValue restrictions on data properties OWL universal restrictions on object properties semantically rich OWL axioms <library> <book year="February 1890"> <author> <name>Arthur Conan Doyle</name> </author> <title>The Sign of the Four</title> </book> </library> Title ⊑  value.string Year ⊑  value.integer RQ2
  77. 77. 77 on formal logics based transformations OWL axioms extracted out of XML Schemas explicitly implicitly formally underpin transformations to formally define and model semantics in a semantically correct way complete extraction of XML Schemas' structural information XML can directly be validated against semantically rich OWL axioms any XML Schema is convertible to OWL minimized effort designing OWL domain ontologies contributions IJMSO, 8(3) RQ2
  78. 78. 78 ISWC 2012 ICITST 2011 OCAS (ISWC 2011) RQ2
  79. 79. 79 1. step of approach executed generic test cases created out of the XML Schema meta-model transformed XML Schemas of 6 XML standards 2. step of approach specified SWRL rules for 3 OWL domain ontologies verified hypothesis determined effort for traditional manual approach estimated effort for semi-automatic approach DDI-RDF serves as OWL domain ontology The effort and the time needed to deliver high quality domain ontologies from scratch by reusing information of already existing XML Schemas is much less than creating domain ontologies completely manually and from the ground up. evaluation IJMSO, 8(3) RQ2
  80. 80. www.kit.edu 80 research question 5
  81. 81. 81 What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed prior to validation to enhance data quality? research question 5 RQ5 SEMANTiCS 2015
  82. 82. 82 What is the role reasoning plays in practical data validation? research question 5-1 RQ5
  83. 83. 83 reasoning may resolve violations Book ⊑  author.Person Book(Huckleberry-Finn) author(Huckleberry-Finn, Mark-Twain) → Person(Mark-Twain) RQ5
  84. 84. 84 reasoning may cause violations Publication ⊑ ∃ publisher.Publisher Book(Huckleberry-Finn) Book ⊑ Publication RQ5
  85. 85. 85 reasoning solves redundency Publication ⊑ ∃ publicationDate . xsd:date Book ⊑ Publication Conference-Proceeding ⊑ Publication Journal-Article ⊑ Publication RQ5
  86. 86. 86 For which constraint types reasoning may be performed prior to validation to enhance data quality? research question 5-2 RQ5
  87. 87. 87 > 2/5 of constraint types property domains (R-25): constraint types with reasoning ∃ author.⊤ ⊑ Publication author(Alices-Adventures-In-Wonderland, Lewis-Carroll) → rdf:type(Alices-Adventures-In-Wonderland, Publication) RQ5
  88. 88. 88 < 3/5 of constraint types literal pattern matching (R-44): constraint types without reasoning RQ5 ISBN a rdfs:Datatype ; owl:equivalentClass [ a rdfs:Datatype ; owl:onDatatype xsd:string ; owl:withRestrictions ([ xsd:pattern "^d{9}[d|X]$" ])] . Book ⊑  identifier.ISBN
  89. 89. 89 For which constraint types validation results differ (1) if the CWA or the OWA and (2) if the UNA or the nUNA is assumed? CWA dependent: 56.8% UNA dependent: 66.6% research question 5-3 RQ5
  90. 90. 90 56.8% of constraint types minimum qualified cardinality restrictions (R-75): CWA dependent constraint types RQ5 Book ⊑ ∃ title.⊤
  91. 91. 91 disjoint classes (R-7): CWA independent constraint types RQ5 Book ⊓ JournalArticle ⊑ ⊥
  92. 92. 92 66.6% of constraint types functional properties (R-57/65): UNA dependent constraint types RQ5 funct(title) title(The-Adventures-of-Huckleberry-Finn, "The Adventures of Huckleberry Finn") title(The-Adventures-of-Huckleberry-Finn, "Die Abenteuer des Huckleberry Finn")
  93. 93. 93 literal value comparison (R-43): UNA independent constraint types RQ5 birthDate(Albert-Einstein, "1955-04-18") deathDate(Albert-Einstein, "1879-03-14") birthDate(Albert_Einstein, "1879-03-14") deathDate(Albert_Einstein, "1955-04-18") owl:sameAs(Albert-Einstein, Albert_Einstein)
  94. 94. www.kit.edu 94 evaluation
  95. 95. 95 collected, classified, and implemented 115 constraints from vocabularies or domain experts on 3 common vocabularies well-established (QB, SKOS) under development (DDI-RDF) evaluation evaluation IJSC, 10(2) ICSC 2016 33 SPARQL endpoints
  96. 96. 96 classification of constraint types RDFS/OWL based constraint language based SPARQL based classification of constraints informational warning error evaluation classification
  97. 97. 97 RDFS/OWL based evaluation classification of constraint types :Publication rdfs:subClassOf [ a owl:Restriction ; owl:onProperty :author ; owl:allValuesFrom :Person ] .
  98. 98. 98 constraint language based evaluation classification of constraint types :Publication { ( :isbn xsd:string, :title xsd:string ) | ( :issn xsd:string, :title xsd:string )}
  99. 99. 99 SPARQL based evaluation classification of constraint types SELECT ?concept WHERE { ?concept a [ rdfs:subClassOf* skos:Concept ] . FILTER NOT EXISTS { ?concept ?p ?o . FILTER ( ?p IN ( skos:related, skos:relatedMatch, skos:broader, ... ) ) . } }
  100. 100. 100 C (constraints), CV (constraint violations) values in % evaluation finding 1 C CV SPARQL 63.2 78.2 CL 34.7 21.8 RDFS/OWL 35.6 21.8
  101. 101. 101 C (constraints), CV (constraint violations) values in % evaluation finding 2 C CV SPARQL 63.2 78.2 CL 34.7 21.8 RDFS/OWL 35.6 21.8
  102. 102. 102 C (constraints), CV (constraint violations) values in % evaluation finding 3 C CV Info 42.3 31.3 Warning 18.7 62.7 Error 39.0 6.1
  103. 103. www.kit.edu 103 future work
  104. 104. 104 future work: RQ1 publication of RDF vocabularies DDI Alliance specifications W3C recommendation for DDI-RDF DDI-Lifecycle MD (Model-Driven) new requirements based on experiences with DDI-RDF international working group: DDI Moving Forward Project individual contributions formalize conceptual model (using UML 2) conceptualize and implement diverse model serializations (e.g., RDFS/OWL) future work
  105. 105. 105 aligning PHDD and CSV on the WEB overlap in the description of tabular data in CSV format broader scope of PHDD description of tabular data with fixed record length description of tabular data with multiple records per case evaluation for use in DDI-Lifecycle MD future work: RQ1 future work
  106. 106. 106 future work: RQ2 bidirectional transformations from models of any meta-model to OWL generalize from XSD meta-model based unidirectional transformations from XSD models into OWL models enable to validate any data against constraints extractable from models of any meta-model using common RDF validation tools future work
  107. 107. 107 future work: validation database and framework maintain and extend RDF validation database collect case studies and use cases extract requirements publish constraint types keep framework in sync evaluate solutions future work http://purl.org/net/rdf-validation
  108. 108. 108 future work: combine framework with SHACL derive SHACL extensions define mappings from SHACL to the abstraction layer and back maintain consistency of implementations of constraint types future work W3C RDF Data Shapes Working Group DCMI RDF Application Profiles Task Group

×