A large part of Linked Data generation entails processing the raw data. However, this process is only documented in human-readable form or as a software repository. This inhibits reproducibility and comparability, as current documentation solutions do not provide detailed metadata and rely on the availability of specific software environments.
We propose an automatic capturing mechanism for interchangeable and implementation independent metadata and provenance that includes data processing. Using declarative mapping documents to describe the computational experiment allows automatic capturing of term-level provenance for both schema and data transformations, and for both the used software tools as the input-output pairs of the data processing executions. This approach is applied to mapping documents described using RML and FnO, and implemented in the RMLMapper. The captured metadata can be used to more easily share, reproduce, and compare the dataset generation process, across software environments.
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
SemSci2017 - Detailed Provenance Capture of Data Processing
1. Detailed
Provenance Capture
of Data Processing
Ben De Meester, Anastasia Dimou,
Ruben Verborgh, and Erik Mannens
Ghent University – imec – IDLab, Belgium
22. What do we want?
Term-level,
implementation-independent provenance
for schema transformations
for data transformations
23. What do we want?
Term-level,
implementation-independent provenance
for schema transformations
for data transformations
Generated automatically
24. What do we want?
Term-level,
implementation-independent provenance
for schema transformations
for data transformations
Generated automatically
Declarative generation process
25. Steps
Align schema and data transformations
in a declarative document
Generate provenance based on
declarative schema transformations
Generate provenance based on
declarative data transformations
28. Declarative generation process? Solved!
Align schema and data transformations in a declarative
document
RML + FnO for DBpedia EF
Declarative data transformations for Linked Data generation: the case of DBpedia
De Meester, B., Maroy, W., Dimou, A., Verborgh, R., and Mannens, E.
Sustainable Linked Data Generation: The Case of DBpedia
Maroy, W., Dimou, A., Kontokostas, D., De Meester, B., Verborgh, R., Lehmann, J., Mannens, E. and Hellmann, S.
30. Schema transformations provenance?
Solved!
Generate provenance based on declarative mapping
document
RML + PROV
Automated metadata generation for Linked Data generation and publishing workflows
Dimou, A., De Nies, T., Verborgh, R., Mannens, E., and Van de Walle, R.
37. Aligning FnO and PROV
Output
prov:Entity
Tool
prov:Agent
wasGeneratedBy
Data Transformation
prov:Activity
Function
prov:Entity
used
Input
prov:Entity
used wasAssociatedWith
wasAttributedTo
40. Cool thing #1:
System details complementary
Output
prov:Entity
Tool
prov:Agent
wasGeneratedBy
Data Transformation
prov:Activity
Function
prov:Entity
used
Input
prov:Entity
used wasAssociatedWith
wasAttributedTo
41. Cool thing #2:
Aligning with RML complementary
wasInformedBy
Output
prov:Entity
Tool
prov:Agent
wasGeneratedBy
Data Transformation
prov:Activity
Function
prov:Entity
used
Input
prov:Entity
Schema
Transformation
prov:Activity
used wasAssociatedWith
wasAttributedTo
42. Cool thing #3:
It actually works
RMLMapper
https://github.com/RMLio/RML-Mapper
FunctionProcessor
https://github.com/FnOio/function-processor-java
DBpedia Extraction Sample
https://fno.io/prov/dbpedia/
43. How can we find a drunk Barney?
Query for long-lasting processes
Query all outputs of a certain function/tool
Query all input-output pairs
44. What to do with a drunk Barney?
Performance evaluation
Qualitative comparison
Iterative improvement
(only changing what is needed!)
46. Detailed
Provenance Capture
of Data Processing
Ben De Meester, Anastasia Dimou,
Ruben Verborgh, and Erik Mannens
Ghent University – imec – IDLab, Belgium
Notes de l'éditeur
Not just a person, could be buggy software as well
Explain term-level (example)
Because it’s declarative, it _can_ be generated automatically
Declarative explains complete generation workflow without implementaiton
Ideal because declarative and in RDF
In summary, we propose a fully declarative generation process
and applied this by aligning FnO to PROV.
There’s a lot of cool things about this, but there’s one uncool thing…
Any schema tf
Provenance that provides more insight in the generation of a dataset, thus helps evaluation, comparison, and improvement of dataset generation