1. www.eudat.euEUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
Modeling Data Life Cycles with
PROV
Yann Le Franc, PhD
e-Science Data Factory, France
EUDAT Conference
Semantic Services in EOSC
Porto, January 22-25 2018
3. What is a Data Life Cycle?
CREATING
DATA
PROCESSING
DATA
ANALYSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA
Ref: UK Data Archive: http://www.data-archive.ac.uk/create-manage/life-cycle
4. About Data Life Cycles
A lifecycle approach ensures to identify and plan the
necessary data management stages (Higgins, 2008)
Provide a structure for considering the many operations that
will need to be performed on a data record throughout its life
(Ball, 2012)
A large diversity of DLC
Review Committee on Earth Observation Satellite (2012) –
51different DLCs
Review Ball (2012): 7 DLCs
Pennock M. 2007 Digital curation: a life cycle approach to managing and preserving usable digital information.Library and Archives Journal, Issue 1
Higgins S. 2008 The DCC Curation Lifecycle Model, the International Journal of Digital Curation, Issue 1, Volume 3
Ball A. 2012 Review of Data Management Lifecycle Models. University of Bath (unpublished)
5. A proposed definition
Data One definition
“The data life cycle provides a high level overview of the
stages involved in successful management and
preservation of data for use and reuse. Multiple
versions of a data life cycle exist with differences
attributable to variation in practices across domains or
communities.”
7. CREATING
DATA
PROCESSING
DATA
ANALYSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA
UK Data Archive DLC
Ref: UK Data Archive: http://www.data-archive.ac.uk/create-manage/life-cycle
CREATING DATA: designing research,
DMPs, planning consent, locate existing
data, data collection and management,
capturing and creating metadata
RE-USING DATA: follow-
up research, new
research, undertake
research reviews,
scrutinising findings,
teaching & learning
ACCESS TO DATA:
distributing data,
sharing data,
controlling access,
establishing copyright,
promoting data PRESERVING DATA: data storage, back-
up & archiving, migrating to best format
& medium, creating metadata and
documentation
ANALYSING DATA:
interpreting, & deriving
data, producing outputs,
authoring publications,
preparing for sharing
PROCESSING DATA:
entering, transcribing,
checking, validating and
cleaning data, anonymising
data, describing data,
manage and store data
8. The Data One DLC
https://www.dataone.org/data-life-cycle
9. Digital Curation Center DLC
http://www.dcc.ac.uk/sites/default/files/documents/publications/DCCLifecycle.pdf
14. Going beyond the classical DLC view
Aim: Modeling DLCs and relations with EUDAT services
Rethinking DLC’s definition: a more operational definition
« Data Life Cycle can be considered as the ensemble of all activities,
actions, and steps that describe the stages through which data
passes, from the time it has been created until its obsolescence. »
DLC can be considered as Data Management
Workflows
15. How to describe workflows?
Declarative langages (before execution)
Workflow Description Language (WDL)
SCULF2 – Taverna Apache
Wf4ever models
Workflow engine specific
Provenance trail (after execution)
16. W3C PROV: tracking the past
From L. Moreau and P. Groth, Provenance, vol. 3, no. 4. Morgan & Claypool Publishers, 2013, pp. 129–129.
19. Modeling activities and agents: Data One Use
case
Data Life Cycle are
constrained by service
implementation
Activities can be
recurrent through the
DLC
High level (data
publication, data
sharing,… ) vs. low
level activities (data
curation, data
documentation,…)
20. Integrating data entities: the EPOS use-case
How to deal with
entities as they can
be transformed,
created or obsoleted?
Should we consider
that a DLC is
associated with each
data entity?
21. Building a proof-of-concept service
User Interface to create graphical representation of the
DLCs
Extended library to create DLC plan and Provenance
template.
Store plans and templates
API to access plan and template
API to fill in provenance template during execution
22. Conclusion
We can create a declarative description of DLCs using PROV
This description does not support directly logical transition between
the DLC steps
Logic can be added to the PROV graph using graph-based rule
langage such as SWRL (Semantic Web Rule Langage). This
approach is currently tested
Descriptions could be used to orchestrate the various EUDAT
services into a user-defined workflow
We can derive directly provenance templates from the declarative
description