This document provides an overview of the PROV provenance model and some of its extensions. It discusses the motivation for provenance, the history and development of the PROV model, its key concepts of entities, activities, and agents. It also describes extensions like ProvONE and PAV that build upon PROV to model workflow and scientific provenance.
A Sightseeing Tour of Prov and Some of its Extensions
1. A Sightseeing Tour of
PROV and Some of its
Extensions
Khalid Belhajjame
LAMSADE, Université Paris-Dauphine
16/03/16 MADICS: ReProVirtuFlow 1
2. Why do we care about provenance
…
Help explain results and outliers
Assess trust and quality
Promote systems transparency: users are able to
determine whether a particular use of information is
appropriate under a set of rules.
Assist in debugging
Promote reuse and reproducibility
16/03/16 MADICS: ReProVirtuFlow 2
3. A bit of History
Provenance is not a new topic. There has been a lot of
provenance work in:
Databases, Workflows, Information retrieval, ….
By 2009, there have been a number of models/vocabularies
for expressing provenance information
Open Provenance Model (OPM),
Proof Markup Language (PML),
Provenance Vocabulary,
PREservation Metadata : Implementation Strategies
(PREMIS),
Semantic Web Applications in Neuromedicine (SWAN)
Ontology,
Dublin Core, ….
16/03/16 MADICS: ReProVirtuFlow 3
4. A bit of History
2009-2010: W3C Provenance Provenance Incubator Group
Objective: provides a state of the art and possible
recommendations for standardization efforts
2011: W3C Provenance Working Group
Objective: To define a standard vocabulary primarily for
the semantic Web
2013: The W3C Provenance Working Group published a
number of PROV recommendations and notes:
PROV-DM, PROV-O, …
Since then a number of models and vocabularies have
extended and/or defined mapping rules to PROV
16/03/16 MADICS: ReProVirtuFlow 4
7. Provenance
The W3C Provenance Working Group defined provenance
as:
Provenance is defined as a record that
describes the people, institutions, entities,
and activities involved in producing,
influencing, or delivering a piece of data or
a thing.
16/03/16 MADICS: ReProVirtuFlow 7
8. PROV…
is not a recommendation for representing and
collecting provenance information that should be
adopted internally by all systems.
That is not realistic, and won’t happen any time soon
Instead, the aim to facilitate and promote
interoperability between domains and applications that
adopt their specific representations of provenance.
More pragmatic, and thus likely to happen.
16/03/16 MADICS: ReProVirtuFlow 8
11. Entity
An entity is a physical, digital, conceptual, or other
kind of thing with some fixed aspects; entities may
be real or imaginary.
Example: An entity may be the document at IRI
http://www.bbc.co.uk/news/science-environment-
17526723, a file in a file system, a car, or an idea.
16/03/16 MADICS: ReProVirtuFlow 11
12. Activity
An activity is something that occurs over a period of
time and acts upon or with entities; it may include
consuming, processing, transforming, modifying,
relocating, using, or generating entities.
Example: An activity may be the publishing of a
document on the Web, sending a twitter message,
extracting metadata embedded in a file, driving a
car from Paris to Lyon, etc.
16/03/16 MADICS: ReProVirtuFlow 12
13. Agent
An agent is something that bears some form of
responsibility for an activity taking place, for the
existence of an entity, or for another agent's activity.
Example: A site selling books on the Web and the
companies hosting them can be seen as agents.
16/03/16 MADICS: ReProVirtuFlow 13
14. Usage and Generation
Usage is the beginning of utilizing an entity by an
activity. Before usage, the activity had not begun to
utilize this entity and could not have been affected
by the entity.
Example: A program beginning to read an input file
Generation is the completion of production of a new
entity by an activity. This entity did not exist before
generation and becomes available for usage after
this generation.
Example: the completed creation of a file by a
program
16/03/16 MADICS: ReProVirtuFlow 14
15. Derivation
Derivation is a transformation of an entity into
another, an update of an entity resulting in a new
one, or the construction of a new entity based on a
pre-existing entity.
Example: The transformation of a relational table
into a linked data set
16/03/16 MADICS: ReProVirtuFlow 15
16. Association and Attribution
An activity association is an assignment of
responsibility to an agent for an activity, indicating
that the agent had a role in the activity
Example: the workflow system is responsible for the
enactment of a workflow execution
Attribution is the ascribing of an entity to an agent.
Example: A blog post can be attributed to an author,
a mobile phone to its manufacturer.
16/03/16 MADICS: ReProVirtuFlow 16
19. PROV Compliant
Vocabularies
This is by no mean complete ….
PRO
V
ProvONE
wfprov
wfdescc
DC
PAV
extends
extends
c
extends
mapsTo
mapsTo
16/03/16 MADICS: ReProVirtuFlow 19
23. Acknowledgements
W3C Provenance Working Group
DataONE Workflow and Provenance Interest Group
PAV’s friends: Paolo Ciccarese, Stian Soiland-
Reyes, Alasdair JG Gray, Carole Goble and Tim
Clark
16/03/16 MADICS: ReProVirtuFlow 23
24. A Sightseeing Tour of
PROV and Some of its
Extensions
Khalid Belhajjame
LAMSADE, Université Paris-Dauphine
16/03/16 MADICS: ReProVirtuFlow 24
Notes de l'éditeur
W3C Incubator Activity with a charter to provide a state-of-the art understanding and develop a roadmap in the area of provenance and possible recommendations for standardization efforts.
W3C Incubator Activity with a charter to provide a state-of-the art understanding and develop a roadmap in the area of provenance and possible recommendations for standardization efforts.
IRI: Internationalized Resource Identifier
The core concepts: Entity, Activity, Agent, Usage and Generation, are supported by almost all implementations.
On the other hand, we observe that the core concepts of Attribution, Communication and Delegation are supported by less than half of the implementations.