A use case designed in the context of the Dataone provenance woring group illustrating how the provenance traces generated by differet workflow engines can be quered via the D-PROV model.
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
D-prov use-case
1. Use Case for D-PROV:
Querying Provenance Traces Produced by
Workflows Enacted by Different Systems
Khalid Belhajjame,
Fernando Seabra Chirigati,
Victor Cuevas
2. Context and Objective
• D-PROV is a model that capture both workflow definitions, their provenance as well as the
provenance of the results obtained by their execution. It expressive enough to capture the
definition of workflows and provenance traces that are specified in multiple workflow
systems, in particular Kepler, Taverna and VisTrails
• D-PROV provides users with an integrated access to workflow definitions and associated
provenance traces
• It uses (extends) the W3C PROV model to capture the provenance traces produced by the
execution of such workflows
• The objective of this use case is to show that D-PROV users are able to query (and
combine) provenance traces that are produced by (equivalent) workflows that are specified
and enacted using different systems, namely Taverna and VisTrails
• Note that while in the use case we focus on two equivalent workflows, generally
speaking, D-PROV is expected to allow users to query and combine provenance traces of
workflows that are not necessarily equivalent.
3. Approach
• The approach adopted in the use case is a four-step process
that is illustrated in the figure below
Enact the workflows within their native
Enact the workflows done
within their native system
system
Export the provenance traces in the native done
format of the workflow systems
Map the workflows and associated
ongoing
provenance traces to D-PROV
Query the provenance traces produced by the
workflow system using D-PROV
4. Workflows
We used two (equivalent) workflows specified within Taverna and
VisTrails. Both workflows implement a simple in-silico experiment for
pathway analysis. Given gene IDs, the workflows fetch the
corresponding pathways. To do so, they make use of two KEGG web
services
Taverna Workflow Vistrails Workflow
5. Provenance Traces
• The two workflows were enacted within their respective system
using different (yet overlapping) set of gene Ids as inputs
• The provenance traces were then captured and exported in different
formats
• From the Taverna workflow, we used PROVO and JANUS formats
• From the VisTrails workflow, we used their own provenance format
(based on XML) and OPM
• The workflows and their provenance are accessible through
myExperiment [1]
• Workflows and their provenance traces are now being mapped to D-
PROV
[1] http://www.myexperiment.org/packs/317.html
6. Queries
Once the mapping is done, we would like to issue some queries, as the
ones specified below, against D-PROV:
• Q1: Give the pathways that were produced by the pathway analysis
workflow (as is defined within D-PROV), specifying the gene IDs that
were used as inputs to that workflow
• The result of this query should be the union of pathways returned by
Taverna and VisTrails workflows, together with the gene IDS used as
input to both workflows.
• Q2: Give the pathways that were produced by the Taverna
workflow, and that are associated with gene IDs that were not used
as input to the VisTrails workflow
• This is a diff query