More Related Content
Similar to Provenance witha purpose (20)
More from Khalid Belhajjame (20)
Provenance witha purpose
- 1. Provenance with a Purpose
Khalid Belhajjame
PSL, Université Paris-Dauphine, LAMSADE
kbelhajj@gmail.com
© K. Belhajjame 1
December 9th, 2022
- 2. We start with a short tale ... about provenance
Characters:
• Alice, a scientists who utilize workflows for their computational experiment and
analyses
• Bob, a believer in the greatness of provenance, who wants to spread the word
© K. Belhajjame 2
December 9th, 2022
- 7. Soundslike I have foundmy hapiness,I will
definitlytryit
© K. Belhajjame 7
December 9th, 2022
- 12. Needless tosaythatI have even moretrouble
makingsense ofthe provenanceoftheexecutions
ofthe workflowsof mycolleagues
© K. Belhajjame 12
December 9th, 2022
- 13. Oh, andthe executionofmy workflowsis getting
slower, andI cannotaffordtostoreall collected
provenance… I justremove it afterfew workflow
executions
© K. Belhajjame 13
December 9th, 2022
- 14. Moral of the story …
• By and large, provenance in current systems is collected without really considering the
requirements of the applications that will be using it
• As a result, we end up collecting all sorts of things just to find later that:
• Interpretability. Collected provenance is difficult to understand
• Relevance. Most of collected provenance is not relevant for the task at hand,
• Completeness. It does not contains all the information needed for the task at hand.
• This conclusions are not limited to workflows
Capture
Provenance
Workflow System
Provenance Log
© K. Belhajjame 14
What can I do with
collected provenance?
December 9th, 2022
- 15. Here, I am arguing for (and by the way coining a new
term), that is “Provenance with a purpose”
© K. Belhajjame 15
December 9th, 2022
- 16. Debugging Workflows
• Scenario
• The workflow developer defines breakpoints. A breakpoint is associated with a step (an activity or
subworkflow) in the workflow.
• During the execution of the workflow, the execution of the workflow paused before and after the
activities associated with breakpoints
• Requirements provenance-wise
• Recording and displaying to the workflow developer the data bound to the input and output of
the steps associated with the breakpoint.
• May involve recording the state of objects that are outside the scope of the inputs and outputs of
the activity that is subject too breakpointing, e.g., a file or a database that is upated by the
activity in question
• One can imagine a situation, where the developer alter the input data of a given step that is
associated with a breakpoint
• Input provided by the workflow developer
• Breakpoints
• Optionally, s/he can provide values to use with given activity input values
© K. Belhajjame 16
Relevance
Completness
December 9th, 2022
- 17. Experiment Reporting
• Senario
• Summarization:
• Identify the subset of the wokflow (activities that are of interest)
• Retains the information relative only to a subset of the input of the workflow and/or its
output
• Abstraction: specify domain annotations to use
• Inputs provided by the user
• Template for reporting.
• For example, sections that needs to be filled, and the corresponding steps (or
subworkflows) in the overall workflow
• Source of annotations, it can be external resources, e.g., Bio.Tools, but it can be extracted in
certain cases from the data values itself
• Requirements provenance-wise
• Recording only the execution information that is necessary to feed the report
© K. Belhajjame 17
Relevance
Completness
Iterpretability
December 9th, 2022
- 18. Policy Verification
• Senario
• A number of policies on the data
• For example, before feeding sensitive data values to a remote analysis, they should be
anonymized or stripped of identifiers
• The way the data is used need to comply with the rights of the owners or the policies
defined on the data
• Provenance wise
• Some policies can be verified by directly analyzing the prospective provenance (workflow
specifications)
• Others can only be checked during the execution of the workflow through analysis of the
retrospective provenance of the workflow
• Not that in this case, the execution of a workflow can be halted if it is found to breach a policy
• Input provided by the user of the workflow
• Policies associated with the datasets that are fed to the workflow, as well as those associated
with the datasets underlying the execution of the activities of the workflow
© K. Belhajjame 18
Relevance
Completness
Iterpretability
December 9th, 2022
- 19. © K. Belhajjame 19
Workflow
Engine
Workflow
Exec Traces
Operating
System
Data
management
system
The Web
Information sources
Provenance Augmentation
Abstraction/Annotation
Provenance Layer
Wf
Debugger
Exp
Reporting
Policy
Checker/Enforcer
Applications Layer
Architecture Wf Designer Wf user
Reproducibility
checker
Users
December 9th, 2022
- 20. How Does it work ?
© K. Belhajjame 20
Choose your task
Provide necessary
inputs if any
Capture (only the)
necessary provenance
Assist the user in the
task at hand
User
System
System
User
December 9th, 2022
- 21. Of course this is far from being perfect…
© K. Belhajjame 21
December 9th, 2022
- 22. This is not entierly new
• Alban Gaignard, Hala Skaf-Molli, Khalid Belhajjame: Findable and reusable workflow data products: A genomic
workflow case study. Semantic Web 11(5): 751-763 (2020)
• Renan Souza, Marta Mattoso:Provenance of Dynamic Adaptations in User-Steered Dataflows. IPAW 2018: 16-29
• Timothy M. McPhillips et al. YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering
Workflow Information from Scripts. CoRR abs/1502.02403 (2015)
• Pinar Alper, Khalid Belhajjame, Carole A. Goble: Static analysis of Taverna workflows to predict provenance
patterns. Future Gener. Comput. Syst. 75: 310-329 (2017)
• Daniel Deutch, Amir Gilad, Yuval Moskovitch: Efficient provenance tracking for datalog using top-k queries.
VLDB J. 27(2): 245-269 (2018)
© K. Belhajjame 22
What new then?
A single framwork that caters and can be adaptaed for different
provenance usage scenarios
December 9th, 2022
- 23. Provenance with a Purpose
Khalid Belhajjame
PSL, Université Paris-Dauphine, LAMSADE
kbelhajj@gmail.com
© K. Belhajjame 23
December 9th, 2022