SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
D-PROV: Extending the PROV Provenance Model
                  to express Workflow Structure



Paolo Missier(1), Saumen Dey(2), Khalid Belhajjame(3),
 Vıctor Cuevas-Vicentt́ın(2), Bertram Ludaescher(2)
                (1) Newcastle University, UK
                   (2) UC Davis, CA, USA
               (3) University of Manchester, UK


                       TAPP’13
                     Lombard, IL.
                      April, 2013
Queries that require provenance
    Q1: “track the lineage of the final outputs of the workflow”
    Q2: “list the parameter values that were used for a specific task t in the workflow”
    Q3: “check that the provenance traces conform to the structure of the workflow”


    Prospective Provenance (p-prov):
    - representation of the workflow itself;

    Retrospective Provenance (r-prov):
    - provenance of the data produced by one workflow run

    Provenance of the Process:
    - account of the evolution of the workflow across versions




    FREIRE, J., KOOP, D., SANTOS, E., AND SILVA, C. T. Provenance for Computational Tasks: A Survey. Computing
2   in Science and Engineering 10, 3 (2008), 11–21.
PROV @W3C: scope and structure
    Recommendation
    track




3             source: http://www.w3.org/TR/prov-overview/
r-prov and p-prov in plain PROV

              T1            prov:type= "prov:plan"          T2           p-prov


        wasAssociatedWith                            wasAssociatedWith

                     wasGeneratedBy              used
             T1Inv                         d               T2Inv         r-prov




    // p-prov: tasks, but no data or activities
    entity(T1, [ prov:type = 'prov:plan'])
    entity(T2, [ prov:type = 'prov:plan'])
    // r-prov - task invocation and data
    activity(T1inv)
    activity(T2inv)
    entity(d) // data flowing between two task instances
    wasGeneratedBy(d, T1inv)
    used(T2inv, d)
    // connecting r-prov and p-prov
    wasAssociatedWith(T1inv, _, T1) // T1 is the plan for T1inv
    wasAssociatedWith(T2inv, _, T2) // T1 is the plan for T2inv

4
Reference dataflow models

    wf
                                                                     Processors, ports, data links
              op1                ip1                                 Kepler
         T1                            T2
              op2                ip2
                                                                     Taverna
                                                                     VisTrails
                                                                     e-Science Central
                                                                     ...




    wf                                                               Processors, channels
              T1                ch1                T2



                               ch2                                   Dataflow process networks (*)
                                                                     (e.g. a specific Kepler semantics)




5   (*) LEE, E. A., AND PARKS, T. M. Dataflow process networks. Proceedings of the IEEE 83, 5 (1995), 773–801.
Extensions I / p-prov / ports model
                               wf

                                           op1                ip1
                                      T1                            T2
                                           op2                ip2




          prov:type= "prov:plan"
           prov:type= "D1:task"


                              hasOutputPort
                      T1                         op1                            hasOutPort(T1, op1)
                                                                                hasInPort(T2, ip1)
                   isTaskOf                                                     dataLink(op1, ip1)
                                                                                isTaskOf(wf, T1)
     prov:type=                               dataLink   prov:type= "D1:port"
                       wf                                                       isTaskOf(wf, T2)
    "D1:workflow"


                   isTaskOf

                               hasInputPort
                      T2                         ip1



             prov:type= "prov:plan"
              prov:type= "D1:task"
6
Extensions II / p-prov / channel model
                                        wf
                                                  T1                  ch1             T2



                                                                      ch2




                      prov:type= "prov:plan"
                       prov:type= "D1:task"

                                               isSourceOf
                                   T1                           ch2


                                isTaskOf

    prov:type= "D1:workflow",                                prov:type= "D1:channel"
      prov:type= "prov:plan"       wf


                               isTaskOf           isSourceOf
                                                                            isSourceOf(T1,ch2)
                                                isSinkOf                    isSourceOf(T1,ch1)
                                  T2                           ch1
                                                                            isSinkOf(T2,ch1)
                    prov:type= "prov:plan"
                                                                            isTaskOf(T1, wf)
                     prov:type= "D1:task"                                   isTaskOf(T2, wf)

7
p-prov/r-prov pattern for port-oriented workflows
                                        hasOutputPort
                                 T1                     op1

                                                                                hasOutPort(t1, op1)
                       wasAssociatedWith
                                                                                hasInPort(t2, ip1)
                                T1Inv                                           dataLink(op1, ip1)
         isTaskOf                                       onOutPort
                                                                                isTaskOf(wf, t1)
                             wasStartedBy
                                                                                isTaskOf(wf, t2)
         wasAssociatedWith                                     d     dataLink
    wf                          wfInv
                                                                                activity (wfRun)
                             wasStartedBy                                       activity ( t1inv )
         isTaskOf                                         onInPort
                                                                                activity ( t2inv )
                                T2Inv
                                                                                entity (d)
                        wasAssociatedWith                                       onOutPort(d, op1, t1Inv)
                                         hasInputPort                           onInPort(d, ip1, t2Inv)
                                 T2                      ip1




8
p-prov/r-prov pattern for port-oriented workflows
                                         hasOutputPort
                                  T1                     op1

                                                                                 hasOutPort(t1, op1)
                        wasAssociatedWith
                                                                                 hasInPort(t2, ip1)
                                 T1Inv                                           dataLink(op1, ip1)
          isTaskOf                                       onOutPort
                                                                                 isTaskOf(wf, t1)
                              wasStartedBy
                                                                                 isTaskOf(wf, t2)
          wasAssociatedWith                                     d     dataLink
     wf                          wfInv
                                                                                 activity (wfRun)
                              wasStartedBy                                       activity ( t1inv )
          isTaskOf                                         onInPort
                                                                                 activity ( t2inv )
                                 T2Inv
                                                                                 entity (d)
                         wasAssociatedWith                                       onOutPort(d, op1, t1Inv)
                                          hasInputPort                           onInPort(d, ip1, t2Inv)
                                  T2                      ip1




    Lossy mapping to plain PROV: Port removal

              wasGeneratedBy(D, tInv) :− onOutPort(D, _, tInv ).
              used(tInv, D)           :− onInPort(D, _, tInv)

8
p-prov/r-prov pattern for channel-oriented workflows
                                T1

                                              isSourceOf
                          wasAssociatedWith                           sourceOf(t1,ch)
                               T1Inv                                  sinkOf(t2,ch)
             isTaskOf
                                                                      isTaskOf(t1, wf)
                                                       wasReadFrom
                                                                      isTaskOf(t2, wf)
                           wasStartedBy
                                                                      activity (wfRun)
         wasAssociatedWith                                       d
    wf                         wfInv             ch1                  activity ( t1inv )
                                                                      activity ( t2inv )
                           wasStartedBy                wasWrittenTo   entity (d)
              isTaskOf         T2Inv                                  wasWrittenTo(d,ch, t1Inv)
                                                                      wasReadFrom(d,ch, t2Inv)
                        wasAssociatedWith
                                               isSinkOf
                                T2




9
p-prov/r-prov pattern for channel-oriented workflows
                                T1

                                              isSourceOf
                          wasAssociatedWith                           sourceOf(t1,ch)
                               T1Inv                                  sinkOf(t2,ch)
             isTaskOf
                                                                      isTaskOf(t1, wf)
                                                       wasReadFrom
                                                                      isTaskOf(t2, wf)
                           wasStartedBy
                                                                      activity (wfRun)
         wasAssociatedWith                                       d
    wf                         wfInv             ch1                  activity ( t1inv )
                                                                      activity ( t2inv )
                           wasStartedBy                wasWrittenTo   entity (d)
              isTaskOf         T2Inv                                  wasWrittenTo(d,ch, t1Inv)
                                                                      wasReadFrom(d,ch, t2Inv)
                        wasAssociatedWith
                                               isSinkOf
                                T2




    Lossy mapping to plain PROV: Channel removal:

            wasGeneratedBy(d, tInv) :− wasWrittenTo(d,ch, t1Inv )
            used(tInv, D):−wasReadFrom(d,ch,t2Inv)

9
Bundles, provenance of provenance
     A bundle is a named set of provenance descriptions, and is itself an entity,
     so allowing provenance of provenance to be expressed.
               pm:bundle1

                    draft   used                wasGeneratedBy     draft
                                   commenting
                     v1                                          comments




        bundle pm:bundle1

        entity(ex:draftComments)
        entity(ex:draftV1)

        activity(ex:commenting)
        wasGeneratedBy(ex:draftComments, ex:commenting,-)
        used(ex:commenting, ex:draftV1, -)
        endBundle
        ...
        entity(pm:bundle1, [ prov:type='prov:Bundle' ])
        wasGeneratedBy(pm:bundle1, -, 2013-03-20T10:30:00)
        wasAttributedTo(pm:bundle1, ex:Bob)
10
Structural workflow nesting using bundles
     Repurposing: use bundles to associate a workflow execution with the
     provenance it generates

          entity (wfRunTrace, [ prov:type=’prov:Bundle’ ])
          wasGeneratedBy(wfRunTrace,wfInv,-)

     This makes it possible to write hierarchical provenance of nested workflows,
     recursively:

      entity (T, [prov:type="D1:task", prov:type="D1:workflow"])


            bundle wfRunTrace
              activity(wfRun)      // run of top level wf
              activity(Tinv)       // run of T, a sub-workflow

              wasAssociatedWith(Tinv, _, T)
              entity(TinvTrace, [ prov:type='prov:Bundle' ])
              wasGeneratedBy(TinvTrace, Tinv, _)
              ...
            endbundle
11
Answering the sample queries
     Q3: “match the provenance trace to the workflow structure”

     Two steps:
     - define rules that entail p-prov relations from r-prov relations, and
     - check that those new p-prov relations are consistent with any constraints
       defined on the workflow structure / infer new p-prov statements

                   dataLink (OP, IP) :− onOutPort(D,OP,_),
                                        onInPort(D,IP,_).
                      OP



                                                 wf

                                                           op1           ip1
                       onOutPort
                                                      T1                       T2
                                                           op2           ip2
                            D      dataLink



                        onInPort

                                                 constraint violation?


                       IP
12
Structure inferences



            T1           OP


     wasAssociatedWith
                                     hasOutPort(T, OP) :−
            I1                              onOutPort(D,OP,I1),
                         onOutPort          wasAssociatedWith(I1,_,T1).

                               D



                          onInPort   hasInPort(T, IP) :−
            I2                              onInPort(D,IP,I1),
     wasAssociatedWith                      wasAssociatedWith(I1,_,T1).

            T2            IP




13
Structure inferences


                     isTaskOf(T2, Wf) :−             onOutPort(D,OP,I1),
                                                     hasOutPort(T1, OP),
                                                     onInPort(D,IP,I2),
                                                     hasInPort(T2, IP),
                            T1           OP          wasAssociatedWith(I1,_,T1),
                     wasAssociatedWith
                                                     wasAssociatedWith(I2,_,T2),
                                                     isTaskOf(T1, Wf).
                            I1

          isTaskOf                       onOutPort


     wf                                        D



          isTaskOf                        onInPort

                            I2

                     wasAssociatedWith


                            T2            IP




14
composed by three main parts: the structure of the experiment        represents the execution of some activity instance (activity


                                          Other PROV extensions into “workflow-land”
                                                                                          (white classes in the UML class diagram), execution of the           Execute activity), while the second defines the execution
                                                                                          experiment (dark gray classes) and environment configuration         properties of one SWfMS (activity Execute workflow). Also, the
                                                                                          (light gray classes). Furthermore, the stereotypes in UML class      execution of an activity instance determines which parameters are
                                                                                          diagram are used to represent PROV components.                       consumed by each instance of the activity.
                                                                                          The agent Scientist represents a person to use computational         By using PROV-Wf we are able to represent provenance data to
                                                                                          resources to execute the experiment (composed as a workflow).        be distributed and queried at runtime. In the next sub-section we
          Prov-Wf:                                                                        Also, the agent Scientist is associated to a machine (i.e. agent
                                                                                          Machine). The agent Machine establishes an association with a
                                                                                                                                                               present how this provenance model is coupled to a set of
                                                                                                                                                               components developed for capturing and querying runtime
                                                                                          workflow (plan Workflow), which is composed by a set of              provenance data.
         Flavio Costa, Vítor Silva, Daniel de Oliveira, Kary Ocaña, Eduardo Ogasawara, Jonas Dias, and Capturing Provenance
                                                                                        3.2 Components for Marta Mattoso,
                                                                                          activities (i.e. plan WActivity). Each activity is responsible for
                                                                                          executing a program in a machine with a specific configuration.
         Capturing and Querying Workflow Runtime Provenance with PROV: a Practical Approach, Procs.components to import components
                                                                                        We have developed a series of
                                                                                                                      BigProv’13,provenance
                                                                                          The invocation of a program within a workflow (i.e. execution
                                                                                        data from different SWfMS to PROV-Wf model. Our
                                                                                          instance) uses a set of parameters that can be seen as a set of
         Genova, Italy, March 2013




                                                          Plan



                  DataLink              hasDataLink     Workflow
                                                                      hasSubProcess
                                 hasSink
     hasSource                  Input        hasInput
                                                        Process
            Output                         hasOutput

                                                                                                                                             Figure 1 PROV-Wf data model
                                               used
                   Parameter                                      describedByProcess

                                            usedInput
     describedByParameter

                     Artifact                               ProcessRun           wasEnactedBy                   WfProv from the Wf4Ever project
                                                                                   WorkflowEngine
                                        wasOutputFrom
                      Entity                                     Activity
                                                                                                                  www.wf4ever-project.org
                                   wasGeneratedBy                                      SoftwareAgent




15
Summary and extensions




     • Simple extensions to PROV
       – designed to model p-prov
       – complementary to r-prov


     • They enable queries that cut across r-prov and p-prov

     • Bundle mechanism used for provenance of nested workflow components

     • Next step: harmonize similar extensions proposed by other groups
16     – overall goal is to achieve interoperability

Contenu connexe

Tendances

Pysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyPysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyRoberto Polli
 
LeFlowを調べてみました
LeFlowを調べてみましたLeFlowを調べてみました
LeFlowを調べてみましたMr. Vengineer
 
Phyton Learning extracts
Phyton Learning extracts Phyton Learning extracts
Phyton Learning extracts Pavan Babu .G
 
Ekon bestof rtl_delphi
Ekon bestof rtl_delphiEkon bestof rtl_delphi
Ekon bestof rtl_delphiMax Kleiner
 
Use of Lua in Lab Devices
Use of Lua in Lab DevicesUse of Lua in Lab Devices
Use of Lua in Lab DevicesClaus Kühnel
 
Tutorial of SF-TAP Flow Abstractor
Tutorial of SF-TAP Flow AbstractorTutorial of SF-TAP Flow Abstractor
Tutorial of SF-TAP Flow AbstractorYuuki Takano
 
Tensor comprehensions
Tensor comprehensionsTensor comprehensions
Tensor comprehensionsMr. Vengineer
 
Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...Samuel Lampa
 
C programming session 11
C programming session 11C programming session 11
C programming session 11AjayBahoriya
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)Yuuki Takano
 
Разработка кросс-платформенного кода между iPhone < -> Windows с помощью o...
Разработка кросс-платформенного кода между iPhone < -> Windows с помощью o...Разработка кросс-платформенного кода между iPhone < -> Windows с помощью o...
Разработка кросс-платформенного кода между iPhone < -> Windows с помощью o...Yandex
 
TDD in C - Recently Used List Kata
TDD in C - Recently Used List KataTDD in C - Recently Used List Kata
TDD in C - Recently Used List KataOlve Maudal
 
Demystify eBPF JIT Compiler
Demystify eBPF JIT CompilerDemystify eBPF JIT Compiler
Demystify eBPF JIT CompilerNetronome
 
Macroprocessor
MacroprocessorMacroprocessor
Macroprocessorksanthosh
 
The bytecode hocus pocus - JavaOne 2016
The bytecode hocus pocus - JavaOne 2016The bytecode hocus pocus - JavaOne 2016
The bytecode hocus pocus - JavaOne 2016Raimon Ràfols
 

Tendances (20)

Pysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyPysmbc Python C Modules are Easy
Pysmbc Python C Modules are Easy
 
LeFlowを調べてみました
LeFlowを調べてみましたLeFlowを調べてみました
LeFlowを調べてみました
 
Tiramisu概要
Tiramisu概要Tiramisu概要
Tiramisu概要
 
Phyton Learning extracts
Phyton Learning extracts Phyton Learning extracts
Phyton Learning extracts
 
Ekon bestof rtl_delphi
Ekon bestof rtl_delphiEkon bestof rtl_delphi
Ekon bestof rtl_delphi
 
Basics of building a blackfin application
Basics of building a blackfin applicationBasics of building a blackfin application
Basics of building a blackfin application
 
Use of Lua in Lab Devices
Use of Lua in Lab DevicesUse of Lua in Lab Devices
Use of Lua in Lab Devices
 
Tutorial of SF-TAP Flow Abstractor
Tutorial of SF-TAP Flow AbstractorTutorial of SF-TAP Flow Abstractor
Tutorial of SF-TAP Flow Abstractor
 
Tensor comprehensions
Tensor comprehensionsTensor comprehensions
Tensor comprehensions
 
Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...
 
C programming session 11
C programming session 11C programming session 11
C programming session 11
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
 
Coroutine
CoroutineCoroutine
Coroutine
 
Разработка кросс-платформенного кода между iPhone < -> Windows с помощью o...
Разработка кросс-платформенного кода между iPhone < -> Windows с помощью o...Разработка кросс-платформенного кода между iPhone < -> Windows с помощью o...
Разработка кросс-платформенного кода между iPhone < -> Windows с помощью o...
 
TDD in C - Recently Used List Kata
TDD in C - Recently Used List KataTDD in C - Recently Used List Kata
TDD in C - Recently Used List Kata
 
Demystify eBPF JIT Compiler
Demystify eBPF JIT CompilerDemystify eBPF JIT Compiler
Demystify eBPF JIT Compiler
 
Macroprocessor
MacroprocessorMacroprocessor
Macroprocessor
 
Ocl 09
Ocl 09Ocl 09
Ocl 09
 
The bytecode hocus pocus - JavaOne 2016
The bytecode hocus pocus - JavaOne 2016The bytecode hocus pocus - JavaOne 2016
The bytecode hocus pocus - JavaOne 2016
 
F#3.0
F#3.0 F#3.0
F#3.0
 

Similaire à Tapp 13-talk

Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Paolo Missier
 
Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011Ed Dodds
 
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsValidation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsBertram Ludäscher
 
The theory of concurrent programming for a seasoned programmer
The theory of concurrent programming for a seasoned programmerThe theory of concurrent programming for a seasoned programmer
The theory of concurrent programming for a seasoned programmerRoman Elizarov
 
Data assimilation with OpenDA
Data assimilation with OpenDAData assimilation with OpenDA
Data assimilation with OpenDAnilsvanvelzen
 
Apache PIG - User Defined Functions
Apache PIG - User Defined FunctionsApache PIG - User Defined Functions
Apache PIG - User Defined FunctionsChristoph Bauer
 

Similaire à Tapp 13-talk (8)

Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011
 
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsValidation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
 
The theory of concurrent programming for a seasoned programmer
The theory of concurrent programming for a seasoned programmerThe theory of concurrent programming for a seasoned programmer
The theory of concurrent programming for a seasoned programmer
 
Data assimilation with OpenDA
Data assimilation with OpenDAData assimilation with OpenDA
Data assimilation with OpenDA
 
GoでKVSを書けるのか
GoでKVSを書けるのかGoでKVSを書けるのか
GoでKVSを書けるのか
 
Apache PIG - User Defined Functions
Apache PIG - User Defined FunctionsApache PIG - User Defined Functions
Apache PIG - User Defined Functions
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 

Plus de Paolo Missier

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsPaolo Missier
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Paolo Missier
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...Paolo Missier
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewPaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...Paolo Missier
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data SciencePaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...Paolo Missier
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...Paolo Missier
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...Paolo Missier
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...Paolo Missier
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff UniversityPaolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...Paolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 

Plus de Paolo Missier (20)

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 

Dernier

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Dernier (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Tapp 13-talk

  • 1. D-PROV: Extending the PROV Provenance Model to express Workflow Structure Paolo Missier(1), Saumen Dey(2), Khalid Belhajjame(3), Vıctor Cuevas-Vicentt́ın(2), Bertram Ludaescher(2) (1) Newcastle University, UK (2) UC Davis, CA, USA (3) University of Manchester, UK TAPP’13 Lombard, IL. April, 2013
  • 2. Queries that require provenance Q1: “track the lineage of the final outputs of the workflow” Q2: “list the parameter values that were used for a specific task t in the workflow” Q3: “check that the provenance traces conform to the structure of the workflow” Prospective Provenance (p-prov): - representation of the workflow itself; Retrospective Provenance (r-prov): - provenance of the data produced by one workflow run Provenance of the Process: - account of the evolution of the workflow across versions FREIRE, J., KOOP, D., SANTOS, E., AND SILVA, C. T. Provenance for Computational Tasks: A Survey. Computing 2 in Science and Engineering 10, 3 (2008), 11–21.
  • 3. PROV @W3C: scope and structure Recommendation track 3 source: http://www.w3.org/TR/prov-overview/
  • 4. r-prov and p-prov in plain PROV T1 prov:type= "prov:plan" T2 p-prov wasAssociatedWith wasAssociatedWith wasGeneratedBy used T1Inv d T2Inv r-prov // p-prov: tasks, but no data or activities entity(T1, [ prov:type = 'prov:plan']) entity(T2, [ prov:type = 'prov:plan']) // r-prov - task invocation and data activity(T1inv) activity(T2inv) entity(d) // data flowing between two task instances wasGeneratedBy(d, T1inv) used(T2inv, d) // connecting r-prov and p-prov wasAssociatedWith(T1inv, _, T1) // T1 is the plan for T1inv wasAssociatedWith(T2inv, _, T2) // T1 is the plan for T2inv 4
  • 5. Reference dataflow models wf Processors, ports, data links op1 ip1 Kepler T1 T2 op2 ip2 Taverna VisTrails e-Science Central ... wf Processors, channels T1 ch1 T2 ch2 Dataflow process networks (*) (e.g. a specific Kepler semantics) 5 (*) LEE, E. A., AND PARKS, T. M. Dataflow process networks. Proceedings of the IEEE 83, 5 (1995), 773–801.
  • 6. Extensions I / p-prov / ports model wf op1 ip1 T1 T2 op2 ip2 prov:type= "prov:plan" prov:type= "D1:task" hasOutputPort T1 op1 hasOutPort(T1, op1) hasInPort(T2, ip1) isTaskOf dataLink(op1, ip1) isTaskOf(wf, T1) prov:type= dataLink prov:type= "D1:port" wf isTaskOf(wf, T2) "D1:workflow" isTaskOf hasInputPort T2 ip1 prov:type= "prov:plan" prov:type= "D1:task" 6
  • 7. Extensions II / p-prov / channel model wf T1 ch1 T2 ch2 prov:type= "prov:plan" prov:type= "D1:task" isSourceOf T1 ch2 isTaskOf prov:type= "D1:workflow", prov:type= "D1:channel" prov:type= "prov:plan" wf isTaskOf isSourceOf isSourceOf(T1,ch2) isSinkOf isSourceOf(T1,ch1) T2 ch1 isSinkOf(T2,ch1) prov:type= "prov:plan" isTaskOf(T1, wf) prov:type= "D1:task" isTaskOf(T2, wf) 7
  • 8. p-prov/r-prov pattern for port-oriented workflows hasOutputPort T1 op1 hasOutPort(t1, op1) wasAssociatedWith hasInPort(t2, ip1) T1Inv dataLink(op1, ip1) isTaskOf onOutPort isTaskOf(wf, t1) wasStartedBy isTaskOf(wf, t2) wasAssociatedWith d dataLink wf wfInv activity (wfRun) wasStartedBy activity ( t1inv ) isTaskOf onInPort activity ( t2inv ) T2Inv entity (d) wasAssociatedWith onOutPort(d, op1, t1Inv) hasInputPort onInPort(d, ip1, t2Inv) T2 ip1 8
  • 9. p-prov/r-prov pattern for port-oriented workflows hasOutputPort T1 op1 hasOutPort(t1, op1) wasAssociatedWith hasInPort(t2, ip1) T1Inv dataLink(op1, ip1) isTaskOf onOutPort isTaskOf(wf, t1) wasStartedBy isTaskOf(wf, t2) wasAssociatedWith d dataLink wf wfInv activity (wfRun) wasStartedBy activity ( t1inv ) isTaskOf onInPort activity ( t2inv ) T2Inv entity (d) wasAssociatedWith onOutPort(d, op1, t1Inv) hasInputPort onInPort(d, ip1, t2Inv) T2 ip1 Lossy mapping to plain PROV: Port removal wasGeneratedBy(D, tInv) :− onOutPort(D, _, tInv ). used(tInv, D) :− onInPort(D, _, tInv) 8
  • 10. p-prov/r-prov pattern for channel-oriented workflows T1 isSourceOf wasAssociatedWith sourceOf(t1,ch) T1Inv sinkOf(t2,ch) isTaskOf isTaskOf(t1, wf) wasReadFrom isTaskOf(t2, wf) wasStartedBy activity (wfRun) wasAssociatedWith d wf wfInv ch1 activity ( t1inv ) activity ( t2inv ) wasStartedBy wasWrittenTo entity (d) isTaskOf T2Inv wasWrittenTo(d,ch, t1Inv) wasReadFrom(d,ch, t2Inv) wasAssociatedWith isSinkOf T2 9
  • 11. p-prov/r-prov pattern for channel-oriented workflows T1 isSourceOf wasAssociatedWith sourceOf(t1,ch) T1Inv sinkOf(t2,ch) isTaskOf isTaskOf(t1, wf) wasReadFrom isTaskOf(t2, wf) wasStartedBy activity (wfRun) wasAssociatedWith d wf wfInv ch1 activity ( t1inv ) activity ( t2inv ) wasStartedBy wasWrittenTo entity (d) isTaskOf T2Inv wasWrittenTo(d,ch, t1Inv) wasReadFrom(d,ch, t2Inv) wasAssociatedWith isSinkOf T2 Lossy mapping to plain PROV: Channel removal: wasGeneratedBy(d, tInv) :− wasWrittenTo(d,ch, t1Inv ) used(tInv, D):−wasReadFrom(d,ch,t2Inv) 9
  • 12. Bundles, provenance of provenance A bundle is a named set of provenance descriptions, and is itself an entity, so allowing provenance of provenance to be expressed. pm:bundle1 draft used wasGeneratedBy draft commenting v1 comments bundle pm:bundle1 entity(ex:draftComments) entity(ex:draftV1) activity(ex:commenting) wasGeneratedBy(ex:draftComments, ex:commenting,-) used(ex:commenting, ex:draftV1, -) endBundle ... entity(pm:bundle1, [ prov:type='prov:Bundle' ]) wasGeneratedBy(pm:bundle1, -, 2013-03-20T10:30:00) wasAttributedTo(pm:bundle1, ex:Bob) 10
  • 13. Structural workflow nesting using bundles Repurposing: use bundles to associate a workflow execution with the provenance it generates entity (wfRunTrace, [ prov:type=’prov:Bundle’ ]) wasGeneratedBy(wfRunTrace,wfInv,-) This makes it possible to write hierarchical provenance of nested workflows, recursively: entity (T, [prov:type="D1:task", prov:type="D1:workflow"]) bundle wfRunTrace activity(wfRun) // run of top level wf activity(Tinv) // run of T, a sub-workflow wasAssociatedWith(Tinv, _, T) entity(TinvTrace, [ prov:type='prov:Bundle' ]) wasGeneratedBy(TinvTrace, Tinv, _) ... endbundle 11
  • 14. Answering the sample queries Q3: “match the provenance trace to the workflow structure” Two steps: - define rules that entail p-prov relations from r-prov relations, and - check that those new p-prov relations are consistent with any constraints defined on the workflow structure / infer new p-prov statements dataLink (OP, IP) :− onOutPort(D,OP,_), onInPort(D,IP,_). OP wf op1 ip1 onOutPort T1 T2 op2 ip2 D dataLink onInPort constraint violation? IP 12
  • 15. Structure inferences T1 OP wasAssociatedWith hasOutPort(T, OP) :− I1 onOutPort(D,OP,I1), onOutPort wasAssociatedWith(I1,_,T1). D onInPort hasInPort(T, IP) :− I2 onInPort(D,IP,I1), wasAssociatedWith wasAssociatedWith(I1,_,T1). T2 IP 13
  • 16. Structure inferences isTaskOf(T2, Wf) :− onOutPort(D,OP,I1), hasOutPort(T1, OP), onInPort(D,IP,I2), hasInPort(T2, IP), T1 OP wasAssociatedWith(I1,_,T1), wasAssociatedWith wasAssociatedWith(I2,_,T2), isTaskOf(T1, Wf). I1 isTaskOf onOutPort wf D isTaskOf onInPort I2 wasAssociatedWith T2 IP 14
  • 17. composed by three main parts: the structure of the experiment represents the execution of some activity instance (activity Other PROV extensions into “workflow-land” (white classes in the UML class diagram), execution of the Execute activity), while the second defines the execution experiment (dark gray classes) and environment configuration properties of one SWfMS (activity Execute workflow). Also, the (light gray classes). Furthermore, the stereotypes in UML class execution of an activity instance determines which parameters are diagram are used to represent PROV components. consumed by each instance of the activity. The agent Scientist represents a person to use computational By using PROV-Wf we are able to represent provenance data to resources to execute the experiment (composed as a workflow). be distributed and queried at runtime. In the next sub-section we Prov-Wf: Also, the agent Scientist is associated to a machine (i.e. agent Machine). The agent Machine establishes an association with a present how this provenance model is coupled to a set of components developed for capturing and querying runtime workflow (plan Workflow), which is composed by a set of provenance data. Flavio Costa, Vítor Silva, Daniel de Oliveira, Kary Ocaña, Eduardo Ogasawara, Jonas Dias, and Capturing Provenance 3.2 Components for Marta Mattoso, activities (i.e. plan WActivity). Each activity is responsible for executing a program in a machine with a specific configuration. Capturing and Querying Workflow Runtime Provenance with PROV: a Practical Approach, Procs.components to import components We have developed a series of BigProv’13,provenance The invocation of a program within a workflow (i.e. execution data from different SWfMS to PROV-Wf model. Our instance) uses a set of parameters that can be seen as a set of Genova, Italy, March 2013 Plan DataLink hasDataLink Workflow hasSubProcess hasSink hasSource Input hasInput Process Output hasOutput Figure 1 PROV-Wf data model used Parameter describedByProcess usedInput describedByParameter Artifact ProcessRun wasEnactedBy WfProv from the Wf4Ever project WorkflowEngine wasOutputFrom Entity Activity www.wf4ever-project.org wasGeneratedBy SoftwareAgent 15
  • 18. Summary and extensions • Simple extensions to PROV – designed to model p-prov – complementary to r-prov • They enable queries that cut across r-prov and p-prov • Bundle mechanism used for provenance of nested workflow components • Next step: harmonize similar extensions proposed by other groups 16 – overall goal is to achieve interoperability