Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
provenance of lists - TAPP'11 Mini-tutorial
1. Provenance of workflow data products
Paolo Missier
School of Computing,
Newcastle University, UK
TAPP’11 workshop
Heraklion, Crete
June 20-21, 2011
2. Workflow provenance
Taverna type system:
- strings + nested lists
- “cat”, [“cat”, “dog”], [ [“cat”, “dog”], [“large”, “small”] ]
Dataflow model:
- data-driven execution
- services activate when input is ready
Raw provenance:
A detailed trace of workflow execution
- tasks performed, data transformations
- inputs used, outputs produced
Linköping, Sweden -- January 2010
3. Workflow provenance
Taverna type system:
- strings + nested lists
- “cat”, [“cat”, “dog”], [ [“cat”, “dog”], [“large”, “small”] ]
Dataflow model:
- data-driven execution
- services activate when input is ready
Raw provenance:
A detailed trace of workflow execution
- tasks performed, data transformations
- inputs used, outputs produced
Linköping, Sweden -- January 2010
4. Workflow provenance
Taverna type system:
- strings + nested lists
- “cat”, [“cat”, “dog”], [ [“cat”, “dog”], [“large”, “small”] ]
lister Dataflow model:
get pathways
by genes1
- data-driven execution
- services activate when input is ready
merge pathways
gene_id Raw provenance:
A detailed trace of workflow execution
concat gene pathway ids
- tasks performed, data transformations
output
- inputs used, outputs produced
pathway_genes
Linköping, Sweden -- January 2010
5. Implicit iteration in Taverna
c = [c1 ... ck]
a = [a1 ... an] X1 X2 X3 b = [b1 ... bm]
P
Y
Linköping, Sweden -- January 2010
6. Implicit iteration in Taverna
c = [c1 ... ck]
(0,1) (1,1) (0,1)
a = [a1 ... an] X1 X2 X3 b = [b1 ... bm]
P
Y
Linköping, Sweden -- January 2010
7. Implicit iteration in Taverna
c = [c1 ... ck]
(0,1) (1,1) (0,1)
a = [a1 ... an] X1 X2 X3 b = [b1 ... bm]
P
Y
y = [ [y11 ... y1n],
...
[ym1 ... ymn] ]
Linköping, Sweden -- January 2010
8. Implicit iteration in Taverna
c = [c1 ... ck]
(0,1) (1,1) (0,1)
a = [a1 ... an] X1 X2 X3 b = [b1 ... bm]
P
Y
y = [ [y11 ... y1n],
...
[ym1 ... ymn] ]
How y is computed at P:
let I = a ⊗ b = [ [ <ai, bj> | bj ∈ b ] | ai ∈ a ] // cross product
I’ = [ [ <ai, c, bj> | bj ∈ b ] | ai ∈ a ] // same product but with c interleaved
y = (map (map P) I’) = [(map P [ <a1,c, b1> ... <a1,c, bm>]), ...,
(map P [ <an,c, b1> ... <an,c, bm>]) ] =
[ [y11 ... y1n], ... [yn1 ... ynm] ]
Linköping, Sweden -- January 2010
9. Implicit iteration in Taverna
c = [c1 ... ck]
(0,1) (1,1) (0,1)
a = [a1 ... an] X1 X2 X3 b = [b1 ... bm]
P
Y
y = [ [y11 ... y1n], bottom line:
... yij depends only on values ai, c, bj
[ym1 ... ymn] ]
How y is computed at P:
let I = a ⊗ b = [ [ <ai, bj> | bj ∈ b ] | ai ∈ a ] // cross product
I’ = [ [ <ai, c, bj> | bj ∈ b ] | ai ∈ a ] // same product but with c interleaved
y = (map (map P) I’) = [(map P [ <a1,c, b1> ... <a1,c, bm>]), ...,
(map P [ <an,c, b1> ... <an,c, bm>]) ] =
[ [y11 ... y1n], ... [yn1 ... ynm] ]
Linköping, Sweden -- January 2010
11. The Open Provenance Model aims to capture the causal make this dependency explicit, it is required to assert that
Which provenance model/language?
endencies between the artifacts, processes, and agents.
erefore, a provenance graph is defined as a directed
artifact A2 was derived from another artifact A1 . This
edge gives us a dataflow oriented view of provenance.
ph, whose nodes are artifacts, processes and agents, It is also recognized that we may not be aware of the
• Let’s toedge represents Open depen- exact artifact generated bya starting point was
icted in Figure 1. An
one at the
a causal
Provenance Modela as another process P . Process P
d whose edges belong look of the following categories
some
artifact that process P2 used, but that there
1 2
cy, between its source, denoting the effect, and its des- is then said to have been triggered by P1 . In contrast to
ation, denoting the cause. edge was derived from, a was triggered by edge allows for
a process oriented view of past executions to be adopted.
(Since these edges summarize some activities for which all
Core OPM not being exposed, it was felt that it was not
details are
• agnostic wrtassociate a role with them.) types
necessary to
Artifact, Processor
As far as conventions are concerned, we note that causal-
• roles: edges use past tense to indicaterelationsrefer to past
ity annotations on binary that they
execution. Causal relationships are defined as follows.
• extensions by subclassing
Definition 4 (Causal Relationship). A causal relation-
• node represented by an arc and denotes the presence of a
ship is types,
• relation types between arc source of the arc (the effect)
causal dependency
and the destination of the
the
(the cause).
Five causal relationships are recognized: a process used an
artifact, an artifact was generated by a process, a process
was triggered by a process, an artifact was derived from
an artifact, and a process was controlled by an agent. By
means of annotations (see Section 8), we allow edges to be
further subtyped from these five categories.
Formal (temporal) semantics hopefully available soon
ure 1: Edges in the Open Provenance Model: sources are effects,
destinations causes
Multiple notions of causal dependencies were consid-
ered for OPM. A very strong notion of causal dependency
The first two edges express that a process used an arti- would express that a set of entities was necessary and suffi-
t and that an artifact was generated by a process. Since cient to explain the existence of another entity. It was felt
rocess may have used several artifacts, it is important that such a notion was not practical, since, with an open
dentify the roles under which these artifacts were used. world assumption, one could always argue that additional
5
oles are denoted by letter ‘R’ in Figure 1.) Likewise, factors may have influenced an outcome (e.g. electricity
was used, temperature range allowed computer to work,
12. OPM relations in the workflow context
Single User’s View (Alice)
Workflow Specification (WA)
in out in out
X A Y B Z
in
Process space
Data Binding & Enactment
Data Storage and Binding
X / [x1, x2]! SA Z / [z1, z2]!
Data space
Execution & Trace Capture
read Provenance Trace (TA)
x2
read
a2
write
y2 b2 z2
Provenance space
idep
x1 a1 y1 b1 z1
ddep
ans(X) :- ddep*(X, z1).! Provenance Queries
Missier, P., Ludascher, B., Bowers, S., Anand, M. K., Altintas, I., Dey, S., et al. (2010). Linking Multiple Workflow Provenance Traces for Interoperable
Collaborative Science. Proc.s 5th Workshop on Workflows in Support of Large-Scale Science (WORKS).
13. Data and invocation dependencies
- read, write are natural observables for a workflow run
- possible additional relations (recorded or inferred):
• invocation dependencies:
Explicit or via:
a2 depends on a1 because a1 has written data d, a2 has read d
• data dependencies:
Explicit or via:
d2 depends on d1
! because some actor invocation a read d1 prior to writing d2
7
14. Provenance queries
• Closure queries:
• operate on the transitive closure ddep* over ddep:
But also:
- queries on the workflow structure
- queries on the data structures (e.g. collections)
and importantly:
use workflow graphs to justify/explain the provenance graph for one
workflow run:
TA trace instance of WA:
h: TA ➔ WA homomorphism
h(x1 ➔ a1) = h(x2 ➔ a2) = X➔A,
h(a1 ➔ y1) = h(a2 ➔ y2) = A➔Y
...
8
16. OPM extensions, principled
Processor
Data types
types
core OPM / PIL(*)
-used
-wasGeneratedBy
-wasDerivedFrom
(Additional who
context) when
where
...
9 (*) PIL = Provenance Interchange Language, W3C Provenance Working Group
17. OPM extensions, principled
Processor
Data types
types
core OPM / PIL(*)
-used
Nested -wasGeneratedBy
Ordered Lists -wasDerivedFrom Operators on lists:
- create, insert,
delete, select...
- map, fold
(Additional who
context) when
where
...
9 (*) PIL = Provenance Interchange Language, W3C Provenance Working Group
18. OPM extensions, principled
Graph models?
Graph
Relations? matching
(sets of tuples) queries?
Relational
Processor queries?
Data types
types
core OPM / PIL(*)
-used
Nested -wasGeneratedBy
Ordered Lists -wasDerivedFrom Operators on lists:
- create, insert,
delete, select...
- map, fold
(Additional who
context) when
where
...
9 (*) PIL = Provenance Interchange Language, W3C Provenance Working Group
19. Role used in relation: Proposed extensions
Context of use
element Contained L Contained(element) x
list
Role Used in relation:
used P Used(list) L
Context of use
position
element Used i ned
C onta P Used(position) p
L C onta i ned (element) x
term
list Used
Used ListUsed (list) L
P comprehensions, see Sec. 4.2
generator
position Used P Used (position) p
filter
term Used List comprehensions, see Sec. 4.2
function Used map, see Sec. 4.3
generator
operand
filter
function Used map, see Sec. 4.3
operand
Table 2: New roles for Used and Contained relations
Table 2: New roles
Causal relation for Used and Contained relations
Example
Contained(R) ⊆ [τ ] × A × [Int] L Contained([i1 . . . in ]) x
x was inserted into L at position [i1 . . . in ]
Causal relation
wasSelectedFrom(R) ⊆ [τ ] × [τ ] × [Int] Example
L wasSelectedFrom([i1 . . . in ]) L
C onta i ned (R) [ ] × A × [Int] L at onta i ned ([i 1. . . . nn ]) x selected from L
L C position [i1 . . i i ] was
wasRemovedFrom(R) ⊆ [τ ] × [τ ] × [Int] L was inserted into L 1 . .position [i 1 . . . in ]
x wasRemovedFrom([i at . in ]) L
wasSelected F rom (R) [ ] × [ ] × [Int] L at position [iF romi([i 1was in ]) L from L
L wasSelected 1 . . . n ] . . . deleted
wasSameAs ⊆ A × A L wasSameAs x[i 1 . . . in ] was selected from L
L at position
wasRemoved F rom (R) [ ] × [ ] × [Int] inferred (various F rom ([i 1 . . . in ]) L
L wasRemoved contexts)
L at position [i 1 . . . in ] was deleted from L
wasSame A s A × A L wasSame A s x
Table 3: Specialised OPM dependencycontexts)
inferred (various relations.
10
20. OPM fragments for elementary operations
empty list
wasGenerated by insertion
L ! wasDerivedFrom
wasGenerated by used(list)
L' P:ins L
unit us
ed
(p
used(element)
contained os
itio
n)
co
p
nt
ai
ne
wasGenerated by used(element)
d
L P:unit x
x
selection p
p
ition)
pos deletion on)
d (
po siti
use d(
use
wasGenerated by used(list)
L' P:sel L wasGeneratedBy used(list)
L' P:del L
WasSelectedFrom
WasRemovedFrom
11
21. templates for the two operations. Additionally, however, the following
Operator composition → graph composition
y holds whenever insertion is followed by selection, if no other intervening
on changes the state operators may translate into inferences on the graphs
• Equalities on of the list:
sel (ins x L p) p = x
w translates into the following OPM inference rule:
sameAs(L1,L) :-
pType(P1, ins), used(P1, X, element), used(P1, Pos, position), wgby(L,P1),
pType(P2, sel), used(P2, L, list), used(P2, Pos, position), wgby(L1,P2).
7
12
22. templates for the two operations. Additionally, however, the following
Operator composition → graph composition
y holds whenever insertion is followed by selection, if no other intervening
on changes the state operators may translate into inferences on the graphs
• Equalities on of the list:
sel (ins x L p) p = x
w translates into the following OPM inference rule:
sameAs(L1,L) :-
pType(P1, ins), used(P1, X, element), used(P1, Pos, position), wgby(L,P1),
pType(P2, sel), used(P2, L, list), used(P2, Pos, position), wgby(L1,P2).
7used(position) p
wasDerivedFrom
wasGeneratedBy used(list) wasGenerated by used(list)
L' P:sel L' P:ins L
us
ed
(p
used(element)
os
WasSelectedFrom itio
co n)
nt
ai
ne
d
wasSam
eAs
x
12
23. nd p a position in the list L = map f L. The following equality follows from
he definition of map: Approach applies to map, fold (reduce)...
sel (map f x) p = f (sel x p) (10)
This equality is captured by the inferred dependency (y wasSameAs y ) in the
WasSelectedFrom
nhanced OPM template of Fig. 7 (inference rule omitted for simplicity).
wasGenerated by used(list) wasGenerated by used(list)
y Q2:sel L' P:map
WasSelectedFrom us
ed L
(fu
nc
use tio
d(p n)
o siti
on)
f
wasSameAs
wasGenerated by used(list) wasGenerated by used(list)
y Q2:sel L' P:map
io n)
unct
ed(f p
t)
us
(lis
us
ed
ed L
us
(fu
wasGenerated by used(operand) wasGenerated by nc
y' R:apply x use Q1:sel used(position) tio
d(p n)
osi
tion
)
f
wasSameAs
n )
(fu nctio
used p
t)
lis
e d(
us
13
24. Take-home message
• OPM (PIL) a candidate starting point for workflow-based
provenance
• extension mechanisms are provided, but they must be used
sensibly
– data types
– processor types
• Provenance of nested ordered lists used as a prototypical example
– semantics of provenance graphs and graph composition grounded in the
semantics of lists
• Can this approach be useful for other interesting data types?
– sets of tuples / relational algebra
14