From Event to Action: Accelerate Your Decision Making with Real-Time Automation
ProvAbs: model, policy, and tooling for abstracting PROV graphs
1. IPAW2014–P.Missier
ProvAbs: model, policy, and tooling for
abstracting PROV graphs
Paolo Missier, Jeremy Bryans, Carl Gamble
School of Computing Science, Newcastle University
Vasa Curcin, Roxana Danger
Imperial College, London
IPAW’14
Koln, June 10th, 2014
2. IPAW2014–P.Missier
Motivation: partial disclosure of provenance
Consumer:
• Motivated to acquire and act upon analysis
But: expect support evidence, mitigate risk of acting upon inaccurate
information
Provider:
• Motivated to provide accurate analysis to Public Agencies
• Enhance communication using provenance metadata for evidence
But: cannot fully disclose sources, analysis methods, etc.
5. IPAW2014–P.Missier
Provenance abstraction
What:
• Abstraction model for PROV
• Policy model and language to drive the abstraction
• Implementation: the ProvAbs tool
Why:
• To enable data exchanges with partial disclosure of the data
provenance
• To simplify understanding of provenance traces by humans
How:
• Graph rewriting, from valid PROV to valid PROV
• A node grouping operator
6. IPAW2014–P.Missier
Provenance views
Motivation similar to the UserViews model (*)
Goals:
1. construct relevant user views
2. answer to a provenance query depends on the workflow view
In contrast, in our work:
No assumption on any process specification (formal or not) driving the
views on provenance
(*) Biton, O, S Cohen Boulakia, S B Davidson, and C S Hara. “Querying and Managing
Provenance through User Views in Scientific Workflows.” In ICDE, 1072–1081, 2008.
doi:http://dx.doi.org/10.1109/ICDE.2008.4497516.
• Heavily focused on workflow and their provenance
• Scenario: one (or more) workflows, multiple users/viewers
• Rely on “composite modules” (sub-workflow structuring):
• Real workflow induced workflow
8. IPAW2014–P.Missier
1 – Define policy to assign sensitivity to graph nodes
consolidate-X consolidate-Y
report-editing
report-1
use
report-2
use
report-3
use
analytics1
X-summary
use
analytics2
use
Y-summary
use
analytics3
use
use
gen
Status: Secret
gen
Status: Secret
gen
Status: confidential
gen
Status: confidential
gen
Status: confidential
list classifications
[protect, restricted, confidential, secret, topSecret];
for all (activity used data)
where (data.Status > confidential in classifications)
setSensitivity(activity, 7);
for all (activity used data)
where (data.Status <= confidential in classifications)
setSensitivity(activity, 5);
9. IPAW2014–P.Missier
2- Node selection
Select nodes for abstraction based on the receiver’s clearance level
consolidate-X consolidate-Y
report-editing
report-1
use
report-2
use
report-3
use
analytics1
X-summary
use
analytics2
use
Y-summary
use
analytics3
use
use
gen
Status: Secret
gen
Status: Secret
gen
Status: confidential
gen
Status: confidential
gen
Status: confidential
7 7 7
5
Receiver’s
clearance level: 6
✔
︎✗︎✗ ︎✗ ︎✗
10. IPAW2014–P.Missier
3- Abstraction
Apply abstraction operator
consolidate-X consolidate-Y
report-editing
report-1
use
report-2
use
report-3
use
analytics1
X-summary
use
analytics2
use
Y-summary
use
analytics3
use
use
gen
Status: Secret
gen
Status: Secret
gen
Status: confidential
gen
Status: confidential
gen
Status: confidential
7 7 7
5 ✔
︎✗︎✗ ︎✗ ︎✗
consolidate-X consolidate-Y
report-editing
report-1
use
report-2
use
report-3
use
abs
X-summary
use
Y-summary
use
gen
Status: Secret
gen
Status: Secret
gen
Status: confidential
gen
Status: confidential
gen
Status: confidenti
12. IPAW2014–P.Missier
Naïve node group replacement: introducing cycles
e1
e2
e3
e4
e5
a1
a3
a2
a4
used
used used
used
used
wgBy
wgBy
e2
a1
a3
a2
a4
used
used
used
e'
wgBy
wgBy
e6
e6
used
used
Generation-usage cycles
are legal in PROV
Note: initial focus on vanilla PROV: usage-generation/entity-activity
13. IPAW2014–P.Missier
What’s wrong with cycles?
New cycles introduce new constraints
on the temporal ordering of events
ae1 e2
u1 g2
a
e1
s
u1
g2
e
e2
a e'
u'
g'
a
e'
start
u' g'
end
s(a) ≤ u' ≤ g' ≤ e(a)
u' ← u1
g' ← g2
e' ← {e1, e2}
u’, g’ simultaneous
14. IPAW2014–P.Missier
More generally: mapping concrete to abstract events
Abstract graph nodes should be characterised by abstract events
• Generation is the completion of production of a new entity (PROV-DM Sec. 5.1.3)
• Usage is the beginning of utilizing an entity (PROV-DM Sec. 5.1.4).
g’ = max { g1, g2 } u’ = min { u3, u4 }
e3
e4
e1
e2
a
u4 g2
g1
u3
a e'
u'
g'
a
e1
s
g1
g2
e
e2
e3
e4
u4
u3
a
e'
s
g'
e
u'
15. IPAW2014–P.Missier
Usage-follows-generation
Abstract graphs with abstract usage-generation events correspond to a
specific class of base graphs with pattern:
<all generations> -- <all usages>
e3
e4
e1
e2
a
u4 g2
g1
u3
a
e1
s e
e2
e3
e4
generation
phase
usage
phase
All generation events for all ei must
precede all usage events for all ei.
Given a grouping set of entities
{e1…en}
such that:
ei wasGeneratedBy a
or
a used ei:
16. IPAW2014–P.Missier
Naïve node group replacement -2: Type violations
e1
e2
e4
e5
a1
a3
a2
a4
used
used
used
used
wgBy
wgBy
e1
e2
e'
a3
a2
a4
used
used
wgBy
??
17. IPAW2014–P.Missier
Criteria for abstraction
1. No new generation-usage cycles
2. No new dependencies
3. Satisfy type constraints on relationship
but: ok to remove some dependencies
Convexity by closure
Extension
Replacement, rewiring
18. IPAW2014–P.Missier
Convexity by path closure
e1
e2
e3
e4
e5
a1
a3
a2
a4
used
used used
used
used
wgBy
wgBy
e1
e2
e3
e4
e5
a1
a3
a2
a4
used
used used
used
used
wgBy
wgBy
(a) (b)
e6 e6
closea5
a5
21. IPAW2014–P.Missier
t-grouping
Nodes in the grouping set can be a mix of Entities or Activities
• When all boundary nodes are of the same type:
grouping creates a node of that type
• e-grouping: new Entity node
• a-grouping: new Activity node
• Boundary nodes of mixed types:
grouping can introduce a node of either type
t-grouping: creates new node of type t ∈ { En, Act }
Note:
Grouping is commutative and closed wrt composition
23. IPAW2014–P.Missier
The ProvAbs tool
• A tool to let a policy designer explore partial disclosure options
• by experimenting with policy settings and clearance thresholds.
• Accepts graphs in PROV-N format
• Policy specified interactively, or loaded from file
Demo available!
24. IPAW2014–P.Missier
Summary
A model for abstracting PROV graph by (recursively) replacing sets
of nodes with new nodes
• Map valid PROV to valid PROV – ref.: PROV-CONSTRAINTS
• No false dependencies introduced
Abstract nodes abstract events
Extended to Agents (see TechReport)
Need to extend to more PROV relationship types
See also:
Missier, P., Gamble, C., Bryans, J.: Provenance graph abstraction by node grouping.
Technical report, Newcastle University (2013)
http://www.ncl.ac.uk/computing/research/publication/194432
Notes de l'éditeur
Reference scenario: intelligence analysis, two parties
Incident Room specialists issue analyses to law enforcement agencies - Agencies may act upon such analyses
[describe incentives/hurdles for each actor here]
Ref. to Zoom
Complete running example trace
Note that one cannot simply replace association with attribution, i.e., replace relation waw(a, ag) with wat(eN , ag), because there is no guarantee that any of the entities represented by the new eN had been attributed to ag in the original graph.