SlideShare une entreprise Scribd logo
1  sur  24
IPAW2014–P.Missier
ProvAbs: model, policy, and tooling for
abstracting PROV graphs
Paolo Missier, Jeremy Bryans, Carl Gamble
School of Computing Science, Newcastle University
Vasa Curcin, Roxana Danger
Imperial College, London
IPAW’14
Koln, June 10th, 2014
IPAW2014–P.Missier
Motivation: partial disclosure of provenance
Consumer:
• Motivated to acquire and act upon analysis
But: expect support evidence, mitigate risk of acting upon inaccurate
information
Provider:
• Motivated to provide accurate analysis to Public Agencies
• Enhance communication using provenance metadata for evidence
But: cannot fully disclose sources, analysis methods, etc.
IPAW2014–P.Missier
Provenance-enabled data exchanges
IPAW2014–P.Missier
Provenance exchange as part of data exchange
IPAW2014–P.Missier
Provenance abstraction
What:
• Abstraction model for PROV
• Policy model and language to drive the abstraction
• Implementation: the ProvAbs tool
Why:
• To enable data exchanges with partial disclosure of the data
provenance
• To simplify understanding of provenance traces by humans
How:
• Graph rewriting, from valid PROV to valid PROV
• A node grouping operator
IPAW2014–P.Missier
Provenance views
Motivation similar to the UserViews model (*)
Goals:
1. construct relevant user views
2. answer to a provenance query depends on the workflow view
In contrast, in our work:
No assumption on any process specification (formal or not) driving the
views on provenance
(*) Biton, O, S Cohen Boulakia, S B Davidson, and C S Hara. “Querying and Managing
Provenance through User Views in Scientific Workflows.” In ICDE, 1072–1081, 2008.
doi:http://dx.doi.org/10.1109/ICDE.2008.4497516.
• Heavily focused on workflow and their provenance
• Scenario: one (or more) workflows, multiple users/viewers
• Rely on “composite modules” (sub-workflow structuring):
• Real workflow  induced workflow
IPAW2014–P.Missier
History of an analyst’s report
Document produced by the
“incident room analysts”
IPAW2014–P.Missier
1 – Define policy to assign sensitivity to graph nodes
consolidate-X consolidate-Y
report-editing
report-1
use
report-2
use
report-3
use
analytics1
X-summary
use
analytics2
use
Y-summary
use
analytics3
use
use
gen
Status: Secret
gen
Status: Secret
gen
Status: confidential
gen
Status: confidential
gen
Status: confidential
list classifications
[protect, restricted, confidential, secret, topSecret];
for all (activity used data)
where (data.Status > confidential in classifications)
setSensitivity(activity, 7);
for all (activity used data)
where (data.Status <= confidential in classifications)
setSensitivity(activity, 5);
IPAW2014–P.Missier
2- Node selection
Select nodes for abstraction based on the receiver’s clearance level
consolidate-X consolidate-Y
report-editing
report-1
use
report-2
use
report-3
use
analytics1
X-summary
use
analytics2
use
Y-summary
use
analytics3
use
use
gen
Status: Secret
gen
Status: Secret
gen
Status: confidential
gen
Status: confidential
gen
Status: confidential
7 7 7
5
Receiver’s
clearance level: 6
✔
︎✗︎✗ ︎✗ ︎✗
IPAW2014–P.Missier
3- Abstraction
Apply abstraction operator
consolidate-X consolidate-Y
report-editing
report-1
use
report-2
use
report-3
use
analytics1
X-summary
use
analytics2
use
Y-summary
use
analytics3
use
use
gen
Status: Secret
gen
Status: Secret
gen
Status: confidential
gen
Status: confidential
gen
Status: confidential
7 7 7
5 ✔
︎✗︎✗ ︎✗ ︎✗
consolidate-X consolidate-Y
report-editing
report-1
use
report-2
use
report-3
use
abs
X-summary
use
Y-summary
use
gen
Status: Secret
gen
Status: Secret
gen
Status: confidential
gen
Status: confidential
gen
Status: confidenti
IPAW2014–P.Missier
Abstracting over sets of nodes
General abstraction idea: replace a group of (possibly non-
contiguous) nodes with a new node
IPAW2014–P.Missier
Naïve node group replacement: introducing cycles
e1
e2
e3
e4
e5
a1
a3
a2
a4
used
used used
used
used
wgBy
wgBy
e2
a1
a3
a2
a4
used
used
used
e'
wgBy
wgBy
e6
e6
used
used
Generation-usage cycles
are legal in PROV
Note: initial focus on vanilla PROV: usage-generation/entity-activity
IPAW2014–P.Missier
What’s wrong with cycles?
New cycles introduce new constraints
on the temporal ordering of events
ae1 e2
u1 g2
a
e1
s
u1
g2
e
e2
a e'
u'
g'
a
e'
start
u' g'
end
s(a) ≤ u' ≤ g' ≤ e(a)
u' ← u1
g' ← g2
e' ← {e1, e2}
u’, g’ simultaneous
IPAW2014–P.Missier
More generally: mapping concrete to abstract events
Abstract graph nodes should be characterised by abstract events
• Generation is the completion of production of a new entity (PROV-DM Sec. 5.1.3)
• Usage is the beginning of utilizing an entity (PROV-DM Sec. 5.1.4).
g’ = max { g1, g2 } u’ = min { u3, u4 }
e3
e4
e1
e2
a
u4 g2
g1
u3
a e'
u'
g'
a
e1
s
g1
g2
e
e2
e3
e4
u4
u3
a
e'
s
g'
e
u'
IPAW2014–P.Missier
Usage-follows-generation
Abstract graphs with abstract usage-generation events correspond to a
specific class of base graphs with pattern:
<all generations> -- <all usages>
e3
e4
e1
e2
a
u4 g2
g1
u3
a
e1
s e
e2
e3
e4
generation
phase
usage
phase
All generation events for all ei must
precede all usage events for all ei.
Given a grouping set of entities
{e1…en}
such that:
ei wasGeneratedBy a
or
a used ei:
IPAW2014–P.Missier
Naïve node group replacement -2: Type violations
e1
e2
e4
e5
a1
a3
a2
a4
used
used
used
used
wgBy
wgBy
e1
e2
e'
a3
a2
a4
used
used
wgBy
??
IPAW2014–P.Missier
Criteria for abstraction
1. No new generation-usage cycles
2. No new dependencies
3. Satisfy type constraints on relationship
but: ok to remove some dependencies
Convexity by closure
Extension
Replacement, rewiring
IPAW2014–P.Missier
Convexity by path closure
e1
e2
e3
e4
e5
a1
a3
a2
a4
used
used used
used
used
wgBy
wgBy
e1
e2
e3
e4
e5
a1
a3
a2
a4
used
used used
used
used
wgBy
wgBy
(a) (b)
e6 e6
closea5
a5
IPAW2014–P.Missier
Replacement , rewiring
e1
e2
e3
e4
e5
a1
a3
a2
a4
used
used used
used
used
wgBy
wgBy
e2
e6
a2
a4
used
used
e'
(c)
e6
replace
a5
a5
IPAW2014–P.Missier
Extension – restore type correctness
e2
e6
a2
a4
used
used
e'
a2
a4
used
used
e'' a5a5
IPAW2014–P.Missier
t-grouping
Nodes in the grouping set can be a mix of Entities or Activities
• When all boundary nodes are of the same type:
 grouping creates a node of that type
• e-grouping: new Entity node
• a-grouping: new Activity node
• Boundary nodes of mixed types:
 grouping can introduce a node of either type
t-grouping: creates new node of type t ∈ { En, Act }
Note:
Grouping is commutative and closed wrt composition
IPAW2014–P.Missier
t-grouping
e4
e5
a1
a3
a2
a4
u42
u52
g53
g41
a-grouping
replace
e5a3 a4
u54g53
aN
e-grouping
replace
a1
a3
a4
un4
gN1
eN (e-2)
(a-1)
(a-2)
aN a4
uN4
gNN
eN (e-3)
extend
and replace
e4
e5
a1
a3
a2
a4
u42
u52
g53
g41
e4
e5
a1
a3
a2
a4
u42
u52
g53
g41
(e-1)
u54
u54u54
u5N
gN3
a-grouping e-grouping
IPAW2014–P.Missier
The ProvAbs tool
• A tool to let a policy designer explore partial disclosure options
• by experimenting with policy settings and clearance thresholds.
• Accepts graphs in PROV-N format
• Policy specified interactively, or loaded from file
Demo available!
IPAW2014–P.Missier
Summary
 A model for abstracting PROV graph by (recursively) replacing sets
of nodes with new nodes
• Map valid PROV to valid PROV – ref.: PROV-CONSTRAINTS
• No false dependencies introduced
 Abstract nodes  abstract events
 Extended to Agents (see TechReport)
 Need to extend to more PROV relationship types
See also:
Missier, P., Gamble, C., Bryans, J.: Provenance graph abstraction by node grouping.
Technical report, Newcastle University (2013)
http://www.ncl.ac.uk/computing/research/publication/194432

Contenu connexe

En vedette

Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...Paolo Missier
 
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paolo Missier
 
Structured Occurrence Network for provenance: talk for ipaw12 paper
Structured Occurrence Network for provenance: talk for ipaw12 paperStructured Occurrence Network for provenance: talk for ipaw12 paper
Structured Occurrence Network for provenance: talk for ipaw12 paperPaolo Missier
 
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...Paolo Missier
 
Invited talk @ DCC09 workshop
Invited talk @ DCC09 workshopInvited talk @ DCC09 workshop
Invited talk @ DCC09 workshopPaolo Missier
 
Ipaw12 datalog paper talk
Ipaw12 datalog paper talkIpaw12 datalog paper talk
Ipaw12 datalog paper talkPaolo Missier
 
Synthesizing API Usage Examples
Synthesizing API Usage Examples Synthesizing API Usage Examples
Synthesizing API Usage Examples Ray Buse
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...Paolo Missier
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralCloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralPaolo Missier
 
ReComp project kickoff presentation 11-03-2016
ReComp project kickoff presentation 11-03-2016ReComp project kickoff presentation 11-03-2016
ReComp project kickoff presentation 11-03-2016Paolo Missier
 
ReComp and the Variant Interpretations Case Study
ReComp and the Variant Interpretations Case StudyReComp and the Variant Interpretations Case Study
ReComp and the Variant Interpretations Case StudyPaolo Missier
 
Paper presentation @DILS'07
Paper presentation @DILS'07Paper presentation @DILS'07
Paper presentation @DILS'07Paolo Missier
 

En vedette (14)

Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
 
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
 
C4Bio paper talk
C4Bio paper talkC4Bio paper talk
C4Bio paper talk
 
Structured Occurrence Network for provenance: talk for ipaw12 paper
Structured Occurrence Network for provenance: talk for ipaw12 paperStructured Occurrence Network for provenance: talk for ipaw12 paper
Structured Occurrence Network for provenance: talk for ipaw12 paper
 
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
 
Invited talk @ DCC09 workshop
Invited talk @ DCC09 workshopInvited talk @ DCC09 workshop
Invited talk @ DCC09 workshop
 
Ipaw12 datalog paper talk
Ipaw12 datalog paper talkIpaw12 datalog paper talk
Ipaw12 datalog paper talk
 
Synthesizing API Usage Examples
Synthesizing API Usage Examples Synthesizing API Usage Examples
Synthesizing API Usage Examples
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...
 
ReComp for genomics
ReComp for genomicsReComp for genomics
ReComp for genomics
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralCloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
 
ReComp project kickoff presentation 11-03-2016
ReComp project kickoff presentation 11-03-2016ReComp project kickoff presentation 11-03-2016
ReComp project kickoff presentation 11-03-2016
 
ReComp and the Variant Interpretations Case Study
ReComp and the Variant Interpretations Case StudyReComp and the Variant Interpretations Case Study
ReComp and the Variant Interpretations Case Study
 
Paper presentation @DILS'07
Paper presentation @DILS'07Paper presentation @DILS'07
Paper presentation @DILS'07
 

Similaire à ProvAbs: model, policy, and tooling for abstracting PROV graphs

Elastic Scaling of a High-Throughput Content-Based Publish/Subscribe Engine
Elastic Scaling of a High-Throughput Content-Based Publish/Subscribe EngineElastic Scaling of a High-Throughput Content-Based Publish/Subscribe Engine
Elastic Scaling of a High-Throughput Content-Based Publish/Subscribe EngineZbigniew Jerzak
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...Paolo Missier
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Daniel Valcarce
 
Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Paolo Missier
 
DATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITODATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITOMarcoMellia
 
ReComp: Preserving the value of large scale data analytics over time through...
ReComp:Preserving the value of large scale data analytics over time through...ReComp:Preserving the value of large scale data analytics over time through...
ReComp: Preserving the value of large scale data analytics over time through...Paolo Missier
 
Consistent Resource Scheduling and QoS management
Consistent Resource Scheduling and QoS managementConsistent Resource Scheduling and QoS management
Consistent Resource Scheduling and QoS managementARCCN
 
DataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображениеDataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображениеGeeksLab Odessa
 
The Search and Hyperlinking Task at MediaEval 2014
The Search and Hyperlinking Task at MediaEval 2014The Search and Hyperlinking Task at MediaEval 2014
The Search and Hyperlinking Task at MediaEval 2014multimediaeval
 
Building on iMarine for fostering Innovation, Decision making, Governance and...
Building on iMarine for fostering Innovation, Decision making, Governance and...Building on iMarine for fostering Innovation, Decision making, Governance and...
Building on iMarine for fostering Innovation, Decision making, Governance and...Blue BRIDGE
 
Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...
Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...
Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...IRJET Journal
 
Trust as a Proxy Measure for the Quality of VGI in the Case of OSM
Trust  as a Proxy Measure for the Quality of VGI in the Case of OSMTrust  as a Proxy Measure for the Quality of VGI in the Case of OSM
Trust as a Proxy Measure for the Quality of VGI in the Case of OSMCarsten Keßler
 
Multi-omics methods and resources for Bioconductor
Multi-omics methods and resources for BioconductorMulti-omics methods and resources for Bioconductor
Multi-omics methods and resources for BioconductorLevi Waldron
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
Cross-validation aggregation for forecasting
Cross-validation aggregation for forecastingCross-validation aggregation for forecasting
Cross-validation aggregation for forecastingDevon Barrow
 
IRJET- Fish Recognition and Detection Based on Deep Learning
IRJET-  	  Fish Recognition and Detection Based on Deep LearningIRJET-  	  Fish Recognition and Detection Based on Deep Learning
IRJET- Fish Recognition and Detection Based on Deep LearningIRJET Journal
 

Similaire à ProvAbs: model, policy, and tooling for abstracting PROV graphs (20)

Elastic Scaling of a High-Throughput Content-Based Publish/Subscribe Engine
Elastic Scaling of a High-Throughput Content-Based Publish/Subscribe EngineElastic Scaling of a High-Throughput Content-Based Publish/Subscribe Engine
Elastic Scaling of a High-Throughput Content-Based Publish/Subscribe Engine
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
 
Wcre08.ppt
Wcre08.pptWcre08.ppt
Wcre08.ppt
 
2014 PV Distribution System Modeling Workshop: Hosting Capacity Analysis and ...
2014 PV Distribution System Modeling Workshop: Hosting Capacity Analysis and ...2014 PV Distribution System Modeling Workshop: Hosting Capacity Analysis and ...
2014 PV Distribution System Modeling Workshop: Hosting Capacity Analysis and ...
 
Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...
 
DATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITODATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITO
 
ReComp: Preserving the value of large scale data analytics over time through...
ReComp:Preserving the value of large scale data analytics over time through...ReComp:Preserving the value of large scale data analytics over time through...
ReComp: Preserving the value of large scale data analytics over time through...
 
Consistent Resource Scheduling and QoS management
Consistent Resource Scheduling and QoS managementConsistent Resource Scheduling and QoS management
Consistent Resource Scheduling and QoS management
 
DataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображениеDataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображение
 
The Search and Hyperlinking Task at MediaEval 2014
The Search and Hyperlinking Task at MediaEval 2014The Search and Hyperlinking Task at MediaEval 2014
The Search and Hyperlinking Task at MediaEval 2014
 
Building on iMarine for fostering Innovation, Decision making, Governance and...
Building on iMarine for fostering Innovation, Decision making, Governance and...Building on iMarine for fostering Innovation, Decision making, Governance and...
Building on iMarine for fostering Innovation, Decision making, Governance and...
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 
Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...
Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...
Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...
 
Trust as a Proxy Measure for the Quality of VGI in the Case of OSM
Trust  as a Proxy Measure for the Quality of VGI in the Case of OSMTrust  as a Proxy Measure for the Quality of VGI in the Case of OSM
Trust as a Proxy Measure for the Quality of VGI in the Case of OSM
 
02 smith epri_smith_hosting_capacity
02 smith epri_smith_hosting_capacity02 smith epri_smith_hosting_capacity
02 smith epri_smith_hosting_capacity
 
Multi-omics methods and resources for Bioconductor
Multi-omics methods and resources for BioconductorMulti-omics methods and resources for Bioconductor
Multi-omics methods and resources for Bioconductor
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Cross-validation aggregation for forecasting
Cross-validation aggregation for forecastingCross-validation aggregation for forecasting
Cross-validation aggregation for forecasting
 
IRJET- Fish Recognition and Detection Based on Deep Learning
IRJET-  	  Fish Recognition and Detection Based on Deep LearningIRJET-  	  Fish Recognition and Detection Based on Deep Learning
IRJET- Fish Recognition and Detection Based on Deep Learning
 

Plus de Paolo Missier

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsPaolo Missier
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Paolo Missier
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...Paolo Missier
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Paolo Missier
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewPaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...Paolo Missier
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data SciencePaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...Paolo Missier
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...Paolo Missier
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...Paolo Missier
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff UniversityPaolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...Paolo Missier
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...Paolo Missier
 

Plus de Paolo Missier (20)

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
 

Dernier

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Dernier (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

ProvAbs: model, policy, and tooling for abstracting PROV graphs

  • 1. IPAW2014–P.Missier ProvAbs: model, policy, and tooling for abstracting PROV graphs Paolo Missier, Jeremy Bryans, Carl Gamble School of Computing Science, Newcastle University Vasa Curcin, Roxana Danger Imperial College, London IPAW’14 Koln, June 10th, 2014
  • 2. IPAW2014–P.Missier Motivation: partial disclosure of provenance Consumer: • Motivated to acquire and act upon analysis But: expect support evidence, mitigate risk of acting upon inaccurate information Provider: • Motivated to provide accurate analysis to Public Agencies • Enhance communication using provenance metadata for evidence But: cannot fully disclose sources, analysis methods, etc.
  • 5. IPAW2014–P.Missier Provenance abstraction What: • Abstraction model for PROV • Policy model and language to drive the abstraction • Implementation: the ProvAbs tool Why: • To enable data exchanges with partial disclosure of the data provenance • To simplify understanding of provenance traces by humans How: • Graph rewriting, from valid PROV to valid PROV • A node grouping operator
  • 6. IPAW2014–P.Missier Provenance views Motivation similar to the UserViews model (*) Goals: 1. construct relevant user views 2. answer to a provenance query depends on the workflow view In contrast, in our work: No assumption on any process specification (formal or not) driving the views on provenance (*) Biton, O, S Cohen Boulakia, S B Davidson, and C S Hara. “Querying and Managing Provenance through User Views in Scientific Workflows.” In ICDE, 1072–1081, 2008. doi:http://dx.doi.org/10.1109/ICDE.2008.4497516. • Heavily focused on workflow and their provenance • Scenario: one (or more) workflows, multiple users/viewers • Rely on “composite modules” (sub-workflow structuring): • Real workflow  induced workflow
  • 7. IPAW2014–P.Missier History of an analyst’s report Document produced by the “incident room analysts”
  • 8. IPAW2014–P.Missier 1 – Define policy to assign sensitivity to graph nodes consolidate-X consolidate-Y report-editing report-1 use report-2 use report-3 use analytics1 X-summary use analytics2 use Y-summary use analytics3 use use gen Status: Secret gen Status: Secret gen Status: confidential gen Status: confidential gen Status: confidential list classifications [protect, restricted, confidential, secret, topSecret]; for all (activity used data) where (data.Status > confidential in classifications) setSensitivity(activity, 7); for all (activity used data) where (data.Status <= confidential in classifications) setSensitivity(activity, 5);
  • 9. IPAW2014–P.Missier 2- Node selection Select nodes for abstraction based on the receiver’s clearance level consolidate-X consolidate-Y report-editing report-1 use report-2 use report-3 use analytics1 X-summary use analytics2 use Y-summary use analytics3 use use gen Status: Secret gen Status: Secret gen Status: confidential gen Status: confidential gen Status: confidential 7 7 7 5 Receiver’s clearance level: 6 ✔ ︎✗︎✗ ︎✗ ︎✗
  • 10. IPAW2014–P.Missier 3- Abstraction Apply abstraction operator consolidate-X consolidate-Y report-editing report-1 use report-2 use report-3 use analytics1 X-summary use analytics2 use Y-summary use analytics3 use use gen Status: Secret gen Status: Secret gen Status: confidential gen Status: confidential gen Status: confidential 7 7 7 5 ✔ ︎✗︎✗ ︎✗ ︎✗ consolidate-X consolidate-Y report-editing report-1 use report-2 use report-3 use abs X-summary use Y-summary use gen Status: Secret gen Status: Secret gen Status: confidential gen Status: confidential gen Status: confidenti
  • 11. IPAW2014–P.Missier Abstracting over sets of nodes General abstraction idea: replace a group of (possibly non- contiguous) nodes with a new node
  • 12. IPAW2014–P.Missier Naïve node group replacement: introducing cycles e1 e2 e3 e4 e5 a1 a3 a2 a4 used used used used used wgBy wgBy e2 a1 a3 a2 a4 used used used e' wgBy wgBy e6 e6 used used Generation-usage cycles are legal in PROV Note: initial focus on vanilla PROV: usage-generation/entity-activity
  • 13. IPAW2014–P.Missier What’s wrong with cycles? New cycles introduce new constraints on the temporal ordering of events ae1 e2 u1 g2 a e1 s u1 g2 e e2 a e' u' g' a e' start u' g' end s(a) ≤ u' ≤ g' ≤ e(a) u' ← u1 g' ← g2 e' ← {e1, e2} u’, g’ simultaneous
  • 14. IPAW2014–P.Missier More generally: mapping concrete to abstract events Abstract graph nodes should be characterised by abstract events • Generation is the completion of production of a new entity (PROV-DM Sec. 5.1.3) • Usage is the beginning of utilizing an entity (PROV-DM Sec. 5.1.4). g’ = max { g1, g2 } u’ = min { u3, u4 } e3 e4 e1 e2 a u4 g2 g1 u3 a e' u' g' a e1 s g1 g2 e e2 e3 e4 u4 u3 a e' s g' e u'
  • 15. IPAW2014–P.Missier Usage-follows-generation Abstract graphs with abstract usage-generation events correspond to a specific class of base graphs with pattern: <all generations> -- <all usages> e3 e4 e1 e2 a u4 g2 g1 u3 a e1 s e e2 e3 e4 generation phase usage phase All generation events for all ei must precede all usage events for all ei. Given a grouping set of entities {e1…en} such that: ei wasGeneratedBy a or a used ei:
  • 16. IPAW2014–P.Missier Naïve node group replacement -2: Type violations e1 e2 e4 e5 a1 a3 a2 a4 used used used used wgBy wgBy e1 e2 e' a3 a2 a4 used used wgBy ??
  • 17. IPAW2014–P.Missier Criteria for abstraction 1. No new generation-usage cycles 2. No new dependencies 3. Satisfy type constraints on relationship but: ok to remove some dependencies Convexity by closure Extension Replacement, rewiring
  • 18. IPAW2014–P.Missier Convexity by path closure e1 e2 e3 e4 e5 a1 a3 a2 a4 used used used used used wgBy wgBy e1 e2 e3 e4 e5 a1 a3 a2 a4 used used used used used wgBy wgBy (a) (b) e6 e6 closea5 a5
  • 19. IPAW2014–P.Missier Replacement , rewiring e1 e2 e3 e4 e5 a1 a3 a2 a4 used used used used used wgBy wgBy e2 e6 a2 a4 used used e' (c) e6 replace a5 a5
  • 20. IPAW2014–P.Missier Extension – restore type correctness e2 e6 a2 a4 used used e' a2 a4 used used e'' a5a5
  • 21. IPAW2014–P.Missier t-grouping Nodes in the grouping set can be a mix of Entities or Activities • When all boundary nodes are of the same type:  grouping creates a node of that type • e-grouping: new Entity node • a-grouping: new Activity node • Boundary nodes of mixed types:  grouping can introduce a node of either type t-grouping: creates new node of type t ∈ { En, Act } Note: Grouping is commutative and closed wrt composition
  • 22. IPAW2014–P.Missier t-grouping e4 e5 a1 a3 a2 a4 u42 u52 g53 g41 a-grouping replace e5a3 a4 u54g53 aN e-grouping replace a1 a3 a4 un4 gN1 eN (e-2) (a-1) (a-2) aN a4 uN4 gNN eN (e-3) extend and replace e4 e5 a1 a3 a2 a4 u42 u52 g53 g41 e4 e5 a1 a3 a2 a4 u42 u52 g53 g41 (e-1) u54 u54u54 u5N gN3 a-grouping e-grouping
  • 23. IPAW2014–P.Missier The ProvAbs tool • A tool to let a policy designer explore partial disclosure options • by experimenting with policy settings and clearance thresholds. • Accepts graphs in PROV-N format • Policy specified interactively, or loaded from file Demo available!
  • 24. IPAW2014–P.Missier Summary  A model for abstracting PROV graph by (recursively) replacing sets of nodes with new nodes • Map valid PROV to valid PROV – ref.: PROV-CONSTRAINTS • No false dependencies introduced  Abstract nodes  abstract events  Extended to Agents (see TechReport)  Need to extend to more PROV relationship types See also: Missier, P., Gamble, C., Bryans, J.: Provenance graph abstraction by node grouping. Technical report, Newcastle University (2013) http://www.ncl.ac.uk/computing/research/publication/194432

Notes de l'éditeur

  1. Reference scenario: intelligence analysis, two parties Incident Room specialists issue analyses to law enforcement agencies - Agencies may act upon such analyses [describe incentives/hurdles for each actor here]
  2. Ref. to Zoom
  3. Complete running example trace
  4. Note that one cannot simply replace association with attribution, i.e., replace relation waw(a, ag) with wat(eN , ag), because there is no guarantee that any of the entities represented by the new eN had been attributed to ag in the original graph.