SlideShare une entreprise Scribd logo
1  sur  45
A Generic Provenance Middleware
for Database Queries, Updates, and
Transactions
Bahareh Sadat Arab1, Dieter Gawlick2, Venkatesh
Radhakrishnan2, Hao Guo1, Boris Glavic1
IIT DBGroup1
Oracle2
Outline
❶ Motivation and Overview
❷ GProM Vision
❸ Provenance for Transactions
2 GProM - Provenance for Queries, Updates, and Transactions
Introduction
• Data Provenance
– Information about the origin and creation process data
• Provenance tracking for database operations
– Considerable interest from database community in last decade
• The de-facto standard for database provenance [1,2,3,4,5]
– model provenance as annotations on data (e.g., tuples)
– compute the provenance by propagating annotations (query rewrite)
SELECT
DISTINCT Owner
FROM CannAcc;
[1] B. Glavic, R. J. Miller, and G. Alonso. Using SQL for Efficient Generation and Querying of Provenance Information. In Search of
Elegance in the Theory and Practice of Computation, Springer, 2013.
[2] G. Karvounarakis, T. J. Green, Z. G. Ives, and V. Tannen. Collaborative data sharing via update exchange and provenance. TODS,
2013.
[3] D. Bhagwat, L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. An Annotation Management System for Relational Databases. VLDB
Journal, 14(4):373–396, 2005.
[4] P. Agrawal, O. Benjelloun, A. D. Sarma, C. Hayworth, S. U. Nabar, T. Sugihara, and J. Widom. Trio: A System for Data, Uncertainty,
and Lineage. In VLDB, pages 1151–1154, 2006.
[5] G. Karvounarakis and T. Green. Semiring-annotated data: Queries and provenance. SIGMOD Record, 41(3):5–14, 2012.
3 GProM - Provenance for Queries, Updates, and Transactions
Use Cases
• Debugging data and transformations (queries)[1]
• Probabilistic databases (queries)[5]
• Auditing and compliance (transactions and update
statements)[6]
• Understanding data integration transformations (queries
and transactions)
• Assessing data quality and trust (queries and
transactions)[7]
 Computing provenance for updates and transactions is
essential for many use cases.
[1] B. Glavic, R. J. Miller, and G. Alonso. Using SQL for Efficient Generation and Querying of Provenance
Information. In Search of Elegance in the Theory and Practice of Computation, pringer, 2013.
[5] P. Agrawal, O. Benjelloun, A. D. Sarma, C. Hayworth, S. U. Nabar, T. Sugihara, and J. Widom. Trio: A System
for Data, Uncertainty, and Lineage. In VLDB, 2006.
[6] D. Gawlick and V. Radhakrishnan. Fine grain provenance using temporal databases. In TaPP, 2011.
[7] G. Karvounarakis and T. Green. Semiring-annotated data: Queries and provenance. SIGMOD Record, 2012.
4 GProM - Provenance for Queries, Updates, and Transactions
Shortcomings of State-of-the-Art
• No practical implementation for updates
• No system or model supports transactions
• Inflexible provenance storage
– Always on [2,3]
– On-demand only [1]
• Query rewrite use atypical access patterns and
operator sequences
– -> leads to poor execution plans
• Most systems: only one type of provenance
[1] B. Glavic, R. J. Miller, and G. Alonso. Using SQL for Efficient Generation and Querying of Provenance Information. In Search of
Elegance in the Theory and Practice of Computation, pringer, 2013.
[2] D. Bhagwat, L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. An Annotation Management System for Relational Databases. VLDB
Journal, 2005.
[3] G. Karvounarakis, T. J. Green, Z. G. Ives, and V. Tannen. Collaborative data sharing via update exchange and provenance. TODS,
2013.
5 GProM - Provenance for Queries, Updates, and Transactions
Objectives
1. Vision: Generic Provenance Database
Middleware (GProM).
– Provenance for
• Queries, updates, and transactions
– User decides when to compute and store
provenance
– Supports multiple provenance models
– Database-independent
2. Tracking provenance of concurrent
transactions
– Reenactment Queries
6 GProM - Provenance for Queries, Updates, and Transactions
Contributions
1. First solution for provenance of transactions
2. Retroactive on-demand provenance
computation
– Using read-only reenactment
3. Only requires audit log + time travel
– Supported by most DBMS
– No additional storage and runtime overhead
4. Non-invasive provenance computation
– query rewrite + annotation propagation
7 GProM - Provenance for Queries, Updates, and Transactions
Outline
❶ Motivation and Overview
❷ GProM Vision
❸ Provenance for Transactions
8 GProM - Provenance for Queries, Updates, and Transactions
System Architecture
• Database independent middleware
– Plug-able parser and SQL code generator
• Internal query representation
– Relational Algebra Graph Model (AGM)
• Core driver: Query rewrites
– Provenance Computation
– Flexible storage policies for provenance
– Provenance import/export
– AGM Optimizer (rewritten queries)
– Extensibility: Rewrite Specification Language (RSL)
• Initial prototype build on-top of Oracle
9 GProM - Provenance for Queries, Updates, and Transactions
GProM Overview
10 GProM - Provenance for Queries, Updates, and Transactions
Provenance Computation
• Query rewrite
– Take original query q and rewrite into q+
Computes original results + provenance
– Propagate provenance through operations
11 GProM - Provenance for Queries, Updates, and Transactions
Q
Result
DB
Result +
Provenance
Q+
Example Rewrite
• Input:
SELECT DISTINCT u.Owner FROM Usacc u, CanAcc c WHERE u.ID = c.ID;
• Rewrite Parts:
USacc SELECT ID, Owner, Balance, Type,
ID AS P1, Owner AS P2, Balance AS P3, Type AS P4
FROM USacc
CanAcc SELECT ID, Owner, Balance, Type,
ID AS P5, Owner AS P6, Balance AS P7, Type AS P8
FROM CanAcc
WHERE u.ID = c.ID WHERE u.ID = c.ID
SELECT DISTINCT Owner SELECT Owner, P1, P2, P3, P4, P5, P6, P7, P8
• Output:
SELECT u.Owner, P1, P2, P3, P4, P5, P6, P7, P8
FROM
(SELECT ID, Owner, Balance, Type,
ID AS P1, Owner AS P2, Balance AS P3, Type AS P4
FROM USacc) u
(SELECT ID, Owner, Balance, Type,
ID AS P5, Owner AS P6, Balance AS P7, Type AS P8
FROM CanAcc) c
WHERE u.ID = c.ID;
12 GProM - Provenance for Queries, Updates, and Transactions
Provenance Computation
• Operates on relational algebra representation of queries
– Fixed set of rewrite rules per provenance type:
• One per type of algebra operator
• Recursive top-down rewrite
– For each relation access: duplicate attributes as provenance
– For each operator: replace with algebra graph that propagates
provenance annotations
• Composable
13 GProM - Provenance for Queries, Updates, and Transactions
UsAcc CanAcc UsAcc CanAcc
Supporting Past Queries, Updates,
and Transactions
• Only needs audit log and time travel
–supported by most DBMS
• Sufficient for provenance of past queries [4]
• Our contribution
–Sufficient for updates and transactions
[4] J. Zhang and H. Jagadish. Lost source provenance. In EDBT, 2010.
14 GProM - Provenance for Queries, Updates, and Transactions
Provenance Generation and
Storage Policies
• GProM default
– Only compute provenance if explicitly requested
• User can register storage policies
– When to store which type of provenance
POLICY storeOnR {
FIRE ON Query, Insert q
WHEN Root(q) +=> Table(R)
COMPUTE PI-CS
STORE AS NEW TABLE
NAMING SCHEME Hash
}
15 GProM - Provenance for Queries, Updates, and Transactions
Optimizing Rewritten Queries
• Query rewrite use atypical access patterns and
operator sequences
leads to poor execution plans
• Optimization for rewritten queries
– Heuristic
– Cost-based
SELECT ID, Owner, Balance,
CASE
WHEN Balance > 1000000
THEN 'Premium '
ELSE Type
END AS Type,
prov_CanAcc_ID,
prov_CanAcc_Owner,
prov_CanAcc_Balance,
prov_CanAcc_Type,
prov_USacc_ID,
prov_USacc_Owner,
prov_USacc_Balance,
prov_USacc_Type
FROM u1
...
SELECT ID, Owner, Balance, 'Premium ' AS Type,
prov_CanAcc_ID,
prov_CanAcc_Owner,
prov_CanAcc_Balance,
prov_CanAcc_Type,
prov_USacc_ID,
prov_USacc_Owner,
prov_USacc_Balance,
prov_USacc_Type
FROM u1
WHERE Balance > 1000000
UNION ALL
SELECT * FROM u1
WHERE (Balance > 1000000) IS NOT TRUE
16 GProM - Provenance for Queries, Updates, and Transactions
Rewrite Extensibility
• Extensible using Rewrite Specification Language (RSL)
– Concise specification of rewrite rules
RULE mergeSelections {
FOR q => c => g
WHERE q->type = selection AND c->type = selection
REWRITE INTO
selection [pred = q->pred AND c->pred] => g
}
17
User
RSL
Manager
1
RSL
Provenance
Rewriter
PolicyPolicyRSL
2
RSL
Interpreter
3
1
2 3
4
GProM - Provenance for Queries, Updates, and Transactions
Outline
❶ Motivation and Overview
❷ GProM Vision
❸ Provenance for Transactions
18 GProM - Provenance for Queries, Updates, and Transactions
Provenance of Transactions
19 GProM - Provenance for Queries, Updates, and Transactions
Provenance of Transactions
INSERT INTO USacc
(SELECT ID,
Owner,
Balance,
‘Standard’ AS Type
FROM CanAcc
WHERE Type = ‘US_dollar’);
UPDATE USacc
SET Type = ’Premium’
WHERE Balance > 1000000;
COMMIT;
20 GProM - Provenance for Queries, Updates, and Transactions
Provenance of Transactions
INSERT INTO Usacc
(SELECT ID,
Owner,
Balance,
‘Standard’ AS Type
FROM CanAcc
WHERE Type = ‘US_dollar’);
UPDATE Usacc
SET Type = ’Premium’
WHERE Balance > 1000000;
21 GProM - Provenance for Queries, Updates, and Transactions
u1
u2
Provenance of Transactions
• Our Approach:
Reenactment + Provenance Propagation
• Currently supports
– Snapshot Isolation
– Statement-level Snapshot Isolation
22 GProM - Provenance for Queries, Updates, and Transactions
Gather
Transaction
Information
Construct
Update
Reenactment
Query
Rewrite For
Provenance
Computation
Execute
Query
1
Construct
Transaction
Reenactment
Query
2 3 4 5
1.Gather Transaction Information
• Retrieve SQL statements of transaction from audit log
• Update u1:
INSERT INTO USacc
(SELECT ID,
Owner,
Balance,
‘Standard’ AS Type
FROM CanAcc
WHERE Type = ‘US_dollar’);
• Update u2:
UPDATE Usacc
SET Type = ’Premium’
WHERE Balance > 1000000;
23 GProM - Provenance for Queries, Updates, and Transactions
2. Translate Updates: Reenactment
• Update reads table version and outputs updated table version
• Multiple versions of the database
– Each modification of a tuple t causes a new version to be created
– Old tuple versions are kept (SI)
– Add version annotation τ to provenance of each updated row
• Use semi-ring model
24 GProM - Provenance for Queries, Updates, and Transactions
UPDATE Usacc
SET Type=’Premium’
WHERE Balance>1000000;
2.Translate Updates
• Construct update reenactment query
– Simulates effect of update
– Read DB version seen by update using time travel
– Query result = updated table (Annotation-Equivalent)
SELECT ID, Owner, Balance, ’Standard’ AS Type
FROM CanAcc AS OF SCN 3652
WHERE Type=‘US_dollar’
UNION ALL
SELECT * FROM Usacc AS OF SCN 3652;
25 GProM - Provenance for Queries, Updates, and Transactions
UPDATE Usacc
SET Type = ’Premium’
WHERE Balance > 1000000;
SELECT ID, Owner, Balance, ’Premium’ AS Type
FROM Usacc AS OF SCN 3652
WHERE Balance>1000000
UNION ALL
SELECT *
FROM Usacc AS OF SCN 3652
WHERE (Balance>1000000) IS NOT TRUE;
INSERT INTO Usacc
(SELECT ID,
Owner,
Balance,
‘Standard’ AS Type
FROM CanAcc
WHERE Type = ‘US_dollar’);
3. Construct Reenactment Query
• Simulates the whole transaction
– Annotation-Equivalent to original transaction
• Merge reenactment queries based on concurrency control protocol
– Each concurrency control requires a different merge process
– SERIALIZABLE (Snapshot isolation) -> modifications before the
transaction started + previous updates of the transaction
– READ COMMITTED (Snapshot isolation) -> sees committed changes
by concurrent transaction
WHIT U1 AS
(SELECT ID, Owner, Balance, ’Standard’ AS Type
FROM CanAcc AS OF SCN 3652
WHERE Type=‘US_dollar’
UNION ALL
SELECT * FROM Usacc AS OF SCN 3652);
SELECT ID, Owner, Balance, ’Premium’ AS Type
FROM U1
WHERE Balance>1000000
UNION ALL
SELECT * FROM U1
WHERE (Balance>1000000) IS NOT TRUE;
26 GProM - Provenance for Queries, Updates, and Transactions
4. Rewrite For Provenance
Computation
• Rewrite reenactment query to compute
provenance using annotation propagation
WITH
u1 AS
(SELECT ID, Owner, Balance, ’Standard ’ AS Type,
ID AS prov_CanAcc_ID,
. . .
NULL AS prov_USacc_ID,
. . .
1 AS updated,
FROM CanAcc AS OF SCN 3652
WHERE Type = ’US dollar ’
UNION ALL
SELECT ID , Owner , Balance , Type ,
NULL AS prov_CanAcc_ID,
. . .
ID AS prov_USacc_ID,
. . .
0 AS updated
FROM USacc AS OF SCN 3652),
. . .
u1 AS
(SELECT . . .
27 GProM - Provenance for Queries, Updates, and Transactions
4. Execute Query
• Execute query to retrieve provenance
Updated USacc Tuples Provenance from CanAcc Provenance from USacc
ID Owner Balance Type P1 P2 P3 P4 P5 P6
3 Alice Bright 1,500,000 Premium 3 Alice Bright 1,500,000 NULL NULL NULL
5 Mark Smith 50 Standard 5 Mark Smith 50 NULL NULL NULL
28 GProM - Provenance for Queries, Updates, and Transactions
Conclusions
• We present our vision for GProM
– Database-independent middleware for computing
provenance of queries, updates, and transactions.
• First solution for provenance of transactions
• Query rewrite techniques on steroids:
– Provenance computation
– Transaction reenactment
– Provenance translation
– Provenance storage
– Optimization
• Extensible through RSL language
29 GProM - Provenance for Queries, Updates, and Transactions
Future Works
• Implementing additional provenance types
• Comprehensive study of heuristic and cost-based
optimizations
• Design and implementation of RSL
• Implementing additional provenance formats
• Study reenactment for other concurrency control
mechanisms
– Locking protocols (2PL)
• Investigate additional Use-cases for Reenactment
– Transaction backout
– Retroactive What-if analysis
30 GProM - Provenance for Queries, Updates, and Transactions
Questions?
• Homepage:
Bahareh: http://www.cs.iit.edu/~dbgroup/people/barab.php
Boris: http://www.cs.iit.edu/~glavic/
• DBGroup:
http://www.cs.iit.edu/~dbgroup/
• GProM Project (partially funded by Oracle)
http://www.cs.iit.edu/~dbgroup/research/oracletprov.php
• Perm
http://www.cs.iit.edu/~dbgroup/research/perm.php
31 GProM - Provenance for Queries, Updates, and Transactions
References
[1] B. Glavic, R. J. Miller, and G. Alonso. Using SQL for Efficient
Generation and Querying of Provenance Information. In Search of
Elegance in the Theory and Practice of Computation, pages 291–320. Springer,
2013.
[2] D. Bhagwat, L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. An
Annotation Management System for Relational Databases. VLDB Journal,
14(4):373–396, 2005.
[3] G. Karvounarakis, T. J. Green, Z. G. Ives, and V. Tannen. Collaborative
data sharing via update exchange and provenance. TODS, 38(3): 19, 2013.
[4] J. Zhang and H. Jagadish. Lost source provenance. In EDBT, pages 311–
322, 2010.
[5] P. Agrawal, O. Benjelloun, A. D. Sarma, C. Hayworth, S. U. Nabar, T.
Sugihara, and J. Widom. Trio: A System for Data, Uncertainty, and
Lineage. In VLDB, pages 1151–1154, 2006.
[6] D. Gawlick and V. Radhakrishnan. Fine grain provenance using
temporal databases. In TaPP, 2011.
[7] G. Karvounarakis and T. Green. Semiring-annotated data: Queries and
provenance. SIGMOD Record, 41(3):5–14, 2012.
32
Q-Bomb
• One pattern that arises from reenactment are long chains of
SELECT clauses using CASE
– Each level references attributes from next level multiple times
– Subquery pull-up creates expressions of size exponential in the number
of SELECT clauses
– In praxis: optimization never finishes
• Minimal example using one row table
SELECT CASE WHEN b < 100 THEN a ELSE a + 2 END AS a, b
FROM SELECT CASE WHEN b < 100 THEN a ELSE a + 2 END AS a, b
…
FROM SELECT CASE WHEN b < 100 THEN a ELSE a + 2 END AS a, b
FROM R
33
Example Provenance Computation
34
Example – Update Reenactment
35
Example – Trans. Reenactment
36
Rewrite Reenactment Query
37
Execute Rewritten Query
38
Types of Update Operations - Insert
• Insert executed at time t
• Updated version of R contains
1. All tuples from previous version
2. All newly inserted tuples
• Fixed tuple defined in VALUES clause
• Results of query over database version at t
Union these two sets
INSERT INTO R VALUES (v1, ... ,vn);
INSERT INTO R (q);
39
(SELECT * FROM R AS OF t)
UNION ALL
(SELECT v1 AS a1, ... , vn AS an);
(SELECT * FROM R AS OF t)
UNION ALL
(q(t));
Types of Update Operations - Delete
• Delete executed at time t
• Tuples in updated version of R:
– All tuples from for which Condition is not
fulfilled
DELETE FROM R WHERE C ; SELECT * FROM R AS OF t
WHERE (C) IS NOT TRUE;
40
Types of Update Operations - Update
• Update executed at time t
• Find tuples where Condition holds and update
the attribute values
• Find tuples where NOT Condition holds
Union these two sets
UPDATE R SET A WHERE C ;
(SELECT A’ FROM R AS OF t WHERE C)
UNION ALL
(SELECT * FROM R AS OF t WHERE (C) IS NOT TRUE)
41
READ COMMITTED
• Statement of a transaction T sees committed changes by concurrent
transaction
• For a given update we need to combine
– tuples produced by previous statements of same transaction
– tuples produced by transactions that committed before update
• Observations
– Once a transaction T modifies a tuple t, no other transaction can access t until T
commits
– Let ui be the update executed at time x of T that first modifies t
– ui will read the latest version committed x
– If we know ui then updates of T before x do not have to look at t
• Consider the database version 1 time unit (C-1) before commit of T
– This contains all the tuple versions seen by the first update of T updating each
individual tuple
– Let t be a tuple version in this version and it’s start time is y
– We know that updates from T which executed before y cannot have updated t
– We can use version C-1 as input for reenactment as long as we hide tuple
version t at y from an reenactment of an updated executed at x with x < y
42
READ COMMITTED
u1 AS
(SELECT
CASE WHEN Balance <=1000000 AND version <= 0 THEN 'Standard ' ELSE Type END AS Type ,
ID , Owner , Balance ,
CASE WHEN Balance <=1000000 AND version <= 0 THEN −1 ELSE version END AS version
FROM USacc AS OF SCN 3652)
,
u2 AS
(SELECT
CASE WHEN Balance > 1000000 AND version <= 1 THEN 'Premium' ELSE Type END AS Type ,
ID , Owner , Balance ,
CASE WHEN Balance > 1000000 AND version <= 1 THEN −1 ELSE version END AS version
FROM u1 )
SELECT ID , Owner , Balance , Type FROM u2 WHERE version = −1;43
Database Independence
• Encapsulate database-specific functionality in
pluggable modules.
• What needs to be adapted are :
1) Parser
2) SQL code generator
3) Metadata access
4) Audit log access
5) Time travel activation.
44
Accessing Several Tables
• Transactions Accessing Several Tables
– We require user to specify which table she is
interested in
– Replace access to table with query for last update
that modified the table
45
U1R1
R2
R3 U2
R1
U3
R3
U4 R1
R3

Contenu connexe

Similaire à TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, Updates, and Transactions

Presto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - LyftPresto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - Lyftkbajda
 
On the need for applications aware adaptive middleware in real-time RDF data ...
On the need for applications aware adaptive middleware in real-time RDF data ...On the need for applications aware adaptive middleware in real-time RDF data ...
On the need for applications aware adaptive middleware in real-time RDF data ...Zia Ush Shamszaman
 
Benchmarking Apache Druid
Benchmarking Apache DruidBenchmarking Apache Druid
Benchmarking Apache DruidImply
 
Benchmarking Apache Druid
Benchmarking Apache Druid Benchmarking Apache Druid
Benchmarking Apache Druid Matt Sarrel
 
Exploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerExploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerSambit Banerjee
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilNLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilDatabricks
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningDatabricks
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaScyllaDB
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with RRevolution Analytics
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisAlex Henthorn-Iwane
 
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design PatternsAllen Day, PhD
 
An empirical evaluation of cost-based federated SPARQL query Processing Engines
An empirical evaluation of cost-based federated SPARQL query Processing EnginesAn empirical evaluation of cost-based federated SPARQL query Processing Engines
An empirical evaluation of cost-based federated SPARQL query Processing EnginesUmair Qudus
 
EnterpriseDB's Best Practices for Postgres DBAs
EnterpriseDB's Best Practices for Postgres DBAsEnterpriseDB's Best Practices for Postgres DBAs
EnterpriseDB's Best Practices for Postgres DBAsEDB
 
Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...VMware Tanzu
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleScyllaDB
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleSaurabh Verma
 
Wipro Customer Presentation
Wipro Customer PresentationWipro Customer Presentation
Wipro Customer PresentationSplunk
 
Geo2tag performance evaluation, Zaslavsky, Krinkin
Geo2tag performance evaluation, Zaslavsky, Krinkin Geo2tag performance evaluation, Zaslavsky, Krinkin
Geo2tag performance evaluation, Zaslavsky, Krinkin OSLL
 
Using Perforce Data in Development at Tableau
Using Perforce Data in Development at TableauUsing Perforce Data in Development at Tableau
Using Perforce Data in Development at TableauPerforce
 

Similaire à TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, Updates, and Transactions (20)

Presto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - LyftPresto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - Lyft
 
On the need for applications aware adaptive middleware in real-time RDF data ...
On the need for applications aware adaptive middleware in real-time RDF data ...On the need for applications aware adaptive middleware in real-time RDF data ...
On the need for applications aware adaptive middleware in real-time RDF data ...
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
Benchmarking Apache Druid
Benchmarking Apache DruidBenchmarking Apache Druid
Benchmarking Apache Druid
 
Benchmarking Apache Druid
Benchmarking Apache Druid Benchmarking Apache Druid
Benchmarking Apache Druid
 
Exploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerExploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access Layer
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilNLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow Analysis
 
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
 
An empirical evaluation of cost-based federated SPARQL query Processing Engines
An empirical evaluation of cost-based federated SPARQL query Processing EnginesAn empirical evaluation of cost-based federated SPARQL query Processing Engines
An empirical evaluation of cost-based federated SPARQL query Processing Engines
 
EnterpriseDB's Best Practices for Postgres DBAs
EnterpriseDB's Best Practices for Postgres DBAsEnterpriseDB's Best Practices for Postgres DBAs
EnterpriseDB's Best Practices for Postgres DBAs
 
Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
 
Wipro Customer Presentation
Wipro Customer PresentationWipro Customer Presentation
Wipro Customer Presentation
 
Geo2tag performance evaluation, Zaslavsky, Krinkin
Geo2tag performance evaluation, Zaslavsky, Krinkin Geo2tag performance evaluation, Zaslavsky, Krinkin
Geo2tag performance evaluation, Zaslavsky, Krinkin
 
Using Perforce Data in Development at Tableau
Using Perforce Data in Development at TableauUsing Perforce Data in Development at Tableau
Using Perforce Data in Development at Tableau
 

Plus de Boris Glavic

2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...Boris Glavic
 
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...Boris Glavic
 
2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata GeneratorBoris Glavic
 
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...Boris Glavic
 
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...Boris Glavic
 
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-AnswersBoris Glavic
 
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSONBoris Glavic
 
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-AnswersTaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-AnswersBoris Glavic
 
ICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationBoris Glavic
 
TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of ProvenanceTaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of ProvenanceBoris Glavic
 
EDBT 2009 - Provenance for Nested Subqueries
EDBT 2009 - Provenance for Nested SubqueriesEDBT 2009 - Provenance for Nested Subqueries
EDBT 2009 - Provenance for Nested SubqueriesBoris Glavic
 
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...Boris Glavic
 
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...Boris Glavic
 
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"Boris Glavic
 
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"Boris Glavic
 
TaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data MiningTaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data MiningBoris Glavic
 
Ipaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanIpaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanBoris Glavic
 

Plus de Boris Glavic (17)

2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
 
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
 
2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator
 
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
 
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
 
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
 
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
 
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-AnswersTaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
 
ICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database Virtualization
 
TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of ProvenanceTaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
 
EDBT 2009 - Provenance for Nested Subqueries
EDBT 2009 - Provenance for Nested SubqueriesEDBT 2009 - Provenance for Nested Subqueries
EDBT 2009 - Provenance for Nested Subqueries
 
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
 
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
 
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
 
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
 
TaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data MiningTaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data Mining
 
Ipaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanIpaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, Ian
 

Dernier

High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 

Dernier (20)

High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 

TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, Updates, and Transactions

  • 1. A Generic Provenance Middleware for Database Queries, Updates, and Transactions Bahareh Sadat Arab1, Dieter Gawlick2, Venkatesh Radhakrishnan2, Hao Guo1, Boris Glavic1 IIT DBGroup1 Oracle2
  • 2. Outline ❶ Motivation and Overview ❷ GProM Vision ❸ Provenance for Transactions 2 GProM - Provenance for Queries, Updates, and Transactions
  • 3. Introduction • Data Provenance – Information about the origin and creation process data • Provenance tracking for database operations – Considerable interest from database community in last decade • The de-facto standard for database provenance [1,2,3,4,5] – model provenance as annotations on data (e.g., tuples) – compute the provenance by propagating annotations (query rewrite) SELECT DISTINCT Owner FROM CannAcc; [1] B. Glavic, R. J. Miller, and G. Alonso. Using SQL for Efficient Generation and Querying of Provenance Information. In Search of Elegance in the Theory and Practice of Computation, Springer, 2013. [2] G. Karvounarakis, T. J. Green, Z. G. Ives, and V. Tannen. Collaborative data sharing via update exchange and provenance. TODS, 2013. [3] D. Bhagwat, L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. An Annotation Management System for Relational Databases. VLDB Journal, 14(4):373–396, 2005. [4] P. Agrawal, O. Benjelloun, A. D. Sarma, C. Hayworth, S. U. Nabar, T. Sugihara, and J. Widom. Trio: A System for Data, Uncertainty, and Lineage. In VLDB, pages 1151–1154, 2006. [5] G. Karvounarakis and T. Green. Semiring-annotated data: Queries and provenance. SIGMOD Record, 41(3):5–14, 2012. 3 GProM - Provenance for Queries, Updates, and Transactions
  • 4. Use Cases • Debugging data and transformations (queries)[1] • Probabilistic databases (queries)[5] • Auditing and compliance (transactions and update statements)[6] • Understanding data integration transformations (queries and transactions) • Assessing data quality and trust (queries and transactions)[7]  Computing provenance for updates and transactions is essential for many use cases. [1] B. Glavic, R. J. Miller, and G. Alonso. Using SQL for Efficient Generation and Querying of Provenance Information. In Search of Elegance in the Theory and Practice of Computation, pringer, 2013. [5] P. Agrawal, O. Benjelloun, A. D. Sarma, C. Hayworth, S. U. Nabar, T. Sugihara, and J. Widom. Trio: A System for Data, Uncertainty, and Lineage. In VLDB, 2006. [6] D. Gawlick and V. Radhakrishnan. Fine grain provenance using temporal databases. In TaPP, 2011. [7] G. Karvounarakis and T. Green. Semiring-annotated data: Queries and provenance. SIGMOD Record, 2012. 4 GProM - Provenance for Queries, Updates, and Transactions
  • 5. Shortcomings of State-of-the-Art • No practical implementation for updates • No system or model supports transactions • Inflexible provenance storage – Always on [2,3] – On-demand only [1] • Query rewrite use atypical access patterns and operator sequences – -> leads to poor execution plans • Most systems: only one type of provenance [1] B. Glavic, R. J. Miller, and G. Alonso. Using SQL for Efficient Generation and Querying of Provenance Information. In Search of Elegance in the Theory and Practice of Computation, pringer, 2013. [2] D. Bhagwat, L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. An Annotation Management System for Relational Databases. VLDB Journal, 2005. [3] G. Karvounarakis, T. J. Green, Z. G. Ives, and V. Tannen. Collaborative data sharing via update exchange and provenance. TODS, 2013. 5 GProM - Provenance for Queries, Updates, and Transactions
  • 6. Objectives 1. Vision: Generic Provenance Database Middleware (GProM). – Provenance for • Queries, updates, and transactions – User decides when to compute and store provenance – Supports multiple provenance models – Database-independent 2. Tracking provenance of concurrent transactions – Reenactment Queries 6 GProM - Provenance for Queries, Updates, and Transactions
  • 7. Contributions 1. First solution for provenance of transactions 2. Retroactive on-demand provenance computation – Using read-only reenactment 3. Only requires audit log + time travel – Supported by most DBMS – No additional storage and runtime overhead 4. Non-invasive provenance computation – query rewrite + annotation propagation 7 GProM - Provenance for Queries, Updates, and Transactions
  • 8. Outline ❶ Motivation and Overview ❷ GProM Vision ❸ Provenance for Transactions 8 GProM - Provenance for Queries, Updates, and Transactions
  • 9. System Architecture • Database independent middleware – Plug-able parser and SQL code generator • Internal query representation – Relational Algebra Graph Model (AGM) • Core driver: Query rewrites – Provenance Computation – Flexible storage policies for provenance – Provenance import/export – AGM Optimizer (rewritten queries) – Extensibility: Rewrite Specification Language (RSL) • Initial prototype build on-top of Oracle 9 GProM - Provenance for Queries, Updates, and Transactions
  • 10. GProM Overview 10 GProM - Provenance for Queries, Updates, and Transactions
  • 11. Provenance Computation • Query rewrite – Take original query q and rewrite into q+ Computes original results + provenance – Propagate provenance through operations 11 GProM - Provenance for Queries, Updates, and Transactions Q Result DB Result + Provenance Q+
  • 12. Example Rewrite • Input: SELECT DISTINCT u.Owner FROM Usacc u, CanAcc c WHERE u.ID = c.ID; • Rewrite Parts: USacc SELECT ID, Owner, Balance, Type, ID AS P1, Owner AS P2, Balance AS P3, Type AS P4 FROM USacc CanAcc SELECT ID, Owner, Balance, Type, ID AS P5, Owner AS P6, Balance AS P7, Type AS P8 FROM CanAcc WHERE u.ID = c.ID WHERE u.ID = c.ID SELECT DISTINCT Owner SELECT Owner, P1, P2, P3, P4, P5, P6, P7, P8 • Output: SELECT u.Owner, P1, P2, P3, P4, P5, P6, P7, P8 FROM (SELECT ID, Owner, Balance, Type, ID AS P1, Owner AS P2, Balance AS P3, Type AS P4 FROM USacc) u (SELECT ID, Owner, Balance, Type, ID AS P5, Owner AS P6, Balance AS P7, Type AS P8 FROM CanAcc) c WHERE u.ID = c.ID; 12 GProM - Provenance for Queries, Updates, and Transactions
  • 13. Provenance Computation • Operates on relational algebra representation of queries – Fixed set of rewrite rules per provenance type: • One per type of algebra operator • Recursive top-down rewrite – For each relation access: duplicate attributes as provenance – For each operator: replace with algebra graph that propagates provenance annotations • Composable 13 GProM - Provenance for Queries, Updates, and Transactions UsAcc CanAcc UsAcc CanAcc
  • 14. Supporting Past Queries, Updates, and Transactions • Only needs audit log and time travel –supported by most DBMS • Sufficient for provenance of past queries [4] • Our contribution –Sufficient for updates and transactions [4] J. Zhang and H. Jagadish. Lost source provenance. In EDBT, 2010. 14 GProM - Provenance for Queries, Updates, and Transactions
  • 15. Provenance Generation and Storage Policies • GProM default – Only compute provenance if explicitly requested • User can register storage policies – When to store which type of provenance POLICY storeOnR { FIRE ON Query, Insert q WHEN Root(q) +=> Table(R) COMPUTE PI-CS STORE AS NEW TABLE NAMING SCHEME Hash } 15 GProM - Provenance for Queries, Updates, and Transactions
  • 16. Optimizing Rewritten Queries • Query rewrite use atypical access patterns and operator sequences leads to poor execution plans • Optimization for rewritten queries – Heuristic – Cost-based SELECT ID, Owner, Balance, CASE WHEN Balance > 1000000 THEN 'Premium ' ELSE Type END AS Type, prov_CanAcc_ID, prov_CanAcc_Owner, prov_CanAcc_Balance, prov_CanAcc_Type, prov_USacc_ID, prov_USacc_Owner, prov_USacc_Balance, prov_USacc_Type FROM u1 ... SELECT ID, Owner, Balance, 'Premium ' AS Type, prov_CanAcc_ID, prov_CanAcc_Owner, prov_CanAcc_Balance, prov_CanAcc_Type, prov_USacc_ID, prov_USacc_Owner, prov_USacc_Balance, prov_USacc_Type FROM u1 WHERE Balance > 1000000 UNION ALL SELECT * FROM u1 WHERE (Balance > 1000000) IS NOT TRUE 16 GProM - Provenance for Queries, Updates, and Transactions
  • 17. Rewrite Extensibility • Extensible using Rewrite Specification Language (RSL) – Concise specification of rewrite rules RULE mergeSelections { FOR q => c => g WHERE q->type = selection AND c->type = selection REWRITE INTO selection [pred = q->pred AND c->pred] => g } 17 User RSL Manager 1 RSL Provenance Rewriter PolicyPolicyRSL 2 RSL Interpreter 3 1 2 3 4 GProM - Provenance for Queries, Updates, and Transactions
  • 18. Outline ❶ Motivation and Overview ❷ GProM Vision ❸ Provenance for Transactions 18 GProM - Provenance for Queries, Updates, and Transactions
  • 19. Provenance of Transactions 19 GProM - Provenance for Queries, Updates, and Transactions
  • 20. Provenance of Transactions INSERT INTO USacc (SELECT ID, Owner, Balance, ‘Standard’ AS Type FROM CanAcc WHERE Type = ‘US_dollar’); UPDATE USacc SET Type = ’Premium’ WHERE Balance > 1000000; COMMIT; 20 GProM - Provenance for Queries, Updates, and Transactions
  • 21. Provenance of Transactions INSERT INTO Usacc (SELECT ID, Owner, Balance, ‘Standard’ AS Type FROM CanAcc WHERE Type = ‘US_dollar’); UPDATE Usacc SET Type = ’Premium’ WHERE Balance > 1000000; 21 GProM - Provenance for Queries, Updates, and Transactions u1 u2
  • 22. Provenance of Transactions • Our Approach: Reenactment + Provenance Propagation • Currently supports – Snapshot Isolation – Statement-level Snapshot Isolation 22 GProM - Provenance for Queries, Updates, and Transactions Gather Transaction Information Construct Update Reenactment Query Rewrite For Provenance Computation Execute Query 1 Construct Transaction Reenactment Query 2 3 4 5
  • 23. 1.Gather Transaction Information • Retrieve SQL statements of transaction from audit log • Update u1: INSERT INTO USacc (SELECT ID, Owner, Balance, ‘Standard’ AS Type FROM CanAcc WHERE Type = ‘US_dollar’); • Update u2: UPDATE Usacc SET Type = ’Premium’ WHERE Balance > 1000000; 23 GProM - Provenance for Queries, Updates, and Transactions
  • 24. 2. Translate Updates: Reenactment • Update reads table version and outputs updated table version • Multiple versions of the database – Each modification of a tuple t causes a new version to be created – Old tuple versions are kept (SI) – Add version annotation τ to provenance of each updated row • Use semi-ring model 24 GProM - Provenance for Queries, Updates, and Transactions UPDATE Usacc SET Type=’Premium’ WHERE Balance>1000000;
  • 25. 2.Translate Updates • Construct update reenactment query – Simulates effect of update – Read DB version seen by update using time travel – Query result = updated table (Annotation-Equivalent) SELECT ID, Owner, Balance, ’Standard’ AS Type FROM CanAcc AS OF SCN 3652 WHERE Type=‘US_dollar’ UNION ALL SELECT * FROM Usacc AS OF SCN 3652; 25 GProM - Provenance for Queries, Updates, and Transactions UPDATE Usacc SET Type = ’Premium’ WHERE Balance > 1000000; SELECT ID, Owner, Balance, ’Premium’ AS Type FROM Usacc AS OF SCN 3652 WHERE Balance>1000000 UNION ALL SELECT * FROM Usacc AS OF SCN 3652 WHERE (Balance>1000000) IS NOT TRUE; INSERT INTO Usacc (SELECT ID, Owner, Balance, ‘Standard’ AS Type FROM CanAcc WHERE Type = ‘US_dollar’);
  • 26. 3. Construct Reenactment Query • Simulates the whole transaction – Annotation-Equivalent to original transaction • Merge reenactment queries based on concurrency control protocol – Each concurrency control requires a different merge process – SERIALIZABLE (Snapshot isolation) -> modifications before the transaction started + previous updates of the transaction – READ COMMITTED (Snapshot isolation) -> sees committed changes by concurrent transaction WHIT U1 AS (SELECT ID, Owner, Balance, ’Standard’ AS Type FROM CanAcc AS OF SCN 3652 WHERE Type=‘US_dollar’ UNION ALL SELECT * FROM Usacc AS OF SCN 3652); SELECT ID, Owner, Balance, ’Premium’ AS Type FROM U1 WHERE Balance>1000000 UNION ALL SELECT * FROM U1 WHERE (Balance>1000000) IS NOT TRUE; 26 GProM - Provenance for Queries, Updates, and Transactions
  • 27. 4. Rewrite For Provenance Computation • Rewrite reenactment query to compute provenance using annotation propagation WITH u1 AS (SELECT ID, Owner, Balance, ’Standard ’ AS Type, ID AS prov_CanAcc_ID, . . . NULL AS prov_USacc_ID, . . . 1 AS updated, FROM CanAcc AS OF SCN 3652 WHERE Type = ’US dollar ’ UNION ALL SELECT ID , Owner , Balance , Type , NULL AS prov_CanAcc_ID, . . . ID AS prov_USacc_ID, . . . 0 AS updated FROM USacc AS OF SCN 3652), . . . u1 AS (SELECT . . . 27 GProM - Provenance for Queries, Updates, and Transactions
  • 28. 4. Execute Query • Execute query to retrieve provenance Updated USacc Tuples Provenance from CanAcc Provenance from USacc ID Owner Balance Type P1 P2 P3 P4 P5 P6 3 Alice Bright 1,500,000 Premium 3 Alice Bright 1,500,000 NULL NULL NULL 5 Mark Smith 50 Standard 5 Mark Smith 50 NULL NULL NULL 28 GProM - Provenance for Queries, Updates, and Transactions
  • 29. Conclusions • We present our vision for GProM – Database-independent middleware for computing provenance of queries, updates, and transactions. • First solution for provenance of transactions • Query rewrite techniques on steroids: – Provenance computation – Transaction reenactment – Provenance translation – Provenance storage – Optimization • Extensible through RSL language 29 GProM - Provenance for Queries, Updates, and Transactions
  • 30. Future Works • Implementing additional provenance types • Comprehensive study of heuristic and cost-based optimizations • Design and implementation of RSL • Implementing additional provenance formats • Study reenactment for other concurrency control mechanisms – Locking protocols (2PL) • Investigate additional Use-cases for Reenactment – Transaction backout – Retroactive What-if analysis 30 GProM - Provenance for Queries, Updates, and Transactions
  • 31. Questions? • Homepage: Bahareh: http://www.cs.iit.edu/~dbgroup/people/barab.php Boris: http://www.cs.iit.edu/~glavic/ • DBGroup: http://www.cs.iit.edu/~dbgroup/ • GProM Project (partially funded by Oracle) http://www.cs.iit.edu/~dbgroup/research/oracletprov.php • Perm http://www.cs.iit.edu/~dbgroup/research/perm.php 31 GProM - Provenance for Queries, Updates, and Transactions
  • 32. References [1] B. Glavic, R. J. Miller, and G. Alonso. Using SQL for Efficient Generation and Querying of Provenance Information. In Search of Elegance in the Theory and Practice of Computation, pages 291–320. Springer, 2013. [2] D. Bhagwat, L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. An Annotation Management System for Relational Databases. VLDB Journal, 14(4):373–396, 2005. [3] G. Karvounarakis, T. J. Green, Z. G. Ives, and V. Tannen. Collaborative data sharing via update exchange and provenance. TODS, 38(3): 19, 2013. [4] J. Zhang and H. Jagadish. Lost source provenance. In EDBT, pages 311– 322, 2010. [5] P. Agrawal, O. Benjelloun, A. D. Sarma, C. Hayworth, S. U. Nabar, T. Sugihara, and J. Widom. Trio: A System for Data, Uncertainty, and Lineage. In VLDB, pages 1151–1154, 2006. [6] D. Gawlick and V. Radhakrishnan. Fine grain provenance using temporal databases. In TaPP, 2011. [7] G. Karvounarakis and T. Green. Semiring-annotated data: Queries and provenance. SIGMOD Record, 41(3):5–14, 2012. 32
  • 33. Q-Bomb • One pattern that arises from reenactment are long chains of SELECT clauses using CASE – Each level references attributes from next level multiple times – Subquery pull-up creates expressions of size exponential in the number of SELECT clauses – In praxis: optimization never finishes • Minimal example using one row table SELECT CASE WHEN b < 100 THEN a ELSE a + 2 END AS a, b FROM SELECT CASE WHEN b < 100 THEN a ELSE a + 2 END AS a, b … FROM SELECT CASE WHEN b < 100 THEN a ELSE a + 2 END AS a, b FROM R 33
  • 35. Example – Update Reenactment 35
  • 36. Example – Trans. Reenactment 36
  • 39. Types of Update Operations - Insert • Insert executed at time t • Updated version of R contains 1. All tuples from previous version 2. All newly inserted tuples • Fixed tuple defined in VALUES clause • Results of query over database version at t Union these two sets INSERT INTO R VALUES (v1, ... ,vn); INSERT INTO R (q); 39 (SELECT * FROM R AS OF t) UNION ALL (SELECT v1 AS a1, ... , vn AS an); (SELECT * FROM R AS OF t) UNION ALL (q(t));
  • 40. Types of Update Operations - Delete • Delete executed at time t • Tuples in updated version of R: – All tuples from for which Condition is not fulfilled DELETE FROM R WHERE C ; SELECT * FROM R AS OF t WHERE (C) IS NOT TRUE; 40
  • 41. Types of Update Operations - Update • Update executed at time t • Find tuples where Condition holds and update the attribute values • Find tuples where NOT Condition holds Union these two sets UPDATE R SET A WHERE C ; (SELECT A’ FROM R AS OF t WHERE C) UNION ALL (SELECT * FROM R AS OF t WHERE (C) IS NOT TRUE) 41
  • 42. READ COMMITTED • Statement of a transaction T sees committed changes by concurrent transaction • For a given update we need to combine – tuples produced by previous statements of same transaction – tuples produced by transactions that committed before update • Observations – Once a transaction T modifies a tuple t, no other transaction can access t until T commits – Let ui be the update executed at time x of T that first modifies t – ui will read the latest version committed x – If we know ui then updates of T before x do not have to look at t • Consider the database version 1 time unit (C-1) before commit of T – This contains all the tuple versions seen by the first update of T updating each individual tuple – Let t be a tuple version in this version and it’s start time is y – We know that updates from T which executed before y cannot have updated t – We can use version C-1 as input for reenactment as long as we hide tuple version t at y from an reenactment of an updated executed at x with x < y 42
  • 43. READ COMMITTED u1 AS (SELECT CASE WHEN Balance <=1000000 AND version <= 0 THEN 'Standard ' ELSE Type END AS Type , ID , Owner , Balance , CASE WHEN Balance <=1000000 AND version <= 0 THEN −1 ELSE version END AS version FROM USacc AS OF SCN 3652) , u2 AS (SELECT CASE WHEN Balance > 1000000 AND version <= 1 THEN 'Premium' ELSE Type END AS Type , ID , Owner , Balance , CASE WHEN Balance > 1000000 AND version <= 1 THEN −1 ELSE version END AS version FROM u1 ) SELECT ID , Owner , Balance , Type FROM u2 WHERE version = −1;43
  • 44. Database Independence • Encapsulate database-specific functionality in pluggable modules. • What needs to be adapted are : 1) Parser 2) SQL code generator 3) Metadata access 4) Audit log access 5) Time travel activation. 44
  • 45. Accessing Several Tables • Transactions Accessing Several Tables – We require user to specify which table she is interested in – Replace access to table with query for last update that modified the table 45 U1R1 R2 R3 U2 R1 U3 R3 U4 R1 R3