08448380779 Call Girls In Civil Lines Women Seeking Men
Omitola o rian_eswc_idts final
1. Digital Enterprise Research Institute www.deri.ie
Capturing interactive data transformation
operations using provenance workflows
Tope Omitola, Andre Freitas, Edward Curry, Sean
O'Riain, Nicholas Gibbins and Nigel Shadbolt
SWPM Workshop 28.05.2012, Herakleion, Crete
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
2. Outline
Digital Enterprise Research Institute www.deri.ie
Motivation
Interactive data transformations (IDTs)
IDT & Provenance
Modelling IDTs
Provenance Representation
Provenance Capture
Case Study
Conclusion
3. Motivation
Digital Enterprise Research Institute www.deri.ie
Dataspaces:
High number of heterogeneous data sources
Complex data transformation environment
Need for both repeatable data transformations and once-
off transformations
Traditional ETL approaches for data
transformation/integration:
Based on scripting/programming
Focus on repeatable data transformation processes
4. Interactive Data Transformation (IDTs)
Digital Enterprise Research Institute www.deri.ie
Based on user interaction paradigms for user
creation of data transformations
Explores GUI elements mapping to data
transformation operations
Instant feedback of each iteration
Complementary to existing ETL tools
Lower the barriers for non-programmers (reduces
programming effort) of doing data transformations
Example platforms: Google Refine, Potters Wheel,
Wrangler
6. Challenges
Digital Enterprise Research Institute www.deri.ie
How to model IDTs?
Facilitating the reuse of previous IDTs
Representing IDTs
Provenance
Making IDT platforms provenance-aware
Enabling transportability across IDT and ETL
platforms
7. IDT & Provenance
Digital Enterprise Research Institute www.deri.ie
Provenance supports representation of interactive
data transformations
Output: a provenance descriptor which shows the
relationship between the inputs, the outputs, and
the applied transformation operations
Both retrospective and prospective provenance
8. IDT
Digital Enterprise Research Institute www.deri.ie
IDT model
Formal model (Algebra for IDT)
Provenance representation
Provenance capture of IDTs
9. IDT Model: Core Elements
Digital Enterprise Research Institute www.deri.ie
Schema and instance data
Set of predefined operations
GUI elements mapping to predefined operations
User actions
Operation selection
Parameter selection
Operation composition (workflow)
11. Formalizing the mapping from IDT to
Provenance
Digital Enterprise Research Institute www.deri.ie
Definition 1: A provenance-based interactive data
transformation engine, consists of a set of
transformations (or activities) on a set of datasets
generating outputs in the form of other datasets or
events which may trigger further transformations
Definition 2: An interactive data transformation
event, consists of the input dataset, the output
dataset(s), the applied transformation function,
and the time the transformation took place
12. Formalizing the mapping from IDT to
Provenance
Digital Enterprise Research Institute www.deri.ie
Definition 3: A run is a function from time to
dataset(s) and the transformation applied to those
dataset(s)
Definition 4: A trace is the sequence of pairs of a
run and the time the run was made
13. Provenance Representation
Digital Enterprise Research Institute www.deri.ie
Proposed in Representing Interoperable Provenance
Descriptions for ETL Workflows
Three-layered provenance model:
Open Provenance Model Vocabulary Layer
Cogs ETL Provenance Vocabulary
Domain-Specific Model Layer
Linked Data standards
16. Case study
Digital Enterprise Research Institute www.deri.ie
Implementation over the GR Platform
Example descriptor
@prefix grf: <http://127.0.0.1:3333/project/1402144365904/> .
grf :MassCellChange-1092380975 rdf:type opmv:Process,
cogs:ColumnOperation, cogs:Transformation; Mapping to the actual program
cogs:operationName "MassCellChange"^^xsd:string;
cogs:programUsed "com.google.refine.operations.cell.MassEditOperation"^^xsd:string; Process
rdfs:label "Mass edit 1 cells in column ==List of winners=="^^xsd:string.
grf:MassCellChange-1092380975/1_0 rdf:type opmv:Artifact ; Input Artifact
rdfs:label "* '''1955 [[Meena Kumari]]'[[Parineeta (1953 film)|Parineeta]]''''' as '''Lolita'''"^^xsd:string.
grf:MassCellChange-1092380975/1_1 rdf:type opmv:Artifact; Output Artifact
rdfs:label "* '''John Wayne'''"^^xsd:string.
Workflow structure
grf:MassCellChange-1092380975/1_1 opmv:wasDerivedFrom grf:MassCellChange-1092380975/1_0.
grf:MassCellChange-1092380975 opmv:used grf:MassCellChange-1092380975/1_0.
grf:MassCellChange-1092380975/1_1 opmv:wasGeneratedBy grf:MassCellChange-1092380975.
grf:MassCellChange-1092380975/1_1 opmv:wasGeneratedAt "2011-11-16T11:2:14"^xsd: dateTime.
17. Conclusion
Digital Enterprise Research Institute www.deri.ie
The proposed approach provides low impact on the
existing IDT process
Provenance representation supports different data
models
Preliminary implementation of a Google Refine
provenance extension