Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
1. UnifiedViews: Towards ETL Tool
for Simple yet Powerful RDF Data Management
T. Knap, P. Škoda, J. Klímek, M. Nečaský
http://xrg.cz | knap@ksi.mff.cuni.cz
XML and Web Engineering Research Group
Faculty of Mathematics and Physics
Charles University in Prague, Czech Republic
Dateso 2015
4. UnifiedViews
an Extract-Transform-Load (ETL)
framework with UI that allows users to
define, execute, monitor, debug, schedule,
and share RDF data processing tasks
UnifiedViews differs from other ETL
frameworks by natively supporting processing
of RDF data.
Dateso 2015
5. A Pipeline
Every data processing task is modelled as a pipeline in
UnifiedViews
Every pipeline consists of one or more DPUs (data
processing units) and arrows depicting data flow
a
Dateso 2015
6. A Data Processing Unit (DPU)
Plugin, which encapsulates certain functionality, typically on top of
RDF data
Users may prepare custom plugins
Every DPU has its inputs, outputs, business logic and configuration
E.g., DPU may apply SPARQL Update query to the input RDF data and
produce output RDF data
a
Dateso 2015
7. Key Features
Web administration interface:
Define and manage pipelines
Validate, execute, monitor and debug pipelines
Possibility to schedule tasks, set up notifications about the pipeline executions
Define and manage DPUs
Possibility to debug inputs to/outputs from DPUs
Possibility to share pipelines and DPUs
Possibility to get notifications about the result of the pipeline execution
Multi-user environment
Engine running the tasks
Ensures that DPUs on the pipeline are executed in the proper order
It may send notifications about the result of the pipeline execution
Core DPUs to work with RDF data
Easy way how to extend Unified Views with your own DPUs
Every DPU is an OSGi bundle, as a result, two DPUs using two different
versions of the same library may coexist in the framework
Dateso 2015
8. Impact of UnifiedViews
Projects
• OpenData.cz initiative
• INTLIB (2012-2014) – TaCR project
• LOD2 (2011-2014) – EU FP7 project
• UnifiedViews integrated into the LOD2 stack
• COMSODE (2013-2015) – EU FP7 project
• Open Data Node contains UnifiedViews
• YourDataStories (2015+), H2020
• TenForce, Belgium
also commercial projects
• Semantic Web Company (Austria),
• EEA s.r.o. (SK)
Dateso 2015
10. Automatic Schema Alignment and
Object Linkage
Object Linkage:
Motivation: If various datasets use the same identifiers for the same
real world objects (cities, countries), level of data integration is
increased and costs of ad-hoc application integration is reduced
Goal: To automatically discover that certain columns in the processed
tabular data represent certain types of data (e.g. cities, countries) and
automatically mapping values in this column to Linked Data URIs taken
from the preferred dataset for the given type of data
Schema Alignment:
Motivation: increase understandability of the data and simplify reuse
of the data by various applications by using common vocabularies.
Goal: To automatically suggest mappings of used RDF vocabulary
terms (e.g., predicated) to well-known RDF terms (e.g., predicates)
Dateso 2015
11. Simplicity of Use
Hiding SPARQL Queries
Goal: To provide set of DPUs for executing typical SPARQL query
operations on top of RDF data
Autocompleting Terms from Well-known Vocabularies
Goal: To Suggest and autocomplete vocabulary terms from well-
known Linked Data vocabularies
• Vocabulary autocomplete-aware controls (text boxes)
• Description of the term, formal def., recommended usage
Wizards for Simple Definition of Data Processing Tasks
Motivation: Defining data processing tasks typically requires
detailed knowledge of the DPUs that are available in the
deployed UnifiedViews instance;
Goal: Step by step guides for defining new typical types of data
processing tasks, e.g, extracting and publishing tabular
Dateso 2015
12. Sustainability and Quality
Sustainable RDF Data Processing
Goal: To allow task designer to define for each DPU a set of
SPARQL queries, which tests that the output data
produced by the given DPU satisfies certain conditions. If
possible, automate creation of such queries.
Assessing Quality of Produced Data, Recommendation
of Cleansing DPUs
Motivation: task designer should be informed about any
problems in the data, e.g., w.r.t. syntactic/semantic
accuracy of the produced Linked Data or completeness of
the published datasets
Goal: Set of DPUs assessing the quality of the data,
cleansing the data
Dateso 2015
14. Summary
UnifiedViews – ETL tool for RDF data
processing
Basic concepts, Impact
Areas of ongoing and future work
Dateso 2015
15. Would you like to try UnifiedViews?
UnifiedViews is available under open
source license
GPLv3 + LGPLv3
Hosted on GitHub
Repository: https://github.com/UnifiedView
Current latest version: Unified Views 2.0.1
More info:
unifiedviews.eu
Dateso 2015
17. How to contribute?
Guideline for contributors:
https://grips.semantic-
web.at/display/UDDOC/Guidelines+for+Contributors
Dateso 2015
Join the Unified Views Team
Notes de l'éditeur
Priklad ulohy
It may employ custom plugins (data processing units, DPUs) created by users.
General Problem with RDF data processing:
Consumers have to write most of the logic to define, execute, monitor, schedule, and share RDF data processing tasks
online platform for data exploitation focused in the financial flows that are critical for transparency, collaboration and participation
To realise 1), first, it is necessary to identify that certain columns contain certain types of values; such identification is always probabilistic and typically based on the comparison of the name of the column with the list of names of the RDF classes and/or based on matching sample data from the considered column against known codelists, such as list of Czech cities; experiments are needed to decide the particular algorithm for identification of types among input data. Second step to realise 1) is to apply predefined Silk~\cite{DBLP:conf/www/VolzBGK09} rules for the given identified type of data within the column of the input tabular data. To realise 2), various schema matching techniques has to be experimented~\cite{Rahm:2001:SAA:767149.767154}.
Evolution of DPUs (Done)
Proper handling of version migrations