AWS Community Day CPH - Three problems of Terraform
Enabling combined Software and Data engineering at Web-scale
1. http://aligned-project.eu ISWC, 21st October 2016, Kobe
Enabling combined Software and Data
engineering at Web-scale
The ALIGNED suite of Ontologies
Monika Solanki
https://w3id.org/people/msolanki
@nimonika
University of Oxford
Joint work with
Bojan Boži´c, Markus Freudenberg, Dimitris Kontokostas,
Christian Dirschl, Rob Brennan &
The ALIGNED consortia
3. http://aligned-project.eu ISWC, 21st October 2016, Kobe
Motivation
Recent years have seen a significant increase in the
demand for data-intensive applications.
Crucial technical and economic challenge ⇒ Effective,
collaborative integration of software and big data
engineering for Web-scale systems.
Current engineering techniques for building these systems
are both immature and often partitioned into software
engineering and data engineering processes, tasks or
teams.
There is a need for integrated engineering approaches
along with an underlying curatorial process to improve and
manage data over time.
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
4. http://aligned-project.eu ISWC, 21st October 2016, Kobe
Semantic models and Linked data
The expressivity of semantic models makes them useful for
both addressing data quality and applying model-driven
approaches to software engineering.
Semantic models can enable tools to easily publish
relevant meta-data about engineering processes.
Linked data based restful APIs can enable tool integration
and process or lifecycle synchronisation/communication.
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
5. http://aligned-project.eu ISWC, 21st October 2016, Kobe
Goal
Develop a common suite of lightweight meta-models or
vocabularies to describe both software and data engineering
system specifications and lifecycles, thereby creating a
common technical space for tools to easily publish relevant
meta-data about systems engineering.
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
6. http://aligned-project.eu ISWC, 21st October 2016, Kobe
Contributions
The ALIGNED* suite of ontologies
Specifically designed to model the information exchange
needs of combined software and data engineering.
Aims to align the divergent processes encapsulating data
and software engineering.
Deployed for validation and incremental improvement in
the ALIGNED project on four, large-scale data-intensive
systems engineering use cases.
Improves productivity, agility and quality.
*http://aligned-project.eu
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
7. http://aligned-project.eu ISWC, 21st October 2016, Kobe
The ALIGNED suite of ontologies
Provides support for
Semantics-based model driven software engineering
Data quality engineering techniques
Development of tools for unified views of software and data
engineering processes
Software/data test case interlinking,
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
8. http://aligned-project.eu ISWC, 21st October 2016, Kobe
ALIGNED Suite: Overview
Text
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
9. http://aligned-project.eu ISWC, 21st October 2016, Kobe
ALIGNED Suite: Design Intents
Text
Design Intent Ontology (DIO) documents the design decisions
about data intensive system artefacts such as requirements,
designs or datasets.
Available at: https://w3id.org/dio
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
10. http://aligned-project.eu ISWC, 21st October 2016, Kobe
ALIGNED Suite: Software Engineering
Text
Defines the major agents, activities and entities involved in a
software engineering project and their relations with a special
focus on capturing the engineering lifecycle.
Available at: https://w3id.org/slo
https://w3id.org/sip
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
11. http://aligned-project.eu ISWC, 21st October 2016, Kobe
ALIGNED Suite: Data Engineering (1)
Text
DLO is the basis for deriving specific domain ontologies which
represent lifecycles of concrete data engineering projects -
DBpedia and Seshat.
Available at: https://w3id.org/dlo
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
12. http://aligned-project.eu ISWC, 21st October 2016, Kobe
ALIGNED Suite: Data Engineering (2)
Text
DataID is a multi-layered meta-data system, which, in its core,
describes datasets and their different manifestations.
Available at: http://dataid.dbpedia.org/ns/core
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
13. http://aligned-project.eu ISWC, 21st October 2016, Kobe
ALIGNED Suite: Unified quality reports (1)
Defines a unified reporting representation for data quality
metrics, ontology reasoning errors, test cases, and test case
results based on the W3C SHACL reporting vocabulary.
RUT is designed to capture the lifecycle of RDF validation with
the test driven validation methodology.
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
14. http://aligned-project.eu ISWC, 21st October 2016, Kobe
ALIGNED Suite: Unified quality reports (2)
RVO describes both ABox and TBox reasoning errors for the
integration of reasoners into data lifecycle tool-chains.
Available at: https://w3id.org/rvo
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
15. http://aligned-project.eu ISWC, 21st October 2016, Kobe
ALIGNED Suite: Domain Models
Enterprise information processing: extensions and models for the
JURION use case.
E-research in the Social Sciences and Humanities: extensions and
models for the Seshat use case.
Crowd-sourced public datasets: extensions and models for the
DBpedia use case.
Enterprise software development: extensions and models for the
PoolParty use case.
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
16. http://aligned-project.eu ISWC, 21st October 2016, Kobe
Use case: Wolters Kluwer’s JURION
JURION: an innovative legal information platform
developed by Wolters Kluwer Germany
Merges and interlinks over 1 million documents of content
and data from diverse sources.
Data is presented to users, e.g. law offices
Data lifecycle stages: extraction, storage, authoring,
interlinking, enrichment, quality analysis, repair and
publication.
Information processing pipeline ⇒ highly customised
applications for legal information retrieval, alerts, analysis
and semantic search.
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
17. http://aligned-project.eu ISWC, 21st October 2016, Kobe
Use case: Wolters Kluwer’s JURION
Challenge
Currently, the software development process and data life cycle
are highly independent from each other and require extensive
manual management to coordinate their parallel development,
leading to higher costs, quality issues and a slower
time-to-market.
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
18. http://aligned-project.eu ISWC, 21st October 2016, Kobe
Wolters Kluwer’s JURION
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
19. http://aligned-project.eu ISWC, 21st October 2016, Kobe
Evaluation (1)
Generic criteria Evaluation
Value Addition - enrich information about process specific
procedures for a tool by adding data and
software engineering specific metadata
- add context dependent information for
enabling automation in tools
Potential users - community of content producers, owners
of large amounts of data, data managers,
ontology engineers
- Software development model design-
ers, and developers of human societies
datasets
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
20. http://aligned-project.eu ISWC, 21st October 2016, Kobe
Evaluation (2)
Generic criteria Evaluation
Availability -https://w3id.org/*
-http://aligned-project.eu
-https://github.com/
aligned-h2020/ALIGNED_
Ontologies
Sustainability - Long term sustainability has been as-
sured by TCD and the ontology engineers
involved in the design
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
21. http://aligned-project.eu ISWC, 21st October 2016, Kobe
Evaluation (3)
Generic criteria Evaluation
Design and technical
quality
- Designed in accordance to ontology en-
gineering principles
- Axiomatisations based on the compe-
tency questions identified during require-
ments scoping from potential exploiting
application
Documentation - The ALIGNED public deliverables and
publications
- Self documentation
- HTML documentation via the LODE ser-
vice
- Graphicall illustrations
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies
22. http://aligned-project.eu ISWC, 21st October 2016, Kobe
Conclusions
Combining data and software engineering processes to
increase productivity and agility, is a challenge.
The proposed ALIGNED suite of ontologies provides
semantic models of design intents, domain specific
datasets, software engineering processes, quality
heuristics and error handling mechanisms.
The ALIGNED suite contributes immensely towards
enabling interoperability and alleviating some of the
complexities involved.
We have exemplified the usage of the suite on a real-world
use case from the legal domain and evaluated it against
the desired criteria.
monika.solanki@cs.ox.ac.uk, @nimonika The ALIGNED suite of Ontologies