This document discusses the generation of linked data platforms (LDPs) in highly decentralized information ecosystems. It presents a model for automating the generation of LDPs that considers data heterogeneity, hosting constraints, and reusability of LDP designs. The model includes an LDP generation workflow, a design language called LDP-DL to describe LDP designs, and an LDP generation toolkit to implement the workflow. The goal is to facilitate data exploitation for consumers in decentralized environments.
chapter 5.pptx: drainage and irrigation engineering
PhD Defense
1. Institut Mines-Télécom
Generation of Linked Data Platforms in
Highly Decentralized Information Ecosystem
Mohammad Noorani BAKERALLY
Institut Henri Fayol, EMSE,
Connected intelligence, Laboratoire Hubert Curien, UMR CNRS 5516
1
December 20, 2018
PhD Thesis Defense
4. Institut Mines-Télécom4
Highly Decentralized Information Ecosystem
Developers
Web Services
Data Sources
Data Consumers
Data Publishers
<<owns>>
Data Publishers
<<owns>>
Data
Providers
Data Portals
Data Portals
is an information ecosystem consisting of information systems managed by
actors that are self-governed with little to no coordination between them,
e.g. Open data context, the Web, Organizational information ecosystem
5. Institut Mines-Télécom
■ Data Heterogeneity levels:
• Syntax
• Semantics
• Access
■ Hosting Constraints preventing
hosting of data in third party software
environments.
• Examples:
─ Data sources bounded by
license restrictions
─ Real-time data sources
Problems
5
Highly Decentralized Information Ecosystem
Developers
Web Services
Data
Sources
Data
Consumers
Data
Publishers
<<owns>>
Data
Publishers
<<owns>>
Data
Providers
Data Portals
Data Portals
6. Institut Mines-Télécom
■ Facilitate data exploitation for data consumers in highly decentralized
information ecosystem
Aim
6
Highly Decentralized Information Ecosystem
Developers
Web Services
Data
Sources
Data
Consumers
Data
Publishers
<<owns>>
Data
Publishers
<<owns>>
Data
Providers
Data
Portals
Data
Portals
7. Institut Mines-Télécom
■ Facilitate data exploitation for data consumers in highly decentralized
information ecosystem
Aim
7
Highly Decentralized Information Ecosystem
Developers
Web Services
Data
Sources
Data
Consumers
Data
Publishers
<<owns>>
Data
Publishers
<<owns>>
Data
Providers
Data
Portals
Data
Portals
Publication of interoperable data and semantics by data publishers
8. Institut Mines-Télécom
■ Syntax
• Uniform identification mechanism to refer to
resources
• Flexibility wrt description of resources having varying
structures
■ Semantics
• Ontology languages to make semantics explicit
• Semantics in syntax to make data self-described and
portable
■ Access
• High-level protocols to hide heterogeneity of platforms
• Uniform data access to facilitate data exploitation
Requirements for data interoperability
8
Highly Decentralized Information Ecosystem
Open
standards
9. Institut Mines-Télécom
■ Semantic Web
■ Linked Data Platform Generation Model
■ Linked Data Platform Generation Toolkit
■ Evaluation
■ Conclusion & Perspectives
Outline
9
10. Institut Mines-Télécom
■ Semantic Web
■ Linked Data Platform Generation Model
■ LDP Generation Toolkit
■ Evaluation
■ Conclusion & Perspectives
Outline
10
11. Institut Mines-Télécom
■ Data Syntax: RDF [CWL14]
• 😃 Uniform identification mechanism
─ Uniform Resource Identifier (URI)
• 😃 Flexibility
─ Schema-less
■ Data Semantics: RDFS [BG14] and OWL [W3C12]
• 😃 Ontology languages
─ RDFS and OWL are ontology languages
• 😃 Semantics in syntax
─ RDFS and OWL can be serialized in RDF
Semantic Web wrt to Data Syntax & Semantics
11
12. Institut Mines-Télécom
■ SPARQL [Gro13]: Standard query language for RDF
• 😃 High-level protocol
─ SPARQL 1.1 Protocol
• 😃 Uniform data access
─ Formal syntax and semantics
■ SPARQL is only for
querying (data consumers ) rather
than publishing data (data publishers )
Semantic Web for Data Access
12
Model
View
Controller
XQUERY,
SQL,
SPARQL
13. Institut Mines-Télécom
Semantic Web for Data Access
13
■ Linked Data principles [BL06]: provide RESTful access to data in RDF
• High-level protocol
─ operates on HTTP
• Uniform data access
─ Provides description using set of standards (RDF, Turtle etc)
─ Leaves open choices (e.g. Default RDF serialization)
■ Linked Data Platform 1.0 [SAM15c]: standardizes RESTful access to
data in RDF
• 😃 High-level protocol
─ Standardizes interaction on top of HTTP
• 😃 Uniform data access
─ Provides domain and interaction model
14. Institut Mines-Télécom
Linked Data Platform 1.0
■ Domain Model
• Defines different types of LDP resources
• Used to describe resources on LDPs
■ Interaction Model
• Well-defined HTTP methods for CRUD
operations on LDP Resources
14
LDP
Resource
LDP
RDF Source
LDP
Non-RDF
Source
LDP
Basic
Container
LDP
Container
LDP
Indirect
Container
LDP
Direct
Container
Semantic Web
LDP Standard: Linked Data Platform 1.0
LDPs: data platforms implementing LDP Standard
15. Institut Mines-Télécom
■ RDF for Data Syntax
• Uniform identification mechanism
• Flexibility
■ RDFS/OWL for Data Semantics
• Ontology languages
• Semantics in syntax
■ LDP Standard for Data Access
• High-level protocols
• Uniform data access
Satisfaction of Requirements for data interoperability
15
Semantic Web
Open
standards
16. Institut Mines-Télécom
LDP Related Work
16
■ Usage of LDP
• Linked Data Platform as a novel approach for Enterprise Application Integration [MGG13]
• Music SOFA: An architecture for semantically informed recomposition of Digital Music
Objects [DDR18]
• ECA2LD: Generating Linked Data from Entity-Component-Attribute runtimes [TRM18]
• Linking the Web of Things: LDP-CoAP Mapping [LIG+16]
■ Custom Generation of LDP
• Morph-LDP: An R2RML-based Linked Data Platform implementation [MPC+14]
• A Linked Data Platform adapter for the Bugzilla issue tracker [MGG14]
■ LDP Implementations:
• LDP Resource Management Systems: Generic LDP servers
• LDP Frameworks: Tools for developing LDP servers
Semantic Web
17. Institut Mines-Télécom
LDP Implementations
■ LDP Resource Management Systems:
• Generic LDP servers for storing, retrieving and
manipulating LDP resources through HTTP
methods
• e.g. OpenLink Virtuoso Server, Apache Marmotta,
Fedora Commons
■ LDP Frameworks:
• API for facilitating the manual development of
LDPs
• e.g. LDP4j [EGMGC14], Eclipse Lyo
17
RDF Data Sources
LDP Resource
Generator
LDP Resources
18. Institut Mines-Télécom
Generation of LDPs
18
Design Implementation Deployment
● Define data design: how
data is organized
according the domain
model
● Encode data design in
LDP Resource
Generator
● Deploy LDP server and
data
● Problems:
○ Heterogeneity: No
support for non-RDF
data sources
○ Hosting constraints
● Problems:
○ Tight coupling between
design and
implementation
hindering:
■ Maintainability of
design
■ Reusability of design
● Problems:
○ Definition is manual
Semantic Web
19. Institut Mines-Télécom
State of the art: Synthesis
19
■ Problems wrt to data exploitation in highly decentralized information
ecosystems are data heterogeneity and hosting constraints
■ Semantic Web standards (RDF, RDFS/OWL, LDP) satisfy requirements
for data interoperability
■ But generating LDPs from existing RDF data sources is a complex task:
• No support for non-RDF data sources
• No support for hosting constraints
• Manual development producing tight coupling between data
design and implementation
─ Reusability and maintainability of LDP designs are strongly limited
20. Institut Mines-Télécom
Objective
■ Automatize the generation of LDPs in highly decentralized
information ecosystem by using Semantic Web technologies and
considering the following constraints:
• Data Heterogeneity
• Hosting Constraints
• LDP Design Reusability
20
28. Institut Mines-Télécom
LDP Dataset
■ LDP Dataset consists of:
• Set of container structures (n,g,M):
─ n is the IRI of the container
─ g its RDF graph
─ M is a set of IRIs representing the members of container n
• Set of named graphs (n,g):
─ n is the IRI of the non-container
─ g its RDF graph
28
LDP Generation Workflow
31. Institut Mines-Télécom
LDP-DL: Overview
31
Data Source
LDP Generation Workflow
Data design questions:
■ What are the LDP resources wrt to
resources from the data source ?
■ What is the structure of
containers/non-containers ?
■ What are the content of
containers/non-containers ?
32. Institut Mines-Télécom
LDP-DL: Overview
32
LDP Dataset
Data Source
LDP Generation Workflow
Data design questions:
■ What are the LDP resources wrt to
resources from the data source ?
■ What is the structure of
containers/non-containers ?
■ What are the content of
containers/non-containers ?
33. Institut Mines-Télécom
LDP-DL: Overview
33
LDP Dataset
Data Source
LDP Generation Workflow
Data design questions:
■ What are the LDP resources wrt to
resources from the data source ?
■ What is the structure of
containers/non-containers ?
■ What are the content of
containers/non-containers ?
dex:paris-catalog a ldp:BasicContainer;
foaf:primaryTopic ex:paris-catalog;
ex:paris-catalog a dcat:catalog;
dcat:keyword "paris","dataset";
…….
ldp:contains dex:parking, dex:busStation;
34. Institut Mines-Télécom
LDP-DL: Overview
34
LDP Dataset
Data Source
Data design questions:
■ What are the LDP resources wrt to
resources from the data source ?
■ What is the structure of
containers/non-containers ?
■ What are the content of
containers/non-containers ?
LDP design language describes LDP
resources:
■ IRIs
■ organization in containers
■ Content (graph)
■ Members of containers
LDP Generation Workflow
36. Institut Mines-Télécom
LDP-DL: Overview
36
Related resource
dex:paris-catalog a ldp:BasicContainer;
foaf:primaryTopic ex:paris-catalog;
ex:paris-catalog a dcat:catalog;
dcat:keyword "paris","dataset";
…….
ldp:contains dex:parking, dex:busStation;
LDP Generation Workflow
37. Institut Mines-Télécom
dex:paris-catalog a ldp:BasicContainer;
foaf:primaryTopic ex:paris-catalog;
ex:paris-catalog a dcat:catalog;
dcat:keyword "paris","dataset";
…….
ldp:contains dex:parking, dex:busStation;
LDP-DL: Overview
37
Related resource
LDP Generation Workflow
RDF Graph of
the LDP Resource
38. Institut Mines-Télécom
LDP-DL: Syntax
■ ResourceMap:
• Related resources identified by
Query Pattern
• RDF graph of LDP resources described
by Construct Query
38
39. Institut Mines-Télécom
LDP-DL: Syntax
■ ResourceMap:
• Related resources identified by
Query Pattern
• RDF graph of LDP resources described
by Construct Query
■ NonContainerMap: describes non-containers
39
40. Institut Mines-Télécom
LDP-DL: Syntax
■ ResourceMap:
• Related resources identified by
Query Pattern
• RDF graph of LDP resources described
by Construct Query
■ NonContainerMap: describes non-containers
■ ContainerMap: describes containers and their
members (containers or non-containers)
40
41. Institut Mines-Télécom
LDP-DL: Syntax
■ ResourceMap:
• Related resources identified by
Query Pattern
• RDF graph of LDP resources described
by Construct Query
■ NonContainerMap: describes non-containers
■ ContainerMap: describes containers and their members
(containers or non-containers)
■ DataSource describes:
• RDF Sources using their IRIs
• Non-RDF Sources using:
─ IRIs of data sources
─ IRIs of lifting rules
41
44. Institut Mines-Télécom
■ Given an interpretation and a design document , we define
the LDP dataset that we call the evaluation of wrt
LDP-DL Formal Semantics
44
■ A LDP dataset D is valid wrt to iff there exists such that:
⊧ and D is the evaluation of wrt
■ We provide an algorithm for that generates LDP datasets that
are provably valid wrt input design documents
45. Institut Mines-Télécom
Handling Hosting Constraints
■ Dynamic LDP dataset store instructions to generate graph of LDP
resources
■ Using dynamic LDP dataset:
• Generate LDP dataset at deployment
• Generate graph of LDP resources at query time
■ Deal with dynamicity of data sources and hosting constraints
45
LDP Generation Workflow
48. Institut Mines-Télécom
LDP Generation Toolkit
48
*Lefrançois, Maxime, Antoine Zimmermann, and Noorani Bakerally.
"A SPARQL extension for generating RDF from heterogeneous
formats." European Semantic Web Conference. Springer, Cham, 2017.
53. Institut Mines-Télécom
Evaluation
■ Objective: Automatize the generation of LDPs in highly
decentralized information ecosystem by using Semantic Web
technologies and considering the following constraints:
• Data Heterogeneity
• Hosting Constraints
• LDP Design Reusability
■ Evaluation criteria are derived from objective
53
54. Institut Mines-Télécom
Evaluation: Experiment Settings
■ 8 design documents
■ 28 data sources
• RDF data sources:
─ Open data catalogs from 21 data portals
─ BBC wildlife dataset
─ LodPaddle
• Heterogeneous data sources (JSON, CSV)
• Real-time data sources (JSON, CSV)
■ Github: https://github.com/noorbakerally/LDPDatasetExamples
■ Performance test done using a simple design document and
different data sources having a maximum of 1 million triples
• Performance is approximately linear
54
57. Institut Mines-Télécom
Evaluation: LDP Design Reusability
■ Domain Design Reusability Experiment: Same design document
and varying data sources structured with same ontology
57
58. Institut Mines-Télécom
■ Generic Design Reusability Experiment: Same design document
and varying data sources structured with different ontology
58
Evaluation: LDP Design Reusability
61. Institut Mines-Télécom
■ Semantic Web
■ LDP Generation Model
• LDP Generation Workflow
• LDP Design Language
■ LDP Generation Toolkit
■ Evaluation
■ Conclusion & Perspectives
Outline
61
62. Institut Mines-Télécom
■ Definition of Highly decentralized information ecosystem
• Identification of problems w.r.t data exploitation
• Identification of requirements for data interoperability
■ Semantic Web standards as foundations to facilitate data
publications
■ Data exploitation may be facilitated by providing tools to data
publishers rather than only data consumers
Conclusion: Context
62
63. Institut Mines-Télécom
■ LDP Generation Workflow
• LDP Design Language with:
─ Formal syntax to write LDP design documents
─ Formal semantics to properly interpret LDP design documents
• LDP Dataset
■ LDP Generation Toolkit: Implementation of the LDP Generation
Workflow
■ Evaluation of LDP Generation Toolkit wrt data heterogeneity, hosting
constraints, LDP design reusability
Conclusion: Summary of Contributions
63
64. Institut Mines-Télécom
■ Partial coverage of the LDP standard (e.g. Direct, Indirect
Containers are not considered)
■ Limited handling of hosting constraints
■ Manual generation of LDP design documents
■ Manual generation of lifting rules
Conclusion: Limitations
64
65. Institut Mines-Télécom
Perspectives
■ Enrich design aspects in LDP-DL Model
• Consider Direct & Indirect containers
• Provide deployment constructs to describe aspects such as:
─ Access rights
─ Paging
■ Generate Linked Data based on best practices from Data on the Web Best
Practices [LBC17]
■ Provide LDP Generation methodology
■ Evaluate with real users of LDP
65
66. Institut Mines-Télécom
References
[BG14] Dan Brickley and Ramanathan V. Guha. RDF Schema 1.1. W3C
Recommendation, World Wide Web Consortium (W3C), February 25 2014.
[BL06] Tim Berners-Lee. Linked Data-Design Issues, 2006.
[CWL14] R. Cyganiak, D. Wood, and M. Lanthaler. RDF 1.1 Concepts and Abstract
Syntax, W3C Recommendation 25 February 2014. Technical report, W3C, 2014
[DDR18] De Roure, David, et al. "Music sofa: An architecture for semantically informed
recomposition of digital music objects." Proceedings of the 1st International Workshop
on Semantic Applications for Audio and Music. ACM, 2018.
[FR07] R. B. France and B. Rumpe. Model-driven development of complex software: A
research roadmap. In FOSE, 2007.
[Gro13] W3C SPARQL Working Group. SPARQL 1.1 Overview. W3C Recommendation,
World Wide Web Consortium (W3C), March 21 2013.
66
67. Institut Mines-Télécom
References
[LIG+16] Loseto, Giuseppe, et al. "Linking the web of things: LDP-CoAP mapping."
Procedia Computer Science 83 (2016): 1182-1187.
[MGG13] Mihindukulasooriya, Nandana, Raúl García-Castro, and Miguel Esteban
Gutiérrez. "Linked Data Platform as a novel approach for Enterprise Application
Integration." COLD. 2013.
[MGG14] Mihindukulasooriya, Nandana Sampath, Miguel Esteban Gutiérrez, and Raul
García Castro. "A Linked Data Platform adapter for the Bugzilla issue tracker." (2014):
89-92.
[MPC+14] Mihindukulasooriya, Nandana, et al. "morph-LDP: an R2RML-based linked
data platform implementation." European Semantic Web Conference. Springer, Cham,
2014.
[SAM15c] Steve Speicher, John Arwe, and Ashok Malhotra. Linked Data Platform 1.0.
Technical report, World Wide Web Consortium (W3C), February 26 2015.
67
68. Institut Mines-Télécom
References
[SVB+06] T. Stahl, M. Volter, J. Bettin, A. Haase, and S. Helsen. Model-driven software
development: technology, engineering, management. Pitman, 2006.
[TRM18] Spieldenner, T., Schubotz, R., & Guldner, M. (2018, June). ECA2LD:
Generating Linked Data from Entity-Component-Attribute runtimes. In 2018 Global
Internet of Things Summit (GIoTS) (pp. 1-4). IEEE.
[W3C12] W3C OWL Working Group. OWL 2 Web Ontology Language Docu-ment
Overview (Second Edition), W3C Recommendation 11 December2012. W3C
Recommendation, World Wide Web Consortium (W3C),December 11 2012
68
81. Institut Mines-Télécom
LDP-DL Semantics
81
1. Eval of qp returns { 𝞀←ex:paris-catalog} and
{𝞀←ex:toulouse-catalog}
2. for each of them, a new resource is created
3. consider {𝞀 ←ex:paris-catalog}
4. the new resource (𝜈) is dex:paris-catalog
5. To generate graph of dex:paris-catalog, cq is
evaluated on the source with the bindings
{𝞀←ex:paris-catalog}, {𝜈←dex:paris-catalog}
𝞀: related resource, 𝜈: new LDP resource
83. Institut Mines-Télécom
LDP-DL Semantics
83
-Consider eval of :dataset to generate members of
dex:paris-catalog
-members of dex:paris-catalog describes
dcat:datasets of ex:paris-catalog (related
resource)
- eval of qp is done with bindings
{π1
← ex:paris-catalog}