These are the slides of a 40mn presentation I've made at the CNRS Software Development days (JDEV 2017), in Marseille (France), July 5th, 2017.
Here is the Webcast, in French: https://webcast.in2p3.fr/videos-integrer_des_sources_de_donnees_heterogenes_dans_le_web_de_donnees
3. 3Franck MICHEL
Example: study history of zoological knowledge
Archaeological excavationConservation biology*
*http://www.lynxeds.com/hmw/plate/family-delphinidae-ocean-dolphins
First Natural History Encycloedia, 1485.
4. 4Franck MICHEL
Example: study history of zoological knowledge
Archaeological excavationConservation biology*
*http://www.lynxeds.com/hmw/plate/family-delphinidae-ocean-dolphins
Knowledge formalizations
Controlled vocabularies,
taxonomies
domain ontologies, …
5. 5Franck MICHEL
LOD Cloud: 10K datasets, 150B Statements
Linking Open Data cloud diagram 2017. A. Abele, J.P. McCrae, P. Buitelaar, A. Jentzsch and R. Cyganiak. http://lod-cloud.net/
On the Web
In RDF
Under open
licences
Interlinked
6. 6Franck MICHEL
Publishing legacy data in RDF raises tricky questions
Metadata
Data
Vocabularies?
Create links?
Raw data?
Translate
into RDF?
7. 7Franck MICHEL
Describe the translation of heterogeneous data into RDF
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
8. 8Franck MICHEL
Describe the translation of heterogeneous data into RDF
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
9. 9Franck MICHEL
Data Sources Have Heterogeneous Data Models
Relational DB
ID NAME
GraphObject-Oriented
Native XML DBs
Documents
10. 10Franck MICHEL
Describe the translation of heterogeneous data into RDF
HTML data with RDFa
CSV data
Relational data
NoSQL data
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
11. 11Franck MICHEL
Describe the translation of heterogeneous data into RDF
HTML data with RDFa
CSV data
Relational data
NoSQL data
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
12. 12Franck MICHEL
<body vocab="http://schema.org/">
<div resource="/jdev2017" typeof="Event">
<h2 property="title">JDEV 2017</h2>
<p>Date: <span property="startDate">2017-07-04</span></p>
...
<p>T2 - Ingénierie et web des données.
<a property="url“href="http://devlog.cnrs.fr/jdev2017/t2">More…</a>
</p>
</div>
</body> prefix sch: <http://schema.org/>
<http://devlog.cnrs.fr/jdev2017>
rdf:type sch:Event ;
sch:title "JDEV 2017";
sch:startDate "2015-10-20" ;
sch:url <http://devlog.cnrs.fr/jdev2017/t2> .
RDFa: RDF in HTML attributes
http://devlog.cnrs.fr/
https://www.w3.org/TR/rdfa-core/
13. 13Franck MICHEL
Describe the translation of heterogeneous data into RDF
HTML data with RDFa
CSV data
Relational data
NoSQL data
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
15. 15Franck MICHEL
Describe the translation of heterogeneous data into RDF
HTML data with RDFa
CSV data
Relational data
NoSQL data
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
16. 16Franck MICHEL
Direct Mapping of a RDB to RDF
<PEOPLE/ID=7> rdf:type <PEOPLE> .
<PEOPLE/ID=7> <PEOPLE#FNAME> "Catherine" .
<PEOPLE/ID=7> <PEOPLE#WROTE> <BOOK/ID=22> .
<PEOPLE/ID=8> rdf:type <People> .
<PEOPLE/ID=8> <PEOPLE#FNAME> "Olivier" .
<PEOPLE/ID=8> <PEOPLE#WROTE> <BOOK/ID=22> .
Table: PEOPLE
ID FNAME WROTE (FK BOOK/ID)
7 Catherine 22
8 Olivier 22
… … …
https://www.w3.org/TR/2012/REC-rdb-direct-mapping-20120927/
17. 17Franck MICHEL
Custom Mapping of a RDB to RDF
<http://unice.fr/staff/7> rdf:type ex:Teacher.
<http://unice.fr/staff/7> foaf:name "Catherine".
<http://unice.fr/staff/7> dc:contributor <http://unice.fr/book/22>.
<http://unice.fr/staff/8> rdf:type ex:Teacher.
<http://unice.fr/staff/8> foaf:name "Olivier".
<http://unice.fr/staff/8> dc:contributor <http://unice.fr/book/22>.
Existing vocabularies
Table: PEOPLE
ID FNAME WROTE (FK BOOK/ID)
7 Catherine 22
8 Olivier 22
… … …
18. 18Franck MICHEL
Custom Mapping of a RDB to RDF with R2RML
<#MapPeople>
rr:logicalTable [ rr:tableName "PEOPLE" ];
rr:subjectMap [
rr:template "http://unice.fr/staff/{ID}";
rr:class ex:Teacher;
];
rr:predicateObjectMap [
rr:predicate foaf:name;
rr:objectMap [ rr:column "FNAME" ];
].
<http://unice.fr/staff/7> rdf:type ex:Teacher.
<http://unice.fr/staff/7> foaf:name "Catherine".
<http://unice.fr/staff/8> rdf:type ex:Teacher.
<http://unice.fr/staff/8> foaf:name "Olivier".
http://www.w3.org/TR/r2rml/
19. 19Franck MICHEL
Describe the translation of heterogeneous data into RDF
HTML data with RDFa
CSV data
Relational data
NoSQL data
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
21. 21Franck MICHEL
Many methods for many types of data sources
AstroGrid-D, SPARQL2XQuery, XSPARQL
XML
XLWrap, Linked CSV, CSVW, RML
CSV/TSV/Spreadsheets
D2RQ, R2O, Ultrawrap, Triplify, SM
R2RML: Morph-RDB, ontop, Virtuoso
Relational Databases
RML, TARQL, Apache Any23, DataLift,
SPARQL-Generate
Multiple formats
RDFa, Microformats
HTML
TARQL, JSON-LD, RML
JSON
xR2RML (MongoDB), ontop (MongoDB),
[Mugnier et al, 2016]
NoSQL
M.L. Mugnier, M.C. Rousset, and F. Ulliana. “Ontology-Mediated Queries for NOSQL Databases.” In Proc. AAAI. 2016.
22. 22Franck MICHEL
Describe the translation of heterogeneous data into RDF
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
23. 23Franck MICHEL
Direct mapping: create my own vocabulary
Can be derived from an existing schema
May seem easier: “I do whatever I want”
But no added semantics, need to link my vocabulary with
other ones
24. 24Franck MICHEL
Custom Mapping: reuse existing vocabularies
Large variety of existing vocabularies
But may be difficult to find the appropriate one
Partial coverage of the domain
Granularity: too high (cumbersome), too low (useless)
Different points of view
Frequently, a mixed approach is used
25. 25Franck MICHEL
Describe the translation of heterogeneous data into RDF
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
26. 26Franck MICHEL
Two approaches to translate existing data sources in RDF
Graph
Materialization
(ETL like)
Virtual Graph
Query
rewriting
SPARQL
SPARQL
ID NAME
27. 27Franck MICHEL
A large variety of approaches
Graph
Materialization
Query
Rewriting
Direct
Mapping
Custom
Mapping
RDFa XSPARQL
SPARQL2XQuery
AstroGrid-D
XLWrap
Linked CSV
CSVW
TARQL
DataLift
SPARQL-Generate
RML
JSON-LD
Virtuoso
R2RML
D2RQ
R2O
Ultrawrap ontop
R2RML xR2RML
Any23
Triplify
28. 28Franck MICHEL
Describe the translation of heterogeneous data into RDF
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
31. 31Franck MICHEL
Metadata are the key to enable dataset reuse
Context Identification, authors, dates, license, version, reference articles
Access Format, structure, location (dwld), query method
Meaning What do the data represent? What concepts, entities, semantics?
Interpretation Units (cm or inches, left/right)…
Provenance
Acquired with which equipment, parameters, protocols?
Derived from which dataset? With which processing?
Dataset-level or entity-level provenance
Statistics Number of triples per property of class, links to other datasets…
…
32. 32Franck MICHEL
Several works normalize metadata descritpions
and how to use metadata
CSVW: CSV on the Web
DCAT: Data Catalog Vocabulary
DCAT extensions, application profiles
W3C Dataset Exchange Working Group
VoID: Vocabulary of Interlinked Datasets
HCLS: Health Care & Life Sciences Dataset Profile
…
35. 35Franck MICHEL
HCLS: Health Care & Life Sciences Dataset Profile
Consensus among stakeholders on
the description of datasets using RDF
*http://www.w3.org/TR/hcls-dataset/
RDF, RDFS, XSD
Citation Typing Ontology
Data Catalog (DCAT)
Dublin Core Metadata Types, Dublin Core Metadata Terms
Friend-of-a-Friend (FOAF)
Collection Description Frequency Vocabulary
Identifiers.org vocabulary
Lexvo.org - Lexical Vocabulary
Provenance Authoring and Versioning ontology (PAV)
PROV Ontology
Semantic science Integrated Ontology (SIO)
Vocabulary of Interlinked Datasets (VoID)
Used
vocabularies
36. 36Franck MICHEL
Using a JSON-LD profile to translate JSON into RDF
<http://example.org/member/106> foaf:mbox "john@foo.com".
<http://example.org/member/106> foaf:mbox "john@example.org".
{ "id": 106,
"firstname": "John",
"emails": [ "john@foo.com",
"john@example.org" ]
}
{ "@context": {
"id": "@id",
"@base": "http://example.org/member/"
"emails": "http://xmlns.com/foaf/0.1/mbox"
}
}
https://www.w3.org/TR/json-ld/
37. 37Franck MICHEL
Various initial motivations
• Web of Data, Linked Data,
• OBDA,
• Ontology learning,
• Schema mapping…
Historical products: D2RQ, Virtuoso…
R2RML mapping language
• 2012 W3C recommendation
• Several implementations
Several methods: direct mapping vs. domain-specific
Translation of RDBs to RDF
38. 38Franck MICHEL
All You Need is LOV
Linked Open Vocabularies
522 curated vocabularies
Quality requirements
• URI stability and availability,
• Quality metadata and
documentation,
• Identifiable and trustable
publication body,
• Proper versioning policy,
• …
“Vocabularies provide the semantic glue
enabling Data to become meaningful Data.”
http://lov.okfn.org/dataset/lov/
39. 39Franck MICHEL
Linked Data rules
1.Use URIs as names for things
2.Use HTTP URIs so that people
can look up those names
3.When someone looks up a URI, provide useful
information, using the standards (RDF*, SPARQL)
4.Include links to other URIs, so that they can discover
more things
Ce matin Manuel a introduit les LD, WoD, URI etc., Olivier a décrit plus en detail RDF, SPARQL.
Now: comment alimenter ce WoD avec des données qui ne sont pas en RDF au depart.
On est tous témoin de la multiplication des DS dispo sur le Web. Témoins et Acteurs.
Social networks, collaborative wikis, scientific databases, crowd-sourced information.
The availability of these DS spurs new ideas and opportunities for DI,
but also comes with new challenges: need to capture and share the semantics of data sources.
Transition: So to make sense of them, we need…
SW technologies bring answers to these challenges
RDF increasingly used as the pivot format for integrating heterogeneous DS.
Transition: c’est précisément l’objet du LOD…
Result of an RDF-based DI process.
Starts with translating existing DS into RDF.
INTERACT: Qui a déjà eu à faire de la conversion d’un format vers un autre pour DI ?
Qui a déjà eu à faire ce genre de transformation ?
Quels outils ?
RDFa définit de nouvelles balises HTML (typeof, resource, property) pour ajouter des métadonnées qui permettent de générer du RDF.
Pb: approche invasive
Non invasif, contrairement à RDFa
Create ad-hoc ontology:
Class = « Table »
Subject is « Table/ID »
Property is « Table#column »
Object is value of column
Create ad-hoc ontology:
Class = « Table »
Subject is « Table/ID »
Property is « Table#column »
Object is value of column
Point of view issue :
- biologist vs. taxonomist: est-ce que “espèce” a bien le même sens pour les 2?
- surgeon vs. anatomist: un chirurgien va sans hésitation donner la délimitation d’une zone du cerveau, alors que l’anatomiste va prudemment désigner le centre de la zone mais pas ses frontières…
(…)
Translate each individual DS into RDF using appropriate vocab = mediation
In practice, this mediation follows two common approaches…
Permettre la répétabilité des expériences.
INTERACT: pouvez-vous me donner qq exemples de métadonnées ?
VoID et DCAT à décrire
(…) library metadata formats provided by the Library of Congress, BnF, DNB, etc..
(…) published and used by (…) large media corporations (BBC), national administrations (INSEE), EC, universities and research projects,
(…) published by individuals and put on the community table, in the tradition and spirit of the open, collaborative Web.”
Retour sur les fondamentaux du LD, les tables de la loi!
On a vu:
- Comment décrire nos données avec des métadonnées,
- les technos du WS qui peuvent nous aider à publier données et métadonnées
- comment créer/réutiliser les vocabulaires qui constituent la référence sémantique de mes données
enfin, comment convertir des données existantes en RDF
Maintenant, quelles sont les bonnes pratiques pour en faire des données ouvertes et liées…