Datalift: A Catalyser for the Web of Data - Francois Scharffe
1. Datalift: A Catalyser for the Web of Data
François Scharffe
University of Montpellier,
LIRMM, INRIA
francois.scharffe@lirmm.fr
@lechatpito
With the help of the Datalift team
And the support of the French National Research Agency
Webscience meetup 5/02/2011 1
3. Datalift
A large scale Web data publication
experiment.
Objectives:
- Publish reference datasets
- Automate the data publication
process
- Show the interest of publishing
linked data
3
4. Datalift
Motivation:
- Two phenomena:
- Society – Open Data
- Technology – Semantic Web
Data revolution going on : the web
of data is explosing as the web of
documents exploded in the 90'
4
5. Datalift
Datasets publication
R&D to automate the publication process
A modular architecture to assist data
publication
Training, tutorials, data publication camps
6. Welcome aboard the data lift
Published and interlinked data on the Web
Applications
Interconnexion
Publication infrastructure
Data convertion
Vocabulary selection
Raw data
8. Vocabulary selection
Vocabularies for linked-data
●
Are meant to describe resources in RDF
●
Are based on one of the standard W3C language RDFS
and OWL
Ø What makes a good vocabulary ?
●
A good vocabulary is a used vocabulary
●
Other usability criterias : Simplicity, visibility,
documentation, flexibility, semantic integration, social
integration
Ø Types of vocabularies
●
Metadata, reference, domain, general
9. Vocabulary of a Friend
Ø http://www.mondeca.com/foaf/voaf
Ø A simple vocabulary...
Ø To represent interconnexions between vocabularies
Ø A unique entry point to vocabularies and Datasets of
the linked-data cloud Linked Data Cloud
Ø Ongoing work in Datalift
11. Reference datasets, URI design
● Providing reference datasets for the French
ecosystem: geographical, topological, statistical,
political. Ex: http://parisemantique.fr
● Providing URI design guidelines
● Opaque or transparent URIs ?
● Usage of accents in URIs
12. Convertion tools to RDF
Ø How is the raw data to be converted ?
§ Relational Database ?
§ (Semi-)structured formats ?
§ Programmatic acces (API) ?
Ø There are solutions for all cases
16. Towards automated interconnexion services
Ø Record linkage, entity reconciliation, instance,
ontology, schema matching
§ Using alignments between vocabularies
§ Detection of discriminating properties
§ Indicating comparison methods by attaching metadata to
ontologies
Ø Work in progress in Datalift
18. “It is a time when, even if nets were to guide all
consciousness that had been converted to photons
and electrons toward coalescing, standalone
individuals have not yet been converted into data to
the extent that they can form unique components of
a larger complex”
Mamoru Oshii, Ghost in the Shell