Linked DAta Applications: There is no One-Size-Fits All Formula (Short presentation)
1. Linked Data Applications:
There is no One-Size-Fits-All
Formula
Asunción Gómez-Pérez
Facultad de Informática, Universidad Politécnica de Madrid
Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net
asun@fi.upm.es
Acknowledgements:
O.Corcho, D. Garijo, D. Vila, L.Vilches, B. Villazón
Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0
2. Table of content
1. Introduction and Motivation
2. The process
3. Examples
• Libraries: http://datos.bne.es
• Geo: http://geo.linkeddata.es/
• Metereology:http://aemet.linkeddata.es/
• Travelling: http://webenemasuno.linkeddata.es/
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 2
3. Ontology Engineering Group
• Director: A. Gómez-Pérez
• Research Group (33 people)
A. Gomez-Perez O. Corcho G. Aguado B. Villazon
• Participation in more than 15
EU projects, (3 as coordinator)
• Collaboration with many
companies
,,,
3
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
4. Ontology Engineering Group Research Areas
2004 2009
Linked Data
Semantic e-Science
(Data Integration, Ontological Engineering
Semantic Grid) 1995
(Social) Natural Language
Semantic Web Processing and
Multilingualism
2000 1997
5. Center for Open Middleware
• Technology center funded by the Santander Group
• Bank
• Associated Software companies
• 1M€/year during the next five years
• Mission:
• Open innovation ecosystem based on open software component
developments
• Managing open source software and products with LD
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 5
6. Linked data project at the Ontology Engineering Group
Geometry2RDF SPARQL Sem4Tags geo REST service
Morph NOR2O Marimba
-Stream annotation
shp2RDF
RDF Generation and Linking
Visualization
Map4RDF Linked Library Data Sensor Data
Visualisation Visualisation
6
7. Linked data: applications
Geo: http://geo.linkeddata.es/ Travelling: http://webenemasuno.linkeddata.es/
Libraries: http://datos.bne.es Metereology: http://aemet.linkeddata.es/
http://bne.linkeddata.es/
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 7
8. Table of content
1. The concept Specification
2. The process Modelling
3. Examples RDF
Generation
Links
Generation
Publication
Exploitation
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 8
9. Table of content
1. The concept
2. The process
3. Examples
• Libraries: http://datos.bne.es
http://bne.linkeddata.es/
• Geo: http://geo.linkeddata.es/
• Metereology: http://aemet.linkeddata.es/
• Travelling: http://webenemasuno.linkeddata.es/
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 9
10. MARC21
Specification
• Different communication formats:
• MARC 21 format for Bibliographic Data
Modelling
• MARC 21 format for Authority Data
• Others: Holdings, Classification, etc.
RDF Generation
• Three main elements:
• Record structure: ISO 2709. Fields, indicators,
Links Generation subfields…
• Content designation: "Meaning" of codes and
conventions
Publication
• Content: Defined outside the MARC standard (ISBD,
AACR..)
Exploitation
So, RDBtoRDF technologies were not appropriate for
this task.
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 11
11. Specification@ BNE
• Records in the MARC 21 format
• 3.9 million bibliographical records
Specification
• 4.2 million authority records
Modelling
• Version: November, 2011
AUTHORITY BIBLIOGRAPHIC
RDF Generation
Links Generation
Persons 76576 Maps
Corporate bodies 320727 Sound recordings
Publication Conferences 166017 Gravings, drawings, pictures
Titles 35770 Manuscripts
Subject 143959 Ancient books
2696560 Modern books
Exploitation
178473 Scores
3021 Electronic resources
156634 Serials
96672 Videos
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 12
12. MARC21 record structure
Specification • Authority record: Camus, Albert*
Control Field 001 XX1721208
005 200012181124
008 901120nn aijnnaabn n aaa
016 $a BNE19900178994
040 $a SpMaBN $b spa $c SpMaBN $e rdc $f
embne
Field Subfield Content 100 10 $a Camus, Albert
HEADING
Subfield Content 1XX
$d 1913-1960
670 $a El mite de Sísif, 1987 $b port. (Albert
Camus)
670 $a Dic. de filosofía, de J. Ferrater Mora,
1980$b(Camus., Albert (1913-1960); n.
Mondovi, Argel)
670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert)
* http://datos.bne.es/resource/XX1721208
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 13
13. MARC21 record content designation
• Authority record: Camus, Albert*
Control Number 001 XX1721208
HEADING – Personal
Personal name Name 100 10 $a Camus, Albert Name
100
Dates associated with name $d 1913-1960
Source consulted Citation 670 $a El mite de Sísif, 1987 $b port. (Albert
Camus)
• Human reading:
An authority record that describes a Person, named
Camus, Albert with associated dates 1913-1960
* http://datos.bne.es/resource/XX1721208
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 14
14. Frecuency of codes in records
Specification
Modelling
RDF Generation
Links Generation
Publication
Exploitation
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 15
15. Specification
• Source data: MARC 21 records, not RDB. Very flat
Specification
structure difficult to map to richer models
Modelling • Domain experts (catalogers) need to be part of the
mapping process.
• Highly specialized library models: FRBR, ISBD.
RDF Generation
• Data quality good but still many errors: data curation
Links Generation during the LD generation process
• Iterative and incremental transformation process: measure
coverage and progress.
Publication
• Multilinguality, collaboration with IFLA
Exploitation
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
16. Modelling: Ontologies and Terminology
Specification
Modelling
RDF Generation
Links Generation
Publication
Shared
Understanding
Exploitation
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
17. Model: FRBR at a glance
Work 2
Specification
Works
Work 1
Modelling Work 3
RDF Generation
Expression 2
Links Generation
Expression1 Expressions
Publication
Exploitation
Manifestations
Manifestation1 Manifestation2
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 18
18. The Ontology: based on IFLA vocabularies
Specification
Modelling
RDF
Generation
Links
Generation
Publication
Exploitation
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
19. Who will be the mapping generator?
001 XX1721208
Specification
005 200012181124
008 901120nn aijnnaabn n aaa
016 $a BNE19900178994
Modelling 040 $a SpMaBN $b spa $c SpMaBN $e rdc $f embne
100 10 $a Camus, Albert
$d 1913-1960
RDF 670 $a El mite de Sísif, 1987 $b port. (Albert Camus)
Generation 670 $a Dic. de filosofía, de J. Ferrater Mora,
1980$b(Camus., Albert (1913-1960); n. Mondovi,
Argel)
Links 670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert)
Generation
Publication
Exploitation
BNE
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
20. Similar to mapping ontologies
100a maps Person
maps
Content Content
(100a) (100at) is creator of
contained in
maps
100at Work
subfield
property
maps
100t title of work
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 21
21. Marimba software
•
Marimba allows librarians to create mappings
between MARC21 records
and IFLA vocabularies using spreadsheets
Basic structure
Classification
mapping MARC21 Records count Content sample Mapping
info
100 $a $d 888.880 Camus, Albert foaf:Person
Annotation 1913-1960
mapping
100 $a 999.999 Cervantes, Miguel foaf:name
de
Relationships 100 $a $m 10.000 Cervantes, iguel ERROR
mapping
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 22
22. Librarians create mappings using excell
Classification
mapping
Classification Basic structure
mapping
MARC21 Records count Content sample Mapping
info
100 $a $d 888.880 Camus, Albert foaf:Person
1913-1960
Annotation 100 $a 999.999 Cervantes, Miguel foaf:name
mapping de
100 $a $m 10.000 Cervantes, iguel ERROR
Relationships
mapping
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 23
23. Librarians create mappings using excell
Annotation
mapping
place of publication
has dimensions
Is part of work
Relationships
mapping
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 24
24. Marimba interprets the Mappings and generate the RDF
001 XX1721208
……
Specification 100 10 $a Camus, Albert
$d 1913-1960
……
Modelling • Classify: Exploiting the heading field and subfield codes.
100 $a $d Person (it has a personal name)
RDF 100 $a $d $t Work (it has a title)
Generation
• Annotate: Using subfield codes and the content.
Links
Generation 100 $a "Camus, Albert" frbr:3001 "Camus, Albert"
100 $t "La Peste" frbr:P3039 "La Peste"
Publication
MARC 21 record Action RDF (Output)
(Input)
Exploitation
100 $a $d Classify rdf:type frbr:C1005
100 $a Camus, Annotate frbr:P3039 "Camus,
BNE
Albert Albert"
100 $d 1913-1960 Annotate frbr:P3040 "1913-
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. 1960"
DERI, Galway - August 3th, 2012 25
25. Mapping process more in detail
• But, what about the relationships between the entities?
RDF • Relationships between records are not explicit in MARC.
Generation
Goal: The work "La Peste" was created by Albert Camus
001 XX1721208 001 XX1910518
100 10 $a Camus, Albert $d 1913-1960 100 10 $a Camus, Albert$d1913-1960 $tLa peste
Common Common Diff
Person Work
We know the type of R1 and R2, and we look at the heading diff
bne:XX1721208 frbr:2010 bne:XX1910518
(isCreatorOf)
* http://datos.bne.es/resource/XX1910518
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 26
26. Marimba: Mapping process summary
(MARC records)
001 XX1721208 001 XX1910518
Specification
100 10 $a Camus, Albert $d 1913-1960 100 10 $a Camus, Albert$d1913-1960 $tLa
peste
Modelling Classify
bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:Work
RDF
Generation
Annotate
Links bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:Work
Generation frbr:name "Camus, Albert" . frbr:title "La Peste"
frbr:hasDates 1913-1960
Publication
Relate
bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:Work
frbr:name "Camus, Albert" . frbr:title "La Peste" .
Exploitation frbr:hasDates 1913-1960 . frbr:isCreatedBy bne:XX1721208
frbr:isCreatorOf bne:XX1721208
BNE
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 27
27. Marimba uses the ontology to generate RDF
Specification
Modelling
RDF
Generation
Links
Generation
Publication
Exploitation
BNE
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
28. Marimba links with other resources:
VIAF, DNB, SUDOC, LIBRIS, DBpedia
http://d-nb.info/gnd/11851993X
Specification
DNB
Modelling http://viaf.org/viaf/17220427
VIAF
Same As
RDF Same As http://dbpedia.org/resource/Miguel_de_Cervantes
Generation
DBpedia
Same As
Links
Generation
http://datos.bne.es/resource/XX1718747
BNE
Publication Same As
Same As
Exploitation http://www.idref.fr/026774771/id
SUDOC
http://libris.kb.se/resource/auth/45369
LIBRIS
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
29. Publicación
Specification
Modelling
Data publication
RDF Metadata publicacion using VOID
Generation
Links
To facilitate the discovery
Generation
• Register in CKAN your dataset
Publication
• Use to sitemap4rdf to generate the site map
Exploitation
• Upload the site map to Google and Sindice
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
30. Especification
Specification http://bne.linkeddata.es/
Modelling
Model
RDF
Generation
generation
Links
Publication
Generation
Exploitation
Publication
Exploitation
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 34
31. Technological Support
• Modelling: based on IFLA Vocabularies
• Open Metadata Registry
• Neon Toolkit
• Mapping and generation
• MARiMbA: Library-oriented, supports and facilitates the
entire process od transformation from MARC21 to RDF
• Publication:
• Virtuoso Universal Server
• Pubby
• CKAN registry
• Sitemap4rdf
• Exploitation:
• Web Applications that visualize data using SPARQL
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
32. Table of content
1. The concept
2. Foundations
3. The process
4. Examples
• Libraries: http://datos.bne.es
• http://linkeddata3.dia.fi.upm.es/bne-demo
• Geo: http://geo.linkeddata.es/
• Metereology: http://aemet.linkeddata.es/
• Travelling: http://webenemasuno.linkeddata.es/
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 37
33. http://geo.linkeddata.es/
Uniform access to the Spanish RDF
Specification Geographical Institute Databases Generation
from DB
7 geographical DB Geometry2RDF
• Granularity
• Scale
NOR2O
• Multilingual
Geometry
column
shp2RDF
W3C 4
Vocabulary
Model O.
hasStatisticalData Statistics
hasLat/Long WGS84 hasLat/Long
SCOVO
on
hydrOntology hasLocation/isLocated FAO
O.
FAO Time
UNESCO Geopolitical
EGM / ERM 4 hasGeometry
hasGeometry
ontology
W3C Time
GeoNames
… GML
Legend
GML 4 Ontology
Specification
Specification 4
Thesaurus
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
36. Phase/Domain Library Goegraphic Meteorology Travelling Statistic
(BNE) (IGN, Otalex) (AEMET) (PRISA) (INE)
hydrontology Scovo
Modeling Wgs84 SSN ontology SIOC
PROV DC time PROV Data cube
PROV
MARiMbA
RDF generation geometry2rdf
NOR2O CSV parser CSV parser NOR2O
Silk Silk Silk
NOR2O
DNB DBPEDIA
DBPEDIA
Links VIAF Geolinkeddata.es Geolinkeddata.es
LIBRIS Geolinkeddata.es
generation Geonames
DBPEDIA
Publication Pubby
sitemap4rdf
SPARQL
map4rdf
Exploitation
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 43
37. Results
http://datos.bne.es http://webenemasuno.linkeddata.es/
• Total number of authority records: 4.100.000 • Total number of guides: 27.876
• Total number of bibliographic records: • Total number of posts: 32.502
2.390.140 • Total number of locations: 6.838
• Total number of RDF triples: 58.053.215 • Total number of RDF triples: 9.462.339
• Links (15% authority): 587.520 • Linked sources: 12.750
• Linked sources: DBPedia (6024 links)
• VIAF GeoLinkedData (6726 links)
• SUDOC (Sistema Universitario de
Documentación) FR
• GND (Auth German National Library) GER
• LIBRIS Sweden
• DBPedia
http://geo.linkeddata.es/
Number of geo type phenomenon: 95 (Rivers, mountains, etc.)
Number of geo entities: 155.000
Total number of RDF triples: 21.564.199
Links: 1002 (outlinks) y 6782 (coming)
Linked sources: DBpedia y GeoNames (outlinks)
AEMET y El Viajero (entry)
A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 44
38. Linked Data Applications:
There is no One-Size-Fits-All
Formula
Asunción Gómez-Pérez
Facultad de Informática, Universidad Politécnica de Madrid
Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net
asun@fi.upm.es
Acknowledgements:
O.Corcho, D. Garijo, D. Vila, L.Vilches, B. Villazón
Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0