Contenu connexe Similaire à Maximising (Re)Usability of Resources using Linked Data (20) Plus de Asuncion Gomez-Perez (9) Maximising (Re)Usability of Resources using Linked Data2. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
About us
Directors: A. Gómez-Pérez, O. Corcho
Position: 8th in the UPM ranking (200 groups)
Research Group (30 people)
- 2 Full Professors
- 5 Associate Professors
- 3 Assistant Professors
- 7 Senior Postdocs
- 12 PhD Students
- 5 MSc and BSc Students
- 3 software engineers
- 1 system administrator
- 2 project managers
170+ Past Collaborators
50+ Past Visitors
http://www.oeg-upm.net/
https://github.com/oeg-upm
@oeg-upm
3. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Ontology Engineering Group at a glance
Created in1995
World-wide known in the research areas
Ontologies
Semantic Web and Linked Data
Multilingual linked Data
Open Data
eScience
Projects (> 12M€)
27 EU projects (7 as coordinator)
54 National Projects
27 contracts with companies
Publications
106 journals
362 International conferences and book chapters
7 Books
Impact of publications H-index
Asunción Gómez-Pérez (h:47, citations 13583)
Oscar Corcho García (h: 33, citations 7230)
Services to the Spanish community
Host esDbpedia
Host linkeddata.es
Supervision of students
23 Ph.D thesis (9 awarded best thesis prize)
>150 MS.C thesis and BS.C
Events organization
11 editions of the International Summer School
on Ontological Engineering and the Semantic
Web
> 50 WS and tutorials
Standardization activities
>25 @ W3C, ISO, OASIS, etc.
Mobility
PhD students: 3-6 months abroad
Postdocs: 1 month every 2 years
Visibility
Program chairs of ESWC, ISWC, KCAP,
EKAW, TKE, TIA
Editorial board of Journals
Invited talks at conferences and events
Programme Committee presence
Collaboration with COM (Center Open
Middleware)
4. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
License
• This work is licensed under the Creative Commons
Attribution – Non Commercial – Share Alike License
• You are free:
- to Share — to copy, distribute and transmit the work
- to Remix — to adapt the work
• Under the following conditions
- Attribution — You must attribute the work by inserting
• “[source http://www.oeg-upm.net/]” at the footer of each
reused slide
• a credits slide stating: “Maximising (Re)Usability of
Resources using Linked Data” by A. Gómez-Pérez ”
- Non-commercial
- Share-Alike
4
5. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
1. Motivation
2. Linked Data Foundations
3. Linked Data Process
Examples from : http://datos.bne.es
4. Linguistic Linked Data
5. Multilingual Linked Data
6. Uses of Linked Data
5
6. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
A world of digital data
Heterogeneous
Formats
Providers Domains Languages
8. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Complementary,
Different
languages,
but not connected
Lack of interoperability
9. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Multilingual Data Integration
Fotografía
El Quijote
Image
http://www.mancia.org/foro/
articulos/107712-don-quijote-medicina.html
URL
El Quijote
Photo
M. Cervantes
El Quijote
Author of
BNE
Located
El Quijote
Vídeo
El Quijote
Españo
Video
Film
Language
http://www.rtve.es/alaca
rta/videos/el-quijote/
URL
Movie
M. Cervantes
Don Quixote
Polish
Written by
Translated in
1960
Year of
publication
VIAF
located
11. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Turbine metadata
Fore cast
information
Wind Turbine
Energy output by
month
Limitations when exploiting different and disconnected data sources
Wind Speed per
day and city
Wind farm topology
Company Private data
Real time wind speed
Metadata Data
M D
M D
M D M D
M D
M
M D
Complementary
but
not connected
12. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Lack of interoperability:
Language, Syntax,Semantic and Technical
• Ecosystem of
- Open Resources in silos
- Complementary domains
- Heterogeneous formats
- Different languages
- Repositories with different
metadata
- Many APIs and services
for querying
13. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
The user problems…
Discovery and Use of Information in
third party applications is hard,
manual and time consuming
Metadata Metadata
Combination of Private and
Public Sector data in third
party applications requires
solutions to the license issues
Data
Data
15. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linked Data: why it is important?
• Facilitate data integration
- From heterogeous sources
- In different formats
- Different granularity
- In different languages
- From different countries
© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
16. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
LD domains in August 2014
Media
Geographic
Life Sciences
PublicationsGoverment
Social
Networking
Cross-domains
User Generated
Content Linguistics
17. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Foundations
Unique identifiers: URI
identify or name a resource
RDF(S) models
El QuijoteCervantes
Is creator of
WorkPerson
Is creator of
Is a Is a
http://datos.bne.es/resource/XX1718747 http://datos.bne.es/resource/XX3383563
http://iflastandards.info/ns/fr/frbr/frbrer/C1005 http://iflastandards.info/ns/fr/frbr/frbrer/C1001
Equivalence links to other datasets
Same As
http://viaf.org/viaf/17220427
Cervantes
Same AsSame As
http://dbpedia.org/resource/Miguel_de_Cervantes
Cervantes
Data navigation
18. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Foundations: Linking
Models alignment using Owl EquivalentClass
EquivalentClass
Same As
http://xmlns.com/foaf/0.1/PersonPerson
http://schema.org/PersonPerson
EquivalentClass
Municipality
Person
Place of birth
http://iflastandards.info/ns/fr/frbr/frbrer/C1005
http://dbpedia.org/resource/Municipalities_of_Spain
http://dbpedia.org/page/Alcal%C3%A1_de_Henares
Alcalá de Henares
Is a
http://geo.linkeddata.es/ontology/Municipio
Municipio
http://geo.linkeddata.es/resource/Alcalá de Henares
Alcalá de Henares
IS A
19. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
The model (Ontology) and the data
23
Work
Idiom
translation
Year
Publication date
Library
Located at
Person
Is creator of
Has subject
El Quijote Cervantes
Is creator of
Catalán
translation
1960
Publication date
BNE
Located in
Has subject
Vida de Cervantes
birthPlace
Place
birthPlace
Alcalá de Henares
Ontology
Data
20. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez24
http://iflastandards.info/ns/fr/frbr/frbrer/C1001
http://iflastandards.info/ns/fr/frbr/frbrer/C1002
translation
Año
Publication date
http://xmlns.com/foaf/0.1/Organization
Located in
http://iflastandards.info/ns/fr/frbr/frbrer/C1005
Is creator of
Has subject
http://datos.bne.es/resource/XX3383563 http://datos.bne.es/resource/XX1718747
Es autor
http://datos.bne.es/resource/XX1924295
translation
1960
Publication date
BNE
Located in
Has subject
http://datos.bne.es/resource/bimo0002045496
Vida de Miguel de Cervantes Saavedra
Don Quijote de la Mancha
Cervantes Saavedra, Miguel de
Catalán
Ontology
Data
http://datos.bne.es/#
Language
work
Biblioteca
Person
http://geo.linkeddata.es/ontology/Municipio
birthPlace
http://geo.linkeddata.es/resource/Alcalá de Henares
birthPlace
Linked data is full of URIs
21. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linked Data without ontologies
http://www.server1.org/resource/Cervantes
http://www.server2.es/resource/Cervantes
http://datos.bne.es/resource/XX1718747
http://d-nb.info/gnd/11851993X
http://geo.linkeddata.es/page/resource/Municipio/Cervantes
Same as
Same as
Same as
Same as
URI
URI
URI
URI
URI
914 296 093
276,4 km²
Phone
Size
1547
#People
1547
Date of Birth
Author
D. Quijote
Cervantes
22. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linked Data and ontologies
http://www.server1.org/resource/Cervantes
http://www.server2.es/resource/Cervantes
http://datos.bne.es/resource/XX1718747
http://d-nb.info/gnd/11851993X
http://geo.linkeddata.es/page/resource/Municipio/Cervantes
Same as
Person
rdf:type
rdf:type
Retaurant
rdf:type
Street
rdf:type
Municipality
rdf:type
URI
URI
URI
URI
URI
1547
Date of Birth
Author
D. Quijote
Cervantes
(Person)
23. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
The problem and challenges
27
Need to access heterogeneous relational
data sources (Geography, Energy, Medicine,
Environment)
Need to submit SPARQL queries into
distributed SPARQL endpoints
• Some of the databases are available
in different DBMSs
• Some of the data sources are
available as spreadsheets, Words, PDFs,
• Furthermore, many of these datasets
are already published as Linked Data
or in SPARQL endpoints
• Data may be available from data
streams (e.g., sensors)
We can use ontologies as
global schemas for our data sources
Oscar Corcho and the OEG-UPM Data Integration
24. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linked Data allows uniform access
1. Agree on vocabularies for
describing metadata and domain
data
2. Unified and standardized language
for describing resources ( RDF(S))
3. Unified and standardized query
language (SPARQL)
4. Standardized non-proprietary APIs
5. Links to other resources
25. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linked data Technologies @ OEG
29
Geometry2RDF
shp2RDF
geo REST service
annotation
Sem4TagsMarimbaNOR2OMorph
SPARQL
-Stream
Linked Library Data
Visualisation
Map4RDF Sensor Data
Visualisation
Visualization
RDF Generation and Linking
LDP4j
26. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Metadata and data Integration
Data Generation
Metadata Generation
Public Resources
Producers
Private Resources
Geographical
Information REST service
annotation
Web 2.0
Library and
Cultural
Heritage
Diverse Information Sensor
Networks data
Data Integration
Users
Metadata Integration
27. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
3. Linked Data Process
Examples from datos.bne.es
28. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linked Data life cycle
1. Clear methodologies,
methods and tools for
monolingual LD
generation and
publication
Villazón-Terrazas, B.; Vilches. L.; Corcho, O.; Gómez-Pérez, A.
Methodological Guidelines for Publishing Government Linked
Data. In D. Wood, ed. Linking Government Data. Springer. (pp,
27-49). 2011
Specification
Modelling
GenerationPublication
Exploitation
Linking
29. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Specification
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
Goal
Linked Data generation of the Spanish National
Library Metadata
• Source data: MARC 21 records, not RDB. Very flat
structure difficult to map to richer models
• Domain experts (catalogers) need to be part of the
mapping process.
- Highly specialized library models: FRBR, ISBD.
• Data quality good but still many errors: data curation
during the LD generation process
- Iterative and incremental transformation process: measure
coverage and progress.
• Multilinguality, collaboration with IFLA
30. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez 34
• Identify and analyse the data sources
analysis
• Design the URIs
• License and Provenance definition
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
31. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
MARC21
• Different communication formats:
- MARC 21 format for Bibliographic Data
- MARC 21 format for Authority Data
- Others: Holdings, Classification, etc.
• 3.9 million bibliographical records
• 4.2 million authority records
35
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
32. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
MARC21 record structure
001 XX1721208
005 200012181124
008 901120nn aijnnaabn n aaa
016 $a BNE19900178994
040 $a SpMaBN $b spa $c SpMaBN $e rdc $f
embne
100 10 $a Camus, Albert
$d 1913-1960
670 $a El mite de Sísif, 1987 $b port. (Albert
Camus)
670 $a Dic. de filosofía, de J. Ferrater Mora,
1980$b(Camus., Albert (1913-1960); n.
Mondovi, Argel)
670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert)
37
SubfieldField
Control Field
Content
Subfield Content
• Authority record: Camus, Albert*
HEADING
1XX
* http://datos.bne.es/resource/XX1721208
Specification
33. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Frecuency of codes in records
39
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
34. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Modelling: Ontologies and Terminology
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
Shared
Understanding
35. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Ontology
• Ligth weight Ontologies:
o Concepts
o Organized in taxonomies
o Properties between concepts
o Properties for describing concepts
• Shared understanding of a domain of interest
• Ontologies expressed in OWL or RDF(S)
• The NeOn methodology helps to build ontologies
Modelling
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
36. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Model: FRBR at a glance
Works
Expressions
Manifestations
Work 1
Work 2
Work 3
Expression1
Expression 2
Manifestation1 Manifestation2
42
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
37. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
The Ontology: based on IFLA vocabularies
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
38. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Who will be the mapping generator?
001 XX1721208
005 200012181124
008 901120nn aijnnaabn n aaa
016 $a BNE19900178994
040 $a SpMaBN $b spa $c SpMaBN $e rdc $f embne
100 10 $a Camus, Albert
$d 1913-1960
670 $a El mite de Sísif, 1987 $b port. (Albert Camus)
670 $a Dic. de filosofía, de J. Ferrater Mora,
1980$b(Camus., Albert (1913-1960); n. Mondovi,
Argel)
670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert)
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
MARC 21 records
IFLA-based Ontologies
39. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Similar to mapping ontologies
45
100at Work
property
subfield
maps
100t title of work
maps
is creator of
Person100a maps
Content
(100a)
Content
(100at)contained in
maps
Modelling
40. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Librarians create mappings using excell
47
Classification
mapping
Annotation
mapping
Relationships
mapping
MARC21
info
Records count Content sample Mapping
100 $a $d 888.880 Camus, Albert
1913-1960
foaf:Person
100 $a 999.999 Cervantes, Miguel
de
foaf:name
100 $a $m 10.000 Cervantes, iguel ERROR
Basic structure
Classification
mapping
41. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez 48
Annotation
mapping
Relationships
mapping
Librarians create mappings using excell
place of publication
has dimensions
Is part of work
42. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Marimba: Mapping process summary
Classify
Annotate
Relate
51
001 XX1721208
100 10 $a Camus, Albert $d 1913-1960
001 XX1910518
100 10 $a Camus, Albert$d1913-1960 $tLa
peste
bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:Work
bne:XX1721208 a frbr:Person
frbr:name "Camus, Albert" .
frbr:hasDates 1913-1960
bne:XX1910518 a frbr:Work
frbr:title "La Peste"
bne:XX1721208 a frbr:Person
frbr:name "Camus, Albert" .
frbr:hasDates 1913-1960 .
frbr:isCreatorOf bne:XX1721208
bne:XX1910518 a frbr:Work
frbr:title "La Peste" .
frbr:isCreatedBy bne:XX1721208
(MARC records)
BNE
Specification
Modelling
RDF
Generation
Publication
Exploitation
Links
Generation
43. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Marimba uses the ontology to generate RDF
BNE
Specification
Modelling
RDF
Generation
Publication
Exploitation
Links
Generation
• http://marimba4lib.com
44. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Marimba links with other resources:
VIAF, DNB, SUDOC, LIBRIS, DBpedia
BNE
http://datos.bne.es/resource/XX1718747
Same As
Same As
Same As
Same As
Same As
LIBRIS
http://libris.kb.se/resource/auth/45369
SUDOC
http://www.idref.fr/026774771/id
DNB
http://d-nb.info/gnd/11851993X
DBpedia
http://dbpedia.org/resource/Miguel_de_Cervantes
VIAF
http://viaf.org/viaf/17220427
Specification
Modelling
RDF
Generation
Publication
Exploitation
Links
Generation
Several IRI/URIs exist for Miguel de Cervantes
45. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Marimba links with other resources:
VIAF, DNB, SUDOC, LIBRIS, DBpedia
Specification
Modelling
RDF
Generation
Publication
Exploitation
Links
Generation
46. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Publicación
Data publication
Metadata publicacion using VOID
To facilitate the discovery
• Register in CKAN your dataset
• Use to sitemap4rdf to generate the site map
• Upload the site map to Google and Sindice
Specification
Modelling
RDF
Generation
Publication
Exploitation
Links
Generation
47. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Exploitation: datos.bne.es
select distinct COUNT(?Obras) where {
http://datos.bne.es/resource/XX1718747
<http://iflastandards.info/ns/fr/frbr/frbrer/P2010>
?Obras
}
URI Cervantes
Is author
SPARQL queries
Web Interface
Specification
Modelling
RDF
Generation
Publication
Exploitation
Links
Generation
48. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Agregation of geographical information with library metadata
60
http://datos.bne.es/autor/XX869875
49. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Locations related with “El Quijote”
61
Itinerary followed in the
trip
Locations
Route
50. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
4. Linguistic Linked
Licensed Data
Linked data
Linguistic
Linked Data
51. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Use cases for LR Discovery
• Language metadata content
- Give me bilingual dictionaries in
Spanish, Polish , that accounts
for grammatical number and
gender with Creative Common
licenses
• Language Resources content
- Give me all occurrences in
corpora of the token “bank”
disambiguated as the WorNet
synset http://wordnet-
rdf.princeton.edu/wn31/1084372
35-n
• Language Services
- Give me all RESTfull
services that can extract
terms from text in Spanish.
52. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Lack of interoperability of Language resources
• Ecosystem of
- Open and Closed resources
- Different Languages
- Silos of LRs
- Complementary resources
• Lexicon, Corpora,
Dictionaries, Grammars, ….
- Heterogeneous formats
• E.g, for Lexicons: Lexinfo,
LMF, LIR, Lemon, …
- Several repositories with
different metadata and
schemas
- Many APIs and services for
querying
53. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
http://es.wiktionary.org
http://rae.es
http://www.wikilengua.org/
index.php/Terminesp:red
http://es.wikipedia.org
http://www.wordreference.
com/sinonimos/
An example
“Red”
(computer
network)
54. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
*Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell
“Red”
Etimologiy Del latin “rete”
Gender: “f”
Definition.: “Conjunto de
ordenadores o de equipos
informáticos conectados entre
sí….”
“Red”
Sinonyms: “sistema”, “malla”,” distribución”
“Red”
Norm: UNE 21302-131
English: network
German: Netzwerk
“Red”
Pronunciation: [red]
Grammar category: sustantivo femenino
Singular: “red”
Plural: “redes”
“Red_de_computadores”
Category: redes informáticas
Image
“Red”
(computer
network)
Complementary but
not connected
55. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
LD allows linguistic data integration
Red
Phonetic form
Form
number
singular
[RED]
Form
plural
[REDES]
Phonetic form
number
Red
Sense
written form
“red”
Sense
written form
“malla”
equivalent
Red
image
Red
Sense Sense
translation
es - en
written form
“red” “network”
written form
Red
written form
Form
gender
femenine
“red”
56. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linguistic Linked Licensed Data
3LD
Linguistic Linked Licensed Data
Language resources
such as:
- Lexica
- Corpora
- Dictionaries ..
NIF
NLP Interchange Format
Using RDF and
standard data
models
(vocabularies):
- Lexica
- Corpora
ODRL
Open Digital Rights Language
Published along with
a machine-readable
license.
www.lider-project.eu
57. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linguistic Linked Data Evolution
Jan. 2013
2014
Sept. 2014Sept. 2013
April. 2015
58. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linguistic Linked Licensed Data @ Nov 2014
LLOD Cloud in November 2014
• 103 Resources (+58%)
• 165 Links (+101% increase)
• More balanced (14 Corpora,
+367%)
• Less Centralized: Babelnet, LexVo
and LexInfo new hubs
Criteria for inclusion:
• Resolvable: URLs that resolve
• RDF: resolve to RDF
• 1000 Triples: self-explaining
• Links: to one resource from the
cloud or other 50 links
• Crawlable: get the whole
resource by crawling
• Linguistic: data must be a
language resources
• Registered: at CKAN
www.lider-project.eu
59. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Best practices and guidelines (BPMLOD @ W3C)
1. Best practices for Multilingual Linked
Data Publication (BPMLOD @ W3C)
- Practices for Naming (URIs)
- Practices for Dereferencing
- Practices for Textual Information
- Practices for Linking
- Practices for Language Identification
2. Guidelines for Linguistic Linked
License Data
- Wordnets,
- Multilingual Lexicographic resources
- Bilingual Dictionaries
- Terminologies in TBX
- NIF-based NLP Web services
How many Linguistic
Resources are exposed in
RDF?
www.lider-project.eu
60. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
How data and Linguistic LD is related
How many Linguistic Resources are exposed in
RDF?
LOD
Is Linguistic LD just
another type of
dataset to be
exposed in RDF?
Is the role of Linguistic
LD to extend any
dataset with lexical
entries? LLD
How many Linguistic Resources are exposed in
RDF?
61. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linguistic Linked Licensed Data
62. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linguistic Linked Licensed Data
How do we represent license information?
63. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linked Data and Linguistic Linked Data
1. Agree on vocabularies for
describing
• Domain vocabularies
• LR metadata and content (Lemon-
Ontolex, NIF, …)
2. Unified and standardized language
for describing resources ( RDF(S))
3. Unified and standardized query
language (SPARQL)
4. Standardized non-proprietary APIs
5. Links to other resources
Linguistic LD
www.lider-project.eu
64. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
5. Linked Data is
multilingual
Linked data
Linguistic
Linked Data
Multilingual
Linked Data
65. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Rationale: LOD is dominated by the English
Language
Some questions:
1. Distribution of natural languages across RDF
datasets?
2. Usage of language tags to indicate the natural
language of RDF tags?
1. Distribution of usage of language tags
2. Distribution of literals tagged as English vs other languages
3. Distribution of literals tagged in languages other than
English
89
2007 2009 2014
66. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
The multilingual LOD: Current state*
9
9%
91%
RDF literals with lang tag
RDF literals without lang tag
7%
93%
RDF literals with lang tag
RDF literals without lang tag
67%
33%
RDF literals English
RDF literals other than English
71%
29%
RDF literals English
RDF literals other than English
JAN
2015
JAN
2014
^* Used corpus: swse.deri.org/dyldo/
67. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
The multilingual LOD: Current state*
91
0
10.000
20.000
30.000
40.000
50.000
60.000
70.000
es de zh fr it ru pl nl pt sv
jan2014 jan2015
Evolution of top 10 most used language tags in languages other than English
^* see statistics for 2012 in the paper “Guidelines for Multilingual Linked Data” Gómez-Pérez 2013
68. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Messages to take home
1. Data providers should include language metadata in their datasets
• in the original data sources (e.g., MARC21 records)
• tags into RDF (e.g., @es, @ pl at least)
• language URIs in the VOID or DCAT descriptions
2. Guidelines and best practices needed to help language metadata generation,
linking and consumption
3. Benefits of adding language information LD datasets
• Reduce the time and cost of identifying language in resources and
terminology
• Foster the aggregation and enrichment of data across complementary
resources
• Enhances data curation
• Improves precision and recall in information retrieval and search
Publishing Linked Data on the Web: The Multilingual Dimension
Daniel Vila-Suero, Asunción Gómez-Pérez, Elena Montiel-Ponsoda, Jorge Gracia, Guadalupe Aguado-de-Cea
http://link.springer.com/chapter/10.1007/978-3-662-43585-4_7
70. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linked Data Applications
104
Ontology Engineering Group
Culture (@BNE) Geograhical (@IGN) Metereological (@AEMET)
News and Media (@ Prisa, RTVE) Internet of Things ( @ CRTM, Bike sharing system)
Smart Cities and Open Data (@ Zaragoza, Gob Aragón, Jacathon, Catalogues)
Host of esDBpedia
71. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Uses of Linked Data
1. Programmers built
applications using
make queries in
SPARQL and get RDF
Culture
(@BNE)
Geograhical
(@IGN)
Metereological
(@AEMET)
Smart Cities
2. Citizens/Users access
LD through a user
interface (they do not
see RDF)
3. Machine – Machine
data exchange and
semantic
interoperability in RDF
72. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
The new Linked Data Ecosystem
Culture
(@BNE)
Geograhical
(@IGN)
Metereological
(@AEMET)
Smart Cities
73. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Thanks for your attention !
107