SlideShare une entreprise Scribd logo
1  sur  73
Télécharger pour lire hors ligne
Maximising
(Re)Usability of
Resources using
Linked Data
A. Gómez-Pérez
Universidad Politécnica de Madrid
asun@fi.upm.es
Acknowledgements: Daniel Vila, Jorge Gracia, Victor Rodríguez Doncel,
Ontology Engineering Group and LIDER Consortium members
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
About us
Directors: A. Gómez-Pérez, O. Corcho
Position: 8th in the UPM ranking (200 groups)
Research Group (30 people)
- 2 Full Professors
- 5 Associate Professors
- 3 Assistant Professors
- 7 Senior Postdocs
- 12 PhD Students
- 5 MSc and BSc Students
- 3 software engineers
- 1 system administrator
- 2 project managers
170+ Past Collaborators
50+ Past Visitors
http://www.oeg-upm.net/
https://github.com/oeg-upm
@oeg-upm
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Ontology Engineering Group at a glance
 Created in1995
 World-wide known in the research areas
 Ontologies
 Semantic Web and Linked Data
 Multilingual linked Data
 Open Data
 eScience
 Projects (> 12M€)
 27 EU projects (7 as coordinator)
 54 National Projects
 27 contracts with companies
 Publications
 106 journals
 362 International conferences and book chapters
 7 Books
 Impact of publications H-index
 Asunción Gómez-Pérez (h:47, citations 13583)
 Oscar Corcho García (h: 33, citations 7230)
 Services to the Spanish community
 Host esDbpedia
 Host linkeddata.es
 Supervision of students
 23 Ph.D thesis (9 awarded best thesis prize)
 >150 MS.C thesis and BS.C
 Events organization
 11 editions of the International Summer School
on Ontological Engineering and the Semantic
Web
 > 50 WS and tutorials
 Standardization activities
 >25 @ W3C, ISO, OASIS, etc.
 Mobility
 PhD students: 3-6 months abroad
 Postdocs: 1 month every 2 years
 Visibility
 Program chairs of ESWC, ISWC, KCAP,
EKAW, TKE, TIA
 Editorial board of Journals
 Invited talks at conferences and events
 Programme Committee presence
 Collaboration with COM (Center Open
Middleware)
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
License
• This work is licensed under the Creative Commons
Attribution – Non Commercial – Share Alike License
• You are free:
- to Share — to copy, distribute and transmit the work
- to Remix — to adapt the work
• Under the following conditions
- Attribution — You must attribute the work by inserting
• “[source http://www.oeg-upm.net/]” at the footer of each
reused slide
• a credits slide stating: “Maximising (Re)Usability of
Resources using Linked Data” by A. Gómez-Pérez ”
- Non-commercial
- Share-Alike
4
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
1. Motivation
2. Linked Data Foundations
3. Linked Data Process
Examples from : http://datos.bne.es
4. Linguistic Linked Data
5. Multilingual Linked Data
6. Uses of Linked Data
5
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
A world of digital data
Heterogeneous
Formats
Providers Domains Languages
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
The students case
“Cervantes"
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Complementary,
Different
languages,
but not connected
Lack of interoperability
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Multilingual Data Integration
Fotografía
El Quijote
Image
http://www.mancia.org/foro/
articulos/107712-don-quijote-medicina.html
URL
El Quijote
Photo
M. Cervantes
El Quijote
Author of
BNE
Located
El Quijote
Vídeo
El Quijote
Españo
Video
Film
Language
http://www.rtve.es/alaca
rta/videos/el-quijote/
URL
Movie
M. Cervantes
Don Quixote
Polish
Written by
Translated in
1960
Year of
publication
VIAF
located
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Energy Efficiency scenario
10
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Turbine metadata
Fore cast
information
Wind Turbine
Energy output by
month
Limitations when exploiting different and disconnected data sources
Wind Speed per
day and city
Wind farm topology
Company Private data
Real time wind speed
Metadata Data
M D
M D
M D M D
M D
M
M D
Complementary
but
not connected
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Lack of interoperability:
Language, Syntax,Semantic and Technical
• Ecosystem of
- Open Resources in silos
- Complementary domains
- Heterogeneous formats
- Different languages
- Repositories with different
metadata
- Many APIs and services
for querying
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
The user problems…
Discovery and Use of Information in
third party applications is hard,
manual and time consuming
Metadata Metadata
Combination of Private and
Public Sector data in third
party applications requires
solutions to the license issues
Data
Data
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
2. Linked Data Foundations
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linked Data: why it is important?
• Facilitate data integration
- From heterogeous sources
- In different formats
- Different granularity
- In different languages
- From different countries
© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
LD domains in August 2014
Media
Geographic
Life Sciences
PublicationsGoverment
Social
Networking
Cross-domains
User Generated
Content Linguistics
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Foundations
Unique identifiers: URI
identify or name a resource
RDF(S) models
El QuijoteCervantes
Is creator of
WorkPerson
Is creator of
Is a Is a
http://datos.bne.es/resource/XX1718747 http://datos.bne.es/resource/XX3383563
http://iflastandards.info/ns/fr/frbr/frbrer/C1005 http://iflastandards.info/ns/fr/frbr/frbrer/C1001
Equivalence links to other datasets
Same As
http://viaf.org/viaf/17220427
Cervantes
Same AsSame As
http://dbpedia.org/resource/Miguel_de_Cervantes
Cervantes
Data navigation
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Foundations: Linking
Models alignment using Owl EquivalentClass
EquivalentClass
Same As
http://xmlns.com/foaf/0.1/PersonPerson
http://schema.org/PersonPerson
EquivalentClass
Municipality
Person
Place of birth
http://iflastandards.info/ns/fr/frbr/frbrer/C1005
http://dbpedia.org/resource/Municipalities_of_Spain
http://dbpedia.org/page/Alcal%C3%A1_de_Henares
Alcalá de Henares
Is a
http://geo.linkeddata.es/ontology/Municipio
Municipio
http://geo.linkeddata.es/resource/Alcalá de Henares
Alcalá de Henares
IS A
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
The model (Ontology) and the data
23
Work
Idiom
translation
Year
Publication date
Library
Located at
Person
Is creator of
Has subject
El Quijote Cervantes
Is creator of
Catalán
translation
1960
Publication date
BNE
Located in
Has subject
Vida de Cervantes
birthPlace
Place
birthPlace
Alcalá de Henares
Ontology
Data
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez24
http://iflastandards.info/ns/fr/frbr/frbrer/C1001
http://iflastandards.info/ns/fr/frbr/frbrer/C1002
translation
Año
Publication date
http://xmlns.com/foaf/0.1/Organization
Located in
http://iflastandards.info/ns/fr/frbr/frbrer/C1005
Is creator of
Has subject
http://datos.bne.es/resource/XX3383563 http://datos.bne.es/resource/XX1718747
Es autor
http://datos.bne.es/resource/XX1924295
translation
1960
Publication date
BNE
Located in
Has subject
http://datos.bne.es/resource/bimo0002045496
Vida de Miguel de Cervantes Saavedra
Don Quijote de la Mancha
Cervantes Saavedra, Miguel de
Catalán
Ontology
Data
http://datos.bne.es/#
Language
work
Biblioteca
Person
http://geo.linkeddata.es/ontology/Municipio
birthPlace
http://geo.linkeddata.es/resource/Alcalá de Henares
birthPlace
Linked data is full of URIs
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linked Data without ontologies
http://www.server1.org/resource/Cervantes
http://www.server2.es/resource/Cervantes
http://datos.bne.es/resource/XX1718747
http://d-nb.info/gnd/11851993X
http://geo.linkeddata.es/page/resource/Municipio/Cervantes
Same as
Same as
Same as
Same as
URI
URI
URI
URI
URI
914 296 093
276,4 km²
Phone
Size
1547
#People
1547
Date of Birth
Author
D. Quijote
Cervantes
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linked Data and ontologies
http://www.server1.org/resource/Cervantes
http://www.server2.es/resource/Cervantes
http://datos.bne.es/resource/XX1718747
http://d-nb.info/gnd/11851993X
http://geo.linkeddata.es/page/resource/Municipio/Cervantes
Same as
Person
rdf:type
rdf:type
Retaurant
rdf:type
Street
rdf:type
Municipality
rdf:type
URI
URI
URI
URI
URI
1547
Date of Birth
Author
D. Quijote
Cervantes
(Person)
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
The problem and challenges
27
Need to access heterogeneous relational
data sources (Geography, Energy, Medicine,
Environment)
Need to submit SPARQL queries into
distributed SPARQL endpoints
• Some of the databases are available
in different DBMSs
• Some of the data sources are
available as spreadsheets, Words, PDFs,
• Furthermore, many of these datasets
are already published as Linked Data
or in SPARQL endpoints
• Data may be available from data
streams (e.g., sensors)
We can use ontologies as
global schemas for our data sources
Oscar Corcho and the OEG-UPM Data Integration
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linked Data allows uniform access
1. Agree on vocabularies for
describing metadata and domain
data
2. Unified and standardized language
for describing resources ( RDF(S))
3. Unified and standardized query
language (SPARQL)
4. Standardized non-proprietary APIs
5. Links to other resources
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linked data Technologies @ OEG
29
Geometry2RDF
shp2RDF
geo REST service
annotation
Sem4TagsMarimbaNOR2OMorph
SPARQL
-Stream
Linked Library Data
Visualisation
Map4RDF Sensor Data
Visualisation
Visualization
RDF Generation and Linking
LDP4j
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Metadata and data Integration
Data Generation
Metadata Generation
Public Resources
Producers
Private Resources
Geographical
Information REST service
annotation
Web 2.0
Library and
Cultural
Heritage
Diverse Information Sensor
Networks data
Data Integration
Users
Metadata Integration
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
3. Linked Data Process
Examples from datos.bne.es
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linked Data life cycle
1. Clear methodologies,
methods and tools for
monolingual LD
generation and
publication
Villazón-Terrazas, B.; Vilches. L.; Corcho, O.; Gómez-Pérez, A.
Methodological Guidelines for Publishing Government Linked
Data. In D. Wood, ed. Linking Government Data. Springer. (pp,
27-49). 2011
Specification
Modelling
GenerationPublication
Exploitation
Linking
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Specification
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
Goal
Linked Data generation of the Spanish National
Library Metadata
• Source data: MARC 21 records, not RDB. Very flat
structure difficult to map to richer models
• Domain experts (catalogers) need to be part of the
mapping process.
- Highly specialized library models: FRBR, ISBD.
• Data quality good but still many errors: data curation
during the LD generation process
- Iterative and incremental transformation process: measure
coverage and progress.
• Multilinguality, collaboration with IFLA
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez 34
• Identify and analyse the data sources
analysis
• Design the URIs
• License and Provenance definition
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
MARC21
• Different communication formats:
- MARC 21 format for Bibliographic Data
- MARC 21 format for Authority Data
- Others: Holdings, Classification, etc.
• 3.9 million bibliographical records
• 4.2 million authority records
35
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
MARC21 record structure
001 XX1721208
005 200012181124
008 901120nn aijnnaabn n aaa
016 $a BNE19900178994
040 $a SpMaBN $b spa $c SpMaBN $e rdc $f
embne
100 10 $a Camus, Albert
$d 1913-1960
670 $a El mite de Sísif, 1987 $b port. (Albert
Camus)
670 $a Dic. de filosofía, de J. Ferrater Mora,
1980$b(Camus., Albert (1913-1960); n.
Mondovi, Argel)
670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert)
37
SubfieldField
Control Field
Content
Subfield Content
• Authority record: Camus, Albert*
HEADING
1XX
* http://datos.bne.es/resource/XX1721208
Specification
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Frecuency of codes in records
39
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Modelling: Ontologies and Terminology
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
Shared
Understanding
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Ontology
• Ligth weight Ontologies:
o Concepts
o Organized in taxonomies
o Properties between concepts
o Properties for describing concepts
• Shared understanding of a domain of interest
• Ontologies expressed in OWL or RDF(S)
• The NeOn methodology helps to build ontologies
Modelling
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Model: FRBR at a glance
Works
Expressions
Manifestations
Work 1
Work 2
Work 3
Expression1
Expression 2
Manifestation1 Manifestation2
42
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
The Ontology: based on IFLA vocabularies
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Who will be the mapping generator?
001 XX1721208
005 200012181124
008 901120nn aijnnaabn n aaa
016 $a BNE19900178994
040 $a SpMaBN $b spa $c SpMaBN $e rdc $f embne
100 10 $a Camus, Albert
$d 1913-1960
670 $a El mite de Sísif, 1987 $b port. (Albert Camus)
670 $a Dic. de filosofía, de J. Ferrater Mora,
1980$b(Camus., Albert (1913-1960); n. Mondovi,
Argel)
670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert)
Specification
Modelling
RDF Generation
Publication
Links Generation
Exploitation
MARC 21 records
IFLA-based Ontologies
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Similar to mapping ontologies
45
100at Work
property
subfield
maps
100t title of work
maps
is creator of
Person100a maps
Content
(100a)
Content
(100at)contained in
maps
Modelling
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Librarians create mappings using excell
47
Classification
mapping
Annotation
mapping
Relationships
mapping
MARC21
info
Records count Content sample Mapping
100 $a $d 888.880 Camus, Albert
1913-1960
foaf:Person
100 $a 999.999 Cervantes, Miguel
de
foaf:name
100 $a $m 10.000 Cervantes, iguel ERROR
Basic structure
Classification
mapping
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez 48
Annotation
mapping
Relationships
mapping
Librarians create mappings using excell
place of publication
has dimensions
Is part of work
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Marimba: Mapping process summary
Classify
Annotate
Relate
51
001 XX1721208
100 10 $a Camus, Albert $d 1913-1960
001 XX1910518
100 10 $a Camus, Albert$d1913-1960 $tLa
peste
bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:Work
bne:XX1721208 a frbr:Person
frbr:name "Camus, Albert" .
frbr:hasDates 1913-1960
bne:XX1910518 a frbr:Work
frbr:title "La Peste"
bne:XX1721208 a frbr:Person
frbr:name "Camus, Albert" .
frbr:hasDates 1913-1960 .
frbr:isCreatorOf bne:XX1721208
bne:XX1910518 a frbr:Work
frbr:title "La Peste" .
frbr:isCreatedBy bne:XX1721208
(MARC records)
BNE
Specification
Modelling
RDF
Generation
Publication
Exploitation
Links
Generation
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Marimba uses the ontology to generate RDF
BNE
Specification
Modelling
RDF
Generation
Publication
Exploitation
Links
Generation
• http://marimba4lib.com
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Marimba links with other resources:
VIAF, DNB, SUDOC, LIBRIS, DBpedia
BNE
http://datos.bne.es/resource/XX1718747
Same As
Same As
Same As
Same As
Same As
LIBRIS
http://libris.kb.se/resource/auth/45369
SUDOC
http://www.idref.fr/026774771/id
DNB
http://d-nb.info/gnd/11851993X
DBpedia
http://dbpedia.org/resource/Miguel_de_Cervantes
VIAF
http://viaf.org/viaf/17220427
Specification
Modelling
RDF
Generation
Publication
Exploitation
Links
Generation
Several IRI/URIs exist for Miguel de Cervantes
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Marimba links with other resources:
VIAF, DNB, SUDOC, LIBRIS, DBpedia
Specification
Modelling
RDF
Generation
Publication
Exploitation
Links
Generation
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Publicación
Data publication
Metadata publicacion using VOID
To facilitate the discovery
• Register in CKAN your dataset
• Use to sitemap4rdf to generate the site map
• Upload the site map to Google and Sindice
Specification
Modelling
RDF
Generation
Publication
Exploitation
Links
Generation
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Exploitation: datos.bne.es
select distinct COUNT(?Obras) where {
http://datos.bne.es/resource/XX1718747
<http://iflastandards.info/ns/fr/frbr/frbrer/P2010>
?Obras
}
URI Cervantes
Is author
SPARQL queries
Web Interface
Specification
Modelling
RDF
Generation
Publication
Exploitation
Links
Generation
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Agregation of geographical information with library metadata
60
http://datos.bne.es/autor/XX869875
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Locations related with “El Quijote”
61
Itinerary followed in the
trip
Locations
Route
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
4. Linguistic Linked
Licensed Data
Linked data
Linguistic
Linked Data
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Use cases for LR Discovery
• Language metadata content
- Give me bilingual dictionaries in
Spanish, Polish , that accounts
for grammatical number and
gender with Creative Common
licenses
• Language Resources content
- Give me all occurrences in
corpora of the token “bank”
disambiguated as the WorNet
synset http://wordnet-
rdf.princeton.edu/wn31/1084372
35-n
• Language Services
- Give me all RESTfull
services that can extract
terms from text in Spanish.
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Lack of interoperability of Language resources
• Ecosystem of
- Open and Closed resources
- Different Languages
- Silos of LRs
- Complementary resources
• Lexicon, Corpora,
Dictionaries, Grammars, ….
- Heterogeneous formats
• E.g, for Lexicons: Lexinfo,
LMF, LIR, Lemon, …
- Several repositories with
different metadata and
schemas
- Many APIs and services for
querying
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
http://es.wiktionary.org
http://rae.es
http://www.wikilengua.org/
index.php/Terminesp:red
http://es.wikipedia.org
http://www.wordreference.
com/sinonimos/
An example
“Red”
(computer
network)
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
*Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell
“Red”
Etimologiy Del latin “rete”
Gender: “f”
Definition.: “Conjunto de
ordenadores o de equipos
informáticos conectados entre
sí….”
“Red”
Sinonyms: “sistema”, “malla”,” distribución”
“Red”
Norm: UNE 21302-131
English: network
German: Netzwerk
“Red”
Pronunciation: [red]
Grammar category: sustantivo femenino
Singular: “red”
Plural: “redes”
“Red_de_computadores”
Category: redes informáticas
Image
“Red”
(computer
network)
Complementary but
not connected
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
LD allows linguistic data integration
Red
Phonetic form
Form
number
singular
[RED]
Form
plural
[REDES]
Phonetic form
number
Red
Sense
written form
“red”
Sense
written form
“malla”
equivalent
Red
image
Red
Sense Sense
translation
es - en
written form
“red” “network”
written form
Red
written form
Form
gender
femenine
“red”
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linguistic Linked Licensed Data
3LD
Linguistic Linked Licensed Data
Language resources
such as:
- Lexica
- Corpora
- Dictionaries ..
NIF
NLP Interchange Format
Using RDF and
standard data
models
(vocabularies):
- Lexica
- Corpora
ODRL
Open Digital Rights Language
Published along with
a machine-readable
license.
www.lider-project.eu
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linguistic Linked Data Evolution
Jan. 2013
 2014
Sept. 2014Sept. 2013
April. 2015
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linguistic Linked Licensed Data @ Nov 2014
LLOD Cloud in November 2014
• 103 Resources (+58%)
• 165 Links (+101% increase)
• More balanced (14 Corpora,
+367%)
• Less Centralized: Babelnet, LexVo
and LexInfo new hubs
Criteria for inclusion:
• Resolvable: URLs that resolve
• RDF: resolve to RDF
• 1000 Triples: self-explaining
• Links: to one resource from the
cloud or other 50 links
• Crawlable: get the whole
resource by crawling
• Linguistic: data must be a
language resources
• Registered: at CKAN
www.lider-project.eu
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Best practices and guidelines (BPMLOD @ W3C)
1. Best practices for Multilingual Linked
Data Publication (BPMLOD @ W3C)
- Practices for Naming (URIs)
- Practices for Dereferencing
- Practices for Textual Information
- Practices for Linking
- Practices for Language Identification
2. Guidelines for Linguistic Linked
License Data
- Wordnets,
- Multilingual Lexicographic resources
- Bilingual Dictionaries
- Terminologies in TBX
- NIF-based NLP Web services
How many Linguistic
Resources are exposed in
RDF?
www.lider-project.eu
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
How data and Linguistic LD is related
How many Linguistic Resources are exposed in
RDF?
LOD
Is Linguistic LD just
another type of
dataset to be
exposed in RDF?
Is the role of Linguistic
LD to extend any
dataset with lexical
entries? LLD
How many Linguistic Resources are exposed in
RDF?
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linguistic Linked Licensed Data
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linguistic Linked Licensed Data
How do we represent license information?
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linked Data and Linguistic Linked Data
1. Agree on vocabularies for
describing
• Domain vocabularies
• LR metadata and content (Lemon-
Ontolex, NIF, …)
2. Unified and standardized language
for describing resources ( RDF(S))
3. Unified and standardized query
language (SPARQL)
4. Standardized non-proprietary APIs
5. Links to other resources
Linguistic LD
www.lider-project.eu
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
5. Linked Data is
multilingual
Linked data
Linguistic
Linked Data
Multilingual
Linked Data
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Rationale: LOD is dominated by the English
Language
Some questions:
1. Distribution of natural languages across RDF
datasets?
2. Usage of language tags to indicate the natural
language of RDF tags?
1. Distribution of usage of language tags
2. Distribution of literals tagged as English vs other languages
3. Distribution of literals tagged in languages other than
English
89
 2007  2009  2014
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
The multilingual LOD: Current state*
9
9%
91%
RDF literals with lang tag
RDF literals without lang tag
7%
93%
RDF literals with lang tag
RDF literals without lang tag
67%
33%
RDF literals English
RDF literals other than English
71%
29%
RDF literals English
RDF literals other than English
JAN
2015
JAN
2014
^* Used corpus: swse.deri.org/dyldo/
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
The multilingual LOD: Current state*
91
0
10.000
20.000
30.000
40.000
50.000
60.000
70.000
es de zh fr it ru pl nl pt sv
jan2014 jan2015
Evolution of top 10 most used language tags in languages other than English
^* see statistics for 2012 in the paper “Guidelines for Multilingual Linked Data” Gómez-Pérez 2013
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Messages to take home
1. Data providers should include language metadata in their datasets
• in the original data sources (e.g., MARC21 records)
• tags into RDF (e.g., @es, @ pl at least)
• language URIs in the VOID or DCAT descriptions
2. Guidelines and best practices needed to help language metadata generation,
linking and consumption
3. Benefits of adding language information LD datasets
• Reduce the time and cost of identifying language in resources and
terminology
• Foster the aggregation and enrichment of data across complementary
resources
• Enhances data curation
• Improves precision and recall in information retrieval and search
Publishing Linked Data on the Web: The Multilingual Dimension
Daniel Vila-Suero, Asunción Gómez-Pérez, Elena Montiel-Ponsoda, Jorge Gracia, Guadalupe Aguado-de-Cea
http://link.springer.com/chapter/10.1007/978-3-662-43585-4_7
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
6. Uses
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Linked Data Applications
104
Ontology Engineering Group
Culture (@BNE) Geograhical (@IGN) Metereological (@AEMET)
News and Media (@ Prisa, RTVE) Internet of Things ( @ CRTM, Bike sharing system)
Smart Cities and Open Data (@ Zaragoza, Gob Aragón, Jacathon, Catalogues)
Host of esDBpedia
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Uses of Linked Data
1. Programmers built
applications using
make queries in
SPARQL and get RDF
Culture
(@BNE)
Geograhical
(@IGN)
Metereological
(@AEMET)
Smart Cities
2. Citizens/Users access
LD through a user
interface (they do not
see RDF)
3. Machine – Machine
data exchange and
semantic
interoperability in RDF
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
The new Linked Data Ecosystem
Culture
(@BNE)
Geograhical
(@IGN)
Metereological
(@AEMET)
Smart Cities
Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez
Thanks for your attention !
107

Contenu connexe

Tendances

Creating knowledge out of interlinked data
Creating knowledge out of interlinked dataCreating knowledge out of interlinked data
Creating knowledge out of interlinked data
Sören Auer
 
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeFrom Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
Sören Auer
 

Tendances (20)

The Great Twentieth-Century Hole Or, what the Digital Humanities Miss
The Great Twentieth-Century Hole Or, what the Digital Humanities MissThe Great Twentieth-Century Hole Or, what the Digital Humanities Miss
The Great Twentieth-Century Hole Or, what the Digital Humanities Miss
 
The drawbridge to knowledge - Linking scholarly publications and research inf...
The drawbridge to knowledge - Linking scholarly publications and research inf...The drawbridge to knowledge - Linking scholarly publications and research inf...
The drawbridge to knowledge - Linking scholarly publications and research inf...
 
Europeana Newspapers -
Europeana Newspapers - Europeana Newspapers -
Europeana Newspapers -
 
Representation and Absence in Digital Resources: The Case of Europeana Newspa...
Representation and Absence in Digital Resources: The Case of Europeana Newspa...Representation and Absence in Digital Resources: The Case of Europeana Newspa...
Representation and Absence in Digital Resources: The Case of Europeana Newspa...
 
PDX Hadoop: Enterprise Data Workflows with Cascading and Mesos
PDX Hadoop: Enterprise Data Workflows with Cascading and MesosPDX Hadoop: Enterprise Data Workflows with Cascading and Mesos
PDX Hadoop: Enterprise Data Workflows with Cascading and Mesos
 
Creating knowledge out of interlinked data
Creating knowledge out of interlinked dataCreating knowledge out of interlinked data
Creating knowledge out of interlinked data
 
Nanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingNanopublications and Decentralized Publishing
Nanopublications and Decentralized Publishing
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
 
Europeana in a Research Context
Europeana in a Research ContextEuropeana in a Research Context
Europeana in a Research Context
 
Web Data Management in the RDF Age
Web Data Management in the RDF AgeWeb Data Management in the RDF Age
Web Data Management in the RDF Age
 
Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications. Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications.
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
 
Linked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; RepositoriesLinked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; Repositories
 
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeFrom Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
 
Biblissima et IIIF (MAE)
Biblissima et IIIF (MAE)Biblissima et IIIF (MAE)
Biblissima et IIIF (MAE)
 
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
 
Introduction to dm2 e final dg
Introduction to dm2 e final dgIntroduction to dm2 e final dg
Introduction to dm2 e final dg
 
Europeana and Schema.org - DC2013
Europeana and Schema.org - DC2013Europeana and Schema.org - DC2013
Europeana and Schema.org - DC2013
 
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of DataA Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
 
Dm2 e okfn-infoday_scholarly_activities_18_nov
Dm2 e okfn-infoday_scholarly_activities_18_novDm2 e okfn-infoday_scholarly_activities_18_nov
Dm2 e okfn-infoday_scholarly_activities_18_nov
 

Similaire à Maximising (Re)Usability of Resources using Linked Data

Team 05 linked data generation
Team 05 linked data generationTeam 05 linked data generation
Team 05 linked data generation
plan4all
 
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Stefan Dietze
 

Similaire à Maximising (Re)Usability of Resources using Linked Data (20)

Putting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open DataPutting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open Data
 
Semantic Web / Linked Data Technologies
Semantic Web / Linked Data TechnologiesSemantic Web / Linked Data Technologies
Semantic Web / Linked Data Technologies
 
Data management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.euData management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.eu
 
Inspire hack 2017-linked-data
Inspire hack 2017-linked-dataInspire hack 2017-linked-data
Inspire hack 2017-linked-data
 
Team 05 linked data generation
Team 05 linked data generationTeam 05 linked data generation
Team 05 linked data generation
 
Camp 4-data workshop presentation
Camp 4-data workshop presentationCamp 4-data workshop presentation
Camp 4-data workshop presentation
 
Linked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter HaaseLinked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter Haase
 
Data Science: Harnessing Open Data for High Impact Solutions
Data Science: Harnessing Open Data for High Impact SolutionsData Science: Harnessing Open Data for High Impact Solutions
Data Science: Harnessing Open Data for High Impact Solutions
 
Linked Data Workshop at I-Semantics 2010
Linked Data Workshop at I-Semantics 2010Linked Data Workshop at I-Semantics 2010
Linked Data Workshop at I-Semantics 2010
 
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked data
 
Cloud computing application for water resources based on open source software...
Cloud computing application for water resources based on open source software...Cloud computing application for water resources based on open source software...
Cloud computing application for water resources based on open source software...
 
Towards a Linked Data Publishing Methodology
Towards a Linked Data Publishing MethodologyTowards a Linked Data Publishing Methodology
Towards a Linked Data Publishing Methodology
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
Llinked open data training for EU institutions
Llinked open data training for EU institutionsLlinked open data training for EU institutions
Llinked open data training for EU institutions
 
dh_specialist_interview
dh_specialist_interviewdh_specialist_interview
dh_specialist_interview
 
Dataportability & Digital Identity
Dataportability & Digital IdentityDataportability & Digital Identity
Dataportability & Digital Identity
 
NJ Wildlife Habitat Finder
NJ Wildlife Habitat FinderNJ Wildlife Habitat Finder
NJ Wildlife Habitat Finder
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage information
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and Examples
 

Plus de Asuncion Gomez-Perez

Plus de Asuncion Gomez-Perez (9)

Presentación del nodo ODI Madrid / Introduction to ODI Madrid
Presentación del nodo ODI Madrid / Introduction to ODI MadridPresentación del nodo ODI Madrid / Introduction to ODI Madrid
Presentación del nodo ODI Madrid / Introduction to ODI Madrid
 
Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data
 
Lecciones aprendidas al publicar datos enlazados
Lecciones aprendidas al publicar datos enlazadosLecciones aprendidas al publicar datos enlazados
Lecciones aprendidas al publicar datos enlazados
 
Uso de datos.bne.es: imaginando el futuro
Uso de datos.bne.es: imaginando el futuroUso de datos.bne.es: imaginando el futuro
Uso de datos.bne.es: imaginando el futuro
 
Linked data and language technologies
Linked data and language technologies Linked data and language technologies
Linked data and language technologies
 
Linked DAta Applications: There is no One-Size-Fits All Formula (Long present...
Linked DAta Applications: There is no One-Size-Fits All Formula (Long present...Linked DAta Applications: There is no One-Size-Fits All Formula (Long present...
Linked DAta Applications: There is no One-Size-Fits All Formula (Long present...
 
Linked DAta Applications: There is no One-Size-Fits All Formula (Short presen...
Linked DAta Applications: There is no One-Size-Fits All Formula (Short presen...Linked DAta Applications: There is no One-Size-Fits All Formula (Short presen...
Linked DAta Applications: There is no One-Size-Fits All Formula (Short presen...
 
W3c app ld-asun(v5)-final
W3c app ld-asun(v5)-finalW3c app ld-asun(v5)-final
W3c app ld-asun(v5)-final
 
Datos enlazados en la Biblioteca Nacional de España
Datos enlazados en la Biblioteca Nacional  de EspañaDatos enlazados en la Biblioteca Nacional  de España
Datos enlazados en la Biblioteca Nacional de España
 

Dernier

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 

Dernier (20)

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 

Maximising (Re)Usability of Resources using Linked Data

  • 1. Maximising (Re)Usability of Resources using Linked Data A. Gómez-Pérez Universidad Politécnica de Madrid asun@fi.upm.es Acknowledgements: Daniel Vila, Jorge Gracia, Victor Rodríguez Doncel, Ontology Engineering Group and LIDER Consortium members
  • 2. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez About us Directors: A. Gómez-Pérez, O. Corcho Position: 8th in the UPM ranking (200 groups) Research Group (30 people) - 2 Full Professors - 5 Associate Professors - 3 Assistant Professors - 7 Senior Postdocs - 12 PhD Students - 5 MSc and BSc Students - 3 software engineers - 1 system administrator - 2 project managers 170+ Past Collaborators 50+ Past Visitors http://www.oeg-upm.net/ https://github.com/oeg-upm @oeg-upm
  • 3. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Ontology Engineering Group at a glance  Created in1995  World-wide known in the research areas  Ontologies  Semantic Web and Linked Data  Multilingual linked Data  Open Data  eScience  Projects (> 12M€)  27 EU projects (7 as coordinator)  54 National Projects  27 contracts with companies  Publications  106 journals  362 International conferences and book chapters  7 Books  Impact of publications H-index  Asunción Gómez-Pérez (h:47, citations 13583)  Oscar Corcho García (h: 33, citations 7230)  Services to the Spanish community  Host esDbpedia  Host linkeddata.es  Supervision of students  23 Ph.D thesis (9 awarded best thesis prize)  >150 MS.C thesis and BS.C  Events organization  11 editions of the International Summer School on Ontological Engineering and the Semantic Web  > 50 WS and tutorials  Standardization activities  >25 @ W3C, ISO, OASIS, etc.  Mobility  PhD students: 3-6 months abroad  Postdocs: 1 month every 2 years  Visibility  Program chairs of ESWC, ISWC, KCAP, EKAW, TKE, TIA  Editorial board of Journals  Invited talks at conferences and events  Programme Committee presence  Collaboration with COM (Center Open Middleware)
  • 4. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez License • This work is licensed under the Creative Commons Attribution – Non Commercial – Share Alike License • You are free: - to Share — to copy, distribute and transmit the work - to Remix — to adapt the work • Under the following conditions - Attribution — You must attribute the work by inserting • “[source http://www.oeg-upm.net/]” at the footer of each reused slide • a credits slide stating: “Maximising (Re)Usability of Resources using Linked Data” by A. Gómez-Pérez ” - Non-commercial - Share-Alike 4
  • 5. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez 1. Motivation 2. Linked Data Foundations 3. Linked Data Process Examples from : http://datos.bne.es 4. Linguistic Linked Data 5. Multilingual Linked Data 6. Uses of Linked Data 5
  • 6. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez A world of digital data Heterogeneous Formats Providers Domains Languages
  • 7. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez The students case “Cervantes"
  • 8. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Complementary, Different languages, but not connected Lack of interoperability
  • 9. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Multilingual Data Integration Fotografía El Quijote Image http://www.mancia.org/foro/ articulos/107712-don-quijote-medicina.html URL El Quijote Photo M. Cervantes El Quijote Author of BNE Located El Quijote Vídeo El Quijote Españo Video Film Language http://www.rtve.es/alaca rta/videos/el-quijote/ URL Movie M. Cervantes Don Quixote Polish Written by Translated in 1960 Year of publication VIAF located
  • 10. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Energy Efficiency scenario 10
  • 11. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Turbine metadata Fore cast information Wind Turbine Energy output by month Limitations when exploiting different and disconnected data sources Wind Speed per day and city Wind farm topology Company Private data Real time wind speed Metadata Data M D M D M D M D M D M M D Complementary but not connected
  • 12. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Lack of interoperability: Language, Syntax,Semantic and Technical • Ecosystem of - Open Resources in silos - Complementary domains - Heterogeneous formats - Different languages - Repositories with different metadata - Many APIs and services for querying
  • 13. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez The user problems… Discovery and Use of Information in third party applications is hard, manual and time consuming Metadata Metadata Combination of Private and Public Sector data in third party applications requires solutions to the license issues Data Data
  • 14. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez 2. Linked Data Foundations
  • 15. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Linked Data: why it is important? • Facilitate data integration - From heterogeous sources - In different formats - Different granularity - In different languages - From different countries © Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
  • 16. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez LD domains in August 2014 Media Geographic Life Sciences PublicationsGoverment Social Networking Cross-domains User Generated Content Linguistics
  • 17. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Foundations Unique identifiers: URI identify or name a resource RDF(S) models El QuijoteCervantes Is creator of WorkPerson Is creator of Is a Is a http://datos.bne.es/resource/XX1718747 http://datos.bne.es/resource/XX3383563 http://iflastandards.info/ns/fr/frbr/frbrer/C1005 http://iflastandards.info/ns/fr/frbr/frbrer/C1001 Equivalence links to other datasets Same As http://viaf.org/viaf/17220427 Cervantes Same AsSame As http://dbpedia.org/resource/Miguel_de_Cervantes Cervantes Data navigation
  • 18. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Foundations: Linking Models alignment using Owl EquivalentClass EquivalentClass Same As http://xmlns.com/foaf/0.1/PersonPerson http://schema.org/PersonPerson EquivalentClass Municipality Person Place of birth http://iflastandards.info/ns/fr/frbr/frbrer/C1005 http://dbpedia.org/resource/Municipalities_of_Spain http://dbpedia.org/page/Alcal%C3%A1_de_Henares Alcalá de Henares Is a http://geo.linkeddata.es/ontology/Municipio Municipio http://geo.linkeddata.es/resource/Alcalá de Henares Alcalá de Henares IS A
  • 19. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez The model (Ontology) and the data 23 Work Idiom translation Year Publication date Library Located at Person Is creator of Has subject El Quijote Cervantes Is creator of Catalán translation 1960 Publication date BNE Located in Has subject Vida de Cervantes birthPlace Place birthPlace Alcalá de Henares Ontology Data
  • 20. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez24 http://iflastandards.info/ns/fr/frbr/frbrer/C1001 http://iflastandards.info/ns/fr/frbr/frbrer/C1002 translation Año Publication date http://xmlns.com/foaf/0.1/Organization Located in http://iflastandards.info/ns/fr/frbr/frbrer/C1005 Is creator of Has subject http://datos.bne.es/resource/XX3383563 http://datos.bne.es/resource/XX1718747 Es autor http://datos.bne.es/resource/XX1924295 translation 1960 Publication date BNE Located in Has subject http://datos.bne.es/resource/bimo0002045496 Vida de Miguel de Cervantes Saavedra Don Quijote de la Mancha Cervantes Saavedra, Miguel de Catalán Ontology Data http://datos.bne.es/# Language work Biblioteca Person http://geo.linkeddata.es/ontology/Municipio birthPlace http://geo.linkeddata.es/resource/Alcalá de Henares birthPlace Linked data is full of URIs
  • 21. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Linked Data without ontologies http://www.server1.org/resource/Cervantes http://www.server2.es/resource/Cervantes http://datos.bne.es/resource/XX1718747 http://d-nb.info/gnd/11851993X http://geo.linkeddata.es/page/resource/Municipio/Cervantes Same as Same as Same as Same as URI URI URI URI URI 914 296 093 276,4 km² Phone Size 1547 #People 1547 Date of Birth Author D. Quijote Cervantes
  • 22. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Linked Data and ontologies http://www.server1.org/resource/Cervantes http://www.server2.es/resource/Cervantes http://datos.bne.es/resource/XX1718747 http://d-nb.info/gnd/11851993X http://geo.linkeddata.es/page/resource/Municipio/Cervantes Same as Person rdf:type rdf:type Retaurant rdf:type Street rdf:type Municipality rdf:type URI URI URI URI URI 1547 Date of Birth Author D. Quijote Cervantes (Person)
  • 23. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez The problem and challenges 27 Need to access heterogeneous relational data sources (Geography, Energy, Medicine, Environment) Need to submit SPARQL queries into distributed SPARQL endpoints • Some of the databases are available in different DBMSs • Some of the data sources are available as spreadsheets, Words, PDFs, • Furthermore, many of these datasets are already published as Linked Data or in SPARQL endpoints • Data may be available from data streams (e.g., sensors) We can use ontologies as global schemas for our data sources Oscar Corcho and the OEG-UPM Data Integration
  • 24. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Linked Data allows uniform access 1. Agree on vocabularies for describing metadata and domain data 2. Unified and standardized language for describing resources ( RDF(S)) 3. Unified and standardized query language (SPARQL) 4. Standardized non-proprietary APIs 5. Links to other resources
  • 25. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Linked data Technologies @ OEG 29 Geometry2RDF shp2RDF geo REST service annotation Sem4TagsMarimbaNOR2OMorph SPARQL -Stream Linked Library Data Visualisation Map4RDF Sensor Data Visualisation Visualization RDF Generation and Linking LDP4j
  • 26. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Metadata and data Integration Data Generation Metadata Generation Public Resources Producers Private Resources Geographical Information REST service annotation Web 2.0 Library and Cultural Heritage Diverse Information Sensor Networks data Data Integration Users Metadata Integration
  • 27. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez 3. Linked Data Process Examples from datos.bne.es
  • 28. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Linked Data life cycle 1. Clear methodologies, methods and tools for monolingual LD generation and publication Villazón-Terrazas, B.; Vilches. L.; Corcho, O.; Gómez-Pérez, A. Methodological Guidelines for Publishing Government Linked Data. In D. Wood, ed. Linking Government Data. Springer. (pp, 27-49). 2011 Specification Modelling GenerationPublication Exploitation Linking
  • 29. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Specification Specification Modelling RDF Generation Publication Links Generation Exploitation Goal Linked Data generation of the Spanish National Library Metadata • Source data: MARC 21 records, not RDB. Very flat structure difficult to map to richer models • Domain experts (catalogers) need to be part of the mapping process. - Highly specialized library models: FRBR, ISBD. • Data quality good but still many errors: data curation during the LD generation process - Iterative and incremental transformation process: measure coverage and progress. • Multilinguality, collaboration with IFLA
  • 30. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez 34 • Identify and analyse the data sources analysis • Design the URIs • License and Provenance definition Specification Modelling RDF Generation Publication Links Generation Exploitation
  • 31. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez MARC21 • Different communication formats: - MARC 21 format for Bibliographic Data - MARC 21 format for Authority Data - Others: Holdings, Classification, etc. • 3.9 million bibliographical records • 4.2 million authority records 35 Specification Modelling RDF Generation Publication Links Generation Exploitation
  • 32. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez MARC21 record structure 001 XX1721208 005 200012181124 008 901120nn aijnnaabn n aaa 016 $a BNE19900178994 040 $a SpMaBN $b spa $c SpMaBN $e rdc $f embne 100 10 $a Camus, Albert $d 1913-1960 670 $a El mite de Sísif, 1987 $b port. (Albert Camus) 670 $a Dic. de filosofía, de J. Ferrater Mora, 1980$b(Camus., Albert (1913-1960); n. Mondovi, Argel) 670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert) 37 SubfieldField Control Field Content Subfield Content • Authority record: Camus, Albert* HEADING 1XX * http://datos.bne.es/resource/XX1721208 Specification
  • 33. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Frecuency of codes in records 39 Specification Modelling RDF Generation Publication Links Generation Exploitation
  • 34. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Modelling: Ontologies and Terminology Specification Modelling RDF Generation Publication Links Generation Exploitation Shared Understanding
  • 35. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Ontology • Ligth weight Ontologies: o Concepts o Organized in taxonomies o Properties between concepts o Properties for describing concepts • Shared understanding of a domain of interest • Ontologies expressed in OWL or RDF(S) • The NeOn methodology helps to build ontologies Modelling Specification Modelling RDF Generation Publication Links Generation Exploitation
  • 36. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Model: FRBR at a glance Works Expressions Manifestations Work 1 Work 2 Work 3 Expression1 Expression 2 Manifestation1 Manifestation2 42 Specification Modelling RDF Generation Publication Links Generation Exploitation
  • 37. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez The Ontology: based on IFLA vocabularies Specification Modelling RDF Generation Publication Links Generation Exploitation
  • 38. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Who will be the mapping generator? 001 XX1721208 005 200012181124 008 901120nn aijnnaabn n aaa 016 $a BNE19900178994 040 $a SpMaBN $b spa $c SpMaBN $e rdc $f embne 100 10 $a Camus, Albert $d 1913-1960 670 $a El mite de Sísif, 1987 $b port. (Albert Camus) 670 $a Dic. de filosofía, de J. Ferrater Mora, 1980$b(Camus., Albert (1913-1960); n. Mondovi, Argel) 670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert) Specification Modelling RDF Generation Publication Links Generation Exploitation MARC 21 records IFLA-based Ontologies
  • 39. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Similar to mapping ontologies 45 100at Work property subfield maps 100t title of work maps is creator of Person100a maps Content (100a) Content (100at)contained in maps Modelling
  • 40. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Librarians create mappings using excell 47 Classification mapping Annotation mapping Relationships mapping MARC21 info Records count Content sample Mapping 100 $a $d 888.880 Camus, Albert 1913-1960 foaf:Person 100 $a 999.999 Cervantes, Miguel de foaf:name 100 $a $m 10.000 Cervantes, iguel ERROR Basic structure Classification mapping
  • 41. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez 48 Annotation mapping Relationships mapping Librarians create mappings using excell place of publication has dimensions Is part of work
  • 42. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Marimba: Mapping process summary Classify Annotate Relate 51 001 XX1721208 100 10 $a Camus, Albert $d 1913-1960 001 XX1910518 100 10 $a Camus, Albert$d1913-1960 $tLa peste bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:Work bne:XX1721208 a frbr:Person frbr:name "Camus, Albert" . frbr:hasDates 1913-1960 bne:XX1910518 a frbr:Work frbr:title "La Peste" bne:XX1721208 a frbr:Person frbr:name "Camus, Albert" . frbr:hasDates 1913-1960 . frbr:isCreatorOf bne:XX1721208 bne:XX1910518 a frbr:Work frbr:title "La Peste" . frbr:isCreatedBy bne:XX1721208 (MARC records) BNE Specification Modelling RDF Generation Publication Exploitation Links Generation
  • 43. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Marimba uses the ontology to generate RDF BNE Specification Modelling RDF Generation Publication Exploitation Links Generation • http://marimba4lib.com
  • 44. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Marimba links with other resources: VIAF, DNB, SUDOC, LIBRIS, DBpedia BNE http://datos.bne.es/resource/XX1718747 Same As Same As Same As Same As Same As LIBRIS http://libris.kb.se/resource/auth/45369 SUDOC http://www.idref.fr/026774771/id DNB http://d-nb.info/gnd/11851993X DBpedia http://dbpedia.org/resource/Miguel_de_Cervantes VIAF http://viaf.org/viaf/17220427 Specification Modelling RDF Generation Publication Exploitation Links Generation Several IRI/URIs exist for Miguel de Cervantes
  • 45. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Marimba links with other resources: VIAF, DNB, SUDOC, LIBRIS, DBpedia Specification Modelling RDF Generation Publication Exploitation Links Generation
  • 46. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Publicación Data publication Metadata publicacion using VOID To facilitate the discovery • Register in CKAN your dataset • Use to sitemap4rdf to generate the site map • Upload the site map to Google and Sindice Specification Modelling RDF Generation Publication Exploitation Links Generation
  • 47. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Exploitation: datos.bne.es select distinct COUNT(?Obras) where { http://datos.bne.es/resource/XX1718747 <http://iflastandards.info/ns/fr/frbr/frbrer/P2010> ?Obras } URI Cervantes Is author SPARQL queries Web Interface Specification Modelling RDF Generation Publication Exploitation Links Generation
  • 48. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Agregation of geographical information with library metadata 60 http://datos.bne.es/autor/XX869875
  • 49. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Locations related with “El Quijote” 61 Itinerary followed in the trip Locations Route
  • 50. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez 4. Linguistic Linked Licensed Data Linked data Linguistic Linked Data
  • 51. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Use cases for LR Discovery • Language metadata content - Give me bilingual dictionaries in Spanish, Polish , that accounts for grammatical number and gender with Creative Common licenses • Language Resources content - Give me all occurrences in corpora of the token “bank” disambiguated as the WorNet synset http://wordnet- rdf.princeton.edu/wn31/1084372 35-n • Language Services - Give me all RESTfull services that can extract terms from text in Spanish.
  • 52. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Lack of interoperability of Language resources • Ecosystem of - Open and Closed resources - Different Languages - Silos of LRs - Complementary resources • Lexicon, Corpora, Dictionaries, Grammars, …. - Heterogeneous formats • E.g, for Lexicons: Lexinfo, LMF, LIR, Lemon, … - Several repositories with different metadata and schemas - Many APIs and services for querying
  • 53. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez http://es.wiktionary.org http://rae.es http://www.wikilengua.org/ index.php/Terminesp:red http://es.wikipedia.org http://www.wordreference. com/sinonimos/ An example “Red” (computer network)
  • 54. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez *Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell “Red” Etimologiy Del latin “rete” Gender: “f” Definition.: “Conjunto de ordenadores o de equipos informáticos conectados entre sí….” “Red” Sinonyms: “sistema”, “malla”,” distribución” “Red” Norm: UNE 21302-131 English: network German: Netzwerk “Red” Pronunciation: [red] Grammar category: sustantivo femenino Singular: “red” Plural: “redes” “Red_de_computadores” Category: redes informáticas Image “Red” (computer network) Complementary but not connected
  • 55. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez LD allows linguistic data integration Red Phonetic form Form number singular [RED] Form plural [REDES] Phonetic form number Red Sense written form “red” Sense written form “malla” equivalent Red image Red Sense Sense translation es - en written form “red” “network” written form Red written form Form gender femenine “red”
  • 56. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Linguistic Linked Licensed Data 3LD Linguistic Linked Licensed Data Language resources such as: - Lexica - Corpora - Dictionaries .. NIF NLP Interchange Format Using RDF and standard data models (vocabularies): - Lexica - Corpora ODRL Open Digital Rights Language Published along with a machine-readable license. www.lider-project.eu
  • 57. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Linguistic Linked Data Evolution Jan. 2013  2014 Sept. 2014Sept. 2013 April. 2015
  • 58. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Linguistic Linked Licensed Data @ Nov 2014 LLOD Cloud in November 2014 • 103 Resources (+58%) • 165 Links (+101% increase) • More balanced (14 Corpora, +367%) • Less Centralized: Babelnet, LexVo and LexInfo new hubs Criteria for inclusion: • Resolvable: URLs that resolve • RDF: resolve to RDF • 1000 Triples: self-explaining • Links: to one resource from the cloud or other 50 links • Crawlable: get the whole resource by crawling • Linguistic: data must be a language resources • Registered: at CKAN www.lider-project.eu
  • 59. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Best practices and guidelines (BPMLOD @ W3C) 1. Best practices for Multilingual Linked Data Publication (BPMLOD @ W3C) - Practices for Naming (URIs) - Practices for Dereferencing - Practices for Textual Information - Practices for Linking - Practices for Language Identification 2. Guidelines for Linguistic Linked License Data - Wordnets, - Multilingual Lexicographic resources - Bilingual Dictionaries - Terminologies in TBX - NIF-based NLP Web services How many Linguistic Resources are exposed in RDF? www.lider-project.eu
  • 60. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez How data and Linguistic LD is related How many Linguistic Resources are exposed in RDF? LOD Is Linguistic LD just another type of dataset to be exposed in RDF? Is the role of Linguistic LD to extend any dataset with lexical entries? LLD How many Linguistic Resources are exposed in RDF?
  • 61. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Linguistic Linked Licensed Data
  • 62. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Linguistic Linked Licensed Data How do we represent license information?
  • 63. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Linked Data and Linguistic Linked Data 1. Agree on vocabularies for describing • Domain vocabularies • LR metadata and content (Lemon- Ontolex, NIF, …) 2. Unified and standardized language for describing resources ( RDF(S)) 3. Unified and standardized query language (SPARQL) 4. Standardized non-proprietary APIs 5. Links to other resources Linguistic LD www.lider-project.eu
  • 64. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez 5. Linked Data is multilingual Linked data Linguistic Linked Data Multilingual Linked Data
  • 65. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Rationale: LOD is dominated by the English Language Some questions: 1. Distribution of natural languages across RDF datasets? 2. Usage of language tags to indicate the natural language of RDF tags? 1. Distribution of usage of language tags 2. Distribution of literals tagged as English vs other languages 3. Distribution of literals tagged in languages other than English 89  2007  2009  2014
  • 66. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez The multilingual LOD: Current state* 9 9% 91% RDF literals with lang tag RDF literals without lang tag 7% 93% RDF literals with lang tag RDF literals without lang tag 67% 33% RDF literals English RDF literals other than English 71% 29% RDF literals English RDF literals other than English JAN 2015 JAN 2014 ^* Used corpus: swse.deri.org/dyldo/
  • 67. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez The multilingual LOD: Current state* 91 0 10.000 20.000 30.000 40.000 50.000 60.000 70.000 es de zh fr it ru pl nl pt sv jan2014 jan2015 Evolution of top 10 most used language tags in languages other than English ^* see statistics for 2012 in the paper “Guidelines for Multilingual Linked Data” Gómez-Pérez 2013
  • 68. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Messages to take home 1. Data providers should include language metadata in their datasets • in the original data sources (e.g., MARC21 records) • tags into RDF (e.g., @es, @ pl at least) • language URIs in the VOID or DCAT descriptions 2. Guidelines and best practices needed to help language metadata generation, linking and consumption 3. Benefits of adding language information LD datasets • Reduce the time and cost of identifying language in resources and terminology • Foster the aggregation and enrichment of data across complementary resources • Enhances data curation • Improves precision and recall in information retrieval and search Publishing Linked Data on the Web: The Multilingual Dimension Daniel Vila-Suero, Asunción Gómez-Pérez, Elena Montiel-Ponsoda, Jorge Gracia, Guadalupe Aguado-de-Cea http://link.springer.com/chapter/10.1007/978-3-662-43585-4_7
  • 69. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez 6. Uses
  • 70. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Linked Data Applications 104 Ontology Engineering Group Culture (@BNE) Geograhical (@IGN) Metereological (@AEMET) News and Media (@ Prisa, RTVE) Internet of Things ( @ CRTM, Bike sharing system) Smart Cities and Open Data (@ Zaragoza, Gob Aragón, Jacathon, Catalogues) Host of esDBpedia
  • 71. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Uses of Linked Data 1. Programmers built applications using make queries in SPARQL and get RDF Culture (@BNE) Geograhical (@IGN) Metereological (@AEMET) Smart Cities 2. Citizens/Users access LD through a user interface (they do not see RDF) 3. Machine – Machine data exchange and semantic interoperability in RDF
  • 72. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez The new Linked Data Ecosystem Culture (@BNE) Geograhical (@IGN) Metereological (@AEMET) Smart Cities
  • 73. Maximising (Re)Usability of Resources using Linked Data. Poznan 12th May 2015 © Asunción Gómez-Pérez Thanks for your attention ! 107