The document provides guidelines for publishing data as Linked Data. It discusses identifying appropriate data sources, reusing existing vocabularies and non-ontological resources, generating RDF data from relational databases or geometrical data using tools like R2O, ODEMapster and geometry2rdf, and publishing the data on the web by resolving URIs. The Ontology Engineering Group at Universidad Politécnica de Madrid has published Spanish geospatial and statistical data as part of projects like GeoLinkedData following these guidelines.
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Methodological Guidelines for Publishing Linked Data
1. Methodological Guidelines for
Publishing Linked Data
Boris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho
Facultad de Informática, Universidad Politécnica de Madrid
Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www oeg upm nethttp://www.oeg-upm.net
{bvillazon,asun,ocorcho}@fi.upm.es
Phone: 34.91.3366605, Fax: 34.91.3524819
Cochabamba, Bolivia
May, 2011
2. ToC
• Ontology Engineering Group
• Introduction to Linked Data
• Guidelines for Publishing Linked Data
• Demo
2
3. ToC
• Ontology Engineering Group
• Introduction to Linked Data
• Guidelines for Publishing Linked Data
• Demo
3
4. People
• Director: A. Gómez-Pérez
• Research Group (38 people)
• 2 Full Professors
• 6 Associate Professors
• 1 Assistant Professor
• 6 Postdocs
• 14 PhD Students
• 6 MSc Students
• 4 Software Engineers
• Management (5)
• 3 Project Managers• 3 Project Managers
• 1 System Administrator
• 1 Secretary
• 80+ Past Collaborators• 80+ Past Collaborators
• 15+ visitors
4
http://www.oeg-upm.net
8. Collaboration with other research groups
Univ. of Amsterdam
i f d
DFKI
Univ. of Augsburg
Univ. of Karlsruhe
U i f K bl
KSL. Stanford Univ.
Univ. of Wien
Univ. of NR & ALS
Univ. of Innsbruck
Free Univ. of Amsterdam Univ. of Koblenz
Univ. of Hannover
Univ. of Mannheim
Univ of Bielefeld
Univ. of Brasilia
Univ. of Bielefeld
Forschungszentrum Informatik
Univ. of Galway (DERI)
Free Univ. of Brussels
Úniv. of Zurich
Open University
O f d U i it
Ústav Informatiky
Oxford University
Univ. of Manchester
Univ. of Liverpool
Univ of Sheffield
Academy of Sciences
Univ. of Sheffield
Univ. of Aberdeen
Univ. of Edinburgh
Univ. of Southampton
CNR
Univ of Trento
Univ. of Tel Aviv
8
Univ. of Southampton
Univ. of Hull
Univ. of Trento
Univ. of Bolzano
INRIA
Univ. of Athens
TUC
9. Research Areas
2004 2008
Internet
of Things
Semantic e-Science
(Data Integration,
Semantic Grid)
Ontological Engineering
1995Semantic Grid) 1995
(S i l) N l(Social)
Semantic
Web
Natural
Language
Processingg
19972000
9
10. Linked Data in OEG
• GeoLinkedData is an open initiative whose aim is to
enrich the Web of Data with Spanish geospatial data.p g p
http://geo.linkeddata.es
• El Viajero Linked Data is project that focuses on the
integration of the contents produced by newspapers
and digital platforms belonging to Prisa Groupand digital platforms belonging to Prisa Group.
http://webenemasuno.linkeddata.es/
• A project with the Biblioteca Nacional to publish the
library information as Linked Data.y
http://cultura.linkeddata.es/visualizer/
10
11. Linked Data in OEG
• Tools for generating and cosuming Linked Data, e.g.,
• geometry2rdf http://www oeg upm net/index php/downloads/151 geometry2rdf• geometry2rdf http://www.oeg-upm.net/index.php/downloads/151-geometry2rdf
• map4rdf http://oegdev.dia.fi.upm.es/projects/map4rdf/
• Spanish Thematic Network of Linked Data
http://red.linkeddata.esp
» Group leader: Ontology Engineering Group
» 19 Research Groups
» 4 companies» 4 companies
11
12. ToC
• Ontology Engineering Group
• Introduction to Linked Data
• Guidelines for Publishing Linked Data
• Demo
12
17. In a nutshell
• An extension of the current• An extension of the current
Web…
• … where information and servicesdata
are given well-defined and explicitly
represented meaning, …
• … so that it can be shared and used
by humans and machinesby humans and machines, ...
• ... better enabling them to work in
cooperation
• How?
• Promoting information exchange by
tagging web content with machine
processable descriptions of its
meaning.
A d t h l i d i f t t• And technologies and infrastructure
to do this
• And clear principles on how to
publish data
17
publish data
18. The four principles (Tim Berners Lee, 2006)
1. Use URIs as names
for things
• http://www.w3.org/D
esignIssues/Linkedfor things
2. Use HTTP URIs so
that people can look
esignIssues/Linked
Data.html
that people can look
up those names.
3. When someone looks
http://www.ted.com/talks/tim_berners_lee_on_the_next_web.htmlhttp://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
up a URI, provide
useful information,
i th t d dusing the standards
(RDF*, SPARQL)
4 Include links to other4. Include links to other
URIs, so that they can
discover more things.discover more things.
18
23. And guess who is starting to publish Linked Data now?
• UK Government• UK Government
• US Government
• BBC
• Open Calais
• Freebase
• NY Times
• CNET
• Dbpedia• Dbpedia
• ….
23
29. Identification of the data sources
• Guidelines based on the Open Data Manual 1
• Two possibilities
• To find the data sources already available in a public data
catalog, e.g., Aporta project 2
• To get an agreement with a particular government body to
publish its data sources, e.g., GeoLinkedData - IGNp g
29
1 http://opendatamanual.org/
2 http://aporta.es
30. GeoLinkedData
Identification of the data sources
IGN
National Geographic Institute of Spain
Agreement with the IGN
g p p
Oracle & MySQL
Data sources available
in a public data catalog
INE
National Statistic Institute of Spain
in a public data catalog
30
33. Ontology
Vocabulary Modelling
• An ontology is an engineering artifact, which provides:
• A set of terms
• A set of explicit assumptions regarding the intended meaning of the terms.
• Almost always including concepts and their classification
• Almost always including properties between concepts
Shared nderstanding of a domain of interest• Shared understanding of a domain of interest
• Ontologies expressed in OWL or RDF(S), both based on RDFOntologies expressed in OWL or RDF(S), both based on RDF
33
34. Reuse available vocabularies
Vocabulary Modelling
Search for suitable
vocabularies
Linked Open Vocabularies
are there
suitable
vocabularies?
Build the vocabulary by
reusing available
vocabularies
Yes
No
34
…
35. Reuse available non-ontological resources
Vocabulary Modelling
Highly reliable Web Sites
Search for suitable
non-ontological resources
Domain-related sites
Government CatalogsGovernment Catalogs
are there
suitable
resources?
Build the vocabulary by
transforming available
resources
Yes
No
Build the vocabulary from
scratch
35
36. GeoLinkedData
Vocabulary Modelling
scv:Dimension
scv:Item
scv:Dataset
WGS84 Geo
Positioning: an RDF
vocabulary
hydrographical
phenomena (riversphenomena (rivers,
lakes, etc.)
Vocabulary for
instants, intervals,, ,
durations, etc.
Ontology for OGC
Geography Markup
Language
Names and
international code
systems for
territories and
groupsg g
Classes 33 33
Object Properties 44 44
http://neon-toolkit.org/
j p
Data Properties 318 318
36
40. R2O & ODEMapster
R O is an extensible fully declarative language to describe
Generation of the RDF Data
• R2O is an extensible, fully declarative language to describe
mappings between relational database schemas and ontologies.
• The ODEMapster processor generates RDF instances from
relational instances based on the mapping description
expressed in the R2O document
40
www.oeg-upm.net/index.php/en/downloads/9-r2o-odempaster
43. geometry2rdf
Generation of the RDF Data
• Tool for generating RDF from geometrical information
• The geometry could be available in GML or WKT
• The RDF generated follows our Geometry Model
43
http://www.oeg-upm.net/index.php/en/downloads/151-geometry2rdf
44. geometry2rdf
Generation of the RDF Data
Oracle STO UTIL packageOracle STO UTIL package
SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry))
AS Gml311Geometry
FROM "BCN200"."BCN200_0301L_RIO" c
WHERE c.Etiqueta='Arroyo'
44
46. Geometry Model
Generation of the RDF Data
geoes: http://geo.linkeddata.es/
geo: http://www.w3.org/2003/01/geo/wgs84_pos#
geoes:ontology/Geometría
rdfs:subClassOf rdfs:subClassOf
geoes:ontology/Polígonogeoes:ontology/Curvageo:Point
rdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
3939geo:lat geo:long
Collection of 2 or
Collection of 3 or
formadoPor formadoPor
more geo:Points
Collection of 3 or
more geo:Points
46
48. URI Generation
Generation of the RDF Data
• URIs are extremely relevant in this process since
they are the key for the alignment of heterogeneousthey are the key for the alignment of heterogeneous
resources that come from different data sources.
• Cool URIs 1
• UK Cabinet Office 2
• Examples:
http://geo.linkeddata.es/ontology/{class/property}
http://geo.linkeddata.es/ontology/Lago
http://geo linkeddata es/resource/dataset/type/{resourcename}http://geo.linkeddata.es/resource/dataset/type/{resourcename}
http://geo.linkeddata.es/resource/Provincia/Madrid
48
1 http://www.w3.org/TR/cooluris/
2 http://www.cabinetoffice.gov.uk/media/301253/puiblic sector uri.pdf
49. Provenance Information
Generation of the RDF Data
• It is relevant
• to manage the provenance information of the resources• to manage the provenance information of the resources
• to establish the license of the information
• Example
49
Pubby: http://www4.wiwiss.fu-berlin.de/pubby/
51. Publication of the RDF data
map4rdf
map4rdfhttp://oegdev.dia.fi.upm.es/projects/map4rdf/
SPARQLLinked DataHTML
PubbyIncluding Provenance Pubby
Pubby 0.3
Including Provenance
Support
http://www4.wiwiss.fu-berlin.de/pubby/
51
Virtuoso 6.1.0
53. Data Cleansing
• To find possible errors, identified by Hogan et al.
• http-level issues such as accessibility and derefencability• http-level issues, such as accessibility and derefencability,
e.g., HTTP URIs return 40x/50x errors
• reasoning issues such as namespace without vocabulary,
e.g., rss:item term invented
• malformed/incompatible datatypes, e.g., “true” as xsd:int
• To fix the identified errors
• Example, encoding URIs
• Special characters á é ñSpecial characters á, é, ñ
• http://geo.linkeddata.es/resource/Provincia/M%C3%A1laga
53
55. Linking the RDF Data
Identify suitable data sets
li ki t t
http://ckan.net
as linking targets
Discover relationships
between data items
Silk FrameworkLIMES
http://aksw.org/Projects/limes http://www4.wiwiss.fu-berlin.de/bizer/silk/
Validate the relationships
discovered sameAs Validator
http://oegdev.dia.fi.upm.es:8080/sameAs/
55
56. GeoLinkedData
Linking the RDF Data
GeoLinked
Data
GeoNamesDBPedia
…. …. ….
http://sws.geoname
s.org/6355233/
http://geo.linkeddata
.es/.../Madrid
http://dbpedia.org/re
source/Madrid
56
…. …. ….
59. Register the dataset into CKAN Registry
Enable Effective Discovery
• Add the dataset to CKAN, the open registry of data
and content packagesand content packages
• Minimum information• Minimum information
• Name, unique ID for your data set on CKAN
• Title, full name of your data set, y
• URL, link to the data set home page
59
http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation
60. Sitemap protocol
Enable Effective Discovery
• Used by web crawlers
• Efficiently find all your content & discover
what has been updated
http://sitemaps.org/
A i fil i i f i di URLA sitemap file contains information regarding one or more URLs on
your Web site. The information that is stored there helps search
engines better spider your website.
60
62. sitemap4rdf
Enable Effective Discovery
• Simple command line tool
• Sends a SPARQL query to list all URIs
• Generates sitemap• Generates sitemap
it 4 df htt // it / l htt // it / /sitemap4rdf http://yoursite/sparql http://yoursite/resource/
Example:
it 4 df if i th SPARQL d i t
sitemap4rdf http://geo.linkeddata.es/sparql http://geo.linkeddata.es/
• run sitemap4rdf specifying the SPARQL endpoint
and the prefix of the URLs to include in the Sitemap
62
http://lab.linkeddata.deri.ie/2010/sitemap4rdf/
77. Methodological Guidelines for
Publishing Linked Data
Boris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho
Facultad de Informática, Universidad Politécnica de Madrid
Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www oeg upm nethttp://www.oeg-upm.net
{bvillazon,asun,ocorcho}@fi.upm.es
Phone: 34.91.3366605, Fax: 34.91.3524819
Cochabamba, Bolivia
May, 2011