2. Reading & Reflections
Bizer, et al. Linked Data – The Story so far
• What is Linked Data?
– Is it same as Web of Data?
• What excited you most about linked data
while reading this article? OR what did you
find most interesting?
• Is Linked Data happening in real life? Have
you seen this anywhere?
2
3. Outline
• What is Linked Data?
• Why Linked Data?
• How to publish as part of Linked Data
– Linked Data Principles
– Finding existing sources
– Possible software architectures
– Query Language: SPARQL
3
7. The web of documents
• Analogy
– Global file system
• Designed for
– Human consumption
• Primary objects
– documents
• Links between
– documents (or sub-parts of)
• Semantics
– implicit
7
8. The web of documents: Issues
• Web of Documents but primarily About
Data
– But the connection is implicit
• Integration & Querying
– Show me all the news stories by US Presidents
coming from Chicago?
8
9. Semantic Web
•We need to help machines to understand the web..so
machines can help us to understand things.
•If machines have access to the data about things (i.e.
knowledge) then they can do better job while processing
documents
9
10. Linked Data
Linking Things
Thing Thing Thing Thing Thing
Thing Thing Thing Thing Thing
relationship relationship relationship relationship
links links links links
10
An introduction to Linked Data- Tim Heath, Talis
11. Linked Data…
• …. is about creating global database of linked
things
• …refers to a set of best practices for publishing
and interlinking data on the Web…
• ….is a method of publishing data [on the Web], so
that it can be interlinked and become more useful.
11
12. The Web of Linked Data
• Analogy
– a global database
• Designed for
– machines first, Humans later
• Primary objects
– things (or descriptions of things)
• Links between
– things
• Semantics
– explicit
12
14. Linked Data Technologies : URIs
• Like URLs but not just for Web pages
– For things
(cars, people, places, organisations, coursework, etc.
)
• “A Uniform Resource Identifier (URI) provides a
simple and extensible means for identifying a
resource.” -- RFC 3986
• Many different schemes – http://, ftp://, mailto:
• Examples:
http://imash.leeds.ac.uk/ontologies/foaf/dhaval/me.rdf
http://dbpedia.org/resource/University_of_Leeds 14
15. HTTP
• Data access mechanism between web
browsers (client) and servers
• HTTP messages consists of requests from
client to servers and responses from servers
to clients
• HTTP request/response methods:
GET, POST, etc.
15
16. RDF
• Data format to describe things and their
interrelations
• is based on triples
• Subject, predicate, object
• <The sky> <has the colour> <blue>
16
17. RDF
rdf:type
dt:dhaval foaf:Person
foaf:name
Dhaval Thakker
foaf:based_near
dbpedia:Leeds
From my profile in RDF
Prefixes
dt: < http://imash.leeds.ac.uk/ontologies/foaf/dhaval/ me.rdf#>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
foaf: <http://xmlns.com/foaf/0.1/> 17
dbpedia: <http://dbpedia.org/resource/>
18. Data Merging with RDF
rdf:type
dt:dhaval foaf:Person
foaf:name
Dhaval Thakker
foaf:based_near
dbpedia:Leeds
From my profile in RDF
dbp-prop:population
751,500
dbpedia:Leeds
dbp-prop: is part of dbpedia:West_
Prefixes Yorkshire
dt: < http://imash.leeds.ac.uk/ontologies/foaf/dhaval/ me.rdf#> From Dbpedia
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
foaf: <http://xmlns.com/foaf/0.1/>
dbpedia: http://dbpedia.org/resource/ 18
dbp-prop: <http://dbpedia.org/ontology/>
19. Data Merging with RDF
rdf:type
dt:dhaval foaf:Person
foaf:name
Dhaval Thakker
foaf:based_near
dbpedia:Leeds
From my profile in RDF
dbp-prop:population
751,500
dbpedia:Leeds
dbp-prop: is part of dbpedia:West_
Prefixes Yorkshire
dt: < http://imash.leeds.ac.uk/ontologies/foaf/dhaval/ me.rdf#> From Dbpedia
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
foaf: <http://xmlns.com/foaf/0.1/>
dbpedia: http://dbpedia.org/resource/ 19
dbp-prop: <http://dbpedia.org/ontology/>
20. Linked Data Principles
• Use URIs as names for things
– anything, not just documents
• Use HTTP URIs
– globally unique names, distributed ownership
– allows people to look up those names
• Provide useful information in RDF
– when someone looks up a URI
• Include RDF links to other URIs
– to enable discovery of related information
Tim Berners-Lee 2007 20
http://www.w3.org/DesignIssues/LinkedData.html
21. Linked Data Principles
• Use URIs as names for things
– anything, not just documents
• Use HTTP URIs
– globally unique names, distributed ownership
– allows people to look up those names
Tim Berners-Lee 2007 21
http://www.w3.org/DesignIssues/LinkedData.html
22. Linked Data Principles
• Use URIs as names for things
– anything, not just documents
• Use HTTP URIs
– globally unique names, distributed ownership
– allows people to look up those names
• Provide useful information in RDF
– when someone looks up a URI
Tim Berners-Lee 2007 22
http://www.w3.org/DesignIssues/LinkedData.html
23. Provide useful information in
RDF
rdf:type
dt:me foaf:Person
foaf:name
Dhaval Thakker
foaf:based_near
dbpedia:Leeds
From my profile in RDF
http://imash.leeds.ac.uk/ontologies/foaf/dhaval/ me.rdf#me
Prefixes
dt: < http://imash.leeds.ac.uk/ontologies/foaf/dhaval/ me.rdf#>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
foaf: <http://xmlns.com/foaf/0.1/> 23
dbpedia: <http://dbpedia.org/resource/>
24. RDF is Data Model, Not
Serialisation Format
• RDF Serialisation Formats : RDF/XML, Turtle, N-Triples
– RDF/XML
<rdf:RDF
xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#
xmlns:foaf=http://xmlns.com/foaf/0.1 />
<foaf:Person rdf:ID="me">
<foaf:name>Dhavalkumar Thakker</foaf:name>
<foaf:title>Dr</foaf:title>
<foaf:based_near rdf:resource="http://dbpedia.org/resource/Leeds"/>
24
26. RDF is Data Model, Not
Serialisation Format
• RDF Serialisation Formats : RDF/XML, Turtle, N-Triples
– N-Triples
< http://imash.leeds.ac.uk/ontologies/foaf/dhaval/me.rdf#me>
<xmlns:foaf=http://xmlns.com/foaf/0.1#name> “Dhavalkumar Thakker”.
< http://imash.leeds.ac.uk/ontologies/foaf/dhaval/me.rdf#me>
< http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<xmlns:foaf=http://xmlns.com/foaf/0.1#Person>.
26
27. Linked Data Principles
• Use URIs as names for things
– anything, not just documents
• Use HTTP URIs
– globally unique names, distributed ownership
– allows people to look up those names
• Provide useful information in RDF
– when someone looks up a URI
• Include RDF links to other URIs
– to enable discovery of related information
Tim Berners-Lee 2007 27
http://www.w3.org/DesignIssues/LinkedData.html
28. Including Links to other Things:
Relationship Links
• Relationship Links point at related things
in other data sources, for instance, other
people, places or genes.
• For example, relationship links enable
people to point to background information
about the place they live, or to bibliographic
data about the publications they have
written.
28
29. Including Links to other Things:
Relationship Links
rdf:type
dt:dhaval foaf:Person
foaf:name
Dhaval Thakker
foaf:based_near
dbpedia:Leeds
From my profile in RDF
dbp-prop:population
751,500
dbpedia:Leeds
dbp-prop: is part of dbpedia:West_
Prefixes Yorkshire
dt: < http://imash.leeds.ac.uk/ontologies/foaf/dhaval/ me.rdf#> From Dbpedia
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
foaf: <http://xmlns.com/foaf/0.1/>
dbpedia: http://dbpedia.org/resource/ 29
dbp-prop: <http://dbpedia.org/ontology/>
30. Including Links to other Things:
Identity Links
• Different URIs may refer to the same object
<URI1> in one dataset
is same as
<URI2> defined somewhere else
<http://dbpedia.org/resource/Kirkgate_Markets> <owl:sameAs>
<http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000000c5f680>
• Such a need exists due to:
– Different opinions.
– Traceability. 30
– No central points of failure.
31. Including Links to other Things:
Vocabulary Links
• Reusing existing Vocabularies to further specify yours
<htttp://mydomain.co.uk/myvocab/enterprise#SmallMedium
Enterprise>
rdfs:subClassOf
<http://dbpedia.org/ontology/Company>;
rdfs:subClassOf
<http://umbel.org/umbel/sc/Business> ;
rdfs:subClassOf
<http://rdf.freebase.com/ns/m/0qb7t>. 31
32. Linked Data Principles:
Summary
Include Links:
RDF serialisation
• Use URIs as names for things Relationship, Vocab
formats:
– anything, not just documents ulary & Identity
RDF/XML, N-
Links
Triples & Turtle
• Use HTTP URIs
– globally unique names, distributed ownership
– allows people to look up those names
• Provide useful information in RDF
– when someone looks up a URI
• Include RDF links to other URIs
– to enable discovery of related information
32
33. Finding Existing Datasets or
Vocabularies
• All of the scenarios about including links to
other things assume some sort of knowledge
of existing vocabularies/datasets
• Where to Find such datasets?
• How to Find such datasets?
– Two steps:
• Find datasets/vocabularies that contain certain
Things or Concepts
• Once found, how to inspect the coverage and
suitability 33
34. Where to Find: Web of Data
• A significant number of individuals and
organisations have adopted Linked Data as
a way to publish their data
• The result is a global data space we call
the Web of Data
• The Web of Data forms a giant global
graph consisting of billions of RDF triples
from numerous sources covering all sorts of
topics
34
41. Step 1: Finding existing datasets and
vocabularies: search engines-> Falcon
Available from: http://ws.nju.edu.cn/falcons/conceptsearch/index.jsp 41
42. Finding existing datasets and
vocabularies: search engines-> Watson
Available from: http://kmi-web05.open.ac.uk/WatsonWUI/ 42
43. Finding existing datasets and
vocabularies: search engines-> Swoogle
Available from: http://swoogle.umbc.edu/ 43
44. Step 1: Finding existing datasets and
vocabularies: search engines-> SWSE
Available from: http://swse.deri.org/ 44
45. Step 2: Once found, how to inspect
further for coverage, suitability
• Linked Data sources usually provides
SPARQL endpoint for their dataset(s)
• SPARQL endpoint is an end point to
dataset(s) that can receive query, and return
results
• If you have used MySQL, you might be
familiar with PhPMyAdmin
– SPARQL endpoint are in similar in nature and
its functionality
45
49. SPARQL
• Query Language for RDF
– Based on RDF Data Model
• Possible to write complex joins of disperate
datasets
• Implemented by all major RDF databases
See more: http://www.w3.org/TR/rdf-sparql-query/
49
51. SELECT query: Find everything about
Concept of “Person” as in Dbpedia
#prefix declaration
prefix dbp-ont: <http://dbpedia.org/ontology/>
#result clause
SELECT *
#dataset definition
FROM <http://dbpedia.org>
#query pattern
WHERE {
dbp-ont:Person ?p ?o.
51
}
52. SELECT query: Find everything about
Concept of “Person” as in Dbpedia
#prefix declaration
prefix dbp-ont: <http://dbpedia.org/ontology/>
#result clause
SELECT *
#dataset definition
FROM <http://dbpedia.org>
#query pattern
WHERE {
dbp-ont:Person ?p ?o.
52
}
53. SELECT query: Find superclasses of
Concept of “Person” as in Dbpedia
#prefix declaration
prefix dbp-ont: <http://dbpedia.org/ontology/>
Prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
#result clause
SELECT ?o
#dataset definition
FROM <http://dbpedia.org>
#query pattern
WHERE {
dbp-ont:Person rdfs:subClassOf ?o.
} 53
54. SELECT query: Find all persons in
Dbpedia
#prefix declaration
prefix dbp-ont: <http://dbpedia.org/ontology/>
Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
#result clause
SELECT ?s
#dataset definition
FROM <http://dbpedia.org>
#query pattern
WHERE {
?s rdf:type dbp-ont:Person .
} 54
55. SELECT query: Find specific types of
persons in Dbpedia
#prefix declaration
prefix dbp-ont: <http://dbpedia.org/ontology/>
Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
#result clause
SELECT ?s
#dataset definition
FROM <http://dbpedia.org> Some one
#query pattern who is
WHERE { Person &
?s rdf:type dbp-ont:Person . Astronaut
?s rdf:type dbp-ont:Astronaut. 55
}
56. SELECT query: Find specific types of
persons in Dbpedia
#prefix declaration
prefix dbp-ont: <http://dbpedia.org/ontology/>
Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
#result clause
SELECT ?s
#dataset definition
FROM <http://dbpedia.org>
Some one
#query pattern
who is
WHERE {
?s rdf:type dbp-ont:Person .
Person &
?s rdf:type dbp-ont:Astronaut.
Astronaut
?s dbp-ont:status "Retired"@en. & Retired
56
}
57. SELECT query: Find 10 of this, LIMIT
#prefix declaration
prefix dbp-ont: <http://dbpedia.org/ontology/>
Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
#result clause
SELECT ?s
#dataset definition
FROM <http://dbpedia.org>
#query pattern Some one
WHERE {
who is
?s rdf:type dbp-ont:Person .
?s rdf:type dbp-ont:Astronaut.
Person &
?s dbp-ont:status "Retired"@en. Astronaut
} & Retired
57
LIMIT 10
58. SELECT query: Find 10 of this and order
it by date: ORDER BY
#prefix declaration
prefix dbp-ont: <http://dbpedia.org/ontology/>
Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
#result clause
SELECT * Some one
#dataset definition who is
FROM <http://dbpedia.org>
#query pattern
Person &
WHERE { Astronaut
?s rdf:type dbp-ont:Person . & Retired
?s rdf:type dbp-ont:Astronaut.
&
?s dbp-ont:status "Retired"@en.
?s dbp-ont:birthDate ?date youngest
} ORDER BY ?date, first
LIMIT 10 58
59. Mathematical operations &
•
Filtering results
Find me all landlocked countries with a population greater
than 15 million , with the highest population country first
PREFIX type: <http://dbpedia.org/class/yago/>
PREFIX prop: <http://dbpedia.org/property/>
SELECT ?country_name ?population
WHERE
{ ?country a type:LandlockedCountries .
?country rdfs:label ?country_name .
?country prop:populationEstimate ?population .
FILTER (?population > 15000000 &&
langMatches(lang(?country_name), "EN")) . }
ORDER BY DESC(?population) 59
60. ASK query: Is India a Landlocked country?
• Is India a Landlocked country?
• ASK query:
PREFIX yago: <http://dbpedia.org/class/yago/>
PREFIX prop: <http://dbpedia.org/property/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
ASK
{ <http://dbpedia.org/resource/India> rdf:type
yago:LandlockedCountries.}
DO NOT HAVE TO SPECIFY
“WHERE”
Replace with Afghanistan 60
61. Exercise: Write a SPARQL query
• Write a SPARQL query to retrieve all the
bands that are of genre rock bands from
Republic of Ireland.
Prefix dbpedia: <http://dbpedia.org/resource/>
Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
Prefix dbp-onto: <http://dbpedia.org/ontology/>
Use following classes or properties
dbp-onto:Band, dbp-onto : genre. dbpedia:Rock_music,
dbpedia:Republic_of_Ireland, dbp-ont:hometown
61
62. Exercise: Write a SPARQL query
• Write a SPARQL query to retrieve all the
bands that are of genre rock bands from
Republic of Ireland.
Prefix dbpedia: <http://dbpedia.org/resource/>
Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
Prefix dbp-onto: <http://dbpedia.org/ontology/>
Select * where {
?s rdf:type dbp-onto:Band.
?s dbp-onto:genre dbpedia:Rock_music.
?s dbp-onto:hometown dbpedia:Republic_of_Ireland
62
}
63. Summary: Finding existing
datasets/vocabularies
• Use of search engines to find a dataset
• Use of SPARQL endpoints to inspect the
dataset further
• SPARQL queries
– SELECT query for selecting a set of results to
display
– ASK query to ask a specific question about
something
– Variations in terms of LIMIT, ORDER BY
63
64. Publishing Linked Data:
Software Architecture Patterns
• Follow linked data principles
– They are good practice principles NOT norms
or rules
• The software architecture needs to support
such way of publication
– Existing architectures using structured or
unstructured data
– doing it from scratch – publishing linked data
– different from when working with existing
applications and infrastructure already in place 64
67. Type of data
Name Address Post code Author of
• Structured data A ---- ------- Book B
– Database tables
– XML documents
• Unstructured data
– Textual documents
• News stories, reports, textual descriptions – as
textual files
67
69. Query-able Structured Data to
Linked Data
• Example: A movie business that has movie
database in a relational database
• published relatively easily as Linked Data
through the use of relational database to
RDF wrappers.
• Maps database schemas to RDF schemas
• Wrappers
– Virtuoso RDF Views
– Triplify 69
71. Static Structured Data to Linked
Data
• A UK government department that has
performance data of each department in
excel sheets
• must undergo a conversion process that
outputs static RDF files or loads converted
data directly into an RDF store.
• RDFizing tools
– http://www.w3.org/wiki/ConverterToRdf
– Tools to convert data from various format to
71
RDF
72. RDF store
• Also called “triple store” or “semantic repository”
• They are engines similar to the DBMS- they allow
for storage, querying, and management of
structured data. Major differences:
– they use ontologies as semantic schemata. This allows
them to automatically reason about the data.
– they work with flexible and generic physical data
models (e.g. graphs). This allows them to easily
interpret and adopt "on the fly" new ontologies or
metadata schemata.
• Available RDF stores: OWLIM, Allegrograph,
72
Virtuoso, Sesame, Jena TDB
74. From Text Documents to Linked
Data
• Example: News publisher with a corpus of news
stories produced in the last month
• it is possible to pass these documents through a
Linked Data entity extractor such as Open
Calais(http://www.opencalais.com/), or DBpedia
Spotlight(http://dbpedia-
spotlight.github.com/demo/index.html) which
annotate documents with the Linked Data URIs of
entities referenced in the documents.
74
75. From Text Documents to Linked
Data
• Publishing these annotations together with the
documents
– increases the discoverability of the documents
– enables applications to use the referenced Linked Data
sources as background knowledge to display
complementary information on web pages
– or to enhance information retrieval tasks, for instance,
offer faceted browsing instead of simple full-text
search.
• Applications like this to be presented in
next lecture(s) 75
76. Summary
• Linked Data is a way of publishing and
interlinking structured data on the web
• Linked Data principles to follow to create
such data
• How to find existing datasets: Web of Data
• How to query existing datasets: SPARQL
• Possible software architecture patterns
76
77. Next Lecture
• Consuming Linked Data
– Linked Data Applications
• What datasets they use from Web of Data
• What software architecture they follow
– Benefits
• Integration – for organisations
• Browsing and interaction – for users
77
78. References
• Tom Heath, An Introduction to Linked
Data, Linked Data Tutorial, Austin, Texas, 2009.
• Raimond et al., A skim-read introduction to linked
data
• Tom Heath, Christian Bizer: Linked Data:
Evolving the Web into a Global Data
Space. Synthesis Lectures on the Semantic
Web, Morgan & Claypool Publishers 2011
• Cambridge Semantics, SPARQL by example
78
79. TED talk from Tim Berners Lee
on Linked Data
• http://www.ted.com/talks/tim_berners_lee_
on_the_next_web.html
79