17. The oldest data model
is a simple table.
header
row
column
k
van Hooland, S. and Verborgh, R.
“Linked Data for Libraries, Archives and Museums” (Facet, 2014)
18. Tables do not cope well
with changes in data or schema.
Title Artist Born Died
The Thrill is Gone B. B. King 1925 2015
Riding with the King John Hiatt 1952
Riding with the King B. B. King 1925
… … … …
19. Relational databases provide
a multi-dimensional table model.
7
header
row
relation
key column
attributes
table/entity
van Hooland, S. and Verborgh, R.
“Linked Data for Libraries, Archives and Museums” (Facet, 2014)
20. Databases cope with data changes
but schema changes are harder.
Title Artist
The Thrill is Gone 1
Riding with the King 2
Riding with the King 1
… …
ID Name Born Died
1 B. B. King 1925 2015
2 John Hiatt 1952
… … … …
21. There is no interoperability
with other databases.
Title Artist
The Thrill is Gone 1
Riding with the King 2
Riding with the King 1
… …
Wikipedia
?
22. XML allows reuse of schemas
and identifiers.
the same for all items; a header
line can indicate their name.
Rec
ers
root
parent
child
siblings
subje
van Hooland, S. and Verborgh, R.
“Linked Data for Libraries, Archives and Museums” (Facet, 2014)
23. XML schema evolution
remains a tough nut to crack.
Tabular data Relational model
Meta-markup languages RDF
Each data item is structured as
a line of field values. Fields are
the same for all items; a header
line can indicate their name.
Data are structured as tables, each of
which has its own set of attributes.
Records in one table can relate to oth-
ers by referencing their key column.
XML documents have a hierarchical
structure, which gives them a tree-
like appearance. Each element can
Each fact about a data item is expressed
as a triple, which connects a subject to
an object through a precise relationship.
root
parent
child
siblings
property
subject
object
?
24. The RDF datamodel is flexible
for changes in data and schema.
RDF
Records in one table can relate to oth-
ers by referencing their key column.
ent
child
s
property
subject
object
van Hooland, S. and Verborgh, R.
“Linked Data for Libraries, Archives and Museums” (Facet, 2014)
25. RDF involves a trade-off
between flexibility and reuse.
custom
ontology
reuse
ontologies
perfect
match
perfect
interoperability
26. So far for change within models…
what about change between them?
1.1. INTRODUCTION 7
Tabular data Relational model
Each data item is structured as
a line of field values. Fields are
the same for all items; a header
line can indicate their name.
Data are structured as tables, each of
which has its own set of attributes.
Records in one table can relate to oth-
ers by referencing their key column.
header
row
column
relation
key column
attributes
table/entity
root
Tabular data Relational model
Each data item is structured as
a line of field values. Fields are
the same for all items; a header
line can indicate their name.
Data are structured as tables, each of
which has its own set of attributes.
Records in one table can relate to oth-
ers by referencing their key column.
header
row
column
relation
key column
table/entity
root
parent
child
siblings
property
subject
object
Tabular data Relational model
Meta-markup languages RDF
Each data item is structured as
a line of field values. Fields are
the same for all items; a header
line can indicate their name.
Data are structured as tables, each of
which has its own set of attributes.
Records in one table can relate to oth-
ers by referencing their key column.
header
row
column
relation
key column
table/entity
root
parent
child
siblings
property
subject
object
1.1. INTRODUCTION 7
Tabular data Relational model
Each data item is structured as
a line of field values. Fields are
the same for all items; a header
line can indicate their name.
Data are structured as tables, each of
which has its own set of attributes.
Records in one table can relate to oth-
ers by referencing their key column.
header
row
column
relation
key column
attributes
table/entity
root
parent
child
property
subject
object
27. There’s no ultimate model.
They co-exist. Change is inherent.
1.1. INTRODUCTION 7
Tabular data Relational model
Each data item is structured as
a line of field values. Fields are
the same for all items; a header
line can indicate their name.
Data are structured as tables, each of
which has its own set of attributes.
Records in one table can relate to oth-
ers by referencing their key column.
header
row
column
relation
key column
attributes
table/entity
root
Tabular data Relational model
Each data item is structured as
a line of field values. Fields are
the same for all items; a header
line can indicate their name.
Data are structured as tables, each of
which has its own set of attributes.
Records in one table can relate to oth-
ers by referencing their key column.
header
row
column
relation
key column
table/entity
root
parent
child
siblings
property
subject
object
Tabular data Relational model
Meta-markup languages RDF
Each data item is structured as
a line of field values. Fields are
the same for all items; a header
line can indicate their name.
Data are structured as tables, each of
which has its own set of attributes.
Records in one table can relate to oth-
ers by referencing their key column.
header
row
column
relation
key column
table/entity
root
parent
child
siblings
property
subject
object
1.1. INTRODUCTION 7
Tabular data Relational model
Each data item is structured as
a line of field values. Fields are
the same for all items; a header
line can indicate their name.
Data are structured as tables, each of
which has its own set of attributes.
Records in one table can relate to oth-
ers by referencing their key column.
header
row
column
relation
key column
attributes
table/entity
root
parent
child
property
subject
object
29. Even if your data doesn’t change,
technology does.
What happens to your data?
new software versions
new software manufacturers
30. Is your software
holding your data hostage?
Is your software the owner of your data?
Intentional or unintentional vendor lock-in?
Or are you?
Can you get your data out at any moment you want?
31. The Cooper-Hewitt Design Museum
had trouble getting their own data.
Data in The Museum System
flexible, but complex relational design
no export button
Website had more flexible demands
complex manual queries to liberate data
parallel CMS to drive website
33. The Web has been designed
with change in mind.
Individual links are allowed to break
so the entire Web does not.
—Tim Berners-Lee
34. The Web is in rapid evolution
but continues on working.
What year is it? Then your users need…
1995 – HTML 2.0
2000 – XML
2008 – JSON
2012 – HTML 5
2015 – RDF ?
2017 – … ?
35. At least HTML seems constant,
so the human Web is safe.
http://bib.org/books/978-1-85604-964-1/
around 2005: made in HTML 4
around 2015: made in HTML 5
Markup changes, the identifier does not.
Tim Berners-Lee called these “Cool URIs”.
36. Web APIs for machines suffer
from changes on many levels.
http://api.bib.org/v2/viewBookDetails.php?
id=978-1-85604-964-1&format=json
&apikey=WSDGU56VP
How does this identifier cope with change?
How long does this identifier work unchanged?
!
38. Plenty of excuses exist
to change machine interfaces.
But our new server does it faster!
But our new API has different features!
But XML is obsolete now so we need JSON!
39. Even funnier are the excuses
for requiring API keys.
But we need to rate limit!
But we need to track automated access!
But we need to protect our data!
40. Once and for all:
API keys do not help with these.
But we need to rate limit!
But we need to track automated access!
But we need to protect our data!
41. Once and for all:
API keys do not help with these.
Your HTML interface is still open!
JSON is a convenience, not a necessity.
Anybody can still do whatever they want
by scraping HTML pages with the same data.
Protect your data, not just one interface.
42. Yet other possible changes
still appear to be a concern.
Remain constant if your server changes?
Remain constant if your API changes?
Remain constant if data models change?
47. Constants allow clients
to establish a shared meaning.
S
O
P
http://bib.org/books/978-1-85604-964-1/
http://bib.org/authors/7356/
http://purl.org/dc/terms/creator
48. Human semantics are in concepts
and their meaning to the world.
S
O
P
a book
a person
written by
49. Machine semantics are in symbols
and their structural interrelations.
S
O
P
http://digybe.wpq/dgjyj-dgu7945
http://aole.wqq/mobd1.tihz
http://yudgy.jdu/DHH8DHBtkixhj
50. We need to be very careful
about our choice of symbols.
S
O
P
http://bib.org/books/978-1-85604-964-1/
http://bib.org/authors/7356/
http://purl.org/dc/terms/creator
51. We need to be very careful
about our choice of symbols.
http://bib.org/books/978-1-85604-964-1/
http://bib.org/authors/7356/
Is this a book
or a description of a book?
:printDate "2014-06-11"
:lastModified "2015-11-25"
Is this a person
or a document?
:birthDate "1987-02-28"
:size "17kB"
52. Although designed for machines,
the example only works for humans.
S
O
P
http://bib.org/books/978-1-85604-964-1/
http://bib.org/authors/7356/
http://purl.org/dc/terms/creator
53. Because, somehow, Web APIs
make machine access different.
S
O
P
http://api.bib.org/v2/viewBookDetails.php?
id=978-1-85604-964-1&format=json
&apikey=WSDGU56VP
http://api.bib.org/v2/viewAuthorProfile.php?
id=7356&format=json&apikey=WSDGU56VP
http://purl.org/dc/terms/creator
54. That’s why it’s a problem if
machines need different identifiers.
S
O
P
http://api.bib.org/v2/viewBookDetails.php?
id=978-1-85604-964-1&format=json
&apikey=WSDGU56VP
http://api.bib.org/v2/viewAuthorProfile.php?
id=7356&format=json&apikey=WSDGU56VP
http://purl.org/dc/terms/creator
55. Only this triple is a global constant.
The other is volatile and local.
S
O
P
http://bib.org/books/978-1-85604-964-1/
http://bib.org/authors/7356/
http://purl.org/dc/terms/creator
57. Fortunately, we don’t have to
pick all the constants ourselves.
Ontologies provide identifiers of concepts
that are designed to be reused.
They are necessary to make RDF work.
They are necessary to create queries,
especially over multiple datasources.
58. Of course, we get the benefits
only if we actually reuse.
Why have our own my:writtenBy property
when dc:creator already exists?
Maybe we have a more specific meaning?
We can still relate both properties with RDF.
But if we all use derivatives of the constants,
what is the value of these constants?
59. Authors are not always in control:
external semantic drift happens.
foaf:knows was bidirectional…
spec: “some level of reciprocity”
An foaf:knows Pete Peter foaf:knows An
…until somebody modeled Twitter followers
Pete follows Angela Merkel Pete knows Angela
Yet Angela doesn’t know Pete…
60. Getting close to Derrida…
but we’re not philosophers.
There are only two hard things
in Computer Science:
cache invalidation and naming things.
—Phil Karlton
62. The constants you can touch
are the constants you can trust.
No matter how hard technology changes,
the books we describe remain the same.
Any mechanism of identification
should based on domain resources,
not on inevitably changing technology.
63. The “success” story
of the Web API community.
e existence of more than 12.000 di↵erent micro-protocols to achieve essen
en clients and servers over http. Of course, each application has its own
t does that also warrant an entirely di↵erent way of exposing this, especially
Each di↵erent api currently requires a di↵erent client, given the lack of a u
pi’s response structure and functionality. Clearly, this approach to Web apis i
2005 2007 2009 2011 2013 2015
186
1,263
2,418
5,018
7,182
10,302
12,559
number of indexed Web s
g number of Web apis is often named an indicator of their success, while the ove
ssary—and detrimental to the development of generic Web api clients. (data: progra
number of indexed Web APIs
in ProgrammableWeb
64. Just imagine we had
15,000 different data models.
e existence of more than 12.000 di↵erent micro-protocols to achieve essen
en clients and servers over http. Of course, each application has its own
t does that also warrant an entirely di↵erent way of exposing this, especially
Each di↵erent api currently requires a di↵erent client, given the lack of a u
pi’s response structure and functionality. Clearly, this approach to Web apis i
2005 2007 2009 2011 2013 2015
186
1,263
2,418
5,018
7,182
10,302
12,559
number of indexed Web s
g number of Web apis is often named an indicator of their success, while the ove
ssary—and detrimental to the development of generic Web api clients. (data: progra
number of indexed Web APIs
in ProgrammableWeb
65. Find resources in your domain
and assign them an identifier.
http://bib.org/books/978-1-85604-964-1/
http://bib.org/authors/7356/
66. It’s just like building a web site.
When a user comes, serve HTML.
http://bib.org/books/978-1-85604-964-1/
U
GET
HTML
67. It’s just like building a web site.
When a client comes, serve JSON.
http://bib.org/books/978-1-85604-964-1/
C
GET
JSON
68. It’s just like building a web site.
When a client comes, serve RDF.
http://bib.org/books/978-1-85604-964-1/
C
GET
RDF
69. Content negotiation exists
for a long time in HTTP.
http://bib.org/books/978-1-85604-964-1/
C
GET
RDF
Resource
Representation
70. This allows constant URIs
even with future changes.
http://bib.org/books/978-1-85604-964-1/
C
GET
RDF 2.0
71. It enables different users and
machines to talk about things.
http://bib.org/books/978-1-85604-964-1/
C
U
C
72. The best API is no API.
Your website is already an API.
Developers like to build complicated APIs.
API keys are especially cool to build.
Every feature and change comes with a high cost.
If you ask for an API, you’ll get one.
Ask for new representations
of your resources instead.
76. The Semantic Web promised
data on the Web.
85,567,007,302 triples from 3,426 datasets
LODStats
38,606,408,765 from 657,896 entries
LOD Laundromat
77. How much of this data
can we readily access?
data dumps
Linked Data documents
SPARQL endpoints
78. A data dump means downloading
everything and querying locally.
79. A data dump means downloading
everything and querying locally.
When was the last time
you downloaded the full Wikipedia
just because you had one question?
80. Dumps are not Web querying.
It’s kind of like giving up.
Semantic Web Semantic Basement?
What advantage do we have
compared to Big Data?
Still the RDF data model…
But the major difference is Web.
82. Linked Data documents
allow you to traverse a dataset.
That’s similar to what we also do:
consume information on Wikipedia
by following links.
83. Much Linked Data is available
using the well-known principles.
Servers publish a light-weight interface.
Clients follow their nose
to retrieve information.
84. Linked Data documents allow
query evaluation on the Web.
# Other books by the same author
SELECT DISTINCT ?book WHERE {
books:85604 dc:creator ?author.
?book dc:creator ?author.
}
85. Some queries are hard
or impossible to evaluate.
# Books about Hamburg
SELECT DISTINCT ?book ?author WHERE {
?book dc:subject dbpedia:Hamburg.
?book dc:creator ?author.
}
87. SPARQL endpoints allow you
to ask any question you want.
When was the last time
you expected Wikipedia to answer
specific questions automatically for you?
88. A public SPARQL endpoint
happily answers this query.
# Other books by the same author
SELECT DISTINCT ?book WHERE {
books:85604 dc:creator ?author.
?book dc:creator ?author.
}
89. A public SPARQL endpoint also
happily answers this query.
# Books about Hamburg
SELECT DISTINCT ?book ?author WHERE {
?book dc:subject dbpedia:Hamburg.
?book dc:creator ?author.
}
91. There’s a price to pay for being
the most expressive HTTP interface.
The majority of public SPARQL endpoints
has less than 95% uptime.
This means we cannot query them
for more than 1.5 days each month.
This means we cannot rely on them
to build Linked Data applications.
Buil-Aranda – Hogan – Umbrich – Vandenbussche
SPARQL Web-Querying Infrastructure: Ready for Action?
93. The main promise of Linked Data
is integration, preserving semantics.
RDF
Records in one table can relate to oth-
ers by referencing their key column.
ent
child
s
property
subject
object
94. Integration is the promise.
But does it work on the Web?
data dumps
Linked Data documents
SPARQL endpoints
95. With data dumps, we just
build a bigger basement.
How far do we go?
How do we keep data up to date?
96. With Linked Data documents,
we keep on following our nose.
There are no dataset boundaries.
Some queries will remain hard.
97. With public SPARQL endpoints,
problems become worse.
1 endpoint has 95% availability.
1.5 days down each month
2 endpoints have 90% availability.
3 days down each month
3 endpoints have 85% availability.
4.5 days down each month
99. Can we think differently
about Linked Data on the Web?
high server costlow server cost
data
dump
SPARQL
endpoint
high availability low availability
high bandwidth low bandwidth
out-of-date data live data
low client costhigh client cost
Linked Data
documents
100. Can we think differently
about Linked Data on the Web?
data
dump
SPARQL
endpoint
Linked Data
documents
? ?
101. Let us combine the lessons on
changes, constants, and promises.
An interface that withstands change,
simple enough so it doesn’t break
complex enough to query.
102. Let us combine the lessons on
changes, constants, and promises.
Data dumps contain too much.
SPARQL endpoint results are too specific.
Linked Data documents are unidirectional.
103. Each interface divides a dataset
into Linked Data Fragments.
Data dumps: 1 huge fragment
SPARQL endpoints: ∞ specific fragments
Linked Data: 1 fragment per subject
104. Can we find a new interface
with a sustainable balance?
Triple Pattern Fragments:
1 fragment per subject / predicate / object
107. Triple Pattern Fragments extend
Linked Data documents with forms.
That’s even more similar to what we do:
consume information on the Wikipedia
by following links and using forms.
108. Machines solve complex queries
by breaking them down.
# Other books by the same author
SELECT DISTINCT ?book WHERE {
books:85604 dc:creator ?author.
?book dc:creator ?author.
}
109. Machines solve complex queries
by breaking them down.
# Books about Hamburg
SELECT DISTINCT ?book ?author WHERE {
?book dc:subject dbpedia:Hamburg.
?book dc:creator ?author.
}
110. Promises can be kept, because
the interface is intelligently light.
Publishing Linked Data
that can be queried on the Web
is realistic because the workload is divided.
The server doesn’t even need a triplestore.
Since the client is in charge,
querying multiple sources is easy.
111. Promises are negotiated contracts
so they always involve trade-offs.
Querying will be slower.
clients send many requests to answer a query
Query times are more consistent.
0.3 secs with a SPARQL endpoint… 95% of time
3 secs with Triple Pattern Fragments… 99.9% of time
Experiment with more complex interfaces.
112. Make your Linked Data
queryable on the Web.
Several open-source implementations:
linkeddatafragments.org/software/
Query one or multiple sources online:
client.linkeddatafragments.org
Example: bit.ly/harvard-hamburg