The document discusses Linked Data and SPARQL concepts including linking heterogeneous data sources without forcing a single data model. It describes using HTTP URIs and RDF to identify and describe resources on the web according to the four rules of Linked Data. The document provides an example of linking Dutch ship and sailor data from different sources and querying it using SPARQL. It emphasizes that Linked Data allows for flexible integration and reuse of existing data sources.
2. The ‘problem’
(Maritime-historical) (archival) data is not integrated
Data is “lost” or published without
reusability
In different physical locations
In different file formats
In different semantic structures
We do not want to force one
monolithic data model
Flexible integration
Re-use existing data sources
3. Linked Data
Machine readable format
Standardized
Flexibility to connect heterogeneous data
Link what can be linked
re-use and re-usability
OBJECT EVENT
PLACE
TIME
PERSON
CONCEPT
PROVENANCE
4. Open Data
is about licenses to allow reuse
Linked Data
is about technology for interoperability
What is Linked Open Data?
7. How does all this work?
Data, not documents
Structured data
Graph (networked) data!
W3C Web standards stack
URIs, HTTP, RDF, RDFa, RDFS, OWL, SPARQL, etc.
8. Four rules of Linked Data
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up
those names.
3. When someone looks up a URI, provide useful
information, using the standards (RDF)
4. Include links to other URIs. so that they can
discover more things.
http://www.w3.org/DesignIssues/LinkedData.html
9. Use HTTP URIs for Things
Uniform Resource Identifier (URI) is
a string of characters used to identify a name
of a resource
http://rijksmuseum.nl/data/schilderij1
I can go there (dereference) and then I get
information about it
HTML page for humans
RDF data for machines
10. Semantic Web standard for writing down data, information
(Subject, Relation, Object)
<Painting001, has_location, Amsterdam>
Resource Description Framework
(RDF)
Painting001 Amsterdam
has_location
11. Resource Description Format (RDF)
Triples form Graphs
rijks:Painting001
geo:Haarlem
rijks:Frans_Hals
147590
52.38084, 4.63683
geo:Noord-Holland
geo:Netherlands
rijks:Painting002
14. Dutch Ships and Sailors
KB NEWSPAPERS
Dutch-Asiatic Shipping“VOC Opvarenden”
Jur Leinenga
Matthias van Rossum
Elbing voyagesArchangel voyages
15. HETEROGENEOUS but LINKED DATAMODELS
dss:Record
gzmvoc:Telling
gzmvoc:telling-1046-De_Berkel
__bnode_1
gzmvoc:aziatischeBemanning
dss:Ship
gzmvoc:Schip
gzmvoc: schip-1046-De_Berkel
dss:has_ship
gzmvoc:schip
"1046"
“Schip”
“De Berkel”
rdfs:label
dss:scheepsnaam
gzmvoc:scheepsnaam
dss:ShipType
gzmvoc:Scheepstype
gzmvoc: type-Ship
dss:has_shiptype
gzmvoc:has_shiptype
gzmvoc:scheepstype
“21”
“Moorse
mattroosen”
dss:azRegistratieKop
gzmvoc:azAantalMatrozen
gzmvoc:telling
gzmvoc:heeft DAS heenreis
dss:Record
das:Voyage
das:voyage-1918_61
Integrate datasets
No monolithic datamodel needed
No normalisation / dumbing down of data needed
Retain original model and intent
16. Reuse: Links to other web resources
Historical Newspapers
http://delpher.nl
isReferencedBy
[HARLINGEN, 24 October.]
…gestrand. Tevens is het berigt
ontvan°e > dat het hier
behoorende schoonerschip
Transit, kapitein Schaap, in de
Noordzee is gezonken, nadat het
achterschip was weggeslagen ;
een ligtmatroos verloor daarbij
het leven. Mede zijn hier drie
vreemde schepen met meer en
minder zware averij
binnengeloopen.
23. Three main ways of accessing
remote Linked Data
1. Through HTTP request on the resource
URI
2. Through SPARQL queries
3. Get a copy of a dataset
24. 1. Through HTTP request on the
resource URI
HTTP GET on resource, parse, follow links
Simple HTTP requests and RDF parsing
Requires dereferencable URIs
One request per resource: may require many
requests
Local caching can be done
Crawling GET /resource/Amsterdam HTTP/1.1
Host: dbpedia.org
Accept: text/html;q=0.5, application/rdf+xml
I’m ok with HTML… …but I really prefer RDF
25. With CURL
curl -L -H "Accept: application/rdf+xml"
http://dbpedia.org/resource/Madrid
curl -L -H "Accept: text/turtle"
http://dbpedia.org/resource/Madrid
curl -L -H "Accept: text/turtle"
http://purl.org/collections/nl/dss/das/voyage-5580_1
With Sindice inspector (or other tool)
http://inspector.sindice.com/inspect?url=
http://inspector.sindice.com/inspect?url=http://dbpedi
a.org/resource/Madrid
26. 2. Get a local copy of a dataset
through SPARQL CONSTRUCT,
crawling or
direct file download
Save in triple store
or convert to something else
31. SPARQL – Querying the Web of
Data
query language for RDF graphs (i.e., linked
data)
extract specific information out of a
dataset (or several datasets)
"The SQL for the Web of Data"