2. About me
Victor de Boer
Assistant professor
Web & Media Group, Network Institute
VU University Amsterdam
Semantic Technologies, Linked Data
Cultural Heritage
Digital History
Linked Data for Development
7. More and more structured data available online
Governments
Social web data
Medical data
Museums
Research data
?
Moverum.com
8. Web of Documents vs Web of Data
People are often not interested in documents,
they are interested in things (information)
Humans are very good at reading (web) documents
and distilling information
Computers are good at calculating, combining
and filtering information.
But they are very bad at reading documents
We need to help machines understand web data
Write it down in a way that they can understand
LINKED DATA!!
20. How does all this work?
Data, not documents
Structured data
Graph (networked) data!
W3C Web standards stack
URIs, HTTP, RDF, RDFa, RDFS, OWL, SPARQL,
etc.
21. Four rules of Linked Data
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up
those names.
3. When someone looks up a URI, provide useful
information, using the standards (RDF)
4. Include links to other URIs. so that they can
discover more things.
http://www.w3.org/DesignIssues/LinkedData.html
22. Semantic Web standard for writing down data, information
(Subject, Relation, Object)
<Painting001, has_location, Amsterdam>
Resource Description Framework
(RDF)
Painting001 Amsterdam
has_location
23.
24. Use HTTP URIs for Things
Uniform Resource Identifier (URI) is
a string of characters used to identify a name
of a resource
http://rijksmuseum.nl/data/schilderij1
I can go there (dereference) and then I get
information about it
HTML page for humans
RDF data for machines
25. Links
Link your data to other data
By establishing RDF triples that point to other
people’s data
By reusing other people’s URIs
27. Linked Data is ``a term used to describe a
recommended best practice for exposing,
sharing, and connecting pieces
of data, information, and knowledge on the
Semantic Web using URIs and RDF.’’ --Wikipedia
28. Why Linked Data for E-science
Large amounts of data
Efficient analysis, data mining
Sharing data, information and knowledge
between scientists
Across continents
Across disciplines
31. MultimediaN E-Culture project (2006)
Museums have increasingly nice websites
But: most of them are driven by stand-alone collection
databases
Data is isolated, both syntactically and semantically
If users can do cross-collection search, the individual
collections become more valuable!
Semantic Search
36. Johan Rudolph Thorbecke werd
in 1798 geboren op 14 januari
in Zwolle en komt uit een half-Duit
Johan Rudolph Thorbecke werd
in 1798 geboren op 14 januari
in Zwolle en komt uit een half-Duit
Linked Data for
BiograpyNet
Thorbecke
Biographical
Description
Provenance
Meta Data
NNBW
Person
Meta Data
“Thorbecke”
Biography
Parts
Birth
1798
Event
Biographical
Description
Enrichment NLP Tool
Person
Meta Data
Event
Birth
Johan Rudolph Thorbecke werd
in 1798 geboren op 14 januari
in Zwolle en komt uit een half-Duit
Zwolle
1798-01-14
39. Het Koninkrijk der Nederlanden in de Tweede Wereldoorlog
History of German occupied Dutch society (1940-1945)
Published 1969 - 1991 in 14 volumes, 30 parts, 18.000 pages
1. Digitization
2. Open Data
3. Enriched access with Linked Data
E-History: Verrijkt Koninkrijk
40. Step 1: Lou de Jong’s “Het Koninkrijk” was digitized and made
available in a reusable format
Step 2: Named Entity Recognition and consolidation of the back-of-
the-book index provide structured vocabularies with links into the text
country, collection, doc-type, volume, chapter, section, sub-section, paragraph
Back-of-the-Book index Named Entities
41. Verrijkt Koninkrijk
Step 3: Enrichment with Linked Data makes new ways of
interaction and analysis possible
Back-of-the-Book index Named Entities
47. The Problem:
((Maritime) historical) data is not integrated
• Researchers’ data is “lost”
– In different physical locations
– In different file formats
– In different semantic structures
• In a workshop, we identified 25+
maritime historical datasets.
– http://dutchshipsandsailors.nl
• We do not want to force one
monolithic data model for
integration
48. The solution: Linked Open Data
• Represent heterogeneous datasets
with their own data models
– In one data format (RDF)
– Link what can be linked to integrate at
project level (and beyond)
– Keep specificity of original data
• Links to other sources: re-use
knowledge
• Allow multiple levels of semantic
enrichment/ normalization
– through Named Graphs
– Provenance
49. What we did
1. Model four maritime historical datasets as
RDF
– Noordelijke Monsterrollen Database [J. Leinenga]
– Generale Zeemonsterrollen [M. van Rossum]
– Dutch Asiatic Shipping
– VOC Opvarenden
2. Link to each other (based on ships, ship types,
ranks, geography,…)
– Models and links evaluated by domain experts
1. Publish as Linked Open Data
2. Show how this data cloud can lead to new
types of integrated research questions
50. Links to Historical Newspapers
[HARLINGEN, 24 October.] …gestrand.
Tevens is het berigt ontvan°e > dat het hier
behoorende schoonerschip Transit,
kapitein Schaap, in de Noordzee is
gezonken, nadat het achterschip was
weggeslagen ; een ligtmatroos verloor
daarbij het leven. Mede zijn hier drie
vreemde schepen met meer en minder
zware averij binnengeloopen.
- Andrea Bravo Balado
63. Linked Data allows for new types
of Humanities research
• Integrate datasets
• Without the need to force everything into one datamodel
• Retain original model and intent, reuse another day
• New research questions (but at which level?)
• Re-use background knowledge
• Common-sense or very specific
• Digital hermeneutics
• Provenance fits very well
• Linked Data is the (technically) best way to publish and share
your research data