Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Linked Data: principles and examples
1. Linked Data
Principles and Examples
Victor de Boer
25-11-2014
With slides from Knud Hinnerk Moller, Kasper Brandt, Christophe Gueret
2. Victor de Boer
Researcher at Netherlands Institute for Sound and Vision
Assistant professor at Web and Media Group VU
Semantic Web, Linked Data
Cultural Heritage
Digital History
Linked Data for Development
7. More and more structured data available online
• Governments
• Social web data
• Medical data
• Museums
• Research data
?
Moverum.com
8. Web of Documents vs Web of Data
• People are often not interested in documents,
they are interested in things (information)
– Humans are very good at reading (web)
documents and distilling information
• Computers are very good at calculating,
combining and filtering information. But they
are very bad at reading documents
– We need to help machines understand web data
– Write it down in a way that they can understand
LINKED DATA!!
16. Intermezzo
Linked Data
is about technology for interoperability
Open Data
is about licenses to allow reuse
Intermezzo
17. Intermezzo
Linked Data five star system (TBL)
★
Available on the web (whatever format), but
with an open license
★★
Available as machine-readable structured
data (e.g. excel instead of image scan of a
table)
★★★
as (2) plus non-proprietary format (e.g. CSV
instead of excel)
★★★★
All the above plus, Use open standards from
W3C (RDF and SPARQL) to identify things, so
that people can point at your stuff
★★★★★
All the above, plus: Link your data to other
people’s data to provide context
Intermezzo
www.w3.org/designissues/linkeddata.html
19. Examples of Linked Data
• Academia, Research
• Community
• Libraries, Museums, Cultural Heritage
• Government and public institutions
(Open Data)
• Media
• Business
24. How does all this work?
• Data, not documents
• Structured data
• Graph (networked) data!
• W3C Web standards stack
– URIs, HTTP, RDF, RDFa, RDFS, OWL, SPARQL, etc.
25. Four rules of Linked Data
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those
names.
3. When someone looks up a URI, provide useful
information, using the standards (RDF)
4. Include links to other URIs. so that they can
discover more things.
http://www.w3.org/DesignIssues/LinkedData.html
26. Resource Description Framework (RDF)
Semantic Web standard for writing down data, information
(Subject, Relation, Object)
<Painting001, has_location, Amsterdam>
has_location
Painting001 Amsterdam
27. People’s
Republic of
name
located in
located in
located in
population
population
China
capital
Beijing
SJTU
23,019,148
20,693,000
Shanghai Jiao Tong
University
name
Shanghai
上海
SJTU name "Shanghai Jiao Tong University"
SJTU located in Shanghai
Shanghai name "上海"
Shanghai population "23,019,148"
Shanghai located in People’s Republic of China
People’s Republic of China capital Beijing
Beijing located in People’s Republic of China
Beijing population "20,693,000"
• Graph
• Triple
Graph Thinking
28. Use HTTP URIs for Things
• Uniform Resource Identifier (URI) is
a string of characters used to identify a name of
a resource
• http://rijksmuseum.nl/data/schilderij1
• I can go there (dereference) and then I get
information about it
– HTML page for humans
– RDF data for machines
29. Links
• Link your data to other data
– By establishing RDF triples that point to other
people’s data
– By reusing other people’s URIs
30. Example: Link to Geonames
IDS: document 0002 Country:”Gambia”
Geonames:Gambia
Region: Africa
population : 1593256
N 13° 30' 0'' W 15° 30' 0'
31. Reuse things: Vocabularies
• FOAF (Friend of a Friend): People, Organisations,
Social Networks
• Dublin Core (Bibliographic): publications, authors,
media, etc.
• schema.org (Google, Yahoo!, Bing, Yandex): cross-domain,
what search engines are interested in
(people, events, products, locations)
• Good Relations: business, products, etc.
http://purl.org/dc/terms/spatial
rijks:Painting001 Amsterdam
32. Reuse things: Datasets
• GeoNames: Geographical data
• DBPedia: RDF version of Wikipedia (also in
Dutch)
• GTAA: (Gemeenschappelijke Thesaurus
Audiovisuele Archieven): Persons, topics, AV-terms
• VIAF: Persons
http://purl.org/dc/terms/spatial
rijks:Painting001 http: //sws.geonames.org/2759794/
35. Dutch Ships and Sailors
Linked Data Cloud
Victor de Boer, Matthias van Rossum, Jur Leinenga, Rik Hoekstra
With input from Andrea Bravo Balado and Robin Ponstein
Netherlands Institute for Sound and Vision / VU University Amsterdam
v.de.boer@vu.nl
ISWC2014
36. The Problem:
((Maritime) historical) data is not integrated
25+ Maritime datasets; Heterogeneous
38. But why Linked Data
• Heterogeneous models, one dataformat
– Link what can be linked
– Keep specificity of original data
– Allow integration at project level (and beyond)
• Links to other sources: re-use knowledge
• Extensible
• Allow multiple levels of semantic enrichment/
normalization
– Provenance
39. Dutch Ships and Sailors
KB Delpher
Dutch-Asiatic Shipping (DAS) –
Voyages (Huygens ING)
“VOC Opvarenden”
Mustering and payroll information (DANS Easy)
41. Modeling in collaboration with historians (2)
dss:Record
gzmvoc:Telling
gzmvoc:telling-1046-
De_Berkel __bnode_
gzmvoc:aziatischeBemanning1
dss:Ship
gzmvoc:Schip
gzmvoc: schip-1046-
De_Berkel
dss:has_ship
gzmvoc:schip
"1046"
“Moorse
mattroosen
”
“De Berkel”
“Schip”
rdfs:label
dss:scheepsnaam
gzmvoc:scheepsnaam
gzmvoc:scheepstype
dss:ShipType
gzmvoc:Scheepst
ype
gzmvoc: type-
Ship
dss:has_shiptype
gzmvoc:has_shiptype
“21”
dss:azRegistratieKop
gzmvoc:azAantalMatrozen
gzmvoc:telling
gzmvoc:heeft DAS heenreis
dss:Record
das:Voyage
das:voyage-
1918_61
Matthias van Rossum (VU-hist)
Payroll information for European
vs Asiatic Sailors (17th / 18th C)
42. Modelling principles
• Model each dataset as directly as possible
– Only “syntactical” transformation to RDF
– No normalization
• Reusability
• Transparency, trust
• Normalize and link in second stage
– store in separate RDF Named Graphs
43. Link properties and classes to
interoperability layer
rdfs:subPropertyOf
mdb:scheepsType
mdb:Schip1 mdb:Kof
das:typeOfShip
dss:has_shipType
rdfs:subPropertyOf
das:ShipX das:Kofship
45. Linking to Historical newspapers
• Automatically detect links
between ships and historical
newspaper articles (delpher.nl)
– Based on ship name, time
intervals, captain’s names, ship
type, named entities, keywords,
background knowledge
• 179,120 links
- Andrea Bravo Balado
46. Example
[HARLINGEN, 24 October.] . «et gestrande
Zweedsche schip , waarvan wij ons vorig no.
melding maakten , is door de 'eepboot van
hier afgebragt en hier binnengede u BiJ die
gelegenheid werd ons medegeeeid, dat nog
vier vaartuigen op Terschelling aren
gestrand. Tevens is het berigt ontvan°e > dat
het hier behoorende schoonerschip
Transit, kapitein Schaap, in de Noordzee is
gezonken, nadat het achterschip was
weggeslagen ; een ligtmatroos verloor
daarbij het leven. Mede zijn hier drie
vreemde schepen met meer en minder
zware averij binnengeloopen.
Spoiler alert! It sank in the North Sea.
mdb:Transit
mdb:Aanmonstering_1859-55
47. Provenance
• Sets of triples have provenance information
– Who made it (people/software?)
– Based on what source
– Content confidence
• Matches historical
science requirements
48. DAS
GZMVOC
MDB
VOCOPV
Begunstig
VOCOPV
Soldijboek
den
en
PROV
AAT
VOCOPV
Opvaren
den
foaf
dss:hasKBLink
owl:sameAs
rdfs:subClassOf,
rdfs:subPropertyOf
dss:DAS link
skos :exactMatch
50. Current work: linking original scans
[HARLINGEN, 24 October.] . «et gestrande
Zweedsche schip , waarvan wij ons vorig no.
melding maakten , is door de 'eepboot van
hier afgebragt en hier binnengede u BiJ die
gelegenheid werd ons medegeeeid, dat nog
vier vaartuigen op Terschelling aren
gestrand. Tevens is het berigt ontvan°e > dat
het hier behoorende schoonerschip
Transit, kapitein Schaap, in de Noordzee is
gezonken, nadat het achterschip was
weggeslagen ; een ligtmatroos verloor
daarbij het leven. Mede zijn hier drie
vreemde schepen met meer en minder
zware averij binnengeloopen.
Spoiler alert! It sank in the North Sea.
mdb:Transit
mdb:Aanmonstering_1859-55
54. LinkedTV: Example of contextualization
Concept: Jan Sluijters (schilder)
DBpedia
Related items
Links
• Styles (Expressionism,
Cubism, Fauvism)
• Period (contemporaries)
55. LinkedTV – SmartTV
Cultureel erfgoed scenario, Tussen Kunst & Kitsch
Met dank aan overeenkomst met AVRO!
12 februari 2013
56. DIVE INTO THE EVENT-BASED
BROWSING OF LINKED HISTORICAL
MEDIA
VICTOR DE BOER, JOHAN OOMEN, OANA INEL, LORA
AROYO,
ELCO VAN STAVEREN, WERNER HELMICH AND DENNIS DE
BEURS
57. Media researcher Lars Arve Røssland of the University of Bergen. (Photo: Andreas R. Graven)
DIGITAL HUMANITIES RESEARCHERS
58. EXPLORATIVE SEARCH
Erp, M. van; Oomen, J.; Segers, R.; Akker, C. van de; Aroyo, L.; Jacobs, G.; Legêne, S; Meij, L. van der;O ssenbruggen, J.R. van; Schreiber,
G. Automatic Heritage Metadata Enrichment with Historic Events Museums and the Web 2011
http://www.museumsandtheweb.com/mw2011/papers/automatic_heritage_metadata_enrichment_with_hi
https://www.flickr.com/photos/drainrat/14779928998/
59.
60. DATA: OPENIMAGES.EU
Open videos Netherlands Institute for Sound and Vision
3000, mostly news broadcasts
61. DATA: DELPHER.NL
Scans of Radio bulletins (hand annotated)
• 1937 – 1984
• 1.5 Million OCR’ed and NErred
62. ENTITY EXTRACTION
ENTITY EXTRACTION
EVENTS CROWDSOURCING AND LINKING TO CONCEPTS
THROUGH CROWDTRUTH.ORG
SEGMENTATION & KEYFRAMES
LINKING EVENTS AND
CONCEPTS TO KEYFRAMES
CROWDTRUTH.ORG
63. SIMPLE EVENT MODEL (SEM),
OPENANNOTATION (OA) AND SKOS
DIVE:MEDIA
OBJECT
SEM:EVEN
T
SEM:PLACE
SEM:TIME
SKOS:CONCEPT
SEM:ACTOR
OA:ANNOTATIO
N
• LINKS TO EUROPEANA (MULTILINGUAL)
• LINKS TO DBPEDIA
69. Linked Data for International
Aid Transparency Initiative
“IATI is a voluntary, multi-stakeholder
initiative that seeks to improve the
transparency of aid in order to
increase its effectiveness in tackling
poverty.”
Msc. Thesis by Kasper Brandt
Victor de Boer
70. Linking datasets and Applications
User questions
1. In total, how much does a given country receive in
aid?
2. A comparative index of aid versus the Human
Development Index.
3. What is the geographic location of a project? How
much aid went to a given province, constituency or
village?
o Is the aid spent in places where the need is
highest? Is it well distributed across the
country?
o Can we attribute sub-national breakdowns for
aid so we can see how much goes to different
parts of recipient countries?
4. How does violent conflict in recipient countries
affect aid activities?
5. How does aid spending as registered in the IATI
standard compare to World Bank indicators?
73. Need for information sharing in rural
developing areas
• Agricultural, Health,
Education, Market prices…
Sharing (heterogeneous)
knowledge is essential
• LD is well-suited because of:
– Language-agnostic
– Interface-agnostic
– De-centralised authoring
• Slicing
– Re-usability
• Local
• Global
Based on Sbc4d.com
74. Local market data
Communiqué
Web Interface Text-To-Speech
GSM/Voice interface
Community radio
RadioMarché
Sahel Eco operative
Buyers
76. Linked Data for Development (LD4D)
Web applications
<VoiceX*ML> to SPARQL
Voice browser
Tel: +31208080855
RadioMarché
Linked market data
Skype: +990009369996162208
‘Allo, Linked
Data?
DBpedia
GeoNames
Agrovoc
77. Low-powered hardware and Mesh
networking
ENTITY REGISTRY SYSTEM (ERS)
• Fully decentralised Linked Data publication platform
• Works under any kind of connectivity context
• Tracks back individual edits back to their authors
• Simple and versatile
• Open Source https://github.com/ers-devs
• Low resource demanding
... and open for contributions so don't
hesitate to fork it!
79. With the mainstream
Dev. countries can leapfrog directly into the
information age,
jumping many phases of immature technologies
Linked Data is mainstream computer science research.
Test hypotheses in domains/environments
Img: flickr/n3v3rv0id
80. Take Home
• Linked Data is a set of technologies and principles fpr
formalizing data and information to make it usable for
computers
– Based on triples and URIs
– Data takes the form of graphs
– We can link data from heterogeneous sources
– Reuse
• It mirrors the Web of Documents, Social Web
– But behind the scenes
• Networks are very powerful and flexible for
representing and sharing information
Laura doet:
- Sem tech / search
- Patronen
Cases modeleren en publiceren van Linked Data
Modeleren van Events
- Polimedia
( - E-culture)
Victor doet:
- Am.museum
- Tools – Carmen?
- Historische use cases
- VK, Bioned, DSS
-interestingly, when you look at Tim Berners-Lees original proposal for the WWW from 1989, you can see that he already had some sort of Semantic Web in mind
-sure, there are documents, but there are also concepts like “Computer Conferencing”, there are organisations, there are people
-also, the links between all those nodes in the graph were meant to be much more expressive than just simple, untyped hyperlinks, like we have them on the WWW today
-interestingly, when you look at Tim Berners-Lees original proposal for the WWW from 1989, you can see that he already had some sort of Semantic Web in mind
-sure, there are documents, but there are also concepts like “Computer Conferencing”, there are organisations, there are people
-also, the links between all those nodes in the graph were meant to be much more expressive than just simple, untyped hyperlinks, like we have them on the WWW today
- Before we go into the details with specific technologies, we’re going to give you some example of where this type of thinking has been applied
Web semantics and Linked Data started out in academia in computer science and AI research, so that’s where originally most applications could be found
However, this quickly moved from there to other domains in research
Then into community efforts, public institutions, cultural institutions, as well as governments and administration – areas where was probably not to make a profit
Media with their vast amount of content have increasingly made use of Linked Data
Businesses are also recently seeing more and more benefits from using structured, linked data, for different reasons (internal efficiency, graph data as an improvement to the product (search engines), graph data as a valuable resource)
BBC Wildlife Finder
Also aggregating data from Wikipedia, etc.
Things = “resources”
-We have seen this graph before
-in a nutshell, this is RDF
-RDF is a graph-based datamodel, as opposed to a relational datamodel, which would be based on tables
-in RDF, the atomic unit of information is the triple, a structure that consists of three parts
-let’s see how many triples there are in this graph
Monsterrollen-database 1803-1937:
Monsterrollen zijn bemanningslijsten met naam, rang, gage, woonplaats en leeftijd van elke zeeman aan boord, evenals de naam, het type en de grootte van het schip.
[…] voor Groningen en Friesland ligt het begin pas in de negentiende eeuw. Ze gunnen ons een kijkje in het beroepsleven van de zeeman in de negentiende en begin twintigste eeuw.
Matthias van Rossum onderzocht de verhoudingen tussen Europese en Aziatische zeelieden onder de Verenigde Oost-Indische Compagnie (1602-1795) erg gelijkwaardig waren. Dat is in scherp contrast met de latere 19de eeuwse situatie, toen Aziatische zeelieden in een ongelijkwaardige en soms onvrijere positie werkten onder slechtere behandeling en beloning. Het werken onder de VOC werd bovendien gekenmerkt door een nuchter multiculturalisme.
video2video, text2video, video2text, image2video, etc
VERSCHILLEN:
LTV focust op zoeken naar externe bronnen (vid2text). Doel binnen LTV is context (relevante info), doel binnen AXES is search/browsing (meer lijkend op Topic Detection and Tracking).
Zoeken is in AXES het startpunt van linken.
Bij LTV is dat de video zelf.
OVEREENKOMSTEN: