Slides used for a guest lecture about Linked Data for the course "Knowledge and Media" at the VU Amsterdam (Nov 2011).
The talk takes the practical example of converting Amsterdam Museum data to Five-star Linked Open Data.
7. Four rules of Linked Data
1. Use URIs as names for things (Resources)
2. Use HTTP URIs so that people can look up those
names. (Dereferencing)
3. When someone looks up a URI, provide useful
information, using the standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can
discover more things.
http://www.w3.org/DesignIssues/LinkedData.html
8. Linked Open Data five star system
Available on the web (whatever
★
format), but with an open license
Available as machine-readable
★★ structured data (e.g. excel instead
of image scan of a table)
as (2) plus non-proprietary format
★★★
(e.g. CSV instead of excel)
All the above plus, Use open
standards from W3C (RDF and
★★★★
SPARQL) to identify things, so that
people can point at your stuff
All the above, plus: Link your data
★★★★★ to other people’s data to provide
context
www.w3.org/designissues/linkeddata.html
17. Use case on how to transform “raw” XML data into 5-star Linked Open Data
18. Europeana
• “Europeana enables people to explore the digital
resources of Europe's museums, libraries, archives and
audio-visual collections.’’
www.europeana.eu
From portal… …to data aggregator.
19. Amsterdam Museum
• Formerly Amsterdam Historic Museum
– “The rich collection of works of art, objects
and archaeological finds brings to life the
fortunes of Amsterdammers of days gone
by and today.”
• In March 2010 published their whole
collection online
– 70.000 objects
– CC license
• We converted their data to RDF
20. AM metadata
<record priref="10541“ >
• Adlib database XML API <acquisition.date>1997</acquisition.date>
<dimension>
<dimension.type>hoogte</dimension.type>
<dimension.unit>cm</dimension.unit>
<dimension.value>6</dimension.value>
• Object metadata …
</dimension>
• 73.000 objects, 256MB </record>
• Nested XML
• Concept Thesaurus <record priref="28024“ >
<term>Kalverstraat 124</term>
<broader_term>Kalverstraat</broader_term>
• 27.000, 9MB <term.type>GEOKEYW </term.type>
• Different types (geo,motif, event) </record>
• Person ‘Thesaurus’ <record priref="6" >
• 67.000 persons, 10MB <biography>boekverkoper en uitgever van
cartografie</biography>
• Consolidated from object metadata fields <birth.date.start>1659</birth.date.start>
• Creators, annotators, reproduction <death.date.start>1733</death.date.start>
<name>Aa, Pieter van der</name>
creators, institutions, <nationality>Nederlands</nationality>
<use>Aa, Pieter van der (I)</use>
</record>
21.
22. Back to the four rules of Linked Data
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up
those names.
3. When someone looks up a URI, provide
useful information, using the standards
(RDF*, SPARQL)
4. Include links to other URIs. so that they can
discover more things.
http://www.w3.org/DesignIssues/LinkedData.html
23. How to make cool URI’s
Use HTTP://
Use a namespace you control
Unique, stable and persistent
• Don’t use:
– Author name, subject, status, access, file name
extension, software mechanism
C://MyDisk/awesome/VdeBoer/latest/cgi_bin/rembrandt.html
24. Amsterdam Museum URIs
• PURL basename: http://purl.org/collections/nl/am/
• Objects: Use “prirefs”, prefixed by “proxy-”
– http://purl.org/collections/nl/am/proxy-63432
• Concepts & Persons: Use “prirefs”, prefixed by “p-”, or “t-”
– http://purl.org/collections/nl/am/p-201
• Properties (schema): Use XML element name
– http://purl.org/collections/nl/am/acquisition.date
25. Again, the rules of Linked Data
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up
those names.
3. When someone looks up a URI, provide
useful information, using the standards
(RDF*, SPARQL)
4. Include links to other URIs. so that they can
discover more things.
http://www.w3.org/DesignIssues/LinkedData.html
28. Architecture
SPARQL-app Browser
Purl.org
redirect
SPARQL Web interface
HTTP server
RDF(s) storage Logic
Prolog
http://semanticweb.cs.vu.nl/
29. How to access the data
• PURL 303 redirect to VU semantic layer
http://purl.org/collections/nl/am/proxy-63432
http://semanticweb.cs.vu.nl/europeana/browse/list_resource?r=h
ttp://purl.org/collections/nl/am/proxy-63432
• At our server: content negotiation
– HTTP request text/html:
• Local condensed view
• Local full view
– HTTP request application/rdf+xml
• rdf/xml “describe”
• SPARQL endpoint
34. Again, the rules of Linked Data
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up
those names.
3. When someone looks up a URI, provide
useful information, using the standards
(RDF*, SPARQL)
4. Include links to other URIs. so that they can
discover more things.
http://www.w3.org/DesignIssues/LinkedData.html
35. Link to other sources
“19319 ”
am:date “1651”
“1234”
am:priref
am:Record am:birthdate
am:maker am:Person
am:proxy-19319 “1606”
am:p-1234
rda:name “Rembrandt”
owl:sameAs (?)
Viaf:nationality
Viaf:Person “Dutch”
Viaf:RebrandtvanRijn
“Rembrandt
Harmensz.
rdfs:label
Van Rijn”
36. Amalgame alignment platform
• Semi-automatic matching
– Simple automatic
techniques,
– chained together by hand
• 3500+ links put in RDF
– 143 places linked to
GeoNames
– 1076 persons linked to
ULAN (VIAF)
– 34 persons linked to
DBPedia
– 2498 concepts AATNed.
38. Four rules and Five stars
1. Use URIs as names for
things
2. Use HTTP URIs so that
people can look up those
names.
3. When someone looks up a
URI, provide useful
information, using the
standards (RDF*, SPARQL)
4. Include links to other URIs.
so that they can discover
more things.
43. Some issues with L(O)D
• Extra burden on the data provider
• Nerd-only (aka “SPARQL is hard”)
• How do we build user-friendly systems?
– Ranking, user-friendly information presentation
• Scalability (how do you query a huge graph?)
• Licenses
• Is Open always a good idea?
– Context?
46. What kind of RDF?
• Europeana Data Model (EDM)
– Keep original metadata intact
– Use sem web (LD) principles: RDF
• Re-use of standard models
– Dublin Core for metadata representation
• creator, date, title etc.
– SKOS for vocabularies
• preferredLabel, hasBroader, etc.