"Adventures in Linked Data Land: bringing RDF to the Wordsworth Trust" is a paper given by RIchard Light (http://uk.linkedin.com/pub/richard-light/a/221/ba5) to a Linked Data meeting run by the Collections Trust in February 2010. He runs through the basics of LD, how it relates to cultural heritage, and some of his experiments with it, specifically with the data of the Wordsworth Trust, finally listing a series of challenges that face museums in trying to get on board the Linked Data bus.
Adventures in Linked Data Land (presentation by Richard Light)
1. Adventures in Linked Data Land:
bringing RDF to the Wordsworth
Trust
Richard Light
CT Linked Data Meeting, 22 February 2010
2. Discovering Linked Data
Four principles of Linked Data (Tim B-L):
● Use URIs to identify resources
● Use HTTP URIs so that people can look them up
● Provide useful information about the resource
● Include links to other URIs in your data
3. Discovering dbPedia
● Extraction of Linked Data from Wikipedia
● Statements in info boxes (mainly) become RDF
triples:
<rdf:Description
rdf:about="http://dbpedia.org/resource/Ber
lin_Marathon">
<dbpprop:location
rdf:resource="http://dbpedia.org/resource/
Berlin"/>
</rdf:Description>
Note the URLs
4. Browsing Linked Data
● View RDF as a web page:
http://dbpedia.org/page/Berlin
● Navigate from one data source to another
● Specialist Linked Data browsers/plugins:
– DISCO
– Marbles
– Openlink Data Explorer
– Tabulator
11. So what do we have here?
● An initiative to generate lots of Linked Data
● A Linked Data Cloud, containing a growing
number of RDF datasets
● A hard-to-use query language capable of very
precise and powerful querying
Where do museums come into this picture?
12. The Wordsworth Trust
● Typical museum collection: about 60,000 objects
● Major collection of manuscripts (notebooks,
letters, etc.)
● Objects published to the Web from a ModesXML
database
● Unwise enough to allow me Remote Desktop
access ...
16. One identifier; three “views”
● This object has a single persistent identifier:
http://collections.wordsworth.org.uk/object/GRMDC.C104.2
● This maps to different views depending on the
“Accept” header in the HTTP request:
– application/rdf+xml >> RDF
– application/xtm+xml >> XTM Topic Map
– Otherwise >> HTML (human-readable)
● Achieved through a custom 404 “page not found”
handler
17. “Page not found” handler (1)
● All URLs are fictitious, so they generate a 404
● Modified a generic smart 404 handler from:
http://evolvedcode.net/content/code_smart404/
● Added support for “303 See other” redirects
● added wild card matching to re-format URLs
18. “Page not found” handler (2)
● Generic URL, plus requested Accept format,
determine initial “303 See other” mapping, e.g.:
http://collections.wordsworth.org.uk/object/GRMDC.C104.2
+
Accept: application/rdf+xml
=
http://collections.wordsworth.org.uk/object/rdf/GRMDC.C104.2
● When this is passed back in, the 404 handler has to
generate the required RDF directly
● Can't just keep redirecting requests!
20. “Page not found” handler (4)
● Generic URL plus a supported Accept type
generates a “303 See other” redirect
● If it comes back as a page request, it is further
redirected with a “301 Moved permanently” to the
object's web page
● If it comes back as an RDF or XTM request, the
record is fetched as XML and subjected to an
XSLT transform by the handler
23. Implementation details
● HTML needed a “back link” to RDF to keep
OpenLink Explorer happy:
<link rel="alternate" type="application/rdf+xml"
href="http://collections.wordsworth.org.uk/object/data/GRMDC
.C104.2" title="RDF" />
● Result is totally unfindable: need a search or
harvesting mechanism:
– OAI support (possible)
– SPARQL end-point (harder)
24. What has been learnt? (1)
● The Linked Data paradigm encourages simple
RDF triples: no “blank nodes”
● For an object, this becomes a simple metadata set,
very analogous to the PNDS DCAP format
● The properties involved need to encapsulate the
whole relation between object and data, e.g.
<p:title>Ulswater from Pooley Bridge</p:title>
<p:technique>drawn</p:technique>
<p:maker>Farington, Joseph (1747-1821)</p:maker>
<p:technique>engraved</p:technique>
<p:maker>Middiman, Samuel (1750-1831)</p:maker>
25. What has been learnt? (2)
● Data in linked resources can “add value” to your
own recording efforts (e.g. place data)
26. Properties: which framework?
● I have used dbPedia properties (for compatibility
with other Linked Data resources … ?):
http://dbpedia.org/property/title
http://dbpedia.org/property/maker
● A viable alternative would be PNDS DCAP:
http://purl.org/dc/elements/1.1/title
http://purl.org/dc/elements/1.1/creator
● One framework which doesn't fit is the CIDOC
CRM:
E21 Physical Thing – E12 Production – E39 Actor = “creator”
27. Do we need “museum” properties?
● DbPedia properties are not coherent
● Need something richer than simple metadata
● Could use CIDOC CRM as basis
● Existing interchange formats such as LIDO could
be re-expressed in RDF
● Could broaden scope: “history” property set?
28. The problem of URIs
● Good Linked Data requires URIs everywhere
● Most of my museum RDF resolves to strings
● One exception is Geonames lookup:
Ullswater
becomes
http://www.geonames.org/2635191/
● In the absence of a central “people” registry,
should be minting URIs myself for people, etc.
29. Conclusions
● Implementing an RDF Linked Data front-end to a
museum database is feasible if:
– You can generate multiple outputs from your database
(XML is sufficient)
– You can implement a suitable URL rewriter or 404
handler
● It's easy (and a good idea) to mint and publish
URIs for your collection objects
● It's less clear where all the other URIs we'll need
will come from
30. Challenges for museum linked data
● Agreeing an ontology to enable cross-collection
[SPARQL] queries
● Shared URLs for in-common concepts: people,
places, events
● Mechanisms for getting URLs into museum data
● Getting existing authorities, e.g. AAT, to be
available as Linked Data