A presentation by Muriel Mewissen, Project Manager of the Shakespeare Registry Project.
Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.
1. Will’s World:
Walking Through Shakespeare
The use of Linked Data in the
Shakespeare Registry Project
Muriel Mewissen – Project Manager
21 September 2012 http://willsworld.blogs.edina.ac.uk 1
2. Outline
• Shakespeare Registry background
• British Museum SPARQL endpoint
• Conclusion
21 September 2012 http://willsworld.blogs.edina.ac.uk 2
3. Background
• JISC Discovery Programme 10 months projects
– Dec 11 to Sep 12
• Aim: to improve discoverability and usability of
online data through better access to better metadata
• Demonstrate the benefits and principles of
assembling metadata: ‘aggregation as a tactic’
• Focus on Shakespeare
– Lots of data
– Cultural Olympiad & Anniversary 23rd April
• Glasgow culture hack event
21 September 2012 http://willsworld.blogs.edina.ac.uk 3
5. Linked Data Fit
Questions
Users/Developers Who? What?
Registry How? Attractive? Format?
Self- sustainable, transferable Schema? License?
Content How much? Sharing? Easy
Rich, complex Access? Format?
Wikipedia on Linked Data:
“linked data describes a method of publishing structured data so that it can be
interlinked and become more useful. It builds upon standard Web technologies
such as HTTP and URIs, but rather than using them to serve web pages for human
readers, it extends them to share information in a way that can be read
automatically by computers. This enables data from different sources to be
connected and queried… using standard formats such as RDF/XML…”
Answer?
21 September 2012 http://willsworld.blogs.edina.ac.uk 5
6. Linked Data Provision
• Over 40 sources of online resources
– Royal Shakespeare Company, Shakespeare Birthplace Trust, Shakespeare’s
Globe, Shakespeare Institute, Folger Shakespeare Library, Open
Shakespeare, World Shakespeare Festival, Open Source Shakespeare, …
– British Museum, British Library, Bodleain Library, Bristish Universities Film &
Video Council, National Library of Scotland, Wellcome Images, British Library
of Sounds, JISC MediaHUB, BBC, …
– National Theatre Poster, Bosak’s Play of Shakespeare in XML, The work of the
Bard, internet Shakespeare Editions, PlayShakespeare.com, Seanco Technology
Shakespeare Quote Generator, ...
• Many images, some XML, one SPARQL endpoint!
British Museum: http://collection.britishmuseum.org/Sparql
21 September 2012 http://willsworld.blogs.edina.ac.uk 6
7. SPARQL Endpoint
• Service endpoint
• Web interface
• Run SPARQL
queries
• Linked Data
• Structured RDF
stores
21 September 2012 http://willsworld.blogs.edina.ac.uk 7
8. Using the British Museum SPARQL
Easy to start:
• Sample query: document ontologies
• Help: data structure, access & URIS
• Documentation: Controlled terms, object names thesaurus
Search for “Shakespeare”, “William Shakespeare”
• Difficult to do keyword search
• Difficult to do multi-stage search
– Find the unique ID for an entity
– Retrieve information related to the entity
• Limited or no results
• Overload the service
21 September 2012 http://willsworld.blogs.edina.ac.uk 8
9. SPARQL Common Issues
Common issues:
• Lack of documentation (ontologies, identifiers)
• Lack example queries
• Lack of identifiers
• Slow, timeouts & result size limit
• Inefficient queries (text & keyword search)
21 September 2012 http://willsworld.blogs.edina.ac.uk 9
10. SPARQL endpoint
SPARQL is not
• Relational DB (search on given value for field)
– Simple SQL query can be complex
• Text DB like Solr (flexible text search)
– Not suited for discovery
SPARQL provides links & context
Think about Linked Data in the right way
21 September 2012 http://willsworld.blogs.edina.ac.uk 10
11. Asking the Right Questions
• Structured data needs structured queries
• To build meaningful queries, we need to know:
– Data, structure, schema, identifiers
• Internally specified
How do we identify “William Shakespeare” and
related objects before we can the retrieve the
relevant Linked Data?
• Need identifier for “William Shakespeare”
• URI or ID in the British Museum schema
21 September 2012 http://willsworld.blogs.edina.ac.uk 11
12. Workflow for extracting metadata
1. Collection
Database
Search GUI
21 September 2012 http://willsworld.blogs.edina.ac.uk 12
17. Sustainable Workflow?
Workaround
• Multiple GUI searches on
Shakespeare, William
Shakespeare, Macbeth, Hamlet,….
• Manual steps
• Many small queries, few large queries
Feedback on blog post
Person ID for “Shakespeare, William”
21 September 2012 http://willsworld.blogs.edina.ac.uk 17
20. Conclusions
• SPARQL best suited to link data from different
informational silos, not suited to text search and
discovery
• Common identifiers are essentials (i.e. ISSN)
– Use of standards (ISNI), common language &
ontologies
• Documentation & example queries
• Be prepared
– To use additional data sources to identify URIs
– To run many queries
21 September 2012 http://willsworld.blogs.edina.ac.uk 20
21. Thanks
• British Museum
– SPARQL is Beta version to generate feedback
– New version available within a few months
• Owen Stephens
• EDINA
Peter Burnhill, Jackie Clark, Catherine
Fleming, Andrew Dorward, Neil Mayo, Nicola
Osborne, Christine Rees, Tim Stickland
21 September 2012 http://willsworld.blogs.edina.ac.uk 21