Open (linked) bibliographic data edmund chamberlain (university of cambridge)
1. Open (linked) Bibliographic Data A perspective from Cambridge University Library Ed Chamberlain – Systems Development Librarian
2.
3.
4.
5.
6.
7.
8.
9.
10. RDF vocab and mappings – no standard dc:title 245$abh dc:alternative 130;210;240;246$abh;247;730;740 dc:contributor 100;110;700;710;730651;662 bibo:lccn 010$a dc:coverage 650,651
11.
12.
13.
14.
15.
16. Beyond bibliography … Bibliographic Holdings FAST subject headings Libraries VIAFF name authority Transactions Special collections Archives Creator / entity Place of publication LCSH subject headings Course lists Language Librarians
17.
Notes de l'éditeur
Intro -me Systems Librarian, focus on front end, When it comes to resource discovery, for several years, we’ve followed a unofficial policy of trying to meet the reader in their domain, rather than forcing them to come to us. Started off with browser toolbars, plugins for our catalogue, ejournal links in Google Scholar, developing in-house widgets for facebook, the VLE and iGoogle, cumulating in public access to some API’s for our library systems through the JISC widgets projects Open access to full datasets is arguably a next step. Already exposing data to OCLC, COPAC, Suncat etc. Quite complex, usually starts with lots of legal negotiation around usage, but why so restrictive to a club?. No harm in looking for a better way We know that open data is good value for the taxpayer or research funder. Academia has already paid for its creation after all, when limit its re-use? And we are being asked for it! Before Comet, we gave some a sample dataset to the JISC Open Bibliography project
Before that … Rufus Pollock - Cambridge Researcher / OKFN Now Rufus is not a typical academic at all, he knows a lot about data and wants to open as much up as possible, but he was persistent in asking and he initially got it under very limited criteria. But, a few motnhs later there were direct benefits for us in his work. Analysis of size of the public domain using our bibliographic data Copyright calculation services based on bib data This data and output is really useful in digitisation planning. He spotted things we had not thought of. Why should the next Rufus even have to ask? Lets remove that barrier to innovation.
Following on from Open Bibliography, we are releasing large amount of our bibliographic records into the public domain. Forgive me right now if this sounds fuzzy, its because we are still pinning down specifics of what we can put out. Formats include Marc21 and linked RDF with a triple-store – looking at getting substantially more out as RDF than we can as MARC21 Road-test linking to OCLC resources (FAST / VIAFF) Supporting documentation on whole process
We’ve moved beyond card indexes and their online clones, the OPACS, to resource discovery platforms and web scale indexes. Good stuff. But lets look beyond libraries and cultural heritage into retail, where the model of greater exposure outside of the vendor website has massively contributed to success. Amazon, eBay, iTunes and others ‘long tail’ retailers rely on ‘out-of-domain exposure’ to increase web traffic. By that, I mean that stepping outside of their websites silos, via search engine adverts, affiliate marketing schemes, retail aggregation sites, has helped in their success. Lorcan Dempsey has argued that however long your tail may be, unless it is effectively exposed, it won’t necessarily be found by its ‘niche’ audience. We can’t expect users to know to come to us. Why should we not be doing the same? Discovery outside of the library domain should be an ambition - Open data makes this feasible
We still need to develop our interfaces, but we can no longer tell people to search one place with absolute authority. Cambridge offers several catalogue and database interfaces including at least one one-stop shop. I think now there is no such thing, although there may be several first-stop shops for different audiences. To put this into some perspective, a national research library catalogue of everything is great, but a national catalogue of literature about fieldmice is also great for one audience. And we all know our local catalogues require better integration with other in house or consortia systems. So perhaps one ambition should be Multiple points of discovery at multiple levels for multiple audiences . Built on a shared community platform of open data
A final ambition. We cannot meet and cater for different needs alone. Providing open data allows others to potentially meet the use cases we cannot (and have not even thought of) for resource discovery and more . Opening data allows others to innovate on our behalf. An ambition should be to provide services for developers as well as our regular users. It does not matter if that is a 1 st year undergrad, a software supplier, MIMAS or another University. We still have control within our systems and catalogues, but beyond that, our data gets a new life of its own. With open data useable in formats familiar to developers, services could be prototyped and developed rapidly, cheaply. Qualities not always associated with library IT.
All sounds promising, but whats’ it really like …
The preferred ideal … Full unrestricted access Creative Commons Zero / Public Domain Data License … Complete dump of data There are other types of open … Examining all license options Working with OCLC But any published data is better than none at all … No good reason not to publish Will need some sort of license - more open the better
RDF vocab and mappings – no standards for bib data set in stone Coming up with own mappings – fall outside the comfort zone of the standards loving librarians Speaking to others – looking at BL, OU and OKFN output Coming up with URI structuring – follow RDTF and open gov guidlines
RDF prefers its own type of database – a triplestore – still a maturing field Needs some form of web application on top to resolve URI’s in linked data Examined a few options Keeping it sweet and simple
Understanding of capabilities of linked data Limitations of Marc encoded data Use of FAST and VIAF (next-gen authority control!) More down the line … may not be immediately apparent
Strong platform for future development, may take several ‘cycles’ Growth area in government and HE for linked data – makes sense to be in the same sphere Hugh scope for back office benefits – totally reshape the catalogue workflow. Sharing in consortia becomes a doddle.
Encourage others – provide useful documentation and code Advice on licensing – to be useful data needs a license Expose more data – from different sources Do something cool with it!