1. The document discusses the evolving nature of library catalogues as data is increasingly shared and consumed outside of traditional library systems. It explores how catalog data is being transformed, merged with other data sources, and used in new ways.
2. Key points addressed include the release of library catalogue data using open standards like RDF and Linked Data, as well as initiatives to make metadata more accessible to developers and the public. Challenges around aging data formats and the need for more community involvement in metadata standards are also covered.
3. The future may include greater programmatic access to catalog data through APIs, as well as new lightweight metadata schemas that better support open data practices and the needs of non-library users.
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Developments in catalogues and data sharing
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21. Our local catalogues National / international aggregations Joe Public Teenage software developer / hacker Booksellers Web start-ups Search engines Wikipedia Other libraries Research group website
22.
23.
24.
25.
26.
27.
28.
29. <h1 itemprop="name”>The Cambridge companion to Spenser edited by Andrew Hadfield. [electronic resource] /</h1> <span style="display: none;" itemprop="publisher">Cambridge University Press,</span> <span style="display: none;" itemprop="datePublished">2001.</span>
34. Marc21 … 001 1000346 245$aEarly medieval history of Kashmir : $b[with special reference to the Loharas] A.D. 1003-1171 / DC XML … <dc:identifer>1000346</dc:identifer> <dc:title>Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171</dc:title> RDF triples … <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> "Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171"
35. 1. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> "Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171" . 2. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/1cb251ec0d568de6a929b520c4aed8d1> . 3. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/46657eb180382684090fda2b5670335d> . 4. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/identifier> "UkCU1000346" . 5. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/issued> "1981" . 6. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/creator> <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> . 7. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/language> <http://id.loc.gov/vocabulary/iso639-2/eng> . 8. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://RDVocab.info/ElementsplaceOfPublication> <http://id.loc.gov/vocabulary/countries/ii>
36.
37. The Linking Open Data cloud diagram - http://richard.cyganiak.de/2007/10/lod
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
Editor's Notes
I’ll look at how our catalogues, and thus the data within them has changed to meet the changing expectation of the user. Cataloguing to provide data that better serve the needs of machines (and the people that program them) Also something of a reflection on the changes that have taken place in the last 10 years …
I’m trying to frame the next 40 minutes or so as a narrative
When attempting to guess where we are going, it helps if we take a step back 1) To simplify things (a little) Librarians and cataloguers used to have full control of their data and the way it was used (consumed) - We created it (or paid others to do so for us) - Our readers consumed it, in our libraries, served via ledgers, card indexes and OPACs - We had / have policies + standards (AACR2, Marc21) procedures (LOC Authority control, organisation (RLUK, OCLC), technology (Z39.50, OPACS)
- We created and largely owned a closed ecosystem for our readers, our data and ourselves and it worked Through this control of production, control of consumption and control of material, total ownership we were successful Closed eco systems can be successful today just look at Apple not a dead concept itself, but I believe that it could not last for libraries …
And along with estate agents, travel agents, government, landlords and bookshops we needed to think again … We already had our own networks, but now there was a global one with a rapid pace of chance Whilst Apple could grow a new ecosystem, ours was under threat
2) - We slowly lost our place as a single prime authority - for data - Commercial, social and academic discovery mechanisms Other sources of information for our users to turn to and eventually for content Also had to cope with a growth in digital content - Publishing shift to digital (took as while as journals came first, they were only a small part of our business - analytical cataloguing not standard practice) – this is resulting in massive changes in metadata and discovery usage …
Library still in its bubble Alternative discovery mechanisms and academic data & content sources suddenly existed alongside our sealed environment – all very heavily branded, very slick, constantly evolving Some we pay for, some we contribute to, some we view as inferior competition – but they exist – all legitimate means to discover bibliographic material of interest to the researcher or the scholar and they act as a direct alternative to our traditional model All with their own data environments, standards, procedures, protocols – not necessarily ours In light of this I argue that we could not longer maintain the closed ecosystem – to argue as such has become a fallacy, even in the mighty libraries of Oxford and Cambridge with world class special collections
In the new environment, come new users termed Generation Y. Generation Y, it is argued have grown up and worked outside of our bubble all along - used to a very different mode of consumption for data and resources They are born between 1984 and 1990. but I would argue the concept can be stretched further, way back, probably anyone who has studied science since the mid to late 1990s … Cambridge Arcadia report 2009 Preference for search engine over catalogue Online over in-building Trust peers of librarian Still respect the library ‘brand’ All of this has lead to a direct and open questioning of the purpose of the academic library – never mind the public one
I’ll now very quickly go over how our services and interfaces have responded to this need for change, with points one and two, And what else is to come in points three and four …
Lightweight simple things Started small … Libraries gateway Search boxes in website – make it a focus Catalogue pages used to be a single link in a wall of text New approach to online services - Don’t hide things, don’t post rules, ‘ 15 tips from those in the know – it used to be that we guarded our knowledge’. When your library looks like a prison, this is pretty vital
Keyword based discovery services Rich faceting Greater linking New ways to OPAC is dead? -it is in your case, and I’m quite jealous… All possible due to richness of data – our authority controlled catalogue records generally work quite well in faceted environments – we gain a competitive edge over folk whose data is not in such good shape Catalogues are easier to pick up, easier to teach and provide a more cohesive experience, even if they don’t always work in the way we as Librarians would always like. Our data is still in use, it is valuable and relevant, partly as a result of these changes in interface And I know this, because when you launched Solo a couple of years ago, some of your undergrads became our post grads and told us what they thought of our interfaces
But our data has evolved along with the catalogue Enrichment – made possible by use of identifiers – something we do very well External data can be indexed alongside yours – redefines what a record means – breaks down the concept of it as composite entity Catalogue records co-exist alongside data from other sources –part of a framework of data
And its not just supplier data, nowadays, we let readers into our catalogues: Tags Public lists Reader reviews Dramatic growth in access points Input from true subject specialists (i.e. at least those who have read the book) Lack of structure (well, our structure is still there) No quality control Compromise of sanctity? I would argue that in the academic sphere, this is fairly self weeding, few people are likely to ‘deface’ a record On a specific niche subject area. They are also popular, Its expected by users, its useful for them and and we are doing it
Web scale -> Resource discovery concept taken further to large centralise indexes In products such as Summon, library catalogue records are taken, merged, modified and stored as part of a central index of over 800 million items. WorldCat local works on these lines, as does Ebsco Discovery service. You use Primo central, its doing the same thing but not holding onto local content … yet These are the catalogue interfaces students will be using to search your records A recent development here lies in full text enrichment from Hathi Trust – records for print material are being boosted with full text where they can. And I’m sure that now they have the technology to do this, Serials Solutions are looking elsewhere for other sources of full text Early days, but a real paradigm shifter. Full text not necessarily a substitute for metadata, but if handled correctly in the keyword based world, it could blow things wide open … Or it could fail dreadfully – but, this development is being explored by major vendors True evolution of the catalogue to the ‘net’
Catalogue data now goes through several processes The record you create is not always the record readers will see The way it is searched and accessed Yet we still build it with the same rules and container formats as we did 20 years ago
This may not always be to our tastes and practices as librarians, but in terms of reader experience this has provided a dramatic improvement in service quality, as they have a fighting chance of understanding our interfaces if all this works – we are perhaps in better shape – perhaps on the same page as some of the competition, at least for the generation Y, and for a fair few others I believe we are now doing a better job …
What comes next is rather unknown … with no map to guide us In management speak, we may be able to meet generation Y’s use cases, but we also need to be ready for the use cases we’ve not yet thought of …
(Talk over black screen …) We have to stop and think about what has changed. We’ve lost the bubble around the library. Our old, successful closed data ecosystem has been rendered obsolete or at least severely degraded by competition on the web. As data and service providers, we’ve still had to change and innovate to get back to somewhere on the same page. Much of this change has come from our technology partners, our suppliers and our vendors as well as a growing community of open source software developers in and around libraries. Its been a pretty painful few years and we’ve all had to play a lot of catch-up. To a certain extent, we are still on the back-foot. To cope better in the future, we need to get better at handling change, we need to be faster, quicker, more ready to evolve. Stepping totally away from the closed model, we cannot exist in isolation. As well as vendors and open source providers, we do need external help from outside the library community to prepare and innovate for the future One way to get encourage this help is by finding ways to making our catalogue data easier to share and for others to reuse Even in its new form of a discovery service, The library catalogue is still a silo and still exists as something of a barrier to sharing. And despite the changes in interfaces we’ve gone through, the way we create, own and share data has until now largely carried on as normal … The practices here are as much of a barrier to sharing as the technology around the data Lets look at how most research libraries currently share data …
One way to prepare is to open up. We need to share and open up our raw data and to make it easier for others to re-use. I would argue each of these groups has an equal right to our raw data as much as we do, each would have different use cases for it And by and large, in the field of online services, I’m talking about software developers but in many areas Allow others to innovate on our data on our behalf, think of those use cases and explore them.
And there is demand. This slide is based on the ideas of a certain Cambridge academic. Bibliographic data linked to many aspects of teaching and research Citation lists – measure output Shared bibliography – core of research group work Reading lists – backbone of undergraduate teaching Quality of data – in terms of consistency and accuracy and form we are much easier to handle than museums and archives All exists already, but not in an open, linked capacity that can be tied quickly and easily into other institutional and external services
This is my colleague Katies’ write up of a talk lead by Owen Stephens it really sums it all up …
Success of distributed access outside of cultural heritage – Amazon can put a lot of their success down to distributed marketing. When you discover an amazon product, its not necessarily on the Amazon site, but they’ve shared their data in such a way that you can get to their catalogue And single means / point of discovery a myth - our one stop shops are actually our first stop shops, or second stop shops. Unrealistic to expect everyone who may want to access your collections to come to you, to your interfaces and domain And in case we are worried about selling out our ‘IP’ – most of this data was funded by the taxpayer, that includes business and web startups.
We are not alone in thinking like this. There is a national and international trend for the public release of data Get ourselves into this domain
This is recognised nationally by the JISC, who earlier this year launched the discovery initiative Oxford text archive contributed a project, we did with catalogue data and they are funding some very exciting work …
Search engines – we’ve open our Aquabrowser catalogue up to web crawlers Argument in the past has always been why should we bother, I would argue why should we withhold? We've actually been here for a while (Worldcat) - no-one is using it (or are they), Used semantic tags in HTML to indicate some structure, author, title, format, availability Still a commercial application designed from advertising - Not great fit Schema data - getting better What is the use case? – Google is designed to sell … For what its worth, Google have only taken 10% So perhaps the primary reason for doing this was to shut people up! I think there are better ways to share data than simply letting the spiders in …
All semantic web talks include this – it grows every year, will be interesting to see if growth is sustained.
More access points to records Better mechanisms for record enrichment Revised cataloguing workflows – imagine LOC subject and name authority entries that simply update themselves Access to developers
Hard to understand and decode Supporting ‘stack’ not up to scratch No seriously compelling use case (yet) Other ways to provide linked data
Whatever the format, there are common challenges in getting our data to them … Now some of the challenges involved in getting there
For this to work, we need to lift the legal barriers to sharing data, make it public, make it open, and open as defined by the wider Internet community Tended to have a lot of restrictions on record re-sharing in the past, there has been a lot of movement on this area in the past two years Most Cambridge data could be released under a permissive license Europeana Digital Library approve CC0 licensing of data OCLC looking at attribution only licensing British Library BNB – Creative Commons ‘Zero’ Move away from ‘non-commercial’ wording
No one involved in the library open data movement wants OCLC to go under Record vendors are valued partners when you have truck loads of legal deposit turning up every week Focus on sharing ‘non-marc21’ formats Of greater use to the non-Librarian I think we are seeing a shift in the way we operate
Based on a 40 year old format Based on a need to print a human readable card Syntax, vocabulary, fields and content all intertwined According to OCLC Research : Only 10% of all Marc tags in Worldcat appear in 100% of all Worldcat records 65% of tags appear in less that 1% of records.
Marc – data rich, structurally poor. If you as cataloguers agonise about where to put punctuation, we developers and hackers agonise about taking it out. Its waste. We are not printing out card indexes anymore, so why use formats and standards designed for that purpose? Provide true granular data and let the interface render it depending on the rules .. Very difficult to map to RDF and emerging standards – that $d in particular, especially with you use bs, ds, and cs … Mixed fields (text and numbers) (020$a) Duplication author name 100 and 245$c ? format 100 record fields?
Marc21 is binary encoded – hard to crack, needs specialised code libraries that few software developers are willing to learn or support Most developers know how to deal with XML / JSON and are happy with it Also, all that bs, ds, and cs? Its really hard to understand and remember. It’s a dark art. To understand easily data needs to be human readable - Numbers for field names, a whole website dedicated to explaining them To learn to code with library data, you technically need to learn to catalogue – an artificial barrier Bad encoding allowed by Marc - can crash whole systems when imported – XML would stop this
LOC Bibliographic Framework Transition declares a shift away from Marc. Delay in introduction of RDA until we get a ‘better container’ No system vendor is going forward with Marc21 as the internal storage mechanism in their next generation of systems. They may allow you to write records in it. Will take 10+ years What is to come next?
So despite the change its my worry that those in charge of Marc21 and RDA developments are not thinking widely enough about the new open ecosystem in which our data must inhabit
If we don’t try and shift … It becomes easier to go to Amazon – who have awesome API’s Or even Google books (theirs are rubbish) Our status as an authority of data providers will be further eroded No-one will want to play with us if we do not share