Bastille, pop band or historical icon? How linked open data helps teachers, students and researchers find the right one. Richard Leeming, BBC/RES, UK
LoCloud Conference
Sharing local cultural heritage online with LoCloud services
Amersfoort, Netherlands
5 February 2016
The Research and Education Space (RES) is a partnership project between Jisc, the British Universities Film & Video Council (BUFVC) and the BBC that aims to make it easier for teachers, students and academics to discover, access and use material held in the public collections of broadcasters, museums, libraries, galleries and publishers.
The RES initiative is:
A platform, built by the BBC, which aggregates the catalogues of publicly-held archives and makes them accessible to the UK’s educational establishments.
A collaborative project to work with collection holders. public sector organisations, archives and libraries - to release their catalogues in the form of linked open data, to assist in the discovery of these assets
An ambition to stimulate public and private companies to build teaching products, underpinned by the platform, for the UK’s education sector.
Here’s a short film
Indexing data from
BBC, Wellcome Trust, British Museum, British Library, National Library of Wales, Wordsworth Trust, Your Art, Natural History Museum, People's Collection of Wales, The National Archive, Nature, the Library of Congress, Europeana,
Pending:
Kew Gardens, The Horniman, The National Gallery,
But to start, let’s go back 2500 years
Then the citizens of Athens had greater access to archive than we do today
They could go into the ‘metroon’ - which held all of the political, administrative and cultural documents held by the state and read and take away copies of anything they found.
But, times have changed.
There are now too many Metroons
Only a select few are allowed to enter and they may be able to look at what they find, but probably not copy it or borrow it…
There’s obviously a lot more data as well
In a story on the BBC news website IBM estimated that 2.5 exabytes - that's 2.5 billion gigabytes (GB) - of data was generated every day in 2012.
"About 75% of data is unstructured, coming from sources such as text, voice and video,”
So we need to start making some kind of sense of all that …
And this data from the National Archive neatly illustrates the issue that RES is trying to solve.
In the first data set we have Alfred Frederick Minall, in the second AF Minall, are they the same person?
We’ve set ourselves a big task … but someone once asked … How do you eat an elephant and the answer is
One mouthful at a time …
So let’s start small … with a school student looking for information about Bastille for their homework.
Do they want Bastille Day
Of the Historical event, the storming of the Bastille
Or the indie band
And this is what I get if I put Bastille into Google, now I know that this is a deliberately incomplete example – but it does illustrate how Google, while it does its job brilliantly is perhaps not the answer to the problems posed by aggregating cultural data
Google does not particularly care about provenance – we do
Google does not particularly care about authenticity – we do
Google does not particularly care about licencing – we do
Google does not particularly care about permanence– we do
Google cares about what’s contemporary – we don’t
Google cares about the number of links to an asset – we don’t
Wouldn’t it be better if, when you’d found a reliable article… that clicking on a person’s name or an event in a document would deliver you a comprehensive list of everything about them held in any institution around the world
The internet has the capability and the technology to do this, but we need to work together to deliver this change
we need to use open standards to make it work
That’s why we’ve backed Linked Open Data … a mechanism for publishing structured data on the Web about virtually anything, in a form which can be consistently retrieved and processed by software.
The result will be added to the world wide web of data which works in parallel to the web of documents our browsers usually access, transparently using the same protocols and infrastructure.
turning legacy datasets into linked open semantic data is not technically hugely difficult, but it can be time consuming and requires some specialist expertise.
Where the ordinary web of documents is a means of publishing a page about something intended for a human being to understand, this web of data is a means of publishing data about those things.
It’s a different model to Europeana as well …
With RES each institution is responsible for publishing its own data
It’s a publish once and use everywhere model
The costs are distributed
And the question is …
Does Europeana scale?
There are many advantages to publish Linked Open Data
1. Increasing relevance: open metadata can be used in places where online users congregate (including social networks), helping providers to maintain their relevance in today’s digital society.2. Increasing channels to end users: providers releasing data as open metadata increase the opportunities that users have to see their data and their content.3. Specific funding opportunities: releasing metadata openly will potentially grant providers access to national and/or European funding (European and most national governments are actively promoting open metadata).4. Brand value (prestige, authenticity, innovation): releasing data openly demonstrates that the provider is working in the innovation vanguard and is actively stimulating digital research.5. Data enrichment: open metadata can be enriched and can then be returned to the data provider. Opening the metadata will increase the possibility of linking that data and the heritage content it represents with other related sources/collections.6. Discoverability: increased use and visibility of data drives traffic to the provider’s website.7. New customers: releasing data openly offers new ways to interact with and relate to customers.8. Public mission: releasing metadata openly aligns the provider with the strategic public mission of allowing the widest possible access to cultural heritage.9. Building expertise: releasing metadata openly will strengthen the institution’s expertise in this area, which will become a marketable commodity such as consulting services.10. Desired spill-over effects: institutions and creative industries will be able to create new businesses, which in turn will strengthen the knowledge economy.
The potential risks of open metadata1. Loss of quality: the high-quality metadata provided could be divorced from the original trusted source and corrupted by third parties.2. Loss of control: institutions will no longer be able to control the metadata if anyone can re-use or distribute it.3. Loss of unity: metadata will get scattered across the digital universe while it might be better (contextually) kept together.4. Loss of brand value: by releasing data openly the institution risks being associated with re-users that they do not want to be associated with.5. Loss of attribution: by releasing data under an open licence institutions will not be credited as the source/owner of the metadata.6. Loss of income: institutions are afraid that they cannot replace current revenues from metadata with other sources of income.7. Loss of potential income: in the future institutions may think of a way to make money from metadata, but if they release it openly now someone else may do this.8. Unwanted spill-over effects: institutions find it unfair that others make money with the metadata that they provide.9. Privacy: there are privacy restrictions on the use of certain data.
Why open up your data and assets
Here’s one example I’ve taken this from a Europeana White Paper – The Problem of the Yellow Milkmaid
I’m sure you’ll all recognise The Milkmaid’, one of Johannes Vermeer's most famous pieces
During a survey the Rijksmuseum discovered that there were over 10,000 copies of the image on the internet—mostly poor, yellowish reproductions
As a result of all of these low-quality copies on the web, according to the Rijksmuseum, “people simply didn’t believe the postcards in our museum shop were showing the original painting. So, they put high-resolution images of the original work with open metadata on the web. As they said, opening up our data is the best defence against the ‘yellow Milkmaid’.”
2012: big jump in revenue - €181,000 – good?
But only 0.2% of museum total revenue staff costs €100,000
York Museums Trust released almost 60,000 images online under either a CC BY-SA 4.0 licence, or marked with a Public Domain mark at the start of 2015.
A wikipedia editor, with a specialism in Art History created (and is continuing to create) huge amounts of researched content on Wikipedia about William Etty a cornerstone of the York Museum Trust’s fine art collection.
They worked with our resident to get one of the Etty paintings ‘Preparing for a Fancy Dress Ball’ featured on the front page of Wikipedia on the day they reopened their refurbished Art Gallery. That article received 13,253 views on August 1st.
This was a completely new article. No information about this painting was available online (other than a tiny 50-word online collections entry) prior to this.
The Wikipedia article about William Etty went from 2,000 words to more than 20,000 after the editor had finished working on it. All the research carried out, outside the museum, with no work needed from our curators.
Daily Mail articles
So, how are we doing this?
A critical mass of linked open semantic data is necessary before the RES platform can really demonstrate its true power
We are working with archive collections across the UK to help them publish Linked Open Data describing their collections (including digital assets, where they exist). Although many collections are already publishing LOD or plan to, the RES project partners will be providing tools and advice to collection-holders in order to assist them throughout the lifetime of the project.
In order for data to be RES compliant there needs to be a digitised asset.
But we don’t care about the format the asset has been digitised in as it will always be served from the collection’s holders servers,
So we don’t care where it’s stored
Neither do we care how it’s licenced – free, subscription or pay per view – it’s not our business
The RES platform will not directly consume or publish digital media (audio, video, images, documents) itself. it will only index data about digital media which has been published in a form which can be used consistently by RES applications.
Each collection holder must take responsibility for writing and maintaining good quality data about their assets,
But they need to do that anyway? Right?
They also need to assign usage rights in machine-readable terms
But they need to do that anyway? Right?
Then they need to publish it as Linked Open Data on a publicly accessible server
explicitly and machine – readably online using Linked Open Data principles
The data about the representation must include a rights information triple referring to the well-known URI of a supported license.
The data describing digital assets must be made available under the terms of a supported license and include explicit licensing data in order for it to be indexed by the Research & Education Space and be useable by applications. Our approach is aligned with the Open Data Institute’s guide to publishing machine-readable rights data. And aligned with the work of the Copyright Hub.
We only keep a thin layer of assertions and links, That’s all
Where?
well because it’s Linked Open Data published under a permissive licence with the content licenced explicitly you can use it pretty much anywhere you like. How you like,
We’ll be using it in RES to transform access to content, data, information for children in schools across the UK …
But as RES will enable frictionless sharing … there is no reason why the use of our technology should be confined to education projects
So we’re opening up the opportunities for incredible collaborations between cultural organisations …
And of course the internet knows no boundaries
When RES is up and running … if you’re the curator of an exhibition at a cultural institute in the UK, you may worry about loaning physical objects from other institutions, but we’re providing the technology to make culture jams, object mashups and seamless sharing child’s play
There’s on thing I haven’t talked about and that’s the user interface.
That’s because there isn’t one.
We’re not building one.
But as I just said because RES is an open platform, using open data linking to explicitly licenced assets you can use it how you like …
As for RES, there are dozens of companies making Virtual Learning Environments or Learning Management Systems out there and we’re talking to them all.
We were at the BETT trade show two weeks ago and we were blown away by the enthusiasm of ed tech co’s … we’re talking to all the biggest ones right now about integrating RES into their learning products