18. Metadata related to the digitised objects produced by the cultural institutions should be widely and freely available for re-use . Key recommendations, p5
19.
20. When public data (which already has been created at public expense) is made openly available for re-use, everybody can benefit: Citizens get better information, companies can come up with new business opportunities and public administrations will (or anyhow should) be grateful for others to work and add value for everybody: this is win-win.
Cultural heritage organisations are explicitly excluded from the PSI directive
You’ve seen the Europeana portal- you probably haven’t seen it’s new look yet. It was launched last week.
The value of Europeana is mainly created by its network of content providers and aggregators who deliver data to Europeana. These are mainly museums, libraries, archives and audiovisual collections from all over Europe. Currently there are 140 direct data providers and aggregators but these represent a few hundreds of memory institutions across Europe. And over the past 3 years they have delivered more than 20m items to Europeana.
Here is a brief results’ page on the search done for chopin in Europeana. I click on the thumbnail and
I am directed in the detailed view. In Europeana we make the distinction between the digital object itself which always resides at a content provider’s side and what we call the metadata. The metadata is information about the object, it’s more or less what you call a bibliographic record in the libraries. So, Europeana harvests metadata and a link to the digital object from which it caches a thumbnail. The user is always directed to the provider’s side to view the object itself in its original “context” Europeana is therefore a gateway to the providers’ sites. Needless to say, that this 20m of data is diverse, often rich, most of it of high quality and that these datasets have been harmonised and enriched by Europeana and intermediaries and for this reason constitute a very valuable pool of information…
about the artistic, scientific, Social and political history of Europe and beyond And of everyday culture
over the past three years we had an agreement with providers based on the principles of the creative commons attribution non-commercial license although not quite a CC-BY-NC license. Metadata re-use could happen outside the portal only for non-commercial purposes and only if the long chain of providers and aggregators were attributed.
And last year we set on a mission to change the agreement and put these 20m normalised and enriched metadata records in the public domain for everyone to re-use without restrictions. Naturally, this was not easily accepted by the cultural heritage community.
Why?
First of all, because this is what our stakeholders want. And I will explain myself. Last year as part of defining our Strategy for the coming years, we asked our stakeholders what values Europeana should be delivering to them for the years to come. Our users said they wanted a trusted source, that would be easy to use and re-use in their school and leisure projects; that they could reach from their regular workflows and customary interfaces, that is, they didn’t want to have to go on europeana.eu to access cultural data.
The cultural institutions said they wanted more visibility to end-users and to politicians and the development of new services with their data that could bring potentially more revenues to them
European politicians said they wanted Europeana to contribute to social and economic inclusion through making culture more accessible to citizens. Primarily this would happen by embedding more widely the cultural data in the educational sector in europe. They also expected Europeana to take the leadership in innovating the cultural heritage sector. Through innovation in the cultural sector they expected Europeana to contribute to economic growth in Europe.
We also asked commercial players, such as telecoms, technology companies, search engines and interactive whiteboard developers and they said they wanted a one-stop-shop to access data and the providers in Europe, that they would be willing to pay for premium services and they valued the brand association Europeana was offering with the cultural sector.
So, it was clear that Europeana could not be a destination portal any more and that we needed to bring the data where the users are To share it on our providers’ sites - even the ones that show some commercial activity with wikipedia - that requires all information that is available on their site to be available for free re-use To publish the data On academics’ blogs- even ones that display google adds To publish the data On commercial sites that can help our providers’ generate income To share the data On apps that are developed for educational uses and for tourism- and we need public-private partnerships or commercial companies with better know-how to be developing and to be funding such apps. To publish it as linked open data in order to make full use of the potential of the semantic web to improve the richness and the functionalities around the data Ultimately, we want to do all this to stimulate users’ engagement with culture in order to create more culture, more knowledge and hopefully more creative projects and money for everyone And to do all these at the same time we need a very open license in order to provide unobstructed re-use of the data We could basically do none of these under the previous agreement we had with our providers which imposed a non-commercial clause in the re-use of the data
These are reflected in Europeana’s strategic plan for 2011-2015 and in particular in the distribute and engage paths.
There are parallel developments we took into account when we started to advocate for open cultural data. The comite des sages report was published ealy this year and recommended that
Metadata related to the digitised objects produced by the cultural institutions should be widely and freely available for re-use.
There is also a strong commitment from the Commission to support opening up of data that has been created with tax-payers’ money. This is the Nellie Kroes, Commissioner for the Digital Agenda who says that … For the commission, Open data would be the fuel to drive economic and social growth.
What are the main characteristics of the new agreement? It drops all restrictions in the metadata reuse It uses an internationally recognised standardised license for the release fo the data. It is a more simple agreement, worded in better legal terms It is the result of a long process of open consultation with the network It combines the previously two separate agreements Takes in input from data providers’ workshops and consultation process - a lot of articles have been improved Combines data provider and data aggregator agreement in one: the data exchange agreement We also suggest to providers to only give to Europeana what they feel comfortable sharing No need to provide metadata for complete or all collections
To recap some of the arguments I presented earlier, we had to drop the NC and the BY clauses because Mainly, because most of the metadata is factual information and there should be no copyright on Mozart’s date of birth We ask the instiutions to withhold information if they think it is that rich that it carries copyright and we ask them to give us o nly what they feel comfortable sharing Most has been created with taxpayers’ money and everyone should have the right to use it for all sorts of purposes. The German national library for example is making a 2m euro per year by selling metadata to other public libraries- that is public money paid twice for the same product. It is anyway very difficult to define the boundaries of NC. Is all commercial activity unwanted? Why should we restrict the national audiovisual archive of france from re-using Europeana data because they are selling content via their website? Is Europeana the appropriate body to police what re-use should be made with the data? And why drop attribution? And this is a very sensitive issue for the memory institutions We believe that attribution is very hard to enforce on the web especially when a long chain of intermediaries is involved and they all want to be credited- what’s been called vanity publishing. Reality though has shown that attribution helps raise the value of the information and that it is common practice in many communities such as wikipedia.So, we encourage attribution as best practice via our metadata usage guidelines. We believe that there is more to gain overall by giving up some metadata, and I will come back to this later. So, overal we believe in a standardised license that will allow a minimum threshold for re-use
It is not easy to convince the whole cultural heritage sector in Europe about the necessity and the value in giving up their metadata for free. So, what we did over the past months was a lot of talking, talking and talking and listening at dedicated workshops with museums, libraries and archives to assess with them the risks and rewards of opening up access to metadata. We held two rounds of consultations with our network on our new license. We had to raise awareness about existing initiatives like the british library publishing their data as CC0 and around new concepts for the cultural sector such as linked open data and business models for open data We had to create evidence about good re-use of the data and we run four hackathons also for commercial applications and an LOD pilot with some pioneer partners. We had to do some research- we commissioned a paper on the compliance of CC0 with the german national jurisdiction because it is one fo the most strict copyright laws. And last, but not least, we created a website to exlain our reasoning behind our Open data activities and our new agreements and to keep everyone informed about these developments
There are several risks that were identified by the institutions at the wokrhsops we did. By releasing their metadata for free, memory institutions are afraid that they will lose control over their data. They were also afraid that they would potentially lose income although there were very few libraries that actually made money out of selling hteir metadata. And these were the BL, the BNF and the German library that are also publishing them openly. The British Library in its press release about the release of their bibliographic data under a CC0 license said that it didn’t harm their business model because they would continue selling their Marc21 records while they were releasing their records for free in a Dublin Core format. The german national library is using a time embargo. That is, it is first selling their data and releasing it as open after a few months had passed. Insitutions were also afraid about potential damage to their reputation if the data was placed in wrong contexts, ie combined with nazi or pornographic content for example. There was though very little evidence of any such misuse.
The institutions also acknowledged that a lot of these fears were potential fears that could be handled with some risk management. I will not repeat here the rewards because I’ve mentioned them earlier. But on the slide you see them summarised as they were listed by the memory institutions during these workshops.
So, these are the reasons for the Europeana Foundation to adopt the Data Exchange Agreement
Now, we tried to be as transparent and inclusive in the process that led to the new agreement. We created this dedicated page that includes all relevant information.
We have published there some metadata principles that Europeana adopted where we state for example that Europeana itself doesn’ t plan to monetise on the use of the metadata.
We’ve published there our guidelines for the use of the metadata where we encourage users to give credit where credit is due. These guidelines will be embedded for example in our LOD
Where are we now? The conference of the European National Librarians has endorsed the new agreement as well as EUscreen- the aggregator for tv heritage in Europe
We will try to get everyone to sign by the end of December. We will leave a grace period of 6 months for any eventual reharvesting needed. The CC0 will be applied to all Europeana data as of 1st of July 2012. All data on europeana for which there is no Data Exchange Agreement signed with the provider will be removed from the site
over the past three years we had an agreement with providers based on the principles of the creative commons attribution non-commercial license although not quite a CC-BY-NC license. Metadata re-use could happen outside the portal only for non-commercial purposes and only if the long chain of providers and aggregators were attributed.
We created a Linked Open data pilot. Via word of mouth practically, we invited some of our partners to allow us to publish their data as LOD. This 3m data is now online. If you want to know more, check Data.europeana.eu
And it’s true- there is not yet evidence of reuse of our LOD but we are proud that Europeana is positioned now somewhere in the extreme right of the Linked data cloud. And this is strangely beautiful
There are though already real examples of re-use of our data through our search API. Unfortunately, due to our previous data agreement restrictions we have restricted the use of the Api to our partner network only. Through the hackathons we wanted to showcase what kind of innovative applications can be developed with Europeana data when developers go creative. There were 48 prototypes developed in 4 categories including a category for commercial potential. And the prizes were given by Mrs Neelie Kroes at the Digital Agenda Day which shows the Commission’s interest in supporting open data as part of the Digital Agenda strategy.
The frist prize in the commercial potential went to this application which I hope to be able to show to you. It’s an app for android developed by a polish team. [try to show the video] You take a picture of an artwork, it uses image recognition, it fetches the Europeana record and reads it out loud for you if you want.
The innovation award went to this app called timemash which is based on geolocation. A user can search on his phone for items in Europeana which are located around him. He comes across a building of which there is an old photo in Europeana and the app helps him take a picture of it today using the same angle. It then creates these then and now comparisons. It geotags the new picture so that other users can easily locate the item and have a try as well.
The Audience Awardw went to Timebook- a facebook like application for historical people whose works are in Europeana. It mashes up content with Dbpedia and shows a portrait of the artist, Links to his friends , namely other contemporary artists and his works All these are very interesting indeed for showing the potential of europeana data and I encourage you to see more apps developed at our api.europeana.eu webpage.
Now, it’s been said before that Europeana doesn’t harvest digital objects. We have been following so far what we’ve been calling the clean hands model. Providers were responsible for clearing any rights with regards to content and for supplying the correct information about it. However, we are a search engine that points to content and we therefore want to make it easy for the users to know what they can or cannot do with the content they find. So far our efforts have rather focused on getting the metadata agreement passed. We haven’t yet put a lot of effort in what I will show. What I will also talk about is an effort towards standardisation from the providers’ side. We therefore have a lot of advocacy and awareness raising work to do here. It’s not like our metadata where there is true standardisation.
So, since a few months now, the rights’ field is mandatory information the providers need to supply. A user can search based on the re-use possibilities, like “all is possible” , derivatives allowed, attribution-only, etc.
These are the guidelines they need to follow
This is the option for the rights’ statements. There are the 6 CC licenses and the public domain mark. And rather unfortunately, Europeana also has its own home grown rights’ statements- rights reserved + different access options. We will have to go back and revist the usefulness and the alternatives regarding these statements.
[give complete stats] Out of the 20m objects: 465,114 are in the public domain …
Here we see a public domain marked object. You click and it takes you to the CC PD Mark page which is great because the rights’ statement is linked to the object. And it’s both machine and human readable. The same happens with all the CC licenses we’ve seen. And if the provider has indicated that an object is under CC-BY-SA-france the user will be directed to that page. The development of the PD Mark in particular is the result of a joint effort with the CC and of our ongoing commitment to protect the PD as manifested in the PD charter. Now, we are trying to do the good thing but we are messing things at the same time… there is an additional field for rights