Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Europeana and open data
1. Europeana and Open Data
Robina Clayphan
Interoperability Manager, Europeana
LDBC TUC meeting, 19 November, 2013
2. What is Europeana?
• Europeana is a service that brings together digital
content from across the cultural heritage domain in
Europe
• It makes the metadata freely available
• It is a catalyst for change in the world of cultural
heritage.
• Our vision: We believe in making cultural heritage openly
accessible in a digital way, to promote the exchange of ideas
and information.
3. Europeana.eu, Europe’s cultural heritage portal
Museums
National Aggregators
Regional Aggregators
Archives
Thematic collections
Libraries
- A network of participants in development and innovation
- Nearly 30 million objects from 2,400 European galleries, museums, archives
and libraries
4. What types of objects does Europeana give access to?
Text Image Video Sound 3D
9. EDM requirements & principles
1. Distinction between “provided objects” (painting, book, movie,
etc.) and their digital representations
2. Distinction between objects and metadata records describing
an object
3. Allow for multiple records for a same object, containing
potentially contradictory statements about it
4. Support for objects that are composed of other objects
5. Support for contextual resources, including concepts from
controlled vocabularies
Richer metadata with finer granularity
10. Provide more semantics to the data
Build a semantic layer on top of Cultural Heritage objects
14. Properties for the Aggregation
Mandatory:
edm:aggregatedCHO
edm:dataProvider
edm:isShownBy or
edm:isShownAt
edm:provider
edm:rights
Optional:
edm:hasView
edm:object
dc:rights
edm:ugc
The aggregation represents the set of related resources about one real object
contributed by one provider. It carries the metadata that is about the whole set
15. Properties for the ProvidedCHO
The ProvidedCHO is the cultural heritage object which is the
subject of the package of data that has been submitted to
Europeana.
Properties: dc:contributor, dc:coverage, dc:creator, dc:date, dc:description,
dc:format, dc:identifier, dc:language, dc:publisher, dc:relation, dc:rights,
dc:source,dc:subject, dc:title, dc:type, dcterms:alternative, dcterms:extent,
dcterms:temporal, dcterms:medium, dcterms:created, dcterms:provenance,
dcterms:issued, dcterms:conformsTo, dcterms:hasFormat,
dcterms:isFormatOf, dcterms:hasVersion, dcterms:isVersionOf,
dcterms:hasPart, dcterms:isPartOf, dcterms:isReferencedBy,
dcterms:references, dcterms:isReplacedBy, dcterms:replaces
dcterms:isRequiredBy, dcterms:requires dcterms:tableOfContents
edm:isNextInSequence
edm:isDerivativeOf
edm:currentLocation…
16. Properties for the web resource
One or more digital representations of the provided cultural heritage
object.
dc:description
dc:format
dc:rights
dc:source
dcterms:conformsTo
dcterms:created
dcterms:extent
dcterms:hasPart
dcterms:isFormatOf
dcterms:isPartOf
dcterms:issued
edm:isNextInSequence
edm:rights
18. Contextual classes
Representing (real-world) entities related to a provided object
as fully fledged resources, not just strings
edm:Agent
foaf:name
skos:altLabel
rdaGr2:biographicalInformation
rdaGr2:dateOfBirth….
skos:Concept
skos:prefLabel
skos:altLabel
skos:broader
skos:definition….
edm:TimeSpan
skos:prefLabel
dcterms:isPartOf
edm:begin
edm:end….
edm:Place
wgs84_pos:lat
wgs84_pos:long
skos:prefLabel
dcterms:isPartOf….
19. Example of a CHO with two contextual
classes
edm:Agent
[identifier for person resource]
"D arw in, C harles"
edm:ProvidedCHO
[identi efi r for "real" object]
skos:Concept
[identifier for subject resource]
"E volution"@ en
"É volution"@ fr
"12-02-1809"
"12-04-1882"
dc:creator
dc:subject
21. How do users access Europeana
content?
Europeana aims to provide content in the users’ workflow –
where they want it, when they want it.
User focused channels: Europeana.eu portal, social media
exports
For programmers: API, search widget, semantic mark up, LOD
pilot
22. Europeana’s infrastructure is open for re-use
Europeana data available via
API
Search widgets
Semantic mark-up (schema.org) on portal
Linked Open Data pilot
http://pro.europeana.eu/api
http://data.europeana.eu
23. Some (approximate) numbers
Europeana database – 30 Million objects
LOD pilot – a subset of 20 Million objects
• contained nearly 1 Billion RDF explicit statements
• 4 Billion once you do all the RDF reasoning (sub-properties,
sub-classes, etc) in OWLIM
• Ontotext has already loaded a chunk of data and is working on
the update of it, in Europeana Creative.
24. Possible benchmarking queries?
Queries for exploring the dataset
• e.g. to generate the complete ordered list of Europeana aggregators and
the data providers they gather
Queries for exploring the objects
• e.g. a list of works with a matching location/creator/title
• Simple graph traversal
Expressing EDM constraints (that cannot be done in OWL)
• Can RDF validation help e.g where at least one of two properties must be
present (title or description)?
Queries to assist in data quality improvement
• Broken links, duplicates (or near duplicates), missing mandatory
properties, missing thumbnails etc etc
For Information: We are starting a data quality task force if you are interested!
25. Useful links
Europeana portal europeana.eu
Europeana Professional pro.europeana.eu
• EDM documentation http://pro.europeana.eu/edm-documentation
• Europeana API http://www.europeana.eu/portal/api-introduction.html
• LOD pilot http://data.europeana.eu
Data Quality task force – dimitra.astidis@kb.nl
Europeana Professional blog pro.europeana.eu/blog
Facebook facebook.com/Europeana
Twitter twitter.com/EuropeanaEU
Europeana Thought Lab pro.europeana.eu/thoughtlab/
Europeana end-user blog blog.europeana.eu/
28. EDM design requirements
Compatibility with different levels of description
• Allow different levels of granularity
• A book, a page, a detail of an image
Standard metadata format that can be specialized
• Allow the specification of domain specific application
profiles
• Enable the re-use of existing standards
• Allow the extension of the initial model
29. EDM basis
OAI ORE (Open Archives Initiative Object Reuse & Exchange) for
organizing an object’s metadata and digital representation(s)
Dublin Core for descriptive metadata
SKOS (Simple Knowledge Organization System) for conceptual
vocabulary representation
CIDOC-CRM for the modeling of event and relationships between
objects
Use the Semantic Web representation principles
• RDF
• Re-use and mix different vocabularies together
• Preserve original data and still allow for interoperability
31. Two providers and two aggregations
(the same object)
31
aggregation
of DMF
aggregation
of Louvre
v
provenance
metadata
provenance
metadata
Cultural heritage object
Structured as a Network of partici[ants in the deve and innovation work.
At a working level, we operate in a network of aggregators. Aggregators are important because they share a background with the organisations whose content they bring together, so there is close understanding.The aggregation model enables Europeana to collect huge quantities of data from thousands of providers, through only a handful of channels.
Les Miserables: Victor Hugo’s handwritten manuscripts: http://www.europeana.eu/portal/record/9200103/5372912AF66AB529E188218BC1F747E75EB1A18F.html
BnF, public domain
Matisse ‘53 in the form of a double helix’ http://www.europeana.eu/portal/record/9200104/F8D60AB9136C8A59B59DF1CFEC278A6CABA8B0C6.htmlThe Wellcome Library (CC-BY-NC-ND)
‘söprűtánc’ – Hungarian traditional dance http://www.europeana.eu/portal/record/08901/E1A7B01BE4AED87FD239672F4F3941F52262D6B2.html
Hungarian Academy of Sciences Institute for Musicology, public domain
‘Neurologico reggae’ Music album http://www.europeana.eu/portal/record/08901/ADC241BCBF8470988DBA6EEAFCF13F14D88E5534.html
DISMARC – EuropeanaConnect Paid Access
‘Castle of Kavala’ 3D exploration of a Greek castle http://www.europeana.eu/portal/record/2020703/05607B24D15BD516EE2B765F74CDA39C7427F7FB.html
Cultural and Educational Technology Institute - Research Centre Athen CARARE CC-BY-NC-ND
Example used is:
http://preview.europeana.eu/portal/record/90402/174D436CF5C61F8AA999090C98DA48B9C7024087.html
Een vrouw met een kind in een kelderkamer by Pieter de Hooch, Rijksmuseum, public domain
We had seven reqs (these are 5). We had started with a flat dc typle metadata standard which was a comon demoninator for all the different practicsa and standards of the providers. Now we were moving on we wanted a more sophisticated model that allowed us to
Accommodate differents data models with differents structures
Accommodate domain specific requirements
Keep the semantics of the original data
Semantic layer provides more context to the object
Links to related entities (people, places etc)
Allow the representation of other specific relationships
“Aboutness” of the object
Similarities
more general links such as general part-whole relation, citation, direct versioning links
Red -> for providers and Europeana
Green -> for Europeana – to allow for duplication and enrichment
This diagram shows the three core classes and the relationship between them.
The Provided CHO is the “real Thing” as it exists in the real world – the mona lisa for example.
The Web Resource is the digital representation of the providedCHO and is the resource that is accessible from europeana
The aggregation is the construct that links these objects to make a logical whole.
Each of these has a defined set of properteis that can be attached to them.
Properties that relate to the aggregation – notably the data about where the data comes from and the identifers of the real thing and its digital representations.
Has the most descriptive properties ( is backward compatible with ESE) - many dc properties and more EDM ones for describe the object and its relationships to other entities.
Some mandatory elements – DQ task force focussing on this.
dc:title or dc:description
dc:language for text objects
dc:subject or dc:type or dc:coverage or dcterms:spatial
edm:type
Red -> for providers and Europeana
Green -> for Europeana – to allow for duplication and enrichment
Poisonous nature exhibition includes content from Europeana, http://poisonousnature.biodiversityexhibition.com/
Europeana Fashion portal will go live in May 2013
Data is available as ‘data dumps’ for Linked Open Data initiatives from data.europeana.eu.
Europeana's move to CC0 is a step change in open data access. Releasing data from across the memory organisations of every EU country sets an important new international precedent, a decisive move away from the world of closed and controlled data.
Note that previews can only be used in accordance with the rights information displayed next to them.
HISPANA and Partage Plus both use the Europeana API to include Europeana search results on their own websites
These are the final tow requriments -
And here we have both examples - two providers of the same object but with different metadata. So there are two aggregations about the same object and the concept of a proxy is intorduced in order to keep the different sets of metadata apart.
Proxies not something that providers necessarily need to worry about but it si something we in Europeana need to do to fulfil the requirements to allow metadata to from different provider to co-exist for the same CHO. It is also an entity that will allow Europeana to enrich data by creating our own metadata based on provider metadata. This is quite importan for librries as there are sets of data out there that can be used – for example – VIAF for name authorities. By creating our won proxy, with our own metadata we can add these links to the provided metadata
Nightmare slide – the euroepana aggregation aggregates both providers aggregations. We also have our own proxy with our enhanced metadta added – here shown using viaf as a unique identifier and skos to give two language versions of the creator name.