SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Challenges in the
Search of European
Cultural Heritage
Mónica Marrero
Search Engines Amsterdam, 22 February 2019
What is Europeana?
● Europeana is the European Commission's digital platform for cultural heritage.
● Europeana aggregates digital collections from libraries, museums and archives
around Europe, offering that information through its digital platform
● Through Europeana, citizens and the Cultural and Creative Industries can access
European culture for the widest possible variety of purposes.
2 / 28
Europeana Timeline
● 2005 idea: “Virtual European library, to make Europe's cultural heritage
accessible for all”, Jacques Chirac
● 2008 prototype: European Digital Library Network (EDLnet)
○ 4.5M objects
● 2009 Europeana v1.0
● 2019 Europeana
○ ~ 58M objects
○ ~ 3700 institutions
○ ~ 40 languages (although 24 official languages in EU)
3 / 28
Europeana Contents
● Collections: Books, newspapers, journals, letters, diaries, archival papers,
paintings, maps, drawings, photographs, music, spoken word, radio broadcasts,
films, newsreels, television, fashion, sculpture, 3d objects, etc.
● Aggregation:
○ As a rule, content is served from institutions
○ No storing of digital object: only metadata and thumbnails (exceptions:
user-generated content in I World War Collection and digitalization
newspaper collection)
○ Access to digital object directly from data provider’s platform
4 / 28
Europeana Contents II
5 / 28
Why is this useful for
cultural institutions?
● Bring synergies
○ Metadata collected by
institutions called
aggregators
○ Search platform
○ Re-use options
○ Clear Copyright
● Wider audience
6 / 28
United Kingdom
Wellcome Collection
Snell, A practical guide
to the examination
Technical Context
Diversity
● Different types of objects: image, video, audio, text
● Different topics: fashion, art, maps, manuscripts, etc.
● Different way institutions describe those objects
How to make these data work together?
The Europeana Data Model (EDM)
8 / 28
Europeana Data Model
● Follow Linked Open Data principles:
○ Open RDF-based model:
https://pro.europeana.eu/page/edm-documentation
○ Reuse of existing vocabularies
○ Linked to external data
● Supports the representation of metadata about:
○ The object
○ Its digital representations
○ Its provider
9 / 28
Process
Database
Data
Provider
VALIDATION
NORMALIZATION
ENRICHMENT
Metadata
Aggregator
Aggregator
Data
Provider
Data
Provider
Data
Provider
Metadata
EDM Metadata
Search
Engine
(Solr 6)
Portal
Record API
Search API
EDM Metadata
Flattened EDM
Metadata
Europeana
10 / 28
Process
Database
Data
Provider
VALIDATION
NORMALIZATION
ENRICHMENT
Metadata
Aggregator
Aggregator
Data
Provider
Data
Provider
Data
Provider
Metadata
EDM Metadata
Search
Engine
(Solr 6)
Portal
Record API
Search API
EDM Metadata
Flattened EDM
Metadata
Europeana
11 / 28
Agence de presse Mondial Photo-Presse.
France, Public Domain
1932, National Library of France
Tournoi royal de motos à Londres :
changement d'une roue de side-car en marche
Challenges I:
Enrichment
From the metadata provided...
Provided Object
Rijksmuseum
Schutters van wijk II onder leiding van kapitein
Frans Banninck Cocq, bekend als de ‘Nachtwacht
Rembrandt van Rijn
1642
Schilderij
title
provider
author
date
type
13 / 28
...to new metadata
Provided Object
Rijksmuseum
Schutters van wijk II onder leiding van kapitein
Frans Banninck Cocq, bekend als de ‘Nachtwacht
Rembrandt
1642
Schilderij
title
provider
author
date
type
Amsterdam
second quarter 17th century
Rembrandt van Rijn
painting
Schutters van wijk II led by Captain Frans
Banninck Cocq, known as the 'Night Watch'
[coord]
[date
birth]
14 / 28
Target Resources to Enrich from
edm:Agent
foaf:name
skos:altLabel
rdaGr2:biographicalInformation
rdaGr2:dateOfBirth
skos:Concept
skos:prefLabel
skos:altLabel
skos:broader
skos:related
skos:definition….
edm:TimeSpan
skos:prefLabel
dcterms:isPartOf
edm:begin
edm:end
….
Photo
Consortium
Semium
Time
edm:Place
wgs84_pos:lat
wgs84_pos:long
skos:prefLabel
skos:note
dcterms:isPartOf….
15 / 28
Source Fields used
edm:Agent
foaf:name
skos:altLabel
rdaGr2:biographicalInformation
rdaGr2:dateOfBirth
skos:Concept
skos:prefLabel
skos:altLabel
skos:broader
skos:related
skos:definition….
edm:TimeSpan
skos:prefLabel
dcterms:isPartOf
edm:begin
edm:end
….
edm:Place
wgs84_pos:lat
wgs84_pos:long
skos:prefLabel
skos:note
dcterms:isPartOf….
From:
- dc:creator
- dc:contributor
From:
- dc:subject
- dc:type
From:
- dc:date
- dcterms:temporal
- dcterms:created
- edm:year
From:
- dc:coverage
- dcterms:spatial
16 / 28
Process
Photo
Consortium
Europeana Entity
Collection
Manual selection
subsets and
transformation
to European Data
Model (RDF)
Semium
Time
Normalization
rules (eg. remove
role from
dc:creator)
Exact string
matching with
label in same
language (if
defined)
Source
Fields
Normalized
Content
17 / 28
Benefits
● Enhance the retrieval experience of
the user
○ More data to retrieve from
○ Multilinguality
○ Less ambiguity: help user to contextualize objects
○ Entity Pages increase browsing options: users can jump from one object to
others sharing common entities
18 / 28
Issues: Source and Rules
● Metadata missing, wrong, without clear format or including misleading properties
for automatic processing
○ E.g. dc:creator: Rembrandt, painter, born in July 15, 1606
○ E.g. Not standardized formats for date
○ E.g. dc:coverage: Wien, 20th century (could be provided as more precise
values with dcterms:spatial and dcterms:temporal)
● Ambiguity: two entities with same mention
○ E.g. Córdoba: city in Spain or Argentina?, Madrid: city or province?
● Cross-lingual ambiguity: wrong enrichments if no language tag
○ E.g. Inde (French) India
Inde (Latvian) poison
19 / 28
Issues: Target Resources
● Coverage and quality of target resources
○ E.g. much more resources in English than in Albanian...
○ E.g. Germania [18th century] is not in Geonames
● Domain and granularity selection
○ E.g. paper in cultural heritage is not the same as in environmental science
○ E.g. enriching with the concept culture may not help...
● Synchronism target resources
20 / 28
Challenges II:
Multilinguality
14/6/1914, concours de cycles nautiques
https://www.europeana.eu/portal/record/9200518/ark__
12148_btv1b53115115j.html.
Bibliothèque nationale de France
keywords
Current approach
Doc ranked 1: French
Doc ranked 2: Spanish
Doc ranked 3: Polish
Doc ranked 4: Polish
Doc ranked 5: Dutch
Doc ranked 5: English
search
results
Search THE SAME KEYWORDS in all languages
22 / 28
Issues
I se h ‘In i ’ in F c ,
w do I d u n n
Lat ?
Doc ranked 1: French
Doc ranked 2: Spanish
Doc ranked 3: Polish
Doc ranked 4: Polish
Doc ranked 5: Dutch
Doc ranked 5: English
search
results
keywords
23 / 28
Issues II
Wha y d o l I e t o f
pa n ? I do ’t ow h an e t
de r … an I on’t a !
Doc ranked 1: French
Doc ranked 2: Spanish
Doc ranked 3: Polish
Doc ranked 4: Polish
Doc ranked 5: Dutch
Doc ranked 5: English
search
results
keywords
24 / 28
Towards Cross-Lingual IR
Doc ranked 1: French
Doc ranked 2: Spanish
search
results
keywords
Metadata Search
Enough
multilingual data
Language tags
Analysis by
language
Input
Query translation
Output
Translation of
results
25 /28
Wrapping Up
Funambulista
https://www.europeana.eu/portal/record/2022717/bnes
earch_detalle_bdh0000020380.html.
Anónimo, s. XIX. National Library of Spain -
Main Battles
● Quality of (meta)data
● Quality of enrichment
● Cross-lingual approach
● Content retrieval
○ Challenge from a search perspective
○ Challenge from a Human Interaction perspective
● Evaluation!
27 / 28
Thanks!

Contenu connexe

Tendances

Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02
The European Library
 
04 digitising contemporary art
04 digitising contemporary art04 digitising contemporary art
04 digitising contemporary art
Europeana
 
Digitisation of Art Collections -Lucile Trunel
Digitisation of Art Collections -Lucile TrunelDigitisation of Art Collections -Lucile Trunel
Digitisation of Art Collections -Lucile Trunel
NIFT
 
EdReNe Presentation May 2010
EdReNe Presentation May 2010EdReNe Presentation May 2010
EdReNe Presentation May 2010
UNI-C
 

Tendances (20)

Hungarian National Digital Archive and Hungarian participation in Europeana
Hungarian National Digital Archive and Hungarian participation in EuropeanaHungarian National Digital Archive and Hungarian participation in Europeana
Hungarian National Digital Archive and Hungarian participation in Europeana
 
Open Fashion & Europeana Fashion
Open Fashion & Europeana FashionOpen Fashion & Europeana Fashion
Open Fashion & Europeana Fashion
 
Europeana_Newspapers_ONB_infoday_HJLieder
Europeana_Newspapers_ONB_infoday_HJLiederEuropeana_Newspapers_ONB_infoday_HJLieder
Europeana_Newspapers_ONB_infoday_HJLieder
 
Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02
 
Presentation of Europeana Regia at "The Message of the Old Book in the New En...
Presentation of Europeana Regia at "The Message of the Old Book in the New En...Presentation of Europeana Regia at "The Message of the Old Book in the New En...
Presentation of Europeana Regia at "The Message of the Old Book in the New En...
 
04 digitising contemporary art
04 digitising contemporary art04 digitising contemporary art
04 digitising contemporary art
 
Dynamics and partnerships with local associations involved in LoCloud: a case...
Dynamics and partnerships with local associations involved in LoCloud: a case...Dynamics and partnerships with local associations involved in LoCloud: a case...
Dynamics and partnerships with local associations involved in LoCloud: a case...
 
Digitisation of Art Collections -Lucile Trunel
Digitisation of Art Collections -Lucile TrunelDigitisation of Art Collections -Lucile Trunel
Digitisation of Art Collections -Lucile Trunel
 
Everything you need to know about Europeana
Everything you need to know about EuropeanaEverything you need to know about Europeana
Everything you need to know about Europeana
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europe
 
Open up your data! Linked Open Data in the Museum Plantin-Moretus
Open up your data! Linked Open Data in the Museum Plantin-MoretusOpen up your data! Linked Open Data in the Museum Plantin-Moretus
Open up your data! Linked Open Data in the Museum Plantin-Moretus
 
EdReNe Presentation May 2010
EdReNe Presentation May 2010EdReNe Presentation May 2010
EdReNe Presentation May 2010
 
You've Digitised. What Next ?
You've Digitised. What Next ?You've Digitised. What Next ?
You've Digitised. What Next ?
 
You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?
 
Europeana Introduction at Creative Kick-Off event - Breandán Knowlton
Europeana Introduction at Creative Kick-Off event - Breandán KnowltonEuropeana Introduction at Creative Kick-Off event - Breandán Knowlton
Europeana Introduction at Creative Kick-Off event - Breandán Knowlton
 
Europeana Network Association Members Council Meeting, Copenhagen by Wim van ...
Europeana Network Association Members Council Meeting, Copenhagen by Wim van ...Europeana Network Association Members Council Meeting, Copenhagen by Wim van ...
Europeana Network Association Members Council Meeting, Copenhagen by Wim van ...
 
Multilingual challenges and ongoing work to tackle them at Europeana
Multilingual challenges and ongoing work to tackle them at EuropeanaMultilingual challenges and ongoing work to tackle them at Europeana
Multilingual challenges and ongoing work to tackle them at Europeana
 
Europeana Newspapers - Data, Tools & Future Plans
 Europeana Newspapers - Data, Tools & Future Plans  Europeana Newspapers - Data, Tools & Future Plans
Europeana Newspapers - Data, Tools & Future Plans
 
Europeana en CARARE
Europeana en CARAREEuropeana en CARARE
Europeana en CARARE
 
APIdays 2018 BnF API projects
APIdays 2018 BnF API projectsAPIdays 2018 BnF API projects
APIdays 2018 BnF API projects
 

Similaire à Challenges in the Search of European Cultural Heritage

Europeana: Connecting society through aggregation
Europeana: Connecting society through aggregationEuropeana: Connecting society through aggregation
Europeana: Connecting society through aggregation
Museums Computer Group
 
Europeana Newspapers Polish Information Day
Europeana Newspapers Polish Information DayEuropeana Newspapers Polish Information Day
Europeana Newspapers Polish Information Day
Europeana Newspapers
 
Europeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop introEuropeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers
 
Europeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introductionEuropeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introduction
Europeana Newspapers
 
Luis Ferraro - DG CONNECT - culture and creativity in the digital realm 062013
Luis Ferraro - DG CONNECT - culture and creativity in the digital realm 062013Luis Ferraro - DG CONNECT - culture and creativity in the digital realm 062013
Luis Ferraro - DG CONNECT - culture and creativity in the digital realm 062013
Europeana Licensing
 

Similaire à Challenges in the Search of European Cultural Heritage (20)

Europeana: Connecting society through aggregation
Europeana: Connecting society through aggregationEuropeana: Connecting society through aggregation
Europeana: Connecting society through aggregation
 
Europeana Newspapers Polish Information Day
Europeana Newspapers Polish Information DayEuropeana Newspapers Polish Information Day
Europeana Newspapers Polish Information Day
 
Europeana Newspapers Project
Europeana Newspapers ProjectEuropeana Newspapers Project
Europeana Newspapers Project
 
Europeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop introEuropeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop intro
 
Presentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information DayPresentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information Day
 
Europeana Awareness year 2 review slides for Workpackage 2 'End-user engagement'
Europeana Awareness year 2 review slides for Workpackage 2 'End-user engagement'Europeana Awareness year 2 review slides for Workpackage 2 'End-user engagement'
Europeana Awareness year 2 review slides for Workpackage 2 'End-user engagement'
 
Taking history to the future - Verwayen
Taking history to the future - VerwayenTaking history to the future - Verwayen
Taking history to the future - Verwayen
 
Mate Toth: Digitisation and creative re-use of cultural content #blokexpertu
Mate Toth: Digitisation and creative re-use of cultural content #blokexpertuMate Toth: Digitisation and creative re-use of cultural content #blokexpertu
Mate Toth: Digitisation and creative re-use of cultural content #blokexpertu
 
The META-NET Strategic Research Agenda for Multilingual Europe 2020
The META-NET Strategic Research Agenda for Multilingual Europe 2020The META-NET Strategic Research Agenda for Multilingual Europe 2020
The META-NET Strategic Research Agenda for Multilingual Europe 2020
 
Promoting Austrian Cultural and Scientific Heritage via EUROPEANA
Promoting Austrian Cultural and Scientific Heritage via EUROPEANAPromoting Austrian Cultural and Scientific Heritage via EUROPEANA
Promoting Austrian Cultural and Scientific Heritage via EUROPEANA
 
Overview of the Europeana Newspapers Project
Overview of the Europeana Newspapers ProjectOverview of the Europeana Newspapers Project
Overview of the Europeana Newspapers Project
 
Europeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday Muehlberger
 
Museums and Europeana
Museums and EuropeanaMuseums and Europeana
Museums and Europeana
 
Europeana Newspapers -
Europeana Newspapers - Europeana Newspapers -
Europeana Newspapers -
 
Europeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introductionEuropeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introduction
 
What's up, Europeana Newspapers?
What's up, Europeana Newspapers?What's up, Europeana Newspapers?
What's up, Europeana Newspapers?
 
GI2012 pekarek-liber
GI2012 pekarek-liberGI2012 pekarek-liber
GI2012 pekarek-liber
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1...
 
Representation and Absence in Digital Resources: The Case of Europeana Newspa...
Representation and Absence in Digital Resources: The Case of Europeana Newspa...Representation and Absence in Digital Resources: The Case of Europeana Newspa...
Representation and Absence in Digital Resources: The Case of Europeana Newspa...
 
Luis Ferraro - DG CONNECT - culture and creativity in the digital realm 062013
Luis Ferraro - DG CONNECT - culture and creativity in the digital realm 062013Luis Ferraro - DG CONNECT - culture and creativity in the digital realm 062013
Luis Ferraro - DG CONNECT - culture and creativity in the digital realm 062013
 

Dernier

Dernier (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Challenges in the Search of European Cultural Heritage

  • 1. Challenges in the Search of European Cultural Heritage Mónica Marrero Search Engines Amsterdam, 22 February 2019
  • 2. What is Europeana? ● Europeana is the European Commission's digital platform for cultural heritage. ● Europeana aggregates digital collections from libraries, museums and archives around Europe, offering that information through its digital platform ● Through Europeana, citizens and the Cultural and Creative Industries can access European culture for the widest possible variety of purposes. 2 / 28
  • 3. Europeana Timeline ● 2005 idea: “Virtual European library, to make Europe's cultural heritage accessible for all”, Jacques Chirac ● 2008 prototype: European Digital Library Network (EDLnet) ○ 4.5M objects ● 2009 Europeana v1.0 ● 2019 Europeana ○ ~ 58M objects ○ ~ 3700 institutions ○ ~ 40 languages (although 24 official languages in EU) 3 / 28
  • 4. Europeana Contents ● Collections: Books, newspapers, journals, letters, diaries, archival papers, paintings, maps, drawings, photographs, music, spoken word, radio broadcasts, films, newsreels, television, fashion, sculpture, 3d objects, etc. ● Aggregation: ○ As a rule, content is served from institutions ○ No storing of digital object: only metadata and thumbnails (exceptions: user-generated content in I World War Collection and digitalization newspaper collection) ○ Access to digital object directly from data provider’s platform 4 / 28
  • 6. Why is this useful for cultural institutions? ● Bring synergies ○ Metadata collected by institutions called aggregators ○ Search platform ○ Re-use options ○ Clear Copyright ● Wider audience 6 / 28
  • 7. United Kingdom Wellcome Collection Snell, A practical guide to the examination Technical Context
  • 8. Diversity ● Different types of objects: image, video, audio, text ● Different topics: fashion, art, maps, manuscripts, etc. ● Different way institutions describe those objects How to make these data work together? The Europeana Data Model (EDM) 8 / 28
  • 9. Europeana Data Model ● Follow Linked Open Data principles: ○ Open RDF-based model: https://pro.europeana.eu/page/edm-documentation ○ Reuse of existing vocabularies ○ Linked to external data ● Supports the representation of metadata about: ○ The object ○ Its digital representations ○ Its provider 9 / 28
  • 12. Agence de presse Mondial Photo-Presse. France, Public Domain 1932, National Library of France Tournoi royal de motos à Londres : changement d'une roue de side-car en marche Challenges I: Enrichment
  • 13. From the metadata provided... Provided Object Rijksmuseum Schutters van wijk II onder leiding van kapitein Frans Banninck Cocq, bekend als de ‘Nachtwacht Rembrandt van Rijn 1642 Schilderij title provider author date type 13 / 28
  • 14. ...to new metadata Provided Object Rijksmuseum Schutters van wijk II onder leiding van kapitein Frans Banninck Cocq, bekend als de ‘Nachtwacht Rembrandt 1642 Schilderij title provider author date type Amsterdam second quarter 17th century Rembrandt van Rijn painting Schutters van wijk II led by Captain Frans Banninck Cocq, known as the 'Night Watch' [coord] [date birth] 14 / 28
  • 15. Target Resources to Enrich from edm:Agent foaf:name skos:altLabel rdaGr2:biographicalInformation rdaGr2:dateOfBirth skos:Concept skos:prefLabel skos:altLabel skos:broader skos:related skos:definition…. edm:TimeSpan skos:prefLabel dcterms:isPartOf edm:begin edm:end …. Photo Consortium Semium Time edm:Place wgs84_pos:lat wgs84_pos:long skos:prefLabel skos:note dcterms:isPartOf…. 15 / 28
  • 17. Process Photo Consortium Europeana Entity Collection Manual selection subsets and transformation to European Data Model (RDF) Semium Time Normalization rules (eg. remove role from dc:creator) Exact string matching with label in same language (if defined) Source Fields Normalized Content 17 / 28
  • 18. Benefits ● Enhance the retrieval experience of the user ○ More data to retrieve from ○ Multilinguality ○ Less ambiguity: help user to contextualize objects ○ Entity Pages increase browsing options: users can jump from one object to others sharing common entities 18 / 28
  • 19. Issues: Source and Rules ● Metadata missing, wrong, without clear format or including misleading properties for automatic processing ○ E.g. dc:creator: Rembrandt, painter, born in July 15, 1606 ○ E.g. Not standardized formats for date ○ E.g. dc:coverage: Wien, 20th century (could be provided as more precise values with dcterms:spatial and dcterms:temporal) ● Ambiguity: two entities with same mention ○ E.g. Córdoba: city in Spain or Argentina?, Madrid: city or province? ● Cross-lingual ambiguity: wrong enrichments if no language tag ○ E.g. Inde (French) India Inde (Latvian) poison 19 / 28
  • 20. Issues: Target Resources ● Coverage and quality of target resources ○ E.g. much more resources in English than in Albanian... ○ E.g. Germania [18th century] is not in Geonames ● Domain and granularity selection ○ E.g. paper in cultural heritage is not the same as in environmental science ○ E.g. enriching with the concept culture may not help... ● Synchronism target resources 20 / 28
  • 21. Challenges II: Multilinguality 14/6/1914, concours de cycles nautiques https://www.europeana.eu/portal/record/9200518/ark__ 12148_btv1b53115115j.html. Bibliothèque nationale de France
  • 22. keywords Current approach Doc ranked 1: French Doc ranked 2: Spanish Doc ranked 3: Polish Doc ranked 4: Polish Doc ranked 5: Dutch Doc ranked 5: English search results Search THE SAME KEYWORDS in all languages 22 / 28
  • 23. Issues I se h ‘In i ’ in F c , w do I d u n n Lat ? Doc ranked 1: French Doc ranked 2: Spanish Doc ranked 3: Polish Doc ranked 4: Polish Doc ranked 5: Dutch Doc ranked 5: English search results keywords 23 / 28
  • 24. Issues II Wha y d o l I e t o f pa n ? I do ’t ow h an e t de r … an I on’t a ! Doc ranked 1: French Doc ranked 2: Spanish Doc ranked 3: Polish Doc ranked 4: Polish Doc ranked 5: Dutch Doc ranked 5: English search results keywords 24 / 28
  • 25. Towards Cross-Lingual IR Doc ranked 1: French Doc ranked 2: Spanish search results keywords Metadata Search Enough multilingual data Language tags Analysis by language Input Query translation Output Translation of results 25 /28
  • 27. Main Battles ● Quality of (meta)data ● Quality of enrichment ● Cross-lingual approach ● Content retrieval ○ Challenge from a search perspective ○ Challenge from a Human Interaction perspective ● Evaluation! 27 / 28