tPresentation from WLIC2013. Reports on a survey conducted by the Europeana Newspaper project of digitised newspaper collections in LIBER (European research) libraries.
The challenges of making Europe's newspapers available online
1. The challenge of making
digitised European newspaper
content available online
Susan Reilly, LIBER
Twitter: @skreilly
IFLA Newspapers, Singapore, Aug 2013
2. This project is partially funded under the ICT Policy Support Programme (ICT PSP)
as part of the Competitiveness and Innovation Framework Programme by the
European Community http://ec.europa.eu/ict_psp 2
Overview
Europeana Newspapers: making European
newspapers available online
Assessing the state of our digitised newspapers
collections
Where do we go from here
3. Europeana Newspapers: making European newspapers
available online
•Content from 20 countries! (13+7 new countries)
•Aggregation of more than 18 million newspapers into
Europeana
•Make newspapers more accessible by applying refinement
methods for OCR, OLR (article segmentation), and named
entity (NER) and class recognition
•Increase visibility via dedicated content browser
•Ensure sustainability by spreading best practice
4. This project is partially funded under the ICT Policy Support Programme (ICT PSP)
as part of the Competitiveness and Innovation Framework Programme by the
European Community http://ec.europa.eu/ict_psp
Asessing the state of Europe digitised newspaper
collections
•Who’s digitising newspapers?
•What percentage of newspaper
collections are digitised?
•How many pages?
•Quality of digitisation?
•How are images made available?
5. Findings:% of newspaper collections digitised
•Survey of LIBER member (400 European research
libraries)
•47 responses
• Does this indicate number of institutions digitising
newspaper?
•Less than 10% of respondents’ collections digitised
• Compared to average of 20% for % of total collection digitised
(Enumerate)
•130 million pages and 24,000 titles
• Not all libraries could provide exact figures because of
cursory nature of catalogue
6. This project is partially funded under the ICT Policy Support Programme (ICT PSP)
as part of the Competitiveness and Innovation Framework Programme by the
European Community http://ec.europa.eu/ict_psp
Findings: 20th century content an issue
•Conservative approach to copyright
terms
•½ of respondents reported a cut-off
date beyond which they do not make
content available
• Early as 1863
• Latest last 70 years
•Special arrangements with
publishers (23%)
•Collective rights agreements too
complex
7. This project is partially funded under the ICT Policy Support Programme (ICT PSP)
as part of the Competitiveness and Innovation Framework Programme by the
European Community http://ec.europa.eu/ict_psp
Findings: How accessible are the collections?
•85% provide free access
• Sometimes only at national level
•Some subscription fees/under licence
8. This project is partially funded under the ICT Policy Support Programme (ICT PSP)
as part of the Competitiveness and Innovation Framework Programme by the
European Community http://ec.europa.eu/ict_psp
Findings: How rich is the content?
•36% employ no OCR
•50% of those who did not confident
enough in results to expose OCR’d
text via search interface
•36% zoning and segmentation
•Only 6% named entity recognition
•Huge variance in metadata
• Dublin Core only
• Own standards
9. This project is partially funded under the ICT Policy Support Programme (ICT PSP)
as part of the Competitiveness and Innovation Framework Programme by the
European Community http://ec.europa.eu/ict_psp
Challenges
•Newspaper digitisation is behind
•Copyright issues more complex
•Lack of quality evaluation
technologies for OCR
•Lack of standardised metadata
suited to newspapers
10. This project is partially funded under the ICT Policy Support Programme (ICT PSP)
as part of the Competitiveness and Innovation Framework Programme by the
European Community http://ec.europa.eu/ict_psp
Solutions
•Standardised metadata mapped to EDM
•Quality evaluation technologies for OCR
•Clarity over rights issues
•Dialogue with publishers
•More funding for digitisation
• Increase visibility
11. Thank you for your attention!
http://www.libereurope.eu
http://www.europeana-newspapers.eu/