A presentation by Luke McKernan, Lead Curator News & Moving Image at the British Library, for the workshop 'Working with News Data across Different Media', 7 September 2015
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
News data at the British Library
1. News data at the
British Library
Luke McKernan
Lead Curator, News and Moving Image
Working with news data across different media
7 September 2015
2. www.bl.uk 2
Map of news stories in the UK as read via Twitter (created using bit.ly links), Guardian Datablog, 16 May 2012
Changing news
3. www.bl.uk 3
Moving from a world-class newspaper service to a world-
class news service
Newspapers, television, radio and Web news
Reflection of the significant changes in news production
and consumption taking place today, but it also reflects
how news has always been consumed
News does not exist in any one form. It is sought out and
selected by its users, from the multiple forms of
information on offer
A change in how we manage news data is an essential
part of how to deliver such change
“News is information of current interest for a specific
audience”
News content strategy
The Newcastle Courant, The Huffington
Post, Today, Al Jazeera English
4. www.bl.uk 4
Newspapers
The UK national collection
34,000 newspaper titles: approximately 60M issues or
450M individual pages, from 17thC to present day
Current acquisition: 1,500 daily or weekly titles
Print copies acquired under legal deposit but will move
increasingly towards digital acquisition
Physical access at Newsroom and Boston Spa
Online access to 11M pages via British Newspaper
Archive (http://www.britishnewspaperarchive.com)
Approximately third of collection has microfilm access
copies; around 2.5% has been digitised so far
British Newspaper Archive
5. www.bl.uk 5
Television and radio news
Began recording television and radio news
programmes receivable in the UK in May 2010
Collection of over 60,000 programmes, recorded off-air
from 20 channels inc. BBC, Al-Jazeera, Russia Today,
CNN, CCTV (China), NHK, Bloomberg, France 24,
World Service, LBC
30 hours of TV and 22 hours of radio captured per day
Born digital archive, including Electronic Programme
Guide data and subtitles where available
Access onsite only, owing to copyright restrictions, via
Broadcast News service
Broadcast News
6. www.bl.uk 6
Web news
Non-print legal deposit legislation introduced in April
2013 means British Library can start harvesting UK
websites
First annual crawl collected 4.5M .uk websites and web
pages – collection now amounts to around 3Bn digital
assets
Harvesting c.1000 UK news websites (newspapers and
web-only sites e.g. hyperlocals) on daily/weekly basis,
from end of 2013, with another 500 to be added soon
Access onsite only at British Library and other Legal
Deposit libraries
Also Open UK Web Archive, smaller collection of
selected websites, openly available at
http://www.webarchive.org.uk
UK Web Archive
7. www.bl.uk 7
Our news research services
Explore.bl.uk The Newsroom Boston Spa reading room
British Newspaper Archive UK Web Archive Broadcast News
8. www.bl.uk 8
News data
2M 19thC British newspaper pages – XML, images
UK television news data 2010 onwards – EPG data for
45,000 programmes, subtitles (XML) for c.25,000
programmes, some speech-to-text files for 2011
broadcasts (XML)
UK radio news data 2010 onwards – EPG data for
15,000 programmes, some speech-to-text files for
2011 broadcasts (XML)
Financial Times – four years of content (1888, 1939,
1966, 1991) – XML, images
Web news selection – possibly
Financial Times, 1893 and 2008
9. www.bl.uk 9
Plans
All out-of-copyright UK newspapers on British
Newspaper Archive, issue level data for research re-
use, covered by single agreement, available through an
API. Possibly…
Title-level data for all newspapers we hold (34,000
titles) released as open data
More partner initiatives
Hackathon on 16 November 2015, to be followed by
other news data events in 2016
User-led development
BBC radio news script, 14/7/1969
10. www.bl.uk 10
Dreams
An open news dataset
An archive news data model
All British Library news records available at
issue level
Hyperlocal news sites: On the Wight,
The City Talking, A Little Bit of Stone
11. www.bl.uk 11
Questions
Copyright constraints limit use of much material to BL
premises – how can tools such as named entity
extraction work as a means to get round this?
How can print, web, television, radio news, and other
news media, be linked up together, and to other
resources, and how would this benefit research?
What research questions will we be able to support
through a greater focus on news data?
Is news data only for the specialist, or can more general
user-friendly applications be produced?
What can news archives learn from the management
tools for current news?
How can we help each other? TV news idents