The document summarizes the British Library Labs project, which supports digital scholarship. It discusses how the Library works with digital scholars and researchers, providing digitized collections and expertise. Examples include text analysis tools developed using newspaper archives, creative competitions, and crowdsourcing projects tagging images and georeferencing maps. The Labs project aims to open up more collections, support new research methods, and engage researchers in experimenting with digital collections.
2. Supporting the Digital Scholar:
Experiences from the British Library Labs
Mahendra Mahey
Manager of British Library Labs
2014 NFAIS Annual Conference
Sunday 23rd February, 2014, 1600 – 1645 (EST)
3. Overview
• Structure of talk
• The British Library and a typical scholar
• The Nature of Digital and the Digital Scholar
• The British Library supporting Digital Scholarship
• Experiences of the Digital Research Team and British
Library Labs project in supporting digital scholarship
• Conclusions and questions
http://labs.bl.uk
#bl_labs
labs@bl.uk
3
4. The British Library
St Pancras, London, UK
Many books are stored 5 stories below the building
Storage at Boston Spa
Uses low oxygen and robots
Inside the British Library
Space for 1200 readers, around 400,000 visitors per year
http://labs.bl.uk
#bl_labs
labs@bl.uk
4
5. British Library Collections
> 150 million items
> 14 m books
> 60 m patents
> 8 m stamps
> 4 m maps
> 3 m sound recordings
> 1.6 m musical scores
> 0.3 m manuscripts
King’s Library
http://labs.bl.uk
> 0.8 m serial titles
#bl_labs
labs@bl.uk
5
6. Our Scholar in Humanities…
• Travel routes in the 19th Century
Pieter Francois
Post doctoral researcher at University of Oxford
http://labs.bl.uk
#bl_labs
labs@bl.uk
6
7. The Nature of Digital
Data broken down
recombined and
duplicated
http://labs.bl.uk
Image: Tower of Babble, Book Sculpture by Brian Dettmer
#bl_labs
labs@bl.uk
7
8. The Digital Scholar
Open
Networked
Digital
From Digital Scholar : How technology is transforming scholarly practice, Martin Weller, Bloomsbury Academic, 2011, page 4
It is someone who employs digital, networked and open
approaches to demonstrate their specialism.
not necessarily be a recognised academic or someone who posts online,
just a specialist
http://labs.bl.uk
#bl_labs
labs@bl.uk
8
9. “Reading individual
works is as irrelevant as
describing the
architecture of a building
from a single brick, or
the layout of a city from
a single church.”
-Franco
http://labs.bl.uk
#bl_labs
labs@bl.uk
9
10. Example Digital research methods
Corpus analysis tools/
Text Mining
Crowdsourcing /
Human Computation
Location based searching
Using Application Programming Interfaces
for datasets e.g. Metadata, Images
Geotagging
Visualisations
Annotation
Transcribing
Natural Language
Processing
http://labs.bl.uk/Launch+Event (presentations from researchers using digital research methods)
http://labs.bl.uk
#bl_labs
labs@bl.uk
10
11. Digitisation at the British Library
http://labs.bl.uk
#bl_labs
labs@bl.uk
11
12. Digitised Books
68,000 volumes digitised with Microsoft
17th, 18th and 19th Century
Temple at Navoo?
Natural History: The Hippopotamus
250,000 books digitised with Google
http://labs.bl.uk
Image taken from page 144 of 'Philadelphia: the story of an American
city ... Issued by the City of Philadelphia under the auspices of the
Joint Special Committee of Councils on World's Columbian Exposition.
#bl_labs
labs@bl.uk
12
16. Digitisation - Transforming access
Spreading the value of collections, content and expertise
Connecting as much as collecting, e.g. social media
Encouraging others to integrate our materials into their
services – and vice versa
http://labs.bl.uk
#bl_labs
labs@bl.uk
16
18. Digital Scholarship Department
…become a leading centre of digital scholarship
… internationally recognised for innovation and
collaboration in support of research and
learning…
• The Digital Research Team
– Digital Curators
• Labs
http://labs.bl.uk
#bl_labs
labs@bl.uk
18
18
19. What is a Digital Curator?
• Explore how digital technologies are
re/shaping research and how this
informs how the library does its
business.
• Support staff across the library to identify
the opportunities that digital tools and
collections afford in modern scholarship
and to gain the skills to engage confidently
in this area.
Stella Wisdom
Aquiles
Alencar-Brayner
James Baker
Nora McGregor
• Partner with libraries and institutions to
enable innovation in digital scholarship.
• No specific collection but rather expertise
in digital scholarship, broadly defined.
http://labs.bl.uk
#bl_labs
labs@bl.uk
19
20. Training Library Staff
Digital Scholarship
Training Programme
•
•
•
•
•
•
•
Behind the Screen: Basics of the Web
What is Digital Scholarship?
Digital Collections at British Library
Digitisation at British Library
Text Encoding Initiative & Annotation
Geo-referencing and Digital Mapping
Crowdsourcing in Libraries, Museums
and Cultural Heritage Institutions
http://labs.bl.uk
•
•
•
Foundations in working with Digital Objects:
From Images to A/V
Data Visualisation for Analysis in Scholarly
Research
Information Integration: Mash-ups, API’s and The
Semantic Web
#bl_labs
labs@bl.uk
20
21. Opening up Digital content
• Picturing Canada: Mapping a Collection:
http://bit.ly/13GhLIe
http://commons.wikimedia.org/wiki/Commons:British_Library/Picturing_Canada
http://labs.bl.uk
#bl_labs
labs@bl.uk
21
23. Creative with Wildlife Sounds
'Dave's Wild Life' by
Samuel de Ceccatty, won first prize!
http://vimeo.com/60401313
Sound Edit Wildlife Films
Competition 2013
http://goo.gl/s7siv0
http://sounds.bl.uk/Environment
http://labs.bl.uk
#bl_labs
labs@bl.uk
23
24. Computer Games
Off the Map Competition 2013
http://youtu.be/SPY-hr-8-M0
Pudding Lane Productions, 6 second-year students,
De Montfort University, Leicester, won first prize.
http://labs.bl.uk
Off the Map
Gothic 2014 !
launches soon!
#bl_labs
labs@bl.uk
24
27. What is Labs…
Competition
Case
Studies
idea
BL Digital
Collection /
Data
Researchers
Contact
Data Driven
Open
Software
BL Labs
Developers
Publications
idea
Events
Other Digital
Collection / Data
Meetings
and visits
Audience
http://labs.bl.uk
Data
Research
question / idea
Experimenting with our
digital collections
#bl_labs
Tools &
services to
support Digital
Scholarship
Outputs from
engagement
labs@bl.uk
27
28. Sample Labs Digital Collections
http://labs.bl.uk/Digital+Collections
• Copyright cleared for research
use
• Curated (Is there someone who
knows the ‘story’ about the
collection?)
Text-mining of
electronic journals
British National
Bibliography
• Collection / Item Level
Metadata available? (What state is
and does it need cleaning?)
• Where is it?
Book ordering and
anonymised reader
data
UK Web Archive Data
http://labs.bl.uk
#bl_labs
labs@bl.uk
28
29. Engaging with Labs
Hack and Data days
2
1
3
Reflect, consider, and choose
Brainstorm ideas & group
Work late and show what has
been done
Ideas Labs
Projects
Labs Data Cards
http://labs.bl.uk
#bl_labs
labs@bl.uk
29
30. The winners of the Labs 2013 competition
They both worked in residence from July to October 2013
with Labs to complete their projects
Two entries chosen in June 2013
Pieter Francois (left) and Dan Norton (right)
and each received a cheque for £2000 in November 2013
as winners of the first British Library Lab Competition 2013
http://labs.bl.uk
#bl_labs
labs@bl.uk
30
31. Sample Generator: representative samples
• Pieter Francois
• Focus on European travel in the
19th Century
• Uses statistical methods to
support text analysis
• Tool produces representative
samples of texts based on
search criteria
http://goo.gl/YFnZmu
http://labs.bl.uk
#bl_labs
labs@bl.uk
31
32. Mixing the Library:
The Disc Jockey & the Digital Collection
Annotation
http://www.tompro.co.uk
Preview ‘item’
‘Play back’ of ‘items’ (Blue)
and annotations (Yellow)
http://www.ablab.org/pd/di/
Selected ‘left’
channel ‘item’
Collection ‘stalks’ made of ‘items’. Each ‘item’ is a URL.
The order of the ‘items’ can be ‘shuffled’ and sent to the ‘left’ or ‘right’ channels
Selected ‘right’
channel ‘item’
Prototype design
Basic functioning prototype: http://212.71.253.54:8000/a
http://www.ablab.org/shetland
http://labs.bl.uk
Living Lab: Library of the Future, see: http://alturl.com/284zw
#bl_labs
labs@bl.uk
32
33. Curatorial for Library metadata
India Office Select materials
Geo location
http://datatales.artefacto.org.uk/
Timeline
Slide show
http://labs.bl.uk
#bl_labs
labs@bl.uk
33
34. Story of one digital collection
What can 68,000
books tell us?
Image: Artwork by Alicia Martin
http://labs.bl.uk
#bl_labs
labs@bl.uk
34
35. Extracting Images from OCR
Image sn
al
Optic ter
ac
Char nition
g
Reco
on
Digitisati
<?xml version="1.0"
encoding="UTF-8" ?>
- <mets:mets
xmlns:xsi="http://ww
w.w3.org/2001/XML
Schema-instance"
xmlns:mets="http://w
ww.loc.gov/METS/"
xsi:schemaLocation=
"http://www.loc.gov/
METS/
http://www.loc.gov/
standards/mets/ver
sion18/mets.xsd
info:lc/xmlns/premi
s-v2
t
ipped ou
Image snipped out
Algorithmically
From XML
XML
Image taken from page 207 of 'London and its Environs. A
picturesque survey of the metropolis and the suburbs ...
Translated by Henry Frith. With ... illustrations'
http://labs.bl.uk
#bl_labs
35
35
36. Face Recognition of 19th Century Faces
The face-recognition algorithm worked
better for female faces than men’s
http://labs.bl.uk
#bl_labs
labs@bl.uk
36
37. The Mechanical Curator
http://mechanicalcurator.tumblr.com
Image from ‘A Lost Estate, by Mary E.Mann,Volume:
02,
Page: 91, 1889, London, Bentley & Son
http://labs.bl.uk
• #similar_to_77576796197_published_date
• #similar_to_77576796197_slantyness
• #similar_to_77576796197_bubblyness_x
• #similar_to_77576796197_bubblyness_y
• #new_train_of_thought
#bl_labs
labs@bl.uk
37
38. Flickr Commons – 1,020,418 images!
1,020,418 images!
http://www.flickr.com/photos/britishlibrary/
Each image has a URL
Some metadata, but you can add tags!
Flickr has an API so researchers and developers can build apps
And query the data
http://labs.bl.uk
#bl_labs
labs@bl.uk
38
39. Flickr in numbers
119,000,000 !!!
image views since launch December 13th, 2013
47,714 tags added
18,567 images favourited
Labs involved with 2 potential research projects & 4 grassroots crowdsourcing efforts.
http://labs.bl.uk
#bl_labs
labs@bl.uk
39
40. Risks of releasing the images
Funny Books for Boys and Girls. Struwelpeter. Good-for-nothing Boys
and Girls. Troublesome Children. King Nutcracker and Poor Reinhold.
http://labs.bl.uk
#bl_labs
labs@bl.uk
40
41. Opportunities
– increasing traffic to Library services
Grouping for image
Download .pdf
View the item in
the Library Catalogue
All illustrations
in book
You can purchase
a ‘High Res’ Copy
Other illustrations in books
Published in same year
User generated
Tag
Tags auto generated
View in the
Library Item Viewer
http://labs.bl.uk
#bl_labs
labs@bl.uk
41
44. Other Labs stories….
• Augmenting news metadata
• Digital Music Lab, analysing music performances
• Opening up over 100,000 Playbills
• 3D printed objects representing statistical data with possibly
embedded USBs and RFID chips
• data.bl.uk, place for all our open data and digital collections
• Content next to parallel compute power, analysis at scale
• Seeking future funding!!
http://labs.bl.uk
#bl_labs
labs@bl.uk
44
45. Competition 2014
• Open!!
• Deadline - 22 April 2014 – tell your friends!
• Residency between late May and end of October 2014
http://labs.bl.uk
#bl_labs
labs@bl.uk
45
46. Conclusions
• Huge appetite for openly available digital content
• There needs to be a continuous dynamic interaction with
data and the researchers to formulate and reformulate
research questions
• Working with Digital Scholars creates new opportunities
• Content and service providers, researchers and technical
people need to talk to each other to create the new tools,
services and data needed to facilitate new discoveries
http://labs.bl.uk
#bl_labs
labs@bl.uk
46
47. What excites you most about digital
scholarship?
1 Opening up digital content
2 New research methodologies opening up
new discoveries
3 New commercial opportunities
4 How technology is enabling new research
5 Thinking of new ideas to support digital
scholarship
http://labs.bl.uk
#bl_labs
labs@bl.uk
47
48. Acknowledgements
Digital Curator Team
Stella Wisdom
- Digital Curator
Nora McGregor
- Digital Curator
Digital Scholarship Heads
Aly Conteh
- Head of Digital Research and
Curator Team
Adam Farquhar
- Head of Digital Scholarship
(Wrote Labs proposal)
James Baker
- Digital Curator
Ben O’Steen
- Labs Technical Lead
http://labs.bl.uk
#bl_labs
labs@bl.uk
48
49. Email Labs
• Let us know your ideas for engaging with Labs!
• Questions? Speak to me at the Welcome Reception.
labs@bl.uk
http://labs.bl.uk
#bl_labs
labs@bl.uk
49
The presentation is available to download from the above URL (please note that this is case sensitive). If you would like to tweet about it, please use the #bl_labs hash tag as well as #nfais14.
80 seconds
Hello everyone,
&lt;click1&gt;
my name’s Mahendra Mahey. I hope you have been enjoying the presentations from our esteemed speakers today. I realise that I am the last speaker of the day, standing between the Assembly Meeting and the Welcome reception. However, I will do my best to re-energise you, keep you interested and engaged!
I have been working at the British Library for nearly a year and I &lt;click2&gt;manage a digital Lab, kindly funded by the Andrew Mellon Foundation. Its role is to try to ‘open up’ the digital content and data the British Library has and encourage researchers, particularly in the Arts and Humanities area to use it for their scholarly work and ideas. We are trying to do this by creating an innovative space to explore project ideas and to get researchers excited about the possibilities of what they can do with our data and most importantly to carry out research using our content and services.
150 seconds
&lt;click1&gt;
Now on to the structure of my talk.
&lt;click2&gt;
I will first give a very brief overview of the Library and then tell you a number of ‘stories’ mostly from a Humanities perspective on how researchers did things in the past
&lt;click3&gt;
and how that is changing because of rapid developments in digital technology. With more and more digital content, data, tools and services being made available, researchers are able to ask questions they had never dreamed of before, share their findings in an open way and collaborating, some of them are becoming the ‘digital’ scholars.
&lt;click4&gt;
I will bring you back the story to the British Library, and how the digital scholar is changing the way we do things. Moving on to the efforts of digitisation across the British Library, giving a whistle stop tour of some of the incredible digital collections we now have and highlight some of the challenges that we face given our historical origins, licensing and technical restrictions. Importantly, I will also try to address how we are trying to tackle some of these challenges.
&lt;click5&gt;
I will outline the work of Digital Scholarship department, created to support the changing research landscape, focusing particularly on the work on the Digital Research Team and that of British Library Labs, both of which sit in the same department. I will point out some of the surprising findings we have discovered and some of the lessons we have learned so far and what we are planning for the future.
&lt;click6&gt;
Finally, I will finish with some important final ‘take away’ messages and asking you what excites you most about digital scholarship. Hopefully, if there is time, I will take a few questions too.
140 seconds
The British Library is the national library of the UK and one of the largest research libraries in the world .
&lt;click1&gt;
The Library moved to a new purpose built building in 1997 the, largest of it’s kind that was built in the UK in the 20th century. Many frequently used items are stored 5 stories below the main building at St Pancras in London and many might not know that part of the building is meant to look like a ship on a journey to discovery!
&lt;click2&gt;(yellow line appears)&lt;pause 5 seconds). &lt;click3&gt;(yellow line disappaers}&lt;click4&gt;
The building can sit 1,200 researchers at any one time across 5 reading rooms.
&lt;click5&gt;
Medium and long term requested items are held at Boston Spa in Yorkshire in a low oxygen warehouse, using robot to retrieve items. In total, the library has 625 km of shelving, growing by 12 km every year.. Some staff in IT Infrastructure, cataloguing, document supply work in Yorkshire too.
Whilst we acquire items through purchase or gifts, much of the collection has been built up through legal deposit. That is, by law, a copy of every UK and Ireland print publication must be given to the British Library by its publishers. Around 3 million items are added per year. In 2013, legal deposit was extended to cover non-print material which means by law we take in digitally published items as well, which means regular mass crawls of the entire UK web domain as well as ebooks, ejournals etc.
85 seconds
The picture you can see is inside the main building in London, it’s the King’s Library – King George the Third’s personal library! Sometimes known as the ‘stack’, I walk past this everyday and I sometimes look at this awe and am reminded that the collections the British Library have are truly staggering! We currently estimate them to exceed
&lt;click1&gt;
150 million items, representing every age of written civilisation and every known language. Our archives now contain the earliest surviving printed book in the world, the Diamond Sutra, written in Chinese and dating from 868 AD….
So some big numbers…Over …
&lt;click2&gt;
14 million books
&lt;click3&gt;
60 million patents
&lt;click4&gt;
8 million stamps
&lt;click5&gt;
4 million maps
&lt;click6&gt;
3 million sound recordings
&lt;click7&gt;
1.6 million music scores
&lt;click8&gt;
over .3 million manuscripts
&lt;click9&gt;
0.8 million serials titles (which are of course made up of many volumes/editions),
Just in case your wondering about why the numbers don’t add up to 150 million, this is where a lot of our content is.
80 seconds
I want to give you an example of a typical scholar who had recently done work at the Library in the Humanities domain.
&lt;click1&gt;
Pieter Francois is a Post Doctoral researcher at the University of Oxford. When Pieter was doing his PhD he would visit the Library often, look through the library catalogue, find and request items he was interested in and then study them in a reading room, disappearing for many years! Pieter Francois was interested
&lt;click2&gt;
in books that were about travel routes in Europe in the 19th Century. Imagine if a sample of these items were available digitally, imagine the time that would have saved him? Imagine how that would transform the kinds of questions he might ask using the power of computation, across hundreds of items? We will come back to Pieter later in my talk and track his story of becoming a digital scholar.
50 seconds
&lt;click1&gt;So, the very nature of digital allows us to
&lt;click2&gt;
break down what were previously bound items down into fundamental bits of information and data. These bits of data can be recombined, duplicated and linked to in infinite ways. This is fundamentally changing our view of research.
It’s a bit like the
&lt;click3&gt;
‘Tower of Babble’ sculpture to the right by Brian Dettmer , created by recombining bits from books, words and sentences cut out and put back together in different ways to create something new, surprising and beautiful. This is what scholars are doing with digital content. Let us now move on to what is understood by the term ‘digital scholar’.
50 seconds
In his book, The Digital Scholar: How technology is transforming scholarly practice, Martin Weller suggests that a short hand term should be used to loosely define a Digital Scholar.
First of all,
&lt;click1&gt;
the person does not necessarily need to be a recognised academic or someone who posts online. It is someone who employs
&lt;click2&gt;
digital,
&lt;click3&gt;
networked
&lt;click 4&gt;
and open approaches to demonstrate their specialism.
Let us now look at the area of Humanities, where our scholar Pieter Francois does his work, to investigate the idea of a Digital Scholar a little further.
40 seconds
Franco Moretti, a Humanities scholar from Stanford University said,
&lt;click1&gt;
‘Reading individual works is as irrelevant as describing the architecture of a building from a single brick, or the layout of a city from a single church.”
Imagine if scholars could view digital archives as an infinite pool of multiple layers of loosely held data from which new research questions could be answered, moving beyond the bounds of single items, to enable research at scale.
180 seconds
So what kinds digital research methods are these digital scholars using especially in the area of Digital Humanities.
&lt;click1&gt;
For example, searching for items based on and time location can reveal very interesting patterns, e.g. when and where works were published. For example one researcher is looking at the evidence of copy and paste in newspapers in the 19th Century which was a common practice back then. Knowing where and when items might include text from a source can reveal patterns of how the text travelled over time.
&lt;click2&gt;
Geotagging objects, putting them in space can add new dimensions to the kinds of research questions we might want to ask.
&lt;click3&gt;
Corpus analysis is the analysis of text in language and Text mining is about finding patterns in text through computational analysis, for example, number crunching (a lot of it based on counting words).
&lt;click4&gt;
Tasks that require humans to use technology to complete a task that computers would hard to do, fall under the area of Crowdsourcing and Human Computation for example e.g. recaptcha is used by getting better users to contribute to better text from scanned book by typing in words they see, these are words that computers couldn’t recognise through Optical Character recognition, recaptcha is getting humans to do the task in microtasks when they need to log in to websites that require additional authentication. Amazon’s Mechanical Turk is another form of human computation, where tasks are outsourced to humans that computers would find very hard to do.
&lt;click5&gt;
Annotation involves augmenting an item with additional information, usually text, but not necessarily, e.g. highlighting an area, a drawing etc.
&lt;click6&gt;
Natural Language processing is used in the analysis of speech, for example.
&lt;click7&gt;
Similarly transcribing can be the conversion of speech into text through human or computing power to then be used for further analysis.
&lt;click8&gt;
Providing Application Programming Interfaces or APIs to data can be very powerful ways to access datasets, and can even be used by software developers to build software applications on top of them.
&lt;click9&gt;
Many researchers want to see the patterns that are emerging in large amounts of data and are now using a number of very powerful tools to visualise large amounts of data to see patterns.
&lt;click10&gt;
This website from our launch event has 6 minute videos of presentations from researchers using digital research methods
What is clear is that digital research methods are much more that searching for an individual item in a catalogue and Libraries, publishers, service and content providers have to change to support that.
20 seconds
Many of our collections have been or are being digitised which is having a big impact on the Digital Scholar. Let’s take you on a little tour the British Library’s digitised collections.
75 seconds
&lt;click1&gt;
68,000 digitised volumes (around 22 million pages) published between 1789 and 1914 covering a wide range of subject areas including philosophy, history, poetry and literature, were digitised with Microsoft. The scans are in the public domain now.
&lt;click2&gt;
Image taken from
&lt;click3&gt;
page 144 of &apos;Philadelphia: the story of an American city ... Issued by the City of Philadelphia under the auspices of the Joint Special Committee of Councils on World&apos;s Columbian Exposition. Does anyone know if the Temple at Navoo still exists?
&lt;click4&gt;
A major partnership project with Google is underway to digitise 250,000 out-of-copyright books from the Library’s collections. Books, pamphlets and periodicals dated 1700 to 1870 are being digitised, including material in a number of major European languages&lt;click&gt;, over 50,000 books have already been digitised.
&lt;click5&gt;
The book is about Hippopotami and I think it is in Dutch.
40 seconds
We have very large collections of Digitised Newspapers, digitised from paper copies&lt;click&gt;. Typically the digitisation has been done with commercial publishing companies and access tends to be on site or through subscription. DC Thompson Family History are working with the Library on the British Newspaper Archive &lt;click&gt;, to digitise up to 40 million pages from the Library’s national newspaper collection , with a coverage of local, regional and national press across three and a half centuries.
35 seconds
The illuminated manuscripts are among the most dazzling and intriguing objects ever created, and the British Library’s collection is one of the finest in the world, they are extremely popular. There are more than 3,000 digitised images reflecting almost 1,000 years of history, from around 600 to 1600.
&lt;click1&gt;
High resolution versions are available on the British Library’s Flickr site.
70 seconds
It’s not just text and images we have. We have 100s of thousands of digitised titles in our moving image collections.
&lt;click&gt;The BBC Pilot Service
Brings together the BBC&apos;s programme catalogue, Radio Times data and BBC television and radio programmes recorded off-air, from mid-2007 to the end of 2011 with 2.2 million catalogue records and 190,000 playable programmes. We expect this to become a full service from April onwards, with the name the BBC catalogue service.
&lt;click&gt;Broadcast News - provides access to daily television and radio news programmes from seventeen channels (15 TV, 2 radio) broadcast in the UK since May 2010, recorded off-air by the British Library. Currently recording 46 hours per day, including BBC, Sky News, Al-Jazeera English, CNN, France 24, Bloomberg, Russia Today and China&apos;s CCTV News.
25 seconds
Digitisation is transforming access to researchers.&lt;click&gt; It is spreading the value of collections, content and expertise. &lt;click&gt;It is about connecting, collaborating and sharing as much as it is about collecting, e.g. through social media and &lt;click&gt;encouraging others to integrate our materials into their services – and vice versa
85 seconds
&lt;click&gt;The British Library faces many challenges of access to our Digital collections!&lt;click&gt; Sometimes digital content is only available onsite due to license restrictions, &lt;click&gt;or even only on a specific computer in a reading room! Technically there are very few reasons why digital content can’t be online &lt;click&gt; though it might be too big or hasn’t been transferred from other digital storage media. &lt;click&gt;Sometimes access is through a paywall. Finally, &lt;click&gt;some content is in the happy sunny place, online, open and freely available.
The real reasons why there are challenges to accessing digital content are of course human. They require different approaches from the Library and may often involve an honest, open dialogue and negotiation with the publishers.
The Labs project has tried to address this problem my creating a ‘residency model’ for researchers to work intensively with a digital collection on-site, so as to not infringe access conditions, I will say more about this later.
45 seconds
Formed in 2010 and lead by Dr Adam Farquhar,
&lt;click1&gt;
the Digital Scholarship department’s mission is to …become a leading centre of digital scholarship … internationally recognised for innovation and collaboration in support of research and learning
Both in the Digital Scholarship department,
&lt;click2&gt;
the Digital Research Team with its Digital Curators work very closely with
&lt;click3&gt; Labs . I will now talk about some of the activities of both of these teams to give you an idea of the work that we do..
55 seconds
Our Digital Curators are,
&lt;click1&gt;
Stella Wisdom, Aquiles Alencar-Brayner, James Baker and Nora McGregor.
So what exactly is a Digital Curator?
&lt;click2&gt;
They explore how digital technologies are re/shaping research and how this informs how the library does its business.
&lt;click3&gt;
They support staff across the library to identify the opportunities that digital tools and collections afford in modern scholarship and to gain the skills to engage confidently in this area.
&lt;click4&gt;
They partner with libraries and institutions to enable innovation in digital scholarship.
&lt;click5&gt;
The don’t curate a specific collection but rather have expertise in digital scholarship, broadly defined.
45 seconds
The Digital Curators support the development of staff skills in the Library through a bespoke
&lt;click1&gt;
Digital Scholarship Training Programme. A quote from one of the attendees states…“It is about helping librarians and curators at the British Library acclimatise to the idea that the Library is becoming a place full of data as much as it is a place full of physical stuff, and that there is a growing community of users who see it that way”.
&lt;click2&gt;
They offer 15 courses several times a year (animate slide).
45 seconds
Part of the role of the Library is to open us as much digital content as it can.
&lt;click1&gt;
The &quot;Picturing Canada&quot; collection is a series of photographs from the Canadian Copyright Collection held at the British Library. They were digitised with Wikimedia UK and the Eccles Centre for American Studies and 5374 images have been uploaded to Wikimedia Commons in high resolution. This demonstrates that the Library is using open models of releasing the digital content it curates.
&lt;click2&gt;
There are now currently 3 collections of copyright free images on Wikimedia commons.
22 seconds
The Digital Maps curator has leveraged crowdsourcing to help geo-reference
&lt;click1&gt;
725 digitised maps of the UK using an accessible and convenient tool called Geoparser
&lt;click2&gt;.
The maps were assigned spatial metadata in only 5 days
&lt;click3&gt;
with only a small proportion of errors.
105 seconds
Curator Cheryl Tipp Curator of Environment and Nature Sounds
&lt;click1&gt;
in Digital Scholarship worked with the creative industries department at the British Library and a company called Ideas Tap to launch the &lt;click&gt;‘Sound Edit Wildlife Films Competition’ which challenged animators, filmmakers and photographers to create a short film inspired by the Library&apos;s collection of 10 wildlife sound recordings.
&lt;click2&gt;
The winning entry was &apos;Dave&apos;s Wild Life&apos; from Samuel de Ceccatty, a fantastic short which follows Dave, an amateur naturalist whose sole aim is to have his own TV show. The clip I will show uses the ‘Haddock drumming calls’ to give a voice to the cranes or, as Dave liked to call them, the Diplodocus longus cranum.
Cue up video and play from 2min 41 - 3 minutes 15
150 seconds
My colleague Stella Wisdom, Digital Curator, was one of the organisers of Off the Map competition for 2013 where videogame design students had to turn historic maps and engravings from the British Library’s collections into a 3D environment using Crytek&apos;s CRYENGINE software. The winners were Pudding Lane Productions, 6 second-year students, De Montfort University, Leicester, won first prize.
&lt;click1&gt;
and this is a screen grab of their winning entry.
&lt;click2&gt;
Their entry used maps of London, and recreated a world that was destroyed by the Great Fire of London in the 16th Century, starting in Pudding Lane. Let’s take a brief look at their winning entry.
Cue up from 13 seconds to 133 seconds.
Back to slide…
&lt;click3&gt;
A new competition is launching soon, Off the Map Gothic 2014, which will be using digitised Gothic digitised items from the Library to inspire Gothic themed 3D environments, the results will be showcased at our Gothic exhibition at the end of this year, in November 2014, Terror & Wonder: The Gothic Imagination.
40 seconds
Now on to British Library Labs. The aim of the Lab is to encourage scholars to experiment at scale with our digital collections and data. The team holds competitions, events, and creates the space in which to engage with scholars. Through Labs we’re learning how to better support scholars and build new services. Our website is available at labs.bl.uk
&lt;click1&gt;
The project is kindly funded by the Andrew Mellon Foundation.
62 seconds
&lt;click1&gt;
The primary purpose of Labs primary is to open as much &lt;click&gt;digital content as possible for
&lt;click2&gt;
researchers and software developers (sometimes they are the same person) and encourage them to use the Library’s content in their research,
&lt;click3&gt;
primarily in UK academia but where appropriate anywhere else in the &lt;world&gt;world.
Labs sits within the Digital Scholarship Department at the British Library
&lt;click4&gt;
and works almost on a daily basis with the Digital Research Team&lt;click&gt;. It also works with the &lt;click&gt;Access and Reuse Group, a cross departmental group that meets once every six weeks to deal with requests to openly license digital content. Labs co-operates internally with
&lt;click5&gt;
Curators and Researchers and Technical staff in order to understand the ‘story’ behind a collection and the technical issues involved in providing access to the digital content.
65 seconds
This is how Labs works.
&lt;click1&gt;
We adopt a Data Driven approach to encourage scholars to do research and development with and across British Library digital collections and data.
&lt;click2&gt;A researcher / developer (again sometimes the same person and sometimes not) comes up with an idea and engages with Labs through various mechanisms
&lt;click3&gt;
such as competitions, events and projects. Through this process the Library learns how better to support digital scholars and to build on existing processes or create new ones, as well as make
&lt;click4&gt;
tools (e.g. APIs etc.) and services. The
&lt;click5&gt;
case studies are some of the outputs we hope to create that will help other research libraries around the world wanting to build Labs for their digital content,
&lt;click6&gt;
others include open software and publications.
115 seconds
Finding openly licensed collections is sometimes like detective work and from lessons learned Labs, uses the following 4 methods for filtering digital content:
&lt;click1&gt;
Is the Copyright cleared for research and non commercial use?
&lt;click2&gt;
Is it Curated (Is there someone who knows the ‘story’ about the collection?)
&lt;click3&gt;
Is there Collection / Item Level Metadata available? And importantly what state is it in, does it need cleansing?)
&lt;click4&gt;
Finally, where is it?
&lt;click5&gt;
These have been effective filters in doing the work of Labs in an agile way.
&lt;click6&gt;
Labs has therefore identified several collections at the website above, some are shown in the slide:
&lt;click7&gt;
Due to our licensing conditions, we are in the process of text mining the abstracts for a large number of journal titles in electronic form. The visualisation indicates the subject spread of our collections.
&lt;click8&gt;
We have been harvesting the UK Web since 1993 and this is available as a resource under specific conditions for research.
&lt;click9&gt;
We are also investigating the use of our item request data (around 17 million records) and anonymised reader data, data protection allowing.
&lt;click10&gt;
The British National Bibliography has over 3million catalogue records, licensed under CCO from the British and Irish National Library catalogues.
More information is available on the Labs website.
70 seconds
We engage researchers through various activities, such informal events such as:
&lt;click1&gt;
Hack and Data days – where researchers, developers, curators and anyone interested with digital collections work together.
&lt;click2&gt;
First brainstorm ideas and try to group them,
&lt;click3&gt;then reflect, consider and choose them, focusing on being realisitic of what can be achieved in the time available
&lt;click4&gt;
and develop prototypes &lt;click&gt;where the atmosphere is relaxed and non judgemental, it’s OK to try things, and make mistakes.
&lt;click5&gt;
We also run ideas Labs where we get researchers together over lunch, engaging with the Library’s digital collections through&lt;click&gt; playing cards, like Top Trumps, boys amongst you will know what I mean. We encourage them through activities to come up with ideas and research questions, focussing on what outputs might be generated, and to continue to work the us.
&lt;click6&gt;
We also get involved in projects from within the Library and collaborating with external institutions.
80 seconds
A major part of Labs activity is to run an annual competition. As mentioned we adopt a data ‘driven approach’, encouraging researchers to look at our data, talk to us, and more importantly to talk to each other and submit ideas and project plans of what they could do in a 4-6 month residency at the British Library. This ‘residency model ‘enables researchers to get access to pretty much all the digital content they require without any license restrictions and we get to engage with them deeply to learn about what they want to do and importantly what we need to learn as a library to support digital scholars better. We worked in an agile way with
&lt;click1&gt;two researchers,
&lt;click2&gt;Pieter Francois (remember him from earlier?) and Dan Norton over a
&lt;click3&gt;
4 month period to work on their research questions and ideas. Let’s now look briefly at their ideas and what was achieved.
70 seconds
&lt;click1&gt;Pieter’s project was the “The Sample Generator” which was a tool to help a researcher by providing representative digitised samples (as well as physical) of materials they were interested in researching about. This is opposed to being faced with the daunting task of sifting through thousands of records to find a representative sample to start working on. Pieter’s area of interest was
&lt;click2&gt; European travel but the idea of the sample generator could work for any subject. We gained a deeper understanding of the distribution of digitised material to date
Pieter’s analysis showed that, while extensive, digitised material is not representative of published output. As a consequence, researchers must take additional care when trying to sample representative content using
&lt;click3&gt;.
statistical methods,
&lt;click4&gt;
a problem which The Sample Generator starts to address
&lt;click5&gt;From this screen shot you can the distribution of all the books the 19th Century. The blue represents the physical collection. The red line is the digital collection (around 2.7 %)
&lt;click6&gt;This screen shot show the distribution of books about travel routes. The blue indicates all the physical items, the red line the digital and the orange line the sample. What’s key is the orange line mimics the frequency of items in the total collection.
65 seconds
Dr Dan Norton was researcher at the University of Dundee and artist in residence at Hangar, Centre for Art and Research, Barcelona. Hi idea was “Mixing the Library: The Disc Jockey and the Digital Collection” which brought a DJ’s approach to interacting with multi-format digital collections
&lt;click1&gt;.
Dan’s interactive approach helps build aesthetic, experimental, or logical links between resources. This ambitious project focused on ideas around creating a prototype
&lt;click2&gt;
and what would be the basic building blocks needed to create a simple demonstrator
&lt;click3&gt;.
Dan is now building on the work he did at Labs and is the resident researcher and artist at the
&lt;click4&gt;
Living Labs: Library of the Future in Barcelona, where he will be working with software developers to produce a fully functioning mixing tool.
50 seconds
The curatorial platform was created to re-use British Library metadata, using the Drupal content management system. It was created by Sara-Wingate-Gray and Kate Lomax, whose Labs 2013 entry was specially commended. Even though they didn’t win, judges loved their idea and subsequently with the help of Labs their idea attracted funding through the Arts and Humanities Research Council in the UK. The project was completed in Jan 2014 and showcases the digital narratives created by Art students using British Library Oil paintings from colonial India.
Here is what a basic metadata record looks like on the British Library site
&lt;click1&gt;
Here is the curatorial interface
&lt;click2&gt;
which has simply ingested the metadata in a comma separated values file.
As we can see it has created a very engaging set of user interfaces
&lt;click3&gt;, using
&lt;click4&gt;
Geolocation
&lt;click5&gt;
Slideshows
&lt;click6&gt;
and &lt;click7&gt;
timelines
75 seconds
The work of Labs is really about a number of stories, stories about digital collections and about researchers wanting to ask fascinating research questions about them. Let’s now tell you a story about one collection and the intended and unintended consequences of working with it.
The Library digitised 68,000 17th to 19th century books from our collections a few years ago (around 2.7 % of the physical total in that period). You can view them from our catalogue or read them on your &lt;click&gt;IPad via the Historical Books app developed by BiblioLabs. We also captured 22 million individual page images, along with full text scans of these images all of which contain untold quantity of useful data such as names of people, places, historical events, dates.
So the question became then, what next? What can 68,000 books tell us?
60 seconds
&lt;click 1&gt;
As the books were
&lt;click2&gt;
scanned for text
&lt;click3&gt;,
this had a fortunate ‘side effect’ the software not only tries to detect the text
&lt;click4&gt;
on the page
&lt;click5&gt;
but also where the images might be. There had already been some interest as part of the competition, Matt Prior attended one of our hack events and when examining our book data and was very interested in the images from the books.
50 seconds
Ben O’Steen, Labs Technical lead, wrote an algorithm to begin to extract all the images that were picked up in the scanning process from the 68,000 books. He did this initially because he had an interest in face recognition algorithms, and wanted to know if algorithms that had been trained on passport photos could be used to detect faces in the images in this collection, many of which were illustrations
&lt;click1&gt;.
He found that it was good at detecting
&lt;click2&gt;female faces but not men, as men tended to have beards and glasses!
130 seconds
Ben then decided that as he had started to extract all images from scanned pages, he would start to post an image every hour on a tumblr blog
&lt;click1&gt;
This was the first image that was published. In discussions with the Digital Research team, the digital curators and me, the service was christened with the somewhat controversial name &lt;click&gt;Mechanical Curator (we like to be a little controversial) and said that it was a ‘she’. Our newest staff member churned away day and night posting an image every hour. It posted previously unseen illustrations taken almost at random. The Mechanical curator uses algorithms to chose other similar images based on a number of algorithms it has at hand, for example
&lt;click2&gt;e.g. published date,
&lt;click3&gt;
slantyness, &lt;click&gt;bubblyness on the x or &lt;click&gt;y axis, or a&lt;click&gt; ‘new train of thought’ if it get’s bored. However, little was known about the actual image, apart from the analytic work of the Mechanical curator.
Meanwhile the algorithm that Ben had written to snip the images from the OCR scans was still churning away, how many were there going to be? The Mechanical Curator could publish them every hour, it, but was there somewhere we could put them all for people to browse when they wanted. Importantly if we did put them somewhere, could we get people to help us add descriptions to the individual images making them infinitely more discoverable.
65 seconds
How many images do you think Mechanical Curator found?
&lt;click1&gt;
Over 1 million images were then put onto
&lt;click2&gt;
Flickr commons.
Why? Because each image would have a
&lt;click3&gt;
URL. Each image had some
&lt;click4&gt;
metadata, i.e. the book and page number it came from.
However the image itself didn’t have any metadata, i.e. was it a picture of a dog. By releasing the images onto Flickr, we could begin to see if people might start adding tags to the images.
&lt;click5&gt;
Flickr has a well known and used API which developers and researchers could start using to build applications on top of or for examining large amounts of them at the same time.
People have already started putting images into sets as you can see from the picture, portraits and ships are very popular.
35 seconds
The site was launched on Friday, December 13, 2013. How many views do you think there have been?
&lt;click1&gt;
There are have been a staggering 119,000,000 views of the images since December 13th, 2013, figures correct as of last week. So Friday the 13th wasn’t so unlucky for us after all!
&lt;click2&gt;
47,714 tags have been added to the images
18,567 images have been favourited.
&lt;click3&gt;
Labs is involved with 2 potential research projects & 4 grassroots crowdsourcing efforts.
35 seconds
There are risks in this of course, surely lurking in the 1 million are images are sordid and of an offensive nature, especially given some of the views that were around at that time?
&lt;click1&gt;
In the end we decided to not interfere, and take any issues as they may arise on a case-by-case basis. To date we’ve had very few.
50 seconds
Here is the anatomy of a Flickr record, importantly we have created links to many of the Library’s services
&lt;click1&gt;
some of this lovely traffic is going back to the Library and hopefully generating more interest in our services, from downloading a pdf of the book to purchasing a high res scan of the image.
&lt;click2&gt;
Tags are added from the original book record, including the approximate page number the image came from
&lt;click3&gt;
users of Flickr can add their own tags, and I have mentioned they have already started doing it.
18 seconds
There has been considerable news coverage about the million images released on Flickr commons.
&lt;click1&gt;
The Independent,
&lt;click2&gt;Wired magazine,
&lt;click3&gt;The Guardian,
&lt;click4&gt;Popular Science and the
&lt;click5&gt;
Mail online to name a very few.
15 seconds
There have been several creative uses of the images and can be found at the website above,
&lt;click1&gt;
even the creation of a skateboard which you can buy for $64.
130 seconds
That’s just one story, there are so many more stories to tell. Here are just a sample of some of the other stories emanating from our Digital Lab at the British Library, stories we are only happy to tell other organisations and conferences. Invitations and all expenses paid trips to speak at Hawaii are most welcome!
&lt;click1&gt;
There is the story of how we are using subtitle files to create summaries of news programmes to enhance the poor metadata that currently exists at the moment for news programmes.
&lt;click2&gt;
The story of how we are working on analsying music performances with computer clusters and how the resulting data will be made available for researchers.
&lt;click3&gt;
The story of opening up over 100,000 Playbills (posters about plays) from the 17th Century onwards.
&lt;click4&gt;
The story of how we might be printing 3D objects to represent Digital Humanities data, and how people might be able to interact with these objects using their mobile phones or plug in and extract data from embedded USB memory devices.
&lt;click5&gt;
The story of data.bl.uk, will be a place we are going to create for all the Library’s open data and freely licensed digital collections.
&lt;click6&gt;
The story of how we are setting up cloud infrastructure, where digital content lives right next door to enormous computing power, so that researchers can begin to interrogate out data at a massive scale and make incredible new discoveries, very similar to the internet archives virtual reading room.
&lt;click7&gt;
And the story of how we are approaching the Andrew Mellon Foundation to keep us funded and we are happy to work with anyone else who would like to talk to us to about sponsorship or digital content.
25 seconds
A quick reminder again for all of you, our current competition is open, please tell everyone you can about it.
The deadline is 22 April 2014 and the residency for two chosen ideas runs from late May to the end of October 2014, more details are available on our website.
50 seconds
Finally these are my take away messages:
&lt;click1&gt;
There is a huge appetite for openly available content as we have shown with the Flickr Commons images.
&lt;click2&gt;
There needs to be a dynamic continuous interaction with data and researcher to formulate and reformulate research questions
&lt;click3&gt;
Working with Digital Scholars creates new opportunities, not just new research questions.
&lt;click4&gt;
Content and service providers, researchers and technical people need to engage with each other to create the new tools, services and content that are needed to facilitate new discoveries.
90 seconds
To finish off, I would like to ask you what excites you most about digital scholarship, you will have 10 seconds to choose, the last number you press in that time will be taken as your choice. Press now please.
Ok, as we can see we have a majority of X and Y not many Z. I hope all of you will be able to get something from my presentation and continue and ideas thoughts and discussions into the Welcome reception.
35 seconds
I would like to acknowledge the following colleagues in Digital Scholarship and the Digital Research Team and especially my amazing and brilliant colleague, Ben O’Steen! And I would like to thank Sam Tillet and Elizabeth Newbould from the British Library for inviting me here and Bonnie and Jill for getting me here. I hope I will be able to come back one day to report on our progress.
20 seconds
Please let us know about any ideas you might have for engaging with Labs.
If you have any questions, please come up to me at the Welcome reception.
Thank you and have a lovely stay here in Philly.
The presentation is available to download from the above URL (please note that this is case sensitive). If you would like to tweet about it, please use the bl_labs hash tag.