SlideShare une entreprise Scribd logo
1  sur  46
1
mahendra.mahey@bl.uk & labs@bl.uk
http://www.bl.uk/projects/british-library-labs
Funded by the Andrew W. Mellon Foundation
Mahendra Mahey
Experiment with our
Digital Collections
Mahendra Mahey
Manager of BL Labs
Project Management and DH projects
1500 - 1630, Monday 18 December, 2017
CHASE AHDA Winter School 2018
The Open University in London/Futurelearn,
1-11 Hawley Crescent, Camden Town,
London, NW1 8NP
2
mahendra.mahey@bl.uk & labs@bl.uk
Breakdown of session
• Introductions
• Background to BL Labs and DH Projects
• Developing project ideas as proposals, tips and challenges
• Feedback and suggestions
3
mahendra.mahey@bl.uk & labs@bl.uk
What is Project Management?
1) Can range from informal and small scale focusing largely
on common sense flexible approaches to management…
2) To large scale formal approaches using methodologies
such PRINCE 2 (PRojects IN Controlled Environments),
AGILE, SCRUM and tools such as MS Project
• BL Labs uses both
• Focus on the first one to help develop your own ideas
• Not a session on Project Management!
4
mahendra.mahey@bl.uk & labs@bl.uk
Introductions
• Name
• Affiliation
• Project idea (one sentence)
5
mahendra.mahey@bl.uk & labs@bl.uk
The British Library
Inside the British Library
Space for 1200 readers, around 500,000 visitors per year
Building 37 uses low oxygen and robots
Boston Spa also has a Reading room and provides delivery of items to London
Many items stored at Document Supply and Storage centre 48 hours away
Stockton-on-Tees
Author right to payment each time their books
are borrowed from public libraries.
St Pancras, London, UK
Many books are stored 4 stories below the building
UK Legal Deposit Library – Reference only
Founded in 1973 though origins stem back to British Museum Library 1759
Boston-Spa
6
mahendra.mahey@bl.uk & labs@bl.uk
BL Labs supports…
Researchers
https://goo.gl/WutNyi
Artists
http://goo.gl/nNKhQ2
Librarians
Curators
https://goo.gl/9NWZUW
Software Developers
https://goo.gl/7QQ5Tf
Archivists
https://goo.gl/x7b4tg
Educators
https://goo.gl/qh01Mi
Anyone
interested in our
digital collections
and data
7
mahendra.mahey@bl.uk & labs@bl.uk
Physical Collections – not just books!
> 180*million items
> 0.8* m serial titles
> 8* m stamps
> 14* m books
> 6* m sound recordings
> 4* m maps
> 1.6* m musical scores
> 0.3* m manuscripts
> 60* m patents
King George IV bequeathed Library *Estimates
8
mahendra.mahey@bl.uk & labs@bl.uk
Born Digital Digitised
Let’s talk Digital…
9
mahendra.mahey@bl.uk & labs@bl.uk
/
Knowledge Quarter London
80 knowledge organisations (as of 07/12/17) within 1 mile radius of
Kings Cross, http://www.knowledgequarter.london
http://www.turing.ac.uk (Headquartered at the British Library)
UK Web Archive and e-legal deposit (2013)
http://www.webarchive.org.uk/ukwa/
Born digital
Data all around us at
Kings Cross!
Born digital
Data all around us at
Kings Cross!
Born digital
Data all around us at
Kings Cross!
10
mahendra.mahey@bl.uk & labs@bl.uk
All our physical
items are digitised
right?
11
mahendra.mahey@bl.uk & labs@bl.uk
#bldigital
1-2 %* digitised
* estimate
Digitisation
Partnerships
Commercial & Other Organisations
Amount
increasing rapidly
Bias in digitisation
So learn the story behind
the digital collection
http://goo.gl/bR9UJL
Sample Generator
12
mahendra.mahey@bl.uk & labs@bl.uk
Playbills, Books, Newspapers
(includes Optical Character Recognition (OCR))
Digital collections and Datasets
British National
Bibliography
http://bnb.data.bl.uk
http://sounds.bl.ukhttp://dml.city.ac.uk/
Music (Recordings & Sheet) & Sounds
http://goo.gl/frSMJt
Broadcast News (TV and Radio)
http://goo.gl/cwThHw
http://goo.gl/pBkisZhttp://goo.gl/E8aRyQ
Usage data
EtHOS
Web ArchiveImages, Manuscripts & Maps
http://www.qdl.qa/
Qatar Digital Library
http://idp.bl.uk/
International
Dunhuang
Project
Maps
http://www.bl.uk/maps/
Hebrew Manuscripts
http://goo.gl/4sbCp9
Flickr &
Wikimedia Commons
https://goo.gl/LZRmaZ
13
mahendra.mahey@bl.uk & labs@bl.uk
Finding Open Cultural Heritage Datasets
Collection Guides (183 as of 05/12/17)
https://www.bl.uk/collection-guides/
Datasets about our collections
Bibliographic datasets relating to our published and
archival holdings
Datasets for content mining
Content suitable for use in text and data mining
research
Datasets for image analysis
Image collections suitable for large-scale image-
analysis-based research
Datasets from UK Web Archive
Data and API services available for accessing UK Web
Archive
Digital mapping
Geospatial data, cartographic applications, digital aerial
photography and scanned historic map materials
https://data.bl.uk
Download collections as zips, no API
Each dataset has a Digital Object Identifier (DOI)
can be referenced for research
Not all discoverable via
search engines!
14
mahendra.mahey@bl.uk & labs@bl.uk
Explore or Imagine Our Data!
• CSV of Metadata
https://data.bl.uk/digbks/dig19cbooks-mdata-csv.csv
• 19th Century Books - Book Metadata - 01/09/2013.
https://data.bl.uk/digbks/db21.html
• Digitised Books - Flickr Tag History - Dec 2013 to March 2016.
TSV
https://data.bl.uk/digbks/db15.html
• Digitised Hebrew Manuscripts - Metadata
https://data.bl.uk/hebrewmanuscripts/heb1.html
• Digitised Hebrew Manuscripts: Or 2210 - Or 2364
https://data.bl.uk/hebrewmanuscripts/heb8.html
• Theatrical playbills from Britain and Ireland (OCR text only)
https://data.bl.uk/playbills/pb2.html
• Portraits of actors, views of theatres and playbills (covering
1750 - 1821 in a single volume)
https://data.bl.uk/singlesheet/por1.html
• Volumes of Lysons Collectanea (Amusements), comprising
broadsides, cuttings, advertisements on amusements.1660-
1840.
https://data.bl.uk/singlesheet/ad1.html
https://data.bl.uk
• Have a look at the data.
• Data Quality
• Issues
Or an idea you have thought of
what to do with the data!
http://labs.bl.uk/Ideas+for+Labs
Smaller datasets
15
mahendra.mahey@bl.uk & labs@bl.uk
Openly Licensed Digital Content?
15% Openly
Licensed
Around 80%*
available online
Working through to make more open…
Though some collections will always only be available onsite due to
various reasons including legal, ethical etc
Breakdown by collection*
Manuscripts 59%
Books 9%
Maps and Views 7%
Newspapers 3%
Archives and Records 3%
Paintings, Prints and Drawings 2%
*Based on number of digitisation projects (693 as of 08/12/17)
Largest proportion of funding
Public / Private Partnership
15 %* Openly Licensed – most online
85 %* Available onsite only at the moment
*Estimates
16
mahendra.mahey@bl.uk & labs@bl.uk
The Story of the Digital Collection…
Digital
Collection
Curator
Who paid for the digitisation?
Who did the digitisation?
Technology used
Born digital?
Published
Unpublished
Where is it?
Can it still be accessed?
Generates income
Reputational risk in using?
Legalities
Politics when digitised
Personalities involved
Surprises (e.g. gaps)
Descriptive information
Old format not supported
What media was the
digitisation done from?
Is there any background documentation?
No Descriptive information
Inconsistent descriptive information
Still there?
Good to know the background ‘Story’ of a Digital Collection’
if you want to use it for research and make conclusions…
17
mahendra.mahey@bl.uk & labs@bl.uk
Competition
Awards
Projects
Tell us your ideas of what to do with our digital content
Show us what you have already done with our
digital content in research, artistic, commercial
and learning and teaching categories
Talk to us about working on collaborative projects
18
mahendra.mahey@bl.uk & labs@bl.uk
Example Pattern of Research
1, 2, 3
1. Find / identify new things in messy stuff
2. Unlock hidden history / data
3. Celebrate by telling new stories!
19
mahendra.mahey@bl.uk & labs@bl.uk
Finding / identifying invisible / well hidden
things in ‘messy’ historical data
https://goo.gl/mcpa8B
Not the British Library!
Example Pattern of Research 1
Some of the challenges we face at the Library
20
mahendra.mahey@bl.uk & labs@bl.uk
Unearthing / unlocking
hidden histories & data
to stimulate new research
https://goo.gl/vJ291F
It’s an
18th Century Poem!
Example Pattern of Research 2
21
mahendra.mahey@bl.uk & labs@bl.uk
Celebrating hidden histories / data
creatively through events, art, performance
and story telling
https://goo.gl/Ql0Bwz
Re-enacting, re-discovering history
Example Pattern of Research 3
22
mahendra.mahey@bl.uk & labs@bl.uk
Experiments with Text
23
mahendra.mahey@bl.uk & labs@bl.uk
https://goo.gl/oUNj5N
https://goo.gl/ImAUv4
Finding things in ‘messy’
Optical Character Recognised (OCR) text
Mrs Folly
• Clean up some manually
• Get human ‘ground truth’
• Write computer code (sometimes
it’s machine learning) to find
things reliably in it ‘automatically’
• Try code on messy content
• Tweak if necessary
• Digital ‘lasso’ around content
• Human sift through
Mrs Folly
An example pattern of research
24
mahendra.mahey@bl.uk & labs@bl.uk
Looking through a rubbish bin?
https://goo.gl/UeEvqs
Good stuff! Some Rubbish
25
mahendra.mahey@bl.uk & labs@bl.uk
Machine Learning / Reading
Analogies to how humans read / learn
Machines acquire ‘knowledge’ / data, use that
knowledge / data to make sense / identify patterns
https://goo.gl/k68fTf
https://goo.gl/gXmVQL Can you see the bird?
26
mahendra.mahey@bl.uk & labs@bl.uk
Need to stress still requires computational
& human effort…
https://goo.gl/gDQEAz
Labs doing this on a case by case basis
so methods can vary
Machine Learning / Reading still
requires ‘Human Effort’!
27
mahendra.mahey@bl.uk & labs@bl.uk
Legalities of Machine Learning /
Text and Data mining
https://goo.gl/toq4Bo
Legalities of Machine Learning / Text and Data
mining still up for discussion…Often misunderstood
Is it the same as humans reading and looking for
patterns…just a bit quicker?
28
mahendra.mahey@bl.uk & labs@bl.uk
http://victorianhumour.tubmblr.com
Victorian Meme Machine (2014)
https://goo.gl/HMqDt3
Bob Nicholson
http://victorianhumour.tumblr.com/
Bob Nicholson interviewed on
BBC Radio 4 Making History Programme:
http://goo.gl/fmV9ep
And telling jokes to the public:
http://goo.gl/xIDRhz
Bob obtained further funding from his university
Looking for more collaborations
https://www.youtube.com/watch?v=-GRgj7Q5OM0
Rob Walker, Victorian Mother-in-law Jokes
Victorian Comedy Night, 7 Nov 2016
Learnt about access paths
to digital collections
29
mahendra.mahey@bl.uk & labs@bl.uk
Katrina Navickas (2015)
Political Meetings Mapper
http://politicalmeetingsmapper.co.uk
https://goo.gl/Qq78Oa
Labs Symposium 2015
https://goo.gl/BSA3be
Interview 2015
The Chartist Newspaper
http://goo.gl/vOLSnH
Chartist Monster Meeting
Chartists Walking Tour and
Re-enactment London
Learnt that domain knowledge
reduces noise
30
mahendra.mahey@bl.uk & labs@bl.uk
Data-mining verse in 18th Century newspapers
BL Labs Project 16-17, Jennifer Batt
https://goo.gl/5Akthd
Slides courtesy Jennifer Batt
31
mahendra.mahey@bl.uk & labs@bl.uk
What thoj' among ourrelves, with too much Heat, or t
W: fweutimes.wongle, wvhen we Ihould debate, W –
(A confequential Ill which Freedom drawvs, fl t
A bad Efficf, but from a noble Caufe) t
We can with univeifal Zcal advance, to
To cutb the faithlefs Arrogancccof V rance. hi
Dublin Journal, 10-14 September, 1745 Slides courtesy Jennifer Batt
32
mahendra.mahey@bl.uk & labs@bl.uk
Verse: 81% lines begin with
initial capital
Prose: 52% lines begin with
initial capital
Westminster Journal 3 March 1745
Slides courtesy Jennifer Batt
Started to refine
Machine Learning Techniques
Jennifer Batt @ the BL on World Poetry Day
‘40,000’ things found…
33
mahendra.mahey@bl.uk & labs@bl.uk
Use of Overproof
OCR Correction?
Re-OCR with
ABBY FineReader?
https://www.abbyy.com/en-gb/
http://overproof.projectcomputing.com/
RE-OCR
Cleaning up OCR Text – significant improvement
up (depending on original image quality)
34
mahendra.mahey@bl.uk & labs@bl.uk
Virtual Infrastructure for OCR text
OCR text ‘scraped’ from
digitised newspapers
and put in internal cloud
Jupyter notebook
Write python code and results
in web browser
http://jupyter.org
Access available for researchers ‘in residence’
https://www.docker.com/
http://dhbox.org/
35
mahendra.mahey@bl.uk & labs@bl.uk
BL Labs Competition Entry Process
• Think of a project which uses the British Library’s Digital
Collections or Data
• Examine our data and discuss idea
• Propose mini project
• Proposals assessed and successful ones worked on
• 3 examples from 2014, 2015, 2016 given
36
mahendra.mahey@bl.uk & labs@bl.uk
Elements of Proposal
(https://goo.gl/K85hTQ)
• Title and Summary
• Research Question(s)
• How it showcasing digital collections / data
• Methods (text mining, visualisations, statistical analysis)
• Evidence of how you have or will develop the skills, knowledge and
expertise to successfully carry out the project
• Evidence of idea is achievable on a technical, curatorial and legal basis
• Plan
• Risk assessment* (new suggestion)
37
mahendra.mahey@bl.uk & labs@bl.uk
Title and Summary
• Try to summarise the project as once sentence
• Abstract / Summary around 400 words
38
mahendra.mahey@bl.uk & labs@bl.uk
Research Question(s)
• This/these should be very clear
39
mahendra.mahey@bl.uk & labs@bl.uk
Showcasing digital collections / data
• How does your idea showcase digital collections / data
• Have you seen the digital collections and data?
• Do you know the ‘story’ of the collection?
• What state is it in?
• Have you done some initial experiments?
• Will it require cleaning, e.g. using tools like open refine?
• Reality check in terms of what you can actually achieve with the data
will determine idea and scope
40
mahendra.mahey@bl.uk & labs@bl.uk
Methods (text mining, visualisations,
statistical analysis)
• Think of what is going to be required to implement methods, e.g. skills,
time and other resources
• Plan accordingly
• Tools required / software / hardware
41
mahendra.mahey@bl.uk & labs@bl.uk
Evidence of how you have or will develop the
skills, knowledge and expertise to successfully
carry out the project
• List skills, presentations, publications etc.
• Are there gaps?
• How are they are going to be filled?
42
mahendra.mahey@bl.uk & labs@bl.uk
Evidence of idea is achievable on a
technical, curatorial and legal basis
• Technical factors
• Is the project technically feasible?
• Whether the technical skills required to complete the project and who
will be required to implement them have been clearly identified.
• Legal factors
• Whether the legal terms of use for the digital collections identified have
been checked and compliance demonstrated in the proposal.
• Whether the idea contains information that ensures the project does not
in any way infringe intellectual property rights or any other rights of any
third party.
43
mahendra.mahey@bl.uk & labs@bl.uk
Evidence of idea is achievable on a
technical, curatorial and legal basis
• Curatorial factors
• Can it be demonstrated that the digital content is available, accessible
and can be realistically used for the project?
• Background research for people connected to the collection / the story
of the collection
• Is any extra worked required to make the digital content usable for the
project has been clearly identified (where appropriate).
44
mahendra.mahey@bl.uk & labs@bl.uk
Plan
• Define period of time X and Y
• Activity described here (e.g. what, when and by who)
• Break down into manageable chunks / units
• Can run parallel
• Build in reasonable review points and lag.
• How will be it be monitored?
• It’s a plan, it can change!
45
mahendra.mahey@bl.uk & labs@bl.uk
Risk
• Have a view of assessing risks
• Risk / Mitigation / Likelihood / Impact
• Use Low, Medium and High for Likelihood and Impact
Risk Mitigation Likelihood
(after
mitigation)
Impact
Insufficient support from UK
research councils.
Build compelling case.
Carry out research to gauge
demand and commitment to
resourcing.
Adapt model according to our
findings.
M H
46
mahendra.mahey@bl.uk & labs@bl.uk
Labs mindset…
1. Start a conversation, generate positive energy
and try to support ideas
2. Start with small experiments, but think big.
3. Fail faster (don’t be afraid) and persevere.
4. Reject perfectionism! Good enough is
sometimes…good enough!
5. Celebrate the uses of digital collections
https://goo.gl/noASfl

Contenu connexe

Similaire à DH Project Management

A hands-on data exploration & challenge to become a derived data-set author o...
A hands-on data exploration & challenge to become a derived data-set author o...A hands-on data exploration & challenge to become a derived data-set author o...
A hands-on data exploration & challenge to become a derived data-set author o...
labsbl
 
Supporting the Digital Scholar: Experiences from the British Library Labs
Supporting the Digital Scholar:Experiences from the British Library LabsSupporting the Digital Scholar:Experiences from the British Library Labs
Supporting the Digital Scholar: Experiences from the British Library Labs
labsbl
 

Similaire à DH Project Management (20)

Digital Research Support by Stella Wisdom, 20th & 21st Century Collections
Digital Research Support by Stella Wisdom, 20th & 21st Century CollectionsDigital Research Support by Stella Wisdom, 20th & 21st Century Collections
Digital Research Support by Stella Wisdom, 20th & 21st Century Collections
 
British Library Labs Leeds Roadshow 2018
British Library Labs Leeds Roadshow 2018British Library Labs Leeds Roadshow 2018
British Library Labs Leeds Roadshow 2018
 
A hands-on data exploration & challenge to become a derived data-set author o...
A hands-on data exploration & challenge to become a derived data-set author o...A hands-on data exploration & challenge to become a derived data-set author o...
A hands-on data exploration & challenge to become a derived data-set author o...
 
BL Labs Roadshow at the University of Kent
BL Labs Roadshow at the University of KentBL Labs Roadshow at the University of Kent
BL Labs Roadshow at the University of Kent
 
Building Better GLAM Labs - Keynote Presentation at Simon Fraser University
Building Better GLAM Labs - Keynote Presentation at Simon Fraser UniversityBuilding Better GLAM Labs - Keynote Presentation at Simon Fraser University
Building Better GLAM Labs - Keynote Presentation at Simon Fraser University
 
BL Labs at Arts and Humanities event
BL Labs at Arts and Humanities eventBL Labs at Arts and Humanities event
BL Labs at Arts and Humanities event
 
Supporting the Digital Scholar: Experiences from the British Library Labs
Supporting the Digital Scholar:Experiences from the British Library LabsSupporting the Digital Scholar:Experiences from the British Library Labs
Supporting the Digital Scholar: Experiences from the British Library Labs
 
Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...
Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...
Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...
 
British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...
British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...
British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...
 
British Library Labs - Bodleian - University of Oxford
British Library Labs - Bodleian - University of OxfordBritish Library Labs - Bodleian - University of Oxford
British Library Labs - Bodleian - University of Oxford
 
BL Labs Presentation at Open Science Infrastructures for Big Cultural Data
BL Labs Presentation at Open Science Infrastructures for Big Cultural DataBL Labs Presentation at Open Science Infrastructures for Big Cultural Data
BL Labs Presentation at Open Science Infrastructures for Big Cultural Data
 
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200British Library Labs - Open University Presentation - 3 April 2014, 1100-1200
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200
 
Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...
Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...
Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...
 
British Library Labs Presentation to City University London
British Library Labs Presentation to City University LondonBritish Library Labs Presentation to City University London
British Library Labs Presentation to City University London
 
British Library Labs Presentation at Edge Hill University
British Library Labs Presentation at Edge Hill UniversityBritish Library Labs Presentation at Edge Hill University
British Library Labs Presentation at Edge Hill University
 
BL Labs Presentation at Liverpool John Moores University
BL Labs Presentation at Liverpool John Moores UniversityBL Labs Presentation at Liverpool John Moores University
BL Labs Presentation at Liverpool John Moores University
 
Digital Scholarship at the British Library
Digital Scholarship at the British LibraryDigital Scholarship at the British Library
Digital Scholarship at the British Library
 
British Library Labs Presentation Hertfordshire
British Library Labs Presentation HertfordshireBritish Library Labs Presentation Hertfordshire
British Library Labs Presentation Hertfordshire
 
BL Labs and Digital Humanities
BL Labs and Digital HumanitiesBL Labs and Digital Humanities
BL Labs and Digital Humanities
 
Stella Wisdom's Slides for Doctoral Open Day – Art & Design plus Media, Cultu...
Stella Wisdom's Slides for Doctoral Open Day – Art & Design plus Media, Cultu...Stella Wisdom's Slides for Doctoral Open Day – Art & Design plus Media, Cultu...
Stella Wisdom's Slides for Doctoral Open Day – Art & Design plus Media, Cultu...
 

Plus de labsbl

Bl labs sfu-dhi_lab-dhilab-2019-workshop
Bl labs sfu-dhi_lab-dhilab-2019-workshopBl labs sfu-dhi_lab-dhilab-2019-workshop
Bl labs sfu-dhi_lab-dhilab-2019-workshop
labsbl
 

Plus de labsbl (20)

7th BL Labs Symposium (2019): 13_Closing comments
7th BL Labs Symposium (2019): 13_Closing comments7th BL Labs Symposium (2019): 13_Closing comments
7th BL Labs Symposium (2019): 13_Closing comments
 
7th BL Labs Symposium (2019): 12_Digital Research team projects update
7th BL Labs Symposium (2019): 12_Digital Research team projects update7th BL Labs Symposium (2019): 12_Digital Research team projects update
7th BL Labs Symposium (2019): 12_Digital Research team projects update
 
7th BL Labs Symposium (2019): 11_The Artistic Award
7th BL Labs Symposium (2019): 11_The Artistic Award7th BL Labs Symposium (2019): 11_The Artistic Award
7th BL Labs Symposium (2019): 11_The Artistic Award
 
7th BL Labs Symposium (2019): 10_British Library Staff Award
7th BL Labs Symposium (2019): 10_British Library Staff Award7th BL Labs Symposium (2019): 10_British Library Staff Award
7th BL Labs Symposium (2019): 10_British Library Staff Award
 
7th BL Labs Symposium (2019): 09_Community commendation
7th BL Labs Symposium (2019): 09_Community commendation7th BL Labs Symposium (2019): 09_Community commendation
7th BL Labs Symposium (2019): 09_Community commendation
 
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
 
7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...
7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...
7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...
 
7th BL Labs Symposium (2019): 05_The Research Award
7th BL Labs Symposium (2019): 05_The Research Award7th BL Labs Symposium (2019): 05_The Research Award
7th BL Labs Symposium (2019): 05_The Research Award
 
7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...
7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...
7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...
 
7th BL Labs Symposium (2019): 03_BL Labs update
7th BL Labs Symposium (2019): 03_BL Labs update7th BL Labs Symposium (2019): 03_BL Labs update
7th BL Labs Symposium (2019): 03_BL Labs update
 
7th BL Labs Symposium (2019): 01_Welcome and Introduction
7th BL Labs Symposium (2019): 01_Welcome and Introduction7th BL Labs Symposium (2019): 01_Welcome and Introduction
7th BL Labs Symposium (2019): 01_Welcome and Introduction
 
7th BL Labs Symposium (2019): 07_The Teaching & Learning Award
7th BL Labs Symposium (2019): 07_The Teaching & Learning Award7th BL Labs Symposium (2019): 07_The Teaching & Learning Award
7th BL Labs Symposium (2019): 07_The Teaching & Learning Award
 
Bl labs sfu-dhi_lab-dhilab-2019-workshop
Bl labs sfu-dhi_lab-dhilab-2019-workshopBl labs sfu-dhi_lab-dhilab-2019-workshop
Bl labs sfu-dhi_lab-dhilab-2019-workshop
 
Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion Project ...
Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion  Project ...Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion  Project ...
Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion Project ...
 
BL Labs Presentation to the British Library Development Team
BL Labs Presentation to the British Library Development TeamBL Labs Presentation to the British Library Development Team
BL Labs Presentation to the British Library Development Team
 
Presentation to the London Psychology Group
Presentation to the London Psychology GroupPresentation to the London Psychology Group
Presentation to the London Psychology Group
 
Experiences and lessons learned through British Library Labs How have we eng...
Experiences and lessons learned through British Library Labs  How have we eng...Experiences and lessons learned through British Library Labs  How have we eng...
Experiences and lessons learned through British Library Labs How have we eng...
 
Presentation to the National Science Library of the Chinese Academy of Sciences
Presentation to the National Science Library of the Chinese Academy of SciencesPresentation to the National Science Library of the Chinese Academy of Sciences
Presentation to the National Science Library of the Chinese Academy of Sciences
 
Working with the British Library’s Digital Collections & Data - Insights from...
Working with the British Library’s Digital Collections & Data - Insights from...Working with the British Library’s Digital Collections & Data - Insights from...
Working with the British Library’s Digital Collections & Data - Insights from...
 
What is BL Labs?
What is BL Labs?What is BL Labs?
What is BL Labs?
 

Dernier

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Dernier (20)

On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
latest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answerslatest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answers
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactistics
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 

DH Project Management

  • 1. 1 mahendra.mahey@bl.uk & labs@bl.uk http://www.bl.uk/projects/british-library-labs Funded by the Andrew W. Mellon Foundation Mahendra Mahey Experiment with our Digital Collections Mahendra Mahey Manager of BL Labs Project Management and DH projects 1500 - 1630, Monday 18 December, 2017 CHASE AHDA Winter School 2018 The Open University in London/Futurelearn, 1-11 Hawley Crescent, Camden Town, London, NW1 8NP
  • 2. 2 mahendra.mahey@bl.uk & labs@bl.uk Breakdown of session • Introductions • Background to BL Labs and DH Projects • Developing project ideas as proposals, tips and challenges • Feedback and suggestions
  • 3. 3 mahendra.mahey@bl.uk & labs@bl.uk What is Project Management? 1) Can range from informal and small scale focusing largely on common sense flexible approaches to management… 2) To large scale formal approaches using methodologies such PRINCE 2 (PRojects IN Controlled Environments), AGILE, SCRUM and tools such as MS Project • BL Labs uses both • Focus on the first one to help develop your own ideas • Not a session on Project Management!
  • 4. 4 mahendra.mahey@bl.uk & labs@bl.uk Introductions • Name • Affiliation • Project idea (one sentence)
  • 5. 5 mahendra.mahey@bl.uk & labs@bl.uk The British Library Inside the British Library Space for 1200 readers, around 500,000 visitors per year Building 37 uses low oxygen and robots Boston Spa also has a Reading room and provides delivery of items to London Many items stored at Document Supply and Storage centre 48 hours away Stockton-on-Tees Author right to payment each time their books are borrowed from public libraries. St Pancras, London, UK Many books are stored 4 stories below the building UK Legal Deposit Library – Reference only Founded in 1973 though origins stem back to British Museum Library 1759 Boston-Spa
  • 6. 6 mahendra.mahey@bl.uk & labs@bl.uk BL Labs supports… Researchers https://goo.gl/WutNyi Artists http://goo.gl/nNKhQ2 Librarians Curators https://goo.gl/9NWZUW Software Developers https://goo.gl/7QQ5Tf Archivists https://goo.gl/x7b4tg Educators https://goo.gl/qh01Mi Anyone interested in our digital collections and data
  • 7. 7 mahendra.mahey@bl.uk & labs@bl.uk Physical Collections – not just books! > 180*million items > 0.8* m serial titles > 8* m stamps > 14* m books > 6* m sound recordings > 4* m maps > 1.6* m musical scores > 0.3* m manuscripts > 60* m patents King George IV bequeathed Library *Estimates
  • 8. 8 mahendra.mahey@bl.uk & labs@bl.uk Born Digital Digitised Let’s talk Digital…
  • 9. 9 mahendra.mahey@bl.uk & labs@bl.uk / Knowledge Quarter London 80 knowledge organisations (as of 07/12/17) within 1 mile radius of Kings Cross, http://www.knowledgequarter.london http://www.turing.ac.uk (Headquartered at the British Library) UK Web Archive and e-legal deposit (2013) http://www.webarchive.org.uk/ukwa/ Born digital Data all around us at Kings Cross! Born digital Data all around us at Kings Cross! Born digital Data all around us at Kings Cross!
  • 10. 10 mahendra.mahey@bl.uk & labs@bl.uk All our physical items are digitised right?
  • 11. 11 mahendra.mahey@bl.uk & labs@bl.uk #bldigital 1-2 %* digitised * estimate Digitisation Partnerships Commercial & Other Organisations Amount increasing rapidly Bias in digitisation So learn the story behind the digital collection http://goo.gl/bR9UJL Sample Generator
  • 12. 12 mahendra.mahey@bl.uk & labs@bl.uk Playbills, Books, Newspapers (includes Optical Character Recognition (OCR)) Digital collections and Datasets British National Bibliography http://bnb.data.bl.uk http://sounds.bl.ukhttp://dml.city.ac.uk/ Music (Recordings & Sheet) & Sounds http://goo.gl/frSMJt Broadcast News (TV and Radio) http://goo.gl/cwThHw http://goo.gl/pBkisZhttp://goo.gl/E8aRyQ Usage data EtHOS Web ArchiveImages, Manuscripts & Maps http://www.qdl.qa/ Qatar Digital Library http://idp.bl.uk/ International Dunhuang Project Maps http://www.bl.uk/maps/ Hebrew Manuscripts http://goo.gl/4sbCp9 Flickr & Wikimedia Commons https://goo.gl/LZRmaZ
  • 13. 13 mahendra.mahey@bl.uk & labs@bl.uk Finding Open Cultural Heritage Datasets Collection Guides (183 as of 05/12/17) https://www.bl.uk/collection-guides/ Datasets about our collections Bibliographic datasets relating to our published and archival holdings Datasets for content mining Content suitable for use in text and data mining research Datasets for image analysis Image collections suitable for large-scale image- analysis-based research Datasets from UK Web Archive Data and API services available for accessing UK Web Archive Digital mapping Geospatial data, cartographic applications, digital aerial photography and scanned historic map materials https://data.bl.uk Download collections as zips, no API Each dataset has a Digital Object Identifier (DOI) can be referenced for research Not all discoverable via search engines!
  • 14. 14 mahendra.mahey@bl.uk & labs@bl.uk Explore or Imagine Our Data! • CSV of Metadata https://data.bl.uk/digbks/dig19cbooks-mdata-csv.csv • 19th Century Books - Book Metadata - 01/09/2013. https://data.bl.uk/digbks/db21.html • Digitised Books - Flickr Tag History - Dec 2013 to March 2016. TSV https://data.bl.uk/digbks/db15.html • Digitised Hebrew Manuscripts - Metadata https://data.bl.uk/hebrewmanuscripts/heb1.html • Digitised Hebrew Manuscripts: Or 2210 - Or 2364 https://data.bl.uk/hebrewmanuscripts/heb8.html • Theatrical playbills from Britain and Ireland (OCR text only) https://data.bl.uk/playbills/pb2.html • Portraits of actors, views of theatres and playbills (covering 1750 - 1821 in a single volume) https://data.bl.uk/singlesheet/por1.html • Volumes of Lysons Collectanea (Amusements), comprising broadsides, cuttings, advertisements on amusements.1660- 1840. https://data.bl.uk/singlesheet/ad1.html https://data.bl.uk • Have a look at the data. • Data Quality • Issues Or an idea you have thought of what to do with the data! http://labs.bl.uk/Ideas+for+Labs Smaller datasets
  • 15. 15 mahendra.mahey@bl.uk & labs@bl.uk Openly Licensed Digital Content? 15% Openly Licensed Around 80%* available online Working through to make more open… Though some collections will always only be available onsite due to various reasons including legal, ethical etc Breakdown by collection* Manuscripts 59% Books 9% Maps and Views 7% Newspapers 3% Archives and Records 3% Paintings, Prints and Drawings 2% *Based on number of digitisation projects (693 as of 08/12/17) Largest proportion of funding Public / Private Partnership 15 %* Openly Licensed – most online 85 %* Available onsite only at the moment *Estimates
  • 16. 16 mahendra.mahey@bl.uk & labs@bl.uk The Story of the Digital Collection… Digital Collection Curator Who paid for the digitisation? Who did the digitisation? Technology used Born digital? Published Unpublished Where is it? Can it still be accessed? Generates income Reputational risk in using? Legalities Politics when digitised Personalities involved Surprises (e.g. gaps) Descriptive information Old format not supported What media was the digitisation done from? Is there any background documentation? No Descriptive information Inconsistent descriptive information Still there? Good to know the background ‘Story’ of a Digital Collection’ if you want to use it for research and make conclusions…
  • 17. 17 mahendra.mahey@bl.uk & labs@bl.uk Competition Awards Projects Tell us your ideas of what to do with our digital content Show us what you have already done with our digital content in research, artistic, commercial and learning and teaching categories Talk to us about working on collaborative projects
  • 18. 18 mahendra.mahey@bl.uk & labs@bl.uk Example Pattern of Research 1, 2, 3 1. Find / identify new things in messy stuff 2. Unlock hidden history / data 3. Celebrate by telling new stories!
  • 19. 19 mahendra.mahey@bl.uk & labs@bl.uk Finding / identifying invisible / well hidden things in ‘messy’ historical data https://goo.gl/mcpa8B Not the British Library! Example Pattern of Research 1 Some of the challenges we face at the Library
  • 20. 20 mahendra.mahey@bl.uk & labs@bl.uk Unearthing / unlocking hidden histories & data to stimulate new research https://goo.gl/vJ291F It’s an 18th Century Poem! Example Pattern of Research 2
  • 21. 21 mahendra.mahey@bl.uk & labs@bl.uk Celebrating hidden histories / data creatively through events, art, performance and story telling https://goo.gl/Ql0Bwz Re-enacting, re-discovering history Example Pattern of Research 3
  • 23. 23 mahendra.mahey@bl.uk & labs@bl.uk https://goo.gl/oUNj5N https://goo.gl/ImAUv4 Finding things in ‘messy’ Optical Character Recognised (OCR) text Mrs Folly • Clean up some manually • Get human ‘ground truth’ • Write computer code (sometimes it’s machine learning) to find things reliably in it ‘automatically’ • Try code on messy content • Tweak if necessary • Digital ‘lasso’ around content • Human sift through Mrs Folly An example pattern of research
  • 24. 24 mahendra.mahey@bl.uk & labs@bl.uk Looking through a rubbish bin? https://goo.gl/UeEvqs Good stuff! Some Rubbish
  • 25. 25 mahendra.mahey@bl.uk & labs@bl.uk Machine Learning / Reading Analogies to how humans read / learn Machines acquire ‘knowledge’ / data, use that knowledge / data to make sense / identify patterns https://goo.gl/k68fTf https://goo.gl/gXmVQL Can you see the bird?
  • 26. 26 mahendra.mahey@bl.uk & labs@bl.uk Need to stress still requires computational & human effort… https://goo.gl/gDQEAz Labs doing this on a case by case basis so methods can vary Machine Learning / Reading still requires ‘Human Effort’!
  • 27. 27 mahendra.mahey@bl.uk & labs@bl.uk Legalities of Machine Learning / Text and Data mining https://goo.gl/toq4Bo Legalities of Machine Learning / Text and Data mining still up for discussion…Often misunderstood Is it the same as humans reading and looking for patterns…just a bit quicker?
  • 28. 28 mahendra.mahey@bl.uk & labs@bl.uk http://victorianhumour.tubmblr.com Victorian Meme Machine (2014) https://goo.gl/HMqDt3 Bob Nicholson http://victorianhumour.tumblr.com/ Bob Nicholson interviewed on BBC Radio 4 Making History Programme: http://goo.gl/fmV9ep And telling jokes to the public: http://goo.gl/xIDRhz Bob obtained further funding from his university Looking for more collaborations https://www.youtube.com/watch?v=-GRgj7Q5OM0 Rob Walker, Victorian Mother-in-law Jokes Victorian Comedy Night, 7 Nov 2016 Learnt about access paths to digital collections
  • 29. 29 mahendra.mahey@bl.uk & labs@bl.uk Katrina Navickas (2015) Political Meetings Mapper http://politicalmeetingsmapper.co.uk https://goo.gl/Qq78Oa Labs Symposium 2015 https://goo.gl/BSA3be Interview 2015 The Chartist Newspaper http://goo.gl/vOLSnH Chartist Monster Meeting Chartists Walking Tour and Re-enactment London Learnt that domain knowledge reduces noise
  • 30. 30 mahendra.mahey@bl.uk & labs@bl.uk Data-mining verse in 18th Century newspapers BL Labs Project 16-17, Jennifer Batt https://goo.gl/5Akthd Slides courtesy Jennifer Batt
  • 31. 31 mahendra.mahey@bl.uk & labs@bl.uk What thoj' among ourrelves, with too much Heat, or t W: fweutimes.wongle, wvhen we Ihould debate, W – (A confequential Ill which Freedom drawvs, fl t A bad Efficf, but from a noble Caufe) t We can with univeifal Zcal advance, to To cutb the faithlefs Arrogancccof V rance. hi Dublin Journal, 10-14 September, 1745 Slides courtesy Jennifer Batt
  • 32. 32 mahendra.mahey@bl.uk & labs@bl.uk Verse: 81% lines begin with initial capital Prose: 52% lines begin with initial capital Westminster Journal 3 March 1745 Slides courtesy Jennifer Batt Started to refine Machine Learning Techniques Jennifer Batt @ the BL on World Poetry Day ‘40,000’ things found…
  • 33. 33 mahendra.mahey@bl.uk & labs@bl.uk Use of Overproof OCR Correction? Re-OCR with ABBY FineReader? https://www.abbyy.com/en-gb/ http://overproof.projectcomputing.com/ RE-OCR Cleaning up OCR Text – significant improvement up (depending on original image quality)
  • 34. 34 mahendra.mahey@bl.uk & labs@bl.uk Virtual Infrastructure for OCR text OCR text ‘scraped’ from digitised newspapers and put in internal cloud Jupyter notebook Write python code and results in web browser http://jupyter.org Access available for researchers ‘in residence’ https://www.docker.com/ http://dhbox.org/
  • 35. 35 mahendra.mahey@bl.uk & labs@bl.uk BL Labs Competition Entry Process • Think of a project which uses the British Library’s Digital Collections or Data • Examine our data and discuss idea • Propose mini project • Proposals assessed and successful ones worked on • 3 examples from 2014, 2015, 2016 given
  • 36. 36 mahendra.mahey@bl.uk & labs@bl.uk Elements of Proposal (https://goo.gl/K85hTQ) • Title and Summary • Research Question(s) • How it showcasing digital collections / data • Methods (text mining, visualisations, statistical analysis) • Evidence of how you have or will develop the skills, knowledge and expertise to successfully carry out the project • Evidence of idea is achievable on a technical, curatorial and legal basis • Plan • Risk assessment* (new suggestion)
  • 37. 37 mahendra.mahey@bl.uk & labs@bl.uk Title and Summary • Try to summarise the project as once sentence • Abstract / Summary around 400 words
  • 38. 38 mahendra.mahey@bl.uk & labs@bl.uk Research Question(s) • This/these should be very clear
  • 39. 39 mahendra.mahey@bl.uk & labs@bl.uk Showcasing digital collections / data • How does your idea showcase digital collections / data • Have you seen the digital collections and data? • Do you know the ‘story’ of the collection? • What state is it in? • Have you done some initial experiments? • Will it require cleaning, e.g. using tools like open refine? • Reality check in terms of what you can actually achieve with the data will determine idea and scope
  • 40. 40 mahendra.mahey@bl.uk & labs@bl.uk Methods (text mining, visualisations, statistical analysis) • Think of what is going to be required to implement methods, e.g. skills, time and other resources • Plan accordingly • Tools required / software / hardware
  • 41. 41 mahendra.mahey@bl.uk & labs@bl.uk Evidence of how you have or will develop the skills, knowledge and expertise to successfully carry out the project • List skills, presentations, publications etc. • Are there gaps? • How are they are going to be filled?
  • 42. 42 mahendra.mahey@bl.uk & labs@bl.uk Evidence of idea is achievable on a technical, curatorial and legal basis • Technical factors • Is the project technically feasible? • Whether the technical skills required to complete the project and who will be required to implement them have been clearly identified. • Legal factors • Whether the legal terms of use for the digital collections identified have been checked and compliance demonstrated in the proposal. • Whether the idea contains information that ensures the project does not in any way infringe intellectual property rights or any other rights of any third party.
  • 43. 43 mahendra.mahey@bl.uk & labs@bl.uk Evidence of idea is achievable on a technical, curatorial and legal basis • Curatorial factors • Can it be demonstrated that the digital content is available, accessible and can be realistically used for the project? • Background research for people connected to the collection / the story of the collection • Is any extra worked required to make the digital content usable for the project has been clearly identified (where appropriate).
  • 44. 44 mahendra.mahey@bl.uk & labs@bl.uk Plan • Define period of time X and Y • Activity described here (e.g. what, when and by who) • Break down into manageable chunks / units • Can run parallel • Build in reasonable review points and lag. • How will be it be monitored? • It’s a plan, it can change!
  • 45. 45 mahendra.mahey@bl.uk & labs@bl.uk Risk • Have a view of assessing risks • Risk / Mitigation / Likelihood / Impact • Use Low, Medium and High for Likelihood and Impact Risk Mitigation Likelihood (after mitigation) Impact Insufficient support from UK research councils. Build compelling case. Carry out research to gauge demand and commitment to resourcing. Adapt model according to our findings. M H
  • 46. 46 mahendra.mahey@bl.uk & labs@bl.uk Labs mindset… 1. Start a conversation, generate positive energy and try to support ideas 2. Start with small experiments, but think big. 3. Fail faster (don’t be afraid) and persevere. 4. Reject perfectionism! Good enough is sometimes…good enough! 5. Celebrate the uses of digital collections https://goo.gl/noASfl

Notes de l'éditeur

  1. 140 seconds The British Library is the national library of the UK and one of the largest research libraries in the world . The Library moved to a new purpose built building in 1997 <click> the largest of it’s kind that was built in the UK in the 20th century. Many frequently used items are stored 5 stories below the main building at St Pancras in London and many might not know that part of the building is meant to look like a ship on a journey to discovery!<click>. <click to switch off> The building can sit 1,200 researchers at any one time across 5 reading rooms. <click>Medium and long term requested items are held at Boston Spa in Yorkshire in a low oxygen warehouse, using robot to retrieve items. In total, the library has 625 km of shelving, growing by 12 km every year. Whilst we acquire items through purchase or gifts, much of the collection has been built up through legal deposit. That is, by law, a copy of every UK and Ireland print publication must be given to the British Library by its publishers. Around 3 million items are added per year. In 2013, legal deposit was extended to cover non-print material which means by law we take in digitally published items as well, which means regular mass crawls of the entire UK web domain as well as ebooks, ejournals etc.
  2. https://goo.gl/WutNyi http://goo.gl/nNKhQ2 https://goo.gl/9NWZUW https://goo.gl/7QQ5Tf https://goo.gl/x7b4tg https://upload.wikimedia.org/wikipedia/commons/a/a2/Interactive_whiteboard_at_CeBIT_2007.jpg
  3. 85 seconds The picture you can see is inside the main building in London, it’s the King’s Library – King George the Third’s personal library! Sometimes known as the ‘stack’, I walk past this everyday and I sometimes forget that the collections the British Library have are truly staggering! We currently estimate them to exceed <click>150 million items, representing every age of written civilisation and every known language. Our archives now contain the earliest surviving printed book in the world, the Diamond Sutra, written in Chinese and dating from 868 AD…. So some big numbers… Over …<click>14 million books <click>60 million patents <click>8 million stamps <click>4 million maps <click>3 million sound recordings <click>1.6 million music scores <click>over .3 million manuscripts <click>0.8 million serials titles (which are of course made up of many many volumes/editions), this is where a lot of our content is, just in case you thought the numbers didn’t add up!
  4. 6 Seconds (20 Words) So <Click> ‘how’ do we try and engage those who might be interested in the BL’s digital collections and data? <Click>
  5. 17 Seconds (53 Words) <Click>The British Library is one of the largest Library’s in the world <Click> with an estimated 180 million physical items, with only a small proportion being digitised. <Click>We estimate this is around 1-2%, but no one really knows exactly how much. However, increasingly more items are being stored as ‘born’ digital, such as the UK Web Archive<Click>
  6. Have balance of Multimedia Broadcast news and radio, sounds asave our sounds Books and newspapers Images BNB Qatar Digital library Hebrew manuscripts
  7. 21 Seconds (65 Words) Katrina Navickas was particularly interested in the <Click>Chartist Movement who were a group who were campaigning for the vote for working people. <Click>They were the biggest popular movement for democracy in 19th century British history, just as this is early picture shows a huge monster meeting at Kennington Common<Click>She wanted to use a combination of manual and computational methods to explore our Digitised Newspapers to find out when and where they met and plot them on map. <Click>and hopefully unearthing new history.
  8. 970 files from a selection of 19th century newspaper titles from the BL corpus for us to correct using the overProof post-OCR correction software The best way to measure the improvement made by the correction process is to compare the OCR'ed text and the automatically corrected text with a perfect correction made by a human (known as the "ground truth"). Hannah-Rose's 5 small human-corrected samples are show as green dots. These are not only smaller than the other files, but their raw error rate is much lower at 13.3%. OverProof was measured as reducing this to 5.4%, a removal of almost 60% of errors. The red dotted-line indicates the correction "break-even" point: the further under the line, the better the quality of the document after correction. In the graph below, the grey line shows distribution of files across error rates before correction and the green line after correction.