AHRC CDP Digital Humanities 101

DH101: Exploring the
Computational Turn in the
Humanities
Nora McGregor
Curator, Digital Research
@ndalyrose

www.bl.uk 2
• Try to define digital humanities 
• Understand some of the buzzwords in and around DH
• Text/Data Mining & Machine Learning
• Data & Data Visualisation
• Georeferencing
• and a little Computer Vision & 3D modelling for good measure
….through lots of examples!
• Get tips for finding further info & support
Over the next hour we will….

www.bl.uk 3
But first, who am I?!
Founded in 2010, the Digital
Scholarship Department at British
Library supports researchers and
staff to make innovative use of our
digital collections and data.
We are a group of cross disciplinary
experts in the areas of digitisation,
librarianship, digital history &
humanities, computer and data
science, looking at how technology is
transforming research, and in turn,
our services.
@BL_DigiSchol

www.bl.uk 4
Getting (& staying) in the game
The Digital Scholarship Training Programme is
an internal staff training initiative by the Digital
Curator team that launched in November 2012.
Informed by the Digital Humanities, we look at
what researchers in the field were
learning/doing.

www.bl.uk 5
What does
“digital humanities”
mean to you?
(https://whatisdigitalhumanities.com/)

www.bl.uk 6
“Unlike many other interdisciplinary
experiments, humanities computing
has a very well-known beginning. In
1949, an Italian Jesuit priest, Father
Roberto Busa, began what even to
this day is a monumental task: to
make an index verborum of all the
words in the works of St Thomas
Aquinas and related authors, totaling
some 11 million words of medieval
Latin.
http://www.digitalhumanities.org/companion
/view?docId=blackwell/9781405103213/97
81405103213.xml&chunk.id=ss1-2-1
The origin story

www.bl.uk 7
“The real origin of that term [digital humanities] was in conversation
with Andrew McNeillie, the original acquiring editor for the Blackwell
Companion to Digital Humanities. We started talking with him about that
book project in 2001, in April, and by the end of November we’d lined up
contributors and were discussing the title, for the contract. Ray
[Siemens] wanted “A Companion to Humanities Computing” as that was
the term commonly used at that point; the editorial and marketing folks
at Blackwell wanted “Companion to Digitized Humanities.” I suggested
“Companion to Digital Humanities” to shift the emphasis away from
simple digitization.”
-John Unsworth, founding director of the
Institute for Advanced Technology in the Humanities
at the University of Virginia and author of
Blackwell Companion to Digital Humanities
The origin story, part II

www.bl.uk 8
• An area of scholarly activity, born from humanities computing, at the
intersection of computing/digital technologies and the
humanities.
• The field both employs technology in the pursuit of humanities
research, and subjects technology to humanistic questioning and
interrogation.
• DH is collaborative, crossdisciplinary, and computationally
engaged research, teaching, and publishing.
https://en.wikipedia.org/wiki/Digital_humanities
Defining digital humanities (DH)

www.bl.uk 9
The emergence of the new digital humanities isn’t an isolated academic
phenomenon. The institutional and disciplinary changes are part of a
larger cultural shift, inside and outside the academy, a rapid cycle of
emergence and convergence in technology and culture
Steven E Jones, Emergence of the Digital Humanities (2014)
http://lisacharlotterost.github.io/2015/06/20/Searching-through-the-years/

www.bl.uk 10
Is it a discipline? Or a set of methods that can be
used across disciplines (like textual criticism)
Lots of debate but for today we can safely
agree….
DH combines the methodologies from
traditional humanities & social science
disciplines…
….with computational tools provided by
computing disciplines.
Machine learning
Data Mining
Georeferencing
Text mining
Defining digital humanities (DH)
Data Visualisation
Crowdsourcing

www.bl.uk 11
How might digital humanities
techniques benefit your research?
• Explore a bigger body of material computationally than by individually
reading entire texts
• Sometimes see trends, patterns and relationships not apparent from
close reading
• Gain a broad overview of a topic
• Test an idea or hypothesis on a large dataset
• Provide skills and tools for keeping your research data clean
• New sources of funding, collaborations, connections
• …..and more!

www.bl.uk 14
Text & Data Mining
Using a variety of computational techniques to derive information
from and find patterns in texts and large datasets. Two common TM
tasks:
• Named-entity recognition: find and classify words in texts that might
refer to names of things, such as a person or company
• Topic modelling: a method for finding a group of words (i.e topic) from
a collection of documents that best represents the information in the
collection.
Machine Learning
• Constructing algorithms that can learn from and make predictions on
data...employed in a range of computing tasks relevant to humanities
scholarship such as TM & automatic Handwritten Text Recognition (HTR)

www.bl.uk 15
Stanford Named Entity Tagger
http://nlp.stanford.edu:8080/ner/

www.bl.uk 16
Transkribus
Transkribus is an open-source software
for the automated recognition,
transcription, indexing and enrichment of
handwritten archival documents. It relies
on crowdsourcing and machine learning.
Each contribution
helps train the model
for automatic
recognition.

www.bl.uk 17
Political Meetings Mapper
Dr. Katrina Navickas, a self-professed
luddite, wanted to know how many, and
where, Chartist movement meetings
took place in the 19th Century and if
there was a more efficient way to
extract this information
programmatically from our digitised
newspapers, rather than by hand.
5,519 meetings held from 1838 to 1850
discovered in 462 towns and villages
across the UK!
Will be added to her existing findings:
http://protesthistory.org.uk/the-story-
1789-1848/database-of-meetings
“I was able to do in minutes with a python code what
I’d spent the last ten years trying to do by hand!”
-Dr. Katrina Navickas, BL Labs Winner 2015

www.bl.uk 18
Data Visualisation
• The graphical display of quantitative or qualitative information to
create insights by highlighting patterns, trends, variations and
anomalies.
• For 'sense-making (also called data analysis) and communication'
(Stephen Few)
• '…interactive, visual representations of abstract data to amplify
cognition' (Card et al)
• Visual perception is faster; interactive visualisations let you move
between the shape and the detail of a collection

www.bl.uk 19
http://datavizproject.com/

www.bl.uk 20
Big Data History of Music
How can vast amounts of bibliographic data held by research libraries be
unlocked for music researchers to analyse?
Can this data be interrogated in ways that challenge the traditional narratives
of music history?
Analyses and visualisations
exposed previously
uncharted patterns in the
history of music, for instance
the rise and fall of music
printing in 16th- and 17th-
century Europe (huge dips in
output in Venice were down
to plague and war).
https://www.royalholloway.ac
.uk/music/research/abigdata
historyofmusic/home.aspx

www.bl.uk 21
Example: Mapping the Republic of Letters & Video
https://youtu.be/nw0oS-AOIPE?t=8s

www.bl.uk 22
Georeferencing
• Linking data with a physical location. It relates information (documents,
texts, maps, images) to geographic locations through place names and
place codes or geospatial referencing (longitude and latitude coordinates).
• Some representative modes of enquiry enabled by georeferencing…
• Correspondence, Networks & Relationships (Republic of Letters)
• Mapping Literature (Willa Cather)
• Historical Social Movements (Political Meetings Mapper)
• Historical reconstructions (Orbis)
• Cities & Memory (Bomb Sight)
• Spread of Technology & Ideas (Atlas of Early Printing)
• Human-Environment Interaction (London Sound Survey)

www.bl.uk 23
Orbis: "Google Maps for Ancient Rome"
Video: https://www.youtube.com/watch?v=eWz7vXzmreg
View Interactive Map: http://atlas.lib.uiowa.edu/
Project Site: http://atlas.lib.uiowa.edu/about.php
The Stanford Geospatial
Network Model of the Roman
World reconstructs the time cost
and financial expense
associated with a wide range of
different types of travel in
antiquity.
ORBIS was created using data
from both primary sources and
computational geography
simulations about travel, wind
and sea patterns, seasonal
access, costs and other
considerations to plot realistic
transport networks.

www.bl.uk 24
Canada Through the Lens:
mapping a collection
Phil Hatfield, Curator created an
interactive map enabling access to
the Canadian copyright collection
by location, providing users with
metadata and, where possible,
access to the rights cleared (public
domain) images held on the
Library's Wikimedia site.
He used openly available tools
(Google Fusion Tables) which
automatically georeferenced the
data for him.
Discovered much of the collection
followed closely along railway
lines.

www.bl.uk 25
Computer Vision
• Closely related to Machine Learning, it’s concerned with the automatic
extraction, analysis and understanding of useful information from a
single image or a sequence of images.
It’s not ALL text based!

www.bl.uk 26
3D modelling
• Creating a three dimensional
computer model which
represents a three dimensional
object. 3D models are made
from points or vertices in 3D
space connected by geometric
data, such as lines and curves.
This forms a wireframe
representation which can be
displayed with a solid surface
through a process called
‘rendering’. Textures and
images can then be mapped to
the surfaces of the 3D model to
create ‘visualisations’.
It’s not ALL text based!

www.bl.uk 28
Humanities Data
• Facts and statistics collected together for reference or analysis
• Humanities data might be sets of bibliographic information, images,
image processing details, texts, texts with mark-up and annotations,
historical tabular data, archived webpages…you name it!
• A data set represents a distinct collection of data ideally packaged,
preserved and made accessible for enquiry.
• Humanities data can be “big”, “small”, “smart”…..but mostly
complex!

www.bl.uk 29
Messiness in historical data
• 'Begun in Kiryu, Japan, finished in France'
• 'Bali? Java? Mexico?'
• Variations on USA:
– U.S.
– U.S.A
– U.S.A.
– USA
– United States of America
– USA ?
– United States (case)
• Inconsistency in uncertainty
– U.S.A. or England
– U.S.A./England ?
– England & U.S.A.

www.bl.uk 31
Ships Log Books & Modern
Forecast Models
The East India Company archives
include 900 log-books of ships containing
daily instrumental measurements of
temperature and pressure, and
subjective estimates of wind speed and
direction, from voyages across the
Atlantic and Indian Oceans between
1789 and 1834.
The Met Office digitised and transcribed
these books, providing 273,000 new
weather records offering an
unprecedentedly detailed view of the
weather and climate of the late
eighteenth and early nineteenth centuries
in certain locations, which can be used to
test the accuracy of their forecasting
models.

www.bl.uk 32
• Cultural heritage records contain uncertainty and fuzziness (e.g. date ranges, multiple
values, uncertain or unavailable information)—Curators and staff at institutions often
have unique expertise in deciphering these anomalies-ask them! ( [1960] vs.1960 can
have a big impact depending on what you’re doing)
• Optical Character Recognition in particular is an imperfect art-need to consider how
bad it is, how this might effect your findings, and what needs doing to mitigate it.
• Keeping data clean, organised, open and described well will not only make your life
easier, but enable its widespread re-use beyond the life of your PhD and increase
future impact. (Datasets you’ve created in the course of your research projects could
even be used to enhance national collections!)
• Decisions always need to be made while normalising information for visualisation.
Documenting them is important for your research but also future re-use!
• Is your aim enquiry or presentation? All of this will have an impact on the tools and
data cleaning choices you make.
Things to consider: Data + Tools

www.bl.uk 34
#digitalhumanities
dancohen/lists/digitalhumanities
@ProfHacker
@Dhnow
@BL_DigiSchol
And more links to resources here: http://scottbot.net/teaching-yourself-to-code-in-dh/

www.bl.uk 35
Contacts
Email: digitalresearch@bl.uk
Blog: http://britishlibrary.typepad.co.uk/digital-scholarship/
Web: https://www.bl.uk/subjects/digital-scholarship
Thank you!

AHRC CDP Digital Humanities 101

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AHRC CDP Digital Humanities 101

Similar to AHRC CDP Digital Humanities 101 (20)

More from Digital Research and Curator Team @ British Library

More from Digital Research and Curator Team @ British Library (20)

Recently uploaded

Recently uploaded (20)

AHRC CDP Digital Humanities 101

Editor's Notes