Sally Rumsey, Janet McKnight, James A.J. Wilson - Research data management for the humanities: a non-Procrustean infrastructure
1. Research data
management for
the humani2es:
a non-‐Procrustean
infrastructure
James A. J. Wilson
Sally Rumsey
Janet McKnight
University of Oxford
hCp://en.wikipedia.org/wiki File:Theseus_Prokroustes_Staatliche_An2kensammlungen_2325.jpg
hCp://en.wikipedia.org/wiki/Public_domain
2. Procrustes
“a brigand who lived between Eleusis and
Athens. Having overcome his vic2ms he would
force them to lie down on a bed, or on one of
two beds; if they were too short, he would
hammer them out or rack them with weights to
fit the longer bed, if too
tall he would cut them to
fit the shorter. Theseus
disposed of him in like
manner.”
Oxford Classical Dic2onary
4. Oxford RDM Principles
• Modular
– Different business models for
different components
– May be extended (or reduced)
• Researcher-‐focused
– Caters for different disciplines and
working prac2ces
• Intra-‐ins2tu2onal
– Requires input from mul2ple support
departments and Academic Divisions
5. Humani2es research data
• Difficult to define what cons2tutes ‘data’
• Extremely diverse
• Value tends not to depreciate over 2me
• Tends to be compiled from exis2ng sources, not created from
scratch
– Frequently incomplete or inconsistent due to inconsistent sources
– Frequently par2al or specific according to research focus
– Frequently involves interpreta2on and assessment
• Some2mes not in op2mal format for analysis
• Life’s work – projects frequently build on earlier projects
• Hard to generalize!
• Many issues not restricted to the humani2es
6. Humani2es data formats
• 95% work with textual data
• 45% with images
• 48% use tables or spreadsheets
• 23% use rela2onal databases
• 6% use XML text mark-‐up
0%
20%
40%
60%
80%
100%
0%
20%
40%
60%
80%
100% How are your data
stored or structured?
What kinds of data do
you work with?
Based on 2012 Survey responses from researchers working with data:
7. Humani2es Data Research Prac2ces
• Least likely to conduct research as part of
a team
– idiosyncra2c prac2ces
– limited sharing of best prac2ce
• Least likely to be externally funded
• Least likely out of all academic divisions to
describe RDM as ‘essen2al’ to their
research (49%)
• Least likely to have deposited data in a
data repository
• Lowest awareness of Oxford’s RDM Policy
• 73% happy (at least in theory) to freely
share at least some of their research data
(2nd most open aper MPLS) 0% 50% 100%
Humanities
Mathematic
al, Physical
and Life
Sciences
Medical
Sciences
Social
Sciences
As part of a
team, with our
research data
managed by the
team
As part of a
team, but each
member of the
team looks after
their own data
As an individual
Some of my
research is
undertaken as
part of a team,
but I also
conduct some
research
independently
Do you conduct your research as part
of a team or as an individual?
Based on 2012 Survey responses from researchers working with data:
8. Conclusions for the Ins2tu2on
• Humani2es researchers amongst hardest to reach
• Need to offer long-‐term cura2on
• Need to encourage cultural change through
training and support
– Par2cularly improving documenta2on and spreading
good prac2ce
• Few requirements unique to Humani2es, however
• Need to offer flexible RDM solu2ons
– whilst also focusing first on most widely shared
problems across disciplines
12. Bodleian: discovery and finding aids
Steve Hankins hankinsphoto.com hCp://
www.flickr.com/photos/7961775@N03/7484532450/
*
* hCp://crea2vecommons.org/licenses/by/2.0/deed.en_GB
hCp://www.fihrist.org.uk/
Towards a Union Catalogue of
Correspondence: Early Modern
LeCers Online hCp://
emlo.bodleian.ox.ac.uk/
16. Consultancy
services:
Bodleian
text
technologies
TCP partners have used this corpus to:
c o ar ed T ’ ec r ex
T ex w o re o rc crea
z a er o wor e a o
e of war f roc a da
The Text Creation Partnership
is a significant data set for innovative digital humanities research
f c e TC re
“With titles on subjects ranging from literature to geography, diplomacy to slavery,
poetry to science, it will be, without question, the most important digital resource
ever created for the study of the early modern period.”
Stephen Ramsay, Associate Professor of English, University of Nebraska-Lincoln
21. • Research data archive, discovery [& access]
• Building a flexible solu2on
• DOIs – cita2on
• Preserva2on for long term access
• Located with Bodleian digital collec2ons
22. Oxford DataBank environment
Metadata
Data input
Manual
Mediated
Harvested
External store
Applica/on
A
Applica/on
B
Applica/on
C
Applica/on
D
What ques2ons do I
want to answer using my
data?
Digi2sed and
born-‐digital
Experimental and
non-‐experimental
23. Item types
Oxford Examina2on Schools hCp://hdl.handle.net/1813.001/5sx8
No known copyright restric2ons Cornell University Library
Images – s2ll & moving
hCps://databank.ora.ox.ac.uk/general/datasets/OSCCIVideos
24. Reproduced with kind permission of the Boethius Commentary Project, Funded by The
Leverhulme Trust, 2007-12, and based at the Faculty of English, University of Oxford
cernebat : et P3; .i. inquirebat A B1 C4 Ge O P P9;
men2s intuitu pulori(!) V4; suo acumine F2 Ma M1 P7 P9 T V5
rosei lumina solis : astronomicam ra2onem A C4 Ge O P P9;
uel splendorem ius22ae K1 P7 T;
uel] om. K1 T.
solis uel lunae defectus. et hoc peryfrasin dictum M2;
astronomicam ra2onem nam et sol unus est ex vii. plane2s B1
rosei: rubicundi B P3; crocei V5; pulchri F2 K1 P9 T; epitethon(!) Es;
solis pulchri Ma;
rosei rubicundi siue pulchri quia roseum ponitur saepe
pro pulchre A C4 Ge O;
rosei] rosei .i. O; rosei et croceum/ A.saepe] om. O. pulchre] pulchro Ge O.
roseum et croceum pro pulchro accipiuntur F2 Ma M1 P7 P9 T V4;
accipiuntur] gloss wri<en over by late hand F2.
pulcri quia rosei coloris est in suo ortu P7
25. Fritzi Scheff (1879-‐1954), Vienna-‐born American vocalist hCp://
www.powerhousemuseum.com/collec2on/database/?irn=322920 No
known copyright restric2ons
Audio
hCps://databank.ora.ox.ac.uk/general/datasets/
Tick1AudioCorpus
26.
27.
28.
29. Packages
• Makes DataBank flexible
• Ideal for data
• Bundle different files
together
– Metadata
– Licence
– Read me
– Sopware
• Unpack zip and other
types of compound files
30.
31.
32.
33. Metadata describing data for Oxford
data services
• Sources: manually entered and harvested
• Data cita2on
• Person [unique ID]
• Geo-‐coordinates
• Any metadata schema can be uploaded
• Subject headings (FAST) & keywords
• Link publica2ons and data
• Other related works
34. DataReporter
• Will generates
standard reports
– Ins2tu2onal and
departmental reports
– Click-‐throughs &
downloads
– Personal data
publica2on reports
– Records lacking key
metadata
– Sta2s2cs for REF
• Admin-‐only in first
instance
35. Conclusion
We believe Oxford
humani2es will be well
served by the Oxford
model
The Bodleian Libraries.
MS. Arch. Selden B. 26
36. Janet Fell
• Also one of you
• Main objec2ves:
• Test refine guidelines & procedures
• Sort out data
• Ingest data
• Examine processes
• Have humani2es data in Bod archival data store
• Data management – planning ahead
• The humani2es projects included in the work – variety; working
across disciplines
• “assessthestrengthsandweaknessesophecurrent
• arrangementsforArtsandHumani2esdatacura2onandsharing” from
RDMF website
37. DHARMa: Digital Humani2es
Archives for Research Materials
Enabling Digital Humani2es research
through effec2ve data preserva2on
38. Direc2on of travel
Surveying the landscape
Humani2es
research prac2ces,
funding
requirements, etc.
Building the infrastructure
IT systems, planned
processes and
workflows
Roads and roadmaps
Real processes,
instruc2ons,
guidelines, and
human guides
through the maze
39. Where we’re going
• Outline the workflow
• Dive into the data
• Ingest into DataBank
• Use what we’ve
learned
• Plan for the future
Photo: Astolath
hCp://www.flickr.com/photos/astolath/492299057/
40. Finding our way
Before you impose a workflow on someone,
you should walk a mile in their shoes…
Photo: juggzy_malone
hCp://www.flickr.com/photos/11507123@N00/466342171/
41. Variety is the spice of life
Variety of projects:
• Different stages
• Different sizes and scales
• Different materials
• Different subject areas
Across departments:
• Bodleian Libraries
• IT Services
• Humani2es Division / TORCH
• Research Services
• Facul2es and departments
Photo: Joanna Bourne
hCp://www.flickr.com/photos/66992990@N00/4818938497/
42. Poten2al problems
• Mo2va2on
• Ownership
• Confusion!
• Grey areas
• Sustainability
Photo: Manic Street Preacher
hCp://www.flickr.com/photos/manicstreetpreacher/4470163049/
43. How we’ll know we’ve got there
Principal outputs will be:
• Comprehensive guidelines and procedures
• A strengthened set of DH projects
• An exemplary archive of data in DataBank
Photo: jayneandd
hCp://www.flickr.com/photos/jayneandd/4450623309/
44. But also…
• BeCer communica2on
• BeCer networks
• BeCer sharing of
knowledge