Jisc, the Wellcome Library, and non UK universities and professional societies, have been working on a three-year large-scale digitisation project of more than 15 million pages of 19th century published works, resulting in the UK Medical Heritage Library, a valuable resource for the exploration of medical humanities.
I hosted a live lab day on the 26th October, with researchers and developers, at the Wellcome Library, to look at how this resource can be developed. These are the results of the discussion.
1. Report on The #UKMHLlivelab
26th
Oct 2016, Wellcome Library, London
Hosted by: Professor Melissa Terras
Professor of Digital Humanities, UCL Dept of Information Studies
Director, UCL Centre for Digital Humanities
m.terras@ucl.ac.uk, @melissaterras
With Owen Stephens (@ostephens) and Peter Findlay (@PFindlay_500)
A report after the event, hurriedly written up early on 27th
October by M Terras!
2. Aims
• Jisc, the Wellcome Library, and non UK universities
and professional societies, have been working on a
three-year large-scale digitisation project of more than
15 million pages of 19th century published works,
resulting in the UK Medical Heritage Library, a
valuable resource for the exploration of medical
humanities.
• How can we best serve the research community with
this material?
• Bring together researchers with developers to explore
the resource, which launched officially on 27th
October
2016
• Understand user needs, and improve functionality
• Help user community
• https://ukmhl.historicaltexts.jisc.ac.uk/
3. Attendees
• Hosted by Melissa Terras, Peter Findlay (Jisc)
and Owen Stephens
• 6 Jisc Historical Text programmers and service
managers
• 20 interested researchers
– From MA level students
– To Professors
– And Librarians
7. Hard work in the basement of the Wellcome Library, discussion inbetween
8. Hard thinking with the developers on the interface. Plus sugar for sustenance.
9. A Dictionary of Psychological Medicine, Tuke, 1982
Some looked at individual items. This image is from a dictionary entry about reflexes,
but the text on the page is for the next entry – regicide – showing how hard is it to do
Machine Learning on images from the text that surrounds them.
10. The texts are not just about medicine. Lots for the food historian in there too! + others
11. Research Topics
• Smells, fumes, air, ventilation
• Food History
• Semantic Text Analysis
• Identifying and using Tables of Data in digitised content
– Particularly related to the census
• Diseases
• Alcohol
• Identification of different genres of text
– Public facing versus medical, grey literature
– Messages in research vs promotional vs lay texts
• Tracing different Editions of text over time
– Identifying reuse of illustrations in different texts
– Using metadata to trace different editions
– Using image matching/processing to identify image reuse
12. What People Want
• A subset of data
– Improved filters, including upload of terms/thesaurus to
generate smaller subset
• A way to search the collection and generate a
ringfenced sub-selection to do further analysis on
– Locking parameters to only search within subset
– For example, topics in certain Boroughs of London
– Perform term analysis within subset
• To take away and do deep/close reading/research on
• Download their subsets to use with other tools.
– CSV (not necessarily through the API).
13. Suggestions
• MA students felt a little overwhelmed with content
– Where to start?
– Need guidance, approach, start to think about particular topics and
what they could use it for.
– Saw possibilities and opportunities
• If used for school groups, intros to selections
– As in “Teaching History with 100 Objects” approach
– Pointers to interesting topics and collections needed
• Tabular data
– Investigate how to improve quality of OCR data
– Reusable data
– Balance between identifying which tables are useful and how
difficult it is to identify text within tables
14. Suggestions (2)
• Crowdsourcing?
– Where would it be appropriate?
– Tagging? Annotation? OCR Correction? Where to
employ it?
– Individual user data used in machine learning to
generate finding aids for all?
• Image Processing
– Matching of images
– Image Wall is a good tool, can it be expanded to
provide useful image analysis?
15. Suggestions (3)
• If UKMHL is open, then BL Book Data should also be
open?
– Discussion about licensing, financial model
• Request for items to be traceable to a shelf mark, and
availability to search for that.
– Would allow variants of a book to be identified
– Build links between them
• Improvement of KWIC results to facilitate “traditional”
research methods
– Concise list view would speed up the process of reading content
hugely
16. Suggestions (4)
• “Premium Version Without the Ads”
Discussion about what the different grey buttons were and why they take up screen real estate
17. Suggestion (5)
• Filter by language of text
– Return Only French, only German, only English, etc.
• Access to ALTO XML, to reuse/analyse data
themselves
– People have to already know about the process to even
know ALTO exists
• Access to raw image data
18. Surprises on the Day
• People’s questions changed
– Alcohol in particular type of official literature
– Revealed Alcohol in lots of recipes and cookbooks
• Soup (in texts) offered as a replacement for wine/beer as a
drink
• Changes scope of how we think about Victorian use of alcohol
• Research questions change by using these tools
and resources
– Scope
– Focus
– Serendipity
20. Could potentially offer the way to upload groups of terms to do search,
and to run term analysis (this isn’t live, it’s a mockup, of a possible service)
21. Easier way to
go through
Key Word in
Contexts
Results
Quicker
(mock up)
22. Overall Impressions/ Comments
• Useful
• Enjoyment
• “Open”
• “Old Text, New Knowledge”
• Identification of gaps in design, and how to scale
up searches, methods, and questions
• API well documented
• From Developers
– Useful to understand research questions
• Rich conversations, interaction, possibilities
24. With thanks to
• Owen Stephens (@ostephens)
– http://www.ostephens.com/
• Peter Findlay (@PFindlay_500)
– Jisc digital portfolio manager
https://www.jisc.ac.uk/staff/peter-findlay
• Jisc organising folks! And the Wellcome Library
for hosting. There were really good biscuits.
25. A researcher commented that this level of discussion with developers was “Paradise”.
First mention of paradise in https://data.ukmhl.historicaltexts.jisc.ac.uk/
view?pubId=ukmhl-b28074117&pageId=ukmhl-b28074117-104&terms=paradise
Apposite bridge building!