Archival field diaries are an invaluable source of scientific and historic data. They can provide invaluable insights into species’ past abundance and distribution, references to significant people and events, and personal descriptions of historic expeditions. Despite the wealth of information they contain, they are a hugely underutilised resource because they are inaccessible in their original state. As hand-written documents they are hard to read, and they are often uncatalogued. This means that neither their contents nor their very existence is searchable. In this paper, we will explore the evolving field of online transcription, with a particular emphasis on archival field diaries. Using Museum Victoria’s recent transcription projects as key case studies, we will discuss the transcription platforms available, the standards required for success, and, most importantly, what we are doing to capture all the data.
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Transcribing between the lines: crowd-sourcing historic data collection
1. Transcribing between the lines:
crowd-sourcing historic data collection
Nicole Kearney
Museum Victoria
@nicolekearney
Dr Elycia Wallis
Museum Victoria
@elyw
2. Biodiversity Heritage Library (BHL)
The world’s
largest online
repository for
biodiversity
heritage and
archival materials.
http://www.biodiversitylibrary.org
4. BHL-Australia
The Naturalist's Miscellany, or Coloured figures
Of natural objects, Vol. 10, George Shaw, 1799.
The first published illustration
of the Duck-billed Platypus
“Of all the Mammalia yet
known it seems the most
extraordinary…
…at first view, it naturally
excites the idea of some
deceptive preparation by
artificial means.”
5. A synopsis of the Birds of Australia and the adjacent islands, John Gould, 1837.
6.
7.
8.
9.
10. What’s in the box?
Ornithology
Department
Archives
“Estate of
Graham Brown
– note books”
Catalogued in our Records & Archives database (TRIM)
15. DATE: 26 September 1948
OBSERVATIONS
TIME: 8am
LOCATION:
Lake Corangamite
16. DATE: 26 September 1948
OBSERVATIONS
TIME: 8am
LOCATION:
Lake Corangamite
BEHAVIOUR: nesting
17. SILVER GULLS (26.9.48)
300 nests on 1 island
15 islands of similar size
Estimates 4500 nests
Nesting success
~ 1.5 eggs/nest
=7000 new gulls from this
year from this locality
18. Underutilised resource
Inaccessible in their current state
• single hard copy
• single location
• hand-written (in the field)
• historic scripts
• unsearchable
• uncatalogued
19. Our scientists need this data!
1931
2012 2014
Grampians
National
Park
Images: Heath Warwick & Nicole Kearney / Museum Victoria
57. There’s a lot of data!
5 Graham Brown field diaries:
Date Species Location
09/09/1947 Red Wattle bird Colac, near lake, in flowering gums
13/09/1947 Crested Grebes Colac East, end of Church St, mouth
of the creek
13/09/1947 Little Pied Cormorant Colac, perched on the wreck
13/09/1947 Mountain Duck Colac East, end of Church St, mouth
of the creek
13/09/1947 Musk Duck Colac, on the lake
13/09/1947 Silver Gull Colac, over the lake, opposite
Queen's Avenue
5611 animal sightings
547 mentions of people & organisations
I’m here today to talk to you about digitisation, transcription, and historic data collection. I am the Coordinator of the Australian component of the Biodiversity Heritage Library, a project led by Dr Ely Wallis.
The Biodiversity Heritage Library is the largest online repository for biodiversity heritage and archival materials. Globally, the project is based at the Smithsonian in Washington DC, but there are global nodes all over the world and in Australia ours is led at Museum Victoria.
Australia has been contributing to the Biodiversity Heritage Library since 2010 and we have now digitised over 500 rare books and historic journals. This represents over 140,000 pages that used to be locked up in our library archives.
These include “The Naturalist’s Miscellany” from 1799, which features the first published illustration of the Duck-billed Platypus.
Our recent contributions include treasures such as this stunning “A Synopsis of the Birds of Australia and the adjacent islands” by John Gould.
These beautiful historic publications are now available online.
Anyone in the world can now turn these beautiful pages.
The items we’ve contributed to the Biodiversity Heritage Library are now infinitely more accessible that they’ve ever been, but they’ve never truly been inaccessible.
As published items, rare books and historic journals have records that are in publically accessible library catalogues. If you had done a search for John Gould a year ago (before we put this digital copy online), you would have discovered the existence of this book in Museum Victoria’s collection. If you wanted to access it (and turn those beautiful pages), you would have had to come to Melbourne and make an appointment with our library staff, but the book was still findable and (to a certain extent) accessible.
But, I’m here today to talk about items in our collection that were neither discoverable nor accessible. This box was discovered by the one of our history curators in our bird mount store. It was labelled “Estate of Graham Brown – note books”. Inside this box were 5 historic field diaries and 6 folders of sightings records – the meticulous observations of an eminent Victorian ornithologist. The box did have a digital record, but this record contained no more information than what’s written on this box. And, as part of archives, the record was in our internal records and archives database, which is not the database used to manage the items in our State Heritage Collection, the database used by our scientists and historians, the people who might be interested in the contents of this box.
But why should we care about someone’s old diaries? Historic field diaries chronicle the scientific expeditions undertaken by our early naturalists explore, research and discover the natural history of our world.
They are filled with descriptions of new discoveries and frontiers.
They mention people, places and events.
But most importantly they are full of data – historic observations of animals, plants, fossils, weather conditions, habitat descriptions and past field collecting techniques.
In diaries, historic observations are linked the two key pieces of information that make an observation useful to science – the DATE the observation was made and the LOCATION of that observation. These are the observations made by Graham Brown on 26 September 1948 at Lake Corangamite. And in this entry Brown has even noted the time of his observations.
Historic observations can provide invaluable insights into past species’ abundance and distribution. They can be used to plan future biological surveys and they can inform threatened species management.
Field diaries are also filled with contextual information about biology and behaviour.
To give you an idea about how rich this contextual information can be, I’m going to take you through this one page.
But, despite the wealth of information they contain, field diaries are a hugely underutilised resource.
Field diaries usually only exist as a single hard copy, stored in a single location. And handwritten in the field, in historic scripts, they can be very hard to read. And as hand-written documents, they’re unsearchable.
And most field diaries in museum collections are uncatalogued, or if they are catalogued their electronic records contain insufficient information for researchers to be able to find them.
Graham Brown made frequent expeditions to locations of great conservation interest to our museum, including the Grampians National Park. These are bird observations made by Graham Brown in 1931. These are Museum Victoria scientists surveying the same area in 2012 and again in 2014. But if our scientists are to get access this historic data, we needed to make the diaries more accessible.
Historic occurrence records are now important than ever, as they can provide a critical baseline for climate change studies. And many scientists are desperate for this data. The authors of this paper include Drs Kevin and Karen Rowe, scientists in Museum Victoria’s ornithology department. This is a study they conducted using data from historic field diaries. And here are two more. These scientists read the original handwritten diaries and manually extracted the data. They have told me what an arduous task this was and how much easier their work – this critical research - would be if this data was more accessible. And Karen Rowe hopes to undertake a similar study in our Grampians National Park, using Graham Brown’s field diaries.
We need to know what’s in this box!
A record for each item in the box, in EMu, our collection management database, the database used by the historians and scientists in our museum. The diary records could now be linked to items, photographs and specimens in our collection, as well as to biography records for the author and other people mentioned in the diaries.
The second step was to digitise the diaries.
We created high quality scans of each page, work that was done by our in-house Biodiversity Heritage Library volunteers.
We then uploaded these images to the collection database, creating a digitised version of each diary.
But while our curators would now be able to find the diaries, their contents were still inaccessible. This is the OCR output for this handwritten page. In order to unlock their contents and extract the historic data, we needed to transcribe them.
Step 3: transcription
Many organisations have developed tools to transcribe handwritten material and this work is being done online by crowd-sourced volunteers. Many, like the Smithsonian Transcription Centre, consist of an image of the original and a free-text box for the volunteers to transcribe into.
Some, like Transcribe Bentham, are a little more sophisticated and provide formatting tools for volunteers to mark up their transcribed text.
And some transcription projects are not interested in a verbatim transcription. Rather, they focus on capturing specific data. In this war diary, volunteers flag the position of key information within the text and transcribe only that text – people’s names, dates and locations.
Or in the case of a specimen label – species names, dates and locations. However, as our field diaries were of interest to both historians and scientists, we needed both a verbatim transcript and the data.
Some of you may be familiar with DigiVol, the online transcription tool, developed by the Atlas of Living Australia in collaboration with the Australian Museum. Originally called the Biodiversity Volunteer Portal, it was designed for the transcription of handwritten specimens labels, but is now also used to transcribe survey sheets and diaries.
Currently it’s also being used to identify animal selfies, animals caught on motion-sensitive cameras in NSW National Parks.
It was DigiVol’s flexibility that made it attractive to our project. We were able to create a custom template with a verbatim text field and a table for capturing our historic observation data – date, location, scientific name and common name – as well as a field for recording mentions of people and organisations.
This was the first field diary we uploaded. In order to attract volunteers to our project, we wrote an introduction highlighting the significance of the author and his work. We included a quote from the diary, links to his biography and obituary, and a link to the tutorial we had written detailing exactly how the diary should be transcribed. This was the first time we’d ever done this. We had no idea what to expect. I uploaded this diary on a Friday afternoon. I had arranged to meet with our in-house volunteers the following week and, once they’d had a go and we’d ironed out any issues, I planned to promote the project in the hope of recruiting some online volunteers. However, by Monday morning the diary was already 30% transcribed. Existing volunteer transcribers already registered with DigiVol were racing through it.
As I looked through their work, I noticed that the volunteers had not just been transcribing, they had been communicating. They had asked questions, provided answers and shared ideas, and some had suggested improvements to my tutorial. An online community had already formed around our little project. And it was via this online community that I was able to find a volunteer willing and qualified to review the completed transcriptions, to ensure that all the data had been collected and that we had a consistent and accurate transcription.
Once the volunteers had finished transcribing the diary, we downloaded the transcribed text. This is what we wanted it to look like.
But a DigiVol export looks like this. The transcription is in the “occurrence remarks” column.
And when the exported transcription is copied into Word, it looks like this.
I produced a complex series of Macros that removed all the formatting tags, replaced them with real formatting and got rid of all the weird character strings.
…turning each page into a perfect match of the original. But this is by far the most time consuming part of the project. I was quite proud of this of this until someone told me that Macros are very 1980s. They also said that it would take a 26 year old programmer half an hour to fix the export so that it would spit this out every time. So, while we’re extraordinarily grateful to the Atlas of Living Australia and the Australian Museum for having such a wonderful tool available and for all the work they have done building such an amazing volunteer workforce, it is tricky to extract the work and turn it into a format suitable for display. And we hope to work with them to improve this part of the workflow.
Once we had a complete, readable transcript, we uploaded it into our collection database and linked to the object records for the diaries. The diaries were now discoverable AND searchable within our internal systems, accessible by any staff member.
But in order for the diaries to be truly accessible, they needed to be online.
Items going into the Biodiversity Heritage Library must be passed through Macaw, the metadata collection tool produced by the Smithsonian.
Macaw allows users to input page-level metadata and create a complete digital version of the item that can be exported to other systems.
And we did the same with the transcriptions.
The completed items, along with their attached metadata, were then uploaded into the Internet Archive.
And harvested across into the Biodiversity Heritage Library.
Along with their transcriptions.
And, of course, our final step was to tell everyone.
You can read our historic field diaries online!
And if you can get other organisations to repost your blogs, all the better!
We uploaded our first field diary onto DigiVol’s transcription portal in November last year. We have now digitised 36 diaries and 18 of these have been completely transcribed. As news of our project spreads around our museum (and beyond), historic field diaries are coming out of the woodwork. These include uncatalogued diaries from our own museum as well as surprises from outside: after we started transcribing the diaries of Allan McEvey, one of his retired assistants made a donation containing photographs from the expeditions we had just transcribed as well as one of her own field diaries.
And the diaries? They have a new home. They are now part of our Scientific Art and Observation Collection. And they’re managed by a very happy curator. This is Rebecca Carland, the curator who original found the box in the bird mount store and has been the driving force behind this project.
But what about the data?
Graham Brown’s five field diaries yielded 5611 animal sightings, complete with date and location data. These occurrence records have been delivered to our curators. Our next task is the taxonomic referencing of this data – the matching of the species names used in the diaries with the currently accepted scientific names and the georeferencing of the locations.
Before I finish, I would like to say a final word about working with online volunteers.
The DigiVol administration, based at the Australian Museum, seeks to recognise and reward the hard work of its volunteers in a number of ways. These include appreciation awards, public honour boards, personal achievement pages, virtual reward badges, and “how you’re making a difference” reports.
However, in a recent DigiVol survey most volunteers stated that the honour board was not important to them. Rather, the strongest motivational drivers given for their volunteering on DigiVol were an interest in “natural history and cultural museum collections”, “doing something worthwhile” and “making a contribution in the field of biodiversity”. These drivers are certainly strong. DigiVol has just attracted its 1000th volunteer transcriber and their current recruitment rate is about fifty new volunteers per month. These volunteers are highly skilled, with a great deal of professional and life experience. Fifty-three percent are over fifty and forty-one percent have a postgraduate degree. In order to appeal to this growing volunteer workforce, the value of the work and the impact it will have on our current understanding of biodiversity cannot be understated.
This project would not have been possible without The Biodiversity Heritage Library, Museum Victoria (Rebecca Carland, Hayley Webster, Karen Rowe, Cerise Howard & Jim Healey), the Atlas of Living Australia, The Australian Museum (Paul Flemons & Rhiannon Stephens), our BHL Volunteers (Bob Griffith, Heidi Griffith, Susan Halliwell, Jade Koekoe, Alan Nankervis & Tiziana Tizian), our volunteer validator (Erin Headon) and 70 DigiVol volunteers. Thank you!