SlideShare une entreprise Scribd logo
1  sur  61
Transcribing between the lines:
crowd-sourcing historic data collection
Nicole Kearney
Museum Victoria
@nicolekearney
Dr Elycia Wallis
Museum Victoria
@elyw
Biodiversity Heritage Library (BHL)
The world’s
largest online
repository for
biodiversity
heritage and
archival materials.
http://www.biodiversitylibrary.org
BHL-Australia
Total BHL-Au uploads
• 546 volumes
• 119 titles
• 140,252 pages
An average of
2,000 pages/month
BHL-Australia
The Naturalist's Miscellany, or Coloured figures
Of natural objects, Vol. 10, George Shaw, 1799.
The first published illustration
of the Duck-billed Platypus
“Of all the Mammalia yet
known it seems the most
extraordinary…
…at first view, it naturally
excites the idea of some
deceptive preparation by
artificial means.”
A synopsis of the Birds of Australia and the adjacent islands, John Gould, 1837.
What’s in the box?
Ornithology
Department
Archives
“Estate of
Graham Brown
– note books”
Catalogued in our Records & Archives database (TRIM)
Why are field diaries so important?
Field diaries
are full of
DATA
DATE: 26 September 1948
OBSERVATIONS
TIME: 8am
LOCATION:
Lake Corangamite
DATE: 26 September 1948
OBSERVATIONS
TIME: 8am
LOCATION:
Lake Corangamite
BEHAVIOUR: nesting
SILVER GULLS (26.9.48)
300 nests on 1 island
15 islands of similar size
Estimates 4500 nests
Nesting success
~ 1.5 eggs/nest
=7000 new gulls from this
year from this locality
Underutilised resource
Inaccessible in their current state
• single hard copy
• single location
• hand-written (in the field)
• historic scripts
• unsearchable
• uncatalogued
Our scientists need this data!
1931
2012 2014
Grampians
National
Park
Images: Heath Warwick & Nicole Kearney / Museum Victoria
A historic baseline for climate change research
?
Step 1:
create individual records
A record for every item
Step 2:
create digital versions
Digitisation & post processing
A digital version in our database
OCR from a page of
Graham Brown’s diary
l>^v-^wAl^ livU*^/) Curiae
'^tila'* -u^vttcvi Lsefei cit^:<
Lv. 1^ Ol^Vm?iJcw , L>w i^-
^Otv^ dS^^iL* ll^^Uk^
M/tTM^li?'^
tvc4fi>r '^^-^ G^WtY^^
uve^v. llCCUvlr]^vvl^
'^L^>u^ l^t^
You can’t search handwriting
Step 3:
transcription
Step 3: select a transcription tool
How to attract online volunteers?
http://volunteer.ala.org.au/
Forums build an online community
http://volunteer.ala.org.au/
Ready for display?
17.
http://volunteer.ala.org.au/
DigiVol export
Extracted transcript in Word
http://volunteer.ala.org.au/
Converted & reformatted
http://volunteer.ala.org.au/
Ready for display!
Transcript in our database
Step 4:
make them accessible
Add the metadata
http://volunteer.ala.org.au/
Add the metadata
http://volunteer.ala.org.au/
Add the metadata
Upload into Internet Archive
https://archive.org/
Final destination: BHL
http://www.biodiversitylibrary.org
Along with the transcriptions!
http://www.biodiversitylibrary.org
Final step?
Tell everyone!
http://museumvictoria.com.au/about/mv-blog
http://blog.biodiversitylibrary.org/
Progress thus far…
http://volunteer.ala.org.au/
• 36 field diaries digitised
• 4 authors
• 18 diaries transcribed
(2 per month)
• 4 diaries in BHL
• 70 crowd-sourced volunteers
New homes for our field diaries
… in our Scientific Art & Observation Collection.
But what about the data?
There’s a lot of data!
5 Graham Brown field diaries:
Date Species Location
09/09/1947 Red Wattle bird Colac, near lake, in flowering gums
13/09/1947 Crested Grebes Colac East, end of Church St, mouth
of the creek
13/09/1947 Little Pied Cormorant Colac, perched on the wreck
13/09/1947 Mountain Duck Colac East, end of Church St, mouth
of the creek
13/09/1947 Musk Duck Colac, on the lake
13/09/1947 Silver Gull Colac, over the lake, opposite
Queen's Avenue
5611 animal sightings
547 mentions of people & organisations
A final word about
online volunteers
Rewarding online volunteers
http://volunteer.ala.org.au/
Slide credit: Paul Flemons, DigiVol Volunteer Survey, April 2015
Thank you
Nicole Kearney
nkearney@museum.vic.gov.au
@nicolekearney
Dr Elycia Wallis
ewallis@museum.vic.gov.au
@elyw

Contenu connexe

En vedette

PresentationProductionPreacherCurl2
PresentationProductionPreacherCurl2PresentationProductionPreacherCurl2
PresentationProductionPreacherCurl2Patrick Barroso
 
평촌오피.야탑오피≒다솜넷≒천안오피걸.강남역오피방
평촌오피.야탑오피≒다솜넷≒천안오피걸.강남역오피방평촌오피.야탑오피≒다솜넷≒천안오피걸.강남역오피방
평촌오피.야탑오피≒다솜넷≒천안오피걸.강남역오피방dasom0041
 
iKas International Client Marketing - 2mb
iKas International Client Marketing - 2mbiKas International Client Marketing - 2mb
iKas International Client Marketing - 2mbKaren Higgins
 
Intern Presentation
Intern PresentationIntern Presentation
Intern PresentationJack Yap
 
For media analysis of pics2
For media analysis of pics2For media analysis of pics2
For media analysis of pics2Tatiitat
 
When worse comes to worse
When worse comes to worseWhen worse comes to worse
When worse comes to worseTatiitat
 

En vedette (8)

PresentationProductionPreacherCurl2
PresentationProductionPreacherCurl2PresentationProductionPreacherCurl2
PresentationProductionPreacherCurl2
 
평촌오피.야탑오피≒다솜넷≒천안오피걸.강남역오피방
평촌오피.야탑오피≒다솜넷≒천안오피걸.강남역오피방평촌오피.야탑오피≒다솜넷≒천안오피걸.강남역오피방
평촌오피.야탑오피≒다솜넷≒천안오피걸.강남역오피방
 
iKas International Client Marketing - 2mb
iKas International Client Marketing - 2mbiKas International Client Marketing - 2mb
iKas International Client Marketing - 2mb
 
3 fundamentos del edificio BIS
3 fundamentos del edificio BIS3 fundamentos del edificio BIS
3 fundamentos del edificio BIS
 
Intern Presentation
Intern PresentationIntern Presentation
Intern Presentation
 
Sete bruxas
Sete bruxasSete bruxas
Sete bruxas
 
For media analysis of pics2
For media analysis of pics2For media analysis of pics2
For media analysis of pics2
 
When worse comes to worse
When worse comes to worseWhen worse comes to worse
When worse comes to worse
 

Similaire à Transcribing between the lines: crowd-sourcing historic data collection

Biodiversity Heritage Library - an overview for the Australian Museum
Biodiversity Heritage Library - an overview for the Australian MuseumBiodiversity Heritage Library - an overview for the Australian Museum
Biodiversity Heritage Library - an overview for the Australian MuseumNicole Kearney
 
Digitization in Support of Services @ Smithsonian Libraries (March)
Digitization in Support of Services @ Smithsonian Libraries (March)Digitization in Support of Services @ Smithsonian Libraries (March)
Digitization in Support of Services @ Smithsonian Libraries (March)Martin Kalfatovic
 
A Different Type of Animal? Advocating for Natural Science Archives
A Different Type of Animal? Advocating for Natural Science ArchivesA Different Type of Animal? Advocating for Natural Science Archives
A Different Type of Animal? Advocating for Natural Science ArchivesNicole Kearney
 
Open Access to Legacy Biodiversity Literature
Open Access to Legacy Biodiversity LiteratureOpen Access to Legacy Biodiversity Literature
Open Access to Legacy Biodiversity Literaturetgarnett
 
Scanning Locally, Collaborating Globally: The Biodiversity Heritage Library
Scanning Locally, Collaborating Globally: The Biodiversity Heritage LibraryScanning Locally, Collaborating Globally: The Biodiversity Heritage Library
Scanning Locally, Collaborating Globally: The Biodiversity Heritage LibraryMartin Kalfatovic
 
Taking a Virtual Walk on the Wild Side
Taking a Virtual Walk on the Wild SideTaking a Virtual Walk on the Wild Side
Taking a Virtual Walk on the Wild SideStella Wisdom
 
2009.07.10 BHL talk at Field Museum
2009.07.10 BHL talk at Field Museum2009.07.10 BHL talk at Field Museum
2009.07.10 BHL talk at Field MuseumSCPilsk
 
2009 05 20 Cimc Pilsk
2009 05 20 Cimc Pilsk2009 05 20 Cimc Pilsk
2009 05 20 Cimc PilskSCPilsk
 
The Biodiversity Heritage Library: Growing from Botanical Origins
The Biodiversity Heritage Library: Growing from Botanical OriginsThe Biodiversity Heritage Library: Growing from Botanical Origins
The Biodiversity Heritage Library: Growing from Botanical OriginsMartin Kalfatovic
 
Digitization in Support of Services @ Smithsonian Libraries (May)
Digitization in Support of Services @ Smithsonian Libraries (May)Digitization in Support of Services @ Smithsonian Libraries (May)
Digitization in Support of Services @ Smithsonian Libraries (May)Martin Kalfatovic
 
Destroying & Creating the Library of the Future
Destroying & Creating the Library of the FutureDestroying & Creating the Library of the Future
Destroying & Creating the Library of the FutureElaine Harrington
 
IETC Free Digital Content: Understanding The Value of Digital Special Collect...
IETC Free Digital Content: Understanding The Value of Digital Special Collect...IETC Free Digital Content: Understanding The Value of Digital Special Collect...
IETC Free Digital Content: Understanding The Value of Digital Special Collect...Paula Murphy
 
Sherborn: Pyle - Towards a Global Names Architecture: The future of indexing...
Sherborn: Pyle -  Towards a Global Names Architecture: The future of indexing...Sherborn: Pyle -  Towards a Global Names Architecture: The future of indexing...
Sherborn: Pyle - Towards a Global Names Architecture: The future of indexing...ICZN
 
CUA LSC 747_2011
CUA LSC 747_2011CUA LSC 747_2011
CUA LSC 747_2011SCPilsk
 
A Virtual Walk on the Wild Side!
A Virtual Walk on the Wild Side!A Virtual Walk on the Wild Side!
A Virtual Walk on the Wild Side!Stella Wisdom
 
Biodiversity Heritage Library Australia. Presentation at VALA2012, Melbourne ...
Biodiversity Heritage Library Australia. Presentation at VALA2012, Melbourne ...Biodiversity Heritage Library Australia. Presentation at VALA2012, Melbourne ...
Biodiversity Heritage Library Australia. Presentation at VALA2012, Melbourne ...Elycia Wallis
 
Usaf navy marine corps librarians 06 25-10
Usaf navy marine corps librarians 06 25-10Usaf navy marine corps librarians 06 25-10
Usaf navy marine corps librarians 06 25-10Marcia Adams
 

Similaire à Transcribing between the lines: crowd-sourcing historic data collection (20)

Biodiversity Heritage Library - an overview for the Australian Museum
Biodiversity Heritage Library - an overview for the Australian MuseumBiodiversity Heritage Library - an overview for the Australian Museum
Biodiversity Heritage Library - an overview for the Australian Museum
 
Places of inspiration: playing and making in the library by Stella Wisdom
Places of inspiration: playing and making in the library by Stella WisdomPlaces of inspiration: playing and making in the library by Stella Wisdom
Places of inspiration: playing and making in the library by Stella Wisdom
 
Digitization in Support of Services @ Smithsonian Libraries (March)
Digitization in Support of Services @ Smithsonian Libraries (March)Digitization in Support of Services @ Smithsonian Libraries (March)
Digitization in Support of Services @ Smithsonian Libraries (March)
 
A Different Type of Animal? Advocating for Natural Science Archives
A Different Type of Animal? Advocating for Natural Science ArchivesA Different Type of Animal? Advocating for Natural Science Archives
A Different Type of Animal? Advocating for Natural Science Archives
 
Open Access to Legacy Biodiversity Literature
Open Access to Legacy Biodiversity LiteratureOpen Access to Legacy Biodiversity Literature
Open Access to Legacy Biodiversity Literature
 
Scanning Locally, Collaborating Globally: The Biodiversity Heritage Library
Scanning Locally, Collaborating Globally: The Biodiversity Heritage LibraryScanning Locally, Collaborating Globally: The Biodiversity Heritage Library
Scanning Locally, Collaborating Globally: The Biodiversity Heritage Library
 
Taking a Virtual Walk on the Wild Side
Taking a Virtual Walk on the Wild SideTaking a Virtual Walk on the Wild Side
Taking a Virtual Walk on the Wild Side
 
2009.07.10 BHL talk at Field Museum
2009.07.10 BHL talk at Field Museum2009.07.10 BHL talk at Field Museum
2009.07.10 BHL talk at Field Museum
 
2009 05 20 Cimc Pilsk
2009 05 20 Cimc Pilsk2009 05 20 Cimc Pilsk
2009 05 20 Cimc Pilsk
 
The Biodiversity Heritage Library: Growing from Botanical Origins
The Biodiversity Heritage Library: Growing from Botanical OriginsThe Biodiversity Heritage Library: Growing from Botanical Origins
The Biodiversity Heritage Library: Growing from Botanical Origins
 
Places of inspiration: Playing and Making in the Library by Stella Wisdom
Places of inspiration: Playing and Making in the Library by Stella WisdomPlaces of inspiration: Playing and Making in the Library by Stella Wisdom
Places of inspiration: Playing and Making in the Library by Stella Wisdom
 
Digitization in Support of Services @ Smithsonian Libraries (May)
Digitization in Support of Services @ Smithsonian Libraries (May)Digitization in Support of Services @ Smithsonian Libraries (May)
Digitization in Support of Services @ Smithsonian Libraries (May)
 
Digital Scholarship at the British Library: Collecting, Collaboration and Res...
Digital Scholarship at the British Library: Collecting, Collaboration and Res...Digital Scholarship at the British Library: Collecting, Collaboration and Res...
Digital Scholarship at the British Library: Collecting, Collaboration and Res...
 
Destroying & Creating the Library of the Future
Destroying & Creating the Library of the FutureDestroying & Creating the Library of the Future
Destroying & Creating the Library of the Future
 
IETC Free Digital Content: Understanding The Value of Digital Special Collect...
IETC Free Digital Content: Understanding The Value of Digital Special Collect...IETC Free Digital Content: Understanding The Value of Digital Special Collect...
IETC Free Digital Content: Understanding The Value of Digital Special Collect...
 
Sherborn: Pyle - Towards a Global Names Architecture: The future of indexing...
Sherborn: Pyle -  Towards a Global Names Architecture: The future of indexing...Sherborn: Pyle -  Towards a Global Names Architecture: The future of indexing...
Sherborn: Pyle - Towards a Global Names Architecture: The future of indexing...
 
CUA LSC 747_2011
CUA LSC 747_2011CUA LSC 747_2011
CUA LSC 747_2011
 
A Virtual Walk on the Wild Side!
A Virtual Walk on the Wild Side!A Virtual Walk on the Wild Side!
A Virtual Walk on the Wild Side!
 
Biodiversity Heritage Library Australia. Presentation at VALA2012, Melbourne ...
Biodiversity Heritage Library Australia. Presentation at VALA2012, Melbourne ...Biodiversity Heritage Library Australia. Presentation at VALA2012, Melbourne ...
Biodiversity Heritage Library Australia. Presentation at VALA2012, Melbourne ...
 
Usaf navy marine corps librarians 06 25-10
Usaf navy marine corps librarians 06 25-10Usaf navy marine corps librarians 06 25-10
Usaf navy marine corps librarians 06 25-10
 

Dernier

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Dernier (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Transcribing between the lines: crowd-sourcing historic data collection

Notes de l'éditeur

  1. I’m here today to talk to you about digitisation, transcription, and historic data collection. I am the Coordinator of the Australian component of the Biodiversity Heritage Library, a project led by Dr Ely Wallis.
  2. The Biodiversity Heritage Library is the largest online repository for biodiversity heritage and archival materials. Globally, the project is based at the Smithsonian in Washington DC, but there are global nodes all over the world and in Australia ours is led at Museum Victoria.
  3. Australia has been contributing to the Biodiversity Heritage Library since 2010 and we have now digitised over 500 rare books and historic journals. This represents over 140,000 pages that used to be locked up in our library archives.
  4. These include “The Naturalist’s Miscellany” from 1799, which features the first published illustration of the Duck-billed Platypus.
  5. Our recent contributions include treasures such as this stunning “A Synopsis of the Birds of Australia and the adjacent islands” by John Gould.
  6. These beautiful historic publications are now available online.
  7. Anyone in the world can now turn these beautiful pages.
  8. The items we’ve contributed to the Biodiversity Heritage Library are now infinitely more accessible that they’ve ever been, but they’ve never truly been inaccessible.
  9. As published items, rare books and historic journals have records that are in publically accessible library catalogues. If you had done a search for John Gould a year ago (before we put this digital copy online), you would have discovered the existence of this book in Museum Victoria’s collection. If you wanted to access it (and turn those beautiful pages), you would have had to come to Melbourne and make an appointment with our library staff, but the book was still findable and (to a certain extent) accessible.
  10. But, I’m here today to talk about items in our collection that were neither discoverable nor accessible. This box was discovered by the one of our history curators in our bird mount store. It was labelled “Estate of Graham Brown – note books”. Inside this box were 5 historic field diaries and 6 folders of sightings records – the meticulous observations of an eminent Victorian ornithologist. The box did have a digital record, but this record contained no more information than what’s written on this box. And, as part of archives, the record was in our internal records and archives database, which is not the database used to manage the items in our State Heritage Collection, the database used by our scientists and historians, the people who might be interested in the contents of this box.
  11. But why should we care about someone’s old diaries? Historic field diaries chronicle the scientific expeditions undertaken by our early naturalists explore, research and discover the natural history of our world.
  12. They are filled with descriptions of new discoveries and frontiers.
  13. They mention people, places and events.
  14. But most importantly they are full of data – historic observations of animals, plants, fossils, weather conditions, habitat descriptions and past field collecting techniques.
  15. In diaries, historic observations are linked the two key pieces of information that make an observation useful to science – the DATE the observation was made and the LOCATION of that observation. These are the observations made by Graham Brown on 26 September 1948 at Lake Corangamite. And in this entry Brown has even noted the time of his observations. Historic observations can provide invaluable insights into past species’ abundance and distribution. They can be used to plan future biological surveys and they can inform threatened species management.
  16. Field diaries are also filled with contextual information about biology and behaviour.
  17. To give you an idea about how rich this contextual information can be, I’m going to take you through this one page.
  18. But, despite the wealth of information they contain, field diaries are a hugely underutilised resource. Field diaries usually only exist as a single hard copy, stored in a single location. And handwritten in the field, in historic scripts, they can be very hard to read. And as hand-written documents, they’re unsearchable. And most field diaries in museum collections are uncatalogued, or if they are catalogued their electronic records contain insufficient information for researchers to be able to find them.
  19. Graham Brown made frequent expeditions to locations of great conservation interest to our museum, including the Grampians National Park. These are bird observations made by Graham Brown in 1931. These are Museum Victoria scientists surveying the same area in 2012 and again in 2014. But if our scientists are to get access this historic data, we needed to make the diaries more accessible.
  20. Historic occurrence records are now important than ever, as they can provide a critical baseline for climate change studies. And many scientists are desperate for this data. The authors of this paper include Drs Kevin and Karen Rowe, scientists in Museum Victoria’s ornithology department. This is a study they conducted using data from historic field diaries. And here are two more. These scientists read the original handwritten diaries and manually extracted the data. They have told me what an arduous task this was and how much easier their work – this critical research - would be if this data was more accessible. And Karen Rowe hopes to undertake a similar study in our Grampians National Park, using Graham Brown’s field diaries.
  21. We need to know what’s in this box!
  22. A record for each item in the box, in EMu, our collection management database, the database used by the historians and scientists in our museum. The diary records could now be linked to items, photographs and specimens in our collection, as well as to biography records for the author and other people mentioned in the diaries.
  23. The second step was to digitise the diaries.
  24. We created high quality scans of each page, work that was done by our in-house Biodiversity Heritage Library volunteers.
  25. We then uploaded these images to the collection database, creating a digitised version of each diary.
  26. But while our curators would now be able to find the diaries, their contents were still inaccessible. This is the OCR output for this handwritten page. In order to unlock their contents and extract the historic data, we needed to transcribe them.
  27. Step 3: transcription
  28. Many organisations have developed tools to transcribe handwritten material and this work is being done online by crowd-sourced volunteers. Many, like the Smithsonian Transcription Centre, consist of an image of the original and a free-text box for the volunteers to transcribe into.
  29. Some, like Transcribe Bentham, are a little more sophisticated and provide formatting tools for volunteers to mark up their transcribed text.
  30. And some transcription projects are not interested in a verbatim transcription. Rather, they focus on capturing specific data. In this war diary, volunteers flag the position of key information within the text and transcribe only that text – people’s names, dates and locations.
  31. Or in the case of a specimen label – species names, dates and locations. However, as our field diaries were of interest to both historians and scientists, we needed both a verbatim transcript and the data.
  32. Some of you may be familiar with DigiVol, the online transcription tool, developed by the Atlas of Living Australia in collaboration with the Australian Museum. Originally called the Biodiversity Volunteer Portal, it was designed for the transcription of handwritten specimens labels, but is now also used to transcribe survey sheets and diaries.
  33. Currently it’s also being used to identify animal selfies, animals caught on motion-sensitive cameras in NSW National Parks.
  34. It was DigiVol’s flexibility that made it attractive to our project. We were able to create a custom template with a verbatim text field and a table for capturing our historic observation data – date, location, scientific name and common name – as well as a field for recording mentions of people and organisations.
  35. This was the first field diary we uploaded. In order to attract volunteers to our project, we wrote an introduction highlighting the significance of the author and his work. We included a quote from the diary, links to his biography and obituary, and a link to the tutorial we had written detailing exactly how the diary should be transcribed. This was the first time we’d ever done this. We had no idea what to expect. I uploaded this diary on a Friday afternoon. I had arranged to meet with our in-house volunteers the following week and, once they’d had a go and we’d ironed out any issues, I planned to promote the project in the hope of recruiting some online volunteers. However, by Monday morning the diary was already 30% transcribed. Existing volunteer transcribers already registered with DigiVol were racing through it.
  36. As I looked through their work, I noticed that the volunteers had not just been transcribing, they had been communicating. They had asked questions, provided answers and shared ideas, and some had suggested improvements to my tutorial. An online community had already formed around our little project. And it was via this online community that I was able to find a volunteer willing and qualified to review the completed transcriptions, to ensure that all the data had been collected and that we had a consistent and accurate transcription.
  37. Once the volunteers had finished transcribing the diary, we downloaded the transcribed text. This is what we wanted it to look like.
  38. But a DigiVol export looks like this. The transcription is in the “occurrence remarks” column.
  39. And when the exported transcription is copied into Word, it looks like this.
  40. I produced a complex series of Macros that removed all the formatting tags, replaced them with real formatting and got rid of all the weird character strings.
  41. …turning each page into a perfect match of the original. But this is by far the most time consuming part of the project. I was quite proud of this of this until someone told me that Macros are very 1980s. They also said that it would take a 26 year old programmer half an hour to fix the export so that it would spit this out every time. So, while we’re extraordinarily grateful to the Atlas of Living Australia and the Australian Museum for having such a wonderful tool available and for all the work they have done building such an amazing volunteer workforce, it is tricky to extract the work and turn it into a format suitable for display. And we hope to work with them to improve this part of the workflow.
  42. Once we had a complete, readable transcript, we uploaded it into our collection database and linked to the object records for the diaries. The diaries were now discoverable AND searchable within our internal systems, accessible by any staff member.
  43. But in order for the diaries to be truly accessible, they needed to be online.
  44. Items going into the Biodiversity Heritage Library must be passed through Macaw, the metadata collection tool produced by the Smithsonian.
  45. Macaw allows users to input page-level metadata and create a complete digital version of the item that can be exported to other systems.
  46. And we did the same with the transcriptions.
  47. The completed items, along with their attached metadata, were then uploaded into the Internet Archive.
  48. And harvested across into the Biodiversity Heritage Library.
  49. Along with their transcriptions.
  50. And, of course, our final step was to tell everyone.
  51. You can read our historic field diaries online!
  52. And if you can get other organisations to repost your blogs, all the better!
  53. We uploaded our first field diary onto DigiVol’s transcription portal in November last year. We have now digitised 36 diaries and 18 of these have been completely transcribed. As news of our project spreads around our museum (and beyond), historic field diaries are coming out of the woodwork. These include uncatalogued diaries from our own museum as well as surprises from outside: after we started transcribing the diaries of Allan McEvey, one of his retired assistants made a donation containing photographs from the expeditions we had just transcribed as well as one of her own field diaries.
  54. And the diaries? They have a new home. They are now part of our Scientific Art and Observation Collection. And they’re managed by a very happy curator. This is Rebecca Carland, the curator who original found the box in the bird mount store and has been the driving force behind this project.
  55. But what about the data?
  56. Graham Brown’s five field diaries yielded 5611 animal sightings, complete with date and location data. These occurrence records have been delivered to our curators. Our next task is the taxonomic referencing of this data – the matching of the species names used in the diaries with the currently accepted scientific names and the georeferencing of the locations.
  57. Before I finish, I would like to say a final word about working with online volunteers.
  58. The DigiVol administration, based at the Australian Museum, seeks to recognise and reward the hard work of its volunteers in a number of ways. These include appreciation awards, public honour boards, personal achievement pages, virtual reward badges, and “how you’re making a difference” reports.
  59. However, in a recent DigiVol survey most volunteers stated that the honour board was not important to them. Rather, the strongest motivational drivers given for their volunteering on DigiVol were an interest in “natural history and cultural museum collections”, “doing something worthwhile” and “making a contribution in the field of biodiversity”. These drivers are certainly strong. DigiVol has just attracted its 1000th volunteer transcriber and their current recruitment rate is about fifty new volunteers per month. These volunteers are highly skilled, with a great deal of professional and life experience. Fifty-three percent are over fifty and forty-one percent have a postgraduate degree. In order to appeal to this growing volunteer workforce, the value of the work and the impact it will have on our current understanding of biodiversity cannot be understated.
  60. This project would not have been possible without The Biodiversity Heritage Library, Museum Victoria (Rebecca Carland, Hayley Webster, Karen Rowe, Cerise Howard & Jim Healey), the Atlas of Living Australia, The Australian Museum (Paul Flemons & Rhiannon Stephens), our BHL Volunteers (Bob Griffith, Heidi Griffith, Susan Halliwell, Jade Koekoe, Alan Nankervis & Tiziana Tizian), our volunteer validator (Erin Headon) and 70 DigiVol volunteers. Thank you!