Ben Brumfield's presentation on crowdsourcing transcription tools at the Midwest Archives Conference Fall Symposium 2013. Discussion of factors for choosing a crowdsourcing tool, with screen-shots and analysis of Scripto, the Bentham Transcription Desks, the NARA Transcribr Drupal module, Zooniverse's Scribe, and live demos of the hosted tools Virtual Transcription Laboratory, WIkisource.org, and FromThePage.com
2. Why Transcribe?
Crowdsourcing can be
− Tagging
− Georectification
− Identification
But if you've got scanned documents, you've got
a problem
3.
4. Serendipity: One Volunteer's Story
Nat Wooding
– Semi-retired data analyst
– 200 pages of Julia Brumfield's 1923 diary in nine
months
– No relation to diarist
5. Serendipity: One Volunteer's Story
Nat Wooding
– Semi-retired data analyst
– 200 pages of Julia Brumfield's 1923 diary in nine
months
– No relation to diarist
– Great uncle was diarist's letter carrier, also
named Nat Wooding
10. Free as in puppy!
http://www.flickr.com/photos/magnusbrath/7614518858/
11. Why Crowdsource?
“At its best, crowdsourcing is not about
getting someone to do work for you, it is
about offering your users the
opportunity to participate in public
memory.”
– Trevor Owens, “Crowdsourcing Cultural Heritage:
The Objectives are Upside-down”
12.
13.
14. Why Crowdsource?
“By engaging the public in digitising our
collections, we are
− Increasing the scientific literacy of the public
− Providing increased access to our collections
− Building an advocacy network for our collections
and our institutions.”
– Paul Flemons, Australian Museum
18. Choosing a Transcription Platform
The good news:
– More than 30 tools to choose from!
The bad news:
– More than 30 tools to choose from!
19. Selection Factors
● Source Material
● Transcript Purpose
● Organizational/Project Management Fit
● Financial and Technical Resources
20. Source Material
● Is it of interest to anyone else?
● Is it under copyright?
● Does it need restricted access?
● Is it composed of “text” or “records”?
● How complex is the layout? How
important is that layout?
21. Purpose
•How will you be using the transcribed data?
– Traditional print editions
– Searchable online editions
•Do you want to use the system to analyze
the text?
•Do you need to import the transcripts into
other systems?
•Is public engagement the only goal?
22. Organizational Fit
•How important is traditional editorial
workflow?
•Will you rely on volunteers? How will you
find and motivate them?
•What is the duration of the project?
•Is there a "final version"?
•Is TEI a mandate?
23. Financial and Technical Resources
•System administrators to install non-hosted
software?
•Money to pay hosting costs?
•Programming skills to customize a tool?
•Money to pay programmers for
customization?
•Support for on-going costs to keep the site
running, however small?
24. The Tools
● Recent (oldest started in 2005)
● Influenced by origin
● Still pretty raw
● Most require tech expertise for set-up and
customization
● All require making trade-offs
http://tinyurl.com/TranscriptionToolGDoc
26. Quick Definitions
MediaWiki: Popular software framework for
runnning wiki projects
Wikipedia, Wikisource, Wiktionary, Wikitravel:
Projects running on MediaWiki
WikiMedia: Organization running many—but not
all—MediaWiki-based wiki projects.
37. Wikisource
Live demo of State Library of Queensland on
Wikisource showing project page, edit screen,
and editorial workflow.
Recommendation of Lori and the GLAMWiki
group to help organizations navigate the
community.
38. FromThePage
Live demo of FromThePage showing edit
screen, wiki-linking a single term, read pages
for a subject, full-text search on name variants,
and auto-link.