SlideShare une entreprise Scribd logo
1  sur  97
Télécharger pour lire hors ligne
What motivates library
crowdsourcing volunteers?
Experiences at the California Digital Newspaper
Collection and the
Cambridge Public Library Newspapers Collection
Brian Geiger
Director, Center for Bibliographical Studies and Research
California Digital Newspaper Collection
Frederick Zarndt
Chair, IFLA Newspapers Section
Photo held by John Oxley Library, State Library of Queensland. Original from
Courier-mail, Brisbane, Queensland, Australia.
1. crowds, properties of
2.crowdsourcing, short history of
3.crowdsourcing and libraries, especially
digital newspapers collections
4.crowdsourcing, motivations for
5.crowdsourcing, interesting fact
6.crowdsourcing, website traffic
7. crowdsourcing, accuracy of
8.crowdsourcing, economic benefits of
9.crowdsourcing, other benefits of
Crowd
Properties
In it he says ...
In 2004 James Surowiecki published ...
The Wisdom of Crowds: Why
the Many Are Smarter Than
the Few and How Collective
Wisdom Shapes Business,
Economies, Societies and
Nations
... a crowd of persons that are
diverse ...
... independent ...
... and
decentralized ...
usually make
better
judgements or
decisions than
single persons
“Country Fair” by Grandma Moses. Original painting 1950.
wisdom of crowds
in summary
James Surowiecki, The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective
Wisdom Shapes Business, Economies, Societies and Nations, Anchor Books, New York, 2005.
Diversity
Each person should have private information
even if it's just an eccentric interpretation of the
known facts.
Independence
People's opinions aren't determined by the
opinions of those around them.
Decentralization
People are able to specialize and draw on local
knowledge.
Aggregation
Some mechanism exists for turning private
judgments into a collective decision.
Crowdsourcing
Short history
“crowdsourcing”
was coined by Jeff Howe in “The rise of
crowdsourcing” published in Wired
magazine June 2006.
web trends for
“crowdsourcing”
Jan-2006 to Jan-2013
• On the date of publication of Jeff Howe’s Wired
magazine article, 1-Jun-2007, Wikipedia did not have
an entry (list) of crowdsourcing projects*.
• On 25-Jan-2010 Wikipedia’s list of crowdsourcing
projects had 35 entries*.
• On 17-Mar -2013 Wikipedia’s list of crowdsourcing
projects had 158 entries+.
* From Internet Archives’ Wayback Machine.
+ Wikipedia contributors, "List of crowdsourcing projects," Wikipedia, The Free Encyclopedia, https://
en.wikipedia.org/wiki/List_of_crowdsourcing_projects (accessed March 17, 2013).
Crowdsourcing is the practice of obtaining
needed services, ideas, or content by
soliciting contributions from a large group of
people, and especially from an online
community, rather than from traditional
employees or suppliers. ... [It] is different
from ordinary outsourcing since it is a task or
problem that is outsourced to an undefined
public rather than a specific, named group.
Wikipedia contributors, "Crowdsourcing," Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/
Crowdsourcing (accessed March 17, 2013)
Amazon Mechanical Turk was launched Nov 2005
Alexa global / USA rank of Amazon Mechanical Turk (Jun 2013): 8,242 / 3,060
Alexa reputation (Jun 2013): 1.344
crowdsourcing
CAPTCHA is a Completely Automated Public Turing test to tell Computers and Human
Apart
Started as a computer science project at Carnegie Mellon
Each day 200,000,000 recaptcha’s are solved by humans around the world
Open Street Map was 1st launched in 2007
Alexa global / Belarus / USA traffic rank of Open Street Map (Jun 2013): 8,853 / 1,373 / 20,114
Alexa reputation (Jun 2013): 28,368
Zooniverse was 1st launched July 2007
Alexa global / USA traffic rank of Zooniverse (Jun 2013): 275,960 / 153,853
Alexa reputation (Jun 2013): 502
Registered users (Jun 2013): 848,140
citizen science
Kickstarter was launched in 2009
Alexa global / USA traffic rank of Kickstarter (Jun 2013): 798 / 361
Alexa reputation (Jun 2013): 85,591
43,000+ projects successfully funded with more than USD $669,000,000
crowdfunding
crowdcollaboration
• Began 2001
• Now in 285 languages
• 40 wikipedia
languages with more
than 100,000 articles
• 112 wikipedia
languages with more
than 10,000 articles
• 400,000,000 unique
visitors per month
• 3,900,000+ articles in English, 1,400,000+ in German, 1,250,000+ in French, 1,050,000 in
Dutch
• 85,000 active contributors / more than 300,000 have edited Wikipedia more than 10 times
• Alexa global / USA traffic rank (Jun 2013): 7 / 8
• Alexa reputation (Jun 2013): 2,102,569
Wikipedia contributors, "List of crowdsourcing projects," Wikipedia, The Free Encyclopedia, http://
en.wikipedia.org/wiki/List_of_crowdsourcing_projectss (accessed Jun 2013).
Crowdsourcing
and Libraries
20130630 What motivates library crowdsourcing volunteers? [ALA LITA]
20130630 What motivates library crowdsourcing volunteers? [ALA LITA]
Family Search Indexing was 1st launched (beta) 2004
Alexa global / USA traffic rank of FamilySearch (Jun 2013): 4,664 / 1,373
Alexa reputation (Jun 2013): 8,140
• Started (beta) 2004
• More than 780,000 worldwide registered volunteers
from ~25 countries index records relevant to family
history
• Approximately 100,000 active volunteers each month
• UI in Chinese, English, German, French, Italian,
Japanese, Korean, Portuguese, and Russian
• Blind double-key entry with arbitration / reconciliation
• More than 1,500,088,741 records indexed (July 2012)
• Accuracy typically > 99.95%
Project Gutenberg was 1st launched Dec 1971
Alexa global traffic rank of Project Gutenberg (Jun 2013): 6,709
Alexa reputation (Jun 2013): 31,190
• Started Dec 1971
• Worldwide volunteers transcribe or proofread OCR’d
public domain books through Distributed Proofreaders
• 40,000 books completed (July 2012)
• Partner / affiliated projects for Australia, Canada,
Europe, Germany, Luxembourg, Philippines, Runeberg
(Nordic literature), Russia, Taiwan
Alexa global / Australia traffic rank of National Library of Australia (June 2013): 15,435 / 411
Trove gets ~77% of all National Library web traffic.
Alexa reputation (Jun 2013): 8,788
National Library of
Australia
• Online since 2008
• More than 9,830,000 / 96,000,000 newspaper
pages / articles (May 2013)
• Top text corrector 1,911,000+ lines (May 2013)
• 2,661,140 lines corrected each month (average for
1st 5 months 2013)
• 96,304,337 lines corrected as of May 2013, up from
66,527,535 lines corrected May 2012
• 95,299 / 8,519 registered / active users (May 2013)
Alexa global / country traffic rank of National Library of Finland
2,535,854 (31-Oct-2012) / 199 (2-Apr-2012)
National Library of
Finland
• Digitalkoot is a project to improve OCR text in
digitized newspapers -- by playing games!
• Digitalkoot is a collaboration between the National
Library and Microtask
• Players correct OCR text by playing Myyräsillassa
(Mole Bridge) or Myyräjahdissa (Mole Hunt)
• National Library has 4,000,000+ digitized pages
• 109,321 registered players (October 2012)
• Since February 2011 8,024,530 micro-tasks have
been completed
Alexa global / USA traffic rank of UC Riverside (June 2013): 12,584 / 4,382
CDNC gets ~1.6% of all UC Riverside web traffic.
California Digital
Newspaper Collection
• CDNC began digitizing newspapers in 2005 as
part of NDNP
• Newspapers digitized to article-level as well as
to page-level as required by NDNP
• Hosted on Veridian beginning 2009
• Collection size 61,351 issues, 544,474 pages,
6,327,491 articles (May 2013)
OCR text correction
• OCR text correction added Aug 2011
• Corrections are done line by line
• 1340 registered / 599 active users (Jun 2013)
• ~1,160,465+ lines of text corrected (Jun 2013)
• ~1.1% of the collection corrected, 98.9% to go!
• Top corrector 379,573 lines > 2x 2nd corrector
20130630 What motivates library crowdsourcing volunteers? [ALA LITA]
Cambridge Public Library
Historic Newspaper Collection
• Cambridge Historic Newspapers online since Jan 2012.
• Cambridge Massachusetts Public Library digitized local
newspapers (http://cambridge.dlconsulting.com/)
• Newspapers digitized to article-level
• Collection size 6,346 issues, 59,070 pages, 669,406
articles (May 2013)
• Collection includes 13,099 obituary cards
Crowdsourcing
Motivation
Cognitive surplus
... people are learning to use their free time for creative
activities rather than consumptive ones [such as watching
TV] ...
... the total human cognitive effort in creating all of
Wikipedia in every language is about one hundred million
hours ...
... Americans alone watch two hundred billion hours of TV
every year, or enough time, if it would be devoted to projects
similar to Wikipedia, to create about 2000 of them ...
Clay Shirky. Cognitive surplus: Creativity and generosity in a connected age. Penguin Press. New York. 2010.
User Lines corrected
1 379,578
2 136,637
3 56,996
4 52,093
5 46,102
6 39,617
7 35,878
8 33,204
9 31,319
10 30,408
Lines corrected User
1,456,906 1
1,385,369 2
1,010,360 3
960,230 4
847,340 5
786,147 6
657,187 7
600,513 8
582,276 9
565,384 10
20130630 What motivates library crowdsourcing volunteers? [ALA LITA]
Motivation
genealogists and family
historians
• National Library of Australia’s 2012 Trove
status report showed that ~50% of Trove users
are family historians
• National Library of New Zealand survey found
that ~50% of PapersPast users are genealogists
• A 2012 Utah Digital Newspapers survey
showed that 72% of its users are genealogists*
PAPERSPAST
*John Herbert and Randy Olsen. “Small town papers: Still delivering the news”. Paper given
at 2012 World Library and Information Congress. Helsinki. August 2012.
User survey
• CDNC and Cambridge Public Library published a user
survey in Mar 2013 (it’s still online)
• 604 / 32 responses
• surveys are (mostly) identical except for organization
name
Survey is on home pages at http://cdnc.ucr.edu and http://cambridge.dlconsulting.com/
User demographic
genealogists and family historians
User demographic
no spring chickensX
User demographic
reasons for use
User demographic
types of information
“I enjoy the correction - it’s a great way to learn more about
past history and things of interest whilst doing a ‘service to
the community’ by correcting text for the benefit of others.”
“I have recently retired from IT and thought that I could be
of some assistance to the project. It benefits me and other
people. It helps with family research.”
From Rose Holley in “Many Hands Make Light Work.” National Library of Australia March 2009.
Motivation
Trove users’ report
“I am interested in all kinds of history. I have pursued genealogy
as a hobby for many years. I correct text at CDNC because I see
it as a constructive way to contribute to a worthwhile project.
Because I am interested in history, I enjoy it.”
Wesley, California
Personal communications with CDNC text correctors.
Motivation
CDNC users’ report
“I only correct the text on articles of local interest - nothing at
state, national or international level, no advertisements, etc. 
The objective is to be able to help researchers to locate local
people, places, organizations and events using the on-line
search at CDNC.  I correct local news & gossip, personal items,
real estate transactions, superior court proceedings, county and
local board of supervisors meetings, obituaries, birth notices,
marriages, yachting news, etc.”
Ann, California
Personal communications with CDNC text correctors.
Motivation
CDNC users’ report
“I am correcting text for the Coronado Tent City Program for
1903.  It is important to correct any problems with personal
names and other information so that researchers will be able
to search by keyword and be assured of retrieving desired
results. ... type fonts cause a great deal of difficulty in
digitizing the text and can cause problems for searchers.  Also,
many of the guests' names at Tent City and Hotel Del
Coronado were taken from the registration books and reported
in the Program.  This led to many problems in spelling of last
names and the editors were not careful to be consistent in the
spellings.  This Program is an important resource since it
provides an excellent picture of daily life in Tent City and
captures much of the history of Coronado itself.”
Gene, California
Personal communications with CDNC text correctors.
Motivation
CDNC users’ report
“I have always been interested in history, especially the
development of the American West, and nothing brings it alive
better than newspapers of the time. I believe them to be an
invaluable source of knowledge for us and future generations.”
David, United Kingdom
Personal communications with CDNC text correctors.
Motivation
CDNC users’ report
CDNC is an excellent source of information matching my
personal interest in such topics as sea history, development
of shipbuilding, clippers and other ships etc. ...
Unfortunately, the quality of text ... is rather poor I’m
afraid. This is why I started to do all corrections necessary
for myself ... and to leave the corrected text for use of
others. .... I am not doing this very regularly as this is just
my hobby and pleasure.
Jerzey, Poland
Personal communications with CDNC text correctors.
Motivation
CDNC users’ report
As an amateur historical researcher my time for research is very
limited.  Making time to travel to archives, libraries, and historical
societies does not happen as often as I would like.  The Cambridge
Public Library’s online newspaper collection has been an invaluable
resource and it is fun.  I am very grateful for all the help I have received
over the years from so many research organizations. Correcting text
has several benefits.  It makes it much more likely that I will find a
story if I decide to search for it in the future.  It is a way of saying
‘thank you’ to the Cambridge Library for having such a great resource
available and maybe I can make the next person’s research a little
easier. It is my own little historical preservation project.
Daniel, Massachusetts
Personal communications with Cambridge text correctors.
Motivation
Cambridge users’ report
Graphic from Kaufmann et al. “More than fun and money. Worker Motivation
in Crowdsourcing – A Study on Mechanical Turk.”
Motivation
Crowdsourcing
Interesting fact
the long tail* of crowdsourced
OCR text correction
a probability distribution has a long tail if a larger share
of the population rests within its tail than would be the
case if the population were normally distributed.
the most productive users represent a small fraction of
the total user population and complete ~50% of total
production.
said a different way, less productive users represent the
largest fraction of the user population and also complete
~50% of the total production.
The phrase “long tail” was popularized by Chris Anderson in the October 2004 Wired magazine article The Long Tail
and by Clay Shirky’s February 2003 essay “Power laws, web logs, and inequality”.
OCR text correction long tails
0
75000
150000
225000
300000
CDNC lines corrected by text corrector
0
750,000
1,500,000
2,250,000
3,000,000
NLA lines corrected by text corrector
top corrector 242,965 top corrector 1,456,906
50%
50%
50%
50%
Statistics from Oct 2012
Website traffic
Website traffic
After a crowdsourcing transcription project of diaries from the
American War Between the States, Nicole Saylor, Head of Digital
Library Services at the University of Iowa Libraries, reported
“On June 9, 2011, we went from about 1000
daily hits to our digital library on a really good
day to more than 70,000.”
Nicole Saylor interviewed by Trevor Owens. “Crowdsourcing the Civil War: Insights Interview with Nicole Saylor” blog post
at http://blogs.loc.gov/digitalpreservation/2011/12/crowdsourcing-the-civil-war-insights-interview-with-nicole-saylor/.
Dec 6, 2011.
Website traffic
Website traffic at CDNC before / after implementing
crowdsourcing
before crowdsourcing
11-Jun-2011 / 12-Jul-2011
after crowdsourcing
11-Jun-2012 / 12-Jul-2012
change
visits 17,485 21,488 +22.9%
unique visitors 11,381 13,376 +17.5%
visit duration 9m 24s 11m 7s +18.3%
bounce rate 51.3% 44.5% -6.8%
pages per visit 14.9 11.7 -21.5%
Website traffic
Accuracy
“His Accuracy Depends on Ours!"
Office for Emergency Management. Office of War
Information. Domestic Operations Branch. Bureau of
Special Services. [Photo held at US National Archives and
Records Administration]
Why correct OCR text?
Here’s why ...
Deaths. lln»rieff, Esq. of <c .. Qn.
Sunday, the till. greatly Drandrellt, of
Orms4irJi.- ~ ; ;✓ ' • * On ijfr r inn
l j j j i l F i i j ' 1 1 f H a v o d i v y d ,
Carnarvonshire, S ; **" *- ' « ' March
Oxford, F. Tfovmeud, Uerald. » • V .
•On Tncsdav last, Mr. Charles.
IWilinson, this 8 ; had vf thesis#,, a
week ago, which tcrminate<i'iu his
death. . / ' ■ O'i Sunday, dJst nit. at.
AsbtCnvHall, mar Lancaster,
Mr.,Geo. Worn ick, many years
house'steward hit late Once The
Hamilton and Brandon. He locked
himself h»oWn'r«wte<: soon. twelve
o'clock" that dny, and fii»-d a loaded
pistol "through Ins bead, 1 which
instantaneously killed him. Coronet's
Verdict, shot himself in a temporary fit of
Friday week,
raw OCR text
Excerpt from The British Newspaper Archive, Chester Courant, Tuesday 6-Apr-1819, page 3.
newspaper image
Accuracy
• Edwin Kiljin (Koninklijke Bibliotheek the Netherlands)
reports raw OCR character accuracies of 68% for early 20th
century newspapers
• Rose Holley (National Library of Australia) reports raw
OCR character accuracy varied from 71% to 98% on a
sample Trove digitized newspapers
Rose Holley. “How good can it get? Analysing and improving OCR accuracy in large scale historic newspaper digitisation
programs. D-Lib Magazine. March/April 2009.
Edwin Kiljin. “The current state-of-art in newspaper digitization.” D-Lib Magazine. January/February 2008.
Public domain graphic courtesy of Wikimedia Commons.
Accuracy
MAPPING TEXTS* assesses digitization quality of
digital newspapers by comparing the number of words
recognized to the total number of words scanned
* Mapping texts is a collaboration between the University of North Texas and Stanford University aimed at experimenting
with new methods for finding and analyzing meaningful patterns embedded in massive collections of digital newspapers.
uncorrected OCR accuracy by
newspaper title
Title
OCR character
accuracy
~OCR word
accuracy*
PRP Pacific Rural Press 1871 - 1922 92.6% 68.1%
SFC San Francisco Call 1890 - 1913 92.6% 68.1%
LAH Los Angeles Herald 1873 - 1910 88.7% 54.9%
LH Livermore Herald 1877 - 1899 88.6% 54.6%
DAC Daily Alta California 1841 - 1891 88.2% 53.4%
CFJ California Farmer and Journal
of Useful Sciences 1855 - 1880
86.5% 48.4%
SN Sausalito News 1885 - 1922 70.4% 17.3%
*Word accuracy assumes average word length is 5 characters
OCR accuracy by newspaper title
Title
OCR character
accuracy
Corrected
accuracy
PRP Pacific Rural Press 1871 - 1922 92.6% 99.3%
SFC San Francisco Call 1890 - 1913 92.6% 99.6%
LAH Los Angeles Herald 1873 - 1910 88.7% 99.1%
LH Livermore Herald 1877 - 1899 88.6% 99.9%
DAC Daily Alta California 1841 - 1891 88.2% 99.9%
CFJ California Farmer and Journal
of Useful Sciences 1855 - 1880
86.5% 99.8%
SN Sausalito News 1885 - 1922 70.4% 100.0%
corrected accuracy
by newspaper title
Title
OCR character
accuracy
~OCR word
accuracy*
Corrected
accuracy
~Corrected
word accuracy*
PRP 1871 - 1922 92.6% 68.1% 99.3% 96.5%
SFC 1890 - 1913 92.6% 68.1% 99.6% 98.0%
LAH 1873 - 1910 88.7% 54.9% 99.1% 95.6%
LH 1877 - 1899 88.6% 54.6% 99.9% 99.5%
DAC 1841 - 1891 88.2% 53.4% 99.9% 99.5%
CF 1855 - 1880 86.5% 48.4% 98.3% 91.8%
SN 1885 - 1922 70.4% 17.3% 100.0% 100.0%
*Word accuracy assumes average word length is 5 characters
correction accuracy by user
User
Average OCR
accuracy
Correction
accuracy
A 70.4% 100.0%
B 87.1% 99.5%
C 95.4% 99.5%
D 86.5% 98.3%
E 95.3% 100.0%
F 91.0% 100.0%
G 91.0% 99.8%
H 90.5% 99.0%
I 96.6% 99.8%
J 94.8% 100.0%
K 86.8% 99.3%
How does low text accuracy affect search recall?
The Facts
• Average uncorrected OCR character accuracy of the
CDNC sample data is ~89%
• Average length of an English word is 5 characters
• Average word accuracy is 89% x 89% x 89% x 89% x 89%
= 55.8% - round up to 60% or 6 out of 10 words correct
Accuracy
ARNDT
ARNDT
ARNDT
ARNDT ARNDT
ARNDT
ARNDT
ARNDT
ARNDT
ARNDT
Search recall no text correction
instances of “ARNDT” found instances of “ARNDT” not found
Accuracy
The Facts
• Average corrected character accuracy of the CDNC
sample data is ~99.4%
• Average word accuracy of CDNC corrected text is 99.4%
x 99.4% x 99.4% x 99.4% x 99.4% = 97.0%
ARNDT
ARNDT
ARNDT
ARNDT ARNDT
ARNDT
ARNDT
ARNDT
ARNDT
ARNDT
instances of “ARNDT” found instances of “ARNDT” not found
Search recall with text correction
A search for “Arndt” at Chronicling America gives
10,267 results*
• If Chronicling America text accuracy is 55.8% (same as
uncorrected CDNC sample), then 8,133 instances of
“Arndt” were not found
• If text accuracy is 97.0%, then 317 instances of “Arndt”
were not found
Accuracy
* Search performed 31 Oct 2012
Accuracy
Suppose the word/name is longer than 5
characters?
The Facts
• Assume that average uncorrected / corrected OCR
character accuracy is ~89% / ~99% same as CDNC.
Name Name length Raw text accuracy Corrected text accuracy
Eklund 6 49.7% 94.2%
Kennedy 7 44.2% 93.25
Espinosa 8 39.4% 92.3%
Bonaparte 9 35.0% 91.4%
Chatterjee 10 31.2% 90.4%
Accuracy
Name
Number of
search results
Missing results with
raw text accuracy
Missing results with
corrected text accuracy
Eklund 2,951 2,987 182
Kennedy 360,723 455,392 26,111
Espinosa 1,918 2,950 160
Bonaparte 44,664 82,947 4,203
Chatterjee 19 42 2
Chronicling America searches done 19-Mar-2013
(6,025,474 pages from 1836 to 1922).
Crowdsourcing
economic benefits
Public domain photo courtesy of US Navy
$
Economic benefits
Financial value of outsourced OCR text correction
for newspapers?
The Assumptions
• 25 to 50 characters per line in a newspaper column:
Assume 40 characters per line (CDNC sample average)
• Outsourced text transcription or correction costs USD
$0.35 to $1.20 per 1000 characters: Assume $0.50
per 1000 characters
$
Economics
$ 578,000 lines x 40 characters per line x
1/1000 x $0.50 = $11,560
$ 68,908,757 lines x 40 characters per line x
1/1000 x $0.50 = $1,378,175
$
Economics
Financial value of in-house OCR text
correction?
The Assumptions
• Correction takes 15 seconds per line
• Cost is hourly wage plus benefits of lowest level
employee, $10 for CDNC, $41.88* for Australia
AUD $40.38 = USD $41.88 is the actual labor value assumed by the National Library of Australia to calculate avoided costs
due to crowdsourced OCR text correction in its 2012 Trove Status Report.
$
Economics
$ 578,000 lines x 15 seconds per line x 1/3600 hrs
per second x $10.00 per hr = $24,083
$ 68,908,757 lines x 15 seconds per line x 1/3600
hrs per second x $41.88 per hr = $12,024,578
Crowdsourcing
other benefits
Public domain photo “A useful instruction for young sailors from the Royal Hospital
School, Greenwich” from the National Maritime Museum.
“when someone transcribes a document, they are
actually better fulfilling the mission of a cultural
heritage organization than someone who simply stops
by to flip through the pages”
Other benefit
Paraphrased from Trevor Owen’s blog http://www.trevorowens.org/2012/03/
crowdsourcing-cultural-heritage-the-objectives-are-upside-down/ (accessed June 2013).
“in addition to increasing search accuracy or lowering
the costs of document transcription, crowdsourcing is
the single greatest advancement in getting people using
and interacting with library collections”
Other benefit
Paraphrased from Trevor Owen’s blog http://www.trevorowens.org/2012/03/
crowdsourcing-cultural-heritage-the-objectives-are-upside-down/ (accessed June 2013).
Crowdsourcing considerations
• How to market / advertise
crowdsourcing?
• How to motivate
crowdsourcers?
• Is authentication / identity of
crowdsourcers an issue?
• How to administer
crowdsourced data?
Photo of Aleister Crowley [Public domain] from Wikimedia
Commons
Conclusions
Conclusion of the Sonata for piano #32, opus 111 by
Ludwig van Beethoven
• Lots of crowdsourcing in cultural heritage
organizations and elsewhere
• Benefits are multi-faceted: Economic, data
accuracy, patron engagement, increased web
traffic
Resources
Public domain photo “A useful instruction for young sailors from the Royal
Hospital School, Greenwich” from the National Maritime Museum.
Comprehensive worldwide list of online
newspaper archives
Wikipedia contributors, "List of online newspaper archives," Wikipedia, The Free Encyclopedia, https://
en.wikipedia.org/wiki/Wikipedia:List_of_online_newspaper_archives (accessed March 17, 2013).
Search many digital newspaper
collections at once!
As of June 2013 elephind (http://www.elephind.com) has indexed 1,032 newspapers from 13
historical digital collections comprising 1,097,928 issues and 45,243,443 pages/articles.
Correct California newspapers at http://cdnc.ucr.edu
Correct Cambridge MA newspapers http://bit.ly/cambridgepublic
Correct Australian newspapers http://trove.nla.gov.au
Correct Tennessee newspapers http://tndp.lib.utk.edu
Correct Virginia newspapers http://virginiachronicle.com
Try crowdsourcing!
Hãy thử crowdsourcing!
Or try Russian language periodicals http://bit.ly/russianperiodicals
Correct Vietnamese newspapers http://bit.ly/nationallibraryofvietnam
Попробуйте краудсорсинга!
Other resources
Mapping Texts at http://mappingtexts.stanford.edu/
Wragge Labs at http://wraggelabs.com/
Wikipedia list of crowdsourcing projects
https://en.wikipedia.org/wiki/
List_of_crowdsourcing_projects
?Brian Geiger
bgeiger@ucr.edu
Frederick Zarndt
frederick@frederickzarndt.com
Photo held by John Oxley Library, State Library of Queensland. Original from
Courier-mail, Brisbane, Queensland, Australia.

Contenu connexe

Tendances

Knowledge Organization | LIS653 | Fall 2017
Knowledge Organization | LIS653 | Fall 2017Knowledge Organization | LIS653 | Fall 2017
Knowledge Organization | LIS653 | Fall 2017PrattSILS
 
Digital Visitors and Residents: Project Feedback
Digital Visitors and Residents: Project FeedbackDigital Visitors and Residents: Project Feedback
Digital Visitors and Residents: Project Feedbackjisc-elearning
 
Beyond the Silos of the LAMs - Library, Archive, Museum Collaboration
Beyond the Silos of the LAMs - Library, Archive, Museum CollaborationBeyond the Silos of the LAMs - Library, Archive, Museum Collaboration
Beyond the Silos of the LAMs - Library, Archive, Museum CollaborationOCLC Research
 
Presentation by Ingrid Parent: Digital Academic Content and the Future of Lib...
Presentation by Ingrid Parent: Digital Academic Content and the Future of Lib...Presentation by Ingrid Parent: Digital Academic Content and the Future of Lib...
Presentation by Ingrid Parent: Digital Academic Content and the Future of Lib...Ingrid Parent
 
Gone today, here tomorrow: the future of government information and the digit...
Gone today, here tomorrow: the future of government information and the digit...Gone today, here tomorrow: the future of government information and the digit...
Gone today, here tomorrow: the future of government information and the digit...James Jacobs
 
Digital FDLP Louisiana GODORT 2012 slides+notes
Digital FDLP Louisiana GODORT 2012 slides+notesDigital FDLP Louisiana GODORT 2012 slides+notes
Digital FDLP Louisiana GODORT 2012 slides+notesJames Jacobs
 
Makerspaces: A New Wave of Library Service: The Westport (CT) Public Library
Makerspaces: A New Wave of Library Service: The Westport (CT) Public LibraryMakerspaces: A New Wave of Library Service: The Westport (CT) Public Library
Makerspaces: A New Wave of Library Service: The Westport (CT) Public LibraryALATechSource
 
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...PrattSILS
 
Blind Spots and Broken Links: Access to Government Information
Blind Spots and Broken Links: Access to Government InformationBlind Spots and Broken Links: Access to Government Information
Blind Spots and Broken Links: Access to Government InformationJames Jacobs
 
Tonta World Is Flat Yet Not Open Oslo Workshop 10 May 2006 Final Revised
Tonta World Is Flat Yet Not Open Oslo Workshop 10 May 2006 Final RevisedTonta World Is Flat Yet Not Open Oslo Workshop 10 May 2006 Final Revised
Tonta World Is Flat Yet Not Open Oslo Workshop 10 May 2006 Final RevisedYasar Tonta
 
Nahl LIS Educators VWs 3-5-10 ALAFIN
Nahl LIS Educators VWs 3-5-10 ALAFINNahl LIS Educators VWs 3-5-10 ALAFIN
Nahl LIS Educators VWs 3-5-10 ALAFINDiane Nahl
 
Who, What, Where,Why and How
Who, What, Where,Why and HowWho, What, Where,Why and How
Who, What, Where,Why and HowRachel Frick
 
Digital Infrastructures that Embody Library Principles: The IMLS national dig...
Digital Infrastructures that Embody Library Principles: The IMLS national dig...Digital Infrastructures that Embody Library Principles: The IMLS national dig...
Digital Infrastructures that Embody Library Principles: The IMLS national dig...Trevor Owens
 
The DPLA and NY Heritage for Tech Camp 2014
The DPLA and NY Heritage for Tech Camp 2014The DPLA and NY Heritage for Tech Camp 2014
The DPLA and NY Heritage for Tech Camp 2014Larry Naukam
 
Research into social media practices and social media practices for research
Research into social media practices and social media practices for researchResearch into social media practices and social media practices for research
Research into social media practices and social media practices for researchHazel Hall
 
Creative Connections Presentation Notes
Creative Connections Presentation NotesCreative Connections Presentation Notes
Creative Connections Presentation NotesTor Loney
 
ARLIS 2010 RLG Partnership Round Table
ARLIS 2010 RLG Partnership Round TableARLIS 2010 RLG Partnership Round Table
ARLIS 2010 RLG Partnership Round TableOCLC Research
 
Creating tangible connections using the intangible library
Creating tangible connections using the intangible libraryCreating tangible connections using the intangible library
Creating tangible connections using the intangible libraryEastern Michigan University
 

Tendances (20)

Knowledge Organization | LIS653 | Fall 2017
Knowledge Organization | LIS653 | Fall 2017Knowledge Organization | LIS653 | Fall 2017
Knowledge Organization | LIS653 | Fall 2017
 
Digital Visitors and Residents: Project Feedback
Digital Visitors and Residents: Project FeedbackDigital Visitors and Residents: Project Feedback
Digital Visitors and Residents: Project Feedback
 
Beyond the Silos of the LAMs - Library, Archive, Museum Collaboration
Beyond the Silos of the LAMs - Library, Archive, Museum CollaborationBeyond the Silos of the LAMs - Library, Archive, Museum Collaboration
Beyond the Silos of the LAMs - Library, Archive, Museum Collaboration
 
Presentation by Ingrid Parent: Digital Academic Content and the Future of Lib...
Presentation by Ingrid Parent: Digital Academic Content and the Future of Lib...Presentation by Ingrid Parent: Digital Academic Content and the Future of Lib...
Presentation by Ingrid Parent: Digital Academic Content and the Future of Lib...
 
Gone today, here tomorrow: the future of government information and the digit...
Gone today, here tomorrow: the future of government information and the digit...Gone today, here tomorrow: the future of government information and the digit...
Gone today, here tomorrow: the future of government information and the digit...
 
Digital FDLP Louisiana GODORT 2012 slides+notes
Digital FDLP Louisiana GODORT 2012 slides+notesDigital FDLP Louisiana GODORT 2012 slides+notes
Digital FDLP Louisiana GODORT 2012 slides+notes
 
Makerspaces: A New Wave of Library Service: The Westport (CT) Public Library
Makerspaces: A New Wave of Library Service: The Westport (CT) Public LibraryMakerspaces: A New Wave of Library Service: The Westport (CT) Public Library
Makerspaces: A New Wave of Library Service: The Westport (CT) Public Library
 
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
 
Blind Spots and Broken Links: Access to Government Information
Blind Spots and Broken Links: Access to Government InformationBlind Spots and Broken Links: Access to Government Information
Blind Spots and Broken Links: Access to Government Information
 
Tonta World Is Flat Yet Not Open Oslo Workshop 10 May 2006 Final Revised
Tonta World Is Flat Yet Not Open Oslo Workshop 10 May 2006 Final RevisedTonta World Is Flat Yet Not Open Oslo Workshop 10 May 2006 Final Revised
Tonta World Is Flat Yet Not Open Oslo Workshop 10 May 2006 Final Revised
 
Nahl LIS Educators VWs 3-5-10 ALAFIN
Nahl LIS Educators VWs 3-5-10 ALAFINNahl LIS Educators VWs 3-5-10 ALAFIN
Nahl LIS Educators VWs 3-5-10 ALAFIN
 
Who, What, Where,Why and How
Who, What, Where,Why and HowWho, What, Where,Why and How
Who, What, Where,Why and How
 
Digital Infrastructures that Embody Library Principles: The IMLS national dig...
Digital Infrastructures that Embody Library Principles: The IMLS national dig...Digital Infrastructures that Embody Library Principles: The IMLS national dig...
Digital Infrastructures that Embody Library Principles: The IMLS national dig...
 
The DPLA and NY Heritage for Tech Camp 2014
The DPLA and NY Heritage for Tech Camp 2014The DPLA and NY Heritage for Tech Camp 2014
The DPLA and NY Heritage for Tech Camp 2014
 
Research into social media practices and social media practices for research
Research into social media practices and social media practices for researchResearch into social media practices and social media practices for research
Research into social media practices and social media practices for research
 
Creative Connections Presentation Notes
Creative Connections Presentation NotesCreative Connections Presentation Notes
Creative Connections Presentation Notes
 
Reawakening the people's university
Reawakening the people's universityReawakening the people's university
Reawakening the people's university
 
ARLIS 2010 RLG Partnership Round Table
ARLIS 2010 RLG Partnership Round TableARLIS 2010 RLG Partnership Round Table
ARLIS 2010 RLG Partnership Round Table
 
Creating tangible connections using the intangible library
Creating tangible connections using the intangible libraryCreating tangible connections using the intangible library
Creating tangible connections using the intangible library
 
Digital Public Library of America
Digital Public Library of AmericaDigital Public Library of America
Digital Public Library of America
 

En vedette

Many hands make light work, the american version [charleston library conferen...
Many hands make light work, the american version [charleston library conferen...Many hands make light work, the american version [charleston library conferen...
Many hands make light work, the american version [charleston library conferen...Frederick Zarndt
 
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...Frederick Zarndt
 
How the british library and the singapore natinoal library support social cha...
How the british library and the singapore natinoal library support social cha...How the british library and the singapore natinoal library support social cha...
How the british library and the singapore natinoal library support social cha...Frederick Zarndt
 
An international survey of born digital legal deposit policies and practices ...
An international survey of born digital legal deposit policies and practices ...An international survey of born digital legal deposit policies and practices ...
An international survey of born digital legal deposit policies and practices ...Frederick Zarndt
 
Coronado public library digital newspapers workshop local partnerships [oct 2...
Coronado public library digital newspapers workshop local partnerships [oct 2...Coronado public library digital newspapers workshop local partnerships [oct 2...
Coronado public library digital newspapers workshop local partnerships [oct 2...Frederick Zarndt
 
An international survey of born digital legal deposit policies and practices ...
An international survey of born digital legal deposit policies and practices ...An international survey of born digital legal deposit policies and practices ...
An international survey of born digital legal deposit policies and practices ...Frederick Zarndt
 

En vedette (6)

Many hands make light work, the american version [charleston library conferen...
Many hands make light work, the american version [charleston library conferen...Many hands make light work, the american version [charleston library conferen...
Many hands make light work, the american version [charleston library conferen...
 
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
 
How the british library and the singapore natinoal library support social cha...
How the british library and the singapore natinoal library support social cha...How the british library and the singapore natinoal library support social cha...
How the british library and the singapore natinoal library support social cha...
 
An international survey of born digital legal deposit policies and practices ...
An international survey of born digital legal deposit policies and practices ...An international survey of born digital legal deposit policies and practices ...
An international survey of born digital legal deposit policies and practices ...
 
Coronado public library digital newspapers workshop local partnerships [oct 2...
Coronado public library digital newspapers workshop local partnerships [oct 2...Coronado public library digital newspapers workshop local partnerships [oct 2...
Coronado public library digital newspapers workshop local partnerships [oct 2...
 
An international survey of born digital legal deposit policies and practices ...
An international survey of born digital legal deposit policies and practices ...An international survey of born digital legal deposit policies and practices ...
An international survey of born digital legal deposit policies and practices ...
 

Similaire à 20130630 What motivates library crowdsourcing volunteers? [ALA LITA]

20140408 digital newspapers collections [idlc kuala lumpur]
20140408 digital newspapers collections [idlc kuala lumpur]20140408 digital newspapers collections [idlc kuala lumpur]
20140408 digital newspapers collections [idlc kuala lumpur]Frederick Zarndt
 
Stories to tell: The making of our digital nation. April 2010
Stories to tell: The making of our digital nation. April 2010 Stories to tell: The making of our digital nation. April 2010
Stories to tell: The making of our digital nation. April 2010 Rose Holley
 
Digital Transformation and Data - the Wikimedia Residency at the University o...
Digital Transformation and Data - the Wikimedia Residency at the University o...Digital Transformation and Data - the Wikimedia Residency at the University o...
Digital Transformation and Data - the Wikimedia Residency at the University o...Ewan McAndrew
 
Crowdsourcing and social engagement: potential, power and freedom for librari...
Crowdsourcing and social engagement: potential, power and freedom for librari...Crowdsourcing and social engagement: potential, power and freedom for librari...
Crowdsourcing and social engagement: potential, power and freedom for librari...Rose Holley
 
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...lljohnston
 
Crowdsourcing as Public Engagement
Crowdsourcing as Public  EngagementCrowdsourcing as Public  Engagement
Crowdsourcing as Public EngagementAlastair Dunning
 
Boris Zetterlund - Global Perspectives on Public Libraries
Boris Zetterlund - Global Perspectives on Public LibrariesBoris Zetterlund - Global Perspectives on Public Libraries
Boris Zetterlund - Global Perspectives on Public LibrariesAxiell Group
 
DPLA - an introduction for historians
DPLA  - an introduction for historiansDPLA  - an introduction for historians
DPLA - an introduction for historiansLarry Naukam
 
Doctoral open day_digital_research_session_Social_Sciences_BL
Doctoral open day_digital_research_session_Social_Sciences_BLDoctoral open day_digital_research_session_Social_Sciences_BL
Doctoral open day_digital_research_session_Social_Sciences_BLAquiles Alencar Brayner
 
Hello islandora building a digital repository nov 30, 2016 v6
Hello islandora  building a digital repository nov 30, 2016 v6Hello islandora  building a digital repository nov 30, 2016 v6
Hello islandora building a digital repository nov 30, 2016 v6eohallor
 
The Relevance of Geospatial Data as a Prerequisite in Obtaining Knowledge and...
The Relevance of Geospatial Data as a Prerequisite in Obtaining Knowledge and...The Relevance of Geospatial Data as a Prerequisite in Obtaining Knowledge and...
The Relevance of Geospatial Data as a Prerequisite in Obtaining Knowledge and...iosrjce
 
LIS 653 Posters Fall 2014
LIS 653 Posters Fall 2014 LIS 653 Posters Fall 2014
LIS 653 Posters Fall 2014 PrattSILS
 
Rethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryRethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryMia
 
We are not in a post facts world
We are not in a post facts worldWe are not in a post facts world
We are not in a post facts worldLorna Campbell
 
Digital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryDigital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryNora McGregor
 
Is It Too Late to Ensure Continuity of Access to the Scholarly Record?
Is It Too Late to Ensure Continuity of Access to the Scholarly Record?Is It Too Late to Ensure Continuity of Access to the Scholarly Record?
Is It Too Late to Ensure Continuity of Access to the Scholarly Record?EDINA, University of Edinburgh
 
Community Generated Databases for NY State History Conference 2013
Community Generated Databases for NY State History Conference 2013Community Generated Databases for NY State History Conference 2013
Community Generated Databases for NY State History Conference 2013Larry Naukam
 
Connected learning week three
Connected learning week threeConnected learning week three
Connected learning week threePaul Richardson
 

Similaire à 20130630 What motivates library crowdsourcing volunteers? [ALA LITA] (20)

20140408 digital newspapers collections [idlc kuala lumpur]
20140408 digital newspapers collections [idlc kuala lumpur]20140408 digital newspapers collections [idlc kuala lumpur]
20140408 digital newspapers collections [idlc kuala lumpur]
 
Aquiles imlr seminar
Aquiles imlr seminarAquiles imlr seminar
Aquiles imlr seminar
 
Stories to tell: The making of our digital nation. April 2010
Stories to tell: The making of our digital nation. April 2010 Stories to tell: The making of our digital nation. April 2010
Stories to tell: The making of our digital nation. April 2010
 
Digital Transformation and Data - the Wikimedia Residency at the University o...
Digital Transformation and Data - the Wikimedia Residency at the University o...Digital Transformation and Data - the Wikimedia Residency at the University o...
Digital Transformation and Data - the Wikimedia Residency at the University o...
 
Crowdsourcing and social engagement: potential, power and freedom for librari...
Crowdsourcing and social engagement: potential, power and freedom for librari...Crowdsourcing and social engagement: potential, power and freedom for librari...
Crowdsourcing and social engagement: potential, power and freedom for librari...
 
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
 
Crowdsourcing as Public Engagement
Crowdsourcing as Public  EngagementCrowdsourcing as Public  Engagement
Crowdsourcing as Public Engagement
 
Boris Zetterlund - Global Perspectives on Public Libraries
Boris Zetterlund - Global Perspectives on Public LibrariesBoris Zetterlund - Global Perspectives on Public Libraries
Boris Zetterlund - Global Perspectives on Public Libraries
 
DPLA - an introduction for historians
DPLA  - an introduction for historiansDPLA  - an introduction for historians
DPLA - an introduction for historians
 
Doctoral open day_digital_research_session_Social_Sciences_BL
Doctoral open day_digital_research_session_Social_Sciences_BLDoctoral open day_digital_research_session_Social_Sciences_BL
Doctoral open day_digital_research_session_Social_Sciences_BL
 
Hello islandora building a digital repository nov 30, 2016 v6
Hello islandora  building a digital repository nov 30, 2016 v6Hello islandora  building a digital repository nov 30, 2016 v6
Hello islandora building a digital repository nov 30, 2016 v6
 
The Relevance of Geospatial Data as a Prerequisite in Obtaining Knowledge and...
The Relevance of Geospatial Data as a Prerequisite in Obtaining Knowledge and...The Relevance of Geospatial Data as a Prerequisite in Obtaining Knowledge and...
The Relevance of Geospatial Data as a Prerequisite in Obtaining Knowledge and...
 
LIS 653 Posters Fall 2014
LIS 653 Posters Fall 2014 LIS 653 Posters Fall 2014
LIS 653 Posters Fall 2014
 
Rethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryRethink research, illuminate history with the British Library
Rethink research, illuminate history with the British Library
 
We are not in a post facts world
We are not in a post facts worldWe are not in a post facts world
We are not in a post facts world
 
Digital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryDigital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British Library
 
Digital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryDigital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British Library
 
Is It Too Late to Ensure Continuity of Access to the Scholarly Record?
Is It Too Late to Ensure Continuity of Access to the Scholarly Record?Is It Too Late to Ensure Continuity of Access to the Scholarly Record?
Is It Too Late to Ensure Continuity of Access to the Scholarly Record?
 
Community Generated Databases for NY State History Conference 2013
Community Generated Databases for NY State History Conference 2013Community Generated Databases for NY State History Conference 2013
Community Generated Databases for NY State History Conference 2013
 
Connected learning week three
Connected learning week threeConnected learning week three
Connected learning week three
 

Plus de Frederick Zarndt

Digitization of the Tuol Sleng Genocide Museum Archives
Digitization of the Tuol Sleng Genocide Museum ArchivesDigitization of the Tuol Sleng Genocide Museum Archives
Digitization of the Tuol Sleng Genocide Museum ArchivesFrederick Zarndt
 
2017 Born Digital Legal Deposit Policies and Practices
2017 Born Digital Legal Deposit Policies and Practices2017 Born Digital Legal Deposit Policies and Practices
2017 Born Digital Legal Deposit Policies and PracticesFrederick Zarndt
 
e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017Frederick Zarndt
 
Project Management according to Great Pumpkin Principles
Project Management according to Great Pumpkin PrinciplesProject Management according to Great Pumpkin Principles
Project Management according to Great Pumpkin PrinciplesFrederick Zarndt
 
What did you say? interculture communication [20160308 phnom penh]
What did you say? interculture communication [20160308 phnom penh]What did you say? interculture communication [20160308 phnom penh]
What did you say? interculture communication [20160308 phnom penh]Frederick Zarndt
 
Coronado public library digital newspapers workshop [Oct 2016]
Coronado public library digital newspapers workshop [Oct 2016]Coronado public library digital newspapers workshop [Oct 2016]
Coronado public library digital newspapers workshop [Oct 2016]Frederick Zarndt
 
What did you say? mindful interculture communication [201608 icgse]
What did you say? mindful interculture communication [201608 icgse]What did you say? mindful interculture communication [201608 icgse]
What did you say? mindful interculture communication [201608 icgse]Frederick Zarndt
 
Here Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsHere Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsFrederick Zarndt
 
Here Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsHere Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsFrederick Zarndt
 
Rootstech 2015 finding and using digitized historical newspapers workshop [20...
Rootstech 2015 finding and using digitized historical newspapers workshop [20...Rootstech 2015 finding and using digitized historical newspapers workshop [20...
Rootstech 2015 finding and using digitized historical newspapers workshop [20...Frederick Zarndt
 
20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]Frederick Zarndt
 
What did you say? Intercultural expectations, misunderstandings, and communic...
What did you say? Intercultural expectations, misunderstandings, and communic...What did you say? Intercultural expectations, misunderstandings, and communic...
What did you say? Intercultural expectations, misunderstandings, and communic...Frederick Zarndt
 
20140628 crowdsourcing, family history, and long tails for libraries [ala ann...
20140628 crowdsourcing, family history, and long tails for libraries [ala ann...20140628 crowdsourcing, family history, and long tails for libraries [ala ann...
20140628 crowdsourcing, family history, and long tails for libraries [ala ann...Frederick Zarndt
 
20131019 digital collections - if you build them will anyone visit [library 2...
20131019 digital collections - if you build them will anyone visit [library 2...20131019 digital collections - if you build them will anyone visit [library 2...
20131019 digital collections - if you build them will anyone visit [library 2...Frederick Zarndt
 
20130903 what did you say? interculture communication [hamburg]
20130903 what did you say? interculture communication [hamburg]20130903 what did you say? interculture communication [hamburg]
20130903 what did you say? interculture communication [hamburg]Frederick Zarndt
 
201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...Frederick Zarndt
 
201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...Frederick Zarndt
 
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...Frederick Zarndt
 
20130629 If you build it, will they visit [ala lita lightning talk]
20130629 If you build it, will they visit [ala lita lightning talk]20130629 If you build it, will they visit [ala lita lightning talk]
20130629 If you build it, will they visit [ala lita lightning talk]Frederick Zarndt
 
20130412 Productivity of the crowd [acrl indianapolis]
20130412 Productivity of the crowd [acrl indianapolis]20130412 Productivity of the crowd [acrl indianapolis]
20130412 Productivity of the crowd [acrl indianapolis]Frederick Zarndt
 

Plus de Frederick Zarndt (20)

Digitization of the Tuol Sleng Genocide Museum Archives
Digitization of the Tuol Sleng Genocide Museum ArchivesDigitization of the Tuol Sleng Genocide Museum Archives
Digitization of the Tuol Sleng Genocide Museum Archives
 
2017 Born Digital Legal Deposit Policies and Practices
2017 Born Digital Legal Deposit Policies and Practices2017 Born Digital Legal Deposit Policies and Practices
2017 Born Digital Legal Deposit Policies and Practices
 
e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017
 
Project Management according to Great Pumpkin Principles
Project Management according to Great Pumpkin PrinciplesProject Management according to Great Pumpkin Principles
Project Management according to Great Pumpkin Principles
 
What did you say? interculture communication [20160308 phnom penh]
What did you say? interculture communication [20160308 phnom penh]What did you say? interculture communication [20160308 phnom penh]
What did you say? interculture communication [20160308 phnom penh]
 
Coronado public library digital newspapers workshop [Oct 2016]
Coronado public library digital newspapers workshop [Oct 2016]Coronado public library digital newspapers workshop [Oct 2016]
Coronado public library digital newspapers workshop [Oct 2016]
 
What did you say? mindful interculture communication [201608 icgse]
What did you say? mindful interculture communication [201608 icgse]What did you say? mindful interculture communication [201608 icgse]
What did you say? mindful interculture communication [201608 icgse]
 
Here Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsHere Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital News
 
Here Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsHere Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital News
 
Rootstech 2015 finding and using digitized historical newspapers workshop [20...
Rootstech 2015 finding and using digitized historical newspapers workshop [20...Rootstech 2015 finding and using digitized historical newspapers workshop [20...
Rootstech 2015 finding and using digitized historical newspapers workshop [20...
 
20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]
 
What did you say? Intercultural expectations, misunderstandings, and communic...
What did you say? Intercultural expectations, misunderstandings, and communic...What did you say? Intercultural expectations, misunderstandings, and communic...
What did you say? Intercultural expectations, misunderstandings, and communic...
 
20140628 crowdsourcing, family history, and long tails for libraries [ala ann...
20140628 crowdsourcing, family history, and long tails for libraries [ala ann...20140628 crowdsourcing, family history, and long tails for libraries [ala ann...
20140628 crowdsourcing, family history, and long tails for libraries [ala ann...
 
20131019 digital collections - if you build them will anyone visit [library 2...
20131019 digital collections - if you build them will anyone visit [library 2...20131019 digital collections - if you build them will anyone visit [library 2...
20131019 digital collections - if you build them will anyone visit [library 2...
 
20130903 what did you say? interculture communication [hamburg]
20130903 what did you say? interculture communication [hamburg]20130903 what did you say? interculture communication [hamburg]
20130903 what did you say? interculture communication [hamburg]
 
201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...
 
201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...
 
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
 
20130629 If you build it, will they visit [ala lita lightning talk]
20130629 If you build it, will they visit [ala lita lightning talk]20130629 If you build it, will they visit [ala lita lightning talk]
20130629 If you build it, will they visit [ala lita lightning talk]
 
20130412 Productivity of the crowd [acrl indianapolis]
20130412 Productivity of the crowd [acrl indianapolis]20130412 Productivity of the crowd [acrl indianapolis]
20130412 Productivity of the crowd [acrl indianapolis]
 

Dernier

How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17Celine George
 
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustVani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustSavipriya Raghavendra
 
Over the counter (OTC)- Sale, rational use.pptx
Over the counter (OTC)- Sale, rational use.pptxOver the counter (OTC)- Sale, rational use.pptx
Over the counter (OTC)- Sale, rational use.pptxraviapr7
 
10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdf10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdfJayanti Pande
 
EBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlEBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlDr. Bruce A. Johnson
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17Celine George
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17Celine George
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxAditiChauhan701637
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?TechSoup
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesCeline George
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxheathfieldcps1
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfMohonDas
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptxSandy Millin
 
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Dr. Asif Anas
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17Celine George
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.EnglishCEIPdeSigeiro
 
Department of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfDepartment of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfMohonDas
 

Dernier (20)

How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17
 
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustVani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
 
Over the counter (OTC)- Sale, rational use.pptx
Over the counter (OTC)- Sale, rational use.pptxOver the counter (OTC)- Sale, rational use.pptx
Over the counter (OTC)- Sale, rational use.pptx
 
10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdf10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdf
 
EBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlEBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting Bl
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 Sales
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdf
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
 
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdfPersonal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.
 
Department of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfDepartment of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdf
 
March 2024 Directors Meeting, Division of Student Affairs and Academic Support
March 2024 Directors Meeting, Division of Student Affairs and Academic SupportMarch 2024 Directors Meeting, Division of Student Affairs and Academic Support
March 2024 Directors Meeting, Division of Student Affairs and Academic Support
 

20130630 What motivates library crowdsourcing volunteers? [ALA LITA]

  • 1. What motivates library crowdsourcing volunteers? Experiences at the California Digital Newspaper Collection and the Cambridge Public Library Newspapers Collection Brian Geiger Director, Center for Bibliographical Studies and Research California Digital Newspaper Collection Frederick Zarndt Chair, IFLA Newspapers Section
  • 2. Photo held by John Oxley Library, State Library of Queensland. Original from Courier-mail, Brisbane, Queensland, Australia. 1. crowds, properties of 2.crowdsourcing, short history of 3.crowdsourcing and libraries, especially digital newspapers collections 4.crowdsourcing, motivations for 5.crowdsourcing, interesting fact 6.crowdsourcing, website traffic 7. crowdsourcing, accuracy of 8.crowdsourcing, economic benefits of 9.crowdsourcing, other benefits of
  • 4. In it he says ... In 2004 James Surowiecki published ... The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations
  • 5. ... a crowd of persons that are diverse ...
  • 9. “Country Fair” by Grandma Moses. Original painting 1950.
  • 10. wisdom of crowds in summary James Surowiecki, The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations, Anchor Books, New York, 2005. Diversity Each person should have private information even if it's just an eccentric interpretation of the known facts. Independence People's opinions aren't determined by the opinions of those around them. Decentralization People are able to specialize and draw on local knowledge. Aggregation Some mechanism exists for turning private judgments into a collective decision.
  • 12. “crowdsourcing” was coined by Jeff Howe in “The rise of crowdsourcing” published in Wired magazine June 2006.
  • 14. • On the date of publication of Jeff Howe’s Wired magazine article, 1-Jun-2007, Wikipedia did not have an entry (list) of crowdsourcing projects*. • On 25-Jan-2010 Wikipedia’s list of crowdsourcing projects had 35 entries*. • On 17-Mar -2013 Wikipedia’s list of crowdsourcing projects had 158 entries+. * From Internet Archives’ Wayback Machine. + Wikipedia contributors, "List of crowdsourcing projects," Wikipedia, The Free Encyclopedia, https:// en.wikipedia.org/wiki/List_of_crowdsourcing_projects (accessed March 17, 2013).
  • 15. Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers. ... [It] is different from ordinary outsourcing since it is a task or problem that is outsourced to an undefined public rather than a specific, named group. Wikipedia contributors, "Crowdsourcing," Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/ Crowdsourcing (accessed March 17, 2013)
  • 16. Amazon Mechanical Turk was launched Nov 2005 Alexa global / USA rank of Amazon Mechanical Turk (Jun 2013): 8,242 / 3,060 Alexa reputation (Jun 2013): 1.344 crowdsourcing
  • 17. CAPTCHA is a Completely Automated Public Turing test to tell Computers and Human Apart Started as a computer science project at Carnegie Mellon Each day 200,000,000 recaptcha’s are solved by humans around the world
  • 18. Open Street Map was 1st launched in 2007 Alexa global / Belarus / USA traffic rank of Open Street Map (Jun 2013): 8,853 / 1,373 / 20,114 Alexa reputation (Jun 2013): 28,368
  • 19. Zooniverse was 1st launched July 2007 Alexa global / USA traffic rank of Zooniverse (Jun 2013): 275,960 / 153,853 Alexa reputation (Jun 2013): 502 Registered users (Jun 2013): 848,140 citizen science
  • 20. Kickstarter was launched in 2009 Alexa global / USA traffic rank of Kickstarter (Jun 2013): 798 / 361 Alexa reputation (Jun 2013): 85,591 43,000+ projects successfully funded with more than USD $669,000,000 crowdfunding
  • 22. • Began 2001 • Now in 285 languages • 40 wikipedia languages with more than 100,000 articles • 112 wikipedia languages with more than 10,000 articles • 400,000,000 unique visitors per month • 3,900,000+ articles in English, 1,400,000+ in German, 1,250,000+ in French, 1,050,000 in Dutch • 85,000 active contributors / more than 300,000 have edited Wikipedia more than 10 times • Alexa global / USA traffic rank (Jun 2013): 7 / 8 • Alexa reputation (Jun 2013): 2,102,569
  • 23. Wikipedia contributors, "List of crowdsourcing projects," Wikipedia, The Free Encyclopedia, http:// en.wikipedia.org/wiki/List_of_crowdsourcing_projectss (accessed Jun 2013).
  • 27. Family Search Indexing was 1st launched (beta) 2004 Alexa global / USA traffic rank of FamilySearch (Jun 2013): 4,664 / 1,373 Alexa reputation (Jun 2013): 8,140
  • 28. • Started (beta) 2004 • More than 780,000 worldwide registered volunteers from ~25 countries index records relevant to family history • Approximately 100,000 active volunteers each month • UI in Chinese, English, German, French, Italian, Japanese, Korean, Portuguese, and Russian • Blind double-key entry with arbitration / reconciliation • More than 1,500,088,741 records indexed (July 2012) • Accuracy typically > 99.95%
  • 29. Project Gutenberg was 1st launched Dec 1971 Alexa global traffic rank of Project Gutenberg (Jun 2013): 6,709 Alexa reputation (Jun 2013): 31,190
  • 30. • Started Dec 1971 • Worldwide volunteers transcribe or proofread OCR’d public domain books through Distributed Proofreaders • 40,000 books completed (July 2012) • Partner / affiliated projects for Australia, Canada, Europe, Germany, Luxembourg, Philippines, Runeberg (Nordic literature), Russia, Taiwan
  • 31. Alexa global / Australia traffic rank of National Library of Australia (June 2013): 15,435 / 411 Trove gets ~77% of all National Library web traffic. Alexa reputation (Jun 2013): 8,788
  • 32. National Library of Australia • Online since 2008 • More than 9,830,000 / 96,000,000 newspaper pages / articles (May 2013) • Top text corrector 1,911,000+ lines (May 2013) • 2,661,140 lines corrected each month (average for 1st 5 months 2013) • 96,304,337 lines corrected as of May 2013, up from 66,527,535 lines corrected May 2012 • 95,299 / 8,519 registered / active users (May 2013)
  • 33. Alexa global / country traffic rank of National Library of Finland 2,535,854 (31-Oct-2012) / 199 (2-Apr-2012)
  • 34. National Library of Finland • Digitalkoot is a project to improve OCR text in digitized newspapers -- by playing games! • Digitalkoot is a collaboration between the National Library and Microtask • Players correct OCR text by playing Myyräsillassa (Mole Bridge) or Myyräjahdissa (Mole Hunt) • National Library has 4,000,000+ digitized pages • 109,321 registered players (October 2012) • Since February 2011 8,024,530 micro-tasks have been completed
  • 35. Alexa global / USA traffic rank of UC Riverside (June 2013): 12,584 / 4,382 CDNC gets ~1.6% of all UC Riverside web traffic.
  • 36. California Digital Newspaper Collection • CDNC began digitizing newspapers in 2005 as part of NDNP • Newspapers digitized to article-level as well as to page-level as required by NDNP • Hosted on Veridian beginning 2009 • Collection size 61,351 issues, 544,474 pages, 6,327,491 articles (May 2013)
  • 37. OCR text correction • OCR text correction added Aug 2011 • Corrections are done line by line • 1340 registered / 599 active users (Jun 2013) • ~1,160,465+ lines of text corrected (Jun 2013) • ~1.1% of the collection corrected, 98.9% to go! • Top corrector 379,573 lines > 2x 2nd corrector
  • 39. Cambridge Public Library Historic Newspaper Collection • Cambridge Historic Newspapers online since Jan 2012. • Cambridge Massachusetts Public Library digitized local newspapers (http://cambridge.dlconsulting.com/) • Newspapers digitized to article-level • Collection size 6,346 issues, 59,070 pages, 669,406 articles (May 2013) • Collection includes 13,099 obituary cards
  • 41. Cognitive surplus ... people are learning to use their free time for creative activities rather than consumptive ones [such as watching TV] ... ... the total human cognitive effort in creating all of Wikipedia in every language is about one hundred million hours ... ... Americans alone watch two hundred billion hours of TV every year, or enough time, if it would be devoted to projects similar to Wikipedia, to create about 2000 of them ... Clay Shirky. Cognitive surplus: Creativity and generosity in a connected age. Penguin Press. New York. 2010.
  • 42. User Lines corrected 1 379,578 2 136,637 3 56,996 4 52,093 5 46,102 6 39,617 7 35,878 8 33,204 9 31,319 10 30,408 Lines corrected User 1,456,906 1 1,385,369 2 1,010,360 3 960,230 4 847,340 5 786,147 6 657,187 7 600,513 8 582,276 9 565,384 10
  • 44. Motivation genealogists and family historians • National Library of Australia’s 2012 Trove status report showed that ~50% of Trove users are family historians • National Library of New Zealand survey found that ~50% of PapersPast users are genealogists • A 2012 Utah Digital Newspapers survey showed that 72% of its users are genealogists* PAPERSPAST *John Herbert and Randy Olsen. “Small town papers: Still delivering the news”. Paper given at 2012 World Library and Information Congress. Helsinki. August 2012.
  • 45. User survey • CDNC and Cambridge Public Library published a user survey in Mar 2013 (it’s still online) • 604 / 32 responses • surveys are (mostly) identical except for organization name Survey is on home pages at http://cdnc.ucr.edu and http://cambridge.dlconsulting.com/
  • 50. “I enjoy the correction - it’s a great way to learn more about past history and things of interest whilst doing a ‘service to the community’ by correcting text for the benefit of others.” “I have recently retired from IT and thought that I could be of some assistance to the project. It benefits me and other people. It helps with family research.” From Rose Holley in “Many Hands Make Light Work.” National Library of Australia March 2009. Motivation Trove users’ report
  • 51. “I am interested in all kinds of history. I have pursued genealogy as a hobby for many years. I correct text at CDNC because I see it as a constructive way to contribute to a worthwhile project. Because I am interested in history, I enjoy it.” Wesley, California Personal communications with CDNC text correctors. Motivation CDNC users’ report
  • 52. “I only correct the text on articles of local interest - nothing at state, national or international level, no advertisements, etc.  The objective is to be able to help researchers to locate local people, places, organizations and events using the on-line search at CDNC.  I correct local news & gossip, personal items, real estate transactions, superior court proceedings, county and local board of supervisors meetings, obituaries, birth notices, marriages, yachting news, etc.” Ann, California Personal communications with CDNC text correctors. Motivation CDNC users’ report
  • 53. “I am correcting text for the Coronado Tent City Program for 1903.  It is important to correct any problems with personal names and other information so that researchers will be able to search by keyword and be assured of retrieving desired results. ... type fonts cause a great deal of difficulty in digitizing the text and can cause problems for searchers.  Also, many of the guests' names at Tent City and Hotel Del Coronado were taken from the registration books and reported in the Program.  This led to many problems in spelling of last names and the editors were not careful to be consistent in the spellings.  This Program is an important resource since it provides an excellent picture of daily life in Tent City and captures much of the history of Coronado itself.” Gene, California Personal communications with CDNC text correctors. Motivation CDNC users’ report
  • 54. “I have always been interested in history, especially the development of the American West, and nothing brings it alive better than newspapers of the time. I believe them to be an invaluable source of knowledge for us and future generations.” David, United Kingdom Personal communications with CDNC text correctors. Motivation CDNC users’ report
  • 55. CDNC is an excellent source of information matching my personal interest in such topics as sea history, development of shipbuilding, clippers and other ships etc. ... Unfortunately, the quality of text ... is rather poor I’m afraid. This is why I started to do all corrections necessary for myself ... and to leave the corrected text for use of others. .... I am not doing this very regularly as this is just my hobby and pleasure. Jerzey, Poland Personal communications with CDNC text correctors. Motivation CDNC users’ report
  • 56. As an amateur historical researcher my time for research is very limited.  Making time to travel to archives, libraries, and historical societies does not happen as often as I would like.  The Cambridge Public Library’s online newspaper collection has been an invaluable resource and it is fun.  I am very grateful for all the help I have received over the years from so many research organizations. Correcting text has several benefits.  It makes it much more likely that I will find a story if I decide to search for it in the future.  It is a way of saying ‘thank you’ to the Cambridge Library for having such a great resource available and maybe I can make the next person’s research a little easier. It is my own little historical preservation project. Daniel, Massachusetts Personal communications with Cambridge text correctors. Motivation Cambridge users’ report
  • 57. Graphic from Kaufmann et al. “More than fun and money. Worker Motivation in Crowdsourcing – A Study on Mechanical Turk.” Motivation
  • 59. the long tail* of crowdsourced OCR text correction a probability distribution has a long tail if a larger share of the population rests within its tail than would be the case if the population were normally distributed. the most productive users represent a small fraction of the total user population and complete ~50% of total production. said a different way, less productive users represent the largest fraction of the user population and also complete ~50% of the total production. The phrase “long tail” was popularized by Chris Anderson in the October 2004 Wired magazine article The Long Tail and by Clay Shirky’s February 2003 essay “Power laws, web logs, and inequality”.
  • 60. OCR text correction long tails 0 75000 150000 225000 300000 CDNC lines corrected by text corrector 0 750,000 1,500,000 2,250,000 3,000,000 NLA lines corrected by text corrector top corrector 242,965 top corrector 1,456,906 50% 50% 50% 50% Statistics from Oct 2012
  • 62. Website traffic After a crowdsourcing transcription project of diaries from the American War Between the States, Nicole Saylor, Head of Digital Library Services at the University of Iowa Libraries, reported “On June 9, 2011, we went from about 1000 daily hits to our digital library on a really good day to more than 70,000.” Nicole Saylor interviewed by Trevor Owens. “Crowdsourcing the Civil War: Insights Interview with Nicole Saylor” blog post at http://blogs.loc.gov/digitalpreservation/2011/12/crowdsourcing-the-civil-war-insights-interview-with-nicole-saylor/. Dec 6, 2011.
  • 63. Website traffic Website traffic at CDNC before / after implementing crowdsourcing before crowdsourcing 11-Jun-2011 / 12-Jul-2011 after crowdsourcing 11-Jun-2012 / 12-Jul-2012 change visits 17,485 21,488 +22.9% unique visitors 11,381 13,376 +17.5% visit duration 9m 24s 11m 7s +18.3% bounce rate 51.3% 44.5% -6.8% pages per visit 14.9 11.7 -21.5%
  • 65. Accuracy “His Accuracy Depends on Ours!" Office for Emergency Management. Office of War Information. Domestic Operations Branch. Bureau of Special Services. [Photo held at US National Archives and Records Administration]
  • 66. Why correct OCR text? Here’s why ...
  • 67. Deaths. lln»rieff, Esq. of <c .. Qn. Sunday, the till. greatly Drandrellt, of Orms4irJi.- ~ ; ;✓ ' • * On ijfr r inn l j j j i l F i i j ' 1 1 f H a v o d i v y d , Carnarvonshire, S ; **" *- ' « ' March Oxford, F. Tfovmeud, Uerald. » • V . •On Tncsdav last, Mr. Charles. IWilinson, this 8 ; had vf thesis#,, a week ago, which tcrminate<i'iu his death. . / ' ■ O'i Sunday, dJst nit. at. AsbtCnvHall, mar Lancaster, Mr.,Geo. Worn ick, many years house'steward hit late Once The Hamilton and Brandon. He locked himself h»oWn'r«wte<: soon. twelve o'clock" that dny, and fii»-d a loaded pistol "through Ins bead, 1 which instantaneously killed him. Coronet's Verdict, shot himself in a temporary fit of Friday week, raw OCR text Excerpt from The British Newspaper Archive, Chester Courant, Tuesday 6-Apr-1819, page 3. newspaper image
  • 68. Accuracy • Edwin Kiljin (Koninklijke Bibliotheek the Netherlands) reports raw OCR character accuracies of 68% for early 20th century newspapers • Rose Holley (National Library of Australia) reports raw OCR character accuracy varied from 71% to 98% on a sample Trove digitized newspapers Rose Holley. “How good can it get? Analysing and improving OCR accuracy in large scale historic newspaper digitisation programs. D-Lib Magazine. March/April 2009. Edwin Kiljin. “The current state-of-art in newspaper digitization.” D-Lib Magazine. January/February 2008. Public domain graphic courtesy of Wikimedia Commons.
  • 69. Accuracy MAPPING TEXTS* assesses digitization quality of digital newspapers by comparing the number of words recognized to the total number of words scanned * Mapping texts is a collaboration between the University of North Texas and Stanford University aimed at experimenting with new methods for finding and analyzing meaningful patterns embedded in massive collections of digital newspapers.
  • 70. uncorrected OCR accuracy by newspaper title Title OCR character accuracy ~OCR word accuracy* PRP Pacific Rural Press 1871 - 1922 92.6% 68.1% SFC San Francisco Call 1890 - 1913 92.6% 68.1% LAH Los Angeles Herald 1873 - 1910 88.7% 54.9% LH Livermore Herald 1877 - 1899 88.6% 54.6% DAC Daily Alta California 1841 - 1891 88.2% 53.4% CFJ California Farmer and Journal of Useful Sciences 1855 - 1880 86.5% 48.4% SN Sausalito News 1885 - 1922 70.4% 17.3% *Word accuracy assumes average word length is 5 characters
  • 71. OCR accuracy by newspaper title Title OCR character accuracy Corrected accuracy PRP Pacific Rural Press 1871 - 1922 92.6% 99.3% SFC San Francisco Call 1890 - 1913 92.6% 99.6% LAH Los Angeles Herald 1873 - 1910 88.7% 99.1% LH Livermore Herald 1877 - 1899 88.6% 99.9% DAC Daily Alta California 1841 - 1891 88.2% 99.9% CFJ California Farmer and Journal of Useful Sciences 1855 - 1880 86.5% 99.8% SN Sausalito News 1885 - 1922 70.4% 100.0%
  • 72. corrected accuracy by newspaper title Title OCR character accuracy ~OCR word accuracy* Corrected accuracy ~Corrected word accuracy* PRP 1871 - 1922 92.6% 68.1% 99.3% 96.5% SFC 1890 - 1913 92.6% 68.1% 99.6% 98.0% LAH 1873 - 1910 88.7% 54.9% 99.1% 95.6% LH 1877 - 1899 88.6% 54.6% 99.9% 99.5% DAC 1841 - 1891 88.2% 53.4% 99.9% 99.5% CF 1855 - 1880 86.5% 48.4% 98.3% 91.8% SN 1885 - 1922 70.4% 17.3% 100.0% 100.0% *Word accuracy assumes average word length is 5 characters
  • 73. correction accuracy by user User Average OCR accuracy Correction accuracy A 70.4% 100.0% B 87.1% 99.5% C 95.4% 99.5% D 86.5% 98.3% E 95.3% 100.0% F 91.0% 100.0% G 91.0% 99.8% H 90.5% 99.0% I 96.6% 99.8% J 94.8% 100.0% K 86.8% 99.3%
  • 74. How does low text accuracy affect search recall? The Facts • Average uncorrected OCR character accuracy of the CDNC sample data is ~89% • Average length of an English word is 5 characters • Average word accuracy is 89% x 89% x 89% x 89% x 89% = 55.8% - round up to 60% or 6 out of 10 words correct Accuracy
  • 75. ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT Search recall no text correction instances of “ARNDT” found instances of “ARNDT” not found
  • 76. Accuracy The Facts • Average corrected character accuracy of the CDNC sample data is ~99.4% • Average word accuracy of CDNC corrected text is 99.4% x 99.4% x 99.4% x 99.4% x 99.4% = 97.0%
  • 77. ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT instances of “ARNDT” found instances of “ARNDT” not found Search recall with text correction
  • 78. A search for “Arndt” at Chronicling America gives 10,267 results* • If Chronicling America text accuracy is 55.8% (same as uncorrected CDNC sample), then 8,133 instances of “Arndt” were not found • If text accuracy is 97.0%, then 317 instances of “Arndt” were not found Accuracy * Search performed 31 Oct 2012
  • 79. Accuracy Suppose the word/name is longer than 5 characters? The Facts • Assume that average uncorrected / corrected OCR character accuracy is ~89% / ~99% same as CDNC. Name Name length Raw text accuracy Corrected text accuracy Eklund 6 49.7% 94.2% Kennedy 7 44.2% 93.25 Espinosa 8 39.4% 92.3% Bonaparte 9 35.0% 91.4% Chatterjee 10 31.2% 90.4%
  • 80. Accuracy Name Number of search results Missing results with raw text accuracy Missing results with corrected text accuracy Eklund 2,951 2,987 182 Kennedy 360,723 455,392 26,111 Espinosa 1,918 2,950 160 Bonaparte 44,664 82,947 4,203 Chatterjee 19 42 2 Chronicling America searches done 19-Mar-2013 (6,025,474 pages from 1836 to 1922).
  • 82. $ Economic benefits Financial value of outsourced OCR text correction for newspapers? The Assumptions • 25 to 50 characters per line in a newspaper column: Assume 40 characters per line (CDNC sample average) • Outsourced text transcription or correction costs USD $0.35 to $1.20 per 1000 characters: Assume $0.50 per 1000 characters
  • 83. $ Economics $ 578,000 lines x 40 characters per line x 1/1000 x $0.50 = $11,560 $ 68,908,757 lines x 40 characters per line x 1/1000 x $0.50 = $1,378,175
  • 84. $ Economics Financial value of in-house OCR text correction? The Assumptions • Correction takes 15 seconds per line • Cost is hourly wage plus benefits of lowest level employee, $10 for CDNC, $41.88* for Australia AUD $40.38 = USD $41.88 is the actual labor value assumed by the National Library of Australia to calculate avoided costs due to crowdsourced OCR text correction in its 2012 Trove Status Report.
  • 85. $ Economics $ 578,000 lines x 15 seconds per line x 1/3600 hrs per second x $10.00 per hr = $24,083 $ 68,908,757 lines x 15 seconds per line x 1/3600 hrs per second x $41.88 per hr = $12,024,578
  • 86. Crowdsourcing other benefits Public domain photo “A useful instruction for young sailors from the Royal Hospital School, Greenwich” from the National Maritime Museum.
  • 87. “when someone transcribes a document, they are actually better fulfilling the mission of a cultural heritage organization than someone who simply stops by to flip through the pages” Other benefit Paraphrased from Trevor Owen’s blog http://www.trevorowens.org/2012/03/ crowdsourcing-cultural-heritage-the-objectives-are-upside-down/ (accessed June 2013).
  • 88. “in addition to increasing search accuracy or lowering the costs of document transcription, crowdsourcing is the single greatest advancement in getting people using and interacting with library collections” Other benefit Paraphrased from Trevor Owen’s blog http://www.trevorowens.org/2012/03/ crowdsourcing-cultural-heritage-the-objectives-are-upside-down/ (accessed June 2013).
  • 89. Crowdsourcing considerations • How to market / advertise crowdsourcing? • How to motivate crowdsourcers? • Is authentication / identity of crowdsourcers an issue? • How to administer crowdsourced data? Photo of Aleister Crowley [Public domain] from Wikimedia Commons
  • 90. Conclusions Conclusion of the Sonata for piano #32, opus 111 by Ludwig van Beethoven • Lots of crowdsourcing in cultural heritage organizations and elsewhere • Benefits are multi-faceted: Economic, data accuracy, patron engagement, increased web traffic
  • 91. Resources Public domain photo “A useful instruction for young sailors from the Royal Hospital School, Greenwich” from the National Maritime Museum.
  • 92. Comprehensive worldwide list of online newspaper archives Wikipedia contributors, "List of online newspaper archives," Wikipedia, The Free Encyclopedia, https:// en.wikipedia.org/wiki/Wikipedia:List_of_online_newspaper_archives (accessed March 17, 2013).
  • 93. Search many digital newspaper collections at once! As of June 2013 elephind (http://www.elephind.com) has indexed 1,032 newspapers from 13 historical digital collections comprising 1,097,928 issues and 45,243,443 pages/articles.
  • 94. Correct California newspapers at http://cdnc.ucr.edu Correct Cambridge MA newspapers http://bit.ly/cambridgepublic Correct Australian newspapers http://trove.nla.gov.au Correct Tennessee newspapers http://tndp.lib.utk.edu Correct Virginia newspapers http://virginiachronicle.com Try crowdsourcing!
  • 95. Hãy thử crowdsourcing! Or try Russian language periodicals http://bit.ly/russianperiodicals Correct Vietnamese newspapers http://bit.ly/nationallibraryofvietnam Попробуйте краудсорсинга!
  • 96. Other resources Mapping Texts at http://mappingtexts.stanford.edu/ Wragge Labs at http://wraggelabs.com/ Wikipedia list of crowdsourcing projects https://en.wikipedia.org/wiki/ List_of_crowdsourcing_projects
  • 97. ?Brian Geiger bgeiger@ucr.edu Frederick Zarndt frederick@frederickzarndt.com Photo held by John Oxley Library, State Library of Queensland. Original from Courier-mail, Brisbane, Queensland, Australia.