SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
Many Hands Make Light Work,
   the American Version

  Experiences with User-Text-Correction at California Digital
              Newspaper Collection (CDNC):	

    How crowd-sourcing OCR text correction impacts a
                historic newspaper collection
About the Collection
The California Digital Newspaper
Collection contains over 490,000 pages of
                                                                   visits per month                	

significant California newspapers published
from 1846 to 1922. 	


The newspapers were digitized to both                              minutes per visit	

page and article level METS/ALTO data as
part of the National Digital Newspaper
Program.	

                                                        pages per visit	

                                             site statistics between Nov. 2010 and Aug. 2011 	

The collection is displayed using Veridian
digital library software.
poor OCR reduces search recall to low levels
 OCR quality ranges between 50%-90% of word level accuracy
Daily Alta California, 2 January 1850




                                          $$
post OCR text correction is
       expensive
≈ $0.50 per 1000 characters or $5.00 to
      $10.00 per newspaper page
The Average CDNC
          User                               users above 40 years old	


                                             users who consider
Like the users of many digital newspaper     themselves genealogists	

collections, patrons of the CDNC visit the
site for personal reasons, consider          users who visit the site at
themselves genealogists or family
                                             least weekly	

historians, and return to the site
frequently.
Wikipedia on Crowdsourcing:

     “distributed problem-solving and production model”

“sourcing tasks traditionally performed by specific individuals
to an undefined large group of people or community (crowd)
                    through an open call”
Crowd-Sourcing Projects
        Project Gutenberg	

           Family Search	

  The National Library of Australia	

   The National Library of Finland	

           FreeBMD.org
Site Statistics Since
User Text Correction
            visits per month       	



            minutes per visit	



            pages per visit
lines per month corrected by
           the top corrector	

    30,000       ‘Engaging with users and building virtual
                                                communities is just as important to the
                                                users as providing the data itself. They want
   total lines corrected since
                         2008	

   49 Million   to be part of a community.’	


                                                Rose Holley, The National Library of Australia	


         total number of text
                  correctors	

    30,000
   lines corrected per month
                      in 2011	

   2,000,000 +
User Text Correction added to CDNC
Results
           August 22 - October 22	


Users who have                     Lines Corrected Per Month
corrected text	


Lines corrected by
top corrector	


Total number of lines
corrected
Goals

•  Improve OCR text at low cost	

•  Improve search precision / recall	

•  Build user community
Risks?

•  User text correction of newspapers is (relatively) new 	

•  Users won’t know what to do, interface is confusing	

•  Users don’t understand errors in OCR text	

• Vandalism of text
Benefits
• Text quality improved 	

•  Cost effective	

     $
•  Community involvement	

•  Users empowered
User Reaction
“Great feature (I tested it during the beta) for a
                                                           “I have used the new system and like it. The user
great site, which I have used extensively.  I plan to
                                                           correction is great idea.”	

use the edit feature when I get back to research in
                                                                                                           ~Pat	

the Los Angeles Herald and the Daily Alta
California.”	


                                        ~Lawrence B. 	

                                                           “Exactly what the system needed!!! Pulled up a
                                                           couple articles in the beta system and made some
                                                           text corrections. Went back and tried the old
                                                           system using the words I corrected and it worked!!	

“STUNNINGLY  FANTASTIC!!!! is what I think!”	

                                                           Outstanding enhancement!”	

                ~A fifth generation Californian 	

                                                                                                      ~Mary B.	

                of multiple Forty-niner families
“The addition of user text correction (UTC) to the California Digital
 Newspaper Collection has dramatically improved the quality of the
  computer-generated text and enlivened our relationship with our
users.  Within a couple of weeks of implementing UTC, and with little
 publicity, a handful of users had already corrected thousands of lines
of text.  Many of those users emailed us directly with questions about
or praise for the UTC, building direct, personal connections between
              our staff and users that hadn’t existed before.”	


   ~Brian Geiger, Center for Bibliographic Research, UC Riverside
?	

Brian Geiger, Director Center for Bibliographic Studies and Research	

                  University of California Riverside	

                          bgeiger@ucr.edu	


         Frederick Zarndt, Chair IFLA Newspapers Section	

                   frederick@frederickzarndt.com

Contenu connexe

Similaire à Many hands make light work, the american version [charleston library conference 201111]

Web-Scale Discovery: Post Implementation
Web-Scale Discovery: Post ImplementationWeb-Scale Discovery: Post Implementation
Web-Scale Discovery: Post ImplementationRachel Vacek
 
Council staff and you
Council staff and youCouncil staff and you
Council staff and youKathryn Fehon
 
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...Trish Rose-Sandler
 
Web Content Strategy for Libraries
Web Content Strategy for LibrariesWeb Content Strategy for Libraries
Web Content Strategy for LibrariesChris Evjy
 
Legal Research using digitised historic Australian Newspapers August 2010, by...
Legal Research using digitised historic Australian Newspapers August 2010, by...Legal Research using digitised historic Australian Newspapers August 2010, by...
Legal Research using digitised historic Australian Newspapers August 2010, by...Rose Holley
 
Best practices in library services
Best practices in library servicesBest practices in library services
Best practices in library servicesFe Angela Verzosa
 
Books and Webs: Pulling the Down Rows
Books and Webs: Pulling the Down RowsBooks and Webs: Pulling the Down Rows
Books and Webs: Pulling the Down RowsPeter Brantley
 
Building and Managing Online Communities
Building and Managing Online CommunitiesBuilding and Managing Online Communities
Building and Managing Online CommunitiesRose Holley
 
Shared advocacy through data-looking beyond the high cost of journals
Shared advocacy through data-looking beyond the high cost of journalsShared advocacy through data-looking beyond the high cost of journals
Shared advocacy through data-looking beyond the high cost of journalsJane Nichols
 
20121105 no tempest in my teapot [dlf forum denver]
20121105 no tempest in my teapot [dlf forum denver]20121105 no tempest in my teapot [dlf forum denver]
20121105 no tempest in my teapot [dlf forum denver]Frederick Zarndt
 
Repositioning realignment and the researcher
Repositioning realignment and the researcherRepositioning realignment and the researcher
Repositioning realignment and the researcherLIBER Europe
 
How Research and Community Inputs Fuel the Library On-Demand.
How Research and Community Inputs Fuel the Library On-Demand.How Research and Community Inputs Fuel the Library On-Demand.
How Research and Community Inputs Fuel the Library On-Demand.Lynn Connaway
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsRobert H. McDonald
 
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...tfons
 
Council staff and you
Council staff and youCouncil staff and you
Council staff and youkjoss
 
Council staff and you
Council staff and youCouncil staff and you
Council staff and youkjoss
 
People, Communities and Platforms: Digital Cultural Heritage and the Web
People, Communities and Platforms: Digital Cultural Heritage and the WebPeople, Communities and Platforms: Digital Cultural Heritage and the Web
People, Communities and Platforms: Digital Cultural Heritage and the WebTrevor Owens
 

Similaire à Many hands make light work, the american version [charleston library conference 201111] (20)

Web-Scale Discovery: Post Implementation
Web-Scale Discovery: Post ImplementationWeb-Scale Discovery: Post Implementation
Web-Scale Discovery: Post Implementation
 
Council staff and you
Council staff and youCouncil staff and you
Council staff and you
 
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
 
Web Content Strategy for Libraries
Web Content Strategy for LibrariesWeb Content Strategy for Libraries
Web Content Strategy for Libraries
 
Legal Research using digitised historic Australian Newspapers August 2010, by...
Legal Research using digitised historic Australian Newspapers August 2010, by...Legal Research using digitised historic Australian Newspapers August 2010, by...
Legal Research using digitised historic Australian Newspapers August 2010, by...
 
Best practices in library services
Best practices in library servicesBest practices in library services
Best practices in library services
 
Books and Webs: Pulling the Down Rows
Books and Webs: Pulling the Down RowsBooks and Webs: Pulling the Down Rows
Books and Webs: Pulling the Down Rows
 
Building and Managing Online Communities
Building and Managing Online CommunitiesBuilding and Managing Online Communities
Building and Managing Online Communities
 
Shared advocacy through data-looking beyond the high cost of journals
Shared advocacy through data-looking beyond the high cost of journalsShared advocacy through data-looking beyond the high cost of journals
Shared advocacy through data-looking beyond the high cost of journals
 
20121105 no tempest in my teapot [dlf forum denver]
20121105 no tempest in my teapot [dlf forum denver]20121105 no tempest in my teapot [dlf forum denver]
20121105 no tempest in my teapot [dlf forum denver]
 
NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...
NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...
NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...
 
Repositioning realignment and the researcher
Repositioning realignment and the researcherRepositioning realignment and the researcher
Repositioning realignment and the researcher
 
How Research and Community Inputs Fuel the Library On-Demand.
How Research and Community Inputs Fuel the Library On-Demand.How Research and Community Inputs Fuel the Library On-Demand.
How Research and Community Inputs Fuel the Library On-Demand.
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your Patrons
 
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...
 
Council staff and you
Council staff and youCouncil staff and you
Council staff and you
 
Council staff and you
Council staff and youCouncil staff and you
Council staff and you
 
Cil06giltrud(1)
Cil06giltrud(1)Cil06giltrud(1)
Cil06giltrud(1)
 
People, Communities and Platforms: Digital Cultural Heritage and the Web
People, Communities and Platforms: Digital Cultural Heritage and the WebPeople, Communities and Platforms: Digital Cultural Heritage and the Web
People, Communities and Platforms: Digital Cultural Heritage and the Web
 
Cataloging Presentation
Cataloging PresentationCataloging Presentation
Cataloging Presentation
 

Plus de Frederick Zarndt

Digitization of the Tuol Sleng Genocide Museum Archives
Digitization of the Tuol Sleng Genocide Museum ArchivesDigitization of the Tuol Sleng Genocide Museum Archives
Digitization of the Tuol Sleng Genocide Museum ArchivesFrederick Zarndt
 
2017 Born Digital Legal Deposit Policies and Practices
2017 Born Digital Legal Deposit Policies and Practices2017 Born Digital Legal Deposit Policies and Practices
2017 Born Digital Legal Deposit Policies and PracticesFrederick Zarndt
 
e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017Frederick Zarndt
 
Project Management according to Great Pumpkin Principles
Project Management according to Great Pumpkin PrinciplesProject Management according to Great Pumpkin Principles
Project Management according to Great Pumpkin PrinciplesFrederick Zarndt
 
What did you say? interculture communication [20160308 phnom penh]
What did you say? interculture communication [20160308 phnom penh]What did you say? interculture communication [20160308 phnom penh]
What did you say? interculture communication [20160308 phnom penh]Frederick Zarndt
 
Coronado public library digital newspapers workshop [Oct 2016]
Coronado public library digital newspapers workshop [Oct 2016]Coronado public library digital newspapers workshop [Oct 2016]
Coronado public library digital newspapers workshop [Oct 2016]Frederick Zarndt
 
What did you say? mindful interculture communication [201608 icgse]
What did you say? mindful interculture communication [201608 icgse]What did you say? mindful interculture communication [201608 icgse]
What did you say? mindful interculture communication [201608 icgse]Frederick Zarndt
 
Here Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsHere Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsFrederick Zarndt
 
Here Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsHere Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsFrederick Zarndt
 
Rootstech 2015 finding and using digitized historical newspapers workshop [20...
Rootstech 2015 finding and using digitized historical newspapers workshop [20...Rootstech 2015 finding and using digitized historical newspapers workshop [20...
Rootstech 2015 finding and using digitized historical newspapers workshop [20...Frederick Zarndt
 
20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]Frederick Zarndt
 
What did you say? Intercultural expectations, misunderstandings, and communic...
What did you say? Intercultural expectations, misunderstandings, and communic...What did you say? Intercultural expectations, misunderstandings, and communic...
What did you say? Intercultural expectations, misunderstandings, and communic...Frederick Zarndt
 
20140628 crowdsourcing, family history, and long tails for libraries [ala ann...
20140628 crowdsourcing, family history, and long tails for libraries [ala ann...20140628 crowdsourcing, family history, and long tails for libraries [ala ann...
20140628 crowdsourcing, family history, and long tails for libraries [ala ann...Frederick Zarndt
 
20140408 digital newspapers collections [idlc kuala lumpur]
20140408 digital newspapers collections [idlc kuala lumpur]20140408 digital newspapers collections [idlc kuala lumpur]
20140408 digital newspapers collections [idlc kuala lumpur]Frederick Zarndt
 
20131019 digital collections - if you build them will anyone visit [library 2...
20131019 digital collections - if you build them will anyone visit [library 2...20131019 digital collections - if you build them will anyone visit [library 2...
20131019 digital collections - if you build them will anyone visit [library 2...Frederick Zarndt
 
20130903 what did you say? interculture communication [hamburg]
20130903 what did you say? interculture communication [hamburg]20130903 what did you say? interculture communication [hamburg]
20130903 what did you say? interculture communication [hamburg]Frederick Zarndt
 
201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...Frederick Zarndt
 
201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...Frederick Zarndt
 
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...Frederick Zarndt
 
2013 ifla satellite zarndt et al [crowdsourcing the world's cultural heritage...
2013 ifla satellite zarndt et al [crowdsourcing the world's cultural heritage...2013 ifla satellite zarndt et al [crowdsourcing the world's cultural heritage...
2013 ifla satellite zarndt et al [crowdsourcing the world's cultural heritage...Frederick Zarndt
 

Plus de Frederick Zarndt (20)

Digitization of the Tuol Sleng Genocide Museum Archives
Digitization of the Tuol Sleng Genocide Museum ArchivesDigitization of the Tuol Sleng Genocide Museum Archives
Digitization of the Tuol Sleng Genocide Museum Archives
 
2017 Born Digital Legal Deposit Policies and Practices
2017 Born Digital Legal Deposit Policies and Practices2017 Born Digital Legal Deposit Policies and Practices
2017 Born Digital Legal Deposit Policies and Practices
 
e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017
 
Project Management according to Great Pumpkin Principles
Project Management according to Great Pumpkin PrinciplesProject Management according to Great Pumpkin Principles
Project Management according to Great Pumpkin Principles
 
What did you say? interculture communication [20160308 phnom penh]
What did you say? interculture communication [20160308 phnom penh]What did you say? interculture communication [20160308 phnom penh]
What did you say? interculture communication [20160308 phnom penh]
 
Coronado public library digital newspapers workshop [Oct 2016]
Coronado public library digital newspapers workshop [Oct 2016]Coronado public library digital newspapers workshop [Oct 2016]
Coronado public library digital newspapers workshop [Oct 2016]
 
What did you say? mindful interculture communication [201608 icgse]
What did you say? mindful interculture communication [201608 icgse]What did you say? mindful interculture communication [201608 icgse]
What did you say? mindful interculture communication [201608 icgse]
 
Here Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsHere Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital News
 
Here Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsHere Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital News
 
Rootstech 2015 finding and using digitized historical newspapers workshop [20...
Rootstech 2015 finding and using digitized historical newspapers workshop [20...Rootstech 2015 finding and using digitized historical newspapers workshop [20...
Rootstech 2015 finding and using digitized historical newspapers workshop [20...
 
20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]
 
What did you say? Intercultural expectations, misunderstandings, and communic...
What did you say? Intercultural expectations, misunderstandings, and communic...What did you say? Intercultural expectations, misunderstandings, and communic...
What did you say? Intercultural expectations, misunderstandings, and communic...
 
20140628 crowdsourcing, family history, and long tails for libraries [ala ann...
20140628 crowdsourcing, family history, and long tails for libraries [ala ann...20140628 crowdsourcing, family history, and long tails for libraries [ala ann...
20140628 crowdsourcing, family history, and long tails for libraries [ala ann...
 
20140408 digital newspapers collections [idlc kuala lumpur]
20140408 digital newspapers collections [idlc kuala lumpur]20140408 digital newspapers collections [idlc kuala lumpur]
20140408 digital newspapers collections [idlc kuala lumpur]
 
20131019 digital collections - if you build them will anyone visit [library 2...
20131019 digital collections - if you build them will anyone visit [library 2...20131019 digital collections - if you build them will anyone visit [library 2...
20131019 digital collections - if you build them will anyone visit [library 2...
 
20130903 what did you say? interculture communication [hamburg]
20130903 what did you say? interculture communication [hamburg]20130903 what did you say? interculture communication [hamburg]
20130903 what did you say? interculture communication [hamburg]
 
201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...
 
201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...
 
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
 
2013 ifla satellite zarndt et al [crowdsourcing the world's cultural heritage...
2013 ifla satellite zarndt et al [crowdsourcing the world's cultural heritage...2013 ifla satellite zarndt et al [crowdsourcing the world's cultural heritage...
2013 ifla satellite zarndt et al [crowdsourcing the world's cultural heritage...
 

Dernier

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Many hands make light work, the american version [charleston library conference 201111]

  • 1. Many Hands Make Light Work, the American Version Experiences with User-Text-Correction at California Digital Newspaper Collection (CDNC): How crowd-sourcing OCR text correction impacts a historic newspaper collection
  • 2. About the Collection The California Digital Newspaper Collection contains over 490,000 pages of visits per month significant California newspapers published from 1846 to 1922. The newspapers were digitized to both minutes per visit page and article level METS/ALTO data as part of the National Digital Newspaper Program. pages per visit site statistics between Nov. 2010 and Aug. 2011 The collection is displayed using Veridian digital library software.
  • 3. poor OCR reduces search recall to low levels OCR quality ranges between 50%-90% of word level accuracy
  • 4. Daily Alta California, 2 January 1850 $$ post OCR text correction is expensive ≈ $0.50 per 1000 characters or $5.00 to $10.00 per newspaper page
  • 5. The Average CDNC User users above 40 years old users who consider Like the users of many digital newspaper themselves genealogists collections, patrons of the CDNC visit the site for personal reasons, consider users who visit the site at themselves genealogists or family least weekly historians, and return to the site frequently.
  • 6. Wikipedia on Crowdsourcing: “distributed problem-solving and production model” “sourcing tasks traditionally performed by specific individuals to an undefined large group of people or community (crowd) through an open call”
  • 7. Crowd-Sourcing Projects Project Gutenberg Family Search The National Library of Australia The National Library of Finland FreeBMD.org
  • 8. Site Statistics Since User Text Correction visits per month minutes per visit pages per visit
  • 9. lines per month corrected by the top corrector 30,000 ‘Engaging with users and building virtual communities is just as important to the users as providing the data itself. They want total lines corrected since 2008 49 Million to be part of a community.’ Rose Holley, The National Library of Australia total number of text correctors 30,000 lines corrected per month in 2011 2,000,000 +
  • 10. User Text Correction added to CDNC
  • 11. Results August 22 - October 22 Users who have Lines Corrected Per Month corrected text Lines corrected by top corrector Total number of lines corrected
  • 12. Goals •  Improve OCR text at low cost •  Improve search precision / recall •  Build user community
  • 13. Risks? •  User text correction of newspapers is (relatively) new •  Users won’t know what to do, interface is confusing •  Users don’t understand errors in OCR text • Vandalism of text
  • 14. Benefits • Text quality improved •  Cost effective $ •  Community involvement •  Users empowered
  • 15. User Reaction “Great feature (I tested it during the beta) for a “I have used the new system and like it. The user great site, which I have used extensively.  I plan to correction is great idea.” use the edit feature when I get back to research in ~Pat the Los Angeles Herald and the Daily Alta California.” ~Lawrence B. “Exactly what the system needed!!! Pulled up a couple articles in the beta system and made some text corrections. Went back and tried the old system using the words I corrected and it worked!! “STUNNINGLY  FANTASTIC!!!! is what I think!” Outstanding enhancement!” ~A fifth generation Californian ~Mary B. of multiple Forty-niner families
  • 16. “The addition of user text correction (UTC) to the California Digital Newspaper Collection has dramatically improved the quality of the computer-generated text and enlivened our relationship with our users.  Within a couple of weeks of implementing UTC, and with little publicity, a handful of users had already corrected thousands of lines of text.  Many of those users emailed us directly with questions about or praise for the UTC, building direct, personal connections between our staff and users that hadn’t existed before.” ~Brian Geiger, Center for Bibliographic Research, UC Riverside
  • 17. ? Brian Geiger, Director Center for Bibliographic Studies and Research University of California Riverside bgeiger@ucr.edu Frederick Zarndt, Chair IFLA Newspapers Section frederick@frederickzarndt.com