SlideShare une entreprise Scribd logo
1  sur  23
Nevada Digital Newspaper Project
Dana Bullinger (Project Coordinator) and Melissa Stoner (Project Technician)
PHASE ONE
Title Selection
● Advisory Board selects qualified titles
○ Research Value
○ Geographic Representation
○ Temporal Coverage
○ Diversity
NDNP Title Guidelines
●Complete (or majority of) title run should be available
on microfilm without restrictions
●Technical factors to consider:
○ Quality of original text and microfilm capture
○ Reduction ratio (lower the reduction ratio, the better, below 20x)
○ Camera master negative microfilm duplicated should have a resolution
test patterns readable at 5.0 or higher
○ Variations of no more than 0.2 within images and between exposures
○ Confidence level through OCR testing of sample page images
Deliverables
For Each Title
•Up-to-date MARC record from the
CONSER OCLC database
•Additional title-level metadata (Reel-Level
Metadata spreadsheet example)
•Newspaper History Essay - 500 words per
title
For each issue
•Structural metadata for issues digitized and
organized by date (Page-Level Metadata
spreadsheet example)
Deliverables
For each newspaper page
- Page image in two formats
- Grayscale, scanned between 300-
400 dpi, uncompressed TIFF 6.0
image file
- Same image, compressed as
JPEG2000 (.JP2)
- OCR text using the ALTO schema
(1 file per page)
- PDF image with Hidden Text
PHASE TWO
Selected Titles
● Research Library of
Congress Control Numbers
CCNs and OCLC numbers
for all titles
● Accurate LCCNs critical for
data management
● Fill in spreadsheet
● Send to LC for approval
Before Duplication Begins...
●Set up purchase order with selected
digitization vendor (iArchives)
●Research and order microfilm reader
●Send work plan to NEH
●Order 10 1-TB Hard Drives for our
deliverables
Microfilm Reader and Software
•14MP Image Sensor
•Light Source
•File Output
•Lens with 7x to 105x
magnification
Sample Batch
● Sample batch allows Library of Congress to
identify any potential problems and ensures
technical specifications are being implemented
● Tonopah Daily Bonanza (1901-1903)
● Negative and Positive Reels duplicated by
NSLA and sent to UNLV
● Apply LC-provided barcodes on Negative Reel
boxes
○ Barcode connects digital content to physical
reel deposited at LC
MasterFile
●Document everything in the MasterFile and Reel-Level
Spreadsheet
○ Title, Year, LCCN, Barcode/Reel Number, Unique name for iArchives,
metadata received from NSLA
Collation: Reel-Level
UNLV NSLA
Unique Name Title
LCCN Source Repository
Reel-Number Density Readings
Location of Publication Reduction Ratio
Start/End date Average Density
Digital Responsible
Institution
Collation: Page-Level
● Use template
● One page-level spreadsheet = one reel
● Page count
● Anomalies
- Missing issues or pages
- Duplicate issues or pages
- Mutilated pages
- Other abnormalities (e.g. pages out of
order,incorrect dates)
Quality Review: before deliver to vendor
● Re-visit collation sheet and reel
metadata line-by-line
● Confirm for accuracy
● Check delivered page count against
● Check all notation for standardization
and clarity
● Metadata property formatted
iArchives
● iArchives Portal
○ Upload Reel and Page-level in a
.CSV file
● Ship Negative reels and blank hard
drive to be digitized
Scanning Specifications
● Scan from clean second-
generation duplicate silver
negative microfilm (to be
deposited at the Library of
Congress at the end of the award
period)
● Capture specifications are 8-bit
grayscale, between 300 and 400
dpi
● Target film strip should be
scanned at the start of each
session
● Provide the master page images,
delivered to LC, as uncompressed
images in TIFF 6.0 format
PHASE THREE
Back to UNLV
●Receive hard
drive
●Batch Structure
Quality Review
- Quality Review process ensures that NDNP Specifications are met
by checking for image quality, irregularities, and correct
bibliographic software
- Digital Viewer and Validator
(DVV)
- Allows awardees and
vendors to view data and
validate technical aspects of
files
- Verification checks digital
signatures of all files in a batch
Quality Review
● Verify Batch
● Double check dates using Calendar View
in DVV, cross reference with Reel-Level
and Page-Level data
● View thumbnails
● Check OCR (10% of pages)
● Verify Batch with DVV for a second time
● Email Tonijala Penn (LC Liaison) and Deb
Thomas (Project Coordinator for NDNP)
Library of Congress
● Ship to LC
○ Hard Drive
○ Shipping Manifest
○ Use fluorescent stickers!
● Receives and processes batch
● 6-8 weeks turnaround time
● If accepted, batch is ingested
into Chronicling America
Totals to date

Contenu connexe

Similaire à Digitizing Nevada Newspapers: Workflow

Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid Karthik Deivasigamani
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsMatthew Kalan
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Recordspbajcsy
 
Globecom 2015: Adaptive Raptor Carousel for 802.11
Globecom 2015: Adaptive Raptor Carousel for 802.11Globecom 2015: Adaptive Raptor Carousel for 802.11
Globecom 2015: Adaptive Raptor Carousel for 802.11Andrew Nix
 
Miniscule Digital Camera Hardware Design (1.18” x 1.18” 1.96”) - Teq Diligent...
Miniscule Digital Camera Hardware Design (1.18” x 1.18” 1.96”) - Teq Diligent...Miniscule Digital Camera Hardware Design (1.18” x 1.18” 1.96”) - Teq Diligent...
Miniscule Digital Camera Hardware Design (1.18” x 1.18” 1.96”) - Teq Diligent...Teq Diligent
 
Mobicents Summit 2012 - Alexandre Mendonca - Mobicents jDiameter
Mobicents Summit 2012 - Alexandre Mendonca - Mobicents jDiameterMobicents Summit 2012 - Alexandre Mendonca - Mobicents jDiameter
Mobicents Summit 2012 - Alexandre Mendonca - Mobicents jDiametertelestax
 
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentPolicy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentAlpen-Adria-Universität
 
Chapter 3 Computer Hardware
Chapter 3 Computer HardwareChapter 3 Computer Hardware
Chapter 3 Computer Hardwareshelly3160
 
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentPolicy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentMinh Nguyen
 
Scanning 101 Standards
Scanning 101 StandardsScanning 101 Standards
Scanning 101 StandardsJenel Farrell
 
Continuous Performance Testing
Continuous Performance TestingContinuous Performance Testing
Continuous Performance TestingC4Media
 
OSMC 2021 | Handling 250K flows per second with OpenNMS: a case study
OSMC 2021 | Handling 250K flows per second with OpenNMS: a case studyOSMC 2021 | Handling 250K flows per second with OpenNMS: a case study
OSMC 2021 | Handling 250K flows per second with OpenNMS: a case studyNETWAYS
 
Analysis of KinectFusion
Analysis of KinectFusionAnalysis of KinectFusion
Analysis of KinectFusionDong-Won Shin
 
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf....NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...Karel Zikmund
 
An Introduction to AV1 - The Next-Gen Royalty-Free Codec From the Alliance fo...
An Introduction to AV1 - The Next-Gen Royalty-Free Codec From the Alliance fo...An Introduction to AV1 - The Next-Gen Royalty-Free Codec From the Alliance fo...
An Introduction to AV1 - The Next-Gen Royalty-Free Codec From the Alliance fo...Tanya Vernitsky
 
An Introduction to AV1 - The Next-Gen Royalty-Free Codec From the Alliance fo...
An Introduction to AV1 - The Next-Gen Royalty-Free Codec From the Alliance fo...An Introduction to AV1 - The Next-Gen Royalty-Free Codec From the Alliance fo...
An Introduction to AV1 - The Next-Gen Royalty-Free Codec From the Alliance fo...Bitmovin Inc
 
Key Aspects in 3D File Format Conversions
Key Aspects in 3D File Format ConversionsKey Aspects in 3D File Format Conversions
Key Aspects in 3D File Format Conversionspbajcsy
 

Similaire à Digitizing Nevada Newspapers: Workflow (20)

Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid
 
1570514051.pptx
1570514051.pptx1570514051.pptx
1570514051.pptx
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design Patterns
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Records
 
Globecom 2015: Adaptive Raptor Carousel for 802.11
Globecom 2015: Adaptive Raptor Carousel for 802.11Globecom 2015: Adaptive Raptor Carousel for 802.11
Globecom 2015: Adaptive Raptor Carousel for 802.11
 
Towards Data Operations
Towards Data OperationsTowards Data Operations
Towards Data Operations
 
Miniscule Digital Camera Hardware Design (1.18” x 1.18” 1.96”) - Teq Diligent...
Miniscule Digital Camera Hardware Design (1.18” x 1.18” 1.96”) - Teq Diligent...Miniscule Digital Camera Hardware Design (1.18” x 1.18” 1.96”) - Teq Diligent...
Miniscule Digital Camera Hardware Design (1.18” x 1.18” 1.96”) - Teq Diligent...
 
Mobicents Summit 2012 - Alexandre Mendonca - Mobicents jDiameter
Mobicents Summit 2012 - Alexandre Mendonca - Mobicents jDiameterMobicents Summit 2012 - Alexandre Mendonca - Mobicents jDiameter
Mobicents Summit 2012 - Alexandre Mendonca - Mobicents jDiameter
 
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentPolicy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
 
Chapter 3 Computer Hardware
Chapter 3 Computer HardwareChapter 3 Computer Hardware
Chapter 3 Computer Hardware
 
Kraken mesoscon 2018
Kraken mesoscon 2018Kraken mesoscon 2018
Kraken mesoscon 2018
 
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentPolicy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
 
Scanning 101 Standards
Scanning 101 StandardsScanning 101 Standards
Scanning 101 Standards
 
Continuous Performance Testing
Continuous Performance TestingContinuous Performance Testing
Continuous Performance Testing
 
OSMC 2021 | Handling 250K flows per second with OpenNMS: a case study
OSMC 2021 | Handling 250K flows per second with OpenNMS: a case studyOSMC 2021 | Handling 250K flows per second with OpenNMS: a case study
OSMC 2021 | Handling 250K flows per second with OpenNMS: a case study
 
Analysis of KinectFusion
Analysis of KinectFusionAnalysis of KinectFusion
Analysis of KinectFusion
 
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf....NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
 
An Introduction to AV1 - The Next-Gen Royalty-Free Codec From the Alliance fo...
An Introduction to AV1 - The Next-Gen Royalty-Free Codec From the Alliance fo...An Introduction to AV1 - The Next-Gen Royalty-Free Codec From the Alliance fo...
An Introduction to AV1 - The Next-Gen Royalty-Free Codec From the Alliance fo...
 
An Introduction to AV1 - The Next-Gen Royalty-Free Codec From the Alliance fo...
An Introduction to AV1 - The Next-Gen Royalty-Free Codec From the Alliance fo...An Introduction to AV1 - The Next-Gen Royalty-Free Codec From the Alliance fo...
An Introduction to AV1 - The Next-Gen Royalty-Free Codec From the Alliance fo...
 
Key Aspects in 3D File Format Conversions
Key Aspects in 3D File Format ConversionsKey Aspects in 3D File Format Conversions
Key Aspects in 3D File Format Conversions
 

Plus de Nevada Digital Newspaper Project (8)

Nevada Digital Newspaper Project and Chronicling America Demo
Nevada Digital Newspaper Project and Chronicling America Demo Nevada Digital Newspaper Project and Chronicling America Demo
Nevada Digital Newspaper Project and Chronicling America Demo
 
NVDNP Project Update: Feb 2018
NVDNP Project Update: Feb 2018NVDNP Project Update: Feb 2018
NVDNP Project Update: Feb 2018
 
Nevada Digital Newspaper Project Midterm Status
Nevada Digital Newspaper Project Midterm StatusNevada Digital Newspaper Project Midterm Status
Nevada Digital Newspaper Project Midterm Status
 
NVDNP Progress Update (infographic)
NVDNP Progress Update (infographic)NVDNP Progress Update (infographic)
NVDNP Progress Update (infographic)
 
Chronicling America Search Tips
Chronicling America Search TipsChronicling America Search Tips
Chronicling America Search Tips
 
Digitizing Historic Newspapers: Workflow
Digitizing Historic Newspapers: WorkflowDigitizing Historic Newspapers: Workflow
Digitizing Historic Newspapers: Workflow
 
Nevada’s Newspaper History
Nevada’s Newspaper HistoryNevada’s Newspaper History
Nevada’s Newspaper History
 
Searching Chronicling America
Searching Chronicling AmericaSearching Chronicling America
Searching Chronicling America
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Dernier (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Digitizing Nevada Newspapers: Workflow

  • 1. Nevada Digital Newspaper Project Dana Bullinger (Project Coordinator) and Melissa Stoner (Project Technician)
  • 3. Title Selection ● Advisory Board selects qualified titles ○ Research Value ○ Geographic Representation ○ Temporal Coverage ○ Diversity
  • 4. NDNP Title Guidelines ●Complete (or majority of) title run should be available on microfilm without restrictions ●Technical factors to consider: ○ Quality of original text and microfilm capture ○ Reduction ratio (lower the reduction ratio, the better, below 20x) ○ Camera master negative microfilm duplicated should have a resolution test patterns readable at 5.0 or higher ○ Variations of no more than 0.2 within images and between exposures ○ Confidence level through OCR testing of sample page images
  • 5. Deliverables For Each Title •Up-to-date MARC record from the CONSER OCLC database •Additional title-level metadata (Reel-Level Metadata spreadsheet example) •Newspaper History Essay - 500 words per title For each issue •Structural metadata for issues digitized and organized by date (Page-Level Metadata spreadsheet example)
  • 6. Deliverables For each newspaper page - Page image in two formats - Grayscale, scanned between 300- 400 dpi, uncompressed TIFF 6.0 image file - Same image, compressed as JPEG2000 (.JP2) - OCR text using the ALTO schema (1 file per page) - PDF image with Hidden Text
  • 8. Selected Titles ● Research Library of Congress Control Numbers CCNs and OCLC numbers for all titles ● Accurate LCCNs critical for data management ● Fill in spreadsheet ● Send to LC for approval
  • 9. Before Duplication Begins... ●Set up purchase order with selected digitization vendor (iArchives) ●Research and order microfilm reader ●Send work plan to NEH ●Order 10 1-TB Hard Drives for our deliverables
  • 10. Microfilm Reader and Software •14MP Image Sensor •Light Source •File Output •Lens with 7x to 105x magnification
  • 11. Sample Batch ● Sample batch allows Library of Congress to identify any potential problems and ensures technical specifications are being implemented ● Tonopah Daily Bonanza (1901-1903) ● Negative and Positive Reels duplicated by NSLA and sent to UNLV ● Apply LC-provided barcodes on Negative Reel boxes ○ Barcode connects digital content to physical reel deposited at LC
  • 12. MasterFile ●Document everything in the MasterFile and Reel-Level Spreadsheet ○ Title, Year, LCCN, Barcode/Reel Number, Unique name for iArchives, metadata received from NSLA
  • 13. Collation: Reel-Level UNLV NSLA Unique Name Title LCCN Source Repository Reel-Number Density Readings Location of Publication Reduction Ratio Start/End date Average Density Digital Responsible Institution
  • 14. Collation: Page-Level ● Use template ● One page-level spreadsheet = one reel ● Page count ● Anomalies - Missing issues or pages - Duplicate issues or pages - Mutilated pages - Other abnormalities (e.g. pages out of order,incorrect dates)
  • 15. Quality Review: before deliver to vendor ● Re-visit collation sheet and reel metadata line-by-line ● Confirm for accuracy ● Check delivered page count against ● Check all notation for standardization and clarity ● Metadata property formatted
  • 16. iArchives ● iArchives Portal ○ Upload Reel and Page-level in a .CSV file ● Ship Negative reels and blank hard drive to be digitized
  • 17. Scanning Specifications ● Scan from clean second- generation duplicate silver negative microfilm (to be deposited at the Library of Congress at the end of the award period) ● Capture specifications are 8-bit grayscale, between 300 and 400 dpi ● Target film strip should be scanned at the start of each session ● Provide the master page images, delivered to LC, as uncompressed images in TIFF 6.0 format
  • 19. Back to UNLV ●Receive hard drive ●Batch Structure
  • 20. Quality Review - Quality Review process ensures that NDNP Specifications are met by checking for image quality, irregularities, and correct bibliographic software - Digital Viewer and Validator (DVV) - Allows awardees and vendors to view data and validate technical aspects of files - Verification checks digital signatures of all files in a batch
  • 21. Quality Review ● Verify Batch ● Double check dates using Calendar View in DVV, cross reference with Reel-Level and Page-Level data ● View thumbnails ● Check OCR (10% of pages) ● Verify Batch with DVV for a second time ● Email Tonijala Penn (LC Liaison) and Deb Thomas (Project Coordinator for NDNP)
  • 22. Library of Congress ● Ship to LC ○ Hard Drive ○ Shipping Manifest ○ Use fluorescent stickers! ● Receives and processes batch ● 6-8 weeks turnaround time ● If accepted, batch is ingested into Chronicling America

Notes de l'éditeur

  1. M
  2. M
  3. M
  4. M In addition to the master TIFF image file and OCR text using the ALTO schema, the awardee institution will provide a searchable PDF (Portable Document Format) Image with Hidden Text for each page image and a JPEG2000 compressed image file (.JP2) PDFs will provide an image of the original page that can be conveniently printed and downloaded, supporting within-page searching for words, external to the NDNP search system. LC will use the separate OCR output file as the basis for search in its access interface. The PDF Image with Hidden Text can be created at the time of processing by the OCR application.
  5. M
  6. D
  7. D
  8. D
  9. D
  10. D
  11. M
  12. M
  13. M
  14. M
  15. M Newspapers microfilmed two sheets per frame should be split into two separate image files (and assigned appropriate metadata). To improve appearance and OCR accuracy, images that contain text blocks exhibiting more than 3 degrees of skew should be deskewed. Page image files should be cropped to the page edge (not to the text block boundaries), retaining the actual edge and up to ¼ inch beyond. In general, the goal of the NDNP cropping specification is to produce as complete a page image as possible in order to best enable long-term management and access needs into the future.
  16. D
  17. D Verify twice, once when it is received, and before it is shipped to LC
  18. D
  19. D
  20. D