Outlining the Nevada Digital Newspaper Project workflow during the first year of the project. Highlighting the multiple stages in the process and focusing on the main activities involved for each stage completion.
3. Title Selection
● Advisory Board selects qualified titles
○ Research Value
○ Geographic Representation
○ Temporal Coverage
○ Diversity
4. NDNP Title Guidelines
●Complete (or majority of) title run should be available
on microfilm without restrictions
●Technical factors to consider:
○ Quality of original text and microfilm capture
○ Reduction ratio (lower the reduction ratio, the better, below 20x)
○ Camera master negative microfilm duplicated should have a resolution
test patterns readable at 5.0 or higher
○ Variations of no more than 0.2 within images and between exposures
○ Confidence level through OCR testing of sample page images
5. Deliverables
For Each Title
•Up-to-date MARC record from the
CONSER OCLC database
•Additional title-level metadata (Reel-Level
Metadata spreadsheet example)
•Newspaper History Essay - 500 words per
title
For each issue
•Structural metadata for issues digitized and
organized by date (Page-Level Metadata
spreadsheet example)
6. Deliverables
For each newspaper page
- Page image in two formats
- Grayscale, scanned between 300-
400 dpi, uncompressed TIFF 6.0
image file
- Same image, compressed as
JPEG2000 (.JP2)
- OCR text using the ALTO schema
(1 file per page)
- PDF image with Hidden Text
8. Selected Titles
● Research Library of
Congress Control Numbers
CCNs and OCLC numbers
for all titles
● Accurate LCCNs critical for
data management
● Fill in spreadsheet
● Send to LC for approval
9. Before Duplication Begins...
●Set up purchase order with selected
digitization vendor (iArchives)
●Research and order microfilm reader
●Send work plan to NEH
●Order 10 1-TB Hard Drives for our
deliverables
10. Microfilm Reader and Software
•14MP Image Sensor
•Light Source
•File Output
•Lens with 7x to 105x
magnification
11. Sample Batch
● Sample batch allows Library of Congress to
identify any potential problems and ensures
technical specifications are being implemented
● Tonopah Daily Bonanza (1901-1903)
● Negative and Positive Reels duplicated by
NSLA and sent to UNLV
● Apply LC-provided barcodes on Negative Reel
boxes
○ Barcode connects digital content to physical
reel deposited at LC
12. MasterFile
●Document everything in the MasterFile and Reel-Level
Spreadsheet
○ Title, Year, LCCN, Barcode/Reel Number, Unique name for iArchives,
metadata received from NSLA
13. Collation: Reel-Level
UNLV NSLA
Unique Name Title
LCCN Source Repository
Reel-Number Density Readings
Location of Publication Reduction Ratio
Start/End date Average Density
Digital Responsible
Institution
14. Collation: Page-Level
● Use template
● One page-level spreadsheet = one reel
● Page count
● Anomalies
- Missing issues or pages
- Duplicate issues or pages
- Mutilated pages
- Other abnormalities (e.g. pages out of
order,incorrect dates)
15. Quality Review: before deliver to vendor
● Re-visit collation sheet and reel
metadata line-by-line
● Confirm for accuracy
● Check delivered page count against
● Check all notation for standardization
and clarity
● Metadata property formatted
16. iArchives
● iArchives Portal
○ Upload Reel and Page-level in a
.CSV file
● Ship Negative reels and blank hard
drive to be digitized
17. Scanning Specifications
● Scan from clean second-
generation duplicate silver
negative microfilm (to be
deposited at the Library of
Congress at the end of the award
period)
● Capture specifications are 8-bit
grayscale, between 300 and 400
dpi
● Target film strip should be
scanned at the start of each
session
● Provide the master page images,
delivered to LC, as uncompressed
images in TIFF 6.0 format
20. Quality Review
- Quality Review process ensures that NDNP Specifications are met
by checking for image quality, irregularities, and correct
bibliographic software
- Digital Viewer and Validator
(DVV)
- Allows awardees and
vendors to view data and
validate technical aspects of
files
- Verification checks digital
signatures of all files in a batch
21. Quality Review
● Verify Batch
● Double check dates using Calendar View
in DVV, cross reference with Reel-Level
and Page-Level data
● View thumbnails
● Check OCR (10% of pages)
● Verify Batch with DVV for a second time
● Email Tonijala Penn (LC Liaison) and Deb
Thomas (Project Coordinator for NDNP)
22. Library of Congress
● Ship to LC
○ Hard Drive
○ Shipping Manifest
○ Use fluorescent stickers!
● Receives and processes batch
● 6-8 weeks turnaround time
● If accepted, batch is ingested
into Chronicling America
M
In addition to the master TIFF image file and OCR text using the ALTO schema, the awardee institution will provide a searchable PDF (Portable Document Format) Image with Hidden Text for each page image and a JPEG2000 compressed image file (.JP2)
PDFs will provide an image of the original page that can be conveniently printed and downloaded, supporting within-page searching for words, external to the NDNP search system. LC will use the separate OCR output file as the basis for search in its access interface. The PDF Image with Hidden Text can be created at the time of processing by the OCR application.
M
D
D
D
D
D
M
M
M
M
M
Newspapers microfilmed two sheets per frame should be split into two separate image files (and assigned appropriate metadata). To improve appearance and OCR accuracy, images that contain text blocks exhibiting more than 3 degrees of skew should be deskewed. Page image files should be cropped to the page edge (not to the text block boundaries), retaining the actual edge and up to ¼ inch beyond.
In general, the goal of the NDNP cropping specification is to produce as complete a page image as possible in order to best enable long-term management and access needs into the future.
D
D
Verify twice, once when it is received, and before it is shipped to LC