Some findings from the Preserving Digital Public Television Project as it enters its final month. The project was a collaboration between WNET, WGBH, PBS, and NYU as part of the Library of Congress's National Digital Information Infrastructure and Preservation Program (NDIIPP). This talk was prepared for the Society of Motion Picture and Television Engineers NY Section Meeting, February 24, 2010.
Preserving Digital Public Television: A Status Report
1. PRESERVING
DIGITAL PUBLIC
TELEVISION
Part of the NDIIPP Program of the Library of Congress
A STATUS REPORT
Kara Van Malssen
Senior Research Scholar & Metadata Specialist
New York University
February 24, 2010
SYMPTE-NY Section Meeting
2. NDIIPP =
National Digital Information
Infrastructure and Preservation
Program of the Library of Congress
www.digitalpreservation.gov
Image by Smiley Man with a Hat via Flickr http://www.flickr.com/photos/smileymanwithahat/2477365291/
6. by massdistraction via Flickr http://www.flickr.com/photos/sharynmorrow/3718174646/in/set-72157621271414097/
DIGITAL ARCHAEOLOGY?
7. Design and build an
PDPTV OAIS-compliant
preservation repository
GOALS for born digital public
television
8. Implement and
PDPTV recommend standards
for metadata, wrapper
GOALS and encoding formats,
production workflow
practices
9. PDPTV Recommend selection
criteria for long-term
GOALS retention
10. PDPTV Examine and recommend
strategies for long term
GOALS sustainability
11. “ An OAIS is an archive,
consisting of an organization of
people and systems, that has
accepted the responsibility to
preserve information and make it
available for a designated
community....
this distinguishes it “
from other uses of the term
‘archive.’
- Reference Model for an Open Archival Information System, ISO 14721:2003
12. PRESERVATION PLANNING
DESCRIPTIVE DESCRIPTIVE
P INFO INFO
C
R DATA O
O MANAGEMENT queries N
D S
result sets
U SIP INGEST ACCESS orders U
C M
ARCHIVAL
E E
STORAGE DIP
R AIP
R
AIP
ADMINISTRATION
MANAGEMENT
OAIS Functional Model
13. TECHNOLOGIES
{
some of the
REPOSITORY
PROJECT
SPECIFIC
CODE
46. ONE COPY IS NO COPY
by NightRPStar via Flickr http://www.flickr.com/photos/ninjanoodles/153893226/
47. “ (rules define how many copies
to make, and which locations
to put these in, with a typical
strategy being 3 copies in 3
geographically separate
“
locations)
- M. Addis, et al “Sustainable Archiving and Storage Management of Audiovisual Digital
Assets” SMPTE Motion Imaging Journal, Nov/Dec 2009
48. Photo by quapan via Flickr http://www.flickr.com/photos/hinkelstone/2435823037/
Consider federated storage models
for cost and sustainability reasons
54. by DG Jones via Flickr http://www.flickr.com/photos/dgjones/1225183400/
a few words about
file formats...
55. “ Businesses may use different
encoding formats for different
business processes, but should
strive to avoid transcoding
wherever possible, because it “
introduces a generation and thus
reduces quality.
- Peter Thomas “File Formats in Television Archiving and Content Exchange”
SMPTE Motion Imaging Journal, Nov/Dec 2009
57. “
Benign neglect is the default
stewardship, collection policy.
Physical world, even more so
“
in the digital world.
- Cathy Marshall, Senior Researcher, Microsoft Research
Keynote at Code4Lib, February 23, 2010.
via Twitter @jschneider
60. THANK
YOU
www.thirteen.org/ptvdigitalarchive
kvm211@nyu.edu
http://www.slideshare.net/kvanmalssen
Notes de l'éditeur
Early context for the project was that public television was supposed to depositing one copy of all programs at LC. That hasn’t been happening.
WNET and WGBH combined produce about 60% of nationally broadcast programming in the US. But they also produce and distribute local programming. PBS is only a distributor - not a content creator. NYU had expertise in building digital libraries (which public broadcasting did not) and an existing PR. LC is the funder, but also a repository.
Why are we doing this?
- Jacques Cousteau story.
We’re not dealing with digitizing things.
An archive in this sense is not a server. It is processes, procedures, people with a mission of preservation, for which it is responsible into the indefinite future.
“The PR is designed as a set of loosely-coupled components communicating over stable interfaces.”
Storage Resource Broker = supports shared collections that can be distributed across multiple organizations and heterogeneous storage systems. Can be used as a Data Grid Management System.
Dspace = DSpace open source software, used for Archival Storage, Data Management, Dissemenation functions of OAIS
These steps need just to process any one of those SIPs. These were the basic steps, but they were slightly different for each SIP class because of different metadata, different file formats, different PREMIS.
Processing is the same for both SIP classes (production masters and source files).
The “p” word was never used. It actually made sense to make changes during transition, and no practices were too entrenched yet, but they didn’t want to say it was for archival reasons, just that this was the way it was going to be done.
Settings: Frame Size (1080i), Aspect Ratio (16/9), Frame Rate (29.97), Data Rate (117 mbps)
Including technical & descriptive metadata
Consider a grid or distributed but federated system
Combined with local storage for most frequently accessed materials with Grid solutions. Make sure your approached is managed. Take a look at the AVATARm project in the UK for more info.
One word. You are going to need a lot of it. Video files are not self describing. Filenames are not good for search and retrieval, file-level metadata is not searchable.
If there are no cataloging rules for descriptive metadata, everyone will input differently. Combine cataloging rules with controlled vocabularies.
Because: you won’t have to reinvent the wheel (elements, definitions, vocab), facilitates exchange, there is a support community.
A few things about file formats: There is still no standard format for video preservation, especially for born digital, because it is born compressed. The most important thing to do is choose an open, widely support encoding format, that can be used in all systems in your core business processes without transcoding. MXF or QT (FCP) containers.
Preservation does not happen in a vacuum. There must be ongoing commitment, funds, staffing, reviewed and updated policies and procedures, etc.