1. Invited Demo: Prometheus: Managing the Ingest of Media
Carriers
Nicholas del Pozo Douglas Elford David Pearson
Digital Preservation Digital Preservation Digital Preservation
National Library of Australia National Library of Australia National Library of Australia
Parkes Place, ACT 2600 Australia Parkes Place, ACT 2600 Australia Parkes Place, ACT 2600 Australia
ndelpozo@nla.gov.au delford@nla.gov.au dapearso@nla.gov.au
ABSTRACT number of widely used carrier types, any long-term solution has
The National Library of Australia has a relatively small but to make provision for almost any kind of carrier, including carrier
important collection of digital material stored on common carriers types which may not have been encountered yet. Moreover, this is
such as floppy disks, CDs and DVDs. This includes both a constantly growing problem; if we don’t deal with the digital
published material and unpublished manuscripts in digital form. materials that we have already collected, and ideally process new
In the past, preservation of the Library’s physical format digital materials as a part of the acquisition process, accessing these
collection has been taken care of manually, on a case-by-case carriers will soon become unmanageable, and eventually
basis, but this approach is insufficient to deal effectively with the impossible.
increasing volume of material requiring preservation. Factors such as obsolescence and carrier degradation already
The Library has produced an application called Prometheus, make it difficult for digital preservation solutions to preserve
which provides a semi-automated, scalable process for access to digital content. Additionally, due to the potential
transferring data from carriers to preservation-managed digital volume and diversity of carriers and file formats, unless solutions
storage. This is helping the Library to mitigate the major risks are robust and semi-automated, the digital data that it is currently
associated with storing the content on physical carriers: possible to preserve may not be. To avoid exacerbating the
deterioration of the media and obsolescence of the hardware problem, it is key that solutions deal with current common carrier
required to access them. Prometheus makes it easier to process the types as efficiently as possible, while providing access to, or a
majority of carriers commonly encountered in the Library and to mechanism for preserving, as many older carriers as is practical.
collect and manage metadata about their content. Although not
perfect, Prometheus is helping the Library to save digital content 2. PROMETHEUS
before it is too late. To ensure access to digital content on the most common carriers
within the Library, the Digital Preservation Workflow Project
Keywords produced an application called Prometheus. This application
Digital preservation, media carriers, National Library of provides a semi-automated, scalable process for transferring data
Australia, obsolescence, open source software, Prometheus. from carriers to preservation-managed digital storage. This is
helping the Library to mitigate the major risks associated with
storing the content on physical carriers: deterioration of the media
1. INTRODUCTION and obsolescence of the technology required to access them.
The National Library of Australia has a relatively small but Prometheus makes it easier to process the majority of carriers
important collection of digital material stored on common carriers commonly encountered in the Library and to collect and manage
such as floppy disks, CDs and DVDs. This includes both metadata about their content. It also provides mechanisms to
published material and unpublished manuscripts in digital form. accommodate special cases, such as less common media types.
In the past, preservation of the Library’s physical format digital Additionally, the original physical arrangement of a group of
collection has been taken care of manually, on a case-by-case media can be recorded, even in those cases where a piece of
basis, but this approach is insufficient to deal effectively with the physical media cannot be processed.
increasing volume of material requiring preservation.
Prometheus allows Library staff to link to catalogue records,
The Library collects digital material through multiple acquisition create a byte-level image of the digital content, and transfer it to
streams and generally has little control over the physical format in preservation-managed digital storage. Once the content is copied
which the material arrives. So, while most items fall into a small from the carrier, the integrity of the image is verified, and as
much metadata as possible is harvested. Attaching a customisable
This work is licensed under the Creative Commons Attribution-
‘mini-jukebox’ (Figure 1) to a staff member’s workstation allows
Noncommercial-No Derivative Works 3.0 Unported license. You are free
to share this work (copy, distribute and transmit) under the following
the accurate duplication of the content from a wider range of
conditions: attribution, non-commercial, and no derivative works. To view carrier types, such as USB thumb drives, memory cards or 3½
a copy of this license, visit http://creativecommons.org/licenses/by-nc- inch floppy disks. It also provides more reliable hardware for
nd/3.0/. imaging CDs, and DVDs. The digital preservation section can use
DigCCurr2009, April 1-3, 2009, Chapel Hill, NC, USA Prometheus to deal with carrier types that fall outside this range,
such as 5¼ inch floppy disks, SyQuest disks or hard drives.
73
2. Figure 1. Library developer Snezana Mihajlovic uses a
customised ‘mini-jukebox’ attached to a standard Library
workstation (Photo: Douglas Elford, National Library).
The system incorporates a range of open source tools to undertake
processing, including carrier imaging (dd [1], cdrdao [2]);
integrity calculation and checking (Jaxsum [3]); file identification
(DROID [4]); and metadata extraction (JHOVE [5], NLNZ
Metadata Extraction Tool [6]). These tools are deployed using
Java-based web services. Moreover, Prometheus has been
designed in a modular way, so that tools and services can be
easily upgraded or replaced as new versions are released or better
software becomes available (Figure 2).
3. THE SOFTWARE RELEASED
Prometheus was designed for the Library’s specific environment,
and therefore is not an ‘out of the box’ solution. However, it may
be possible for other parties to use all or some of the
requirements, other documentation or components. As such, the
software has been released under the GNU General Public
License V3.0. The latest version of Prometheus and its
documentation is available from the project website [7]. A paper
was presented on this project at the IFLA World Library and
Information Congress in Quebec City, Canada, in August 2008
[8].
If we wait for the prefect system to be built, for the content on
many carriers it will already be too late. Experience to date
suggests that even though we all share the same fundamental Figure 2. General Process View.
problem, the sheer volume and diversity of carriers, as well as
varying individual collecting and business environments, makes it
unlikely that there will ever be a single software solution that can
be used by everyone. At least for the Library, Prometheus 4. ACKNOWLEDGMENTS
provides a starting point to manage the ingest of, and preserve Our thanks to Gerard Clifton, Snezana Mihajlovic and Joseph
content from problematic and sometimes idiosyncratic carriers for Mok, who worked with us on version 1.0 of Prometheus, and who
long-term preservation, hopefully in a way that can advantage continue with the development work for version 1.4.
others.
This paper is based on the earlier paper, that appeared in 5. REFERENCES
Gateways Dec 2008 [9]. [1] dd for Windows, at http://www.chrysocome.net/dd
[2] cdrdao, at http://cdrdao.sourceforge.net/
[3] Jaxsum Java checksum utility, at
http://sourceforge.net/projects/jacksum/
74
3. [4] DROID automatic file format identification tool, at [8] Elford, D., del Pozo, N., Mihajlovic, S., Pearson, D., Clifton,
http://droid.sourceforge.net/wiki/index.php/Introduction G. and Webb, C. 2008. Media Matters: developing processes
[5] JHOVE object validation environment, at for preserving digital objects on physical carriers at the
http://hul.harvard.edu/jhove/ National Library of Australia. In World Library and
Information Congress: 74th IFLA General Conference and
[6] National Library of New Zealand Metadata Extraction Tool, Council 10-14 August 2008, Québec, Canada
at http://meta-extractor.sourceforge.net/ www.ifla.org/IV/ifla74/papers/084-Webb-en.pdf.
[7] Prometheus Sourceforge Website, at http://prometheus- [9] Pearson, D. 2008. Titans in the Library: Prometheus Unbinds
digi.sourceforge.net/ At-risk Data. In Gateways Dec 2008.
http://www.nla.gov.au/pub/gateways/issues/96/story02.htm.
75