This presentation discusses the development of Archival Information Packages (AIPs) for NASA HDF-EOS data. It outlines the components of an AIP according to the OAIS reference model. It then describes efforts to implement AIPs using standards like METS, PREMIS, ISO 19115 and HDF-5 to package HDF data files with associated metadata. The goals were to prototype AIPs at the data set and granule level and test the usability of digital library standards for geospatial data. Next steps involve further work applying these standards to create preservation-ready packages of NASA's HDF data holdings.
2. Outline
• What is an Archival Information Package?
HDF-AIP
• Standards? What Standards?
METS
DIF/FGDC/ISO 19115-2
PREMIS
• Results
• Next Steps
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
3. OAIS Reference Model1
Archive Information Package
1
Reference Model for an Open Archival Information System (OAIS), CCSDS 650.0-B-1, Blue Book, January 2002.
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
4. Archival Information Package Contents
• Content Information
The data object to be preserved
Information that describes the data object
o Typically interpreted as the syntax and semantics of the file
structure
• Preservation Description Information
Provenance –
Origin or source of the data, any changes that have taken place since,
and who has had custody of it
Fixity – the authentication mechanisms (with keys) needed to ensure that the data
object has not been altered in an undocumented manner
Reference – identification mechanisms and values
Context – relation of the object to its environment
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
5. HDF-Archive Information Packages
• The HDF group was
funded to investigate
and propose a design
for a complete archival
information package
for HDF data files
• The result was a METS
metadata file to
accompany the HDF
data file
http://www.hdfgroup.org/projects/hdf5_aip/hdf5_aip_wp.html
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
6. Metadata Standards - METS
• Metadata Encoding and Transmission Standard
• An initiative of the Digital Library Federation
• Provides the means to convey the metadata
necessary for
management of digital objects within a repository
exchange of objects between repositories (or between
repositories and their users)
• Designed to facilitate
shared development of information management
tools/services
interoperable exchange of digital materials
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
7. METS - A very brief overview
Describes the METS
document itself
Describes the editor
e.g., creator orobject
using some external standard
Describes object creation, storage,
e.g., MARC, FGDC, Dublin Core
intellectual property rights, source
info, provenance, etc.
Provides an inventory of all of the
e.g., PREMIS
files that are part of the object
described
A physical or logical map of the
organization of the materials
described
Allows specification of hyperlinks
between parts of the map (mostly
useful when preserving websites)
Used to associate executable code
with parts of the content
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
8. Metadata Standards - Descriptive Metadata
Derived from
• Discovery, Assess and Access Metadata
GCMD DIF
FGDC CSDGM
ISO 19115
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
9. Metadata Standards - ISO 19115:2003
• The international equivalent of the FGDC standard
• Most fields can be mapped or generated from
FGDC metadata
• The exception is the Dataset Topic Keywords
• Allows for national profiles
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
10. Metadata Standards - ISO 19115:2003
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
11. Is there a metadata standard for AIP
information?
Archive Information Package
1
Reference Model for an Open Archival Information System (OAIS), CCSDS 650.0-B-1, Blue Book, January 2002.
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
12. Preservation Metadata Implementation Strategies
(PREMIS)
• Provide a core preservation metadata set with broad
applicability across the digital preservation
community
• Developed by an OCLC and RLG sponsored
international working group
Representatives from libraries, museums, archives,
government, and the private sector.
• Maintained by the Library of Congress
• Based on the OAIS reference model
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
13. PREMIS - Entity-Relationship Diagram
Intellectual
Entities
Objects
“an action that involves at
least organization, or
Rights
“a“a coherent set of content
person,one object or agent
known to the of information
software program associated
“a discrete unitpreservation
that is reasonably
repository”
with described as a unit” in
preservation events
in digital form”
thee.g.,example,archived,
For created, a data file
life of a web site,
For example, an object” data
migrated or more
e.g., Dr. Spockofof data it
“assertions donated sets
set or collection one
rights or permissions
pertaining to an object
or an agent”
e.g., copywrite notice, legal
Events
statute, deposit agreement
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
Agents
14. Is there a metadata standard for AIP
information?
PREMIS
ISO 19115
1
Reference Model for an Open Archival Information System (OAIS), CCSDS 650.0-B-1, Blue Book, January 2002.
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
15. NOAA Data Stewardship Prototype
• NSIDC and THG demonstrated the feasibility of
migrating NASA data to a standard HDF-AIP
format
• Motivation:
Technologies change regularly,
organizations come and go, but data must
survive
But preserving data takes more than just
preserving the bits, all the components of an
AIP are critical
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
16. Project Goals
• Prototype development of Archive Information
Packages for HDF data:
For entire data sets
For individual “granules”
• Test usability of digital library standards with
geospatial data
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
17. Program Plan (Modified)
ISO-19115
CDM/NetCDF4
ECS to
METS
(Data Set)
HDF5-AIP
NetCDF4 /
HDF5 Data
METS
NetCDF4/HDF5-data
ECS to
METS
NSIDC/ECS
Metadata
(Granule)
H4to
H5
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
NSIDC/ ECS
HDF4-data
18. HDF5 Granule Level Archive Information
Packages
Data file
HDF5
Metadata file
METS
Primary Schema
Extension Schema
|<mets>
|---<dmdSec>----------------<ISO 19115>
|---<amdSec>--------------|--<techMD>
|
|--<rightsMD>
|
|--<sourceMD>
|----<fileGrp>
|----<structMap>
PREMIS
HDF5 AIP Components
http://www.hdfgroup.uiuc.edu/papers/papers/AIP/HDF5_AIP_White_Paper.pdf
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
19. File Level AIP Activity Status
• Developed a map from NSIDC/ECS metadata to
METS/PREMIS/ISO 19115 components
• Prototype software completed
• Issues
What goes in PREMIS vs ISO 19115?
Auxillary file handling - own AIP or not?
o
E.g., browse files, processing history, PGE’s
Granules vs files
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
20. Issues and Questions
• Inconsistent use of terminology between standards
– for example, what is a data set?
• Many of the standards care about distribution
formats
Are these even relevant concepts any more?
Do you really want to have to update the metadata record
just because a new distribution format was added?
What about new access services?
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
21. Next Steps
• NSIDC is updating our non-ECS data systems
handling of metadata including support for
PREMIS, etc. metadata on all holdings
• Work underway to upgrade granule level metadata
for NSIDC flagship sea ice products
(PREMIS/METS/ISO AIP packages)
• Work to improve archivability of data stored in
HDF formats on-going – NASA implementing a
standard XML description of contents across its
archives
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
22. Acknowledgement
This work was supported under NOAA Scientific
Stewardship Program grant number
NA07OAR4310286. Any opinions, findings,
and conclusions or recommendations
expressed in this material are those of the
author(s) and do not necessarily reflect the
views of NOAA.
Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF
and HDF-EOS Workshop XIII
Notes de l'éditeur
Lots of background material that I won’t really discuss – indicated
Syntax
- XFDU
- DFDL
- ESML
Semantics?
A couple of interesting and useful things about METS:
is that it is deliberately designed to handle objects at a wide variety of scales (single files, complex web sites)
Rather than attempting to define descriptive and administrative metadata needs for all kinds of objects, they designed the standard to incorporate a variety of other standards (e.g., FGDC for geospatial metadata)
When you talk to a geoscientist or data scientist who deals with geospatial data – these are the standards they know and care about
GCMD – because it is the oldest, is internationally accepted; NASA/NOAA/NSF require it for data set descriptions; because the Global Change Master Directory is the data equivalent of WorldCat
FGDC – Content Standard for Digital Geospatial Metadata; derived from DIF; mandated for all federally funded data by Executive Order
ISO 19115 – Most recent standard – replacing FGDC – adopted by NOAA and likely NASA
But more than just descriptive metadata is needed
It is equally important to know what has happened to the data since it’s creation, to know it’s provenance
The PREMIS entity<->relationship diagram
Representation - “the set of files needed for a complete and reasonable rendition of an Intellectual Entity”
File
Bitstream - “contiguous or non-contiguous data within a file that has meaningful common properties for preservation purposes”
So how does this apply to science data?
Keeping track of events in the digital library world for a few years
Noticed that they’ve come up with standards to deal with a wide variety of information types
NOAA and USGS were to be the ultimate home of much of NASA’s EOS data
THG with funding ultimately from National Archives and Records Administration had written a white paper defining an HDF-AIP using a digital library standard
A standard called METS
Primary Schema Extension Schema
National Digital Geospatial Archive - LOC NDIIP (National Digital Information Infrastructure and Preservation Program )
Recommendation by Nancy Hoebelheinrich of Stanford
Different data sets are different - some data sets have 1 file per granule; others have many; some data sets have a browse for each granule; in others the mapping is 1 to many; many to 1, or many to many
In ISO 19115 parlance, a dataset is an “identifiable collection of data,” where a dataset may reside in a larger dataset, can be as small as a single feature, and could even be a single map or chart (see ISO 19115:2003(E) page 3). This is in contrast to a data series which is a “collection of datasets sharing the same product specification” where the phrase “product specification” is totally undefined.
In NASA, NOAA, and NSF parlance a data set is the collection of all of the files for a particular project, from a particular instrument, etc. preferentially that are all of the same type.
A data set is comprised of data files or data granules.In HDF parlance, a Science Data Set is the unit within a file that contains a particular data array.