A preponderance of data from NASA's Earth Observing System (EOS) is archived in the HDF Version 4 (HDF4) format. The long-term preservation of these data is critical for climate and other scientific studies going many decades into the future. HDF4 is very effective for working with the large and complex collection of EOS data products. Unfortunately, because of the complex internal byte layout of HDF4 files, future readability of HDF4 data depends on preserving a complex software library that can interpret that layout. Having a way to access HDF4 data independent of a library could improve its viability as an archive format, and consequently give confidence that HDF4 data will be readily accessible forever, even if the HDF4 library is gone.
To address the need to simplify long-term access to EOS data stored in HDF4, a collaborative project between The HDF Group and NASA Earth Science Data Centers is implementing an approach to accessing data in HDF4 files based on the use of independent maps that describe the data in HDF4 files and tools that can use these maps to recover data from those files. With this approach, relatively simple programs will be able to extract the data from an HDF4 file, bypassing the need for the HDF4 library.
A demonstration project has shown that this approach is feasible. This involved an assessment of NASA�s HDF4 data holdings, and development of a prototype XML-based layout mapping language and tools to read layout maps and read HDF4 files using layout maps. Future plans call for a second phase of the project, in which the mapping tools and XML schema are made production quality, the mapping schema are integrated with existing XML metadata files in several data centers, and outreach activities are carried out to encourage and facilitate acceptance of the technology.
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps
1. The HDF Group
Ensuring Long Term Access to
Remotely Sensed HDF4 Data
with Layout Maps
Ruth Duerr, NSIDC
Christopher Lynnes, GES DISC
Mike Folk, Kent Yang, Peter Cao, The HDF Group
November 3-5,
2009
HDF/HDF-EOS Workshop XIII
1 www.hdfgroup.org
2. HDF4 files are complex
November 3-5,
2009
HDF/HDF-EOS Workshop XIII
2
www.hdfgroup.org
3. How do we save HDF users
from having to deal with all of
the complexity under the
hood?
November 3-5,
2009
HDF/HDF-EOS Workshop XIII
3
www.hdfgroup.org
4. Through the HDF software
libraries, either by using the
HDF APIs directly or by using
HDF tools that depend on the
HDF libraries.
But what about the future…
November 3-5,
2009
HDF/HDF-EOS Workshop XIII
4
www.hdfgroup.org
5. There is a risk in depending solely
on HDF libraries to access HDFformatted data over the long term.
It is possible, especially in the
distant future, that the libraries may
not be available.
November 3-5,
2009
HDF/HDF-EOS Workshop XIII
5
www.hdfgroup.org
6. “If only we could read HDF data with an
independent program that does not rely on
the HDF API…
A possible approach [would be to create] a
map of a data file, [and] utilities to find,
assemble and write out SDSes and vdatas.”
“Leveraging HDF Utilities”
Christopher Lynnes
HDF Workshop X.
November 3-5,
2009
HDF/HDF-EOS Workshop XIII
6
www.hdfgroup.org
7. User’s view of the HDF4 SD model
November 3-5,
2009
HDF/HDF-EOS Workshop XIII
7
www.hdfgroup.org
8. Mapping SDS to file offset/length
HDF4 file
layout
November 3-5,
2009
HDF/HDF-EOS Workshop XIII
8
www.hdfgroup.org
9. Mapping with chunks
HDF4 file
layout
November 3-5,
2009
HDF/HDF-EOS Workshop XIII
9
www.hdfgroup.org
10. Recap
• Problem
• The complex byte layout of HDF files makes
long-term readability of HDF data dependent
on long-term availability HDF software.
• Solution
• Create a map of the layout of data objects in
an HDF file, allowing a simple reader to be
written to access the data.
November 3-5,
2009
HDF/HDF-EOS Workshop XIII
10
www.hdfgroup.org
11. The HDF Group
The project – phase 1
2007-2008
November 3-5,
2009
HDF/HDF-EOS Workshop XIII
11 www.hdfgroup.org
12. HDF4 mapping project activities
1. Assess and categorize HDF4 data held by NASA
• Determine what types of objects to map.
• Get an idea of the magnitude of the project.
1. Develop prototype for proof of concept
• Develop markup-language based layout
specification.
• Develop tool to produce layout for an HDF4 file.
• Develop and test two independent tools to read
HDF4 data based solely on the map files
November 3-5,
2009
HDF/HDF-EOS Workshop XIII
12
www.hdfgroup.org
13. How many HDF4 products?
Data Center
ASF
HDF4 Products
0
GES-DISC
GHRC
54
ASDC
63
LP-DAAC
67
NSIDC
47
ORNL-DAAC
2
PO.DAAC
22
SDAC
0
MrDC
95
Total
November 3-5,
HDF/HDF-EOS
2009
Workshop XIII
236
586
13
www.hdfgroup.org
14. Data characteristics
Product Characteristics Examined
•
Product Identification
•
•
•
•
•
•
HDF-EOS version
For point data
•
•
•
•
•
Number of swaths
Maximum number of dimensions
Organized by time, space, both, or other
Whether dimension maps were used
For gridded data
•
•
•
•
Number of grids
Max number of dimensions in a grid
Number of projections used
Whether any grids were indexed
HDF Version
•
•
Number of SDSs
Maximum number of dimensions
Did any SDS have attributes
Was any SDS annotated
Were dimension scales used
Was compression used and if so what
kind
Was chunking used
For Vdata
•
•
•
•
•
November 3-5,
HDF/HDF-EOS
2009
Workshop XIII
Number of 8-bit rasters
Number of 24-bit rasters
Number of general rasters
Whether any rasters had attributes
Whether any rasters were compressed
Whether any rasters were chunked
Whether there were any palettes
For SDS data
•
•
•
•
•
•
Number of point data sets
Maximum number of levels
For swath data
•
•
•
•
For raster data
•
•
•
•
•
•
•
Product Name
Data Level
Archive Location
Product Version
Whether the product was multi-file
For HDF-EOS products
•
•
•
•
Number of Vdata structures
Did any Vdata have attributes
Did any Vdata fields have attributes
Was compression used and if so what
kind
Was chunking used
14
www.hdfgroup.org
15. HDF4 mapping prototype workflow
HDF4 File
HDF4 File
“H4.hdf”
“H4.hdf”
hmap
hmap
linked with
linked with
HDF4 library
HDF4 library
HDF4 Mapping File
HDF4 Mapping File
(XML document)
(XML document)
“H4.hdf.map.xml”
“H4.hdf.map.xml”
Groups, Data Objects,
Structural and Application
Metadata;
Locations of Object Data
Object Data
Reader 1
Reader 2
2
(C program)
(Perl Script)
(Perl Script)
November 3-5,
2009
HDF/HDF-EOS Workshop XIII
15
www.hdfgroup.org
16. The HDF Group
Phase 2: 2009-2011
Productizing HDF4
Mapping schema and
tools for deployment
November 3-5,
2009
HDF/HDF-EOS Workshop XIII
16
www.hdfgroup.org
17. Phase 2 tasks
• Revise schema
• Investigate integration of mapping schema with
existing standards
• Analyze what’s needed to include HDF-EOS 2
• Revise the XML schema
•
•
•
•
Implement production quality HDF4 map writer
Develop demo HDF4 map reader
Deploy
Optional tasks
• Implement general purpose reader
• Develop validation utilities
November 3-5,
2009
HDF/HDF-EOS Workshop XIII
17
www.hdfgroup.org
18. How you can help
• Project page at The HDF Group website:
• http://www.hdfgroup.org/projects/hdf4mapping/
• Consider what it might take to implement this
for your archive - contact us if you’d like
support
• Let us know if you are interested in
participating in any capacity.
November 3-5,
2009
HDF/HDF-EOS Workshop XIII
18
www.hdfgroup.org
19. The HDF Group
Thank You!
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
19
www.hdfgroup.org
20. Acknowledgements
This work was supported by cooperative agreement
number NNX08AO77A from the National
Aeronautics and Space Administration (NASA).
Any opinions, findings, conclusions, or
recommendations expressed in this material are
those of the author[s] and do not necessarily reflect
the views of the National Aeronautics and Space
Administration.
November 3-5, 2009
HDF/HDF-EOS Workshop XIII
20
www.hdfgroup.org
Full quote, from proposal:
Through the HDF software libraries, either by using the HDF APIs directly or by using HDF tools that depend on the HDF libraries.
However there is a risk in depending solely on the HDF libraries to access HDF-formatted data over the long term.
It is possible, especially in the distant future, that the libraries may not be as readily available as they are today. To address this risk, it is desirable to have a way to retrieve the data independently.
At the 10th HDF workshop, Christopher Lynnes of the Goddard Earth Sciences Data and Information Services Center(GES DISC) addressed this need: “If only we could read HDF data with an independent program that does not rely on the HDF API… A possible approach [would be to] extend” hdfls to print a hierarchical map of a data file, [and] write ncdump/hdp-like utilities to find, assemble and write out SDSes and vdatas.”
“Leveraging HDF Utilities,” Christopher Lynnes, 10th HDF Workshop. http://www.hdfeos.org/workshops/ws10/presentations/day3/Leveraging_HDF_Utilities.ppt.
An XML-based prototype schema for HDF4 mapping files (XML documents) was created. For a given binary HDF4 file, an associated mapping file contains structural and application metadata for the HDF4 file, as well as the locations of the object data (array element values) in the HDF4 file.
A tool was written to generate mapping files.
Other tools were developed that use the mapping files to read HDF4 files without calling the HDF4 library, confirming the approach is viable.
While the focus of this effort was NASA EOSDIS data stored in HDF4 files, the general methodology is also relevant to other cases where the long-term accessibility of data stored in binary files is of concern.
In addition, this work demonstrates how binary HDF files can be used to efficiently store large volumes of scientific data that is referenced by text-based XML documents (the mapping files).