This document summarizes HDF software activities in 2002, including support and funding sources for HDF, recent and upcoming releases of HDF4 and HDF5 libraries and tools, and other HDF-related projects. The HDF5 library saw improvements to performance, compilers supported, and tools like HDFView and converters. The next major HDF5 release in 2003 will focus on new features, performance enhancements, and special platform support. High level APIs and the parallel HDF5 programming model were also under development.
How to Troubleshoot Apps for the Modern Connected Worker
HDF Update Highlights Developments and Activities
1. HDF Update
Mike Folk
National Center for Supercomputing Applications
HDF and HDF-EOS Workshop VI
December 4-5, 2002
-1-
HDF
2. Topics
• Who is supporting HDF
• HDF software in 2002
• Other activities of interest
-2-
HDF
3. Who is supporting HDF?
• NASA/ESDIS
– Earth science applications, instrument data
• DOE/ASCI (Accelerated Strategic Computing Init.)
– Simulations on massively parallel machines
• NCSA/NSF/State of Illinois
– HPC and Grid data intensive apps, Visualization, user support
– Atmospheric and ocean modeling environments
• DOE Scientific Data Analysis & Computation Program
– High performance I/O R & D
• National Archives and Records Administration
– Small grant to consider HDF5 as an archive format
-3-
HDF
4. HDF software in 2002
•
•
•
•
•
Library releases
Java Products
Tools
Compression
Investigations of Web technologies
-4-
HDF
5. HDF4 library
• No releases in 2002.
• Release 1.6 planned for May, 2003
– Bug fixes
– New compilers
• Intel
• Portland Group
– New OS
• Mac OS X
• AIX 5.1 64-bit
-5-
HDF
6. HDF5 software milestones in 2002
Q1 ‘02
Base
library
High level
library
Java
products
Other
tools
Q2 ‘02
Q3 ‘02
3
.4.
1
F5
HD
sβ
ble
ta
0
a
av ts 1.
J c
du
pro
5
-H sion
H4 er
onv y
c ar
- 6 - libr
.4.
1
Q4 ‘02
4
5
.4.
1
e
ev
hl
Hig Is
AP
a
va 1.1
a ts
av .2
J c
J 1
du
ds
pro
pro
ort
p
im
H5
HDF
7. HDF5 library in 2002
• Compilers, configuration, etc.
– “h5cc” script to simplify compilation of HDF5
programs
– F90 shared library and C++ supported on Windows
– Intel C, F90 and C++ on Linux, IA32/64 and Windows
– Support for zlib 1.1.4
• Performance
– Added library performance tests
– Performance improvements
• hyperslabs, data conversions. chunking
– Fewer and larger I/O requests when accessing a file
– Parallel I/O performance improvements
-7-
HDF
8. Parallel HDF5
• Parallel I/O performance benchmark suite
– Compares raw I/O, MPI-I/O, and HDF5 I/O
– Distributed with HDF5
– http://hdf/RFC/PIO_Perf/PHDF5_performance.html
• Parallel HDF5 tutorial
– http://hdf.ncsa.uiuc.edu/HDF5/doc/Tutor/
• “Flexible parallel HDF5” programming model
– More flexible model for parallel HDF5
• Performance studies and tuning activities
-8-
HDF
9. Next major release -- HDF5 1.6
• Release date: Spring 2003
• New format and library features include
–
–
–
–
Compression enhancements, including szip
Generic Properties
Checksum
Dimension scale support (tentative)
• Performance improvements include
– Chunking & compression
– Parallel I/O performance benchmark suite
-9-
HDF
10. Next major release -- HDF5 1.6
• Flexible parallel HDF5
• Special platforms
–
–
–
–
–
Large Compaq cluster (Pittsburgh SC)
Crays
Windows XP
Mac
Several new compilers (e.g. Intel, Portland Group)
• Documentation
– New User’s Guide-good draft, first version
- 10 -
HDF
11. High level APIs
• Make HDF5 easier to use
– More operations per call than the normal HDF5 API
• Encourage standard ways to store objects
– Enforce standard representation of objects in HDF5
- 11 -
HDF
12. High level APIs
• Lite – done
– Same as HDF5, but simpler
• Image – done
– Interprets dataset as image/palette
– 2-D raster data like HDF4 raster images
• Table – partly done
– Interprets dataset as “tables” – collections of records
– Insert, delete records or fields
– Future: sort and search
• Dimension scale – in the works
• Unstructured grids – in the works
• http://hdf.ncsa.uiuc.edu/HDF5/hdf5_hl/doc/
- 12 -
HDF
14. HDF Java Products – 2002
• Goal: replace older tools with single viewer/editor
• HDF Java Products
–
–
–
–
Java HDF Interface (JHI) – to access the HDF4 library.
Java HDF5 Interface (JHI5) – to access the HDF5 library.
New hdf-object package – understands HDF4 and HDF5.
HDFView – tool for browsing/editing HDF4 and HDF5
• See demo, brochure, CD, web page
– http://hdf.ncsa.uiuc.edu/hdf-java-html/
- 14 -
HDF
15. HDFView releases in 2002
Q2
Version 1.0
Browser for
both HDF4
and HDF5
Q3
Version 1.1
Editor for
both HDF4
and HDF5
Q4
Version 1.2
All features of
old Java tools.
Some new
features.
HDFView can do as much as JHV and H5View and
also includes many new editing features
http://hdf.ncsa.uiuc.edu/hdf-java-html/hdfview/
- 15 -
HDF
16. H4toH5 Conversion Toolkit
• Goal: support transition from HDF4 to HDF5
• Version 1.0 released in July 2002
• Includes
– h4toh5 converter
– h5toh4 converter
– library of functions for converting HDF4 objects into
HDF5 objects
• Download from:
– http://hdf.ncsa.uiuc.edu/h4toh5/libh4toh5.html
• Mapping specification and FAQ
– http://hdf.ncsa.uiuc.edu/HDF5/doc/ADGuide/H4toH5Mapping.pdf
- 16 -
HDF
17. Other tools work
• H5import - convert flat files to HDF5 datasets
– ASCII text file with numeric data (float or integer)
– Binary file with native floating point data
– Binary file with native integer data
• hdf4import – souped up version of the old fptohdf
– Available in hdf4r1.6
• HDF5-to-GIF and GIF-to-HDF5 converters
• H5dump improvements
– Subsetting
– Support variable length datatypes including strings
- 17 -
HDF
18. Other tools work
• H5diff
– compare the structure and contents of two HDF5 files,
and report differences
– Command line utility like Unix ‘diff’ and older ‘hdiff’
– Report missing objects, inconsistent size, datatype, etc.
– Compare values of numeric datasets
– First beta available January 2003
– RFC: http://hdf.ncsa.uiuc.edu/RFC/H5diff/h5diff.html
- 18 -
HDF
19. Compression
• Szip - fast compression method for EOS data
– Expect to include in next releases of HDF4 and HDF5
• Shuffling – reorder bytes before compressing
– Can improve compression ratio
• Performance study – BZIP2 vs gzip compression
– Study: whether or not to support bzip2 compression
– Result: BZIP2 not significantly better than gzip
– So not currently supported in the release
– But BZIP2 can be used with HDF5
- 19 -
HDF
21. HDF5 XML
• Great interest in XML, interoperation of XML and
binary formats
• Results
– HDF5 DTD
– h5dump –XML
– H5View reads XML and writes HDF5
• Studies, design notes, other info
– http://hdf.ncsa.uiuc.edu/HDF5/XML/
• Possible future activity:
–
–
–
–
XML schema
Update tools
HDF4 schema, tools
Format translation via XSLT
- 21 -
HDF
22. XML, Java Server Pages, etc.
• How to use HDF5 data in Web environment
• Experiments with XML, Java Server Pages
(JSP), etc.
– JSP server
• Access HDF5 files on Web server using Web browser,
or Java applet, or Java application
– Several variations demonstrated
– Is not a product!
• http://hdf.ncsa.uiuc.edu/HDF5/XML/
- 22 -
HDF
23. CORBA Experiments
• HDF5 with CORBA on distributed systems
– Prototype CORBA server to wrap HDF5 library
and datasets (C++)
– Remote access via C++, Java, Web
– Might be valuable as replacement for Java Native
Interface
– Successful demonstration, but many open issues
– Is not a product!
http://hdf.ncsa.uiuc.edu/HDF5/XML/JSPExperiments/index.html
- 23 -
HDF
25. NPOESS
• National Polar-orbiting Operational Environmental
Satellite System
– Combine satellite systems of civil and defense programs
• HDF5 to be used to distribute data to users
• First implementation in 2006
– Support the NPOESS Preparatory Program
• Later full implementation by 2013
– Converged system provides global coverage
• http://www.ipo.noaa.gov
- 25 -
HDF
26. Neutron Research Community
• Worldwide research community
– England, France, Germany, Japan, Italy, Switzerland, Russia
– US centers at Argonne, NIST, Los Alamos
• Neutron and X-ray scattering experiments and simulations
– Common software and formats to gather, share, archive, postprocess data
• NeXus data format
–
–
–
–
Enforces standardization of metadata and data structures
Based on HDF4 for many years
Now switching to HDF5
http://www.neutron.anl.gov/nexus/
- 26 -
HDF
27. National Archives and
Records Administration
• Pilot project for HDF5
• Explore scientific data format requirements
for long term archiving of electronic records
• Identify record types for which HDF5 is
suited
- 27 -
HDF
28. Atmospheric and Ocean Models
•
•
Modeling Environment for Atmospheric
Discovery (MEAD)
HDF5 for high performance I/O for
atmospheric and ocean modeling
– Weather Research and Forecasting (WRF) model
– Regional Ocean Modeling System (ROMS)
– Coupling of WRF and ROMS
•
UAH ESML & data mining also involved
- 28 -
HDF
29. HDF5 Mesh API prototype
• Support for structured and unstructured “mesh” data
• For applications such as computational fluid
dynamics, finite element analysis, and visualization.
• A higher-level API
• Format
– HDF5 groups and datasets to organize the data
• Collaboration involving NCSA, CEI and others
• Documentation still pretty sketchy, but see
• ftp://ftp.ensight.com/pub/HDF_RW/hdf_rw.tgz
• Discussion list in the works
- 29 -
HDF
30. HDF5 Wins 2002 R&D Magazine Award
“The 100 products and processes that are the
most ‘technologically significant’ and can
change people's lives for the better”
http://www.ncsa.uiuc.edu/News/Access/Releases/020722.HDF5.html
- 30 -
HDF
31. Thank you!
Information Sources
HDF • HDF website
– http://hdf.ncsa.uiuc.edu/
5 • HDF5 Information Center
– http://hdf.ncsa.uiuc.edu/HDF5/
• HDF Helpdesk
– hdfhelp@ncsa.uiuc.edu
• HDF users mailing list
– hdfnews@ncsa.uiuc.edu
- 31 -
HDF
33. HDF5 funding sources
Other
DOE 4%
SciDAC
4%
State of IL
10%
NASA
37%
NSF
14%
ASCI
31%
NASA
ASCI
NSF
State of IL DOE SciD Other
$588,000 $495,000 $225,553 $162,750 $70,000 $60,000
- 33 -
HDF
34. HDF5 User Community
• Worldwide use in government, academia, industry
• How many users?
– 450 organizations or individuals have filled in “user” form in the past
year
– There are many times this many anonymous users
– And some organizations have thousands of users (e.g. the Earth
Observing System)
• Public applications
– More than 25 publicly available applications
– Four vendors so far
•
•
•
•
•
LabVIEW
IDL
EarthScan Network
HDF Explorer
Others in the works (e.g. Matlab)
- 34 -
HDF
35. Technical fields that use HDF5
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Aerospace
Agricultural research
Air traffic control
Aircraft emissions database
Applied mathematics
Astrophysics
Astrophysics / supernovae
Atmospheric chemistry
Atmospheric physics
Bioengineering
CEM Simulation
Climatology / hydrology
Computational fluid dynamics
Computational physics
Computational physics /
education
Computational physics and
computational astrophysics
Computer modeling
Computer science
Data processing
Earth observation /
atmospheric science
Earth science
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Environmental science
Fast searching, sorting and retrieval
Film making special effects
Fluid mechanics
GIS
Geodetic Science
Geology
Gravitational physics
Hydrology
Information technology
Magnetic mass spectrometer
development
Marine biology / ecology
Materials science
Meteorological data products
Meteorology
Microscopy
Molecular biology
Nano device simulation
Neutron scattering
Ocean color
Ocean remote sensing
Optics / optoelectronics
Petroleum engineering
- 35 -
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Photonic band gap studies
Photonic crystals
Photonics
Post-fire erosion analysis
Protein crystallography,
molecular modeling
Protostellar accretion discs
Remote sensing
SAR processing
Satellite / weather radar remote
sensing
Satellite oceanography
Semiconductor process
simulation
Software engineering, distributed
systems
Space geodesy
Space physics
Surface water flow and sediment
transport
Theoretical chemistry
Visualization
Volcanology
Water resources management
X-ray physics
HDF
38. Next major release -- HDF5 1.6
• Performance improvements
–
–
–
–
–
Chunking
Compression (several)
Parallel I/O
Metadata I/O
Compact dataset storage
• Other parallel
– Parallel I/O performance benchmark suite
– Flexible parallel HDF5
– Portland group C, Fortran 90 and C++ compilers
– Quite a bit of Fortran work
- 38 -
HDF
39. Next major release -- HDF5 1.6
• Testing (several)
• Special platforms
–
–
–
–
–
PSC cluster
Cray
Windows XP
Mac
Several new compilers (e.g. Intel, Portland Group)
• Documentation
– New User’s Guide-good draft, first version
- 39 -
HDF
40. HDF5 High Level APIs – HDF5 Image
• For datasets to be interpreted as images/palettes
– 2-D raster data like HDF4 raster images
• Image operations
– Create, write, read, query
• Based on “HDF5 Image & Palette Specification”
- 40 -
HDF
41. HDF5 High Level APIs – HDF5 Table
• For datasets to be interpreted as “tables”
– A collection of records
– All records have the same structure
– Like Vdatas in HDF4, but more operations
• Table operations
–
–
–
–
Create, write, read, query
Insert, delete records or fields
Future: sort and search
Includes the following new Table functions:
- 41 -
HDF
42. HDF5 High Level APIs – HDF5 Table
• For datasets to be interpreted as “tables”
– A collection of records
– All records have the same structure
– Like Vdatas in HDF4, but more operations
• Table operations
– Create, write, read, query
– Insert, delete records or fields
– Later: sort and search
- 42 -
HDF
43. HDF5 High Level API – Future
• Dimension scales
– Similar to HDF4
– In progress
• More table operations
– sort and search
• Unstructured grids
– E.g. triangle mesh
- 43 -
HDF
44. Szip Compression Software
• Implements CCSDS lossless compression algorithm
• Fast compression method for EOS data
• Expect to include in next releases of HDF4 and HDF5
– HDF4: compress SDS and image
– HDF5: compress datasets
• Intellectual property issues
–
–
–
–
Owned by U of Idaho (formerly U of New Mexico)
Open source
No commercial of encoder use without license
Decoder free for everyone
- 44 -
HDF
45. Performance study – BZIP2 compression
• Goal: decide whether or not to support bzip2 compression
• Compared bzip2 and gzip
• Observations
– Bzip2 always better than gzip in compression ratio
– But the difference was just a few percentage points
– And bzip2 always takes more processing time, especially for
decoding
• Result
– Not currently supported in the release
– But BZIP2 can be used with HDF5 (checked with HDF5-1.4.4)
• http://hdf.ncsa.uiuc.edu/HDF5/papers/bzip2/
- 45 -
HDF
46. New HDFView features
•
•
•
•
•
•
•
•
•
•
Display palette in graph as
separate RGB lines.
Open file as read-only option
Create new array from old array
Import data from text file
Save to HDF4, HDF5 or binary
Create new image from subset
of existing image
Modify string-type dataset
content
Convert jpeg to HDF image
Convert HDF to jpeg image
More user options and well
organized GUI
- 46 -
•
•
•
•
•
•
•
Select vdata or compound datatype
by field
Select subset from preview image
and using mouse
Support unlimited dimension when
creating new HDF4 dataset.
Enable application of simple math
calculations to data
Support multiple palettes/image
Create new image with default
attributes
Modify image palette or select
predefined palette
HDF
47. CORBA, XML etc. permutations
Java
HTML
Java C
Java
Java
Server
Native
Platform Interface
Web
browser
HDF
Library
and File
C
XML
Any
CORBA
Server
Applet
C++
Java
Native
Interface
C
Java
Java
Other
App.
Other
App.
Any
H5view,
etc
Java
Any
Client/Remote
Server/Local
Distributed Product
Demonstrated in Research
Should work, but not demonstrated
- 47 -
HDF
48. National Polar-orbiting Operational Environmental
Satellite System (NPOESS)
U.S. civil and defense programs to combine weather data collection, expanding to
global coverage and long-term continuity of observations at lessMETOP
cost!
POES
METOP
DMSP
0730
1330
0830
0530
0830
0530
1330
DMSP
Today
• 4-Orbit System
– 2 US Military
– 2 US Civilian
NPOESS
0530
0930
0930
1330
POES
POES
Local Equatorial
Crossing Time
NPOESS
DMSP
Local Equatorial
Crossing Time
DMSP
Tomorrow (2005)
2 US Military
1 US Civilian
1 EUMETSAT/METOP
- 48 - Distribute
Local Equatorial
Crossing Time
NPOESS
Lite
Future (2013)
2 US Converged
1 US “Lite”
1 EUMETSAT/METOP
Specialized Satellites
in HDF5 HDF
Notes de l'éditeur
<number>
Use this as backup
backup
backup
backup
Backup slides
Backup slides
backup
Backup: Animation of all the permutations
NPOESS is evolving the United States’ 4 spacecraft polar-orbiting satellite system into a two satellite system based on U.S. civil and national security requirements. Consistent with the PDD, the NPOESS program is implementing the converged system in a manner that encourages cooperation with foreign governments and international organizations, specifically leveraging European developed payloads and relying on EUMETSAT to provide the satellite for the third plane of the 3-satellite Joint Polar System constellation that will ensure global coverage for key environmental data.
<NEXT SLIDE>