SlideShare une entreprise Scribd logo
1  sur  72
HDF Update
Mike Folk
The HDF Group
HDF and HDF-EOS Workshop XI
November 7, 2007

02/18/14

The HDF Group

1
Outline
• What is The HDF Group?
• HDF Software Update
• Other Activities of Interest

02/18/14

The HDF Group

2
What is
The HDF Group
(THG)?

02/18/14

The HDF Group

3
THG, the Company
•
•
•
•

Spun-off from University of Illinois July 2006
Non-profit
20+ scientific, technology, professional staff
Intellectual property:
− THG owns HDF4 and HDF5
− HDF formats and libraries to remain open
− Libraries have BSD-type license

• Continue ties to U of I and NCSA

02/18/14

The HDF Group

4
The mission of The HDF Group
is to ensure long-term
accessibility of HDF data through
sustainable development and
support of HDF technologies.

02/18/14

The HDF Group

5
Goals
• Maintain, evolve HDF for sponsors and
communities that depend on it
• Do consulting, training, tuning, development,
research
• Sustain The HDF Group for long term to assure
data access over time

02/18/14

The HDF Group

6
THG Services
•
•
•
•
•

•

Helpdesk and Mailing Lists
− Available to all users as a first level of support
Standard Support
− Rapid issue resolution support
Consulting
− Needs assessment, troubleshooting, design reviews, etc.
Enterprise Support
− Coordinating HDF activities across divisions
Special Projects
− Adapting customer applications to HDF
− New features and tools, with changes normally incorporated into
open source product
− Research and Development
Training
− Tutorials and hands-on practical experience

02/18/14

The HDF Group

7
HDF Software Update

02/18/14

The HDF Group

8
HDF4 update

02/18/14

The HDF Group

9
HDF 4.2r2
Released in October

02/18/14

The HDF Group

10
New features and changes
• New APIs added to the SD and GR interfaces:
− SDreset_maxopenfiles, SDget_maxopenfiles, Modifies, reports
maximum allowable number of files
− SDget_numopenfiles:Gets number of open files
− SDgetcompinfo, GRgetcompinfo: Gets compression info
− SDgetfilename: Retrieves name of file, given its ID
− SDgetnamelen: Retrieves length of object name, given its ID

• SZIP compression
− Now can be invoked by Fortran API
− Now available for raster images via GR interface

• SDS, Vgroup names no longer limited to 64 characters

02/18/14

The HDF Group

11
New features and changes
• HDF configuration changes
− --enable-netcdf flag introduced
− Autotools versions updated

• Many bug fixes made to hrepack and hdiff
• See RELEASE.txt for a full list of changes

02/18/14

The HDF Group

12
Platforms to drop/add next release
• Drop
− Windows XP with MSVC+
+ 6.0
− Linux 2.4
− IRIX64 6.5
− SunOS 5.8, 5.9

02/18/14

The HDF Group

• Add
− Windows 64-bit (32 and
64-bit binaries)

13
Platforms tested
•

• Compilers

Systems
−
−
−
−
−
−
−
−

AIX 5.3 (32-bit, 64-bit)
Free BSD 6.2 (32-bit, 64-bit)*
HP-UX B.11.23 (32-bit, 64-bit)*
IRIX 64 v6.5 (32-bit, 64-bit)
Linux 2.4, 2.6*
Linux ia64
Linux x86_64
Sun OS 5.8, 5.10* (32-bit, 64bit)
− SunOS 5.10 on Intel
− Windows XP, Vista
− Mac OS X Intel*

−
−
−
−
−
−
−
−
−

IBM C and Fortran compilers
GNU gcc 3.4* and GNU Fortran
HPUX C and Fortran compilers
GNU gcc 3.4 and 4.*
Intel C and Fortran versions 9.1 and
10.00
SUN WorkShop C and Fortran
Visual Studio .NET and 2005 and
Intel Fortran
Visual Studio 2005 (no fortran)
GNU gcc 4.0.1 with gfortran and
g95

* New platforms
For detailed info, see RELEASE.txt

02/18/14

The HDF Group

14
HDF5 Update

02/18/14

The HDF Group

15
HDF5 1.6.6

02/18/14

The HDF Group

16
HDF5 1.6.6 release
• Primarily a bug-fix release
• Some tool changes (see later slide)
• http://hdfgroup.org/HDF5/release/obtain5.html

02/18/14

The HDF Group

17
Platforms dropped
• Operating systems
−
−
−
−

• Compilers

− PGI 6.5-*
AIX 5.3
Solaris 2.8 and 2.9
OSF1
Windows XP with MSVC++ 6.0

http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html

02/18/14

The HDF Group

18
Platforms added
•

Systems
− Alpha Open VMS
− MAC OSX 10.4 (Intel)
− Solaris 2.* on Intel
− Cray XT3
− Windows 64-bit (32 and 64bit)
− BG/L

02/18/14

The HDF Group

• Compilers
−
−
−
−

PGI V. 7.*
Intel 10.*
MPICH 1.2.7
MPICH2

19
HDF5 1.8

02/18/14

The HDF Group

20
HDF5 1.8 new library features
• Datatype and dataspace features
−
−
−
−
−
−

Create datatype from text description
Integer to float conversions during I/O
Compact storage for N-bit datatypes
Offset+size storage filter, saving space
“Null” dataspace – datasets with no elements
Data transformation filter

02/18/14

The HDF Group

21
HDF5 1.8 – new library features
• Group improvements
−
−
−
−

Creation order access
Compact groups – small groups take less space
Large group storage improvements
Intermediate group creation

• Link improvements
− Unicode names allowed
− External links – to objects in another file
− User defined links – create own kinds of links

02/18/14

The HDF Group

22
HDF5 1.8 – new library features
• Attribute improvements
− Improved storage for large number of attributes
− Iterate or look up by creation order
− Unicode names allowed

• Support for Unicode UTF-8 character set
• Shared header information, possibly saving space
• Metadata cache improvements – faster I/O on
files with many objects
• Better UNIX/Linux portability

02/18/14

The HDF Group

23
HDF5 1.8 – new APIs
•
•
•
•

New extendible error-handling API
New APIs to copy objects between files quickly
Dimension scale model and API
“HDFpacket” API, to read/write packets efficiently

02/18/14

The HDF Group

24
HDF5 1.8 – Backward and
Forward Compatibility

02/18/14

The HDF Group

25
HDF5 1.8 and 1.6
• Differences between 1.8 and 1.6.x
− Some file format changes
− Several new routines added
− Old APIs deprecated – may be removed in later
release

• Consequences
− Applications requiring 1.8 format changes will
generate objects that cannot be read by 1.6 library
− To exploit 1.8 changes, applications need to be
rewritten

02/18/14

The HDF Group

26
“The art of progress is to
preserve order amid change, and
to preserve change amid order.”
Alfred North Whitehead

02/18/14

The HDF Group

27
Principle of
Maximum File Format Compatibility
Unless instructed otherwise, the HDF5 library will write objects
using the earliest version of the format possible for describing
the information.
information
Assures older library versions are forward compatible whenever
possible:
− Objects in new files can be read with old versions of the library,
if the objects are “known” to the old libraries.
− New versions of the library can always read objects in files
written with older versions.

02/18/14
02/18/14

The HDF GroupGroup
The HDF

28
28
Command Line Tools

02/18/14
02/18/14

The HDF GroupGroup
The HDF

32
32
New features for existing tools
• -V option for all tools
− Prints HDF5 library version number used by tool

• h5repack: -L option
− Use latest version of file format to create objects

• h5dump: dumps groups/attributes in creation or
name order
− -q Q, --sort_by=Q Sort groups and attributes by index Q
− -z Z, --sort_order=Z Sort groups and attributes by order Z

02/18/14
02/18/14

The HDF GroupGroup
The HDF

33
33
New command line tools
• h5mkgrp
− Creates new groups and group hierarchies in an HDF5 file

• h5stat
− Provides statistics regarding the file, such as number of
objects per group, sizes of datasets, amount of free space in
file

• h5copy
− Copy object within a file or cross files

• h5check
− Verifies an HDF5 file against the defined HDF5 File Format
Specification
− Completed for 1.6.
− In progress for 1.8

02/18/14
02/18/14

The HDF GroupGroup
The HDF

34
34
Tool work in the pipeline
• Export numeric data formatted in several different
ways (such as MS excel, XML, etc)
• Import ASCII data that conforms to certain format
• Use a common text format for h5import and
h5dump
• Support NaN in tools such as h5diff.
Challenges:
− NaN is platform specific
− NaN can have different values for the same
machine
− Checking NaN can be a performance hit
02/18/14
02/18/14

The HDF GroupGroup
The HDF

35
35
HDF Java Products

02/18/14
02/18/14

The HDF GroupGroup
The HDF

36
36
HDF5 Java is Growing UP

02/18/14

The HDF Group

37
HDFView changes
• HDFView 2.4 released
• Many new features, such as
−
−
−
−
−

Support for compound datatypes of 2D+ arrays
Support for "filtering fill value" in Image Viewer
Effective handling of large 3D images
Support large fonts in GUI components
New autogain algorithm for image Brightness/Contrast

• New platforms
− Mac intel
− Linux 64-bit AMD
− Solaris 64-bit

02/18/14
02/18/14

The HDF GroupGroup
The HDF

38
38
Other Java products
• 36 new enhancements and 44 bugs fixed
• Test suite (using junit testing framework)
− Tests all public methods in the object package
− Added “make check” to run the test suite

• Enhanced documentation
− All public methods in the object package are fully
documented

02/18/14
02/18/14

The HDF GroupGroup
The HDF

39
39
Future work for Java
• Update HDF5 JNI APIs for HDF5 1.8 release
• Release HDFView with bug fixes/new features
with HDF5 1.8 release
• Port HDF5-SRB model to HDF5-iRODS model
• Writing capability for HDF5-iRODS model

02/18/14
02/18/14

The HDF GroupGroup
The HDF

40
40
Other Activities of Interest

02/18/14

The HDF Group

41
New THG Website

02/18/14

The HDF Group

42
New THG Website

02/18/14
02/18/14

The HDF GroupGroup
The HDF

43
43
HDF Performance
Framework

02/18/14

The HDF Group

44
Goals
• A framework for performance regression testing
• A tool for
−
−
−
−

Testing on multiple platforms
Testing different versions
Long term regression testing
Assistance in debugging

02/18/14

The HDF Group

45
Solution

HDF5 1.6

HDF5 1.8
cron

A User’s
Benchmark

Database

Performance
Library
www

PHP
Web Server

Graph/Text

02/18/14

The HDF Group

46
Sample Usage
H5Perf_startTimer(&time);
for(i=0;i<1000 ;i++) {
H5Gcreate(fileid,group_name,(size_t)0));
// Add groups
}
H5Perf_endTimer(&time);
H5Perf_addInstance(db_host, date, time);
00 21 * * * /home/local/hyoklee/src/chicago/test-perf-hdfdap-3.sh
|

178820 | 2007-08-17 21:51:14 | 10000 groups

Timestamp

02/18/14

| creating 10000 empty groups

Instance Name

The HDF Group

| 1.8.0

| hdfdap |

Version Platform

47

0.670198 |

Time

4384 |
Improved Crash
Survivability
in the HDF5 Library

02/18/14

The HDF Group

48
Crash Survivability in HDF5
• Problem:
− Data in HDF5 files susceptible to corruption in the
event of an application or system crash.
− Corruption possible if structural metadata is being
written when the crash occurs.

• Initial Objective:
− Guarantee an HDF5 file with consistent metadata
can be reconstructed in the event of a crash.
− No guarantee on state of raw data – contains
whatever made it to disk prior to crash.
02/18/14
02/18/14

The HDF GroupGroup
The HDF

49
49
Crash Survivability in HDF5
• Approach: Metadata Journaling
− When a piece of metadata is modified and in a
consistent state, make a journal note.
− If the application crashes, a recovery program can
replay the journal by applying in order all metadata
writes until the end of the last completed
transaction written to the journal file.

02/18/14
02/18/14

The HDF GroupGroup
The HDF

50
50
Faster HDF5 Data Appends

02/18/14

The HDF Group

51
Fast Data Appends
• Problem: Metadata operations limit the rate at
which HDF5 can append data to datasets.
• Solution: new data structure for indexing chunks:
− Allows constant time extend, shrink and lookup of
chunks in datasets with single unlimited dimension
− # of metadata I/O operations to append to dataset
is independent of # of chunks
− Allows single-writer/multiple-reader access

• Details at:
http://www.hdfgroup.uiuc.edu/RFC/HDF5/SkipList
ChunkIndex/SkipListChunkIndex.html
02/18/14
02/18/14

The HDF GroupGroup
The HDF

52
52
netCDF-4

02/18/14

The HDF Group

53
netCDF-4 Project
• Enhanced NetCDF-4 Interface to HDF5
− Combine features of netCDF and HDF5
− Take advantage of their separate strengths

• Collaboration between NCSA, THG, Unidata
• Currently in beta release
• Will be released after HDF5 1.8

02/18/14

The HDF Group

54
NetCDF-4 Architecture
netCDF-3
netCDF-3
applications
applications

netCDF
netCDF
files
files
netCDF-4
HDF5 files

netCDF-4
netCDF-4
applications
applications

HDF5
HDF5
applications
applications

netCDF-3
Interface

netCDF-4
Library

HDF5
files

HDF5 Library

• Supports access to netCDF files and HDF5
files created through netCDF-4 interface
02/18/14

The HDF Group

55
HDF5 OPeNDAP
Project
02/18/14
02/18/14

The HDF GroupGroup
The HDF

56
56
Project description
• Investigate integrated DAP-aware HDF5 library
that can provide seamless access to both
local and remote data
• A NASA ROSES NRA project
• See Kent Yang’s talk and poster

02/18/14
02/18/14

The HDF GroupGroup
The HDF

57
57
NOAA – Science Data
Stewardship

02/18/14

The HDF Group

58
NOAA – Science Data Stewardship
• Use HDF5 Archival Information Package (AIP) to
archive HDF EOS2 data
• A collaboration between NSIDC and THG
• See Ruth Duerr and Kent Yang’s poster

02/18/14
02/18/14

The HDF GroupGroup
The HDF

59
59
HDF5 and .NET
Framework

02/18/14
02/18/14

The HDF GroupGroup
The HDF

60
60
Why .NET?
• The Microsoft .NET framework is used by most
new applications created for Windows.
− Makes it easier to develop applications
− Reduces application vulnerability to security threats

• Supports development in multiple programming
languages, in particular C#.
• Increased level of interest in .NET from users of
HDF5.

02/18/14
02/18/14

The HDF GroupGroup
The HDF

61
61
HDF and .NET Status
• Received funding to implement prototype .NET
wrapper API for Windows XP
− Based on HDF5 C API
− Focus on C# binding
− Functionality limited to subset of API routines

• If funded, we would like to move beyond the
prototype to
− Create .NET wrappers for all HDF C functions
− Offer full support for .NET wrappers with HDF5 1.8
02/18/14
02/18/14

The HDF GroupGroup
The HDF

62
62
Bioinformatics
caacaagccaaaactcgtacaa
Cgagatatctcttggaaaaact
gctcacaatattgacgtacaag
gttgttcatgaaactttcggta
Acaatcgttgacattgcgacct
aatacagcccagcaagcagaat

Managing genomic data
02/18/14

The HDF Group

63
Electron tomography

25-80Å resolution
4k x 4k x 500 images now
8k x 8k x 1k images soon (256 GB)
02/18/14

The HDF Group

64
Sequencing

•

Next Gen Sequencing platforms produce ~1500 X more data than
CE (Sanger)

•

A single Next Gen instrument can produce 20 times more data a
single run than a day’s operation of a genome center with 100 CE
instruments

02/18/14

The HDF Group

65
An email on Sept 21…

“… A little background, we're doing genetic
association studies, these result in large 2-d matrices
(40K x 1M before applying threshholds). Each of
the cells in this matrix has ~10 numerical
statistics (e.g. some sort of pvalue)… ”
40K x 1M x 10 x 4 = 1,600,000,000,000 (1.6 TB)

02/18/14

The HDF Group

66
Product Data
STE
P

02/18/14

The HDF Group

67
Product data
• HDF5 proposed to ISO as binary representation
for product data representation and exchange
• Would be a binary option to the STEP format
• ISO/NWI-CD 10303-026, STEP Part 26

02/18/14

The HDF Group

68
SQL Server and HDF5

02/18/14

The HDF Group

69
SQL Server and HDF5
• THG discussing possible project with Microsoft
• Microsoft envisions a dream environment for
scientists that would encompass both computing
and data management
• Possible SQL Server solution
− Combine RDBMS and scientific analysis tools in a
single integrated system
− Use HDF5 to manage scientific objects not handled
well by traditional database

02/18/14
02/18/14

The HDF GroupGroup
The HDF

70
70
HDF5 in SQL server
Visualization

Libraries

(MATLAB,…)

Web Services

(XML, REST, RSS)

OLAP and
Data Mining

Reporting

.NET Languages with Language Integrated Query
Entity Framework (EDM, eSQL, O-R mapping)

HDF5 EDM model

SQL Server
HDF5

HDF5
TVFs

Index

HDF5
type

02/18/14

HDF5
files

HDF5 FS
blob

The HDF Group

71
Thank You All
and
Thank You NASA!

02/18/14

The HDF Group

72
Acknowledgement
This report is based upon work supported in part by a
Cooperative Agreement with NASA under NASA
NNG05GC60A. Any opinions, findings, and conclusions
or recommendations expressed in this material are
those of the author(s) and do not necessarily reflect the
views of the National Aeronautics and Space
Administration.

02/18/14

The HDF Group

73
Questions/comments?

02/18/14

The HDF Group

74
Information Sources
• HDF website
http://hdfgroup.org/

• HDF5 Information Center
http://hdfgroup.org/HDF5/

• HDF Helpdesk
hdfhelp@hdfgroup.org

• HDF users mailing list
hdfnews@ncsa.uiuc.edu
coming soon: news@hdfgroup.org

02/18/14

The HDF Group

75

Contenu connexe

Tendances

Tendances (20)

HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
HDF5 Tools Updates
HDF5 Tools UpdatesHDF5 Tools Updates
HDF5 Tools Updates
 
HDF Tools Tutorial
HDF Tools TutorialHDF Tools Tutorial
HDF Tools Tutorial
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
334096317.pptx
334096317.pptx334096317.pptx
334096317.pptx
 
HDF5 OPeNDAP project update and demo
HDF5 OPeNDAP project update and demoHDF5 OPeNDAP project update and demo
HDF5 OPeNDAP project update and demo
 
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFViewHDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Transition from HDF4 to HDF5
Transition from HDF4 to HDF5 Transition from HDF4 to HDF5
Transition from HDF4 to HDF5
 
Easy Access of NASA HDF data via OPeNDAP
Easy Access of NASA HDF data via OPeNDAPEasy Access of NASA HDF data via OPeNDAP
Easy Access of NASA HDF data via OPeNDAP
 
Performance Tuning in HDF5
Performance Tuning in HDF5 Performance Tuning in HDF5
Performance Tuning in HDF5
 
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, DatatypesHDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
 
Status of HDF-EOS, Related Software and Tools
 Status of HDF-EOS, Related Software and Tools Status of HDF-EOS, Related Software and Tools
Status of HDF-EOS, Related Software and Tools
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Easy Remote Access Via OPeNDAP
Easy Remote Access Via OPeNDAPEasy Remote Access Via OPeNDAP
Easy Remote Access Via OPeNDAP
 
Access HDF5 Datasets via OPeNDAP's Data Access Protocol (DAP)
Access HDF5 Datasets via OPeNDAP's Data Access Protocol (DAP)Access HDF5 Datasets via OPeNDAP's Data Access Protocol (DAP)
Access HDF5 Datasets via OPeNDAP's Data Access Protocol (DAP)
 
HDF5 I/O Performance
HDF5 I/O PerformanceHDF5 I/O Performance
HDF5 I/O Performance
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Bridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data ProductsBridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data Products
 

Similaire à HDF Update

Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout MapsEnsuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps
The HDF-EOS Tools and Information Center
 

Similaire à HDF Update (20)

HDF Updae
HDF UpdaeHDF Updae
HDF Updae
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF Status and Development
HDF Status and DevelopmentHDF Status and Development
HDF Status and Development
 
Hierarchical Data Formats (HDF) Update
Hierarchical Data Formats (HDF) UpdateHierarchical Data Formats (HDF) Update
Hierarchical Data Formats (HDF) Update
 
HDF Project Status and Plans
HDF Project Status and PlansHDF Project Status and Plans
HDF Project Status and Plans
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout MapsEnsuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Introduction to HDF5 Data and Programming Models
Introduction to HDF5 Data and Programming ModelsIntroduction to HDF5 Data and Programming Models
Introduction to HDF5 Data and Programming Models
 
HDF Project Update
HDF Project UpdateHDF Project Update
HDF Project Update
 
Hdf5 parallel
Hdf5 parallelHdf5 parallel
Hdf5 parallel
 
Data Interoperability
Data InteroperabilityData Interoperability
Data Interoperability
 
Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
HDF5 Backward and Forward Compatibility Issues
HDF5 Backward and Forward Compatibility IssuesHDF5 Backward and Forward Compatibility Issues
HDF5 Backward and Forward Compatibility Issues
 
HDF Project Update
HDF Project UpdateHDF Project Update
HDF Project Update
 

Plus de The HDF-EOS Tools and Information Center

Plus de The HDF-EOS Tools and Information Center (20)

Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
 
Google Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOSGoogle Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOS
 
Parallel Computing with HDF Server
Parallel Computing with HDF ServerParallel Computing with HDF Server
Parallel Computing with HDF Server
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
 
HDF Status Update
HDF Status UpdateHDF Status Update
HDF Status Update
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 

HDF Update

  • 1. HDF Update Mike Folk The HDF Group HDF and HDF-EOS Workshop XI November 7, 2007 02/18/14 The HDF Group 1
  • 2. Outline • What is The HDF Group? • HDF Software Update • Other Activities of Interest 02/18/14 The HDF Group 2
  • 3. What is The HDF Group (THG)? 02/18/14 The HDF Group 3
  • 4. THG, the Company • • • • Spun-off from University of Illinois July 2006 Non-profit 20+ scientific, technology, professional staff Intellectual property: − THG owns HDF4 and HDF5 − HDF formats and libraries to remain open − Libraries have BSD-type license • Continue ties to U of I and NCSA 02/18/14 The HDF Group 4
  • 5. The mission of The HDF Group is to ensure long-term accessibility of HDF data through sustainable development and support of HDF technologies. 02/18/14 The HDF Group 5
  • 6. Goals • Maintain, evolve HDF for sponsors and communities that depend on it • Do consulting, training, tuning, development, research • Sustain The HDF Group for long term to assure data access over time 02/18/14 The HDF Group 6
  • 7. THG Services • • • • • • Helpdesk and Mailing Lists − Available to all users as a first level of support Standard Support − Rapid issue resolution support Consulting − Needs assessment, troubleshooting, design reviews, etc. Enterprise Support − Coordinating HDF activities across divisions Special Projects − Adapting customer applications to HDF − New features and tools, with changes normally incorporated into open source product − Research and Development Training − Tutorials and hands-on practical experience 02/18/14 The HDF Group 7
  • 10. HDF 4.2r2 Released in October 02/18/14 The HDF Group 10
  • 11. New features and changes • New APIs added to the SD and GR interfaces: − SDreset_maxopenfiles, SDget_maxopenfiles, Modifies, reports maximum allowable number of files − SDget_numopenfiles:Gets number of open files − SDgetcompinfo, GRgetcompinfo: Gets compression info − SDgetfilename: Retrieves name of file, given its ID − SDgetnamelen: Retrieves length of object name, given its ID • SZIP compression − Now can be invoked by Fortran API − Now available for raster images via GR interface • SDS, Vgroup names no longer limited to 64 characters 02/18/14 The HDF Group 11
  • 12. New features and changes • HDF configuration changes − --enable-netcdf flag introduced − Autotools versions updated • Many bug fixes made to hrepack and hdiff • See RELEASE.txt for a full list of changes 02/18/14 The HDF Group 12
  • 13. Platforms to drop/add next release • Drop − Windows XP with MSVC+ + 6.0 − Linux 2.4 − IRIX64 6.5 − SunOS 5.8, 5.9 02/18/14 The HDF Group • Add − Windows 64-bit (32 and 64-bit binaries) 13
  • 14. Platforms tested • • Compilers Systems − − − − − − − − AIX 5.3 (32-bit, 64-bit) Free BSD 6.2 (32-bit, 64-bit)* HP-UX B.11.23 (32-bit, 64-bit)* IRIX 64 v6.5 (32-bit, 64-bit) Linux 2.4, 2.6* Linux ia64 Linux x86_64 Sun OS 5.8, 5.10* (32-bit, 64bit) − SunOS 5.10 on Intel − Windows XP, Vista − Mac OS X Intel* − − − − − − − − − IBM C and Fortran compilers GNU gcc 3.4* and GNU Fortran HPUX C and Fortran compilers GNU gcc 3.4 and 4.* Intel C and Fortran versions 9.1 and 10.00 SUN WorkShop C and Fortran Visual Studio .NET and 2005 and Intel Fortran Visual Studio 2005 (no fortran) GNU gcc 4.0.1 with gfortran and g95 * New platforms For detailed info, see RELEASE.txt 02/18/14 The HDF Group 14
  • 17. HDF5 1.6.6 release • Primarily a bug-fix release • Some tool changes (see later slide) • http://hdfgroup.org/HDF5/release/obtain5.html 02/18/14 The HDF Group 17
  • 18. Platforms dropped • Operating systems − − − − • Compilers − PGI 6.5-* AIX 5.3 Solaris 2.8 and 2.9 OSF1 Windows XP with MSVC++ 6.0 http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html 02/18/14 The HDF Group 18
  • 19. Platforms added • Systems − Alpha Open VMS − MAC OSX 10.4 (Intel) − Solaris 2.* on Intel − Cray XT3 − Windows 64-bit (32 and 64bit) − BG/L 02/18/14 The HDF Group • Compilers − − − − PGI V. 7.* Intel 10.* MPICH 1.2.7 MPICH2 19
  • 21. HDF5 1.8 new library features • Datatype and dataspace features − − − − − − Create datatype from text description Integer to float conversions during I/O Compact storage for N-bit datatypes Offset+size storage filter, saving space “Null” dataspace – datasets with no elements Data transformation filter 02/18/14 The HDF Group 21
  • 22. HDF5 1.8 – new library features • Group improvements − − − − Creation order access Compact groups – small groups take less space Large group storage improvements Intermediate group creation • Link improvements − Unicode names allowed − External links – to objects in another file − User defined links – create own kinds of links 02/18/14 The HDF Group 22
  • 23. HDF5 1.8 – new library features • Attribute improvements − Improved storage for large number of attributes − Iterate or look up by creation order − Unicode names allowed • Support for Unicode UTF-8 character set • Shared header information, possibly saving space • Metadata cache improvements – faster I/O on files with many objects • Better UNIX/Linux portability 02/18/14 The HDF Group 23
  • 24. HDF5 1.8 – new APIs • • • • New extendible error-handling API New APIs to copy objects between files quickly Dimension scale model and API “HDFpacket” API, to read/write packets efficiently 02/18/14 The HDF Group 24
  • 25. HDF5 1.8 – Backward and Forward Compatibility 02/18/14 The HDF Group 25
  • 26. HDF5 1.8 and 1.6 • Differences between 1.8 and 1.6.x − Some file format changes − Several new routines added − Old APIs deprecated – may be removed in later release • Consequences − Applications requiring 1.8 format changes will generate objects that cannot be read by 1.6 library − To exploit 1.8 changes, applications need to be rewritten 02/18/14 The HDF Group 26
  • 27. “The art of progress is to preserve order amid change, and to preserve change amid order.” Alfred North Whitehead 02/18/14 The HDF Group 27
  • 28. Principle of Maximum File Format Compatibility Unless instructed otherwise, the HDF5 library will write objects using the earliest version of the format possible for describing the information. information Assures older library versions are forward compatible whenever possible: − Objects in new files can be read with old versions of the library, if the objects are “known” to the old libraries. − New versions of the library can always read objects in files written with older versions. 02/18/14 02/18/14 The HDF GroupGroup The HDF 28 28
  • 29. Command Line Tools 02/18/14 02/18/14 The HDF GroupGroup The HDF 32 32
  • 30. New features for existing tools • -V option for all tools − Prints HDF5 library version number used by tool • h5repack: -L option − Use latest version of file format to create objects • h5dump: dumps groups/attributes in creation or name order − -q Q, --sort_by=Q Sort groups and attributes by index Q − -z Z, --sort_order=Z Sort groups and attributes by order Z 02/18/14 02/18/14 The HDF GroupGroup The HDF 33 33
  • 31. New command line tools • h5mkgrp − Creates new groups and group hierarchies in an HDF5 file • h5stat − Provides statistics regarding the file, such as number of objects per group, sizes of datasets, amount of free space in file • h5copy − Copy object within a file or cross files • h5check − Verifies an HDF5 file against the defined HDF5 File Format Specification − Completed for 1.6. − In progress for 1.8 02/18/14 02/18/14 The HDF GroupGroup The HDF 34 34
  • 32. Tool work in the pipeline • Export numeric data formatted in several different ways (such as MS excel, XML, etc) • Import ASCII data that conforms to certain format • Use a common text format for h5import and h5dump • Support NaN in tools such as h5diff. Challenges: − NaN is platform specific − NaN can have different values for the same machine − Checking NaN can be a performance hit 02/18/14 02/18/14 The HDF GroupGroup The HDF 35 35
  • 33. HDF Java Products 02/18/14 02/18/14 The HDF GroupGroup The HDF 36 36
  • 34. HDF5 Java is Growing UP 02/18/14 The HDF Group 37
  • 35. HDFView changes • HDFView 2.4 released • Many new features, such as − − − − − Support for compound datatypes of 2D+ arrays Support for "filtering fill value" in Image Viewer Effective handling of large 3D images Support large fonts in GUI components New autogain algorithm for image Brightness/Contrast • New platforms − Mac intel − Linux 64-bit AMD − Solaris 64-bit 02/18/14 02/18/14 The HDF GroupGroup The HDF 38 38
  • 36. Other Java products • 36 new enhancements and 44 bugs fixed • Test suite (using junit testing framework) − Tests all public methods in the object package − Added “make check” to run the test suite • Enhanced documentation − All public methods in the object package are fully documented 02/18/14 02/18/14 The HDF GroupGroup The HDF 39 39
  • 37. Future work for Java • Update HDF5 JNI APIs for HDF5 1.8 release • Release HDFView with bug fixes/new features with HDF5 1.8 release • Port HDF5-SRB model to HDF5-iRODS model • Writing capability for HDF5-iRODS model 02/18/14 02/18/14 The HDF GroupGroup The HDF 40 40
  • 38. Other Activities of Interest 02/18/14 The HDF Group 41
  • 40. New THG Website 02/18/14 02/18/14 The HDF GroupGroup The HDF 43 43
  • 42. Goals • A framework for performance regression testing • A tool for − − − − Testing on multiple platforms Testing different versions Long term regression testing Assistance in debugging 02/18/14 The HDF Group 45
  • 43. Solution HDF5 1.6 HDF5 1.8 cron A User’s Benchmark Database Performance Library www PHP Web Server Graph/Text 02/18/14 The HDF Group 46
  • 44. Sample Usage H5Perf_startTimer(&time); for(i=0;i<1000 ;i++) { H5Gcreate(fileid,group_name,(size_t)0)); // Add groups } H5Perf_endTimer(&time); H5Perf_addInstance(db_host, date, time); 00 21 * * * /home/local/hyoklee/src/chicago/test-perf-hdfdap-3.sh | 178820 | 2007-08-17 21:51:14 | 10000 groups Timestamp 02/18/14 | creating 10000 empty groups Instance Name The HDF Group | 1.8.0 | hdfdap | Version Platform 47 0.670198 | Time 4384 |
  • 45. Improved Crash Survivability in the HDF5 Library 02/18/14 The HDF Group 48
  • 46. Crash Survivability in HDF5 • Problem: − Data in HDF5 files susceptible to corruption in the event of an application or system crash. − Corruption possible if structural metadata is being written when the crash occurs. • Initial Objective: − Guarantee an HDF5 file with consistent metadata can be reconstructed in the event of a crash. − No guarantee on state of raw data – contains whatever made it to disk prior to crash. 02/18/14 02/18/14 The HDF GroupGroup The HDF 49 49
  • 47. Crash Survivability in HDF5 • Approach: Metadata Journaling − When a piece of metadata is modified and in a consistent state, make a journal note. − If the application crashes, a recovery program can replay the journal by applying in order all metadata writes until the end of the last completed transaction written to the journal file. 02/18/14 02/18/14 The HDF GroupGroup The HDF 50 50
  • 48. Faster HDF5 Data Appends 02/18/14 The HDF Group 51
  • 49. Fast Data Appends • Problem: Metadata operations limit the rate at which HDF5 can append data to datasets. • Solution: new data structure for indexing chunks: − Allows constant time extend, shrink and lookup of chunks in datasets with single unlimited dimension − # of metadata I/O operations to append to dataset is independent of # of chunks − Allows single-writer/multiple-reader access • Details at: http://www.hdfgroup.uiuc.edu/RFC/HDF5/SkipList ChunkIndex/SkipListChunkIndex.html 02/18/14 02/18/14 The HDF GroupGroup The HDF 52 52
  • 51. netCDF-4 Project • Enhanced NetCDF-4 Interface to HDF5 − Combine features of netCDF and HDF5 − Take advantage of their separate strengths • Collaboration between NCSA, THG, Unidata • Currently in beta release • Will be released after HDF5 1.8 02/18/14 The HDF Group 54
  • 54. Project description • Investigate integrated DAP-aware HDF5 library that can provide seamless access to both local and remote data • A NASA ROSES NRA project • See Kent Yang’s talk and poster 02/18/14 02/18/14 The HDF GroupGroup The HDF 57 57
  • 55. NOAA – Science Data Stewardship 02/18/14 The HDF Group 58
  • 56. NOAA – Science Data Stewardship • Use HDF5 Archival Information Package (AIP) to archive HDF EOS2 data • A collaboration between NSIDC and THG • See Ruth Duerr and Kent Yang’s poster 02/18/14 02/18/14 The HDF GroupGroup The HDF 59 59
  • 57. HDF5 and .NET Framework 02/18/14 02/18/14 The HDF GroupGroup The HDF 60 60
  • 58. Why .NET? • The Microsoft .NET framework is used by most new applications created for Windows. − Makes it easier to develop applications − Reduces application vulnerability to security threats • Supports development in multiple programming languages, in particular C#. • Increased level of interest in .NET from users of HDF5. 02/18/14 02/18/14 The HDF GroupGroup The HDF 61 61
  • 59. HDF and .NET Status • Received funding to implement prototype .NET wrapper API for Windows XP − Based on HDF5 C API − Focus on C# binding − Functionality limited to subset of API routines • If funded, we would like to move beyond the prototype to − Create .NET wrappers for all HDF C functions − Offer full support for .NET wrappers with HDF5 1.8 02/18/14 02/18/14 The HDF GroupGroup The HDF 62 62
  • 61. Electron tomography 25-80Å resolution 4k x 4k x 500 images now 8k x 8k x 1k images soon (256 GB) 02/18/14 The HDF Group 64
  • 62. Sequencing • Next Gen Sequencing platforms produce ~1500 X more data than CE (Sanger) • A single Next Gen instrument can produce 20 times more data a single run than a day’s operation of a genome center with 100 CE instruments 02/18/14 The HDF Group 65
  • 63. An email on Sept 21… “… A little background, we're doing genetic association studies, these result in large 2-d matrices (40K x 1M before applying threshholds). Each of the cells in this matrix has ~10 numerical statistics (e.g. some sort of pvalue)… ” 40K x 1M x 10 x 4 = 1,600,000,000,000 (1.6 TB) 02/18/14 The HDF Group 66
  • 65. Product data • HDF5 proposed to ISO as binary representation for product data representation and exchange • Would be a binary option to the STEP format • ISO/NWI-CD 10303-026, STEP Part 26 02/18/14 The HDF Group 68
  • 66. SQL Server and HDF5 02/18/14 The HDF Group 69
  • 67. SQL Server and HDF5 • THG discussing possible project with Microsoft • Microsoft envisions a dream environment for scientists that would encompass both computing and data management • Possible SQL Server solution − Combine RDBMS and scientific analysis tools in a single integrated system − Use HDF5 to manage scientific objects not handled well by traditional database 02/18/14 02/18/14 The HDF GroupGroup The HDF 70 70
  • 68. HDF5 in SQL server Visualization Libraries (MATLAB,…) Web Services (XML, REST, RSS) OLAP and Data Mining Reporting .NET Languages with Language Integrated Query Entity Framework (EDM, eSQL, O-R mapping) HDF5 EDM model SQL Server HDF5 HDF5 TVFs Index HDF5 type 02/18/14 HDF5 files HDF5 FS blob The HDF Group 71
  • 69. Thank You All and Thank You NASA! 02/18/14 The HDF Group 72
  • 70. Acknowledgement This report is based upon work supported in part by a Cooperative Agreement with NASA under NASA NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration. 02/18/14 The HDF Group 73
  • 72. Information Sources • HDF website http://hdfgroup.org/ • HDF5 Information Center http://hdfgroup.org/HDF5/ • HDF Helpdesk hdfhelp@hdfgroup.org • HDF users mailing list hdfnews@ncsa.uiuc.edu coming soon: news@hdfgroup.org 02/18/14 The HDF Group 75

Notes de l'éditeur

  1. Why Increasing need for support, services, quick response Not a good model for a University R&amp;D project Who 11 software engineers and several students: develop, maintain HDF software, work on special projects, manage projects 3 tech support staff: helpdesk, doc, sysadmin. Management team President Director of Technical Services and Operations Director of Software Development Director of Business Operations Managers responsible for tools, applications Other THG staff include seven full-time software engineers who develop and maintain the HDF software, as well as working on special projects, and three technical support staff who provide helpdesk support, documentation, and system administration. The HDF group also generally employs students from the University Computer Science and Engineering departments.
  2. The R&amp;D mission Maintain and evolve HDF for high end science apps Maintain HDF4 and HDF5 and tools at supercomputing centers, TeraGrid Support academic science Cutting edge data management research Adapt to leading edge, experimental architectures Integrate with new middleware technologies, parallel file systems The “Support and Sustain” mission Maintain, evolve for communities, sponsors Provide proprietary consulting, tuning, development Sustain for long term, maintain data access over time
  3. &lt;number&gt;
  4. I get all mixed up with the terms backward &amp; forward compatibility. I did a little investigation on the definitions and use in talking with Frank about his compatibility matrix awhile back and still don’t have a good grasp of what is meant… my conclusion was there is no consistent use. It seems most, like MathWorks use “compatibility” without the forward/backward words. I made a change here… is this what you meant in the original?. And, I don’t know if its’ worth saying but – New Versions can always read object in files written with older versions (unless there’s a bug in the writer!) Then we’ll offer the best solution we can.
  5. Maybe Objective bullets do belong on later slide… not sure.
  6. Is it only limited for unlimited / chunked datasets? Or is it that way for all but we’re just fixing it for limited / unchunked cases? Contrasts with B-tree index: - B-tree has O(log n) extend, shrink and lookup of chunks - B-tree has ~logarithmic # of metadata I/O operations as chunks appended Will be optimizing chunked dataset indexing for datasets with no unlimited dimensions (with array index) and multiple unlimited dimensions (with v2 B-tree) as part of project in the next year also.
  7. &lt;number&gt;
  8. I’ve changed this considerably. I don’t think its necessary to say who has funded work to date, exactly what that entails, or that the prototype is available. The important message (to me) is we have experience &amp; interest in this area. And, willing to do more if it’s funded. If not, then that’s the end of the story.
  9. First bullet – let them know it may or may not happen… not a done deal Not sure I got the “translation” from first version of text to this one right… Dropped “&amp; other formats” (let them give those presentatations)
  10. &lt;number&gt;