Fast partial access to objects from very large files in the SDSC Storage Resource Broker (SRB5) can be extremely challenging, even when those objects are small. The HDF-SRB project integrates the SRB and NCSA Hierarchical Data Format (HDF5), to create an access mechanism within the SRB that is can be orders of magnitude more efficient than current methods for accessing object-based file formats. The project provides interactive and efficient access to datasets or subsets of datasets in large files without bringing entire files into local machines. A new set of data structures and APIs have been implemented to the SRB support such object-level data access. A working prototype of the HDF5-SRB data system has been developed and tested. The SRB support is implemented in HDFView as a client application.
1. Integrating HDF5 with SRB
Object-level Access to Remote Files
Peter Cao, NCSA
December 1, 2005
Sponsored by NLADR, NFS PACI Project
in Support of NCSA-SDSC Collaboration
December 1, 2005
HDF & HDF-EOS Workshop IX
2. Outline
Introduction to the HDF-SRB project
The HDF-SRB model
SRB Support in HDFView
December 1, 2005
HDF & HDF-EOS Workshop IX
2/22
3. Overview of SRB
What is
SRB
The SDSC Storage Resource Broker (SRB)
is client-server middleware that provides a
uniform interface for connecting to
distributed data storage in multiple types
of storage resources.
December 1, 2005
HDF & HDF-EOS Workshop IX
3/22
4. Overview of SRB
Architecture
SRB Client
MCAT
SRB Server
DB2
ObjStore
HDF5
HPSS
Unitree
FTP
Distributed Storage Resources: database system, archival storage system, file system, ftp
December 1, 2005
HDF & HDF-EOS Workshop IX
4/22
5. Project Description
Motivation
SRB
Indexing and searching
Distributed data system
Access control
HDF5
Large and diverse data
High performance access
Interactive and subsetting
High performance distributed data system
December 1, 2005
HDF & HDF-EOS Workshop IX
5/22
6. Remote Data Access on SRB
Methods
Normal ways to access SRB:
Get the whole file: large files (100TB SCEC)
Use POSIX low level calls: low performance
New way:
Implement proxy operations to access objects
or parts of objects in one request
December 1, 2005
HDF & HDF-EOS Workshop IX
6/22
7. Project Description
Goals
Working prototype of client/server system for
object-level access to HDF5 stored in the SRB
Use SRB as middleware to transfer data between
the server and client
Use Object-level access for interactive and
efficient access to part of the file
December 1, 2005
HDF & HDF-EOS Workshop IX
7/22
8. Normal SRB File Access
Architecture
client
HDF5
HDF5 File
(whole file or a
sequence of bytes)
SRB Server
MCAT
December 1, 2005
HDF & HDF-EOS Workshop IX
8/22
10. Examples of File Access
I need to see the eye of
Hurricane Bob!
HDF5
December 1, 2005
HDF & HDF-EOS Workshop IX
10/22
11. Examples of File Access
Whole file transfer
Get t
he fil
e
HDF5
December 1, 2005
HDF & HDF-EOS Workshop IX
11/22
12. Examples of File Access
Whole file
transfer
HDF5
HDF5
HDF5
HDF5
Transfer large image – slow!
December 1, 2005
HDF & HDF-EOS Workshop IX
12/22
13. Examples of File Access
SRB POSIX
API
I need to see the eye of
Hurricane Bob!
HDF5
December 1, 2005
HDF & HDF-EOS Workshop IX
13/22
14. Examples of File Access
SRB POSIX
API
Open file
find image file’s open
open image image found
image open
HDF5
Many small messages – slow and complex!
December 1, 2005
HDF & HDF-EOS Workshop IX
14/22
15. Examples of File Access
Object level
Get m
e the
ey e o
f hurr
icane
Bob
HDF5
December 1, 2005
HDF & HDF-EOS Workshop IX
15/22
16. Examples of File Access
Object level
Get m
e the
ey e o
f hurr
icane
Bob
HDF5
One request & small image – fast & simple!
December 1, 2005
HDF & HDF-EOS Workshop IX
16/22
17. HDF5-SRB Model
New objects/APIs
A new set data objects
H5File, H5Group, H5Dataset, H5Datatype, etc
Encapsulated client requests and server results
Enhanced SRB APIs
Pack/Unpack routines (exchange data between
byte stream and structure) to handle complicated
struct – string, pointers, pointers to arrays, arrays
of pointers, etc
New srbGenProxyFunct (general Proxy Function)
handles other types of request besides HDF5
December 1, 2005
HDF & HDF-EOS Workshop IX
17/22
18. HDF5-SRB Model
Data Flow
Client API
Server API
srbObjRequest(void *obj, int objID)
srbObjProcess(void *obj, int objID)
g()
Ms
g()
kM
s
5O
bj
::o
p
H5
Ob
je
c
t
()
un
p
ack
pac
HDF5 Library
2.
()
sg
6.
kM
ac
()
sg
kM
ac
p
1.
p
un
7.
3.
H
5.
4. Access file
srbGenProxyFunct
HDF5 file
SRB Server
December 1, 2005
HDF & HDF-EOS Workshop IX
18/22
19. Runing Server/Client
A SRB server that supports HDF5
HDF5 library and other external libraries (SZIP, ZLIB)
A SRB version 3.4 or later from http://www.sdsc.edu/srb/
Follow instruction on how to run SRB server from UG packed
with SRB source release or online at
http://hdf.ncsa.uiuc.edu/hdf-srb-html/HDF-SRB-UG.html
Any client application that implements HDF5-SRB
Objects
No HDF5 library is required on the client
Example client application: HDFView 2.3 or above
December 1, 2005
HDF & HDF-EOS Workshop IX
19/22
20. Short Demo
HDFView
Support Windows and Linux
May support SGI, AIX if there is new funding
December 1, 2005
HDF & HDF-EOS Workshop IX
20/22
21. Future Work?
Writing capabilities: the current HDF-SRB implementation only
supports read-only operations. We propose to add write
functionality so that users will be able to create new files and data
objects, and to modify data content and attributes.
Better support for complex datatypes such as compound datatype
and variable length datatype
Support for HDF5 indexing and ingesting HDF5 metadata, which
will enable users to access HDF5 objects directly through MCAT
Support for files across different servers (or SRB federation)
More features for HDFView: compared to its local file-handling
capabilities, the current version of HDFView has very limited
features for remote file access on an SRB server.
December 1, 2005
HDF & HDF-EOS Workshop IX
21/22