1. IBM Systems and Technology Education
Case Study
Swiss National
Supercomputing Center
Gains low-latency, high-bandwidth storage with
IBM General Parallel File System
Founded in 1991, CSCS, the Swiss National Supercomputing Center,
Overview develops and promotes technical and scientific services for the Swiss
research community in the field of high-performance computing
The need
(HPC). CSCS enables world-class scientific research by pioneering,
With data volumes doubling each
operating and supporting leading-edge supercomputing technologies.
year, researchers at the Swiss National
Supercomputing Center (CSCS) needed Located near Lugano, in the south of Switzerland, CSCS is an
a centralized storage solution offering autonomous unit of the Swiss Federal Institute of Technology in
low latency, high bandwidth and Zurich (ETH Zurich).
extreme scalability.
The solution CSCS serves dozens of different research institutions, supporting a
CSCS engaged IBM to build a broad range of computational projects across theoretical chemistry,
centralized storage solution based material sciences, biological sciences and climate science. Simulations
on IBM® General Parallel File
System (GPFS), IBM System x® and and other computational projects running on the organization’s com-
IBM System Storage® hardware, and pute clusters process many terabytes of data and generate large sets of
InfiniBand networking technology. intermediate results ready for further computation. During the actual
The benefit simulation run time, all of this data resides on the ’scratch’ storage sys-
The solution supports massively parallel tems that are directly attached to each cluster.
read/write operations and provides a
single namespace for all systems. It In the past, research teams would store intermediate simulation results
offers extremely high availability and
on tape - but the limited bandwidth of the tape library made this
nondisruptive online scaling of the file
system. impractical as data volumes grew rapidly. To analyze their results fully,
users would have needed to transfer the data back to their own institu-
tions, which was not feasible because of the low transfer speeds.
“Even over high-speed leased lines, copying data back to a university
network could take weeks, so we wanted to give our users the possibil-
ity of storing their data locally at CSCS for the duration of their proj-
ects,” comments Dominik Ulmer, CSCS general manager. “Equally,
the typical HPC workflow has become more sophisticated: Instead of
simply running a simulation on an input data set, we now often run
2. IBM Systems and Technology Education
Case Study
multiple simulations in series, using the output data from one as the
input data for the next. This tendency to reuse data was another reason
“We selected IBM GPFS for creating a permanent, centralized data storage solution at CSCS.”
as it offered the best
combination of high Choosing the best solution
The amount of data handled in the HPC environment at CSCS
scalability, compatibility roughly doubles each year, making it imperative to select a highly scal-
with our distributed able architecture for the proposed centralized storage solution. It was
also critical to choose a file system that could be mounted on multiple
operating systems, par- different HPC systems simultaneously, and that would offer both per-
allelism of access, and formance and reliability.
failover between nodes.”
“We tested a number of file systems and narrowed our choice down
to Oracle Lustre and IBM General Parallel File System [GPFS],”
—Hussein Harake, HPC systems engineer,
CSCS says Hussein Harake, CSCS HPC systems engineer. “We selected
IBM GPFS, as it offered the best combination of high scalability, com-
patibility with our distributed operating systems, parallelism of access
and failover between nodes. Our data can be very long-lived, so it was
also important to choose a solution that would offer longevity—both in
terms of the reliability of long-term data storage and in terms of the
vendor support and roadmap. Selecting GPFS from IBM enabled us to
meet these requirements.”
Managing rapid growth
GPFS supports single cluster file systems of multiple petabytes and
runs at I/O rates of more than 100 gigabytes per second. Individual
clusters may be cross-connected to provide parallel access to data even
across large geographic distances. At CSCS, GPFS offers both low
latency (needed for high-speed access to small files) and high band-
width (vital for delivering very large files to compute clusters).
“Our GPFS-based central file store is becoming a really important
resource for us,” says Harake. “Users really appreciate the option to
store their data locally rather than needing to copy it back to their own
institution. They are requesting more capacity than we originally antic-
ipated, so the environment is growing faster than expected.”
Ulmer adds, “The rapid rate of growth in data volumes is partly a con-
sequence of researchers being able to run more complex simulations on
the newer HPC clusters. So, to an extent, they are catching up on
2
3. IBM Systems and Technology Education
Case Study
projects that couldn’t be done before. GPFS gave us an infrastructure
Solution components: that would grow with user demand but in a way that was predictable in
budgetary terms.”
Hardware
● IBM® System x® 3650 M2 A key decision factor for GPFS was its support for nondisruptive
● IBM System Storage® DS5100
● IBM System Storage DS5300 migrations and upgrades. Since first implementing the IBM file
● IBM System Storage EXP5000 system, CSCS has upgraded through three phases of different storage
● IBM System Storage EXP5060 arrays and network switches, all without loss of data or service inter-
Software ruption. Today, the centralized storage solution is based around three
● IBM General Parallel File System IBM System Storage DS5100 controllers with eight IBM System
Storage EXP5060 Storage Expansion Enclosures (containing high-
Services capacity SATA disks) and four IBM System Storage EXP5000 Storage
● IBM Global Technology Services
Expansion Enclosures (containing high-performance Fibre Channel
disks). IBM System x 3650 M2 servers running GPFS act as the file
servers; Mellanox gateways and switches provide high-speed InfiniBand
networking.
Says Harake, “We have upgraded the file system several times, changed
the disk controllers and even changed the disks themselves, all without
taking the solution down. We will soon upgrade the controllers from
DS5100 to DS5300 and add four more expansion enclosures, which
will expand our total capacity to 2 PB without any interruption to
service.”
Holistic approach
IBM is responsible for supplying and supporting every element in the
centralized storage environment, from the disks up to the network
infrastructure. “We wanted IBM to take ownership of the core net-
work, so that we have a single point of support for the whole environ-
ment,” says Ulmer. “This holistic approach helps us minimize risk and
delays in support.”
He adds, “We consider HPC technology know-how to be our core
competence, and we want to find external partners that are willing to
tackle the really cutting-edge stuff and learn alongside us. Our rela-
tionship with IBM is very good, and we see a lot of value in our shared
workshops. With the GPFS-based centralized storage solution, we
feel that we have the ideal building-block for the coming years. The
IBM solution will enable us to expand our capacity enormously with-
out disruption and without loss of performance.”
3