Unblocking The Main Thread Solving ANRs and Frozen Frames
HPC_June2011
1. National Institute for R&D of Isotopic and Molecular Technologies
65-103 Donath Str., P.O.Box 700 RO-400293 Cluj-Napoca 5, ROMANIA
High Performance Computing - Physico-chemical
applications to molecular and biomolecular systems
Calin Gabriel Floare
Max von Laue Paul Langevin Joseph Fourier
1879-1960 1879-1946 1768-1830
2. Outline
• What is parallel and high performance computing ?
• Why Use Parallel computing ?
• IBM BG/P system @ UVT
• GPU & FPGA High Performance Heterogeneous Computing
• INCDTIM Data Center containing a Grid site & a cluster
• The story of a serendipitous discovery
• Molecular Dynamics simulations on a very big system
• HPC-Europa 2 Program
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 1/30
3. What is parallel computing ?
• Traditionally, software is written for serial computation:
To be run on a single computer having a single CPU
A problem is broken into a discrete series of instructions
Instructions are executed one after the other
Only one instruction may execute at any moment in time
• Parallel computing is the simultaneous use of multiple compute
resources to solve a computational problem:
To be run using multiple CPUs
A problem is broken into discrete parts that can be solved concurrently
Each part is further broken down to a series of instructions
Instructions from each part execute simultaneously on different CPUs
• The compute resources can include:
A single computer with multiple processors
An arbitrary number of computers connected by a network
A combination of both
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 2/30
4. The Universe is parallel
• Parallel computing is an evolution of serial computing that attempts to emulate what has always been the state
of affairs in the natural world: many complex, interrelated events happening at the same time, yet within a
sequence.
The Real World is massively parallel
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 3/30
5. Why Use parallel computing ?
• Historically, parallel computing has been considered to be the ―high end of computing‖, and has been used to model
difficult scientific and engineering problems found in the real world.
• Today, commercial applications provide an equal or greater driving force in the development of faster computers.
These applications require the processing of large amounts of data in sophisticated ways.
• Why use it ?
Save time and/or money
Solve larger problems
Provide concurrency
Use of non-local resources (SETI@home, Folding@home)
Limits of serial computing (Transmissions speeds, Limits to miniaturization, Economic limitations)
https://computing.llnl.gov/tutorials/parallel_comp/
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 4/30
6. Blue Brain and Human Brain Project
http://bluebrain.epfl.ch
is an attempt to create a synthetic brain by reverse-engineering the mammalian and human brain down to the
molecular level.
Founded in May 2005 by the Brain and Mind Institute of the École Polytechnique in Lausanne, Switzerland, is to
study the brain's architectural and functional principles. The project is headed by the Institute's director, Henry
Markram.
NEURON is a simulation environment
for modeling individual neurons and
Using a Blue Gene supercomputer running Michael Hines's NEURON software, networks of neurons.
the simulation does not consist simply of an artificial neural network, but involves
a biologically realistic model of neurons. It is hoped that it will eventually shed
light on the nature of consciousness.
• IBM Blue Gene/P Massively Parallel Computer
• 4 racks, one row, wired as a 16x16x16 3D torus
• 4096 quad-core nodes, PowerPC 450, 850 MHz
• Energy efficient, water cooled
• 56 Tflops peak, 46 Tflops LINPACK
• 16 TB of memory (4 GB per compute node)
• 1 PB of disk space, GPFS parallel file system
• OS Linux SuSE SLES 10
If selected from amongst six other candidates by the Future and Emerging Technologies
(FET) Flagship Program launched by the European Commission, the Blue Brain Project will
upgrade to become the Human Brain Project and will receive funding up to 100 million http://www.neuron.yale.edu/neuron/
euros a year for 10 years.
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 5/30
7. IBM BG/P system @ UVT
• IBM Blue Gene/P Massively Parallel Computer
• 1x rack, 1024 compute cards (32 compute cards / node)
• 1x Quad PowerPC 450 @ 850 MHz – Double FPU
• 4x TB of memory (4 Gb RAM / compute card)
• 4x power servers p520
• 2x DS3524 and EXP3000 – totally 2×48 SAS HDD
• GPFS parallel file system
• One Cisco Nexus 7010 Switch with 64x10GbE and 98x1GbE
• 1x Torus Network, 1x Collective network, 1x10GbE network (for I/O’s)
• OS Linux SuSE SLES 10
IBM BG/P Compute Card
• System-on-a-Chip (SoC)
• PowerPC 450 CPU
850 MHz Frequency
Quad Core
• 4 GB RAM
• Network Connections
Blue Gene/P system overview INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 6/30
8. GPUs (Graphical Processing Units)
In the future, 2010 may be known as the year of the GPU. The Tesla C2050 / Tesla C2070 is capable of running 515
GFLOPs/sec of double precision processing performance.
Tesla C2050 comes standard with 3 GB of GDDR5 memory
Tesla C2050/C2070
at 144 GB/s bandwidth. Tesla C2070 comes standard with 6
GB of GDDR5 memory.
Fermi Architecture
The soul of a supercomputer in the body of a GPU
Octoputer Microway - 8 Tesla cards
NVIDIA Fermi GF100 Block Diagram
CUDA (Compute Unified Device Architecture) is the
computing engine in NVIDIA GPUs
http://www.nvidia.com/cuda
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 7/30
9. FPGAs (Field Programmable Gate Arrays)
A Field Programmable Gate Arrays (FPGA) is an integrated circuit designed to be configured by the customer
or designer after manufacturing—hence "field-programmable".
Reconfigurable Computing uses FPGAs as Attached Processing Elements in a Computing System, in order to
Dramatically Increase the Processing Speed.
Annapolis Micro Systems, Inc. (Annapolis, Maryland), the leader in Commercial
Off the Shelf (COTS) Field Programmable Gate Array (FPGA) Based High
Performance Computing, announces the availability of its new WILDSTAR 6
PCIe Card, with up to three Xilinx Virtex 6 FPGAs.
Dini Group DNV6F6PCIe
Xilinx Virtex LX550T
Hightech Global Xilinx Virtex
6 PCIe Development Board Annapolis’s Wildstar 6 PCIe
Dr. Wim Vanderbauwhede from Glasgow University
creates 1000 core processor using FPGAs
The Gannet platform aims to make it easier to
design complex reconfigurable Systems-on-Chip.
http://www.dcs.gla.ac.uk/~wim/
http://www.gannetcode.org/
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 8/30
10. INCDTIM Data Center
• Hewlett Packard Blade C7000 with 16 Proliant BL280c G6 (2 Intel Quad-core Xeon x5570 @
2.93 GHz, 16 Gb RAM, 500 Gb HDD) running, TORQUE, MAUI, GANGLIA (http://hpc.itim-
cj.ro), NAGIOS, configured from scratch – Scientific Linux 5.3 (Boron)
• We installed different Intel compilers, mathematical and MPI libraries
• We are using different Quantum chemistry codes like: AMBER, GROMACS, NAMD,
LAMMPS, CPMD, CP2K, Gaussian, NWCHEM, GAMESS, ORCA, MOLPRO, DFTB+,
Siesta, VASP, Accelrys Materials Studio
• We are hosting also the RO-14-ITIM Grid site (http://grid.itim-cj.ro)
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 9/30
12. The story of a serendipitous discovery1
α-cyclodextrine, αCD:
the association of 6 glucose units: (C6O5H10)6
4-methylpyridine, 4MP:
C6NH7
…..and a bit of water
1M. Plazanet, C. Floare, M. R. Johnson, R. Schweins, H. P. Tommsdorff, Freezing on heating of liquid solutions, J. Chem. Phys., 121(11),
5031 (2004), ILL Annual Report 2004, 54-55 and the papers which followed.
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 11/30
13. 80
Temperature °C 70 Solid phase
60
Liquid phase
50
40
100 150 200 250 300
Concentration, αCD[g]/4MP[l]
200g/l ~ 1 αCD for 50 4MP
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 12/30
14. A movie by A. Filhol, Laue-Langevin Institute
Azobenzene
: melts at
66oC
CD-4MP :
freezes at
66oC
http://www.ill.eu/about/movies/experiments/in16-a-liquid-paradox/
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 13/30
15. 300
250 Solubility αCD in 4MP
Concentration mg/ml
200
150
100
50
0
40 45 50 55 60 65 70 75 80 85 90 95 100
Temperature °C
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 14/30
16. How we can rationalize these surprising observations?
As temperature increases, entropy must increase, how
is this compatible with the observation that crystalline
order is established and that molecular motions are
slowed down?
Characterize the changes of the structure and of the
molecular dynamics by:
• elastic and inelastic neutron scattering
• neutron and X-ray diffraction,
• low-field NMR and
• molecular dynamics simulations
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 15/30
19. a) Hysteresis-like fixed window (elastic) scan, IN10, ILL; b) Quasi-elastic neutron spectra, IN5, ILL
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 18/30
20. F. Ding and N. Dokholyan, Trends in Biotechnology 23(9) 450 (2005)
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 19/30
21. Model studied system:
2004 - NPT molecular dynamics simulations using Accelrys CERIUS2 v4.6 with
COMPASS forcefield running on different SGI workstation
A periodic box with the dimensions 24Å× 24Å× 24Å, containing:
one a-CD molecule
50 molecules of 4MP
826 atoms
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 20/30
22. 20 a-CD molecules
1120 molecules of 4MP
240 water molecules
NPT ensemble MD using AMBER9
60 A3 box
18920 atoms
An AMBER benchmark on IBM SP5 cluster (IBM p575
speed of 0.22ns day (1 core), 0.39ns day Power 5, bassi.nersc.gov, 118 8-cpu nodes, 1.9 GHz
(2 cores) and 0.69 (4 cores) Power 5+ cpu, 2 MB L2 cache, 36 MB L3 cache, 32 GB
memory per node) produced 22ns/day when using 256
Infiniband is needed for a further scale up cores, on a system containing around 23500 atoms.
• Initially we have to optimize the force
fields using the force-matching method
• 100 ns long trajectories at different
temperatures must be calculated for good
statistics
• Hydrogen-bond dynamics and cluster
formation analysis
• Correlation coefficients
This system will be studied at
CINECA, Italy, on a project founded by
HPC-Europa2 program on 256 CPUs
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 21/30
23. GPU Codes
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 22/30
24. 1 Million atoms systems simulation now
possible on a desktop workstation
Amber 11 GPU performance compared with that
on Kracken@ORNL, Dihydrofolate reductase
(DHFR) solvated in water, 23558 atoms.
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 23/30
25. • Milu (Miramare Interoperable Lite User Interface), a tool to set up easily an
UI on (almost) any machine
(https://eforge.escience-lab.org/gf/project/milu/)
• BEMuSE: Bias-Exchange Metadynamics Submission Environment
(https://euindia.ictp.it/bemuse/)
• EPICO – eLab Procedure for Installation and Configuration
(http://epico.escience-lab.org/)
• Training Tools: GRID Seed (http://gridseed.escience-lab.org)
Moodle Platform (http://www.moodle.org)
• Amazon Elastic Compute Cloud (EC2) - from $0.02 per hour
http://aws.amazon.com/ec2/pricing/
http://aws.amazon.com/ec2/instance-types/
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 24/30
26. To know more about it :
• Freezing on heating of liquid solutions, M. Plazanet, C. Floare,
M.R. Johnson, R. Schweins, H.P. Trommsdorff, J. Chem. Phys.
121 (2004) 5031
• J. Chem. Phys. 125 (2005) 154504
• Chem. Phys. 317 (2006) 153
• Chem. Phys. 331 (2006) 35
• J. Phys. Cond. Mat. 19 (2007) 205108
• Phys. Chem. Chem. Phys. 12 (2010) 7026
INCDTIM Seminar, June 16, 2011, Cluj-Napoca, Romania 25/30
33. Molecular Dynamics Method
―Molecular dynamics (MD) provides the methodology for detailed microscopic modeling on
the molecular scale. The theoretical underpinnings amount on little more than Newton’s laws
of motion. After all, the nature of matter is to be found in the structure and motion of its
constituent building blocks, and the dynamics is contained in the solution of the N-body
problem‖*
Classical N-body problem lacks a the only path open is the numerical
general analytical solution one
Deterministic – provides us with a trajectory of the system
• From atom positions, velocities, and accelerations, calculate atom positions and velocities at the next time step.
• Integrating infinitesimal steps yields the trajectory of the system for any desired time range.
• There are efficient methods for integrating these elementary steps with Verlet and leapfrog algorithms being
the most commonly used.
Use physics to find the potential energy between all pairs of atoms
Move atoms to the next state
Repeat
* D. C. Rapaport, The Art of Molecular Dynamics Simulation, Cambridge University Press (2004)
34. Energy function
• Target function that MD tries to optimize
• Describes the interaction energies of all atoms and molecules in the system
• Always an approximation - closer to real physics (accuracy increases) if more computation time,
smaller time steps and more interactions
AMBER bond
Force Field
Covalent terms
angle
dihedral
van der
Waals
electrostatic
Non-covalent terms
polarization
implicit
solvation