SlideShare a Scribd company logo
1 of 52
Download to read offline
Parallelization of
molecular dynamics
Jaewoon Jung
(RIKEN Advanced Institute for Computational Science)
Overview of MD
Molecular Dynamics (MD)
1. Energy/forces are described by classical molecular mechanics force field.
2. Update state according to equations of motion
Long time MD trajectories are important to obtain     
thermodynamic quantities of target systems.
i i
i
i
d
dt m
d
dt


r p
p
F
Equation of motion
Long time MD trajectory
=> Ensemble generation
( ) ( )
( ) ( )
i
i i
i i i
t t t t
m
t t t t
    
    
p
r r
p p F
Integration
Potential energy in MD
4
2
total 0
bonds
2
0
angles
dihedrals
12 61
0 0
1 1
( )
( )
[1 cos( )]
2
b
a
n
N N
ij ij i j
ij
ij ij ijj i j
E k b b
k
V n
r r q q
r r r
 
 


  
 
 
  
                 
       



 
O(N)
O(N)
O(N)
O(N2)
Main bottleneck in MD
12 6 2 2
0 0
2
0
erfc( ) exp( / 4 )
2 FFT( ( ))
ij ij i j ij
ij
ij ij iji j R
r r q q r
Q
r r r
 

  
                  
       
 
k
k
k
k
Real space, O(CN) Reciprocal space, O(NlogN)
Total number of particles
Non-bonded interaction
1. Non-bond energy calculation is reduced by introducing cutoff
2
4
2 U
2. The electrostatic energy calculation beyond cutoff will be done in the reciprocal
space with FFT
3. Further, it could be reduced by properly distributing over parallel processors, in
particular good domain decomposition scheme. 5
U
4
erfc	
| |
2 exp	 G 4⁄
G
G
4
cos	 G ∙
4
Real part Reciprocal part Self energy
O(N2)
O(N1)
Current MD simulations of biological systems
Protein Folding 
by Anton
D. E. Shaw (2011)
(WW domain, 
50 AAs)
50 nm
10 nm
5 nm
100 nm
6
10ns 100ns μs ms sec
Virus system
Schulten (2013)
(HIV virus)
Large biomolecule
Sanbonmatsu (2013) 
(Ribosome)
The First MD
Karplus (1977) 
(BPTI, 3ps)
Crowding system performed by GENESIS :
Comparable to the largest system
performed until now
Time
Size
Difficulty to perform long time MD simulation
1. One time step length (Δt) is limited to 1-2 fs due to vibrations.
2. On the other hand, biologically meaningful events occur on the time scale of
milliseconds or longer.
fs ps ns μs ms sec
vibrations
Sidechain motions
Mainchain motions
Folding
Protein global motions
How to accelerate MD simulations?
=> Parallelization
Serial Parallel
16 cpus
X16?
1cpu
1cpu
Good Parallelization ; 
1) Small amount of
computation in one CPU
2) Small amount of
communication time
C
CPU
Core
MPI 
Comm
Overview of MPI and OpenMP
parallelization
Shared memory parallelization (OpenMP)
M1 M2 M3 MP‐1 MP
P1 P2 P3 PP‐1 PP
…
Distributed memory parallelization (MPI)
M1
P1
M2
P2
M3
P3
MP‐1
PP‐1
MP
PP
…
Hybrid parallelization (MPI+OpenMP)
M1
P1
M2
P2
M3
P3
MP‐1
PP‐1
MP
PP
…
Parallelization of MD (real space)
Parallelization scheme 1 :
Replicated data approach
1. Each processor has a copy of all particle data.
2. Each processor works only part of the whole works by proper assign in do
loops.
do i = 1, N
do j = i+1, N
energy(i,j)
force(i,j)
end do
end do
my_rank = MPI_Rank
proc = total MPI
do i = my_rank+1, N, proc
do j = i+1, N
energy(i,j)
force(i,j)
end do
end do
MPI reduction (energy,force)
Hybrid (MPI+OpenMP) parallelization of the
Replicated data approach
1. Works are distributed over MPI and OpenMP threads.
2. Parallelization is increased by reducing the number of MPIs involved in
communications.
my_rank = MPI_Rank
proc = total MPI
do i = my_rank+1, N, proc
do j = i+1, N
energy(i,j)
force(i,j)
end do
end do
MPI reduction
(energy,force)
my_rank = MPI_Rank
proc = total MPI
nthread = total OMP thread
!$omp parallel
id = omp thread id
my_id = my_rank*nthread + id
do i = my_id+1,N,proc*nthread
do j = i+1, N
energy(i,j)
force(i,j)
end do
end do
Openmp reduciton
!$omp end parallel
MPI reduction (energy,force)
Advantage/Disadvantage of
the Replicated data approach
1. Advantage : easy to implement
2. Disadvantage
1) Parallel efficiency is not good
a) Load imbalance
b) Communication is not reduced by increasing the number
of processors
2) Needs a lot of memory
b) Memory usage is independent of the number of processors
Parallelization scheme 2 :
Domain decomposition
1. The simulation space is
divided into subdomains
according to MPI
2. Domain size is usually
equal to or greater than
the cutoff value.
Advantage/Disadvantage of
the domain decomposition approach
1. Advantage
1) Parallel efficiency is very good compared to the replicated
data method due to small communicational cost
2) The amount of memory is reduced by increasing the number
of processors
2. Disadvantage
1) Difficult to implement
Comparison of two parallelization scheme
Computation Communication Memory
Replicated data O(N/P) O(N) O(N)
Domain
decomposition
O(N/P) O((N/P)2/3) O(N/P)
Domain decomposition of existing MD :
(1) Gromacs
Coordinates in zones 1 to 7 are
communicated to the corner cell 0
1. Gromacs makes use of the 8-th
shell scheme as the domain
decomposition scheme
Rc
8th shell scheme
Ref : B. Hess et al. J. Chem. Theor. Comput. 4, 435 (2008)
Domain decomposition of existing MD :
(1) Gromacs
2. Multiple-Program, Multiple-Data
PME parallelization
1) A subset of processors are
assigned for reciprocal space PME
2) Other processors are assigned for
real space and integration
Ref : S. Pronk et al. Bioinformatics, btt055 (2013)
Domain decomposition of existing MD :
(2) NAMD
1. NAMD is based on the Charmm++ parallel program system and runtime
library.
2. Subdomains named patch are decided according to MPI
3. Forces are calculated by independent compute objects
Ref : L. Kale et al. J. Comput.
Chem. 151, 283 ((1999)
Domain decomposition of existing MD :
(3) Desmond – midpoint method
1. Two particles interact on a particular box if and only if the midpoint of the segment
connecting them falls within the region of space associated with that box
2. This scheme applies not only for non-bonded but also bonded interactions.
Each pair of particles separated by a distance less than R (cutoff distance) is connected by a dashed line segment,
with “x” at its center lying in the box which will compute the interaction of that pair
Ref : KJ. Bowers et al, J. Chem. Phys. 124, 184109 (2006)
Domain decomposition of existing MD :
(4) GENESIS – midpoint cell method
Midpoint method : interaction between
two particles are decided from the
midpoint position of them.
Midpoint cell method : interaction
between two particles are decided from
the midpoint cells where each particle
resides.
Small communication, efficient energy/force 
evaluations
Ref : J. Jung et al, J. Comput. Chem. 35, 1064
(2014)
Communication space
Communicate only with neighboring cells to minimize communication
Eighth shell v.s. Midpoint (cell)
Rc/2
1. Midpoint (cell) and eighth shell method have the same amount of communication
(import volumes are identical to each other)
2. Midpoint (cell) method could be move better than the eighth shell for certain
communication network topologies like the toroidal mesh.
3. Midpoint (cell) could be more advantageous considering cubic decomposition
parallelization for FFT (will be explained after)
Eighth shell Midpoint
Rc
Hybrid parallelization in GENESIS
1. The basic is the Domain
decomposition with the midpoint
scheme
2. Each node consists of at least two
cells in each direction.
3. Within nodes, thread calculation
(OPEN MP) is used for
parallelization and communication
is not necessary
4. Only for different nodes, we allow
point to point communication
(MPI) for neighboring domains.
Efficient shared memory calculation
=>Cell-wise particle data
1. Each cell contains an array with the data of the particles that reside within it
2. This improves the locality of the particle data for operations on individual cells or
pairs of cells.
particle data in traditional cell lists
Ref : P. Gonnet, JCC 33, 76-81 (2012)
cell-wise arrays of particle data
How to parallelize by OPENMP?
1. In the case of integrator, every cell indices are divided according to the thread id.
2. As for the non-bond interaction, cell-pair lists are first identified and cell-pair lists are
distributed to each thread
1
5
2
13
9 10
6
3
14
4
7
11
8
12
1615
(1,2)
(1,5)
(1,6)
(2,3)
(2,5)
(2,6)
(2,7)
(3,4)
(3,6)
(3,7)
(3,8)
(4,7)
(4,8)
(5,6)
(5,9)
(5,10)
(6,7)
(6,10)
…
thread1
thread2
thread3
thread4
thread1
thread2
thread3
thread4
…
Recent trend of MD : CPU => GPU
Image is from informaticamente.over‐blog.it
1. A CPU core can execute 4 or 8 32-bit instructions per clock, whereas a GPU can
execute much more (>3200) 32-bit instructions per clock.
2. Unlike CPUs, GPUs have a parallel throughput architecture that emphasizes executing
many concurrent threads slowly, rather than executing a single thread very quickly.
=> Parallelization on GPU is also important.
Parallelization of MD
(reciprocal space)
Smooth particle mesh Ewald method
Real part Reciprocal part Self energy
The structure factor in the reciprocal part is approximated as
Using Cardinal B-splines of order n Fourier Transform of
charge
  2 2 2
2 2
2
' , 1 1
'
erfc1 1 exp( / )
( ) ( )
2 2
i j c
N N
i j i j
i
i j ii j
r
q q
E S q
V
   
   
  
  
  
 
   n k 0
r r n
r r n k
r k
kr r n
1 2 3 1 1 2 2 3 3 1 2 3( , , ) ( ) ( ) ( ) ( )( , , )S k k k b k b k b k F Q k k k
It is important to parallelize the Fast Fourier transform efficiently in PME!!
Ref : U. Essmann et al, J. Chem. Phys. 103, 8577 (1995)
Simple note of MPI_alltoall and
MPI_allgather communications
MPI_alltoall
MPI_allgather
Parallel 3D FFT – slab(1D) decomposition
1. Each processor is assigned a slob of size
N × N × N/P for computing an N × N
× N FFT on P processors.
2. The parallel scheme of FFTW
3. Even though it is easy to implement, the
scalability is limited by N, the extent of
the data along a single axis
4. In the case of FFTW, N should be
divisible by P
1D decomposition of 3D FFT
Reference: H. Jagode. Master’s thesis, The University of Edinburgh, 2005
1D decomposition of 3D FFT (continued)
1. Slab decomposition of 3D FFT has three steps
 2D FFT (or two 1D FFT) along the two local dimension
 Global transpose
 1D FFT along third dimension
2. Advantage : The fastest on limited number of processors because it only needs
one global transpose
3. Disadvantage : Maximum parallelization is limited to the length of the largest
axis of the 3D data ( The maximum parallelization can be increased by using a
hybrid method combining 1D decomposition with a thread based
parallelization)
Parallel 3D FFT –2D decomposition
1. Each processor is assigned a slob of size N × N/P × N/Q for computing an N
× N × N FFT on P ×Q processors.
2. Current GENESIS adopt this scheme with 1D FFTW
2D decomposition of 3D FFT
Reference: H. Jagode. Master’s thesis, The University of Edinburgh, 2005
2D decomposition of 3D FFT (continued)
1. 2D decomposition of 3D FFT has five steps
 1D FFT along the local dimension
 Global transpose
 1D FFT along the second dimension
 Global transpose
 1D FFT along the third dimension
 Global transpose
2. The global transpose requires communication only between subgroups of all nodes
3. Disadvantage : Slower than 1D decomposition for a number of processors possible with
1D decomposition
4. Advantage : Maximum parallelization is increased
5. Program with this scheme
 Parallel FFT package by Steve Plimpoton (Using MPI_Send and MPI_Irecv)[1]
 FFTE by Daisuke Takahashi (Using MPI_AlltoAll)[2]
 P3DFFT by Dmitry Pekurovsky (Using MPI_Alltoallv)[3]
[1] http://www.sandia.gov/~sjplimp/docs/fft/README.html
[2] http://www.ffte.jp
[3] http://www.sdsc.edu/us/resources/p3dfft.php
2D decomposition of 3D FFT (pseudo-code)
! compute Q factor
do i = 1, natom/P
compute Q_orig
end do
call mpi_alltoall(Q_orig, Q_new, …)
accumulate Q from Q_new
!FFT : F(Q)
do iz = 1, zgrid(local)
do iy = 1, ygrid(local)
work_local = Q(my_rank)
call fftw(work_local)
Q(my_rank) = work_local
end do
end do
call mpi_alltoall(Q, Q_new,…)
do iz = 1, zgrid(local)
do ix = 1, xgrid(local)
work_local = Q_new(my_rank)
call fftw(work_local)
Q(my_rank) = work_local
end do
end do
call mpi_alltoall(Q,Q_new,..)
do iy = 1, ygrid(local)
do ix = 1, xgrid(local)
work_local = Q(my_rank)
call fftw(work_local)
Q(my_rank) = work_local
end do
end do
! compute energy and virial
do iz = 1, zgrid
do iy = 1, ygrid(local)
do ix = 1, xgrid(local)
energy = energy +  sum(Th*Q)
virial = viral + ..
end do
end do
end do
! X=F_1(Th)*F_1(Q)
! FFT (F(X))
do iy = 1, ygrid(local)
do ix = 1, xgrid(local)
work_local = Q(my_rank)
call fftw(work_local)
Q(my_rank) = work_local
end do
end do
call mpi_alltoall(Q,Q_new,..)
do iz = 1, zgrid(local)
do ix = 1, xgrid(local)
work_local = Q(my_rank)
call fftw(work_local)
Q(my_rank) = work_local
end do
end do
call mpi_alltoall(Q,Q_new)
do iz = 1, zgrid(local)
do iy = 1, ygrid(local)
work_local = Q(my_rank)
call fftw(work_local)
Q(my_rank) = work_local
end do
end do
compute force
2 dimensional view of conventional method
Benchmark result of existing MD
programs
Gromacs
Ion channel /w virtual sites : 129,692
Ion channel /w.o virtual site : 141,677
Virus capsid : 1,091,164
Vesicle fusion : 2,511,403
Methanol : 7,680,000
Ref : S. Pronk et al. Bioinformatics, btt055 (2013)
NAMD
Titan (Ref : Y. Sun et al. SC12)
Nodes Cores Timestep(ms)
2048 32768 98.8
4096 65536 55.4
8192 131072 30.3
16384 262144 17.9
20M and 100M system on Blue Gene/Q
(Ref : K. Sundhakar et al. IPDPS, 2013 IEEE)
GENESIS on K (1M atom system)
Ref) J. Jung et al. WIREs CMS, 5, 310-323 (2015)
■ 12M System (Provided by Dr. Isseki Yu)
□ atom size:  11737298
□ macro molecule size:  216 (43 type)
□ metabolites size: 4212 (76 type)
□ ion size:  23049 
(Na+, Cl‐, Mg2+, K+)
□ water size: 2944143
□ PBC size: 480 x 480 x 480(Å3)
□ Volume fraction of water 75%
■ Simulation 
□ MD program:  GENESIS
□ Type of Model: All atom 
□ Computer:  K (8192 (16x32x16) nodes)
□ Parameter: CHARMM
□ Performance 12 ns/day,  (without waiting time)
GENESIS on K (12 M system)
2048 4096 8192 16384 32768 65536 131072
8
10
20
40
60
80
100
200
Simulationtime(ns/day)
Simulationtime(msec/step)
Number of cores
0.5
1
2
4
8
16
256 512 1024 2048 4096 8192 16384 32768
0.125
0.25
0.5
1
2
4
8
Number of cores
Time(msec/step)
FFT Forward
FFT Forward communication
FFT Backward
FFT Backward communication
Send/Receive before force
Send/Receive after force
Simulation of 12 M atoms system
(provided by Dr. Isseki Yu)
49
confidential data
animal cell
mycoplasma
Cytoplasm
10μm
=1/100 mm
300nm
=3/10000mm
100nm = 1/10000mm Collaboration with M.Feig (MSU)
■ 100 M System (provided by Dr. Isseki Yu) 
GENESIS on K (100 M atoms)
16384 32768 65536 131072 262144
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.2
0.3
Number of cores
Timeperstep(sec)
16384 32768 65536 131072 262144
0.5
0.6
0.7
0.8
0.9
1
2
3
4
5
6
Simulationtime(ns/day) Number of cores
GENESIS on K (100 M atoms, RESPA)
16384 32768 65536 131072 262144
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.2
0.3
Number of cores
Timeperstep(sec)
16384 32768 65536 131072 262144
0.5
0.6
0.7
0.8
0.9
1
2
3
4
5
6
7
8
9
10
Simulationtime(ns/day) Number of cores
Summary
1. Parallelization of MD is important to obtain the long time MD trajectory.
2. Domain decomposition scheme is used for minimal communicational and
computational cost. Energy/forces are described by classical molecular
mechanics force field.
3. Hybrid parallelization (MPI+OpenMP) is very helpful to increase the parallel
efficiency.
4. Parallelization of FFT is very important using PME method in MD.
5. There are several decomposition schemes of parallel FFT, and we suggest
that the cubic decomposition scheme with the midpoint cell is a good solution
for efficient parallelization

More Related Content

What's hot

magnetic materials (160760109010)
  magnetic materials (160760109010)  magnetic materials (160760109010)
magnetic materials (160760109010)Ronak Dhola
 
Band structure
Band structureBand structure
Band structurenirupam12
 
L13 magnetic induction
L13   magnetic inductionL13   magnetic induction
L13 magnetic inductionSatyakam
 
06 laser-basics
06 laser-basics06 laser-basics
06 laser-basicsshubham211
 
PART VII.2 - Quantum Electrodynamics
PART VII.2 - Quantum ElectrodynamicsPART VII.2 - Quantum Electrodynamics
PART VII.2 - Quantum ElectrodynamicsMaurice R. TREMBLAY
 
Ese570 inv stat12
Ese570 inv stat12Ese570 inv stat12
Ese570 inv stat12bheemsain
 
Lecture 4 4521 semiconductor device physics - metal-semiconductor system
Lecture 4   4521 semiconductor device physics - metal-semiconductor systemLecture 4   4521 semiconductor device physics - metal-semiconductor system
Lecture 4 4521 semiconductor device physics - metal-semiconductor systemNedal Al Taradeh
 
Two port networks
Two port networksTwo port networks
Two port networksJaved khan
 
Energy band diagram of semiconductor
Energy band diagram of semiconductorEnergy band diagram of semiconductor
Energy band diagram of semiconductorkavinyazhini
 
Physics barriers and tunneling
Physics barriers and tunnelingPhysics barriers and tunneling
Physics barriers and tunnelingMohamed Anwar
 
SWITCH GEAR & PROTECTIVE DEVICE (EEN-437)
SWITCH GEAR & PROTECTIVE DEVICE (EEN-437)SWITCH GEAR & PROTECTIVE DEVICE (EEN-437)
SWITCH GEAR & PROTECTIVE DEVICE (EEN-437)foyez ahammad
 
Maxwell equation
Maxwell equationMaxwell equation
Maxwell equationKumar
 
Introduction to Electron Correlation
Introduction to Electron CorrelationIntroduction to Electron Correlation
Introduction to Electron CorrelationAlbert DeFusco
 
G Type Antiferromagnetic by Singhan Ganguly
G Type Antiferromagnetic by Singhan GangulyG Type Antiferromagnetic by Singhan Ganguly
G Type Antiferromagnetic by Singhan GangulySINGHAN GANGULY
 

What's hot (20)

magnetic materials (160760109010)
  magnetic materials (160760109010)  magnetic materials (160760109010)
magnetic materials (160760109010)
 
Band structure
Band structureBand structure
Band structure
 
L13 magnetic induction
L13   magnetic inductionL13   magnetic induction
L13 magnetic induction
 
06 laser-basics
06 laser-basics06 laser-basics
06 laser-basics
 
PART VII.2 - Quantum Electrodynamics
PART VII.2 - Quantum ElectrodynamicsPART VII.2 - Quantum Electrodynamics
PART VII.2 - Quantum Electrodynamics
 
Ese570 inv stat12
Ese570 inv stat12Ese570 inv stat12
Ese570 inv stat12
 
2 7plants
2 7plants2 7plants
2 7plants
 
Lecture 4 4521 semiconductor device physics - metal-semiconductor system
Lecture 4   4521 semiconductor device physics - metal-semiconductor systemLecture 4   4521 semiconductor device physics - metal-semiconductor system
Lecture 4 4521 semiconductor device physics - metal-semiconductor system
 
Two port networks
Two port networksTwo port networks
Two port networks
 
Energy band diagram of semiconductor
Energy band diagram of semiconductorEnergy band diagram of semiconductor
Energy band diagram of semiconductor
 
Physics barriers and tunneling
Physics barriers and tunnelingPhysics barriers and tunneling
Physics barriers and tunneling
 
SWITCH GEAR & PROTECTIVE DEVICE (EEN-437)
SWITCH GEAR & PROTECTIVE DEVICE (EEN-437)SWITCH GEAR & PROTECTIVE DEVICE (EEN-437)
SWITCH GEAR & PROTECTIVE DEVICE (EEN-437)
 
Kondo Effect.pptx
Kondo Effect.pptxKondo Effect.pptx
Kondo Effect.pptx
 
Maxwell equation
Maxwell equationMaxwell equation
Maxwell equation
 
Quantum Electronics Lecture 1
Quantum Electronics Lecture 1Quantum Electronics Lecture 1
Quantum Electronics Lecture 1
 
Introduction to Electron Correlation
Introduction to Electron CorrelationIntroduction to Electron Correlation
Introduction to Electron Correlation
 
statistic mechanics
statistic mechanicsstatistic mechanics
statistic mechanics
 
G Type Antiferromagnetic by Singhan Ganguly
G Type Antiferromagnetic by Singhan GangulyG Type Antiferromagnetic by Singhan Ganguly
G Type Antiferromagnetic by Singhan Ganguly
 
Charging C
Charging  CCharging  C
Charging C
 
電路學第七章 交流穩態分析
電路學第七章 交流穩態分析電路學第七章 交流穩態分析
電路學第七章 交流穩態分析
 

Viewers also liked

Aprendizaje colaborativo.
Aprendizaje colaborativo.Aprendizaje colaborativo.
Aprendizaje colaborativo.cribaoc
 
CMSI計算科学技術特論A (2015) 第12回 古典分子動力学法の高速化
CMSI計算科学技術特論A (2015) 第12回 古典分子動力学法の高速化CMSI計算科学技術特論A (2015) 第12回 古典分子動力学法の高速化
CMSI計算科学技術特論A (2015) 第12回 古典分子動力学法の高速化Computational Materials Science Initiative
 
Acontecimientos (3)
Acontecimientos (3)Acontecimientos (3)
Acontecimientos (3)daniel diaz
 
Precured Tyre Retreading machinery catalogue
Precured Tyre Retreading machinery cataloguePrecured Tyre Retreading machinery catalogue
Precured Tyre Retreading machinery cataloguemidhunzone
 
70-483 Programming in C# Complete Study
70-483 Programming in C# Complete Study70-483 Programming in C# Complete Study
70-483 Programming in C# Complete Studydrovioph
 
A comparison of molecular dynamics simulations using GROMACS with GPU and CPU
A comparison of molecular dynamics simulations using GROMACS with GPU and CPUA comparison of molecular dynamics simulations using GROMACS with GPU and CPU
A comparison of molecular dynamics simulations using GROMACS with GPU and CPUAlex Camargo
 
Social Media e Personal Branding
Social Media e Personal BrandingSocial Media e Personal Branding
Social Media e Personal BrandingMarco Massarotto
 
The united states and canada
The united states and canadaThe united states and canada
The united states and canadaMrO97
 
LE NUMÉRIQUE, POUR TRANSFORMER L’ÉTAT
LE NUMÉRIQUE, POUR TRANSFORMER L’ÉTATLE NUMÉRIQUE, POUR TRANSFORMER L’ÉTAT
LE NUMÉRIQUE, POUR TRANSFORMER L’ÉTATYves Buisson
 
Koller Zurich - Zeichnungen Auktion Dienstag, - Old Master Drawings Auction
Koller Zurich  -  Zeichnungen Auktion Dienstag, - Old Master Drawings AuctionKoller Zurich  -  Zeichnungen Auktion Dienstag, - Old Master Drawings Auction
Koller Zurich - Zeichnungen Auktion Dienstag, - Old Master Drawings AuctionKoller Auctions
 

Viewers also liked (14)

Aprendizaje colaborativo.
Aprendizaje colaborativo.Aprendizaje colaborativo.
Aprendizaje colaborativo.
 
CMSI計算科学技術特論C (2015) MODYLAS と古典MD②
CMSI計算科学技術特論C (2015) MODYLAS と古典MD②CMSI計算科学技術特論C (2015) MODYLAS と古典MD②
CMSI計算科学技術特論C (2015) MODYLAS と古典MD②
 
CMSI計算科学技術特論A (2015) 第12回 古典分子動力学法の高速化
CMSI計算科学技術特論A (2015) 第12回 古典分子動力学法の高速化CMSI計算科学技術特論A (2015) 第12回 古典分子動力学法の高速化
CMSI計算科学技術特論A (2015) 第12回 古典分子動力学法の高速化
 
Acontecimientos (3)
Acontecimientos (3)Acontecimientos (3)
Acontecimientos (3)
 
как себя вести в театре
как себя вести в театрекак себя вести в театре
как себя вести в театре
 
Precured Tyre Retreading machinery catalogue
Precured Tyre Retreading machinery cataloguePrecured Tyre Retreading machinery catalogue
Precured Tyre Retreading machinery catalogue
 
CMSI計算科学技術特論C (2015) MODYLAS と古典MD①
CMSI計算科学技術特論C (2015) MODYLAS と古典MD①CMSI計算科学技術特論C (2015) MODYLAS と古典MD①
CMSI計算科学技術特論C (2015) MODYLAS と古典MD①
 
70-483 Programming in C# Complete Study
70-483 Programming in C# Complete Study70-483 Programming in C# Complete Study
70-483 Programming in C# Complete Study
 
A comparison of molecular dynamics simulations using GROMACS with GPU and CPU
A comparison of molecular dynamics simulations using GROMACS with GPU and CPUA comparison of molecular dynamics simulations using GROMACS with GPU and CPU
A comparison of molecular dynamics simulations using GROMACS with GPU and CPU
 
Microsoft 20412D
Microsoft 20412DMicrosoft 20412D
Microsoft 20412D
 
Social Media e Personal Branding
Social Media e Personal BrandingSocial Media e Personal Branding
Social Media e Personal Branding
 
The united states and canada
The united states and canadaThe united states and canada
The united states and canada
 
LE NUMÉRIQUE, POUR TRANSFORMER L’ÉTAT
LE NUMÉRIQUE, POUR TRANSFORMER L’ÉTATLE NUMÉRIQUE, POUR TRANSFORMER L’ÉTAT
LE NUMÉRIQUE, POUR TRANSFORMER L’ÉTAT
 
Koller Zurich - Zeichnungen Auktion Dienstag, - Old Master Drawings Auction
Koller Zurich  -  Zeichnungen Auktion Dienstag, - Old Master Drawings AuctionKoller Zurich  -  Zeichnungen Auktion Dienstag, - Old Master Drawings Auction
Koller Zurich - Zeichnungen Auktion Dienstag, - Old Master Drawings Auction
 

Similar to CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics

El text.tokuron a(2019).jung190711
El text.tokuron a(2019).jung190711El text.tokuron a(2019).jung190711
El text.tokuron a(2019).jung190711RCCSRENKEI
 
第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)RCCSRENKEI
 
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...IOSRJVSP
 
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...ijp2p
 
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...ijp2p
 
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...ijp2p
 
Performance evaluation of energy
Performance evaluation of energyPerformance evaluation of energy
Performance evaluation of energyijp2p
 
An Improved Empirical Mode Decomposition Based On Particle Swarm Optimization
An Improved Empirical Mode Decomposition Based On Particle Swarm OptimizationAn Improved Empirical Mode Decomposition Based On Particle Swarm Optimization
An Improved Empirical Mode Decomposition Based On Particle Swarm OptimizationIJRES Journal
 
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...ijasuc
 
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...ijasuc
 
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...ijasuc
 
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...IJRESJOURNAL
 
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINES
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINESAPPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINES
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINEScseij
 
Application of particle swarm optimization to microwave tapered microstrip lines
Application of particle swarm optimization to microwave tapered microstrip linesApplication of particle swarm optimization to microwave tapered microstrip lines
Application of particle swarm optimization to microwave tapered microstrip linescseij
 

Similar to CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics (20)

El text.tokuron a(2019).jung190711
El text.tokuron a(2019).jung190711El text.tokuron a(2019).jung190711
El text.tokuron a(2019).jung190711
 
第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)
 
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...
 
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...
 
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...
PERFORMANCE EVALUATION OF ENERGY EFFICIENT CLUSTERING PROTOCOL FOR CLUSTER HE...
 
Performance evaluation of energy
Performance evaluation of energyPerformance evaluation of energy
Performance evaluation of energy
 
An Improved Empirical Mode Decomposition Based On Particle Swarm Optimization
An Improved Empirical Mode Decomposition Based On Particle Swarm OptimizationAn Improved Empirical Mode Decomposition Based On Particle Swarm Optimization
An Improved Empirical Mode Decomposition Based On Particle Swarm Optimization
 
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...
 
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...
 
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...
Energy-Efficient Target Coverage in Wireless Sensor Networks Based on Modifie...
 
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINES
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINESAPPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINES
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINES
 
Application of particle swarm optimization to microwave tapered microstrip lines
Application of particle swarm optimization to microwave tapered microstrip linesApplication of particle swarm optimization to microwave tapered microstrip lines
Application of particle swarm optimization to microwave tapered microstrip lines
 
bmi SP.ppt
bmi SP.pptbmi SP.ppt
bmi SP.ppt
 
Bz02516281633
Bz02516281633Bz02516281633
Bz02516281633
 
MD_course.ppt
MD_course.pptMD_course.ppt
MD_course.ppt
 
Modelling the Single Chamber Solid Oxide Fuel Cell by Artificial Neural Network
Modelling the Single Chamber Solid Oxide Fuel Cell by Artificial Neural NetworkModelling the Single Chamber Solid Oxide Fuel Cell by Artificial Neural Network
Modelling the Single Chamber Solid Oxide Fuel Cell by Artificial Neural Network
 

More from Computational Materials Science Initiative

More from Computational Materials Science Initiative (20)

MateriApps LIVE! の設定
MateriApps LIVE! の設定MateriApps LIVE! の設定
MateriApps LIVE! の設定
 
How to setup MateriApps LIVE!
How to setup MateriApps LIVE!How to setup MateriApps LIVE!
How to setup MateriApps LIVE!
 
How to setup MateriApps LIVE!
How to setup MateriApps LIVE!How to setup MateriApps LIVE!
How to setup MateriApps LIVE!
 
MateriApps LIVE!の設定
MateriApps LIVE!の設定MateriApps LIVE!の設定
MateriApps LIVE!の設定
 
MateriApps LIVE! の設定
MateriApps LIVE! の設定MateriApps LIVE! の設定
MateriApps LIVE! の設定
 
How to setup MateriApps LIVE!
How to setup MateriApps LIVE!How to setup MateriApps LIVE!
How to setup MateriApps LIVE!
 
How to setup MateriApps LIVE!
How to setup MateriApps LIVE!How to setup MateriApps LIVE!
How to setup MateriApps LIVE!
 
MateriApps LIVE! の設定
MateriApps LIVE! の設定MateriApps LIVE! の設定
MateriApps LIVE! の設定
 
MateriApps LIVE! の設定
MateriApps LIVE! の設定MateriApps LIVE! の設定
MateriApps LIVE! の設定
 
How to setup MateriApps LIVE!
How to setup MateriApps LIVE!How to setup MateriApps LIVE!
How to setup MateriApps LIVE!
 
How to setup MateriApps LIVE!
How to setup MateriApps LIVE!How to setup MateriApps LIVE!
How to setup MateriApps LIVE!
 
MateriApps LIVE! の設定
MateriApps LIVE! の設定MateriApps LIVE! の設定
MateriApps LIVE! の設定
 
How to setup MateriApps LIVE!
How to setup MateriApps LIVE!How to setup MateriApps LIVE!
How to setup MateriApps LIVE!
 
MateriApps LIVE!の設定
MateriApps LIVE!の設定MateriApps LIVE!の設定
MateriApps LIVE!の設定
 
ALPSチュートリアル
ALPSチュートリアルALPSチュートリアル
ALPSチュートリアル
 
How to setup MateriApps LIVE!
How to setup MateriApps LIVE!How to setup MateriApps LIVE!
How to setup MateriApps LIVE!
 
MateriApps LIVE!の設定
MateriApps LIVE!の設定MateriApps LIVE!の設定
MateriApps LIVE!の設定
 
MateriApps: OpenMXを利用した第一原理計算の簡単な実習
MateriApps: OpenMXを利用した第一原理計算の簡単な実習MateriApps: OpenMXを利用した第一原理計算の簡単な実習
MateriApps: OpenMXを利用した第一原理計算の簡単な実習
 
CMSI計算科学技術特論C (2015) ALPS と量子多体問題②
CMSI計算科学技術特論C (2015) ALPS と量子多体問題②CMSI計算科学技術特論C (2015) ALPS と量子多体問題②
CMSI計算科学技術特論C (2015) ALPS と量子多体問題②
 
CMSI計算科学技術特論C (2015) ALPS と量子多体問題①
CMSI計算科学技術特論C (2015) ALPS と量子多体問題①CMSI計算科学技術特論C (2015) ALPS と量子多体問題①
CMSI計算科学技術特論C (2015) ALPS と量子多体問題①
 

Recently uploaded

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 

Recently uploaded (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 

CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics

  • 1. Parallelization of molecular dynamics Jaewoon Jung (RIKEN Advanced Institute for Computational Science)
  • 3. Molecular Dynamics (MD) 1. Energy/forces are described by classical molecular mechanics force field. 2. Update state according to equations of motion Long time MD trajectories are important to obtain      thermodynamic quantities of target systems. i i i i d dt m d dt   r p p F Equation of motion Long time MD trajectory => Ensemble generation ( ) ( ) ( ) ( ) i i i i i i t t t t m t t t t           p r r p p F Integration
  • 4. Potential energy in MD 4 2 total 0 bonds 2 0 angles dihedrals 12 61 0 0 1 1 ( ) ( ) [1 cos( )] 2 b a n N N ij ij i j ij ij ij ijj i j E k b b k V n r r q q r r r                                                O(N) O(N) O(N) O(N2) Main bottleneck in MD 12 6 2 2 0 0 2 0 erfc( ) exp( / 4 ) 2 FFT( ( )) ij ij i j ij ij ij ij iji j R r r q q r Q r r r                                    k k k k Real space, O(CN) Reciprocal space, O(NlogN) Total number of particles
  • 5. Non-bonded interaction 1. Non-bond energy calculation is reduced by introducing cutoff 2 4 2 U 2. The electrostatic energy calculation beyond cutoff will be done in the reciprocal space with FFT 3. Further, it could be reduced by properly distributing over parallel processors, in particular good domain decomposition scheme. 5 U 4 erfc | | 2 exp G 4⁄ G G 4 cos G ∙ 4 Real part Reciprocal part Self energy O(N2) O(N1)
  • 6. Current MD simulations of biological systems Protein Folding  by Anton D. E. Shaw (2011) (WW domain,  50 AAs) 50 nm 10 nm 5 nm 100 nm 6 10ns 100ns μs ms sec Virus system Schulten (2013) (HIV virus) Large biomolecule Sanbonmatsu (2013)  (Ribosome) The First MD Karplus (1977)  (BPTI, 3ps) Crowding system performed by GENESIS : Comparable to the largest system performed until now Time Size
  • 7. Difficulty to perform long time MD simulation 1. One time step length (Δt) is limited to 1-2 fs due to vibrations. 2. On the other hand, biologically meaningful events occur on the time scale of milliseconds or longer. fs ps ns μs ms sec vibrations Sidechain motions Mainchain motions Folding Protein global motions
  • 8. How to accelerate MD simulations? => Parallelization Serial Parallel 16 cpus X16? 1cpu 1cpu Good Parallelization ;  1) Small amount of computation in one CPU 2) Small amount of communication time C CPU Core MPI  Comm
  • 9. Overview of MPI and OpenMP parallelization
  • 10. Shared memory parallelization (OpenMP) M1 M2 M3 MP‐1 MP P1 P2 P3 PP‐1 PP …
  • 11. Distributed memory parallelization (MPI) M1 P1 M2 P2 M3 P3 MP‐1 PP‐1 MP PP …
  • 13. Parallelization of MD (real space)
  • 14. Parallelization scheme 1 : Replicated data approach 1. Each processor has a copy of all particle data. 2. Each processor works only part of the whole works by proper assign in do loops. do i = 1, N do j = i+1, N energy(i,j) force(i,j) end do end do my_rank = MPI_Rank proc = total MPI do i = my_rank+1, N, proc do j = i+1, N energy(i,j) force(i,j) end do end do MPI reduction (energy,force)
  • 15. Hybrid (MPI+OpenMP) parallelization of the Replicated data approach 1. Works are distributed over MPI and OpenMP threads. 2. Parallelization is increased by reducing the number of MPIs involved in communications. my_rank = MPI_Rank proc = total MPI do i = my_rank+1, N, proc do j = i+1, N energy(i,j) force(i,j) end do end do MPI reduction (energy,force) my_rank = MPI_Rank proc = total MPI nthread = total OMP thread !$omp parallel id = omp thread id my_id = my_rank*nthread + id do i = my_id+1,N,proc*nthread do j = i+1, N energy(i,j) force(i,j) end do end do Openmp reduciton !$omp end parallel MPI reduction (energy,force)
  • 16. Advantage/Disadvantage of the Replicated data approach 1. Advantage : easy to implement 2. Disadvantage 1) Parallel efficiency is not good a) Load imbalance b) Communication is not reduced by increasing the number of processors 2) Needs a lot of memory b) Memory usage is independent of the number of processors
  • 17. Parallelization scheme 2 : Domain decomposition 1. The simulation space is divided into subdomains according to MPI 2. Domain size is usually equal to or greater than the cutoff value.
  • 18. Advantage/Disadvantage of the domain decomposition approach 1. Advantage 1) Parallel efficiency is very good compared to the replicated data method due to small communicational cost 2) The amount of memory is reduced by increasing the number of processors 2. Disadvantage 1) Difficult to implement
  • 19. Comparison of two parallelization scheme Computation Communication Memory Replicated data O(N/P) O(N) O(N) Domain decomposition O(N/P) O((N/P)2/3) O(N/P)
  • 20. Domain decomposition of existing MD : (1) Gromacs Coordinates in zones 1 to 7 are communicated to the corner cell 0 1. Gromacs makes use of the 8-th shell scheme as the domain decomposition scheme Rc 8th shell scheme Ref : B. Hess et al. J. Chem. Theor. Comput. 4, 435 (2008)
  • 21. Domain decomposition of existing MD : (1) Gromacs 2. Multiple-Program, Multiple-Data PME parallelization 1) A subset of processors are assigned for reciprocal space PME 2) Other processors are assigned for real space and integration Ref : S. Pronk et al. Bioinformatics, btt055 (2013)
  • 22. Domain decomposition of existing MD : (2) NAMD 1. NAMD is based on the Charmm++ parallel program system and runtime library. 2. Subdomains named patch are decided according to MPI 3. Forces are calculated by independent compute objects Ref : L. Kale et al. J. Comput. Chem. 151, 283 ((1999)
  • 23. Domain decomposition of existing MD : (3) Desmond – midpoint method 1. Two particles interact on a particular box if and only if the midpoint of the segment connecting them falls within the region of space associated with that box 2. This scheme applies not only for non-bonded but also bonded interactions. Each pair of particles separated by a distance less than R (cutoff distance) is connected by a dashed line segment, with “x” at its center lying in the box which will compute the interaction of that pair Ref : KJ. Bowers et al, J. Chem. Phys. 124, 184109 (2006)
  • 24. Domain decomposition of existing MD : (4) GENESIS – midpoint cell method Midpoint method : interaction between two particles are decided from the midpoint position of them. Midpoint cell method : interaction between two particles are decided from the midpoint cells where each particle resides. Small communication, efficient energy/force  evaluations Ref : J. Jung et al, J. Comput. Chem. 35, 1064 (2014)
  • 26. Eighth shell v.s. Midpoint (cell) Rc/2 1. Midpoint (cell) and eighth shell method have the same amount of communication (import volumes are identical to each other) 2. Midpoint (cell) method could be move better than the eighth shell for certain communication network topologies like the toroidal mesh. 3. Midpoint (cell) could be more advantageous considering cubic decomposition parallelization for FFT (will be explained after) Eighth shell Midpoint Rc
  • 27. Hybrid parallelization in GENESIS 1. The basic is the Domain decomposition with the midpoint scheme 2. Each node consists of at least two cells in each direction. 3. Within nodes, thread calculation (OPEN MP) is used for parallelization and communication is not necessary 4. Only for different nodes, we allow point to point communication (MPI) for neighboring domains.
  • 28. Efficient shared memory calculation =>Cell-wise particle data 1. Each cell contains an array with the data of the particles that reside within it 2. This improves the locality of the particle data for operations on individual cells or pairs of cells. particle data in traditional cell lists Ref : P. Gonnet, JCC 33, 76-81 (2012) cell-wise arrays of particle data
  • 29. How to parallelize by OPENMP? 1. In the case of integrator, every cell indices are divided according to the thread id. 2. As for the non-bond interaction, cell-pair lists are first identified and cell-pair lists are distributed to each thread 1 5 2 13 9 10 6 3 14 4 7 11 8 12 1615 (1,2) (1,5) (1,6) (2,3) (2,5) (2,6) (2,7) (3,4) (3,6) (3,7) (3,8) (4,7) (4,8) (5,6) (5,9) (5,10) (6,7) (6,10) … thread1 thread2 thread3 thread4 thread1 thread2 thread3 thread4 …
  • 30. Recent trend of MD : CPU => GPU Image is from informaticamente.over‐blog.it 1. A CPU core can execute 4 or 8 32-bit instructions per clock, whereas a GPU can execute much more (>3200) 32-bit instructions per clock. 2. Unlike CPUs, GPUs have a parallel throughput architecture that emphasizes executing many concurrent threads slowly, rather than executing a single thread very quickly. => Parallelization on GPU is also important.
  • 32. Smooth particle mesh Ewald method Real part Reciprocal part Self energy The structure factor in the reciprocal part is approximated as Using Cardinal B-splines of order n Fourier Transform of charge   2 2 2 2 2 2 ' , 1 1 ' erfc1 1 exp( / ) ( ) ( ) 2 2 i j c N N i j i j i i j ii j r q q E S q V                       n k 0 r r n r r n k r k kr r n 1 2 3 1 1 2 2 3 3 1 2 3( , , ) ( ) ( ) ( ) ( )( , , )S k k k b k b k b k F Q k k k It is important to parallelize the Fast Fourier transform efficiently in PME!! Ref : U. Essmann et al, J. Chem. Phys. 103, 8577 (1995)
  • 33. Simple note of MPI_alltoall and MPI_allgather communications MPI_alltoall MPI_allgather
  • 34. Parallel 3D FFT – slab(1D) decomposition 1. Each processor is assigned a slob of size N × N × N/P for computing an N × N × N FFT on P processors. 2. The parallel scheme of FFTW 3. Even though it is easy to implement, the scalability is limited by N, the extent of the data along a single axis 4. In the case of FFTW, N should be divisible by P
  • 35. 1D decomposition of 3D FFT Reference: H. Jagode. Master’s thesis, The University of Edinburgh, 2005
  • 36. 1D decomposition of 3D FFT (continued) 1. Slab decomposition of 3D FFT has three steps  2D FFT (or two 1D FFT) along the two local dimension  Global transpose  1D FFT along third dimension 2. Advantage : The fastest on limited number of processors because it only needs one global transpose 3. Disadvantage : Maximum parallelization is limited to the length of the largest axis of the 3D data ( The maximum parallelization can be increased by using a hybrid method combining 1D decomposition with a thread based parallelization)
  • 37. Parallel 3D FFT –2D decomposition 1. Each processor is assigned a slob of size N × N/P × N/Q for computing an N × N × N FFT on P ×Q processors. 2. Current GENESIS adopt this scheme with 1D FFTW
  • 38. 2D decomposition of 3D FFT Reference: H. Jagode. Master’s thesis, The University of Edinburgh, 2005
  • 39. 2D decomposition of 3D FFT (continued) 1. 2D decomposition of 3D FFT has five steps  1D FFT along the local dimension  Global transpose  1D FFT along the second dimension  Global transpose  1D FFT along the third dimension  Global transpose 2. The global transpose requires communication only between subgroups of all nodes 3. Disadvantage : Slower than 1D decomposition for a number of processors possible with 1D decomposition 4. Advantage : Maximum parallelization is increased 5. Program with this scheme  Parallel FFT package by Steve Plimpoton (Using MPI_Send and MPI_Irecv)[1]  FFTE by Daisuke Takahashi (Using MPI_AlltoAll)[2]  P3DFFT by Dmitry Pekurovsky (Using MPI_Alltoallv)[3] [1] http://www.sandia.gov/~sjplimp/docs/fft/README.html [2] http://www.ffte.jp [3] http://www.sdsc.edu/us/resources/p3dfft.php
  • 40. 2D decomposition of 3D FFT (pseudo-code) ! compute Q factor do i = 1, natom/P compute Q_orig end do call mpi_alltoall(Q_orig, Q_new, …) accumulate Q from Q_new !FFT : F(Q) do iz = 1, zgrid(local) do iy = 1, ygrid(local) work_local = Q(my_rank) call fftw(work_local) Q(my_rank) = work_local end do end do call mpi_alltoall(Q, Q_new,…) do iz = 1, zgrid(local) do ix = 1, xgrid(local) work_local = Q_new(my_rank) call fftw(work_local) Q(my_rank) = work_local end do end do call mpi_alltoall(Q,Q_new,..) do iy = 1, ygrid(local) do ix = 1, xgrid(local) work_local = Q(my_rank) call fftw(work_local) Q(my_rank) = work_local end do end do ! compute energy and virial do iz = 1, zgrid do iy = 1, ygrid(local) do ix = 1, xgrid(local) energy = energy +  sum(Th*Q) virial = viral + .. end do end do end do ! X=F_1(Th)*F_1(Q) ! FFT (F(X)) do iy = 1, ygrid(local) do ix = 1, xgrid(local) work_local = Q(my_rank) call fftw(work_local) Q(my_rank) = work_local end do end do call mpi_alltoall(Q,Q_new,..) do iz = 1, zgrid(local) do ix = 1, xgrid(local) work_local = Q(my_rank) call fftw(work_local) Q(my_rank) = work_local end do end do call mpi_alltoall(Q,Q_new) do iz = 1, zgrid(local) do iy = 1, ygrid(local) work_local = Q(my_rank) call fftw(work_local) Q(my_rank) = work_local end do end do compute force
  • 41. 2 dimensional view of conventional method
  • 42. Benchmark result of existing MD programs
  • 43. Gromacs Ion channel /w virtual sites : 129,692 Ion channel /w.o virtual site : 141,677 Virus capsid : 1,091,164 Vesicle fusion : 2,511,403 Methanol : 7,680,000 Ref : S. Pronk et al. Bioinformatics, btt055 (2013)
  • 44. NAMD Titan (Ref : Y. Sun et al. SC12) Nodes Cores Timestep(ms) 2048 32768 98.8 4096 65536 55.4 8192 131072 30.3 16384 262144 17.9 20M and 100M system on Blue Gene/Q (Ref : K. Sundhakar et al. IPDPS, 2013 IEEE)
  • 45. GENESIS on K (1M atom system) Ref) J. Jung et al. WIREs CMS, 5, 310-323 (2015)
  • 46. ■ 12M System (Provided by Dr. Isseki Yu) □ atom size:  11737298 □ macro molecule size:  216 (43 type) □ metabolites size: 4212 (76 type) □ ion size:  23049  (Na+, Cl‐, Mg2+, K+) □ water size: 2944143 □ PBC size: 480 x 480 x 480(Å3) □ Volume fraction of water 75% ■ Simulation  □ MD program:  GENESIS □ Type of Model: All atom  □ Computer:  K (8192 (16x32x16) nodes) □ Parameter: CHARMM □ Performance 12 ns/day,  (without waiting time)
  • 47. GENESIS on K (12 M system) 2048 4096 8192 16384 32768 65536 131072 8 10 20 40 60 80 100 200 Simulationtime(ns/day) Simulationtime(msec/step) Number of cores 0.5 1 2 4 8 16 256 512 1024 2048 4096 8192 16384 32768 0.125 0.25 0.5 1 2 4 8 Number of cores Time(msec/step) FFT Forward FFT Forward communication FFT Backward FFT Backward communication Send/Receive before force Send/Receive after force
  • 48. Simulation of 12 M atoms system (provided by Dr. Isseki Yu)
  • 50. GENESIS on K (100 M atoms) 16384 32768 65536 131072 262144 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.2 0.3 Number of cores Timeperstep(sec) 16384 32768 65536 131072 262144 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 Simulationtime(ns/day) Number of cores
  • 51. GENESIS on K (100 M atoms, RESPA) 16384 32768 65536 131072 262144 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.2 0.3 Number of cores Timeperstep(sec) 16384 32768 65536 131072 262144 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 8 9 10 Simulationtime(ns/day) Number of cores
  • 52. Summary 1. Parallelization of MD is important to obtain the long time MD trajectory. 2. Domain decomposition scheme is used for minimal communicational and computational cost. Energy/forces are described by classical molecular mechanics force field. 3. Hybrid parallelization (MPI+OpenMP) is very helpful to increase the parallel efficiency. 4. Parallelization of FFT is very important using PME method in MD. 5. There are several decomposition schemes of parallel FFT, and we suggest that the cubic decomposition scheme with the midpoint cell is a good solution for efficient parallelization