SlideShare a Scribd company logo
1 of 20
Download to read offline
1/20 
University of Tokyo 
Huge-Scale Molecular Dynamics 
Simulation of Multi-bubble Nuclei 
ISSP, The University of Tokyo 
M. Suzuki Kyushu University 
H. Inaoka RIKEN AICS 
The University of Tokyo, RIKEN AICS 
Outline 
H. Watanabe 
N. Ito 
1. Introduction 
2. Benchmark results 
3. Cavitation 
4. Future of HPC 
5. Summary and Discussion 
May. 15th, 2014@NCSA Blue Waters Symposium
2/20 
University of Tokyo 
Gas-Liquid Multi-Phase Flow Hierarchical 
Phase 
Transi+on 
Modeling 
Bubble 
Interac+on 
Flow 
Par+cle 
Interac+on 
Direct 
Calcula+on 
Multi-Scale and Multi-Physics Problem 
Numerical Difficulties 
Moving, Annihilation and Creation 
of Boundary 
Hierarchical Modeling 
Divide a problem into sub problems 
Governing Equations for each scale 
Artificial Hierarchy 
Validity of Governing Equation 
Direct Simulation 
Movie 
Introduction (1/2)
3/20 
Classical Nucleation Theory 
Classical nucleation theory (CNT) predicts a nucleation rate of clusters. 
OK for droplet nuclei, bad for bubble nuclei. 
Interaction between clusters 
Interact via gas (diffusive) 
Interact via liquid (ballistic) 
University of Tokyo 
Introduction (2/2) 
Droplets and Bubbles 
Additional work to create a bubble 
Same as Droplet 
Work by Bubble 
It is difficult to investigate microscopic behavior directly. 
Huge scale molecular dynamics simulation 
to observe interaction between bubbles.
4/20 
University of Tokyo 
Simulation Method 
Molecular Dynamics (MD) method with Short-range Interaction 
General Approach 
Parallelization (1/3) 
MPI: Domain Decomposition 
OpenMP: Loop Decomposition 
DO I=1,N 
DO J in Pair(I) 
CalcForce(I,J) 
ENDDO 
ENDDO 
Loop Decomposition here 
or here 
Problems 
Conflicts on write back of momenta 
→ Prepare temporary buffer to avoid conflicts 
→ Do not use Newton’s third law (calculate every force twice) 
We have to thread parallelize other parts, such as pair-list construction. 
We have to treat SIMDization and thread parallelization simultaneously.
5/20 
University of Tokyo 
Pseudo-flat MPI 
Full domain decomposition for both intra- and inner-nodes. 
MDUnit class: 
Represent an OpenMP thread 
Responsible for calculation 
MDManager class: 
Represent an MPI process 
Responsible for communication 
DO I=1,THREAD_NUM 
CALL MDUnit[i]->Calculate() 
ENDDO 
Loop parallelization 
MDUnit 
MDUnit 
MDUnit 
MDUnit 
Thread 
MDUnit 
MDUnit 
MDUnit 
Process 
MDUnit 
Parallelization (2/3) 
Merits 
It is straightforward to implement pseudo-flat MPI codes from flat MPI codes. 
Memory locality is automatically satisfied. (not related to the K computer) 
We have to take care SIMDization only at the hot-spot (force calculation). 
→ We do not have to take care two kinds of load balance simultaneously.
6/20 
Parallelization (3/3) 
Demerits 
Two-level communication is required. 
→ Limitation of MPI call in thread. 
→ Data should be packed before being sent by a master thread. 
It is troublesome to compute physical quantities such as pressure, and so on. 
University of Tokyo 
MDManager 
MDManager 
MDUnit 
MDUnit 
MDUnit 
MDUnit 
Communication 
MDUnit 
MDUnit 
MDUnit 
MDUnit
7/20 
University of Tokyo 
Pair-list Construction 
Algorithms for MD 
Grid Search 
O(N) method to find interacting particle-pairs 
Bookkeeping method 
Reuse the pair-list by registering pairs in 
a length longer than truncation length. 
z 
Sorting of Pair-list 
Sort by i-particle in order to reduce memory access 
Truncation 
Registration
8/20 
University of Tokyo 
SIMDization 
Non-use of Newton’s third law 
We find that store operations involving indirect addressing is quite 
expensive for the K computer. 
We avoid indirect storing by not using Newton’s third law. 
Computations becomes double, but speed up is more than twofold. 
The value of FLOPS is meaningless. 
Hand-SIMDization 
Compiler cannot generate SIMD operations efficiently. 
We use intrinsic SIMD instructions explicitly. (emmintrin.h) 
Divide instruction (fdivd) cannot be specified as a SIMD instruction. 
We use a reciprocal approximation (frcpd) which is fast, but accuracy 
is only 8bit. 
We improve the precision by additional calculations. 
The loop is unrolled by four times to enhance the software pipelining. 
You can see the source codes from: http://mdacp.sourceforge.net/
9/20 
University of Tokyo 
Benchmark (1/3) - Conditions - 
Conditions 
Truncated Lennard-Jones potential with cutoff length 2.5 (in LJ units) 
Initial Condition is FCC and density is 0.5 
Integration Scheme: 2nd order symplectic with dt = 0.001 
0.5 million particles on a CPU-core (4 million particles on a node) 
After 150 steps, measure time required for 1000 steps. 
Parallelization 
Flat MPI : 8 processes /node up to 32768 nodes (4.2 PF) 
Hybrid : 8 threads/node   up to 82944 nodes (10.6 PF) 
Computations assigned to each core are perfectly identical for both scheme. 
1 CPU-core 
1 node 
8 nodes 
0.5 million 
particles 
4 million 
particles 
32 million 
particles
10/20 
University of Tokyo 
500 
400 
300 
200 
100 
0 
Benchmark (2/3) - Elapse Time - 
1 8 64 512 4096 82944 
Elapsed Time [sec] 
Nodes 
Flat-MPI 
Hybrid 
332 billion particles 
72.3% efficiency 
131 billion particles 
92.0% efficiency 
Nice scaling for the flat MPI (92% efficiency compared with a single node) 
Performance of the largest run: 1.76 PFLOPS (16.6%)
11/20 
University of Tokyo 
9 
8 
7 
6 
5 
4 
3 
2 
1 
0 
Benchmark (3/3) - Memory Usage - 
1 8 64 512 4096 82944 
Memory Usage [GiB] 
Nodes 
Flat-MPI 
Hybrid 
8.52 GiB 
39% increase 
5.66 GiB 
2.8% increase 
Usage of memory is almost flat for hybrid. 
extra 1GB/node for Flat-MPI 
We performed product runs using 4096 nodes adopting flat-MPI.
12/20 
University of Tokyo 
Time Evolution of Multi-bubble nuclei 
Equilibrium Liquid 
Adiabatic expansion 
(2.5% on a side) 
Multi-bubble nuclei 
Converge Ostwald-ripening to a single bubble 
Larger bubbles becomes larger, 
smaller bubbles becomes smaller
13/20 
University of Tokyo 
Density 
Temperature 
G 
Coexistence 
Liquid 
Condition 
Initial Condition : Pure Liquid 
Truncation Lenght:3.0 
Periodic Boundary Conditon: 
Time Step:0.005 
Integration:2nd order Symplectic (NVE) 
     2nd order RESPA (NVT) 
Computational Scale 
Resource:4096 nodes of K computer (0.5PF) 
Flat-MPI: 32768 processes 
Parallelization:Simple Domain Decomposition 
Particles: 400 〜 700 million particles 
System Size: L = 960 (in LJ units) 
Thermalize: 10,000 steps + Observation:1,000,000 steps 
Movie 
Details of Simulation 
24 hours / run 
1 run 〜 0.1million node hours 
10 samples =1 million node hours
14/20 
Multi-bubble Nuclei and Ostwald-like Ripening 
intermediate stage 
Final stage 
University of Tokyo 
23 million particles 
Initial stage 
Multi nuclei 
Ostwald-like ripening 
One bubble survives 
1.5 billion particles 
Bubbles appear a the results of particle interactions. 
Ostwald-like ripening is observed as the results of bubble interactions. 
→ Direct simulations of multi-scale and multi-physics phenomena.
15/20 
A type of binarization 
University of Tokyo 
Identification of Bubbles 
Definition of Bubble 
Divide a system into small cells. 
A cell with density smaller than some threshold is defined to be gas phase. 
Identify bubbles on the basis of site-percolation criterion (clustering). 
Process A 
Process B 
Liquid 
Gas 
Bubble 
We do not have to take 
care of byte-order! 
Data Format 
A number of particles in a cell is stored as unsigned char (1 Byte). 
A system is divided into 32.7 million cells → 32.7 MB / snap shots. 
1000 frames/run → 32.7GB/run. 
Clustering is performed in post process.
16/20 
1.5 billion particles 
23 million particles 
University of Tokyo 
Bubble-size distribution 
Bubble size distribution at the 10,000th step after the expansion. 
Accuracy of the smaller run is insufficient for further analysis. 
→ We need 1 billion particles to study population dynamics of bubbles 
which is impossible without Peta-scale computers.
17/20 
shared with the classical nucleation theory 
f(v, t) ⇠ ty ˜ f(vtx) 
University of Tokyo 
Lifshitz-Slyozov-Wagner (LSW) Theory 
Definitions 
a number of bubbles which has volume v at time t. 
f(v, t) 
@f 
@t 
=  
@ 
@v 
v˙(v, t) Kinetic Term (Microscopic Dynamics) 
Assumptions 
( ˙ vf) Governing Equation 
- Mean field approximation 
- Mechanical equilibrium 
- Conservation of the total volume of gas phase 
- Self-similarity of the distribution function 
Predictions 
Power-law behaviors 
n(t) ⇠ tx 
¯v(t) ⇠ tx 
The total number of bubbles 
The average volume of bubbles 
The scaling index x depends on microscopic dynamics.
18/20 
Scaling Region 
University of Tokyo 
Simulations Results (1/2) 
Number of Bubbles 
Average Volume of Bubbles 
Slope: x=1.5 
Scaling Region 
Power-law behavior in late-stage of time evolution. 
Sharing the value of an exponent as predicted by the theory. 
The LSW theory works fine. 
Assumptions are valid. 
We cannot obtain the scaling region using small systems. 
Huge-scale simulation is necessary.
19/20 
University of Tokyo 
Simulations Results (2/2) 
Scaling exponent changed from 1.5 to 1 
High T 
Low T 
Reaction-Limit 
x=3/2 
Diffusion-Limit 
x=1 
Microscopic dynamics can be investigated from macroscopic behavior.
20/20 
Simulations involving billions of particles allows us to investigate multi-scale 
University of Tokyo 
Summary and Discussion 
Summary 
We did physics using a peta-scale computer. 
References 
MDACP (Molecular Dynamics code for Avogadro Challenge Project) 
Source codes are available online: http://mdacp.sourceforge.net/ 
Financial Supports: 
CMSI 
KAUST GRP (KUK-I1-005-04) 
KAKENHI (23740287) 
and multi-physics problems directly. 
Toward Exa-scale Computing 
Please give us decent programming language! 
MPI (library), OpenMP (directive), SIMD (intrinsic functions) 
Acknowledgements 
Computational Resources: 
“Large-scale HPC Challenge” Project, ITC, Univ. of Tokyo 
Institute of Solid State Physics, Univ. of Tokyo 
HPCI project (hp130047) on the K computer, AICS, RIKEN

More Related Content

What's hot

Developing Multithreaded Applications
Developing Multithreaded ApplicationsDeveloping Multithreaded Applications
Developing Multithreaded Applications
Bharat17485
 
Multithreading In Java
Multithreading In JavaMultithreading In Java
Multithreading In Java
parag
 

What's hot (18)

Multi threading
Multi threadingMulti threading
Multi threading
 
Multi-threaded Programming in JAVA
Multi-threaded Programming in JAVAMulti-threaded Programming in JAVA
Multi-threaded Programming in JAVA
 
Learning Java 3 – Threads and Synchronization
Learning Java 3 – Threads and SynchronizationLearning Java 3 – Threads and Synchronization
Learning Java 3 – Threads and Synchronization
 
Java And Multithreading
Java And MultithreadingJava And Multithreading
Java And Multithreading
 
Multithreading in Java
Multithreading in JavaMultithreading in Java
Multithreading in Java
 
Life cycle-of-a-thread
Life cycle-of-a-threadLife cycle-of-a-thread
Life cycle-of-a-thread
 
Multithreading in java
Multithreading in javaMultithreading in java
Multithreading in java
 
Jeff Johnson, Research Engineer, Facebook at MLconf NYC
Jeff Johnson, Research Engineer, Facebook at MLconf NYCJeff Johnson, Research Engineer, Facebook at MLconf NYC
Jeff Johnson, Research Engineer, Facebook at MLconf NYC
 
Multithreading Concepts
Multithreading ConceptsMultithreading Concepts
Multithreading Concepts
 
Java Multithreading
Java MultithreadingJava Multithreading
Java Multithreading
 
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYCTed Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
 
Python Training in Bangalore | Multi threading | Learnbay.in
Python Training in Bangalore | Multi threading | Learnbay.inPython Training in Bangalore | Multi threading | Learnbay.in
Python Training in Bangalore | Multi threading | Learnbay.in
 
Developing Multithreaded Applications
Developing Multithreaded ApplicationsDeveloping Multithreaded Applications
Developing Multithreaded Applications
 
[DL輪読会]Pay Attention to MLPs (gMLP)
[DL輪読会]Pay Attention to MLPs	(gMLP)[DL輪読会]Pay Attention to MLPs	(gMLP)
[DL輪読会]Pay Attention to MLPs (gMLP)
 
L22 multi-threading-introduction
L22 multi-threading-introductionL22 multi-threading-introduction
L22 multi-threading-introduction
 
Shogun 2.0 @ PyData NYC 2012
Shogun 2.0 @ PyData NYC 2012Shogun 2.0 @ PyData NYC 2012
Shogun 2.0 @ PyData NYC 2012
 
Multithreading In Java
Multithreading In JavaMultithreading In Java
Multithreading In Java
 
Java threads
Java threadsJava threads
Java threads
 

Similar to Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei

A new Leach protocol based on ICH-Leach for adaptive image transferring using...
A new Leach protocol based on ICH-Leach for adaptive image transferring using...A new Leach protocol based on ICH-Leach for adaptive image transferring using...
A new Leach protocol based on ICH-Leach for adaptive image transferring using...
TELKOMNIKA JOURNAL
 
Quantum Computers
Quantum ComputersQuantum Computers
Quantum Computers
kathan
 
The Dielectric Relaxation Properties And Dipole Ordering...
The Dielectric Relaxation Properties And Dipole Ordering...The Dielectric Relaxation Properties And Dipole Ordering...
The Dielectric Relaxation Properties And Dipole Ordering...
Sarah Gordon
 
Design Novel CMOL Architecture in Terabit-scale Nano Electronic Memories Tech...
Design Novel CMOL Architecture in Terabit-scale Nano Electronic Memories Tech...Design Novel CMOL Architecture in Terabit-scale Nano Electronic Memories Tech...
Design Novel CMOL Architecture in Terabit-scale Nano Electronic Memories Tech...
IJMTST Journal
 

Similar to Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei (20)

El text.tokuron a(2019).jung190711
El text.tokuron a(2019).jung190711El text.tokuron a(2019).jung190711
El text.tokuron a(2019).jung190711
 
第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)
 
What can we learn from molecular dynamics simulations of carbon nanotube and ...
What can we learn from molecular dynamics simulations of carbon nanotube and ...What can we learn from molecular dynamics simulations of carbon nanotube and ...
What can we learn from molecular dynamics simulations of carbon nanotube and ...
 
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
 
Comparing ICH-leach and Leach Descendents on Image Transfer using DCT
Comparing ICH-leach and Leach Descendents on Image Transfer using DCT Comparing ICH-leach and Leach Descendents on Image Transfer using DCT
Comparing ICH-leach and Leach Descendents on Image Transfer using DCT
 
Entropy 12-02268-v2
Entropy 12-02268-v2Entropy 12-02268-v2
Entropy 12-02268-v2
 
Testo
TestoTesto
Testo
 
Nano mos25
Nano mos25Nano mos25
Nano mos25
 
Quantum computing1
Quantum computing1Quantum computing1
Quantum computing1
 
Quantum comput ing
Quantum comput ingQuantum comput ing
Quantum comput ing
 
A new Leach protocol based on ICH-Leach for adaptive image transferring using...
A new Leach protocol based on ICH-Leach for adaptive image transferring using...A new Leach protocol based on ICH-Leach for adaptive image transferring using...
A new Leach protocol based on ICH-Leach for adaptive image transferring using...
 
Physical Modeling and Design for Phase Change Memories
Physical Modeling and Design for Phase Change MemoriesPhysical Modeling and Design for Phase Change Memories
Physical Modeling and Design for Phase Change Memories
 
Quantum Computers
Quantum ComputersQuantum Computers
Quantum Computers
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
A NOVEL FULL ADDER CELL BASED ON CARBON NANOTUBE FIELD EFFECT TRANSISTORS
A NOVEL FULL ADDER CELL BASED ON CARBON NANOTUBE FIELD EFFECT TRANSISTORSA NOVEL FULL ADDER CELL BASED ON CARBON NANOTUBE FIELD EFFECT TRANSISTORS
A NOVEL FULL ADDER CELL BASED ON CARBON NANOTUBE FIELD EFFECT TRANSISTORS
 
Energy Efficient Wireless Internet Access
Energy Efficient Wireless Internet AccessEnergy Efficient Wireless Internet Access
Energy Efficient Wireless Internet Access
 
The Dielectric Relaxation Properties And Dipole Ordering...
The Dielectric Relaxation Properties And Dipole Ordering...The Dielectric Relaxation Properties And Dipole Ordering...
The Dielectric Relaxation Properties And Dipole Ordering...
 
Design Novel CMOL Architecture in Terabit-scale Nano Electronic Memories Tech...
Design Novel CMOL Architecture in Terabit-scale Nano Electronic Memories Tech...Design Novel CMOL Architecture in Terabit-scale Nano Electronic Memories Tech...
Design Novel CMOL Architecture in Terabit-scale Nano Electronic Memories Tech...
 
Analytical instrumentation introduction
Analytical instrumentation   introductionAnalytical instrumentation   introduction
Analytical instrumentation introduction
 
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free NetworksSelf-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
 

More from Hiroshi Watanabe

More from Hiroshi Watanabe (11)

Enjoy supercomputing
Enjoy supercomputingEnjoy supercomputing
Enjoy supercomputing
 
CMSI計算科学技術特論A(8) 高速化チューニングとその関連技術2
CMSI計算科学技術特論A(8) 高速化チューニングとその関連技術2CMSI計算科学技術特論A(8) 高速化チューニングとその関連技術2
CMSI計算科学技術特論A(8) 高速化チューニングとその関連技術2
 
CMSI計算科学技術特論A(8) 高速化チューニングとその関連技術1
CMSI計算科学技術特論A(8) 高速化チューニングとその関連技術1CMSI計算科学技術特論A(8) 高速化チューニングとその関連技術1
CMSI計算科学技術特論A(8) 高速化チューニングとその関連技術1
 
AVX命令を用いたLJの力計算のSIMD化
AVX命令を用いたLJの力計算のSIMD化AVX命令を用いたLJの力計算のSIMD化
AVX命令を用いたLJの力計算のSIMD化
 
MDとはなにか
MDとはなにかMDとはなにか
MDとはなにか
 
短距離ハイブリッド並列分子動力学コードの設計思想と説明のようなもの 〜並列編〜
短距離ハイブリッド並列分子動力学コードの設計思想と説明のようなもの 〜並列編〜短距離ハイブリッド並列分子動力学コードの設計思想と説明のようなもの 〜並列編〜
短距離ハイブリッド並列分子動力学コードの設計思想と説明のようなもの 〜並列編〜
 
短距離ハイブリッド並列分子動力学コードの設計思想と説明のようなもの
短距離ハイブリッド並列分子動力学コードの設計思想と説明のようなもの短距離ハイブリッド並列分子動力学コードの設計思想と説明のようなもの
短距離ハイブリッド並列分子動力学コードの設計思想と説明のようなもの
 
130613-debug
130613-debug130613-debug
130613-debug
 
ハイパースレッディングの並列化への影響
ハイパースレッディングの並列化への影響ハイパースレッディングの並列化への影響
ハイパースレッディングの並列化への影響
 
短距離古典分子動力学計算の 高速化と大規模並列化
短距離古典分子動力学計算の 高速化と大規模並列化短距離古典分子動力学計算の 高速化と大規模並列化
短距離古典分子動力学計算の 高速化と大規模並列化
 
Tuning, etc.
Tuning, etc.Tuning, etc.
Tuning, etc.
 

Recently uploaded

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 

Recently uploaded (20)

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 

Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei

  • 1. 1/20 University of Tokyo Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei ISSP, The University of Tokyo M. Suzuki Kyushu University H. Inaoka RIKEN AICS The University of Tokyo, RIKEN AICS Outline H. Watanabe N. Ito 1. Introduction 2. Benchmark results 3. Cavitation 4. Future of HPC 5. Summary and Discussion May. 15th, 2014@NCSA Blue Waters Symposium
  • 2. 2/20 University of Tokyo Gas-Liquid Multi-Phase Flow Hierarchical Phase Transi+on Modeling Bubble Interac+on Flow Par+cle Interac+on Direct Calcula+on Multi-Scale and Multi-Physics Problem Numerical Difficulties Moving, Annihilation and Creation of Boundary Hierarchical Modeling Divide a problem into sub problems Governing Equations for each scale Artificial Hierarchy Validity of Governing Equation Direct Simulation Movie Introduction (1/2)
  • 3. 3/20 Classical Nucleation Theory Classical nucleation theory (CNT) predicts a nucleation rate of clusters. OK for droplet nuclei, bad for bubble nuclei. Interaction between clusters Interact via gas (diffusive) Interact via liquid (ballistic) University of Tokyo Introduction (2/2) Droplets and Bubbles Additional work to create a bubble Same as Droplet Work by Bubble It is difficult to investigate microscopic behavior directly. Huge scale molecular dynamics simulation to observe interaction between bubbles.
  • 4. 4/20 University of Tokyo Simulation Method Molecular Dynamics (MD) method with Short-range Interaction General Approach Parallelization (1/3) MPI: Domain Decomposition OpenMP: Loop Decomposition DO I=1,N DO J in Pair(I) CalcForce(I,J) ENDDO ENDDO Loop Decomposition here or here Problems Conflicts on write back of momenta → Prepare temporary buffer to avoid conflicts → Do not use Newton’s third law (calculate every force twice) We have to thread parallelize other parts, such as pair-list construction. We have to treat SIMDization and thread parallelization simultaneously.
  • 5. 5/20 University of Tokyo Pseudo-flat MPI Full domain decomposition for both intra- and inner-nodes. MDUnit class: Represent an OpenMP thread Responsible for calculation MDManager class: Represent an MPI process Responsible for communication DO I=1,THREAD_NUM CALL MDUnit[i]->Calculate() ENDDO Loop parallelization MDUnit MDUnit MDUnit MDUnit Thread MDUnit MDUnit MDUnit Process MDUnit Parallelization (2/3) Merits It is straightforward to implement pseudo-flat MPI codes from flat MPI codes. Memory locality is automatically satisfied. (not related to the K computer) We have to take care SIMDization only at the hot-spot (force calculation). → We do not have to take care two kinds of load balance simultaneously.
  • 6. 6/20 Parallelization (3/3) Demerits Two-level communication is required. → Limitation of MPI call in thread. → Data should be packed before being sent by a master thread. It is troublesome to compute physical quantities such as pressure, and so on. University of Tokyo MDManager MDManager MDUnit MDUnit MDUnit MDUnit Communication MDUnit MDUnit MDUnit MDUnit
  • 7. 7/20 University of Tokyo Pair-list Construction Algorithms for MD Grid Search O(N) method to find interacting particle-pairs Bookkeeping method Reuse the pair-list by registering pairs in a length longer than truncation length. z Sorting of Pair-list Sort by i-particle in order to reduce memory access Truncation Registration
  • 8. 8/20 University of Tokyo SIMDization Non-use of Newton’s third law We find that store operations involving indirect addressing is quite expensive for the K computer. We avoid indirect storing by not using Newton’s third law. Computations becomes double, but speed up is more than twofold. The value of FLOPS is meaningless. Hand-SIMDization Compiler cannot generate SIMD operations efficiently. We use intrinsic SIMD instructions explicitly. (emmintrin.h) Divide instruction (fdivd) cannot be specified as a SIMD instruction. We use a reciprocal approximation (frcpd) which is fast, but accuracy is only 8bit. We improve the precision by additional calculations. The loop is unrolled by four times to enhance the software pipelining. You can see the source codes from: http://mdacp.sourceforge.net/
  • 9. 9/20 University of Tokyo Benchmark (1/3) - Conditions - Conditions Truncated Lennard-Jones potential with cutoff length 2.5 (in LJ units) Initial Condition is FCC and density is 0.5 Integration Scheme: 2nd order symplectic with dt = 0.001 0.5 million particles on a CPU-core (4 million particles on a node) After 150 steps, measure time required for 1000 steps. Parallelization Flat MPI : 8 processes /node up to 32768 nodes (4.2 PF) Hybrid : 8 threads/node   up to 82944 nodes (10.6 PF) Computations assigned to each core are perfectly identical for both scheme. 1 CPU-core 1 node 8 nodes 0.5 million particles 4 million particles 32 million particles
  • 10. 10/20 University of Tokyo 500 400 300 200 100 0 Benchmark (2/3) - Elapse Time - 1 8 64 512 4096 82944 Elapsed Time [sec] Nodes Flat-MPI Hybrid 332 billion particles 72.3% efficiency 131 billion particles 92.0% efficiency Nice scaling for the flat MPI (92% efficiency compared with a single node) Performance of the largest run: 1.76 PFLOPS (16.6%)
  • 11. 11/20 University of Tokyo 9 8 7 6 5 4 3 2 1 0 Benchmark (3/3) - Memory Usage - 1 8 64 512 4096 82944 Memory Usage [GiB] Nodes Flat-MPI Hybrid 8.52 GiB 39% increase 5.66 GiB 2.8% increase Usage of memory is almost flat for hybrid. extra 1GB/node for Flat-MPI We performed product runs using 4096 nodes adopting flat-MPI.
  • 12. 12/20 University of Tokyo Time Evolution of Multi-bubble nuclei Equilibrium Liquid Adiabatic expansion (2.5% on a side) Multi-bubble nuclei Converge Ostwald-ripening to a single bubble Larger bubbles becomes larger, smaller bubbles becomes smaller
  • 13. 13/20 University of Tokyo Density Temperature G Coexistence Liquid Condition Initial Condition : Pure Liquid Truncation Lenght:3.0 Periodic Boundary Conditon: Time Step:0.005 Integration:2nd order Symplectic (NVE)      2nd order RESPA (NVT) Computational Scale Resource:4096 nodes of K computer (0.5PF) Flat-MPI: 32768 processes Parallelization:Simple Domain Decomposition Particles: 400 〜 700 million particles System Size: L = 960 (in LJ units) Thermalize: 10,000 steps + Observation:1,000,000 steps Movie Details of Simulation 24 hours / run 1 run 〜 0.1million node hours 10 samples =1 million node hours
  • 14. 14/20 Multi-bubble Nuclei and Ostwald-like Ripening intermediate stage Final stage University of Tokyo 23 million particles Initial stage Multi nuclei Ostwald-like ripening One bubble survives 1.5 billion particles Bubbles appear a the results of particle interactions. Ostwald-like ripening is observed as the results of bubble interactions. → Direct simulations of multi-scale and multi-physics phenomena.
  • 15. 15/20 A type of binarization University of Tokyo Identification of Bubbles Definition of Bubble Divide a system into small cells. A cell with density smaller than some threshold is defined to be gas phase. Identify bubbles on the basis of site-percolation criterion (clustering). Process A Process B Liquid Gas Bubble We do not have to take care of byte-order! Data Format A number of particles in a cell is stored as unsigned char (1 Byte). A system is divided into 32.7 million cells → 32.7 MB / snap shots. 1000 frames/run → 32.7GB/run. Clustering is performed in post process.
  • 16. 16/20 1.5 billion particles 23 million particles University of Tokyo Bubble-size distribution Bubble size distribution at the 10,000th step after the expansion. Accuracy of the smaller run is insufficient for further analysis. → We need 1 billion particles to study population dynamics of bubbles which is impossible without Peta-scale computers.
  • 17. 17/20 shared with the classical nucleation theory f(v, t) ⇠ ty ˜ f(vtx) University of Tokyo Lifshitz-Slyozov-Wagner (LSW) Theory Definitions a number of bubbles which has volume v at time t. f(v, t) @f @t = @ @v v˙(v, t) Kinetic Term (Microscopic Dynamics) Assumptions ( ˙ vf) Governing Equation - Mean field approximation - Mechanical equilibrium - Conservation of the total volume of gas phase - Self-similarity of the distribution function Predictions Power-law behaviors n(t) ⇠ tx ¯v(t) ⇠ tx The total number of bubbles The average volume of bubbles The scaling index x depends on microscopic dynamics.
  • 18. 18/20 Scaling Region University of Tokyo Simulations Results (1/2) Number of Bubbles Average Volume of Bubbles Slope: x=1.5 Scaling Region Power-law behavior in late-stage of time evolution. Sharing the value of an exponent as predicted by the theory. The LSW theory works fine. Assumptions are valid. We cannot obtain the scaling region using small systems. Huge-scale simulation is necessary.
  • 19. 19/20 University of Tokyo Simulations Results (2/2) Scaling exponent changed from 1.5 to 1 High T Low T Reaction-Limit x=3/2 Diffusion-Limit x=1 Microscopic dynamics can be investigated from macroscopic behavior.
  • 20. 20/20 Simulations involving billions of particles allows us to investigate multi-scale University of Tokyo Summary and Discussion Summary We did physics using a peta-scale computer. References MDACP (Molecular Dynamics code for Avogadro Challenge Project) Source codes are available online: http://mdacp.sourceforge.net/ Financial Supports: CMSI KAUST GRP (KUK-I1-005-04) KAKENHI (23740287) and multi-physics problems directly. Toward Exa-scale Computing Please give us decent programming language! MPI (library), OpenMP (directive), SIMD (intrinsic functions) Acknowledgements Computational Resources: “Large-scale HPC Challenge” Project, ITC, Univ. of Tokyo Institute of Solid State Physics, Univ. of Tokyo HPCI project (hp130047) on the K computer, AICS, RIKEN