SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Using GPUs to accelerate
nonstiff and stiff chemical
kinetics in combustion
simulations
Kyle Niemeyer
School of Mechanical, Industrial, and Manufacturing Eng.
Oregon State University
20 April 2015
Benefit of GPU computing
(for combustion)
2
Two
avenues
https://www.olcf.ornl.gov/titan/
Exascale science on supercomputers
High-fidelity engineering
on workstations
Challenges of GPU computing
(for combustion)
3
Two
challenges
Design algorithms/strategies to
reduce computational expense
Identify appropriate algorithms
for equal or better performance
GPU
• Graphics Processing Unit
• Developed to process & display 1000s pixels
• Throughput over latency massive parallelism
4
TECHNICAL SPECIFICATIONS
FORM FACTOR> 9.75” PCIe x16 form factor
# OF CUDA CORES> 448
FRE
of high performance
Modern GPU hardware
architecture
Streaming
multiprocessor
(SM)
5
A.R. Brodtkorb et al. / J. Parallel Distrib. Comput. 73 (2013) 4–13
Fermi-class GPU hardware. The GPU consisting of up to 16 streaming multiprocessors (also known as SMs) is shown in (left), and (righ
the information in this article can be found in
different sources, including books, documentation,
nference presentations, and on Internet fora. Getting
of all this information is an arduous exercise that
bstantial effort. The aim of this article is therefore to
distributes thread blocks to multiprocessor thread sc
Fig. 4a). This scheduler handles concurrent kernel2
e
out-of-order thread block execution.
Each multiprocessor has 16 load/store units, all
and destination addresses to be calculated for 16 thre
Brodtkorb AR, Hagen TR, Sætra ML. J
Parallel Distrib Comput 2013;73:4–13.
Using GPUs
• Parallel function: “kernel”
• Hundreds–millions of concurrent threads
• Executed in 32-thread “warps”
• Challenge: thread divergence
6
if
if if
else
elseelse
7
if
if if
else
elseelse
8
Problem Description
9
ODE ODE ODE ODE ODE ODE ODE ODE ODE
ODE ODE ODE ODE ODE ODE ODE ODE ODE
ODE ODE ODE ODE ODE ODE ODE ODE ODE
ODE ODE ODE ODE ODE ODE ODE ODE ODE
ODE ODE ODE ODE ODE ODE ODE ODE ODE
ODE ODE ODE ODE ODE ODE ODE ODE ODE
ODE ODE ODE ODE ODE ODE ODE ODE ODE
• Large number of independent ODEs to
solve
• Can be even more for turbulent
combustion!
dYi
dt
=
Wi
⇢
!i
0
B
B
B
@
dY1
dt
dY2
dt
...
dYk
dt
1
C
C
C
A
10
• Traditionally solved with implicit algorithms:
complex logical flow
GPU Algorithms
significant thread divergence
11
Prior Efforts: Implicit
Implicit algorithms*: not well-suited for
GPU acceleration
12
Algorithm
Single-core CPU
speedup
Stone & Davis 2013 DVODE 7.7×
Sewerin &
Rigopoulos 2015
implicit Runge–
Kutta (Radau5)
4.8×
*Thus far
• Traditionally solved with implicit algorithms:
complex logical flow
• Instead, what about explicit algorithms?
GPU Algorithms
significant thread divergence
13
Prior Efforts: Explicit
14
Algorithm
Single-core
CPU speedup
# Species
Spafford et al.
2010
4th-order Runge–
Kutta*
9× 22
Niemeyer et al.
2011
4th-order Runge–
Kutta
75× 9
Shi et al. 2012 CHEMEQ2 3–13× 39 & 117
Stone & Davis
2013
4th-order Runge–
Kutta–Fehlberg
28.6× 19 (reduced)
*Species production terms only
GPU Algorithms
• For nonstiff chemistry: explicit Runge–Kutta–
Cash–Karp
- H2: 9 species & 38 reactions
• Moderately stiff* chemistry: stabilized explicit
Runge–Kutta–Chebyshev
- H2/CO: 13 species & 27 reactions
- CH4: 53 species & 325 reactions
- C2H4: 111 species & 784 reactions
• Custom CPU & GPU source code
15
Initial Condition Sampling
16
Runge–Kutta–Cash–Karp
• Fifth-order accuracy
• Adaptive time stepping
• Global time step: 1×10-8 sec
• “Nonstiff” hydrogen mechanism1
• Range number of ODEs from 10–107
1Yetter , Dryer, and Rabitz, CST 79 (1991) 97–128
17
18
10
-3
10
-2
10
-1
10
0
10
1
10
2
10
3
10
2
10
3
10
4
10
5
10
6
10
7
Computingtimeperglobaltimestep(s)
Number of independent ODEs
RKCK-CPU
RKCK-CPU × 6
RKCK-GPU
25×
126×
Dealing with Stiffness
• Standard explicit algorithms fail
• “Stabilized” explicit methods
19
Runge–Kutta–Chebyshev
• AKA “stabilized” Runge–Kutta
• Explicit, but capable of handling mild
stiffness
- Extended stability domain along real axis
• Second-order accuracy
• Jacobian free
• Time step: 1×10-6 sec
20
H2/CO – RKC
21
59×
10×
CH4 – RKC
22
69×
13×
C2H4 – RKC
23
18×
4.5×
CH4 – RKC vs. VODE
24
57×
C2H4 – RKC vs. VODE (stiff)
25
2.5×
Time step: 1×10-4 s
Takeaways
• For exascale (i.e., DNS):
- Explicit algorithms significantly faster on GPUs
• For high-fidelity engineering (i.e., LES):
- Implicit algorithms perform comparably to CPU, so
far… (but not much better)
- Stabilized explicit algorithms offer attractive
alternative
- Greater stiffness still a problem
26
More details: see Niemeyer & Sung, J
Comput Phys 256 (2014):854–871
?
27
Acknowledgements: Chih-Jen Sung
& Nick Curtis @ UConn
Thank you!
Questions?
28
10
0
10
1
10
2
10
3
10
4
10
3
10
4
10
5
10
6
Computingtimeperglobaltimestep(s)
Number of independent ODEs
4×
Similar conditions
Randomized conditions
CH4 – Thread divergence

Contenu connexe

Tendances

Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...Tokyo Institute of Technology
 
Let's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwLet's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwJan Holčapek
 
Hardware universe reference sheet v series net app vtl rc00330909 (sep 2009)
Hardware universe reference sheet v series net app vtl rc00330909 (sep 2009)Hardware universe reference sheet v series net app vtl rc00330909 (sep 2009)
Hardware universe reference sheet v series net app vtl rc00330909 (sep 2009)Pedro Siqueira
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Kohei KaiGai
 
uTensor COSCUP
uTensor COSCUPuTensor COSCUP
uTensor COSCUPNeil Tan
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012DefCamp
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDASavith Satheesh
 
DDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : PresentationDDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : PresentationSubhajit Sahu
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAprithan
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_mapslcplcp1
 
WSE 6A-Octo-X Terrain Mapping UAV
WSE 6A-Octo-X Terrain Mapping UAVWSE 6A-Octo-X Terrain Mapping UAV
WSE 6A-Octo-X Terrain Mapping UAVManuel De La Cruz
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
3 open power gcg_offering_jp_meetup_20181126_v1
3 open power gcg_offering_jp_meetup_20181126_v13 open power gcg_offering_jp_meetup_20181126_v1
3 open power gcg_offering_jp_meetup_20181126_v1Yutaka Kawai
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda enKohei KaiGai
 
DevOpsDaysRiga 2017 Ignite: Toshaan Bharvani - POWER your DC
DevOpsDaysRiga 2017 Ignite: Toshaan Bharvani - POWER your DCDevOpsDaysRiga 2017 Ignite: Toshaan Bharvani - POWER your DC
DevOpsDaysRiga 2017 Ignite: Toshaan Bharvani - POWER your DCDevOpsDays Riga
 
Accelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUsAccelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUsVivek Venugopalan
 

Tendances (20)

Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
 
Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better
Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, BetterMachine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better
Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better
 
Let's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwLet's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdw
 
Hardware universe reference sheet v series net app vtl rc00330909 (sep 2009)
Hardware universe reference sheet v series net app vtl rc00330909 (sep 2009)Hardware universe reference sheet v series net app vtl rc00330909 (sep 2009)
Hardware universe reference sheet v series net app vtl rc00330909 (sep 2009)
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
Cuda
CudaCuda
Cuda
 
uTensor COSCUP
uTensor COSCUPuTensor COSCUP
uTensor COSCUP
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
DDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : PresentationDDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : Presentation
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_maps
 
WSE 6A-Octo-X Terrain Mapping UAV
WSE 6A-Octo-X Terrain Mapping UAVWSE 6A-Octo-X Terrain Mapping UAV
WSE 6A-Octo-X Terrain Mapping UAV
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
3 open power gcg_offering_jp_meetup_20181126_v1
3 open power gcg_offering_jp_meetup_20181126_v13 open power gcg_offering_jp_meetup_20181126_v1
3 open power gcg_offering_jp_meetup_20181126_v1
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
DevOpsDaysRiga 2017 Ignite: Toshaan Bharvani - POWER your DC
DevOpsDaysRiga 2017 Ignite: Toshaan Bharvani - POWER your DCDevOpsDaysRiga 2017 Ignite: Toshaan Bharvani - POWER your DC
DevOpsDaysRiga 2017 Ignite: Toshaan Bharvani - POWER your DC
 
Nvidia tesla-k80-overview
Nvidia tesla-k80-overviewNvidia tesla-k80-overview
Nvidia tesla-k80-overview
 
Accelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUsAccelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUs
 

Similaire à Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion simulations

Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
 
C-SAW: A Framework for Graph Sampling and Random Walk on GPUs
C-SAW: A Framework for Graph Sampling and Random Walk on GPUsC-SAW: A Framework for Graph Sampling and Random Walk on GPUs
C-SAW: A Framework for Graph Sampling and Random Walk on GPUsPandey_G
 
RT15 Berkeley | ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
RT15 Berkeley |  ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...RT15 Berkeley |  ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
RT15 Berkeley | ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...OPAL-RT TECHNOLOGIES
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Fisnik Kraja
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAFacultad de Informática UCM
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...NVIDIA Taiwan
 
Advancements in the Real-Time Simulation of Large Active Distribution Systems...
Advancements in the Real-Time Simulation of Large Active Distribution Systems...Advancements in the Real-Time Simulation of Large Active Distribution Systems...
Advancements in the Real-Time Simulation of Large Active Distribution Systems...OPAL-RT TECHNOLOGIES
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010TELECOM I+D
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUsSri Ambati
 
HPC Accelerating Combustion Engine Design
HPC Accelerating Combustion Engine DesignHPC Accelerating Combustion Engine Design
HPC Accelerating Combustion Engine Designinside-BigData.com
 
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATORQGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATORNVIDIA Japan
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolversinside-BigData.com
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureCeph Community
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry
 

Similaire à Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion simulations (20)

Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
C-SAW: A Framework for Graph Sampling and Random Walk on GPUs
C-SAW: A Framework for Graph Sampling and Random Walk on GPUsC-SAW: A Framework for Graph Sampling and Random Walk on GPUs
C-SAW: A Framework for Graph Sampling and Random Walk on GPUs
 
RT15 Berkeley | ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
RT15 Berkeley |  ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...RT15 Berkeley |  ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
RT15 Berkeley | ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
 
Available HPC resources at CSUC
Available HPC resources at CSUCAvailable HPC resources at CSUC
Available HPC resources at CSUC
 
Available HPC resources at CSUC
Available HPC resources at CSUCAvailable HPC resources at CSUC
Available HPC resources at CSUC
 
Advancements in the Real-Time Simulation of Large Active Distribution Systems...
Advancements in the Real-Time Simulation of Large Active Distribution Systems...Advancements in the Real-Time Simulation of Large Active Distribution Systems...
Advancements in the Real-Time Simulation of Large Active Distribution Systems...
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
HPC Accelerating Combustion Engine Design
HPC Accelerating Combustion Engine DesignHPC Accelerating Combustion Engine Design
HPC Accelerating Combustion Engine Design
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
 
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATORQGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
 
Programming Models for Heterogeneous Chips
Programming Models for  Heterogeneous ChipsProgramming Models for  Heterogeneous Chips
Programming Models for Heterogeneous Chips
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
Dl2 computing gpu
Dl2 computing gpuDl2 computing gpu
Dl2 computing gpu
 

Dernier

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 

Dernier (20)

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 

Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion simulations

  • 1. Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion simulations Kyle Niemeyer School of Mechanical, Industrial, and Manufacturing Eng. Oregon State University 20 April 2015
  • 2. Benefit of GPU computing (for combustion) 2 Two avenues https://www.olcf.ornl.gov/titan/ Exascale science on supercomputers High-fidelity engineering on workstations
  • 3. Challenges of GPU computing (for combustion) 3 Two challenges Design algorithms/strategies to reduce computational expense Identify appropriate algorithms for equal or better performance
  • 4. GPU • Graphics Processing Unit • Developed to process & display 1000s pixels • Throughput over latency massive parallelism 4 TECHNICAL SPECIFICATIONS FORM FACTOR> 9.75” PCIe x16 form factor # OF CUDA CORES> 448 FRE of high performance
  • 5. Modern GPU hardware architecture Streaming multiprocessor (SM) 5 A.R. Brodtkorb et al. / J. Parallel Distrib. Comput. 73 (2013) 4–13 Fermi-class GPU hardware. The GPU consisting of up to 16 streaming multiprocessors (also known as SMs) is shown in (left), and (righ the information in this article can be found in different sources, including books, documentation, nference presentations, and on Internet fora. Getting of all this information is an arduous exercise that bstantial effort. The aim of this article is therefore to distributes thread blocks to multiprocessor thread sc Fig. 4a). This scheduler handles concurrent kernel2 e out-of-order thread block execution. Each multiprocessor has 16 load/store units, all and destination addresses to be calculated for 16 thre Brodtkorb AR, Hagen TR, Sætra ML. J Parallel Distrib Comput 2013;73:4–13.
  • 6. Using GPUs • Parallel function: “kernel” • Hundreds–millions of concurrent threads • Executed in 32-thread “warps” • Challenge: thread divergence 6
  • 9. Problem Description 9 ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE
  • 10. • Large number of independent ODEs to solve • Can be even more for turbulent combustion! dYi dt = Wi ⇢ !i 0 B B B @ dY1 dt dY2 dt ... dYk dt 1 C C C A 10
  • 11. • Traditionally solved with implicit algorithms: complex logical flow GPU Algorithms significant thread divergence 11
  • 12. Prior Efforts: Implicit Implicit algorithms*: not well-suited for GPU acceleration 12 Algorithm Single-core CPU speedup Stone & Davis 2013 DVODE 7.7× Sewerin & Rigopoulos 2015 implicit Runge– Kutta (Radau5) 4.8× *Thus far
  • 13. • Traditionally solved with implicit algorithms: complex logical flow • Instead, what about explicit algorithms? GPU Algorithms significant thread divergence 13
  • 14. Prior Efforts: Explicit 14 Algorithm Single-core CPU speedup # Species Spafford et al. 2010 4th-order Runge– Kutta* 9× 22 Niemeyer et al. 2011 4th-order Runge– Kutta 75× 9 Shi et al. 2012 CHEMEQ2 3–13× 39 & 117 Stone & Davis 2013 4th-order Runge– Kutta–Fehlberg 28.6× 19 (reduced) *Species production terms only
  • 15. GPU Algorithms • For nonstiff chemistry: explicit Runge–Kutta– Cash–Karp - H2: 9 species & 38 reactions • Moderately stiff* chemistry: stabilized explicit Runge–Kutta–Chebyshev - H2/CO: 13 species & 27 reactions - CH4: 53 species & 325 reactions - C2H4: 111 species & 784 reactions • Custom CPU & GPU source code 15
  • 17. Runge–Kutta–Cash–Karp • Fifth-order accuracy • Adaptive time stepping • Global time step: 1×10-8 sec • “Nonstiff” hydrogen mechanism1 • Range number of ODEs from 10–107 1Yetter , Dryer, and Rabitz, CST 79 (1991) 97–128 17
  • 19. Dealing with Stiffness • Standard explicit algorithms fail • “Stabilized” explicit methods 19
  • 20. Runge–Kutta–Chebyshev • AKA “stabilized” Runge–Kutta • Explicit, but capable of handling mild stiffness - Extended stability domain along real axis • Second-order accuracy • Jacobian free • Time step: 1×10-6 sec 20
  • 24. CH4 – RKC vs. VODE 24 57×
  • 25. C2H4 – RKC vs. VODE (stiff) 25 2.5× Time step: 1×10-4 s
  • 26. Takeaways • For exascale (i.e., DNS): - Explicit algorithms significantly faster on GPUs • For high-fidelity engineering (i.e., LES): - Implicit algorithms perform comparably to CPU, so far… (but not much better) - Stabilized explicit algorithms offer attractive alternative - Greater stiffness still a problem 26 More details: see Niemeyer & Sung, J Comput Phys 256 (2014):854–871
  • 27. ? 27 Acknowledgements: Chih-Jen Sung & Nick Curtis @ UConn Thank you! Questions?
  • 28. 28 10 0 10 1 10 2 10 3 10 4 10 3 10 4 10 5 10 6 Computingtimeperglobaltimestep(s) Number of independent ODEs 4× Similar conditions Randomized conditions CH4 – Thread divergence