SlideShare une entreprise Scribd logo
1  sur  1
Télécharger pour lire hors ligne
Pontifical Catholic University of Rio Grande do Sul – PUCRS
Graduate Program in Computer Science
Faculty of Informatics
Parallelization Strategies for N-Body
Simulations on Multicore Architectures
Filipo Novo Mór
Thais Christina Webber dos Santos
César Augusto Missio Marcon
GPU implementation
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 1 2
Global Memory
Shared
Memory
Bank
Active Tasks
Active Transfers
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Global Memory
Shared
Memory
Bank
Active Tasks
Active Transfers
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
BARRIER
Global Memory
Shared
Memory
Bank
Active Tasks
Active Transfers
data bufferization
on shared
memory
data on buffer
consumption by
parallel threads
at the end, global
memory is
updated
Initially, information about particles is copied from RAM
to the GPU memory. Then, the code runs in a pipeline
where several data transferences between shared and
global memory will take place while data on shared
memory buffer is consumed. At the end of the process
all remaining data, now updated, is copied back to the
global memory and then back to the RAM on the CPU.
The computational cost is given by (1)
where C is the cost of the force calculation function; n
is the amount of particles; p is the amount of parallel
running threads; M is the cost of data transfer between
shared and global memories on GPU, and T is the cost
of data transfer between RAM and the GPU memory.
𝐶
𝑛2
𝑝
+ 2𝑀𝑛 + 2𝑇𝑛 (1)
Multicore CPU Implementation
𝐶
𝑛2
𝑝
For a multicore CPU, a standard serial code
was parallelized by adding OpenMP
directives directly on it. The computational
cost was reduced from n2 to (2), once there
is no need to memory transfers. p is the
parallel threads, which normally is one per
amount of
CPU core.
(2)
filipo.mor@acad.pucrs.br
The N-Body Simulation
The Particle-Particle method does not round
off the force summation, such that the
accuracy equals to the machine precision.
Energy Monitoring during the simulation.
Computational Cost: O(n2).
Partial Results and Perspectives
 Cost saving by real speedup achievement.
 Visualization module already implemented.
Next Steps:
 Cluster implementation (CUDA + MPI + GPUDirect).
 Exploration of hierarchical algorithms (such as
Barnes&Hut and mesh).
 OpenCL version.
 OpenACC version.
Execution quoted on Amazon EC2 service.
Version Cost
Serial 0,33$
OpenMP 0,08$
CUDA 0,05$

Contenu connexe

Tendances

Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale FrontierMultiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontierinside-BigData.com
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersCleverence Kombe
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecturemohamedragabslideshare
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS An efficient-parallel-approach-fo...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS  An efficient-parallel-approach-fo...IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS  An efficient-parallel-approach-fo...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS An efficient-parallel-approach-fo...IEEEBEBTECHSTUDENTPROJECTS
 
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersAshraf Uddin
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...Adrian Florea
 
MapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large ClustersMapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large Clusterskazuma_sato
 
Distributed Processing Frameworks
Distributed Processing FrameworksDistributed Processing Frameworks
Distributed Processing FrameworksAntonios Katsarakis
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAprithan
 
MapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersAbolfazl Asudeh
 
USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)Ryousei Takano
 
A GPU Implementation of Generalized Graph Processing Algorithm GIM-V
A GPU Implementation of Generalized Graph Processing Algorithm GIM-VA GPU Implementation of Generalized Graph Processing Algorithm GIM-V
A GPU Implementation of Generalized Graph Processing Algorithm GIM-VKoichi Shirahata
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersAbhishek Singh
 
Introduction to Computing on GPU
Introduction to Computing on GPUIntroduction to Computing on GPU
Introduction to Computing on GPUIlya Kuzovkin
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clusteringSubhas Kumar Ghosh
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...NECST Lab @ Politecnico di Milano
 

Tendances (20)

Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale FrontierMultiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clusters
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS An efficient-parallel-approach-fo...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS  An efficient-parallel-approach-fo...IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS  An efficient-parallel-approach-fo...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS An efficient-parallel-approach-fo...
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large Clusters
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
 
IEEE CLOUD \'11
IEEE CLOUD \'11IEEE CLOUD \'11
IEEE CLOUD \'11
 
MapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large ClustersMapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large Clusters
 
Distributed Processing Frameworks
Distributed Processing FrameworksDistributed Processing Frameworks
Distributed Processing Frameworks
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
 
Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)
 
MapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large Clusters
 
USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)
 
A GPU Implementation of Generalized Graph Processing Algorithm GIM-V
A GPU Implementation of Generalized Graph Processing Algorithm GIM-VA GPU Implementation of Generalized Graph Processing Algorithm GIM-V
A GPU Implementation of Generalized Graph Processing Algorithm GIM-V
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
Introduction to Computing on GPU
Introduction to Computing on GPUIntroduction to Computing on GPU
Introduction to Computing on GPU
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
 

Similaire à Parallelization Strategies for Implementing Nbody Codes on Multicore Architectures

OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020OpenACC
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...Bomm Kim
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdfTigabu Yaya
 
Accelerating S3D A GPGPU Case Study
Accelerating S3D  A GPGPU Case StudyAccelerating S3D  A GPGPU Case Study
Accelerating S3D A GPGPU Case StudyMartha Brown
 
Map SMAC Algorithm onto GPU
Map SMAC Algorithm onto GPUMap SMAC Algorithm onto GPU
Map SMAC Algorithm onto GPUZhengjie Lu
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)PhtRaveller
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
System-on-Chip Design Flow for the Image Signal Processor of a Nonlinear CMOS...
System-on-Chip Design Flow for the Image Signal Processor of a Nonlinear CMOS...System-on-Chip Design Flow for the Image Signal Processor of a Nonlinear CMOS...
System-on-Chip Design Flow for the Image Signal Processor of a Nonlinear CMOS...Maikon
 
Efficient algorithm for rsa text encryption using cuda c
Efficient algorithm for rsa text encryption using cuda cEfficient algorithm for rsa text encryption using cuda c
Efficient algorithm for rsa text encryption using cuda ccsandit
 
Efficient algorithm for rsa text encryption using cuda c
Efficient algorithm for rsa text encryption using cuda cEfficient algorithm for rsa text encryption using cuda c
Efficient algorithm for rsa text encryption using cuda ccsandit
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...NVIDIA Taiwan
 

Similaire à Parallelization Strategies for Implementing Nbody Codes on Multicore Architectures (20)

OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
 
FrackingPaper
FrackingPaperFrackingPaper
FrackingPaper
 
A0270107
A0270107A0270107
A0270107
 
Todtree
TodtreeTodtree
Todtree
 
[IJCT-V3I2P17] Authors: Sheng Lai, Xiaohua Meng, Dongqin Zheng
[IJCT-V3I2P17] Authors: Sheng Lai, Xiaohua Meng, Dongqin Zheng[IJCT-V3I2P17] Authors: Sheng Lai, Xiaohua Meng, Dongqin Zheng
[IJCT-V3I2P17] Authors: Sheng Lai, Xiaohua Meng, Dongqin Zheng
 
Accelerating S3D A GPGPU Case Study
Accelerating S3D  A GPGPU Case StudyAccelerating S3D  A GPGPU Case Study
Accelerating S3D A GPGPU Case Study
 
Map SMAC Algorithm onto GPU
Map SMAC Algorithm onto GPUMap SMAC Algorithm onto GPU
Map SMAC Algorithm onto GPU
 
cug2011-praveen
cug2011-praveencug2011-praveen
cug2011-praveen
 
Ultra Fast SOM using CUDA
Ultra Fast SOM using CUDAUltra Fast SOM using CUDA
Ultra Fast SOM using CUDA
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
 
Scolari's ICCD17 Talk
Scolari's ICCD17 TalkScolari's ICCD17 Talk
Scolari's ICCD17 Talk
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
cuTau Leaping
cuTau LeapingcuTau Leaping
cuTau Leaping
 
System-on-Chip Design Flow for the Image Signal Processor of a Nonlinear CMOS...
System-on-Chip Design Flow for the Image Signal Processor of a Nonlinear CMOS...System-on-Chip Design Flow for the Image Signal Processor of a Nonlinear CMOS...
System-on-Chip Design Flow for the Image Signal Processor of a Nonlinear CMOS...
 
Efficient algorithm for rsa text encryption using cuda c
Efficient algorithm for rsa text encryption using cuda cEfficient algorithm for rsa text encryption using cuda c
Efficient algorithm for rsa text encryption using cuda c
 
Efficient algorithm for rsa text encryption using cuda c
Efficient algorithm for rsa text encryption using cuda cEfficient algorithm for rsa text encryption using cuda c
Efficient algorithm for rsa text encryption using cuda c
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
 

Plus de Filipo Mór

Desenvolvendo Aplicações de Uso Geral para GPU com CUDA
Desenvolvendo Aplicações de Uso Geral para GPU com CUDADesenvolvendo Aplicações de Uso Geral para GPU com CUDA
Desenvolvendo Aplicações de Uso Geral para GPU com CUDAFilipo Mór
 
Master Thesis Defense
Master Thesis DefenseMaster Thesis Defense
Master Thesis DefenseFilipo Mór
 
Programaçao C - Aula 2
Programaçao C - Aula 2Programaçao C - Aula 2
Programaçao C - Aula 2Filipo Mór
 
Programação C - Aula 1
Programação C - Aula 1Programação C - Aula 1
Programação C - Aula 1Filipo Mór
 
Uma Abordagem Paralela da Evolução Diferencial em GPU
Uma Abordagem Paralela da Evolução Diferencial em GPUUma Abordagem Paralela da Evolução Diferencial em GPU
Uma Abordagem Paralela da Evolução Diferencial em GPUFilipo Mór
 
Aula 6 - Redes de Computadores A - Endereçamento IP
Aula 6 - Redes de Computadores A - Endereçamento IPAula 6 - Redes de Computadores A - Endereçamento IP
Aula 6 - Redes de Computadores A - Endereçamento IPFilipo Mór
 
Aula Especial - Redes de Computadores A - Sockets
Aula Especial - Redes de Computadores A - SocketsAula Especial - Redes de Computadores A - Sockets
Aula Especial - Redes de Computadores A - SocketsFilipo Mór
 
Aula 4 - Redes de Computadores A - Camadas Modelos TCP/IP e OSI. Camada Física.
Aula 4 - Redes de Computadores A - Camadas Modelos TCP/IP e OSI. Camada Física.Aula 4 - Redes de Computadores A - Camadas Modelos TCP/IP e OSI. Camada Física.
Aula 4 - Redes de Computadores A - Camadas Modelos TCP/IP e OSI. Camada Física.Filipo Mór
 
Auditoria e Segurança em TI - Aula 4
Auditoria e Segurança em TI - Aula 4Auditoria e Segurança em TI - Aula 4
Auditoria e Segurança em TI - Aula 4Filipo Mór
 
Aula 3 - Redes de Computadores A - Administração da Internet. Modelo TCP/IP.
Aula 3 - Redes de Computadores A - Administração da Internet. Modelo TCP/IP.Aula 3 - Redes de Computadores A - Administração da Internet. Modelo TCP/IP.
Aula 3 - Redes de Computadores A - Administração da Internet. Modelo TCP/IP.Filipo Mór
 
Auditoria e Segurança em TI - Aula 3
Auditoria e Segurança em TI - Aula 3Auditoria e Segurança em TI - Aula 3
Auditoria e Segurança em TI - Aula 3Filipo Mór
 
Aula 1 - Redes de Computadores A - Conceitos Básicos.
Aula 1 - Redes de Computadores A - Conceitos Básicos.Aula 1 - Redes de Computadores A - Conceitos Básicos.
Aula 1 - Redes de Computadores A - Conceitos Básicos.Filipo Mór
 
Aula 1 - Conceitos de TI e PDTI
Aula 1 - Conceitos de TI e PDTIAula 1 - Conceitos de TI e PDTI
Aula 1 - Conceitos de TI e PDTIFilipo Mór
 
Curso "Desenvolvendo aplicações de uso geral para GPU com CUDA".
Curso "Desenvolvendo aplicações de uso geral para GPU com CUDA".Curso "Desenvolvendo aplicações de uso geral para GPU com CUDA".
Curso "Desenvolvendo aplicações de uso geral para GPU com CUDA".Filipo Mór
 
Aula 12 - Gestão do Conhecimento
Aula 12 - Gestão do ConhecimentoAula 12 - Gestão do Conhecimento
Aula 12 - Gestão do ConhecimentoFilipo Mór
 
Aula 11 - Terceirização em TI
Aula 11 - Terceirização em TIAula 11 - Terceirização em TI
Aula 11 - Terceirização em TIFilipo Mór
 
Aula 10 - Acompanhamento de Projetos
Aula 10 - Acompanhamento de ProjetosAula 10 - Acompanhamento de Projetos
Aula 10 - Acompanhamento de ProjetosFilipo Mór
 
Aula 9 - Controle de Atividades e Custos
Aula 9 - Controle de Atividades e CustosAula 9 - Controle de Atividades e Custos
Aula 9 - Controle de Atividades e CustosFilipo Mór
 
Aula 8 - Técnicas de Negociação e Gestão de RH
Aula 8 - Técnicas de Negociação e Gestão de RHAula 8 - Técnicas de Negociação e Gestão de RH
Aula 8 - Técnicas de Negociação e Gestão de RHFilipo Mór
 
Aula 7 - Técnicas de Planejamento
Aula 7 - Técnicas de PlanejamentoAula 7 - Técnicas de Planejamento
Aula 7 - Técnicas de PlanejamentoFilipo Mór
 

Plus de Filipo Mór (20)

Desenvolvendo Aplicações de Uso Geral para GPU com CUDA
Desenvolvendo Aplicações de Uso Geral para GPU com CUDADesenvolvendo Aplicações de Uso Geral para GPU com CUDA
Desenvolvendo Aplicações de Uso Geral para GPU com CUDA
 
Master Thesis Defense
Master Thesis DefenseMaster Thesis Defense
Master Thesis Defense
 
Programaçao C - Aula 2
Programaçao C - Aula 2Programaçao C - Aula 2
Programaçao C - Aula 2
 
Programação C - Aula 1
Programação C - Aula 1Programação C - Aula 1
Programação C - Aula 1
 
Uma Abordagem Paralela da Evolução Diferencial em GPU
Uma Abordagem Paralela da Evolução Diferencial em GPUUma Abordagem Paralela da Evolução Diferencial em GPU
Uma Abordagem Paralela da Evolução Diferencial em GPU
 
Aula 6 - Redes de Computadores A - Endereçamento IP
Aula 6 - Redes de Computadores A - Endereçamento IPAula 6 - Redes de Computadores A - Endereçamento IP
Aula 6 - Redes de Computadores A - Endereçamento IP
 
Aula Especial - Redes de Computadores A - Sockets
Aula Especial - Redes de Computadores A - SocketsAula Especial - Redes de Computadores A - Sockets
Aula Especial - Redes de Computadores A - Sockets
 
Aula 4 - Redes de Computadores A - Camadas Modelos TCP/IP e OSI. Camada Física.
Aula 4 - Redes de Computadores A - Camadas Modelos TCP/IP e OSI. Camada Física.Aula 4 - Redes de Computadores A - Camadas Modelos TCP/IP e OSI. Camada Física.
Aula 4 - Redes de Computadores A - Camadas Modelos TCP/IP e OSI. Camada Física.
 
Auditoria e Segurança em TI - Aula 4
Auditoria e Segurança em TI - Aula 4Auditoria e Segurança em TI - Aula 4
Auditoria e Segurança em TI - Aula 4
 
Aula 3 - Redes de Computadores A - Administração da Internet. Modelo TCP/IP.
Aula 3 - Redes de Computadores A - Administração da Internet. Modelo TCP/IP.Aula 3 - Redes de Computadores A - Administração da Internet. Modelo TCP/IP.
Aula 3 - Redes de Computadores A - Administração da Internet. Modelo TCP/IP.
 
Auditoria e Segurança em TI - Aula 3
Auditoria e Segurança em TI - Aula 3Auditoria e Segurança em TI - Aula 3
Auditoria e Segurança em TI - Aula 3
 
Aula 1 - Redes de Computadores A - Conceitos Básicos.
Aula 1 - Redes de Computadores A - Conceitos Básicos.Aula 1 - Redes de Computadores A - Conceitos Básicos.
Aula 1 - Redes de Computadores A - Conceitos Básicos.
 
Aula 1 - Conceitos de TI e PDTI
Aula 1 - Conceitos de TI e PDTIAula 1 - Conceitos de TI e PDTI
Aula 1 - Conceitos de TI e PDTI
 
Curso "Desenvolvendo aplicações de uso geral para GPU com CUDA".
Curso "Desenvolvendo aplicações de uso geral para GPU com CUDA".Curso "Desenvolvendo aplicações de uso geral para GPU com CUDA".
Curso "Desenvolvendo aplicações de uso geral para GPU com CUDA".
 
Aula 12 - Gestão do Conhecimento
Aula 12 - Gestão do ConhecimentoAula 12 - Gestão do Conhecimento
Aula 12 - Gestão do Conhecimento
 
Aula 11 - Terceirização em TI
Aula 11 - Terceirização em TIAula 11 - Terceirização em TI
Aula 11 - Terceirização em TI
 
Aula 10 - Acompanhamento de Projetos
Aula 10 - Acompanhamento de ProjetosAula 10 - Acompanhamento de Projetos
Aula 10 - Acompanhamento de Projetos
 
Aula 9 - Controle de Atividades e Custos
Aula 9 - Controle de Atividades e CustosAula 9 - Controle de Atividades e Custos
Aula 9 - Controle de Atividades e Custos
 
Aula 8 - Técnicas de Negociação e Gestão de RH
Aula 8 - Técnicas de Negociação e Gestão de RHAula 8 - Técnicas de Negociação e Gestão de RH
Aula 8 - Técnicas de Negociação e Gestão de RH
 
Aula 7 - Técnicas de Planejamento
Aula 7 - Técnicas de PlanejamentoAula 7 - Técnicas de Planejamento
Aula 7 - Técnicas de Planejamento
 

Dernier

Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 

Dernier (20)

Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 

Parallelization Strategies for Implementing Nbody Codes on Multicore Architectures

  • 1. Pontifical Catholic University of Rio Grande do Sul – PUCRS Graduate Program in Computer Science Faculty of Informatics Parallelization Strategies for N-Body Simulations on Multicore Architectures Filipo Novo Mór Thais Christina Webber dos Santos César Augusto Missio Marcon GPU implementation 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 Global Memory Shared Memory Bank Active Tasks Active Transfers 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Global Memory Shared Memory Bank Active Tasks Active Transfers 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 BARRIER Global Memory Shared Memory Bank Active Tasks Active Transfers data bufferization on shared memory data on buffer consumption by parallel threads at the end, global memory is updated Initially, information about particles is copied from RAM to the GPU memory. Then, the code runs in a pipeline where several data transferences between shared and global memory will take place while data on shared memory buffer is consumed. At the end of the process all remaining data, now updated, is copied back to the global memory and then back to the RAM on the CPU. The computational cost is given by (1) where C is the cost of the force calculation function; n is the amount of particles; p is the amount of parallel running threads; M is the cost of data transfer between shared and global memories on GPU, and T is the cost of data transfer between RAM and the GPU memory. 𝐶 𝑛2 𝑝 + 2𝑀𝑛 + 2𝑇𝑛 (1) Multicore CPU Implementation 𝐶 𝑛2 𝑝 For a multicore CPU, a standard serial code was parallelized by adding OpenMP directives directly on it. The computational cost was reduced from n2 to (2), once there is no need to memory transfers. p is the parallel threads, which normally is one per amount of CPU core. (2) filipo.mor@acad.pucrs.br The N-Body Simulation The Particle-Particle method does not round off the force summation, such that the accuracy equals to the machine precision. Energy Monitoring during the simulation. Computational Cost: O(n2). Partial Results and Perspectives  Cost saving by real speedup achievement.  Visualization module already implemented. Next Steps:  Cluster implementation (CUDA + MPI + GPUDirect).  Exploration of hierarchical algorithms (such as Barnes&Hut and mesh).  OpenCL version.  OpenACC version. Execution quoted on Amazon EC2 service. Version Cost Serial 0,33$ OpenMP 0,08$ CUDA 0,05$