GPU Programming with CUDA

•Télécharger en tant que PPTX, PDF•

2 j'aime•2,544 vues

Filipo Mór

This is an brief introduction to GPU technology and CUDA programming model.

Technologie

CUDA
Programming

GPU programming with CUDA

Filipo Mór
Plauto Neto

PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

CUDA
Programming

AGENDA

 INTRODUÇÃO
 NVIDIA GPU ARCHITECTURE
 CUDA PROGRAMMING MODEL
 CASE STUDY

PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

CUDA
Programming

INTRODUCTION
NVIDIA FIRST GRAPHIC ACCELERATOR
 NVIDIA NV1 – 90’S
 DirectX – 1996
 First GPU – 1999
 NVIDIA GeForce 256
 22 million transistors
 10 million polygons
 32/64MB
 T&L engine (vertex)
 challenging programming

 CUDA – 2006

Compute Unified Device Architecture
GPGPU

PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

CUDA
Programming

MULTICORE vs MANY-CORE
CPU vs GPU

PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

CUDA
Programming

MULTICORE vs MANY-CORE
CPU

GPU

•

Based on pipeline philosophy

•

Big amount of parallel data

•

A lot of strucutres for cache and

•

Less strucutres for cache and

control

control

•

More flexible

•

Less flexibility

•

MIMD – task parallelism

•

SIMD – data parallelism

•

Latency sensible

•

Latency tolerant

PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

CUDA
Programming

NVIDIA GPU ROADMAP

PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

CUDA
Programming

NVIDIA GPU ROADMAP
GENERATIONS
•

•

•

•
•

TESLA/GEFORCE
• 2006
• Float point algebra
• CUDA
FERMI
• improvements on shared memory
• SLI
KEPLER
• Dynamic Parallelism
• MIMD
MAXWELL
• Unified Virtual Memory
VOLTA
• Stacked DRAM

PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

CUDA
Programming

DYNAMIC PARALLELISM

PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

CUDA
Programming

COMPUTE CAPABILITY

PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

CUDA
Programming

CUDA PROGRAMMING MODEL

PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

CUDA
Programming

CUDA PROGRAMMING MODEL
thread ID

•

Basic unit – kernel
•

2

3

4

…

Thread array

•

1

Synchronous or asynchronous

•

0

Array / matrix / cube

[1] Keutzer, K.,Malik, S. Newton, A.R., Rabaye, J.M. and Sangiovanni Vincentelli, A.: System-level design: orthogonolization of concerns and platform-based
design. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst.,2000,19, (12), pp. 1523-1543
PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

CUDA
Programming

CUDA PROGRAMMING MODEL

•

Basic unit – kernel
•

Synchronous or asynchronous

•

Thread array

•

Array / matrix / cube

PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

CUDA
Programming

CUDA PROGRAMMING MODEL

Coalesced memory access!

PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

CUDA
Programming

CASE STUDY
•

DOT PRODUCT

𝑛

𝐴∙ 𝐵=

𝑎 𝑖 𝑏 𝑖 = 𝑎1 𝑏1 + 𝑎2 𝑏2 + ⋯ + 𝑎 𝑛 𝑏 𝑛
𝑖=1

𝐴 = 𝑎1 , 𝑎2 , … , 𝑎 𝑛

𝐵 = (𝑏1 , 𝑏2 , … , 𝑏 𝑛 )

PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

CUDA
Programming

CASE STUDY
•

DOT PRODUCT

PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

CUDA
Programming

Perguntas !!!

PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

Contenu connexe

Tendances

Gpu perf-presentationGiannisTsagatakis

GPU ComputingKhan Mostafa

CudaGopi Saiteja

Cuda ArchitecturePiyush Mittal

Nvidia (History, GPU Architecture and New Pascal Architecture)Saksham Tanwar

Gpu with cuda architectureDhaval Kaneria

Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central

Parallel Computing on the GPUTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

GPU Programming with JavaKelum Senanayake

GPU - An IntroductionDhan V Sagar

Gpu and The Brick Wallugur candan

GPU Architecture NVIDIA (GTX GeForce 480)Fatima Qayyum

Introduction to GPU ProgrammingChakkrit (Kla) Tantithamthavorn

Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central

Gcn performance ftw by stephan hodesAMD Developer Central

Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central

CC-4005, Performance analysis of 3D Finite Difference computational stencils ...AMD Developer Central

Easy and High Performance GPU Programming for Java ProgrammersKazuaki Ishizaki

GPUHamid Bluri

HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central

Tendances (20)

Gpu perf-presentation

GPU Computing

Cuda

Cuda Architecture

Nvidia (History, GPU Architecture and New Pascal Architecture)

Gpu with cuda architecture

Direct3D12 and the Future of Graphics APIs by Dave Oldcorn

Parallel Computing on the GPU

GPU Programming with Java

GPU - An Introduction

Gpu and The Brick Wall

GPU Architecture NVIDIA (GTX GeForce 480)

Introduction to GPU Programming

Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14

Gcn performance ftw by stephan hodes

Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...

CC-4005, Performance analysis of 3D Finite Difference computational stencils ...

Easy and High Performance GPU Programming for Java Programmers

GPU

HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...

Similaire à GPU Programming with CUDA

S12075-GPU-Accelerated-Video-Encoding.pdfgopikahari7

Current Trends in HPCPutchong Uthayopas

Making the most out of Heterogeneous Chips with CPU, GPU and FPGAFacultad de Informática UCM

S12075-GPU-Accelerated-Video-Encoding.pptxgopikahari7

The Rise of Parallel Computingbakers84

NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...Yuichiro Yasui

GPU Algorithms and trends 2018Prabindh Sundareson

Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSPeterAndreasEntschev

High speed-pcb-board-design-and-analysiscadence-130218085524-phpapp01khalid noman husainy

CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com

HiPEAC Computing Systems Week 2022_Mario Porrmann presentationVEDLIoT Project

Something about SSE and beyondLihang Li

Cuda 6 performance_reportMichael Zhang

lecture_GPUArchCUDA02-CUDAMem.pdfTigabu Yaya

Computing using GPUsShree Kumar

Francisco Javier Ramirez Urea - Hopla - OSL19marketingsyone

Introduction to HPC & Supercomputing in AITyrone Systems

Rapids: Data Science on GPUsinside-BigData.com

NVIDIA Rapids presentationtestSri1

Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Clusterbyonggon chun

Similaire à GPU Programming with CUDA (20)

S12075-GPU-Accelerated-Video-Encoding.pdf

Current Trends in HPC

Making the most out of Heterogeneous Chips with CPU, GPU and FPGA

S12075-GPU-Accelerated-Video-Encoding.pptx

The Rise of Parallel Computing

NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...

GPU Algorithms and trends 2018

Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS

High speed-pcb-board-design-and-analysiscadence-130218085524-phpapp01

CUDA-Python and RAPIDS for blazing fast scientific computing

HiPEAC Computing Systems Week 2022_Mario Porrmann presentation

Something about SSE and beyond

Cuda 6 performance_report

lecture_GPUArchCUDA02-CUDAMem.pdf

Computing using GPUs

Francisco Javier Ramirez Urea - Hopla - OSL19

Introduction to HPC & Supercomputing in AI

Rapids: Data Science on GPUs

NVIDIA Rapids presentation

Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster

Plus de Filipo Mór

Desenvolvendo Aplicações de Uso Geral para GPU com CUDAFilipo Mór

Master Thesis DefenseFilipo Mór

Programaçao C - Aula 2Filipo Mór

Programação C - Aula 1Filipo Mór

Uma Abordagem Paralela da Evolução Diferencial em GPUFilipo Mór

Aula 6 - Redes de Computadores A - Endereçamento IPFilipo Mór

Aula Especial - Redes de Computadores A - SocketsFilipo Mór

Aula 4 - Redes de Computadores A - Camadas Modelos TCP/IP e OSI. Camada Física.Filipo Mór

Auditoria e Segurança em TI - Aula 4Filipo Mór

Aula 3 - Redes de Computadores A - Administração da Internet. Modelo TCP/IP.Filipo Mór

Auditoria e Segurança em TI - Aula 3Filipo Mór

Aula 1 - Redes de Computadores A - Conceitos Básicos.Filipo Mór

Aula 1 - Conceitos de TI e PDTIFilipo Mór

Curso "Desenvolvendo aplicações de uso geral para GPU com CUDA".Filipo Mór

Aula 12 - Gestão do ConhecimentoFilipo Mór

Aula 11 - Terceirização em TIFilipo Mór

Aula 10 - Acompanhamento de ProjetosFilipo Mór

Aula 9 - Controle de Atividades e CustosFilipo Mór

Aula 8 - Técnicas de Negociação e Gestão de RHFilipo Mór

Aula 7 - Técnicas de PlanejamentoFilipo Mór

Plus de Filipo Mór (20)

Desenvolvendo Aplicações de Uso Geral para GPU com CUDA

Master Thesis Defense

Programaçao C - Aula 2

Programação C - Aula 1

Uma Abordagem Paralela da Evolução Diferencial em GPU

Aula 6 - Redes de Computadores A - Endereçamento IP

Aula Especial - Redes de Computadores A - Sockets

Aula 4 - Redes de Computadores A - Camadas Modelos TCP/IP e OSI. Camada Física.

Auditoria e Segurança em TI - Aula 4

Aula 3 - Redes de Computadores A - Administração da Internet. Modelo TCP/IP.

Auditoria e Segurança em TI - Aula 3

Aula 1 - Redes de Computadores A - Conceitos Básicos.

Aula 1 - Conceitos de TI e PDTI

Curso "Desenvolvendo aplicações de uso geral para GPU com CUDA".

Aula 12 - Gestão do Conhecimento

Aula 11 - Terceirização em TI

Aula 10 - Acompanhamento de Projetos

Aula 9 - Controle de Atividades e Custos

Aula 8 - Técnicas de Negociação e Gestão de RH

Aula 7 - Técnicas de Planejamento

Dernier

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney

AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

MINDCTI Revenue Release Quarter One 2024MIND CTI

DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Exploring Multimodal Embeddings with MilvusZilliz

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

Corporate and higher education May webinar.pptxRustici Software

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays

CNIC Information System with Pakdata Cf In Pakistandanishmna97

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Dernier (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

AXA XL - Insurer Innovation Award Americas 2024

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

Strategies for Landing an Oracle DBA Job as a Fresher

MINDCTI Revenue Release Quarter One 2024

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Exploring Multimodal Embeddings with Milvus

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Corporate and higher education May webinar.pptx

2024: Domino Containers - The Next Step. News from the Domino Container commu...

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

How to Troubleshoot Apps for the Modern Connected Worker

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

CNIC Information System with Pakdata Cf In Pakistan

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

presentation ICT roal in 21st century education

Cyberprint. Dark Pink Apt Group [EN].pdf

Axa Assurance Maroc - Insurer Innovation Award 2024

GPU Programming with CUDA

1. CUDA Programming GPU programming with CUDA Filipo Mór Plauto Neto PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

2. CUDA Programming AGENDA  INTRODUÇÃO  NVIDIA GPU ARCHITECTURE  CUDA PROGRAMMING MODEL  CASE STUDY PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

3. CUDA Programming INTRODUCTION NVIDIA FIRST GRAPHIC ACCELERATOR  NVIDIA NV1 – 90’S  DirectX – 1996  First GPU – 1999  NVIDIA GeForce 256  22 million transistors  10 million polygons  32/64MB  T&L engine (vertex)  challenging programming  CUDA – 2006 Compute Unified Device Architecture GPGPU PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

4. CUDA Programming MULTICORE vs MANY-CORE CPU vs GPU PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

5. CUDA Programming MULTICORE vs MANY-CORE CPU GPU • Based on pipeline philosophy • Big amount of parallel data • A lot of strucutres for cache and • Less strucutres for cache and control control • More flexible • Less flexibility • MIMD – task parallelism • SIMD – data parallelism • Latency sensible • Latency tolerant PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

6. CUDA Programming NVIDIA GPU ROADMAP PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

7. CUDA Programming NVIDIA GPU ROADMAP GENERATIONS • • • • • TESLA/GEFORCE • 2006 • Float point algebra • CUDA FERMI • improvements on shared memory • SLI KEPLER • Dynamic Parallelism • MIMD MAXWELL • Unified Virtual Memory VOLTA • Stacked DRAM PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

8. CUDA Programming DYNAMIC PARALLELISM PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

9. CUDA Programming DYNAMIC PARALLELISM PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

10. CUDA Programming COMPUTE CAPABILITY PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

11. CUDA Programming CUDA PROGRAMMING MODEL PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

12. CUDA Programming CUDA PROGRAMMING MODEL thread ID • Basic unit – kernel • 2 3 4 … Thread array • 1 Synchronous or asynchronous • 0 Array / matrix / cube [1] Keutzer, K.,Malik, S. Newton, A.R., Rabaye, J.M. and Sangiovanni Vincentelli, A.: System-level design: orthogonolization of concerns and platform-based design. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst.,2000,19, (12), pp. 1523-1543 PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

13. CUDA Programming CUDA PROGRAMMING MODEL • Basic unit – kernel • Synchronous or asynchronous • Thread array • Array / matrix / cube PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

14. CUDA Programming CUDA PROGRAMMING MODEL Coalesced memory access! PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

15. CUDA Programming CASE STUDY • DOT PRODUCT 𝑛 𝐴∙ 𝐵= 𝑎 𝑖 𝑏 𝑖 = 𝑎1 𝑏1 + 𝑎2 𝑏2 + ⋯ + 𝑎 𝑛 𝑏 𝑛 𝑖=1 𝐴 = 𝑎1 , 𝑎2 , … , 𝑎 𝑛 𝐵 = (𝑏1 , 𝑏2 , … , 𝑏 𝑛 ) PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

16. CUDA Programming CASE STUDY • DOT PRODUCT PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

17. CUDA Programming CASE STUDY • DOT PRODUCT PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

18. CUDA Programming CASE STUDY • DOT PRODUCT PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

19. CUDA Programming CASE STUDY • DOT PRODUCT PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

20. CUDA Programming Perguntas !!! PUCRS - PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO

GPU Programming with CUDA

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à GPU Programming with CUDA

Similaire à GPU Programming with CUDA (20)

Plus de Filipo Mór

Plus de Filipo Mór (20)

Dernier

Dernier (20)

GPU Programming with CUDA