SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Computing Using
Graphics Cards

Shree Kumar, Hewlett Packard
http://www.shreekumar.in/
Speaker Intro

• High Performance Computing @ Hewlett‐Packard
  – VizStack (http://vizstack.sourceforge.net)
  – GPU Computing
• Big 3D enthusiast
• Travels a lot
• Blogs at http://www.shreekumar.in/
What we will cover

•   GPUs and their history
•   Why use GPUs
•   Architecture
•   Getting Started with GPU Programming
•   Challenges, Techniques & Pitfalls
•   Where not to use GPUs ?
•   Resources
•   The Future
What is a GPU

• Graphics Programming Unit
   – Coined in 1999 by NVidia
   – Specialized add‐on board
• Accelerates interactive 3D rendering
   – 60 image updates (or more) on large data
   – Solves embarrassingly parallel problem
   – Game driven volume economics
       • NVidia v/s ATI, just like Intel v/s AMD
• Demand for better effects led to
   – programmable GPUs
   – floating point capabilities
   – this led to General Purpose GPU(GPGPU) Computation
History of GPUs : a GPGPU Perspective
Date Product               Trans       Cores Flops             Technology

1997   RIVA 128            3 M                                 Rasterization
1999 GeForce 256           25 M                                Transform & Lighting
2001   GeForce 3           60 M                                Programmable shaders
2002 GeForce FX            125 M                               16, 32 bit FP, long shaders
2004 GeForce 6800 222 M                                        Infinite length shaders, branching
2006 GeForce 8800 681 M 128                                    Unified graphics & compute, CUDA, 
                                                               64 bit FP
2008 GeForce GTX           1.4 B       240        933 G        IEEE FP, CUDA C, OpenCL and 
     280                                          78 M         DirectCompute, PCI‐express Gen 2
2009 Tesla M2050           3.0 B       512        1.03 T       Improved 64 bit perf, caching, ECC 
                                                  515 G        memory, 64‐bit unified addressing, 
                                                               asynchronous bidirectional data 
                                                               transfer, multiple kernels
          Source : Nickolls J. , Dally W.J. “The GPU Computing Era”, IEEE Micro, March-April 2010
The GPU Advantage




  30x CPU FLOPS on Latest GPUs                10x Memory Bandwidth




                                                  Add to these a
                                                 3x Performance/$


Energy Efficient : 5x Performance/Watt
                                         All Graphs From: GPU4Vision : http://gpu4vision.icg.tugrz.at/
People use GPUs for…




    Source : Nickolls J. , Dally W.J. “The GPU Computing Era”, IEEE Micro, March-April 2010
More “why to use GPUs”

• Proliferation of GPUs
   – Mobile devices will have capable GPUs soon !
• Make more things possible
   – Make things real‐time
      • From seconds to real‐time interactive performance
   – Reduce offline processing overhead
• Research Opportunities
   – New & efficient algorithms
   – Pairing Multi‐core CPUs and massively multi‐threaded 
     GPUs
GPU Computing 1‐2‐3


A GPU isn’t a CPU replacement!
GPU Computing 1‐2‐3


There ain’t no such thing as a FREE Lunch!
GPU Computing 1‐2‐3


You don’t always “port” a CPU algorithm to a GPU!
CPU versus GPU

• CPU
  – Optimized for latency
  – Speedup techniques
     • Vectorization (MMX, SSE, …)
     • Coarse Grained Parallelism using multiple CPUs and cores
  – Memory approaching a TB
• GPU
  – Optimized for throughput
  – Speedup techniques
     • Massive multithreading
     • Fine grained parallelism
  – A few GBs of memory max
Getting Started

• Software
  – CUDA (NVidia specific)
  – OpenCL (Cross‐platform, GPU/CPU)
  – DirectCompute (MS specific)
• Hardware
  – A system equipped with GPU
• OS no bar
  – But Windows, RedHat Enterprise Linux seem better 
    supported
CUDA
• Compute Unified Device 
  Architecture
• Most popular GPGPU toolkit
• CUDA C extends C with 
  constructs
     – Easy to write programs
•   Lower level “driver” API is 
    available
                                        Source: NVIDIA CUDA Architecture, Introduction and Overview
     – Provides more control
     – Use multiple GPUs in the same 
       application
     – Mix graphics & compute code
•   Language bindings available
     – PyCUDA, Java, .NET
•   Toolkit provides conveniences


                                                                CUDA Toolkit
CUDA Architecture
• 1 more streaming 
  multiprocessors (“cores”)
• Thread Blocks
   – Single Instruction, Multiple 
     Thread (SIMT)
   – Hide latency by parallelism
• Memory Hierarchy
   – Fermi GPUs can access 
     system memory
• Primitives for
   – Thread synchronization
   – Atomic Operations on 
     memory


                                     Source : The GPU Computing Era
Simple Example : Vector Addition
C/C++ ‐ serial code
void VecAdd(const float *A, const float*B, float *C, int N) {
  for(unsigned int i=0;i<N;i++)
    C[i]=A[i]+B[i];
}
VecAdd(A,B,C,N);




C/C++ with OpenMP – thread level parallelism
void VecAdd(const float *A, const float*B, float *C, int N) {
  #pragma omp for
  for(unsigned int i=0;i<N;i++)
    C[i]=A[i]+B[i];
}
VecAdd(A,B,C,N);
Vector Addition using CUDA
CUDA C – element level parallelism
__global__ void VecAdd(const float *A, const float*B, float *C, int N) {
  int I = blockDim.x * blockIdx.x + threadIdx.x;
  if(i<N)
    C[i]=A[i]+B[i];
}


Invoking the function
cudaMalloc((void**)&d_A, size);
                                                                       Allocate Memory on GPU
cudaMalloc((void**)&d_B, size);
cudaMalloc((void**)&d_C, size);
cudaMemcpy(d_A, A, size, cudaMemcpyHostToDevice);                            Copy Arrays to GPU
cudaMemcpy(d_B, B, size, cudaMemcpyHostToDevice);
int threadsPerBlock = 256;
int blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock;                Invoke function
VecAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, N);
cudaMemcpy(C, d_C, size, cudaMemcpyDeviceToHost);
                                                               Copy Result Back to Main Memory
cudaFree(d_A);
cudaFree(d_B);
                                                                              Free GPU Memory
cudaFree(d_C);


Compilation
# nvcc vectorAdd.cu –I ../../common/inc
GPU Programming Challenges

• Need high “occupancy” for best performance
• Extracting parallelism with limited resources
  – Limited Registers
  – Limited Shared Memory
• Preferred Approach
  – Small Kernels
  – Multiple Passes if needed
• Decompose Problem into Parallel Pieces
  – Write once, scale perform everywhere!
GPU Programming

• Use Shared Memory when possible
   – Cooperation between threads in a block
   – Reduce access to global memory
• Reduce Data Transfer over the Bus
• It’s still a GPU !
   – use textures to your advantage
   – use vector data types if you can
• Watch out for GPU capability differences!
Enough Theory!

          Demo Time
              &
Let’s do some programming 
Watch out for

• Portability of programs across GPUs
   – Capabilities vary from GPU to GPU
   – Memory usage
• Arithmetic differences in the result
• Pay careful attention to demos…
Resources

• CUDA
  – Tools on NVIDIA Developer Site 
    http://developer.nvidia.com/object/gpucomputing.html
  – CUDPP 
    http://code.google.com/p/cudpp/
• OpenCL
• Google Search !
The Future

• Better throughput
   – More GPU cores, scaling by Moore’s law
   – PCIe Gen 3
• Easier to program
• Arbitrary control and data access patterns
Questions ?

shree.shree@gmail.com

Contenu connexe

Tendances

Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDARaymond Tay
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Rob Gillen
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introductionHanibei
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAPiyush Mittal
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaRob Gillen
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012DefCamp
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesSubhajit Sahu
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computingbakers84
 
AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021Grigory Sapunov
 

Tendances (18)

Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
 
Cuda Architecture
Cuda ArchitectureCuda Architecture
Cuda Architecture
 
Cuda
CudaCuda
Cuda
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introduction
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDA
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : Notes
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
CUDA Architecture
CUDA ArchitectureCUDA Architecture
CUDA Architecture
 
Tech Talk NVIDIA CUDA
Tech Talk NVIDIA CUDATech Talk NVIDIA CUDA
Tech Talk NVIDIA CUDA
 
AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021
 

En vedette

Extending Android with New Devices
Extending Android with New DevicesExtending Android with New Devices
Extending Android with New DevicesShree Kumar
 
Switching on the fibre staff present 2010
Switching on the fibre staff present 2010Switching on the fibre staff present 2010
Switching on the fibre staff present 2010moranf
 
Calendario 3divisao hoquei
Calendario 3divisao hoqueiCalendario 3divisao hoquei
Calendario 3divisao hoqueiMané Castilho
 
Breakfast with Beatrice by Andrea Olausson
Breakfast with Beatrice by Andrea OlaussonBreakfast with Beatrice by Andrea Olausson
Breakfast with Beatrice by Andrea OlaussonAdriana Young
 
Using Social Media for Nonprofits
Using Social Media for NonprofitsUsing Social Media for Nonprofits
Using Social Media for NonprofitsProfiles, Inc.
 
Verification 2006 1
Verification 2006 1Verification 2006 1
Verification 2006 1hjbarten
 
台北縣政府農業局簡介_final
台北縣政府農業局簡介_final台北縣政府農業局簡介_final
台北縣政府農業局簡介_finalNancy Xiao
 
Livro 3 leitura sem simbolo
Livro 3   leitura sem simboloLivro 3   leitura sem simbolo
Livro 3 leitura sem simboloeliane santos
 
Contract and its assential
Contract and its assentialContract and its assential
Contract and its assentialspicysugar
 
A horror story about me
A horror story about meA horror story about me
A horror story about meManohar Patil
 
Il mercato pubblicitario in un contesto postmoderno
Il mercato pubblicitario in un contesto postmodernoIl mercato pubblicitario in un contesto postmoderno
Il mercato pubblicitario in un contesto postmodernopginzaina
 
Diseño de envases, packaging
Diseño de envases, packagingDiseño de envases, packaging
Diseño de envases, packagingNieves dibujo
 
How i built my own irrigation controller
How i built my own irrigation controllerHow i built my own irrigation controller
How i built my own irrigation controllerShree Kumar
 
Android Service Patterns
Android Service PatternsAndroid Service Patterns
Android Service PatternsShree Kumar
 
Android, without batteries
Android, without batteriesAndroid, without batteries
Android, without batteriesShree Kumar
 

En vedette (18)

Extending Android with New Devices
Extending Android with New DevicesExtending Android with New Devices
Extending Android with New Devices
 
Switching on the fibre staff present 2010
Switching on the fibre staff present 2010Switching on the fibre staff present 2010
Switching on the fibre staff present 2010
 
Calendario 3divisao hoquei
Calendario 3divisao hoqueiCalendario 3divisao hoquei
Calendario 3divisao hoquei
 
Shop and Awe
Shop and AweShop and Awe
Shop and Awe
 
Breakfast with Beatrice by Andrea Olausson
Breakfast with Beatrice by Andrea OlaussonBreakfast with Beatrice by Andrea Olausson
Breakfast with Beatrice by Andrea Olausson
 
Using Social Media for Nonprofits
Using Social Media for NonprofitsUsing Social Media for Nonprofits
Using Social Media for Nonprofits
 
Verification 2006 1
Verification 2006 1Verification 2006 1
Verification 2006 1
 
Passie voor Oranje
Passie voor OranjePassie voor Oranje
Passie voor Oranje
 
台北縣政府農業局簡介_final
台北縣政府農業局簡介_final台北縣政府農業局簡介_final
台北縣政府農業局簡介_final
 
Livro 3 leitura sem simbolo
Livro 3   leitura sem simboloLivro 3   leitura sem simbolo
Livro 3 leitura sem simbolo
 
Contract and its assential
Contract and its assentialContract and its assential
Contract and its assential
 
A horror story about me
A horror story about meA horror story about me
A horror story about me
 
Il mercato pubblicitario in un contesto postmoderno
Il mercato pubblicitario in un contesto postmodernoIl mercato pubblicitario in un contesto postmoderno
Il mercato pubblicitario in un contesto postmoderno
 
Diseño de envases, packaging
Diseño de envases, packagingDiseño de envases, packaging
Diseño de envases, packaging
 
How i built my own irrigation controller
How i built my own irrigation controllerHow i built my own irrigation controller
How i built my own irrigation controller
 
Android Service Patterns
Android Service PatternsAndroid Service Patterns
Android Service Patterns
 
Android, without batteries
Android, without batteriesAndroid, without batteries
Android, without batteries
 
Woning in spanje
Woning in spanjeWoning in spanje
Woning in spanje
 

Similaire à Computing using GPUs

lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxssuser413a98
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
Introduction to GPUs for Machine Learning
Introduction to GPUs for Machine LearningIntroduction to GPUs for Machine Learning
Introduction to GPUs for Machine LearningSri Ambati
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010John Holden
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to AcceleratorsDilum Bandara
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsnARUNACHALAM468781
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiectureHaris456
 
Introduction to cuda geek camp singapore 2011
Introduction to cuda   geek camp singapore 2011Introduction to cuda   geek camp singapore 2011
Introduction to cuda geek camp singapore 2011Raymond Tay
 
S0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudaS0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudamistercteam
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdfTigabu Yaya
 
GPGPU Computation
GPGPU ComputationGPGPU Computation
GPGPU Computationjtsagata
 

Similaire à Computing using GPUs (20)

lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
Introduction to GPUs for Machine Learning
Introduction to GPUs for Machine LearningIntroduction to GPUs for Machine Learning
Introduction to GPUs for Machine Learning
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiecture
 
Introduction to cuda geek camp singapore 2011
Introduction to cuda   geek camp singapore 2011Introduction to cuda   geek camp singapore 2011
Introduction to cuda geek camp singapore 2011
 
S0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudaS0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cuda
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
Gpu perf-presentation
Gpu perf-presentationGpu perf-presentation
Gpu perf-presentation
 
GPGPU Computation
GPGPU ComputationGPGPU Computation
GPGPU Computation
 
GPU for DL
GPU for DLGPU for DL
GPU for DL
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
 

Dernier

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Dernier (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Computing using GPUs

  • 2. Speaker Intro • High Performance Computing @ Hewlett‐Packard – VizStack (http://vizstack.sourceforge.net) – GPU Computing • Big 3D enthusiast • Travels a lot • Blogs at http://www.shreekumar.in/
  • 3. What we will cover • GPUs and their history • Why use GPUs • Architecture • Getting Started with GPU Programming • Challenges, Techniques & Pitfalls • Where not to use GPUs ? • Resources • The Future
  • 4. What is a GPU • Graphics Programming Unit – Coined in 1999 by NVidia – Specialized add‐on board • Accelerates interactive 3D rendering – 60 image updates (or more) on large data – Solves embarrassingly parallel problem – Game driven volume economics • NVidia v/s ATI, just like Intel v/s AMD • Demand for better effects led to – programmable GPUs – floating point capabilities – this led to General Purpose GPU(GPGPU) Computation
  • 5. History of GPUs : a GPGPU Perspective Date Product Trans Cores Flops Technology 1997 RIVA 128 3 M Rasterization 1999 GeForce 256 25 M Transform & Lighting 2001 GeForce 3 60 M Programmable shaders 2002 GeForce FX 125 M 16, 32 bit FP, long shaders 2004 GeForce 6800 222 M Infinite length shaders, branching 2006 GeForce 8800 681 M 128 Unified graphics & compute, CUDA,  64 bit FP 2008 GeForce GTX  1.4 B 240 933 G IEEE FP, CUDA C, OpenCL and  280 78 M DirectCompute, PCI‐express Gen 2 2009 Tesla M2050 3.0 B 512 1.03 T Improved 64 bit perf, caching, ECC  515 G memory, 64‐bit unified addressing,  asynchronous bidirectional data  transfer, multiple kernels Source : Nickolls J. , Dally W.J. “The GPU Computing Era”, IEEE Micro, March-April 2010
  • 6. The GPU Advantage 30x CPU FLOPS on Latest GPUs 10x Memory Bandwidth Add to these a 3x Performance/$ Energy Efficient : 5x Performance/Watt All Graphs From: GPU4Vision : http://gpu4vision.icg.tugrz.at/
  • 7. People use GPUs for… Source : Nickolls J. , Dally W.J. “The GPU Computing Era”, IEEE Micro, March-April 2010
  • 8. More “why to use GPUs” • Proliferation of GPUs – Mobile devices will have capable GPUs soon ! • Make more things possible – Make things real‐time • From seconds to real‐time interactive performance – Reduce offline processing overhead • Research Opportunities – New & efficient algorithms – Pairing Multi‐core CPUs and massively multi‐threaded  GPUs
  • 12. CPU versus GPU • CPU – Optimized for latency – Speedup techniques • Vectorization (MMX, SSE, …) • Coarse Grained Parallelism using multiple CPUs and cores – Memory approaching a TB • GPU – Optimized for throughput – Speedup techniques • Massive multithreading • Fine grained parallelism – A few GBs of memory max
  • 13. Getting Started • Software – CUDA (NVidia specific) – OpenCL (Cross‐platform, GPU/CPU) – DirectCompute (MS specific) • Hardware – A system equipped with GPU • OS no bar – But Windows, RedHat Enterprise Linux seem better  supported
  • 14. CUDA • Compute Unified Device  Architecture • Most popular GPGPU toolkit • CUDA C extends C with  constructs – Easy to write programs • Lower level “driver” API is  available Source: NVIDIA CUDA Architecture, Introduction and Overview – Provides more control – Use multiple GPUs in the same  application – Mix graphics & compute code • Language bindings available – PyCUDA, Java, .NET • Toolkit provides conveniences CUDA Toolkit
  • 15. CUDA Architecture • 1 more streaming  multiprocessors (“cores”) • Thread Blocks – Single Instruction, Multiple  Thread (SIMT) – Hide latency by parallelism • Memory Hierarchy – Fermi GPUs can access  system memory • Primitives for – Thread synchronization – Atomic Operations on  memory Source : The GPU Computing Era
  • 16. Simple Example : Vector Addition C/C++ ‐ serial code void VecAdd(const float *A, const float*B, float *C, int N) { for(unsigned int i=0;i<N;i++) C[i]=A[i]+B[i]; } VecAdd(A,B,C,N); C/C++ with OpenMP – thread level parallelism void VecAdd(const float *A, const float*B, float *C, int N) { #pragma omp for for(unsigned int i=0;i<N;i++) C[i]=A[i]+B[i]; } VecAdd(A,B,C,N);
  • 17. Vector Addition using CUDA CUDA C – element level parallelism __global__ void VecAdd(const float *A, const float*B, float *C, int N) { int I = blockDim.x * blockIdx.x + threadIdx.x; if(i<N) C[i]=A[i]+B[i]; } Invoking the function cudaMalloc((void**)&d_A, size); Allocate Memory on GPU cudaMalloc((void**)&d_B, size); cudaMalloc((void**)&d_C, size); cudaMemcpy(d_A, A, size, cudaMemcpyHostToDevice); Copy Arrays to GPU cudaMemcpy(d_B, B, size, cudaMemcpyHostToDevice); int threadsPerBlock = 256; int blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock; Invoke function VecAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, N); cudaMemcpy(C, d_C, size, cudaMemcpyDeviceToHost); Copy Result Back to Main Memory cudaFree(d_A); cudaFree(d_B); Free GPU Memory cudaFree(d_C); Compilation # nvcc vectorAdd.cu –I ../../common/inc
  • 18. GPU Programming Challenges • Need high “occupancy” for best performance • Extracting parallelism with limited resources – Limited Registers – Limited Shared Memory • Preferred Approach – Small Kernels – Multiple Passes if needed • Decompose Problem into Parallel Pieces – Write once, scale perform everywhere!
  • 19. GPU Programming • Use Shared Memory when possible – Cooperation between threads in a block – Reduce access to global memory • Reduce Data Transfer over the Bus • It’s still a GPU ! – use textures to your advantage – use vector data types if you can • Watch out for GPU capability differences!
  • 20. Enough Theory! Demo Time & Let’s do some programming 
  • 21. Watch out for • Portability of programs across GPUs – Capabilities vary from GPU to GPU – Memory usage • Arithmetic differences in the result • Pay careful attention to demos…
  • 22. Resources • CUDA – Tools on NVIDIA Developer Site  http://developer.nvidia.com/object/gpucomputing.html – CUDPP  http://code.google.com/p/cudpp/ • OpenCL • Google Search !
  • 23. The Future • Better throughput – More GPU cores, scaling by Moore’s law – PCIe Gen 3 • Easier to program • Arbitrary control and data access patterns