SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
INTRODUCTION TO CUDA
Prepared for Geek Camp Singapore 2011
                                  Raymond Tay
THE FREE LUNCH IS OVER – HERB
SUTTER
WE NEED TO THINK BEYOND MULTI-CORE
CPUS … WE NEED TO THINK MANY-CORE
GPUS




…
NVIDIA GPUS FPS
    FPS – Floating-point per second aka flops. A measure of how
     many flops can a GPU do. More is Better 


                                                   GPUs beat CPUs
NVIDIA GPUS MEMORY BANDWIDTH
    With massively parallel processors in Nvidia’s GPUs, providing
     high memory bandwidth plays a big role in high performance
     computing.

                                                    GPUs beat CPUs
GPU VS CPU




CPU                                  GPU
"   Optimised for low-latency        "   Optimised for data-parallel,
    access to cached data sets           throughput computation
"   Control logic for out-of-order   "   Architecture tolerant of
    and speculative execution            memory latency
                                     "   More transistors dedicated to
                                         computation
I DON’T KNOW C/C++, SHOULD I LEAVE?
                           Your Brain Asks:
                                       Wait a minute, why
  Relax,   no worries. Not to fret.   should I learn the C/
                                       C++ SDK?

                                       CUDA Answers:
                                       Efficiency!!!
WHAT DO I NEED TO BEGIN WITH CUDA?
  A   Nvidia CUDA enabled graphics card e.g. Fermi
HOW DOES CUDA WORK



                                  PCI Bus




1.  Copy input data from CPU memory to
    GPU memory
2.  Load GPU program and execute,
    caching data on chip for performance
3.  Copy results from GPU memory to CPU
    memory
EXAMPLE: BLOCK CYPHER
void host_shift_cypher(unsigned int *input_array,    __global__ void shift_cypher(unsigned int
   unsigned int *output_array, unsigned int              *input_array, unsigned int *output_array,
   shift_amount, unsigned int alphabet_max,              unsigned int shift_amount, unsigned int
   unsigned int array_length)	
                          alphabet_max, unsigned int array_length)	
{	
                                                  {	
  for(unsigned int i=0;i<array_length;i++)	
           unsigned int tid = threadIdx.x + blockIdx.x *
 {	
                                                      blockDim.x;	

       int element = input_array[i];	
                 int shifted = input_array[tid] + shift_amount;	
       int shifted = element + shift_amount;	
         if ( shifted > alphabet_max )	
       if(shifted > alphabet_max)	
                        	
shifted = shifted % (alphabet_max + 1);	
       {	
         shifted = shifted % (alphabet_max + 1);	
     output_array[tid] = shifted;	
       }	
                                           }	
       output_array[i] = shifted;	
  }	
                                                Int main() {	
}	
                                                  dim3 dimGrid(ceil(array_length)/block_size);	
Int main() {	
                                                     dim3 dimBlock(block_size);	
host_shift_cypher(input_array, output_array,
                                                     shift_cypher<<<dimGrid,dimBlock>>>(input_array,
   shift_amount, alphabet_max, array_length);	
                                                          output_array, shift_amount, alphabet_max,
}	
                                                       array_length);	
                                                     }	
                    CPU                                               GPU
                    Program                                           Program
EXAMPLE: VECTOR ADDITION
 // CUDA CODE
__global__ void VecAdd(const float* A, const float* B, float* C,
    unsigned int N)
{
  int i = blockDim.x * blockIdx.x + threadIdx.x;
  if (i < N)
   C[i] = A[i] + B[i];
}

// C CODE
void VecAdd(const float* A, const float* B, float* C,unsigned int N)
{
 for( int i = 0; i < N; ++i)
  C[i] = A[i] + B[i];
}
DEBUGGER
              CUDA-GDB	
           • Based on GDB
           • Linux
           • Mac OS X



                             Parallel Nsight	
                            • Plugin inside
                            Visual Studio
VISUAL PROFILER & MEMCHECK
                                 Profiler	
                           •  Microsoft Windows
                           •  Linux
                           •  Mac OS X

                           •  Analyze
                           Performance




     CUDA-MEMCHECK	
    •  Microsoft Windows
    •  Linux
    •  Mac OS X

    •  Detect memory
    access errors
WHERE’S CUDA AT IN 2011?
  60,000 researchers use it to aid drug discovery
  470 universities teach CUDA
WHERE’S CUDA AT IN 2011? (PART 2..)
  NVIDIA   Show Case (1000+ applications)
ADDITIONAL RESOURCES
    CUDA FAQ (http://tegradeveloper.nvidia.com/cuda-faq)
    CUDA Tools & Ecosystem (
     http://tegradeveloper.nvidia.com/cuda-tools-ecosystem)
    CUDA Downloads (http://tegradeveloper.nvidia.com/cuda-downloads)
    NVIDIA Forums (http://forums.nvidia.com/index.php?showforum=62)
    GPGPU (http://gpgpu.org )
    CUDA By Example (
     http://tegradeveloper.nvidia.com/content/cuda-example-introduction-
     general-purpose-gpu-programming-0)
         Jason Sanders & Edward Kandrot
    GPU Computing Gems Emerald Edition (
     http://www.amazon.com/GPU-Computing-Gems-Emerald-Applications/dp/
     0123849888/ )
         Editor in Chief: Prof Hwu Wen-Mei
CUDA LIBRARIES
  Visit this site
   http://developer.nvidia.com/cuda-tools-
   ecosystem#Libraries
  Thrust, CUFFT, CUBLAS, CUSP, NPP, OpenCV,
   GPU AI-Tree Search, GPU AI-Path Finding
  A lot of the libraries are hosted in Google Code.
   Many more gems in there too!
THANK YOU
  @RaymondTayBL

Contenu connexe

Tendances

Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
Rob Gillen
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 

Tendances (18)

Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
Cuda
CudaCuda
Cuda
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
 
C++ amp on linux
C++ amp on linuxC++ amp on linux
C++ amp on linux
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUs
 
Vc4c development of opencl compiler for videocore4
Vc4c  development of opencl compiler for videocore4Vc4c  development of opencl compiler for videocore4
Vc4c development of opencl compiler for videocore4
 
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
 
Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
 
Lecture 04
Lecture 04Lecture 04
Lecture 04
 
UDPSRC GStreamer Plugin Session VIII
UDPSRC GStreamer Plugin Session VIIIUDPSRC GStreamer Plugin Session VIII
UDPSRC GStreamer Plugin Session VIII
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 

En vedette

Network Security Threats and Solutions
Network Security Threats and SolutionsNetwork Security Threats and Solutions
Network Security Threats and Solutions
Colin058
 

En vedette (7)

Toying with spark
Toying with sparkToying with spark
Toying with spark
 
Distributed computing for new bloods
Distributed computing for new bloodsDistributed computing for new bloods
Distributed computing for new bloods
 
Modern Cryptography
Modern CryptographyModern Cryptography
Modern Cryptography
 
Network Security
Network SecurityNetwork Security
Network Security
 
Network Security
Network SecurityNetwork Security
Network Security
 
Network security
Network securityNetwork security
Network security
 
Network Security Threats and Solutions
Network Security Threats and SolutionsNetwork Security Threats and Solutions
Network Security Threats and Solutions
 

Similaire à Introduction to cuda geek camp singapore 2011

Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
mouhouioui
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
Angela Mendoza M.
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux Club
Ofer Rosenberg
 
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
laparuma
 

Similaire à Introduction to cuda geek camp singapore 2011 (20)

Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
introduction to CUDA_C.pptx it is widely used
introduction to CUDA_C.pptx it is widely usedintroduction to CUDA_C.pptx it is widely used
introduction to CUDA_C.pptx it is widely used
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux Club
 
Intro2 Cuda Moayad
Intro2 Cuda MoayadIntro2 Cuda Moayad
Intro2 Cuda Moayad
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
 
C++ AMP 실천 및 적용 전략
C++ AMP 실천 및 적용 전략 C++ AMP 실천 및 적용 전략
C++ AMP 실천 및 적용 전략
 

Dernier

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Introduction to cuda geek camp singapore 2011

  • 1. INTRODUCTION TO CUDA Prepared for Geek Camp Singapore 2011 Raymond Tay
  • 2. THE FREE LUNCH IS OVER – HERB SUTTER
  • 3. WE NEED TO THINK BEYOND MULTI-CORE CPUS … WE NEED TO THINK MANY-CORE GPUS …
  • 4. NVIDIA GPUS FPS   FPS – Floating-point per second aka flops. A measure of how many flops can a GPU do. More is Better  GPUs beat CPUs
  • 5. NVIDIA GPUS MEMORY BANDWIDTH   With massively parallel processors in Nvidia’s GPUs, providing high memory bandwidth plays a big role in high performance computing. GPUs beat CPUs
  • 6. GPU VS CPU CPU GPU "   Optimised for low-latency "   Optimised for data-parallel, access to cached data sets throughput computation "   Control logic for out-of-order "   Architecture tolerant of and speculative execution memory latency "   More transistors dedicated to computation
  • 7. I DON’T KNOW C/C++, SHOULD I LEAVE? Your Brain Asks: Wait a minute, why   Relax, no worries. Not to fret. should I learn the C/ C++ SDK? CUDA Answers: Efficiency!!!
  • 8. WHAT DO I NEED TO BEGIN WITH CUDA?   A Nvidia CUDA enabled graphics card e.g. Fermi
  • 9. HOW DOES CUDA WORK PCI Bus 1.  Copy input data from CPU memory to GPU memory 2.  Load GPU program and execute, caching data on chip for performance 3.  Copy results from GPU memory to CPU memory
  • 10. EXAMPLE: BLOCK CYPHER void host_shift_cypher(unsigned int *input_array, __global__ void shift_cypher(unsigned int unsigned int *output_array, unsigned int *input_array, unsigned int *output_array, shift_amount, unsigned int alphabet_max, unsigned int shift_amount, unsigned int unsigned int array_length) alphabet_max, unsigned int array_length) { { for(unsigned int i=0;i<array_length;i++) unsigned int tid = threadIdx.x + blockIdx.x * { blockDim.x; int element = input_array[i]; int shifted = input_array[tid] + shift_amount; int shifted = element + shift_amount; if ( shifted > alphabet_max ) if(shifted > alphabet_max) shifted = shifted % (alphabet_max + 1); { shifted = shifted % (alphabet_max + 1); output_array[tid] = shifted; } } output_array[i] = shifted; } Int main() { } dim3 dimGrid(ceil(array_length)/block_size); Int main() { dim3 dimBlock(block_size); host_shift_cypher(input_array, output_array, shift_cypher<<<dimGrid,dimBlock>>>(input_array, shift_amount, alphabet_max, array_length); output_array, shift_amount, alphabet_max, } array_length); } CPU GPU Program Program
  • 11. EXAMPLE: VECTOR ADDITION // CUDA CODE __global__ void VecAdd(const float* A, const float* B, float* C, unsigned int N) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < N) C[i] = A[i] + B[i]; } // C CODE void VecAdd(const float* A, const float* B, float* C,unsigned int N) { for( int i = 0; i < N; ++i) C[i] = A[i] + B[i]; }
  • 12. DEBUGGER CUDA-GDB • Based on GDB • Linux • Mac OS X Parallel Nsight • Plugin inside Visual Studio
  • 13. VISUAL PROFILER & MEMCHECK Profiler •  Microsoft Windows •  Linux •  Mac OS X •  Analyze Performance CUDA-MEMCHECK •  Microsoft Windows •  Linux •  Mac OS X •  Detect memory access errors
  • 14. WHERE’S CUDA AT IN 2011?   60,000 researchers use it to aid drug discovery   470 universities teach CUDA
  • 15. WHERE’S CUDA AT IN 2011? (PART 2..)   NVIDIA Show Case (1000+ applications)
  • 16. ADDITIONAL RESOURCES   CUDA FAQ (http://tegradeveloper.nvidia.com/cuda-faq)   CUDA Tools & Ecosystem ( http://tegradeveloper.nvidia.com/cuda-tools-ecosystem)   CUDA Downloads (http://tegradeveloper.nvidia.com/cuda-downloads)   NVIDIA Forums (http://forums.nvidia.com/index.php?showforum=62)   GPGPU (http://gpgpu.org )   CUDA By Example ( http://tegradeveloper.nvidia.com/content/cuda-example-introduction- general-purpose-gpu-programming-0)   Jason Sanders & Edward Kandrot   GPU Computing Gems Emerald Edition ( http://www.amazon.com/GPU-Computing-Gems-Emerald-Applications/dp/ 0123849888/ )   Editor in Chief: Prof Hwu Wen-Mei
  • 17. CUDA LIBRARIES   Visit this site http://developer.nvidia.com/cuda-tools- ecosystem#Libraries   Thrust, CUFFT, CUBLAS, CUSP, NPP, OpenCV, GPU AI-Tree Search, GPU AI-Path Finding   A lot of the libraries are hosted in Google Code. Many more gems in there too!
  • 18. THANK YOU @RaymondTayBL