SlideShare a Scribd company logo
1 of 16
CUDAArchitecture
Prof. Shashikant V. Athawale
Assistant Professor | Computer Engineering
Department | AISSMS College of Engineering,
Kennedy Road, Pune , MH, India - 411001
Contents
❖ CUDAArchitecture
❖ Applications of CUDA
❖ Introduction to CUDA C-Write and launch CUDA C
kernels
❖ Manage GPU memory
❖ Manage communication and synchronization
❖ Parallel programming in CUDA- C.
Communication And Synchronization in
Thread
Communication And Synchronization in
Thread
CUDAArchitecture
CUDAArchitecture
Applications of CUDA
CUDA C : The Basics
❖ Based on industry-standard C
❖ A handful of language extensions to allow heterogeneous
programs
❖ Straightforward APIs to manage devices, memory, etc.
❖ Terminology:
➢ Host – The CPU and its memory (host memory)
➢ Device – The GPU and its memory (device memory)
Device
CUDA Kernels
GPU Memory Management
Data Transfer Directions Keywords
❖ cudaMemcpyHostToHost
❖ cudaMemcpyHostToDevice
❖ cudaMemcpyDeviceToHost
❖ cudaMemcpyDeviceToDevice
Parallel Programming in CUDA C
❖ CUDA brings data-parallel computing to the masses.
❖ CUDA is a scalable parallel programming model.
❖ Program runs on any number of processors without
recompiling.
Architecture Of Parallel CUDA Programming
CUDA Uses Extensive Multithreading
❖ CUDA threads express fine-grained data parallelism.
➢ Map threads to GPU threads.
➢ Virtualize the processors.
❖ CUDA thread blocks express coarse-grained parallelism.
➢ Blocks hold arrays of GPU threads, define shared
memory boundaries.
➢ Allow scaling between smaller and larger GPUs.
CUDA Uses Extensive Multithreading
❖ GPUs execute thousands of lightweight threads.
➢ In graphics, each thread computes one pixel.
➢ One CUDA thread computes one result (or several
results).
➢ Hardware multithreading & zero-overhead
scheduling.
Applications
❖ High bandwidth
❖ Visual computing
❖ High arithmetic intensity

More Related Content

What's hot

High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance ComputingDivyen Patel
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with GpuRohit Khatana
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computingVajira Thambawita
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSKathirvel Ayyaswamy
 
Parallel computing
Parallel computingParallel computing
Parallel computingVinay Gupta
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentationVishal Singh
 
Introduction to High Performance Computing
Introduction to High Performance ComputingIntroduction to High Performance Computing
Introduction to High Performance ComputingUmarudin Zaenuri
 
Introduction to MPI
Introduction to MPI Introduction to MPI
Introduction to MPI Hanif Durad
 
High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...Pradeep Redddy Raamana
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDAMartin Peniak
 

What's hot (20)

Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
High–Performance Computing
High–Performance ComputingHigh–Performance Computing
High–Performance Computing
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentation
 
Introduction to High Performance Computing
Introduction to High Performance ComputingIntroduction to High Performance Computing
Introduction to High Performance Computing
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
 
Introduction to MPI
Introduction to MPI Introduction to MPI
Introduction to MPI
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 

Similar to CUDA Architecture

Compute Unified Device Architecture (CUDA) Programmimg
Compute Unified Device Architecture (CUDA) ProgrammimgCompute Unified Device Architecture (CUDA) Programmimg
Compute Unified Device Architecture (CUDA) ProgrammimgWinnie Thomas
 
IIT ropar_CUDA_Report_Ankita Dewan
IIT ropar_CUDA_Report_Ankita DewanIIT ropar_CUDA_Report_Ankita Dewan
IIT ropar_CUDA_Report_Ankita DewanAnkita Dewan
 
IIT ropar_CUDA_Report_Ankita Dewan
IIT ropar_CUDA_Report_Ankita DewanIIT ropar_CUDA_Report_Ankita Dewan
IIT ropar_CUDA_Report_Ankita DewanAnkita Dewan
 
CUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : NotesCUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : NotesSubhajit Sahu
 
Pycon2014 GPU computing
Pycon2014 GPU computingPycon2014 GPU computing
Pycon2014 GPU computingAshwin Ashok
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An IntroductionDhan V Sagar
 
Engineer Engineering Software
Engineer Engineering SoftwareEngineer Engineering Software
Engineer Engineering SoftwareYung-Yu Chen
 
The Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACCThe Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACCinside-BigData.com
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)dibyendu.das
 
S0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudaS0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudamistercteam
 
GPU Programming with CUDA
GPU Programming with CUDAGPU Programming with CUDA
GPU Programming with CUDAFilipo Mór
 
qCUDA-ARM : Virtualization for Embedded GPU Architectures
 qCUDA-ARM : Virtualization for Embedded GPU Architectures  qCUDA-ARM : Virtualization for Embedded GPU Architectures
qCUDA-ARM : Virtualization for Embedded GPU Architectures 柏瑀 黃
 
Everything as code
Everything as codeEverything as code
Everything as codeHepsiburada
 

Similar to CUDA Architecture (20)

Compute Unified Device Architecture (CUDA) Programmimg
Compute Unified Device Architecture (CUDA) ProgrammimgCompute Unified Device Architecture (CUDA) Programmimg
Compute Unified Device Architecture (CUDA) Programmimg
 
IIT ropar_CUDA_Report_Ankita Dewan
IIT ropar_CUDA_Report_Ankita DewanIIT ropar_CUDA_Report_Ankita Dewan
IIT ropar_CUDA_Report_Ankita Dewan
 
IIT ropar_CUDA_Report_Ankita Dewan
IIT ropar_CUDA_Report_Ankita DewanIIT ropar_CUDA_Report_Ankita Dewan
IIT ropar_CUDA_Report_Ankita Dewan
 
CUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : NotesCUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : Notes
 
CUDA
CUDACUDA
CUDA
 
Pycon2014 GPU computing
Pycon2014 GPU computingPycon2014 GPU computing
Pycon2014 GPU computing
 
Cuda materials
Cuda materialsCuda materials
Cuda materials
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An Introduction
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
Cuda lab manual
Cuda lab manualCuda lab manual
Cuda lab manual
 
Engineer Engineering Software
Engineer Engineering SoftwareEngineer Engineering Software
Engineer Engineering Software
 
The Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACCThe Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACC
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
S0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudaS0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cuda
 
GPU Programming with CUDA
GPU Programming with CUDAGPU Programming with CUDA
GPU Programming with CUDA
 
Resume
ResumeResume
Resume
 
qCUDA-ARM : Virtualization for Embedded GPU Architectures
 qCUDA-ARM : Virtualization for Embedded GPU Architectures  qCUDA-ARM : Virtualization for Embedded GPU Architectures
qCUDA-ARM : Virtualization for Embedded GPU Architectures
 
Everything as code
Everything as codeEverything as code
Everything as code
 

More from Dr Shashikant Athawale (20)

multi threaded and distributed algorithms
multi threaded and distributed algorithms multi threaded and distributed algorithms
multi threaded and distributed algorithms
 
Amortized analysis
Amortized analysisAmortized analysis
Amortized analysis
 
Complexity theory
Complexity theory Complexity theory
Complexity theory
 
Divide and Conquer
Divide and ConquerDivide and Conquer
Divide and Conquer
 
Model and Design
Model and Design Model and Design
Model and Design
 
Fundamental of Algorithms
Fundamental of Algorithms Fundamental of Algorithms
Fundamental of Algorithms
 
Parallel Algorithms- Sorting and Graph
Parallel Algorithms- Sorting and GraphParallel Algorithms- Sorting and Graph
Parallel Algorithms- Sorting and Graph
 
Analytical Models of Parallel Programs
Analytical Models of Parallel ProgramsAnalytical Models of Parallel Programs
Analytical Models of Parallel Programs
 
Basic Communication
Basic CommunicationBasic Communication
Basic Communication
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Dynamic programming
Dynamic programmingDynamic programming
Dynamic programming
 
Parallel algorithms
Parallel algorithms Parallel algorithms
Parallel algorithms
 
Greedy method
Greedy method Greedy method
Greedy method
 
Divide and conquer
Divide and conquerDivide and conquer
Divide and conquer
 
Branch and bound
Branch and boundBranch and bound
Branch and bound
 
Asymptotic notation
Asymptotic notationAsymptotic notation
Asymptotic notation
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
Advanced Wireless Technologies
Advanced Wireless TechnologiesAdvanced Wireless Technologies
Advanced Wireless Technologies
 
Vo ip
Vo ipVo ip
Vo ip
 

Recently uploaded

Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 

Recently uploaded (20)

Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 

CUDA Architecture