SlideShare une entreprise Scribd logo
1  sur  33
Rohit khatana
Parallel Computing With GPU
Rohit Khatana
4344
Seminar guide
Prof. Aparna Joshi
ARMY INSTITUE OF TECHNOLOGY
Rohit khatana
Content
1.What is parallel computing?
2.Gpu
3.CUDA
4.Application
Rohit khatana
What is Parallel Computing?
Performing or Executing a task/program
on more than one machine or processor.
In simple way dividing a job in a group.
Rohit khatana
For example
Rohit khatana
What kind of processors will we
build?
(major design constraint: power)
Cpu: - Complex Control Hardware
Flexibility + Performance
Expensive in Terms of Power
GPU: - Simpler Control Hardware
More H/W for Computation
Potentially More power Efficient (ops/watt)
More Restrictive Programming Model
Modern GPU has more ALU’s
Graphics Logical Pipeline
• The GPU receives geometry information
from the CPU as an input and provides
a picture as an output
• Let’s see how that happens
Host Interface
• The host interface is the communication bridge
between the CPU and the GPU
• It receives commands from the CPU and also
pulls geometry information from system
memory
• It outputs a stream of vertices in object space
with all their associated information (normals,
texture coordinates, per vertex color etc)
Vertex Processing
• The vertex processing stage receives vertices from the
host interface in object space and outputs them in screen
space
• This may be a simple linear transformation, or a complex
operation involving morphing effects
• No new vertices are created in this stage, and no
vertices are discarded (input/output has 1:1 mapping)
Triangle Setup
• In this stage geometry information becomes raster
information (screen space geometry is the input,
pixels are the output)
• Prior to rasterization, triangles that are backfacing
or are located outside the viewing frustrum are
rejected
Triangle Setup
• A fragment is generated if and only if its center
is inside the triangle
• Every fragment generated has its attributes
computed to be the perspective correct
interpolation of the three vertices that make up
the triangle
Fragment Processing
• Each fragment provided by triangle setup is fed
into fragment processing as a set of attributes
(position, normal, texcoord etc), which are used to
compute the final color for this pixel
• The computations taking place here include
texture mapping and math operations
Memory Interface
• Fragments provided by the last step are written to
the framebuffer.
• Before the final write occurs, some fragments are
rejected by the zbuffer, stencil and alpha tests
Memory Model of GPU
Basic Architecture of GPU
CUDA(compute unified device
Architecture)
• CUDA is a parallel computing platform and
programming model.
• Created by NVIDIA and implemented by the
GPUs that they produce.
CUDA
• CUDA gives developers access to the
virtual instruction set and memory of the
parallel computational elements in CUDA
GPUs.
• CUDA supports standard programming
languages , including C++,python , Fortran.
Programming Model
• Threads are organized into blocks.
• Blocks are organized into a grid.
• A multiprocessor executes one block at a
time.
• A warp is the set of threads executed in
parallel.
• 32 threads in a warp.
Typical CUDA/GPU Program
1. CPU allocates storage on GPU (cudaMalloc).
2. CPU copies input data from CPU GPU
(cudaMemcpy).
3. CPU launches kernel on GPU to process the data.
(Kernel function<<<no of threads>>>(parameter))
4. CPU copies results back to CPU from GPU
(cudaMemcpy)
simply squaring the elements of an array
__global__ void square(float * d_out, float * d_in){
// Todo: Fill in this function
int idx = threadIdx.x;
float f = d_in[idx];
d_out[idx] = f*f
}
theadIdx.x =gives the current thread number
GPU/CUDA programming
Main program
int main(int argc, char **argv){
……………………
…………………….
float h_out[ARRAY_SIZE];
//declare GPU pointer
float * d_in;
float * d_out;
// allocate GPU memory
cudaMalloc( (void*) &d_in, ARRAY_BYTES);
cudaMalloc( (void*) &d_out, ARRAY_BYTES);
Main program(cont.)
// transfer the array to the GPU
cudaMemcpy(d_in, h_in, ARRAY_BYTES, cudaMemcpyHostToDevice);
// launch the kernel
square<<<1, ARRAY_SIZE>>>(d_out, d_in);
// copy back the result array to the CPU
cudaMemcpy(h_out, d_out, ARRAY_BYTES, cudaMemcpyDeviceToHost);
// print out the resulting array
for (int i =0; i < ARRAY_SIZE; i++) {
printf("%f", h_out[i]);
}
Programming Model
GPU vs CPU Code
Conclusion
• GPU computing is a good choice for fine-
grained data-parallel programs with limited
communication
• GPU computing is not so good for coarse-
grained program with a lot of communication
• The GPU has become a co-processor to the
CPU.
References
• 1.[‘IEEE’] Accelerating image processing capability using
graphics processors Jason. Dalea, Gordon. Caina, Brad.
ZellbaVision4ce Ltd. Crowthorne Enterprise Center,
Crowthorne, Berkshire, UK, RG45 6AWbVision4ce LLC
Severna Park, USA, MD2114
•
• 2.Udacity cs344,Intro to parallel Programming with GPU
• 3.Wikipedia
• 4.Nividia docs

Contenu connexe

Tendances (20)

GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
Cuda
CudaCuda
Cuda
 
Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)
 
Tech Talk NVIDIA CUDA
Tech Talk NVIDIA CUDATech Talk NVIDIA CUDA
Tech Talk NVIDIA CUDA
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
Cuda
CudaCuda
Cuda
 
Graphic Processing Unit
Graphic Processing UnitGraphic Processing Unit
Graphic Processing Unit
 
CUDA Architecture
CUDA ArchitectureCUDA Architecture
CUDA Architecture
 
It's Time to ROCm!
It's Time to ROCm!It's Time to ROCm!
It's Time to ROCm!
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
CPU vs GPU Comparison
CPU  vs GPU ComparisonCPU  vs GPU Comparison
CPU vs GPU Comparison
 
Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)
 
Gpu presentation
Gpu presentationGpu presentation
Gpu presentation
 
Cuda Architecture
Cuda ArchitectureCuda Architecture
Cuda Architecture
 

Similaire à Parallel computing with Gpu

Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to AcceleratorsDilum Bandara
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdfTigabu Yaya
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAPiyush Mittal
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsnARUNACHALAM468781
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computingbakers84
 
Parallel program design
Parallel program designParallel program design
Parallel program designZongYing Lyu
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxssuser413a98
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilersAnastasiaStulova
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8AbdullahMunir32
 

Similaire à Parallel computing with Gpu (20)

Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDA
 
Exploring Gpgpu Workloads
Exploring Gpgpu WorkloadsExploring Gpgpu Workloads
Exploring Gpgpu Workloads
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
Parallel program design
Parallel program designParallel program design
Parallel program design
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilers
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
 
Lecture 04
Lecture 04Lecture 04
Lecture 04
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
 

Dernier

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Dernier (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Parallel computing with Gpu

  • 1. Rohit khatana Parallel Computing With GPU Rohit Khatana 4344 Seminar guide Prof. Aparna Joshi ARMY INSTITUE OF TECHNOLOGY
  • 2. Rohit khatana Content 1.What is parallel computing? 2.Gpu 3.CUDA 4.Application
  • 3. Rohit khatana What is Parallel Computing? Performing or Executing a task/program on more than one machine or processor. In simple way dividing a job in a group.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. What kind of processors will we build? (major design constraint: power) Cpu: - Complex Control Hardware Flexibility + Performance Expensive in Terms of Power GPU: - Simpler Control Hardware More H/W for Computation Potentially More power Efficient (ops/watt) More Restrictive Programming Model
  • 13. Modern GPU has more ALU’s
  • 14. Graphics Logical Pipeline • The GPU receives geometry information from the CPU as an input and provides a picture as an output • Let’s see how that happens
  • 15. Host Interface • The host interface is the communication bridge between the CPU and the GPU • It receives commands from the CPU and also pulls geometry information from system memory • It outputs a stream of vertices in object space with all their associated information (normals, texture coordinates, per vertex color etc)
  • 16. Vertex Processing • The vertex processing stage receives vertices from the host interface in object space and outputs them in screen space • This may be a simple linear transformation, or a complex operation involving morphing effects • No new vertices are created in this stage, and no vertices are discarded (input/output has 1:1 mapping)
  • 17. Triangle Setup • In this stage geometry information becomes raster information (screen space geometry is the input, pixels are the output) • Prior to rasterization, triangles that are backfacing or are located outside the viewing frustrum are rejected
  • 18. Triangle Setup • A fragment is generated if and only if its center is inside the triangle • Every fragment generated has its attributes computed to be the perspective correct interpolation of the three vertices that make up the triangle
  • 19. Fragment Processing • Each fragment provided by triangle setup is fed into fragment processing as a set of attributes (position, normal, texcoord etc), which are used to compute the final color for this pixel • The computations taking place here include texture mapping and math operations
  • 20. Memory Interface • Fragments provided by the last step are written to the framebuffer. • Before the final write occurs, some fragments are rejected by the zbuffer, stencil and alpha tests
  • 23. CUDA(compute unified device Architecture) • CUDA is a parallel computing platform and programming model. • Created by NVIDIA and implemented by the GPUs that they produce.
  • 24. CUDA • CUDA gives developers access to the virtual instruction set and memory of the parallel computational elements in CUDA GPUs. • CUDA supports standard programming languages , including C++,python , Fortran.
  • 25. Programming Model • Threads are organized into blocks. • Blocks are organized into a grid. • A multiprocessor executes one block at a time. • A warp is the set of threads executed in parallel. • 32 threads in a warp.
  • 26. Typical CUDA/GPU Program 1. CPU allocates storage on GPU (cudaMalloc). 2. CPU copies input data from CPU GPU (cudaMemcpy). 3. CPU launches kernel on GPU to process the data. (Kernel function<<<no of threads>>>(parameter)) 4. CPU copies results back to CPU from GPU (cudaMemcpy)
  • 27. simply squaring the elements of an array __global__ void square(float * d_out, float * d_in){ // Todo: Fill in this function int idx = threadIdx.x; float f = d_in[idx]; d_out[idx] = f*f } theadIdx.x =gives the current thread number GPU/CUDA programming
  • 28. Main program int main(int argc, char **argv){ …………………… ……………………. float h_out[ARRAY_SIZE]; //declare GPU pointer float * d_in; float * d_out; // allocate GPU memory cudaMalloc( (void*) &d_in, ARRAY_BYTES); cudaMalloc( (void*) &d_out, ARRAY_BYTES);
  • 29. Main program(cont.) // transfer the array to the GPU cudaMemcpy(d_in, h_in, ARRAY_BYTES, cudaMemcpyHostToDevice); // launch the kernel square<<<1, ARRAY_SIZE>>>(d_out, d_in); // copy back the result array to the CPU cudaMemcpy(h_out, d_out, ARRAY_BYTES, cudaMemcpyDeviceToHost); // print out the resulting array for (int i =0; i < ARRAY_SIZE; i++) { printf("%f", h_out[i]); }
  • 31. GPU vs CPU Code
  • 32. Conclusion • GPU computing is a good choice for fine- grained data-parallel programs with limited communication • GPU computing is not so good for coarse- grained program with a lot of communication • The GPU has become a co-processor to the CPU.
  • 33. References • 1.[‘IEEE’] Accelerating image processing capability using graphics processors Jason. Dalea, Gordon. Caina, Brad. ZellbaVision4ce Ltd. Crowthorne Enterprise Center, Crowthorne, Berkshire, UK, RG45 6AWbVision4ce LLC Severna Park, USA, MD2114 • • 2.Udacity cs344,Intro to parallel Programming with GPU • 3.Wikipedia • 4.Nividia docs