SlideShare une entreprise Scribd logo
1  sur  19
By : 
Ravi Kumar Parmar 
Student ,BCA V semester(Roll-42) 
School of Computer And Systems Sciences 
Jaipur National University 
Jaipur
 The graphics processing unit (GPU) has become 
an integral part of today’s mainstream 
computing systems. Over the past six years, 
there has been a marked increase in the 
performance and capabilities of GPUs. 
 GPU is a graphical processing unit which enables 
you run high definitions graphics on your PC, 
which are the emend of modern computing. Like 
the CPU (Central Processing Unit), it is a single-chip 
processor. The GPU has hundreds of cores as 
compared to the 4 or 8 in the latest CPUs. The 
primary job of the GPU is to compute 3D 
functions.
 This has already been undertaken by a number of 
graphics processor vendors, such as in NVidia's Pure 
video technology, and AT i's, A vivo. Both companies’ 
technologies offload a number of the most 
computationally intensive aspects of MPEG decoding 
to the GPU, in order to speed up the process over 
the CPU alone so we will concentrate our work 
on the post-decoding phases. 
 With this paper we aim to fill the existing gap in the 
literature using the broader perspective of SOA. We 
do not restrict our self to specific problems but give 
an overview of the multitude of existing application 
examples and the promising future employment of 
graphics hardware in SOAs. By classifying the use 
cases into the layer of reference architecture, we 
show which specific advantage of GPUs is most 
beneficial at each layer
 Control hardware dominates processors 
 Complex, difficult to build and verify 
 Takes substantial fraction of die Scales poorly 
 Pay for max throughput, sustain average 
throughput 
 Quadratic dependency checking 
 Control hardware doesn’t do any math! 
 Over the past few years, the GPU has 
evolved from a fixed-function special-purpose 
processor into a full-fledged parallel 
programmable processor with additional 
fixed-function special-purpose functionality.
 The Graphics Pipeline 
- The input to the GPU is a list of 
geometric primitives, typically triangles, in a 
3-D world coordinate system.Through many 
steps, Vertex Operations: The input primitives 
are formed from individual vertices. Each 
vertex must be transformed into screen space 
and shaded, typically through computing their 
interaction with the lights in the scene. 
Because typical scenes have tens to hundreds 
of thousands of vertices, and each vertex can 
be computed independently, this stage is well 
suited for parallel hardware.
 Evolution of GPU Architecture 
- The fixed-function pipeline lacked the 
generality to efficiently express more complicated 
shading and lighting operations that are essential 
for complex effects. The key step was replacing 
the fixed-function per-vertex and per-fragment 
operations with user-specified programs run on 
each vertex and fragment. Over the past six years, 
these vertex programs and fragment programs 
have become increasingly more capable, with 
larger limits on their size and resource 
consumption, with more fully featured instruction 
sets, and with more flexible control-flow 
operations.
 Architecture Of Modern Gpu 
- we noted that the GPU is built for different 
application demands than the CPU: large parallel 
computation requirements with an emphasis on 
throughput rather than latency. Consequently, the 
architecture of the GPU has progressed in a different 
direction than that of the CPU. 
- The CPU divides the pipeline in time, applying 
all resources in the processor to each stage in turn. 
GPUs have historically taken a different approach. The 
GPU divides the resources of the processor among the 
different stages, such that the pipeline is divided in 
space, not time. The part of the processor working on 
one stage feeds its output directly into a different part 
that works on the next stage.
 NVIDIA 
– Tesla HPC specific GPUs have evolved 
from GeForce series 
 AMD 
– Fire Stream HPC specific GPUs have 
evolved from (ATI) Radeon series 
 Intel 
– Knights Corner many-core x86 chip is 
like hybrid between a GPU and many-core CPU
 Large computational requirements 
 Massive parallelism 
 Graphics pipeline designed for 
independent operations 
 Long latencies tolerable 
 Deep, feed-forward pipelines 
 Hacks are OK—can tolerate lack of 
accuracy 
 GPUs are good at parallel, arithmetically 
intense, streaming-memory problems
 In addition to query processing, large web search 
engines need to perform many other operations 
including web crawl-ing, index building, and 
data mining steps for tasks such as link analysis 
and spam and duplicate detection. We focus 
here on query processing and in particular on 
one phase of this step as explained further 
below. We believe that this part is suitable for 
implementation on GPUs as it is fairly simple in 
structure but nonetheless consumes a 
disproportionate amount of the overall system 
resources. In contrast, we do not think that 
implementation of a complete search engine on 
a GPU is currently realistic.
 The Gpu Programming Model 
- The programmable units of the GPU 
follow a single program multiple-data (SPMD) 
programming model. For efficiency, the GPU 
processes many elements (vertices or 
fragments) in parallel using the same program. 
Each element is independent from the other 
elements, and in the base programming model, 
elements cannot communicate with each 
other. All GPU programs must be structured in 
this way: many parallel elements, each 
processed in parallel by a single program.
 General-Purpose Computing on the GPU 
- Steps to show the simpler and direct way that 
today’s GPU computing applications are written. 
1. Programming a GPU for Graphics: We begin with the 
same GPU pipeline that we described in Section II, 
concentrating on the programmable aspects of this 
pipeline. 
2. The programmer specifies geometry that covers a region 
on the screen. The rasterizer generates a fragment at 
each pixel location covered by that geometry. 
3. Each fragment is shaded by the fragment program. 
4. The fragment program computes the value of the 
fragment by a combination of math operations and 
global memory reads from a global Btexture [memory]. 
5. The resulting image can then be used as texture on 
future passes through the graphics pipeline.
We observe that different use cases weight 
the criteria differently—for example a VDI 
deployment values high VM-to-GPU 
consolidation ratios (e.g., multiplexing) 
while a consumer running a VM to access a 
game or CAD application unavailable on his 
host values performance and likely fidelity. A 
tech support person maintaining a library of 
different configurations and an IT 
administrator running server VMs are both 
likely to value portability and secure 
isolation (interposition).
 Front-end Virtualization 
- Front-end virtualization introduces a virtualization 
boundary at a relatively high level in the stack, and runs the 
graphics driver in the host/hypervisor. This approach does 
not rely on any GPU vendor- or model-specific de- tails. 
 Back-end Virtualization 
-The most obvious back-end virtualization technique 
is fixed pass-through: the permanent association of a virtual 
machine with full exclusive access to a physical GPU. Recent 
chipset features, such as Intel’s VT-d, make fixed pass-through 
practical without requiring any special knowledge of 
a GPU’s programming interfaces. However, fixed pass-through 
is not a general solution. It completely forgoes any 
multiplexing and packing machines with one GPU per virtual 
machine (plus one for the host) is not feasible.
 This paper presents our evaluation and analysis of the efficiency of GPU 
computing for data-parallel scientific applications. Starting with a 
bimolecular code that calculates electrostatic properties in a data-parallel 
manner (i.e., GEM), we evaluate our different implementations 
of GEM across three metrics: performance, energy consumption, and 
energy efficiency. 
 In the future, we will continue this work by investigating the effects of 
memory layout (global, constant, texture) on GPU performance and 
efficiency. In addition, we will delve further into potential techniques for 
proactively reducing power and conserving energy on the gpu. 
 There is much future work in developing reliable benchmarks which 
specifically stress the performance weaknesses of a virtualization layer. 
Our tests show API overheads of about 2 to 120 times that of a native 
GPU. As a result, the performance of a virtualized GPU can be highly 
dependent on subtle implementation details of the application under 
test. 
 Back-end virtualization holds much promise for performance, breadth of 
GPU feature support, and ease of driver maintenance. While fixed pass-through 
is easy, none of the more advanced techniques have been 
demonstrated .
[1].ftp://download.nvidia.com/developer/cud 
a/seminar/TDCI_ Arch.pdf 
[2].http://cs.utsa.edu/~qitian/seminar/Spring 
11/03_04_11/GPU.pdf 
[3].https://www.usenix.org/legacy/event/wio 
v08/tech/full_papers/dowty/dowty.pdf 
[4].http://courses.cs.washington.edu/courses/ 
cse471/13sp/lectures/GPUsStudents.pdf
Graphics processing unit ppt

Contenu connexe

Tendances

CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentation
Vishal Singh
 
Graphic Processing Unit
Graphic Processing UnitGraphic Processing Unit
Graphic Processing Unit
Kamran Ashraf
 
HYPER-THREADING TECHNOLOGY
HYPER-THREADING TECHNOLOGYHYPER-THREADING TECHNOLOGY
HYPER-THREADING TECHNOLOGY
SHASHI SHAW
 
Computer Organization and Architecture.pptx
Computer Organization and Architecture.pptxComputer Organization and Architecture.pptx
Computer Organization and Architecture.pptx
AshokRachapalli1
 

Tendances (20)

Multicore Processor Technology
Multicore Processor TechnologyMulticore Processor Technology
Multicore Processor Technology
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
Gpu presentation
Gpu presentationGpu presentation
Gpu presentation
 
Graphics processing unit
Graphics processing unitGraphics processing unit
Graphics processing unit
 
Graphics Processing Unit - GPU
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPU
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentation
 
Graphic Processing Unit
Graphic Processing UnitGraphic Processing Unit
Graphic Processing Unit
 
GPU
GPUGPU
GPU
 
CISC & RISC Architecture
CISC & RISC Architecture CISC & RISC Architecture
CISC & RISC Architecture
 
Intel Core i7 Processors
Intel Core i7 ProcessorsIntel Core i7 Processors
Intel Core i7 Processors
 
HYPER-THREADING TECHNOLOGY
HYPER-THREADING TECHNOLOGYHYPER-THREADING TECHNOLOGY
HYPER-THREADING TECHNOLOGY
 
Hyper threading technology
Hyper threading technologyHyper threading technology
Hyper threading technology
 
Computer science seminar topics
Computer science seminar topicsComputer science seminar topics
Computer science seminar topics
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
Computer Organization and Architecture.pptx
Computer Organization and Architecture.pptxComputer Organization and Architecture.pptx
Computer Organization and Architecture.pptx
 
Types Of Buses
Types Of BusesTypes Of Buses
Types Of Buses
 
Embedded system
Embedded systemEmbedded system
Embedded system
 
VIRTUALIZATION STRUCTURES TOOLS.docx
VIRTUALIZATION STRUCTURES TOOLS.docxVIRTUALIZATION STRUCTURES TOOLS.docx
VIRTUALIZATION STRUCTURES TOOLS.docx
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
 
Graphics Processing Unit by Saurabh
Graphics Processing Unit by SaurabhGraphics Processing Unit by Saurabh
Graphics Processing Unit by Saurabh
 

Similaire à Graphics processing unit ppt

Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
Editor IJARCET
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
Editor IJARCET
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
mohamedragabslideshare
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
theijes
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overview
Rajiv Kumar
 

Similaire à Graphics processing unit ppt (20)

A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
 
Gpu
GpuGpu
Gpu
 
Gpu
GpuGpu
Gpu
 
Amd fusion apus
Amd fusion apusAmd fusion apus
Amd fusion apus
 
Graphics Processing Unit: An Introduction
Graphics Processing Unit: An IntroductionGraphics Processing Unit: An Introduction
Graphics Processing Unit: An Introduction
 
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDAIRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
High Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming ParadigmsHigh Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming Paradigms
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
 
Challenges and Opportunities of FPGA Acceleration in Big Data
Challenges and Opportunities of FPGA Acceleration in Big DataChallenges and Opportunities of FPGA Acceleration in Big Data
Challenges and Opportunities of FPGA Acceleration in Big Data
 
Hybrid Multicore Computing : NOTES
Hybrid Multicore Computing : NOTESHybrid Multicore Computing : NOTES
Hybrid Multicore Computing : NOTES
 
Image Processing Application on Graphics processors
Image Processing Application on Graphics processorsImage Processing Application on Graphics processors
Image Processing Application on Graphics processors
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overview
 
An exposition of performance comparison of graphic processing unit virtualiza...
An exposition of performance comparison of graphic processing unit virtualiza...An exposition of performance comparison of graphic processing unit virtualiza...
An exposition of performance comparison of graphic processing unit virtualiza...
 
An Exposition of Performance Comparison of Graphic Processing Unit Virtualiza...
An Exposition of Performance Comparison of Graphic Processing Unit Virtualiza...An Exposition of Performance Comparison of Graphic Processing Unit Virtualiza...
An Exposition of Performance Comparison of Graphic Processing Unit Virtualiza...
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Dernier (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Graphics processing unit ppt

  • 1. By : Ravi Kumar Parmar Student ,BCA V semester(Roll-42) School of Computer And Systems Sciences Jaipur National University Jaipur
  • 2.  The graphics processing unit (GPU) has become an integral part of today’s mainstream computing systems. Over the past six years, there has been a marked increase in the performance and capabilities of GPUs.  GPU is a graphical processing unit which enables you run high definitions graphics on your PC, which are the emend of modern computing. Like the CPU (Central Processing Unit), it is a single-chip processor. The GPU has hundreds of cores as compared to the 4 or 8 in the latest CPUs. The primary job of the GPU is to compute 3D functions.
  • 3.  This has already been undertaken by a number of graphics processor vendors, such as in NVidia's Pure video technology, and AT i's, A vivo. Both companies’ technologies offload a number of the most computationally intensive aspects of MPEG decoding to the GPU, in order to speed up the process over the CPU alone so we will concentrate our work on the post-decoding phases.  With this paper we aim to fill the existing gap in the literature using the broader perspective of SOA. We do not restrict our self to specific problems but give an overview of the multitude of existing application examples and the promising future employment of graphics hardware in SOAs. By classifying the use cases into the layer of reference architecture, we show which specific advantage of GPUs is most beneficial at each layer
  • 4.  Control hardware dominates processors  Complex, difficult to build and verify  Takes substantial fraction of die Scales poorly  Pay for max throughput, sustain average throughput  Quadratic dependency checking  Control hardware doesn’t do any math!  Over the past few years, the GPU has evolved from a fixed-function special-purpose processor into a full-fledged parallel programmable processor with additional fixed-function special-purpose functionality.
  • 5.  The Graphics Pipeline - The input to the GPU is a list of geometric primitives, typically triangles, in a 3-D world coordinate system.Through many steps, Vertex Operations: The input primitives are formed from individual vertices. Each vertex must be transformed into screen space and shaded, typically through computing their interaction with the lights in the scene. Because typical scenes have tens to hundreds of thousands of vertices, and each vertex can be computed independently, this stage is well suited for parallel hardware.
  • 6.
  • 7.  Evolution of GPU Architecture - The fixed-function pipeline lacked the generality to efficiently express more complicated shading and lighting operations that are essential for complex effects. The key step was replacing the fixed-function per-vertex and per-fragment operations with user-specified programs run on each vertex and fragment. Over the past six years, these vertex programs and fragment programs have become increasingly more capable, with larger limits on their size and resource consumption, with more fully featured instruction sets, and with more flexible control-flow operations.
  • 8.  Architecture Of Modern Gpu - we noted that the GPU is built for different application demands than the CPU: large parallel computation requirements with an emphasis on throughput rather than latency. Consequently, the architecture of the GPU has progressed in a different direction than that of the CPU. - The CPU divides the pipeline in time, applying all resources in the processor to each stage in turn. GPUs have historically taken a different approach. The GPU divides the resources of the processor among the different stages, such that the pipeline is divided in space, not time. The part of the processor working on one stage feeds its output directly into a different part that works on the next stage.
  • 9.
  • 10.  NVIDIA – Tesla HPC specific GPUs have evolved from GeForce series  AMD – Fire Stream HPC specific GPUs have evolved from (ATI) Radeon series  Intel – Knights Corner many-core x86 chip is like hybrid between a GPU and many-core CPU
  • 11.  Large computational requirements  Massive parallelism  Graphics pipeline designed for independent operations  Long latencies tolerable  Deep, feed-forward pipelines  Hacks are OK—can tolerate lack of accuracy  GPUs are good at parallel, arithmetically intense, streaming-memory problems
  • 12.  In addition to query processing, large web search engines need to perform many other operations including web crawl-ing, index building, and data mining steps for tasks such as link analysis and spam and duplicate detection. We focus here on query processing and in particular on one phase of this step as explained further below. We believe that this part is suitable for implementation on GPUs as it is fairly simple in structure but nonetheless consumes a disproportionate amount of the overall system resources. In contrast, we do not think that implementation of a complete search engine on a GPU is currently realistic.
  • 13.  The Gpu Programming Model - The programmable units of the GPU follow a single program multiple-data (SPMD) programming model. For efficiency, the GPU processes many elements (vertices or fragments) in parallel using the same program. Each element is independent from the other elements, and in the base programming model, elements cannot communicate with each other. All GPU programs must be structured in this way: many parallel elements, each processed in parallel by a single program.
  • 14.  General-Purpose Computing on the GPU - Steps to show the simpler and direct way that today’s GPU computing applications are written. 1. Programming a GPU for Graphics: We begin with the same GPU pipeline that we described in Section II, concentrating on the programmable aspects of this pipeline. 2. The programmer specifies geometry that covers a region on the screen. The rasterizer generates a fragment at each pixel location covered by that geometry. 3. Each fragment is shaded by the fragment program. 4. The fragment program computes the value of the fragment by a combination of math operations and global memory reads from a global Btexture [memory]. 5. The resulting image can then be used as texture on future passes through the graphics pipeline.
  • 15. We observe that different use cases weight the criteria differently—for example a VDI deployment values high VM-to-GPU consolidation ratios (e.g., multiplexing) while a consumer running a VM to access a game or CAD application unavailable on his host values performance and likely fidelity. A tech support person maintaining a library of different configurations and an IT administrator running server VMs are both likely to value portability and secure isolation (interposition).
  • 16.  Front-end Virtualization - Front-end virtualization introduces a virtualization boundary at a relatively high level in the stack, and runs the graphics driver in the host/hypervisor. This approach does not rely on any GPU vendor- or model-specific de- tails.  Back-end Virtualization -The most obvious back-end virtualization technique is fixed pass-through: the permanent association of a virtual machine with full exclusive access to a physical GPU. Recent chipset features, such as Intel’s VT-d, make fixed pass-through practical without requiring any special knowledge of a GPU’s programming interfaces. However, fixed pass-through is not a general solution. It completely forgoes any multiplexing and packing machines with one GPU per virtual machine (plus one for the host) is not feasible.
  • 17.  This paper presents our evaluation and analysis of the efficiency of GPU computing for data-parallel scientific applications. Starting with a bimolecular code that calculates electrostatic properties in a data-parallel manner (i.e., GEM), we evaluate our different implementations of GEM across three metrics: performance, energy consumption, and energy efficiency.  In the future, we will continue this work by investigating the effects of memory layout (global, constant, texture) on GPU performance and efficiency. In addition, we will delve further into potential techniques for proactively reducing power and conserving energy on the gpu.  There is much future work in developing reliable benchmarks which specifically stress the performance weaknesses of a virtualization layer. Our tests show API overheads of about 2 to 120 times that of a native GPU. As a result, the performance of a virtualized GPU can be highly dependent on subtle implementation details of the application under test.  Back-end virtualization holds much promise for performance, breadth of GPU feature support, and ease of driver maintenance. While fixed pass-through is easy, none of the more advanced techniques have been demonstrated .
  • 18. [1].ftp://download.nvidia.com/developer/cud a/seminar/TDCI_ Arch.pdf [2].http://cs.utsa.edu/~qitian/seminar/Spring 11/03_04_11/GPU.pdf [3].https://www.usenix.org/legacy/event/wio v08/tech/full_papers/dowty/dowty.pdf [4].http://courses.cs.washington.edu/courses/ cse471/13sp/lectures/GPUsStudents.pdf