SlideShare une entreprise Scribd logo
1  sur  61
CS 354
GPU Architecture
Mark Kilgard
University of Texas
March 6, 2012
CS 354                                                                2



         Today’s material
        In-class quiz
        Lecture topic
            Architecture of Graphics Processing Units (GPUs)
        Course work
            Homework #4 due today
            Review textbook reading
                 Chapter 5, 6, and 7
            Project #2 on texturing, shading, & lighting is coming
            Remember: Midterm in-class on March 8
CS 354                            3



         My Office Hours
        Tuesday, before class
            Painter (PAI) 5.35
            8:45 a.m. to 9:15
        Thursday, after class
            ACE 6.302
            11:00 a.m. to 12


        Randy’s office hours
            Monday & Wednesday
            11 a.m. to 12:00
            Painter (PAI) 5.33
CS 354                                           4



         Last time, this time
        Last lecture, we discussed
          Programmable shading
          Graphics hardware shading languages
        This lecture
            How do GPUs work?
CS 354                                                                                      5

                                            On a sheet of paper
         Daily Quiz                         • Write your EID, name, and date
                                            • Write #1, #2, #3, #4 followed by its answer

        Pick the best choice: Shade                  Multiple choice: The GLSL standard
                                                       has built-in data types for
         trees are                                     a) vectors
         a) fractal trees with shadows                 b) matrices
         b) OpenGL commands                            c) texture samplers
         c) hierarchical arrangements of               d) floating-point values
                                                       e) pointers to malloc’ed memory
         shading computations                          f) a through e
         d) fractal patterns of all sorts              g) a through d

        Name one general purpose
         programming language that GLSL
         borrows from.
CS 354                                                                    6



          Key Trend in OpenGL Evolution
                                Complex
                           Configurability
                Simple                         Shaders!
         Configurability
                                                   High-level languages

                Fixed-function               Programmable

      Direct3D follows the same trend
      Also reflects trend in GPU architecture
              API and hardware co-evolving
CS 354                                                                                                          7



         Programming Shaders inside GPU
                                  Multiple programmable domains within the GPU
   3D Application
     or Game                      Can be programmed in high-level languages
                                       Cg, HLSL, or OpenGL Shading Language (GLSL)
   OpenGL API
                                                         CPU – GPU
                                                          Boundary
       GPU           Vertex               Primitive                    Clipping, Setup,             Raster
     Front End      Assembly              Assembly                    and Rasterization            Operations


                                   Vertex                Geometry                    Fragment
                                   Shader                Program                      Shader


           Attribute Fetch

Legend
                                 Parameter Buffer Read                  Texture Fetch     Framebuffer Access
 programmable

 fixed-function
                                                           Memory Interface
                                                                                            OpenGL 3.3
CS 354                              8



         Complex OpenGL Data Flow
CS 354                                                                                          9


   Six Years of GPU Architecture
                                                                               OpenGL    Direct3D
         Product              New Features                                     Version    Version

                              Hardware transform & lighting, configurable
2000     GeForce 256          fixed-point shading, cube maps, texture           1.3       DX7
                              compression, anisotropic texture filtering

                              Programmable vertex transformation, 4
                              texture units, dependent textures, 3D
2001     GeForce3
                              textures, shadow maps, multisampling,
                                                                                1.4       DX8
                              occlusion queries

2002     GeForce4 Ti 4600     Early Z culling, dual-monitor                     1.4      DX8.1
                              Vertex program branching, floating-point
                              fragment programs, 16 texture units, limited
2003     GeForce FX
                              floating-point textures, color and depth
                                                                                1.5       DX9
                              compression

                              Vertex textures, structured fragment
                              branching, non-power-of-two textures,
2004     GeForce 6800 Ultra
                              generalized floating-point textures, floating-
                                                                                2.0      DX9c
                              point texture filtering and blending

2005     GeForce 7800 GTX     Transparency antialiasing                         2.0      DX9c
CS 354                                                                                                                       10

                              GeForce Peak
                              Vertex Processing Trends
                                                   rate for trivial 4x4   exceeds peak
                                           1,400



                                                   vertex transform       setup rates—allows
         Millions of vertices per second




                                           1,200
                                                                          excess vertex
                                                                          processing
                                           1,000




                                            800




                                            600




                                            400




                                            200




                                              0
                                                   GeForce2    GeForce3   GeForce4 Ti   GeForce FX    GeForce     GeForce
                                                     GTS                     4600                    6800 Ultra   7800 GTX

   Vertex units                                        1          1            2              3             6           8
CS 354                                                                                                                    11

                           GeForce Peak
                           Memory Bandwidth Trends
                          200
                                                          128-bit interface        256-bit interface
                          180



                                                                                                         Raw
                          160                                                                            bandwidth
   Gigabytes per second




                          140

                                                                                                         Effective raw
                                                                                                         bandwidth
                          120
                                                                                                         with
                                                                                                         compression
                          100
                                                                                                         Expon.
                                                                                                         (Effective raw
                                                                                                         bandwidth
                          80
                                                                                                         with
                                                                                                         compression)
                          60
                                                                                                         Expon. (Raw
                                                                                                         bandwidth)

                          40




                          20




                           0
                                GeForce2   GeForce3   GeForce4 T i GeForce FX    GeForce      GeForce
                                  GT S                   4600                   6800 Ultra   7800 GT X
CS 354                                                        12

         Effective GPU
         Memory Bandwidth
        Compression schemes
          Lossless depth and color (when multisampling)
           compression
          Lossy texture compression (S3TC / DXTC)
          Typically assumes 4:1 compression
        Avoidance useless work
          Early killing of fragments (Z cull)
          Avoiding useless blending and texture fetches
        Very clever memory controller designs
          Combining memory accesses for improved coherency
          Caches for texture fetches
CS 354                                                                                                                             13

                      GeForce Core and Memory
                      Clock Rates
                      1,400
                                                          DDR memory
                      1,200
                                                          transition—
                                                          memory rates
                      1,000
                                                          double physical
                                                          clock rate
    Megahertz (Mhz)




                       800                                                                                                Core
                                                                                                                          clock
                       600                                                                                                Memory
                                                                                                                          clock
                       400



                       200



                         0




                                                                                                                      X
                                                                                                          a
                                                                              0
                                                      S




                                                                                                       ltr



                                                                                                                    T
                                                                                           X
                                                                            60
                                           2




                                                                  3
                               X




                                                      T
                                          T




                                                                                                                  G
                                                                                        F



                                                                                                   U
                                                                ce
                              Z




                                                     G




                                                                           i4
                                       N




                                                                                      ce




                                                                                                                0
                                                                                                  0
                                                              or
                          a



                                      T



                                                2




                                                                          T




                                                                                                              80
                                                                                                80
                        iv




                                                ce




                                                                                    or
                                                           eF
                                      a




                                                                      4
                       R




                                                                                                              7
                                    iv




                                                                                               6
                                                                                eF
                                                                      e
                                              or




                                                          G
                                   R




                                                                                                         ce
                                                                     c




                                                                                           ce
                                          eF




                                                                                G
                                                                  or




                                                                                                       or
                                                                                        or
                                                                eF
                                          G




                                                                                                   eF
                                                                                     eF
                                                              G




                                                                                                   G
                                                                                    G
CS 354                                                                                                                   14

                              GeForce Peak
                              Triangle Setup Trends
                                            300
                                                     assumes 50%
                                                     face culling
         Millions of triangles per second




                                            250




                                            200




                                            150




                                            100




                                            50




                                             0

                                                  GeForce2   GeForce3   GeForce4 Ti GeForce FX    GeForce      GeForce
                                                    GTS                    4600                  6800 Ultra   7800 GTX
CS 354                                                                                                                 15

             GeForce Peak
             Texture Fetch Trends
                                       12,000
                                                  assuming no texture
                                                  cache misses
                                       10,000
         Millions of texture fetches




                                        8,000
                 per second




                                        6,000




                                        4,000




                                        2,000




                                           0
                                                GeForce2   GeForce3   GeForce4 Ti GeForce FX    GeForce      GeForce
                                                  GTS                   4600                   6800 Ultra   7800 GTX

   Texture units                                 2×4        2×4          2×4       2×4            16          24
CS 354                                                                                                                             16

               GeForce Peak
               Depth/Stencil-only Fill
                                                   18,000
                                                             assuming no                       double speed
         Millions of depth/stencil pixel updates




                                                   16,000    read-modify-write                 depth-stencil
                                                                                               only
                                                   14,000


                                                   12,000
                       per second




                                                   10,000


                                                    8,000


                                                    6,000


                                                    4,000


                                                    2,000


                                                       0
                                                            GeForce2   GeForce3   GeForce4 Ti GeForce FX    GeForce      GeForce
                                                              GTS                   4600                   6800 Ultra   7800 GTX

  Raster Op units                                              4          4          4          4+4          16+16       16+16
CS 354                                                                                                            17

         GeForce Transistor Count and
         Semiconductor Process
                                   450




                                   400
         Millions of transistors



                                   350




                                   300




                                   250




                                   200




                                   150




                                   100




                                    50




                                    0

                                         Riva ZX   Riva   GeForce2 GeForce3 GeForce4 GeForce   GeForce GeForce
                                                   TNT2     GTS              Ti 4600   FX       6800   7800 GTX
                                                                                                Ultra


     Process (µm) 0.35                             0.22    0.18     0.18    0.15     0.13      0.13     0.11
CS 354                                             18
 Hardware GeForce        GeForce        GeForce
 Unit     FX 5900       6800 Ultra      7800 GTX
 Vertex
                  3            6                   8


                  4+4              16              24
 Fragment

 2nd Texture
 Fetch



                4+4        16+16          16+16
 Raster Color

 Raster Depth
CS 354                                                              19

         GeForce 7800 GTX
         Board Details
               SLI Connector            Single slot cooling



   sVideo
   TV Out


     DVI x 2




                                             256MB/256-bit DDR3
                                                600 MHz
                      16x PCI-Express           8 pieces of 8Mx32
CS 354                                                         20

         GeForce 7800 GTX
         GPU Details
                  302 million transistors
                  430 MHz core clock
                  256-bit memory interface

                  Notable Functionality
                  • Non-power-of-two textures with mipmaps
                  • Floating-point (fp16) blending and filtering
                  • sRGB color space texture filtering and
                    frame buffer blending
                  • Vertex textures
                  • 16x anisotropic texture filtering
                  • Dynamic vertex and fragment branching
                  • Double-rate depth/stencil-only rendering
                  • Early depth/stencil culling
                  • Transparency antialiasing
CS 354                                                                              21

   GeForce 7800 GTX
   Parallelism
                                                                 8 Vertex Engines

         Z-Cull           Triangle Setup/Raster

                        Shader Instruction Dispatch           24 Fragment Shaders




                      Fragment Crossbar       16 Raster Operation Pipelines




          Memory               Memory             Memory            Memory
          Partition            Partition          Partition         Partition
CS 354                                                             22



         GeForce Graphics Pipeline

                     Separate dedicated units




            Vertex                    Fragment   Raster   Frame
     CPU    Engine   Setup   Raster    Shader     Ops     Buffer



                             Z Cull   Texture
CS 354                                                                    23

         GeForce Graphics Pipeline
         Vertex Engine
            Vertex pulling
            Vector floating-point instructions
            Dynamic branching
            Vertex texture
            Vertex stream frequency


            Vertex                           Fragment   Raster   Frame
     CPU    Engine     Setup      Raster      Shader     Ops     Buffer



                                   Z Cull    Texture
CS 354                                                                   24

         GeForce Graphics Pipeline
         Setup
                     Prepare triangle for
                     rasterization
                     215M triangles/sec setup




            Vertex                          Fragment   Raster   Frame
     CPU    Engine     Setup      Raster     Shader     Ops     Buffer



                                   Z Cull   Texture
CS 354                                                                   25

         GeForce Graphics Pipeline
         Raster
                             Compute coverage
                             Points, lines, and triangles
                             Rotated grid multisampling




            Vertex                        Fragment     Raster   Frame
     CPU    Engine   Setup     Raster      Shader       Ops     Buffer



                               Z Cull     Texture
CS 354                                                                     26

         GeForce Graphics Pipeline
         Z Cull


                             Discard fragments early based on Z
                             Up to 64 pixels/clock
                             Multisampled: 256 samples/clock

            Vertex                       Fragment    Raster       Frame
     CPU    Engine   Setup     Raster     Shader      Ops         Buffer



                               Z Cull     Texture
CS 354                                                                   27

         GeForce Graphics Pipeline
         Fragment Shader
                                      User-programmed fragment coloring
                                      Dynamic branching
                                      Long shaders
                                      Multiple render targets
                                      fp16 and fp32 vectors



            Vertex                       Fragment      Raster   Frame
     CPU    Engine   Setup   Raster       Shader        Ops     Buffer



                             Z Cull       Texture
CS 354                                                                     28

         GeForce Graphics Pipeline
         Texture
                                      fp16 and sRGB filtering
                                      16x anisotropic filtering
                                      Non-power-of-two mipmapping
                                      Shadow maps, cube maps, and 3D
                                      Floating-point textures


            Vertex                      Fragment      Raster      Frame
     CPU    Engine   Setup   Raster      Shader        Ops        Buffer



                             Z Cull      Texture
CS 354                                                                   29

         GeForce Graphics Pipeline
         Texture
                                             2x and 4x multisampling
                                             fp16 and sRGB blending
                                               Multiple render targets
                                        Color and depth compression
                                      Double-speed depth/stencil only


            Vertex                      Fragment    Raster      Frame
     CPU    Engine   Setup   Raster      Shader      Ops        Buffer



                             Z Cull     Texture
CS 354                                                                       30

         Single GeForce 7800
         Vertex Unit
                  Primitive Assembly +       Vertex Processing Engine
                  Attribute Processing       • MIMD Architecture
                                             • Dual Issue
                                             • Low-penalty branching
                                             • Shader Model 3.0
                                             • 32 vector registers
             Vertex     FP32        FP32     • 512 static instructions per
             Texture    Scalar      Vector
              Fetch      Unit        Unit
                                               program
                                             • Indexed input and output
                                               registers

   Texture              Branch
                                             Vertex Texture Fetch
    Cache                Unit
                                             • Non-stalling
                                             • Up to 4 texture units
                  Viewport Processing            • Unlimited fetches
                                             • Mipmapping, no filtering

                       To Setup
CS 354                                                           31



          Vertex Texturing Example




                                   Vertex
                                  Program




         Flat tessellated mesh                  Displaced mesh
                                 Height field
                                   texture
CS 354                                                                                      32

         Vertex Textures for Dynamic
         Displacement Mapping




         Without Vertex Textures                        With Vertex Textures

  Images used with permission from Pacific Fighters. © 2004 Developed by 1C:Maddox Games.
                      All rights reserved. © 2004 Ubi Soft Entertainment.
CS 354                              33

         Vertex Textures to Drive
         Particle Systems
        Render-to-texture
            Simulation runs
             in floating-point
             frame buffer, also
             usable as texture
        Vertex textures
            Determines particle
             location with
             vertex texture
             fetch
CS 354                                                                              34

          Single GeForce 7800
          Fragment Shader Pipeline
         Texture     Input Fragment      Texture Processor
          Data             Data
                                         16 texture units
                                         1 texture fetch at full speed
                                         Bilinear or tri-linear filtering
                         FP32            16x anisotropic filtering
          Texture
                         Shader
         Processor                       Floating-point (fp16) texture filtering
                         Unit 1
                                              Shader Unit 1
                         FP32                 4 MULs + RCP
         Texture                              Dual Issue
                         Shader
          Cache          Unit 2               Texture address calculation
                                              Fast fp16 normalize
                        Branch                Free: negate, abs, condition codes
                       Processor
                                                    Shader Unit 2
                                       Output       4 MADs or DP4
                      Fixed-function
                                       Shaded       Dual Issue
                         Fog Unit
                                       Fragments
                                                    Free: negate, abs, condition codes
CS 354                                                               35

         Operations Per Fragment
         Shader Pass

         Shader           4 Components            1 Texture /
         Unit 1           1 Op / component
                                                  fragment at full
                          4 ops / fragment   or
                            per pass              speed per pass
         Texture

         Shader           4 Components
                          1 Op / component
         Unit 2           4 Ops / fragment
                            per pass


                  8 Operations / fragment per pass
CS 354                                                                      36

         Fragment Shader
         Component Co-issue
        Use 4 components various ways
          RGBA all together
          RGB and A
          RG and GB
                               Shader
      Both shader units       Unit 1   R       G         B       A

      Two operations                       Operation 1       Operation 2

       per shader unit
                               Shader
                               Unit 2   R       G         B       A

                                        Operation 3       Operation 4
CS 354                                                               37

         Single GeForce 7800
         Raster Operations Pipeline
       Input
    Shaded             Pixel Crossbar
   Fragment             Interconnect             Functionality
        Data
                                                 • OpenEXR
                   Multisample Antialiasing        floating-point
                                                   blending
                                                 • sRGB
                 Depth                Color        blending
               Compression         Compression   • 4x rotated grid
                                                   multisampling
                 Depth                Color      • Lossless color
                Raster               Raster        and depth
               Operations           Operations     compression
                                                 • Multiple
                                                   render targets
    Memory          Frame Buffer Partition
CS 354                                                                38

         GeForce 7800
         Transparency Antialiasing




           Conventional 4x antialiasing   Transparency antialiasing
           with alpha tested context      with alpha tested context
CS 354                                                          39



         Scalable Link Interface (SLI)

        Gang two GeForce 6600, 6800, or 7800
         graphics boards together
            Can almost double your performance

                                              SLI
                                              Connector




                                              Two 6800 Ultras
                                              pictured
CS 354                                                                  40



         SLI Rendering Modes
        Split Frame Rendering (SFR)
          One GPU renders top of screen; other renders the bottom
          Scales fragment processing but not vertex processing
        Alternate Frame Rendering (AFR)
          Scales both vertex and fragment processing
          Adds frame latency
          Rendering must be free of CPU synchronization
        SLI Antialiasing: SLI8x and SLI16x
          Better antialiasing quality rather than performance
          Each card renders with slightly different sub-pixel offset
CS 354                                                                                    41

     PC Graphics Hardware Evolution
     Viable economics: 650 million GeForce GPUs since 1999
     1,000x complexity since 1995
     Moore’s Law at work                                                   GeForce
                                                                           580 GTX
                                                                         3B transistors
                                                         GeForce
                                                           8800
                                                           681M
                                          GeForce FX    transistors
               GeForce 256 GeForce 3
                      ®                      125M
                   23M         60M        transistors
    RIVA 128
                transistors transistors
       3M
   transistors




         1997             2000                 2005                   2010
CS 354                                        42



             Current High-end “Fermi” GPU
            Current high-end graphics card
            512 graphics “cores”
            1.5Gb memory
            System power: 600W
            OpenGL 4.2 / DirectX 11
             functionality
CS 354                                            43



         High-level “Fermi” Architecture
                                    GF100
                                    Four Graphics
                                     Processor
                                     Clusters (GPCs)
                                      Each is self-
                                       contained
                                       graphics
                                       pipeline
                                      Smaller chips
                                       have fewer
                                       GPcs
                                    Shared L2 cache
                                    6 Memory
                                     Controllers
                                        1.5 Gb
CS 354                                               44

         Inside Each
         Graphics Processing Cluster
                                    Raster engine
                                    Four SMs
                                        Streaming
                                         Multiprocessor
                                    Texture fetch
                                     resources
                                    Tessellation and
                                     vertex
                                     processing
                                     resources
                                        Polymorph
                                         Engine
CS 354                              45

         Streaming
         Multiprocessor (SM)
        Multi-processor
         execution unit
          32 scalar processor
           cores
          Warp is a unit of
           thread execution of up
           to 32 threads
        Two workloads
            Graphics
                 Vertex shader
                 Tessellation
                 Geometry shader
                 Fragment shader
            Compute
CS 354                                                                                                     46

          OpenGL Pipeline Programmable
          Domains run on Unified Hardware
            Unified Streaming Processor Array (SPA) architecture
             means same capabilities for all domains
                Plus tessellation + compute (not shown below)


                                                                     ,
       GPU          Vertex        Primitive                   Clipping, Setup,
                                                                                                Raster
     Front End     Assembly       Assembly                   and Rasterization                 Operations



         Can be            Vertex                Primitive                       Fragment
         unified          Program                Program                         Program

         hardware!
     Attribute Fetch     Parameter Buffer Read                  Texture Fetch         Framebuffer Access


                                                   Memory Interface
CS 354                          47



         Dual Warp Scheduling


32 threads launch!
CS 354                                     48

         Shader or CUDA Core,
         Same Unit but Two Personalities
        Execution unit
          Scalar floating-point
          Scalar integer
CS 354                                        49



         Levels of Caching in Fermi GPU
        12 KB L1 Texture cache
             Per texture unit
        SM 64 K cache
           Split into dedicated 16K or 48K
            Load/Store cache
           Shared memory 48K or 16K




        L2 unifies texture cache, raster
         operation cache, and internal
         buffering in prior generation
           768 K
             Read / write
             Fully coherent
CS 354                                                                    50

         Cache Use Strategies
         in Fermi GPU
        Pipeline stages can communicate efficiently through
         GPU’s L1 and L2 caches
          Buffering between stages stays all on chip
          Only vertex, texel, and pixel read/writes need to go to DRAM
CS 354                                                                  51

         Vertex and Tessellation
         Processing Tasks
        Fixed-function graphics engines
            Pull attributes and assemble vertex
            Manage tessellation control and domain shader evaluation
            Viewport transform
            Attribute setup of plane equations for rasterization
            Stream out vertices into buffers
CS 354                                                                  52



         Rasterization Tasks
        Turns primitives into fragments
            Computes edge equations
            Two-stage rasterization
                 Coarse raster finds tiles the primitive could be in
                 Fine raster evaluates sample positions within tiles
            Zcull efficiently eliminates occluded fragments
CS 354                                         53



    Base Input
 Input Mesh           Mesh




         From Metro 2033, © THQ and 4A Games
CS 354                                             54



         Apply Phong Tessellation




             From Metro 2033, © THQ and 4A Games
CS 354                                         55



   Add Displacement Mapping
 Apply Displacement Mapping




         From Metro 2033, © THQ and 4A Games
CS 354                                                          56



          GPUs as Compute Nodes
         Architecture of GPU has evolved into a high-
          performance, high-bandwidth compute node



         Small form factor
            Compute




         Integrated CPU-GPU   OEM CPU Server +   Workstations
           Servers & Blades     Compute 1U       2 to 4 Tesla
                                                    GPUs
CS 354                                                          57



         Compute Programming Model
        Cooperative Thread Array (CTA)
          Single Program, Multiple Data
          Organized in 1D, 2D, or 3D
        Programming APIs
            CUDA, OpenCL, DirectCompute
                APIs + language = parallel processing system
            OpenGL or Direct3D through shaders
                Cg, HLSL, GLSL
CS 354                                58




             Now in World’s Fastest
               Supercomputers
            Tianhe-1A

          2.507 Petaflop

         7168 Tesla M2050
               GPUs

    National Supercomputing Center
               in Tianjin
CS 354                             59

         Opposite direction:
         Consumer mobile devices
CS 354                                60

             Low-power Mobile
             System on a Chip (SoC)
         Complete system on a chip
          4 ARM cores
          Integrated graphics
                OpenGL ES 2.0
            Power <1W
CS 354                                                                               61



         Mid-term Next Class
        Mid-term
            Similar in format to the homeworks
            15% of your final grade
            Arrive on time
            Open textbook. Open notes, including lecture slides.
            Calculators allowed/encouraged.
            No smart phones, no computers, no Internet access.
            Show your work to justify your answer and provide a basis for partial
             credit.
        What to study
          All material in lecture slides
          Review in-class quiz questions
          Study homeworks
          Responsible for textbook readings
        Have a relaxing spring break
          Next lecture: Shadows
          Come back to Project 2

Contenu connexe

Tendances

Terrain in Battlefield 3: A Modern, Complete and Scalable System
Terrain in Battlefield 3: A Modern, Complete and Scalable SystemTerrain in Battlefield 3: A Modern, Complete and Scalable System
Terrain in Battlefield 3: A Modern, Complete and Scalable SystemElectronic Arts / DICE
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadTristan Lorach
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingStar Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingumsl snfrzb
 
Dissecting the Rendering of The Surge
Dissecting the Rendering of The SurgeDissecting the Rendering of The Surge
Dissecting the Rendering of The SurgePhilip Hammer
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringElectronic Arts / DICE
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Johan Andersson
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Tiago Sousa
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)Philip Hammer
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteElectronic Arts / DICE
 
Past, Present and Future Challenges of Global Illumination in Games
Past, Present and Future Challenges of Global Illumination in GamesPast, Present and Future Challenges of Global Illumination in Games
Past, Present and Future Challenges of Global Illumination in GamesColin Barré-Brisebois
 
Deferred rendering in Dying Light
Deferred rendering in Dying LightDeferred rendering in Dying Light
Deferred rendering in Dying LightMaciej Jamrozik
 
Hair animation by vertex shader
Hair animation by vertex shaderHair animation by vertex shader
Hair animation by vertex shader동석 김
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
 
Forward+ (EUROGRAPHICS 2012)
Forward+ (EUROGRAPHICS 2012)Forward+ (EUROGRAPHICS 2012)
Forward+ (EUROGRAPHICS 2012)Takahiro Harada
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3guest11b095
 
Multiprocessor Game Loops: Lessons from Uncharted 2: Among Thieves
Multiprocessor Game Loops: Lessons from Uncharted 2: Among ThievesMultiprocessor Game Loops: Lessons from Uncharted 2: Among Thieves
Multiprocessor Game Loops: Lessons from Uncharted 2: Among ThievesNaughty Dog
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3Electronic Arts / DICE
 

Tendances (20)

Terrain in Battlefield 3: A Modern, Complete and Scalable System
Terrain in Battlefield 3: A Modern, Complete and Scalable SystemTerrain in Battlefield 3: A Modern, Complete and Scalable System
Terrain in Battlefield 3: A Modern, Complete and Scalable System
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingStar Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processing
 
Dissecting the Rendering of The Surge
Dissecting the Rendering of The SurgeDissecting the Rendering of The Surge
Dissecting the Rendering of The Surge
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
 
Past, Present and Future Challenges of Global Illumination in Games
Past, Present and Future Challenges of Global Illumination in GamesPast, Present and Future Challenges of Global Illumination in Games
Past, Present and Future Challenges of Global Illumination in Games
 
Hair in Tomb Raider
Hair in Tomb RaiderHair in Tomb Raider
Hair in Tomb Raider
 
Deferred rendering in Dying Light
Deferred rendering in Dying LightDeferred rendering in Dying Light
Deferred rendering in Dying Light
 
Hair animation by vertex shader
Hair animation by vertex shaderHair animation by vertex shader
Hair animation by vertex shader
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
Forward+ (EUROGRAPHICS 2012)
Forward+ (EUROGRAPHICS 2012)Forward+ (EUROGRAPHICS 2012)
Forward+ (EUROGRAPHICS 2012)
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3
 
Multiprocessor Game Loops: Lessons from Uncharted 2: Among Thieves
Multiprocessor Game Loops: Lessons from Uncharted 2: Among ThievesMultiprocessor Game Loops: Lessons from Uncharted 2: Among Thieves
Multiprocessor Game Loops: Lessons from Uncharted 2: Among Thieves
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
 

En vedette

CS 354 Introduction
CS 354 IntroductionCS 354 Introduction
CS 354 IntroductionMark Kilgard
 
Robot In OpenGL Using Line Function
Robot In OpenGL Using Line Function Robot In OpenGL Using Line Function
Robot In OpenGL Using Line Function Jannat Jamshed
 
Programming using opengl in visual c++
Programming   using opengl in visual c++Programming   using opengl in visual c++
Programming using opengl in visual c++Ta Nam
 
Creating a game using C++, OpenGL and Qt
Creating a game using C++, OpenGL and QtCreating a game using C++, OpenGL and Qt
Creating a game using C++, OpenGL and Qtguestd5d4ce
 
CS 354 Global Illumination
CS 354 Global IlluminationCS 354 Global Illumination
CS 354 Global IlluminationMark Kilgard
 
CS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and CullingCS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and CullingMark Kilgard
 
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...AMD Developer Central
 
General Programming on the GPU - Confoo
General Programming on the GPU - ConfooGeneral Programming on the GPU - Confoo
General Programming on the GPU - ConfooSirKetchup
 
CSTalks - GPGPU - 19 Jan
CSTalks  -  GPGPU - 19 JanCSTalks  -  GPGPU - 19 Jan
CSTalks - GPGPU - 19 Jancstalks
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...npinto
 
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Storti Mario
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLinaro
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux ClubOfer Rosenberg
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteNVIDIA
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Rob Gillen
 

En vedette (20)

CS 354 Introduction
CS 354 IntroductionCS 354 Introduction
CS 354 Introduction
 
Robot In OpenGL Using Line Function
Robot In OpenGL Using Line Function Robot In OpenGL Using Line Function
Robot In OpenGL Using Line Function
 
Programming using opengl in visual c++
Programming   using opengl in visual c++Programming   using opengl in visual c++
Programming using opengl in visual c++
 
Robot by gulnaz
Robot by gulnazRobot by gulnaz
Robot by gulnaz
 
Creating a game using C++, OpenGL and Qt
Creating a game using C++, OpenGL and QtCreating a game using C++, OpenGL and Qt
Creating a game using C++, OpenGL and Qt
 
CS 354 Global Illumination
CS 354 Global IlluminationCS 354 Global Illumination
CS 354 Global Illumination
 
CS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and CullingCS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and Culling
 
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
 
General Programming on the GPU - Confoo
General Programming on the GPU - ConfooGeneral Programming on the GPU - Confoo
General Programming on the GPU - Confoo
 
Gpgpu intro
Gpgpu introGpgpu intro
Gpgpu intro
 
CSTalks - GPGPU - 19 Jan
CSTalks  -  GPGPU - 19 JanCSTalks  -  GPGPU - 19 Jan
CSTalks - GPGPU - 19 Jan
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
 
Cliff sugerman
Cliff sugermanCliff sugerman
Cliff sugerman
 
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
 
Gpgpu
GpgpuGpgpu
Gpgpu
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux Club
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 Keynote
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
 

Similaire à CS 354 GPU Architecture

SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012Mark Kilgard
 
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and BeyondSIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and BeyondMark Kilgard
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyMark Kilgard
 
CS 354 Ray Casting & Tracing
CS 354 Ray Casting & TracingCS 354 Ray Casting & Tracing
CS 354 Ray Casting & TracingMark Kilgard
 
OpenGL 3.2 and More
OpenGL 3.2 and MoreOpenGL 3.2 and More
OpenGL 3.2 and MoreMark Kilgard
 
NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityMark Kilgard
 
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...Persistent Systems Ltd.
 
The next generation of GPU APIs for Game Engines
The next generation of GPU APIs for Game EnginesThe next generation of GPU APIs for Game Engines
The next generation of GPU APIs for Game EnginesPooya Eimandar
 
Hardware Shaders
Hardware ShadersHardware Shaders
Hardware Shadersgueste52f1b
 
CS 354 Project 2 and Compression
CS 354 Project 2 and CompressionCS 354 Project 2 and Compression
CS 354 Project 2 and CompressionMark Kilgard
 
2D Games to HPC
2D Games to HPC2D Games to HPC
2D Games to HPCDVClub
 
GeForce 8800 OpenGL Extensions
GeForce 8800 OpenGL ExtensionsGeForce 8800 OpenGL Extensions
GeForce 8800 OpenGL Extensionsicastano
 
OpenGL Shading Language
OpenGL Shading LanguageOpenGL Shading Language
OpenGL Shading LanguageJungsoo Nam
 
Improving Shadows and Reflections via the Stencil Buffer
Improving Shadows and Reflections via the Stencil BufferImproving Shadows and Reflections via the Stencil Buffer
Improving Shadows and Reflections via the Stencil BufferMark Kilgard
 
Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360VIKAS SINGH BHADOURIA
 
Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Johan Andersson
 

Similaire à CS 354 GPU Architecture (20)

SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012
 
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and BeyondSIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and Transparency
 
CS 354 Ray Casting & Tracing
CS 354 Ray Casting & TracingCS 354 Ray Casting & Tracing
CS 354 Ray Casting & Tracing
 
OpenGL 3.2 and More
OpenGL 3.2 and MoreOpenGL 3.2 and More
OpenGL 3.2 and More
 
NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL Functionality
 
OpenGL 4 for 2010
OpenGL 4 for 2010OpenGL 4 for 2010
OpenGL 4 for 2010
 
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
 
The next generation of GPU APIs for Game Engines
The next generation of GPU APIs for Game EnginesThe next generation of GPU APIs for Game Engines
The next generation of GPU APIs for Game Engines
 
Ge force fx
Ge force fxGe force fx
Ge force fx
 
3 d to _hpc
3 d to _hpc3 d to _hpc
3 d to _hpc
 
Hardware Shaders
Hardware ShadersHardware Shaders
Hardware Shaders
 
CS 354 Project 2 and Compression
CS 354 Project 2 and CompressionCS 354 Project 2 and Compression
CS 354 Project 2 and Compression
 
3 d to_hpc
3 d to_hpc3 d to_hpc
3 d to_hpc
 
2D Games to HPC
2D Games to HPC2D Games to HPC
2D Games to HPC
 
GeForce 8800 OpenGL Extensions
GeForce 8800 OpenGL ExtensionsGeForce 8800 OpenGL Extensions
GeForce 8800 OpenGL Extensions
 
OpenGL Shading Language
OpenGL Shading LanguageOpenGL Shading Language
OpenGL Shading Language
 
Improving Shadows and Reflections via the Stencil Buffer
Improving Shadows and Reflections via the Stencil BufferImproving Shadows and Reflections via the Stencil Buffer
Improving Shadows and Reflections via the Stencil Buffer
 
Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360
 
Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!
 

Plus de Mark Kilgard

D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...Mark Kilgard
 
Computers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School StudentsComputers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School StudentsMark Kilgard
 
NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017Mark Kilgard
 
NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017Mark Kilgard
 
NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016Mark Kilgard
 
Virtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUsVirtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUsMark Kilgard
 
Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMark Kilgard
 
EXT_window_rectangles
EXT_window_rectanglesEXT_window_rectangles
EXT_window_rectanglesMark Kilgard
 
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...Mark Kilgard
 
Accelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
Accelerating Vector Graphics Rendering using the Graphics Hardware PipelineAccelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
Accelerating Vector Graphics Rendering using the Graphics Hardware PipelineMark Kilgard
 
NV_path rendering Functional Improvements
NV_path rendering Functional ImprovementsNV_path rendering Functional Improvements
NV_path rendering Functional ImprovementsMark Kilgard
 
OpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsOpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsMark Kilgard
 
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
SIGGRAPH Asia 2012: GPU-accelerated Path RenderingSIGGRAPH Asia 2012: GPU-accelerated Path Rendering
SIGGRAPH Asia 2012: GPU-accelerated Path RenderingMark Kilgard
 
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...Mark Kilgard
 
GPU accelerated path rendering fastforward
GPU accelerated path rendering fastforwardGPU accelerated path rendering fastforward
GPU accelerated path rendering fastforwardMark Kilgard
 
GPU-accelerated Path Rendering
GPU-accelerated Path RenderingGPU-accelerated Path Rendering
GPU-accelerated Path RenderingMark Kilgard
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingSIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingMark Kilgard
 
GTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path RenderingGTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path Rendering Mark Kilgard
 
GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012Mark Kilgard
 

Plus de Mark Kilgard (20)

D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...
 
Computers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School StudentsComputers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School Students
 
NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017
 
NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017
 
NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016
 
Virtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUsVirtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUs
 
Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to Vulkan
 
EXT_window_rectangles
EXT_window_rectanglesEXT_window_rectangles
EXT_window_rectangles
 
OpenGL for 2015
OpenGL for 2015OpenGL for 2015
OpenGL for 2015
 
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
 
Accelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
Accelerating Vector Graphics Rendering using the Graphics Hardware PipelineAccelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
Accelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
 
NV_path rendering Functional Improvements
NV_path rendering Functional ImprovementsNV_path rendering Functional Improvements
NV_path rendering Functional Improvements
 
OpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsOpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUs
 
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
SIGGRAPH Asia 2012: GPU-accelerated Path RenderingSIGGRAPH Asia 2012: GPU-accelerated Path Rendering
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
 
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
 
GPU accelerated path rendering fastforward
GPU accelerated path rendering fastforwardGPU accelerated path rendering fastforward
GPU accelerated path rendering fastforward
 
GPU-accelerated Path Rendering
GPU-accelerated Path RenderingGPU-accelerated Path Rendering
GPU-accelerated Path Rendering
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingSIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
 
GTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path RenderingGTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path Rendering
 
GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012
 

Dernier

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 

Dernier (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 

CS 354 GPU Architecture

  • 1. CS 354 GPU Architecture Mark Kilgard University of Texas March 6, 2012
  • 2. CS 354 2 Today’s material  In-class quiz  Lecture topic  Architecture of Graphics Processing Units (GPUs)  Course work  Homework #4 due today  Review textbook reading  Chapter 5, 6, and 7  Project #2 on texturing, shading, & lighting is coming  Remember: Midterm in-class on March 8
  • 3. CS 354 3 My Office Hours  Tuesday, before class  Painter (PAI) 5.35  8:45 a.m. to 9:15  Thursday, after class  ACE 6.302  11:00 a.m. to 12  Randy’s office hours  Monday & Wednesday  11 a.m. to 12:00  Painter (PAI) 5.33
  • 4. CS 354 4 Last time, this time  Last lecture, we discussed  Programmable shading  Graphics hardware shading languages  This lecture  How do GPUs work?
  • 5. CS 354 5 On a sheet of paper Daily Quiz • Write your EID, name, and date • Write #1, #2, #3, #4 followed by its answer  Pick the best choice: Shade  Multiple choice: The GLSL standard has built-in data types for trees are a) vectors a) fractal trees with shadows b) matrices b) OpenGL commands c) texture samplers c) hierarchical arrangements of d) floating-point values e) pointers to malloc’ed memory shading computations f) a through e d) fractal patterns of all sorts g) a through d  Name one general purpose programming language that GLSL borrows from.
  • 6. CS 354 6 Key Trend in OpenGL Evolution Complex Configurability Simple Shaders! Configurability High-level languages Fixed-function Programmable  Direct3D follows the same trend  Also reflects trend in GPU architecture  API and hardware co-evolving
  • 7. CS 354 7 Programming Shaders inside GPU  Multiple programmable domains within the GPU 3D Application or Game  Can be programmed in high-level languages  Cg, HLSL, or OpenGL Shading Language (GLSL) OpenGL API CPU – GPU Boundary GPU Vertex Primitive Clipping, Setup, Raster Front End Assembly Assembly and Rasterization Operations Vertex Geometry Fragment Shader Program Shader Attribute Fetch Legend Parameter Buffer Read Texture Fetch Framebuffer Access programmable fixed-function Memory Interface OpenGL 3.3
  • 8. CS 354 8 Complex OpenGL Data Flow
  • 9. CS 354 9 Six Years of GPU Architecture OpenGL Direct3D Product New Features Version Version Hardware transform & lighting, configurable 2000 GeForce 256 fixed-point shading, cube maps, texture 1.3 DX7 compression, anisotropic texture filtering Programmable vertex transformation, 4 texture units, dependent textures, 3D 2001 GeForce3 textures, shadow maps, multisampling, 1.4 DX8 occlusion queries 2002 GeForce4 Ti 4600 Early Z culling, dual-monitor 1.4 DX8.1 Vertex program branching, floating-point fragment programs, 16 texture units, limited 2003 GeForce FX floating-point textures, color and depth 1.5 DX9 compression Vertex textures, structured fragment branching, non-power-of-two textures, 2004 GeForce 6800 Ultra generalized floating-point textures, floating- 2.0 DX9c point texture filtering and blending 2005 GeForce 7800 GTX Transparency antialiasing 2.0 DX9c
  • 10. CS 354 10 GeForce Peak Vertex Processing Trends rate for trivial 4x4 exceeds peak 1,400 vertex transform setup rates—allows Millions of vertices per second 1,200 excess vertex processing 1,000 800 600 400 200 0 GeForce2 GeForce3 GeForce4 Ti GeForce FX GeForce GeForce GTS 4600 6800 Ultra 7800 GTX Vertex units 1 1 2 3 6 8
  • 11. CS 354 11 GeForce Peak Memory Bandwidth Trends 200 128-bit interface 256-bit interface 180 Raw 160 bandwidth Gigabytes per second 140 Effective raw bandwidth 120 with compression 100 Expon. (Effective raw bandwidth 80 with compression) 60 Expon. (Raw bandwidth) 40 20 0 GeForce2 GeForce3 GeForce4 T i GeForce FX GeForce GeForce GT S 4600 6800 Ultra 7800 GT X
  • 12. CS 354 12 Effective GPU Memory Bandwidth  Compression schemes  Lossless depth and color (when multisampling) compression  Lossy texture compression (S3TC / DXTC)  Typically assumes 4:1 compression  Avoidance useless work  Early killing of fragments (Z cull)  Avoiding useless blending and texture fetches  Very clever memory controller designs  Combining memory accesses for improved coherency  Caches for texture fetches
  • 13. CS 354 13 GeForce Core and Memory Clock Rates 1,400 DDR memory 1,200 transition— memory rates 1,000 double physical clock rate Megahertz (Mhz) 800 Core clock 600 Memory clock 400 200 0 X a 0 S ltr T X 60 2 3 X T T G F U ce Z G i4 N ce 0 0 or a T 2 T 80 80 iv ce or eF a 4 R 7 iv 6 eF e or G R ce c ce eF G or or or eF G eF eF G G G
  • 14. CS 354 14 GeForce Peak Triangle Setup Trends 300 assumes 50% face culling Millions of triangles per second 250 200 150 100 50 0 GeForce2 GeForce3 GeForce4 Ti GeForce FX GeForce GeForce GTS 4600 6800 Ultra 7800 GTX
  • 15. CS 354 15 GeForce Peak Texture Fetch Trends 12,000 assuming no texture cache misses 10,000 Millions of texture fetches 8,000 per second 6,000 4,000 2,000 0 GeForce2 GeForce3 GeForce4 Ti GeForce FX GeForce GeForce GTS 4600 6800 Ultra 7800 GTX Texture units 2×4 2×4 2×4 2×4 16 24
  • 16. CS 354 16 GeForce Peak Depth/Stencil-only Fill 18,000 assuming no double speed Millions of depth/stencil pixel updates 16,000 read-modify-write depth-stencil only 14,000 12,000 per second 10,000 8,000 6,000 4,000 2,000 0 GeForce2 GeForce3 GeForce4 Ti GeForce FX GeForce GeForce GTS 4600 6800 Ultra 7800 GTX Raster Op units 4 4 4 4+4 16+16 16+16
  • 17. CS 354 17 GeForce Transistor Count and Semiconductor Process 450 400 Millions of transistors 350 300 250 200 150 100 50 0 Riva ZX Riva GeForce2 GeForce3 GeForce4 GeForce GeForce GeForce TNT2 GTS Ti 4600 FX 6800 7800 GTX Ultra Process (µm) 0.35 0.22 0.18 0.18 0.15 0.13 0.13 0.11
  • 18. CS 354 18 Hardware GeForce GeForce GeForce Unit FX 5900 6800 Ultra 7800 GTX Vertex 3 6 8 4+4 16 24 Fragment 2nd Texture Fetch 4+4 16+16 16+16 Raster Color Raster Depth
  • 19. CS 354 19 GeForce 7800 GTX Board Details SLI Connector Single slot cooling sVideo TV Out DVI x 2 256MB/256-bit DDR3 600 MHz 16x PCI-Express 8 pieces of 8Mx32
  • 20. CS 354 20 GeForce 7800 GTX GPU Details 302 million transistors 430 MHz core clock 256-bit memory interface Notable Functionality • Non-power-of-two textures with mipmaps • Floating-point (fp16) blending and filtering • sRGB color space texture filtering and frame buffer blending • Vertex textures • 16x anisotropic texture filtering • Dynamic vertex and fragment branching • Double-rate depth/stencil-only rendering • Early depth/stencil culling • Transparency antialiasing
  • 21. CS 354 21 GeForce 7800 GTX Parallelism 8 Vertex Engines Z-Cull Triangle Setup/Raster Shader Instruction Dispatch 24 Fragment Shaders Fragment Crossbar 16 Raster Operation Pipelines Memory Memory Memory Memory Partition Partition Partition Partition
  • 22. CS 354 22 GeForce Graphics Pipeline Separate dedicated units Vertex Fragment Raster Frame CPU Engine Setup Raster Shader Ops Buffer Z Cull Texture
  • 23. CS 354 23 GeForce Graphics Pipeline Vertex Engine Vertex pulling Vector floating-point instructions Dynamic branching Vertex texture Vertex stream frequency Vertex Fragment Raster Frame CPU Engine Setup Raster Shader Ops Buffer Z Cull Texture
  • 24. CS 354 24 GeForce Graphics Pipeline Setup Prepare triangle for rasterization 215M triangles/sec setup Vertex Fragment Raster Frame CPU Engine Setup Raster Shader Ops Buffer Z Cull Texture
  • 25. CS 354 25 GeForce Graphics Pipeline Raster Compute coverage Points, lines, and triangles Rotated grid multisampling Vertex Fragment Raster Frame CPU Engine Setup Raster Shader Ops Buffer Z Cull Texture
  • 26. CS 354 26 GeForce Graphics Pipeline Z Cull Discard fragments early based on Z Up to 64 pixels/clock Multisampled: 256 samples/clock Vertex Fragment Raster Frame CPU Engine Setup Raster Shader Ops Buffer Z Cull Texture
  • 27. CS 354 27 GeForce Graphics Pipeline Fragment Shader User-programmed fragment coloring Dynamic branching Long shaders Multiple render targets fp16 and fp32 vectors Vertex Fragment Raster Frame CPU Engine Setup Raster Shader Ops Buffer Z Cull Texture
  • 28. CS 354 28 GeForce Graphics Pipeline Texture fp16 and sRGB filtering 16x anisotropic filtering Non-power-of-two mipmapping Shadow maps, cube maps, and 3D Floating-point textures Vertex Fragment Raster Frame CPU Engine Setup Raster Shader Ops Buffer Z Cull Texture
  • 29. CS 354 29 GeForce Graphics Pipeline Texture 2x and 4x multisampling fp16 and sRGB blending Multiple render targets Color and depth compression Double-speed depth/stencil only Vertex Fragment Raster Frame CPU Engine Setup Raster Shader Ops Buffer Z Cull Texture
  • 30. CS 354 30 Single GeForce 7800 Vertex Unit Primitive Assembly + Vertex Processing Engine Attribute Processing • MIMD Architecture • Dual Issue • Low-penalty branching • Shader Model 3.0 • 32 vector registers Vertex FP32 FP32 • 512 static instructions per Texture Scalar Vector Fetch Unit Unit program • Indexed input and output registers Texture Branch Vertex Texture Fetch Cache Unit • Non-stalling • Up to 4 texture units Viewport Processing • Unlimited fetches • Mipmapping, no filtering To Setup
  • 31. CS 354 31 Vertex Texturing Example Vertex Program Flat tessellated mesh Displaced mesh Height field texture
  • 32. CS 354 32 Vertex Textures for Dynamic Displacement Mapping Without Vertex Textures With Vertex Textures Images used with permission from Pacific Fighters. © 2004 Developed by 1C:Maddox Games. All rights reserved. © 2004 Ubi Soft Entertainment.
  • 33. CS 354 33 Vertex Textures to Drive Particle Systems  Render-to-texture  Simulation runs in floating-point frame buffer, also usable as texture  Vertex textures  Determines particle location with vertex texture fetch
  • 34. CS 354 34 Single GeForce 7800 Fragment Shader Pipeline Texture Input Fragment Texture Processor Data Data 16 texture units 1 texture fetch at full speed Bilinear or tri-linear filtering FP32 16x anisotropic filtering Texture Shader Processor Floating-point (fp16) texture filtering Unit 1 Shader Unit 1 FP32 4 MULs + RCP Texture Dual Issue Shader Cache Unit 2 Texture address calculation Fast fp16 normalize Branch Free: negate, abs, condition codes Processor Shader Unit 2 Output 4 MADs or DP4 Fixed-function Shaded Dual Issue Fog Unit Fragments Free: negate, abs, condition codes
  • 35. CS 354 35 Operations Per Fragment Shader Pass Shader 4 Components 1 Texture / Unit 1 1 Op / component fragment at full 4 ops / fragment or per pass speed per pass Texture Shader 4 Components 1 Op / component Unit 2 4 Ops / fragment per pass 8 Operations / fragment per pass
  • 36. CS 354 36 Fragment Shader Component Co-issue  Use 4 components various ways  RGBA all together  RGB and A  RG and GB Shader  Both shader units Unit 1 R G B A  Two operations Operation 1 Operation 2 per shader unit Shader Unit 2 R G B A Operation 3 Operation 4
  • 37. CS 354 37 Single GeForce 7800 Raster Operations Pipeline Input Shaded Pixel Crossbar Fragment Interconnect Functionality Data • OpenEXR Multisample Antialiasing floating-point blending • sRGB Depth Color blending Compression Compression • 4x rotated grid multisampling Depth Color • Lossless color Raster Raster and depth Operations Operations compression • Multiple render targets Memory Frame Buffer Partition
  • 38. CS 354 38 GeForce 7800 Transparency Antialiasing Conventional 4x antialiasing Transparency antialiasing with alpha tested context with alpha tested context
  • 39. CS 354 39 Scalable Link Interface (SLI)  Gang two GeForce 6600, 6800, or 7800 graphics boards together  Can almost double your performance SLI Connector Two 6800 Ultras pictured
  • 40. CS 354 40 SLI Rendering Modes  Split Frame Rendering (SFR)  One GPU renders top of screen; other renders the bottom  Scales fragment processing but not vertex processing  Alternate Frame Rendering (AFR)  Scales both vertex and fragment processing  Adds frame latency  Rendering must be free of CPU synchronization  SLI Antialiasing: SLI8x and SLI16x  Better antialiasing quality rather than performance  Each card renders with slightly different sub-pixel offset
  • 41. CS 354 41 PC Graphics Hardware Evolution Viable economics: 650 million GeForce GPUs since 1999 1,000x complexity since 1995 Moore’s Law at work GeForce 580 GTX 3B transistors GeForce 8800 681M GeForce FX transistors GeForce 256 GeForce 3 ® 125M 23M 60M transistors RIVA 128 transistors transistors 3M transistors 1997 2000 2005 2010
  • 42. CS 354 42 Current High-end “Fermi” GPU  Current high-end graphics card  512 graphics “cores”  1.5Gb memory  System power: 600W  OpenGL 4.2 / DirectX 11 functionality
  • 43. CS 354 43 High-level “Fermi” Architecture  GF100  Four Graphics Processor Clusters (GPCs)  Each is self- contained graphics pipeline  Smaller chips have fewer GPcs  Shared L2 cache  6 Memory Controllers  1.5 Gb
  • 44. CS 354 44 Inside Each Graphics Processing Cluster  Raster engine  Four SMs  Streaming Multiprocessor  Texture fetch resources  Tessellation and vertex processing resources  Polymorph Engine
  • 45. CS 354 45 Streaming Multiprocessor (SM)  Multi-processor execution unit  32 scalar processor cores  Warp is a unit of thread execution of up to 32 threads  Two workloads  Graphics  Vertex shader  Tessellation  Geometry shader  Fragment shader  Compute
  • 46. CS 354 46 OpenGL Pipeline Programmable Domains run on Unified Hardware  Unified Streaming Processor Array (SPA) architecture means same capabilities for all domains  Plus tessellation + compute (not shown below) , GPU Vertex Primitive Clipping, Setup, Raster Front End Assembly Assembly and Rasterization Operations Can be Vertex Primitive Fragment unified Program Program Program hardware! Attribute Fetch Parameter Buffer Read Texture Fetch Framebuffer Access Memory Interface
  • 47. CS 354 47 Dual Warp Scheduling 32 threads launch!
  • 48. CS 354 48 Shader or CUDA Core, Same Unit but Two Personalities  Execution unit  Scalar floating-point  Scalar integer
  • 49. CS 354 49 Levels of Caching in Fermi GPU  12 KB L1 Texture cache  Per texture unit  SM 64 K cache  Split into dedicated 16K or 48K Load/Store cache  Shared memory 48K or 16K  L2 unifies texture cache, raster operation cache, and internal buffering in prior generation  768 K  Read / write  Fully coherent
  • 50. CS 354 50 Cache Use Strategies in Fermi GPU  Pipeline stages can communicate efficiently through GPU’s L1 and L2 caches  Buffering between stages stays all on chip  Only vertex, texel, and pixel read/writes need to go to DRAM
  • 51. CS 354 51 Vertex and Tessellation Processing Tasks  Fixed-function graphics engines  Pull attributes and assemble vertex  Manage tessellation control and domain shader evaluation  Viewport transform  Attribute setup of plane equations for rasterization  Stream out vertices into buffers
  • 52. CS 354 52 Rasterization Tasks  Turns primitives into fragments  Computes edge equations  Two-stage rasterization  Coarse raster finds tiles the primitive could be in  Fine raster evaluates sample positions within tiles  Zcull efficiently eliminates occluded fragments
  • 53. CS 354 53 Base Input Input Mesh Mesh From Metro 2033, © THQ and 4A Games
  • 54. CS 354 54 Apply Phong Tessellation From Metro 2033, © THQ and 4A Games
  • 55. CS 354 55 Add Displacement Mapping Apply Displacement Mapping From Metro 2033, © THQ and 4A Games
  • 56. CS 354 56 GPUs as Compute Nodes  Architecture of GPU has evolved into a high- performance, high-bandwidth compute node Small form factor Compute Integrated CPU-GPU OEM CPU Server + Workstations Servers & Blades Compute 1U 2 to 4 Tesla GPUs
  • 57. CS 354 57 Compute Programming Model  Cooperative Thread Array (CTA)  Single Program, Multiple Data  Organized in 1D, 2D, or 3D  Programming APIs  CUDA, OpenCL, DirectCompute  APIs + language = parallel processing system  OpenGL or Direct3D through shaders  Cg, HLSL, GLSL
  • 58. CS 354 58 Now in World’s Fastest Supercomputers Tianhe-1A 2.507 Petaflop 7168 Tesla M2050 GPUs National Supercomputing Center in Tianjin
  • 59. CS 354 59 Opposite direction: Consumer mobile devices
  • 60. CS 354 60 Low-power Mobile System on a Chip (SoC) Complete system on a chip  4 ARM cores  Integrated graphics  OpenGL ES 2.0  Power <1W
  • 61. CS 354 61 Mid-term Next Class  Mid-term  Similar in format to the homeworks  15% of your final grade  Arrive on time  Open textbook. Open notes, including lecture slides.  Calculators allowed/encouraged.  No smart phones, no computers, no Internet access.  Show your work to justify your answer and provide a basis for partial credit.  What to study  All material in lecture slides  Review in-class quiz questions  Study homeworks  Responsible for textbook readings  Have a relaxing spring break  Next lecture: Shadows  Come back to Project 2

Notes de l'éditeur

  1. The technology of graphics processors has evolved amazingly over the last 15 years or so. I’ve been at NVIDIA for more than 10 years and have seen a lot of this first hand. As the hardware increases in performance, the visual quality improves. This is driven by Moore’s law, which says that the number of transistors able to fit on a piece of silicon doubles roughly every 18 months. The great thing about graphics is that has an insatiable appetite for computation. We’re clearly not at photo-realistic quality yet and still have a long way to.
  2. World’s Fastest Known Supercomputer today – official Top500 list comes out next month Peta = 10^15 = thousand trillion floating point operations per second