2. First Generation - Wireframe
Vertex: transform, clip, and project
Rasterization: lines only
Pixel: no pixels! calligraphic display
Dates: prior to 1987
3. Storage Tube Terminals
CRTs with analog charge “persistence”
Accumulate a detailed static image by writing points or
line segments
Erase the stored image to start a new one
4. Early Framebuffers
By the mid-1970’s one could afford framebuffers with a
few bits per pixel at modest resolution
“A Random Access Video Frame Buffer”,
Kajiya, Sutherland, Cheadle, 1975
Vector displays were still better for fine position detail
Framebuffers were used to emulate storage tube vector
terminals on a raster display
5. Second Generation – Shaded Solids
Vertex: lighting
Rasterization: filled polygons
Pixel: depth buffer, color blending
Dates: 1987 - 1992
8. 1990’s
Desktop 3D workstations under $5000
Single-board, multi-chip graphics subsystems
Rise of 3D on the PC
40 company free-for-all until intense competition knocked out all but a
few players
Many were “decelerators”, and easy to beat
Single-chip GPUs
Interesting hardware experimentation
PCs would take over the workstation business
Interesting consoles
3DO, Nintendo, Sega, Sony
13. Tradition Fixed Function Graphics pipeline
T&L evolved
to vertex
shading
memory
interface
vertex
processing
triangle
setup
pixel
processing
raster
operations
Triangle,
point, line
setup
Flat shading,
texturing
eventually
pixel shading
Blending, Z-
buffering,
Antialiasing
Wider and
faster over
the years
Processor per function
21. Millions of triangles Millions of pixels
Why are so
many parallel
operations
needed?
Input triangle Tessellate Projection Rasterize ShadeTransform vertices
Image plane
Camera
22. GPU = More computational horsepower and
bandwidth per watt
Few complex processors
Optimized for single-
threaded performance
Many simple processors
with minimal overhead
Slow single-threaded
performance but massive
overall throughput
24. GPU Architecture:
Two Main Components
Streaming Multiprocessors (SMs)
Perform the actual computations
Each SM has its own:
Control units, registers, execution pipelines, caches
Global memory
Analogous to RAM in a CPU server
Accessible by both GPU and CPU
Currently up to 6 GB per GPU
Bandwidth currently up to 250 GB/s
DRAMI/F
Giga
Thread
HOSTI/FDRAMI/F
DRAMI/FDRAMI/FDRAMI/FDRAMI/F
L2
27. WORLD’S #1 SUPERCOMPUTER
With a peak performance of 27 petaflops, the
Titan supercomputer at Oak Ridge National
Labs is the world’s fastest. 18,688 GPUs
provide 90% of the machine’s computing
power.
28.
29. The Graphics pipeline
Vertex and fragment processing are programmable
The programmer can write programs that are executed for every vertex as
well as for every fragment
This allows fully customizable geometry and shading effects that go well
beyond the generic look and feel of older 3D applications
host
interface
vertex
processing
triangle
setup
pixel
processing
memory
interface