SlideShare une entreprise Scribd logo
1  sur  60
Next-Generation Graphics
Programming on Xbox 360
Prerequisites
Basic system architecture
Caches, Bandwidth, etc.
Basic Direct3D
Drawing, render targets, render state
Vertex and pixel shaders
Overview
Xbox 360 System Architecture
Xbox 360 Graphics Architecture
Details
Direct3D on Xbox 360
Xbox 360 graphics APIs
Shader development
Tools for graphics debugging and
optimization
Xbox 360
512 MB system memory
IBM 3-way symmetric core processor
ATI GPU with embedded EDRAM
12x DVD
Optional Hard disk
The Xbox 360 GPU
Custom silicon designed by ATi
Technologies Inc.
500 MHz, 338 million transistors, 90nm
process
Supports vertex and pixel shader version
3.0+
Includes some Xbox 360 extensions
The Xbox 360 GPU
10 MB embedded DRAM (EDRAM) for
extremely high-bandwidth render targets
Alpha blending, Z testing, multisample antialiasing
are all free (even when combined)
Hierarchical Z logic and dedicated memory
for early Z/stencil rejection
GPU is also the memory hub for the whole
system
22.4 GB/sec to/from system memory
More About the Xbox 360 GPU
48 shader ALUs shared between pixel
and vertex shading (unified shaders)
Each ALU can co-issue one float4 op and
one scalar op each cycle
Non-traditional architecture
16 texture samplers
Dedicated Branch instruction
execution
More About the Xbox 360 GPU
2x and 4x hardware multi-sample anti-
aliasing (MSAA)
Hardware tessellator
N-patches, triangular patches, and
rectangular patches
Can render to 4 render targets and a
depth/stencil buffer simultaneously
GPU: Work Flow
Consumes instructions and data from a
command buffer
Ring buffer in system memory
Managed by Direct3D, user configurable size
(default 2 MB)
Supports indirection for vertex data, index data,
shaders, textures, render state, and command
buffers
Up to 8 simultaneous contexts in-flight at
once
Changing shaders or render state is inexpensive,
since a new context can be started up easily
GPU: Work Flow
Threads work on units of 64 vertices or
pixels at once
Dedicated triangle setup, clipping, etc.
Pixels processed in 2x2 quads
Back buffers/render targets stored in
EDRAM
Alpha, Z, stencil test, and MSAA expansion done
in EDRAM module
EDRAM contents copied to system
memory by “resolve” hardware
GPU: Operations Per Clock
Write 8 pixels or 16 Z-only pixels to
EDRAM
With MSAA, up to 32 samples or 64 Z-only
samples
Reject up to 64 pixels that fail
Hierarchical Z testing
Vertex fetch sixteen 32-bit words from
up to two different vertex streams
GPU: Operations Per Clock
16 bilinear texture fetches
48 vector and scalar ALU operations
Interpolate 16 float4 shader
interpolants
32 control flow operations
Process one vertex, one triangle
Resolve 8 pixels to system memory
from EDRAM
GPU: Hierarchical Z
Rough, low-resolution representation
of Z/stencil buffer contents
Provides early Z/stencil rejection for
pixel quads
11 bits of Z and 1 bit of stencil per
block
GPU: Hierarchical Z
NOT tied to compression
EDRAM BW advantage
Separate memory buffer on GPU
Enough memory for 1280x720 2x MSAA
Provides a big performance boost
when drawing complex scenes
Draw opaque objects front to back
GPU: Xbox Procedural Synthesis
Dedicated hardware channel between
CPU L2 cache and GPU
10.8 GB/sec
Useful for dynamic geometry,
instancing, particle systems, etc.
Callbacks inserted into command
buffer spin up XPS threads on CPU
during rendering
GPU: Xbox Procedural Synthesis
XPS threads run your supplied code
GPU consumes data as CPU
generates it
Separate, small command buffer in locked
L2 cache section
GPU: Hardware Tessellation
Dedicated tessellation hardware
Occurs before vertex shading
Does not compute interpolated vertex
positions
Parametric values are sent to a special
vertex shader designed for tessellation
GPU: Hardware Tessellation
3 modes of tessellation
Discrete: integer parameter, triangles only
Continuous: float parameter, new vertices
are shifted to their final position
Per-edge: Special index buffer used to
provide per-edge factors, used for patches
only, cannot support traditional index data
as well
Discrete Tessellation Sample
Triangle with tessellation level 4
Continuous Tessellation Sample
Quad with tessellation levels 3, 4, 5
Per-Edge Tessellation Sample
Quad-patch with per-edge tessellation
factors of 1, 3, 5, and 7
GPU: Textures
16 bilinear texture samples per clock
64bpp runs at half rate, 128bpp at quarter rate
Trilinear at half rate
Unlimited dependent texture fetching
DXT decompression has 32 bit precision
Better than Xbox (16-bit precision)
GPU: Textures
Texture arrays – generalized version of
cube maps
Up to 64 surfaces within a texture, optional
MIPmaps for each surface
Surface is indexed with a [0..1] z coordinate in a
3D texture fetch
Mip tail packing
All MIPs smaller than 32 texels on a side are
packed into one memory page
GPU: Resolve
Copies surface data from EDRAM to a
texture in system memory
Required for render-to-texture and
presentation to the screen
Can perform MSAA sample averaging
or resolve individual samples
Can perform format conversions and
biasing
Direct3D 9+ on Xbox 360
Similar API to PC Direct3D 9.0
Optimized for Xbox 360 hardware
No abstraction layers or drivers—it’s direct
to the metal
Exposes all Xbox 360 custom hardware
features
New state enums
New APIs for finer-grained control and
completely new features
Direct3D 9+ on Xbox 360
Communicates with GPU via a
command buffer
Ring buffer in system memory
Direct Command Buffer Playback support
Direct3D: Drawing
Points, lines, triangles, quads, rects,
polygons
Stripped lines, triangles, quads, polygons
Configurable reset index to reset strips
within an index buffer
DrawPrimitive, DrawIndexedPrimitive
DrawPrimitiveUP,
DrawIndexedPrimitiveUP
No vertex buffers or index buffers required
Direct3D: Drawing
BeginVertices / EndVertices
Direct allocation from the command buffer
Better than DrawPrimitiveUP
SetStreamSourceFrequency
Not directly supported on Xbox 360
Can be done with more flexibility in vertex
shader
RunCommandBuffer
Direct support for modifying and playing command
buffers
Direct3D: State
Render state and sampler state
New states added for Xbox 360
Half-pixel offset
High-precision blending
Primitive reset index
Unsupported states ignored by Direct3D
Direct3D: State
Lazy state
Direct3D batches up state in blocks that are
optimized for submission to the GPU
Uses special CPU instructions for very low
CPU overhead
State API extensions for Xbox 360
SetBlendState
Individual blend control for each render
target
Direct3D: Render Targets
New format for high dynamic range
2:10:10:10 float
7 bits mantissa, 3 bits exponent for each
of R, G, B
No matching texture format
Special mode which allows for 10 bits of
alpha precision on blends
Optional sRGB gamma correction
(2.2) on 8:8:8:8 fixed point
Direct3D: New Texture Formats
DXN
2-component format with 8 bits of precision
per component
Great for normal maps
CTX1
Also good for normal maps
Less bits per pixel than DXN
Direct3D: New Texture Formats
DXT3A, DXT5A
Single component textures made from a
DXT3/DXT5 alpha block
4 bits of precision
Direct3D: Command Buffer
Ring buffer that allows the CPU to safely
send commands to the GPU
Buffer is filled by CPU, and the GPU
consumes the data
CPU Write Pointer
GPU Read Pointer
Code
Execution
Draw
Draw
Draw
Draw
Rendering
Direct3D: Predicated Tiling
EDRAM is not large enough for 720p
rendering with MSAA or multiple
render targets
Solution: Predicated tiling
Divide the surface into a handful of “tiles”
Direct3D replays an accumulated command
buffer to each tile
Direct3D stitches together a final image by
resolving each tile pass to a rect in a
texture in system memory
Direct3D: Predicated Tiling
Optimization
Uses GPU hardware predication and
screen-space extents capture
Only renders primitives on the tiles where
they appear
Cuts down significantly on duplicate vertex
processing
Direct3D: Predicated Tiling
What does it cost?
Increased vertex processing cost, for
primitives that span more than one tile
No pixel processing increase
Required for 720p and MSAA
XGraphics
Graphics API for direct access to Direct3D
resources and low-level texture data
manipulation
Create/update resources without the D3D
device
XGSetTextureHeader, XGSetVertexBufferHeader,
XGSetIndexBufferHeader, etc
Useful for threaded loading code
XGraphics
Direct management of resource memory
Use your own allocators, etc.
Texture tiling functions
Endian swapping functions
Win32 library is useful for tool pipelines
and offline resource packing
Shaders
Two options for writing shaders
HLSL (with Xbox 360 extensions)
GPU microcode (specific to the Xbox 360
GPU, similar to assembly but direct to
hardware)
Recommendation: Use HLSL
Easy to write and maintain
Replace individual shaders with microcode
if performance analysis warrants it
High-Level Shading Language
(HLSL)
Xbox 360 extensions
Texture sampler modes (offsets, filtering
override, etc.)
Memory export
Predicated tiling screen space offset
Vertex fetching/index manipulation
Tesselation
Shaders: GPU Microcode
Shader language that maps 1:1 with
the hardware
Similar to assembly shaders in
appearance
Somewhat esoteric, but extremely
flexible
Plan for HLSL, write microcode for
performance gains where needed
Shaders: Branching
Static branching
Branching based on bool or int constants
Example: Perform texture sample if
constant b15 is TRUE
Dynamic branching
Branching based on a computed or
interpolated value
Example: Compute dot product, light pixel if
result > 0
Shaders: Branching
Dynamic branching can be as fast as
static branching if all vertices/pixels in
a 64-unit vector take the same branch
Shaders: Vertex Declarations and
Fetches
Xbox 360 GPU does not natively
support vertex declarations or streams
Shaders are patched at run time to
match the current vertex declaration
Vertex fetch instructions used to load vertex
data into the GPU
Shaders: Vertex Declarations and
Fetches
Shader patching code caches the
patched code using a shader-decl
pointer pair
Try not to create many identical vertex
declarations
More GPU Friendly shaders are added as
an extension but are not as flexible
PIX
The ultimate performance and
graphics debugging tool
Incorporates several modules
Performance monitor
CPU performance analysis
Command buffer capture, playback, and
analysis
Shader and mesh debuggers
PIX
Hardware accelerated
GPU has hundreds of performance
counters
Development kit has special hardware for
PIX
PIX: Performance Monitor
Scrolling graph display
Configurable graphs
CPU, GPU, memory, storage stats
Best place to accurately gauge frame
rate/frame time
Runs continuously while PIX is open
Works with any application running on
the development kit
PIX: CPU Timeline
Shows the CPU usage of graphics
calls
Also shows the CPU usage of your
title’s subsystems
Use PIXBeginNamedEvent /
PIXEndNamedEvent APIs to label
your subsystems by name and color
PIX: GPU Timeline
Shows the execution of each graphics
command on the GPU
Timing and statistics
Full GPU and Direct3D state
All input and output resources (textures,
surfaces, shaders, etc.)
Event breakdown
Render target views
Mesh debugger
PIX: Shader debuggers
Vertex shader debugger
Click on a vertex in the mesh debugger
HLSL And Microcode
Pixel shader debugger
Right-click on a pixel in a render target
Select the right draw call that influenced
that pixel
HLSL and Microcode
Conclusion
Direct3D customized and optimized
for the hardware
New APIs to take advantage of custom
features
Direct to the metal, no drivers or HAL
Conclusion
Extremely powerful graphics hardware
Unified shaders
Shader model 3.0 plus extensions for Xbox
360
Fast EDRAM provides a huge amount of
dedicated frame buffer bandwidth
Dedicated hardware for tesselation,
procedural geometry, and multi-sample
anti-aliasing
Conclusion
High performance shader
development
Brand new HLSL compiler for Xbox 360
GPU microcode to expose the hardware’s
abilities
FXLite for fast effects
Conclusion
Best performance tools in the industry
PIX is your best friend and your secret
weapon
Hardware-accelerated
Exposes every bit of performance info
Resources
GDC 2006 Presentations
http://msdn.com/directx/presentations
DirectX Developer Center
http://msdn.com/directx
XNA Developer Center
http://msdn.com/xna
XNA, DirectX, XACT Forums
http://msdn.com/directx/forums
Email addresses
directx@microsoft.com (DirectX & Windows development)
xna@microsoft.com (XNA Feedback)
© 2006 Microsoft Corporation. All rights reserved.
Microsoft, DirectX, Xbox 360, the Xbox logo, and XNA are either registered trademarks or trademarks of Microsoft Corporation in the United Sates and / or other countries.
This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
Next generation graphics programming on xbox 360

Contenu connexe

Tendances

Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overhead
Cass Everitt
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Johan Andersson
 

Tendances (20)

Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overhead
 
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
Lighting the City of Glass
Lighting the City of GlassLighting the City of Glass
Lighting the City of Glass
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 
The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in FrostbitePhysically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
 
Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-rendering
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next Steps
 
Parallel Futures of a Game Engine
Parallel Futures of a Game EngineParallel Futures of a Game Engine
Parallel Futures of a Game Engine
 
OpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering TechniquesOpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering Techniques
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
 

En vedette

8 3 giraldo anderson y jhon trejos
8 3 giraldo anderson y jhon trejos8 3 giraldo anderson y jhon trejos
8 3 giraldo anderson y jhon trejos
jhonatanrulo
 

En vedette (20)

Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloaded
 
Social Media Strategy DRAFT
Social Media Strategy DRAFTSocial Media Strategy DRAFT
Social Media Strategy DRAFT
 
[SJSU] Social Media Strategy
[SJSU] Social Media Strategy[SJSU] Social Media Strategy
[SJSU] Social Media Strategy
 
ประชุมชี้แจ้งกรอบวิจัย งบประมาณปี 2558
ประชุมชี้แจ้งกรอบวิจัย งบประมาณปี 2558ประชุมชี้แจ้งกรอบวิจัย งบประมาณปี 2558
ประชุมชี้แจ้งกรอบวิจัย งบประมาณปี 2558
 
Internaliser et industrialiser ses études de satisfaction digitales - Qualtri...
Internaliser et industrialiser ses études de satisfaction digitales - Qualtri...Internaliser et industrialiser ses études de satisfaction digitales - Qualtri...
Internaliser et industrialiser ses études de satisfaction digitales - Qualtri...
 
Xarxes socials
Xarxes socialsXarxes socials
Xarxes socials
 
002 ethics
002   ethics002   ethics
002 ethics
 
Ipsos MORI Political Monitor - August 2014
Ipsos MORI Political Monitor - August 2014Ipsos MORI Political Monitor - August 2014
Ipsos MORI Political Monitor - August 2014
 
8 3 giraldo anderson y jhon trejos
8 3 giraldo anderson y jhon trejos8 3 giraldo anderson y jhon trejos
8 3 giraldo anderson y jhon trejos
 
일본에 불어닥친 클라우드 컴퓨팅 열풍의 현황과 전망
일본에 불어닥친 클라우드 컴퓨팅 열풍의 현황과 전망일본에 불어닥친 클라우드 컴퓨팅 열풍의 현황과 전망
일본에 불어닥친 클라우드 컴퓨팅 열풍의 현황과 전망
 
2nd annual io t summit 2016 - Bangalore India
2nd annual io t summit 2016 - Bangalore India2nd annual io t summit 2016 - Bangalore India
2nd annual io t summit 2016 - Bangalore India
 
Sesiones cs fuensanta 25 años
Sesiones cs fuensanta 25 añosSesiones cs fuensanta 25 años
Sesiones cs fuensanta 25 años
 
Social Media for Beginners and 4 Holiday Facebook Tips
Social Media for Beginners and 4 Holiday Facebook TipsSocial Media for Beginners and 4 Holiday Facebook Tips
Social Media for Beginners and 4 Holiday Facebook Tips
 
XC 2015
XC 2015XC 2015
XC 2015
 
Digital digest //24.06.2016
Digital digest //24.06.2016Digital digest //24.06.2016
Digital digest //24.06.2016
 
Box Office Best Practices [Webinar]
Box Office Best Practices [Webinar]Box Office Best Practices [Webinar]
Box Office Best Practices [Webinar]
 
Apc's customised digital signage solutions
Apc's customised digital signage solutionsApc's customised digital signage solutions
Apc's customised digital signage solutions
 
Вестник // Digital Blow Mind // Май 2016
Вестник // Digital Blow Mind // Май 2016Вестник // Digital Blow Mind // Май 2016
Вестник // Digital Blow Mind // Май 2016
 
Transforming Healthcare: Opportunities for Making a Difference and Pivoting Y...
Transforming Healthcare: Opportunities for Making a Difference and Pivoting Y...Transforming Healthcare: Opportunities for Making a Difference and Pivoting Y...
Transforming Healthcare: Opportunities for Making a Difference and Pivoting Y...
 
Facts & Numbers You Might Not Know about Wechat
Facts & Numbers  You Might Not  Know about WechatFacts & Numbers  You Might Not  Know about Wechat
Facts & Numbers You Might Not Know about Wechat
 

Similaire à Next generation graphics programming on xbox 360

Architectural Analysis of Game Machines
Architectural Analysis of Game MachinesArchitectural Analysis of Game Machines
Architectural Analysis of Game Machines
Praveen AP
 
Smedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicsSmedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphics
changehee lee
 
Threading Successes 06 Allegorithmic
Threading Successes 06   AllegorithmicThreading Successes 06   Allegorithmic
Threading Successes 06 Allegorithmic
guest40fc7cd
 
thu-blake-gdc-2014-final
thu-blake-gdc-2014-finalthu-blake-gdc-2014-final
thu-blake-gdc-2014-final
Robert Taylor
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
Thomas Goddard
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
Johan Andersson
 

Similaire à Next generation graphics programming on xbox 360 (20)

Xbox
XboxXbox
Xbox
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and Transparency
 
Architectural Analysis of Game Machines
Architectural Analysis of Game MachinesArchitectural Analysis of Game Machines
Architectural Analysis of Game Machines
 
Smedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicsSmedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphics
 
Threading Successes 06 Allegorithmic
Threading Successes 06   AllegorithmicThreading Successes 06   Allegorithmic
Threading Successes 06 Allegorithmic
 
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit ppt
 
thu-blake-gdc-2014-final
thu-blake-gdc-2014-finalthu-blake-gdc-2014-final
thu-blake-gdc-2014-final
 
Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!
 
2.Hardware.ppt
2.Hardware.ppt2.Hardware.ppt
2.Hardware.ppt
 
Ge force fx
Ge force fxGe force fx
Ge force fx
 
OpenGL 3.2 and More
OpenGL 3.2 and MoreOpenGL 3.2 and More
OpenGL 3.2 and More
 
Look Ma, No Jutter! Optimizing Performance Across Oculus Mobile
Look Ma, No Jutter! Optimizing Performance Across Oculus MobileLook Ma, No Jutter! Optimizing Performance Across Oculus Mobile
Look Ma, No Jutter! Optimizing Performance Across Oculus Mobile
 
Gpu microprocessors
Gpu microprocessorsGpu microprocessors
Gpu microprocessors
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
 
OpenGL 4 for 2010
OpenGL 4 for 2010OpenGL 4 for 2010
OpenGL 4 for 2010
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL Functionality
 
Destruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsDestruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance Fields
 

Plus de VIKAS SINGH BHADOURIA

Plus de VIKAS SINGH BHADOURIA (20)

Virtualization
VirtualizationVirtualization
Virtualization
 
Wireless Hacking
Wireless HackingWireless Hacking
Wireless Hacking
 
EDGE DETECTION
EDGE DETECTIONEDGE DETECTION
EDGE DETECTION
 
TCP /IP
TCP /IPTCP /IP
TCP /IP
 
Complete course on wireless
Complete course on wirelessComplete course on wireless
Complete course on wireless
 
Principles of programming languages. Detail notes
Principles of programming languages. Detail notesPrinciples of programming languages. Detail notes
Principles of programming languages. Detail notes
 
Network Security
Network  SecurityNetwork  Security
Network Security
 
Improve your communication skills
Improve your communication skillsImprove your communication skills
Improve your communication skills
 
Infrared spectroscopy
Infrared spectroscopy   Infrared spectroscopy
Infrared spectroscopy
 
Speed detection-of-moving-vehicle-using-speed-cameras
Speed detection-of-moving-vehicle-using-speed-camerasSpeed detection-of-moving-vehicle-using-speed-cameras
Speed detection-of-moving-vehicle-using-speed-cameras
 
Fingerprint based physical access control vehicle immobilizer
Fingerprint based physical access control vehicle immobilizer Fingerprint based physical access control vehicle immobilizer
Fingerprint based physical access control vehicle immobilizer
 
Intelligent agents
Intelligent agentsIntelligent agents
Intelligent agents
 
Interactive voice-response-system
Interactive voice-response-systemInteractive voice-response-system
Interactive voice-response-system
 
Data compression
Data compressionData compression
Data compression
 
Brain computer-interface-ppt
Brain computer-interface-pptBrain computer-interface-ppt
Brain computer-interface-ppt
 
Trees
TreesTrees
Trees
 
Introduction to parallel computing
Introduction to parallel computingIntroduction to parallel computing
Introduction to parallel computing
 
Bluetooth mobileip
Bluetooth mobileipBluetooth mobileip
Bluetooth mobileip
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 
Full report on light peak technology
Full report on light peak technologyFull report on light peak technology
Full report on light peak technology
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Next generation graphics programming on xbox 360

  • 1.
  • 3. Prerequisites Basic system architecture Caches, Bandwidth, etc. Basic Direct3D Drawing, render targets, render state Vertex and pixel shaders
  • 4. Overview Xbox 360 System Architecture Xbox 360 Graphics Architecture Details Direct3D on Xbox 360 Xbox 360 graphics APIs Shader development Tools for graphics debugging and optimization
  • 5. Xbox 360 512 MB system memory IBM 3-way symmetric core processor ATI GPU with embedded EDRAM 12x DVD Optional Hard disk
  • 6. The Xbox 360 GPU Custom silicon designed by ATi Technologies Inc. 500 MHz, 338 million transistors, 90nm process Supports vertex and pixel shader version 3.0+ Includes some Xbox 360 extensions
  • 7. The Xbox 360 GPU 10 MB embedded DRAM (EDRAM) for extremely high-bandwidth render targets Alpha blending, Z testing, multisample antialiasing are all free (even when combined) Hierarchical Z logic and dedicated memory for early Z/stencil rejection GPU is also the memory hub for the whole system 22.4 GB/sec to/from system memory
  • 8. More About the Xbox 360 GPU 48 shader ALUs shared between pixel and vertex shading (unified shaders) Each ALU can co-issue one float4 op and one scalar op each cycle Non-traditional architecture 16 texture samplers Dedicated Branch instruction execution
  • 9. More About the Xbox 360 GPU 2x and 4x hardware multi-sample anti- aliasing (MSAA) Hardware tessellator N-patches, triangular patches, and rectangular patches Can render to 4 render targets and a depth/stencil buffer simultaneously
  • 10. GPU: Work Flow Consumes instructions and data from a command buffer Ring buffer in system memory Managed by Direct3D, user configurable size (default 2 MB) Supports indirection for vertex data, index data, shaders, textures, render state, and command buffers Up to 8 simultaneous contexts in-flight at once Changing shaders or render state is inexpensive, since a new context can be started up easily
  • 11. GPU: Work Flow Threads work on units of 64 vertices or pixels at once Dedicated triangle setup, clipping, etc. Pixels processed in 2x2 quads Back buffers/render targets stored in EDRAM Alpha, Z, stencil test, and MSAA expansion done in EDRAM module EDRAM contents copied to system memory by “resolve” hardware
  • 12. GPU: Operations Per Clock Write 8 pixels or 16 Z-only pixels to EDRAM With MSAA, up to 32 samples or 64 Z-only samples Reject up to 64 pixels that fail Hierarchical Z testing Vertex fetch sixteen 32-bit words from up to two different vertex streams
  • 13. GPU: Operations Per Clock 16 bilinear texture fetches 48 vector and scalar ALU operations Interpolate 16 float4 shader interpolants 32 control flow operations Process one vertex, one triangle Resolve 8 pixels to system memory from EDRAM
  • 14. GPU: Hierarchical Z Rough, low-resolution representation of Z/stencil buffer contents Provides early Z/stencil rejection for pixel quads 11 bits of Z and 1 bit of stencil per block
  • 15. GPU: Hierarchical Z NOT tied to compression EDRAM BW advantage Separate memory buffer on GPU Enough memory for 1280x720 2x MSAA Provides a big performance boost when drawing complex scenes Draw opaque objects front to back
  • 16. GPU: Xbox Procedural Synthesis Dedicated hardware channel between CPU L2 cache and GPU 10.8 GB/sec Useful for dynamic geometry, instancing, particle systems, etc. Callbacks inserted into command buffer spin up XPS threads on CPU during rendering
  • 17. GPU: Xbox Procedural Synthesis XPS threads run your supplied code GPU consumes data as CPU generates it Separate, small command buffer in locked L2 cache section
  • 18. GPU: Hardware Tessellation Dedicated tessellation hardware Occurs before vertex shading Does not compute interpolated vertex positions Parametric values are sent to a special vertex shader designed for tessellation
  • 19. GPU: Hardware Tessellation 3 modes of tessellation Discrete: integer parameter, triangles only Continuous: float parameter, new vertices are shifted to their final position Per-edge: Special index buffer used to provide per-edge factors, used for patches only, cannot support traditional index data as well
  • 20. Discrete Tessellation Sample Triangle with tessellation level 4
  • 21. Continuous Tessellation Sample Quad with tessellation levels 3, 4, 5
  • 22. Per-Edge Tessellation Sample Quad-patch with per-edge tessellation factors of 1, 3, 5, and 7
  • 23. GPU: Textures 16 bilinear texture samples per clock 64bpp runs at half rate, 128bpp at quarter rate Trilinear at half rate Unlimited dependent texture fetching DXT decompression has 32 bit precision Better than Xbox (16-bit precision)
  • 24. GPU: Textures Texture arrays – generalized version of cube maps Up to 64 surfaces within a texture, optional MIPmaps for each surface Surface is indexed with a [0..1] z coordinate in a 3D texture fetch Mip tail packing All MIPs smaller than 32 texels on a side are packed into one memory page
  • 25. GPU: Resolve Copies surface data from EDRAM to a texture in system memory Required for render-to-texture and presentation to the screen Can perform MSAA sample averaging or resolve individual samples Can perform format conversions and biasing
  • 26. Direct3D 9+ on Xbox 360 Similar API to PC Direct3D 9.0 Optimized for Xbox 360 hardware No abstraction layers or drivers—it’s direct to the metal Exposes all Xbox 360 custom hardware features New state enums New APIs for finer-grained control and completely new features
  • 27. Direct3D 9+ on Xbox 360 Communicates with GPU via a command buffer Ring buffer in system memory Direct Command Buffer Playback support
  • 28. Direct3D: Drawing Points, lines, triangles, quads, rects, polygons Stripped lines, triangles, quads, polygons Configurable reset index to reset strips within an index buffer DrawPrimitive, DrawIndexedPrimitive DrawPrimitiveUP, DrawIndexedPrimitiveUP No vertex buffers or index buffers required
  • 29. Direct3D: Drawing BeginVertices / EndVertices Direct allocation from the command buffer Better than DrawPrimitiveUP SetStreamSourceFrequency Not directly supported on Xbox 360 Can be done with more flexibility in vertex shader RunCommandBuffer Direct support for modifying and playing command buffers
  • 30. Direct3D: State Render state and sampler state New states added for Xbox 360 Half-pixel offset High-precision blending Primitive reset index Unsupported states ignored by Direct3D
  • 31. Direct3D: State Lazy state Direct3D batches up state in blocks that are optimized for submission to the GPU Uses special CPU instructions for very low CPU overhead State API extensions for Xbox 360 SetBlendState Individual blend control for each render target
  • 32. Direct3D: Render Targets New format for high dynamic range 2:10:10:10 float 7 bits mantissa, 3 bits exponent for each of R, G, B No matching texture format Special mode which allows for 10 bits of alpha precision on blends Optional sRGB gamma correction (2.2) on 8:8:8:8 fixed point
  • 33. Direct3D: New Texture Formats DXN 2-component format with 8 bits of precision per component Great for normal maps CTX1 Also good for normal maps Less bits per pixel than DXN
  • 34. Direct3D: New Texture Formats DXT3A, DXT5A Single component textures made from a DXT3/DXT5 alpha block 4 bits of precision
  • 35. Direct3D: Command Buffer Ring buffer that allows the CPU to safely send commands to the GPU Buffer is filled by CPU, and the GPU consumes the data CPU Write Pointer GPU Read Pointer Code Execution Draw Draw Draw Draw Rendering
  • 36. Direct3D: Predicated Tiling EDRAM is not large enough for 720p rendering with MSAA or multiple render targets Solution: Predicated tiling Divide the surface into a handful of “tiles” Direct3D replays an accumulated command buffer to each tile Direct3D stitches together a final image by resolving each tile pass to a rect in a texture in system memory
  • 37. Direct3D: Predicated Tiling Optimization Uses GPU hardware predication and screen-space extents capture Only renders primitives on the tiles where they appear Cuts down significantly on duplicate vertex processing
  • 38. Direct3D: Predicated Tiling What does it cost? Increased vertex processing cost, for primitives that span more than one tile No pixel processing increase Required for 720p and MSAA
  • 39. XGraphics Graphics API for direct access to Direct3D resources and low-level texture data manipulation Create/update resources without the D3D device XGSetTextureHeader, XGSetVertexBufferHeader, XGSetIndexBufferHeader, etc Useful for threaded loading code
  • 40. XGraphics Direct management of resource memory Use your own allocators, etc. Texture tiling functions Endian swapping functions Win32 library is useful for tool pipelines and offline resource packing
  • 41. Shaders Two options for writing shaders HLSL (with Xbox 360 extensions) GPU microcode (specific to the Xbox 360 GPU, similar to assembly but direct to hardware) Recommendation: Use HLSL Easy to write and maintain Replace individual shaders with microcode if performance analysis warrants it
  • 42. High-Level Shading Language (HLSL) Xbox 360 extensions Texture sampler modes (offsets, filtering override, etc.) Memory export Predicated tiling screen space offset Vertex fetching/index manipulation Tesselation
  • 43. Shaders: GPU Microcode Shader language that maps 1:1 with the hardware Similar to assembly shaders in appearance Somewhat esoteric, but extremely flexible Plan for HLSL, write microcode for performance gains where needed
  • 44. Shaders: Branching Static branching Branching based on bool or int constants Example: Perform texture sample if constant b15 is TRUE Dynamic branching Branching based on a computed or interpolated value Example: Compute dot product, light pixel if result > 0
  • 45. Shaders: Branching Dynamic branching can be as fast as static branching if all vertices/pixels in a 64-unit vector take the same branch
  • 46. Shaders: Vertex Declarations and Fetches Xbox 360 GPU does not natively support vertex declarations or streams Shaders are patched at run time to match the current vertex declaration Vertex fetch instructions used to load vertex data into the GPU
  • 47. Shaders: Vertex Declarations and Fetches Shader patching code caches the patched code using a shader-decl pointer pair Try not to create many identical vertex declarations More GPU Friendly shaders are added as an extension but are not as flexible
  • 48. PIX The ultimate performance and graphics debugging tool Incorporates several modules Performance monitor CPU performance analysis Command buffer capture, playback, and analysis Shader and mesh debuggers
  • 49. PIX Hardware accelerated GPU has hundreds of performance counters Development kit has special hardware for PIX
  • 50. PIX: Performance Monitor Scrolling graph display Configurable graphs CPU, GPU, memory, storage stats Best place to accurately gauge frame rate/frame time Runs continuously while PIX is open Works with any application running on the development kit
  • 51. PIX: CPU Timeline Shows the CPU usage of graphics calls Also shows the CPU usage of your title’s subsystems Use PIXBeginNamedEvent / PIXEndNamedEvent APIs to label your subsystems by name and color
  • 52. PIX: GPU Timeline Shows the execution of each graphics command on the GPU Timing and statistics Full GPU and Direct3D state All input and output resources (textures, surfaces, shaders, etc.) Event breakdown Render target views Mesh debugger
  • 53. PIX: Shader debuggers Vertex shader debugger Click on a vertex in the mesh debugger HLSL And Microcode Pixel shader debugger Right-click on a pixel in a render target Select the right draw call that influenced that pixel HLSL and Microcode
  • 54. Conclusion Direct3D customized and optimized for the hardware New APIs to take advantage of custom features Direct to the metal, no drivers or HAL
  • 55. Conclusion Extremely powerful graphics hardware Unified shaders Shader model 3.0 plus extensions for Xbox 360 Fast EDRAM provides a huge amount of dedicated frame buffer bandwidth Dedicated hardware for tesselation, procedural geometry, and multi-sample anti-aliasing
  • 56. Conclusion High performance shader development Brand new HLSL compiler for Xbox 360 GPU microcode to expose the hardware’s abilities FXLite for fast effects
  • 57. Conclusion Best performance tools in the industry PIX is your best friend and your secret weapon Hardware-accelerated Exposes every bit of performance info
  • 58. Resources GDC 2006 Presentations http://msdn.com/directx/presentations DirectX Developer Center http://msdn.com/directx XNA Developer Center http://msdn.com/xna XNA, DirectX, XACT Forums http://msdn.com/directx/forums Email addresses directx@microsoft.com (DirectX & Windows development) xna@microsoft.com (XNA Feedback)
  • 59. © 2006 Microsoft Corporation. All rights reserved. Microsoft, DirectX, Xbox 360, the Xbox logo, and XNA are either registered trademarks or trademarks of Microsoft Corporation in the United Sates and / or other countries. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

Notes de l'éditeur

  1. V
  2. One thing that is a bit unusual about continuous tessellation is that the number of vertices per-edge is always an odd number. Notice how the middle quad has 5 vertices per edge. It would be nice to be able to demo this in real-time.
  3. If the CPU gets too far ahead of the GPU, there is a stall If the GPU catches up to the CPU, the GPU sits idle
  4. V