4. TressFX OVERVIEW
WHAT IS TressFX?
Realistic hair rendering and simulation
‒ Used in Tomb Raider
Goes beyond simple shells and fins representation used in games
Hair is rendered as thousands of strands with self shadowing, antialiasing and transparency
Physical simulation for each strand using GPU compute shaders
Very flexible to allow for different hair styles and different conditions
4 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
5. TressFX RENDERING
WHAT MAKES IT LOOK GOOD
What goes into good hair?
‒ Anti-aliasing
‒ Volumetric self shadowing
‒ Transparency
Basic Rendering
Antialiasing
Antialiasing
+ Self Shadowing
5 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
Antialiasing
+ Self Shadowing
+ Transparency
7. TressFX RENDERING
LIGHTING MODEL
Secondary Highlights
Kajiya-Kay Hair Lighting Model
‒ Anisotropic hair strand lighting model
‒ Uses the tangent along the strand instead of the normal
for light reflections
‒ Instead of cos(N, H) , use sin(T,H)
Marschner Model
‒ Two specular highlights
‒ Primary light colored highlight shifted towards the tip
‒ Secondary hair colored highlight shifted towards the root
‒ TressFX uses an approximation of the Marchner
technique when rendering two highlights
7 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
Primary Highlights
8. TressFX RENDERING
ANTI-ALIASING
Every hair strand is anti-aliased manually
‒ Not using Hardware MSAA!
Compute pixel coverage on edges of hair strands and convert it
to an alpha value
8 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
9. TressFX RENDERING
SELF SHADOWING
Self Shadowing
‒ Uses a simplified Deep Shadow Map technique
No Self Shadows
9 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
With Self Shadows
10. TressFX RENDERING
TRANSPARENCY
Order Independent Transparency (OIT) using a Per-Pixel Linked Lists (PPLL)
Fragments are stored in link lists on the GPU
Nearest K fragments are rendered in back to front order
No Transparency
10 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
With Transparency
11. TressFX 1.0 RENDERING
How rendering was done in version 1.0
TressFX 1.0 Rendering
‒ Render hair strand geometry into A-buffer
‒ Do lighting, shadowing, and antialiasing
‒ Store fragment color with depth and coverage in per-pixel linked list (PPLL)
‒ Render the K nearest fragments (K-buffer) in back to front order
‒ Blend nearest K fragments in the correct order with transparency
‒ Blend the remaining fragments without sorting
11 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
12. TressFX 1.0 RENDERING
A-BUFFER PASS
Coverage
Hair Geometry
Vertex Shader
Head UAV
Pixel Shader
Lighting
PPLL UAV
Shadows
depth
color
coverage
12 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
next
13. TressFX 1.0 RENDERING
PER-PIXEL LINKED LIST
GPU implementation of order independent transparency (OIT)
Head UAV
‒ Each pixel location has a “head pointer” to a linked list in the PPLL
UAV
Head UAV
PPLL UAV
‒ As new fragments are rendered, they are added to the next open
location in the PPLL (using UAV counter)
‒ A link is created to the fragment pointed to by the head pointer
‒ Head pointer then points to the new fragment
// Retrieve current pixel count and increase counter
uint uPixelCount = LinkedListUAV.IncrementCounter();
uint uOldStartOffset;
// Exchange indices in LinkedListHead texture corresponding to pixel location
InterlockedExchange(LinkedListHeadUAV[address], uPixelCount, uOldStartOffset);
// Append new element at the end of the Fragment and Link Buffer
Element.uNext = uOldStartOffset;
LinkedListUAV[uPixelCount] = Element;
13 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
PPLL UAV
14. TressFX 1.0 RENDERING
K-BUFFER PASS
K-Buffer
Full Screen Quad
Vertex Shader
14 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
Pixel Shader
depth
depth
color depth
color depth
coverage color
coverage color
coverage
coverage
Transparency
15. TressFX 1.0 RENDERING
HOW CAN WE MAKE IT FASTER?
Observation
‒ All fragments are lit and shadowed equally
‒ Even the ones buried under dozens of hair fragments that you can’t see
Solution
‒ Defer the lighting and shadowing until the k-buffer pass
‒ Render the nearest K fragments with high quality
‒ Render the remaining fragments with lower quality (but faster)
15 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
16. TressFX 2.0 RENDERING
A-BUFFER PASS
Hair Geometry
Vertex Shader
Pixel Shader
Coverage
depth
coverage
tangent
next
16 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
18. TressFX 2.0 IMPROVEMENTS
CONTINUOUS LODs
Distance to camera can be used for reducing the density of the hair
‒ Uniformly remove hair strands from the rendering
‒ To compensate for missing strands, thicken the hair
‒ Adjust the minimum pixel coverage with distance
Full Density Hair
18 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
Reduced Density Hair
Reduced Density with Thicker Strands
19. TressFX 2.0 IMPROVEMENTS
CODE RESTRUCTURING
TressFX11 Sample Code is much more modular
All of the necessary TressFX code in separate files
for
‒ Rendering
‒ Simulation
‒ Mesh management
‒ Asset loading
Code for head rendering and sample framework
are completely separate
Main
TressFXSimulate
TressFXSimulate
TressFXRender
SceneRender
SceneRender
TressFXRender
TressFXMesh
Gaussian
Filter
‒ Take the “TressFX” files to get just what you need
Better variable names
Removal of dead code
19 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
DX11Mesh
TressFXAssetLoader
TressFX Code
ObjImport
20. TressFX 2.0 IMPROVEMENTS
MISCELLANEOUS IMPROVEMENTS
Vertex shader optimizations for rendering
‒ Draw call for hair now uses an index buffer with a triangle list instead of looking up indices from a buffer
PPLL head buffer uses a RWTexture2D for better caching (tiled)
Hair shadow on model is softer and less blocky
Various shader code optimizations
Porting Guide
Download the new TressFX 2.0 sample soon from our Radeon SDK :
http://developer.amd.com
20 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
21. TressFX 2.0 RENDERING
MEMORY CONSIDERATIONS
A-Buffer
‒ 2 UAVs
‒ Size determined by resolution
‒ Head of the Linked List UAV
250.00
200.00
‒ Screen resolution RWTexture2D, DXGI_FORMAT_R32_UINT
‒ Per-Pixel Linked List UAV
‒ Structured Buffer, size = (number of pixels) x (avg hair layers) x
(sizeof(LinkedListStructure))
‒ Default average number of hair layers is 8
‒ Linked list structure is currently 3 DWORDs: depth, coverage, tangent
Limited memory, but unbounded linked list
‒ This means too many fragments for a given pixel can overflow the PPLL
‒ Can cause artifacts
‒ Typically this only happens if the camera gets too close
21 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
150.00
720p
100.00
1080p
50.00
0.00
Total A-Buffer
Memory (MB)
Linked List Head
Per-Pixel Linked List
24. TressFX Simulation
Topics
TressFX 1.0 Simulation Overview
‒ Main Interest
‒ Simulation Overview
‒ Constraints
‒ Global shape constraints
‒ Local Shape Constraints
‒ Edge length constraints
‒ Problems
TressFX Beyond
‒ General Constraint Formulation
‒ Tridiagonal Matrix-free Formulation
‒ Solving Linear System
‒ Benefits
24 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
25. Main Interest
Main interest of TressFX simulation
‒ Performance, performance and performance! – DirectCompute
‒ Styled hair – bending and twisting forces are important
‒ Stability – position based dynamics
- Conditions – wet, dry or heavy
- Wind – helps express dynamics even the character in the idle mode
25 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
26. Simulation Overview
load hair data
precompute rest-state values – can be offline
CPU
while simulation running do
apply gravity
integrate
apply GSC (Global Shape Constraints)
GPU – DirectCompute
apply LSC (Local Shape Constraints)
apply wind
apply ELC (Edge Length Constraints)
collision handling
vertex buffer
26 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
GPU – Rendering
pipeline
27. GLOBAL SHAPE CONSTRAINTS
GSC(Global Shape Constraints)
‒ The initial positions of particles serve as the global goal positions
‒ The goal positions are rigid w.r.t character head transform.
‒ You can think the initial positions are some cage and vertices are trapped in that cage during simulation.
‒ Easy and cheap. Help maintain the global shape but lose the detailed simulation
initial goal position
current position
final position
27 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
28. LOCAL SHAPE CONSTRAINTS
LSC(Local Shape Constraints)
‒ The goal positions are determined in the local frames.
‒ Still the goal positions are transformed in world frames and applied to vertex positions.
initial goal position
current position
final position
28 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
29. LOCAL SHAPE CONSTRAINTS – CONT’
Local Transforms
‒ As in robotic arm, an open-chain structure has joints and each joint has parent-child relationships to its connected
joints.
‒ 𝑖−1 𝑇𝑖 is to transform (translate and rotate) child space(i) to its parent space(i-1)
‒ With local transforms in chain structure, we can get a global transforms.
𝑤
𝑇𝑖 =
𝑤
𝑇0 ∙ 0 𝑇1 … ∙
𝑖−2
𝑇𝑖−1 ∙
‒ Local frames should be updated at each particles
29 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
𝑖−1
𝑇𝑖
30. LOCAL SHAPE CONSTRAINTS – CONT’
Initialize and update local and global transforms
‒ Initialization is performed in CPU or offline only once.
‒ Update is performed at each frame in GPU.
‒ Update is serial process but independent to other strands. We update multiple strands in massive parallel processes
in GPU.
‒ With local and global transforms, we can calculate target vertex positions for local shape constraints.
‒ Finally, update two neighboring vertices to get stable convergence.
Updating position
i
Computing on local transform
i-1
Zero
30 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
31. EDGE LENGTH CONSTRAINTS
how much
stretched or
compressed
0.5
31 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
unit edge vector
32. Problems
Extreme acceleration
‒ When character makes a sudden move, it can generate extreme linear and
angular acceleration which stretch hair very long.
‒ Even with high iterations with Edge Length Constraints, hair doesn’t recover the
original length and as a result, hair can look too stretchy.
‒ Possible solution was to enforce Edge Length Constraints in the serial fashion
from the root to the end of hair with extra damping – used for Tomb Raider
‒ We need a better way! And we did research!
32 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
36. Tridiagonal Matrix Formulation
Special Formulation for Chain Structure such as Hair
‒ We don’t want to solve a big matrix equation, especially in GPU!
‒ Let’s take advantage of linear topology and serial indexing
General
case. We
don’t
want this!
Known. Easy to compute them.
Special
case. Much
simpler!
Unknown and what we are solving for
36 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
37. SOLVING LINEAR SYSTEM
Solving Linear System
‒ The formulation doesn’t require explicit matrix – Good for GPU!
‒ Diagonal, super and sub diagonal elements are non-zero - Sparse!
‒ The equation is diagonally dominant – Good for choice of direct solver!
‒ We can use tridiagonal matrix algorithm (Thomas algorithm)
‒ So we can solve it in GPU!
37 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
41. BENEFITS
No more iterations for Edge Length Constraints
‒ Needn’t have to guess number of iterations
‒ Fixed computation cost
‒ Fast convergence
41 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
42. TressFX 2.0
CONCLUSIONS
TressFX 2.0 performance now makes hair rendering faster than the previous version
‒ More than 2X faster in some cases
TressFX is now fast enough to use on consoles
More modular code structure means easier porting to your game
Realistic physics for hair simulation can now be extended to other objects
Stay tuned for more!
‒ Ongoing research to improve and expand the use of this technology
42 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT
43. REFERENCE
Real-time Hair Simulation with Efficient Hair Style Preservation – Han, et al. VRIPHYS 2012
Tridiagonal Matrix Formulation for Inextensible Hair Strand Simulation – Han, et al. VRIPHYS 2013
43 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT