In this talk, the authors will describe an overview of a different method for deferred lighting approach used in CryENGINE 3, along with an in-depth description of the many techniques used. Original file and videos at http://crytek.com/cryengine/presentations
Extensive tests for all platforms, many combos researched. From interleaved lighting storage, RG11B10 and other formats, diferent space storage (like inv exponential or sRGB), diferent encoding types. Simplest solutions turned out the most efficient, cleanest, most robust and general ones. Slim G-Buffer A8B8G8R8 Normals + glossiness Readback D24S8 Depth + stencil magic 2x A2B10G10R10F_EDRAM for Diffuse and Specular buffers X360: No FP10 tex format, resolved to A2B10G10R10 instead (XDK) PS3: 2x A8B8G8R8 encoded in RGBK Encoding <=> No HW blending possible. Workaround: r/w from dst RT A2B10G10R10F_EDRAM for scene target PC: A16B16G16R16F PS3: A8B8G8R8 RGBK encoded for opaque, FP16 for transparents
Add GI [7] (add blend) Apply SSDO (mul blend) and RLR (alpha blend) Add light sources Normalized Blinn-Phong For C2, only projectors and point lights Limited shadow casters count on consoles - go bananas on PC Resolve/restore requirement for Xbox 360 Lights rendering depends on couple heuristics FS quads with stencil volumes pre-passes: for global lights or when viewer is inside of light source Else convex light volumes or even 2D quads + depth bounds when coverage not significant Light frustum volumes are reconstructed in vertex shader Special geometry of tessellated unit cubes and frustum projection matrix
Base idea: reuse diffuse lighting accumulation No redundant work as in vanila UV space approach Concept also expandeable to UV space approach Aproximate Sub-Surface Scattering in screen space Poisson distribution for taps, straighforward bilateral blur using depth for discarding samples. Best bits: No aditional memory requirements for atlas Cost proportional to screen area coverage VS fixed cost per surface update Plus Zcull is your biggest friend. What happens when a face covers entire screen? Exactly. Sharing is the keyword here
Super simple trick/approximation. Ray cast along screen space light vector Have macro self-shadowing details, even on consoles Important visual hint, visible at diferent distances Cost proportional to how big surface gets on screen Unless a cutscene, its very rare Used for skin, hair and eyes
The usual alpha blending troubles issues: fog volumes, shadows, sorting, refraction. Efficient hair rendering is a pain on this HW generation Hair is a particular case of alpha blending, albeit a very important case for characters look Avoid it ? Alpha test looks very circa 1999 ATOC doens’t look so good and requires HW MSAA One solution: smooth hair geometry along screen space tangent vector Cost proportional to how big geometry on screen
Re-projection + velocity buffer We got rid of our old approach of using the sphere around camera approach has it mainly worked for camera rotations For static geometry, straighforward to compute how much a pixel moved compared to previous frame, by using depth and previous view projection matrix can compute previous frame pixel location and compute delta (velocity) to current frame. Handle dynamic geometry by outputting velocity Full resolution means at least ~3 ms spent. Not much can do for speeding up (for worst case scenario) Use half resolution instead Inspired by KZ2 approach [5] To save BW, we ended up storing 2 rgba8 targets – one rgbk encoded with final result, and the other with the composition mask. On pc just straighforward FP16, with mask on alpha Object velocity buffer dilation Avoid unpleasant sharp silhouettes -> Dilate slightly velocity buffer Done on the fly, on same pass as when applying motion blur Good enough and no aditional passes/resolves Do it in HDR, before Tone Mapping Physically correctness Bright streaks are kept and propagated [1] Looks great on bright particles For console specs we used 9 taps and clamping to a maximum range to avoid noticeable undersampling. PC hi specs, 24 taps.
Just another blur kernel Taps offsets are Disk shaped Kernel size scaled based on Circle of Confusion term Reused same 9 taps for consoles. 24 taps for higher pc specs
Go Bananas: Render a Quad/Sprite per pixel Scale quads by CoC factor Accumulate results into intermediate targets Sort Quads by layers: Front/Back Alpha stores quad count Dithering to minimize noticeable precision loss Composite with final scene Divide layers result by layer alpha Quality can’t get better than this (and slower!) And slower.. > 100 ms for large defocus at 1080p
Video
Video
Idea is simply to extrude quads along motion direction Waiting for hardware to catch up
Particularly important for deferred lighting.. Can minimize light range clipping (notice brights highlights being clipped by dark alpha blended surface) or any alpha blending case on top of bright surface, like decals, fog volumes, darker smoke particles and such. Minimize precision artefacts on darker regions of image. Allows for physically based post processes – Motion Blur, Depth of Field (and others) should be done before tone mapping. This final image would be almost impossible to achieve in LDR. And most importantly, it improves the art workflow significantly – as it minimizes the amount of time artists have to waste workaround such issues on more old fashioned engines. Linear Correctness All math in linear space: L inear space lighting, Linear correct blending Used HW sRGB Precision!
Code for computing of VPos basis for position reconstruction cam is view camera camMatrix is simple camera’s orientation ortho-normal matrix. vStereoShift is shift for computing reconstruction for the stereo rendering (we use Off-Center perspective projection) mShadowTexGen – can represent any kind of homogeneous transformation or be Identity matrix
Can’t afford on consoles each light casting shadows Clip Boxes / Volumes Tool for lighting artists to eliminate light bleeding Artist created convex volumes + regular types ( Cone, boxes, etc)