Oit And Indirect Illumination Using Dx11 Linked Lists

Holger Gruen
Holger GruenSenior Developer Technoloy Engineer à NVIDIA
Oit And Indirect Illumination Using Dx11 Linked Lists
OIT and Indirect Illumination using DX11 Linked Lists,[object Object],Holger Gruen	AMD ISV Relations,[object Object],Nicolas Thibieroz	AMD ISV Relations,[object Object]
Agenda,[object Object],Introduction,[object Object],Linked List Rendering,[object Object],Order Independent Transparency,[object Object],Indirect Illumination,[object Object],Q&A,[object Object]
Introduction,[object Object],Direct3D 11 HW opens the door to many new rendering algorithms,[object Object],In particular per pixel linked lists allow for a number of new techniquesOIT, Indirect Shadows, Ray Tracing of dynamic scenes, REYES surface dicing, custom AA, Irregular Z-buffering, custom blending, Advanced Depth of Field, etc. ,[object Object],This talk will walk you through:A DX11 implementation of per-pixel linked list and two effects that utilize this techique,[object Object],OIT,[object Object],Indirect Illumination ,[object Object]
Per-Pixel Linked Lists with Direct3D 11,[object Object],Element,[object Object],Element,[object Object],Element,[object Object],Element,[object Object],Link,[object Object],Link,[object Object],Link,[object Object],Link,[object Object],Nicolas Thibieroz,[object Object],European ISV Relations,[object Object],AMD,[object Object]
Why Linked Lists?,[object Object],Data structure useful for programming,[object Object],Very hard to implement efficiently with previous real-time graphics APIs,[object Object],DX11 allows efficient creation and parsing of linked lists,[object Object],Per-pixel linked lists,[object Object],A collection of linked lists enumerating all pixels belonging to the same screen position,[object Object],Element,[object Object],Element,[object Object],Element,[object Object],Element,[object Object],Link,[object Object],Link,[object Object],Link,[object Object],Link,[object Object]
Two-step process,[object Object],1) Linked List Creation,[object Object],Store incoming fragments into linked lists,[object Object],2) Rendering from Linked List,[object Object],Linked List traversal and processing of stored fragments,[object Object]
Creating Per-Pixel Linked Lists,[object Object]
PS5.0 and UAVs,[object Object],Uses a Pixel Shader 5.0 to store fragments into linked lists,[object Object],Not a Compute Shader 5.0!,[object Object],Uses atomic operations,[object Object],Two UAV buffers required,[object Object],- “Fragment & Link” buffer,[object Object],- “Start Offset” buffer,[object Object],         UAV = Unordered Access View,[object Object]
Fragment & Link Buffer,[object Object],The “Fragment & Link” buffer contains data and link for all fragments to store,[object Object],Must be large enough to store all fragments,[object Object],Created with Counter support,[object Object],D3D11_BUFFER_UAV_FLAG_COUNTER flag in UAV view,[object Object],Declaration:,[object Object],structFragmentAndLinkBuffer_STRUCT,[object Object],{,[object Object],FragmentData_STRUCTFragmentData;	// Fragment data,[object Object],uintuNext;		// Link to next fragment,[object Object],};,[object Object],RWStructuredBuffer <FragmentAndLinkBuffer_STRUCT> FLBuffer;,[object Object]
Start Offset Buffer,[object Object],The “Start Offset” buffer contains the offset of the last fragment written at every pixel location,[object Object],Screen-sized:(width * height * sizeof(UINT32) ),[object Object],Initialized to magic value (e.g. -1),[object Object],Magic value indicates no more fragments are stored (i.e. end of the list),[object Object],Declaration:,[object Object],RWByteAddressBufferStartOffsetBuffer;,[object Object]
Linked List Creation (1),[object Object],No color Render Target bound!,[object Object],No rendering yet, just storing in L.L.,[object Object],Depth buffer bound if needed,[object Object],OIT will need it in a few slides,[object Object],UAVs bounds as input/output:,[object Object],StartOffsetBuffer (R/W),[object Object],FragmentAndLinkBuffer (W),[object Object]
Linked List Creation (2a),[object Object],Start Offset Buffer,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],Viewport,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],0,[object Object],Fragment and Link Buffer,[object Object],Counter =,[object Object],Fragment and Link Buffer,[object Object],Fragment and Link Buffer,[object Object],Fragment and Link Buffer,[object Object]
Linked List Creation (2b),[object Object],Start Offset Buffer,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],Viewport,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],1,[object Object],0,[object Object],Fragment and Link Buffer,[object Object],Counter =,[object Object],Fragment and Link Buffer,[object Object],-1,[object Object],Fragment and Link Buffer,[object Object],-1,[object Object],Fragment and Link Buffer,[object Object]
Linked List Creation (2c),[object Object],Start Offset Buffer,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],0,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],Viewport,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],1,[object Object],2,[object Object],3,[object Object],Fragment and Link Buffer,[object Object],Counter =,[object Object],Fragment and Link Buffer,[object Object],-1,[object Object],Fragment and Link Buffer,[object Object],-1,[object Object],-1,[object Object],Fragment and Link Buffer,[object Object]
Linked List Creation (2d),[object Object],Start Offset Buffer,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],0,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],Viewport,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],1,[object Object],2,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],4,[object Object],3,[object Object],5,[object Object],Fragment and Link Buffer,[object Object],Counter =,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],Fragment and Link Buffer,[object Object],0,[object Object],-1,[object Object],Fragment and Link Buffer,[object Object]
Linked List Creation - Code,[object Object],float PS_StoreFragments(PS_INPUT input) : SV_Target,[object Object],{,[object Object], // Calculate fragment data (color, depth, etc.),[object Object],FragmentData_STRUCTFragmentData = ComputeFragment();,[object Object], // Retrieve current pixel count and increase counter,[object Object],uintuPixelCount = FLBuffer.IncrementCounter();,[object Object],// Exchange offsets in StartOffsetBuffer,[object Object],uintvPos = uint(input.vPos);,[object Object],uintuStartOffsetAddress= 4 * ( (SCREEN_WIDTH*vPos.y) + vPos.x );,[object Object],uintuOldStartOffset;,[object Object],StartOffsetBuffer.InterlockedExchange(uStartOffsetAddress, uPixelCount, uOldStartOffset);,[object Object],// Add new fragment entry in Fragment & Link Buffer,[object Object],FragmentAndLinkBuffer_STRUCT Element;,[object Object],Element.FragmentData  = FragmentData;,[object Object],Element.uNext         = uOldStartOffset;,[object Object],FLBuffer[uPixelCount] = Element;,[object Object],},[object Object]
Traversing Per-Pixel Linked Lists,[object Object]
Rendering Pixels (1),[object Object],“Start Offset” Buffer and “Fragment & Link” Buffer now bound as SRV,[object Object],Buffer<uint> StartOffsetBufferSRV;,[object Object],StructuredBuffer<FragmentAndLinkBuffer_STRUCT> FLBufferSRV;,[object Object],Render a fullscreen quad,[object Object],For each pixel, parse the linked list and retrieve fragments for this screen position,[object Object],Process list of fragments as required,[object Object],Depends on algorithm,[object Object],e.g. sorting, finding maximum, etc.,[object Object],		    SRV = Shader Resource View,[object Object]
Rendering from Linked List,[object Object],Start Offset Buffer,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],4,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],3,[object Object],3,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],Render Target,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],1,[object Object],2,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],Fragment and Link Buffer,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],Fragment and Link Buffer,[object Object],0,[object Object],0,[object Object],-1,[object Object],Fragment and Link Buffer,[object Object]
Rendering Pixels (2),[object Object],float4 PS_RenderFragments(PS_INPUT input) : SV_Target,[object Object],{,[object Object],  // Calculate UINT-aligned start offset buffer address,[object Object],uintvPos = uint(input.vPos);,[object Object],uintuStartOffsetAddress =  SCREEN_WIDTH*vPos.y + vPos.x;,[object Object],// Fetch offset of first fragment for current pixel,[object Object],uintuOffset = StartOffsetBufferSRV.Load(uStartOffsetAddress);,[object Object],// Parse linked list for all fragments at this position,[object Object],  float4 FinalColor=float4(0,0,0,0);,[object Object],  while (uOffset!=0xFFFFFFFF)	// 0xFFFFFFFF is magic value,[object Object],  {,[object Object],// Retrieve pixel at current offset,[object Object],    Element=FLBufferSRV[uOffset];,[object Object],    // Process pixel as required,[object Object],ProcessPixel(Element, FinalColor);,[object Object],// Retrieve next offset,[object Object],uOffset = Element.uNext;,[object Object],  },[object Object],  return (FinalColor);,[object Object],},[object Object]
Order-Independent Transparency via Per-Pixel Linked Lists,[object Object],Nicolas Thibieroz,[object Object],European ISV Relations,[object Object],AMD,[object Object]
Description,[object Object],Straight application of the linked list algorithm,[object Object],Stores transparent fragments into PPLL,[object Object],Rendering phase sorts pixels in a back-to-front order and blends them manually in a pixel shader,[object Object],Blend mode can be unique per-pixel!,[object Object],Special case for MSAA support,[object Object]
Linked List Structure,[object Object],Optimize performance by reducing amount of data to write to/read from UAV,[object Object],E.g. uint instead of float4 for color,[object Object],Example data structure for OIT:,[object Object],structFragmentAndLinkBuffer_STRUCT,[object Object],{,[object Object],uintuPixelColor;	// Packed pixel color,[object Object],uintuDepth;	// Pixel depth,[object Object],uintuNext;		// Address of next link,[object Object],};,[object Object],May also get away with packed color and depth into the same uint! (if same alpha),[object Object],16 bits color (565) + 16 bits depth,[object Object],Performance/memory/quality trade-off,[object Object]
Visible Fragments Only!,[object Object],Use [earlydepthstencil] in front of Linked List creation pixel shader,[object Object],This ensures only transparent fragments that pass the depth test are stored,[object Object],i.e. Visible fragments!,[object Object],Allows performance savings and rendering correctness!,[object Object],[earlydepthstencil],[object Object],float PS_StoreFragments(PS_INPUT input) : SV_Target,[object Object],{,[object Object],  ...,[object Object],},[object Object]
Sorting Pixels,[object Object],Sorting in place requires R/W access to Linked List,[object Object],Sparse memory accesses = slow!,[object Object],Better way is to copy all pixels into array of temp registers,[object Object],Then do the sorting,[object Object],Temp array declaration means a hard limit on number of pixel per screen coordinates,[object Object],Required trade-off for performance,[object Object]
Sorting and Blending,[object Object],0.95,[object Object],0.93,[object Object],0.87,[object Object],0.98,[object Object],Background,[object Object],color,[object Object],Temp Array,[object Object],Render Target,[object Object],PS color,[object Object],Blend fragments back to front in PS,[object Object],Blending algorithm up to app,[object Object],Example: SRCALPHA-INVSRCALPHA,[object Object],Or unique per pixel! (stored in fragment data),[object Object],Background passed as input texture,[object Object],Actual HW blending mode disabled,[object Object],0.98,[object Object],0.87,[object Object],0.95,[object Object],0.93,[object Object],-1,[object Object],0,[object Object],12,[object Object],34,[object Object]
Storing Pixels for Sorting,[object Object], (...),[object Object], static uint2 SortedPixels[MAX_SORTED_PIXELS]; ,[object Object], // Parse linked list for all pixels at this position,[object Object], // and store them into temp array for later sorting,[object Object],intnNumPixels=0;,[object Object], while (uOffset!=0xFFFFFFFF),[object Object], {,[object Object],// Retrieve pixel at current offset,[object Object],    Element=FLBufferSRV[uOffset];,[object Object],    // Copy pixel data into temp array,[object Object],SortedPixels[nNumPixels++]=,[object Object],		uint2(Element.uPixelColor, Element.uDepth);,[object Object],// Retrieve next offset,[object Object],    [flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ?,[object Object],                        0xFFFFFFFF : Element.uNext;,[object Object], },[object Object],// Sort pixels in-place,[object Object],SortPixelsInPlace(SortedPixels, nNumPixels);,[object Object], (...) ,[object Object]
Pixel Blending in PS,[object Object], (...),[object Object], // Retrieve current color from background texture,[object Object], float4 vCurrentColor=BackgroundTexture.Load(int3(vPos.xy, 0)); ,[object Object], // Rendering pixels using SRCALPHA-INVSRCALPHA blending,[object Object], for (int k=0; k<nNumPixels; k++),[object Object], {,[object Object],// Retrieve next unblended furthermost pixel,[object Object],    float4 vPixColor= UnpackFromUint(SortedPixels[k].x);,[object Object],    // Manual blending between current fragment and previous one,[object Object],    vCurrentColor.xyz= lerp(vCurrentColor.xyz, vPixColor.xyz,,[object Object],vPixColor.w);,[object Object],  },[object Object],// Return manually-blended color,[object Object],  return vCurrentColor;,[object Object],} ,[object Object]
OIT via Per-Pixel Linked Lists with MSAA Support,[object Object]
Sample Coverage,[object Object],Storing individual samples into Linked Lists requires a huge amount of memory,[object Object],... and performance will suffer!,[object Object],Solution is to store transparent pixels into PPLL as before,[object Object],But including sample coverage too!,[object Object],Requires as many bits as MSAA mode,[object Object],Declare SV_COVERAGE in PS structure,[object Object],struct PS_INPUT,[object Object],   {,[object Object],     float3 vNormal : NORMAL;,[object Object],     float2 vTex    : TEXCOORD;,[object Object],     float4 vPos    : SV_POSITION;,[object Object],uintuCoverage : SV_COVERAGE;,[object Object],   },[object Object]
Linked List Structure,[object Object],Almost unchanged from previously,[object Object],Depth is now packed into 24 bits,[object Object],8 Bits are used to store coverage,[object Object],structFragmentAndLinkBuffer_STRUCT,[object Object],{,[object Object],uintuPixelColor;		// Packed pixel color,[object Object],uintuDepthAndCoverage;	// Depth + coverage,[object Object],uintuNext;		// Address of next link,[object Object],};,[object Object]
Sample Coverage Example,[object Object],Pixel Center,[object Object],Sample,[object Object],Third sample is covered,[object Object],uCoverage = 0x04           (0100 in binary),[object Object],Element.uDepthAndCoverage = ,[object Object],( In.vPos.z*(2^24-1) << 8 ) | In.uCoverage;,[object Object]
Rendering Samples (1),[object Object],Rendering phase needs to be able to write individual samples,[object Object],Thus PS is run at sample frequency,[object Object],Can be done by declaring SV_SAMPLEINDEX in input structure,[object Object],Parse linked list and store pixels into temp array for later sorting,[object Object],Similar to non-MSAA case,[object Object],Difference is to only store sample if coverage matches sample index being rasterized,[object Object]
Rendering Samples (2),[object Object], static uint2 SortedPixels[MAX_SORTED_PIXELS]; ,[object Object], // Parse linked list for all pixels at this position,[object Object], // and store them into temp array for later sorting,[object Object],intnNumPixels=0;,[object Object], while (uOffset!=0xFFFFFFFF),[object Object], {,[object Object],// Retrieve pixel at current offset,[object Object],    Element=FLBufferSRV[uOffset];,[object Object],  // Retrieve pixel coverage from linked list element,[object Object],uintuCoverage=UnpackCoverage(Element.uDepthAndCoverage);,[object Object], 	 if ( uCoverage & (1<<In.uSampleIndex) ),[object Object],    {,[object Object],      // Coverage matches current sample so copy pixel,[object Object],SortedPixels[nNumPixels++]=Element;,[object Object],},[object Object],// Retrieve next offset,[object Object],    [flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ?,[object Object],                        0xFFFFFFFF : Element.uNext;,[object Object], },[object Object]
DEMO,[object Object],OIT Linked List Demo,[object Object]
Direct3D 11 Indirect Illumination,[object Object],Holger GruenEuropean ISV Relations AMD,[object Object]
Indirect Illumination Introduction 1,[object Object],Real-time Indirect illumination is an active research topic,[object Object],Numerous approaches existReflective Shadow Maps (RSM)[Dachsbacher/Stammiger05]Splatting Indirect Illumination [Dachsbacher/Stammiger2006]Multi-Res Splatting of Illumination [Wyman2009]Light propagation volumes [Kapalanyan2009]Approximating Dynamic Global Illumination in Image Space [Ritschel2009] ,[object Object],Only a few support indirect shadowsImperfect Shadow Maps [Ritschel/Grosch2008]Micro-Rendering for Scalable, Parallel Final Gathering(SSDO) [Ritschel2010] Cascaded light propagation volumes for real-time indirect illumination [Kapalanyan/Dachsbacher2010],[object Object],Most approaches somehow extend to multi-bounce lighting,[object Object]
Indirect Illumination Introduction 2,[object Object],This section will coverAn efficient and simple DX9-compliant RSM based implementation for smooth one bounce indirect illumination,[object Object],Indirect shadows  are  ignored here,[object Object],A Direct3D 11 technique that traces rays to compute indirect shadows,[object Object],Part of this technique could generally be used for ray-tracing dynamic scenes,[object Object]
Indirect Illumination w/o Indirect Shadows,[object Object],Draw scene g-buffer,[object Object],Draw Reflective Shadowmap (RSM),[object Object],RSM shows the part of the scene that recieves direct light from the light source,[object Object],Draw Indirect Light buffer at ½ res ,[object Object],RSM texels are used as light sources on g-buffer pixels for indirect lighting ,[object Object],Upsample Indirect Light (IL),[object Object],Draw final image adding IL,[object Object]
Step 1,[object Object],G-Buffer needs to allow reconstruction of,[object Object],World/Camera space position,[object Object],World/Camera space normal,[object Object],Color/ Albedo ,[object Object],DXGI_FORMAT_R32G32B32A32_FLOAT positions may be required for precise ray queries for indirect shadows,[object Object]
Step 2,[object Object],RSM needs to allow reconstruction of,[object Object],World/Camera space position,[object Object],World/Camera space normal,[object Object],Color/ Albedo,[object Object],Only draw emitters of indirect light,[object Object],DXGI_FORMAT_R32G32B32A32_FLOAT position may be required for ray precise queries for indirect shadows,[object Object]
Step 3,[object Object],Render a ½ res IL as a deferred op,[object Object],Transform g-buffer pix to RSM space,[object Object],->Light Space->project to RSM texel space,[object Object],Use a kernel of RSM texels as light sources,[object Object],RSM texels also called Virtual Point Light(VPL),[object Object],Kernel size depends on,[object Object],Desired speed,[object Object],Desired look of the effect,[object Object],RSM resolution,[object Object]
Computing IL at a G-buf Pixel 1,[object Object],Sum up contribution of all VPLs in the kernel,[object Object]
Computing IL at a G-buf Pixel 2,[object Object],RSM texel/VPL,[object Object],g-buffer pixel,[object Object],This term is very similar to terms used in radiosity form factor computations,[object Object]
Computing IL at a G-buf Pixel 3,[object Object],A naive solution for smooth IL  needs to consider four VPL kernels with centers at t0, t1, t2 and t3.,[object Object],stx : sub RSM texel x position [0.0, 1.0[,[object Object],sty : sub RSM texel y position [0.0, 1.0[,[object Object]
Computing IL at a g-buf pixel 4,[object Object],IndirectLight =,[object Object], (1.0f-sty) * ((1.0f-stx) *     + stx *     ) +,[object Object],  (0.0f+sty) * ((1.0f-stx) *     + stx *     ),[object Object],Evaluation of 4 big VPL kernels is slow ,[object Object],VPL kernel at t0,[object Object],stx : sub texel x position [0.0, 1.0[,[object Object],VPL kernel at t2,[object Object],sty : sub texel y position [0.0, 1.0[,[object Object],VPL kernel at t1,[object Object],VPL kernel at t3,[object Object]
Computing IL at a g-buf pixel 5,[object Object],SmoothIndirectLight =,[object Object], (1.0f-sty)*(((1.0f-stx)*(B0+B3)+stx*(B2+B5))+B1)+,[object Object], (0.0f+sty)*(((1.0f-stx)*(B6+B3)+stx*(B8+B5))+B7)+B4,[object Object],stx : sub RSM texel x position of g-buf pix [0.0, 1.0[,[object Object],sty : sub RSM texel y position of g-buf pix [0.0, 1.0[,[object Object],This trick is probably known to some of you already. See backup for a detailed explanation !,[object Object]
Indirect Light Buffer,[object Object]
Step 4,[object Object],Indirect Light buffer is ½ res,[object Object],Perform a bilateral upsampling step,[object Object],SeePeter-Pike Sloan, Naga K. Govindaraju, Derek Nowrouzezahrai, John Snyder. "Image-Based Proxy Accumulation for Real-Time Soft Global Illumination". Pacific Graphics 2007,[object Object],Result is a full resolution IL,[object Object]
Step 5,[object Object],Combine ,[object Object],Direct Illumination,[object Object],Indirect Illumination,[object Object],Shadows (not mentioned),[object Object]
Scene without IL,[object Object]
Combined Image,[object Object],DEMO,[object Object],~280 FPS on a HD5970 @ 1280x1024,[object Object],for a 15x15 VPL kernel,[object Object],Deffered IL pass + bilateral upsampling costs ~2.5 ms,[object Object]
How to add Indirect Shadows,[object Object],Use a CS and the linked lists technique,[object Object],Insert blocker geomety of IL into 3D grid of lists – let‘s use the triangles of the blocker for now ,[object Object],see backup for alternative data structure,[object Object],Look at a kernel of VPLs again,[object Object],Only accumulate light of VPLs that are occluded by blocker tris,[object Object],Trace rays through 3d grid to detect occluded VPLs,[object Object],Render low res buffer only,[object Object],Subtract blocked indirect light from IL buffer,[object Object],Blurred version of low res blocked IL is used,[object Object],Blur is combined bilateral blurring/upsampling,[object Object]
Insert tris into 3D grid of triangle lists,[object Object],Rasterize dynamic blockers to 3D grid using a CS and atomics,[object Object],Scene,[object Object]
Insert tris into 3D grid of triangle lists,[object Object],(0,1,0),[object Object],Rasterize dynamic blockers to 3D grid using a CS and atomics,[object Object],World space 3D grid of triangle lists around IL blockers laid out in a UAV,[object Object],Scene,[object Object],eol = End of list (0xffffffff),[object Object]
3D Grid Demo,[object Object]
Indirect Light Buffer,[object Object],Blocker of green light,[object Object],Emitter of green,[object Object],light,[object Object],Expected,[object Object],indirect shadow,[object Object]
Blocked Indirect Light,[object Object]
Indirect Light Buffer,[object Object]
Subtracting Blocked IL,[object Object]
Final Image,[object Object],DEMO,[object Object],~70 FPS on a HD5970 @ 1280x1024,[object Object],~300 million rays per second for Indirect Shadows,[object Object],Ray casting costs ~9 ms,[object Object]
Future directions,[object Object],Speed up IL rendering,[object Object],Render IL at even lower res,[object Object],Look into multi-res RSMs,[object Object],Speed up ray-tracing,[object Object],Per pixel array of lists for depth buckets (see backup),[object Object],Other data structures,[object Object],Raytrace other primitive types,[object Object],Splats, fuzzy ellipsoids etc.,[object Object],Proxy geometry or bounding volumes of blockers,[object Object],Get rid of Interlocked*() ops,[object Object],Just mark grid cells as occupied,[object Object],Lower quality but could work on earlier hardware,[object Object]
Q&A,[object Object],Holger Gruen 		 holger.gruen@AMD.com,[object Object],Nicolas Thibieroz 	 nicolas.thibieroz@AMD.com,[object Object],Credits for the basic idea of how to implement PPLL under Direct3D 11 go to Jakub Klarowicz (Techland),Holger Gruen and Nicolas Thibieroz (AMD) ,[object Object]
Backup Slides IL,[object Object]
Computing IL at a g-buf pixel 1,[object Object],Want to support low res RSMs,[object Object],Want to create smooth indirect light ,[object Object],Goal is bi-linear filtering of four VPL-Kernels,[object Object],Otherwise results don‘t look smooth,[object Object]
Computing IL at a g-buf pixel 2,[object Object],stx : sub texel x position [0.0, 1.0[,[object Object],sty : sub texel y position [0.0, 1.0[,[object Object]
Computing IL at a g-buf pixel 3,[object Object],For smooth IL one needs to consider four VPL kernels with centers at t0, t1, t2 and t3.,[object Object],stx : sub texel x position [0.0, 1.0[,[object Object],sty : sub texel y position [0.0, 1.0[,[object Object]
Computing IL at a g-buf pixel 4,[object Object],Center at t0,[object Object],VPL kernel at t0,[object Object],stx : sub texel x position [0.0, 1.0[,[object Object],sty : sub texel y position [0.0, 1.0[,[object Object]
Computing IL at a g-buf pixel 4,[object Object],Center at t1,[object Object],VPL kernel at t0,[object Object],stx : sub texel x position [0.0, 1.0[,[object Object],sty : sub texel y position [0.0, 1.0[,[object Object],VPL kernel at t1,[object Object]
Computing IL at a g-buf pixel 5,[object Object],Center at t2,[object Object],stx : sub texel x position [0.0, 1.0[,[object Object],VPL kernel at t2,[object Object],VPL kernel at t0,[object Object],sty : sub texel y position [0.0, 1.0[,[object Object],VPL kernel at t1,[object Object]
Computing IL at a g-buf pixel 6,[object Object],Center at t3,[object Object],VPL kernel at t0,[object Object],stx : sub texel x position [0.0, 1.0[,[object Object],VPL kernel at t2,[object Object],sty : sub texel y position [0.0, 1.0[,[object Object],VPL kernel at t1,[object Object],VPL kernel at t3,[object Object]
Computing IL at a g-buf pixel 7,[object Object],IndirectLight =,[object Object], (1.0f-sty) * ((1.0f-stx) *     + stx *     ) +,[object Object],  (0.0f+sty) * ((1.0f-stx) *     + stx *     ),[object Object],Evaluation of 4 big VPL kernels is slow ,[object Object],VPL kernel at t0,[object Object],stx : sub texel x position [0.0, 1.0[,[object Object],VPL kernel at t2,[object Object],sty : sub texel y position [0.0, 1.0[,[object Object],VPL kernel at t1,[object Object],VPL kernel at t3,[object Object]
Computing IL at a g-buf pixel 8,[object Object],VPL kernel at t0,[object Object],stx : sub texel x position [0.0, 1.0[,[object Object],VPL kernel at t2,[object Object],sty : sub texel y position [0.0, 1.0[,[object Object],VPL kernel at t1,[object Object],VPL kernel at t3,[object Object]
Computing IL at a g-buf pixel 9,[object Object],B1,[object Object],B2,[object Object],B0,[object Object],B3,[object Object],B4,[object Object],B5,[object Object],B6,[object Object],B7,[object Object],B8,[object Object],VPL kernel at t0,[object Object],stx : sub texel x position [0.0, 1.0[,[object Object],VPL kernel at t2,[object Object],sty : sub texel y position [0.0, 1.0[,[object Object],VPL kernel at t1,[object Object],VPL kernel at t3,[object Object]
Computing IL at a g-buf pixel 9,[object Object],IndirectLight =,[object Object], (1.0f-sty)*(((1.0f-stx)*(B0+B3)+stx*(B2+B5))+B1)+,[object Object], (0.0f+sty)*(((1.0f-stx)*(B6+B3)+stx*(B8+B5))+B7)+B4,[object Object],Evaluation of 7 small and 1 bigger VPL kernels is fast ,[object Object],stx : sub texel x position [0.0, 1.0[,[object Object],sty : sub texel y position [0.0, 1.0[,[object Object]
Insert Tris into 2D Map of Lists of Tris,[object Object],Rasterize blockers of IL from view of light ,[object Object],Light,[object Object],2D buffer,[object Object],Scene,[object Object]
Insert Tris into 2D Map of Lists of Tris,[object Object],Rasterize  blockers of IL from view of light using a GS and conservative rasterization,[object Object],Light,[object Object],Scene,[object Object],2D buffer of lists of triangles written to by scattering PS,[object Object],eol = End of list (0xffffffff),[object Object]
Backup Slides PPLL,[object Object]
Linked List Creation (2),[object Object],For every pixel:,[object Object],Calculate pixel data (color, depth etc.),[object Object],	Retrieve current pixel count from Fragment & Link UAV and increment counter,[object Object],uintuPixelCount =FragmentAndLinkBuffer.IncrementCounter();,[object Object],Swap offsets in Start Offset UAV,[object Object],uintuOldStartOffset;   StartOffsetBuffer.InterlockedExchange(,[object Object],PixelScreenSpacePositionLinearAddress, ,[object Object],uPixelCount, uOldStartOffset);,[object Object],Add new entry to Fragment & Link UAV,[object Object],FragmentAndLinkBuffer_STRUCT Element;,[object Object],Element.FragmentData = FragmentData;,[object Object],Element.uNext        = uOldStartOffset;,[object Object],FragmentAndLinkBuffer[uPixelCount] = Element;,[object Object]
Linked List Creation (3d),[object Object],Start Offset Buffer,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],3,[object Object],4,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],Render Target,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],1,[object Object],2,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],Fragment and Link Buffer,[object Object],Fragment and Link Buffer,[object Object],0,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],-1,[object Object],Fragment and Link Buffer,[object Object],Fragment and Link Buffer,[object Object]
PPLL Pros and Cons,[object Object],Pros:,[object Object],Allows storing of variable number of elements per pixel,[object Object],Only one atomic operation per pixel,[object Object],Using built-in HW UAV counter,[object Object],Good performance,[object Object],Cons:,[object Object],Traversal causes a lot of random access,[object Object],Special case for storing MSAA samples,[object Object]
Alternative Technique:Per Pixel Array,[object Object],There is a two pass method that allows the construction of per-pixel arrays,[object Object],First described in ’Sample Based Visibility for Soft Shadows using Alias-free Shadow Maps’(E.Sintorn, E. Eisemann, U. Assarsson, EG 2008),[object Object],Uses a pre-fix sum to allocate multiple entries into a buffer,[object Object],Elements belonging to the same location will therefore be contiguous in memory,[object Object],Faster traversal (less random access),[object Object],Not as fast as PPLL for scenes with ‘high’ depth complexity in our testing,[object Object]
Rendering SamplesAlternate Method,[object Object],Writes pixels straight into non-MSAA RT,[object Object],Only viable if access to samples is no longer required,[object Object],Samples must be resolved into pixels before writing,[object Object],This affects the manual blending process,[object Object],Samples with the same coverage are blended back-to-front,[object Object],...then averaged (resolved) with other samples before written out to RT,[object Object],This method can prove slightly faster than per-sample rendering,[object Object]
Rendering FragmentsAlternate Method (2x MSAA),[object Object],// Retrieve current sample colors from background texture,[object Object], float4 vCurrentColor0=BackgroundTexture.Load(int3(vPos.xy, 0));,[object Object], float4 vCurrentColor1=BackgroundTexture.Load(int3(vPos.xy, 1)); ,[object Object], // Rendering pixels using SRCALPHA-INVSRCALPHA blending,[object Object], for (int k=0; k<nNumPixels; k++),[object Object], {,[object Object],// Retrieve next unblended furthermost pixel,[object Object],    float4 vPixColor= UnpackFromUint(SortedPixels[k].x);,[object Object],    // Retrieve sample coverage,[object Object],uintuCoverage=UnpackCoverage(SortedPixels[k].y);,[object Object],    // Manual blending between current samples and previous ones,[object Object],    if (uCoverage & (1<<0)) vCurrentColor0= 	lerp(vCurrentColor0.xyz, vPixColor.xyz, vPixColor.w);,[object Object],    if (uCoverage & (1<<1)) vCurrentColor1= 	lerp(vCurrentColor1.xyz, vPixColor.xyz, vPixColor.w);,[object Object], },[object Object],// Return resolved and manually-blended color,[object Object], return (vCurrentColor0+vCurrentColor1)/0.5;,[object Object],} ,[object Object]
1 sur 85

Recommandé

Calibrating Lighting and Materials in Far Cry 3 par
Calibrating Lighting and Materials in Far Cry 3Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3stevemcauley
12.7K vues67 diapositives
Optimizing the Graphics Pipeline with Compute, GDC 2016 par
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
135.4K vues99 diapositives
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda par
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect AndromedaElectronic Arts / DICE
58K vues154 diapositives
A Bit More Deferred Cry Engine3 par
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3guest11b095
12K vues27 diapositives
Siggraph2016 - The Devil is in the Details: idTech 666 par
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Tiago Sousa
7.1K vues58 diapositives
Approaching zero driver overhead par
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overheadCass Everitt
370.2K vues130 diapositives

Contenu connexe

Tendances

Beyond porting par
Beyond portingBeyond porting
Beyond portingCass Everitt
69.4K vues74 diapositives
Graphics Gems from CryENGINE 3 (Siggraph 2013) par
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Tiago Sousa
11K vues59 diapositives
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che... par
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barré-Brisebois
84.2K vues39 diapositives
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas par
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
15.2K vues35 diapositives
Rendering Tech of Space Marine par
Rendering Tech of Space MarineRendering Tech of Space Marine
Rendering Tech of Space MarinePope Kim
20.5K vues129 diapositives
Frostbite on Mobile par
Frostbite on MobileFrostbite on Mobile
Frostbite on MobileElectronic Arts / DICE
43.9K vues41 diapositives

Tendances(20)

Graphics Gems from CryENGINE 3 (Siggraph 2013) par Tiago Sousa
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Tiago Sousa11K vues
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che... par Colin Barré-Brisebois
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas par AMD Developer Central
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Rendering Tech of Space Marine par Pope Kim
Rendering Tech of Space MarineRendering Tech of Space Marine
Rendering Tech of Space Marine
Pope Kim20.5K vues
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson par AMD Developer Central
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Physically Based and Unified Volumetric Rendering in Frostbite par Electronic Arts / DICE
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
Precomputed Voxelized-Shadows for Large-scale Scene and Many lights par Seongdae Kim
Precomputed Voxelized-Shadows for Large-scale Scene and Many lightsPrecomputed Voxelized-Shadows for Large-scale Scene and Many lights
Precomputed Voxelized-Shadows for Large-scale Scene and Many lights
Seongdae Kim766 vues
Order Independent Transparency par acbess
Order Independent TransparencyOrder Independent Transparency
Order Independent Transparency
acbess7.9K vues
Advanced Scenegraph Rendering Pipeline par Narann29
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
Narann291.4K vues
The Rendering Technology of Killzone 2 par Guerrilla
The Rendering Technology of Killzone 2The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2
Guerrilla11.1K vues
Crysis Next-Gen Effects (GDC 2008) par Tiago Sousa
Crysis Next-Gen Effects (GDC 2008)Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)
Tiago Sousa10.5K vues
Z Buffer Optimizations par pjcozzi
Z Buffer OptimizationsZ Buffer Optimizations
Z Buffer Optimizations
pjcozzi18.2K vues

Similaire à Oit And Indirect Illumination Using Dx11 Linked Lists

Gdce 2010 dx11 par
Gdce 2010 dx11Gdce 2010 dx11
Gdce 2010 dx11mistercteam
383 vues108 diapositives
Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeks par
Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeksBeginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeks
Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeksJinTaek Seo
418 vues46 diapositives
最新のデータベース技術の方向性で思うこと par
最新のデータベース技術の方向性で思うこと最新のデータベース技術の方向性で思うこと
最新のデータベース技術の方向性で思うことMasayoshi Hagiwara
6.5K vues30 diapositives
Cassandra Architecture par
Cassandra ArchitectureCassandra Architecture
Cassandra ArchitecturePrasad Wali
72 vues11 diapositives
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead par
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadTristan Lorach
242.3K vues42 diapositives
Berkeley Packet Filters par
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet FiltersKernel TLV
6.4K vues33 diapositives

Similaire à Oit And Indirect Illumination Using Dx11 Linked Lists(20)

Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeks par JinTaek Seo
Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeksBeginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeks
Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeks
JinTaek Seo418 vues
最新のデータベース技術の方向性で思うこと par Masayoshi Hagiwara
最新のデータベース技術の方向性で思うこと最新のデータベース技術の方向性で思うこと
最新のデータベース技術の方向性で思うこと
Masayoshi Hagiwara6.5K vues
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead par Tristan Lorach
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
Tristan Lorach242.3K vues
Berkeley Packet Filters par Kernel TLV
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
Kernel TLV6.4K vues
Compiler Construction for DLX Processor par Soham Kulkarni
Compiler Construction for DLX Processor Compiler Construction for DLX Processor
Compiler Construction for DLX Processor
Soham Kulkarni137 vues
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009) par Johan Andersson
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Johan Andersson11.8K vues
D3 D10 Unleashed New Features And Effects par Thomas Goddard
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
Thomas Goddard959 vues
User guide wishbone serializer par dragonvnu
User guide wishbone serializerUser guide wishbone serializer
User guide wishbone serializer
dragonvnu761 vues
2007 Tidc India Profiling par danrinkes
2007 Tidc India Profiling2007 Tidc India Profiling
2007 Tidc India Profiling
danrinkes380 vues
Progscon 2017: Taming the wild fronteer - Adventures in Clojurescript par John Stevenson
Progscon 2017: Taming the wild fronteer - Adventures in ClojurescriptProgscon 2017: Taming the wild fronteer - Adventures in Clojurescript
Progscon 2017: Taming the wild fronteer - Adventures in Clojurescript
John Stevenson1.1K vues
A "Do-It-Yourself" Specification Language with BeepBeep 3 (Talk @ Dagstuhl 2017) par Sylvain Hallé
A "Do-It-Yourself" Specification Language with BeepBeep 3 (Talk @ Dagstuhl 2017)A "Do-It-Yourself" Specification Language with BeepBeep 3 (Talk @ Dagstuhl 2017)
A "Do-It-Yourself" Specification Language with BeepBeep 3 (Talk @ Dagstuhl 2017)
Sylvain Hallé176 vues
Variants Density along DNA Sequence par Yoann Pageaud
Variants Density along DNA SequenceVariants Density along DNA Sequence
Variants Density along DNA Sequence
Yoann Pageaud182 vues
Exploit Research and Development Megaprimer: DEP Bypassing with ROP Chains par Ajin Abraham
Exploit Research and Development Megaprimer: DEP Bypassing with ROP ChainsExploit Research and Development Megaprimer: DEP Bypassing with ROP Chains
Exploit Research and Development Megaprimer: DEP Bypassing with ROP Chains
Ajin Abraham4.6K vues

Oit And Indirect Illumination Using Dx11 Linked Lists

  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.

Notes de l'éditeur

  1. A node is composed of Element and Link
  2. Uses a Pixel Shader 5.0 to store fragments into linked list: because we need trianglesrasterized as normal through the rendering pipeline. Briefly describe UAVs.Uses atomic operations: only 1 such operation per pixel is used (although it has feedback)
  3. “Fragment &amp; Link” buffer only use Write access actually.FragmentData: typically fragment color, but also transparency, depth, blend mode id, etc.Must be large enough to store all fragments: allocation must take into account maximum number of elements to store. E.g. “average overdraw per pixel” can be used for allocation: width*height*averageoverdraw*sizeof(Element&amp;Link)Fragment data struct should be kept minimal to reduce performance requirements (and improve performance in the process by performing less memory operations)
  4. StartOffsetBuffer is 1D buffer despite being screen-sized; this is because atomic operations are not supported on 2D UAV textures. Initialized to magic value: thus by default the linked lists are empty since they point to -1 (end of list).
  5. “Viewport” is used to illustrate incoming triangle (as no RT is bound)Here fragment data is just the pixel color for simplification.Start Offset Buffer is initialized to “magic value” (-1)
  6. We’re going to exchange the offset in the StartOffsetBuffer with the current value of the counter (before incrementation).This is because the current value of the counter gives us the offset (address) of the fragment we’re storing.
  7. Assuming a rasterization order of left to right for the green triangle
  8. Assuming a rasterization order of left to right for the yellow triangle
  9. Current pixel count is TOTAL pixel count, not pixel count per pixel location.Not using Append Buffers because they require the DXGI_FORMAT_R32_UNKNOWN format.Times 4 because startoffsetaddress is in bytes, not UINT (1 UINT = 4 bytes).
  10. No “4x”to calculate offset address here becauseStartOffsetBufferSRV is declared with the uint type.
  11. No point storing transparent fragments that are hidden by opaque geometry, i.e. that fail the depth test.
  12. May also get away with packed color and depth into the same uint!: this would make the element size a nice 64 bits.Deferred rendering engines may wish to store the same properties in F&amp;L structure as opaque pixels, but watch out for memory and performance impact of large structure!If using different blend modes per pixel then you would need to store blend mode as id into fragment data.Depth is stored as uint – we’ll explain why later on.
  13. Default DX11 pixel pipeline executes pixel shader before depth test[earlydepthstencil] allows the depth test to be performed *before* pixel shader.Normally the HW will try to do this for you automatically but a PS writing to UAVs cannot benefit from this treatment – a bit like alpha testing.Allows performance savings and rendering correctness!: less fragments to store = good!
  14. Sparse memory accesses = slow!: especially true as original rendering order can mean that fragments belonging to the same screen coordinates can be (very) far apart from each other.
  15. Under-blending can be used insteadNo need for background input thenActual HW blending mode enabledBlending mode to use with under-blending is ONE, SRCALPHA
  16. SortedPixels[].x storescolor, SortedPixels[].y stores depthUse (nNumPixels&gt;=MAX_SORTED_PIXELS) test to prevent overflowActual sort algorithm used can be anything (e.g. insertion sort, etc.)
  17. This happens after sorting.vCurrentColor.xyz= vFragColor.w * vFragColor.xyz + (1.0-vFragColor.w)* vCurrentColor.xyz;
  18. The problem with OIT via PPLL and MSAA is that fragment sorting and blending must happens on a per-sample basis, which means a lot of storage and processing.Storing individual samples into Linked Lists requires a huge amount of memory: as most pixels are not edge pixels, they would get all their samples covered. Unfortunately schemes mixing edge and non-edge pixels proved difficult to implement and did not generate satisfactory results.SV_COVERAGE isDX11 only
  19. Depth is now packed into 24 bits: this is why depth is stored as uint
  20. Blue triangle covers one depth sample
  21. Assuming a rasterization order of left to right for the yellow triangle
  22. Traversal causes a lot of random access: Elements belonging to a same pixel location can be far apart in LL!
  23. Not as fast: probably because multiple passes involved.
  24. Only viable if access to samples is no longer required: (e.g. for HDR-correct tone mapping)Sorting is unaffected because stored samples share the same depth.Samples with the same coverage are blended back-to-front: this means blending happens on a per-sample basis as it normally would.
  25. Code illustration for 2 samplesSortedPixels[] contains pixel color in .x and depth+coverage in .y