Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
OIT and Indirect Illumination using DX11 Linked Lists<br />Holger Gruen	AMD ISV Relations<br />Nicolas Thibieroz	AMD ISV R...
Agenda<br />Introduction<br />Linked List Rendering<br />Order Independent Transparency<br />Indirect Illumination<br />Q&...
Introduction<br />Direct3D 11 HW opens the door to many new rendering algorithms<br />In particular per pixel linked lists...
Per-Pixel Linked Lists with Direct3D 11<br />Element<br />Element<br />Element<br />Element<br />Link<br />Link<br />Link<...
Why Linked Lists?<br />Data structure useful for programming<br />Very hard to implement efficiently with previous real-ti...
Two-step process<br />1) Linked List Creation<br />Store incoming fragments into linked lists<br />2) Rendering from Linke...
Creating Per-Pixel Linked Lists<br />
PS5.0 and UAVs<br />Uses a Pixel Shader 5.0 to store fragments into linked lists<br />Not a Compute Shader 5.0!<br />Uses ...
Fragment & Link Buffer<br />The “Fragment & Link” buffer contains data and link for all fragments to store<br />Must be la...
Start Offset Buffer<br />The “Start Offset” buffer contains the offset of the last fragment written at every pixel locatio...
Linked List Creation (1)<br />No color Render Target bound!<br />No rendering yet, just storing in L.L.<br />Depth buffer ...
Linked List Creation (2a)<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1...
Linked List Creation (2b)<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1...
Linked List Creation (2c)<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />0<br />-1<...
Linked List Creation (2d)<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1...
Linked List Creation - Code<br />float PS_StoreFragments(PS_INPUT input) : SV_Target<br />{<br /> // Calculate fragment da...
Traversing Per-Pixel Linked Lists<br />
Rendering Pixels (1)<br />“Start Offset” Buffer and “Fragment & Link” Buffer now bound as SRV<br />Buffer<uint> StartOffse...
Rendering from Linked List<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />4<br />-1<br />-1...
Rendering Pixels (2)<br />float4 PS_RenderFragments(PS_INPUT input) : SV_Target<br />{<br />  // Calculate UINT-aligned st...
Order-Independent Transparency via Per-Pixel Linked Lists<br />Nicolas Thibieroz<br />European ISV Relations<br />AMD<br />
Description<br />Straight application of the linked list algorithm<br />Stores transparent fragments into PPLL<br />Render...
Linked List Structure<br />Optimize performance by reducing amount of data to write to/read from UAV<br />E.g. uint instea...
Visible Fragments Only!<br />Use [earlydepthstencil] in front of Linked List creation pixel shader<br />This ensures only ...
Sorting Pixels<br />Sorting in place requires R/W access to Linked List<br />Sparse memory accesses = slow!<br />Better wa...
Sorting and Blending<br />0.95<br />0.93<br />0.87<br />0.98<br />Background<br />color<br />Temp Array<br />Render Target...
Storing Pixels for Sorting<br /> (...)<br /> static uint2 SortedPixels[MAX_SORTED_PIXELS]; <br /> // Parse linked list for...
Pixel Blending in PS<br /> (...)<br /> // Retrieve current color from background texture<br /> float4 vCurrentColor=Backgr...
OIT via Per-Pixel Linked Lists with MSAA Support<br />
Sample Coverage<br />Storing individual samples into Linked Lists requires a huge amount of memory<br />... and performanc...
Linked List Structure<br />Almost unchanged from previously<br />Depth is now packed into 24 bits<br />8 Bits are used to ...
Sample Coverage Example<br />Pixel Center<br />Sample<br />Third sample is covered<br />uCoverage = 0x04           (0100 i...
Rendering Samples (1)<br />Rendering phase needs to be able to write individual samples<br />Thus PS is run at sample freq...
Rendering Samples (2)<br /> static uint2 SortedPixels[MAX_SORTED_PIXELS]; <br /> // Parse linked list for all pixels at th...
DEMO<br />OIT Linked List Demo<br />
Direct3D 11 Indirect Illumination<br />Holger GruenEuropean ISV Relations AMD<br />
Indirect Illumination Introduction 1<br />Real-time Indirect illumination is an active research topic<br />Numerous approa...
Indirect Illumination Introduction 2<br />This section will coverAn efficient and simple DX9-compliant RSM based implement...
Indirect Illumination w/o Indirect Shadows<br />Draw scene g-buffer<br />Draw Reflective Shadowmap (RSM)<br />RSM shows th...
Step 1<br />G-Buffer needs to allow reconstruction of<br />World/Camera space position<br />World/Camera space normal<br /...
Step 2<br />RSM needs to allow reconstruction of<br />World/Camera space position<br />World/Camera space normal<br />Colo...
Step 3<br />Render a ½ res IL as a deferred op<br />Transform g-buffer pix to RSM space<br />->Light Space->project to RSM...
Computing IL at a G-buf Pixel 1<br />Sum up contribution of all VPLs in the kernel<br />
Computing IL at a G-buf Pixel 2<br />RSM texel/VPL<br />g-buffer pixel<br />This term is very similar to terms used in rad...
Computing IL at a G-buf Pixel 3<br />A naive solution for smooth IL  needs to consider four VPL kernels with centers at t0...
Computing IL at a g-buf pixel 4<br />IndirectLight =<br /> (1.0f-sty) * ((1.0f-stx) *     + stx *     ) +<br />  (0.0f+sty...
Computing IL at a g-buf pixel 5<br />SmoothIndirectLight =<br /> (1.0f-sty)*(((1.0f-stx)*(B0+B3)+stx*(B2+B5))+B1)+<br /> (...
Indirect Light Buffer<br />
Step 4<br />Indirect Light buffer is ½ res<br />Perform a bilateral upsampling step<br />SeePeter-Pike Sloan, Naga K. Govi...
Step 5<br />Combine <br />Direct Illumination<br />Indirect Illumination<br />Shadows (not mentioned)<br />
Scene without IL<br />
Combined Image<br />DEMO<br />~280 FPS on a HD5970 @ 1280x1024<br />for a 15x15 VPL kernel<br />Deffered IL pass + bilater...
How to add Indirect Shadows<br />Use a CS and the linked lists technique<br />Insert blocker geomety of IL into 3D grid of...
Insert tris into 3D grid of triangle lists<br />Rasterize dynamic blockers to 3D grid using a CS and atomics<br />Scene<br />
Insert tris into 3D grid of triangle lists<br />(0,1,0)<br />Rasterize dynamic blockers to 3D grid using a CS and atomics<...
3D Grid Demo<br />
Indirect Light Buffer<br />Blocker of green light<br />Emitter of green<br />light<br />Expected<br />indirect shadow<br />
Blocked Indirect Light<br />
Indirect Light Buffer<br />
Subtracting Blocked IL<br />
Final Image<br />DEMO<br />~70 FPS on a HD5970 @ 1280x1024<br />~300 million rays per second for Indirect Shadows<br />Ray...
Future directions<br />Speed up IL rendering<br />Render IL at even lower res<br />Look into multi-res RSMs<br />Speed up ...
Q&A<br />Holger Gruen 		 holger.gruen@AMD.com<br />Nicolas Thibieroz 	 nicolas.thibieroz@AMD.com<br />Credits for the basi...
Backup Slides IL<br />
Computing IL at a g-buf pixel 1<br />Want to support low res RSMs<br />Want to create smooth indirect light <br />Goal is ...
Computing IL at a g-buf pixel 2<br />stx : sub texel x position [0.0, 1.0[<br />sty : sub texel y position [0.0, 1.0[<br />
Computing IL at a g-buf pixel 3<br />For smooth IL one needs to consider four VPL kernels with centers at t0, t1, t2 and t...
Computing IL at a g-buf pixel 4<br />Center at t0<br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />st...
Computing IL at a g-buf pixel 4<br />Center at t1<br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />st...
Computing IL at a g-buf pixel 5<br />Center at t2<br />stx : sub texel x position [0.0, 1.0[<br />VPL kernel at t2<br />VP...
Computing IL at a g-buf pixel 6<br />Center at t3<br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />VP...
Computing IL at a g-buf pixel 7<br />IndirectLight =<br /> (1.0f-sty) * ((1.0f-stx) *     + stx *     ) +<br />  (0.0f+sty...
Computing IL at a g-buf pixel 8<br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />VPL kernel at t2<br ...
Computing IL at a g-buf pixel 9<br />B1<br />B2<br />B0<br />B3<br />B4<br />B5<br />B6<br />B7<br />B8<br />VPL kernel at...
Computing IL at a g-buf pixel 9<br />IndirectLight =<br /> (1.0f-sty)*(((1.0f-stx)*(B0+B3)+stx*(B2+B5))+B1)+<br /> (0.0f+s...
Insert Tris into 2D Map of Lists of Tris<br />Rasterize blockers of IL from view of light <br />Light<br />2D buffer<br />...
Insert Tris into 2D Map of Lists of Tris<br />Rasterize  blockers of IL from view of light using a GS and conservative ras...
Backup Slides PPLL<br />
Linked List Creation (2)<br />For every pixel:<br />Calculate pixel data (color, depth etc.)<br />	Retrieve current pixel ...
Linked List Creation (3d)<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />3<br />4<b...
PPLL Pros and Cons<br />Pros:<br />Allows storing of variable number of elements per pixel<br />Only one atomic operation ...
Alternative Technique:Per Pixel Array<br />There is a two pass method that allows the construction of per-pixel arrays<br ...
Rendering SamplesAlternate Method<br />Writes pixels straight into non-MSAA RT<br />Only viable if access to samples is no...
Rendering FragmentsAlternate Method (2x MSAA)<br />// Retrieve current sample colors from background texture<br /> float4 ...
Prochain SlideShare
Chargement dans…5
×

Oit And Indirect Illumination Using Dx11 Linked Lists

21 132 vues

Publié le

  • Soyez le premier à commenter

Oit And Indirect Illumination Using Dx11 Linked Lists

  1. 1.
  2. 2. OIT and Indirect Illumination using DX11 Linked Lists<br />Holger Gruen AMD ISV Relations<br />Nicolas Thibieroz AMD ISV Relations<br />
  3. 3. Agenda<br />Introduction<br />Linked List Rendering<br />Order Independent Transparency<br />Indirect Illumination<br />Q&A<br />
  4. 4. Introduction<br />Direct3D 11 HW opens the door to many new rendering algorithms<br />In particular per pixel linked lists allow for a number of new techniquesOIT, Indirect Shadows, Ray Tracing of dynamic scenes, REYES surface dicing, custom AA, Irregular Z-buffering, custom blending, Advanced Depth of Field, etc. <br />This talk will walk you through:A DX11 implementation of per-pixel linked list and two effects that utilize this techique<br />OIT<br />Indirect Illumination <br />
  5. 5. Per-Pixel Linked Lists with Direct3D 11<br />Element<br />Element<br />Element<br />Element<br />Link<br />Link<br />Link<br />Link<br />Nicolas Thibieroz<br />European ISV Relations<br />AMD<br />
  6. 6. Why Linked Lists?<br />Data structure useful for programming<br />Very hard to implement efficiently with previous real-time graphics APIs<br />DX11 allows efficient creation and parsing of linked lists<br />Per-pixel linked lists<br />A collection of linked lists enumerating all pixels belonging to the same screen position<br />Element<br />Element<br />Element<br />Element<br />Link<br />Link<br />Link<br />Link<br />
  7. 7. Two-step process<br />1) Linked List Creation<br />Store incoming fragments into linked lists<br />2) Rendering from Linked List<br />Linked List traversal and processing of stored fragments<br />
  8. 8. Creating Per-Pixel Linked Lists<br />
  9. 9. PS5.0 and UAVs<br />Uses a Pixel Shader 5.0 to store fragments into linked lists<br />Not a Compute Shader 5.0!<br />Uses atomic operations<br />Two UAV buffers required<br />- “Fragment & Link” buffer<br />- “Start Offset” buffer<br /> UAV = Unordered Access View<br />
  10. 10. Fragment & Link Buffer<br />The “Fragment & Link” buffer contains data and link for all fragments to store<br />Must be large enough to store all fragments<br />Created with Counter support<br />D3D11_BUFFER_UAV_FLAG_COUNTER flag in UAV view<br />Declaration:<br />structFragmentAndLinkBuffer_STRUCT<br />{<br />FragmentData_STRUCTFragmentData; // Fragment data<br />uintuNext; // Link to next fragment<br />};<br />RWStructuredBuffer <FragmentAndLinkBuffer_STRUCT> FLBuffer;<br />
  11. 11. Start Offset Buffer<br />The “Start Offset” buffer contains the offset of the last fragment written at every pixel location<br />Screen-sized:(width * height * sizeof(UINT32) )<br />Initialized to magic value (e.g. -1)<br />Magic value indicates no more fragments are stored (i.e. end of the list)<br />Declaration:<br />RWByteAddressBufferStartOffsetBuffer;<br />
  12. 12. Linked List Creation (1)<br />No color Render Target bound!<br />No rendering yet, just storing in L.L.<br />Depth buffer bound if needed<br />OIT will need it in a few slides<br />UAVs bounds as input/output:<br />StartOffsetBuffer (R/W)<br />FragmentAndLinkBuffer (W)<br />
  13. 13. Linked List Creation (2a)<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />Viewport<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />0<br />Fragment and Link Buffer<br />Counter =<br />Fragment and Link Buffer<br />Fragment and Link Buffer<br />Fragment and Link Buffer<br />
  14. 14. Linked List Creation (2b)<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />Viewport<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />1<br />0<br />Fragment and Link Buffer<br />Counter =<br />Fragment and Link Buffer<br />-1<br />Fragment and Link Buffer<br />-1<br />Fragment and Link Buffer<br />
  15. 15. Linked List Creation (2c)<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />0<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />Viewport<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />1<br />2<br />3<br />Fragment and Link Buffer<br />Counter =<br />Fragment and Link Buffer<br />-1<br />Fragment and Link Buffer<br />-1<br />-1<br />Fragment and Link Buffer<br />
  16. 16. Linked List Creation (2d)<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />0<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />Viewport<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />1<br />2<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />4<br />3<br />5<br />Fragment and Link Buffer<br />Counter =<br />-1<br />-1<br />-1<br />Fragment and Link Buffer<br />0<br />-1<br />Fragment and Link Buffer<br />
  17. 17. Linked List Creation - Code<br />float PS_StoreFragments(PS_INPUT input) : SV_Target<br />{<br /> // Calculate fragment data (color, depth, etc.)<br />FragmentData_STRUCTFragmentData = ComputeFragment();<br /> // Retrieve current pixel count and increase counter<br />uintuPixelCount = FLBuffer.IncrementCounter();<br />// Exchange offsets in StartOffsetBuffer<br />uintvPos = uint(input.vPos);<br />uintuStartOffsetAddress= 4 * ( (SCREEN_WIDTH*vPos.y) + vPos.x );<br />uintuOldStartOffset;<br />StartOffsetBuffer.InterlockedExchange(uStartOffsetAddress, uPixelCount, uOldStartOffset);<br />// Add new fragment entry in Fragment & Link Buffer<br />FragmentAndLinkBuffer_STRUCT Element;<br />Element.FragmentData = FragmentData;<br />Element.uNext = uOldStartOffset;<br />FLBuffer[uPixelCount] = Element;<br />}<br />
  18. 18. Traversing Per-Pixel Linked Lists<br />
  19. 19. Rendering Pixels (1)<br />“Start Offset” Buffer and “Fragment & Link” Buffer now bound as SRV<br />Buffer<uint> StartOffsetBufferSRV;<br />StructuredBuffer<FragmentAndLinkBuffer_STRUCT> FLBufferSRV;<br />Render a fullscreen quad<br />For each pixel, parse the linked list and retrieve fragments for this screen position<br />Process list of fragments as required<br />Depends on algorithm<br />e.g. sorting, finding maximum, etc.<br /> SRV = Shader Resource View<br />
  20. 20. Rendering from Linked List<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />4<br />-1<br />-1<br />-1<br />-1<br />3<br />3<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />Render Target<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />1<br />2<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />Fragment and Link Buffer<br />-1<br />-1<br />-1<br />-1<br />Fragment and Link Buffer<br />0<br />0<br />-1<br />Fragment and Link Buffer<br />
  21. 21. Rendering Pixels (2)<br />float4 PS_RenderFragments(PS_INPUT input) : SV_Target<br />{<br /> // Calculate UINT-aligned start offset buffer address<br />uintvPos = uint(input.vPos);<br />uintuStartOffsetAddress = SCREEN_WIDTH*vPos.y + vPos.x;<br />// Fetch offset of first fragment for current pixel<br />uintuOffset = StartOffsetBufferSRV.Load(uStartOffsetAddress);<br />// Parse linked list for all fragments at this position<br /> float4 FinalColor=float4(0,0,0,0);<br /> while (uOffset!=0xFFFFFFFF) // 0xFFFFFFFF is magic value<br /> {<br />// Retrieve pixel at current offset<br /> Element=FLBufferSRV[uOffset];<br /> // Process pixel as required<br />ProcessPixel(Element, FinalColor);<br />// Retrieve next offset<br />uOffset = Element.uNext;<br /> }<br /> return (FinalColor);<br />}<br />
  22. 22. Order-Independent Transparency via Per-Pixel Linked Lists<br />Nicolas Thibieroz<br />European ISV Relations<br />AMD<br />
  23. 23. Description<br />Straight application of the linked list algorithm<br />Stores transparent fragments into PPLL<br />Rendering phase sorts pixels in a back-to-front order and blends them manually in a pixel shader<br />Blend mode can be unique per-pixel!<br />Special case for MSAA support<br />
  24. 24. Linked List Structure<br />Optimize performance by reducing amount of data to write to/read from UAV<br />E.g. uint instead of float4 for color<br />Example data structure for OIT:<br />structFragmentAndLinkBuffer_STRUCT<br />{<br />uintuPixelColor; // Packed pixel color<br />uintuDepth; // Pixel depth<br />uintuNext; // Address of next link<br />};<br />May also get away with packed color and depth into the same uint! (if same alpha)<br />16 bits color (565) + 16 bits depth<br />Performance/memory/quality trade-off<br />
  25. 25. Visible Fragments Only!<br />Use [earlydepthstencil] in front of Linked List creation pixel shader<br />This ensures only transparent fragments that pass the depth test are stored<br />i.e. Visible fragments!<br />Allows performance savings and rendering correctness!<br />[earlydepthstencil]<br />float PS_StoreFragments(PS_INPUT input) : SV_Target<br />{<br /> ...<br />}<br />
  26. 26. Sorting Pixels<br />Sorting in place requires R/W access to Linked List<br />Sparse memory accesses = slow!<br />Better way is to copy all pixels into array of temp registers<br />Then do the sorting<br />Temp array declaration means a hard limit on number of pixel per screen coordinates<br />Required trade-off for performance<br />
  27. 27. Sorting and Blending<br />0.95<br />0.93<br />0.87<br />0.98<br />Background<br />color<br />Temp Array<br />Render Target<br />PS color<br />Blend fragments back to front in PS<br />Blending algorithm up to app<br />Example: SRCALPHA-INVSRCALPHA<br />Or unique per pixel! (stored in fragment data)<br />Background passed as input texture<br />Actual HW blending mode disabled<br />0.98<br />0.87<br />0.95<br />0.93<br />-1<br />0<br />12<br />34<br />
  28. 28. Storing Pixels for Sorting<br /> (...)<br /> static uint2 SortedPixels[MAX_SORTED_PIXELS]; <br /> // Parse linked list for all pixels at this position<br /> // and store them into temp array for later sorting<br />intnNumPixels=0;<br /> while (uOffset!=0xFFFFFFFF)<br /> {<br />// Retrieve pixel at current offset<br /> Element=FLBufferSRV[uOffset];<br /> // Copy pixel data into temp array<br />SortedPixels[nNumPixels++]=<br /> uint2(Element.uPixelColor, Element.uDepth);<br />// Retrieve next offset<br /> [flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ?<br /> 0xFFFFFFFF : Element.uNext;<br /> }<br />// Sort pixels in-place<br />SortPixelsInPlace(SortedPixels, nNumPixels);<br /> (...) <br />
  29. 29. Pixel Blending in PS<br /> (...)<br /> // Retrieve current color from background texture<br /> float4 vCurrentColor=BackgroundTexture.Load(int3(vPos.xy, 0)); <br /> // Rendering pixels using SRCALPHA-INVSRCALPHA blending<br /> for (int k=0; k<nNumPixels; k++)<br /> {<br />// Retrieve next unblended furthermost pixel<br /> float4 vPixColor= UnpackFromUint(SortedPixels[k].x);<br /> // Manual blending between current fragment and previous one<br /> vCurrentColor.xyz= lerp(vCurrentColor.xyz, vPixColor.xyz,<br />vPixColor.w);<br /> }<br />// Return manually-blended color<br /> return vCurrentColor;<br />} <br />
  30. 30. OIT via Per-Pixel Linked Lists with MSAA Support<br />
  31. 31. Sample Coverage<br />Storing individual samples into Linked Lists requires a huge amount of memory<br />... and performance will suffer!<br />Solution is to store transparent pixels into PPLL as before<br />But including sample coverage too!<br />Requires as many bits as MSAA mode<br />Declare SV_COVERAGE in PS structure<br />struct PS_INPUT<br /> {<br /> float3 vNormal : NORMAL;<br /> float2 vTex : TEXCOORD;<br /> float4 vPos : SV_POSITION;<br />uintuCoverage : SV_COVERAGE;<br /> }<br />
  32. 32. Linked List Structure<br />Almost unchanged from previously<br />Depth is now packed into 24 bits<br />8 Bits are used to store coverage<br />structFragmentAndLinkBuffer_STRUCT<br />{<br />uintuPixelColor; // Packed pixel color<br />uintuDepthAndCoverage; // Depth + coverage<br />uintuNext; // Address of next link<br />};<br />
  33. 33. Sample Coverage Example<br />Pixel Center<br />Sample<br />Third sample is covered<br />uCoverage = 0x04 (0100 in binary)<br />Element.uDepthAndCoverage = <br />( In.vPos.z*(2^24-1) << 8 ) | In.uCoverage;<br />
  34. 34. Rendering Samples (1)<br />Rendering phase needs to be able to write individual samples<br />Thus PS is run at sample frequency<br />Can be done by declaring SV_SAMPLEINDEX in input structure<br />Parse linked list and store pixels into temp array for later sorting<br />Similar to non-MSAA case<br />Difference is to only store sample if coverage matches sample index being rasterized<br />
  35. 35. Rendering Samples (2)<br /> static uint2 SortedPixels[MAX_SORTED_PIXELS]; <br /> // Parse linked list for all pixels at this position<br /> // and store them into temp array for later sorting<br />intnNumPixels=0;<br /> while (uOffset!=0xFFFFFFFF)<br /> {<br />// Retrieve pixel at current offset<br /> Element=FLBufferSRV[uOffset];<br /> // Retrieve pixel coverage from linked list element<br />uintuCoverage=UnpackCoverage(Element.uDepthAndCoverage);<br /> if ( uCoverage & (1<<In.uSampleIndex) )<br /> {<br /> // Coverage matches current sample so copy pixel<br />SortedPixels[nNumPixels++]=Element;<br />}<br />// Retrieve next offset<br /> [flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ?<br /> 0xFFFFFFFF : Element.uNext;<br /> }<br />
  36. 36. DEMO<br />OIT Linked List Demo<br />
  37. 37. Direct3D 11 Indirect Illumination<br />Holger GruenEuropean ISV Relations AMD<br />
  38. 38. Indirect Illumination Introduction 1<br />Real-time Indirect illumination is an active research topic<br />Numerous approaches existReflective Shadow Maps (RSM)[Dachsbacher/Stammiger05]Splatting Indirect Illumination [Dachsbacher/Stammiger2006]Multi-Res Splatting of Illumination [Wyman2009]Light propagation volumes [Kapalanyan2009]Approximating Dynamic Global Illumination in Image Space [Ritschel2009] <br />Only a few support indirect shadowsImperfect Shadow Maps [Ritschel/Grosch2008]Micro-Rendering for Scalable, Parallel Final Gathering(SSDO) [Ritschel2010] Cascaded light propagation volumes for real-time indirect illumination [Kapalanyan/Dachsbacher2010]<br />Most approaches somehow extend to multi-bounce lighting<br />
  39. 39. Indirect Illumination Introduction 2<br />This section will coverAn efficient and simple DX9-compliant RSM based implementation for smooth one bounce indirect illumination<br />Indirect shadows are ignored here<br />A Direct3D 11 technique that traces rays to compute indirect shadows<br />Part of this technique could generally be used for ray-tracing dynamic scenes<br />
  40. 40. Indirect Illumination w/o Indirect Shadows<br />Draw scene g-buffer<br />Draw Reflective Shadowmap (RSM)<br />RSM shows the part of the scene that recieves direct light from the light source<br />Draw Indirect Light buffer at ½ res <br />RSM texels are used as light sources on g-buffer pixels for indirect lighting <br />Upsample Indirect Light (IL)<br />Draw final image adding IL<br />
  41. 41. Step 1<br />G-Buffer needs to allow reconstruction of<br />World/Camera space position<br />World/Camera space normal<br />Color/ Albedo <br />DXGI_FORMAT_R32G32B32A32_FLOAT positions may be required for precise ray queries for indirect shadows<br />
  42. 42. Step 2<br />RSM needs to allow reconstruction of<br />World/Camera space position<br />World/Camera space normal<br />Color/ Albedo<br />Only draw emitters of indirect light<br />DXGI_FORMAT_R32G32B32A32_FLOAT position may be required for ray precise queries for indirect shadows<br />
  43. 43. Step 3<br />Render a ½ res IL as a deferred op<br />Transform g-buffer pix to RSM space<br />->Light Space->project to RSM texel space<br />Use a kernel of RSM texels as light sources<br />RSM texels also called Virtual Point Light(VPL)<br />Kernel size depends on<br />Desired speed<br />Desired look of the effect<br />RSM resolution<br />
  44. 44. Computing IL at a G-buf Pixel 1<br />Sum up contribution of all VPLs in the kernel<br />
  45. 45. Computing IL at a G-buf Pixel 2<br />RSM texel/VPL<br />g-buffer pixel<br />This term is very similar to terms used in radiosity form factor computations<br />
  46. 46. Computing IL at a G-buf Pixel 3<br />A naive solution for smooth IL needs to consider four VPL kernels with centers at t0, t1, t2 and t3.<br />stx : sub RSM texel x position [0.0, 1.0[<br />sty : sub RSM texel y position [0.0, 1.0[<br />
  47. 47. Computing IL at a g-buf pixel 4<br />IndirectLight =<br /> (1.0f-sty) * ((1.0f-stx) * + stx * ) +<br /> (0.0f+sty) * ((1.0f-stx) * + stx * )<br />Evaluation of 4 big VPL kernels is slow <br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />VPL kernel at t2<br />sty : sub texel y position [0.0, 1.0[<br />VPL kernel at t1<br />VPL kernel at t3<br />
  48. 48. Computing IL at a g-buf pixel 5<br />SmoothIndirectLight =<br /> (1.0f-sty)*(((1.0f-stx)*(B0+B3)+stx*(B2+B5))+B1)+<br /> (0.0f+sty)*(((1.0f-stx)*(B6+B3)+stx*(B8+B5))+B7)+B4<br />stx : sub RSM texel x position of g-buf pix [0.0, 1.0[<br />sty : sub RSM texel y position of g-buf pix [0.0, 1.0[<br />This trick is probably known to some of you already. See backup for a detailed explanation !<br />
  49. 49. Indirect Light Buffer<br />
  50. 50. Step 4<br />Indirect Light buffer is ½ res<br />Perform a bilateral upsampling step<br />SeePeter-Pike Sloan, Naga K. Govindaraju, Derek Nowrouzezahrai, John Snyder. "Image-Based Proxy Accumulation for Real-Time Soft Global Illumination". Pacific Graphics 2007<br />Result is a full resolution IL<br />
  51. 51. Step 5<br />Combine <br />Direct Illumination<br />Indirect Illumination<br />Shadows (not mentioned)<br />
  52. 52. Scene without IL<br />
  53. 53. Combined Image<br />DEMO<br />~280 FPS on a HD5970 @ 1280x1024<br />for a 15x15 VPL kernel<br />Deffered IL pass + bilateral upsampling costs ~2.5 ms<br />
  54. 54. How to add Indirect Shadows<br />Use a CS and the linked lists technique<br />Insert blocker geomety of IL into 3D grid of lists – let‘s use the triangles of the blocker for now <br />see backup for alternative data structure<br />Look at a kernel of VPLs again<br />Only accumulate light of VPLs that are occluded by blocker tris<br />Trace rays through 3d grid to detect occluded VPLs<br />Render low res buffer only<br />Subtract blocked indirect light from IL buffer<br />Blurred version of low res blocked IL is used<br />Blur is combined bilateral blurring/upsampling<br />
  55. 55. Insert tris into 3D grid of triangle lists<br />Rasterize dynamic blockers to 3D grid using a CS and atomics<br />Scene<br />
  56. 56. Insert tris into 3D grid of triangle lists<br />(0,1,0)<br />Rasterize dynamic blockers to 3D grid using a CS and atomics<br />World space 3D grid of triangle lists around IL blockers laid out in a UAV<br />Scene<br />eol = End of list (0xffffffff)<br />
  57. 57. 3D Grid Demo<br />
  58. 58. Indirect Light Buffer<br />Blocker of green light<br />Emitter of green<br />light<br />Expected<br />indirect shadow<br />
  59. 59. Blocked Indirect Light<br />
  60. 60. Indirect Light Buffer<br />
  61. 61. Subtracting Blocked IL<br />
  62. 62. Final Image<br />DEMO<br />~70 FPS on a HD5970 @ 1280x1024<br />~300 million rays per second for Indirect Shadows<br />Ray casting costs ~9 ms<br />
  63. 63. Future directions<br />Speed up IL rendering<br />Render IL at even lower res<br />Look into multi-res RSMs<br />Speed up ray-tracing<br />Per pixel array of lists for depth buckets (see backup)<br />Other data structures<br />Raytrace other primitive types<br />Splats, fuzzy ellipsoids etc.<br />Proxy geometry or bounding volumes of blockers<br />Get rid of Interlocked*() ops<br />Just mark grid cells as occupied<br />Lower quality but could work on earlier hardware<br />
  64. 64. Q&A<br />Holger Gruen holger.gruen@AMD.com<br />Nicolas Thibieroz nicolas.thibieroz@AMD.com<br />Credits for the basic idea of how to implement PPLL under Direct3D 11 go to Jakub Klarowicz (Techland),Holger Gruen and Nicolas Thibieroz (AMD) <br />
  65. 65. Backup Slides IL<br />
  66. 66. Computing IL at a g-buf pixel 1<br />Want to support low res RSMs<br />Want to create smooth indirect light <br />Goal is bi-linear filtering of four VPL-Kernels<br />Otherwise results don‘t look smooth<br />
  67. 67. Computing IL at a g-buf pixel 2<br />stx : sub texel x position [0.0, 1.0[<br />sty : sub texel y position [0.0, 1.0[<br />
  68. 68. Computing IL at a g-buf pixel 3<br />For smooth IL one needs to consider four VPL kernels with centers at t0, t1, t2 and t3.<br />stx : sub texel x position [0.0, 1.0[<br />sty : sub texel y position [0.0, 1.0[<br />
  69. 69. Computing IL at a g-buf pixel 4<br />Center at t0<br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />sty : sub texel y position [0.0, 1.0[<br />
  70. 70. Computing IL at a g-buf pixel 4<br />Center at t1<br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />sty : sub texel y position [0.0, 1.0[<br />VPL kernel at t1<br />
  71. 71. Computing IL at a g-buf pixel 5<br />Center at t2<br />stx : sub texel x position [0.0, 1.0[<br />VPL kernel at t2<br />VPL kernel at t0<br />sty : sub texel y position [0.0, 1.0[<br />VPL kernel at t1<br />
  72. 72. Computing IL at a g-buf pixel 6<br />Center at t3<br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />VPL kernel at t2<br />sty : sub texel y position [0.0, 1.0[<br />VPL kernel at t1<br />VPL kernel at t3<br />
  73. 73. Computing IL at a g-buf pixel 7<br />IndirectLight =<br /> (1.0f-sty) * ((1.0f-stx) * + stx * ) +<br /> (0.0f+sty) * ((1.0f-stx) * + stx * )<br />Evaluation of 4 big VPL kernels is slow <br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />VPL kernel at t2<br />sty : sub texel y position [0.0, 1.0[<br />VPL kernel at t1<br />VPL kernel at t3<br />
  74. 74. Computing IL at a g-buf pixel 8<br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />VPL kernel at t2<br />sty : sub texel y position [0.0, 1.0[<br />VPL kernel at t1<br />VPL kernel at t3<br />
  75. 75. Computing IL at a g-buf pixel 9<br />B1<br />B2<br />B0<br />B3<br />B4<br />B5<br />B6<br />B7<br />B8<br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />VPL kernel at t2<br />sty : sub texel y position [0.0, 1.0[<br />VPL kernel at t1<br />VPL kernel at t3<br />
  76. 76. Computing IL at a g-buf pixel 9<br />IndirectLight =<br /> (1.0f-sty)*(((1.0f-stx)*(B0+B3)+stx*(B2+B5))+B1)+<br /> (0.0f+sty)*(((1.0f-stx)*(B6+B3)+stx*(B8+B5))+B7)+B4<br />Evaluation of 7 small and 1 bigger VPL kernels is fast <br />stx : sub texel x position [0.0, 1.0[<br />sty : sub texel y position [0.0, 1.0[<br />
  77. 77. Insert Tris into 2D Map of Lists of Tris<br />Rasterize blockers of IL from view of light <br />Light<br />2D buffer<br />Scene<br />
  78. 78. Insert Tris into 2D Map of Lists of Tris<br />Rasterize blockers of IL from view of light using a GS and conservative rasterization<br />Light<br />Scene<br />2D buffer of lists of triangles written to by scattering PS<br />eol = End of list (0xffffffff)<br />
  79. 79. Backup Slides PPLL<br />
  80. 80. Linked List Creation (2)<br />For every pixel:<br />Calculate pixel data (color, depth etc.)<br /> Retrieve current pixel count from Fragment & Link UAV and increment counter<br />uintuPixelCount =FragmentAndLinkBuffer.IncrementCounter();<br />Swap offsets in Start Offset UAV<br />uintuOldStartOffset; StartOffsetBuffer.InterlockedExchange(<br />PixelScreenSpacePositionLinearAddress, <br />uPixelCount, uOldStartOffset);<br />Add new entry to Fragment & Link UAV<br />FragmentAndLinkBuffer_STRUCT Element;<br />Element.FragmentData = FragmentData;<br />Element.uNext = uOldStartOffset;<br />FragmentAndLinkBuffer[uPixelCount] = Element;<br />
  81. 81. Linked List Creation (3d)<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />3<br />4<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />Render Target<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />1<br />2<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />Fragment and Link Buffer<br />Fragment and Link Buffer<br />0<br />-1<br />-1<br />-1<br />-1<br />Fragment and Link Buffer<br />Fragment and Link Buffer<br />
  82. 82. PPLL Pros and Cons<br />Pros:<br />Allows storing of variable number of elements per pixel<br />Only one atomic operation per pixel<br />Using built-in HW UAV counter<br />Good performance<br />Cons:<br />Traversal causes a lot of random access<br />Special case for storing MSAA samples<br />
  83. 83. Alternative Technique:Per Pixel Array<br />There is a two pass method that allows the construction of per-pixel arrays<br />First described in ’Sample Based Visibility for Soft Shadows using Alias-free Shadow Maps’(E.Sintorn, E. Eisemann, U. Assarsson, EG 2008)<br />Uses a pre-fix sum to allocate multiple entries into a buffer<br />Elements belonging to the same location will therefore be contiguous in memory<br />Faster traversal (less random access)<br />Not as fast as PPLL for scenes with ‘high’ depth complexity in our testing<br />
  84. 84. Rendering SamplesAlternate Method<br />Writes pixels straight into non-MSAA RT<br />Only viable if access to samples is no longer required<br />Samples must be resolved into pixels before writing<br />This affects the manual blending process<br />Samples with the same coverage are blended back-to-front<br />...then averaged (resolved) with other samples before written out to RT<br />This method can prove slightly faster than per-sample rendering<br />
  85. 85. Rendering FragmentsAlternate Method (2x MSAA)<br />// Retrieve current sample colors from background texture<br /> float4 vCurrentColor0=BackgroundTexture.Load(int3(vPos.xy, 0));<br /> float4 vCurrentColor1=BackgroundTexture.Load(int3(vPos.xy, 1)); <br /> // Rendering pixels using SRCALPHA-INVSRCALPHA blending<br /> for (int k=0; k<nNumPixels; k++)<br /> {<br />// Retrieve next unblended furthermost pixel<br /> float4 vPixColor= UnpackFromUint(SortedPixels[k].x);<br /> // Retrieve sample coverage<br />uintuCoverage=UnpackCoverage(SortedPixels[k].y);<br /> // Manual blending between current samples and previous ones<br /> if (uCoverage & (1<<0)) vCurrentColor0= lerp(vCurrentColor0.xyz, vPixColor.xyz, vPixColor.w);<br /> if (uCoverage & (1<<1)) vCurrentColor1= lerp(vCurrentColor1.xyz, vPixColor.xyz, vPixColor.w);<br /> }<br />// Return resolved and manually-blended color<br /> return (vCurrentColor0+vCurrentColor1)/0.5;<br />} <br />

×