Oit And Indirect Illumination Using Dx11 Linked Lists
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
83.
84.
85.
Notes de l'éditeur
A node is composed of Element and Link
Uses a Pixel Shader 5.0 to store fragments into linked list: because we need trianglesrasterized as normal through the rendering pipeline. Briefly describe UAVs.Uses atomic operations: only 1 such operation per pixel is used (although it has feedback)
“Fragment & Link” buffer only use Write access actually.FragmentData: typically fragment color, but also transparency, depth, blend mode id, etc.Must be large enough to store all fragments: allocation must take into account maximum number of elements to store. E.g. “average overdraw per pixel” can be used for allocation: width*height*averageoverdraw*sizeof(Element&Link)Fragment data struct should be kept minimal to reduce performance requirements (and improve performance in the process by performing less memory operations)
StartOffsetBuffer is 1D buffer despite being screen-sized; this is because atomic operations are not supported on 2D UAV textures. Initialized to magic value: thus by default the linked lists are empty since they point to -1 (end of list).
“Viewport” is used to illustrate incoming triangle (as no RT is bound)Here fragment data is just the pixel color for simplification.Start Offset Buffer is initialized to “magic value” (-1)
We’re going to exchange the offset in the StartOffsetBuffer with the current value of the counter (before incrementation).This is because the current value of the counter gives us the offset (address) of the fragment we’re storing.
Assuming a rasterization order of left to right for the green triangle
Assuming a rasterization order of left to right for the yellow triangle
Current pixel count is TOTAL pixel count, not pixel count per pixel location.Not using Append Buffers because they require the DXGI_FORMAT_R32_UNKNOWN format.Times 4 because startoffsetaddress is in bytes, not UINT (1 UINT = 4 bytes).
No “4x”to calculate offset address here becauseStartOffsetBufferSRV is declared with the uint type.
No point storing transparent fragments that are hidden by opaque geometry, i.e. that fail the depth test.
May also get away with packed color and depth into the same uint!: this would make the element size a nice 64 bits.Deferred rendering engines may wish to store the same properties in F&L structure as opaque pixels, but watch out for memory and performance impact of large structure!If using different blend modes per pixel then you would need to store blend mode as id into fragment data.Depth is stored as uint – we’ll explain why later on.
Default DX11 pixel pipeline executes pixel shader before depth test[earlydepthstencil] allows the depth test to be performed *before* pixel shader.Normally the HW will try to do this for you automatically but a PS writing to UAVs cannot benefit from this treatment – a bit like alpha testing.Allows performance savings and rendering correctness!: less fragments to store = good!
Sparse memory accesses = slow!: especially true as original rendering order can mean that fragments belonging to the same screen coordinates can be (very) far apart from each other.
Under-blending can be used insteadNo need for background input thenActual HW blending mode enabledBlending mode to use with under-blending is ONE, SRCALPHA
SortedPixels[].x storescolor, SortedPixels[].y stores depthUse (nNumPixels>=MAX_SORTED_PIXELS) test to prevent overflowActual sort algorithm used can be anything (e.g. insertion sort, etc.)
This happens after sorting.vCurrentColor.xyz= vFragColor.w * vFragColor.xyz + (1.0-vFragColor.w)* vCurrentColor.xyz;
The problem with OIT via PPLL and MSAA is that fragment sorting and blending must happens on a per-sample basis, which means a lot of storage and processing.Storing individual samples into Linked Lists requires a huge amount of memory: as most pixels are not edge pixels, they would get all their samples covered. Unfortunately schemes mixing edge and non-edge pixels proved difficult to implement and did not generate satisfactory results.SV_COVERAGE isDX11 only
Depth is now packed into 24 bits: this is why depth is stored as uint
Blue triangle covers one depth sample
Assuming a rasterization order of left to right for the yellow triangle
Traversal causes a lot of random access: Elements belonging to a same pixel location can be far apart in LL!
Not as fast: probably because multiple passes involved.
Only viable if access to samples is no longer required: (e.g. for HDR-correct tone mapping)Sorting is unaffected because stored samples share the same depth.Samples with the same coverage are blended back-to-front: this means blending happens on a per-sample basis as it normally would.
Code illustration for 2 samplesSortedPixels[] contains pixel color in .x and depth+coverage in .y