SPU Assisted Rendering

1. /* * SPU Assisted Rendering. */ Steven Tovey & Stephen McAuley Graphics Programmers, Bizarre Creations Ltd. steven.tovey@bizarrecreations.com stephen.mcauley@bizarrecreations.com http://www.bizarrecreations.com

3. Car Lighting

5. /* * Part I w/ Steven Tovey */ SPU Acceleration of Car Rendering in Blur

7. Why do this?Free up RSX™ to do other things. Enable otherwise unfeasible techniques. Optimise rendering. /* What is SPU AR? I */

9. Synchronisation.

10. Optimising SPU modules.

11. Memory considerations:

12. Local store

13. Resource allocation

14. Etc./* What is SPU AR? II */

16. Totally GPU-based.

17. 2xVTF (volume & 2D) for damage.

18. Large amount of work in vertex shader, making cars in Blur heavily vertex-bound.

19. All lighting in pixel shader./* Case Study: Cars I */

25. Increase rendering speed of cars.

26. Maintain same quality./* Case Study: Cars VI */

28. Large parts are SPU based.

29. On demand.

30. Sync-free.

31. Deferred.

32. Work split between GPU/SPU./* Damage: Solution */

34. Read-only car vertex data.

35. Shared between similar cars.

36. SPU-modified damage vertex data.

37. Per instance.

38. One-to-one mapping of vertices.

39. Control points:

40. Crude approximation of volume preservation.

41. Dent/scratch blend levels./* Damage: Data I */

42. /* Damage: Data II */ Stream0 Stream1 Position SPU_Position Normal UV0 SPU_Normal UV1 PosOffset NormalOffset ControlPoints AO

50. If vertex format is 16 bytes exactly can atomically change a vertex from SPU.

51. If you can live with the odd vertex being wrong for a frame, this could be a huge win!/* Damage: Data III */

52. /* Damage: Data IV */ SPU RSX Local Main Write-only Vertices Read-only Vertices

54. Note: There is no link to the player health, purely superficial./* Damage: Events */ Impact Impact Game Code Impact Impact Impact Impact

55. /* Damage: Data V */ Impact Impact Impact Constants Impact Impact Impact SPU GPU Write-only Vertices* Read-only Vertices* * - w.r.t to SPU

56. /* Damage: Data VI */ SPU GPU Write-only Vertices* Read-only Vertices* * - w.r.t to SPU

65. We favour si style for simplicity and ease./* de-code into IEEE754-ish 32bit float (meh): */ qword sign_bit = si_and(result, sign_bit_mask); sign_bit = si_shli(sign_bit, 0x10); /* move 16 bits into correct place. */ qword significand = si_and(result, mant_bit_mask); significand = si_shli(significand, 0xd); qword is_zero_mask = si_cgti(significand, 0x0); /* all bits set if non-zero. */ expo_bias = si_and(is_zero_mask, expo_bias); qword exponent_bias= si_a(significand, expo_bias); /* move expo up range, 0x07800000=>0x3f800000. */ exponent_bias= si_or(exponent_bias, sign_bit); /* Damage: SPU I */

67. GPU version relied on bilinear filtering of volume texture to smooth damage.

68. Filtering on SPU is a bit of a pain.

69. Working out which events affect which vertices?/* Damage: SPU II */

71. Two-stage x-form:

72. 1. Get data in volume texture-ish format.

73. 2. Apply x-form to all vertices./* Damage: SPU III */

75. Software bilinear filtering.

76. Some interesting instructions in ISA will help here./* Damage: SPU IV */

80. We added some of the per-vertex lighting calculations for brake lights, for example./* Damage: Lessons III */

81. /* Damage: Results */

83. SPU-generated cube maps.

84. 40 in total (accounting for double buffer).

85. 8x8 per face.

86. Deferred.

87. Work split between GPU/SPU.

88. Cars are lit with a mixture of things:

89. SH (world + dynamic)

90. Cube map lighting

91. Vertex lighting/* Lighting: Solution */

93. Nearest 16 lights.

94. Output:

95. Cube map.

96. Simples!/* Lighting: Data */ Light SPU Light Cube map Light Light Light Light

98. Cube maps are double buffered to avoid artefacts and contention with GPU.

99. Workload scalable.

100. Number of cube maps per task can change dynamically if need be./* Lighting: Control */

103. /* Lighting: Results I */

106. /* Lighting: Results II */

107. /* Lighting: Results II */

108. /* * Part II w/ Steve McAuley */ SPU Acceleration of Fragment Shading

110. /* The Pipeline */ Vertices Vertex shader Triangle setup Rasterisation Textures Fragment shader ROP

112. Forward Rendering=FAIL

114. /* Light Pre-Pass */ Geometry Normals Final Colour Geometry Real-Time Lighting Depth

115. /* A Frame of Blur */ Solid Alpha Post Pre-Pass Lights Mirror, Cube Map & Reflection GPU:

122. /* Step #1: The Data */ Normal X Normal Y Depth Hi Transform Depth Lo Lights

124. Divide the frame buffer into tiles.

125. Each tile is a unit of work./* Step #2: Jobs */

128. /* Step #3: Sync */ Solid Alpha Post Pre-Pass Mirror, Cube Map & Reflection GPU: Lights SPUs:

129. /* Step #3: Sync */ Solid Alpha Post Pre-Pass Mirror, Cube Map & Reflection GPU: Lights SPUs:

130. /* Step #3: Sync */ Solid Alpha Post Pre-Pass Mirror, Cube Map & Reflection WriteLabel GPU: Lights SPUs:

131. /* Step #3: Sync */ Solid Alpha Post Pre-Pass Mirror, Cube Map & Reflection WriteLabel GPU: Lights Wait on Label SPUs:

133. /* Step #3: Sync */ Solid Alpha Post Pre-Pass Mirror, Cube Map & Reflection WriteLabel Jump To Self GPU: Lights Wait on Label SPUs:

136. Cull an entire tile if:Depth min and max are both far clip. No lights intersect. /* Step #4: Culling */

137. /* Step #5: Light! */

140. - Array-of-structures: 1 dot product, 23 cycles qword d0 = si_fm(xyz0, abc0); qword d1 = si_rotqbyi(d0, 0x4); qword d2 = si_rotqbyi(d0, 0x8); qword dot = si_fa(d0, d1); dot = si_fa(dot, d2); - Structure-of-arrays: 4 dot products, 18 cycles qword dot0123 = si_fm(x0123, a0123); dot0123 = si_fma(y0123, b0123, dot0123); dot0123 = si_fma(z0123, c0123, dot0123); /* Step #6: Optimise! */

143. Slightly faster than the RSX.

144. An optimisation even if you have nothing to parallelise with!/* Case Study: Lighting */

145. /* Case Study: Lighting */

146. Lighting Damage Rendering Physics /* The Complete Picture */

148. /* Further Reading */ - Steven Tovey & Stephen McAuley, “Parallelized Light Pre-Pass Rendering with the Cell Broadband Engine”, GPU Pro - Stephen McAuley & Steven Tovey, “A Bizarre Way to do Real-Time Lighting”, Develop in Liverpool 2009

149. If you’re talented, then we’re hiring ;) jobs@bizarrecreations.com

150. lqd $r1,question_count stopd $r0,$r0,0x1 ; thanks for listening! ;) brnz $r1,questions

SPU Assisted Rendering

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à SPU Assisted Rendering

Similaire à SPU Assisted Rendering (20)

Dernier

Dernier (20)

SPU Assisted Rendering

Notes de l'éditeur