One of the key challenges of modern 3D game rendering engines powering the next-generation of console games is to minimize resources spent on assets that do not actually contribute to the user experience. More specifically, determining which surfaces are hidden behind (occluded by) other surfaces can be a very hard problem to solve in real-time, but will typically yield significant performance gains.
Real-time occlusion culling typically requires either a vast amount of manual labor or a computationally intensive pre-processing step. In this talk, I will show how the occluder generation step can actually be considered embarrassingly parallel, and distributed across multiple nodes accordingly. I will also discuss how this model can be further improved.
2. Who are we?
• The only occlusion culling middleware
company in the world
• Founded in 2006
• Based in Helsinki
• 12 people
• Customers: Bungie (Halo), Guerrilla (Killzone),
Remedy (Alan Wake), Bioware (Mass Effect),
CD Projekt (Witcher), ArenaNet (Guild Wars)
and many more
3. We’re going to talk about
• The past
– Brief introduction to occlusion culling
– Traditional methods of visibility computation
• The present
– Umbra’s visibility computation algorithm
– How it can be distributed
• The future
– Challenges of modern games and engines
5. Graphics in games
• Game development process:
– Artists create content
– Engine runtime renders it
• Rendering
– Content consists of objects
– Which consist of triangles
– Which get rendered by the GPU
• Our business: rendering optimization
6. Occlusion culling explained
• ”Culling is the process of removing breeding
animals from a group based on specific criteria.”
(Wikipedia)
• Hidden surface removal: ”Which surfaces do not
contribute to the final rendered image on the
screen?”
• Some popular HSR methods:
– Frustum culling
– Backface culling
– Occlusion culling
7. Occlusion culling explained
• Occlusion culling: ”Which surfaces are blocked
(occluded) by other surfaces?”
• Depth buffering is one way to do OC
– Very accurate (i.e. pixel level)
– Ubiquitous on hardware, easy problem to solve
– Occurs very late in the pipeline
8. Occlusion culling explained
• Higher-level methods complement depth-
buffering nicely
• These cull entire objects, groups of objects or
entire sections of the scene
– Not easy!
• The earlier, the better
10. ”Traditional” way to do OC
• Preprocess:
– Divide scene into cells
– Compute visibility between cells
• Results in a visibility matrix (PVS)
• Runtime:
– Locate the camera
– Do a lookup into the PVS matrix
18. Problem?
• Solving visibility between cells is very difficult
– E.g. Solving analytically is actually O(n4)
• Global operation by nature
• Doesn’t play well with dynamic scenes
– Worst case: a change in one cell requires
recomputation of the entire matrix
20. Welcome to the 2010s
• Modern game worlds are huge
• So it’d be cool if you didn’t need the entire
scene in memory, ever
• It’d be even cooler if the heavy lifting could be
distributed. Or sent to the Cloud™
• Buildings collapse. Things change.
21. The Umbra approach
• Don’t actually compute visibility for the entire
scene
• Instead, process geometry to create a
datastructure to solve visibility in the runtime
• Portal culling in the runtime
22. Data generation
• Data = portal graph
• Generate local graphs individually reasonably-
sized geometry chunks (tiles), in parallel
• Combine the results into a global portal graph
that can be quickly traversed
• Solve visibility quickly in the runtime using this
graph
23. Will this work?
• Portal generation
– Is very hard, but possible to do automatically
– Only local geometry needed
→Pretty much an embarrassingly parallel problem
• Runtime
– Not as simple as a PVS lookup, but still quite fast
30. What did we do here?
• Essentially a map-reduce
– Split scene into distributable tiles
– Generate local portal graph for each tile
– Combine results, link global portal graph
Runtime
Scene Tile 0 Portals 0 Global portal Visible
graph objects
Reduce
Tile 1 Portals 1
Query
Map
... ...
Tile n Portals n
32. Turns out...
• Even the initial ”map” is too much for large
game worlds
• A global graph of a vast world is too expensive
in the runtime
• You need to support multiple versions of some
chunks for dynamic content
– Quite a combinatorial problem
→ Next-gen games require an even better
solution!
33. So we did something like this
Runtime
Tile 0 Portals 0 Graph A Visible
objects
Combine
Query
Tile 1 Portals 1
Tile 2 Portals 2
Tile 3 Portals 3 Graph B Visible
Combine
objects
Query
... ... ...
Tile n Portals n
34. Got rid of ”map”
Runtime
Tile 0 Portals 0 Graph A Visible
objects
Combine
Query
Tile 1 Portals 1
Tile 2 Portals 2
Tile 3 Portals 3 Graph B Visible
Combine
objects
Query
... ... ...
Tile n Portals n
35. Split up ”reduce”, moved to runtime
Runtime
Tile 0 Portals 0 Graph A Visible
objects
Combine
Query
Tile 1 Portals 1
Tile 2 Portals 2
Tile 3 Portals 3 Graph B Visible
Combine
objects
Query
... ... ...
Tile n Portals n