SlideShare une entreprise Scribd logo
1  sur  92
Télécharger pour lire hors ligne
Leszek Godlewski
Programmer, Nordic Games
Gamedev-grade debugging
Source: http://igetyourfail.blogspot.com/2009/01/reaching-out-tale-of-failed-skinning.html
Nordic Games GmbH
● Started in 2011 as a sister company to Nordic Games
Publishing (We Sing)
● Base IP acquired from JoWooD and DreamCatcher
(SpellForce, The Guild, Aquanox, Painkiller)
● Initially focusing on smaller, niche games
● Acquired THQ IPs in 2013 (Darksiders, Titan Quest, Red
Faction, MX vs. ATV)
● Now shifting towards being a production company with
internal devs
● Since fall 2013: internal studio in Munich, Germany
(Grimlore Games)
Who is this guy?
Leszek Godlewski
Programmer, Nordic Games (early 2014 – Nov 2014)
– Linux port of Darksiders
Freelance Programmer (Sep 2013 – early 2014)
– Linux port of Painkiller Hell & Damnation
– Linux port of Deadfall Adventures
Generalist Programmer, The Farm 51 (Mar 2010 – Aug 2013)
– Painkiller Hell & Damnation, Deadfall Adventures
Agenda
How is gamedev different?
Bug species
Case studies
Conclusions
How is gamedev different?
StartStart Exit?Exit?
EndEnd
Yes
No
UpdateUpdate DrawDraw
33 milliseconds
How much time you have to get shit done™
– 30 Hz → 33⅓ ms per frame
– 60 Hz → 16⅔ ms per frame
EditorEditor
Level toolsLevel tools
Asset toolsAsset tools
EngineEngine
PhysicsPhysics
RenderingRendering AudioAudio
NetworkNetwork
PlatformPlatform
InputInput
Network
back-end
Network
back-end
GameGame
UIUI LogicLogic AIAI
Interdisciplinary working environment
Designers
– Game, Level, Quest, Audio…
Artists
– Environment, Character, 2D, UI, Concept…
Programmers
– Gameplay, Engine, Tools, UI, Audio…
Writers
Composers
Actors
Producers
PR & Marketing Specialists
…
}Tightly
woven
teams
Severe, fixed hardware constraints
Main reason for extensive use of native code
Different trade-offs
Robustness
Cost
Performance
Fun
/Coolness
Enterprise/B2B/webdev Gamedev
Indeterminism & complexity
Leads to poor testability
– Parts make no sense in isolation
– What exactly is correct?
– Performance regressions?
Source: https://github.com/memononen/recastnavigation
Aversion to general software engineering
Modelling
Object-Oriented Programming
Design patterns
C++ STL
Templates in general
…
Agenda
How is gamedev different?
Bug species
Case studies
Conclusions
Bug species
Source: http://benigoat.tumblr.com/post/100306422911/press-b-to-crouch
General programming bugs
Memory access violations
Memory stomping/buffer overflows
Infinite loops
Uninitialized variables
Reference cycles
Floating point precision errors
Out-Of-Memory/memory fragmentation
Memory leaks
Threading errors
Bad maths
Incorrect transform order
– Matrix multiplication not commutative
– AB ≠ BA
Incorrect transform space
Source: http://leadwerks.com/wiki/index.php?title=TFormQuat
Temporal bugs
Incorrect update order
– for (int i = 0; i < entities.size(); ++i)
entities[i].update();
Incorrect interpolation/blending
– Bad alpha term
– Bad blending mode (additive/modulate)
Deferred effects
– After n frames
– After n times an action happens
– n may be random, indeterministic
Graphical glitches
Incorrect render state
Shader code bugs
Precision
Source: http://igetyourfail.blogspot.com/2009/01/visit-lake-fail-this-weekend.html
Content bugs
Incorrect scripts
Buggy assets
Source: http://www.polycount.com/forum/showpost.php?p=1263124&postcount=10466
Worst part?
Most cases are two or more of the aforementioned,
intertwined
Agenda
How is gamedev different?
Bug species
Case studies
Conclusions
Case studies
Most material captured by
Video settings not updating
Incorrect weapon after demon
mode foreshadowing
Post-death sprint camera anim
Corpses teleported on death
Corpses teleported on death
In normal gameplay, pawns have simplified movement
– Sweep the actor's collision primitive through the world
– Slide along slopes, stop against walls
Source: http://udn.epicgames.com/Three/PhysicalAnimation.html
Corpses teleported on death
Upon death, pawns switch to physics-based movement
(ragdoll)
Source: http://udn.epicgames.com/Three/PhysicalAnimation.html
Corpses teleported on death (cont.)
Physics bodies have separate state from the game actor
– Actor does not drive physics bodies, unless requested
– If actor is driven by physics simulation, their location is synchronized to
the hips bone body's
Source: http://udn.epicgames.com/Three/PhysicalAnimation.html
Corpses teleported on death (cont.)
Idea: breakpoint in FarMove()?
– One function because world octree is updated
– Function gets called a gazillion times per frame �
– Terrible noise
Breakpoint condition?
– Teleport from arbitrary point A to arbitrary point B
– Distance?
Breakpoint sequence?
– Break on death instead
– When breakpoint hit, break in FarMove()
Corpses teleported on death (cont.)
Cause: physics body driving the actor with out-of-date
state
Fix: request physics body state synchronization to
animation before switching to ragdoll
Weapons floating away from the
player
Weapons floating away from the
player
Weapons floating away from the player
Extremely rare, only encountered on consoles
– Reproduction rate somewhere at 1 in 50 attempts
– And never on developer machines �
Player pawn in a special state for the rollercoaster ride
– Many things could go wrong
For the lack of repro, sprinkled the code with debug logs
Weapons floating away from the player (cont.)
Cause: incorrect update order
– for (int i = 0; i < entities.size(); ++i)
entities[i].update();
– Player pawn forced to update after rollercoaster car
– Possible for weapons to be updated before player pawns
Fix: enforce weapon update after player pawns
Characters with “rapiers”
Characters with “rapiers”
UE3 has ”content cooking” as part of game build pipeline
– Redistributable builds are ”cooked” builds
Artifact appears only in cooked builds
Characters with “rapiers” – cont.
Logs contained assertions for ”out-of-bounds vertices”
Mesh vertex compression scheme
– 32-bit float → 16-bit short int (~50% savings)
– Find bounding sphere for all vertices
– Normalize all vertices to said sphere radius
– Map [-1; 1] floats to [-32768; 32767] 16-bit integers
Assert condition
– for (int i = 0; i < 3; ++i)
assert(v[i] >= -1.f && v[i] <= 1.f,
”Out-of-bound vertex!”);
Characters with “rapiers” – cont.
v[i] was NaN
– Interesting property of NaN: all comparisons fail
– Even with itself
●
float f = nanf();
bool b = (f == f);
// b is false
How did it get there?!
Tracked the NaN all the way down to the raw engine
asset!
Characters with “rapiers” (cont.)
Cause: ???
Fix: re-export the mesh from 3D software
– Magic!
Meta-case: undeniable assertion
Undeniable assertion
Happened while debugging ”rapiers”
Texture compression library without sources
Flood of non-critical assertions
– For almost every texture
– Could not ignore in bulk �
– Terrible noise
Solution suggestion taken from [SINILO12]
Undeniable assertion (cont.)
Enter disassembly
Undeniable assertion (cont.)
Locate assert message function call instruction
Undeniable assertion (cont.)
Enter memory view and look up the adress
– 0xE8 is the CALL opcode
– 4-byte address argument
Undeniable assertion (cont.)
NOP it out!
– 0x90 is the NOP opcode
Undeniable assertion (cont.)
Incorrect player movement
Incorrect player movement
Incorrect player movement
Recreating player movement from one engine in another
(Pain Engine → Unreal Engine 3)
Different physics engines (Havok vs PhysX)
Many nuances
– Air control
– Jump and fall heights
– Slope & stair climbing & sliding down
Incorrect player movement (cont.)
Main nuance: capsule vs cylinder
Incorrect player movement (cont.)
Switching our pawn collision to capsule-based was not an
option
Emulate by sampling the ground under the cylinder
instead
No clever way to debug, just make it ”bug out” and break
in debugger
Incorrect player movement (cont.)
Situation when getting stuck
Cause: vanilla UE3 code sent a player locked between
non-walkable surfaces into the ”falling” state
Fix: keep the player ”walking”
Incorrect player movement (cont.)
Situation when moving without player intent
Added visualization of sampling, turned on collision
display
Cause: undersampling
Fix: increase radial sampling resolution
1) 2)
Blinking full-screen damage
effects
Blinking full-screen damage effects
Post-process effects are organized in one-way chains
Blinking full-screen damage effects (cont.)
No debugger available to observe the PP chain
Rolled my own overlay that walked and dumped the
chain contents
MaterialEffect 'Vignette'
Param 'Strength' 0.83 [IIIIIIII ]
MaterialEffect 'FilmGrain'
Param 'Strength' 0.00 [ ]
UberPostProcessEffect 'None'
SceneHighLights (X=0.80,Y=0.80,Z=0.80)
SceneMidTones (X=0.80,Y=0.80,Z=0.80)
…
MaterialEffect 'Blood'
Param 'Strength' 1.00 [IIIIIIIIII]
Blinking full-screen damage effects (cont.)
Cause: entire PP chain override
– Breakpoint in chain setting revealed the level script as the source
– Overeager level designer ticking one checkbox too many when setting
up thunderstorm effects
Fix: disable chain overriding altogether
– No use case for it in our game anyway
Incorrect animation states
Incorrect animation states
Incorrect animation states
Incorrect animation states
Animation in UE3 is done by evaluating a tree
– Branches are weight-blended (either replacement or additive blend)
– Sequences (raw animations) for whole-skeleton poses
– Skeletal controls for fine-tuning of individual bones
Source: http://udn.epicgames.com/Three/AnimTreeEditorUserGuide.html
Incorrect animation states (cont.)
Prominent case for domain-specific debuggers
No tools for that in UE3, rolled my own visualizer
– Allows inspection of animation state, but not the reasons for transitions
– Still requires conventional debugging, but narrows it down greatly
– Walks the animation tree and dumps active branches and its parameters
Incorrect animation states (cont.)
We have developed sort of an animation bug checklist
Inspect the animation state in slow motion
– Is the correct blending mode used?
Inspect the AI and cutscene state
– Capable of animation overrides
Inspect the assets (animation sequences)
– Is the root bone correctly oriented?
– Is the root bone motion correct?
– Are inverse kinematics targets present and correctly placed?
– Is the mesh skeleton complete and correct?
Incorrect animation states (cont.)
Incorrect blend of reload animation
– Cause: bad root bone orientation in animation sequence
Left hand off the weapon
– Cause: left hand inverse kinematics was off
– Fix: revise IK state control code
Left hand incorrectly oriented
– Cause: bad IK target marker orientation on weapon mesh
Viewport stretched when portals
are in view
Viewport stretched when portals are in view
Graphics debugging is:
– Tracing & recording graphics API (OpenGL/Direct3D) calls
– Replaying the trace
– Reviewing the renderer state and resources
Trace may be somewhat unreadable at first…
Viewport stretched when portals are… (cont.)
Traces may be annotated for clarity
– Direct3D: ID3DUserDefinedAnnotation
– OpenGL: GL_KHR_debug (more info: [GODLEWSKI01])
Viewport stretched when portals are… (cont.)
Quick renderer state inspection revealed that viewport
dimensions were off
– 1024x1024, 1:1 aspect ratio instead of 1280x720, 16:9
– Shadow map resolution?
Found the latest glViewport() call
– Shadow map indeed
Why wasn't the viewport updated for main scene
rendering?
Viewport stretched when portals are… (cont.)
Renderer state changes are expensive
– New state needs to be validated
– Modern graphics APIs are asynchronous
– State reading may requrie synchronization → stalls
Cache the current renderer state to avoid redundant calls
– Cache ↔ state divergence → bugs!
Viewport stretched when portals are… (cont.)
Cause: cache ↔ state divergence
– Difference between Direct3D and OpenGL: viewport dimensions as part
of render target state, or global state
Fix: tie viewport dimensions to render target in the cache
Black artifacts
Black artifacts
Black artifacts
Black artifacts
Black artifacts
Black artifacts
First thing to do is to inspect the state
Nothing suspicious found, turned to shaders
On OpenGL 4.2+, shaders could be debugged in NSight…
OpenGL 2.1, so had to resort to early returns from shader
with debug colours
– Shader equivalent of debug logs, a.k.a. ”Your Mum's Debugger”
”Shotgun debugging” with is*() functions
isnan() returned true!
Black artifacts (cont.)
Cause: undefined behaviour in NVIDIA's pow()
implementation
– Results are undefined if x < 0.
Results are undefined if x = 0 and y <= 0. [GLSL120]
– Undefined means the implementation is free to do whatever
●
NVIDIA returns QNaN the Barbarian (displayed as black, poisoning
all involved calculations)
●
Other vendors usually return 0
Fix: for all pow() calls, clamp either:
– Arguments to their proper ranges
– Output to [0; ∞)
Mysterious crash
Mysterious crash
Game in content lock (feature freeze) for a while
Playstation 3 port nearly done
Crash ~3-5 frames after entering a specific room
First report included a perfectly normal callstack but no
obvious reason
QA reassigned to another task, could not pursue more
Concluded it must've been an OOM crash
Mysterious crash (cont.)
Bug comes back, albeit with wildly different callstack
Asked QA to reproduce mutliple times, including other
platforms
– No crashes on X360 & Windows!
Totally different callstack each time
Confusion!
– OOM? Even in 512 MB developer mode (256 MB in retail units)?
– Bad content?
– Console OS bug?
– Audio thread?
– ???
Mysterious crash (cont.)
Reviewed a larger sample of callstacks
Most ended in dlmalloc's integrity checks
– Assertions triggered upon allocations and frees
Memory stomping…? Could it be…?
Mysterious crash (cont.)
Started researching memory debugging
No tools provided by Sony
Attempted to use debug allocators (dmalloc et al.)
– Most use the concept of memory fences
– Difficult to hook up to UE3
malloc
Regular allocation Fenced allocation
malloc
Mysterious crash (cont.)
Found and integrated a community-developed tool, Heap
Inspector [VANDERBEEK14]
– Memory analyzer
– Focused on consumption and usage patterns monitoring
– Records callstacks for allocations and frees
Several reproduction attempts revealed a correlation
– Crash adress
– Construction of a specific class
Gotcha!
Mysterious crash (cont.)
// class declaration
class Crasher extends ActorComponent;
var int DummyArray[1024];
// in ammo consumption code
Crash = new class'Crasher';
Comp = new class'ActorComponent'
(Crash);
Mysterious crash (cont.)
// class declaration
class Crasher extends ActorComponent;
var int DummyArray[1024];
// in ammo consumption code
Crash = new class'Crasher';
Comp = new class'ActorComponent'
(Crash);
Mysterious crash (cont.)
Cause: buffer overflow vulnerability in UnrealScript VM
– No manifestation on X360 & Windows due to larger allocation
alignment value (8 vs 16 bytes)
Fix: make copy-construction with subclassed object as
template fail
I wish I had Valgrind! [GODLEWSKI02]
Agenda
How is gamedev different?
Bug species
Case studies
Conclusions
Takeaway
Time is of the essence!
Always on a tight schedule
Constantly in motion
– Temporal visualization is key
– Custom, domain-specific tools
Complex and indeterministic
– Difficult to automate testing
– Wide knowledge required
Prone to bugs outside the code
– Custom, domain-specific tools, again
Takeaway (cont.)
Rendering is a whole separate beast
– Absolutely custom tools in isolation from the rest of the game
– Still far from ideal usability
Good to know your machine down to the metal
Good memory debugging tools make a world's difference
You are never safe, not even in managed languages!
@ l go d l ews k i @ n o rd i c ga m e s . at
t @ T h e I n e Q u ati o n
K w w w. i n e q u ati o n . o rg
Questions?
F u rt h e r N o rd i c G a m e s i nfo rm ati o n :
K w ww. n o rd i c ga m e s . at
Deve l o p me nt i nfo rmati o n :
K ww w. gr i m l o re ga m e s . co m
Thank you!
References
 SINILO12 – Sinilo, M. ”Coding in a debugger” [link]
 GODLEWSKI01 – Godlewski, L. ”OpenGL (ES) debugging” [link]
 GLSL120 – Kessenich, J. ”The OpenGL® Shading Language”, Language Version: 1.20, Document
Revision: 8, p. 57 [link]
 VANDERBEEK14 – van der Beek, J. ”Heap Inspector” [link]
 GODLEWSKI02 – Godlewski, L. ”Advanced Linux Game Programming” [link]

Contenu connexe

Tendances

[COSCUP 2021] A trip about how I contribute to LLVM
[COSCUP 2021] A trip about how I contribute to LLVM[COSCUP 2021] A trip about how I contribute to LLVM
[COSCUP 2021] A trip about how I contribute to LLVMDouglas Chen
 
Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64Scala Italy
 
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...44CON
 
Doom Technical Review
Doom Technical ReviewDoom Technical Review
Doom Technical ReviewAli Salehi
 
Advanced Evasion Techniques by Win32/Gapz
Advanced Evasion Techniques by Win32/GapzAdvanced Evasion Techniques by Win32/Gapz
Advanced Evasion Techniques by Win32/GapzAlex Matrosov
 
A Quick Introduction to Programmable Logic
A Quick Introduction to Programmable LogicA Quick Introduction to Programmable Logic
A Quick Introduction to Programmable LogicOmer Kilic
 
Deep dive into Android’s audio latency problem
Deep dive into Android’s audio latency problemDeep dive into Android’s audio latency problem
Deep dive into Android’s audio latency problemSirawat Pitaksarit
 
Binary art - Byte-ing the PE that fails you (extended offline version)
Binary art - Byte-ing the PE that fails you (extended offline version)Binary art - Byte-ing the PE that fails you (extended offline version)
Binary art - Byte-ing the PE that fails you (extended offline version)Ange Albertini
 
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...CanSecWest
 
Android On Development Boards at AnDevCon3
Android On Development Boards at AnDevCon3Android On Development Boards at AnDevCon3
Android On Development Boards at AnDevCon3Opersys inc.
 
[Ruxcon 2011] Post Memory Corruption Memory Analysis
[Ruxcon 2011] Post Memory Corruption Memory Analysis[Ruxcon 2011] Post Memory Corruption Memory Analysis
[Ruxcon 2011] Post Memory Corruption Memory AnalysisMoabi.com
 
Reconstructing Gapz: Position-Independent Code Analysis Problem
Reconstructing Gapz: Position-Independent Code Analysis ProblemReconstructing Gapz: Position-Independent Code Analysis Problem
Reconstructing Gapz: Position-Independent Code Analysis ProblemAlex Matrosov
 
Engineer Engineering Software
Engineer Engineering SoftwareEngineer Engineering Software
Engineer Engineering SoftwareYung-Yu Chen
 
CODE BLUE 2014 : BadXNU, A rotten apple! by PEDRO VILAÇA
CODE BLUE 2014 : BadXNU, A rotten apple! by PEDRO VILAÇACODE BLUE 2014 : BadXNU, A rotten apple! by PEDRO VILAÇA
CODE BLUE 2014 : BadXNU, A rotten apple! by PEDRO VILAÇACODE BLUE
 

Tendances (18)

[COSCUP 2021] A trip about how I contribute to LLVM
[COSCUP 2021] A trip about how I contribute to LLVM[COSCUP 2021] A trip about how I contribute to LLVM
[COSCUP 2021] A trip about how I contribute to LLVM
 
Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64
 
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
 
Doom Technical Review
Doom Technical ReviewDoom Technical Review
Doom Technical Review
 
Code Injection in Windows
Code Injection in WindowsCode Injection in Windows
Code Injection in Windows
 
Advanced Evasion Techniques by Win32/Gapz
Advanced Evasion Techniques by Win32/GapzAdvanced Evasion Techniques by Win32/Gapz
Advanced Evasion Techniques by Win32/Gapz
 
A Quick Introduction to Programmable Logic
A Quick Introduction to Programmable LogicA Quick Introduction to Programmable Logic
A Quick Introduction to Programmable Logic
 
There is more to C
There is more to CThere is more to C
There is more to C
 
Deep dive into Android’s audio latency problem
Deep dive into Android’s audio latency problemDeep dive into Android’s audio latency problem
Deep dive into Android’s audio latency problem
 
Android ndk
Android ndkAndroid ndk
Android ndk
 
Binary art - Byte-ing the PE that fails you (extended offline version)
Binary art - Byte-ing the PE that fails you (extended offline version)Binary art - Byte-ing the PE that fails you (extended offline version)
Binary art - Byte-ing the PE that fails you (extended offline version)
 
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
 
Android On Development Boards at AnDevCon3
Android On Development Boards at AnDevCon3Android On Development Boards at AnDevCon3
Android On Development Boards at AnDevCon3
 
[Ruxcon 2011] Post Memory Corruption Memory Analysis
[Ruxcon 2011] Post Memory Corruption Memory Analysis[Ruxcon 2011] Post Memory Corruption Memory Analysis
[Ruxcon 2011] Post Memory Corruption Memory Analysis
 
Reconstructing Gapz: Position-Independent Code Analysis Problem
Reconstructing Gapz: Position-Independent Code Analysis ProblemReconstructing Gapz: Position-Independent Code Analysis Problem
Reconstructing Gapz: Position-Independent Code Analysis Problem
 
Engineer Engineering Software
Engineer Engineering SoftwareEngineer Engineering Software
Engineer Engineering Software
 
CODE BLUE 2014 : BadXNU, A rotten apple! by PEDRO VILAÇA
CODE BLUE 2014 : BadXNU, A rotten apple! by PEDRO VILAÇACODE BLUE 2014 : BadXNU, A rotten apple! by PEDRO VILAÇA
CODE BLUE 2014 : BadXNU, A rotten apple! by PEDRO VILAÇA
 
Asus Tinker Board
Asus Tinker BoardAsus Tinker Board
Asus Tinker Board
 

Similaire à Gamedev Debugging Techniques

Dynamic Wounds on Animated Characters in UE4
Dynamic Wounds on Animated Characters in UE4Dynamic Wounds on Animated Characters in UE4
Dynamic Wounds on Animated Characters in UE4Michał Kłoś
 
Gdc gameplay replication in acu with videos
Gdc   gameplay replication in acu with videosGdc   gameplay replication in acu with videos
Gdc gameplay replication in acu with videosCharles Lefebvre
 
Claudia Doppioslash - Time Travel for game development with Elm
Claudia Doppioslash - Time Travel for game development with ElmClaudia Doppioslash - Time Travel for game development with Elm
Claudia Doppioslash - Time Travel for game development with ElmCodemotion
 
Deterministic Simulation - What modern online games can learn from the Game B...
Deterministic Simulation - What modern online games can learn from the Game B...Deterministic Simulation - What modern online games can learn from the Game B...
Deterministic Simulation - What modern online games can learn from the Game B...David Salz
 
Umbra Ignite 2015: Thor Gunnarsson & Reynir Hardarson – Nailing AAA quality i...
Umbra Ignite 2015: Thor Gunnarsson & Reynir Hardarson – Nailing AAA quality i...Umbra Ignite 2015: Thor Gunnarsson & Reynir Hardarson – Nailing AAA quality i...
Umbra Ignite 2015: Thor Gunnarsson & Reynir Hardarson – Nailing AAA quality i...Umbra Software
 
Finding Monsters Adventure VR Experience
Finding Monsters Adventure VR ExperienceFinding Monsters Adventure VR Experience
Finding Monsters Adventure VR ExperienceRafael Ferrari
 
BSidesDelhi 2018: Headshot - Game Hacking on macOS
BSidesDelhi 2018: Headshot - Game Hacking on macOSBSidesDelhi 2018: Headshot - Game Hacking on macOS
BSidesDelhi 2018: Headshot - Game Hacking on macOSBSides Delhi
 
Minko stage3d workshop_20130525
Minko stage3d workshop_20130525Minko stage3d workshop_20130525
Minko stage3d workshop_20130525Minko3D
 
The Next Mainstream Programming Language: A Game Developer's Perspective
The Next Mainstream Programming Language: A Game Developer's PerspectiveThe Next Mainstream Programming Language: A Game Developer's Perspective
The Next Mainstream Programming Language: A Game Developer's Perspectivekfrdbs
 
Game development with Cocos2d-x Engine
Game development with Cocos2d-x EngineGame development with Cocos2d-x Engine
Game development with Cocos2d-x EngineDuy Tan Geek
 
ميهين
ميهينميهين
ميهينAhmed
 
Mastering Multiplayer Stage3d and AIR game development for mobile devices
Mastering Multiplayer Stage3d and AIR game development for mobile devicesMastering Multiplayer Stage3d and AIR game development for mobile devices
Mastering Multiplayer Stage3d and AIR game development for mobile devicesJean-Philippe Doiron
 
FGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie TycoonFGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie Tycoonmochimedia
 
Game development
Game developmentGame development
Game developmentAsido_
 
GDC 2015 でのハイエンドグラフィックス
GDC 2015 でのハイエンドグラフィックスGDC 2015 でのハイエンドグラフィックス
GDC 2015 でのハイエンドグラフィックスTakashi Imagire
 
The nitty gritty of game development
The nitty gritty of game developmentThe nitty gritty of game development
The nitty gritty of game developmentbasisspace
 
Visibility Optimization for Games
Visibility Optimization for GamesVisibility Optimization for Games
Visibility Optimization for GamesUmbra
 

Similaire à Gamedev Debugging Techniques (20)

Dynamic Wounds on Animated Characters in UE4
Dynamic Wounds on Animated Characters in UE4Dynamic Wounds on Animated Characters in UE4
Dynamic Wounds on Animated Characters in UE4
 
Gdc gameplay replication in acu with videos
Gdc   gameplay replication in acu with videosGdc   gameplay replication in acu with videos
Gdc gameplay replication in acu with videos
 
Claudia Doppioslash - Time Travel for game development with Elm
Claudia Doppioslash - Time Travel for game development with ElmClaudia Doppioslash - Time Travel for game development with Elm
Claudia Doppioslash - Time Travel for game development with Elm
 
Deterministic Simulation - What modern online games can learn from the Game B...
Deterministic Simulation - What modern online games can learn from the Game B...Deterministic Simulation - What modern online games can learn from the Game B...
Deterministic Simulation - What modern online games can learn from the Game B...
 
Umbra Ignite 2015: Thor Gunnarsson & Reynir Hardarson – Nailing AAA quality i...
Umbra Ignite 2015: Thor Gunnarsson & Reynir Hardarson – Nailing AAA quality i...Umbra Ignite 2015: Thor Gunnarsson & Reynir Hardarson – Nailing AAA quality i...
Umbra Ignite 2015: Thor Gunnarsson & Reynir Hardarson – Nailing AAA quality i...
 
Finding Monsters Adventure VR Experience
Finding Monsters Adventure VR ExperienceFinding Monsters Adventure VR Experience
Finding Monsters Adventure VR Experience
 
Unity
UnityUnity
Unity
 
BSidesDelhi 2018: Headshot - Game Hacking on macOS
BSidesDelhi 2018: Headshot - Game Hacking on macOSBSidesDelhi 2018: Headshot - Game Hacking on macOS
BSidesDelhi 2018: Headshot - Game Hacking on macOS
 
Minko stage3d workshop_20130525
Minko stage3d workshop_20130525Minko stage3d workshop_20130525
Minko stage3d workshop_20130525
 
The Next Mainstream Programming Language: A Game Developer's Perspective
The Next Mainstream Programming Language: A Game Developer's PerspectiveThe Next Mainstream Programming Language: A Game Developer's Perspective
The Next Mainstream Programming Language: A Game Developer's Perspective
 
Game development with Cocos2d-x Engine
Game development with Cocos2d-x EngineGame development with Cocos2d-x Engine
Game development with Cocos2d-x Engine
 
Md2010 jl-wp7-sl-game-dev
Md2010 jl-wp7-sl-game-devMd2010 jl-wp7-sl-game-dev
Md2010 jl-wp7-sl-game-dev
 
ميهين
ميهينميهين
ميهين
 
Mastering Multiplayer Stage3d and AIR game development for mobile devices
Mastering Multiplayer Stage3d and AIR game development for mobile devicesMastering Multiplayer Stage3d and AIR game development for mobile devices
Mastering Multiplayer Stage3d and AIR game development for mobile devices
 
FGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie TycoonFGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie Tycoon
 
Game development
Game developmentGame development
Game development
 
XRobots
XRobotsXRobots
XRobots
 
GDC 2015 でのハイエンドグラフィックス
GDC 2015 でのハイエンドグラフィックスGDC 2015 でのハイエンドグラフィックス
GDC 2015 でのハイエンドグラフィックス
 
The nitty gritty of game development
The nitty gritty of game developmentThe nitty gritty of game development
The nitty gritty of game development
 
Visibility Optimization for Games
Visibility Optimization for GamesVisibility Optimization for Games
Visibility Optimization for Games
 

Gamedev Debugging Techniques

  • 1. Leszek Godlewski Programmer, Nordic Games Gamedev-grade debugging Source: http://igetyourfail.blogspot.com/2009/01/reaching-out-tale-of-failed-skinning.html
  • 2. Nordic Games GmbH ● Started in 2011 as a sister company to Nordic Games Publishing (We Sing) ● Base IP acquired from JoWooD and DreamCatcher (SpellForce, The Guild, Aquanox, Painkiller) ● Initially focusing on smaller, niche games ● Acquired THQ IPs in 2013 (Darksiders, Titan Quest, Red Faction, MX vs. ATV) ● Now shifting towards being a production company with internal devs ● Since fall 2013: internal studio in Munich, Germany (Grimlore Games)
  • 3. Who is this guy? Leszek Godlewski Programmer, Nordic Games (early 2014 – Nov 2014) – Linux port of Darksiders Freelance Programmer (Sep 2013 – early 2014) – Linux port of Painkiller Hell & Damnation – Linux port of Deadfall Adventures Generalist Programmer, The Farm 51 (Mar 2010 – Aug 2013) – Painkiller Hell & Damnation, Deadfall Adventures
  • 4. Agenda How is gamedev different? Bug species Case studies Conclusions
  • 5. How is gamedev different? StartStart Exit?Exit? EndEnd Yes No UpdateUpdate DrawDraw
  • 6. 33 milliseconds How much time you have to get shit done™ – 30 Hz → 33⅓ ms per frame – 60 Hz → 16⅔ ms per frame EditorEditor Level toolsLevel tools Asset toolsAsset tools EngineEngine PhysicsPhysics RenderingRendering AudioAudio NetworkNetwork PlatformPlatform InputInput Network back-end Network back-end GameGame UIUI LogicLogic AIAI
  • 7. Interdisciplinary working environment Designers – Game, Level, Quest, Audio… Artists – Environment, Character, 2D, UI, Concept… Programmers – Gameplay, Engine, Tools, UI, Audio… Writers Composers Actors Producers PR & Marketing Specialists … }Tightly woven teams
  • 8. Severe, fixed hardware constraints Main reason for extensive use of native code
  • 10. Indeterminism & complexity Leads to poor testability – Parts make no sense in isolation – What exactly is correct? – Performance regressions? Source: https://github.com/memononen/recastnavigation
  • 11. Aversion to general software engineering Modelling Object-Oriented Programming Design patterns C++ STL Templates in general …
  • 12. Agenda How is gamedev different? Bug species Case studies Conclusions
  • 14. General programming bugs Memory access violations Memory stomping/buffer overflows Infinite loops Uninitialized variables Reference cycles Floating point precision errors Out-Of-Memory/memory fragmentation Memory leaks Threading errors
  • 15. Bad maths Incorrect transform order – Matrix multiplication not commutative – AB ≠ BA Incorrect transform space Source: http://leadwerks.com/wiki/index.php?title=TFormQuat
  • 16. Temporal bugs Incorrect update order – for (int i = 0; i < entities.size(); ++i) entities[i].update(); Incorrect interpolation/blending – Bad alpha term – Bad blending mode (additive/modulate) Deferred effects – After n frames – After n times an action happens – n may be random, indeterministic
  • 17. Graphical glitches Incorrect render state Shader code bugs Precision Source: http://igetyourfail.blogspot.com/2009/01/visit-lake-fail-this-weekend.html
  • 18. Content bugs Incorrect scripts Buggy assets Source: http://www.polycount.com/forum/showpost.php?p=1263124&postcount=10466
  • 19. Worst part? Most cases are two or more of the aforementioned, intertwined
  • 20. Agenda How is gamedev different? Bug species Case studies Conclusions
  • 22. Video settings not updating
  • 23. Incorrect weapon after demon mode foreshadowing
  • 26. Corpses teleported on death In normal gameplay, pawns have simplified movement – Sweep the actor's collision primitive through the world – Slide along slopes, stop against walls Source: http://udn.epicgames.com/Three/PhysicalAnimation.html
  • 27. Corpses teleported on death Upon death, pawns switch to physics-based movement (ragdoll) Source: http://udn.epicgames.com/Three/PhysicalAnimation.html
  • 28. Corpses teleported on death (cont.) Physics bodies have separate state from the game actor – Actor does not drive physics bodies, unless requested – If actor is driven by physics simulation, their location is synchronized to the hips bone body's Source: http://udn.epicgames.com/Three/PhysicalAnimation.html
  • 29. Corpses teleported on death (cont.) Idea: breakpoint in FarMove()? – One function because world octree is updated – Function gets called a gazillion times per frame � – Terrible noise Breakpoint condition? – Teleport from arbitrary point A to arbitrary point B – Distance? Breakpoint sequence? – Break on death instead – When breakpoint hit, break in FarMove()
  • 30. Corpses teleported on death (cont.) Cause: physics body driving the actor with out-of-date state Fix: request physics body state synchronization to animation before switching to ragdoll
  • 31. Weapons floating away from the player
  • 32. Weapons floating away from the player
  • 33. Weapons floating away from the player Extremely rare, only encountered on consoles – Reproduction rate somewhere at 1 in 50 attempts – And never on developer machines � Player pawn in a special state for the rollercoaster ride – Many things could go wrong For the lack of repro, sprinkled the code with debug logs
  • 34. Weapons floating away from the player (cont.) Cause: incorrect update order – for (int i = 0; i < entities.size(); ++i) entities[i].update(); – Player pawn forced to update after rollercoaster car – Possible for weapons to be updated before player pawns Fix: enforce weapon update after player pawns
  • 36. Characters with “rapiers” UE3 has ”content cooking” as part of game build pipeline – Redistributable builds are ”cooked” builds Artifact appears only in cooked builds
  • 37. Characters with “rapiers” – cont. Logs contained assertions for ”out-of-bounds vertices” Mesh vertex compression scheme – 32-bit float → 16-bit short int (~50% savings) – Find bounding sphere for all vertices – Normalize all vertices to said sphere radius – Map [-1; 1] floats to [-32768; 32767] 16-bit integers Assert condition – for (int i = 0; i < 3; ++i) assert(v[i] >= -1.f && v[i] <= 1.f, ”Out-of-bound vertex!”);
  • 38. Characters with “rapiers” – cont. v[i] was NaN – Interesting property of NaN: all comparisons fail – Even with itself ● float f = nanf(); bool b = (f == f); // b is false How did it get there?! Tracked the NaN all the way down to the raw engine asset!
  • 39. Characters with “rapiers” (cont.) Cause: ??? Fix: re-export the mesh from 3D software – Magic!
  • 41. Undeniable assertion Happened while debugging ”rapiers” Texture compression library without sources Flood of non-critical assertions – For almost every texture – Could not ignore in bulk � – Terrible noise Solution suggestion taken from [SINILO12]
  • 43. Undeniable assertion (cont.) Locate assert message function call instruction
  • 44. Undeniable assertion (cont.) Enter memory view and look up the adress – 0xE8 is the CALL opcode – 4-byte address argument
  • 45. Undeniable assertion (cont.) NOP it out! – 0x90 is the NOP opcode
  • 49. Incorrect player movement Recreating player movement from one engine in another (Pain Engine → Unreal Engine 3) Different physics engines (Havok vs PhysX) Many nuances – Air control – Jump and fall heights – Slope & stair climbing & sliding down
  • 50. Incorrect player movement (cont.) Main nuance: capsule vs cylinder
  • 51. Incorrect player movement (cont.) Switching our pawn collision to capsule-based was not an option Emulate by sampling the ground under the cylinder instead No clever way to debug, just make it ”bug out” and break in debugger
  • 52. Incorrect player movement (cont.) Situation when getting stuck Cause: vanilla UE3 code sent a player locked between non-walkable surfaces into the ”falling” state Fix: keep the player ”walking”
  • 53. Incorrect player movement (cont.) Situation when moving without player intent Added visualization of sampling, turned on collision display Cause: undersampling Fix: increase radial sampling resolution 1) 2)
  • 55. Blinking full-screen damage effects Post-process effects are organized in one-way chains
  • 56. Blinking full-screen damage effects (cont.) No debugger available to observe the PP chain Rolled my own overlay that walked and dumped the chain contents MaterialEffect 'Vignette' Param 'Strength' 0.83 [IIIIIIII ] MaterialEffect 'FilmGrain' Param 'Strength' 0.00 [ ] UberPostProcessEffect 'None' SceneHighLights (X=0.80,Y=0.80,Z=0.80) SceneMidTones (X=0.80,Y=0.80,Z=0.80) … MaterialEffect 'Blood' Param 'Strength' 1.00 [IIIIIIIIII]
  • 57. Blinking full-screen damage effects (cont.) Cause: entire PP chain override – Breakpoint in chain setting revealed the level script as the source – Overeager level designer ticking one checkbox too many when setting up thunderstorm effects Fix: disable chain overriding altogether – No use case for it in our game anyway
  • 61. Incorrect animation states Animation in UE3 is done by evaluating a tree – Branches are weight-blended (either replacement or additive blend) – Sequences (raw animations) for whole-skeleton poses – Skeletal controls for fine-tuning of individual bones Source: http://udn.epicgames.com/Three/AnimTreeEditorUserGuide.html
  • 62. Incorrect animation states (cont.) Prominent case for domain-specific debuggers No tools for that in UE3, rolled my own visualizer – Allows inspection of animation state, but not the reasons for transitions – Still requires conventional debugging, but narrows it down greatly – Walks the animation tree and dumps active branches and its parameters
  • 63. Incorrect animation states (cont.) We have developed sort of an animation bug checklist Inspect the animation state in slow motion – Is the correct blending mode used? Inspect the AI and cutscene state – Capable of animation overrides Inspect the assets (animation sequences) – Is the root bone correctly oriented? – Is the root bone motion correct? – Are inverse kinematics targets present and correctly placed? – Is the mesh skeleton complete and correct?
  • 64. Incorrect animation states (cont.) Incorrect blend of reload animation – Cause: bad root bone orientation in animation sequence Left hand off the weapon – Cause: left hand inverse kinematics was off – Fix: revise IK state control code Left hand incorrectly oriented – Cause: bad IK target marker orientation on weapon mesh
  • 65. Viewport stretched when portals are in view
  • 66. Viewport stretched when portals are in view Graphics debugging is: – Tracing & recording graphics API (OpenGL/Direct3D) calls – Replaying the trace – Reviewing the renderer state and resources Trace may be somewhat unreadable at first…
  • 67. Viewport stretched when portals are… (cont.) Traces may be annotated for clarity – Direct3D: ID3DUserDefinedAnnotation – OpenGL: GL_KHR_debug (more info: [GODLEWSKI01])
  • 68. Viewport stretched when portals are… (cont.) Quick renderer state inspection revealed that viewport dimensions were off – 1024x1024, 1:1 aspect ratio instead of 1280x720, 16:9 – Shadow map resolution? Found the latest glViewport() call – Shadow map indeed Why wasn't the viewport updated for main scene rendering?
  • 69. Viewport stretched when portals are… (cont.) Renderer state changes are expensive – New state needs to be validated – Modern graphics APIs are asynchronous – State reading may requrie synchronization → stalls Cache the current renderer state to avoid redundant calls – Cache ↔ state divergence → bugs!
  • 70. Viewport stretched when portals are… (cont.) Cause: cache ↔ state divergence – Difference between Direct3D and OpenGL: viewport dimensions as part of render target state, or global state Fix: tie viewport dimensions to render target in the cache
  • 76. Black artifacts First thing to do is to inspect the state Nothing suspicious found, turned to shaders On OpenGL 4.2+, shaders could be debugged in NSight… OpenGL 2.1, so had to resort to early returns from shader with debug colours – Shader equivalent of debug logs, a.k.a. ”Your Mum's Debugger” ”Shotgun debugging” with is*() functions isnan() returned true!
  • 77. Black artifacts (cont.) Cause: undefined behaviour in NVIDIA's pow() implementation – Results are undefined if x < 0. Results are undefined if x = 0 and y <= 0. [GLSL120] – Undefined means the implementation is free to do whatever ● NVIDIA returns QNaN the Barbarian (displayed as black, poisoning all involved calculations) ● Other vendors usually return 0 Fix: for all pow() calls, clamp either: – Arguments to their proper ranges – Output to [0; ∞)
  • 79. Mysterious crash Game in content lock (feature freeze) for a while Playstation 3 port nearly done Crash ~3-5 frames after entering a specific room First report included a perfectly normal callstack but no obvious reason QA reassigned to another task, could not pursue more Concluded it must've been an OOM crash
  • 80. Mysterious crash (cont.) Bug comes back, albeit with wildly different callstack Asked QA to reproduce mutliple times, including other platforms – No crashes on X360 & Windows! Totally different callstack each time Confusion! – OOM? Even in 512 MB developer mode (256 MB in retail units)? – Bad content? – Console OS bug? – Audio thread? – ???
  • 81. Mysterious crash (cont.) Reviewed a larger sample of callstacks Most ended in dlmalloc's integrity checks – Assertions triggered upon allocations and frees Memory stomping…? Could it be…?
  • 82. Mysterious crash (cont.) Started researching memory debugging No tools provided by Sony Attempted to use debug allocators (dmalloc et al.) – Most use the concept of memory fences – Difficult to hook up to UE3 malloc Regular allocation Fenced allocation malloc
  • 83. Mysterious crash (cont.) Found and integrated a community-developed tool, Heap Inspector [VANDERBEEK14] – Memory analyzer – Focused on consumption and usage patterns monitoring – Records callstacks for allocations and frees Several reproduction attempts revealed a correlation – Crash adress – Construction of a specific class Gotcha!
  • 84. Mysterious crash (cont.) // class declaration class Crasher extends ActorComponent; var int DummyArray[1024]; // in ammo consumption code Crash = new class'Crasher'; Comp = new class'ActorComponent' (Crash);
  • 85. Mysterious crash (cont.) // class declaration class Crasher extends ActorComponent; var int DummyArray[1024]; // in ammo consumption code Crash = new class'Crasher'; Comp = new class'ActorComponent' (Crash);
  • 86. Mysterious crash (cont.) Cause: buffer overflow vulnerability in UnrealScript VM – No manifestation on X360 & Windows due to larger allocation alignment value (8 vs 16 bytes) Fix: make copy-construction with subclassed object as template fail I wish I had Valgrind! [GODLEWSKI02]
  • 87. Agenda How is gamedev different? Bug species Case studies Conclusions
  • 88. Takeaway Time is of the essence! Always on a tight schedule Constantly in motion – Temporal visualization is key – Custom, domain-specific tools Complex and indeterministic – Difficult to automate testing – Wide knowledge required Prone to bugs outside the code – Custom, domain-specific tools, again
  • 89. Takeaway (cont.) Rendering is a whole separate beast – Absolutely custom tools in isolation from the rest of the game – Still far from ideal usability Good to know your machine down to the metal Good memory debugging tools make a world's difference You are never safe, not even in managed languages!
  • 90. @ l go d l ews k i @ n o rd i c ga m e s . at t @ T h e I n e Q u ati o n K w w w. i n e q u ati o n . o rg Questions?
  • 91. F u rt h e r N o rd i c G a m e s i nfo rm ati o n : K w ww. n o rd i c ga m e s . at Deve l o p me nt i nfo rmati o n : K ww w. gr i m l o re ga m e s . co m Thank you!
  • 92. References  SINILO12 – Sinilo, M. ”Coding in a debugger” [link]  GODLEWSKI01 – Godlewski, L. ”OpenGL (ES) debugging” [link]  GLSL120 – Kessenich, J. ”The OpenGL® Shading Language”, Language Version: 1.20, Document Revision: 8, p. 57 [link]  VANDERBEEK14 – van der Beek, J. ”Heap Inspector” [link]  GODLEWSKI02 – Godlewski, L. ”Advanced Linux Game Programming” [link]