by Marco Trivellato - In this presentation we will provide in-depth knowledge about the Unity runtime. The first part will focus on memory and how to deal with fragmentation and garbage collection. The second part will cover implementation details and their memory vs cycles tradeoffs in both Unity4 and the upcoming Unity5.
6. Memory Overview
• Native (internal)
– Assets data, game objects and components
– Engine internals
• Managed (Mono)
– Scripts objects (managed DLLs)
– wrappers for Unity objects
• Native Dlls
– User’s and 3rd parties Dlls
7. Managed Memory Internals
• Allocates system heap blocks for internal
allocator
• Will allocate new heap blocks when needed
• Garbage collector cleans up
• Heap blocks are kept in Mono for later use
– Memory can be given back to the system after a
while
– …but it depends on the platform don’t count
on it
• Fragmentation can cause new heap blocks
even though memory is not exhausted
8. Reference vs Value Types
Value types (bool, int,
float, struct, ...)
• Exist in stack memory
• De-allocated when
removed from the stack
• No Garbage
Reference types
(classes)
• Exist on the heap and
are handled by the
mono/.net GC
• De-allocated when no
longer referenced
• Lots of Garbage
9. Garbage Collection
• Roots are not collected in a GC.Collect
– Thread stacks
– CPU Registers
– GC Handles (used by Unity to hold onto
managed objects)
– Static variables!!
• Collection time scales with managed heap
size
– The more you allocate, the slower it gets
10. Temporary Allocations
• Don’t use FindObjects or LINQ
• Use StringBuilder for string concatenation
• Reuse large temporary work buffers
• ToString()
• .tag use CompareTag() instead
11. Internal Temporary Allocations
Some Examples:
– GetComponents<T>
– Vector3[] Mesh.vertices
– Camera[] Camera.allCameras
– foreach
• does not allocate by definition
• However, there can be a small allocation, depending on
the implementation of .GetEnumerator()
5.x: We are working on new non-allocating versions
13. Memory Fragmentation
• Memory fragmentation is hard to account for
– Fully unload dynamically allocated content
– Switch to a blank scene before proceeding to next
level
• This scene could have a hook where you may pause the
game long enough to sample if there is anything significant in
memory
• Ensure you clear out variables so GC.Collect will
remove as much as possible
• Avoid allocations where possible
• Reuse objects where possible within a scene play
• Clear them out for map load to clean the memory
14. Wrappers: Disposable Types
Some Objects used in scripts have large
native backing memory in unity
– Memory not freed for some time…
Managed Native
WWW
Compressed file
Decompression buffer
Decompressed file
15. Garbage Collection
• GC.Collect
– Runs on the main thread when
• Mono exhausts the heap space
• Or user calls System.GC.Collect()
• Finalizers
– Run on a separate thread
• Controlled by mono
• Can have several seconds delay
• Unity native memory
– Dispose() cleans up internal memory
• Eventually called from finalizer
• Manually call Dispose() to cleanup
Main thread Finalizer thread
www = null;
new(someclass);
//no more heap
-> GC.Collect();
www.Dispose();
16. Wrappers for Unity Objects
• Inherit from Object
• Types:
– GameObject
– Assets: Texture2D, AudioClip, Mesh, etc…
– Components: MeshRenderer, Transform,
MonoBehaviour
• Native Memory is released when Destroy
is called
17. Best Practices
• Reuse objects Use object pools
• Prefer stack-based allocations Use
struct instead of class
• System.GC.Collect can be used to trigger
collection
• Calling it 6 times returns the unused
memory to the OS
• Manually call Dispose to cleanup
immediately
19. Mesh Read/Write Option
• It allows you to modify the mesh at run-time
• If enabled, a system-copy of the Mesh will
remain in memory
• It is enabled by default
• In some cases, disabling this option will
not reduce the memory usage
– Skinned meshes
– iOS
20. Non-Uniform scaled Meshes
We need to correctly transform vertex normals
• Unity 4.x:
– transform the mesh on the CPU
– create an extra copy of the data
• Unity 5.0
– Scaled on GPU
– Extra memory no longer needed
21. Static Batching
What is it ?
• It’s an optimization that reduces number of draw calls
and state changes
How do I enable it ?
• In the player settings + Tag the object as static
22. Static Batching cont.ed
How does it work internally ?
• Build-time: Vertices are transformed to world-space
• Run-time: Index buffer is created with indices
of visible objects
Unity 5.0:
• Re-implemented static batching without
copying of index buffers
• Beware of misleading stats
23. Dynamic Batching
What is it ?
• Similar to Static Batching but it batches non-static
objects at run-time
How do I enable it ?
• In the player settings
• no need to tag. it auto-magically works…
24. Dynamic Batching cont.ed
How does it work internally ?
• objects are transformed to world space on
the CPU
• Temporary VB & IB are created
• Rendered in one draw call
29. Command Buffers
• Command buffers
hold list of
rendering
commands
• They can be set to
execute at various
points during
camera rendering
30. Shadows
• Directional Light:
• Use CSM, up to 4 cascades
• they are rendered into screen space to a
32bit RT
• Point Light:
• Render 6 cube faces
• Spot Light:
• One shadow map per light
31. Mesh Skinning
Different Implementations depending on platform:
• x86: SSE
• iOS/Android/WP8: Neon optimizations
• D3D11/XBoxOne/GLES3.0: GPU
• XBox360, WiiU: GPU (memexport)
• PS3: SPU
• WiiU: GPU w/ stream out
Unity 5.0: Skinned meshes use less memory by
sharing index buffers between instances
32. Best Practices
• Try different Render Paths
– Performance depends on scene and platform
• Mix Realtime and Baked Lighting
• Use Level-Of-Detail Techniques
– Mesh, Texture, Shader
33. Scripting API and JIT compilation performance, allocations
SCRIPTING
34. GetComponent<T>
It asks the GameObject, for a component of
the specified type:
• The GO contains a list of Components
• Each Component type is compared to T
• The first Component of type T (or that
derives from T), will be returned to the
caller
• Not too much overhead but it still needs to
call into native code
35. Property Accessors
• Most accessors will be removed in Unity 5.0
• The objective is to reduce dependencies,
therefore improve modularization
• Transform will remain
• Existing scripts will be converted. Example:
in 5.0:
36. Transform Component
• this.transform is the same as GetComponent<Transform>()
• transform.position/rotation needs to:
– find Transform component
– Traverse hierarchy to calculate absolute position
– Apply translation/rotation
• transform internally stores the position relative to the parent
– transform.localPosition = new Vector(…) simple
assignment
– transform.position = new Vector(…) costs the same if
no father, otherwise it will need to traverse the hierarchy
up to transform the abs position into local
• finally, other components (collider, rigid body, light, camera,
etc..) will be notified via messages
37. WWW class properties
WWW.texture: Allocates a new Texture2D
…another example is WWW.audioClip
39. Object.Instantiate cont.ed
• Awake can be expensive
• AwakeFromLoad (main thread)
– clear states
– internal state caching
– pre-compute
Unity 5.0:
• Allocations have been reduced
• Some inner loops for copying the data have been
optimized
40. JIT Compilation
What is it ?
• The process in which machine code is generated from
CIL code during the application's run-time
Pros:
• It generates optimized code for the current platform
Cons:
• Each time a method is called for the first time, the
application will suffer a certain performance penalty
because of the compilation
41. JIT compilation spikes
What about pre-JITting ?
• RuntimeHelpers.PrepareMethod does not work:
…better to use MethodHandle.GetFunctionPointer()
43. Best Practices
• Don’t make assumptions
• Platform X != Platform Y
• Profile on target device
• Editor != Player
• Managed Memory is not returned to Native
Land!
• For best results…: Profile early and regularly
44. Want to know more ?
• Unite: http://unity3d.com/unite/archive
• Blog: http://blog.unity3d.com
• Forum: http://forum.unity3d.com
• Support: support@unity3d.com
The memory in unity is split up in managed and native memory.
The managed memory is what is used from Mono, and includes what you allocate in scripts and the shells for unity objects.
This memory is garbage collected by mono
The native memory is what Unity is allocating to hold everything in the engine. This includes Asset data like tex, mesh, audio, animation. It includes gameobjects and components, and then it covers engine internals like redering , culling, shaderlab, particles, webstreams, files, physics, etc….
This memory is not garbage collected, and sometimes needs the UnloadUnusedAssets method call to be freed.
*notes from Kim’s Unite Asia 2014 presentation
Native (internal)
Asset Data: Textures, AudioClips, Meshes
Game Objects & Components: Transform, etc..
Engine Internals: Managers, Rendering, Physics, etc..
Managed - Mono
Script objects (Managed dlls)
Wrappers for Unity objects: Game objects, assets, components
Native Dlls
Unity’s : DirectX, User’s external dll’s
The managed memory used by mono reserves a block of memory to use for the allocations requested from your scripts. If more memory is needed mono will allocate more heap space. This will grow to fit the peak memory usage of your application.
When the memory is being freed, mono can in some cases give the heap space back to the system.
When the heapspace is exhausted, mono will run the garbage collecter to reclaim memory that is no longer being referenced. This can be a timeconsuming operation specially on large games. This is one of the reasons to keep the memory activity low.
Another reason is that with high memory activity, memory is likely to fragment and the small memory fragments will be unusable and cause memory to grow
*slide and notes from Kim’s Unite Asia 2014 presentation
System.GC.Collect can be used to trigger a collection of managed memory
Don’t use FindObjects or LINQ
They both allocate temp variables on the heap
In .Net pre 2.0, "" creates an object while String.Empty creates no object. So WAS more efficient to use String.Empty.
for some types, GetEnumerator() returns a class. if you foreach those, you'll have an allocation
if it returns a struct, it does not allocate
Everything is scanned. GC takes more time
Only array of strings is scanned. GC takes less time
Some objects that you can new in scripts have large native memory footprints in unity. An example of that is the WWW class which in mono is only a wrapper, but in Unity it contains some very large allocations. This backing memory is not cleaned up until the finalizers are run, or Dispose is called manually. If this is left to the garbage collector and the finalizers, this memory can potentially live long after the reference is removed from mono. The reason for this is that mono does not see this as a large allocation because of the small footprint of the wrapper.
Slide and notes from Kim.
So the way the garbage collector works is shown here. A reference is removed from mono by setting the reference to null or the object going out of scope. After a while the garbage collector will run, either because the heapspace gets exhausted, or because we manually call GC.Collect()
When the memory is collected by the garbagecollectorit is put on a queue for the finalizer. This is handled by another thread, and eventually the finalizer will call Dispose on the object. At this point the unity native memory will be deallocated.
To skip this roundtrip, you can call Dispose manually on the object, and then the object will not be given to the finalizers and the memory will be cleaned up imediately. I have a small example that shows this
*slide and notes from Kim’s Unite Asia 2014 presentation
“C# is a language that allows you to quickly write powerful code. However the downside to this is that writing natural C# code produces a lot of garbage. The only way around this is to eliminate or reduce your heap allocations”
Garbage Collection: It’s very expensive, especially on mobile
- Pre-allocate and keep lists of inactive objects . Use object pools. An implementation is on the AS: https://www.assetstore.unity3d.com/#/content/1010
Keep lists of inactive objects, like bullets for example
Calling GC.Collect 6 times: OSX, IOS and Android, Windows and WP8 don’t seem to work. The platform should implement mmap.
REMOVED:
Using structs will benefit GC performance
GC only traverse managed allocations
Texture Read/Write Enabled: it will DOUBLE the amount of memory required for texture asset.
- Unity 5.x: will be disabled by default
Non-uniform scaling has a negative impact on rendering performance + memory overhead:
It's only done every frame if the transform changes every frame.
Of course the memory is still used, as long as the object exists.
5.0:
- Non-uniformly scaled meshes no longer incur any memory cost or performance hit. Instead of being scaled on the CPU they are scaled and lit correctly in shaders.
- This also means that meshes that have the Read/Write Enabled option unchecked in their import settings will be lit correctly when they are scaled differently in x,y,z direction at runtime.
What is it ?
It’s an optimization that It reduces number of draw calls and state changes by batching static objects with the same material
reduces the number of draw calls (CPU/GPU time) and state changes… but it increases memory usage
Not supported on PS3
It costs memory and cycles at runtime to create index buffer
Unity 5.0: Re-implemented static batching without copying of index buffers (faster)
What is it ?
It’s an optimization that It reduces number of draw calls and state changes by dynamically batching objects with the same material
It batches “small” objects with the same material
Dynamic batching is done automatically (if mesh meets the requirements)
Conditions:
Object must be affected by the same set of lights
There is a limit of 300 vertices per object.
There is a limit on the number of channels * vertices of the combined mesh = 900. E.g:
If you have position + normal + uv = 300 vertices max. (3 * 300 = 900)
If you have position + normal + uv0 + uv1 + tangent = 180 verts max. (3 * 180 = 900)
Non-uniform scaling and uniform scaled meshes are batched reparately.
GameObjects with uniform scale must have the same scale. For example (1,1,1) and (2,2,2) cannot be batched together.
The combined mesh can only have up to 32000 indices (to workaround an issue with MacBookPros) and 2^16 vertices (the limit for any mesh in Unity).
Ojects must share the same material
Multi-pass rendering. If one pass gets batched and another does not, then there's mismatch between vertex positions, resulting in Z-fighting. Result is: many per-pixel lights, shadows etc. disable dynamic batching.
Not supported on PS3/X360
Unity 5.x: we are considering to expose per-platform parameters
Fat G-buffer. Right now 4x32bit render targets, plus 32 bit Z buffer (160 bits/pixel); add 32 more bits when HDR (emission/light buffer is rgba16f when in HDR).
- RT0: diffuse color (rgb), unused (a)
- RT1: spec color (rgb), roughness (a)
- RT2: normal (rgb), unused (a). 10.10.10.2 when available.
- RT3: emission/light (rgb), unused (a)
Z: depth buffer & stencil
It will be customizable in Unity 5.x
Render Loop Customization
It allows you to “inject” draw calls during the rendering loop
before/after specific camera “events”:
DepthTexture
Lighting
Image Effects
etc…
You can query for any Component using QueryComponent and GetComponent
This will Ask every component in the GO if it is derived from the wanted class
Internal:
/// The difference between QueryComponent and GetComponent is that
/// QueryComponent returns a ptr, GetComponent returns a reference
/// but asserts if no matching component could be found
/// Also you are not allowed to Query for Component classes
/// that are in the GameObject more than once.
Prefer WWW.LoadImageIntoTexture
Instantiate is SLOW
Prefabs at a fundamental level. They are a collection of predefined GameObjects & Components that are re-usable throughout your game.
Clones the object original and returns the clone.
When you clone a GameObject or Component, all child objects and components will also be cloned with their properties set like those of the original object. However, the parent of the new object will be null, so it will not be a "sibling" of the original. However, you can still set the parent explicitly if you wish. Also, the active status of a GameObject at the time of cloning will be passed on, so if the original is inactive then the clone will be created in an inactive state too.
Unity 5.0 optimizations: Biggest wins will be on big game object / transform hierarchies.
Some platforms like consoles or ios use ahead-of-time scripts compilation, which means the scripts are compiled at build time.
Other platforms, like Android use JIT compilation.
JIT Compilation: The process in which machine code is generated from Common Intermediate Language code during the application's run-time
Can we warm up specific functions ? Like we do for shaders
Use Profiler.BeginSample/EndSample To profile your own code
Reserved Total: it's the managed memory high watermark