4. State of Flash
• Is Flash Dead?
• FB: Top 10 = 250M MAU
• Desktops: Flash 10 installed on 99%+
• SmartPhones: Flash/Air 200+M, 100 devices
• Streaming: 120 petabytes per month
• Advances in Flash for 3D games
• AS3
• 10.1, 10.2 …
• Molehill
5. Molehill’s API Presentation
• Pros:
– GPU Accelerated API
– Relies on DirectX 9 and OpenGL ES 2.0
– Native Software fallback
• Cons:
– No point sprite support, branching, MRT, depth buffer
– No CPU threading support
– Native Software fallback
8. Digging deeper into Molehill
• Assuming a basic knowledge of 3D development terminology
• Display Layers
• Model/Animation File Format
• Character Animation: Matrix vs Quaternion
• Texturing
• Optimizing the Particle System
• Fast Lights & Shadows
• CPU Post-Processing effects
• Profiling & Debugging tools
• Bonus!
– 3D GameDev Lexicon
– The math explaining all the numbers I’m going to talk about
– Cheat sheets
10. Frima 3D File Format
• Many 3D engines for flash try to support multiple input format
• …Or support only generic format such as ColladaXML
• Using a format optimized for 3D game made in Flash
– Small File Size
– Small Memory footprint
– No processing required
Model & Animation File Processing on low-end computer
6000
5250
5000
4000
3000
2000 Time to process (ms)
1000
15
0
Collada XML Frima Binary Format
11. Frima 3D File Format
• Export pipeline
3DS Max Scene
Build Tool
Max Script Exporter
Collada XML
12. Frima 3D File Format
• Export pipeline
Build Tool
Game Serialize (AMF) Game
Model / Animation
Object Compress File
13. Frima 3D File Format
• In-Game usage
Game Uncompress Game
Add To Scene
File Unserialize Object
15. Animation techniques
• Matrix linear blending can cause loss of volume when joints are twisted or
extremely bent
• When using matrix, each bone take 3 constants
– Maximum number of bones is 40
• When using DualQuats, each bone take only 2 constants
– Maximum number of bones is 60
Matrix (left) / Dual Quaternion (Right)
16.
17. Transitions & interpolation
• Animation transition require two sets of bones
• Idle blending to walk
• Same thing for frame interpolation (ex: Bullet time Animation)
DualQuaternion 48
matrix 72
0 32 64 96 128
VertexShader constant required for animating a character (24 bones)
DualQuaternion Anim1 (48) Anim2 (48)
matrix Anim1 (72) Anim2 (72) Too Much
0 32 64 96 128
Constant for anim 1 Constant for anim 2
20. Texturing in Molehill
• The first version of the engine was only using PNGs
• Adobe Texture Format (ATF)
– Texture are kept compressed in Video Memory
– Native support for multi-device publishing
– One file containing 3 encoding: DXT1, ETC1 and PVRTC
– 1.3x bigger than original PNG
– Contain the MipMapping of the texture
– Does not support transparency
21. Texturing in Molehill
• Transparency
– Use PNGs with indexed color
– Sample a “alpha mask texture” in the pixel shader
ATF PNG
Avatar = opaque Fence = Transparent
22. Texturing in Molehill
• Many effects can use ATF when using the good blend modes
• No need for transparency
Splatter = Multiply Fire = Additive
23.
24. Particle System
• Using a divided workload (CPU/GPU) for better performance
– Each particle property update is computed on the CPU at each frame
• Alpha, Color, Direction, Rotation, frame(If SpriteSheet), etc.
– On the GPU
• Applying theses properties
• Expending billboard vertex to face the screen
25. Particle System : Optimization
• How many particle?
– Due to the VertexBuffer and IndexBuffer limits,
– In ZombieTycoon we were limited to around 16383 particles per draw call
• Using Fast ByteArray (also known as Alchemy memory or DomainMemory)
– Using Azoth, properties updates were 10 times faster
• Batching draw calls using the same texture
• Using a 100% GPU particle system
– It’s expensive on the GPU
– Support only linear transformation
– Zero CPU required
29. Lights & shadows
• ShadowMap & LightMap
– We used two textures, a “multiplied” ShadowMap and an “additive” LightMap
Diffuse
* ShadowMap
+ Lightmap
= Composite
30. Lights & shadows
• Dynamic lighting
– Lighting required expensive pixel shader, currently limited to 256 instructions
– Zombie Tycoon support up to 7-9 lights (spot or points) per object.
31.
32. Lights & shadows
• Pixel Shader assembly code
– Per light, without Normal/Specular mapping.
33. Lights & shadows
• Fake Volumetric Lights
– Using a few billboard particles, it’s easy to fake a nice and lightweight volumetric lighting
– All object are sampling Shadow and light maps, and since the light particles are “additive”, if
an object is behind the lights, it will look brighter
36. Lights & shadows
• Fake projected shadows
– We created a particle of a gradient black spot aligned to the ground
– Orientation and scale of the particle depends on light position and intensity
37.
38. CPU Post-Processing
• Possibility of reading the BackBuffer
– Strongly recommended not to use Readback
– Fast pipeline for data from the System memory to Video memory
– VERY slow pipeline from video to system memory
• Effects: Bloom, Blur, Depth of Field, etc.
Motion Blur
40. Profiling and Debugging tools (CPU)
• FlashDevelop (O.S.S.)
– Most of the production is using FlashDevelop
– Now with a profiler and a debugger, it’s very easy to work with it
41. Profiling and Debugging tools (CPU)
• Adobe Flash Builder Profiler
– Profile Function calls
– Profile Memory allocation
42. Profiling and Debugging tools (CPU)
• FlashPreloadProfiler (O.S.S.)
– Profile Function calls
– Profile Memory allocation
– Profile Loaders status
– Can be used in Debug/Release & browser/Projector
43. Profiling and Debugging tools (GPU)
• Pix for windows
– List of API calls
– Shaders assembly code
– Pixel debugger
– Texture viewer
44. Profiling and Debugging tools (GPU)
• Intel® Graphics Performance Analyzers (GPA)
– Render in wireframe
– Profile Vertex and Pixel shader performance
– Visualize overdraw and draw call sequence
– Save a frame, and make real-time experiment
– Identification of bottlenecks
46. What it means?
• VertexBuffer
• IndexBuffer
• Vertex Constants
• MipMapping
• Quaternion
• Billboard
47. Bonus Slide: The maths!
• Character animation:
– Matrix linear blending:
• 128 Float4 VertexConstant – WorldMatrix – ViewProj matrix = 120Float4
• 120Float4 / / 3Float4 per bone = 40 bones in the constants
• Bullet time and transitions require two sets of bones: 40/2 = 20 bones per character max
– DualQuaternion linear blending:
• 128 Float4 VertexConstant – WorldMatrix – ViewProj matrix = 120Float4
• 120Float4 / / 2Float4 per bone = 60 bones in the constants
• Bullet time and transitions require two sets of bones: 60/2 = 30 bones per character max
• Max Particle Count
– The VertexBuffer is limited to 65536 vertex, the IndexBuffer is limited to 983040 index of type SHORT
– In theory, you could have up to 327680 triangle in one draw call
– In practice, with no vertex re-use between particles and using quads (4 vertex): 65536/6 = 16383 particle max per
draw call
• Lighting
– With the PixelShader limit of 256 instructions, we were able to fit around 7 to 9 dynamic lights per object (point or
spot light)
Here is a quick overview of what we will go through in this session. I will start by looking at the current state of Flash and I will present the Molehill API that Adobe is offering us. JP will then take over and dig deeper in the technical aspects of it all.
Is Flash Dead? Look this up in Google and you’ll find many interesting discussions. I have no intention on going into the politics of it, so here are some facts.According to AppData, the top two FB apps attracts close to 150M MAU. Both are Flash games. From known metrics, this amounts to a conservative 30M $ per month, all going to the same company, but still WebGL = Final draft was done a month ago.The Flash player provided advances that « helped » 3D based applications. AS3 was a big one, providing a much faster Virtual Machine, but it’s still non-native software rendering. Molehill provides the jump ahead to real 3D on the browser.35 devices by 2010 – Close to 100 deviceswillshipwithit in 2011Flash Stats based on : http://www.adobe.com/products/player_census/flashplayer/version_penetration.html (December 2010)http://tv.adobe.com/watch/industry-trends/strong-mobile-adoption-of-flash-platform-in-2011/
Molehill opens a low-level API to access the GPU. Adobe decided to limit this API to a lowest supported denominator. This decision has been made to keep a consistent experience across all supported platforms.Those are some Pros and Cons, you can probably draw your own list after playing with it No CPU threading API yet on the Flash player. Adobe has been using it more internally and I’m sure they are working on something for us to use soon One thing to note, is the software fallback. Be careful when designing Molehill games, decide if you can afford the software renderer as soon as possible. If you don’t support it, always validate that hardware is used and show a prompt otherwise.I will now let JP discuss the technical aspects of all of this
Starting with the base, the file formatAnd the I Will cover some of the system that we have in ZombieTycoon
Starting with the base, the file formatAnd the I Will cover some of the system that we have in ZombieTycoon
If you have 1000 quad particles, itmeansyou have to update around 6000 vertex to update, and witharound 10 values soAt the end it’s 60 000 update on yourbytearray.VB: of 64 DWORD
If you have 1000 quad particles, itmeansyou have to update around 6000 vertex to update, and witharound 10 values soAt the end it’s 60 000 update on yourbytearray.VB: of 64 DWORD
If you have 1000 quad particles, itmeansyou have to update around 6000 vertex to update, and witharound 10 values soAt the end it’s 60 000 update on yourbytearray.VB: of 64 DWORD
If you have 1000 quad particles, itmeansyou have to update around 6000 vertex to update, and witharound 10 values soAt the end it’s 60 000 update on yourbytearray.VB: of 64 DWORD
If you have 1000 quad particles, itmeansyou have to update around 6000 vertex to update, and witharound 10 values soAt the end it’s 60 000 update on yourbytearray.VB: of 64 DWORD