MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

DESIGNING A GAME AUDIO ENGINE FOR HSA
LAURENT BETBEDER
SCEA

WHAT’S SO SPECIAL ABOUT CONSOLE GAME DEV?
NOW THAT CONSOLES MOSTLY RUN PC HARDWARE

 Extreme performance optimizations
‒ Until gamers opt for shorter upgrade cycles (phones/tablets business model) ?
‒ Can’t run sub-optimal audio code when competing for cycles on crowded compute queues

 Custom hardware, OS, drivers and compilers
‒ To extract max perf from fixed hardware
‒ Helps lengthening platform life time
‒ “But but… where’s my OpenCL runtime?”

 Low latency
‒ Music games on consoles need it as much as professional music prod software on desktop
‒ But is much harder to achieve reliably when a system is constantly overloaded

2 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

GAME AUDIO DSP ON THE ACP
WHY?

 Heavy specialized DSP workloads
‒ Stuff games need badly but don’t really want to deal with
‒ Best fit for dedicated and/or fixed function hardware
‒ Codecs
‒
‒
‒
‒

CELP codecs -> party chat
100s of MP3/AT9/AAC decode instances
Huge impact on game assets footprint, down/load times
Optional output bitstream encoding (AC3/DTS)

‒ Voice recognition
‒ Echo cancelation

 Platform wide IP licensing levels the playing field
‒ Good for indy developers
‒ And good for the platform!

 Available via asynchronous secure system APIs


GAME AUDIO DSP ON THE ACP
WHY NOT?

 Exotic hardware and dev environment
‒ Closed to games
‒ Closed to middleware
‒ Platform specific

 Asynchronous interface
‒ Can’t have sequential interleaving of DSP back and forth between CPU and ACP w/o latency buildup
‒ But ultimately, we want the DSP pipeline to be data driven (by artists who know nothing about this)
‒ Modularity

 Slow clock rate @ 800MHz, very limited SIMD and no FP support
‒ Tough sell against Jaguar for many DSP algorithms
‒ Very tight local memory shared by multiple DSP cores

 Already pretty busy with codec loads and system tasks


GAME AUDIO DSP ON THE GPU
WHY?

 Much more demand for real-time effects today and will keep growing

 CPU FLOPS likely to stagnate and could even decline in HSA as CUs takes over SIMD workloads
 Flexibility: some games are CPU bound, others are GPU bound…
 hUMA is a game changer (removes NUMA’s main bottleneck: GPU write back)
 Compute queues with prioritized scheduling and even some form of preemption
 Many real-time audio DSP algorithms work well on wide SIMD units
‒ FFT convolution (spectral processing in general)
‒ Mixing, resampling, wave shaping, etc…

 Mostly coalesced mem accesses
 Low/med bandwidth (< 1GB/s)


GAME AUDIO DSP ON THE GPU
WHY NOT?

 Some algorithms do not work (as) well on wide SIMD units
‒ IIR filters, ADPCM decodes, dynamics: data recursion causes thread interdependencies within wavefronts
‒ Typical AAA game runs 1000s of biquads at various stages in the filtergraph

 Workloads may require batch voice processing to achieve high CU efficiency
‒ Build 2D grids (channels x samples) or 3D grids (channels x subbands x samples)
‒ Swizzling is key but watch out for runtime cost as SIMD widens (static vs dynamic)

 Batch processing goes against free form MaxMSP model artists are pushing for
‒ Unique DSP chain for each sound “just because we can!”
‒ Data driven filtergraph and DSP pipeline

 Complex prioritized scheduling & dispatching compute queues
‒ Do not prevent intermittent CU saturation caused by large graphics workloads
‒ Risky for low latency direct path audio DSP

 Proprietary hardware, drivers and shader compilers (PSSL)
‒ Audio middleware will need a some incentive to move up there
‒ Most will probably stay on the CPU

GAME AUDIO DSP ON JAGUAR
WHY?

 Well known and open x64 dev environment
‒ Middleware friendly
‒ CLANG/LLVM solid & stable

 Full FP unit with SSE4 support

 Early PA is surprisingly good for compiled intrinsics code
‒ ~10% slower than core i7 @ same clock rate
‒ GDDR5 latency is not an issue
‒ < ~50% of 1 core @ 1.6GHz running the entire KZSF filtergraph

 Only reliable solution for ultra low latency
‒ Music and rhythm games
‒ Run 100% on CPU (including decoding)


GAME AUDIO DSP ON JAGUAR
WHY NOT?

 “Weak laptop CPU” compared to top of the line on desktop
‒ No FMA4
‒ Slow clock @ 1.6GHz (compared to typical desktop)

 256bit AVX mostly useless
 Possible bottleneck down the line


GAME ENGINE CODE
THIN COMPUTE

 3D audio
‒ Sound emitters (distance, directionality and size modeling)
‒ Sound listeners (mic and ear modeling)
‒ Sound geometry (collision meshes)
‒ Deeper physical modeling of sound propagation
‒ Simple ray casting (occlusion, obstruction, indirect audio)
‒ Advanced ray casting (diffraction, real-time individual early reflection tracking)

 Physics
‒ Rigid body dynamics (collisions, friction, destruction)
‒ Fluid dynamics (turbulences)

 Animation, special FX
‒ Inline audio sequencing and modulation
‒ Foley, coarse granular synthesis


CONCLUSIONS
 HSA + hUMA is a great combo for high perf game audio!
‒ Maximized perf per W from specialized hardware (CPU + GPU + ACP)
‒ Our challenge is to figure out what to run where and when

 ACP is a great fit for codecs and OS services
‒ But not for modular synthesis and highly customized DSP pipelines

 GPU is great fit for mid/high latency DSP and high level 3D thin compute
‒ Indirect (reflected) audio
‒ Convolution reverb
‒ 3D ray casting for occlusion/obstruction/diffraction

 CPU is still the best fit for everything else:
‒ Open modular synthesis frameworks and middleware
‒ Low latency audio


AUDIO SYNTHESIZER SCHEDULING IN HSA


DISCLAIMER & ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap
changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software
changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD
reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of
such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION
CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices,
Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names
are for informational purposes only and may be trademarks of their respective owners.

MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (19)

Similaire à MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

Similaire à MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder (20)

Plus de AMD Developer Central

Plus de AMD Developer Central (20)

Dernier

Dernier (20)

MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder