Scalable Game Servers talk I gave at TGC 2017
The slides describe the case for building distributed systems using a statteful architecture to achieve high throughput and low latency using Microsoft Orleans as the best provider and inventor of the virtual actor model.
http://github.com/dotnet/orleans
2. Who am i?
• Coded games for more than 8 years professionally
• Worked on networking middleware at MuchDifferent and wrote a
Unity emulator for servers there
• Worked on multiple multiplayer games for mobile, webGL and on a
kids MMO for south Korean market
• TakeCover www.gamajun-games.com
• Educational games for www.dimensionu.com as a sub-contractor at
MuchDifferent
• Small stuff here and there on different things, a bit of OSS included
3. Chicken and egg
• I want to describe the case for Orleans
• You should know the definitions before the case makes sense
• Definitions aren’t motivating without the case for some
• I have to start from one or the other
• I’ll go fast over the definitions
4. Concurrency vs parallelism
• concurrency is the decomposability property of a program, algorithm,
or problem into order-independent or partially-ordered components
or units (What!?)
• concurrency comes from the need to execute many pieces of code on
a small number of CPU cores
• Parallel computing is a type of computation in which many
calculations or the execution of processes are carried out
simultaneously
5. Distribution
• A distributed system is a model in which components located on
networked computers communicate and coordinate their actions by
passing messages
6. Why concurrent, parallel and distributed code
• It’s more than 10 years that CPU cores won’t get more powerful
• We have multiple cores instead
• One single machine have limited power
• Modern workloads
7. History
• We should know more about CS and SE history but we don’t
• All programs were sequential
• CPUs were getting more and more powerful and programs were
getting faster
• Some people used multiple processors but only on servers
• Today nobody has a one core processor and the free lunch stopped
• Academia worked on distributed stuff
8. Who needed it?
• Not many people
• Heavy graphics processing
• AI
• Simulations and military scale stuff
• Telephone switches
• Researchers interested I the field
9. Ok what those some people did
• How to communicate?
• Memory sharing vs message passing
• Transactional memory, locks and semaphores
• Memory sharing was the way
• Message passing and Erlang
• Stateless web
• Multi-threading, coroutines, the actor model and CSP
10. The usual way
• Thread
• Shared memory and locks
• Don’t do it unless you have to and quit as soon as you had to debug it
• Transactional memory to the rescue?
• Things changed in last 5-8 years
• Actor model rises again
11. Actor model
• Mathematical theory of computation
• Introduced in 1973 by Hewitt, Bishop, and Steiger
• A framework and basis for reasoning about concurrency
• Actors as primitives of concurrent computation
12. The Erlang way
• No shared memory
• Even no mutable state
• Message passing
• The actor model
• Why it is not popular?
• Elixir helps
14. Scalability
• Horizontal scalability
• Vertical scalability
• Vertical scalability is limited
• Big machines are expensive
• Horizontal scalability requires distributed software
• HS can make your software more available and more fault tolerant
• VS is impossible for many tasks
15. stateless services for scaling
• stateless services are easy to develop
• They offload the problem to the storage system
• Caches
• Low latency , high throughput usually dictates stateful
16. Stateful services for scalability
• Much faster responses
• Harder to develop and maintain
• Doesn’t offload all problems to storage
• Doesn’t hit storage as much as stateless services
• Actor model is very good here for many cases
17. Actor model implementation issues
• Erlang is the first and most robust industrial implementation made in
Ericson
• Akka and Akka.net are similar to it conceptually
• You need to manage resources and do load balancing
• You need to handle many failure cases
• Erlang as the most robust one has a very special syntax, is dynamically
typed and its ecosystem is not as big as what you would expect
20. Microsoft Orleans
• Distributed actor model runtime
• Virtual actor model
• Location transparency
• Based on .NET objects and interfaces
• Asynchronous using async await
• Error propagation
• Silo runtime execution container
• Implicit activation and life-cycle management
• Coordinated placement, multiplexed communication and failure recovery
21. What was it again
• Distributed C# with remote objects
• Magically managed by Orleans runtime
• Magic doesn’t mean hidden stuff
• You are in control if you need
• It looks like C# and it act like C#
• It scales
• It’s fault tolerant
• Guides you within the right path
22. Grains , virtual actor
1. Grain instances always exist, virtually
• Needn’t be created, looked up or deleted
• Code can always call methods the grain
• Grains never fail
2. Activations are created on-demand
• If there is no existing activation, a message sent to it triggers instantiation
• Lifecycle is managed by the runtime
• Transparent recovery from server failures
• Runtime can create multiple activations of stateless grains (for performance)
3. Location transparency
• Grains can pass references to one another around
• References can be persisted to cold-storage
23. Execution model
• Activations are single-threaded
• Optionally re-entrant
• Runtime schedules execution of methods
• Multiplexed across threads
• No shared state
• Avoid races
• No need for locks
• Cooperative multitasking
26. Grain implementation
Public class HelloGrain : GrainBase, IHello
{
async Task<string> SayHello(string greeting)
{
var resp = "You said: '" + greeting + "', I
say: Hello!";
return resp;
}
}
27. Orleans is built for
• Large scale distributed real-time systems
• High throughput , low latency
• Large number of independent entities
• Stateful computing
• Not too much communication between many entities for computing
results
28. Apadana game backend case study
• A scalable distributed backend for games
• No physics and game world logic for now
• Players, authentication, cloud save
• Leaderboards
• Match making
• Realtime messaging in games
• Guilds and friend lists
29.
30. Architecture
• Many silos in the back
• Storage using ArangoDB and couchbase (customizable)
• Cloud ready
• Front-ends using WebAPI and websockets, custom UDP protocol soon
31. Inside silo
• Developer
• Game (Title) used as tenant
• Player
• RealtimeGame
• Leaderboard
• MatchMaker
32. Architecture
• Stateful grains
• Write to storage for fault tolerance
• Batch communications
• Distributed processing of user messages
• Index grains for managing resources
33. LeaderBoard
• Stateful grain per leaderboard
• Sorting data in memory
• Defined resource
• Very low latency and without much database hits
• Hot leaderboards in memory
34. MatchMaker
• One grain per MatchMaking game mode
• Clients send requests
• MM processes in a timer and sends responses
• Games are created here
35. Real-time game
• One grain per match
• References all players
• Relays messages between them
• Doesn’t care about message format unless your scripts want to
36. Scripting
• Uses C# and Roslyn
• Code analysis for security
• Times-out scripts which take a long time
• Scripts are asynchronous as well
• Compiled and executed locally on each machine
37. A note on Micro services
• Not the silver-bullet
• Each grain is a micro service
• Latency
• Do you really need multiple apps in multiple languages?
• Local per service storage
38. What about Node.js
• All cool kids are using it
• Single threaded
• No concurrency and distribution abstractions
• CSP and hard to reason about
• Not good for CPU bound and stateful stuff
• Horrible idea for real-time games
• Add web-hooks and you are doomed
• Brain melt-down coming soon
40. What else
• Super awesome community
• More super awesome community
• Fascinating core team
• A great project lead
• Being used in MSN win10 apps , Skype, internal Azure services, Halo 4
and 5, age of empire castle siege and many other MS and community
projects
41. Acknowledgements
• Sergey Bykov
• Julian Dominguez
• Reuben Bond
• Gutemberg Ribeiro
• Other members of Orleans community
• Amir Reza Moghassemi (My fellow at Apadana)
age of empire castle siege