For companies looking to build (or buy) cloud tech for the mobile + social games. Based on experiences gained developing the bitHeads brainCloud platform.
As presented at GDC 2013 in San Francisco, and in May at OIGC in Ottawa.
For more information about brainCloud go to http://getbraincloud.com
2. This session is for...
• Developers looking to build or
acquire cloud tech for their next
game project
• Stakeholders who want to
confirm that their cloud tech is
ready for deployment
Thursday, 28 March, 13
3. Who am I?
• Paul Winterhalder, CDO of bitHeads and playBrains
• bitHeads
• developers of custom cloud tech and mobile apps
• worked on the back-ends for Trade Nations, Simpson’s Tapped Out
• playBrains
• for-hire developer of mobile + social games + digital games
• developer of Eggies, Sideway: New York, Jaws, Madballs, etc...
• Product Manager for the brainCloud platform
Thursday, 28 March, 13
5. You don’t want to be a headline...
(not this sort of headline anyway)
• SimCity Launch marred by Server Problems
- Forbes, March 5th, 2013
• Overwhelmed Diablo 3 Servers crash on
launch day
- Digital Journal, May 15, 2012
• Curiosity’s servers overwhelmed by number
of people chiseling away at cube
- Polygon, Nov 7, 2012
• EA pulls Simpsons iOS Game [over server issues]
- Edge, March 6, 2012
Thursday, 28 March, 13
6. Server development is
night vs. day different
from client development.
Thursday, 28 March, 13
7. Server development concerns...
Concern Server Mobile
thousands of
Scalability concurrent users
One user
Only while the user is
Reliability 24-7 uptime
using the app!
Security !!! OS concern
Thursday, 28 March, 13
8. Why risk it? Why have server
features at all?
(Enhances the game for your users
and your bottom line.)
Thursday, 28 March, 13
9. Motivation: Key Gameplay
The very nature of the game
dictates that you must be online.
• multiplayer
• online worlds
• server-based gameplay rules
Thursday, 28 March, 13
10. Motivation: Deeper Engagement
Let the player play how they want
and when they want.
• pick up and play from any device
• all key info in the cloud:
• game progress
• player stats & xp
• virtual currencies
Thursday, 28 March, 13
11. Motivation: Social Reach
Cloud tech can ease the burden
of adding social features to your
games.
• invites
• messages
• gifts
• leaderboards
• tournaments
Thursday, 28 March, 13
12. Motivation: Feature Parity
Every platform is different - cloud
tech can aid in delivering the same
experience to all players.
• leaderboards
• achievements
• async multiplayer
• cloud saves
• etc.
Thursday, 28 March, 13
13. Motivation: Unite Communities
Cloud tech can unite your player communities
across platforms - and across IPs.
• multiplatform libraries
• cross-platform multiplayer
• global stats
• social identity
• cross-platform virtual currencies
• cross-ip identity and rewards
Thursday, 28 March, 13
14. Motivation: Monetization
Your cloud solution will be a key part of
your monetization strategy...
• cross-platform virtual currencies
• economy tuning
• sales & promotions
• aggregate ads
• analytics
Thursday, 28 March, 13
15. Motivation: Improved iteration!
Today’s games must iterate or die!
• a/b testing
• gameplay tuning
• new quests / achievements
• additional content (DLC)
Thursday, 28 March, 13
16. Cloud-based features
GAMIFICATION
Player XP & Missions +
Achievements MULTIPLAYER
SOCIAL Levels Quests
Tournaments Async MP
BASE DATA E-COMMERCE
Identity
Leaderboards Downloader Global Stats Deep Entity Multi-currency
Management EVENTS
Client Gifts & Transaction
File Access User Stats Notifications
Versioning Challenges Log
Sharding Strategy E-commerce
Analytics Friend Data Events Integrations
Thursday, 28 March, 13
17. So how does cloud tech help?
Cloud tech gives you easy access to
key infrastructure - but you still need
to construct a proper solution.
Thursday, 28 March, 13
18. Key characteristics
The National Institute of Standards and Technology’s definition of cloud
computing identifies “five essential characteristics”:
On-demand self-service - administer yourself via the web
Broad network access - ubiquitous access via the greater web
Resource pooling - shared resources - multi-tenant
Rapid elasticity - can scale up and down with demand
Measured service - pay for what you use
Thursday, 28 March, 13
19. 3 levels of cloud computing
• Software as a Service (SaaS) - you access software in the cloud,
but don’t need to manage any of it... just use the APIs.
• Platform as a Service (PaaS) - you rent a development
environment that has been specifically engineered with built-in services
for scalability, rapid deployment, automated elasticity, etc.
• Infrastructure as a Service (IaaS) - you rent virtualized
hardware in the cloud, and are fully in control of the software that runs
upon it.
Thursday, 28 March, 13
20. Cloud services - examples
SaaS Flurry iCloud Facebook brainCloud
Amazon Microsoft Azure
Google
Elastic Beanstalk Cloud Services
App Engine
PaaS
Java, PHP, Python, .Net Java, Python,
Java, Python, Go, etc.
and Node.js node.JS, .NET
Google Compute Microsoft Azure
IaaS Amazon EC2
Engine Virtual Machines
Thursday, 28 March, 13
22. It’s all about the DATA.
In the end, it all comes down to how
efficiently and reliably your games can
store and retrieve data to/from the cloud.
Thursday, 28 March, 13
23. Rise of the NoSQL database
• Non-relational
• Store documents is key-value pairs
• Most are not ACID (BASE instead)
• Simpler = higher performance
• Easier to shard
• Recognize that most queries are
about a single player - not across
the whole domain of the db
http://swrpgcronicas.blogspot.com
Thursday, 28 March, 13
24. Database Sharding
AppServer 1 Vertic
al sha
rding
- simp
le, but
not op
AppServer 2 NonSharded
DB
timal User Profiles
Shard 1
AppServer 1
AppServer 3
ble!
N ot scala
AppServer 2 Shard 2 Leaderboards
g
ardin
AppServer 3
sh
o ntal !
lable
Users Stats
oriz
Shard 3
H ca
st s
- mo AppServer 1 Users A-J
Shard 1
arise
P roblems
able
AppServer 2 wh en one t
Users K-O ger /
s up lar
Shard 2
end
affic
Note* - for optimal higher tr
s...
an other
AppServer 3
Users P-Z results you may nee
d to th
Shard 3
manually implement your
own horizontal sharding.
Thursday, 28 March, 13
25. In the land of NoSQL, MongoDB is king.
Relative popularity of
NoSQL database
skills, based on
LinkedIn skills
searches.
Did not include a
search on BigTable.
- blogs.the451group.com
Thursday, 28 March, 13
27. App Servers
• Build on a PaaS solution if
possible - gives you load balancing,
elasticity
• AppServers ideally* should be PaaS
stateless Load App
MongoDB
Client(s)
• Use Memcached to cache session
Balancer(s) Server(s)
Memcached
state
• Also consider caching static
reference data (like game tuning,
rules, etc.) in each server
Thursday, 28 March, 13
28. Communications
HTTP-based communications is
the way to go - nothing else gets through
the myriad of firewalls and routers that
make-up the web.
Thursday, 28 March, 13
29. Build a Comms Manager
• Abstracts communications
• Responsible for serializing / de-
serializing messages
*
• Handles HTTP 503 (service Client
Client
Application
unavailable errors). Automatic App
Comms
Manager Server
escalating retries.
• Can queue requests (bundling them)
• Callback interface to the app for
responses
Thursday, 28 March, 13
30. Sequence numbers
• Client • Server
• Add sequence numbers to all • Use the # to guarantee in-order
messages sent from client to processing and discard duplicates
server
• Saves last response sent to client.
• If client doesn’t receive a • If receives duplicate request,
response in time, simply retry
doesn’t process it, but returns the
saved response again.
Bonus: Prevents man-in-the-middle / replay attacks!
Thursday, 28 March, 13
31. Message Formatting
• JSON format is a good place
to start (better than XML)
Formal messaging options:
• Human readable - and aligns
with MongoDB • Google ProtocolBuffers
• Can be compressed for • Apache Thrift
more optimization • Apache Avro
• Go to formal options if you
need the extra performance Good comparison here:
www.slideshare.net/IgorAnishchenko/pb-vs-thrift-vs-avro
• Warning - formal options
use more CPU on server
Thursday, 28 March, 13
32. Content Distribution Network (CDN)
• Also called a Content Delivery Network
• Distributed system of servers deployed
throughout the world
• Content physically closer to the user =
FASTER!
• Use for static files: media, configs, etc.
• Popular CDNs include: Akamai, Amazon
CloudFront, CacheFly, etc.
Tip - to avoid caching issues, place all files under a
<version> directory
Thursday, 28 March, 13
36. Feature: Deep Entity
• For complex game data - like “ville”
games
• Data store in the cloud allows:
• Players to visit each other
• Safe-guards player’s time
investment
• Allows player to play on multiple
devices (not concurrently)
Thursday, 28 March, 13
37. Implementation: Deep Entity
• Data objects stored in an entity collection Client
Game
Client
Cloud
• When the game starts, the entire set of Start Game
entities is retrieved (downloaded) Retrieve Entities
All Entities
• As the player performs actions and
operations, the client both: Plant Tree
•
Create tree locally
updates the entities locally and... New Entity (tree)
•
Ack
sends updates for individual entities [i.e. Level Up Tree
create, update, deletes] to the server Update tree locally
• If connection is terminated, user would only
Update Entity (tree)
Ack
lose maybe a few updates
Thursday, 28 March, 13
38. Feature: Player Stats
• Tracking of simple numeric stats on
the server
• Stats act as the underlying
datastructure of all the meta-
systems - xp, achievements, quests
• Allows concurrent access
Thursday, 28 March, 13
39. Implementation: Player Stats
• Each “stat” is declared on the server Energy example:
- with a declared data type, initial
value, and known min/max • Client tells server to “Increment
Hearts +1”
• Client never *sets* the stat - sends
• Server looks at Hearts value - sees
commands increment, decrement, that it’s 15 - and increments it to 16
reset, etc.
• Rules are processed to ensure the • Server then checks the rules - and
sees that the Hearts max is supposed
stat value maintains consistency to be 15 - so it resets the value back.
• Implementation is further enhanced by (Note - simplified example - energy actually uses the
the concept of StatsEvents <- server- IncrementToMax operation in our game - we only
based stats macros restore to 5 hearts)
Thursday, 28 March, 13
40. Feature: Async Multiplayer
• Allows for a game match to be
played between a list of players
• The currently active player
determines which player goes next
(to allow for turnovers)
• The next player should be notified
when it’s their turn
Thursday, 28 March, 13
41. Implementation: Async Multiplayer
• The initiator of the game will own
the match data - the others will
have permission to manipulate it
• The state of the match, including
who’s turn it is, is store in the status
entity
• Only the player who’s turn it is can
manipulate the match (except for
termination)
• The full play-by-play of the match is
stored separately in MatchHistory
Thursday, 28 March, 13
43. Performance testing is hard.
• need to simulate tens and hundreds
of thousands of users
• need to simulate the data
environment as well (especially
challenging for social applications)
• Every system is different - which can
make it difficult to determine good
vs. bad performance.
• Focus on changes in relative
performance
www.mayhemandmuse.com
Thursday, 28 March, 13
44. Grinder in the Cloud
• Grinder is a free Java-based load testing
framework that makes it easy to run a
distributed test using many load injector For Simpsons...
machines
>500 Grinder servers
• Scripts are written using a dynamic scripting
languages - Python, Clojure, Groovy, etc. 408K concurrent users
• Grinder in the Cloud allows you to 4M DAU!
temporarily spin up a cloud-based Grinder
set - to massively test your system
• For more info - grinder.sourceforge.net
Thursday, 28 March, 13
45. Gradual Roll-out
• These days, it’s more important than ever to
do a staged roll-out
• Deploy to small market to begin (like
Canada) - monitor - and then add additional
regions step-by-step, monitoring each as you
go along
• You’re looking for situations where the
performance of the system is degrading in an
unexpected, non-linear fashion. Your job is to
catch these issues before they become a
problem!
Thursday, 28 March, 13
46. Publisher Review
What follows are actual questions from
a publisher’s server due diligence that
was done for a recent game that we
produced. It’s useful as a review.
Thursday, 28 March, 13
47. Questions publishers will ask:
• Provide a technical architecture diagram describing all server tiers
• What database are you using, and how is it set up?
• Is there a plan for DB sharding for scalability?
• What do you use for caching in front of the DB? What gets cached?
• What is your DB backup strategy?
• What is the minimum server setup for 50K DAU
• If DAU reaches 200K, what would the server architecture look like?
Thursday, 28 March, 13
48. Thanks for listening!
• For more info on the tech that we build:
www.bitheads.com/braincloud
paulw@bitheads.com
• Follow me on twitter: @winterhalder
Special thanks to Scott Simpson, Rick McMullin,
Chris Justus and Preston Jennings for their
contributions and feedback!
Thursday, 28 March, 13