3. Jesse Sanford, Joshua Kehn
Freeverse / ngmoco:) / DeNA
● Web guys brought in to design RESTful
HTTP based games for handheld clients.
● Platform concurrently being built by Ngmoco
team out in San Francisco.
● First games in company's history built
entirely on
○ EC2
○ Node.js
○ MongoDB
● There is a lot of firsts here!
4. Why Node.js?
● Already using javascript. Knowledge share!
● Fast growing ecosystem.
● Reasonable to bring libraries from client to
server and vice versa.
● Lots of javascript patterns and best practices
to follow.
● Growing talent pool.
5. Why MongoDB?
● Ever changing schemas make document
stores attractive.
● Easy path to horizontal scalability.
● 10gen is very easy to work with.
● Lots of best practice patterns for running on
ec2 infrastructure.
● Javascript is a friendly interface.
6. Handling the change.
● Lots of patience.
● Many proof of concepts.
● Dedicated poc's for different puzzle pieces.
(Platform services, Game Libraries)
● Developer training and evangelists.
● Performance testing and open source library
vetting.
● Lots of patience. Seriously.
7. Building from scratch.
● Lots of testing.
● Pre-flight environment for content.
● Duplicate of production for release staging.
● Full stack developer sandboxes on every
workstation.
○ Individual MongoDB and Node.js instances running.
○ Full client stack available as a browser based
handheld simulator for client interface.
8. "Physical" Infrastructure
● EC2 fabric managed by Rightscale
● Extensive library of "Rightscripts" and
"Server templates".
● Different deployments for each environment.
● Deployments a mix of single service
machines and arrays of machines.
● Arrays load balanced by HA proxy not ELB's
● Mongo clusters are largest expense.
11. Mongo Infrastructure
● Mongo cluster per environment.
● 3 config nodes split between 2 availability
zones.
● Currently only 1 shard.
● 3 db nodes split between 2 availability
● mongos processes running directly on app
servers.
12. Mongo Infrastructure cont.
● Config nodes on t1-micros.
● DB nodes on m1-xlarges.
● DB nodes running raid 10 on ebs.
● XFS with LVM.
● Snapshots taken after forcing fsync and lock
on db and then XFS freeze.
● Backups always done on secondary.
13. Shrinking Mongo
● Staging and testing environments too costly.
● Logically the application knows no
MongoD/S differences.
● Still single shard.
● Spinning instances is quick.
● Only used for smoke testing at the end of
every dev cycle.
● Moving to single master -> slave replication.
● Cost savings of 60% in these environments.
15. Log4js-syslog, Flume
● Centralized logging from all application
servers in the cluster.
● Configurable log levels at both the
application layer and filters on the stream
after that.
● Flume speaks syslog fluently
● Flume allows us to point the firehose
wherever we want.
● It's trivial to ingest the Flume ouput from s3
into Hadoop/Elastic Map Reduce
16. Daida, Beanstalkd
● Needed fast worker queue for push
messaging and out-of-band computation.
● Considered Redis and Resque
● Considered RabbitMQ/AMPQ
● Beanstalkd was built for work queues.
● Beanstalkd is very simple.
● No real support for HA
● Workers needed to be written in javascript.
● No upfront knowledge about the runtime
activities of workers.
17. Daida, Beanstalkd cont.
● Developers define jobs (payload contains
variables needed for job to execute)
● Developers schedule jobs.
● Developers create "strategies" which know
how to execute the jobs.
● At runtime using some functional magic
Daida closes the developer defined strategy
around the payload variables that came with
the job.
● This is somewhat similar to the job being run
by a worker inside a container with a
18. Daida handler example.
var handlers = {
bar: function(data, cb) {
var callback = cb || function() { /* noOp */ }; //if callback wasn't passed
console.log('test job passed data: ' + JSON.stringify(data));
callback(); //always make sure to callback!!!!
},
foo: function(data, cb) {
var callback = cb || function() { /* noOp */ };
console.log('foo job passed name'+ data.name);
callback(); //again never forget to callback!!!
},
};
exports.handlers = handlers;
exports.bar = handlers.bar;
exports.foo = handlers.foo;
//taken from https://github.com/ngmoco/daida.js
19. Ejabberd
● Best multi-user-chat solution for the money.
● Considered IRC and other more custom
solutions.
● Javascript handhelds can use javascript chat
client libraries!
● Capable of being run over plain HTTP.
(Comet/long-poll/BOSH)
● Widely used.
● Fine grained control over users and rooms.
● A little complex for our needs.
● Erlang/OTP is solid.
21. Megaphone load tester
● Written in erlang/otp to make use of it's
lightweight processes and distributed nature.
● SSL Capable HTTP Reverse proxy.
● Records sessions from handhelds.
● Proxy is transparent and handhelds are
stupid.
● Choose which sessions to replay.
● Write small scripts to manipulate req/resp
during replay. OAuth handshakes?
● Interact with replay in console.
● Record results of replay.
22. Megaphone load tester cont.
● Replay in bulk! (Load test).
● Centralized console can spawn http replay
processes on many headless machines.
Similar to headless Jmeter.
● A single session (some number of individual
requests) is sent to the client process when
spawned
● Responses are sent back to the centralized
databases as clients receive them.
● The same session can be sent to multiple
clients and played back concurrently.
24. Other notables
● Recently started using python's fabric library
for rolling releases.
● Node cluster for multiprocess node.
● Node ipc with linux signals to raise and lower
logging levels and content updates.