Human Factors of XR: Using Human Factors to Design XR Systems
Scaling High Traffic Web Applications
1.
2. About Me
• Joined Achievers in June 2009
• Prior to Achievers, I was the CTO of ZipLocal
• I have spent the last 7 years worrying about
how to build scalable applications
• Academic Background:
– Ph.D. from the University of Toronto
– Naval Research Labs Post Doctoral Fellow of
Secure Systems at Cambridge University
3. Goals
• Tell you about our journey to a scalable
architecture
• Give you insight into common scaling
problems
• Give you a way to think about the issues of
scaling that you can apply today
5. What Does Achievers Do
• Achievers started in rewards and recognition
space in 2007
• We provide reward and recognition software
– Points based system to reward performance
– Catalog to redeem the points
• Our mission is to “Change the way the world
works”
7. Our Traffic Growth
• From 2009 to today
– Visits up 903%
– Unique Visitors up 832%
• Last month we did 2.5 million page views
• During business hours we have about 250
people on the site at any given moment
8. Funding
• 3.3 million Series A from JLA Ventures
• 6.9 million Series B form Grandbanks
• 24 million Series C from Sequoia Capital
10. Definitions
• Performance
– Performance measures the speed which a single
request can be executed
• Scalability
– Scalability is the ability to handle a growing
number of requests in a capable manner
Scalability != Performance
11. Which Language Scales the Best?
• Languages Don’t Scale Architectures Do
• If you hear “language X doesn’t scale” then
turn around and walk away.
– That person doesn’t understand scalability
12. There is a bit more to Scalability
• Scalability is also about how you scale the
development team
• If you are successful and need to add people
how easy is it for them to contribute
• How fast can you write code
– Your competitors are right behind you
– He who can develop good code fast wins!
14. The Achievers Platform
• Multi tenant architecture
– One code base
– One database
• Module based platform
– Hundreds of configuration options for each
module
– Lots of legacy configurations
15. Backend Processing
• We handle many millions of dollars of orders
every month
• We send out hundreds of thousands of emails
a month
17. The Stack
• Pretty Standard J2EE stack
• Hibernate
• Spring
• JMS
• MySql
• All running on Amazon EC2
18. Aside – Amazon EC2
• EC2 is great
• Spin up machines for testing then shut them
down
• A must for any startup
– Don’t manage your own servers when you are
small. It isn’t worth it
24. Scaling Was an Afterthought
• We had to scale vertically since the underlying
design did not consider what would happen if
we had 2 web servers
• We had the largest EC2 instance money could
buy
• You cannot retrofit scalability
– Your architecture and design either have it or they
doesn’t
25. Design Decisions
• Your basic approach and philosophy to a few
things will determine how hard it will be to
scale your infrastructure
27. Who doesn’t like magic
• Extensive use of Aspect Oriented
Programming (AOP)
– Allows you to define ‘cut-points’ to insert code
before or after a function call
• As an academic AOP is brilliant
• As a CTO not so much
28. There is a Pattern for That
• Use of design patterns for the sake of using a
design pattern
• Don’t get me wrong every developer must
know and understand design patterns
• But it isn’t a competition to see who can use
the most design patterns in any given day
– The right tool for the right job
– Don’t force it!
29. Overly complex object model
• The Access Control model had so many
objects and relationships that other than the
original author no other person ever
understood it
30. Why is Complexity Bad?
• If the system dies at two o'clock in the
morning and I'm staring at your code, can I
easily figure out what's going on?
• People Forget about Magic
– Code needs to be in front of you not buried in an
XML file or magically invoked
31. What Does This Have To Do With
Scalability?
• Complex systems are really, really hard to
scale
– In a clustered environment you need to first figure
out if the problem is because of clustering or
because of your code
– This isn’t trivial even for simple systems
• To many things to worry about
• When you hit a wall (and you will) it becomes
very hard to figure out what to do
32. Don’t Forget About the People
• As you grow your team you need to ramp
everybody up
• A complex system takes longer to learn than a
simple one
• Complexity ALWAYS increases over time. If
you start with something that is complex it
will quickly get beyond the scope of a meer
mortal
35. The Database
• ORMs make you stupid … kidding … sort of
• You need to understand your data
– Do not let an ORM define your database you will be
sorry
• Generating reports out of an ORM is painful
• Developers must understand how a DB works
– You will forget about what a DB is good for if you
don’t consider it explicitly
– New developers usually do not understand the
importance of the DB in scaling
36. ORM’s
• Can they scale?
– Sure
• Is it hard?
– Yup
• A quote from stackoverflow on scaling ORM’s
– “… a good ORM will provide plenty of hooks that
allow you to optimize quite a bit. You just need to
spend some time learning it.”
37. Is that all?
• Initially ORMs might allow you to write code
quickly
– I would challenge this but that is another topic
• Your system runs into a brick wall. Customers
are complaining. Your CEO is chewing out the
CTO. The VP Engineering is curled up in a ball
in the corner. They turn to you as the
architect and you answer:
“We just need to learn how to use all the hooks”
38. Just Learn the ORM
• I have yet to meet somebody that could
convince me that they knew how to scale an
ORM
– It HAS been done, so yes it is possible but it takes
patience and a CEO that likes to wait
– I’ve had people tell me “we just have to rewrite
the ORM with a new ORM that could scale”
39. Know your database
• I believe that your DB should own all your
data
– Let it do what it is good at
• If that is true then simple replication
strategies and a little bit of coding can get you
reading data from a replica
• You can then start denormalizing the DB to get
better performance
40. Scaling Your Data
• Scaling a DB is a well understood problem
with well understood solutions
• Don’t confuse this with easy!
42. Server Side Sessions
• Very developer friendly
• You have 2 choices to scale:
– Session replication
– Sticky Sessions
43. Session Replication
• Yuck!
• Lots of network chatter
• Slow propagation of the session means the
user has a bad experience
• You could be moving lots of data around
– Our sessions were huge
44. Sticky Sessions
• Works but you now need to worry about a
machine being overloaded while the others
are idle
• A machine failure logs out everybody from
that machine
• You have be very careful when configuring
– If all IP addresses go to one server then you
essentially have one company per server
46. When to Cache
• Our platform made extensive use of caches
• That has to be good right?
• Not in our case
– Items were cached by Java
– Shared state posed a problem when adding
another server
– Yes there are Java based solutions but all you are
doing is adding complexity
48. It Won’t Love You Back
• Never fall in love with your technology. It will
break your heart.
• You must always challenge your assumptions
and be prepared to throw away something
– Hard to throw away your ‘baby’
– Remember it is just a bunch of 1’s and 0’s
50. Basic Premise
• Every web application follows the same basic
flow:
1. User makes a request
2. Validate the request
3. Grab some data
4. Process it a bit
5. Build a Page for the user
51. Guiding Architectural Principles
• Initial deployment would be on 3 machines
– Forcing us to understand how we are going to scale
upfront
• Servers must be stateless
• The database owns all the data
• Caching is an explicit choice to solve a real
problem
• Always use the right tool for the job
• Minimize complexity
52. Other Goals
• Zero downtime deployments
• We wanted to be able upgrade customers one
at a time
• Maximize developer productivity
53. The Target
Load Balancer
Web Server Web Server Web Server
Background
MemcacheD NAS Processing
Cluster Device MySql MySql
Master Slave
54. The Language Choice
• Why PHP
– Faster code/debug cycles
• This has increased our productivity
– Zero downtime deployments
• We have patched running servers multiple times in a
day and nobody has noticed anything
– Shared nothing philosophy
• Forces a good frame of mind for server development
55. Doesn’t PHP Suck?
• Languages don’t suck only the developers
using them do
• PHP isn’t perfect
– Google ‘why php sucks’ for an extensive list
• But PHP doesn’t scale
– Remember, languages don’t scale …
– If you don’t believe me ask
Wikipedia, Facebook, Digg etc.
56. Sure but PHP is Slow
• If your web application is not database bound
then you are probably doing it wrong
• Yes Java might perform at some things but
that will not be a limiting factor
57. Surely There are Down Sides?
• Because PHP does not have strong typing you
need really good error detection and reporting
– We will do another talk on our struggles and
solutions
• Coding standards are a must since PHP lets
you pretty much do whatever you want
– Naming conventions are super important
– Don’t start a religious war over bracket placement.
There really is only one right way
58. The Framework
• We use Codeigniter (CI)
• Simple MVC framework
– The code is very easy to follow
• Works out of the box, but is very extensible
– Strictly follows the Open/Closed principle
– We have extended CI a lot to meet our needs
• Doesn’t require learning anything but PHP
59. Using the Right Tool
• Have Apache (or a faster web server) server all
static content
• A Network Attached Storage (NAS) device was
used for a shared file system.
– This makes life a TON easier
• Have your web servers serve requests
• Move background work to another server
60. The Problem
• We had about 120 customers and we couldn’t
just go away to do what we needed to do
– Not a bad problem to have
62. Step 1
• We wrote a controller that would forward
requests to the new code base
• GET requests could be easily forwarded
• POST request were a bit more complicated
• This step allowed us to start developing the
new platform AND keep releasing features
63. Step 2
• Start migrating customers to the new platform
• We put a proxy server in front of our old and
new platforms.
• We then proxied specific requests to the
version they were running on
64. The Setup
HAProxy
Express Achievers
Platform Platform
MySql
65. HAProxy
• If you don’t have it installed go back to the
office download it and install it!
• It isn’t just a load balancer
– We can move specific traffic to specific machines
for whatever reason
– We have a machine with profiling capabilities that
we have used to profile production problems
– Fine grain control over your request
66. We did it!
• It took us almost 6 months to migrate every
customer but we did get there
• Our productivity has improved
• And we have an architecture that we know
can handle whatever we can throw at it
– At least in the short term
68. Scaling is Hard
• Don’t make it harder on yourself
– Reduce complexity
– Understand your database
– Have an upfront strategy to deal with state
• We picked stateless but you don’t have to
69. Never let anybody tell you a
language or framework does or
doesn’t scale.
It is all in the details