Scaling High Traffic Web Applications

About Me
• Joined Achievers in June 2009
• Prior to Achievers, I was the CTO of ZipLocal
• I have spent the last 7 years worrying about
how to build scalable applications
• Academic Background:
– Ph.D. from the University of Toronto
– Naval Research Labs Post Doctoral Fellow of
Secure Systems at Cambridge University

Goals
• Tell you about our journey to a scalable
architecture
• Give you insight into common scaling
problems
• Give you a way to think about the issues of
scaling that you can apply today

What Does Achievers Do
• Achievers started in rewards and recognition
space in 2007
• We provide reward and recognition software
– Points based system to reward performance
– Catalog to redeem the points
• Our mission is to “Change the way the world
works”

Our Traffic Growth
• From 2009 to today
– Visits up 903%
– Unique Visitors up 832%
• Last month we did 2.5 million page views
• During business hours we have about 250
people on the site at any given moment

Funding
• 3.3 million Series A from JLA Ventures
• 6.9 million Series B form Grandbanks
• 24 million Series C from Sequoia Capital

Definitions
• Performance
– Performance measures the speed which a single
request can be executed
• Scalability
– Scalability is the ability to handle a growing
number of requests in a capable manner

Scalability != Performance

Which Language Scales the Best?
• Languages Don’t Scale Architectures Do
• If you hear “language X doesn’t scale” then
turn around and walk away.
– That person doesn’t understand scalability

There is a bit more to Scalability
• Scalability is also about how you scale the
development team
• If you are successful and need to add people
how easy is it for them to contribute
• How fast can you write code
– Your competitors are right behind you
– He who can develop good code fast wins!

The Achievers Platform
• Multi tenant architecture
– One code base
– One database
• Module based platform
– Hundreds of configuration options for each
module
– Lots of legacy configurations

Backend Processing
• We handle many millions of dollars of orders
every month
• We send out hundreds of thousands of emails
a month

The Stack
• Pretty Standard J2EE stack
• Hibernate
• Spring
• JMS
• MySql
• All running on Amazon EC2

Aside – Amazon EC2
• EC2 is great
• Spin up machines for testing then shut them
down
• A must for any startup
– Don’t manage your own servers when you are
small. It isn’t worth it

Architecture
Presentation Business Logic

JSP Pages

Hibernate
Servlet Objects

HTML
MySql

LOOKS GREAT SO WHAT'S THE
PROBLEM?

Architecture – Data Center View

Server 1

But J2EE Scales
• Sure it does BUT
• The devil is in the details

Scaling Was an Afterthought
• We had to scale vertically since the underlying
design did not consider what would happen if
we had 2 web servers
• We had the largest EC2 instance money could
buy
• You cannot retrofit scalability
– Your architecture and design either have it or they
doesn’t

Design Decisions
• Your basic approach and philosophy to a few
things will determine how hard it will be to
scale your infrastructure

Who doesn’t like magic
• Extensive use of Aspect Oriented
Programming (AOP)
– Allows you to define ‘cut-points’ to insert code
before or after a function call
• As an academic AOP is brilliant
• As a CTO not so much

There is a Pattern for That
• Use of design patterns for the sake of using a
design pattern
• Don’t get me wrong every developer must
know and understand design patterns
• But it isn’t a competition to see who can use
the most design patterns in any given day
– The right tool for the right job
– Don’t force it!

Overly complex object model
• The Access Control model had so many
objects and relationships that other than the
original author no other person ever
understood it

Why is Complexity Bad?
• If the system dies at two o'clock in the
morning and I'm staring at your code, can I
easily figure out what's going on?
• People Forget about Magic
– Code needs to be in front of you not buried in an
XML file or magically invoked

What Does This Have To Do With
Scalability?
• Complex systems are really, really hard to
scale
– In a clustered environment you need to first figure
out if the problem is because of clustering or
because of your code
– This isn’t trivial even for simple systems
• To many things to worry about
• When you hit a wall (and you will) it becomes
very hard to figure out what to do

Don’t Forget About the People
• As you grow your team you need to ramp
everybody up
• A complex system takes longer to learn than a
simple one
• Complexity ALWAYS increases over time. If
you start with something that is complex it
will quickly get beyond the scope of a meer
mortal

Desire for Complex Solutions
Complexity

Experience

The Database
• ORMs make you stupid … kidding … sort of
• You need to understand your data
– Do not let an ORM define your database you will be
sorry
• Generating reports out of an ORM is painful
• Developers must understand how a DB works
– You will forget about what a DB is good for if you
don’t consider it explicitly
– New developers usually do not understand the
importance of the DB in scaling

ORM’s
• Can they scale?
– Sure
• Is it hard?
– Yup
• A quote from stackoverflow on scaling ORM’s
– “… a good ORM will provide plenty of hooks that
allow you to optimize quite a bit. You just need to
spend some time learning it.”

Is that all?
• Initially ORMs might allow you to write code
quickly
– I would challenge this but that is another topic
• Your system runs into a brick wall. Customers
are complaining. Your CEO is chewing out the
CTO. The VP Engineering is curled up in a ball
in the corner. They turn to you as the
architect and you answer:
“We just need to learn how to use all the hooks”

Just Learn the ORM
• I have yet to meet somebody that could
convince me that they knew how to scale an
ORM
– It HAS been done, so yes it is possible but it takes
patience and a CEO that likes to wait
– I’ve had people tell me “we just have to rewrite
the ORM with a new ORM that could scale”

Know your database
• I believe that your DB should own all your
data
– Let it do what it is good at
• If that is true then simple replication
strategies and a little bit of coding can get you
reading data from a replica
• You can then start denormalizing the DB to get
better performance

Scaling Your Data
• Scaling a DB is a well understood problem
with well understood solutions
• Don’t confuse this with easy!

Server Side Sessions
• Very developer friendly
• You have 2 choices to scale:
– Session replication
– Sticky Sessions

Session Replication
• Yuck!
• Lots of network chatter
• Slow propagation of the session means the
user has a bad experience
• You could be moving lots of data around
– Our sessions were huge

Sticky Sessions
• Works but you now need to worry about a
machine being overloaded while the others
are idle
• A machine failure logs out everybody from
that machine
• You have be very careful when configuring
– If all IP addresses go to one server then you
essentially have one company per server

When to Cache
• Our platform made extensive use of caches
• That has to be good right?
• Not in our case
– Items were cached by Java
– Shared state posed a problem when adding
another server
– Yes there are Java based solutions but all you are
doing is adding complexity

It Won’t Love You Back
• Never fall in love with your technology. It will
break your heart.
• You must always challenge your assumptions
and be prepared to throw away something
– Hard to throw away your ‘baby’
– Remember it is just a bunch of 1’s and 0’s

Basic Premise
• Every web application follows the same basic
flow:
1. User makes a request
2. Validate the request
3. Grab some data
4. Process it a bit
5. Build a Page for the user

Guiding Architectural Principles
• Initial deployment would be on 3 machines
– Forcing us to understand how we are going to scale
upfront
• Servers must be stateless
• The database owns all the data
• Caching is an explicit choice to solve a real
problem
• Always use the right tool for the job
• Minimize complexity

Other Goals
• Zero downtime deployments
• We wanted to be able upgrade customers one
at a time
• Maximize developer productivity

The Target
Load Balancer

Web Server Web Server Web Server

Background
MemcacheD NAS Processing
Cluster Device MySql MySql
Master Slave

The Language Choice
• Why PHP
– Faster code/debug cycles
• This has increased our productivity
– Zero downtime deployments
• We have patched running servers multiple times in a
day and nobody has noticed anything
– Shared nothing philosophy
• Forces a good frame of mind for server development

Doesn’t PHP Suck?
• Languages don’t suck only the developers
using them do
• PHP isn’t perfect
– Google ‘why php sucks’ for an extensive list
• But PHP doesn’t scale
– Remember, languages don’t scale …
– If you don’t believe me ask
Wikipedia, Facebook, Digg etc.

Sure but PHP is Slow
• If your web application is not database bound
then you are probably doing it wrong
• Yes Java might perform at some things but
that will not be a limiting factor

Surely There are Down Sides?
• Because PHP does not have strong typing you
need really good error detection and reporting
– We will do another talk on our struggles and
solutions
• Coding standards are a must since PHP lets
you pretty much do whatever you want
– Naming conventions are super important
– Don’t start a religious war over bracket placement.
There really is only one right way 

The Framework
• We use Codeigniter (CI)
• Simple MVC framework
– The code is very easy to follow
• Works out of the box, but is very extensible
– Strictly follows the Open/Closed principle
– We have extended CI a lot to meet our needs
• Doesn’t require learning anything but PHP

Using the Right Tool
• Have Apache (or a faster web server) server all
static content
• A Network Attached Storage (NAS) device was
used for a shared file system.
– This makes life a TON easier
• Have your web servers serve requests
• Move background work to another server

The Problem
• We had about 120 customers and we couldn’t
just go away to do what we needed to do
– Not a bad problem to have

Step 1
• We wrote a controller that would forward
requests to the new code base
• GET requests could be easily forwarded
• POST request were a bit more complicated
• This step allowed us to start developing the
new platform AND keep releasing features

Step 2
• Start migrating customers to the new platform
• We put a proxy server in front of our old and
new platforms.
• We then proxied specific requests to the
version they were running on

The Setup

HAProxy

Express Achievers
Platform Platform

MySql

HAProxy
• If you don’t have it installed go back to the
office download it and install it!
• It isn’t just a load balancer
– We can move specific traffic to specific machines
for whatever reason
– We have a machine with profiling capabilities that
we have used to profile production problems
– Fine grain control over your request

We did it!
• It took us almost 6 months to migrate every
customer but we did get there
• Our productivity has improved
• And we have an architecture that we know
can handle whatever we can throw at it
– At least in the short term

Scaling is Hard
• Don’t make it harder on yourself
– Reduce complexity
– Understand your database
– Have an upfront strategy to deal with state
• We picked stateless but you don’t have to

Never let anybody tell you a
language or framework does or
doesn’t scale.

It is all in the details

Scaling High Traffic Web Applications

Scaling High Traffic Web Applications

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (18)

Similaire à Scaling High Traffic Web Applications

Similaire à Scaling High Traffic Web Applications (20)

Plus de Achievers Tech

Plus de Achievers Tech (6)

Dernier

Dernier (20)

Scaling High Traffic Web Applications