This document provides an overview of best practices for scaling infrastructure on AWS from 1 user to 10 million users. It discusses starting with a single EC2 instance, then expanding horizontally by adding more instances and vertically by increasing instance sizes. As users grow from 1,000 to 500,000, the document recommends separating databases from web servers, using read replicas, caching with ElastiCache, and auto scaling. From 500,000 to 1 million users, it suggests moving to a service-oriented architecture and leveraging other AWS services. Scaling from 5 to 10 million users may require database sharding or moving some functions to NoSQL databases.
2. • ME: Craig S. Dickson – Solutions Architect – Amazon
Web Services – @craigsdickson
• YOU: Here to learn more about scaling infrastructure on
AWS
• TODAY: About best practices and things to think about
when building for large scale
10. Day One, User One
• A single EC2 Instance
– With full stack on this host
• Web app
• Database
• Management
• Etc.
• A single Elastic IP
• Route53 for DNS
EC2
Instance
Elastic IP
Amazon
Route 53
User
11. “We’re gonna need a bigger box”
• Simplest approach
• Can now leverage PIOPs
• High I/O instances
• High memory instances
• High CPU instances
• High storage instances
• Easy to change instance sizes
• Will hit an endpoint eventually
hi1.4xlarge
m2.4xlarge
m1.small
12. “We’re gonna need a bigger box”
• Simplest approach
• Can now leverage PIOPs
• High I/O instances
• High memory instances
• High CPU instances
• High storage instances
• Easy to change instance sizes
• Will hit an endpoint eventually
hi1.4xlarge
m2.4xlarge
m1.small
13. Day One, User One
• We could potentially get
to a few hundred to a few
thousand depending on
application complexity
and traffic
• No failover
• No redundancy
• Too many eggs in one
basket
EC2
Instance
Elastic IP
Amazon
Route 53
User
14. Day One, User One
• We could potentially get
to a few hundred to a few
thousand depending on
application complexity
and traffic
• No failover
• No redundancy
• Too many eggs in one
basket
EC2
Instance
Elastic IP
Amazon
Route 53
User
15. Day Two, User >1
First let’s separate out
our single host into
more than one.
• Web
• Database
– Make use of a database
service?
Web
Instance
Database
Instance
Elastic IP
Amazon
Route 53
User
16. Self-managed Fully Managed
Database Server
on Amazon EC2
Your choice of
database running on
Amazon EC2
Bring Your Own
License (BYOL)
Amazon
DynamoDB
Managed NoSQL
database service
using SSD storage
Seamless scalability
Zero administration
Amazon
RDS
Microsoft SQL,
Oracle, Postgres or
MySQL as a
managed service
Flexible licensing –
BYOL or license
included
Amazon
Redshift
Massively parallel,
petabyte-scale, data
warehouse service
Fast, powerful and
easy to scale
Database Options
17. But how do I choose
what DB technology I
need? SQL? NOSQL?
21. Why start with SQL?
• Established and well worn technology
• Lots of existing code, communities, books, background,
tools, etc
• You aren’t going to break SQL DBs in your first 10 million
users. But you might break parts of it (hence blended
approach)
• Clear patterns to scalability
22. If your usage is such that you will be
generating several TB ( >5 ) of data
in the first year OR have an
incredibly data intensive workload
you might need NoSQL
23. Why might you need NOSQL?
• Super low latency applications
• Metadata driven datasets
• Highly unrelational data
• Need schema-less data constructs*
• Massive amounts of data (again, in the TB range)
• Rapid ingest of data (thousands of records/sec)
*Need != “its easier to do dev without schemas”
25. Users > 100
First let’s separate out
our single host into
more than one
• Web
• Database
– Use RDS to make your life
easier
Web
Instance
Elastic IP
RDS DB
Instance
Amazon
Route 53
User
26. Users > 1000
Next let’s address our
lack of failover and
redundancy issues
• Elastic Load Balancing
• Another web instance
– In another Availability Zone
• Enable Amazon RDS multi-AZ
Web
Instance
RDS DB Instance
Active (Multi-AZ)
Availability Zone Availability Zone
Web
Instance
RDS DB Instance
Standby (Multi-AZ)
Elastic Load
Balancing
Amazon
Route 53
User
28. User > 10ks – 100ks
RDS DB Instance
Active (Multi-AZ)
Availability Zone Availability Zone
RDS DB Instance
Standby (Multi-AZ)
Elastic Load
Balancing
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Amazon
Route 53
User
30. Shift Some Load Around
Let’s lighten the load on our
web and database instances:
• Move static content from the
web Instance to Amazon S3
and CloudFront
• Move dynamic content from
the Elastic Load Balancing to
CloudFront
• Move session/state and DB
caching to ElastiCache or
Amazon DynamoDB
Web
Instance
RDS DB Instance
Active (Multi-AZ)
Availability Zone
Elastic Load
Balancer
Amazon S3
Amazon
CloudFront
Amazon
Route 53
User
ElastiCache
Amazon
DynamoDB
31. Shift Some Load Around
Let’s lighten the load on our
web and database instances
• Move static content from the
web instance to Amazon S3
and CloudFront
• Move dynamic content from
the Elastic Load Balancing
to CloudFront
• Move session/state and DB
caching to ElastiCache or
DynamoDB
Web
Instance
RDS DB Instance
Active (Multi-AZ)
Availability Zone
Elastic Load
Balancing
Amazon S3
Amazon
CloudFront
Amazon
Route 53
User
ElastiCache
Amazon
DynamoDB
32. Shift Some Load Around
Let’s lighten the load on our
web and database
instances
• Move static content from the
web instance to Amazon S3
and CloudFront
• Move dynamic content from
the Elastic Load Balancing to
CloudFront
• Move session/state and DB
caching to ElastiCache or
Amazon DynamoDB
Web
Instance
RDS DB Instance
Active (Multi-AZ)
Availability Zone
Elastic Load
Balancing
Amazon S3
Amazon
Cloudfront
Amazon
Route 53
User
ElastiCache
Amazon
DynamoDB
33. Now that our Web tier is
much more lightweight, we
can revisit the beginning of
our talk…
42. Users > 500k+
Availability Zone
Amazon
Route 53
User
Amazon S3
Amazon
Cloudfront
Availability Zone
Elastic Load
Balancing
DynamoDB
RDS DB Instance
Read Replica
Web
Instance
Web
Instance
Web
Instance
ElastiCache RDS DB Instance
Read Replica
Web
Instance
Web
Instance
Web
Instance
ElastiCacheRDS DB Instance
Standby (Multi-AZ)
RDS DB Instance
Active (Multi-AZ)
Auto
Scaling
Group
Auto
Scaling
Group
47. AWS Marketplace & Partners Can Help
• Customers can find, research,
buy software
• Simple pricing aligns with EC2
usage model
• Launch in minutes
• Marketplace billing integrated
into your AWS account
• 1,000+ products across 20+
categories
Learn more at: aws.amazon.com/marketplace
48. Spend Your Time Wisely
Managing your infrastructure will become an
increasingly important part of your time. Use tools to
automate repetitive tasks
• Tools to manage AWS resources
• Tools to manage the software on and
configuration of your instances
• Automated data analysis of logs and user actions
49. AWS Application Management Solutions
Elastic Beanstalk AWS OpsWorks AWS CloudFormation EC2
Convenience Control
Higher level services Do it yourself
50. Host-based Configuration Management
Popular offerings
– Opscode Chef
– PuppetLabs Puppet
– Ansible
• Do similar things in slightly different ways
• Works well with tools from the previous slide
• Require some learning time
• Can’t scale easily without this kind of capability
51. From 500k to 1 Million Users
• Getting serious now
• Significant user base
• Plenty of attention if things go wrong
• Interesting phase for startups with funding
rounds
52. Time to make some
radical improvements at
the web & app layers
54. SOAing
Move services into their own tiers
or modules. Treat each of these
as 100% separate pieces of your
infrastructure and scale them
independently.
Amazon.com and AWS do this
extensively! It offers flexibility and
greater understanding of each
component.
55. Loose Coupling Sets You Free!
• The looser they're coupled, the bigger they scale
– Use independent components
– Design everything as a black box
– Decouple interactions
– Favor services with built in redundancy and scalability than
building your own
Controller
A
Controller
B
Controller
A
Controller
B
Q
Q
Tight
Coupling
Use
Amazon
SQS
as
Buffers
Loose
Coupling
56. Loose Coupling + SOA = Winning
Examples:
• Email
• Queuing
• Transcoding
• Search
Amazon
CloudSearch
Amazon SQSAmazon SNS
Amazon Elastic
Transcoder
Amazon SWF
Amazon SES
In the early days, if someone has a service for it already,
use that instead of building it yourself
Don’t reinvent the wheel
• Databases
• Monitoring
• Metrics
• Logging
58. Amazon S3
Bucket for
Ingest
User
Amazon SNS Topic
RRS
Amazon S3
Bucket to
Serve
Content to
CloudFront
Amazon S3
Bucket for
Originals
CloudFront
Download
Distribution
SQS Queue
Size for Thumbnail
SQS Queue
Size Image for
Mobile
SQS Queue
Size Image for Web
Autoscaling
Group
Instances
Autoscaling
Group
Instances
Autoscaling
Group
Instances
59. Amazon Simple Workflow Service (SWF)
• Provides an orchestration tool across your infrastructure
• Can act as a middle layer to pass messages and setup tasks
• Lets you break down individual tasks into different workers
• Lets you define logic between workers
• Lets you make a worker task from anything that can be scripted
• Includes built-in retries, timeouts, logging
• Features built-in reliability, scalability, and low cost
Deciders Workers
Your code = &
60. Amazon S3
Bucket for
Ingest
User
RRS
Amazon S3
Bucket to
Serve
Content to
CloudFront
Amazon S3
Bucket for
Originals
CloudFront
Download
Distribution
SQS Queue
Size for Thumbnail
SQS Queue
Size Image for
Mobile
SQS Queue
Size Image for Web
Autoscaling
Group
Instances
Autoscaling
Group
Instances
Autoscaling
Group
Instances
Amazon SNS Topic
61. Amazon S3
Bucket for
Ingest
User
RRS
Amazon S3
Bucket to
Serve
Content to
CloudFront
Amazon S3
Bucket for
Originals
CloudFront
Download
Distribution
Autoscaling
Group
Instances
Autoscaling
Group
Instances
Autoscaling
Group
Instances
SWF
Instance Running
Decider
62. Users > 1 Million
Reaching a million and above is going to require some of
all the previous things:
• Multi-AZ
• Elastic Load Balancing between tiers
• Auto Scaling
• Service-oriented architecture
• Serving content smartly (S3/CloudFront)
• Caching off DB
• Moving state off tiers that autoscale
63. Users > 1 Million
RDS DB Instance
Active (Multi-AZ)
Availability Zone
Elastic Load
Balancer
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Amazon
Route 53
User
Amazon S3
Amazon
Cloudfront
Amazon
DynamoDB
Amazon SQS
ElastiCache
Worker
Instance
Worker
Instance
Amazon
CloudWatch
Internal App
Instance
Internal App
Instance
Amazon SES
65. From 5 to 10 Million Users
You may start to run into issues with your database around
contention on the write master.
How can you solve it?
• Federation (splitting into multiple DBs based on function)
• Sharding (splitting one data set up across multiple hosts)
• Moving some functionality to other types of DBs
(NOSQL)
66. Database Federation
• Split up databases by function
or purpose
• Harder to do cross-function
queries
• Essentially delays the need for
something like sharding or
NOSQL until much further down
the line
• Won’t help with single huge
functions or tables
ForumsDB
UsersDB
ProductsDB
67. Sharded Horizontal Scaling
• More complex at the
application layer
• ORM support can help
• No practical limit on
scalability
• Operational complexity and
sophistication
• Shard by function or key
space
• RDBMS or NOSQL
User ShardID
002345 A
002346 B
002347 C
002348 B
002349 A
A
B
C
68. Shifting Functionality to NOSQL
• Similar in a sense to federation
• Again, think about the earlier points for when you
need NOSQL vs. SQL
• Leverage hosted services like Amazon DynamoDB
• Consider these use cases:
– Leaderboards and scoring
– Rapid ingest of clickstream or log data
– Temporary data needs (cart data)
– “Hot” tables
– Metadata or lookup tables Amazon
DynamoDB
70. • Use Multi-AZ for your infrastructure
• Make use of self-scaling services (Elastic Load Balancing,
Amazon S3, Amazon SNS, SQS, Amazon SES, etc.)
Build in redundancy at every level
• Blend SQL & NOSQL wisely
• Cache data both inside and outside your infrastructure
• Split tiers into individual services (SOA)
• Use autoscaling once you’re ready for it
• Use automation tools in your infrastructure
• Make sure you have good metrics, monitoring, and logging
tools in place
• Don’t reinvent the wheel
71. Putting all this together
means we should now
easily be able to handle
10+ million users!
72. Users > 10 Million
Iterating on top of the
patterns seen here will get
you up and over 100
million users.
73. Users > 10 Million
• More fine tuning of your application
• More SOA of features and functionality
• Going from multi-AZ to multi-region
• Needing to start building custom solutions
• Deep analysis of your whole stack
74. One More Thing
• A fantastic amount of FINANCIAL ENGINEERING
to do as well
• Reserved Instances
• Spot Instances
• Correct use of storage
• Scaling driven by queues
• Correct instance sizes
• Etc…
76. Next steps?
Ask for help!
• forums.aws.amazon.com
• aws.amazon.com/support
• Your friendly local account manager & solutions
architect
77. Expand your skills with AWS
Certification
aws.amazon.com/certification
Exams
Validate your proven
technical expertise with
the AWS platform
On-Demand
Resources
aws.amazon.com/training/
self-paced-labs
Videos & Labs
Get hands-on practice
working with AWS
technologies in a live
environment
aws.amazon.com/training
Instructor-Led
Courses
Training Classes
Expand your technical
expertise to design, deploy,
and operate scalable,
efficient applications on AWS