Cloud computing gives you a number of advantages in being able to scale on demand, easily replace whole parts of your infrastructure, and much more. As a new business looking to use the cloud, you inevitably ask yourself, Where do I start? Join us at this session to understand some of the common patterns and recommended areas of focus you can expect to work through while scaling an infrastructure to handle going from zero to millions of users. From leveraging highly scalable AWS services to making smart decisions on building out your application, you'll learn a number of best practices for scaling your infrastructure in the cloud. The patterns and practices reviewed in this session will get you there.
2. • ME: Simon Elisha – Principal Solutions Architect –
Amazon Web Services – @simon_elisha
• YOU: Here to learn more about scaling infrastructure on
AWS
• TODAY: About best practices and things to think about
when building for large scale
11. Regions
US West (Oregon)
EU (Ireland)
AWS GovCloud (US)
Asia Pacific (Tokyo)
US East (Virginia)
Asia Pacific
(Sydney)
US West (N. California)
South America (Sao Paulo)
Asia Pacific
(Singapore)
12. Availability Zones
US West (Oregon)
EU (Ireland)
AWS GovCloud (US)
Asia Pacific (Tokyo)
US East (Virginia)
Asia Pacific
(Sydney)
US West (N. California)
South America (Sao Paulo)
Asia Pacific
(Singapore)
17. Day One, User One
• A single EC2 Instance
Amazon
Route 53
User
– With full stack on this host
•
•
•
•
Web app
Database
Management
Etc.
• A single Elastic IP
• Route53 for DNS
Elastic IP
EC2
Instance
18. “We’re gonna need a bigger box”
•
•
•
•
•
•
•
•
Simplest approach
Can now leverage PIOPs
High I/O instances
High memory instances
High CPU instances
High storage instances
Easy to change instance sizes
Will hit an endpoint eventually
hi1.4xlarge
m2.4xlarge
m1.small
19. “We’re gonna need a bigger box”
•
•
•
•
•
•
•
•
Simplest approach
Can now leverage PIOPs
High I/O instances
High memory instances
High CPU instances
High storage instances
Easy to change instance sizes
Will hit an endpoint eventually
hi1.4xlarge
m2.4xlarge
m1.small
20. Day One, User One
• We could potentially get
to a few hundred to a few
thousand depending on
application complexity
and traffic
• No failover
• No redundancy
• Too many eggs in one
basket
Amazon
Route 53
User
Elastic IP
EC2
Instance
21. Day One, User One:
• We could potentially get
to a few hundred to a few
thousand depending on
application complexity
and traffic
• No failover
• No redundancy
• Too many eggs in one
basket
Amazon
Route 53
User
Elastic IP
EC2
Instance
22. Day Two, User >1
First let’s separate out
our single host into
more than one.
• Web
• Database
– Make use of a database
service?
Amazon
Route 53
User
Elastic IP
Web
Instance
Database
Instance
23. Database Options
Self-managed
Database Server
on Amazon EC2
Your choice of
database running on
Amazon EC2
Bring Your Own
License (BYOL)
Fully Managed
Amazon
RDS
Amazon
DynamoDB
Amazon
Redshift
Microsoft SQL,
Oracle or MySQL as
a managed service
Managed NoSQL
database service
using SSD storage
Massively parallel,
petabyte-scale, data
warehouse service
Flexible licensing –
BYOL or license
included
Seamless scalability
Fast, powerful and
easy to scale
Zero administration
24. But how do I choose
what DB technology I
need? SQL? NoSQL?
28. Why start with SQL?
• Established and well worn technology
• Lots of existing code, communities, books, background,
tools, etc
• You aren’t going to break SQL DBs in your first 10 million
users. But you might break parts of it (hence blended
approach)
• Clear patterns to scalability
29. If your usage is such that you will be
generating several TB ( >5 ) of data
in the first year OR have an
incredibly data intensive workload
you might need NoSQL
30. Why else might you need NoSQL?
•
•
•
•
•
•
Super low latency applications
Metadata driven datasets
Highly unrelational data
Need schema-less data constructs*
Massive amounts of data (again, in the TB range)
Rapid ingest of data (thousands of records/sec)
*Need != “its easier to do dev without schemas”
32. User >100
First let’s separate out
our single host into
more than one
• Web
• Database
– Use RDS to make your life
easier
Amazon
Route 53
User
Elastic IP
Web
Instance
RDS DB
Instance
33. User > 1000
User
Next let’s address our
lack of failover and
redundancy issues
• Elastic Load Balancing
• Another web instance
Amazon
Route 53
Elastic Load
Balancing
Web
Instance
Web
Instance
RDS DB Instance
Active (Multi-AZ)
RDS DB Instance
Standby (Multi-AZ)
Availability Zone
Availability Zone
– In another Availability Zone
• Enable Amazon RDS multi-AZ
34. Elastic Load Balancing
•
Create highly scalable applications
•
Distribute load across EC2 instances
in multiple Availability Zones
Feature
Available
Health checks
Session stickiness
Secure sockets layer
Monitoring
Elastic Load
Balancer
Details
Load balance across instances in multiple
Availability Zones
Automatically checks health of instances and
takes them in or out of service
Route requests to the same instance
Supports SSL offload from web and application
servers with flexible cipher support
Publishes metrics to CloudWatch
36. User >10 ks–100 ks
User
Amazon
Route 53
Elastic Load
Balancing
Web
Instance
Web
Instance
Web
Instance
RDS DB Instance RDS DB Instance
Read Replica
Read Replica
Availability Zone
Web
Instance
RDS DB Instance
Active (Multi-AZ)
Web
Instance
Web
Instance
RDS DB Instance
Standby (Multi-AZ)
Web
Instance
RDS DB Instance
Read Replica
Availability Zone
Web
Instance
RDS DB Instance
Read Replica
37. This will take us pretty far
honestly, but we care about
performance and
efficiency, so let’s clean
this up a bit
38. Shift Some Load Around
User
Let’s lighten the load on our
web and database instances:
•
•
•
Move static content from the
web Instance to Amazon S3
and CloudFront
Move dynamic content from the
Elastic Load Balancing to
CloudFront
Move session/state and DB
caching to ElastiCache or
Amazon DynamoDB
Amazon
Route 53
Amazon
CloudFront
Elastic Load
Balancer
Amazon S3
Web
Instance
ElastiCache
RDS DB Instance
Active (Multi-AZ)
Amazon
DynamoDB
Availability Zone
Check out Session: ARC309 – Dynamic Content
Acceleration: Lightning Fast Web Apps with
Amazon CloudFront and Amazon Route 53
39. Working with S3 – Amazon Simple Storage Service
•
•
•
•
Object-based storage for the web
11 9s of durability
Good for things like:
– Static assets ( css, js, images,
videos )
– Backups
– Logs
– Ingest of files for processing
“Infinitely scalable”
•
•
•
•
•
•
•
Supports fine grained permission control
Ties in well with CloudFront
Ties in with Amazon EMR
Acts as a logging endpoint for Amazon
S3, CloudFront, Billing
Supports encryption at transit and at rest
Reduced redundancy 1/3 cheaper
Amazon Glacier for super long term
storage
40. Amazon CloudFront
CDN for Static
CDN for Static &
Content
No CDN
Dynamic Content
•
80
70
60
50
40
30
20
10
0
8:00
AM
9:00
AM
10:00 11:00 12:00
AM
AM
PM
1:00
PM
2:00
PM
3:00
PM
4:00
PM
5:00
PM
6:00
PM
Server
Load
Response Time
Server
Load
Response Time
Server Load
Cache static content at the edge for faster delivery
Helps lower load on origin infrastructure
Dynamic and static content
Streaming video
Zone apex support
Custom SSL certificates
Low TTLs ( as short as 0 seconds )
Lower costs for origin fetches ( between Amazon
S3/EC2 and CloudFront )
Optimized to work with EC2, Amazon S3, Elastic
Load Balancing, and Route53
Volume of Data
Delivered (Gbps)
•
•
•
•
•
•
•
•
Response Time
Amazon CloudFront is a web service for
scalable content delivery.
7:00
PM
8:00
PM
9:00
PM
41. Shift Some Load Around
User
Let’s lighten the load on our
web and database instances
•
Move static content from the
web instance to Amazon S3
and CloudFront
• Move dynamic content from
the Elastic Load Balancing
to CloudFront
• Move session/state and DB
caching to ElastiCache or
DynamoDB
Amazon
Route 53
Amazon
CloudFront
Elastic Load
Balancing
Amazon S3
Web
Instance
ElastiCache
RDS DB Instance
Active (Multi-AZ)
Availability Zone
Amazon
DynamoDB
42. Shift Some Load Around
User
Let’s lighten the load on our
web and database
instances
• Move static content from the
web instance to Amazon S3
and CloudFront
• Move dynamic content from
the Elastic Load Balancing to
CloudFront
• Move session/state and DB
caching to ElastiCache or
Amazon DynamoDB
Amazon
Route 53
Amazon
Cloudfront
Elastic Load
Balancing
Amazon S3
Web
Instance
ElastiCache
RDS DB Instance
Active (Multi-AZ)
Availability Zone
Amazon
DynamoDB
43. Amazon DynamoDB
• Provisioned throughput NoSQL
database
• Fast, predictable performance
• Fully distributed, fault-tolerant
Feature Details
Provisioned
throughput
Predictable
performance
Strong
consistency
Fault tolerant
architecture
• Considerations for nonuniform
data
Monitoring
Dial up or down provisioned
read/write capacity
Average single-digit millisecond
latencies from SSD-backed
infrastructure
Be sure you are reading the
most up to date values
Data replicated across
Availability Zones
Integrated to CloudWatch
Secure
Integrates with AWS Identity
and Access Management (IAM)
Elastic
MapReduce
Integrates with Amazon Elastic
MapReduce for complex
analytics on large datasets
44. ElastiCache
•
•
•
•
•
•
•
Hosted Memcached & Redis
– Speaks same API as traditional open source
Memcached and Redis
Scale from one to many nodes
Self-healing ( replaces dead instance )
Very fast ( single digit ms speeds usually )
Local to a single AZ for Memcache, with no persistence or
replication
With Redis can put a replica in a different AZ with
persistence
Use AWS’s Auto Discovery client to simplify clusters
growing and shrinking without affecting your application
45. Now that our Web tier is
much more lightweight, we
can revisit the beginning of
our talk…
47. Auto Scaling
Trigger autoscaling policy
Amazon
CloudWatch
Automatic resizing of compute clusters
based on demand
Feature
Details
Control
Define minimum and maximum instance pool
sizes and when scaling and cool down occurs.
Integrated to Amazon
CloudWatch
Use metrics gathered by CloudWatch to drive
scaling.
Instance types
Run Auto Scaling for On-Demand and Spot
Instances. Compatible with VPC.
AWS autoscaling create-autoscaling-group
— Auto Scaling-group-name MyGroup
— Launch-configuration-name MyConfig
— Min size 4
— Max size 200
— Availability Zones us-west-2c
55. User >500k+
Amazon
Route 53
User
Amazon
Cloudfront
Elastic Load
Balancing
Web
Instance
Web
Instance
Web
Instance
Amazon S3
Web
Instance
Web
Instance
Web
Instance
DynamoDB
RDS DB Instance RDS DB Instance
Active (Multi-AZ)
Read Replica
Availability Zone
ElastiCache
RDS DB Instance RDS DB Instance
Standby (Multi-AZ) Read Replica
Availability Zone
ElastiCache
57. “Give me six hours to chop down a tree and I will spend
the first four sharpening the axe.” – Abraham Lincoln
58. “World of Hurt” If You Are Missing These
•
•
•
•
Metrics & alarming
Automated builds
Automated deployment
Centralized logging
Check out Session: ARC306
– Lumberjacking on AWS:
Cutting Through Logs to Find
What Matters
Check out Session: ARC307
–Continuous Integration and
Deployment Best Practices on
AWS
60. Not having proper monitoring
or metrics is like flying a
plane with an eye mask on in
a thunderstorm.
Oh and your wing is on fire.
61. AWS Marketplace & Partners Can Help
• Customers can find, research,
buy software
• Simple pricing aligns with EC2
usage model
• Launch in minutes
• Marketplace billing integrated
into your AWS account
• 1,000+ products across 20+
categories
Learn more at: aws.amazon.com/marketplace
62. Spend Your Time Wisely
Managing your infrastructure will become an
increasingly important part of your time. Use tools to
automate repetitive tasks
• Tools to manage AWS resources
• Tools to manage software on and configuration of
your instances
• Automated data analysis of logs and user actions
63. AWS Application Management Solutions
Higher level services
Elastic Beanstalk
Convenience
AWS OpsWorks
Do it yourself
AWS CloudFormation
EC2
Control
64. Host-based Configuration Management
Two big players
– Opscode Chef
– PuppetLabs Puppet
•
•
•
•
•
Both do more or less the same thing
They have similar syntax
Works well with tools from the previous slide
Require some learning time
Can’t scale easily without this kind of capability
65. From 500K to 1 Million Users
•
•
•
•
Getting serious now
Significant user base
Plenty of attention if things go wrong
Interesting phase for startups with funding
rounds
66. Time to make some
radical improvements at
the web & app layers
68. SOAing
Move services into their own tiers
or modules. Treat each of these
as 100% separate pieces of your
infrastructure and scale them
independently.
Amazon.com and AWS do this
extensively! It offers flexibility and
greater understanding of each
component.
69. Loose Coupling Sets You Free!
• The looser they're coupled, the bigger they scale
–
–
–
–
Use independent components
Design everything as a black box
Decouple interactions
Favor services with built in redundancy and scalability than
building your own
Use Amazon SQS as Buffers
Tight Coupling
Loose Coupling
Controller A
Q
Controller B
Q
Controller A
Controller B
Check out Session: ARC301
– Controlling the Flood:
Massive Message Processing
with Amazon SQS & Amazon
DynamoDB
70. Loose Coupling + SOA = Winning
In the early days, if someone has a service for it already,
use that instead of building it yourself
Don’t reinvent the wheel
Examples:
• Email
• Queuing
• Transcoding
• Search
•
•
•
•
Databases
Monitoring
Metrics
Logging
Amazon SNS
Amazon SES
Amazon
CloudSearch
Amazon SQS
Amazon SWF
Amazon Elastic
Transcoder
71. On reinventing the wheel: If
you find yourself writing
your own queue, DNS
server, database, storage
system, monitoring tool …
75. CloudFront
Download
Distribution
RRS
Amazon S3
Bucket to
Serve
Content to
CloudFront
Amazon S3
Bucket for
Ingest
Instances
User
SQS Queue
Size for Thumbnail
Autoscaling
Group
Instances
Amazon SNS Topic
Autoscaling
Group
SQS Queue
Size Image for
Mobile
Instances
SQS Queue
Size Image for Web
Autoscaling
Group
Amazon S3
Bucket for
Originals
76. Amazon Simple Workflow Service (SWF)
•
•
•
•
•
•
•
Provides an orchestration tool across your infrastructure
Can act as a middle layer to pass messages and setup tasks
Lets you break down individual tasks into different workers
Lets you define logic between workers
Lets you make a worker task from anything that can be scripted
Includes built-in retries, timeouts, logging
Features built-in reliability, scalability, and low cost
Your code =
&
Deciders
Workers
77. CloudFront
Download
Distribution
RRS
Amazon S3
Bucket to
Serve
Content to
CloudFront
Amazon S3
Bucket for
Ingest
Instances
User
SQS Queue
Size for Thumbnail
Autoscaling
Group
Instances
Amazon SNS Topic
Autoscaling
Group
SQS Queue
Size Image for
Mobile
Instances
SQS Queue
Size Image for Web
Autoscaling
Group
Amazon S3
Bucket for
Originals
79. Users > 1 Million
Reaching a million and above is going to require some of
all the previous things:
• Multi-AZ
• Elastic Load Balancing between tiers
• Auto Scaling
• Service-oriented architecture
• Serving content smartly (S3/CloudFront)
• Caching off DB
• Moving state off tiers that autoscale
80. Users > 1 Million
User
Amazon
Route 53
Amazon
Cloudfront
Elastic Load
Balancer
Amazon SQS
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Worker
Instance
Worker
Instance
Amazon
DynamoDB
ElastiCache
RDS DB Instance RDS DB Instance
Read Replica
Read Replica
Availability Zone
RDS DB Instance
Active (Multi-AZ)
Amazon S3
Internal App
Instance
Internal App
Instance
Amazon
CloudWatch
Amazon SES
82. From 5 to 10 Million Users
You may start to run into issues with your database around
contention on the write master.
How can you solve it?
• Federation (splitting into multiple DBs based on function)
• Sharding (splitting one data set up across multiple hosts)
• Moving some functionality to other types of DBs (NoSQL)
83. Database Federation
• Split up databases by function
or purpose
• Harder to do cross-function
queries
• Essentially delays the need for
something like sharding or
NoSQL until much further down
the line
• Won’t help with single huge
functions or tables
ForumsDB
UsersDB
ProductsDB
84. Sharded Horizontal Scaling
• More complex at the
application layer
• ORM support can help
• No practical limit on
scalability
• Operational complexity
and sophistication
• Shard by function or key
space
• RDBMS or NoSQL
User
ShardID
002345
A
002346
B
002347
C
002348
B
002349
A
A
C
B
85. Shifting Functionality to NoSQL
• Similar in a sense to federation
• Again, think about the earlier points for when you
need NoSQL vs SQL
• Leverage hosted services like Amazon DynamoDB
• Consider these use cases:
–
–
–
–
–
Leaderboards and scoring
Rapid ingest of clickstream or log data
Temporary data needs (cart data)
“Hot” tables
Metadata or lookup tables
Amazon
DynamoDB
86. From 5 to 10 Million Users
You may start to run into issues with speed and performance
of your applications
• Make sure you have monitoring, metrics, & logging in place
– If you can’t build it internally, outsource it! (third-party SaaS)
• Pay attention to what customers are saying works well vs.
what doesn’t, and use this as direction
• Try to work on squeezing as much performance out of each
service or component
88. • Use Multi-AZ for your infrastructure
• Make use of self-scaling services (Elastic Load Balancing,
Amazon S3, Amazon SNS, SQS, Amazon SES, etc)
Build in redundancy at every level
• Blend SQL & NoSQL wisely
• Cache data both inside and outside your infrastructure
• Split tiers into individual services (SOA)
• Use autoscaling once you’re ready for it
• Use automation tools in your infrastructure
• Make sure you have good metrics, monitoring, and logging
tools in place
• Don’t reinvent the wheel
89. Putting all this together
means we should now
easily be able to handle
10+ million users!
90. Users > 10 Million
Iterating on top of the
patterns seen here will get
you up and over 100
million users.
91. Users > 10 Million
•
•
•
•
•
More fine tuning of your application
More SOA of features and functionality
Going from Multi-AZ to multi-region
Needing to start building custom solutions
Deep analysis of your whole stack
Check out Session: ARC305
– How Netflix Leverages
Multiple Regions to Increase
Availability
92. One More Thing
• A fantastic amount of FINANCIAL ENGINEERING
to do as well
• Reserved Instances
• Spot Instances
• Correct use of storage
• Scaling driven by queues
• Correct instance sizes
• Etc…
Check out Session: ARC313
– Running Lean and Mean:
Designing Cost-Efficient
Architectures on AWS
94. Next steps?
Ask for help!
• forums.aws.amazon.com
• aws.amazon.com/support
• Your local account manager & solution architect
95. AWS re:Invent Pub Crawl
Join the AWS Startup Team this evening at the AWS Pub Crawl
When: Wednesday November 13, 5:30pm - 7:30pm
Where: Canaletto at The Venetian, 2nd Floor
Who Will Be There: Startups, The AWS Startup Team,
Startup Launch Companies and
AWS re:Invent Hackathon winners
96. Startup Spotlight Sessions with Dr. Werner Vogels
Thurs. Nov 14, Marcello Room 4406
SPOT 203 - Fireside Chats – Startup Founders, 1:30-2:30pm
– Eliot Horowitz, CTO of MongoDB
– Jeff Lawson, CEO of Twilio
– Valentino Volonghi, Chief Architect of AdRoll
SPOT 204 - Fireside Chats – Startup Influencers, 3:00-4:00pm
– Albert Wegner, Managing Partner at Union Square Ventures
– David Cohen, Founder and CEO of TechStars
SPOT 101 - Startup Launches, 4:15-5:15pm
– 5 companies powered by AWS launching at AWS re:Invent 2013
97. Please give us your feedback on this
presentation
ARC206
As a thank you, we will select prize
winners daily for completed surveys!