This presentation session from the Cloud Management, Services and Applications Theatre at Cloud Expo Europe 2014 explores the techniques and AWS services that you can use in order to build high scalability web applications on AWS. It also features a great overview of a high-scalability mobile application built by Myriad Group, and AWS customer, that serves over 41 million users.
3. • US: Ian Massingham – Technical Evangelist, Amazon Web
Services,
@IanMmmm
• YOU: Here to learn more about scaling and managing
applications on AWS
• THIS SESSION: About best practices and things to think
about when building for large scale
6. Regions& Availability Zones
US West (Oregon)
EU (Ireland)
AWS GovCloud (US)
Asia Pacific (Tokyo)
US East (Virginia)
Asia Pacific
(Sydney)
US West (N. California)
South America (Sao Paulo)
Asia Pacific
(Singapore)
11. Day One, User One
• A single EC2 Instance
• With full stack on this host
• Web app
• Database
• Management
• Etc.
• A single Elastic IP
• Route53 for DNS
Amazon
Route 53
User
Elastic IP
EC2
Instance
12. “We’re gonna need a bigger box”
• Simplest approach
• Can now leverage PIOPs
• High I/O instances
• High memory instances
• High CPU instances
• High storage instances
• Easy to change instance sizes
• Will hit an endpoint eventually
hi1.4xlarge
m2.4xlarge
m1.small
13. “We’re gonna need a bigger box”
• Simplest approach
• Can now leverage PIOPs
• High I/O instances
• High memory instances
• High CPU instances
• High storage instances
• Easy to change instance sizes
• Will hit an endpoint eventually
hi1.4xlarge
m2.4xlarge
m1.small
14. Day One, User One
• We could potentially get to a few
hundred to a few thousand depending
on application complexity and traffic
• No failover
• No redundancy
• All our eggs in one basket
Amazon
Route 53
User
Elastic IP
EC2
Instance
15. Day One, User One:
• We could potentially get to a few
hundred to a few thousand depending
on application complexity and traffic
• No failover
• No redundancy
• All our eggs in one basket
Amazon
Route 53
User
Elastic IP
EC2
Instance
16. Day Two, User >1
First let’s separate out our single
host into more than one.
• Web
• Database
• Make use of a database service?
Amazon
Route 53
User
Elastic IP
EC2
Instance
Database
Instance
17. Database Options
Self-managed
Fully Managed
Database Server
on Amazon EC2
Amazon
RDS
Amazon
DynamoDB
Amazon
Redshift
Your choice of
database running on
Amazon EC2
Microsoft SQL,
Oracle or MySQL as
a managed service
Managed NoSQL
database service
using SSD storage
Massively parallel,
petabyte-scale, data
warehouse service
Bring Your Own
License (BYOL)
Flexible licensing –
BYOL or license
included
Seamless scalability
Fast, powerful and
easy to scale
Zero administration
18. But how do I choose
what DB technology I
need? SQL? NoSQL?
Not a binary decision!
19. User >100
First let’s separate out our single
host into more than one
• Web
Amazon
Route 53
User
Elastic IP
• Database
• Use RDS to make your life easier
EC2
Instance
RDS DB
Instance
20. User > 1000
User
Next let’s address our lack of
failover and redundancy issues
Amazon
Route 53
Elastic Load
Balancing
• Elastic Load Balancing
• Another web instance
• In another Availability Zone
Web
Instance
Web
Instance
RDS DB Instance
Active (Multi-AZ)
RDS DB Instance
Standby (Multi-AZ)
Availability Zone
Availability Zone
• Enable Amazon RDS multi-AZ
21. Elastic Load Balancing
• Create highly scalable applications
Elastic Load
Balancer
• Distribute load across EC2
instances in multiple Availability
Zones
Feature
Available
Health checks
Session
stickiness
Secure
sockets layer
Monitoring
Details
Load balance across instances in
multiple Availability Zones
Automatically checks health of
instances and takes them in or out
of service
Route requests to the same
instance
Supports SSL offload from web
and application servers with
flexible cipher support
Publishes metrics to CloudWatch
23. User >10 ks–100 ks
Amazon
Route 53
User
Elastic Load
Balancing
Web
Instance
Web
Instance
Web
Instance
RDS DB Instance RDS DB Instance
Read Replica
Read Replica
Availability Zone
Web
Instance
RDS DB Instance
Active (Multi-AZ)
Web
Instance
Web
Instance
RDS DB Instance
Standby (Multi-AZ)
Web
Instance
RDS DB Instance
Read Replica
Availability Zone
Web
Instance
RDS DB Instance
Read Replica
24. This will take us pretty far
honestly, but we care about
performance and efficiency,
so let’s clean this up a bit
25. Shift Some Load Around
Let’s lighten the load on our
web and database instances:
Amazon
Route 53
User
Amazon
CloudFront
• Move static content from the web
Instance to Amazon S3 and
CloudFront
• Move dynamic content from the
Elastic Load Balancing to
CloudFront
• Move session/state and DB
caching to ElastiCache or Amazon
DynamoDB
Elastic Load
Balancer
Amazon S3
Web
Instance
ElastiCache
RDS DB Instance
Active (Multi-AZ)
Availability Zone
Amazon
DynamoDB
26. Working with S3 – Amazon Simple Storage Service
• Object-based storage for the web
• Supports fine grained permission control
• 11 9s of durability
• Ties in well with CloudFront
• Good for things like:
• Ties in with Amazon EMR
• Static assets ( css, js, images, videos )
• Acts as a logging endpoint for Amazon S3,
• Backups
CloudFront, Billing
• Supports encryption at transit and at rest
• Reduced redundancy 1/3 cheaper
• Amazon Glacier for super long term storage
• Logs
• Ingest of files for processing
• “Infinitely scalable”
27. Amazon CloudFront
Amazon CloudFront is a web service for
scalable content delivery.
CDN for Static
CDN for Static &
Content
No CDN
Dynamic
Content
• Cache static content at the edge for faster delivery
• Helps lower load on origin infrastructure
• Dynamic and static content
Server
Load
Response Time
Server
Load
• Low TTLs ( as short as 0 seconds )
Response Time
• Custom SSL certificates
Server Load
• Zone apex support
Response Time
• Streaming video
• Lower costs for origin fetches ( between Amazon
80
Volume of Data Delivered
(Gbps)
S3/EC2 and CloudFront )
• Optimized to work with EC2, Amazon S3, Elastic
Load Balancing, and Route53
70
60
50
40
30
20
10
0
8:00
AM
9:00 10:00 11:00 12:00 1:00
AM AM AM
PM
PM
2:00
PM
3:00
PM
4:00
PM
5:00
PM
6:00
PM
7:00
PM
8:00
PM
9:00
PM
28. Shift Some Load Around
Let’s lighten the load on our web
and database instances
• Move static content from the web
instance to Amazon S3 and
CloudFront
• Move dynamic content from the
Elastic Load Balancing to
CloudFront
• Move session/state and DB
caching to ElastiCache or
Amazon DynamoDB
Amazon
Route 53
User
Amazon
CloudFront
Elastic Load
Balancing
Amazon S3
Web
Instance
ElastiCache
RDS DB Instance
Active (Multi-AZ)
Availability Zone
Amazon
DynamoDB
29. Amazon DynamoDB
• Provisioned throughput NoSQL
database
• Fast, predictable performance
• Fully distributed, fault-tolerant
Feature Details
Provisioned
throughput
Predictable
performance
Dial up or down provisioned
read/write capacity
Average single-digit millisecond
latencies from SSD-backed
infrastructure
architecture
• Considerations for non-uniform
data
Strong
consistency
Be sure you are reading the most
up to date values
Fault tolerant
Data replicated across Availability
Zones
Monitoring
Secure
Elastic
MapReduce
Integrated to CloudWatch
Integrates with AWS Identity and
Access Management (IAM)
Integrates with Amazon Elastic
MapReduce for complex analytics
on large datasets
30. Now that our Web tier is much
more lightweight, we can
revisit the beginning of our
talk…
32. Auto Scaling
Amazon
CloudWatch
Automatic resizing of compute clusters
based on demand
Trigger
autoscaling
policy
Feature
Details
Control
Define minimum and maximum instance pool
sizes and when scaling and cool down occurs.
Integrated to Amazon
CloudWatch
Use metrics gathered by CloudWatch to drive
scaling.
Instance types
Run Auto Scaling for On-Demand and Spot
Instances. Compatible with VPC.
aws autoscaling create-autoscaling-group
— Auto Scaling-group-name MyGroup
— Launch-configuration-name MyConfig
— Min size 4
— Max size 200
— Availability Zones us-west-2c
40. User >500k+
Amazon
Route 53
User
Amazon
Cloudfront
Elastic Load
Balancing
Web
Instance
Web
Instance
Web
Instance
Amazon S3
Web
Instance
Web
Instance
Web
Instance
DynamoDB
RDS DB Instance RDS DB Instance
Active (Multi-AZ)
Read Replica
Availability Zone
ElastiCache
RDS DB Instance RDS DB Instance
Standby (Multi-AZ) Read Replica
Availability Zone
ElastiCache
41. Want to go beyond 500K users?
Let me introduce an AWS
customer that has done that…
43. Myriad Overview
Myriad delivers consumer applications, messaging solutions, and
embedded software to leading OEMs, mobile operators, and pay TV
providers worldwide.
Messaging Services
– Instant messaging to tens of millions of users in Latin America
– Access to social networking applications via USSD on feature phones in Latin
America, Asia and Africa
– Working with some of the world’s most successful mobile phone operators.
Java
– Long established business providing Java VM for embedded devices
Connected Home Solutions
– Found in tens of millions of set-top boxes and Blu-ray players.
– Driving the convergence of mobile, tablets and TV through creation of a suite of
exciting connected home solutions.
Worldwide operation with 200 people located in
UK, France, Mexico, Brazil, India & China.
43
44. Myriad Messenger
Latin American chat service launched in
16 markets
41M users through current carrier
partnerships
Service platform and clients undergoing
redevelopment to support rapid growth
with complementary carrier and OTT
services
The challenge
– Build a new platform that will deliver
superior performance, features and
scale in a short time with a small team
44
45. High level platform overview
user service
Platform is Java
addressbook service
All services developed as OSGi
bundles
Apache Felix OSGi container
WAP service
OSGi allows us to separate
functional interfaces from
implementations dynamically
– We can quickly and painlessly
swap out implementations if they
don’t perform or scale
We have a single design pattern
for all system components
message service
REST API servers
message queue
handlers
message queue
MSNGR clients
routing service
SMS aggregator
message notification
We have developed OSGi
services over AWS
45
notification queue
Amazon cloud
messaging
46. AWS Application Services
Each of our persistent services runs over DynamoDB
– Gives us unlimited scale, predictable latency and the ability to individually tune the
IO requirements to suit the service (e.g. address book needs less IO than
message service, and greater read capacity than write)
We make extensive use of SQS to decouple system components and allow us to
– scale more effectively
– flatten daily system usage patterns
– give clients a very fast response to the REST API
SNS for internal message routing
Elasticache provides caches for performance (and cost reduction) over system
hotspots
S3 for storage of media, extended message bodies, branding packages, code
deployments
EC2/Cloudwatch/Auto scaling for flexible processing on demand
46
47. Thank You
Read!
• aws.amazon.com/documentation
• aws.amazon.com/architecture
• aws.amazon.com/start-ups
Listen!
• aws.amazon.com/podcast
Ian Massingham
IanMmmm
Technical Evangelist, Amazon Web Services
Feb, 2014
Notes de l'éditeur
Introduce yourself, who the crowd is, and our goal for today
Background intro: Scaling is a big topic, with lots of opinions, guides, how-tos, and 3rd parties. If you are new to scaling on AWS, you might ask yourself this question: “So how do I scale?”
We need some basics to lay the foundations we’ll need to build our knowledge of AWS on top of.
Next up we have Availability Zones, these are part of our regions, and existing within them. There will be at a minimum 2 of these in every AZ, and generally speaking your infrastructure will live in one or more AZ’s inside a given region. We’ll be talking a lot about Multi-AZ architectures today, as it’s a core-component to having a highly available, highly redundant, and highly durable infrastructure on AWS.https://aws.amazon.com/about-aws/globalinfrastructure
Last up we have our Edge locations, which are the homes to our CloudFront CDN, and 53 DNS service. There are over 40 of these around the globe right now, and we’re constantly trying to add these to new places to help best meet our customer’s needs. We also sometimes have multiple edge locations in a single area to help handle the capacity, IE New York, which has 3 today.https://aws.amazon.com/about-aws/globalinfrastructure
Over 30 services as represented in groupings in the console
Over 30 services, the 25 major services as seen in the console
So let’s start from day one, user one, of our new infrastructure and application.
This here is the most basic set up you would need to serve up a web application. We have Route53 for DNS, an EC2 instance running our webapp and database, and an Elastic IP attached to the EC2 instance so Route53 can direct traffic to us. Now in scaling this infrastructure, the only real option we have is to get a bigger EC2 instance…
Scaling the one EC2 instance we have to a larger one is the most simple approach to start with. There are a lot of different AWS instance types to go with depending on your work load. Some have high I/O, CPU, Memory, or local storage. You can also make use of EBS-Optimized instances and Provisioned IOPs to help scale the storage for this instance quite a bit.
Scaling the one EC2 instance we have to a larger one is the most simple approach to start with. There are a lot of different AWS instance types to go with depending on your work load. Some have high I/O, CPU, Memory, or local storage. You can also make use of EBS-Optimized instances and Provisioned IOPs to help scale the storage for this instance quite a bit.
So while we could reach potentially a few hundred or few thousand users supported by this single instance, its not a long term play.
We also have to consider some other issues with this infrastructure; No Failover, No redundancy, and too many eggs in one basket, since we have both the database and webapp on the same instance.
The first thing we can do to address the issues of too many eggs in one basket, and to over come the “no bigger boat” problem, is to split out our Webapp and Database into two instances. This gives us more flexibility in scaling these two things independently. And since we are breaking out the Database, this is a great time to think about maybe making use of a database services instead of managing this ourselves…
At AWS there are a lot of different options to running databases. One is to just install pretty much any database you can think of on an EC2 instance, and manage all of it yourself. If you are really comfortable doing DBA like activities, like backups, patching, security, tuning, this could be an option for you.If not, then we have a few options that we think are a better idea:First is Amazon RDS, or Relational Database Service. With RDS you get a managed database instance of either MySQL, Oracle, or SQL Server, with features such as automated daily backups, simple scaling, patch management, snapshots and restores, High availability, and read replicas, depending on the engine you go with.Next up we have DynamoDB, a NoSQL database, built ontop of SSDs. DynamoDB is based on the Dynamo whitepaper published by Amazon.com back in 2003, considered the grandfather of most modern NoSQL databases like Cassandra and Riak. DynamoDB that we have here at AWS is kind of like a cousin of the original paper. One of the key concepts to DynamoDB is what we call “Zero Administration”. With DynamoDB the only knobs to tweak are the reads and writes per second you want the DB to be able to perform at. You set it, and it will give you that capacity with query responses averaging in single millisecond. We’ve had customers with loads such as half a million reads and writes per second without DynamoDB even blinking.Lastly we have Amazon Redshift, a multi-petabyte-scale data warehouse service. With Redshift, much like most AWS services, the idea is that you can start small, and scale as you need to, while only paying for what scale you are at. What this means is that you can get 1TB of of data per year at less than a thousand dollars with Redshift. This is several times cheaper than most other dataware house providers costs, and again, you can scale and grow as your business dictates without you needing to sign an expensive contract upfront.
Given that we have all these different options, from running pretty much anything you want yourself, to making use of one of the database services AWS provides, how do you choose? How do you decide between SQL and NoSQL?
So for this scenario today, we’re going to go with RDS and MYSQL as our database engine.
Next up we need to address the lack of failover and redundancy in our infrastructure. We’re going to do this by adding in another webapp instance, and enabling the Multi-AZ feature of RDS, which will give us a standby instance in a different AZ from the Primary. We’re also going to replace our EIP with an Elastic Load Balancer to share the load between our two web instances
For those who aren’t familiar yet with ELB( Elastic Load Balancer ), it is a highly scalable load balancing service that you can put infront of tiers of your application where you have multiple instances that you want to share load across. ELB is a really great service, in that it does a lot for you without you having to do much. It will create a self-healing/self-scaling LB that can do things such as SSL termination, handle sticky Sessions, have multiple listeners. It will also do health checks back to the instances behind it, and puts a bunch of metrics into CloudWatch for you as well. This is a key service in building highly available infrastructures on AWS.
Read this slide.
Most of you will get to this point and be pretty well off honestly. You can take this really pretty far for most web applications. We could scale this out over another AZ maybe. Add in another tier of read replicas.
but its not that efficient in both performance or cost, and since those are important too, let’s clean up this infrastructure a bit.
The biggest things we can do, and these are incredibly important, is lighten up some of the work our webapp is doing, as well as make life easier for our database. We can start by moving any static assets from our webapp instances to S3, and then serving those objects via CloudFront. We can also move things like session information, and any other temporary application data to a memory based cache like one supported by ElastiCache or DynamoDB. We can also use this same cache to store some of our database query results which will help us from hitting the database too much.
Talk about S3
Talk about CloudFront. Make sure to mention the two charts to the right. Static content will certainly speed up your site, but Static&Dynamic content is even better. The chart down below is showing data from a real customer who went from very little traffic, to a huge spike of over 60gigabits per second, without having to do anything on their side, or notify AWS at all.
Imagine for instance if you cached the search pages for highly requested queries. This could take load off your search, off your web application, your database, etc. So now we can see here that we’ve got CloudFront in front of both S3 and our ELB. Now that we’ve got that covered, lets move back to the session information, and database queries we can be caching as well.
Talk about DynamoDB. We could use it in this work flow to store session information from our web application.
Read slide
Read slide
Talk about auto-scaling.
Imagine a “typical” week of traffic to Amazon.com. This pattern might look a lot like your own traffic, with a peak during the middle of the day, and a low down in the middle of the night.
Given this pattern, it becomes really easy to do capacity planning. We can provision say 15% over what we see our normal peak to be, and be happy with the capacity we have for a while, so long as our traffic matches this pattern. BUT, what if I told you that this was the first week of November for Amazon.com
And this was the month of November! We can see a pretty big growth here at the end of the month with Black Friday and Cyber Monday sales in the US.
IF we attempted to do our ‘add 15% capacity for spikes” rule, we’d be in trouble for the month of November.
That’s a lot of potential wasted infrastructure and cost. 76% wasted potentially, while only 24% of it on average for the month gets utilized. Traditionally this is how IT did things. You bought servers for a 6-12 month vision on what growth might be. So, since we can all agree this is bad, what is the solution.
Well what if we could map our capacity directly to our needs based on our end-users? We could make sure that at any point in time, we were only provisioning what we needed, vs some arbitrary capacity guess.
Read slide.
If we add in auto-scaling, our caching layer(both inside, and outside our infrastructure), and the read-replicas with MySQL, we can now handle a pretty serious load. This could potentially even get us into the millions of users by itself if continued to be scaled horizontally and vertically.