A behind the scenes look at key aspects of the AWS infrastructure deployments. Some of the true differences between a cloud infrastructure design and conventional enterprise infrastructure deployment and why the cloud fundamentally changes application deployment speed, economics, and provides more and better tools for delivering high reliability applications. Few companies can afford to have a datacenter in every region in which they serve customers or have employees. Even fewer can afford to have multiple datacenter in each region where they have a presence. Even fewer can afford to invest in custom optimized network, server, storage, monitoring, cooling, and power distribution systems and software. We'll look more closely at these systems, how they work, how they are scaled, and the advantages they bring to customers.
3. Perspective on Scaling
On average, AWS adds enough
new server capacity every day
to support Amazon’s global
infrastructure when it was a
$7B business (2004).
10. “AWS is the overwhelming market
share leader, with more than five
times the compute capacity in
use than the aggregate total of
the other fourteen providers.”
12. Pace of Innovation
Infrastructure pace of
innovation increasing
– Driven by cloud service providers and
high-scale internet applications
– Cost of datacenter and H/W
infrastructure dominates
– Infrastructure more than just a cost
center
High focus on innovation
– Driving down cost
– Increasing aggregate reliability
– Reducing resource consumption
footprint
13. AWS Custom Server Designs
OEM Server Ecosystem
– Optimized for 10s to 100s of thousands of customers
– Broadly applicable servers can run a variety of workloads
Cloud Server Ecosystem
– Optimized for single customer
– Highly specialized servers optimized for specific workload
– Large scale deployments allow hardware specialization
– Move hot s/w kernels to hardware implementations
– Datacenters, servers, networking, storage to designed to integrated spec.
14. AWS Custom Storage Designs
Commercial high-density storage:
• Quanta M4600H 4U Disk Enclosure
• Impressive best in class general purpose design
• We use custom design with still higher density
OEM storage & servers must target vast workload
diversity
High scale supports AWS-specific optimizations
– More space, power, & cost efficient
15. Networking Equipment
• Relative cost of networking
increasing quickly
• Profit margins high
• Ecosystem vertically
integrated
8%
3 year server & 10 year infrastructure amortization
Monthly Costs
16. Get the Network Out of the Way
Current Networks Over-SubscribedMainframe Model Goes Commodity
• Forces workload placement
restrictions
• Goal: Make all points in
datacenter equidistant
• Amazon custom routers &
protocol stacks
18. Procurement & Supply Chain Optimization
Global demand allows
purchasing power at volume
Direct component purchasing
– Precise inventory control
– Better pricing
– Optimized designs
Supply ChainProcurement
Demand-driven supply chain
Shorter cycle time drives higher
utilization
– Predicting next week easier
than 4 to 6 months out
Less overbuy & less capacity risk
yielding lower costs
19. Utilization & Economics
On premise 30% utilization
VERY good &10% to 20%
more common
Solution: Pool number of
heterogeneous services
Don’t block the business
Don’t over-buy
Transfers capital expense
to variable expense
Apply capital for business
investments rather than
infrastructure
Cost encourages prioritization
of work by application
developers
High scale needed to make a
spot market for low priority
work
Pay as You Go
Pay as You Grow
Server Utilization
Problem
Chargeback Models
Drive Good Behavior
20. Amazon Cycle of Innovation
15+ years of
operational excellence
LowerReduce
Prices
Innovate
Listen to
Customers
Lower
Costs
Improve
Processes
Re-invest
in
Features
42 AWS price
reductions since 2006
21. AWS Pace of Innovation
New
Service
Announcements
&
Updates
9
24
48 61
82
159
280
88
2007 2008 2009 2010 2011 2012 2013 2014
23. Conventional Design: Cross-Region Replication
5th app availability “9” only via multi-datacenter replication
Conventional approach:
– Two datacenters in distant locations
– Replicate all data to both datacenters
The industry-wide dominant multi-DC availability approach
– Looks rock solid but performs remarkably poorly in
practice
Acid Test: Are you willing to pull the plug on the primary server?
99.999%
24. What is wrong with inter-regional replication?
Asynchronous replication between datacenters
– Committing to an SSD order 1 to 2 msec
– LA to New York 74 msec round trip
On failure, a difficult & high skill decision:
– Fail-over & lose transactions, or
– Don’t fail-over & lose availability
I’ve been on these calls in the past
– No win situation
– Very hard to get right
25. What Else is Wrong with X-Country Replication?
Fragile: Active/Passive Doesn’t Work
– Failover to a system that hasn’t been taking operational load
– Passive secondary not recently tested
– Secondary config or S/W version different, incorrect load balancer config,
incorrect network ACLs, latent hardware problem, router problem,
resource shortage under load
– Can’t test without negative customer impact
– If you don’t test it, it won’t work
2-Way Redundancy Expensive:
– More than ½ capacity reserved to handle failure
– 3 datacenters much less expensive but impractical w/o high scale
26. AWS Multi-Availability Zone Model
Choose Region to be close to user, close to data, or meeting jurisdictional
requirements
Synchronous replication to 2 (or better 3) Availability Zones
– Easy when less than 2 to 3 msec away
– Can failover w/o customer impact
ELB over EC2 instances in different AZs
Stateless EC2 apps easy
For persistent state use
– DynamoDB
– Simple Storage Service
– Mutli-AZ RDS
27. New Research: Customers
Improve Availability by Migrating
Apps to AWS
32% reduction in total
application downtime
2013 AWS Customer Survey
Research Note: Benchmarking availability and reliability
in the cloud: Amazon Web Services Nucleus Research,
November 2013, Document N168
28. Is Hosting On-premises Less Expensive?
Utilization fundamentally higher in cloud
– Aggregating non-correlated workloads,
scale, spot market
Amazon specific H/W designs
– ODM acquisition of custom servers & net
gear
– Direct purchasing of disk, memory, & CPU
– AWS controlled hypervisor & net protocol
layers
Deep R&D: Many new data centers built each
year
Immense scale
– Volume purchasing, highly automated,
specialists in all areas
Amazon margins are tiny compared to
enterprise margins
29. Summary
AWS Economics driven by scale & singular focus
– Economies of scale
– Increased availability through multiple-datacenter deployment
– Steadily declining price
Mega-scale advantages available to all customers regardless of size
– Datacenter presence near all customers world-wide
– Multiple datacenters in each region for high availability
– Deeper R&D investment & operational focus in datacenter, server, storage, &
networking than any IT organization in the world
– Buying power that rivals the biggest in the world
Cloud Model Fundamentally different from the last 30 years
– Even if rebranded as “cloud enabled”, “private cloud”, “cloud-like”