This session dives deep into techniques used by successful customers who optimized their use of AWS. Learn tricks and hear tips you can implement right away to reduce waste, choose the most efficient instance, and fine-tune your spending, often with improved performance and a better end-customer experience. We showcase innovative approaches and demonstrate easily-applicable methods for cost optimizing Amazon EC2, Amazon S3, and a host of other services to save you time and money.
2. Introductions and Outline
• Tom Johnston (AWS)
Reducing Cost and Spending Smart
• Sean Simpson (Stitcher)
Moving to AWS – A Story
• Kingsley Wood (AWS)
Maximizing Efficiency and Cost Optimization
• Ashay Padwal (vServ.mobi)
a Spot Case Study
8. Instance types
Start
Tune
Choose an instance
that best meets your
basic requirements
Change instance size up
or down based upon
monitoring
Match memory & virtual
cores
Use CloudWatch &
Trusted Advisor to assess
9. Know your usage
Instance
Free Memory
Free CPU
Free HDD
…
Custom Metrics
…
At 1-min
intervals
PUT
2 weeks
Amazon
CloudWatch
Alarm
11. Instance types
Start
Tune
Roll-Out
Choose an instance
that best meets your
basic requirements
Change instance size up
or down based upon
monitoring
Run multiple instances
in multiple Availability
Zones
Match memory & virtual
cores
Use CloudWatch &
Trusted Advisor to assess
13. Choose your metric
optimize for the metric
Cost per unit of work per instance(size)
Workload A
Workload B
Workload C
Optimal on 4x
m1.xlarge
Optimal on 10x
m1.medium
Optimal on 2x
m3.xxlarge
14. Choose your metric
optimize for the metric
Cost per unit of work per instance (size)
100 concurrent jobs on 10 x m1.large @ $0.26 / hr = $ 0.026 / job
vs
300 concurrent jobs on 10 x m3.xlarge @ $0.58 / hr = $ 0.019 / job
17. Server Load
Capacity of 1 Server
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour of day
18. Server Load
Traditional capacity required
Capacity of 1 Server
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour of day
19. Server Load
Traditional capacity required
Capacity of 1 Server
1 Server for 8 hours
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour of day
20. Server Load
Traditional capacity required
Capacity of 1 Server
1 Server for 8 hours
1 Server for 8 hours
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour of day
21. Traditional capacity required
Server Load
1 Server for 8 hours
Capacity of 1 Server
1 Server for 8 hours
1 Server for 8 hours
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour of day
22. Traditional capacity required
Server Load
1 Server for 8 hours
Capacity of 1 Server
1 Server for 8 hours
1 Server for 8 hours
1 Server for 8 hours
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour of day
23. Server Load
Traditional capacity required
Capacity of 1 Server
1/3rd
Saving
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour of day
29. Reserved instances
On-demand instances
Unix/Linux instances start at
$0.02/hour
Pay as you go for compute power
Low cost and flexibility
Pay only for what you use, no up-front
commitments or long-term contracts
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
30. Reserved instances
On-demand instances
Reserved instances
Unix/Linux instances start at
$0.02/hour
1- or 3-year terms
Pay as you go for compute power
Pay low up-front fee, receive significant hourly
discount
Low cost and flexibility
Low Cost / Predictability
Pay only for what you use, no up-front
commitments or long-term contracts
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
Use Cases:
Applications with steady state or predictable
usage
Applications that require reserved capacity,
including disaster recovery
31. Reserved instances
Heavy utilization RI
On-demand instances
Reserved instances
Unix/Linux instances start at
$0.02/hour
1- or 3-year terms
Pay as you go for compute power
Pay low up-front fee, receive significant hourly
discount
Low cost and flexibility
Low Cost / Predictability
Pay only for what you use, no up-front
commitments or long-term contracts
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
Use Cases:
Applications with steady state or predictable
usage
Applications that require reserved capacity,
including disaster recovery
Up to 58%
Savings
32. Reserved instances
Heavy utilization RI
> 80% utilization
Lower costs up to 58%
On-demand instances
Reserved instances
Unix/Linux instances start at
$0.02/hour
1- or 3-year terms
Pay as you go for compute power
Pay low up-front fee, receive significant hourly
discount
Low cost and flexibility
Low Cost / Predictability
Pay only for what you use, no up-front
commitments or long-term contracts
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
Use Cases:
Applications with steady state or predictable
usage
Applications that require reserved capacity,
including disaster recovery
Use Cases: Databases, Large Scale HPC,
Always-on infrastructure, Baseline
33. Reserved instances
Heavy utilization RI
> 80% utilization
Lower costs up to 58%
On-demand instances
Reserved instances
Unix/Linux instances start at
$0.02/hour
1- or 3-year terms
Pay as you go for compute power
Pay low up-front fee, receive significant hourly
discount
Low cost and flexibility
Low Cost / Predictability
Pay only for what you use, no up-front
commitments or long-term contracts
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
Use Cases:
Applications with steady state or predictable
usage
Applications that require reserved capacity,
including disaster recovery
Use Cases: Databases, Large Scale HPC,
Always-on infrastructure, Baseline
Medium utilization RI
Up to 49%
Savings
34. Reserved instances
Heavy utilization RI
> 80% utilization
Lower costs up to 58%
On-demand instances
Reserved instances
Unix/Linux instances start at
$0.02/hour
1- or 3-year terms
Pay as you go for compute power
Pay low up-front fee, receive significant hourly
discount
Low cost and flexibility
Low Cost / Predictability
Pay only for what you use, no up-front
commitments or long-term contracts
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
Use Cases: Databases, Large Scale HPC,
Always-on infrastructure, Baseline
Medium utilization RI
41-79% utilization
Lower costs up to 49%
Use Cases:
Applications with steady state or predictable
usage
Applications that require reserved capacity,
including disaster recovery
Use Cases: Web applications, many heavy
processing tasks, running much of the time
35. Reserved instances
Heavy utilization RI
> 80% utilization
Lower costs up to 58%
On-demand instances
Reserved instances
Unix/Linux instances start at
$0.02/hour
1- or 3-year terms
Pay as you go for compute power
Pay low up-front fee, receive significant hourly
discount
Low cost and flexibility
Low Cost / Predictability
Pay only for what you use, no up-front
commitments or long-term contracts
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
Use Cases: Databases, Large Scale HPC,
Always-on infrastructure, Baseline
Medium utilization RI
41-79% utilization
Lower costs up to 49%
Use Cases: Web applications, many heavy
processing tasks, running much of the time
Use Cases:
Light utilization RI
Applications with steady state or predictable
usage
Applications that require reserved capacity,
including disaster recovery
Up to 34%
Savings
36. Reserved instances
Heavy utilization RI
> 80% utilization
Lower costs up to 58%
On-demand instances
Reserved instances
Unix/Linux instances start at
$0.02/hour
1- or 3-year terms
Pay as you go for compute power
Pay low up-front fee, receive significant hourly
discount
Low cost and flexibility
Low Cost / Predictability
Pay only for what you use, no up-front
commitments or long-term contracts
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
Use Cases: Databases, Large Scale HPC,
Always-on infrastructure, Baseline
Medium utilization RI
41-79% utilization
Lower costs up to 49%
Use Cases: Web applications, many heavy
processing tasks, running much of the time
Use Cases:
Light utilization RI
Applications with steady state or predictable
usage
Applications that require reserved capacity,
including disaster recovery
15-40% utilization
Lower costs up to 34%
Use Cases: Disaster Recovery, Weekly /
Monthly reporting, Elastic Map Reduce
37. Best RI for Utilization
$18,000
$16,000
$14,000
$12,000
$10,000
$8,000
Heavy
Medium
Light
$6,000
$4,000
$2,000
$-
O-Demand
38. Best RI for Utilisation
$18,000
$16,000
$14,000
$12,000
$10,000
$8,000
Heavy
Medium
Light
$6,000
$4,000
$2,000
$-
O-Demand
39. Optimizing costs with RIs
14
12
On Demand
10
Light Utilization RI
8
Medium Utilization RI
6
Heavy utilization RI
4
2
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
40. Spot instances
On-demand instances
Reserved instances
Spot instances
Unix/Linux instances start at
$0.02/hour
1- or 3-year terms
Bid on unused EC2 capacity
Pay as you go for compute power
Pay low up-front fee, receive significant hourly
discount
Spot Price based on supply/demand,
determined automatically
Low cost and flexibility
Low Cost / Predictability
Cost / Large Scale, dynamic workload handling
Pay only for what you use, no up-front
commitments or long-term contracts
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
Use Cases:
Use Cases:
Applications with flexible start and end times
Applications with steady state or predictable
usage
Applications only feasible at very low compute
prices
Applications that require reserved capacity,
including disaster recovery
41. Governance Matters
• Who can create and launch instances?
• Who checks that only needed instances are
running?
• Have specific policies
• Use AWS tools such as IAM to help enforce
them
42. Checklist
•
•
•
•
•
•
Identify your goals
Understand your workload & match to instances
Scale up and down with demand
Align purchasing methods & utilization
Have governance appropriate to your goals
Change in goals & workload will drive change in
use of AWS
43.
44. Moving to AWS – A Story
Sean Simpson
Director of Operations - Stitcher, Inc.
45. What is Stitcher?
• Stitcher is to news and talk radio what Pandora
is to music
• Stitcher is a content aggregator
• Stitcher is an on-demand service
• Stitcher is deployed on mobile, CE, and
automotive platforms
46. Stitcher by the Numbers
•
•
•
•
12 million downloads
20,000+ shows
Over 1 million hours of listening weekly
Over 100 TB outbound data monthly
47. With Growth Comes Pain
• DRBD database locked us into hardware
• Sublease of colocation facility restricted our
access to our servers
• Server leases and purchases constrained our
architecture
• Growth inhibited by human, server, and vendor
resources
48. What options did we consider?
• Move to another colocation facility
• Move to a cloud provider
• Move to a hybrid colocation/cloud provider
49. Why we chose Amazon Web Services
• Familiarity
– Already using Amazon Simple Storage Service for our RSS
feeds
– Already experimenting with Amazon Elastic Compute Cloud
– Recently implemented Amazon Simple Queue Service
50. Why we chose Amazon Web Services
• Flexibility / Scalability
– Ability to adjust resources quickly in our production environment
– Ability to create any number of environments
– Ability to design servers as we wanted with respect to operating
systems, systems software, etc.
51. Why we chose Amazon Web Services
• Cost
–
–
–
–
Cost matches usage
Bandwidth savings when using Amazon CloudFront as our CDN
Many resources to assist in optimization
Put simply, we got our solution for the lowest quote
52. Why we chose Amazon Web Services
• Documentation & Customer Service
– Knowledgeable solutions architects
– “Right-level” documentation
– Quick response to our needs
53. Architecting Change
• Ask yourself: What are we trying to achieve?
• Know yourself, know your systems
• Consider industry best practices (but don’t
blindly follow them)
• Read the documentation
54. Use Puppet or Chef
• Configuration management tools are both
enabling and liberating
• Build, destroy, and build again
• Write once, build many
• Nuances between node types are managed with
clearly written rules
• Naming conventions are your friend
62. How we save money
•
•
•
•
•
•
•
Reserved instances
Appropriate instance types
CloudFront CDN
Rapid reorganization using the API
Monitor utilization
Load test
Housecleaning
63. On Deck Cost Savings
•
•
•
•
Spot instances for processing tasks
Auto Scaling
In-app optimizations
Instance type tuning
64. Parting Advice
• Architect for 10X
• Take the time to get it right the first time (or at
least, close enough)
• Plan on continuous evolution of systems
67. OFFLOAD all static content
• reduce your compute demand and costs
• improve end-user experience
• increase reliability and durability
+
68.
69. ENTIRE SITE via CloudFront
• minimize client-server chatter (keep it at the edge)
• reduce server-database traffic (cache the common calls)
• speed up mobile app response (persistent connections)
+
70. Real World Example
Standard Setup
Optimized
• 4 x Medium Instances
$485
• AWS Data Transfer 1 TB
$194
• 1 x Medium Instance
$121
• CloudFront Data 1 TB
$168
• CloudFront Requests
$1.89
• Total = $291
• Total = $679
57% Lower Cost + 6X Faster
71.
72.
73.
74.
75. Offloading Tips
• Leverage S3, CloudFront, Route 53
• Eliminate repeated calls (edge and data cache)
• Static website hosting on S3
No web server at all!
• Minimize your EC2 and database footprint
stand up Read Replicas for variable loads
76. Utilization and Auto-Scaling: Granularity
more small instances vs. less large instances
29 Large @
$0.32/hr
= $9.28
59 Small @
$0.08/hr
= $4.72
77. Utilization – Trigger Actions by Event
Leverage CloudWatch to collect and measure metrics
79. The Straits Times Mobile App
REAL-TIME reaction response
•
•
•
•
notification of pending News Flash (with audible alarm)
on-demand ramp up of capacity (6 mins)
subscriber alert push delivered
mass response traffic handled (followed by ramp down)
80. Architecture
Amazon Web Services provides services and
infrastructure to build reliable, fault-tolerant, and
highly available systems in the cloud.
These qualities have been designed into our services
both by handling such aspects without any special
action by you and by providing features that must be
used explicitly and correctly.
91. What are Spot Instances?
• Value
Pricing
• Up to 92% discount
Elastic
• Capacity not otherwise
available
Minimum Commitment
• Commit to 1 hour
• Tradeoff
Potential for interruption
92. Key Points about Spot
•
•
•
•
Spare capacity – supply and demand
Be prepared for no availability at times
Be willing to accept and deal with interruption
Far greater potential scale
starting at 5X default instance limits
• Massive possible capacity = new ideas…
93. Consider 2 Time-to-Value Scenarios
1) Value of results quickly diminishes
2) Value of result stable until deadline
e.g., Engineering simulations
e.g., Analytics before an M&A deal
94. Spot Applications
Ideal Applications
Batch Processing
Time-Delayable
Fault-Tolerant or Restartable
Compute-Intensive
Horizontally Scalable
Stateless Worker Nodes
Region and AZ Independent
Uses Deployment Automation
Less Ideal Applications
Interactive
Strict/Tight SLA for Completion
Expensive to Handle Terminations
Data-Intensive
In-Memory Scaling
Long-Running Worker Nodes
Requires a Single AZ
Manually Launched and Managed
95. Spot Advice and Tips
• Don’t build your reliability ENTIRELY on spot
vServ.mobi – exceptional and smart architecture
• With time flexibility, different approaches:
delayed results, lower cost
spend less, quicker answers
• Ask different questions:
with enormous capacity, what is now possible?
96. Look at the World Differently
•
•
•
•
•
•
Order of magnitude more capacity
New experiments enabled = innovation!
Lucky Oyster – recommendation exchange
Prototyping a new search technology idea (using Common Crawl)
3.4 billion web pages > 1 TB of data > Index of 400 million entities
“The cost? About $100... in about 14 hours”
97. A Spot Case Study
Ashay Padwal
CoFounder & CTO – vServ.mobi
100. 31 Bn Ad Requests / Month
11% EUROPE
11% REST OF ASIA
7% NORTH
AMERICA
33% INDIA
10% SOUTH
AMERICA
14% MIDDLE
EAST & AFRICA
14% SE ASIA
Over 200 Mn Unique Users / Month
101. Infrastructure: Requirements & Challenges
1
2
3
4
Requirement: Self Serve for Publisher On-boarding & Exit
Challenge: No Capacity Planning; Extreme Scalability
Requirement: Start Up
Challenge: No Capex, no Lock-in
Requirement: Least Latency & High Availability
Challenge: Suite of services – Compute, Load Balancing,
DNS, CDN, Storage, Multiple DCs per location
Requirement: Global Setup management with small team
Challenge: Availability across Regions with extensive APIs
102. Infrastructure: Solution
1
AWS
2
AWS
3
EC2 & ELB – Multi-AZ
Route53, CloudFront, S3
4
US East, US West, Europe, South America, Asia
For Middle East, we host in Turkey
For Africa, we host in South Africa
105. Now What? Reduce Cost without impacting Performance
• AWS is pretty cost-effective. But we were greedy!
• Saving more meant more money for other areas in our
business.
• We walked in the opposite direction... and it worked!
• We use spot instances in production extensively.
• Sounds risky? - Yes, but if you architect your system
correctly, you should be safe.
106. What we did
1
2
Selected the right Instance Type
- use CloudWatch for CPU & memory usage
- Load Test
Designed our servers to be self-sufficient and perishable
-
3
Business logic & DB on same server
Transaction Logs written to EBS
Auto Setup on Server
Data Collection module
We built a custom Scaling solution
-
Add/Remove instances by checking present traffic & predicting traffic
in the immediate future
Based on trending of spot prices either try launching spot or fall back
to on-demand instances
Remove servers if in use between 45-55min
Track spot prices to shift to on-demand
107. What AWS did
1
Reduced pricing for EC2 (On Demand & Reserved) and S3
2
Cheap Archival System - Glacier
3
Pre warming of Load Balancer (ELB)
4
AMI movement across regions
5
ELB with equal distribution of traffic across instances
spread in any Availability Zone
109. Closing – Key Takeaways
• Re-evaluate, revist and re:Invent
Evolve along with AWS
• Leverage
Managed Services, CloudWatch
• Stay up to date
RI modifications, Trusted Advisor
• AWS Blog: aws.typepad.com
110. Please give us your feedback on this
presentation
CPN211
As a thank you, we will select prize
winners daily for completed surveys!