Reducing Cost and Maximizing Efficiency on AWS

CPN211 - Reducing Cost and Maximizing
Efficiency: Tightening the Belt on AWS
Tom Johnston - Business Development Manager, Amazon Web Services
Sean Simpson - Director of Operations, Stitcher, Inc.
Kingsley Wood - Business Development Manager, Amazon Web Services
Ashay Padwal - CTO, Vserv.mobi

November 15, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Introductions and Outline
• Tom Johnston (AWS)
Reducing Cost and Spending Smart
• Sean Simpson (Stitcher)
Moving to AWS – A Story
• Kingsley Wood (AWS)
Maximizing Efficiency and Cost Optimization
• Ashay Padwal (vServ.mobi)
a Spot Case Study

Reducing Cost
and
Spending Smart
Tom Johnston – Business Development Manager, AWS

Fundamentals

•
•
•
•
•

Explicit Objectives
Match Instances with Workloads
Match Scale & Use with Demand
Match Purchasing with Utilization
Governance Matters

Objectives

AWS provides you the ability to
match your architecture to your
objectives

Instance types

Start
Choose an instance
that best meets your
basic requirements
Match memory & virtual
cores

Instance types

Start

Tune

Choose an instance
basic requirements

Change instance size up
or down based upon
monitoring

cores

Use CloudWatch &
Trusted Advisor to assess

Know your usage

Instance

Free Memory
Free CPU
Free HDD
…
Custom Metrics
…
At 1-min
intervals

PUT

2 weeks

Amazon
CloudWatch

Alarm

More
Memory
Memory (GB)

High-Mem
Cluster
Compute

High
Storage
High
I/O

High
Mem

Cluster
Compute

M3
C3

M1
High-CPU

Processing Ability

More
Processing

Instance types

Start

Tune

Roll-Out

Choose an instance
basic requirements

Change instance size up
or down based upon
monitoring

Run multiple instances
in multiple Availability
Zones

cores

Use CloudWatch &
Trusted Advisor to assess

Choose your metric
optimize for the metric

Choose your metric
Cost per unit of work per instance(size)
Workload A

Workload B

Workload C

Optimal on 4x
m1.xlarge

Optimal on 10x
m1.medium

Optimal on 2x
m3.xxlarge

Choose your metric
Cost per unit of work per instance (size)

100 concurrent jobs on 10 x m1.large @ $0.26 / hr = $ 0.026 / job
vs
300 concurrent jobs on 10 x m3.xlarge @ $0.58 / hr = $ 0.019 / job

Choose your metric
Think workload density
Don’t just focus on instance hourly rate

Server Load
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour of day

Server Load

Capacity of 1 Server

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour of day

Server Load

Traditional capacity required


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour of day

Server Load



1 Server for 8 hours

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour of day

Server Load





0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour of day


Server Load




0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour of day


Server Load





0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour of day

Server Load



1/3rd
Saving

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour of day

6

Instance Count

5
4
3
2
1
0
0

2

4

6

8

10

12 14 16 18
Day of Month

20

22

24

26

28

30

6

Instance Count

5

Monthly
predictable
peak
processing

4
3
2
1
0
0

2

4

6

8

10

12 14 16 18
Day of Month

20

22

24

26

28

30


6

Instance Count

5
4
3
2
1
0
0

2

4

6

8

10

12 14 16 18
Day of Month

20

22

24

26

28

30


6

Instance Count

5
4
3
2
1

Elastic Capacity
0
0

2

4

6

8

10

12 14 16 18
Day of Month

20

22

24

26

28

30


6

Instance Count

5
4

75% Savings

3
2
1

Elastic Capacity
0
0

2

4

6

8

10

12 14 16 18
Day of Month

20

22

24

26

28

30

Reserved instances

On-demand instances
Unix/Linux instances start at
$0.02/hour
Pay as you go for compute power
Low cost and flexibility

Pay only for what you use, no up-front
commitments or long-term contracts
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing

Reserved instances

On-demand instances

Reserved instances

$0.02/hour

1- or 3-year terms


Pay low up-front fee, receive significant hourly
discount


Low Cost / Predictability


Helps ensure compute capacity is available
when needed

Use Cases:

Use Cases:
Applications with steady state or predictable
usage
Applications that require reserved capacity,
including disaster recovery

Reserved instances

Heavy utilization RI

On-demand instances

Reserved instances

$0.02/hour

1- or 3-year terms


discount




when needed

Use Cases:

Use Cases:
usage

Up to 58%
Savings

Reserved instances

> 80% utilization
Lower costs up to 58%

On-demand instances

Reserved instances

$0.02/hour

1- or 3-year terms


discount




when needed

Use Cases:

Use Cases:
usage

Use Cases: Databases, Large Scale HPC,
Always-on infrastructure, Baseline

Reserved instances

> 80% utilization

On-demand instances

Reserved instances

$0.02/hour

1- or 3-year terms


discount




when needed

Use Cases:

Use Cases:
usage


Medium utilization RI

Up to 49%
Savings

Reserved instances

> 80% utilization

On-demand instances

Reserved instances

$0.02/hour

1- or 3-year terms


discount




when needed

Use Cases:


41-79% utilization

Use Cases:
usage

Use Cases: Web applications, many heavy
processing tasks, running much of the time

Reserved instances

> 80% utilization

On-demand instances

Reserved instances

$0.02/hour

1- or 3-year terms


discount




when needed

Use Cases:


41-79% utilization

Use Cases:

Light utilization RI
usage

Up to 34%
Savings

Reserved instances

> 80% utilization

On-demand instances

Reserved instances

$0.02/hour

1- or 3-year terms


discount




when needed

Use Cases:


41-79% utilization

Use Cases:

Light utilization RI
usage

15-40% utilization
Use Cases: Disaster Recovery, Weekly /
Monthly reporting, Elastic Map Reduce

Best RI for Utilization
$18,000

$16,000
$14,000
$12,000
$10,000
$8,000

Heavy
Medium
Light

$6,000
$4,000
$2,000
$-

O-Demand

Best RI for Utilisation
$18,000

$16,000
$14,000
$12,000
$10,000
$8,000

Heavy
Medium
Light

$6,000
$4,000
$2,000
$-

O-Demand

Optimizing costs with RIs
14

12

On Demand
10

Light Utilization RI
8

Medium Utilization RI
6

4

2

0
1

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Spot instances

On-demand instances

Reserved instances

Spot instances

$0.02/hour

1- or 3-year terms

Bid on unused EC2 capacity


discount

Spot Price based on supply/demand,
determined automatically



Cost / Large Scale, dynamic workload handling


when needed

Use Cases:

Use Cases:
Use Cases:

Applications with flexible start and end times

usage

Applications only feasible at very low compute
prices


Governance Matters
• Who can create and launch instances?
• Who checks that only needed instances are
running?
• Have specific policies
• Use AWS tools such as IAM to help enforce
them

Checklist
•
•
•
•
•
•

Identify your goals
Understand your workload & match to instances
Scale up and down with demand
Align purchasing methods & utilization
Have governance appropriate to your goals
Change in goals & workload will drive change in
use of AWS

Moving to AWS – A Story
Sean Simpson
Director of Operations - Stitcher, Inc.

What is Stitcher?
• Stitcher is to news and talk radio what Pandora
is to music
• Stitcher is a content aggregator
• Stitcher is an on-demand service
• Stitcher is deployed on mobile, CE, and
automotive platforms

Stitcher by the Numbers
•
•
•
•

12 million downloads
20,000+ shows
Over 1 million hours of listening weekly
Over 100 TB outbound data monthly

With Growth Comes Pain
• DRBD database locked us into hardware
• Sublease of colocation facility restricted our
access to our servers
• Server leases and purchases constrained our
architecture
• Growth inhibited by human, server, and vendor
resources

What options did we consider?
• Move to another colocation facility
• Move to a cloud provider
• Move to a hybrid colocation/cloud provider

Why we chose Amazon Web Services
• Familiarity
– Already using Amazon Simple Storage Service for our RSS
feeds
– Already experimenting with Amazon Elastic Compute Cloud
– Recently implemented Amazon Simple Queue Service

• Flexibility / Scalability
– Ability to adjust resources quickly in our production environment
– Ability to create any number of environments
– Ability to design servers as we wanted with respect to operating
systems, systems software, etc.

• Cost
–
–
–
–

Cost matches usage
Bandwidth savings when using Amazon CloudFront as our CDN
Many resources to assist in optimization
Put simply, we got our solution for the lowest quote

• Documentation & Customer Service
– Knowledgeable solutions architects
– “Right-level” documentation
– Quick response to our needs

Architecting Change
• Ask yourself: What are we trying to achieve?
• Know yourself, know your systems
• Consider industry best practices (but don’t
blindly follow them)
• Read the documentation

Use Puppet or Chef
• Configuration management tools are both
enabling and liberating
• Build, destroy, and build again
• Write once, build many
• Nuances between node types are managed with
clearly written rules
• Naming conventions are your friend

Looks nice, but what does it do?
•
•
•
•
•

High Availability
Scalability
Security
Performance
Cost effectiveness

The Results – Database connections/sec
Before

225

After

450

0

100

200

300

400

500

The Results – GetStationPlaylist()
Before

0.75

After

0.1

0

0.2

0.4

0.6

0.8

The Results – Maximum throughput
Before

5000

After

20000

0

5000

10000

15000

20000

25000

The Results – Downtime
Before

1200

15

After

0

200

400

600

800

1000

1200

1400

Cost Optimization Results
• Twice the results for the same money

How we save money
•
•
•
•
•
•
•

Reserved instances
Appropriate instance types
CloudFront CDN
Rapid reorganization using the API
Monitor utilization
Load test
Housecleaning

On Deck Cost Savings
•
•
•
•

Spot instances for processing tasks
Auto Scaling
In-app optimizations
Instance type tuning

Parting Advice
• Architect for 10X
• Take the time to get it right the first time (or at
least, close enough)
• Plan on continuous evolution of systems

Maximizing Efficiency
and
Cost Optimization
Kingsley Wood – Business Development Manager, AWS

Considerations
•
•
•
•
•
•

Offloading – reduce footprint
Utilization – your biggest lever
Managed Services – leverage RDS, SQS, SES
Consolidated Billing – pooling resources
Flexible Evolution – continually revisit
Spot Instances – think big, new possibilities

OFFLOAD all static content
• reduce your compute demand and costs
• improve end-user experience
• increase reliability and durability

+

ENTIRE SITE via CloudFront
• minimize client-server chatter (keep it at the edge)
• reduce server-database traffic (cache the common calls)
• speed up mobile app response (persistent connections)

+

Real World Example
Standard Setup

Optimized

• 4 x Medium Instances
$485
• AWS Data Transfer 1 TB
$194

• 1 x Medium Instance
$121
• CloudFront Data 1 TB
$168
• CloudFront Requests
$1.89
• Total = $291

• Total = $679

57% Lower Cost + 6X Faster

Offloading Tips
• Leverage S3, CloudFront, Route 53
• Eliminate repeated calls (edge and data cache)
• Static website hosting on S3
No web server at all!
• Minimize your EC2 and database footprint
stand up Read Replicas for variable loads

Utilization and Auto-Scaling: Granularity
more small instances vs. less large instances
29 Large @
$0.32/hr
= $9.28
59 Small @
$0.08/hr
= $4.72

Utilization – Trigger Actions by Event
Leverage CloudWatch to collect and measure metrics

Buuuk for Singapore Press Holdings (SPH)

The Straits Times Mobile App
REAL-TIME reaction response
•
•
•
•

notification of pending News Flash (with audible alarm)
on-demand ramp up of capacity (6 mins)
subscriber alert push delivered
mass response traffic handled (followed by ramp down)

Architecture
Amazon Web Services provides services and
infrastructure to build reliable, fault-tolerant, and
highly available systems in the cloud.
These qualities have been designed into our services
both by handling such aspects without any special
action by you and by providing features that must be
used explicitly and correctly.

Managed Services

Amazon Relational
Database Service
(RDS)

Amazon
ElastiCache

Amazon Simple
Queue Service
(SQS)

Elastic Load
Balancing

Amazon Elastic
MapReduce

Amazon Simple
Email Service
(SES)

Amazon Simple
Notification Service
(SNS)

$0.028
per hour

DNS

Elastic Load
Balancing

Web Servers
Availability Zone

$0.028
per hour

DNS

Elastic Load
Balancer

Web Servers
Availability Zone

VS

$0.08
per hour
(small instance)

DNS

EC2 instance
+ software LB

Web Servers

Availability Zone

Consumers
Producer

$0.50 per
1,000,000 Requests
($0.0000005 per Request)

SQS queue

Consumers
Producer

SQS queue

$0.50 per
1,000,000 Requests
($0.0000005 per Request)

VS

$0.08
per hour
(small instance)

Producer

EC2 instance
+ software queue

Consumers

RI Purchases to grow a Resource Pool
35
30
25

E
D
C
B
A

20
15
Reserved Instance
Pool

10
5

0
1

2

3

4

5

6

7

8

9

10

11

12

Flexibility: Take advantage!
Architecture
vs.
Gardening
STOP/START
size changes
new instance types
vary capacity
rearrange, etc.

What are Spot Instances?
• Value
 Pricing
• Up to 92% discount

 Elastic
• Capacity not otherwise
available

 Minimum Commitment
• Commit to 1 hour

• Tradeoff
 Potential for interruption

Key Points about Spot
•
•
•
•

Spare capacity – supply and demand
Be prepared for no availability at times
Be willing to accept and deal with interruption
Far greater potential scale
starting at 5X default instance limits
• Massive possible capacity = new ideas…

Consider 2 Time-to-Value Scenarios
1) Value of results quickly diminishes

2) Value of result stable until deadline

e.g., Engineering simulations

e.g., Analytics before an M&A deal

Spot Applications
Ideal Applications
Batch Processing
Time-Delayable
Fault-Tolerant or Restartable
Compute-Intensive
Horizontally Scalable
Stateless Worker Nodes
Region and AZ Independent
Uses Deployment Automation

Less Ideal Applications
Interactive
Strict/Tight SLA for Completion
Expensive to Handle Terminations
Data-Intensive
In-Memory Scaling
Long-Running Worker Nodes
Requires a Single AZ
Manually Launched and Managed

Spot Advice and Tips
• Don’t build your reliability ENTIRELY on spot
vServ.mobi – exceptional and smart architecture
• With time flexibility, different approaches:
delayed results, lower cost
spend less, quicker answers
• Ask different questions:
with enormous capacity, what is now possible?

Look at the World Differently
•
•
•
•
•
•

Order of magnitude more capacity
New experiments enabled = innovation!
Lucky Oyster – recommendation exchange
Prototyping a new search technology idea (using Common Crawl)
3.4 billion web pages > 1 TB of data > Index of 400 million entities
“The cost? About $100... in about 14 hours”

A Spot Case Study
Ashay Padwal
CoFounder & CTO – vServ.mobi

GLOBAL

INNOVATION

FOCUSED

Award Winning
Mobile Ad Exchange
across Emerging Markets

31 Bn Ad Requests / Month

11% EUROPE

11% REST OF ASIA

7% NORTH
AMERICA
33% INDIA

10% SOUTH
AMERICA

14% MIDDLE
EAST & AFRICA

14% SE ASIA

Over 200 Mn Unique Users / Month

Infrastructure: Requirements & Challenges
1

2

3

4

Requirement: Self Serve for Publisher On-boarding & Exit
Challenge: No Capacity Planning; Extreme Scalability
Requirement: Start Up
Challenge: No Capex, no Lock-in
Requirement: Least Latency & High Availability
Challenge: Suite of services – Compute, Load Balancing,
DNS, CDN, Storage, Multiple DCs per location

Requirement: Global Setup management with small team
Challenge: Availability across Regions with extensive APIs

Infrastructure: Solution
1

AWS

2

AWS

3

EC2 & ELB – Multi-AZ
Route53, CloudFront, S3

4

US East, US West, Europe, South America, Asia
For Middle East, we host in Turkey
For Africa, we host in South Africa

Now What? Reduce Cost without impacting Performance
• AWS is pretty cost-effective. But we were greedy!

• Saving more meant more money for other areas in our
business.
• We walked in the opposite direction... and it worked!
• We use spot instances in production extensively.
• Sounds risky? - Yes, but if you architect your system
correctly, you should be safe.

What we did
1

2

Selected the right Instance Type
- use CloudWatch for CPU & memory usage
- Load Test

Designed our servers to be self-sufficient and perishable
-

3

Business logic & DB on same server
Transaction Logs written to EBS
Auto Setup on Server
Data Collection module

We built a custom Scaling solution
-

Add/Remove instances by checking present traffic & predicting traffic
in the immediate future
Based on trending of spot prices either try launching spot or fall back
to on-demand instances
Remove servers if in use between 45-55min
Track spot prices to shift to on-demand

What AWS did
1

Reduced pricing for EC2 (On Demand & Reserved) and S3

2

Cheap Archival System - Glacier

3

Pre warming of Load Balancer (ELB)

4

AMI movement across regions

5

ELB with equal distribution of traffic across instances
spread in any Availability Zone

THANK YOU!
Ashay Padwal
CTO & Co-Founder
ashay@vserv.mobi

Closing – Key Takeaways
• Re-evaluate, revist and re:Invent
Evolve along with AWS
• Leverage
Managed Services, CloudWatch
• Stay up to date
RI modifications, Trusted Advisor
• AWS Blog: aws.typepad.com

Please give us your feedback on this
presentation

CPN211
As a thank you, we will select prize
winners daily for completed surveys!

Reducing Cost and Maximizing Efficiency on AWS

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Reducing Cost and Maximizing Efficiency on AWS

Similaire à Reducing Cost and Maximizing Efficiency on AWS (20)

Plus de Amazon Web Services

Plus de Amazon Web Services (20)

Dernier

Dernier (20)

Reducing Cost and Maximizing Efficiency on AWS