Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013

Maximizing Amazon S3 Performance
Craig Carl, AWS
November 15, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Trillions
Of Unique Customer Objects

1.5 Million+
Peak Transactions Per Second

Architecture

Optimizing PUTs

Choosing a region

Multipart upload

Building a naming scheme
Considering LISTs

Optimizing GETs
Using CloudFront
Range-based GETs

Choosing a Region
• Performance
– Proximity to your users
– Co-locating with compute, other AWS resources

• Other things to think about
– Legal and regulatory requirements
– Costs vary by region

Pay Attention to Your Naming Scheme If:
• You want consistent performance from a bucket
• You want a bucket capable of routinely
exceeding 100 TPS
http://amzn.to/18oF5LC

Transactions Per Second (TPS)

1

8

2

5

100/8 = 12.5 events/sec
100,000 users @ 10 events an hour = 224 TPS

Distributing Key Names
• Don’t do this
<my_bucket>/2013_11_13-164533125.jpg
<my_bucket>/2013_11_13-051033564.jpg
<my_bucket>/2013_11_13-061133789.jpg
<my_bucket>/2013_11_13-051033458.jpg
<my_bucket>/2013_11_12-063433125.jpg
<my_bucket>/2013_11_12-021033564.jpg
<my_bucket>/2013_11_12-065533789.jpg
<my_bucket>/2013_11_12-011033458.jpg
<my_bucket>/2013_11_11-022333125.jpg
<my_bucket>/2013_11_11-153433564.jpg
<my_bucket>/2013_11_11-065233789.jpg
<my_bucket>/2013_11_11-065633458.jpg

Distributing Key Names
• Add randomness to the beginning of the key name
<my_bucket>/521335461-2013_11_13.jpg
<my_bucket>/465330151-2013_11_13.jpg
<my_bucket>/987331160-2013_11_13.jpg
<my_bucket>/465765461-2013_11_13.jpg
<my_bucket>/125631151-2013_11_13.jpg
<my_bucket>/934563160-2013_11_13.jpg
<my_bucket>/532132341-2013_11_13.jpg
<my_bucket>/565437681-2013_11_13.jpg
<my_bucket>/234567460-2013_11_13.jpg
<my_bucket>/456767561-2013_11_13.jpg
<my_bucket>/345565651-2013_11_13.jpg
<my_bucket>/431345660-2013_11_13.jpg

Other Techniques for Distributing Key Names
• Store objects as a hash of their name
– add the original name as metadata
• “deadmau5_mix.mp3”  0aa316fb000eae52921aab1b4697424958a53ad9
– watch for duplicate names!

– prepend keyname with short hash
• 0aa3-deadmau5_mix.mp3

• Epoch time (reverse)
– 5321354831-deadmau5_mix.mp3

Randomness in a Key Name Can Be an Anti-Pattern

• Lifecycle policies
• LISTs with prefix filters
• Maintaining thumbnails of images
– craig.jpg -> stored as orig-09329jed0fc
– thumb-09329jed0fc

• When you need to recover a file with its original
name

Solving for the Anti-Pattern
• Add additional prefixes to help sorting
<my_bucket>/images/521335461-2013_11_13.jpg
<my_bucket>/images/465330151-2013_11_13.jpg
<my_bucket>/movies/293924440-2013_11_13.jpg
<my_bucket>/movies/987331160-2013_11_13.jpg
<my_bucket>/thumbs-small/838434842-2013_11_13.jpg

• Amazon S3 maintains keys lexicographically in its
internal indices

Distributing Your Key Names Is Always a Good Idea!

It can take some time for improvements to manifest

Open a support case if you need an immediate bump
or if you’ve got any questions!

http://amzn.to/18oF5LC

Using Amazon CloudFront for Distribution
•
•
•
•

Caches objects from Amazon S3
Reduces the number of Amazon S3 GETs
Low latency with multiple endpoints
High transfer rate

• Two flavors:
– Web distribution (static content)
– RTMP distribution (on-demand streaming of media)

Multipart Upload Provides Parallelism
• Allows faster, more flexible uploads
• Allows you to upload a single object as a set of parts
• Upon upload, Amazon S3 then presents all parts as
a single object
• Enables parallel uploads, pausing and resuming
an object upload, and beginning uploads before
you know the total object size

Choose the Right Part Size
• Strike a balance between part size and number of parts
– Lots of small parts increase connection overhead, invalidating the benefits
of parallelism
– Too few large parts don’t get you enough benefits of multipart; don’t get you
resiliency to network errors

• We recommend parts of 25–50 MB on higher-bandwidth
networks and parts of 10 MB on mobile networks

You Can Parallelize Your GETs, Too
• Use range-based GETs to get multithreaded
performance when downloading objects
• Compensates for unreliable networks
• Benefits of multithreaded parallelism
• Align your ranges with your parts!

If you’re using SSL and parallelizing…
• You’re likely to become CPU-constrained
because encryption is CPU-intensive
• Amazon S3 recommends using AES-256 to
optimize for security and performance
• You can leverage AES-NI hardware on your host
to improve your performance

If Your Application Relies on LIST…
• Getting the objects your customers have stored
• Seeing sets of files (all animations, videos)
• Getting logs
• Viewing inventories
• Sorting keys based on metadata

What Should You Do?
• Parallelize LIST when you need a sequential list of
your keys
• You should build a secondary index of your keys,
such as with Amazon DynamoDB, to get a faster
alternative to LIST when a sequential list isn’t
sufficient
– Sorting by metadata
– Looking up by category
– Objects by time stamp

LIST Operations with Amazon DynamoDB
• Maintain metadata in DynamoDB
– Keep data about what’s in your buckets in DynamoDB

• On PUTs, enter data about your objects in DynamoDB
• On GETs, use DynamoDB to assist in your search for
specific objects
• You can use DynamoDB to give you “LIST” based on
specific criteria

Wrap up: Maximizing Amazon S3 Performance
Architecture

Optimizing PUTs

Choosing a region

Multipart upload

Building a naming scheme

Considering LISTs

Optimizing GETs
Using CloudFront
Range-based GETs

Please give us your feedback on this
presentation

STG304
As a thank you, we will select prize
winners daily for completed surveys!

Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013

Similaire à Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013 (20)

Plus de Amazon Web Services

Plus de Amazon Web Services (20)

Dernier

Dernier (20)

Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013