This document discusses using AWS for storage and archive solutions. It begins by outlining the business and technical benefits AWS can provide, such as reducing costs, reducing on-premise infrastructure needs, changing processes, and removing aging technologies. It then covers fundamental AWS storage services like EBS, S3, and Glacier. Examples of how these services can be used for different storage and archive use cases like backups, data distribution, databases, and long-term archives are provided. Finally, it discusses getting data into AWS and using database services like RDS and DynamoDB.
1. Journey through the Cloud:
Storage & Archive
Ryan Shuttleworth – Technical Evangelist
@ryanAWS
2. Journey through the cloud
Common use cases & stepping stones into the AWS cloud
Learning from customer journeys
Best practices to bootstrap your projects
3. Storage & Archive
Benefit from cloud economics with simple to implement use cases
Simplify the management of data assets
Eliminate technologies & processes
Gain performance and reliability improvements
4. Agenda
Why AWS for storage & archive
AWS fundamental services
Storage & archive – examples & patterns
Where to go next
6. Storage & Archive
AWS is used in a variety of ways…
Powers applications that allows
customers to access historical Store its vast repository of music to
stock price information feed to over 15 million active users
Estimates it has saved $500,000 Digital assets and usage data behind
in storage expenditures and cut publication sites and mobile
its disk storage array costs in half applications
7. Business & technical drivers
You might be able to:
Reduce costs Reduce on-premise
Slash storage & archive budgets Eliminate on premise equipment to
manage archives
Change processes Remove aging technologies
Remove the need to do capacity Eliminate tape for backup and archive
planning
8. Business & technical drivers
You might be able to:
Reduce costs Reduce on-premise
Reduce CAPEX while dramatically
Slash storage & archive budgets by Eliminate on premise equipment to
increasing scalability
up to 50% manage archives
Eliminate the need for secondary
sites
Change processes Remove aging technologies
Remove the need to do capacity Eliminate tape for backup and archive
planning
9. Business & technical drivers
You might be able to:
Reduce costs Reduce on-premise
Reduce CAPEX while dramatically Eliminate 30%+ of your storage
Slash storage & archive budgets by Eliminate on premise equipment to
increasing scalability footprint
up to 50% manage archives
Eliminate the need for secondary Consolidate on-premise and
sites augment with cloud
Change processes Remove aging technologies
Remove the need to do capacity Eliminate tape for backup and archive
planning
10. Business & technical drivers
You might be able to:
Reduce costs Reduce on-premise
Reduce CAPEX while dramatically Eliminate 30%+ of your storage
Slash storage & archive budgets by Eliminate on premise equipment to
increasing scalability footprint
up to 50% manage archives
Eliminate the need for secondary Consolidate on-premise and
sites augment with cloud
Change processes Remove aging technologies
Remove the need to do capacity
Eliminate capacity planning Eliminate tape for backup and archive
planning
Eliminate provisioning for peak
demand
11. Business & technical drivers
You might be able to:
Reduce costs Reduce on-premise
Reduce CAPEX while dramatically Eliminate 30%+ of your storage
Slash storage & archive budgets by Eliminate on premise equipment to
increasing scalability footprint
up to 50% manage archives
Eliminate the need for secondary Consolidate on-premise and
sites augment with cloud
Change processes Remove aging technologies
Remove the need to do capacity
Eliminate capacity planning Eliminate tape for backup and
planning Remove tape archives
Eliminate provisioning for peak
Cycle out aging disk arrays
demand
13. Fundamental Storage Options
Elastic Block Store, S3 and Glacier
Elastic Block Store Simple Storage Service Glacier
High performance block storage device Highly scalable object storage Long term object archive
1GB to 1TB in size 1 byte to 5TB in size Extremely low cost per gigabyte
Mount as drives to instances with 99.999999999% durability 99.999999999% durability
snapshot/cloning functionalities
14. Fundamental Storage Options
Elastic Block Store, S3 and Glacier
Elastic Block Store Simple Storage Service Glacier
High performance block storage device Highly scalable object storage Long term object archive
1GB to 1TB in size 1 byte to 5TB in size Extremely low cost per gigabyte
Mount as drives to instances with 99.999999999% durability 99.999999999% durability
snapshot/cloning functionalities
Very fast Fast web object Slow, rare access
‘instance’ disks storage
15. Fundamental Storage Options
Elastic Block Store, S3 and Glacier
Elastic Block Store Simple Storage Service Glacier
High performance block storage device Highly scalable object storage Long term object archive
1GB to 1TB in size 1 byte to 5TB in size Extremely low cost per gigabyte
Mount as drives to instances with 99.999999999% durability 99.999999999% durability
snapshot/cloning functionalities
16. Fundamental Storage Options
Elastic Block Store, S3 and Glacier
Elastic Block Store Simple Storage Service Glacier
High performance block storage device Persistent storage
Highly scalablelifetime is independent of any particular EC2 instance. archive
Volume
object storage Long term object
1GB to 1TB in size
1 byte to 5TB in size Extremely low cost per gigabyte
Mount as drives to instances with General purpose
snapshot/cloning functionalities 99.999999999% durability 99.999999999% durability
Raw, unformatted, block device. Use from Linux, Solaris or Windows.
High performance
Equal to or better than local EC2 drive. Provisioned IOPS
High reliability
Built-in redundancy within availability zone.
AFR (Annual Failure Rate) between 0.1% and 1%.
Scalable
Volume sizes ranging from 1 GB to 1 TB.
IMAGE
Easy
Easy to create, attach, back up, restore, and delete volumes.
17. Fundamental Storage Options
Elastic Block Store, S3 and Glacier
Elastic Block Store Simple Storage Service Glacier
High performance block storage device
Highly scalable object storage Long term object archive
1GB to 1TB in size
Paradigm in size
1 byte to 5TB Extremely low cost per gigabyte
File system
Mount as drives to instances with
snapshot/cloning functionalities Very, very fast (~100 IOPs per durability
99.999999999% durability
Performance 99.999999999% volume)
Redundancy Within data center
Security Visible only to your EC2 instances
.
Pricing $0.10/GB/Mo. allocated
Access from the Net? No
Typical use case IMAGE
It’s a disk drive
18. Fundamental Storage Options
Elastic Block Store, S3 and Glacier
Elastic Block Store Simple Storage Service Glacier
High performance block storage device Highly scalable object storage Long term object archive
1GB to 1TB in size 1 byte to 5TB in size Extremely low cost per gigabyte
Mount as drives to instances with 99.999999999% durability 99.999999999% durability
snapshot/cloning functionalities
19. Fundamental Storage Options
Elastic Block Store, S3 and Glacier
ElasticAmazon S3
Block Store Simple Storage Service Glacier
Highly scalable object storage
High performance block Service device
Simple Storage storage
Paradigm term object archive
Long Object store
1 byte to 5TB in size
1GB to 1TB in size
Highly scalable Extremely low cost per gigabyte
99.999999999% durability Performance Very fast
Mount data storage in-the-cloud with
as drives to instances 99.999999999% durability
snapshot/cloning functionalities Redundancy Across data centers
Programmatic access
via web services API Security Public Key / Private Key
Is a Web Store Pricing $0.125/GB/month stored
Not a file system
Access from Yes
Optimized for WORM
Eventually consistent the Net? IMAGE
Typical use Write once, read many
Fast, highly available case
Durable
Economical
20. Fundamental Storage Options
Elastic Block Store, S3 and Glacier
Elastic Block Store Simple Storage Service Glacier
High performance block storage device Highly scalable object storage Long term object archive
1GB to 1TB in size 1 byte to 5TB in size Extremely low cost per gigabyte
Mount as drives to instances with 99.999999999% durability 99.999999999% durability
snapshot/cloning functionalities
21. Fundamental Storage Options
Elastic Block Store, S3 and Glacier
Elastic Block Store
Archive Backup
Simple Storage Service
DR
Glacier
High performance block storage device Highly scalable object storage Long term object archive
Data accessed ~>10% Snapshots
1GB to 1TB in size 1 byte to 5TB in size
Rapid RTO Extremely low cost per gigabyte
/ month
Amazon as drives to instances with
Mount 99.999999999% durability
Shorter term data99.999999999% durability
S3 Expiration policies
snapshot/cloning functionalities backup with rapid
11 9s durability
RTO
Amazon Lower cost when 11 9s
Lower cost Lower cost
S3 RRS not required
Long term archiving
Use policies to move Retain write once read
Amazon
Infrequent data cold backup data for never copy in case of worst
Glacier access (~<10% long term retention case scenario
data/month)
23. Use case journey
On-premise On-instance Object level Long term
Locally
accessible file
systems
Workloads
with local data
24. Use case journey
On-premise On-instance Object level Long term
Locally
accessible file
systems AWS
Workloads
with local data
25. Use case journey
On-premise On-instance Object level Long term
Locally EC2 based Data System images
accessible file applications distribution Database
systems DR Durable media backups
Workloads deployments storage Data archives
with local data
26. Use case journey
On-premise On-instance Object level Long term
Locally EC2 based Data System images
accessible file applications distribution Database
systems DR Durable media backups
Workloads deployments storage Data archives
with local data
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
performance Scalability
27. Use case journey
On-premise On-instance Object level Long term
Locally EC2 based Data System images
accessible file applications distribution Database
systems DR Durable media backups
Workloads deployments storage Data archives
with local data
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
performance Scalability
28. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data System images
accessible file applications distribution Database
systems DR Durable media backups
Workloads deployments storage Data archives
with local data
Getting
data into
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
performance Scalability
29. Getting data into the cloud
Direct connect, import/export and storage gateway
AWS Direct Connect AWS Import/Export Amazon Storage Gateway
Dedicated bandwidth between you Physical transfer of media into and Shrink-wrapped gateway for volume
site and AWS out of AWS synchronization
30. Getting data into the cloud
Storage gateway
Restoration
from snapshots
Snapshot of
local volumes
31. “Amazon Web Services and AWS Storage
Gateway are great assets that help us scale
fast, store data in an ultra-secure
environment, spend more time on product
development (rather than disaster recovery
& backup), and achieve faster time-to-
market with minimal investment…
…By using AWS Storage Gateway, we went
to just hours instead of days to restore from
backup.”
Craig Link, Glympse Technology Manager
32. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data System images
accessible file applications distribution Database
systems DR Durable media backups
Workloads deployments storage Data archives
with local data
Getting
data into
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
performance Scalability
33. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data System images
accessible file applications and distribution
Disks
Database
systems data
DR Durable media backups
Workloads deployments storage Data archives
with local data
Getting
data into 2
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
performance Scalability
39. Curiosity
The mars.jpl.nasa.gov website
is based on the open-source
Content Management System
(CMS) Railo, running on
Amazon EC2
Shared storage for Railo is
provided by Amazon EC2
instances running Gluster on a
pool of Amazon Elastic Block
Store (EBS) volumes for
consistently high performance
disk I/O.
40. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data System images
accessible file applications and distribution
Disks
Database
systems data
DR Durable media backups
Workloads deployments storage Data archives
with local data
Getting
data into 2
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
performance Scalability
41. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data System images
accessible file applications and distribution
Disks
Database
systems data
DR Durable media backups
Workloads deployments storage Data archives
with local data
Getting
data into 2
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
Database
performance
as a
service
3 Scalability
53. DynamoDB Feature Details
Provisioned throughput NoSQL Provisioned Dial up or down provisioned read/write
throughput capacity
database
Predictable Average single digit millisecond latencies
Fast, predictable performance performance from SSD backed infrastructure
Fully distributed, fault tolerant Strong consistency Be sure you are reading the most up to
architecture date values
Fault tolerant Data replicated across availability zones
Monitoring Integrated to Cloud Watch
Secure Integrates with AWS Identity and Access
Management (IAM)
Elastic Integrates with Elastic MapReduce for
MapReduce complex analytics on large datasets
54.
55. “AWS gave us the flexibility to bring a massive
amount of capacity online in a short period of
DynamoDB: time and allowed us to do so in an operationally
over 500,000 writes per straightforward way.
second
AWS is now Shazam’s cloud provider of choice,”
Amazon EMR:
more than 1 million writes Jason Titus,
per second CTO
56. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data System images
accessible file applications and distribution
Disks
Database
systems data
DR Durable media backups
Workloads deployments storage Data archives
with local data
Getting
data into 2
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
Database
performance
as a
service
3 Scalability
57. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data System images
accessible file applications and distribution
Disks
Database
systems data
DR Durable media backups
Workloads deployments storage Data archives
with local data
Getting
data into 2 4
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
Database
performance
Object
as a
service
3 Scalability
serving
and
storage
59. You put in it S3
AWS stores with 99.999999999% durability
60. Highly scalable web
access to objects
You put in it S3
AWS stores with 99.999999999% durability
Multiple redundant
copies in a region
61. …not so simple
CloudFront integration Logging
Access control lists Requestor Pays
Server side encryption Signed URLs
Object expiry Bittorrent support
Website support IAM
Versioning Meta-data
Browser Upload to S3 Multi-object delete
62.
63. “Spotify needed a storage solution that
could scale very quickly without incurring
long lead times for upgrades. This led us to
cloud storage, and in that market, Amazon
Simple Storage Service (Amazon S3) is the
most mature large-scale product.
Amazon S3 gives us confidence in our
ability to expand storage quickly while also
providing high data durability.”
Emil Fredriksson, Operations Director
64. Need to store ‘something’?
S3 is a foundation building block
65. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data System images
accessible file applications and distribution
Disks
Database
systems data
DR Durable media backups
Workloads deployments storage Data archives
with local data
Getting
data into 2 4
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
Database
performance
Object
as a
service
3 Scalability
serving
and
storage
66. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data Cold System images
accessible file applications and distribution storage & Database
Disks
systems data
DR Durable mediaarchiving backups
Workloads deployments storage Data archives
with local data
Getting
data into 2 4 5
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
Database
performance
Object
as a
service
3 Scalability
serving
and
storage
67. Reliable and cheap storage of
data for:
Data with long retention periods
Multi-PB, infrequently accessed
Glacier data sets
Long term cold storage
From $0.01 per GB/Month
99.999999999% durability
68. Glacier allows you to cost-effectively and securely store
Offsite archive enterprise data offsite, making it simple, inexpensive and safe
to retain archived data for as long as desired. Common use
cases include enterprise data, media assets, and research and
scientific data
69. Glacier allows you to cost-effectively and securely store
Offsite archive enterprise data offsite, making it simple, inexpensive and safe
to retain archived data for as long as desired. Common use
cases include enterprise data, media assets, and research and
scientific data
Libraries, historical societies, non-profit organizations and
Digital preservation governments are increasing their efforts to preserve
valuable but aging digital content such as websites, software
source code, video games, user-generated content and
other digital artifacts
70. Glacier allows you to cost-effectively and securely store
Offsite archive enterprise data offsite, making it simple, inexpensive and safe
to retain archived data for as long as desired. Common use
cases include enterprise data, media assets, and research and
scientific data
Libraries, historical societies, non-profit organizations and
Digital preservation governments are increasing their efforts to preserve
valuable but aging digital content such as websites, software
source code, video games, user-generated content and
other digital artifacts
Amazon Glacier is cost competitive, even at scale, and
Tape replacement eliminates pain points like capacity planning, capital
budgeting and investments, media formats, hardware
refreshes, and off-site storage costs, shipping and
retrieving
71.
72. “Every day our genome sequencers produce
terabytes of data. As our company moves into the
clinical space, we face a legal requirement to
archive patient data for years that would
drastically raise the cost of storage.
Thanks to Amazon Glacier’s secure and scalable
solution, we will be able to provide cost-effective,
long-term storage and thereby eliminate a barrier
to providing whole genome sequencing for
medical treatment of cancer and other genetic
diseases.”
Keith Raffel, Senior Vice President and Chief Commercial Officer, Complete Genomics
73. “An organization like ours thinks in centuries
when it comes to content retention, and long
term preservation of our Master Archives is a
critical part our mission here at NYPR.
Storing these core assets on traditional media
such as local disk and off-site tape exposes us to
corruption and even outright-loss of data. We
are excited to move our archives to Amazon
Glacier, which will be a better long-term
solution.”
Steve Shultis, CTO, New York Public Radio
78. A wide range of use cases
AWS supports archive & storage across many application types…
Customer facing online App Storage Big Data
storage Smartphone apps Log files
Files, photos, downloads Facebook Apps Customer Data
Streaming Media File Sharing Usage Data
EC2 Instance Storage Backup and Archive On Premise Storage
File Storage Data Retention NAS Storage
Block Storage Tape Replacement SAN Storage
Usage Data Offsite Backup Offsite Backups
79. AWS is a cost effective place to manage digital assets
There are many options for storing data based upon requirements
On-premise data assets can integrated with cloud services
AWS storage and archive revolutionizes the technology behind long term data