Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. In this session we'll give an introduction to the service and its pricing before diving into how it delivers fast query performance on data sets ranging from hundreds of gigabytes to a petabyte or more.
2. Data warehousing done the AWS way
• No upfront costs, pay as you go
• Really fast performance at a really low price
• Open and flexible with support for popular tools
• Easy to provision and scale up massively
3. We set out to build…
A fast and powerful, petabyte-scale data warehouse that is:
Delivered as a managed service
A Lot Faster
A Lot Cheaper
A Lot SimplerAmazon Redshift
5. We set out to build…
A fast and powerful, petabyte-scale data warehouse that is:
Delivered as a managed service
A Lot Faster
A Lot Cheaper
A Lot SimplerAmazon Redshift
6. Amazon Redshift dramatically reduces I/O
ID Age State
123 20 CA
345 25 WA
678 40 FL
Row storage Column storage
Scan
Direction
7. Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
ID Age State Amou
nt
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• With row storage you do
unnecessary I/O
• To get total amount, you have to
read everything
8. Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
ID Age State Amou
nt
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• With column storage, you only
read the data you need
9. Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
• Columnar compression
saves space & reduces
I/O
• Amazon Redshift
analyzes and
compresses your data
analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
10. Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Direct-attached storage
• Large data block sizes
• Track of the minimum
and maximum value for
each block
• Skip over blocks that
don’t contain the data
needed for a given query
• Minimize unnecessary
I/O
11. Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
• Use direct-attached storage
to maximize throughput
• Hardware optimized for high
performance data
processing
• Large block sizes to make
the most of each read
• Amazon Redshift manages
durability for you
12. Amazon Redshift architecture
• Leader Node
– SQL endpoint
– Stores metadata
– Coordinates query execution
• Compute Nodes
– Local, columnar storage
– Execute queries in parallel
– Load, backup, restore via Amazon
S3
– Parallel load from Amazon
DynamoDB
• Single node version available
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
13. Amazon Redshift runs on optimized hardware
HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate
HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage
• Optimized for I/O intensive workloads
• High disk density
• Runs in HPC - fast network
• HS1.8XL available on Amazon EC2
16. Amazon Redshift parallelizes and distributes everything
• Load in parallel from Amazon
S3 or Amazon DynamoDB
• Data automatically distributed
and sorted according to DDL
• Scales linearly with number of
nodes
• Query
• Load
• Backup/Restore
• Resize
17. Amazon Redshift parallelizes and distributes everything
• Backups to Amazon S3 are
automatic, continuous and
incremental
• Configurable system snapshot
retention period
• Take user snapshots on-
demand
• Streaming restores enable you
to resume querying faster
• Query
• Load
• Backup/Restore
• Resize
18. Amazon Redshift parallelizes and distributes everything
• Resize while remaining online
• Provision a new cluster in the
background
• Copy data in parallel from node to
node
• Only charged for source cluster
• Query
• Load
• Backup/Restore
• Resize
19. Amazon Redshift parallelizes and distributes everything
• Query
• Load
• Backup/Restore
• Resize
• Automatic SQL endpoint switchover
via DNS
• Decommission the source cluster
• Simple operation via AWS Console or
API
21. Amazon Redshift lets you start small and grow big
Extra Large Node (HS1.XL)
3 spindles, 2 TB, 16 GB RAM, 2 cores
Single Node (2 TB)
Cluster 2-32 Nodes (4 TB – 64 TB)
Eight Extra Large Node (HS1.8XL)
24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE
Cluster 2-100 Nodes (32 TB – 1.6 PB)
Note: Nodes not to scale
22. Amazon Redshift is priced to let you analyze all your data
Price Per Hour for
HS1.XL Single Node
Effective Hourly
Price
Per TB
Effective Annual
Price per TB
On-Demand $ 0.850 $ 0.425 $ 3,723
1 Year
Reservation
$ 0.500 $ 0.250 $ 2,190
3 Year
Reservation
$ 0.228 $ 0.114 $ 999
Simple Pricing
Number of Nodes x Cost per Hour
No charge for Leader Node
No upfront costs
Pay as you go
23. Amazon Redshift is easy to use
• Provision in minutes
• Monitor query performance
• Point and click resize
• Built in security
• Automatic backups
26. Amazon Redshift integrates with multiple data sources
Amazon
DynamoDB
Amazon Elastic
MapReduce
Amazon Simple
Storage Service (S3)
Amazon Elastic
Compute Cloud (EC2)
AWS Storage
Gateway Service
Corporate
Data Center
Amazon Relational
Database Service
(RDS)
Amazon
Redshift
More coming soon…
27. Amazon Redshift provides multiple data loading options
• Upload to Amazon S3
• AWS Import/Export
• AWS Direct Connect
• Work with a partner
Data Integration Systems Integrators
More coming soon…
28. Amazon Redshift works with your existing analysis tools
JDBC/ODBC
Amazon Redshift
More coming soon…