In this session, we will introduce Amazon RedShift, a new petabyte scale data warehouse service. We'll walk through the basics of the Redshift architecture, launching a new cluster and run SQL queries across a large scale, public dataset. After demonstrating how easy it is to get started with RedShift, we will show how to visualize and query large scale datasets, running queries, reports, and analytics against millions of rows of records in just a few seconds.
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Getting Started with Amazon Redshift
1. Getting Started with Amazon RedShift
AWS 2013 Summit
Ben Butler:
butlerb@amazon.com
@bensbutler
Federal Solutions Architect
World Wide Public Sector
2. Amazon Web Services managed data services
Simple
Storage
Service
Import/Export
Glacier (cold storage)G
DynamoDB
(NoSql database)
Relational Data
ServiceRDS
Elastic MapReduce
(managed Hadoop)
Big
Data
RedShift
(data warehouse)
Data Pipeline
(managed data
workflows)
3. AWS Database Services
Fully managed SQL database service for OLTP workloads
Fully managed NoSQL service for massively scalable, high
throughput, low latency workloads
Fully managed, fast and powerful, petabyte-scale data
warehouse service
Fully managed Memcached-compliant in memory caching
service
4. Traditional data warehousing is expensive and
complicated
Expensive Hardware and Software
Complex Tuning and Admin
Enterprises average between 3 and
4 DBAs per data warehouse
Source: Oracle technology global price list 11/1/2012
Source: Oracle technology global price list 11/1/2012,
Gartner: Critical factors in calculating the data warehouse TCO, July 2009
5. Customers Aren’t Happy with Today’s Solutions
Large Companies Small Companies
Expensive
Hard to scale
Cant afford to have a
data warehouse
6. Most data never makes it to a data warehouse
1990 2000 2010 2020
The Data Analysis Gap
Enterprise Data
Data in Warehouse
Enterprise Data is growing at
over 50% yearly
Data Warehousing growing at
less than 10% yearly
Most data is left on the floor
Sources:
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
7. Our customers have been asking us for
data warehousing done the AWS way
No upfront costs, pay as you go
Really fast performance at a really low price
Open and flexible with support for
popular BI tools
Easy to provision and scale up massively
8. Amazon Redshift is
A fast and powerful, petabyte-scale data warehouse that is
A Lot Faster
A Lot Cheaper
A Whole Lot Simpler
Delivered as a managed service
Amazon Redshift
10. Amazon Redshift dramatically reduces IO
Column storage
Data compression
Zone maps
Direct-attached storage
Large data block sizes
Id Age State
123 20 CA
345 25 WA
678 40 FL
Row storage Column storage
11. Amazon Redshift parallelizes and distributes everything
Query
Load
Backup
Restore
Resize
Compute
Node
Compute
Node
Compute
Node
Leader
Node
Common BI Tools
JDBC/ ODBC
10GigE Mesh
Compute
Node
Compute
Node
Compute
Node
Leader
Node
Common BI Tools
JDBC/ ODBC
10Gi gE Mes h
12. Amazon Redshift lets you start small and grow big
Extra Large Node
3 spindles, 2TB, 15GiB RAM
2 virtual cores, 10GigE
Single Node (2TB)
Cluster 2-32 Nodes (4TB – 64TB)
8 Extra Large Node
24 spindles, 16TB, 120GiB RAM
16 virtual cores, 10GigE
Cluster 2-100 Nodes (32TB – 1.6PB)
Note: Nodes not to scale
13. Amazon Redshift is priced to let you analyze all your data
Price Per Hour for HS1.XL
Single Node
Effective Hourly Price Per
TB
Effective Annual Price
per TB
On-Demand $ 0.850 $ 0.425 $ 3,723
1 Year Reservation $ 0.500 $ 0.250 $ 2,190
3 Year Reservation $ 0.228 $ 0.114 $ 999
Simple Pricing: Number of Nodes x Cost per
Hour
No charge for Leader Node
Pay as you grow
14. Amazon Redshift simplifies provisioning
• Create a cluster in minutes
• Automatically patch your OS and data warehouse software
• Scale up to 1.6PB with a few clicks and no downtime
Amazon Redshift
Amazon Redshift
Amazon Redshift
23. Amazon Redshift integrates with your data sources
Amazon
DynamoDB
Amazon Elastic
MapReduce
Amazon Simple Storage
Service (S3)
Amazon EC2
AWS Storage
Gateway Service
Corporate
Data Center
Amazon Relational
Database Service (RDS)
Amazon
Redshift
25. Pilot results have been dramatic
Current environment: 32 nodes, 128 CPUs, 4.2TB RAM, 1.6 PB disk
Tested 2 Billion row data set, 6 representative queries on 2 node
Amazon Redshift cluster
Queries ran between 12x and 150x faster
26. It’s still day one for Amazon Redshift
Increase
Adoption
Scale
Infrastructure
Increase
Efficiency
Lower
Price
Get Feedback
Add features
that matter
Raise
Value
27. And now for the demo…
Download this presentation and video at: http://bit.ly/aws-summit-redshift
• Install tools and drivers
• Grab Census data
• Prepare it and put in S3
• Prepare IAM credentials
• COPY the data into RedShift
• Run queries
• Intro to a BI tool