SlideShare une entreprise Scribd logo
1  sur  46
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Data Warehousing with Amazon Redshift
Karan Desai
deskaran@amazon.com
Solutions Architect
Neel Mitra
indranem@amazon.com
Solutions Architect
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
PostgreSQL
Columnar
MPP
OLAP
AWS IAMAmazon VPC
Amazon S3 AWS KMS
Amazon
Route 53
Amazon
CloudWatch
Amazon EC2
Amazon Redshift
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
February 2013
January 2018
> 125 Significant Patches
> 165 Significant Features
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Load
Unload
Backup
Restore
• Massively parallel, shared nothing
columnar architecture
• Leader node
– SQL endpoint
– Stores metadata
– Coordinates parallel SQL
processing
• Compute nodes
– Local, columnar storage
– Executes queries in parallel
– Load, unload, backup, restore
• Amazon Redshift Spectrum nodes
– Execute queries directly against
Amazon Simple Storage Service
(Amazon S3)
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Leader
Node
Amazon S3
...
1 2 3 4 N
Amazon
Redshift
Spectrum
Amazon Redshift Architecture
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Redshift Cluster Architecture
• Massively parallel, shared nothing
• Leader node
– SQL endpoint
– Stores metadata
– Coordinates parallel SQL processing
• Compute nodes
– Local, columnar storage
– Executes queries in parallel
– Load, backup, restore
10 GigE
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
S3 / EMR / DynamoDB / SSH
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Leader
Node
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
Leader Node
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
Leader Node
• Parser & Rewriter
• Planner & Optimizer
• Code Generator
• Input: Optimized plan
• Output: >=1 C++ functions
• Compiler
• Task Scheduler
• Workload Management
• Admission
• Scheduling
• PostgreSQL Catalog Tables
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
Leader Node
• Query execution processes
• Backup & restore processes
• Replication processes
• Local Storage
• Disks
• Slices
• Tables
• Columns
• Blocks
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
Leader Node
• Query execution processes
• Backup & restore processes
• Replication processes
• Local Storage
• Disks
• Slices
• Tables
• Columns
• Blocks
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Terminology and Concepts: Columnar
• Amazon Redshift uses a columnar architecture for storing data
on disk
• Goal: reduce I/O for analytics queries
• Physically store data on disk by column rather than row
• Only read the column data that is required
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Columnar Architecture: Example
Row-based storage
• Need to read everything
• Unnecessary I/O
aid loc dt
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
aid loc dt
1 SFO 2017-10-20
2 JFK 2017-10-20
3 SFO 2017-04-01
4 JFK 2017-05-14
SELECT min(dt) FROM deep_dive;
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Columnar Architecture: Example
Column-based storage
• Only scan blocks for relevant
column
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
aid loc dt
1 SFO 2017-10-20
2 JFK 2017-10-20
3 SFO 2017-04-01
4 JFK 2017-05-14
SELECT min(dt) FROM deep_dive
aid loc dt
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Terminology and Concepts: Compression
• Goals:
• Allow more data to be stored within an Amazon Redshift cluster
• Improve query performance by decreasing I/O
• Impact:
• Allows two to four times more data to be stored within the cluster
• By default, COPY automatically analyzes and compresses data on first load into
an empty table
• ANALYZE COMPRESSION is a built-in command that will find the optimal
compression for each column on an existing table
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Compression: Example
aid loc dt
aid loc dt
1 SFO 2017-10-20
2 JFK 2017-10-20
3 SFO 2017-04-01
4 JFK 2017-05-14
Add 1 of 11 different
encodings to each column
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Compression: Example
• More efficient compression is due to
storing the same data type in the
columnar architecture
• Columns grow and shrink
independently
• Reduces storage requirements
• Reduces I/O
aid loc dt
CREATE TABLE deep_dive (
aid INT ENCODE ZSTD
,loc CHAR(3) ENCODE BYTEDICT
,dt DATE ENCODE RUNLENGTH
);
aid loc dt
1 SFO 2017-10-20
2 JFK 2017-10-20
3 SFO 2017-04-01
4 JFK 2017-05-14
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Best Practices: Compression
• Apply compression to all tables
• Use ANALYZE COMPRESSION command to find optimal compression
– RAW (no compression) for sparse columns and small tables
• Changing column encoding requires a table rebuild
https://github.com/awslabs/amazon-redshift-utils/tree/master/src/ColumnEncodingUtility
• Verifying columns are
compressed:
SELECT "column", type, encoding FROM pg_table_def
WHERE tablename = 'deep_dive';
column | type | encoding
--------+--------------+----------
aid | integer | zstd
loc | character(3) | bytedict
dt | date | runlength
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Terminology and Concepts: Blocks
• Column data is persisted to 1 MB immutable blocks
• Blocks are individually encoded with 1 of 11 encodings
• A full block can contain millions of values
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Terminology and Concepts: Zone Maps
• Goal: eliminates unnecessary I/O
• In-memory block metadata
• Contains per-block min and max values
• All blocks automatically have zone maps
• Effectively prunes blocks which cannot contain data for a
given query
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Terminology and Concepts: Data Sorting
Goal: make queries run faster by increasing the effectiveness of zone maps and reducing I/O
Impact: enables range-restricted scans to prune blocks by leveraging zone maps
Achieved with the table property SORTKEY defined on one or more columns
Optimal sort key is dependent on:
• Query patterns
• Business requirements
• Data profile
• Data is stored on disk in sorted order according to the sort key.
• Redshift query optimizer uses sort order when it determines optimal query plans
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Sort Key: Example
Add a sort key to one or more columns to
physically sort the data on disk
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) SORT KEY (dt, loc);
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) SORTKEY(dt);
deep_dive
aid loc dt
1 SFO 2017-10-20
2 JFK 2017-10-20
3 SFO 2017-04-01
4 JFK 2017-05-14
deep_dive (sorted)
aid loc dt
3 SFO 2017-04-01
4 JFK 2017-05-14
2 JFK 2017-10-20
1 SFO 2017-10-20
deep_dive (sorted)
aid loc dt
3 SFO 2017-04-01
deep_dive (sorted)
aid loc dt
3 SFO 2017-04-01
4 JFK 2017-05-14
deep_dive (sorted)
aid loc dt
3 SFO 2017-04-01
4 JFK 2017-05-14
2 JFK 2017-10-20
deep_dive (sorted)
aid loc dt
3 SFO 2017-04-01
4 JFK 2017-05-14
2 JFK 2017-10-20
1 SFO 2017-10-20
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
SELECT count(*) FROM deep_dive WHERE dt = '06-09-2017';
MIN: 01-JUNE-2017
MAX: 06-JUNE-2017
MIN: 07-JUNE-2017
MAX: 12-JUNE-2017
MIN: 13-JUNE-2017
MAX: 21-JUNE-2017
MIN: 21-JUNE-2017
MAX: 30-JUNE-2017
Sorted by date
MIN: 01-JUNE-2017
MAX: 20-JUNE-2017
MIN: 08-JUNE-2017
MAX: 30-JUNE-2017
MIN: 12-JUNE-2017
MAX: 20-JUNE-2017
MIN: 02-JUNE-2017
MAX: 25-JUNE-2017
Unsorted table
Zone Maps and Sorting: Example
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Best Practices: Sort Keys
• Place the sort key on columns that are frequently filtered on,
placing the lowest cardinality columns first
– On most fact tables, the first sort key column should be a temporal column
– Columns added to a sort key after a high-cardinality column are not effective
• With an established workload, use the following scripts to help find sort key
suggestions:
https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminScripts/filter_used.sql
https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminScripts/predicate_columns.sql
• Design considerations:
• Sort keys are less beneficial on small tables
• Define four or less sort key columns—more will result in marginal gains and
increased ingestion overhead
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Terminology and Concepts: Slices
• A slice can be thought of like a virtual compute node
– Unit of data partitioning
– Parallel query processing
• Facts about slices:
– Each compute node has either 2, 16, or 32 slices
– Table rows are distributed to slices
– A slice processes only its own data
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Data Distribution
• Distribution style is a table property which dictates how
that table’s data is distributed throughout the cluster:
• KEY: value is hashed, same value goes to same
location (slice)
• ALL: full table data goes to the first slice of every
node
• EVEN: round robin
ALL
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
KEY
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
EVENGoals:
• Distribute data evenly
for parallel processing
• Minimize data
movement during query
processing
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE (EVEN|KEY|ALL);
Data Distribution: Example
Node 1
Slice 0 Slice 1
Node 2
Slice 2 Slice 3
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Data Distribution: EVEN Example
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE EVEN;
Node 1
Slice 0 Slice 1
Node 2
Slice 2 Slice 3
INSERT INTO deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0 Rows: 0 Rows: 0 Rows: 0Rows: 1 Rows: 1 Rows: 1 Rows: 1
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE KEY DISTKEY (loc);
Data Distribution: KEY Example #1
Node 1
Slice 0 Slice 1
Node 2
Slice 2 Slice 3
INSERT INTO deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 2 Rows: 0 Rows: 0Rows: 0Rows: 1
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 2Rows: 0Rows: 1
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Data Distribution: KEY Example #2
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE KEY DISTKEY (aid);
Node 1
Slice 0 Slice 1
Node 2
Slice 2 Slice 3
INSERT INTO deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0 Rows: 0 Rows: 0 Rows: 0Rows: 1 Rows: 1 Rows: 1 Rows: 1
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Data Distribution: ALL Example
Node 1
Slice 0 Slice 1
Node 2
Slice 2 Slice 3
INSERT INTO deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Rows: 0 Rows: 0
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE ALL;
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Best Practices: Data Distribution
DISTSTYLE KEY
• Goals
• Optimize JOIN performance between large tables
• Optimize INSERT INTO SELECT performance
• Optimize GROUP BY performance
• The column that is being distributed on should have a high cardinality and not cause row
skew:
DISTSTYLE ALL
• Goals
• Optimize JOIN performance with dimension tables
• Reduces disk usage on small tables
• Small and medium size dimension tables (< 3M rows)
DISTSTYLE EVEN
• If neither KEY or ALL apply (or you are unsure)
SELECT diststyle, skew_rows FROM svv_table_info WHERE "table" = 'deep_dive';
diststyle | skew_rows
-----------+-----------
KEY(aid) | 1.07  Ratio between the slice with the most and least number of rows
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Best Practices: Table Design Summary
• Materialize often filtered columns from dimension tables into fact tables
• Materialize often calculated values into tables
• Avoid distribution keys on temporal columns
• Keep data types as wide as necessary (but no longer than necessary)
• VARCHAR, CHAR, and NUMERIC
• Add compression to columns
• Optimal compression can be found using ANALYZE COMPRESSION
• Add sort keys on the primary columns that are filtered on
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Allows for the separation of different query workloads
• Goals
• Prioritize important queries
• Throttle/abort less important queries
• Control concurrent number of executing of queries
• Divide cluster memory
• Set query timeouts to abort long running queries
Workload Management (WLM)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
• Queues
– Assigned a percentage of cluster memory
– SQL queries execute in queue based on
• User group: which groups the user belongs to
• Query group session level variable
• Slots
– Division of memory within a WLM Queue, correlated
with the number of simultaneous running queries
– WLM_QUERY_SLOT_COUNT is a session level
variable
• Useful to increase for memory intensive operations
Terminology and Concepts: WLM
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
• Use case:
– Light ingestion/ELT on a continuous cadence of 10 minutes
– Peak reporting workload during business hours (7 a.m.–7 p.m.)
– Heavy ingestion/ELT nightly (11 p.m.–3 a.m.)
• User types:
– Business reporting and dashboards
– Analysts and data science teams
– Database administrators
Workload Management: Example
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
• Unallocated memory goes into a general pool that can be used by any queue
• Hidden superuser queue can be used by admins, manually switched into:
SET query_group TO 'superuser'
• The superuser queue has a single slot, the equivalent of 5–7% memory
allocation, and no timeout
Workload Management: Example
Queue Name Memory Slots/Concurrency Timeout
(seconds)
Ingestion 20% 2 None
Dashboard 50% 10 120
Default (Analysts) 25% 3 None
Create a queue for each workload type:
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Extension of workload management (WLM)
Allow the automatic handling of runaway (poorly written) queries
• Rules applied to a WLM queue allow queries to be:
– LOGGED
– ABORTED
– HOPPED
• Goals
– Protect against wasteful use of the cluster
– Log resource-intensive queries
Query Monitoring Rules (QMR)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
• Metrics with operators and values (e.g. query_cpu_time > 1000) create a
predicate
• Multiple predicates can be AND-ed together to create a rule
• Multiple rules can be defined for a queue in WLM. These rules are OR-ed
together
If { rule } then [action]
{ rule : metric operator value } e.g.: rows_scanned >
100000
• Metric: cpu_time, query_blocks_read, rows scanned,
query execution time, cpu & io skew per slice,
join_row_count, etc.
• Operator: <, >, ==
• Value: integer
[action]: hop, log, abort
Query Monitoring Rules (QMR)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
• Keep the number of WLM queues to a minimum, typically just three
queues to avoid having unused queues
https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminScripts/wlm_apex_hourly.sql
• Use WLM to limit ingestion/ELT concurrency to two to three
• To maximize query throughput, use WLM to throttle the number of
concurrent queries to 15 or less
• Use QMR rather than WLM to set query timeouts
• Use QMR to log long running queries
• Save the superuser queue for administration tasks and canceling
queries
Best Practices: WLM and QMR
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
• Dense Compute—DC2
• Solid state disks
• Dense Storage—DS2
• Magnetic disks
Instance
Type
Disk Type Size Memory CPUs
DC2 large NVMe SSD 160 GB 16 GB 2
DC2 8xlarge NVMe SSD 2.56 TB 244 GB 32
DS2 xlarge Magnetic 2 TB 32 GB 4
DS2 8xlarge Magnetic 16 TB 244 GB 36
Terminology and Concepts: Node Types
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Best Practices: Cluster Sizing
• Use at least two computes nodes (multi-node cluster) in production for data
mirroring
– Leader node is given for no additional cost
• Amazon Redshift is significantly faster in a VPC compared to EC2 Classic
• Maintain at least 20% free space or three times the size of the largest table
– Scratch space for usage, rewriting tables
– Free space is required for vacuum to re-sort table
– Temporary tables used for intermediate query results
• The maximum number of available Amazon Redshift Spectrum nodes is a function
of the number of slices in the Amazon Redshift cluster
• If you’re using DC1 instances, upgrade to the DC2 instance type
– Same price as DC1, significantly faster
– Reserved Instances do not automatically transfer over
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Fast @ Exabyte scale Elastic & highly available On-demand, pay-per-query
High concurrency:
Multiple clusters access
same data
No ETL: Query data in-
place using open file
formats
Full Amazon Redshift
SQL support
S3
SQL
Run SQL queries directly against data in S3 using thousands of nodes
Amazon Redshift Spectrum
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Query:
SELECT COUNT(*)
FROM s3.ext_table
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
An Exabyte Query: Harry Potter
Roughly 140 TB of customer item order detail records
for each day over past 20 years.
190 million files across 15,000 partitions in S3. One
partition per day for USA and rest of world.
Need a billion-fold reduction in data processed.
Running this query using a 1000 node Hive cluster
would take over 5 years.*
• Compression ……………..….……..5X
• Columnar file format……….......…10X
• Scanning with 2500 nodes…....2500X
• Static partition elimination…............2X
• Dynamic partition elimination..….350X
• Redshift’s query optimizer……......40X
---------------------------------------------------
Total reduction……….…………3.5B X
* Estimated using 20 node Hive cluster & 1.4TB, assume linear
* Query used a 20 node DC1.8XLarge Amazon Redshift cluster
* Not actual sales data - generated for this demo based on data format
used by Amazon Retail.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
AWS Labs on GitHub—Amazon Redshift
• https://github.com/awslabs/amazon-redshift-utils
• https://github.com/awslabs/amazon-redshift-monitoring
• https://github.com/awslabs/amazon-redshift-udfs
• Admin Scripts
Collection of utilities for running diagnostics on your cluster
• Admin Views
Collection of utilities for managing your cluster, generating schema DDL, and so on
• Analyze Vacuum Utility
Utility that can be scheduled to vacuum and analyze the tables within your Amazon
Redshift cluster
• Column Encoding Utility
Utility that will apply optimal column encoding to an established schema with data
already loaded
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
AWS Big Data Blog—Amazon Redshift
Amazon Redshift Engineering’s Advanced Table Design Playbook
https://aws.amazon.com/blogs/big-data/amazon-redshift-engineerings-advanced-table-design-
playbook-preamble-prerequisites-and-prioritization/
- Zach Christopherson
• Top 10 Performance Tuning Techniques for Amazon Redshift
https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-techniques-for-amazon-redshift/
- Ian Meyers and Zach Christopherson
• 10 Best Practices for Amazon Redshift Spectrum
https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/
- Po Hong and Peter Dalton
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
aws.amazon.com/activate
Everything and Anything Startups
Need to Get Started on AWS

Contenu connexe

Tendances

AWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the CloudAWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the CloudAmazon Web Services
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxSwathiPonugumati
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
AWS Cloud cost optimization
AWS Cloud cost optimizationAWS Cloud cost optimization
AWS Cloud cost optimizationYogesh Sharma
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Amazon Web Services
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Amazon Web Services
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeAmazon Web Services
 
SID201_IAM for Enterprises How Vanguard strikes the Balance Between Agility, ...
SID201_IAM for Enterprises How Vanguard strikes the Balance Between Agility, ...SID201_IAM for Enterprises How Vanguard strikes the Balance Between Agility, ...
SID201_IAM for Enterprises How Vanguard strikes the Balance Between Agility, ...Amazon Web Services
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWSSungmin Kim
 
Amazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS SummitAmazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS SummitAmazon Web Services
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarAmazon Web Services
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineAmazon Web Services
 

Tendances (20)

AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
 
AWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the CloudAWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the Cloud
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
AWS Cloud cost optimization
AWS Cloud cost optimizationAWS Cloud cost optimization
AWS Cloud cost optimization
 
ElastiCache & Redis
ElastiCache & RedisElastiCache & Redis
ElastiCache & Redis
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
 
Athena & Glue
Athena & GlueAthena & Glue
Athena & Glue
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data Lake
 
SID201_IAM for Enterprises How Vanguard strikes the Balance Between Agility, ...
SID201_IAM for Enterprises How Vanguard strikes the Balance Between Agility, ...SID201_IAM for Enterprises How Vanguard strikes the Balance Between Agility, ...
SID201_IAM for Enterprises How Vanguard strikes the Balance Between Agility, ...
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 
Amazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS SummitAmazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
 

Similaire à Data Warehousing with Amazon Redshift

Data Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF LoftData Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF LoftAmazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Data Warehousing with Amazon Redshift: Data Analytics Week SF
Data Warehousing with Amazon Redshift: Data Analytics Week SFData Warehousing with Amazon Redshift: Data Analytics Week SF
Data Warehousing with Amazon Redshift: Data Analytics Week SFAmazon Web Services
 
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumAmazon Web Services
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with LabAmazon Web Services
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFAmazon Web Services
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftAmazon Web Services
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon RedshiftAmazon Web Services
 
Best Practices for Migrating Legacy Data Warehouses into Amazon Redshift
Best Practices for Migrating Legacy Data Warehouses into Amazon RedshiftBest Practices for Migrating Legacy Data Warehouses into Amazon Redshift
Best Practices for Migrating Legacy Data Warehouses into Amazon RedshiftAmazon Web Services
 
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Amazon Web Services
 
Data Warehousing and Data Lake Analytics, Together - AWS Online Tech Talks
Data Warehousing and Data Lake Analytics, Together - AWS Online Tech TalksData Warehousing and Data Lake Analytics, Together - AWS Online Tech Talks
Data Warehousing and Data Lake Analytics, Together - AWS Online Tech TalksAmazon Web Services
 
Building High Performance Apps with In-memory Data
Building High Performance Apps with In-memory DataBuilding High Performance Apps with In-memory Data
Building High Performance Apps with In-memory DataAmazon Web Services
 

Similaire à Data Warehousing with Amazon Redshift (20)

Data Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF LoftData Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Data Warehousing with Amazon Redshift: Data Analytics Week SF
Data Warehousing with Amazon Redshift: Data Analytics Week SFData Warehousing with Amazon Redshift: Data Analytics Week SF
Data Warehousing with Amazon Redshift: Data Analytics Week SF
 
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with Lab
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SF
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Best Practices for Migrating Legacy Data Warehouses into Amazon Redshift
Best Practices for Migrating Legacy Data Warehouses into Amazon RedshiftBest Practices for Migrating Legacy Data Warehouses into Amazon Redshift
Best Practices for Migrating Legacy Data Warehouses into Amazon Redshift
 
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
 
Data Warehousing and Data Lake Analytics, Together - AWS Online Tech Talks
Data Warehousing and Data Lake Analytics, Together - AWS Online Tech TalksData Warehousing and Data Lake Analytics, Together - AWS Online Tech Talks
Data Warehousing and Data Lake Analytics, Together - AWS Online Tech Talks
 
Building High Performance Apps with In-memory Data
Building High Performance Apps with In-memory DataBuilding High Performance Apps with In-memory Data
Building High Performance Apps with In-memory Data
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 

Plus de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Data Warehousing with Amazon Redshift

  • 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Data Warehousing with Amazon Redshift Karan Desai deskaran@amazon.com Solutions Architect Neel Mitra indranem@amazon.com Solutions Architect
  • 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved PostgreSQL Columnar MPP OLAP AWS IAMAmazon VPC Amazon S3 AWS KMS Amazon Route 53 Amazon CloudWatch Amazon EC2 Amazon Redshift
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved February 2013 January 2018 > 125 Significant Patches > 165 Significant Features
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Load Unload Backup Restore • Massively parallel, shared nothing columnar architecture • Leader node – SQL endpoint – Stores metadata – Coordinates parallel SQL processing • Compute nodes – Local, columnar storage – Executes queries in parallel – Load, unload, backup, restore • Amazon Redshift Spectrum nodes – Execute queries directly against Amazon Simple Storage Service (Amazon S3) SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores JDBC/ODBC 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node Leader Node Amazon S3 ... 1 2 3 4 N Amazon Redshift Spectrum Amazon Redshift Architecture
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Redshift Cluster Architecture • Massively parallel, shared nothing • Leader node – SQL endpoint – Stores metadata – Coordinates parallel SQL processing • Compute nodes – Local, columnar storage – Executes queries in parallel – Load, backup, restore 10 GigE Ingestion Backup Restore SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores S3 / EMR / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node Leader Node
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Parser & Rewriter • Planner & Optimizer • Code Generator • Input: Optimized plan • Output: >=1 C++ functions • Compiler • Task Scheduler • Workload Management • Admission • Scheduling • PostgreSQL Catalog Tables
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Query execution processes • Backup & restore processes • Replication processes • Local Storage • Disks • Slices • Tables • Columns • Blocks
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Query execution processes • Backup & restore processes • Replication processes • Local Storage • Disks • Slices • Tables • Columns • Blocks
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Terminology and Concepts: Columnar • Amazon Redshift uses a columnar architecture for storing data on disk • Goal: reduce I/O for analytics queries • Physically store data on disk by column rather than row • Only read the column data that is required
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Columnar Architecture: Example Row-based storage • Need to read everything • Unnecessary I/O aid loc dt CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ); aid loc dt 1 SFO 2017-10-20 2 JFK 2017-10-20 3 SFO 2017-04-01 4 JFK 2017-05-14 SELECT min(dt) FROM deep_dive;
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Columnar Architecture: Example Column-based storage • Only scan blocks for relevant column CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ); aid loc dt 1 SFO 2017-10-20 2 JFK 2017-10-20 3 SFO 2017-04-01 4 JFK 2017-05-14 SELECT min(dt) FROM deep_dive aid loc dt
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Terminology and Concepts: Compression • Goals: • Allow more data to be stored within an Amazon Redshift cluster • Improve query performance by decreasing I/O • Impact: • Allows two to four times more data to be stored within the cluster • By default, COPY automatically analyzes and compresses data on first load into an empty table • ANALYZE COMPRESSION is a built-in command that will find the optimal compression for each column on an existing table
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Compression: Example aid loc dt aid loc dt 1 SFO 2017-10-20 2 JFK 2017-10-20 3 SFO 2017-04-01 4 JFK 2017-05-14 Add 1 of 11 different encodings to each column CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date );
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Compression: Example • More efficient compression is due to storing the same data type in the columnar architecture • Columns grow and shrink independently • Reduces storage requirements • Reduces I/O aid loc dt CREATE TABLE deep_dive ( aid INT ENCODE ZSTD ,loc CHAR(3) ENCODE BYTEDICT ,dt DATE ENCODE RUNLENGTH ); aid loc dt 1 SFO 2017-10-20 2 JFK 2017-10-20 3 SFO 2017-04-01 4 JFK 2017-05-14
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Best Practices: Compression • Apply compression to all tables • Use ANALYZE COMPRESSION command to find optimal compression – RAW (no compression) for sparse columns and small tables • Changing column encoding requires a table rebuild https://github.com/awslabs/amazon-redshift-utils/tree/master/src/ColumnEncodingUtility • Verifying columns are compressed: SELECT "column", type, encoding FROM pg_table_def WHERE tablename = 'deep_dive'; column | type | encoding --------+--------------+---------- aid | integer | zstd loc | character(3) | bytedict dt | date | runlength
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Terminology and Concepts: Blocks • Column data is persisted to 1 MB immutable blocks • Blocks are individually encoded with 1 of 11 encodings • A full block can contain millions of values
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Terminology and Concepts: Zone Maps • Goal: eliminates unnecessary I/O • In-memory block metadata • Contains per-block min and max values • All blocks automatically have zone maps • Effectively prunes blocks which cannot contain data for a given query
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Terminology and Concepts: Data Sorting Goal: make queries run faster by increasing the effectiveness of zone maps and reducing I/O Impact: enables range-restricted scans to prune blocks by leveraging zone maps Achieved with the table property SORTKEY defined on one or more columns Optimal sort key is dependent on: • Query patterns • Business requirements • Data profile • Data is stored on disk in sorted order according to the sort key. • Redshift query optimizer uses sort order when it determines optimal query plans
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Sort Key: Example Add a sort key to one or more columns to physically sort the data on disk CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) SORT KEY (dt, loc); CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) SORTKEY(dt); deep_dive aid loc dt 1 SFO 2017-10-20 2 JFK 2017-10-20 3 SFO 2017-04-01 4 JFK 2017-05-14 deep_dive (sorted) aid loc dt 3 SFO 2017-04-01 4 JFK 2017-05-14 2 JFK 2017-10-20 1 SFO 2017-10-20 deep_dive (sorted) aid loc dt 3 SFO 2017-04-01 deep_dive (sorted) aid loc dt 3 SFO 2017-04-01 4 JFK 2017-05-14 deep_dive (sorted) aid loc dt 3 SFO 2017-04-01 4 JFK 2017-05-14 2 JFK 2017-10-20 deep_dive (sorted) aid loc dt 3 SFO 2017-04-01 4 JFK 2017-05-14 2 JFK 2017-10-20 1 SFO 2017-10-20
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved SELECT count(*) FROM deep_dive WHERE dt = '06-09-2017'; MIN: 01-JUNE-2017 MAX: 06-JUNE-2017 MIN: 07-JUNE-2017 MAX: 12-JUNE-2017 MIN: 13-JUNE-2017 MAX: 21-JUNE-2017 MIN: 21-JUNE-2017 MAX: 30-JUNE-2017 Sorted by date MIN: 01-JUNE-2017 MAX: 20-JUNE-2017 MIN: 08-JUNE-2017 MAX: 30-JUNE-2017 MIN: 12-JUNE-2017 MAX: 20-JUNE-2017 MIN: 02-JUNE-2017 MAX: 25-JUNE-2017 Unsorted table Zone Maps and Sorting: Example
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Best Practices: Sort Keys • Place the sort key on columns that are frequently filtered on, placing the lowest cardinality columns first – On most fact tables, the first sort key column should be a temporal column – Columns added to a sort key after a high-cardinality column are not effective • With an established workload, use the following scripts to help find sort key suggestions: https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminScripts/filter_used.sql https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminScripts/predicate_columns.sql • Design considerations: • Sort keys are less beneficial on small tables • Define four or less sort key columns—more will result in marginal gains and increased ingestion overhead
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Terminology and Concepts: Slices • A slice can be thought of like a virtual compute node – Unit of data partitioning – Parallel query processing • Facts about slices: – Each compute node has either 2, 16, or 32 slices – Table rows are distributed to slices – A slice processes only its own data
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Data Distribution • Distribution style is a table property which dictates how that table’s data is distributed throughout the cluster: • KEY: value is hashed, same value goes to same location (slice) • ALL: full table data goes to the first slice of every node • EVEN: round robin ALL Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 KEY Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 EVENGoals: • Distribute data evenly for parallel processing • Minimize data movement during query processing
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE (EVEN|KEY|ALL); Data Distribution: Example Node 1 Slice 0 Slice 1 Node 2 Slice 2 Slice 3 Table: deep_dive User Columns System Columns aid loc dt ins del row
  • 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Data Distribution: EVEN Example CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE EVEN; Node 1 Slice 0 Slice 1 Node 2 Slice 2 Slice 3 INSERT INTO deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: deep_dive User Columns System Columns aid loc dt ins del row Table: deep_dive User Columns System Columns aid loc dt ins del row Table: deep_dive User Columns System Columns aid loc dt ins del row Table: deep_dive User Columns System Columns aid loc dt ins del row Rows: 0 Rows: 0 Rows: 0 Rows: 0Rows: 1 Rows: 1 Rows: 1 Rows: 1
  • 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE KEY DISTKEY (loc); Data Distribution: KEY Example #1 Node 1 Slice 0 Slice 1 Node 2 Slice 2 Slice 3 INSERT INTO deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: deep_dive User Columns System Columns aid loc dt ins del row Rows: 2 Rows: 0 Rows: 0Rows: 0Rows: 1 Table: deep_dive User Columns System Columns aid loc dt ins del row Rows: 2Rows: 0Rows: 1
  • 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Data Distribution: KEY Example #2 CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE KEY DISTKEY (aid); Node 1 Slice 0 Slice 1 Node 2 Slice 2 Slice 3 INSERT INTO deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: deep_dive User Columns System Columns aid loc dt ins del row Table: deep_dive User Columns System Columns aid loc dt ins del row Table: deep_dive User Columns System Columns aid loc dt ins del row Table: deep_dive User Columns System Columns aid loc dt ins del row Rows: 0 Rows: 0 Rows: 0 Rows: 0Rows: 1 Rows: 1 Rows: 1 Rows: 1
  • 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Data Distribution: ALL Example Node 1 Slice 0 Slice 1 Node 2 Slice 2 Slice 3 INSERT INTO deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Rows: 0 Rows: 0 Table: deep_dive User Columns System Columns aid loc dt ins del row Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3 Table: deep_dive User Columns System Columns aid loc dt ins del row Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3 CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE ALL;
  • 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Best Practices: Data Distribution DISTSTYLE KEY • Goals • Optimize JOIN performance between large tables • Optimize INSERT INTO SELECT performance • Optimize GROUP BY performance • The column that is being distributed on should have a high cardinality and not cause row skew: DISTSTYLE ALL • Goals • Optimize JOIN performance with dimension tables • Reduces disk usage on small tables • Small and medium size dimension tables (< 3M rows) DISTSTYLE EVEN • If neither KEY or ALL apply (or you are unsure) SELECT diststyle, skew_rows FROM svv_table_info WHERE "table" = 'deep_dive'; diststyle | skew_rows -----------+----------- KEY(aid) | 1.07  Ratio between the slice with the most and least number of rows
  • 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Best Practices: Table Design Summary • Materialize often filtered columns from dimension tables into fact tables • Materialize often calculated values into tables • Avoid distribution keys on temporal columns • Keep data types as wide as necessary (but no longer than necessary) • VARCHAR, CHAR, and NUMERIC • Add compression to columns • Optimal compression can be found using ANALYZE COMPRESSION • Add sort keys on the primary columns that are filtered on
  • 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Allows for the separation of different query workloads • Goals • Prioritize important queries • Throttle/abort less important queries • Control concurrent number of executing of queries • Divide cluster memory • Set query timeouts to abort long running queries Workload Management (WLM)
  • 33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved • Queues – Assigned a percentage of cluster memory – SQL queries execute in queue based on • User group: which groups the user belongs to • Query group session level variable • Slots – Division of memory within a WLM Queue, correlated with the number of simultaneous running queries – WLM_QUERY_SLOT_COUNT is a session level variable • Useful to increase for memory intensive operations Terminology and Concepts: WLM
  • 34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved • Use case: – Light ingestion/ELT on a continuous cadence of 10 minutes – Peak reporting workload during business hours (7 a.m.–7 p.m.) – Heavy ingestion/ELT nightly (11 p.m.–3 a.m.) • User types: – Business reporting and dashboards – Analysts and data science teams – Database administrators Workload Management: Example
  • 35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved • Unallocated memory goes into a general pool that can be used by any queue • Hidden superuser queue can be used by admins, manually switched into: SET query_group TO 'superuser' • The superuser queue has a single slot, the equivalent of 5–7% memory allocation, and no timeout Workload Management: Example Queue Name Memory Slots/Concurrency Timeout (seconds) Ingestion 20% 2 None Dashboard 50% 10 120 Default (Analysts) 25% 3 None Create a queue for each workload type:
  • 36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Extension of workload management (WLM) Allow the automatic handling of runaway (poorly written) queries • Rules applied to a WLM queue allow queries to be: – LOGGED – ABORTED – HOPPED • Goals – Protect against wasteful use of the cluster – Log resource-intensive queries Query Monitoring Rules (QMR)
  • 37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved • Metrics with operators and values (e.g. query_cpu_time > 1000) create a predicate • Multiple predicates can be AND-ed together to create a rule • Multiple rules can be defined for a queue in WLM. These rules are OR-ed together If { rule } then [action] { rule : metric operator value } e.g.: rows_scanned > 100000 • Metric: cpu_time, query_blocks_read, rows scanned, query execution time, cpu & io skew per slice, join_row_count, etc. • Operator: <, >, == • Value: integer [action]: hop, log, abort Query Monitoring Rules (QMR)
  • 38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved • Keep the number of WLM queues to a minimum, typically just three queues to avoid having unused queues https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminScripts/wlm_apex_hourly.sql • Use WLM to limit ingestion/ELT concurrency to two to three • To maximize query throughput, use WLM to throttle the number of concurrent queries to 15 or less • Use QMR rather than WLM to set query timeouts • Use QMR to log long running queries • Save the superuser queue for administration tasks and canceling queries Best Practices: WLM and QMR
  • 39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved • Dense Compute—DC2 • Solid state disks • Dense Storage—DS2 • Magnetic disks Instance Type Disk Type Size Memory CPUs DC2 large NVMe SSD 160 GB 16 GB 2 DC2 8xlarge NVMe SSD 2.56 TB 244 GB 32 DS2 xlarge Magnetic 2 TB 32 GB 4 DS2 8xlarge Magnetic 16 TB 244 GB 36 Terminology and Concepts: Node Types
  • 40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Best Practices: Cluster Sizing • Use at least two computes nodes (multi-node cluster) in production for data mirroring – Leader node is given for no additional cost • Amazon Redshift is significantly faster in a VPC compared to EC2 Classic • Maintain at least 20% free space or three times the size of the largest table – Scratch space for usage, rewriting tables – Free space is required for vacuum to re-sort table – Temporary tables used for intermediate query results • The maximum number of available Amazon Redshift Spectrum nodes is a function of the number of slices in the Amazon Redshift cluster • If you’re using DC1 instances, upgrade to the DC2 instance type – Same price as DC1, significantly faster – Reserved Instances do not automatically transfer over
  • 41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Fast @ Exabyte scale Elastic & highly available On-demand, pay-per-query High concurrency: Multiple clusters access same data No ETL: Query data in- place using open file formats Full Amazon Redshift SQL support S3 SQL Run SQL queries directly against data in S3 using thousands of nodes Amazon Redshift Spectrum
  • 42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Query: SELECT COUNT(*) FROM s3.ext_table Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Metastore
  • 43. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved An Exabyte Query: Harry Potter Roughly 140 TB of customer item order detail records for each day over past 20 years. 190 million files across 15,000 partitions in S3. One partition per day for USA and rest of world. Need a billion-fold reduction in data processed. Running this query using a 1000 node Hive cluster would take over 5 years.* • Compression ……………..….……..5X • Columnar file format……….......…10X • Scanning with 2500 nodes…....2500X • Static partition elimination…............2X • Dynamic partition elimination..….350X • Redshift’s query optimizer……......40X --------------------------------------------------- Total reduction……….…………3.5B X * Estimated using 20 node Hive cluster & 1.4TB, assume linear * Query used a 20 node DC1.8XLarge Amazon Redshift cluster * Not actual sales data - generated for this demo based on data format used by Amazon Retail.
  • 44. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved AWS Labs on GitHub—Amazon Redshift • https://github.com/awslabs/amazon-redshift-utils • https://github.com/awslabs/amazon-redshift-monitoring • https://github.com/awslabs/amazon-redshift-udfs • Admin Scripts Collection of utilities for running diagnostics on your cluster • Admin Views Collection of utilities for managing your cluster, generating schema DDL, and so on • Analyze Vacuum Utility Utility that can be scheduled to vacuum and analyze the tables within your Amazon Redshift cluster • Column Encoding Utility Utility that will apply optimal column encoding to an established schema with data already loaded
  • 45. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved AWS Big Data Blog—Amazon Redshift Amazon Redshift Engineering’s Advanced Table Design Playbook https://aws.amazon.com/blogs/big-data/amazon-redshift-engineerings-advanced-table-design- playbook-preamble-prerequisites-and-prioritization/ - Zach Christopherson • Top 10 Performance Tuning Techniques for Amazon Redshift https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-techniques-for-amazon-redshift/ - Ian Meyers and Zach Christopherson • 10 Best Practices for Amazon Redshift Spectrum https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/ - Po Hong and Peter Dalton
  • 46. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved aws.amazon.com/activate Everything and Anything Startups Need to Get Started on AWS