SlideShare une entreprise Scribd logo
1  sur  26
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Eric Ferreira
Principal Engineer, Amazon Redshift
Optimising Your Amazon Redshift
Cluster For Peak Performance
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Recent Features
Timeless Best Practices
Additional Resources
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Are You An Amazon Redshift User?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
PostgreSQL
Columnar
MPP
OLAP
AWS IAMAmazon VPCAmazon SWF
Amazon S3 AWS KMS
Amazon
Route 53
Amazon
CloudWatch
Amazon EC2
Amazon Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
February 2013
April 2018
> 125 Significant Patches
> 165 Significant Features
Significant Features & Patches
Load
Unload
Backup
Restore
Massively parallel, shared nothing columnar
architecture
Leader node
• SQL endpoint
• Stores metadata
• Coordinates parallel SQL processing
Compute nodes
• Local, columnar storage
• Executes queries in parallel
• Load, unload, backup, restore
Amazon Redshift Spectrum nodes
• Execute queries directly against
Amazon Simple Storage Service
(Amazon S3)
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Leader
Node
Amazon S3
...
1 2 3 4 N
Amazon
Redshift
Spectrum
Load
Query
Amazon Redshift Architecture
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Think: Toaster
• You submit your workload
• Choose a few options
• It runs fast and cheap
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recent Features
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Short Query Acceleration
• New queue (#14) can be enabled via the console or API
• Total Concurrency should be less than 15
• Adds 6 new slots and equivalent query processors
• Only active if there are queries waiting in queue
• Machine learning algorithm picks queries that are eligible
• 3x throughput improvement on short query workload with
minimal effect on long running queries.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Results Caching
• Sub-second response time for repeat queries
• Fully ACID (we take care of invalidating the cache if data
changes)
• No queueing
• Size proportional of your node type
• New View SVL_QLOG to get information of both regular
and cached queries.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Late Materialisation
• Further reduces I/O on scanning tables when using
multiple predicates
• In some cases allows for compressing Sort Keys without
performance degradation on scans.
• Fully automatic
• New column (is_rlf_scan) on STL_SCAN
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Improved Commit Speeds
• Read-Only transactions batch checkpoints
• Selects and CTAS/DML on Temporary objects
• Greatest improvement on complex but fast executing
queries (volt_tt temporary tables)
• We are constantly working to improve transaction speeds
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Timeless Best Practices
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migration
• Lift-and-Shift is NOT an ideal approach
• Depending where you are coming from, it is
sure to fail
• AWS has a rich ecosystem of solutions
• Your final solution will use other AWS services
• AWS Solution Architects, ProServ, and Partners
can help
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Distribution
• Distribution Styles
• KEY: Value is hashed, same value goes to same location (slice)
• ALL: Full table data goes to first slice of every node
• EVEN: Round robin
• Goals
• Distribute data evenly for parallel processing
• Minimise data movement during query processing
KEY
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
EVEN
ALL
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Table Design Summary
• Materialise often filtered columns from dimension tables
into fact tables
• Materialise often calculated values into tables
• Avoid DIST KEYS on temporal columns
• Keep data types as wide as necessary (but no longer than
necessary)
• VARCHAR, CHAR and NUMERIC
• Add compression to columns
• Optimal compression can be found using ANALYSE COMPRESSION
• Add SORT KEYS on the primary columns that are filtered on
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Copy & Unload
• Delimited files are recommend
• Split files so there is a multiple of the number of slices
• Files sizes should be 1MB – 1GB after compression
• Use UNLOAD to extract large amounts of data from the
cluster
• Non-parallel UNLOAD only for very small amounts of data
S3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Extract, Load & Transform (ELT)
Wrap workflow/statements in an explicit transaction
Consider using DROP TABLE or TRUNCATE instead of DELETE
Staging Tables
• Use temporary table or permanent table with the “BACKUP NO” option
• If possible use DISTSTYLE KEY on both the staging table and production table to speed
up the INSERT AS SELECT statement
• Turn off automatic compression - COMPUPDATE OFF
• Copy compression settings from production table or use ANALYSE COMPRESSION
statement
• Use CREATE TABLE LIKE or write encodings into the DDL
• For copying a large number of rows (> hundreds of millions) consider using ALTER
TABLE APPEND instead of INSERT AS SELECT
SQL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Vacuum & Analyse
• VACUUM should be run as necessary
• Typically nightly or weekly
• Consider “Deep Copy” for larger or wide tables
• ANALYSE can be run periodically after ingestion on just predicate
columns
• Utility to VACUUM and ANALYSE all the tables in the cluster:
https://github.com/awslabs/amazon-redshift utils/tree/master/src/AnalyzeVacuumUtility
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
WLM & QMR
• Keep the number of WLM queues to a minimum, typically just 3 queues
to avoid having unused queues
• https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminScripts/wlm_apex_hourly.sql
• Use WLM to limit ingestion/ELT concurrency to 2-3
• To maximise query throughput use WLM to throttle number of
concurrent queries to 15 or less
• Use QMR rather than WLM to set query timeouts
• Use QMR to log long running queries
• Save the superuser queue for administration tasks and canceling
queries
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cluster Sizing
Use at least two computes nodes (multi-node cluster) in production for data mirroring
• Leader Node is given for no additional cost
Amazon Redshift is significantly faster in a VPC compared to EC2 Classic
Maintain at least 20% free space or 3x the size of the largest table
• Scratch space for re-writing tables
• Free space is required for vacuum to resort table
• Temporary tables used for intermediate query results
The maximum number of available Amazon Redshift Spectrum nodes is a function of the
number of slices in the Amazon Redshift cluster
If you’re using DC1 instances, upgrade to the DC2 instance type
• Same price as DC1, significantly faster
• Reserved Instances do not automatically transfer over
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Additional Resources
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Labs On Github – Amazon Redshift
https://github.com/awslabs/amazon-redshift-utils
https://github.com/awslabs/amazon-redshift-monitoring
https://github.com/awslabs/amazon-redshift-udfs
Admin Scripts
• Collection of utilities for running diagnostics on your cluster
Admin Views
• Collection of utilities for managing your cluster, generating schema DDL, etc.
Analyse Vacuum Utility
• Utility that can be scheduled to vacuum and analyse the tables within your Amazon Redshift cluster
Column Encoding Utility
• Utility that will apply optimal column encoding to an established schema with data already loaded
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Big Data Blog – Amazon Redshift
Amazon Redshift Engineering’s Advanced Table Design Playbook
https://aws.amazon.com/blogs/big-data/amazon-redshift-engineerings-advanced-table-design-
playbook-preamble-prerequisites-and-prioritization/
- Zach Christopherson
Top 10 Performance Tuning Techniques for Amazon Redshift
https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-techniques-for-amazon-redshift/
- Ian Meyers and Zach Christopherson
10 Best Practices for Amazon Redshift Spectrum
https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/
- Po Hong and Peter Dalton
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Back To Our Toaster…
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank You

Contenu connexe

Tendances

Tendances (20)

20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
 
20191023 AWS Black Belt Online Seminar Amazon EMR
20191023 AWS Black Belt Online Seminar Amazon EMR20191023 AWS Black Belt Online Seminar Amazon EMR
20191023 AWS Black Belt Online Seminar Amazon EMR
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
 
Deep Dive: Amazon DynamoDB (db tech showcase 2016)
Deep Dive: Amazon DynamoDB (db tech showcase 2016) Deep Dive: Amazon DynamoDB (db tech showcase 2016)
Deep Dive: Amazon DynamoDB (db tech showcase 2016)
 
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
 
How Can I Build a Landing Zone & Extend my Operations into AWS to Support my ...
How Can I Build a Landing Zone & Extend my Operations into AWS to Support my ...How Can I Build a Landing Zone & Extend my Operations into AWS to Support my ...
How Can I Build a Landing Zone & Extend my Operations into AWS to Support my ...
 
20200218 AWS Black Belt Online Seminar Next Generation Redshift
20200218 AWS Black Belt Online Seminar Next Generation Redshift20200218 AWS Black Belt Online Seminar Next Generation Redshift
20200218 AWS Black Belt Online Seminar Next Generation Redshift
 
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovations
re:Invent 2022  DAT326 Deep dive into Amazon Aurora and its innovationsre:Invent 2022  DAT326 Deep dive into Amazon Aurora and its innovations
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovations
 
Snowflake Architecture and Performance
Snowflake Architecture and PerformanceSnowflake Architecture and Performance
Snowflake Architecture and Performance
 
20190806 AWS Black Belt Online Seminar AWS Glue
20190806 AWS Black Belt Online Seminar AWS Glue20190806 AWS Black Belt Online Seminar AWS Glue
20190806 AWS Black Belt Online Seminar AWS Glue
 
Amazon Redshiftへの移行方法と設計のポイント(db tech showcase 2016)
Amazon Redshiftへの移行方法と設計のポイント(db tech showcase 2016)Amazon Redshiftへの移行方法と設計のポイント(db tech showcase 2016)
Amazon Redshiftへの移行方法と設計のポイント(db tech showcase 2016)
 
Amazon Aurora Deep Dive (db tech showcase 2016)
Amazon Aurora Deep Dive (db tech showcase 2016)Amazon Aurora Deep Dive (db tech showcase 2016)
Amazon Aurora Deep Dive (db tech showcase 2016)
 
20190522 AWS Black Belt Online Seminar AWS Step Functions
20190522 AWS Black Belt Online Seminar AWS Step Functions20190522 AWS Black Belt Online Seminar AWS Step Functions
20190522 AWS Black Belt Online Seminar AWS Step Functions
 
20190424 AWS Black Belt Online Seminar Amazon Aurora MySQL
20190424 AWS Black Belt Online Seminar Amazon Aurora MySQL20190424 AWS Black Belt Online Seminar Amazon Aurora MySQL
20190424 AWS Black Belt Online Seminar Amazon Aurora MySQL
 
Redshift勉強会
Redshift勉強会Redshift勉強会
Redshift勉強会
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
AWS Black Belt Techシリーズ Amazon EMR
AWS Black Belt Techシリーズ  Amazon EMRAWS Black Belt Techシリーズ  Amazon EMR
AWS Black Belt Techシリーズ Amazon EMR
 
20191120 AWS Black Belt Online Seminar Amazon Managed Streaming for Apache Ka...
20191120 AWS Black Belt Online Seminar Amazon Managed Streaming for Apache Ka...20191120 AWS Black Belt Online Seminar Amazon Managed Streaming for Apache Ka...
20191120 AWS Black Belt Online Seminar Amazon Managed Streaming for Apache Ka...
 
ログ管理のベストプラクティス
ログ管理のベストプラクティスログ管理のベストプラクティス
ログ管理のベストプラクティス
 
Amazon Athena 初心者向けハンズオン
Amazon Athena 初心者向けハンズオンAmazon Athena 初心者向けハンズオン
Amazon Athena 初心者向けハンズオン
 

Similaire à Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Sydney 2018

Similaire à Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Sydney 2018 (20)

Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak Performance
 
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with Lab
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SF
 
SQL Server on AWS
SQL Server on AWSSQL Server on AWS
SQL Server on AWS
 
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
 
What's New in Amazon Relational Database Service (DAT203) - AWS re:Invent 2018
What's New in Amazon Relational Database Service (DAT203) - AWS re:Invent 2018What's New in Amazon Relational Database Service (DAT203) - AWS re:Invent 2018
What's New in Amazon Relational Database Service (DAT203) - AWS re:Invent 2018
 
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
 
Cost and Performance Optimisation in Amazon RDS - AWS Summit Sydney 2018
Cost and Performance Optimisation in Amazon RDS - AWS Summit Sydney 2018Cost and Performance Optimisation in Amazon RDS - AWS Summit Sydney 2018
Cost and Performance Optimisation in Amazon RDS - AWS Summit Sydney 2018
 
GPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud Data
GPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud DataGPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud Data
GPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud Data
 
GPS: Optimizing Tips: Amazon Redshift for Cloud Data Warehousing - GPSTEC315 ...
GPS: Optimizing Tips: Amazon Redshift for Cloud Data Warehousing - GPSTEC315 ...GPS: Optimizing Tips: Amazon Redshift for Cloud Data Warehousing - GPSTEC315 ...
GPS: Optimizing Tips: Amazon Redshift for Cloud Data Warehousing - GPSTEC315 ...
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
SQL Server on AWS
SQL Server on AWSSQL Server on AWS
SQL Server on AWS
 
ABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWS
 
Relational Database Services on AWS - Bill Baldwin
Relational Database Services on AWS - Bill BaldwinRelational Database Services on AWS - Bill Baldwin
Relational Database Services on AWS - Bill Baldwin
 

Plus de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Sydney 2018

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Eric Ferreira Principal Engineer, Amazon Redshift Optimising Your Amazon Redshift Cluster For Peak Performance
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Recent Features Timeless Best Practices Additional Resources
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Are You An Amazon Redshift User?
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. PostgreSQL Columnar MPP OLAP AWS IAMAmazon VPCAmazon SWF Amazon S3 AWS KMS Amazon Route 53 Amazon CloudWatch Amazon EC2 Amazon Redshift
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. February 2013 April 2018 > 125 Significant Patches > 165 Significant Features Significant Features & Patches
  • 6. Load Unload Backup Restore Massively parallel, shared nothing columnar architecture Leader node • SQL endpoint • Stores metadata • Coordinates parallel SQL processing Compute nodes • Local, columnar storage • Executes queries in parallel • Load, unload, backup, restore Amazon Redshift Spectrum nodes • Execute queries directly against Amazon Simple Storage Service (Amazon S3) SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores JDBC/ODBC 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node Leader Node Amazon S3 ... 1 2 3 4 N Amazon Redshift Spectrum Load Query Amazon Redshift Architecture
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Think: Toaster • You submit your workload • Choose a few options • It runs fast and cheap
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Recent Features
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Short Query Acceleration • New queue (#14) can be enabled via the console or API • Total Concurrency should be less than 15 • Adds 6 new slots and equivalent query processors • Only active if there are queries waiting in queue • Machine learning algorithm picks queries that are eligible • 3x throughput improvement on short query workload with minimal effect on long running queries.
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Results Caching • Sub-second response time for repeat queries • Fully ACID (we take care of invalidating the cache if data changes) • No queueing • Size proportional of your node type • New View SVL_QLOG to get information of both regular and cached queries.
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Late Materialisation • Further reduces I/O on scanning tables when using multiple predicates • In some cases allows for compressing Sort Keys without performance degradation on scans. • Fully automatic • New column (is_rlf_scan) on STL_SCAN
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Improved Commit Speeds • Read-Only transactions batch checkpoints • Selects and CTAS/DML on Temporary objects • Greatest improvement on complex but fast executing queries (volt_tt temporary tables) • We are constantly working to improve transaction speeds
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Timeless Best Practices
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migration • Lift-and-Shift is NOT an ideal approach • Depending where you are coming from, it is sure to fail • AWS has a rich ecosystem of solutions • Your final solution will use other AWS services • AWS Solution Architects, ProServ, and Partners can help
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Distribution • Distribution Styles • KEY: Value is hashed, same value goes to same location (slice) • ALL: Full table data goes to first slice of every node • EVEN: Round robin • Goals • Distribute data evenly for parallel processing • Minimise data movement during query processing KEY Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 EVEN ALL Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Table Design Summary • Materialise often filtered columns from dimension tables into fact tables • Materialise often calculated values into tables • Avoid DIST KEYS on temporal columns • Keep data types as wide as necessary (but no longer than necessary) • VARCHAR, CHAR and NUMERIC • Add compression to columns • Optimal compression can be found using ANALYSE COMPRESSION • Add SORT KEYS on the primary columns that are filtered on
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Copy & Unload • Delimited files are recommend • Split files so there is a multiple of the number of slices • Files sizes should be 1MB – 1GB after compression • Use UNLOAD to extract large amounts of data from the cluster • Non-parallel UNLOAD only for very small amounts of data S3
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Extract, Load & Transform (ELT) Wrap workflow/statements in an explicit transaction Consider using DROP TABLE or TRUNCATE instead of DELETE Staging Tables • Use temporary table or permanent table with the “BACKUP NO” option • If possible use DISTSTYLE KEY on both the staging table and production table to speed up the INSERT AS SELECT statement • Turn off automatic compression - COMPUPDATE OFF • Copy compression settings from production table or use ANALYSE COMPRESSION statement • Use CREATE TABLE LIKE or write encodings into the DDL • For copying a large number of rows (> hundreds of millions) consider using ALTER TABLE APPEND instead of INSERT AS SELECT SQL
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Vacuum & Analyse • VACUUM should be run as necessary • Typically nightly or weekly • Consider “Deep Copy” for larger or wide tables • ANALYSE can be run periodically after ingestion on just predicate columns • Utility to VACUUM and ANALYSE all the tables in the cluster: https://github.com/awslabs/amazon-redshift utils/tree/master/src/AnalyzeVacuumUtility
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. WLM & QMR • Keep the number of WLM queues to a minimum, typically just 3 queues to avoid having unused queues • https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminScripts/wlm_apex_hourly.sql • Use WLM to limit ingestion/ELT concurrency to 2-3 • To maximise query throughput use WLM to throttle number of concurrent queries to 15 or less • Use QMR rather than WLM to set query timeouts • Use QMR to log long running queries • Save the superuser queue for administration tasks and canceling queries
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cluster Sizing Use at least two computes nodes (multi-node cluster) in production for data mirroring • Leader Node is given for no additional cost Amazon Redshift is significantly faster in a VPC compared to EC2 Classic Maintain at least 20% free space or 3x the size of the largest table • Scratch space for re-writing tables • Free space is required for vacuum to resort table • Temporary tables used for intermediate query results The maximum number of available Amazon Redshift Spectrum nodes is a function of the number of slices in the Amazon Redshift cluster If you’re using DC1 instances, upgrade to the DC2 instance type • Same price as DC1, significantly faster • Reserved Instances do not automatically transfer over
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Additional Resources
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Labs On Github – Amazon Redshift https://github.com/awslabs/amazon-redshift-utils https://github.com/awslabs/amazon-redshift-monitoring https://github.com/awslabs/amazon-redshift-udfs Admin Scripts • Collection of utilities for running diagnostics on your cluster Admin Views • Collection of utilities for managing your cluster, generating schema DDL, etc. Analyse Vacuum Utility • Utility that can be scheduled to vacuum and analyse the tables within your Amazon Redshift cluster Column Encoding Utility • Utility that will apply optimal column encoding to an established schema with data already loaded
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Big Data Blog – Amazon Redshift Amazon Redshift Engineering’s Advanced Table Design Playbook https://aws.amazon.com/blogs/big-data/amazon-redshift-engineerings-advanced-table-design- playbook-preamble-prerequisites-and-prioritization/ - Zach Christopherson Top 10 Performance Tuning Techniques for Amazon Redshift https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-techniques-for-amazon-redshift/ - Ian Meyers and Zach Christopherson 10 Best Practices for Amazon Redshift Spectrum https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/ - Po Hong and Peter Dalton
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Back To Our Toaster…
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank You