SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Understanding the Basics & Avoiding Common Mistakes
Presented by: Michael Krouze, CTO & VP Analytics, Charter Solutions, Inc.
Redshift 101
Charter Solutions’ Partnerships
2
What is Amazon Redshift?
3
Amazon Redshift is a cloud hosted,
fast, fully-managed, petabyte-
scale data warehouse.
Distributed rather than single node
4
vs.
Columnar rather than row-based
5
Enough intro, on to the meat of the presentation
7
Pick the right node
type for your cluster
Redshift Node Options
8
dc1.large: 15 GB RAM, 2 cores, 2 slices,
160 GB SSD, 5.12 TB max/cluster
dc1.8xlarge: 244 GB RAM, 32 cores, 32
slices, 2.56 TB SSD, 326 TB max/cluster
dS2.xlarge: 15 GB RAM, 4 cores, 2 slices, 2
TB HDD, 64 TB max/cluster
ds2.8xlarge: 244 GB RAM, 36 cores, 16
slices, 16 TB SSD, 2 PB max/cluster
DenseComputeDenseStorage
¨ Geared to high performance
¨ SSD Storage (326 TB max)
¨ ~ 95 GB member per TB of storage
¨ Starts at $0.25/hr
¨ Geared to large data sets
¨ HDD Storage (2PB max)
¨ ~ 15 GB memory per TB of storage
¨ Starts at $0.85/hr
9
Understand and use
sort keys properly
Zone Maps
Read
Min: 5
Max 45
Read
Min: 9
Max: 32
Min: 30
Max: 42
Read
Min: 22
Max : 80
Read
Min: 18
Max: 50
10
Min: 1
Max 10
Read
Min: 11
Max: 25
Min: 26
Max: 40
Min: 41
Max : 55
Min: 56
Max: 95
Select count(*) from customers where age = 24
Unsorted Sorted
Sort Key Options
11
Single Column Sort Key • Table is sorted by 1 column
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group-bys
• Quickest to VACUUM
Sort Key Options
12
Single Column Sort Key • Table is sorted by 1 column
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group-bys
• Quickest to VACUUM
Compound Sort Key • Table is sorted by 1st column , then 2nd column etc.
• Queries that use 1st column as primary filter, then other columnss
• Can speed up joins and group bys
• Slower to VACUUM
Sort Key Options
13
Single Column Sort Key • Table is sorted by 1 column
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group-bys
• Quickest to VACUUM
Compound Sort Key • Table is sorted by 1st column , then 2nd column etc.
• Queries that use 1st column as primary filter, then other columnss
• Can speed up joins and group bys
• Slower to VACUUM
Interleaved Sort Key • Equal weight is given to each column
• Queries that use different columns in filter
• Queries get fasterthe more columns used in the filter (up to 8)
• Slowest to VACUUM
• More effective with large tables (> 100M+ rows)
14
Understand and use
distribution styles and
keys properly
Distribution Style Options
15
All
Node  1
Slice  
1
Slice  
2
Node  2
Slice  
3
Slice  
4
All   data  on  every  node
Key
Node  1
Slice  
1
Slice  
2
Node  2
Slice  
3
Slice  
4
Same   key  to  same  location
Node  1
Slice  
1
Slice  
2
Node  2
Slice  
3
Slice  
4
Even
Round  robin  distribution
• Tables with no joins or
group-bys
• Small dimension tables
(<1000 rows)
• Medium dimension
tables (1K – 2M)
• Large fact tables
• Large dimension tables
16
Primary keys and
foreign keys don’t
work the way you
think
How are they different?
17
u Primary and foreign key constraints are not enforced by Redshift
u Indexes are not created (only sort keys exist for indexing)
u They do help with query plan optimization though
18
Compress your
columns
Redshift Compression
19
u Each column can be compressed with most appropriatealgorithm for content
u Many algorithms supported
u Raw encoding, Byte-dictionary, Delta encoding, Mostly encoding, Runlength encoding, Text encoding,
LZO encoding
u Average of 2-4x compression rates are common
u Can cut query time as much as 50%
u Use analyze  compression to get recommendations
20
Vacuum and analyze
regularly
Addition of new rows create unsorted regions
21
Vacuum reclaims space and re-sorts tables
22
Vacuum
23
u 4 modes:
u FULL – Reclaims space and re-sorts
u DELETE ONLY – Reclaims space but does not re-sort
u SORT ONLY – Re-sorts but does not reclaim space
u REINDEX – Used for INTERLEAVED sort keys. Re-Analyzes sort keys and then runs FULL VACUUM
u Vacuum is I/O intensive and can take time to run
u Run regularly to minimize impact
Analyze
24
u Updates statistics used by the query planner
u Run regularly to keep statistics up to date
u Especially after large data loads
25
Monitor and tune
workload management
Workload Management
26
u Workload management is about creating queues for different workloads
User Group A
Short-running queueLong-running queue
Short
Query Group
Long
Query Group
Thank you!
u Contact me:
u michael.krouze@chartersolutions.com
u @mjkrouze
u Resources:
u www.chartersolutions.com
u github.com/awslabs/amazon-redshift-utils
u AWS YouTube channel
u AWS on SlideShare

Contenu connexe

En vedette

Redshift Introduction
Redshift IntroductionRedshift Introduction
Redshift IntroductionDataKitchen
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon RedshiftAmazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
RFC 7435 - Opportunistic security - Some protection most of the time
RFC 7435 - Opportunistic security - Some protection most of the timeRFC 7435 - Opportunistic security - Some protection most of the time
RFC 7435 - Opportunistic security - Some protection most of the timeOlle E Johansson
 
syENGAGE Company Profile
syENGAGE Company ProfilesyENGAGE Company Profile
syENGAGE Company ProfileSimon Young
 
Time management, Portent-style
Time management, Portent-styleTime management, Portent-style
Time management, Portent-styleIan Lurie
 
Introduction And Graphs
Introduction And GraphsIntroduction And Graphs
Introduction And GraphsZia Khan
 
Using Clocker with Project Calico - Running Production Workloads in the Cloud
Using Clocker with Project Calico - Running Production Workloads in the CloudUsing Clocker with Project Calico - Running Production Workloads in the Cloud
Using Clocker with Project Calico - Running Production Workloads in the CloudAndrew Kennedy
 
Copiade Vuelode Gansos
Copiade Vuelode GansosCopiade Vuelode Gansos
Copiade Vuelode Gansosjoanvinpa
 
Proxecto de recuperación do río Corgo nos Salgueiriños
Proxecto de recuperación do río Corgo nos SalgueiriñosProxecto de recuperación do río Corgo nos Salgueiriños
Proxecto de recuperación do río Corgo nos Salgueiriñosbng.compostela
 
Day 3 2nd_weekcris
Day 3 2nd_weekcrisDay 3 2nd_weekcris
Day 3 2nd_weekcriscristiarnau
 
Creating a Culture around Social Media
Creating a Culture around Social MediaCreating a Culture around Social Media
Creating a Culture around Social MediaSimon Young
 
Video Game Collection @ Your Library
Video Game Collection @ Your LibraryVideo Game Collection @ Your Library
Video Game Collection @ Your LibraryMaggie Hommel Thomann
 
כיצד מראיינים כתב טכני טוב?
כיצד מראיינים כתב טכני טוב?כיצד מראיינים כתב טכני טוב?
כיצד מראיינים כתב טכני טוב?elijacobs
 
Het Spel Van De Wereld
Het Spel Van De WereldHet Spel Van De Wereld
Het Spel Van De WereldyentelB
 
SYNOPSIS PELVIC PAIN RESEARCH
SYNOPSIS PELVIC PAIN RESEARCHSYNOPSIS PELVIC PAIN RESEARCH
SYNOPSIS PELVIC PAIN RESEARCHZia Khan
 

En vedette (20)

Redshift Introduction
Redshift IntroductionRedshift Introduction
Redshift Introduction
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Design Pattern
Design PatternDesign Pattern
Design Pattern
 
RFC 7435 - Opportunistic security - Some protection most of the time
RFC 7435 - Opportunistic security - Some protection most of the timeRFC 7435 - Opportunistic security - Some protection most of the time
RFC 7435 - Opportunistic security - Some protection most of the time
 
syENGAGE Company Profile
syENGAGE Company ProfilesyENGAGE Company Profile
syENGAGE Company Profile
 
Time management, Portent-style
Time management, Portent-styleTime management, Portent-style
Time management, Portent-style
 
Introduction And Graphs
Introduction And GraphsIntroduction And Graphs
Introduction And Graphs
 
Using Clocker with Project Calico - Running Production Workloads in the Cloud
Using Clocker with Project Calico - Running Production Workloads in the CloudUsing Clocker with Project Calico - Running Production Workloads in the Cloud
Using Clocker with Project Calico - Running Production Workloads in the Cloud
 
Copiade Vuelode Gansos
Copiade Vuelode GansosCopiade Vuelode Gansos
Copiade Vuelode Gansos
 
Proxecto de recuperación do río Corgo nos Salgueiriños
Proxecto de recuperación do río Corgo nos SalgueiriñosProxecto de recuperación do río Corgo nos Salgueiriños
Proxecto de recuperación do río Corgo nos Salgueiriños
 
Day 3 2nd_weekcris
Day 3 2nd_weekcrisDay 3 2nd_weekcris
Day 3 2nd_weekcris
 
Creating a Culture around Social Media
Creating a Culture around Social MediaCreating a Culture around Social Media
Creating a Culture around Social Media
 
Video Game Collection @ Your Library
Video Game Collection @ Your LibraryVideo Game Collection @ Your Library
Video Game Collection @ Your Library
 
כיצד מראיינים כתב טכני טוב?
כיצד מראיינים כתב טכני טוב?כיצד מראיינים כתב טכני טוב?
כיצד מראיינים כתב טכני טוב?
 
Het Spel Van De Wereld
Het Spel Van De WereldHet Spel Van De Wereld
Het Spel Van De Wereld
 
SYNOPSIS PELVIC PAIN RESEARCH
SYNOPSIS PELVIC PAIN RESEARCHSYNOPSIS PELVIC PAIN RESEARCH
SYNOPSIS PELVIC PAIN RESEARCH
 

Similaire à Redshift 101

AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationVolodymyr Rovetskiy
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Web Services
 
Web Application Development using PHP Chapter 6
Web Application Development using PHP Chapter 6Web Application Development using PHP Chapter 6
Web Application Development using PHP Chapter 6Mohd Harris Ahmad Jaal
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Lviv Startup Club
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisOfer Zelig
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAmazon Web Services
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Cloudera, Inc.
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataAmazon Web Services
 
Indy pass writing efficient queries – part 1 - indexing
Indy pass   writing efficient queries – part 1 - indexingIndy pass   writing efficient queries – part 1 - indexing
Indy pass writing efficient queries – part 1 - indexingeddiew
 
SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)Shy Engelberg
 

Similaire à Redshift 101 (20)

AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
Web Application Development using PHP Chapter 6
Web Application Development using PHP Chapter 6Web Application Development using PHP Chapter 6
Web Application Development using PHP Chapter 6
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
Intro to column stores
Intro to column storesIntro to column stores
Intro to column stores
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big Data
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Indy pass writing efficient queries – part 1 - indexing
Indy pass   writing efficient queries – part 1 - indexingIndy pass   writing efficient queries – part 1 - indexing
Indy pass writing efficient queries – part 1 - indexing
 
SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 

Dernier

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 

Dernier (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Redshift 101

  • 1. Understanding the Basics & Avoiding Common Mistakes Presented by: Michael Krouze, CTO & VP Analytics, Charter Solutions, Inc. Redshift 101
  • 3. What is Amazon Redshift? 3 Amazon Redshift is a cloud hosted, fast, fully-managed, petabyte- scale data warehouse.
  • 4. Distributed rather than single node 4 vs.
  • 5. Columnar rather than row-based 5
  • 6. Enough intro, on to the meat of the presentation
  • 7. 7 Pick the right node type for your cluster
  • 8. Redshift Node Options 8 dc1.large: 15 GB RAM, 2 cores, 2 slices, 160 GB SSD, 5.12 TB max/cluster dc1.8xlarge: 244 GB RAM, 32 cores, 32 slices, 2.56 TB SSD, 326 TB max/cluster dS2.xlarge: 15 GB RAM, 4 cores, 2 slices, 2 TB HDD, 64 TB max/cluster ds2.8xlarge: 244 GB RAM, 36 cores, 16 slices, 16 TB SSD, 2 PB max/cluster DenseComputeDenseStorage ¨ Geared to high performance ¨ SSD Storage (326 TB max) ¨ ~ 95 GB member per TB of storage ¨ Starts at $0.25/hr ¨ Geared to large data sets ¨ HDD Storage (2PB max) ¨ ~ 15 GB memory per TB of storage ¨ Starts at $0.85/hr
  • 10. Zone Maps Read Min: 5 Max 45 Read Min: 9 Max: 32 Min: 30 Max: 42 Read Min: 22 Max : 80 Read Min: 18 Max: 50 10 Min: 1 Max 10 Read Min: 11 Max: 25 Min: 26 Max: 40 Min: 41 Max : 55 Min: 56 Max: 95 Select count(*) from customers where age = 24 Unsorted Sorted
  • 11. Sort Key Options 11 Single Column Sort Key • Table is sorted by 1 column • Queries that use 1st column (i.e. date) as primary filter • Can speed up joins and group-bys • Quickest to VACUUM
  • 12. Sort Key Options 12 Single Column Sort Key • Table is sorted by 1 column • Queries that use 1st column (i.e. date) as primary filter • Can speed up joins and group-bys • Quickest to VACUUM Compound Sort Key • Table is sorted by 1st column , then 2nd column etc. • Queries that use 1st column as primary filter, then other columnss • Can speed up joins and group bys • Slower to VACUUM
  • 13. Sort Key Options 13 Single Column Sort Key • Table is sorted by 1 column • Queries that use 1st column (i.e. date) as primary filter • Can speed up joins and group-bys • Quickest to VACUUM Compound Sort Key • Table is sorted by 1st column , then 2nd column etc. • Queries that use 1st column as primary filter, then other columnss • Can speed up joins and group bys • Slower to VACUUM Interleaved Sort Key • Equal weight is given to each column • Queries that use different columns in filter • Queries get fasterthe more columns used in the filter (up to 8) • Slowest to VACUUM • More effective with large tables (> 100M+ rows)
  • 14. 14 Understand and use distribution styles and keys properly
  • 15. Distribution Style Options 15 All Node  1 Slice   1 Slice   2 Node  2 Slice   3 Slice   4 All   data  on  every  node Key Node  1 Slice   1 Slice   2 Node  2 Slice   3 Slice   4 Same   key  to  same  location Node  1 Slice   1 Slice   2 Node  2 Slice   3 Slice   4 Even Round  robin  distribution • Tables with no joins or group-bys • Small dimension tables (<1000 rows) • Medium dimension tables (1K – 2M) • Large fact tables • Large dimension tables
  • 16. 16 Primary keys and foreign keys don’t work the way you think
  • 17. How are they different? 17 u Primary and foreign key constraints are not enforced by Redshift u Indexes are not created (only sort keys exist for indexing) u They do help with query plan optimization though
  • 19. Redshift Compression 19 u Each column can be compressed with most appropriatealgorithm for content u Many algorithms supported u Raw encoding, Byte-dictionary, Delta encoding, Mostly encoding, Runlength encoding, Text encoding, LZO encoding u Average of 2-4x compression rates are common u Can cut query time as much as 50% u Use analyze  compression to get recommendations
  • 21. Addition of new rows create unsorted regions 21
  • 22. Vacuum reclaims space and re-sorts tables 22
  • 23. Vacuum 23 u 4 modes: u FULL – Reclaims space and re-sorts u DELETE ONLY – Reclaims space but does not re-sort u SORT ONLY – Re-sorts but does not reclaim space u REINDEX – Used for INTERLEAVED sort keys. Re-Analyzes sort keys and then runs FULL VACUUM u Vacuum is I/O intensive and can take time to run u Run regularly to minimize impact
  • 24. Analyze 24 u Updates statistics used by the query planner u Run regularly to keep statistics up to date u Especially after large data loads
  • 26. Workload Management 26 u Workload management is about creating queues for different workloads User Group A Short-running queueLong-running queue Short Query Group Long Query Group
  • 28. u Contact me: u michael.krouze@chartersolutions.com u @mjkrouze u Resources: u www.chartersolutions.com u github.com/awslabs/amazon-redshift-utils u AWS YouTube channel u AWS on SlideShare