Amazon Redshift offers many powerful features. Yet, there are many instances where customers encounter sloppy performance and cost upheavals beyond control.
Scaling AWS Redshift clusters to meet the increasing compute and reporting needs, while ensuring optimal cost, performance and security standards is quite a challenge for many organizations.
This webinar covered the following,
• Understand key design/architectural considerations of AWS Redshift
• Tips & Tricks to optimize Cost & Performance
• How Agilisium helped clients reduce AWS Redshift run cost up to 40%
Presented by:
Jay Palaniappan - CTO & Head of Innovation Labs || Smitha Basavaraju - Big Data Architect || Arun Chinnadurai - Associate Director – BD
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Get the most out of your AWS Redshift investment while keeping cost down
1. How to get the most out of your AWS Redshift
Investment, while keeping cost down
WEBINAR SERIES : AWS OPTIMIZATION
Agilisium Innovation Labs
2. • Tens of thousands of customers and growing
• 3x faster than other CDWs
• 200+ new features in last 18 months
AWS Redshift : A Shift towards the Future
3. • Keep up with the rapid pace of innovation
• Lack of time to experiment
• Extend knowledge on best practices
But what seems to be the challenge ?
All above have impeded organizations ability to extract maximum value from
their existing Redshift investments
4. In the next 30-35 mins…
• Key design/architectural considerations of AWS Redshift
• Strategies to optimize AWS Redshift for Cost & Performance
• Success Story : Reducing Redshift run cost by 40%
• How we can help you
What we would like to talk about today
5. Agilisium – Overview
U.S (60+) : Los Angeles(HQ), Chicago, Texas with global presence in
India (250+), Canada, Costa Rica, Netherlands and UK (30+)
We are a Big Data and Analytics company with clear focus
on helping organizations take the
“Data-to-Insights-Leap”
7. Our Redshift Experience
400-level AWS ExpertsProven Expertise
Top 3 AWS Redshift Competency
Partner in the U.S with razor focus
on AWS Data & Analytics solutions
Demonstrated Capability
15+ PB migrated to AWS through
$ 50 MN worth of successful Big
Data Analytics projects
55+ AWS Certified Experts. Our SAs
are regular attendees of AOD
training by Redshift Product team
8. MEET THE SPEAKERS
Jay Palaniappan
CTO & Head of Innovation Labs
Smitha Basavaraju
Big Data Architect
Arun Chinnadurai
Associate Director – BD
shukvina@amazon.com
13. 1. Reserved Instances
Reserved Instance
Immediate Low Up to 70%
Cost Savings
Reserved Instances :
Duration- 1Yr / 3Yr
Payment Option: No Upfront
Partial Upfront
All Upfront
14. 2. Pause & Resume
Pause & Resume
Immediate Low Up to 50%
Cost Savings
Pause nonproduction instances
Pay only for storage
Applicable only for on-demand instances
15. 3. Elastic Resize
Elastic Resize
Immediate Low Surge in Data/
Performance
Scales redshift clusters up and down clusters in minutes
Automate cluster resize on predictable loads
Optimize cost and plan for capacity
Schedule cluster resize using management console or API
16. 4. Concurrency Scaling
Concurrency Scaling
Immediate Low Scalable
capacity
Automatically adds transient clusters
Serves spike in concurrent requests
For 24hrs of cluster in use, 1 hr. of concurrency scaling is free
Ability to set usage limit
17. 5. Moving to RA3 Instance
RA3
Immediate Zero 2x performance
uplift | 2x
storage
Scale data warehouse based on workload and scale on peak
demand
Pay separately compute and storage independently
2X performance and 2X Storage capacity in comparison to
DS2.XLarge
18. 6. Right Sizing
• Instance Types
• Dense Compute (DC2)
• Dense Storage (DS2)
• RA3
• Sizing
• Size based on workload: CPU, disk, I/O
• Scale up by adding nodes to check
linear performance
• Move to Higher instance groups
19. 7. Table Design Considerations
Sort Key Column EncodingDistribution Key
• ANALYZE COMPRESSION
• Compress all columns except
for first sort key column
• AZ64 is new encoding
• Improves performance 2X-4X by
reducing I/O
• Use PG_TABLE_DEF
• Zone maps stores min and
max values of block
• Order columns by low to high
cardinality
• No of Sort Columns < 4
• Interleaved Sort key– BE
CAUTIOUS
• More columns in interleaved
sort key = Longer Vacuum
• Use STL_TABLE_INFO
• Distributions keys should
have high cardinality to
avoid data skew and “hot”
nodes
• Use Date Columns only if
cardinality is high
• DISTSTYLE AUTO is a great
go-to for all tables < ~5
million rows.
20. Moving Towards AUTO Management
Table Stats
WLM
• Ensure that AUTO ANALYZE , AUTO SORT &
AUTO VACUUM is enabled
• INTERLEAVED SORT KEYS - Run
VACUUM REINDEX command scheduled
• Use STL_TABLE_INFO for stats
• Use Auto WLM with SQA Enabled
Manual WLM
• Number of queues < 4
• Use QMR to monitor performance from bad queries
• Max concurrency level for all user <=15
• Leave ~5% of memory unallocated
22. AWS WAF-based Redshift assessment for M&E Giant
Technologies:
S3, Redshift, Redshift Spectrum
Source System:
25 TB
Team:
Cloud Solutions Architect, Sr. Big
Data Architect
Fast FactsSolution
• Comprehensive assessment of the candidate Redshift workload across 5
pillars of the AWS WAF, using Agilisium’s Redshift Inspector
• Several observations across all 5 pillars were made based on Findings &
Recommendations report (Redshift Inspector) and workshop
Recommendations
• Security : Database Encryption, Redshift in private cluster, Port
Obfuscation, S3 VPC endpoint
• Performance : Time series data model, Concurrency Scaling, Limited use
of Interleaved sort keys, right-size column width
• Cost Optimization : Data placement, RA3, Deletion of redundant backup
& Reserved Instances
• Reliability : Cross-region backup, Avoid Temp & Staging backup
• Operational Excellence : Audit Logging, Cloud Watch Alerts and Auto
Update
Client requested a holistic assessment of their 25TB Redshift workload to identify avenues for improvement across all dimensions
Objective
Value delivered
30% faster
query
performance
More secured and resilient
Redshift workload
40% Cost
reduction
24. How we can help?
• AWS WAF-based Redshift
Assessment
✓Findings &
Recommendations Report
✓Remediation Plan (Pre-
cursor for next phase –
Optimize)
• Performance Optimization
(Compelling pricing options)
• Cost Optimization (Outcome-
based pricing)
• Extend customer’s knowledge
on new features and best
practices
• Custom trend report with top
10 metrics for ongoing
maintenance
Diagnose – 3 Days Optimize – 2+ Weeks Maintain – Quarterly
25. Diagnose – AWS WAF-based Redshift Assessment – 3 Days
Customer Contribution
Identification of business-critical
Redshift workload
Availability of Business & Tech SMEs
tied to Redshift workload for
workshop
Availability of Client DBA to run
Redshift diagnostic queries
Read-only access to your Redshift
cluster for additional investigation,
if any
Automated fact-
based assessment
Rich Corpus of Best practices
Agilisium’s Redshift Inspector
Holistic 60-point check of your Redshift
workload across 5 pillars of AWS WAF
Toolkit is based on 100+ Redshift best practices identified
from migrating 15+ PB to AWS in the last 7+ years
AWS WAF-based
Assessment – Deliverables
Findings & Recommendations Report –
Get accurate observations by criticality
(Critical, Needs Improvement, Well-
Architected)
Actionable Remediation Plan – Plan to
implement top observations from the
Findings & Recommendations report.
Clients can choose to implement the plan
internally or involve Agilisium
Automated Redshift WAF-based Assessment Toolkit
26. Agilisium’s Redshift Inspector – Key facets covered
Cost Optimization
• Right-fit cluster size
• On-demand to restore snapshots
• Underutilized/unused clusters
• Choice of Reserved Instances (stand vs convertible)
• Hot/cold/warm data strategy
• Intra-region Data Transfer
• Snapshot lifecycle management
Performance Efficiency
• Compression/Encoding of large datasets to improve network throughput
• Avoid Data Skew through right Distribution & Sort Keys
• Up-to-date stats via ANALYZE & VACUUM
• VACUUM strategies (Pre-sort & load)
• Data loads Optimize strategies (COPY commands)
• Track query performance (Integrity constraints as hints)
• Auto WLM vs Custom WLM
• Time-series data model for larger datasets
Security
• SSO & IAM Federations
• Ingress policy – Port 5832 open for internal IPs only
• All traffic routed via private subnets/VPCs
• Encrypt data-at-rest – KMS/HSM
• Encrypt data-in-motion – SSL/TLS
Reliability
• Multi-region cluster setup
• SLA-based manual backup – Restore data lost due to accidental
deletion
• Cross-region backups for HA
• Continuous monitoring of key metrics for HA (Disk utilization,
ReadIOPS, WriteOPS, CPU utilization etc.)
• Redshift user activity logged for RCA
Operational Excellence
• Deferred maintenance
• Redshift Advisor recommendations
• RA3 – Intelligent Data offload