This presentation summarizes Amazon Redshift data warehouse service, its architecture and best practices for application development using Amazon Redshift.
5. Client Applications
Integrates with various data loading and ETL (Extract, Transform, and
Load) tools and business intelligence (BI) reporting, data mining, and
analytics tools
Redshift is based on industry-standard PostgreSQL, so most existing
SQL client applications will work with only minimal changes
9. Compute Nodes
Execute the compiled code, send intermediate results back to the
leader node for final aggregation
It has own dedicated CPU, memory, and attached disk storage, which
are determined by the node type
10. Databases
A cluster contains one or more databases
User data is stored on the compute nodes
Amazon Redshift is a Relational Database Management System
(RDBMS)
Amazon Redshift is optimized for high-performance analysis and
reporting of very large datasets
Amazon Redshift is based on PostgreSQL
11. Redshift reduces I/O
Column storage - read data you need
Data compression - analyzes and compress your data
Zone Map
Keep track of minimum and maximum value for each block
Skip over blocks that don't contain data needed for a given query
Minimize unnecessary I/O
Direct attached storage
Hardware optimized for high performance data processing
Large data block sizes
Large block sizes to make the most of each read
15. Redshift has security built-in
SSL to secure data in transit
Encryption to secure data at rest
AES 256 - hardware accelerated
All blocks on disk and in Amazon S3 encrypted
No direct access to compute nodes
Amazon VPC support
23. Redshift Implementation
High Storage Extra Large (XL) DW Node
ETL Activities
Approx. 90 minutes including exports from RDBMS, copying to S3,
loading stage tables, loading target tables, vacuuming and
analysing tables
Schema
Compression
Retention
26. Best Practices
Avoid large number of singleton Data Manipulation Language (DML)
statements if possible
Use COPY for uploading large datasets
Choose SORT and DISTRIBUTION keys with care
Encode data and time with TIMESTAMP data type
Experiment with WLM (Workload Manager) settings