2. My motivation
n Delivery of an analytics platform in Amazon Redshift for
randomly generated healthcare data.
n Deep delve into Amazon Redshift as a distributed data
warehouse system.
n Redshift is being widely employed in business, efficient
analytics is important to supply operational insight.
6. Columnar Compression
Types How it works Examples
Raw N/A – no compression, use for large domain Identifiers
Bytedict Creates dict. of unique values, optimal for limited unique values Dept. code
LZO Creates a dictionary of repeating character sequences, use for
very long character strings
Comments
Runlength Store repeat value counts, use for consecutive repeating values Doctor code
Text255 &
text32k
Creates dictionary of unique words for repeating text Address
Delta Record difference between values that follow each other,
optimal for consecutive integer values
Gender
Code
Mostly Store values in smaller standard storage size, optimal when the
data type for a column is larger than most values
BIGINT
columns
7. Columnar Compression
Types How it works Examples
Raw N/A – no compression, use for large domain Identifiers
Bytedict Creates dict. of unique values, optimal for limited unique values Dept. code
LZO Creates a dictionary of repeating character sequences, use for
very long character strings
Comments
Runlength Store repeat value counts, use for consecutive repeating values Doctor code
Text255 &
text32k
Creates dictionary of unique words for repeating text Address
Delta Record difference between values that follow each other,
optimal for consecutive integer values
Gender
Code
Mostly Store values in smaller standard storage size, optimal when the
data type for a column is larger than most values
BIGINT
columns
13. About me
n Worked in Data Migration for IT system
implementation project and Business
Intelligence at an NHS Trust.
n M.Eng in Engineering Mathematics
from University of Bristol.
n Interests include hiking
and swimming.
15. Encryption
n AWS Key Management Services (KMS)
n Automatically integrates with Redshift
n $1 a month
n Hardware Security Module (HSM)
n Need to use client and server certificates to configure a trusted connection
to Amazon Redshift
n Monthly fee plus $5000 initial cost
16. Redshift cluster
n Set up a Redshift cluster with 4 dc1.large nodes. = four
nodes with two slices each
Node size vCPU EC
U
RAM
(GiB)
Slices
per Node
Storage per
Node
Node
Range
Total
Capacity
dc1.large 2 7 15 2 160 GB
SSD
1-32 5.12 TB
17. Columnar Compression
Types How it works Use case Examples
Raw N/A – no compression Large domain Identifiers
Bytedict Creates a dict. of unique values Limited unique vals Dept. code
LZO Creates a dictionary of repeating
character sequences
V. Long char strings Comments
Runlength Store repeated value counts, use for
consecutive repeating values
Consecutive
repeating vals
Dr code
Text255 &
text32k
Creates dictionary of unique words for
repeating text
Repeating words
within string
Address
Delta Record difference between values that
follow each other
Consecutive integer
vals
Gender
Code
Mostly Store values in smaller standard storage
size
Column data type is
larger than most vals
BIGINT
columns