SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift Best Practices
Part 2
May 2013
Eric Ferreira & John Loughlin
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Agenda
Introduction & Recap
Best Practices for
• Workload Migration
• Copy Command Options
• Vacuum
• Space Management
Q&A
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon DynamoDB
Fast, Predictable, Highly-Scalable NoSQL Data Store
Amazon RDS
Managed Relational Database Service for
MySQL, Oracle and SQL Server
Amazon ElastiCache
In-Memory Caching Service
Amazon Redshift
Fast, Powerful, Fully Managed, Petabyte-Scale
Data Warehouse Service
Compute Storage
AWS Global Infrastructure
Database
Application Services
Deployment & Administration
Networking
AWS Database
Services
Scalable High Performance
Application Storage in the Cloud
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift architecture
Leader Node
• SQL endpoint
• Postgres based
• Stores metadata
• Communicates with client
• Compiles queries
• Coordinates query execution
Compute Nodes
• Local, columnar storage
• Execute queries in parallel - slices
• Load, backup, restore via Amazon S3
Everything is mirrored
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
In Part 1…
This is Part 2 of the Redshift Best Practices series.
Visit:
http://aws.amazon.com/resources/databaseservices/webin
ars/
To watch Part 1.
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Workload Migration
ELT/ETL Process
• Load Atomic Data (target table or staging area)
• Transform data (include cleanup and aggregation)
• Prepare target tables for query/reports
• Includes Statistics gathering and vacuum
• Includes data retention policy
Re-evaluate to take advantage of cloud characteristics.
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Workload Migration cont.
Make provision for testing multiple options before you
migrate the production workflow
• Different number of nodes
• Few large nodes versus many small nodes (16xXL versus 2x8XL)
• WLM Settings
• Concurrency versus response time
• Different Sort and Distribution Keys
• Test both queries and load/vacuum times
• Compression
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Workload Best Practices
Organizing and keeping your load files in S3 allows for re-run or scenario testing
as you evolve your workflow in the platform.
• Keep in S3 or Glacier for fiscal/legal reasons
Data updated for short-term
• consider having a short-term version of the table for staging and a long term version once
data gets stable.
Round Robin distribution key
• When you don’t have a good Distribution Key
• Check Part 1 for query on checking for distribution skew
• Trade off with collocated joins
Loading the target (final) table
• Use a chronological date/timestamp columns for first sortkey. Vacuum is needed less often
and runs faster
• When first sort column has low cardinality/resolution (i.e, date instead of timestamp),
subsequent columns should match common filters and/or grouping columns
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Workload Best Practices cont.
Use UNLOAD command to archive data that is not needed for
business reasons
• Data that needs to exist only for fiscal/legal reasons can be re-loaded as
needed.
Consider applying retention policies less often than the regular
workflow
• Weekly/Monthly process during a less busy time
• Make space provision for the data growth
• Make sure all queries have date/timestamp range filters (> and <)
• Keep a sliding window of data to minimize block re-write during vacuum
Take manual snapshots to save status at specific mileposts (year-
end).
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Workload Best Practices cont.
Ratio between Load/Query performance needs
• Low ratio: Consider Load -> Snapshot -> Spin “Query” clusters -
> Tear down
• High ratio: Consider Performance above space needs when
choosing number of nodes
Normalization Rule of Thumb
• De-normalize only to avoid non-collocated joins
• Slow Changing Dimensions (type II): Keep normalized, match
distkey with fact table
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
COPY Command
COPY table_name [ (column1 [,column2, ...]) ]
FROM 's3://objectpath' [ WITH ] CREDENTIALS [AS] 'aws_access_credentials'
[ option [ ... ] ]
Options worth mentioning:
GZIP
• Using compressed files saves network bandwidth and can speed up loads.
MAXERROR and NOLOAD
• Default maxerror is 0. Set to a larger value while troubleshooting new data
stream
• Use with noload option to speed up file validation
STATUPDATE
• When loading significant amount of data to non empty table can update stats at
the end the load.
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
COPY Command Common Issues
UTF-8
• Currently redshift can only load well-formed uft-8 characters up to 3 bytes.
NULL AS and ESCAPE
• Common issues loading files can be circumvented with these options
• Narrow down to small set of rows and visually find what type of problem you
have
• Note that the error message might refer to a later portion. For example
“Delimiter not found” might be caused by a EOL that was not escaped.
DATEFORMAT and TIMEFORMAT
• Currently all date/timestamp columns have to use the same formatting
defined by the option
• Using ACCEPTANYDATE will not generate errors but load NULL when
format does not match
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
COPY Command Troubleshooting
STL_LOAD_ERRORS / STL_LOADERROR_DETAIL
• Find errors during specific loads
• You can create a view to simplify troubleshooting process
create view loadview as (select distinct tbl, trim(name) as table_name, query, starttime,
trim(filename) as input, line_number, colname, err_code, trim(err_reason) as reason from
stl_load_errors sl, stv_tbl_perm sp where sl.tbl = sp.id);
• Then you “select * from loadview where table_name = <table>” if you have any issues.
STL_LOAD_COMMITS / STL_FILE_SCAN / STL_S3CLIENT
• Load times for specific files. Confirms a given file was read
STL_S3CLIENT_ERROR
• Information about specific S3 or file transfer errors that happen during load
process
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
COPY Command – Historical Information
Look back to confirm number files and bytes loaded by
each COPY statement
select substring(q.querytxt,1,40) as querytxt, s.n_files, size_mb, s.time_seconds,
s.size_mb/decode(s.time_seconds,0,1,s.time_seconds) as mb_per_s
from (select query, count(*) as n_files,
sum(transfer_size/(1024*1024)) as size_MB, (max(end_Time) -
min(start_Time))/(1000000) as time_seconds , max(end_time) as end_time
from stl_s3client where query > 0 and transfer_time > 0 group by query ) as s
LEFT JOIN stl_Query as q on q.query = s.query
order by mb_per_s desc
limit 10
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
COPY Command – Historical Information
cont.
querytxt | n_files | size_mb | time_seconds | mb_per_s
--------------------------------------------------------------+---------+---------+--------------+----------
copy lineitem from 's3://tpc-h/100/lineitem.tbl.' credential | 603 | 22201 | 2390 | 9
copy lineitem from 's3://tpc-h/1/lineitem.tbl.' credentials | 34 | 192 | 21 | 8
copy customer from 's3://tpc-h/100/customer.tbl.' credential | 152 | 750 | 85 | 8
copy partsupp from 's3://tpc-h/100/partsupp.tbl.' credential | 82 | 2720 | 367 | 7
COPY ANALYZE part | 22 | 40 | 7 | 5
copy orders from 's3://tpc-h/100/orders.tbl.' credentials '' | 152 | 4800 | 1035 | 4
copy orders from 's3://tpc-h/1/orders.tbl.' credentials '' g | 34 | 32 | 7 | 4
copy part from 's3://tpc-h/100/part.tbl.' credentials '' gzi | 202 | 400 | 95 | 4
COPY ANALYZE supplier | 34 | 0 | 3 | 0
copy supplier from 's3://tpc-h/100/supplier.tbl.' credential | 102 | 0 | 10 | 0
(10 rows)
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Vacuum
Before Vacuum
• Data inserted goes to a “non-sorted” area at the end of the table
• As this area grows, query times grow
• Data deleted is “marked” in a special column
• As that column grows, query times grow
What vacuum does
• Non-sorted area gets sorted and integrated into the table
• Deleted rows are removed and blocks reorganized
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Vacuum cont.
• Vacuum takes advantage of sortkey and skips
blocks that don’t need to be modified.
• Vacuum is a maintenance type operation
• Only one vacuum can be running at a time
(cluster-wide)
• More Memory = Faster Vacuum
– set wlm_query_slot_count to 4;
• Keep track of Vacuum progress (ETA)
– SVV_VACUUM_PROGRESS
• Record vacuum details after to consider adjust
frequency
– SVV_VACUUM_SUMMARY
April/2013
May/2013
Unsorted
March/2013
May/2013
June/2013
April/2013
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Space Management
Redshift has a single pool of space used for tables and
temporary segments.
• Loads need 2.5 times the space of the data being loaded if table
has a sortkey
• Vacuum may need 2.5 times the size of the table.
Monitor the free space
• Performance Tab in the console
• Cloudwatch Alarms
• SQL
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Space Management cont.
Tables Sizes
select trim(pgdb.datname) as Database, trim(pgn.nspname) as
Schema,
trim(a.name) as Table, b.mbytes, a.rows
from ( select db_id, id, name, sum(rows) as rows
from stv_tbl_perm a group by db_id, id, name ) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (select tbl, count(*) as mbytes
from stv_blocklist group by tbl) b on a.id=b.tbl
order by mbytes desc, a.db_id, a.name;
Free Space
select sum(capacity)/1024 as capacity_gbytes,
sum(used)/1024 as used_gbytes,
(sum(capacity) - sum(used))/1024
as free_gbytes
from stv_partitions
where part_begin=0;
• Redshift allows you to resize your cluster up and down and across node
types. Online (read-only access).
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Summary
• Experiment to optimize your workflows
• Various STL/STV tables hold most information needed for
troubleshooting
• Space Management and Vacuum schedule should be
considered during implementation phase
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
More information
COPY Command
http://docs.aws.amazon.com/redshift/latest/dg/t_Loading_tables_with_the_COPY_command.html
Loads Troubleshooting
http://docs.aws.amazon.com/redshift/latest/dg/t_Troubleshooting_load_errors.html
Vacuum
http://docs.aws.amazon.com/redshift/latest/dg/t_Reclaiming_storage_space202.html
UNLOADING data
http://docs.aws.amazon.com/redshift/latest/dg/c_unloading_data.html
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Q&A

Contenu connexe

Plus de Amazon Web Services

Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSAmazon Web Services
 
AWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAmazon Web Services
 
Crea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightCrea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightAmazon Web Services
 
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotCostruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotAmazon Web Services
 
Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Amazon Web Services
 
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?Amazon Web Services
 

Plus de Amazon Web Services (20)

Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
 
AWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei server
 
Crea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightCrea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSight
 
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotCostruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
 
Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows
 
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
 

Dernier

How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityEric T. Tung
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxWorkforce Group
 
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...lizamodels9
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon investment
 
Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture conceptP&CO
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Centuryrwgiffor
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noidadlhescort
 
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort ServiceEluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort ServiceDamini Dixit
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with CultureSeta Wicaksana
 
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...rajveerescorts2022
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentationuneakwhite
 
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...amitlee9823
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...daisycvs
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876dlhescort
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...amitlee9823
 

Dernier (20)

How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business Growth
 
Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture concept
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
(Anamika) VIP Call Girls Napur Call Now 8617697112 Napur Escorts 24x7
(Anamika) VIP Call Girls Napur Call Now 8617697112 Napur Escorts 24x7(Anamika) VIP Call Girls Napur Call Now 8617697112 Napur Escorts 24x7
(Anamika) VIP Call Girls Napur Call Now 8617697112 Napur Escorts 24x7
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
 
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort ServiceEluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
 
Falcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in indiaFalcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in india
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 

AWS Webcast - Amazon Redshift Best Practices Part 2 – Performance

  • 1. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Amazon Redshift Best Practices Part 2 May 2013 Eric Ferreira & John Loughlin
  • 2. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Agenda Introduction & Recap Best Practices for • Workload Migration • Copy Command Options • Vacuum • Space Management Q&A
  • 3. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Amazon DynamoDB Fast, Predictable, Highly-Scalable NoSQL Data Store Amazon RDS Managed Relational Database Service for MySQL, Oracle and SQL Server Amazon ElastiCache In-Memory Caching Service Amazon Redshift Fast, Powerful, Fully Managed, Petabyte-Scale Data Warehouse Service Compute Storage AWS Global Infrastructure Database Application Services Deployment & Administration Networking AWS Database Services Scalable High Performance Application Storage in the Cloud
  • 4. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Amazon Redshift architecture Leader Node • SQL endpoint • Postgres based • Stores metadata • Communicates with client • Compiles queries • Coordinates query execution Compute Nodes • Local, columnar storage • Execute queries in parallel - slices • Load, backup, restore via Amazon S3 Everything is mirrored 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 5. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. In Part 1… This is Part 2 of the Redshift Best Practices series. Visit: http://aws.amazon.com/resources/databaseservices/webin ars/ To watch Part 1.
  • 6. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Workload Migration ELT/ETL Process • Load Atomic Data (target table or staging area) • Transform data (include cleanup and aggregation) • Prepare target tables for query/reports • Includes Statistics gathering and vacuum • Includes data retention policy Re-evaluate to take advantage of cloud characteristics.
  • 7. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Workload Migration cont. Make provision for testing multiple options before you migrate the production workflow • Different number of nodes • Few large nodes versus many small nodes (16xXL versus 2x8XL) • WLM Settings • Concurrency versus response time • Different Sort and Distribution Keys • Test both queries and load/vacuum times • Compression
  • 8. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Workload Best Practices Organizing and keeping your load files in S3 allows for re-run or scenario testing as you evolve your workflow in the platform. • Keep in S3 or Glacier for fiscal/legal reasons Data updated for short-term • consider having a short-term version of the table for staging and a long term version once data gets stable. Round Robin distribution key • When you don’t have a good Distribution Key • Check Part 1 for query on checking for distribution skew • Trade off with collocated joins Loading the target (final) table • Use a chronological date/timestamp columns for first sortkey. Vacuum is needed less often and runs faster • When first sort column has low cardinality/resolution (i.e, date instead of timestamp), subsequent columns should match common filters and/or grouping columns
  • 9. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Workload Best Practices cont. Use UNLOAD command to archive data that is not needed for business reasons • Data that needs to exist only for fiscal/legal reasons can be re-loaded as needed. Consider applying retention policies less often than the regular workflow • Weekly/Monthly process during a less busy time • Make space provision for the data growth • Make sure all queries have date/timestamp range filters (> and <) • Keep a sliding window of data to minimize block re-write during vacuum Take manual snapshots to save status at specific mileposts (year- end).
  • 10. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Workload Best Practices cont. Ratio between Load/Query performance needs • Low ratio: Consider Load -> Snapshot -> Spin “Query” clusters - > Tear down • High ratio: Consider Performance above space needs when choosing number of nodes Normalization Rule of Thumb • De-normalize only to avoid non-collocated joins • Slow Changing Dimensions (type II): Keep normalized, match distkey with fact table
  • 11. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. COPY Command COPY table_name [ (column1 [,column2, ...]) ] FROM 's3://objectpath' [ WITH ] CREDENTIALS [AS] 'aws_access_credentials' [ option [ ... ] ] Options worth mentioning: GZIP • Using compressed files saves network bandwidth and can speed up loads. MAXERROR and NOLOAD • Default maxerror is 0. Set to a larger value while troubleshooting new data stream • Use with noload option to speed up file validation STATUPDATE • When loading significant amount of data to non empty table can update stats at the end the load.
  • 12. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. COPY Command Common Issues UTF-8 • Currently redshift can only load well-formed uft-8 characters up to 3 bytes. NULL AS and ESCAPE • Common issues loading files can be circumvented with these options • Narrow down to small set of rows and visually find what type of problem you have • Note that the error message might refer to a later portion. For example “Delimiter not found” might be caused by a EOL that was not escaped. DATEFORMAT and TIMEFORMAT • Currently all date/timestamp columns have to use the same formatting defined by the option • Using ACCEPTANYDATE will not generate errors but load NULL when format does not match
  • 13. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. COPY Command Troubleshooting STL_LOAD_ERRORS / STL_LOADERROR_DETAIL • Find errors during specific loads • You can create a view to simplify troubleshooting process create view loadview as (select distinct tbl, trim(name) as table_name, query, starttime, trim(filename) as input, line_number, colname, err_code, trim(err_reason) as reason from stl_load_errors sl, stv_tbl_perm sp where sl.tbl = sp.id); • Then you “select * from loadview where table_name = <table>” if you have any issues. STL_LOAD_COMMITS / STL_FILE_SCAN / STL_S3CLIENT • Load times for specific files. Confirms a given file was read STL_S3CLIENT_ERROR • Information about specific S3 or file transfer errors that happen during load process
  • 14. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. COPY Command – Historical Information Look back to confirm number files and bytes loaded by each COPY statement select substring(q.querytxt,1,40) as querytxt, s.n_files, size_mb, s.time_seconds, s.size_mb/decode(s.time_seconds,0,1,s.time_seconds) as mb_per_s from (select query, count(*) as n_files, sum(transfer_size/(1024*1024)) as size_MB, (max(end_Time) - min(start_Time))/(1000000) as time_seconds , max(end_time) as end_time from stl_s3client where query > 0 and transfer_time > 0 group by query ) as s LEFT JOIN stl_Query as q on q.query = s.query order by mb_per_s desc limit 10
  • 15. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. COPY Command – Historical Information cont. querytxt | n_files | size_mb | time_seconds | mb_per_s --------------------------------------------------------------+---------+---------+--------------+---------- copy lineitem from 's3://tpc-h/100/lineitem.tbl.' credential | 603 | 22201 | 2390 | 9 copy lineitem from 's3://tpc-h/1/lineitem.tbl.' credentials | 34 | 192 | 21 | 8 copy customer from 's3://tpc-h/100/customer.tbl.' credential | 152 | 750 | 85 | 8 copy partsupp from 's3://tpc-h/100/partsupp.tbl.' credential | 82 | 2720 | 367 | 7 COPY ANALYZE part | 22 | 40 | 7 | 5 copy orders from 's3://tpc-h/100/orders.tbl.' credentials '' | 152 | 4800 | 1035 | 4 copy orders from 's3://tpc-h/1/orders.tbl.' credentials '' g | 34 | 32 | 7 | 4 copy part from 's3://tpc-h/100/part.tbl.' credentials '' gzi | 202 | 400 | 95 | 4 COPY ANALYZE supplier | 34 | 0 | 3 | 0 copy supplier from 's3://tpc-h/100/supplier.tbl.' credential | 102 | 0 | 10 | 0 (10 rows)
  • 16. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Vacuum Before Vacuum • Data inserted goes to a “non-sorted” area at the end of the table • As this area grows, query times grow • Data deleted is “marked” in a special column • As that column grows, query times grow What vacuum does • Non-sorted area gets sorted and integrated into the table • Deleted rows are removed and blocks reorganized
  • 17. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Vacuum cont. • Vacuum takes advantage of sortkey and skips blocks that don’t need to be modified. • Vacuum is a maintenance type operation • Only one vacuum can be running at a time (cluster-wide) • More Memory = Faster Vacuum – set wlm_query_slot_count to 4; • Keep track of Vacuum progress (ETA) – SVV_VACUUM_PROGRESS • Record vacuum details after to consider adjust frequency – SVV_VACUUM_SUMMARY April/2013 May/2013 Unsorted March/2013 May/2013 June/2013 April/2013
  • 18. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Space Management Redshift has a single pool of space used for tables and temporary segments. • Loads need 2.5 times the space of the data being loaded if table has a sortkey • Vacuum may need 2.5 times the size of the table. Monitor the free space • Performance Tab in the console • Cloudwatch Alarms • SQL
  • 19. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Space Management cont. Tables Sizes select trim(pgdb.datname) as Database, trim(pgn.nspname) as Schema, trim(a.name) as Table, b.mbytes, a.rows from ( select db_id, id, name, sum(rows) as rows from stv_tbl_perm a group by db_id, id, name ) as a join pg_class as pgc on pgc.oid = a.id join pg_namespace as pgn on pgn.oid = pgc.relnamespace join pg_database as pgdb on pgdb.oid = a.db_id join (select tbl, count(*) as mbytes from stv_blocklist group by tbl) b on a.id=b.tbl order by mbytes desc, a.db_id, a.name; Free Space select sum(capacity)/1024 as capacity_gbytes, sum(used)/1024 as used_gbytes, (sum(capacity) - sum(used))/1024 as free_gbytes from stv_partitions where part_begin=0; • Redshift allows you to resize your cluster up and down and across node types. Online (read-only access).
  • 20. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Summary • Experiment to optimize your workflows • Various STL/STV tables hold most information needed for troubleshooting • Space Management and Vacuum schedule should be considered during implementation phase
  • 21. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. More information COPY Command http://docs.aws.amazon.com/redshift/latest/dg/t_Loading_tables_with_the_COPY_command.html Loads Troubleshooting http://docs.aws.amazon.com/redshift/latest/dg/t_Troubleshooting_load_errors.html Vacuum http://docs.aws.amazon.com/redshift/latest/dg/t_Reclaiming_storage_space202.html UNLOADING data http://docs.aws.amazon.com/redshift/latest/dg/c_unloading_data.html
  • 22. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 23. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Q&A

Notes de l'éditeur

  1. Usual Progression: Steps that happen at a certain frequency (daily, hourly, weekly)
  2. If your data has updates in the short term, consider having a short-term version of the table for staging and a long term version once data gets stable - Example: Orders stay on a short term table while in process and goes to