Finding ways to make ETL loads faster is not always obvious. Moreover, there is a difference in how to tune OLAP vs OLTP databases. Some of the techniques learned through years of tuning EBS seem to make no effect on tuning a BI ETL. This presentation will discuss why this is the case, present some techniques on how to find the bottlenecks in your BI ETL jobs and some techniques to tune these slow SQL statements, improving the speed of nightly ETL jobs. Attendees will learn the steps to monitor ETLs, capture Problem SQL and gain knowledge to improve the overall ETL Performance.
1. Tuning ETLs for Better BI
Datavail is the largest provider of remote
database administration in the U.S. with nearly
400 DBAs, 24/7 support and onsite/offsite,
onshore/offshore delivery.
Presented by Chuck Ezell
Performance, Tuning & Optimization
Services, Datavail
2. www.datavail.com
Agenda
• OLTP and OLAP an approach for tuning
• More than just data: peeling back the layers
• Components & Layers of Common ETLs
• Component Points of Failure
• Source, Transformation & Target Tuning Points
• High-Level Tuning Examples
• Monitoring ETL Activity (tools to make it easy)
• Recap & Questions
2
3. www.datavail.com
OLTP & OLAP
• OLTP Online Transaction Processing
• Best for relational database transactions.
• Emphasis is on Fast Query & Relational Data Integrity
• Emphasis very normalized data
• Business Process Data (operational, workflows, etc…)
• Insert, Update & Delete activity
• OLAP Online Analytical Processing
• Best for structured, sometimes redundant data.
• Emphasis is on ability to aggregate & analyze
• Emphasis on de-normalized & fewer tables
• Data Warehouse (trending, historical, analytical, etc…)
• Write (loading) & Reads (complex selects)
Organization
of Data
Most often both OLTP and OLAP systems exist within all
ETLs but the tuning of each is different.
3
4. www.datavail.com
The Essence of ETL
Extracting data from various sources, performing transformations
and loading transformed data ready for reporting.
Extraction Transform Load
4
Workflow / Task / Procedure
8. www.datavail.com
Reporting
Data
Target(s)
Reporting
Data
ETL Stage Component Points of Failure
Temp
Tables
Lookup
File
Lookup
File
Lookup
File
Lookup
File
Transform
Lookup
Tables
Data
Warehouse
Files
Cloud
Data
EBS
Data
Flat
Files
Source(s)
Disk (I/O)
Network
Latency
Too Much
In-Memory
Limited
RAM
File System or
Cache Fragmentation
IOP & CPU
Bottlenecks
Limited
Space
Poor
Code
8
9. www.datavail.com
Source Bottlenecks & Tuning Ideas
• Source is often OLTP structured data (but not always)
• A traditional tuning approach will apply
• Factor in DML causing Fragmentation & Stats problems
• Find poor plans and tune in traditional fashion Data
Warehouse
Files
Cloud
Data
EBS
Data
Flat
Files
• SQL Code (better filtering, use of
custom and vendor functions)
• Statistics
• Indexing & Table
Fragmentation
• Conflicting Sessions or
Processes during ETL
• Offload or replicate data for
better isolation
9
10. www.datavail.com
Transformation Bottlenecks & Tuning Ideas
• Depending on your ETL, high % could be in-memory
• RAM & Temp space is critical(the more the better)
• Filesystem lookups can be slow (lack of indexing)
• Filesystems can become fragmented (depending OS)
• SQL Code (in memory merges and joins)
• Statistics can hinder on temp
tables
• Indexing could slow a process
down
• Lack of proper temp space will
cause failures (watch logs & ASM)
• Filesystem lookups perform better
if they’re converted to DB table
lookups
Temp
Tables
Lookup
File
Lookup
File
Lookup
File
Lookup
File
Lookup
Tables
10
11. www.datavail.com
Target Bottlenecks & Tuning Ideas
• OLAP Write speeds and I/O are overlooked
• Indexing and Stats can be problematic
• Loading could be single inserts in a loop
• SQL Code
• Inserts can benefit from HINT
“APPEND” or “APPEND_VALUES”
• Inserts and Updates could benefit from
PARALLEL hinting
• Stats and Indexing added after loads
and performed in Parallel (split out tasks)
• Confirm Async I/O settings in OS and
DB
• Use Bulk Loading where possible
Reporting
Data
Reporting
Data
11
13. www.datavail.com
? ?
What do we want from our ETLs?
Setting goals will affect our approach however, there are two main
goals for any and all ETLs.
13
Speed Consistency
&
14. www.datavail.com
Common Problems Seen
• Doing too much in-memory
• Doing too much from filesystem
• Not considering network speeds or drive speeds
• Not considering system or session conflicts
• Not taking advantage of ASYNC features
• Not Partitioning
• Not providing enough resources to database
• Not reviewing workflow logs
• Not knowing the business purpose of the data or each task
• Using HINTs too much or wrongly (ordered, cardinality, parallel)
14
15. www.datavail.com
Using ORDERED /*+ HINTs */
• ORDERED forces the table join order
• Instructs Optimizer to join in the order they appear in the SQL code
• Use LEADING() instead but only for investigation
15
/*+ ORDERED */
/*+ LEADING(FA_BOOK_TYPE_D, FIN_BUSN_LOCATION_D) */
16. www.datavail.com
Using CARDINALITY /*+ HINTs */
• Cardinality has been deprecated from 10g on
• Use OPT_ESTIMATE() instead
16
CARDINALITY(5) OPT_ESTIMATE(table tabname rows=5)
Wrong
select count(*) from tabname; Result=35,754,849
CARDINALITY(35754849)
or
OPT_ESTIMATE(table tabname rows=35754849)
Right
17. www.datavail.com
Using PARALLEL /*+ HINTs */
Original Plan
Plan with Full Table Scans
17
PARALLEL(auto) or PARALLEL(32)
Could cause unpredictable runtimes
18. www.datavail.com
Using PARALLEL /*+ HINTs */
Parallelism Introduced
Time and Cost is Reduced
Parallel Hinting also consumed CPU
and didn’t solve plan problems.
18
19. www.datavail.com
Plan Improvement w/ Indexing
Full Table Scan due to NVL() function on filter
condition causing Long Operations
Filtering against almost 1 million rows
19
20. www.datavail.com
Plan Improvement w/ Indexing
Function Based Index Immediately
Improved Performance
Index improved filtering performance by reducing
read activity from 947k to 253 rows
20
21. www.datavail.com
Plan Improvement w/ Indexing
Parallel Hints didn’t reduce Long Ops
Parallel Hinting could improve the
performance of the indexing further but
alone would only a band-aid.
21
28. www.datavail.com
Monitor Tasks in DAC
DAC serves the following purposes:
- DAC is a metadata driven administration and deployment tool
- Manages Application Configuration
- Manages the execution of warehouse loads
- Provides a monitoring capabilities
28
30. www.datavail.com
In Closing
• OLTP and OLAP an approach for tuning
• More than just data: peeling back the layers
• Components & Layers of Common ETLs
• Component Points of Failure
• Source, Transformation & Target Tuning Points
• High-Level Tuning Examples
• Monitoring ETL Activity (tools to make it easy)
• Recap & Questions
30
31. Questions?
Questions can also be sent to
kelley.Weir@Datavail.com
or chuck.Ezell@Datavail.com
Presented by Chuck Ezell
Performance, Tuning & Optimization
Services, Datavail
Datavail is the largest provider of remote
database administration in the U.S. with nearly
400 DBAs, 24/7 support and onsite/offsite,
onshore/offshore delivery.