Shaping Optimizer's Search Space

Shaping the 
Optimizer’s 
Search-Space
@MarkusWinand

The Optimizer’s Search-Space is Limited
“the query optimizer determines 
the most eﬃcient execution plan*”
...most eﬃcient? Out of what?
*http://docs.oracle.com/cd/E16655_01/server.121/e15857/pfgrf_perf_overview.htm#TGDBA94082

The Optimizer’s Search-Space is Limited
The Optimizer...
‣Considers existing indexes only
➡ Other indexes might give even better performance
‣Doesn’t de-obfuscate queries very well
➡ Writing it in simpler terms might improve performance
‣Has built-in limitations
➡ Some theoretically possible plans are never considered

Bring the Best Plan in the Search-Space
... it determines the most efficient
execution plan out of the remaining ones.
Before the optimizer can find the 
absolutely best plan we must first 
make sure it is within these boundaries.

Two steps to get the absolutely best access path:
1. Maximize data-locality
‣ Plain old B-tree index is the #1 tool for that
‣ Partitions are greatly overrated
‣ Table clusters are slightly underrated
It’s All About Matching Queries to Indexes
2. Write the query to exploit it
‣ Use explicit range conditions
‣ Use top-n based termination
‣ Exploit index order
Thinking
in
Ordered 
Sets

Visualizing Indexes as Pyramids
Visualize Simplify

The Order of Multi-Column Indexes

Using Indexes:
Column Order Deﬁnes Row-Locality
Example: WHERE A > :a AND B = :b

Using Indexes:
Simple-man’s guidelines (best in ~97% of the cases):
‣ Conjunctive equality conditions are king
Column order doesn’t affect data-locality
➡ Put them ﬁrst into the index and choose the column
order so that other queries can use the index too.
‣ Conjunctive range conditions are tricky
Column order affects data-locality
➡ Put them after the equality columns. If there are
multiple range conditions, put the most-selective ﬁrst.

Using Indexes:
Common mistakes:
‣ Arbitrary column order ☠ (bad)
“Just put all columns from the where-clause in the index”
➡ Works only for all-conjunctive all-equality searches
➡ Doesn’t make the index useful for other queries
‣ Most-selective ﬁrst ☠ (bad)
“Order the columns according to the selectivity”
➡ Only valid to prioritize among multiple range conditions

Using Indexes:
Finding Bad Index Row-Locality
------------------------------------
| Id | Operation |
------------------------------------
| 0 | SELECT STATEMENT |
| 1 | TABLE ACCESS BY INDEX ROWID|
|* 2 | INDEX SKIP SCAN |
------------------------------------
Predicate Information:
------------------------------------
2 - access("B"=20 AND "A">25)
filter("B"=20)
Index on (A, B)
------------------------------------
| Id | Operation |
------------------------------------
| 1 | TABLE ACCESS BY INDEX ROWID|
|* 2 | INDEX RANGE SCAN |
------------------------------------
------------------------------------
2 - access("B"=20 AND "A">25)
Index on (B, A)
Most 
efficient solution
Most efficient 
workaround
‣ Index ﬁlter predicates are a “bad smell”
‣ Index Skip Scan is a “bad smell”
‣ Index Fast Full Scan is a “bad smell”

Using Indexes:
Trailing-Columns to Avoid Table-Access
Example: SELECT C FROM X WHERE A > :a AND B = :b

Using Indexes:
Add all needed columns to the index to avoid table access. 
The so-called index-only scan.
‣ Useful to nullify a bad clustering factor 
Consequently, not very useful if
➡ Clustering factor close to the number of table blocks or
➡ Selecting only a few rows
‣ A single non-indexed column breaks it
No matter where it is mentioned (SELECT, ORDER BY,...)
➡ All or nothing: no beneﬁt from adding some SELECT
columns to the index.

Using Indexes:
Common mistakes:
‣ Selecting unneeded columns* ☠ (bad)
SELECT * anybody? ORM-tools in use? Hooray!
➡ Adding many columns to many indexes is a no-no.
‣ Pushing too hard ☠ (bad)
➡ Index gets bigger, clustering factor (CF) gets worse
➡ Small beneﬁt for low CF or if selecting a few rows only
➡ You’ll hit the hard limits (32 columns, 6398 bytes@8k)
* http://use-the-index-luke.com/blog/2013-08/its-not-about-the-star-stupid

Thinking
in
Ordered 
Sets
✓ ✓

Example:
List yesterday’s orders
CREATE TABLE orders ( 
..., 
order_dt DATE NOT NULL, 
... 
);
INSERT INTO orders 
(..., order_dt, ...) 
VALUES (..., sysdate , ...);
100k rows
Evenly distributed 
over 4 weeks.

Example:

Example:
1. Lower bound: 
ORDER_DT >= TRUNC(sysdate-1)
2. Upper bound: 
ORDER_DT < TRUNC(sysdate)
2. Write query using explicit range conditions
----------------------------------------------
| Id | Operation |
----------------------------------------------
|* 1 | FILTER |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED |
----------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TRUNC(SYSDATE@!)>TRUNC(SYSDATE@!-1))
3 - access("ORDER_DT">=TRUNC(SYSDATE@!-1)
AND "ORDER_DT"<TRUNC(SYSDATE@!))

Example:
Common anti-pattern:
‣TRUNC(order_dt)=:yesterday ☠ (bad)
This is an “obfuscation” of the actual intention
➡ Requires function-based index 
CREATE INDEX … (TRUNC(order_dt));
➡ Doesn’t support ordering by order_dt 
WHERE TRUNC(order_dt) = :yesterday 
ORDER BY order_dt DESC;
Index 
not ordered by that

--------------------------------------
| Id | Operation |
--------------------------------------
|* 1 | FILTER |
| 2 | TABLE ACCESS BY INDEX ROWID |
|* 3 | INDEX RANGE SCAN DESCENDING|
--------------------------------------
---------------------------------------------------
3 - access("ORDER_DT"<TRUNC(SYSDATE@!)
AND "ORDER_DT">=TRUNC(SYSDATE@!-1))
Example:
List yesterday’s orders reverse chronologically
1. Lower & upper bounds: 
ORDER_DT >= TRUNC(sysdate-1) 
ORDER_DT < TRUNC(sysdate)
2. Order 
ORDER BY ORDER_DT DESC 
2. Write query - exploit index order

--------------------------------------
| Id | Operation |
--------------------------------------
|* 1 | FILTER |
| 2 | TABLE ACCESS BY INDEX ROWID |
|* 3 | INDEX RANGE SCAN DESCENDING|
--------------------------------------
---------------------------------------------------
3 - access("ORDER_DT"<TRUNC(SYSDATE@!)
AND "ORDER_DT">=TRUNC(SYSDATE@!-1))
Example:
TRUNC(ORDER_DT)  
= TRUNC(sysdate)-1
2. Order 
ORDER BY ORDER_DT DESC

Example:
----------------------------------------------
| Id | Operation |
----------------------------------------------
| 1 | SORT ORDER BY |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED |
----------------------------------------------
---------------------------------------------------
3 - access("ORDERS"."SYS_NC00004$"=TRUNC(SYSDATE@!-1))
Tradeoff: 
CPU 
Memory 
IO 
TRUNC(ORDER_DT)  
= TRUNC(sysdate)-1
2. Order 

Example:
List orders from last 24 hours
1. Data-locality for the TRUNC variant
* http://www.sqlfail.com/2014/05/05/oracle-can-now-use-function-based-indexes-in-queries-without-functions/

Example:
-------------------------------------------------
| Id | Operation |
-------------------------------------------------
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED |
|* 2 | INDEX RANGE SCAN on TRUNC(ORDER_DT) |
-------------------------------------------------
---------------------------------------------------
1 - filter("ORDER_DT">SYSDATE@!-1)
2 - access("ORDERS"."SYS_NC00004$">=TRUNC(SYSDATE@!-1))
2. Upper bound: none (unbounded)
1. Lower bound: 
ORDER_DT > sysdate - 1
To use FBI Oracle adds (since 11.2.0.2*) 
TRUNC(ORDER_DT)>=TRUNC(sysdate-1)
* http://www.sqlfail.com/2014/05/05/oracle-can-now-use-function-based-indexes-in-queries-without-functions/

Example:
1. Maximize data-locality using straight index

Example:
1. Lower bound: 
ORDER_DT > sysdate - 1
2. Upper bound: none (unbounded)
--------------------------------------------
| Id | Operation |
--------------------------------------------
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED|
--------------------------------------------
----------------------
2 - access("ORDER_DT">SYSDATE@!-1)

Example:
--------------------------------------------
| Id | Operation |
--------------------------------------------
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED|
--------------------------------------------
----------------------
2 - access("ORDER_DT">SYSDATE@!-1)
--------------------------------------------
| Id | Operation |
--------------------------------------------
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED|
--------------------------------------------
----------------------
1 - filter("ORDER_DT">SYSDATE@!-1)
2 - access("ORDERS"."SYS_NC00004$">=TRUNC(SYSDATE@!-1))
Most 
efficient
solution
Most 
efficient 
workaround

Thinking
in
Ordered 
Sets
✓
✓
✓
✓

Example:
List 10 Most Recent Orders

Example:
1. Lower bound...? After 10 rows...???
2. Upper bound? sysdate? Unbounded!

Example:
1. Lower bound...? After 10 rows...???
2. Upper bound? sysdate? Unbounded!
2. Write query using top-n based termination
3. Start with: most recent 
4. Stop after: 10 rows 
FETCH FIRST 10 ROWS ONLY (since 12c)

Example:
2. Write query using top-n based termination
3. Start with: most recent 
4. Stop after: 10 rows 
FETCH FIRST 10 ROWS ONLY (since 12c)
----------------------------------------------------------
| Id | Operation | A-Rows | Buffers |
----------------------------------------------------------
| 0 | SELECT STATEMENT | 10 | 8 |
|* 1 | VIEW | 10 | 8 |
|* 2 | WINDOW NOSORT STOPKEY | 10 | 8 |
| 3 | TABLE ACCESS BY INDEX ROWID| 11 | 8 |
| 4 | INDEX FULL SCAN DESCENDING| 11 | 3 |
----------------------------------------------------------
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
2 - filter(ROW_NUMBER() OVER (ORDER BY ORDER_DT DESC)<=10) ROW_NUMBER() OVER (ORDER BY ORDER_DT DESC)<=10

SELECT orders.* 
, ROW_NUMBER() OVER ( 
ORDER BY order_dt DESC 
) rn 
FROM orders 

Window-Functions for Top-N Termination

SELECT * 
FROM ( 
SELECT orders.* 
, ROW_NUMBER() OVER ( 
ORDER BY order_dt DESC 
) rn 
FROM orders 
) 
WHERE rn <= 10 
Select 10 rows

SELECT * 
FROM ( 
SELECT orders.* 
, DENSE_RANK() OVER ( 
ORDER BY TRUNC(order_dt) DESC 
) rn 
FROM orders 
) 
WHERE rn <= 1 
Select 1 group

SELECT * 
FROM ( 
SELECT orders.* 
) rn 
FROM orders 
) 
WHERE rn <= 1 
Useful to 
abort on edges

SELECT * 
FROM ( 
SELECT orders.* 
) rn 
FROM orders 
) 
WHERE rn <= 1 
---------------------------------------------------------------------------
| Id | Operation | E-Rows | A-Rows | Buffers | Reads |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2057 | 695 | 695 |
| 1 | SORT ORDER BY | 100K| 2057 | 695 | 695 |
|* 2 | VIEW | 100K| 2057 | 695 | 695 |
|* 3 | WINDOW NOSORT STOPKEY | 100K| 2057 | 695 | 695 |
| 4 | TABLE ACCESS BY INDEX ROWID| 100K| 2058 | 695 | 695 |
| 5 | INDEX FULL SCAN DESCENDING| 100K| 2058 | 8 | 8 |
---------------------------------------------------------------------------
DENSE_RANK

SELECT * 
FROM orders 
WHERE TRUNC(order_dt) 
= (SELECT TRUNC(MAX(order_dt)) 
FROM orders 
) 
ORDER BY order_dt ;
---------------------------------------------------------------------------
---------------------------------------------------------------------------
| 1 | SORT ORDER BY | 100K| 2057 | 695 | 695 |
|* 2 | VIEW | 100K| 2057 | 695 | 695 |
---------------------------------------------------------------------------
DENSE_RANK

---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
| 1 | SORT ORDER BY | 3448 | 2057 | 1038 | 694 |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| 3448 | 2057 | 1038 | 694 |
|* 3 | INDEX RANGE SCAN | 3448 | 2057 | 10 | 8 |
| 4 | SORT AGGREGATE | 1 | 1 | 2 | 2 |
| 5 | INDEX FULL SCAN (MIN/MAX) | 1 | 1 | 2 | 2 |
---------------------------------------------------------------------------------
---------------------------------------------------------------------------
---------------------------------------------------------------------------
| 1 | SORT ORDER BY | 100K| 2057 | 695 | 695 |
|* 2 | VIEW | 100K| 2057 | 695 | 695 |
---------------------------------------------------------------------------
DENSE_RANK
SUB-SELECT

Top-N vs. Max()-Subquery
Common mistakes:
‣ Breaking ties with sub-queries ☠ (bad)
WHERE (a, b)= (select max(a), max(b) ...)
➡ max() values coming from different rows...
➡ No rows selected.
‣ Selecting Nth largest ☠ (bad)
WHERE X < (SELECT MAX()... 
WHERE X < (SELECT MAX()...))
WHERE (N-1) = (SELECT COUNT(DISTINCT(DT))...

Two steps to get the absolutely best execution plan:
Thinking
in
Ordered 
Sets
✓
✓
✓
✓
✓

Example:
List next 10 orders

Example:
List next 10 orders
2. Use explicit range condition & top-n abort
1. Lower bound: unbounded (top-n)
2. Upper bound: where we stopped 
WHERE ORDER_DT < :prev_dt
3. ORDER BY ORDER_DT DESC
4. FETCH FIRST 10 ROWS ONLY
What about ties?

Explicit range conditions: the general case
Example:
List next 10 orders

Example:
List next 10 orders
1. Use deﬁnite sort order
2. Use Row-Value ﬁlter to
remove what we have
seen before (SQL:92)
3. Hit Enter

Example:
List next 10 orders
(x,y) = (a,b)
(x,y) IN ((a,b),(c,d))
(x,y) < (a,b)
(x,y) > (a,b)
✓
✓
✗
✗
Oracle
limitation

Example:
List next 10 orders
Oracle
limitation
Two semantically 
equivalent workarounds:
X <= A
AND NOT(X=A AND Y>=B)
(X < A)
OR (X = A AND Y < B)
* http://use-the-index-luke.com/sql/partial-results/fetch-next-page#sb-equivalent-logic
☠
No proper index use*

Using OFFSET to fetch next rows
‣After adding FETCH FIRST...ROWS ONLY,
with SQL:2008, SQL:2011 introduced
OFFSET to skip rows.
‣Rows can be skipped with the ROWNUM
pseudo column too (ROWNUM > :x)
‣ROW_NUMBER() can do the trick too.
It doesn’t matter how to write it, ...

Using OFFSET to fetch next rows
OFFSET = SLEEP
The bigger the number, 
the slower the execution.
Even worse: it eats up resources 
and yields drifting results.

Thinking
in
Ordered 
Sets
✓
✓
✓
✓
✓
✓

About @MarkusWinand
‣Training for Developers
‣ SQL Performance (Indexing)
‣ Modern SQL
‣ On-Site or Online
‣SQL Tuning
‣ Index-Redesign
‣ Query Improvements
‣ On-Site or Online
http://winand.at/

About @MarkusWinand
@ModernSQL
http://modern-sql.com
@SQLPerfTips
http://use-the-index-luke.com

Shaping Optimizer's Search Space

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Shaping Optimizer's Search Space

Similaire à Shaping Optimizer's Search Space (20)

Plus de Gerger

Plus de Gerger (12)

Dernier

Dernier (20)

Shaping Optimizer's Search Space