Boost PC performance: How more available memory can improve productivity
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
1. Advanced query optimizer
tuning and analysis
Sergei Petrunia
Timour Katchaounov
Monty Program Ab
MySQL Conference And Expo 2013
2. 2 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
3. 3 07:48:08 AM
Is there a problem with query optimizer?
• Database
performance is
affected by many
factors
• One of them is the
query optimizer
• Is my performance
problem caused by
the optimizer?
4. 4 07:48:08 AM
Sings that there is a query optimizer problem
• Some (not all) queries are slow
• A query seems to run longer than it ought to
– And examines more records than it ought to
• Usually, query remains slow regardless of
other activity on the server
5. 5 07:48:08 AM
Catching slow queries, the old ways
● Watch the Slow query log
– Percona Server/MariaDB:
--log_slow_verbosity=query_plan
# Thread_id: 1 Schema: dbt3sf10 QC_hit: No
# Query_time: 2.452373 Lock_time: 0.000113 Rows_sent: 0 Rows_examined: 1500000
# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No
# Filesort: No Filesort_on_disk: No Merge_passes: 0
SET timestamp=1333385770;
select * from customer where c_acctbal < -1000;
• Run SHOW PROCESSLIST periodically
– Run pt-query-digest on the log
6. 6 07:48:08 AM
The new way: SHOW PROCESSLIST + SHOW EXPLAIN
• Available in MariaDB 10.0+
• Displays EXPLAIN of a running statement
MariaDB> show processlist;
+--+----+---------+-------+-------+----+------------+-------------------------...
|Id|User|Host |db |Command|Time|State |Info
+--+----+---------+-------+-------+----+------------+-------------------------...
| 1|root|localhost|dbt3sf1|Query | 10|Sending data|select max(o_totalprice) ...
| 2|root|localhost|dbt3sf1|Query | 0|init |show processlist
+--+----+---------+-------+-------+----+------------+-------------------------...
MariaDB> show explain for 1;
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |orders|ALL |NULL |NULL|NULL |NULL|1498194|Using where|
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+
MariaDB [dbt3sf1]> show warnings;
+-----+----+-----------------------------------------------------------------+
|Level|Code|Message |
+-----+----+-----------------------------------------------------------------+
|Note |1003|select max(o_totalprice) from orders where year(o_orderDATE)=1995|
+-----+----+-----------------------------------------------------------------+
7. 7 07:48:08 AM
SHOW EXPLAIN usage
● Intended usage
– SHOW PROCESSLIST ...
– SHOW EXPLAIN FOR ...
● Why not just run EXPLAIN again
– Difficult to replicate setups
● Temporary tables
● Optimizer settings
● Storage engine's index statistics
● ...
– No uncertainty about whether you're looking at
the same query plan or not.
9. 9 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
• Modified Q18 from DBT3
select c_name, c_custkey, o_orderkey, o_orderdate,
o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where
o_totalprice > ?
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by c_name, c_custkey, o_orderkey,
o_orderdate, o_totalprice
order by o_totalprice desc, o_orderdate
LIMIT 10;
• App executes Q18 many times with
? = 550000, 500000, 400000, ...
9
10. 10 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
● Find candidate slow queries
● Simple tests: select_full_join > 0,
created_tmp_disk_tables > 0, etc
● Complex conditions:
max execution time > X sec OR
min/max time vary a lot:
select max_timer_wait/avg_timer_wait as max_ratio,
avg_timer_wait/min_timer_wait as min_ratio
from events_statements_summary_by_digest
where max_timer_wait > 1000000000000
or max_timer_wait / avg_timer_wait > 2
or avg_timer_wait / min_timer_wait > 2G
11. 11 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
*************************** 5. row ***************************
DIGEST: 3cd7b881cbc0102f65fe8a290ec1bd6b
DIGEST_TEXT: SELECT `c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` ,
`o_totalprice` , SUM ( `l_quantity` ) FROM `customer` , `orders` , `lineitem` WHERE
`o_totalprice` > ? AND `c_custkey` = `o_custkey` AND `o_orderkey` = `l_orderkey` GROUP BY
`c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , `o_totalprice` ORDER BY `o_totalprice`
DESC , `o_orderdate` LIMIT ?
COUNT_STAR: 3
SUM_TIMER_WAIT: 3251758347000
MIN_TIMER_WAIT: 3914209000 → 0.0039 sec
AVG_TIMER_WAIT: 1083919449000
MAX_TIMER_WAIT: 3204044053000 → 3.2 sec
SUM_LOCK_TIME: 555000000
SUM_ROWS_SENT: 25
SUM_ROWS_EXAMINED: 0
SUM_CREATED_TMP_DISK_TABLES: 0
SUM_CREATED_TMP_TABLES: 3
SUM_SELECT_FULL_JOIN: 0
SUM_SELECT_RANGE: 3
SUM_SELECT_SCAN: 0
SUM_SORT_RANGE: 0
SUM_SORT_ROWS: 25
SUM_SORT_SCAN: 3
SUM_NO_INDEX_USED: 0
SUM_NO_GOOD_INDEX_USED: 0
FIRST_SEEN: 1970-01-01 03:38:27
LAST_SEEN: 1970-01-01 03:38:43
max_ratio: 2.9560
min_ratio: 276.9192
High variance of
execution time
12. 12 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
● Check the actual queries and constants
● The events_statements_history table
select timer_wait/1000000000000 as exec_time, sql_text
from events_statements_history
where digest in
(select digest from events_statements_summary_by_digest
where max_timer_wait > 1000000000000
or max_timer_wait / avg_timer_wait > 2
or avg_timer_wait / min_timer_wait > 2)
order by timer_wait;
13. 13 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
+-----------+-----------------------------------------------------------------------------------+
| exec_time | sql_text |
+-----------+-----------------------------------------------------------------------------------+
| 0.0039 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where o_totalprice > 550000 and c_custkey = o_custkey ... LIMIT 10 |
| 0.0438 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where o_totalprice > 500000 and c_custkey = o_custkey ... LIMIT 10 |
| 3.2040 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where o_totalprice > 400000 and c_custkey = o_custkey ... LIMIT 10 |
+-----------+-----------------------------------------------------------------------------------+
Observation:
orders.o_totalprice > ? is less and less selective
14. 14 07:48:08 AM
Actions after finding the slow query
Bad query plan
– Rewrite the query
– Force a good query plan
• Bad optimizer settings
– Do tuning
• Query is inherently complex
– Don't waste time with it
– Look for other solutions.
15. 15 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
16. 16 07:48:08 AM
Consider a simple select
• 15M rows were scanned, 19 rows in output
• Query plan seems inefficient
– (note: this logic doesn't directly apply to group/order by queries).
select * from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and
o_clerk='Clerk#000009506'
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
19 rows in set (7.65 sec)
● Check the query plan:
● Run the query:
17. 17 07:48:08 AM
Query plan analysis
• Entire table is scanned
• WHERE condition checked
after records are read
– Not used to limit
#examined rows.
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
select * from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and
o_clerk='Clerk#000009506'
18. 18 07:48:08 AM
Let's add an index
• Outcome
– Down to reading 300K rows
– Still, 300K >> 19 rows.
alter table orders add key i_o_orderdate (o_orderdate);
select * from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and
o_clerk='Clerk#000009506'
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where|
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
19 rows in set (0.76 sec)
● Query time:
19. 19 07:48:08 AM
Finding out which indexes to add
● index (o_orderdate)
● index (o_clerk)
Check selectivity of conditions that will use the index
select * from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and
o_clerk='Clerk#000009506'
select count(*) from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06';
306322 rows
select count(*) from orders where o_clerk='Clerk#000009506'
1507 rows.
20. 20 07:48:08 AM
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+
|1 |SIMPLE |orders|range|i_o_clerk_...|i_o_clerk_date|20 |NULL|19 |Using where|
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+
|1 |SIMPLE |orders|range|i_o_date_c...|i_o_date_clerk|20 |NULL|360354|Using where|
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+
Try adding composite indexes
● index (o_clerk, o_orderdate)
● index (o_orderdate, o_clerk)
Bingo! 100% efficiency
Much worse!
• If condition uses multiple columns, composite index will be most efficient
• Order of column matters
– Explanation why is outside of scope of this tutorial. Covered in last year's
tutorial
21. 21 07:48:08 AM
Conditions must be in SARGable form
• Condition must represent a range
• It must have form that is recognized by the optimizer
o_orderDate BETWEEN '1992-06-01' and '1992-06-30'
day(o_orderDate)=1992 and month(o_orderdate)=6
TO_DAYS(o_orderDATE) between TO_DAYS('1992-06-06') and
TO_DAYS('1992-07-06')
o_clerk='Clerk#000009506'
o_clerk LIKE 'Clerk#000009506'
o_clerk LIKE '%Clerk#000009506%'
column IN (1,10,15,21, ...)
(col1, col2) IN ( (1,1), (2,2), (3,3), …).
22. 22 07:48:08 AM
New in MySQL-5.6: optimizer_trace
● Lets you see the ranges
set optimizer_trace=1;
explain select * from orders
where o_orderDATE between '1992-06-01' and '1992-07-03' and
o_orderdate not in ('1992-01-01', '1992-06-12','1992-07-04')
select * from information_schema.optimizer_traceG
● Will print a big JSON struct
● Search for range_scan_alternatives.
23. 23 07:48:08 AM
New in MySQL-5.6: optimizer_trace
...
"range_scan_alternatives": [
{
"index": "i_o_orderdate",
"ranges": [
"1992-06-01 <= o_orderDATE < 1992-06-12",
"1992-06-12 < o_orderDATE <= 1992-07-03"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 319082,
"cost": 382900,
"chosen": true
},
{
"index": "i_o_date_clerk",
"ranges": [
"1992-06-01 <= o_orderDATE < 1992-06-12",
"1992-06-12 < o_orderDATE <= 1992-07-03"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 406336,
"cost": 487605,
"chosen": false,
"cause": "cost"
}
],
...
● Considered ranges are shown
in range_scan_alternatives
section
● This is actually original use
case of optimizer_trace
● Alas, recent mysql-5.6 displays
misleading info about ranges
on multi-component keys (will
file a bug)
● Still, very useful.
24. 24 07:48:08 AM
Source of #rows estimates for range
select * from orders
where o_orderDate BETWEEN '1992-06-06' and '1992-07-06'
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where|
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
?
• “records_in_range” estimate
• Done by diving into index
• Usually is fairly accurate
• Not affected by ANALYZE
TABLE.
25. 25 07:48:08 AM
Simple selects: conclusions
• Efficiency == “#rows_scanned is close to #rows_returned”
• Indexes and WHERE conditions reduce #rows scanned
• Index estimates are usually accurate
• Multi-column indexes
– “handle” conditions on multiple columns
– Order of columns in the index matters
• optimizer_trace allows to view the ranges
– But misrepresents ranges over multi-column indexes.
26. 26 07:48:08 AM
Now, will skip some topics
One can also speedup simple selects with
● index_merge access method
● index access method
● Index Condition Pushdown
We don't have time for these now, check out the last
year's tutorial.
27. 27 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
28. 28 07:48:08 AM
A simple join
select * from customer, orders where c_custkey=o_custkey
• “Customers with their orders”
29. 29 07:48:08 AM
Execution: Nested Loops join
select * from customer, orders where c_custkey=o_custkey
for each customer C {
for each order O {
if (C.c_custkey == O.o_custkey)
produce record(C, O);
}
}
• Complexity:
– Scans table customer
– For each record in customer, scans table orders
• Is this ok?
30. 30 07:48:08 AM
Execution: Nested loops join (2)
select * from customer, orders where c_custkey=o_custkey
for each customer C {
for each order O {
if (C.c_custkey == O.o_custkey)
produce record(C, O);
}
}
• EXPLAIN:
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | |
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
31. 31 07:48:08 AM
Execution: Nested loops join (3)
select * from customer, orders where c_custkey=o_custkey
for each customer C {
for each order O {
if (C.c_custkey == O.o_custkey)
produce record(C, O);
}
}
• EXPLAIN:
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | |
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
rows to read
from customer
rows to read from orders
c_custkey=o_custkey
32. 32 07:48:08 AM
Execution: Nested loops join (4)
select * from customer, orders where c_custkey=o_custkey
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | |
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
• Scan a 1,493,361-row table 148,749 times
– Consider 1,493,361 * 148,749 row combinations
• Is this query inherently complex?
– We know each customer has his own orders
– size(customer x orders)= size(orders)
– Lower bound is
1,493,361 + 148,749 + costs to match customer<->order.
33. 33 07:48:08 AM
Using index for join: ref access
alter table orders add index i_o_custkey(o_custkey)
select * from customer, orders where c_custkey=o_custkey
34. 34 07:48:08 AM
ref access - analysis
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |148749| |
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+
select * from customer, orders where c_custkey=o_custkey
● One ref lookup scans 7 rows.
● In total: 7 * 148,749=1,041,243 rows
– `orders` has 1.4M rows
– no redundant reads from `orders`
● The whole query plan
– Reads all customers
– Reads 1M orders (of 1.4M)
● Efficient!
35. 35 07:48:08 AM
Conditions that can be used for ref access
● Can use equalities
– tbl.key=other_table.col
– tbl.key=const
– tbl.key IS NULL
● For multipart keys, will use largest prefix
– keypart1=... AND keypart2= … AND keypartK=... .
36. 36 07:48:08 AM
Conditions that can't be used for ref access
● Doesn't work for non-equalities
t1.key BETWEEN t2.col1 AND t2.col2
● Doesn't work for OR-ed equalities
t1.key=t2.col1 OR t1.key=t2.col2
– Except for ref_or_null
t1.key=... OR t1.key IS NULL
● Doesn't “combine” ref and range
access
– t.keypart1 BETWEEN c1 AND c2 AND
t.keypart2=t2.col
– t.keypart2 BETWEEN c1 AND c2 AND
t.keypart1=t2.col .
37. 37 07:48:08 AM
Is ref always efficient?
● Efficient, if column has many different values
– Best case – unique index (eq_ref)
● A few different values – not useful
● Skewed distribution: depends on which part the
join touches
good
bad
depends
38. 38 07:48:08 AM
ref access estimates - index statistics
• How many rows will match
tbl.key_column = $value
for an arbitrary $value?
• Index statistics
show keys from orders where key_name='i_o_custkey'
*************************** 1. row ***************
Table: orders
Non_unique: 1
Key_name: i_o_custkey
Seq_in_index: 1
Column_name: o_custkey
Collation: A
Cardinality: 214462
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
show table status like 'orders'
*************************** 1. row ****
Name: orders
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 1495152
Avg_row_length: 133
Data_length: 199966720
Max_data_length: 0
Index_length: 122421248
Data_free: 6291456
...
average = Rows /Cardinality = 1495152 / 214462 = 6.97.
39. 39 07:48:08 AM
ref access – conclusions
● Based on t.key=... equality conditions
● Can make joins very efficient
● Relies on index statistics for estimates.
40. 40 07:48:08 AM
Optimizer statistics
● MySQL/Percona Server
– Index statistics
– Persistent/transient InnoDB stats
● MariaDB
– Index statistics, persistent/transient
● Same as Percona Server (via XtraDB)
– Persistent,
engine-independent,
index-independent statistics.
41. 41 07:48:08 AM
Index statistics
● Cardinality allows to calculate a table-wide
average #rows-per-key-prefix
● It is a statistical value (inexact)
● Exact collection procedure depends on the
storage engine
– InnoDB – random sampling
– MyISAM – index scan
– Engine-independent – index scan.
42. 42 07:48:08 AM
Index statistics in MySQL 5.6
● Sample [8] random index leaf pages
● Table statistics (stored)
– rows - estimated number of rows in a table
– Other stats not used by optimizer
● Index statistics (stored)
– fields - #fields in the index
– rows_per_key - rows per 1 key value, per prefix fields
([1 column value], [2 columns value], [3 columns value], …)
– Other stats not used by optimizer.
43. 43 07:48:08 AM
Index statics updates
● Statistics updated when:
– ANALYZE TABLE tbl_name [, tbl_name] …
– SHOW TABLE STATUS, SHOW INDEX
– Access to INFORMATION_SCHEMA.[TABLES|
STATISTICS]
– A table is opened for the first time
(after server restart)
– A table has changed >10%
– When InnoDB Monitor is turned ON.
44. 44 07:48:08 AM
Displaying optimizer statistics
● MySQL 5.5, MariaDB 5.3, and older
– Issue SQL statements to count rows/keys
– Indirectly, look at EXPLAIN for simple queries
● MariaDB 5.5, Percona Server 5.5 (using XtraDB)
– information_schema.[innodb_index_stats, innodb_table_stats]
– Read-only, always visible
● MySQL 5.6
– mysql.[innodb_index_stats, innodb_table_stats]
– User updatetable
– Only available if innodb_analyze_is_persistent=ON
● MariaDB 10.0
– Persistent updateable tables mysql.[index_stats, column_stats, table_stats]
– User updateable
– + current XtraDB mechanisms.
46. 46 07:48:08 AM
Controlling statistics (MySQL 5.6)
● Persistent and user-updatetable InnoDB statistics
– innodb_analyze_is_persistent = ON,
– updated manually by ANALYZE TABLE or
– automatically by innodb_stats_auto_recalc = ON
● Control the precision of sampling [default 8]
– innodb_stats_persistent_sample_pages,
– innodb_stats_transient_sample_pages
●
No new statistics compared to older versions.
47. 47 07:48:08 AM
Controlling statistics (MariaDB 10.0)
Current XtraDB index statistics
+
● Engine-independent, persistent, user-updateable statistics
● Precise
● Additional statistics per column (even when there is no
index):
– min_value, max_value: minimum/maximum value per
column
– nulls_ratio: fraction of null values in a column
– avg_length: average size of values in a column
– avg_frequency: average number of rows with the same
value.
49. 49 07:48:08 AM
Join condition pushdown
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+.
50. 50 07:48:08 AM
Join condition pushdown
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
51. 51 07:48:08 AM
Join condition pushdown
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
52. 52 07:48:08 AM
Join condition pushdown
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
● Conjunctive (ANDed) conditions are split into parts
● Each part is attached as early as possible
– Either as “Using where”
– Or as table access method.
53. 53 07:48:08 AM
Observing join condition pushdown
EXPLAIN: {
"query_block": {
"select_id": 1,
"nested_loop": [
{
"table": {
"table_name": "orders",
"access_type": "ALL",
"possible_keys": [
"i_o_custkey"
],
"rows": 1499715,
"filtered": 100,
"attached_condition": "((`dbt3sf1`.`orders`.`o_orderpriority` =
'1-URGENT') and (`dbt3sf1`.`orders`.`o_custkey` is not null))"
}
},
{
"table": {
"table_name": "customer",
"access_type": "eq_ref",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"c_custkey"
],
"key_length": "4",
"ref": [
"dbt3sf1.orders.o_custkey"
],
"rows": 1,
"filtered": 100,
"attached_condition": "(`dbt3sf1`.`customer`.`c_acctbal` <
<cache>(-(500)))"
}
● Before mysql-5.6:
EXPLAIN shows only
“Using where”
– The condition itself
only visible in debug
trace
● Starting from 5.6:
EXPLAIN FORMAT=JSON
shows attached
conditions.
54. 54 07:48:08 AM
Reasoning about join plan efficiency
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
First table, “customer”
● type=ALL, 150 K rows
●
select count(*) from customer where c_acctbal < -500 gives 6804.
● alter table customer add index (c_acctbal).
55. 55 07:48:08 AM
Reasoning about join plan efficiency
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
First table, “customer”
● type=ALL, 150 K rows
●
select count(*) from customer where c_acctbal < -500 gives 6804.
● alter table customer add index (c_acctbal)
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
Now, access to 'customer' is efficient.
56. 56 07:48:08 AM
Reasoning about join plan efficiency
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
Second table, “orders”
● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT'
● ref access uses only c_custkey=o_custkey
● What about o_orderpriority='1-URGENT'?.
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
58. 58 07:48:08 AM
Reasoning about join plan efficiency
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
Second table, “orders”
● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT'
● ref access uses only c_custkey=o_custkey
● What about o_orderpriority='1-URGENT'? Selectivity= 0.2
– Can examine 7*0.2=1.4 rows, 6802 times if we add an index:
alter table orders add index (o_custkey, o_orderpriority)
or
alter table orders add index (o_orderpriority, o_custkey)
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
59. 59 07:48:08 AM
Reasoning about join plan efficiency - summary
Basic* approach to evaluation of join plan efficiency:
for each table $T in the join order {
Look at conditions attached to table $T (condition must
use table $T, may also use previous tables)
Does access method used with $T make a good use
of attached conditions?
}
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
* some other details may also affect join performance
61. 61 07:48:08 AM
Attached conditions
● Ideally, should be used for table access
● Not all conditions can be used [at the same time]
– Unused ones are still useful
– They reduce number of scans for subsequent tables
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
62. 62 07:48:08 AM
Informing optimizer about attached conditions
Currently: a range access that's too expensive to use
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+
|id|select_type|table |type|possible_keys |key |key_len|ref |rows |filtered|Extra |
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY,c_acctbal|NULL |NULL |NULL |150081| 36.22 |Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | 100.00 |Using where|
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+
explain extended
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal > 8000 and
o_orderpriority='1-URGENT';
● `orders` will be scanned 150081 * 36.22%= 54359 times
● This reduces the cost of join
– Has an effect when comparing potential join plans
● => Index i_o_custkey is not used. But may help the optimizer.
63. 63 07:48:08 AM
Attached condition selectivity
● Unused indexes provide info about selectivity
– Works, but very expensive
● MariaDB 10.0 has engine-independent statistics
– Index statistics
– Non-indexed Column statistics
● Histograms
– Further info:
Tomorrow, 2:20 pm @ Ballroom D
Igor Babaev
Engine-independent persistent statistics with histograms
in MariaDB.
65. 65 07:48:08 AM
Check if query plan is realistic
● EXPLAIN shows what optimizer
expects. It may be wrong
– Out-of-date index statistics
– Non-uniform data distribution
● Other DBMS: EXPLAIN ANALYZE
● MySQL: no equivalent. Instead, have
– Handler counters
– “User statistics” (Percona, MariaDB)
– PERFORMANCE_SCHEMA
66. 66 07:48:08 AM
Join analysis: example query (Q18, DBT3)
<reset counters>
select c_name, c_custkey, o_orderkey, o_orderdate,
o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where
o_totalprice > 500000
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by c_name, c_custkey, o_orderkey, o_orderdate,
o_totalprice
order by o_totalprice desc, o_orderdate
LIMIT 10;
<collect statistics>
69. 69 07:48:08 AM
Join analysis: PERFORMANCE SCHEMA
[MySQL 5.6, MariaDB 10.0]
● summary tables with read/write statistics
– table_io_waits_summary_by_table
– table_io_waits_summary_by_index_usage
● Superset of the userstat tables
● More overhead
● Not possible to associate statistics with a query
=> truncate stats tables before running a query
● Possible bug
– performance schema not ignored
– Disable by
UPDATE setup_consumers SET ENABLED = 'NO'
where name = 'global_instrumentation';
72. 72 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
73. 73 07:48:08 AM
Batched joins
● Optimization for analytical queries
● Analytic queries shovel through lots of data
– e.g. “average size of order in the last month”
– or “pairs of goods purchased together”
● Indexes,etc won't help when you really need to
look at all data
● More data means greater chance of being io-bound
● Solution: batched joins
80. 80 07:48:08 AM
Batched Key Access Idea
● Non-BKA join hits data at random
● Caches are not used efficiently
● Prefetching is not useful
81. 81 07:48:08 AM
Batched Key Access Idea
● BKA implementation accesses data
in order
● Takes advantages of caches and
prefetching
82. 82 07:48:08 AM
Batched Key access effect
set join_cache_level=6;
select max(l_extendedprice)
from orders, lineitem
where
l_orderkey=o_orderkey and
o_orderdate between $DATE1 and $DATE2
The benchmark was run with
● Various BKA buffer size
● Various size of $DATE1...$DATE2 range
83. 83 07:48:08 AM
Batched Key Access Performance
-2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,000
0
500
1000
1500
2000
2500
3000
BKA join performance depending on buffer size
query_size=1, regular
query_size=1, BKA
query_size=2, regular
query_size=2, BKA
query_size=3, regular
query_size=3, BKA
Buffer size, bytes
Querytime,sec
Performance without BKA
Performance with BKA,
given sufficient buffer size● 4x-10x speedup
● The more the data, the bigger the speedup
● Buffer size setting is very important.
84. 84 07:48:08 AM
Batched Key Access settings
● Needs to be turned on
set join_buffer_size= 32*1024*1024;
set join_cache_level=6; -- MariaDB
set optimizer_switch='batched_key_access=on' -- MySQL 5.6
set optimizer_switch='mrr=on';
set optimizer_switch='mrr_sort_keys=on'; -- MariaDB only
● Further join_buffer_size tuning is watching
– Query performance
– Handler_mrr_init counter
and increasing join_buffer_size until either saturates.
85. 85 07:48:08 AM
Batched Key Access - conclusions
● Targeted at big joins
● Needs to be enabled manually
● @@join_buffer_size is the most important
setting
● MariaDB's implementation is a superset of
MySQL's.
86. 86 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
88. 88 07:48:08 AM
Aggregate functions, no GROUP BY
● COUNT, SUM, AVG, etc need to examine all rows
select SUM(column) from tbl needs to examine the whole tbl.
● MIN and MAX can use index for lookup
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+
|id|select_type|table|type|possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+
|1 |SIMPLE |NULL |NULL|NULL |NULL|NULL |NULL|NULL|Select tables optimized away|
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+
index (o_orderdate)
select max(o_orderdate) from orders
select min(o_orderdate) from orders where o_orderdate > '1995-05-01'
select max(o_orderdate) from orders where o_orderpriority='1-URGENT'
index (o_orderpriority, o_orderdate)
89. 89 07:48:08 AM
ORDER BY … LIMIT
Three algorithms
● Use an index to read in order
● Read one table, sort, join - “Using filesort”
● Execute join into temporary table and then
sort - “Using temporary; Using filesort”
90. 90 07:48:08 AM
Using index to read data in order
● No special indication
in EXPLAIN output
● LIMIT n: as soon as
we read n records,
we can stop!
91. 91 07:48:08 AM
A problem with LIMIT N optimization
`orders` has 1.5 M rows
explain select * from orders order by o_orderdate desc limit 10;
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra|
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+
|1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 | |
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+
select * from orders where o_orderpriority='1-URGENT' order by o_orderdate desc limit 10;
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+
|1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 |Using where|
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+
● A problem:
– 1.5M rows, 300K of them 'URGENT'
– Scanning by date, when will we find 10 'URGENT' rows?
– No good solution so far.
92. 92 07:48:08 AM
Using filesort strategy
● Have to read the entire
first table
● For remaining, can apply
LIMIT n
● ORDER BY can only use
columns of tbl1.
93. 93 07:48:08 AM
Using temporary; Using filesort
● ORDER BY clause
can use columns of
any table
● LIMIT is applied only
after executing the
entire join and
sorting.
94. 94 07:48:08 AM
ORDER BY - conclusions
● Resolving ORDER BY with index allows very
efficient handling for LIMIT
– Optimization for
WHERE unused_condition ORDER BY … LIMIT n
is challenging.
● Use sql_big_result, IGNORE INDEX FOR ORDER BY
● Using filesort
– Needs all ORDER BY columns in the first table
– Take advantage of LIMIT when doing join to non-first tables
● Using where; Using filesort is least efficient.
95. 95 07:48:08 AM
GROUP BY strategies
There are three strategies
● Ordered index scan
● Loose Index Scan (LooseScan)
● Groups table
(Using temporary; [Using filesort]).
96. 96 07:48:08 AM
Ordered index scan
● Groups are
enumerated one after
another
● Can compute
aggregates on the fly
● Loose index scan is
also able to jump to
next group.
99. 99 07:48:08 AM
Subquery optimizations
● Before MariaDB 5.3/MySQL 5.6 - “don't use subqueries”
● Queries that caused most of the pain
– SELECT … FROM tbl WHERE col IN (SELECT …) - semi-joins
– SELECT … FROM (SELECT …) - derived tables
● MariaDB 5.3 and MySQL 5.6
– Have common inheritance, MySQL 6.0 alpha
– Huge (100x, 1000x) speedups for painful areas
– Other kinds of subqueries received a speedup, too
– MariaDB 5.3/5.5 has a superset of MySQL 5.6's optimizations
● 5.6 handles some un-handled edge cases, too
100. 100 07:48:08 AM
Tuning for subqueries
● “Before”: one execution strategy
– No tuning possible
● “After”: similar to joins
– Reasonable execution strategies supported
– Need indexes
– Need selective conditions
– Support batching in most important cases
● Should be better 9x% of the time.
101. 101 07:48:08 AM
What if it still picks a poor query plan?
For both MariaDB and MySQL:
● Check EXPLAIN [EXTENDED], find a keyword around a
subquery table
● Google “site:kb.askmonty.org $subuqery_keyword”
or https://kb.askmonty.org/en/subquery-optimizations-map/
● Find which optimization it was
● set optimizer_switch='$subquery_optimization=off'