SlideShare une entreprise Scribd logo
1  sur  102
Télécharger pour lire hors ligne
Advanced query optimizer
tuning and analysis
Sergei Petrunia
Timour Katchaounov
Monty Program Ab
MySQL Conference And Expo 2013
2 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
3 07:48:08 AM
Is there a problem with query optimizer?
• Database
performance is
affected by many
factors
• One of them is the
query optimizer
• Is my performance
problem caused by
the optimizer?
4 07:48:08 AM
Sings that there is a query optimizer problem
• Some (not all) queries are slow
• A query seems to run longer than it ought to
– And examines more records than it ought to
• Usually, query remains slow regardless of
other activity on the server
5 07:48:08 AM
Catching slow queries, the old ways
● Watch the Slow query log
– Percona Server/MariaDB:
--log_slow_verbosity=query_plan
# Thread_id: 1 Schema: dbt3sf10 QC_hit: No
# Query_time: 2.452373 Lock_time: 0.000113 Rows_sent: 0 Rows_examined: 1500000
# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No
# Filesort: No Filesort_on_disk: No Merge_passes: 0
SET timestamp=1333385770;
select * from customer where c_acctbal < -1000;
• Run SHOW PROCESSLIST periodically
– Run pt-query-digest on the log
6 07:48:08 AM
The new way: SHOW PROCESSLIST + SHOW EXPLAIN
• Available in MariaDB 10.0+
• Displays EXPLAIN of a running statement
MariaDB> show processlist;
+--+----+---------+-------+-------+----+------------+-------------------------...
|Id|User|Host |db |Command|Time|State |Info
+--+----+---------+-------+-------+----+------------+-------------------------...
| 1|root|localhost|dbt3sf1|Query | 10|Sending data|select max(o_totalprice) ...
| 2|root|localhost|dbt3sf1|Query | 0|init |show processlist
+--+----+---------+-------+-------+----+------------+-------------------------...
MariaDB> show explain for 1;
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |orders|ALL |NULL |NULL|NULL |NULL|1498194|Using where|
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+
MariaDB [dbt3sf1]> show warnings;
+-----+----+-----------------------------------------------------------------+
|Level|Code|Message |
+-----+----+-----------------------------------------------------------------+
|Note |1003|select max(o_totalprice) from orders where year(o_orderDATE)=1995|
+-----+----+-----------------------------------------------------------------+
7 07:48:08 AM
SHOW EXPLAIN usage
● Intended usage
– SHOW PROCESSLIST ...
– SHOW EXPLAIN FOR ...
● Why not just run EXPLAIN again
– Difficult to replicate setups
● Temporary tables
● Optimizer settings
● Storage engine's index statistics
● ...
– No uncertainty about whether you're looking at
the same query plan or not.
8 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
● use performance_schema
● Many ways to analyze via queries
– events_statements_summary_by_digest
● count_star, sum_timer_wait,
min_timer_wait, avg_timer_wait, max_timer_wait
● digest_text, digest
● sum_rows_examined, sum_created_tmp_disk_tables,
sum_select_full_join
– events_statements_history
● sql_text, digest_text, digest
● timer_start, timer_end, timer_wait
● rows_examined, created_tmp_disk_tables,
select_full_join
8
9 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
• Modified Q18 from DBT3
select c_name, c_custkey, o_orderkey, o_orderdate,
o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where
o_totalprice > ?
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by c_name, c_custkey, o_orderkey,
o_orderdate, o_totalprice
order by o_totalprice desc, o_orderdate
LIMIT 10;
• App executes Q18 many times with
? = 550000, 500000, 400000, ...
9
10 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
● Find candidate slow queries
● Simple tests: select_full_join > 0,
created_tmp_disk_tables > 0, etc
● Complex conditions:
max execution time > X sec OR
min/max time vary a lot:
select max_timer_wait/avg_timer_wait as max_ratio,
avg_timer_wait/min_timer_wait as min_ratio
from events_statements_summary_by_digest
where max_timer_wait > 1000000000000
or max_timer_wait / avg_timer_wait > 2
or avg_timer_wait / min_timer_wait > 2G
11 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
*************************** 5. row ***************************
DIGEST: 3cd7b881cbc0102f65fe8a290ec1bd6b
DIGEST_TEXT: SELECT `c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` ,
`o_totalprice` , SUM ( `l_quantity` ) FROM `customer` , `orders` , `lineitem` WHERE
`o_totalprice` > ? AND `c_custkey` = `o_custkey` AND `o_orderkey` = `l_orderkey` GROUP BY
`c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , `o_totalprice` ORDER BY `o_totalprice`
DESC , `o_orderdate` LIMIT ?
COUNT_STAR: 3
SUM_TIMER_WAIT: 3251758347000
MIN_TIMER_WAIT: 3914209000 → 0.0039 sec
AVG_TIMER_WAIT: 1083919449000
MAX_TIMER_WAIT: 3204044053000 → 3.2 sec
SUM_LOCK_TIME: 555000000
SUM_ROWS_SENT: 25
SUM_ROWS_EXAMINED: 0
SUM_CREATED_TMP_DISK_TABLES: 0
SUM_CREATED_TMP_TABLES: 3
SUM_SELECT_FULL_JOIN: 0
SUM_SELECT_RANGE: 3
SUM_SELECT_SCAN: 0
SUM_SORT_RANGE: 0
SUM_SORT_ROWS: 25
SUM_SORT_SCAN: 3
SUM_NO_INDEX_USED: 0
SUM_NO_GOOD_INDEX_USED: 0
FIRST_SEEN: 1970-01-01 03:38:27
LAST_SEEN: 1970-01-01 03:38:43
max_ratio: 2.9560
min_ratio: 276.9192
High variance of
execution time
12 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
● Check the actual queries and constants
● The events_statements_history table
select timer_wait/1000000000000 as exec_time, sql_text
from events_statements_history
where digest in
(select digest from events_statements_summary_by_digest
where max_timer_wait > 1000000000000
or max_timer_wait / avg_timer_wait > 2
or avg_timer_wait / min_timer_wait > 2)
order by timer_wait;
13 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
+-----------+-----------------------------------------------------------------------------------+
| exec_time | sql_text |
+-----------+-----------------------------------------------------------------------------------+
| 0.0039 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where o_totalprice > 550000 and c_custkey = o_custkey ... LIMIT 10 |
| 0.0438 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where o_totalprice > 500000 and c_custkey = o_custkey ... LIMIT 10 |
| 3.2040 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where o_totalprice > 400000 and c_custkey = o_custkey ... LIMIT 10 |
+-----------+-----------------------------------------------------------------------------------+
Observation:
orders.o_totalprice > ? is less and less selective
14 07:48:08 AM
Actions after finding the slow query
Bad query plan
– Rewrite the query
– Force a good query plan
• Bad optimizer settings
– Do tuning
• Query is inherently complex
– Don't waste time with it
– Look for other solutions.
15 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
16 07:48:08 AM
Consider a simple select
• 15M rows were scanned, 19 rows in output
• Query plan seems inefficient
– (note: this logic doesn't directly apply to group/order by queries).
select * from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and
o_clerk='Clerk#000009506'
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
19 rows in set (7.65 sec)
● Check the query plan:
● Run the query:
17 07:48:08 AM
Query plan analysis
• Entire table is scanned
• WHERE condition checked
after records are read
– Not used to limit
#examined rows.
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
select * from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and
o_clerk='Clerk#000009506'
18 07:48:08 AM
Let's add an index
• Outcome
– Down to reading 300K rows
– Still, 300K >> 19 rows.
alter table orders add key i_o_orderdate (o_orderdate);
select * from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and
o_clerk='Clerk#000009506'
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where|
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
19 rows in set (0.76 sec)
● Query time:
19 07:48:08 AM
Finding out which indexes to add
● index (o_orderdate)
● index (o_clerk)
Check selectivity of conditions that will use the index
select * from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and
o_clerk='Clerk#000009506'
select count(*) from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06';
306322 rows
select count(*) from orders where o_clerk='Clerk#000009506'
1507 rows.
20 07:48:08 AM
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+
|1 |SIMPLE |orders|range|i_o_clerk_...|i_o_clerk_date|20 |NULL|19 |Using where|
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+
|1 |SIMPLE |orders|range|i_o_date_c...|i_o_date_clerk|20 |NULL|360354|Using where|
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+
Try adding composite indexes
● index (o_clerk, o_orderdate)
● index (o_orderdate, o_clerk)
Bingo! 100% efficiency
Much worse!
• If condition uses multiple columns, composite index will be most efficient
• Order of column matters
– Explanation why is outside of scope of this tutorial. Covered in last year's
tutorial
21 07:48:08 AM
Conditions must be in SARGable form
• Condition must represent a range
• It must have form that is recognized by the optimizer
o_orderDate BETWEEN '1992-06-01' and '1992-06-30'
day(o_orderDate)=1992 and month(o_orderdate)=6
TO_DAYS(o_orderDATE) between TO_DAYS('1992-06-06') and
TO_DAYS('1992-07-06')
o_clerk='Clerk#000009506'
o_clerk LIKE 'Clerk#000009506'
o_clerk LIKE '%Clerk#000009506%'






column IN (1,10,15,21, ...)
(col1, col2) IN ( (1,1), (2,2), (3,3), …). 

22 07:48:08 AM
New in MySQL-5.6: optimizer_trace
● Lets you see the ranges
set optimizer_trace=1;
explain select * from orders
where o_orderDATE between '1992-06-01' and '1992-07-03' and
o_orderdate not in ('1992-01-01', '1992-06-12','1992-07-04')
select * from information_schema.optimizer_traceG
● Will print a big JSON struct
● Search for range_scan_alternatives.
23 07:48:08 AM
New in MySQL-5.6: optimizer_trace
...
"range_scan_alternatives": [
{
"index": "i_o_orderdate",
"ranges": [
"1992-06-01 <= o_orderDATE < 1992-06-12",
"1992-06-12 < o_orderDATE <= 1992-07-03"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 319082,
"cost": 382900,
"chosen": true
},
{
"index": "i_o_date_clerk",
"ranges": [
"1992-06-01 <= o_orderDATE < 1992-06-12",
"1992-06-12 < o_orderDATE <= 1992-07-03"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 406336,
"cost": 487605,
"chosen": false,
"cause": "cost"
}
],
...
● Considered ranges are shown
in range_scan_alternatives
section
● This is actually original use
case of optimizer_trace
● Alas, recent mysql-5.6 displays
misleading info about ranges
on multi-component keys (will
file a bug)
● Still, very useful.
24 07:48:08 AM
Source of #rows estimates for range
select * from orders
where o_orderDate BETWEEN '1992-06-06' and '1992-07-06'
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where|
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
?
• “records_in_range” estimate
• Done by diving into index
• Usually is fairly accurate
• Not affected by ANALYZE
TABLE.
25 07:48:08 AM
Simple selects: conclusions
• Efficiency == “#rows_scanned is close to #rows_returned”
• Indexes and WHERE conditions reduce #rows scanned
• Index estimates are usually accurate
• Multi-column indexes
– “handle” conditions on multiple columns
– Order of columns in the index matters
• optimizer_trace allows to view the ranges
– But misrepresents ranges over multi-column indexes.
26 07:48:08 AM
Now, will skip some topics
One can also speedup simple selects with
● index_merge access method
● index access method
● Index Condition Pushdown
We don't have time for these now, check out the last
year's tutorial.
27 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
28 07:48:08 AM
A simple join
select * from customer, orders where c_custkey=o_custkey
• “Customers with their orders”
29 07:48:08 AM
Execution: Nested Loops join
select * from customer, orders where c_custkey=o_custkey
for each customer C {
for each order O {
if (C.c_custkey == O.o_custkey)
produce record(C, O);
}
}
• Complexity:
– Scans table customer
– For each record in customer, scans table orders
• Is this ok?
30 07:48:08 AM
Execution: Nested loops join (2)
select * from customer, orders where c_custkey=o_custkey
for each customer C {
for each order O {
if (C.c_custkey == O.o_custkey)
produce record(C, O);
}
}
• EXPLAIN:
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | |
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
31 07:48:08 AM
Execution: Nested loops join (3)
select * from customer, orders where c_custkey=o_custkey
for each customer C {
for each order O {
if (C.c_custkey == O.o_custkey)
produce record(C, O);
}
}
• EXPLAIN:
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | |
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
rows to read
from customer
rows to read from orders
c_custkey=o_custkey
32 07:48:08 AM
Execution: Nested loops join (4)
select * from customer, orders where c_custkey=o_custkey
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | |
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
• Scan a 1,493,361-row table 148,749 times
– Consider 1,493,361 * 148,749 row combinations
• Is this query inherently complex?
– We know each customer has his own orders
– size(customer x orders)= size(orders)
– Lower bound is
1,493,361 + 148,749 + costs to match customer<->order.
33 07:48:08 AM
Using index for join: ref access
alter table orders add index i_o_custkey(o_custkey)
select * from customer, orders where c_custkey=o_custkey
34 07:48:08 AM
ref access - analysis
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |148749| |
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+
select * from customer, orders where c_custkey=o_custkey
● One ref lookup scans 7 rows.
● In total: 7 * 148,749=1,041,243 rows
– `orders` has 1.4M rows
– no redundant reads from `orders`
● The whole query plan
– Reads all customers
– Reads 1M orders (of 1.4M)
● Efficient!
35 07:48:08 AM
Conditions that can be used for ref access
● Can use equalities
– tbl.key=other_table.col
– tbl.key=const
– tbl.key IS NULL
● For multipart keys, will use largest prefix
– keypart1=... AND keypart2= … AND keypartK=... .
36 07:48:08 AM
Conditions that can't be used for ref access
● Doesn't work for non-equalities
t1.key BETWEEN t2.col1 AND t2.col2
● Doesn't work for OR-ed equalities
t1.key=t2.col1 OR t1.key=t2.col2
– Except for ref_or_null
t1.key=... OR t1.key IS NULL
● Doesn't “combine” ref and range
access
– t.keypart1 BETWEEN c1 AND c2 AND
t.keypart2=t2.col
– t.keypart2 BETWEEN c1 AND c2 AND
t.keypart1=t2.col .
37 07:48:08 AM
Is ref always efficient?
● Efficient, if column has many different values
– Best case – unique index (eq_ref)
● A few different values – not useful
● Skewed distribution: depends on which part the
join touches
good
bad
depends
38 07:48:08 AM
ref access estimates - index statistics
• How many rows will match
tbl.key_column = $value
for an arbitrary $value?
• Index statistics
show keys from orders where key_name='i_o_custkey'
*************************** 1. row ***************
Table: orders
Non_unique: 1
Key_name: i_o_custkey
Seq_in_index: 1
Column_name: o_custkey
Collation: A
Cardinality: 214462
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
show table status like 'orders'
*************************** 1. row ****
Name: orders
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 1495152
Avg_row_length: 133
Data_length: 199966720
Max_data_length: 0
Index_length: 122421248
Data_free: 6291456
...
average = Rows /Cardinality = 1495152 / 214462 = 6.97.
39 07:48:08 AM
ref access – conclusions
● Based on t.key=... equality conditions
● Can make joins very efficient
● Relies on index statistics for estimates.
40 07:48:08 AM
Optimizer statistics
● MySQL/Percona Server
– Index statistics
– Persistent/transient InnoDB stats
● MariaDB
– Index statistics, persistent/transient
● Same as Percona Server (via XtraDB)
– Persistent,
engine-independent,
index-independent statistics.
41 07:48:08 AM
Index statistics
● Cardinality allows to calculate a table-wide
average #rows-per-key-prefix
● It is a statistical value (inexact)
● Exact collection procedure depends on the
storage engine
– InnoDB – random sampling
– MyISAM – index scan
– Engine-independent – index scan.
42 07:48:08 AM
Index statistics in MySQL 5.6
● Sample [8] random index leaf pages
● Table statistics (stored)
– rows - estimated number of rows in a table
– Other stats not used by optimizer
● Index statistics (stored)
– fields - #fields in the index
– rows_per_key - rows per 1 key value, per prefix fields
([1 column value], [2 columns value], [3 columns value], …)
– Other stats not used by optimizer.
43 07:48:08 AM
Index statics updates
● Statistics updated when:
– ANALYZE TABLE tbl_name [, tbl_name] …
– SHOW TABLE STATUS, SHOW INDEX
– Access to INFORMATION_SCHEMA.[TABLES|
STATISTICS]
– A table is opened for the first time
(after server restart)
– A table has changed >10%
– When InnoDB Monitor is turned ON.
44 07:48:08 AM
Displaying optimizer statistics
● MySQL 5.5, MariaDB 5.3, and older
– Issue SQL statements to count rows/keys
– Indirectly, look at EXPLAIN for simple queries
● MariaDB 5.5, Percona Server 5.5 (using XtraDB)
– information_schema.[innodb_index_stats, innodb_table_stats]
– Read-only, always visible
● MySQL 5.6
– mysql.[innodb_index_stats, innodb_table_stats]
– User updatetable
– Only available if innodb_analyze_is_persistent=ON
● MariaDB 10.0
– Persistent updateable tables mysql.[index_stats, column_stats, table_stats]
– User updateable
– + current XtraDB mechanisms.
45 07:48:08 AM
Plan [in]stability
● Statistics may vary a lot (orders)
MariaDB [dbt3]> select * from information_schema.innodb_index_stats;
+------------+-----------------+--------------+ +---------------+
| table_name | index_name | rows_per_key | | rows_per_key | error (actual)
+------------+-----------------+--------------+ +---------------+
| partsupp | PRIMARY | 3, 1 | | 4, 1 | 25%
| partsupp | i_ps_partkey | 3, 0 | => | 4, 1 | 25% (4)
| partsupp | i_ps_suppkey | 64, 0 | | 91, 1 | 30% (80)
| orders | i_o_orderdate | 9597, 1 | | 1660956, 0 | 99% (6234)
| orders | i_o_custkey | 15, 1 | | 15, 0 | 0% (15)
| lineitem | i_l_receiptdate | 7425, 1, 1 | | 6665850, 1, 1 | 99.9% (23477)
+------------+-----------------+--------------+ +---------------+
MariaDB [dbt3]> select * from information_schema.innodb_table_stats;
+-----------------+----------+ +----------+
| table_name | rows | | rows |
+-----------------+----------+ +----------+
| partsupp | 6524766 | | 9101065 | 28% (8000000)
| orders | 15039855 | ==> | 14948612 | 0.6% (15000000)
| lineitem | 60062904 | | 59992655 | 0.1% (59986052)
+-----------------+----------+ +----------+
.
46 07:48:08 AM
Controlling statistics (MySQL 5.6)
● Persistent and user-updatetable InnoDB statistics
– innodb_analyze_is_persistent = ON,
– updated manually by ANALYZE TABLE or
– automatically by innodb_stats_auto_recalc = ON
● Control the precision of sampling [default 8]
– innodb_stats_persistent_sample_pages,
– innodb_stats_transient_sample_pages
●
No new statistics compared to older versions.
47 07:48:08 AM
Controlling statistics (MariaDB 10.0)
Current XtraDB index statistics
+
● Engine-independent, persistent, user-updateable statistics
● Precise
● Additional statistics per column (even when there is no
index):
– min_value, max_value: minimum/maximum value per
column
– nulls_ratio: fraction of null values in a column
– avg_length: average size of values in a column
– avg_frequency: average number of rows with the same
value.
48 07:48:08 AM
Join condition
pushdown
49 07:48:08 AM
Join condition pushdown
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+.
50 07:48:08 AM
Join condition pushdown
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
51 07:48:08 AM
Join condition pushdown
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
52 07:48:08 AM
Join condition pushdown
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
● Conjunctive (ANDed) conditions are split into parts
● Each part is attached as early as possible
– Either as “Using where”
– Or as table access method.
53 07:48:08 AM
Observing join condition pushdown
EXPLAIN: {
"query_block": {
"select_id": 1,
"nested_loop": [
{
"table": {
"table_name": "orders",
"access_type": "ALL",
"possible_keys": [
"i_o_custkey"
],
"rows": 1499715,
"filtered": 100,
"attached_condition": "((`dbt3sf1`.`orders`.`o_orderpriority` =
'1-URGENT') and (`dbt3sf1`.`orders`.`o_custkey` is not null))"
}
},
{
"table": {
"table_name": "customer",
"access_type": "eq_ref",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"c_custkey"
],
"key_length": "4",
"ref": [
"dbt3sf1.orders.o_custkey"
],
"rows": 1,
"filtered": 100,
"attached_condition": "(`dbt3sf1`.`customer`.`c_acctbal` <
<cache>(-(500)))"
}
● Before mysql-5.6:
EXPLAIN shows only
“Using where”
– The condition itself
only visible in debug
trace
● Starting from 5.6:
EXPLAIN FORMAT=JSON
shows attached
conditions.
54 07:48:08 AM
Reasoning about join plan efficiency
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
First table, “customer”
● type=ALL, 150 K rows
●
select count(*) from customer where c_acctbal < -500 gives 6804.
● alter table customer add index (c_acctbal).
55 07:48:08 AM
Reasoning about join plan efficiency
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
First table, “customer”
● type=ALL, 150 K rows
●
select count(*) from customer where c_acctbal < -500 gives 6804.
● alter table customer add index (c_acctbal)
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
Now, access to 'customer' is efficient.
56 07:48:08 AM
Reasoning about join plan efficiency
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
Second table, “orders”
● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT'
● ref access uses only c_custkey=o_custkey
● What about o_orderpriority='1-URGENT'?.
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
57 07:48:08 AM
●o_orderpriority='1-URGENT'
o_orderpriority='1-URGENT'
● select count(*) from orders – 1.5M rows
● select count(*) from orders where o_orderpriority='1-URGENT' - 300K
rows
● 300K / 1.5M = 0.2
58 07:48:08 AM
Reasoning about join plan efficiency
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
Second table, “orders”
● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT'
● ref access uses only c_custkey=o_custkey
● What about o_orderpriority='1-URGENT'? Selectivity= 0.2
– Can examine 7*0.2=1.4 rows, 6802 times if we add an index:
alter table orders add index (o_custkey, o_orderpriority)
or
alter table orders add index (o_orderpriority, o_custkey)
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
59 07:48:08 AM
Reasoning about join plan efficiency - summary
Basic* approach to evaluation of join plan efficiency:
for each table $T in the join order {
Look at conditions attached to table $T (condition must
use table $T, may also use previous tables)
Does access method used with $T make a good use
of attached conditions?
}
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
* some other details may also affect join performance
60 07:48:08 AM
Attached conditions
61 07:48:08 AM
Attached conditions
● Ideally, should be used for table access
● Not all conditions can be used [at the same time]
– Unused ones are still useful
– They reduce number of scans for subsequent tables
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
62 07:48:08 AM
Informing optimizer about attached conditions
Currently: a range access that's too expensive to use
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+
|id|select_type|table |type|possible_keys |key |key_len|ref |rows |filtered|Extra |
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY,c_acctbal|NULL |NULL |NULL |150081| 36.22 |Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | 100.00 |Using where|
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+
explain extended
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal > 8000 and
o_orderpriority='1-URGENT';
● `orders` will be scanned 150081 * 36.22%= 54359 times
● This reduces the cost of join
– Has an effect when comparing potential join plans
● => Index i_o_custkey is not used. But may help the optimizer.
63 07:48:08 AM
Attached condition selectivity
● Unused indexes provide info about selectivity
– Works, but very expensive
● MariaDB 10.0 has engine-independent statistics
– Index statistics
– Non-indexed Column statistics
● Histograms
– Further info:
Tomorrow, 2:20 pm @ Ballroom D
Igor Babaev
Engine-independent persistent statistics with histograms
in MariaDB.
64 07:48:08 AM
How to check if the query plan
matches the reality
65 07:48:08 AM
Check if query plan is realistic
● EXPLAIN shows what optimizer
expects. It may be wrong
– Out-of-date index statistics
– Non-uniform data distribution
● Other DBMS: EXPLAIN ANALYZE
● MySQL: no equivalent. Instead, have
– Handler counters
– “User statistics” (Percona, MariaDB)
– PERFORMANCE_SCHEMA
66 07:48:08 AM
Join analysis: example query (Q18, DBT3)
<reset counters>
select c_name, c_custkey, o_orderkey, o_orderdate,
o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where
o_totalprice > 500000
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by c_name, c_custkey, o_orderkey, o_orderdate,
o_totalprice
order by o_totalprice desc, o_orderdate
LIMIT 10;
<collect statistics>
67 07:48:08 AM
Join analysis: handler counters (old)
FLUSH STATUS;
=> RUN QUERY
SHOW STATUS LIKE "Handler%";
+----------------------------+-------+
| Handler_mrr_key_refills | 0 |
| Handler_mrr_rowid_refills | 0 |
| Handler_read_first | 0 |
| Handler_read_key | 1646 |
| Handler_read_last | 0 |
| Handler_read_next | 1462 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 10 |
| Handler_read_rnd_deleted | 0 |
| Handler_read_rnd_next | 184 |
| Handler_tmp_update | 1096 |
| Handler_tmp_write | 183 |
| Handler_update | 0 |
| Handler_write | 0 |
68 07:48:08 AM
Join analysis: USERSTAT by Facebook
MariaDB, Percona Server
SET GLOBAL USERSTAT=1;
FLUSH TABLE_STATISTICS;
FLUSH INDEX_STATISTICS;
=> RUN QUERY
SHOW TABLE_STATISTICS;
+--------------+------------+-----------+--------------+-------------------------+
| Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+--------------+------------+-----------+--------------+-------------------------+
| dbt3 | orders | 183 | 0 | 0 |
| dbt3 | lineitem | 1279 | 0 | 0 |
| dbt3 | customer | 183 | 0 | 0 |
+--------------+------------+-----------+--------------+-------------------------+
SHOW INDEX_STATISTICS;
+--------------+------------+-----------------------+-----------+
| Table_schema | Table_name | Index_name | Rows_read |
+--------------+------------+-----------------------+-----------+
| dbt3 | customer | PRIMARY | 183 |
| dbt3 | lineitem | i_l_orderkey_quantity | 1279 |
| dbt3 | orders | i_o_totalprice | 183 |
+--------------+------------+-----------------------+-----------+
69 07:48:08 AM
Join analysis: PERFORMANCE SCHEMA
[MySQL 5.6, MariaDB 10.0]
● summary tables with read/write statistics
– table_io_waits_summary_by_table
– table_io_waits_summary_by_index_usage
● Superset of the userstat tables
● More overhead
● Not possible to associate statistics with a query
=> truncate stats tables before running a query
● Possible bug
– performance schema not ignored
– Disable by
UPDATE setup_consumers SET ENABLED = 'NO'
where name = 'global_instrumentation';
70 07:48:08 AM
Analyze joins via PERFORMANCE SCHEMA:
SHOW TABLE_STATISTICS analogue
select object_schema, object_name, count_read, count_write,
sum_timer_read, sum_timer_write, ...
from table_io_waits_summary_by_table
where object_schema = 'dbt3' and count_star > 0;
+---------------+-------------+------------+-------------+
| object_schema | object_name | count_read | count_write |
+---------------+-------------+------------+-------------+
| dbt3 | customer | 183 | 0 |
| dbt3 | lineitem | 1462 | 0 |
| dbt3 | orders | 184 | 0 |
+---------------+-------------+------------+-------------+
+----------------+-----------------+
| sum_timer_read | sum_timer_write | ...
+----------------+-----------------+
| 8326528406 | 0 |
| 12117332778 | 0 |
| 7946312812 | 0 |
+----------------+-----------------+
71 07:48:08 AM
Analyze joins via PERFORMANCE SCHEMA:
SHOW INDEX_STATISTICS analogue
select object_schema, object_name, index_name, count_read,
sum_timer_read, sum_timer_write, ...
from table_io_waits_summary_by_index_usage
where object_schema = 'dbt3' and count_star > 0
and index_name is not null;
+---------------+-------------+-----------------------+------------+
| object_schema | object_name | index_name | count_read |
+---------------+-------------+-----------------------+------------+
| dbt3 | customer | PRIMARY | 183 |
| dbt3 | lineitem | i_l_orderkey_quantity | 1462 |
| dbt3 | orders | i_o_totalprice | 184 |
+---------------+-------------+-----------------------+------------+
+----------------+-----------------+
| sum_timer_read | sum_timer_write | ...
+----------------+-----------------+
| 8326528406 | 0 |
| 12117332778 | 0 |
| 7946312812 | 0 |
+----------------+-----------------+
72 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
73 07:48:08 AM
Batched joins
● Optimization for analytical queries
● Analytic queries shovel through lots of data
– e.g. “average size of order in the last month”
– or “pairs of goods purchased together”
● Indexes,etc won't help when you really need to
look at all data
● More data means greater chance of being io-bound
● Solution: batched joins
74 07:48:08 AM
Batched Key Access Idea
75 07:48:08 AM
Batched Key Access Idea
76 07:48:08 AM
Batched Key Access Idea
77 07:48:08 AM
Batched Key Access Idea
78 07:48:08 AM
Batched Key Access Idea
79 07:48:08 AM
Batched Key Access Idea
80 07:48:08 AM
Batched Key Access Idea
● Non-BKA join hits data at random
● Caches are not used efficiently
● Prefetching is not useful
81 07:48:08 AM
Batched Key Access Idea
● BKA implementation accesses data
in order
● Takes advantages of caches and
prefetching
82 07:48:08 AM
Batched Key access effect
set join_cache_level=6;
select max(l_extendedprice)
from orders, lineitem
where
l_orderkey=o_orderkey and
o_orderdate between $DATE1 and $DATE2
The benchmark was run with
● Various BKA buffer size
● Various size of $DATE1...$DATE2 range
83 07:48:08 AM
Batched Key Access Performance
-2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,000
0
500
1000
1500
2000
2500
3000
BKA join performance depending on buffer size
query_size=1, regular
query_size=1, BKA
query_size=2, regular
query_size=2, BKA
query_size=3, regular
query_size=3, BKA
Buffer size, bytes
Querytime,sec
Performance without BKA
Performance with BKA,
given sufficient buffer size● 4x-10x speedup
● The more the data, the bigger the speedup
● Buffer size setting is very important.
84 07:48:08 AM
Batched Key Access settings
● Needs to be turned on
set join_buffer_size= 32*1024*1024;
set join_cache_level=6; -- MariaDB
set optimizer_switch='batched_key_access=on' -- MySQL 5.6
set optimizer_switch='mrr=on';
set optimizer_switch='mrr_sort_keys=on'; -- MariaDB only
● Further join_buffer_size tuning is watching
– Query performance
– Handler_mrr_init counter
and increasing join_buffer_size until either saturates.
85 07:48:08 AM
Batched Key Access - conclusions
● Targeted at big joins
● Needs to be enabled manually
● @@join_buffer_size is the most important
setting
● MariaDB's implementation is a superset of
MySQL's.
86 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
87 07:48:08 AM
ORDER BY
GROUP BY
aggregates
88 07:48:08 AM
Aggregate functions, no GROUP BY
● COUNT, SUM, AVG, etc need to examine all rows
select SUM(column) from tbl needs to examine the whole tbl.
● MIN and MAX can use index for lookup
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+
|id|select_type|table|type|possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+
|1 |SIMPLE |NULL |NULL|NULL |NULL|NULL |NULL|NULL|Select tables optimized away|
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+
index (o_orderdate)
select max(o_orderdate) from orders
select min(o_orderdate) from orders where o_orderdate > '1995-05-01'
select max(o_orderdate) from orders where o_orderpriority='1-URGENT'
index (o_orderpriority, o_orderdate)
89 07:48:08 AM
ORDER BY … LIMIT
Three algorithms
● Use an index to read in order
● Read one table, sort, join - “Using filesort”
● Execute join into temporary table and then
sort - “Using temporary; Using filesort”
90 07:48:08 AM
Using index to read data in order
● No special indication
in EXPLAIN output
● LIMIT n: as soon as
we read n records,
we can stop!
91 07:48:08 AM
A problem with LIMIT N optimization
`orders` has 1.5 M rows
explain select * from orders order by o_orderdate desc limit 10;
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra|
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+
|1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 | |
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+
select * from orders where o_orderpriority='1-URGENT' order by o_orderdate desc limit 10;
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+
|1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 |Using where|
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+
● A problem:
– 1.5M rows, 300K of them 'URGENT'
– Scanning by date, when will we find 10 'URGENT' rows?
– No good solution so far.
92 07:48:08 AM
Using filesort strategy
● Have to read the entire
first table
● For remaining, can apply
LIMIT n
● ORDER BY can only use
columns of tbl1.
93 07:48:08 AM
Using temporary; Using filesort
● ORDER BY clause
can use columns of
any table
● LIMIT is applied only
after executing the
entire join and
sorting.
94 07:48:08 AM
ORDER BY - conclusions
● Resolving ORDER BY with index allows very
efficient handling for LIMIT
– Optimization for
WHERE unused_condition ORDER BY … LIMIT n
is challenging.
● Use sql_big_result, IGNORE INDEX FOR ORDER BY
● Using filesort
– Needs all ORDER BY columns in the first table
– Take advantage of LIMIT when doing join to non-first tables
● Using where; Using filesort is least efficient.
95 07:48:08 AM
GROUP BY strategies
There are three strategies
● Ordered index scan
● Loose Index Scan (LooseScan)
● Groups table
(Using temporary; [Using filesort]).
96 07:48:08 AM
Ordered index scan
● Groups are
enumerated one after
another
● Can compute
aggregates on the fly
● Loose index scan is
also able to jump to
next group.
97 07:48:08 AM
Execution of GROUP BY with temptable
98 07:48:08 AM
Subqueries
99 07:48:08 AM
Subquery optimizations
● Before MariaDB 5.3/MySQL 5.6 - “don't use subqueries”
● Queries that caused most of the pain
– SELECT … FROM tbl WHERE col IN (SELECT …) - semi-joins
– SELECT … FROM (SELECT …) - derived tables
● MariaDB 5.3 and MySQL 5.6
– Have common inheritance, MySQL 6.0 alpha
– Huge (100x, 1000x) speedups for painful areas
– Other kinds of subqueries received a speedup, too
– MariaDB 5.3/5.5 has a superset of MySQL 5.6's optimizations
● 5.6 handles some un-handled edge cases, too
100 07:48:08 AM
Tuning for subqueries
● “Before”: one execution strategy
– No tuning possible
● “After”: similar to joins
– Reasonable execution strategies supported
– Need indexes
– Need selective conditions
– Support batching in most important cases
● Should be better 9x% of the time.
101 07:48:08 AM
What if it still picks a poor query plan?
For both MariaDB and MySQL:
● Check EXPLAIN [EXTENDED], find a keyword around a
subquery table
● Google “site:kb.askmonty.org $subuqery_keyword”
or https://kb.askmonty.org/en/subquery-optimizations-map/
● Find which optimization it was
● set optimizer_switch='$subquery_optimization=off'
102 07:48:08 AM
Thanks!
Q & A

Contenu connexe

Tendances

How to use histograms to get better performance
How to use histograms to get better performanceHow to use histograms to get better performance
How to use histograms to get better performanceMariaDB plc
 
MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesSergey Petrunya
 
Oracle Database SQL Tuning Concept
Oracle Database SQL Tuning ConceptOracle Database SQL Tuning Concept
Oracle Database SQL Tuning ConceptChien Chung Shen
 
Chasing the optimizer
Chasing the optimizerChasing the optimizer
Chasing the optimizerMauro Pagano
 
The MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer TraceThe MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer Traceoysteing
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...Altinity Ltd
 
Using Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query PerformanceUsing Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query Performanceoysteing
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Mydbops
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MYXPLAIN
 
MySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer GuideMySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer GuideMorgan Tocker
 
Online index rebuild automation
Online index rebuild automationOnline index rebuild automation
Online index rebuild automationCarlos Sierra
 
Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)NTT DATA Technology & Innovation
 
More mastering the art of indexing
More mastering the art of indexingMore mastering the art of indexing
More mastering the art of indexingYoshinori Matsunobu
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Aaron Shilo
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsEnkitec
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
Your tuning arsenal: AWR, ADDM, ASH, Metrics and Advisors
Your tuning arsenal: AWR, ADDM, ASH, Metrics and AdvisorsYour tuning arsenal: AWR, ADDM, ASH, Metrics and Advisors
Your tuning arsenal: AWR, ADDM, ASH, Metrics and AdvisorsJohn Kanagaraj
 
Oracle Database Performance Tuning Concept
Oracle Database Performance Tuning ConceptOracle Database Performance Tuning Concept
Oracle Database Performance Tuning ConceptChien Chung Shen
 

Tendances (20)

How to use histograms to get better performance
How to use histograms to get better performanceHow to use histograms to get better performance
How to use histograms to get better performance
 
MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixes
 
Oracle Database SQL Tuning Concept
Oracle Database SQL Tuning ConceptOracle Database SQL Tuning Concept
Oracle Database SQL Tuning Concept
 
Chasing the optimizer
Chasing the optimizerChasing the optimizer
Chasing the optimizer
 
The MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer TraceThe MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer Trace
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...
 
How to Design Indexes, Really
How to Design Indexes, ReallyHow to Design Indexes, Really
How to Design Indexes, Really
 
Using Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query PerformanceUsing Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query Performance
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6
 
MySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer GuideMySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer Guide
 
Online index rebuild automation
Online index rebuild automationOnline index rebuild automation
Online index rebuild automation
 
Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
 
More mastering the art of indexing
More mastering the art of indexingMore mastering the art of indexing
More mastering the art of indexing
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning Fundamentals
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
SQL Tuning 101
SQL Tuning 101SQL Tuning 101
SQL Tuning 101
 
Your tuning arsenal: AWR, ADDM, ASH, Metrics and Advisors
Your tuning arsenal: AWR, ADDM, ASH, Metrics and AdvisorsYour tuning arsenal: AWR, ADDM, ASH, Metrics and Advisors
Your tuning arsenal: AWR, ADDM, ASH, Metrics and Advisors
 
Oracle Database Performance Tuning Concept
Oracle Database Performance Tuning ConceptOracle Database Performance Tuning Concept
Oracle Database Performance Tuning Concept
 

En vedette

Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksQuery Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksJaime Crespo
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewMYXPLAIN
 
MariaDB Optimizer
MariaDB OptimizerMariaDB Optimizer
MariaDB OptimizerJongJin Lee
 
Capturing Network Traffic into Database
Capturing Network Traffic into Database Capturing Network Traffic into Database
Capturing Network Traffic into Database Tigran Tsaturyan
 
MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)Karthik .P.R
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewOlav Sandstå
 

En vedette (8)

Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksQuery Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
 
Mysql Optimization
Mysql OptimizationMysql Optimization
Mysql Optimization
 
Cost-Based query optimization
Cost-Based query optimizationCost-Based query optimization
Cost-Based query optimization
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
MariaDB Optimizer
MariaDB OptimizerMariaDB Optimizer
MariaDB Optimizer
 
Capturing Network Traffic into Database
Capturing Network Traffic into Database Capturing Network Traffic into Database
Capturing Network Traffic into Database
 
MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 

Similaire à MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013

Advanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and AnalysisAdvanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and AnalysisMYXPLAIN
 
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...Sergey Petrunya
 
Need for Speed: MySQL Indexing
Need for Speed: MySQL IndexingNeed for Speed: MySQL Indexing
Need for Speed: MySQL IndexingMYXPLAIN
 
Percona live-2012-optimizer-tuning
Percona live-2012-optimizer-tuningPercona live-2012-optimizer-tuning
Percona live-2012-optimizer-tuningSergey Petrunya
 
Window functions in MySQL 8.0
Window functions in MySQL 8.0Window functions in MySQL 8.0
Window functions in MySQL 8.0Mydbops
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingSveta Smirnova
 
Workshop 20140522 BigQuery Implementation
Workshop 20140522   BigQuery ImplementationWorkshop 20140522   BigQuery Implementation
Workshop 20140522 BigQuery ImplementationSimon Su
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query OptimizationAnju Garg
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingSveta Smirnova
 
Highload Perf Tuning
Highload Perf TuningHighload Perf Tuning
Highload Perf TuningHighLoad2009
 
MariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerMariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerSergey Petrunya
 
Adapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cAdapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cMauro Pagano
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)Valeriy Kravchuk
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightDataStax Academy
 

Similaire à MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013 (20)

Advanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and AnalysisAdvanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and Analysis
 
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
 
Need for Speed: MySQL Indexing
Need for Speed: MySQL IndexingNeed for Speed: MySQL Indexing
Need for Speed: MySQL Indexing
 
Percona live-2012-optimizer-tuning
Percona live-2012-optimizer-tuningPercona live-2012-optimizer-tuning
Percona live-2012-optimizer-tuning
 
Window functions in MySQL 8.0
Window functions in MySQL 8.0Window functions in MySQL 8.0
Window functions in MySQL 8.0
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
 
Workshop 20140522 BigQuery Implementation
Workshop 20140522   BigQuery ImplementationWorkshop 20140522   BigQuery Implementation
Workshop 20140522 BigQuery Implementation
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
 
Highload Perf Tuning
Highload Perf TuningHighload Perf Tuning
Highload Perf Tuning
 
MariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerMariaDB 10.0 Query Optimizer
MariaDB 10.0 Query Optimizer
 
Adapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cAdapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12c
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
 
Mysql tracing
Mysql tracingMysql tracing
Mysql tracing
 
Mysql tracing
Mysql tracingMysql tracing
Mysql tracing
 
Perf Tuning Short
Perf Tuning ShortPerf Tuning Short
Perf Tuning Short
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 

Plus de Sergey Petrunya

New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12Sergey Petrunya
 
Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Sergey Petrunya
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesSergey Petrunya
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureSergey Petrunya
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace WalkthroughSergey Petrunya
 
ANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemSergey Petrunya
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesSergey Petrunya
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что новогоSergey Petrunya
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performanceSergey Petrunya
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeSergey Petrunya
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Sergey Petrunya
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkSergey Petrunya
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standSergey Petrunya
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18Sergey Petrunya
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3Sergey Petrunya
 
Histograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLHistograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLSergey Petrunya
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Sergey Petrunya
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howSergey Petrunya
 

Plus de Sergey Petrunya (20)

New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12
 
Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimates
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger picture
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
 
ANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gem
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что нового
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performance
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit hole
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it stand
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3
 
MyRocks in MariaDB
MyRocks in MariaDBMyRocks in MariaDB
MyRocks in MariaDB
 
Histograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLHistograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQL
 
Say Hello to MyRocks
Say Hello to MyRocksSay Hello to MyRocks
Say Hello to MyRocks
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and how
 

Dernier

20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 

Dernier (20)

20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 

MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013

  • 1. Advanced query optimizer tuning and analysis Sergei Petrunia Timour Katchaounov Monty Program Ab MySQL Conference And Expo 2013
  • 2. 2 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 3. 3 07:48:08 AM Is there a problem with query optimizer? • Database performance is affected by many factors • One of them is the query optimizer • Is my performance problem caused by the optimizer?
  • 4. 4 07:48:08 AM Sings that there is a query optimizer problem • Some (not all) queries are slow • A query seems to run longer than it ought to – And examines more records than it ought to • Usually, query remains slow regardless of other activity on the server
  • 5. 5 07:48:08 AM Catching slow queries, the old ways ● Watch the Slow query log – Percona Server/MariaDB: --log_slow_verbosity=query_plan # Thread_id: 1 Schema: dbt3sf10 QC_hit: No # Query_time: 2.452373 Lock_time: 0.000113 Rows_sent: 0 Rows_examined: 1500000 # Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: No Filesort_on_disk: No Merge_passes: 0 SET timestamp=1333385770; select * from customer where c_acctbal < -1000; • Run SHOW PROCESSLIST periodically – Run pt-query-digest on the log
  • 6. 6 07:48:08 AM The new way: SHOW PROCESSLIST + SHOW EXPLAIN • Available in MariaDB 10.0+ • Displays EXPLAIN of a running statement MariaDB> show processlist; +--+----+---------+-------+-------+----+------------+-------------------------... |Id|User|Host |db |Command|Time|State |Info +--+----+---------+-------+-------+----+------------+-------------------------... | 1|root|localhost|dbt3sf1|Query | 10|Sending data|select max(o_totalprice) ... | 2|root|localhost|dbt3sf1|Query | 0|init |show processlist +--+----+---------+-------+-------+----+------------+-------------------------... MariaDB> show explain for 1; +--+-----------+------+----+-------------+----+-------+----+-------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+------+----+-------------+----+-------+----+-------+-----------+ |1 |SIMPLE |orders|ALL |NULL |NULL|NULL |NULL|1498194|Using where| +--+-----------+------+----+-------------+----+-------+----+-------+-----------+ MariaDB [dbt3sf1]> show warnings; +-----+----+-----------------------------------------------------------------+ |Level|Code|Message | +-----+----+-----------------------------------------------------------------+ |Note |1003|select max(o_totalprice) from orders where year(o_orderDATE)=1995| +-----+----+-----------------------------------------------------------------+
  • 7. 7 07:48:08 AM SHOW EXPLAIN usage ● Intended usage – SHOW PROCESSLIST ... – SHOW EXPLAIN FOR ... ● Why not just run EXPLAIN again – Difficult to replicate setups ● Temporary tables ● Optimizer settings ● Storage engine's index statistics ● ... – No uncertainty about whether you're looking at the same query plan or not.
  • 8. 8 07:48:08 AM Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] ● use performance_schema ● Many ways to analyze via queries – events_statements_summary_by_digest ● count_star, sum_timer_wait, min_timer_wait, avg_timer_wait, max_timer_wait ● digest_text, digest ● sum_rows_examined, sum_created_tmp_disk_tables, sum_select_full_join – events_statements_history ● sql_text, digest_text, digest ● timer_start, timer_end, timer_wait ● rows_examined, created_tmp_disk_tables, select_full_join 8
  • 9. 9 07:48:08 AM Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] • Modified Q18 from DBT3 select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > ? and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate LIMIT 10; • App executes Q18 many times with ? = 550000, 500000, 400000, ... 9
  • 10. 10 07:48:08 AM Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] ● Find candidate slow queries ● Simple tests: select_full_join > 0, created_tmp_disk_tables > 0, etc ● Complex conditions: max execution time > X sec OR min/max time vary a lot: select max_timer_wait/avg_timer_wait as max_ratio, avg_timer_wait/min_timer_wait as min_ratio from events_statements_summary_by_digest where max_timer_wait > 1000000000000 or max_timer_wait / avg_timer_wait > 2 or avg_timer_wait / min_timer_wait > 2G
  • 11. 11 07:48:08 AM Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] *************************** 5. row *************************** DIGEST: 3cd7b881cbc0102f65fe8a290ec1bd6b DIGEST_TEXT: SELECT `c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , `o_totalprice` , SUM ( `l_quantity` ) FROM `customer` , `orders` , `lineitem` WHERE `o_totalprice` > ? AND `c_custkey` = `o_custkey` AND `o_orderkey` = `l_orderkey` GROUP BY `c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , `o_totalprice` ORDER BY `o_totalprice` DESC , `o_orderdate` LIMIT ? COUNT_STAR: 3 SUM_TIMER_WAIT: 3251758347000 MIN_TIMER_WAIT: 3914209000 → 0.0039 sec AVG_TIMER_WAIT: 1083919449000 MAX_TIMER_WAIT: 3204044053000 → 3.2 sec SUM_LOCK_TIME: 555000000 SUM_ROWS_SENT: 25 SUM_ROWS_EXAMINED: 0 SUM_CREATED_TMP_DISK_TABLES: 0 SUM_CREATED_TMP_TABLES: 3 SUM_SELECT_FULL_JOIN: 0 SUM_SELECT_RANGE: 3 SUM_SELECT_SCAN: 0 SUM_SORT_RANGE: 0 SUM_SORT_ROWS: 25 SUM_SORT_SCAN: 3 SUM_NO_INDEX_USED: 0 SUM_NO_GOOD_INDEX_USED: 0 FIRST_SEEN: 1970-01-01 03:38:27 LAST_SEEN: 1970-01-01 03:38:43 max_ratio: 2.9560 min_ratio: 276.9192 High variance of execution time
  • 12. 12 07:48:08 AM Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] ● Check the actual queries and constants ● The events_statements_history table select timer_wait/1000000000000 as exec_time, sql_text from events_statements_history where digest in (select digest from events_statements_summary_by_digest where max_timer_wait > 1000000000000 or max_timer_wait / avg_timer_wait > 2 or avg_timer_wait / min_timer_wait > 2) order by timer_wait;
  • 13. 13 07:48:08 AM Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] +-----------+-----------------------------------------------------------------------------------+ | exec_time | sql_text | +-----------+-----------------------------------------------------------------------------------+ | 0.0039 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > 550000 and c_custkey = o_custkey ... LIMIT 10 | | 0.0438 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > 500000 and c_custkey = o_custkey ... LIMIT 10 | | 3.2040 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > 400000 and c_custkey = o_custkey ... LIMIT 10 | +-----------+-----------------------------------------------------------------------------------+ Observation: orders.o_totalprice > ? is less and less selective
  • 14. 14 07:48:08 AM Actions after finding the slow query Bad query plan – Rewrite the query – Force a good query plan • Bad optimizer settings – Do tuning • Query is inherently complex – Don't waste time with it – Look for other solutions.
  • 15. 15 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 16. 16 07:48:08 AM Consider a simple select • 15M rows were scanned, 19 rows in output • Query plan seems inefficient – (note: this logic doesn't directly apply to group/order by queries). select * from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and o_clerk='Clerk#000009506' +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ | 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where | +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 19 rows in set (7.65 sec) ● Check the query plan: ● Run the query:
  • 17. 17 07:48:08 AM Query plan analysis • Entire table is scanned • WHERE condition checked after records are read – Not used to limit #examined rows. +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ | 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where | +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ select * from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and o_clerk='Clerk#000009506'
  • 18. 18 07:48:08 AM Let's add an index • Outcome – Down to reading 300K rows – Still, 300K >> 19 rows. alter table orders add key i_o_orderdate (o_orderdate); select * from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and o_clerk='Clerk#000009506' +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ |1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where| +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ 19 rows in set (0.76 sec) ● Query time:
  • 19. 19 07:48:08 AM Finding out which indexes to add ● index (o_orderdate) ● index (o_clerk) Check selectivity of conditions that will use the index select * from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and o_clerk='Clerk#000009506' select count(*) from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06'; 306322 rows select count(*) from orders where o_clerk='Clerk#000009506' 1507 rows.
  • 20. 20 07:48:08 AM +--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ |1 |SIMPLE |orders|range|i_o_clerk_...|i_o_clerk_date|20 |NULL|19 |Using where| +--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ +--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | +--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ |1 |SIMPLE |orders|range|i_o_date_c...|i_o_date_clerk|20 |NULL|360354|Using where| +--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ Try adding composite indexes ● index (o_clerk, o_orderdate) ● index (o_orderdate, o_clerk) Bingo! 100% efficiency Much worse! • If condition uses multiple columns, composite index will be most efficient • Order of column matters – Explanation why is outside of scope of this tutorial. Covered in last year's tutorial
  • 21. 21 07:48:08 AM Conditions must be in SARGable form • Condition must represent a range • It must have form that is recognized by the optimizer o_orderDate BETWEEN '1992-06-01' and '1992-06-30' day(o_orderDate)=1992 and month(o_orderdate)=6 TO_DAYS(o_orderDATE) between TO_DAYS('1992-06-06') and TO_DAYS('1992-07-06') o_clerk='Clerk#000009506' o_clerk LIKE 'Clerk#000009506' o_clerk LIKE '%Clerk#000009506%'       column IN (1,10,15,21, ...) (col1, col2) IN ( (1,1), (2,2), (3,3), …).  
  • 22. 22 07:48:08 AM New in MySQL-5.6: optimizer_trace ● Lets you see the ranges set optimizer_trace=1; explain select * from orders where o_orderDATE between '1992-06-01' and '1992-07-03' and o_orderdate not in ('1992-01-01', '1992-06-12','1992-07-04') select * from information_schema.optimizer_traceG ● Will print a big JSON struct ● Search for range_scan_alternatives.
  • 23. 23 07:48:08 AM New in MySQL-5.6: optimizer_trace ... "range_scan_alternatives": [ { "index": "i_o_orderdate", "ranges": [ "1992-06-01 <= o_orderDATE < 1992-06-12", "1992-06-12 < o_orderDATE <= 1992-07-03" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 319082, "cost": 382900, "chosen": true }, { "index": "i_o_date_clerk", "ranges": [ "1992-06-01 <= o_orderDATE < 1992-06-12", "1992-06-12 < o_orderDATE <= 1992-07-03" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 406336, "cost": 487605, "chosen": false, "cause": "cost" } ], ... ● Considered ranges are shown in range_scan_alternatives section ● This is actually original use case of optimizer_trace ● Alas, recent mysql-5.6 displays misleading info about ranges on multi-component keys (will file a bug) ● Still, very useful.
  • 24. 24 07:48:08 AM Source of #rows estimates for range select * from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ |1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where| +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ ? • “records_in_range” estimate • Done by diving into index • Usually is fairly accurate • Not affected by ANALYZE TABLE.
  • 25. 25 07:48:08 AM Simple selects: conclusions • Efficiency == “#rows_scanned is close to #rows_returned” • Indexes and WHERE conditions reduce #rows scanned • Index estimates are usually accurate • Multi-column indexes – “handle” conditions on multiple columns – Order of columns in the index matters • optimizer_trace allows to view the ranges – But misrepresents ranges over multi-column indexes.
  • 26. 26 07:48:08 AM Now, will skip some topics One can also speedup simple selects with ● index_merge access method ● index access method ● Index Condition Pushdown We don't have time for these now, check out the last year's tutorial.
  • 27. 27 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 28. 28 07:48:08 AM A simple join select * from customer, orders where c_custkey=o_custkey • “Customers with their orders”
  • 29. 29 07:48:08 AM Execution: Nested Loops join select * from customer, orders where c_custkey=o_custkey for each customer C { for each order O { if (C.c_custkey == O.o_custkey) produce record(C, O); } } • Complexity: – Scans table customer – For each record in customer, scans table orders • Is this ok?
  • 30. 30 07:48:08 AM Execution: Nested loops join (2) select * from customer, orders where c_custkey=o_custkey for each customer C { for each order O { if (C.c_custkey == O.o_custkey) produce record(C, O); } } • EXPLAIN: +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | |1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
  • 31. 31 07:48:08 AM Execution: Nested loops join (3) select * from customer, orders where c_custkey=o_custkey for each customer C { for each order O { if (C.c_custkey == O.o_custkey) produce record(C, O); } } • EXPLAIN: +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | |1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ rows to read from customer rows to read from orders c_custkey=o_custkey
  • 32. 32 07:48:08 AM Execution: Nested loops join (4) select * from customer, orders where c_custkey=o_custkey +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | |1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ • Scan a 1,493,361-row table 148,749 times – Consider 1,493,361 * 148,749 row combinations • Is this query inherently complex? – We know each customer has his own orders – size(customer x orders)= size(orders) – Lower bound is 1,493,361 + 148,749 + costs to match customer<->order.
  • 33. 33 07:48:08 AM Using index for join: ref access alter table orders add index i_o_custkey(o_custkey) select * from customer, orders where c_custkey=o_custkey
  • 34. 34 07:48:08 AM ref access - analysis +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |148749| | |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ select * from customer, orders where c_custkey=o_custkey ● One ref lookup scans 7 rows. ● In total: 7 * 148,749=1,041,243 rows – `orders` has 1.4M rows – no redundant reads from `orders` ● The whole query plan – Reads all customers – Reads 1M orders (of 1.4M) ● Efficient!
  • 35. 35 07:48:08 AM Conditions that can be used for ref access ● Can use equalities – tbl.key=other_table.col – tbl.key=const – tbl.key IS NULL ● For multipart keys, will use largest prefix – keypart1=... AND keypart2= … AND keypartK=... .
  • 36. 36 07:48:08 AM Conditions that can't be used for ref access ● Doesn't work for non-equalities t1.key BETWEEN t2.col1 AND t2.col2 ● Doesn't work for OR-ed equalities t1.key=t2.col1 OR t1.key=t2.col2 – Except for ref_or_null t1.key=... OR t1.key IS NULL ● Doesn't “combine” ref and range access – t.keypart1 BETWEEN c1 AND c2 AND t.keypart2=t2.col – t.keypart2 BETWEEN c1 AND c2 AND t.keypart1=t2.col .
  • 37. 37 07:48:08 AM Is ref always efficient? ● Efficient, if column has many different values – Best case – unique index (eq_ref) ● A few different values – not useful ● Skewed distribution: depends on which part the join touches good bad depends
  • 38. 38 07:48:08 AM ref access estimates - index statistics • How many rows will match tbl.key_column = $value for an arbitrary $value? • Index statistics show keys from orders where key_name='i_o_custkey' *************************** 1. row *************** Table: orders Non_unique: 1 Key_name: i_o_custkey Seq_in_index: 1 Column_name: o_custkey Collation: A Cardinality: 214462 Sub_part: NULL Packed: NULL Null: YES Index_type: BTREE show table status like 'orders' *************************** 1. row **** Name: orders Engine: InnoDB Version: 10 Row_format: Compact Rows: 1495152 Avg_row_length: 133 Data_length: 199966720 Max_data_length: 0 Index_length: 122421248 Data_free: 6291456 ... average = Rows /Cardinality = 1495152 / 214462 = 6.97.
  • 39. 39 07:48:08 AM ref access – conclusions ● Based on t.key=... equality conditions ● Can make joins very efficient ● Relies on index statistics for estimates.
  • 40. 40 07:48:08 AM Optimizer statistics ● MySQL/Percona Server – Index statistics – Persistent/transient InnoDB stats ● MariaDB – Index statistics, persistent/transient ● Same as Percona Server (via XtraDB) – Persistent, engine-independent, index-independent statistics.
  • 41. 41 07:48:08 AM Index statistics ● Cardinality allows to calculate a table-wide average #rows-per-key-prefix ● It is a statistical value (inexact) ● Exact collection procedure depends on the storage engine – InnoDB – random sampling – MyISAM – index scan – Engine-independent – index scan.
  • 42. 42 07:48:08 AM Index statistics in MySQL 5.6 ● Sample [8] random index leaf pages ● Table statistics (stored) – rows - estimated number of rows in a table – Other stats not used by optimizer ● Index statistics (stored) – fields - #fields in the index – rows_per_key - rows per 1 key value, per prefix fields ([1 column value], [2 columns value], [3 columns value], …) – Other stats not used by optimizer.
  • 43. 43 07:48:08 AM Index statics updates ● Statistics updated when: – ANALYZE TABLE tbl_name [, tbl_name] … – SHOW TABLE STATUS, SHOW INDEX – Access to INFORMATION_SCHEMA.[TABLES| STATISTICS] – A table is opened for the first time (after server restart) – A table has changed >10% – When InnoDB Monitor is turned ON.
  • 44. 44 07:48:08 AM Displaying optimizer statistics ● MySQL 5.5, MariaDB 5.3, and older – Issue SQL statements to count rows/keys – Indirectly, look at EXPLAIN for simple queries ● MariaDB 5.5, Percona Server 5.5 (using XtraDB) – information_schema.[innodb_index_stats, innodb_table_stats] – Read-only, always visible ● MySQL 5.6 – mysql.[innodb_index_stats, innodb_table_stats] – User updatetable – Only available if innodb_analyze_is_persistent=ON ● MariaDB 10.0 – Persistent updateable tables mysql.[index_stats, column_stats, table_stats] – User updateable – + current XtraDB mechanisms.
  • 45. 45 07:48:08 AM Plan [in]stability ● Statistics may vary a lot (orders) MariaDB [dbt3]> select * from information_schema.innodb_index_stats; +------------+-----------------+--------------+ +---------------+ | table_name | index_name | rows_per_key | | rows_per_key | error (actual) +------------+-----------------+--------------+ +---------------+ | partsupp | PRIMARY | 3, 1 | | 4, 1 | 25% | partsupp | i_ps_partkey | 3, 0 | => | 4, 1 | 25% (4) | partsupp | i_ps_suppkey | 64, 0 | | 91, 1 | 30% (80) | orders | i_o_orderdate | 9597, 1 | | 1660956, 0 | 99% (6234) | orders | i_o_custkey | 15, 1 | | 15, 0 | 0% (15) | lineitem | i_l_receiptdate | 7425, 1, 1 | | 6665850, 1, 1 | 99.9% (23477) +------------+-----------------+--------------+ +---------------+ MariaDB [dbt3]> select * from information_schema.innodb_table_stats; +-----------------+----------+ +----------+ | table_name | rows | | rows | +-----------------+----------+ +----------+ | partsupp | 6524766 | | 9101065 | 28% (8000000) | orders | 15039855 | ==> | 14948612 | 0.6% (15000000) | lineitem | 60062904 | | 59992655 | 0.1% (59986052) +-----------------+----------+ +----------+ .
  • 46. 46 07:48:08 AM Controlling statistics (MySQL 5.6) ● Persistent and user-updatetable InnoDB statistics – innodb_analyze_is_persistent = ON, – updated manually by ANALYZE TABLE or – automatically by innodb_stats_auto_recalc = ON ● Control the precision of sampling [default 8] – innodb_stats_persistent_sample_pages, – innodb_stats_transient_sample_pages ● No new statistics compared to older versions.
  • 47. 47 07:48:08 AM Controlling statistics (MariaDB 10.0) Current XtraDB index statistics + ● Engine-independent, persistent, user-updateable statistics ● Precise ● Additional statistics per column (even when there is no index): – min_value, max_value: minimum/maximum value per column – nulls_ratio: fraction of null values in a column – avg_length: average size of values in a column – avg_frequency: average number of rows with the same value.
  • 48. 48 07:48:08 AM Join condition pushdown
  • 49. 49 07:48:08 AM Join condition pushdown select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+.
  • 50. 50 07:48:08 AM Join condition pushdown select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
  • 51. 51 07:48:08 AM Join condition pushdown select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
  • 52. 52 07:48:08 AM Join condition pushdown select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ ● Conjunctive (ANDed) conditions are split into parts ● Each part is attached as early as possible – Either as “Using where” – Or as table access method.
  • 53. 53 07:48:08 AM Observing join condition pushdown EXPLAIN: { "query_block": { "select_id": 1, "nested_loop": [ { "table": { "table_name": "orders", "access_type": "ALL", "possible_keys": [ "i_o_custkey" ], "rows": 1499715, "filtered": 100, "attached_condition": "((`dbt3sf1`.`orders`.`o_orderpriority` = '1-URGENT') and (`dbt3sf1`.`orders`.`o_custkey` is not null))" } }, { "table": { "table_name": "customer", "access_type": "eq_ref", "possible_keys": [ "PRIMARY" ], "key": "PRIMARY", "used_key_parts": [ "c_custkey" ], "key_length": "4", "ref": [ "dbt3sf1.orders.o_custkey" ], "rows": 1, "filtered": 100, "attached_condition": "(`dbt3sf1`.`customer`.`c_acctbal` < <cache>(-(500)))" } ● Before mysql-5.6: EXPLAIN shows only “Using where” – The condition itself only visible in debug trace ● Starting from 5.6: EXPLAIN FORMAT=JSON shows attached conditions.
  • 54. 54 07:48:08 AM Reasoning about join plan efficiency select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ First table, “customer” ● type=ALL, 150 K rows ● select count(*) from customer where c_acctbal < -500 gives 6804. ● alter table customer add index (c_acctbal).
  • 55. 55 07:48:08 AM Reasoning about join plan efficiency select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; First table, “customer” ● type=ALL, 150 K rows ● select count(*) from customer where c_acctbal < -500 gives 6804. ● alter table customer add index (c_acctbal) +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ Now, access to 'customer' is efficient.
  • 56. 56 07:48:08 AM Reasoning about join plan efficiency select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; Second table, “orders” ● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT' ● ref access uses only c_custkey=o_custkey ● What about o_orderpriority='1-URGENT'?. +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
  • 57. 57 07:48:08 AM ●o_orderpriority='1-URGENT' o_orderpriority='1-URGENT' ● select count(*) from orders – 1.5M rows ● select count(*) from orders where o_orderpriority='1-URGENT' - 300K rows ● 300K / 1.5M = 0.2
  • 58. 58 07:48:08 AM Reasoning about join plan efficiency select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; Second table, “orders” ● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT' ● ref access uses only c_custkey=o_custkey ● What about o_orderpriority='1-URGENT'? Selectivity= 0.2 – Can examine 7*0.2=1.4 rows, 6802 times if we add an index: alter table orders add index (o_custkey, o_orderpriority) or alter table orders add index (o_orderpriority, o_custkey) +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
  • 59. 59 07:48:08 AM Reasoning about join plan efficiency - summary Basic* approach to evaluation of join plan efficiency: for each table $T in the join order { Look at conditions attached to table $T (condition must use table $T, may also use previous tables) Does access method used with $T make a good use of attached conditions? } +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ * some other details may also affect join performance
  • 61. 61 07:48:08 AM Attached conditions ● Ideally, should be used for table access ● Not all conditions can be used [at the same time] – Unused ones are still useful – They reduce number of scans for subsequent tables select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
  • 62. 62 07:48:08 AM Informing optimizer about attached conditions Currently: a range access that's too expensive to use +--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ |id|select_type|table |type|possible_keys |key |key_len|ref |rows |filtered|Extra | +--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY,c_acctbal|NULL |NULL |NULL |150081| 36.22 |Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | 100.00 |Using where| +--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ explain extended select * from customer, orders where c_custkey=o_custkey and c_acctbal > 8000 and o_orderpriority='1-URGENT'; ● `orders` will be scanned 150081 * 36.22%= 54359 times ● This reduces the cost of join – Has an effect when comparing potential join plans ● => Index i_o_custkey is not used. But may help the optimizer.
  • 63. 63 07:48:08 AM Attached condition selectivity ● Unused indexes provide info about selectivity – Works, but very expensive ● MariaDB 10.0 has engine-independent statistics – Index statistics – Non-indexed Column statistics ● Histograms – Further info: Tomorrow, 2:20 pm @ Ballroom D Igor Babaev Engine-independent persistent statistics with histograms in MariaDB.
  • 64. 64 07:48:08 AM How to check if the query plan matches the reality
  • 65. 65 07:48:08 AM Check if query plan is realistic ● EXPLAIN shows what optimizer expects. It may be wrong – Out-of-date index statistics – Non-uniform data distribution ● Other DBMS: EXPLAIN ANALYZE ● MySQL: no equivalent. Instead, have – Handler counters – “User statistics” (Percona, MariaDB) – PERFORMANCE_SCHEMA
  • 66. 66 07:48:08 AM Join analysis: example query (Q18, DBT3) <reset counters> select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > 500000 and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate LIMIT 10; <collect statistics>
  • 67. 67 07:48:08 AM Join analysis: handler counters (old) FLUSH STATUS; => RUN QUERY SHOW STATUS LIKE "Handler%"; +----------------------------+-------+ | Handler_mrr_key_refills | 0 | | Handler_mrr_rowid_refills | 0 | | Handler_read_first | 0 | | Handler_read_key | 1646 | | Handler_read_last | 0 | | Handler_read_next | 1462 | | Handler_read_prev | 0 | | Handler_read_rnd | 10 | | Handler_read_rnd_deleted | 0 | | Handler_read_rnd_next | 184 | | Handler_tmp_update | 1096 | | Handler_tmp_write | 183 | | Handler_update | 0 | | Handler_write | 0 |
  • 68. 68 07:48:08 AM Join analysis: USERSTAT by Facebook MariaDB, Percona Server SET GLOBAL USERSTAT=1; FLUSH TABLE_STATISTICS; FLUSH INDEX_STATISTICS; => RUN QUERY SHOW TABLE_STATISTICS; +--------------+------------+-----------+--------------+-------------------------+ | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes | +--------------+------------+-----------+--------------+-------------------------+ | dbt3 | orders | 183 | 0 | 0 | | dbt3 | lineitem | 1279 | 0 | 0 | | dbt3 | customer | 183 | 0 | 0 | +--------------+------------+-----------+--------------+-------------------------+ SHOW INDEX_STATISTICS; +--------------+------------+-----------------------+-----------+ | Table_schema | Table_name | Index_name | Rows_read | +--------------+------------+-----------------------+-----------+ | dbt3 | customer | PRIMARY | 183 | | dbt3 | lineitem | i_l_orderkey_quantity | 1279 | | dbt3 | orders | i_o_totalprice | 183 | +--------------+------------+-----------------------+-----------+
  • 69. 69 07:48:08 AM Join analysis: PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] ● summary tables with read/write statistics – table_io_waits_summary_by_table – table_io_waits_summary_by_index_usage ● Superset of the userstat tables ● More overhead ● Not possible to associate statistics with a query => truncate stats tables before running a query ● Possible bug – performance schema not ignored – Disable by UPDATE setup_consumers SET ENABLED = 'NO' where name = 'global_instrumentation';
  • 70. 70 07:48:08 AM Analyze joins via PERFORMANCE SCHEMA: SHOW TABLE_STATISTICS analogue select object_schema, object_name, count_read, count_write, sum_timer_read, sum_timer_write, ... from table_io_waits_summary_by_table where object_schema = 'dbt3' and count_star > 0; +---------------+-------------+------------+-------------+ | object_schema | object_name | count_read | count_write | +---------------+-------------+------------+-------------+ | dbt3 | customer | 183 | 0 | | dbt3 | lineitem | 1462 | 0 | | dbt3 | orders | 184 | 0 | +---------------+-------------+------------+-------------+ +----------------+-----------------+ | sum_timer_read | sum_timer_write | ... +----------------+-----------------+ | 8326528406 | 0 | | 12117332778 | 0 | | 7946312812 | 0 | +----------------+-----------------+
  • 71. 71 07:48:08 AM Analyze joins via PERFORMANCE SCHEMA: SHOW INDEX_STATISTICS analogue select object_schema, object_name, index_name, count_read, sum_timer_read, sum_timer_write, ... from table_io_waits_summary_by_index_usage where object_schema = 'dbt3' and count_star > 0 and index_name is not null; +---------------+-------------+-----------------------+------------+ | object_schema | object_name | index_name | count_read | +---------------+-------------+-----------------------+------------+ | dbt3 | customer | PRIMARY | 183 | | dbt3 | lineitem | i_l_orderkey_quantity | 1462 | | dbt3 | orders | i_o_totalprice | 184 | +---------------+-------------+-----------------------+------------+ +----------------+-----------------+ | sum_timer_read | sum_timer_write | ... +----------------+-----------------+ | 8326528406 | 0 | | 12117332778 | 0 | | 7946312812 | 0 | +----------------+-----------------+
  • 72. 72 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 73. 73 07:48:08 AM Batched joins ● Optimization for analytical queries ● Analytic queries shovel through lots of data – e.g. “average size of order in the last month” – or “pairs of goods purchased together” ● Indexes,etc won't help when you really need to look at all data ● More data means greater chance of being io-bound ● Solution: batched joins
  • 74. 74 07:48:08 AM Batched Key Access Idea
  • 75. 75 07:48:08 AM Batched Key Access Idea
  • 76. 76 07:48:08 AM Batched Key Access Idea
  • 77. 77 07:48:08 AM Batched Key Access Idea
  • 78. 78 07:48:08 AM Batched Key Access Idea
  • 79. 79 07:48:08 AM Batched Key Access Idea
  • 80. 80 07:48:08 AM Batched Key Access Idea ● Non-BKA join hits data at random ● Caches are not used efficiently ● Prefetching is not useful
  • 81. 81 07:48:08 AM Batched Key Access Idea ● BKA implementation accesses data in order ● Takes advantages of caches and prefetching
  • 82. 82 07:48:08 AM Batched Key access effect set join_cache_level=6; select max(l_extendedprice) from orders, lineitem where l_orderkey=o_orderkey and o_orderdate between $DATE1 and $DATE2 The benchmark was run with ● Various BKA buffer size ● Various size of $DATE1...$DATE2 range
  • 83. 83 07:48:08 AM Batched Key Access Performance -2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,000 0 500 1000 1500 2000 2500 3000 BKA join performance depending on buffer size query_size=1, regular query_size=1, BKA query_size=2, regular query_size=2, BKA query_size=3, regular query_size=3, BKA Buffer size, bytes Querytime,sec Performance without BKA Performance with BKA, given sufficient buffer size● 4x-10x speedup ● The more the data, the bigger the speedup ● Buffer size setting is very important.
  • 84. 84 07:48:08 AM Batched Key Access settings ● Needs to be turned on set join_buffer_size= 32*1024*1024; set join_cache_level=6; -- MariaDB set optimizer_switch='batched_key_access=on' -- MySQL 5.6 set optimizer_switch='mrr=on'; set optimizer_switch='mrr_sort_keys=on'; -- MariaDB only ● Further join_buffer_size tuning is watching – Query performance – Handler_mrr_init counter and increasing join_buffer_size until either saturates.
  • 85. 85 07:48:08 AM Batched Key Access - conclusions ● Targeted at big joins ● Needs to be enabled manually ● @@join_buffer_size is the most important setting ● MariaDB's implementation is a superset of MySQL's.
  • 86. 86 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 87. 87 07:48:08 AM ORDER BY GROUP BY aggregates
  • 88. 88 07:48:08 AM Aggregate functions, no GROUP BY ● COUNT, SUM, AVG, etc need to examine all rows select SUM(column) from tbl needs to examine the whole tbl. ● MIN and MAX can use index for lookup +--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ |id|select_type|table|type|possible_keys|key |key_len|ref |rows|Extra | +--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ |1 |SIMPLE |NULL |NULL|NULL |NULL|NULL |NULL|NULL|Select tables optimized away| +--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ index (o_orderdate) select max(o_orderdate) from orders select min(o_orderdate) from orders where o_orderdate > '1995-05-01' select max(o_orderdate) from orders where o_orderpriority='1-URGENT' index (o_orderpriority, o_orderdate)
  • 89. 89 07:48:08 AM ORDER BY … LIMIT Three algorithms ● Use an index to read in order ● Read one table, sort, join - “Using filesort” ● Execute join into temporary table and then sort - “Using temporary; Using filesort”
  • 90. 90 07:48:08 AM Using index to read data in order ● No special indication in EXPLAIN output ● LIMIT n: as soon as we read n records, we can stop!
  • 91. 91 07:48:08 AM A problem with LIMIT N optimization `orders` has 1.5 M rows explain select * from orders order by o_orderdate desc limit 10; +--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra| +--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ |1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 | | +--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ select * from orders where o_orderpriority='1-URGENT' order by o_orderdate desc limit 10; +--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ |1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 |Using where| +--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ ● A problem: – 1.5M rows, 300K of them 'URGENT' – Scanning by date, when will we find 10 'URGENT' rows? – No good solution so far.
  • 92. 92 07:48:08 AM Using filesort strategy ● Have to read the entire first table ● For remaining, can apply LIMIT n ● ORDER BY can only use columns of tbl1.
  • 93. 93 07:48:08 AM Using temporary; Using filesort ● ORDER BY clause can use columns of any table ● LIMIT is applied only after executing the entire join and sorting.
  • 94. 94 07:48:08 AM ORDER BY - conclusions ● Resolving ORDER BY with index allows very efficient handling for LIMIT – Optimization for WHERE unused_condition ORDER BY … LIMIT n is challenging. ● Use sql_big_result, IGNORE INDEX FOR ORDER BY ● Using filesort – Needs all ORDER BY columns in the first table – Take advantage of LIMIT when doing join to non-first tables ● Using where; Using filesort is least efficient.
  • 95. 95 07:48:08 AM GROUP BY strategies There are three strategies ● Ordered index scan ● Loose Index Scan (LooseScan) ● Groups table (Using temporary; [Using filesort]).
  • 96. 96 07:48:08 AM Ordered index scan ● Groups are enumerated one after another ● Can compute aggregates on the fly ● Loose index scan is also able to jump to next group.
  • 97. 97 07:48:08 AM Execution of GROUP BY with temptable
  • 99. 99 07:48:08 AM Subquery optimizations ● Before MariaDB 5.3/MySQL 5.6 - “don't use subqueries” ● Queries that caused most of the pain – SELECT … FROM tbl WHERE col IN (SELECT …) - semi-joins – SELECT … FROM (SELECT …) - derived tables ● MariaDB 5.3 and MySQL 5.6 – Have common inheritance, MySQL 6.0 alpha – Huge (100x, 1000x) speedups for painful areas – Other kinds of subqueries received a speedup, too – MariaDB 5.3/5.5 has a superset of MySQL 5.6's optimizations ● 5.6 handles some un-handled edge cases, too
  • 100. 100 07:48:08 AM Tuning for subqueries ● “Before”: one execution strategy – No tuning possible ● “After”: similar to joins – Reasonable execution strategies supported – Need indexes – Need selective conditions – Support batching in most important cases ● Should be better 9x% of the time.
  • 101. 101 07:48:08 AM What if it still picks a poor query plan? For both MariaDB and MySQL: ● Check EXPLAIN [EXTENDED], find a keyword around a subquery table ● Google “site:kb.askmonty.org $subuqery_keyword” or https://kb.askmonty.org/en/subquery-optimizations-map/ ● Find which optimization it was ● set optimizer_switch='$subquery_optimization=off'