%in Harare+277-882-255-28 abortion pills for sale in Harare
MariaDB 10.3 Optimizer - where does it stand
1. Santa Clara, California | April 23th – 25th, 2018
Sergey Petrunia MariaDB Project
Vicen iu Ciorbaru MariaDB Foundationț
Santa Clara, California | April 23th – 25th, 2018
Sergey Petrunia MariaDB Project
Vicen iu Ciorbaru MariaDB Foundationț
MariaDB Optimizer in 10.3,
where does it stand?
2. 2
Agenda
● New releases of MySQL and MariaDB
– MariaDB 10.2 and 10.3
– MySQL 8.0
● Optimizer related features
– Histograms
– Non-recursive CTEs
● Derived table optimizations
– Window Functions
● Let’s look and compare
– Also look at PostgreSQL and SQL Server
4. 4
Condition Selectivity
Query optimizer needs to decide on a plan to execute the query
Goal is to get the shortest running time
• Chose access method
- Index Access, Hash Join, BKA, etc.
• Choose correct join order to minimize the cost of reading rows
- Usually, minimizing rows read minimizes execution time
- Sometimes reading more rows is advantageous, if table / index is all in memory
5. 5
Condition Selectivity
Query optimizer needs to decide on a plan to execute the query
Goal is to get the shortest running time
• Chose access method
- Index Access, Hash Join, BKA, etc.
• Choose correct join order to minimize the cost of reading rows
- Usually, minimizing rows read minimizes execution time
- Sometimes reading more rows is advantageous, if table / index is all in memory
Use a cost model to estimate how long an execution plan would take
6. 6
Condition Selectivity
Query optimizer needs to decide on a plan to execute the query
Goal is to get the shortest running time
• Chose access method
- Index Access, Hash Join, BKA, etc.
• Choose correct join order to minimize the cost of reading rows
- Usually, minimizing rows read minimizes execution time
- Sometimes reading more rows is advantageous, if table / index is all in memory
Use a cost model to estimate how long an execution plan would take
For each condition in the where clause (and having) we compute
• Condition selectivity
- How many rows of the table is this condition going to accept? 10%, 20%, 90% ?
7. 7
Condition Selectivity
Query optimizer needs to decide on a plan to execute the query
Goal is to get the shortest running time
• Chose access method
- Index Access, Hash Join, BKA, etc.
• Choose correct join order to minimize the cost of reading rows
- Usually, minimizing rows read minimizes execution time
- Sometimes reading more rows is advantageous, if table / index is all in memory
Use a cost model to estimate how long an execution plan would take
For each condition in the where clause (and having) we compute
• Condition selectivity
- How many rows of the table is this condition going to accept? 10%, 20%, 90% ?
Getting the estimates
right is important!
9. 9
Condition Selectivity
Suppose we have query with 10 tables: T1, T2, T3, … T10
Query optimizer will:
• Estimate the number of rows that it will read from each table
• Based on the conditions in the where (and having) clause
10. 10
Condition Selectivity
Suppose we have query with 10 tables: T1, T2, T3, … T10
Query optimizer will:
• Estimate the number of rows that it will read from each table
• Based on the conditions in the where (and having) clauses
Assume estimates have an average error coefficient e
• Total number of estimated rows read is:
- (e * #T1) * (e * #T2) * (e * #T3) * … * (e * #T10)
• Where #T1..#T10 is the actual number of rows read for each table
11. 11
Condition Selectivity
Suppose we have query with 10 tables: T1, T2, T3, … T10
Query optimizer will:
• Estimate the number of rows that it will read from each table
• Based on the conditions in the where (and having) clauses
Assume estimates have an average error coefficient e
• Total number of estimated rows read is:
- (e * #T1) * (e * #T2) * (e * #T3) * … * (e * #T10)
• Where #T1..#T10 is the actual number of rows read for each table
The estimation error is amplified, the more tables there are in a join
• If we under/over estimate by a factor of 2 final error factor is 1024!
• If error is only 1.5 (off by 50%), final error factor is ~60
12. 12
Condition Selectivity
How does optimizer produce estimates?
• Condition analysis:
- Is it possible to satisfy conditions? t1.a > 10 and t1.a < 5
- Equality condition on a distinct column?
• Index dives to get number of rows in a range
• Guesstimates (MySQL)
• Histograms for non-indexed columns
14. 14
Histograms estimate a distribution
Multiple types of histograms
• Equi-Width Histograms
Histograms
15. 15
Histograms estimate a distribution
Multiple types of histograms
• Equi-Width Histograms
- Not uniform information
- Many values in one bucket (5)
- Other buckets take few values (1)
Histograms
16. 16
Histograms estimate a distribution
Multiple types of histograms
• Equi-Width Histograms
- Not uniform information
- Many values in one bucket (5)
- Other buckets take few values (1)
Histograms
17. 17
Histograms estimate a distribution
Multiple types of histograms
• Equi-Width Histograms
- Not uniform information
- Many values in one bucket (5)
- Other buckets take few values (1)
• Equi-Height Histograms
- All bins have same #values
- More bins where there are more
Values
Histograms
18. 18
Histograms estimate a distribution
Multiple types of histograms
• Equi-Width Histograms
- Not uniform information
- Many values in one bucket (5)
- Other buckets take few values (1)
• Equi-Height Histograms
- All bins have same #values
- More bins where there are more
Values
• Most Common Values Histograms
- Useful for ENUM columns
- One bin per value
Histograms
19. 19
Histograms in MariaDB
MariaDB histograms are collected by doing a full table scan
• Needs to be done manually using ANALYZE TABLE … PERSISTENT
Stored inside
• mysql.table_stats, mysql.column_stats, mysql.index_stats
• As a binary value (max 255 bytes), single / double precision
• Special function to decode, decode_histogram()
Can be manually updated
• One can run data collection on a slave, then propagate results
Not enabled by default, needs a few switches turned on to work
20. 20
Histograms in MySQL
MySQL histograms are collected by doing a full table scan
• Needs to be done manually using ANALYZE TABLE … UPDATE HISTOGRAM
• Can collect all data or perform sampling by skipping rows, based on max memory
allocation
Stored inside data dictionary
• Can be viewed through INFORMATION_SCHEMA.column_statistics
• Stored as Equi-Width (Singleton) or Equi-Height
• Visible as JSON
Can not be manually updated
• No obvious easy way to share statistics
Enabled by default, will be used when available
21. 21
Histograms in PostgreSQL
PostgreSQL histograms are collected by doing a true random read
• Can be collected manually with ANALYZE
• Also collected automatically when VACUUM runs
Stores equal-height and most common values at the same time
• Equal-height histogram doesn’t cover MCV
Can be manually updated
• One could import histograms from slave instances
• VACUUM auto-collection seems to cover the use case
22. 22
Using Histograms
Histograms are useful for range conditions
• Equi-width or equi-height:
- COLUMN > constant
• Most Common Values (Singleton):
- COLUMN = constant
Problematic when multiple columns are involved:
• t1.COL1 > 100 AND t1.COL2 > 1000
Most optimizers assume column values are independent
• P(A ∩ B) = P(A) * P(B) vs P(A ∩ B) = P(A) * P(B | A)
PostgreSQL 10 has added support for multi-variable distributions.
MySQL assumes independent values.
MariaDB doesn’t handle multi-variable case well either.
23. 23
Using Histograms
Sample database world:
select city.name
from city
where (city.population > 10 mil or
city.population < 10 thousand)
MariaDB MySQL PostgreSQL
Estimated Rows Filtered 1.95% 1.09% 1.05%
Actual Rows Filtered 1.05 %
24. 24
Using Histograms
Table with 2 columns A and B
• t1.a always equals t1.b
• 10 distinct values, each value occurs with 10% probability
select t1.A, t1.B
from t1
where t1.A = t1.B and t1.A = 5
MariaDB MySQL PostgreSQL
Estimated Rows Filtered 1.03% 1% 10%
Actual Rows Filtered 10%
25. 25
Conclusions
MariaDB
• Slightly less precise than MySQL, but smaller in size
• Same problem with correlated data as MySQL
• Performs full-table-scan, no sampling support
• Easy to share between instances
MySQL
• Histograms provide good estimates for real world data
• Poor performance with highly correlated data
• Performs full-table-scan, supports sampling
PostgreSQL
• Estimates on par with MySQL and MariaDB
• Support for multi-variable distributions!
• True sampling
27. 27
A set of related optimizations
Some are new, some are old:
● Derived table merge
● Condition pushdown
– Condition pushdown through window functions
● GROUP BY splitting
28. 28
Background – derived table merge
● “VIP customers and their big orders from October”
select *
from
vip_customer,
(select *
from orders
where order_date BETWEEN '2017-10-01' and '2017-10-31'
) as OCT_ORDERS
where
OCT_ORDERS.amount > 1M and
OCT_ORDERS.customer_id = customer.customer_id
29. 29
Naive execution
select *
from
vip_customer,
(select *
from orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31'
) as OCT_ORDERS
where
OCT_ORDERS.amount > 1M and
OCT_ORDERS.customer_id =
vip_customer.customer_id
orders
vip_customer
1 – compute
oct_orders
2- do join OCT_ORDERS
amount > 1M
30. 30
Derived table merge
select *
from
vip_customer,
(select *
from orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31'
) as OCT_ORDERS
where
OCT_ORDERS.amount > 1M and
OCT_ORDERS.customer_id =
vip_customer.customer_id
select *
from
vip_customer,
orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31'
and
orders.amount > 1M and
orders.customer_id =
vip_customer.customer_id
31. 31
Execution after merge
vip_customer
Join
orders
select *
from
vip_customer,
orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31'
and
orders.amount > 1M and
orders.customer_id =
vip_customer.customer_id
Made in October
amount > 1M
● Allows the optimizer to join customer→orders or orders→customer
● Good for optimization
32. 32
What if the subquery has a GROUP BY ?
● Merging is only possible when the “final” operation of the subquery is a join
● Can’t merge if it’s a GROUP BY/DISTINCT/ORDER BY LIMIT/etc
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
select * from OCT_TOTALS where customer_id=1
33. 33
Execution is inefficient
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
select * from OCT_TOTALS where customer_id=1
orders
1 – compute all totals
2- get customer=1
OCT_TOTALS
customer_id=1
Sum
34. 34
Condition pushdown optimization
select *
from OCT_TOTALS
where customer_id=1
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
● Can push down conditions on GROUP
BY columns
● … to filter out rows that go into groups
we don’t care about
35. 35
Condition pushdown
select *
from OCT_TOTALS
where customer_id=1
orders
1 – find customer_id=1
OCT_TOTALS,
customer_id=1
customer_id=1
Sum
● Looking only at groups you’re interested in is much more efficient
– Pushing into HAVING clause is useful, too.
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
orders
36. 36
Pushdown for inferred conditions (in MariaDB)
select
customer.customer_name,
TOTAL_AMT
from
customer, OCT_TOTALS
where
customer.customer_id=OCT_TOTALS.customer_id and
customer.customer_id=1
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
OCT_TOTALS.customer_id=1
37. 37
Condition Pushdown through Window Functions
● “Customer’s biggest orders”
create view top_three_orders as
select *
from
(
select
customer_id,
amount,
rank() over (partition by customer_id
order by amount desc
) as order_rank
from orders
) as ordered_orders
where order_rank<3
select * from top_three_orders where customer_id=1
+-------------+--------+------------+
| customer_id | amount | order_rank |
+-------------+--------+------------+
| 1 | 10000 | 1 |
| 1 | 9500 | 2 |
| 1 | 400 | 3 |
| 2 | 3200 | 1 |
| 2 | 1000 | 2 |
| 2 | 400 | 3 |
...
38. 38
Condition pushdown through Window Functions
Without condition pushdown
● Compute top_three_orders
for all customers
● select rows with
customer_id=1
select * from top_three_orders where customer_id=1
With condition pushdown
● Only compute top_three_orders
for customer_id=1
– This is much faster
– Can take advantage of
index on customer_id
39. 39
Summary so far
● Derived table merge
– Available since MySQL/MariaDB 5.1 and in most other databases
● Condition pushdown
– Available in PostgreSQL, MariaDB 10.2
– Not available in MySQL 5.7 or 8.0
– Limitations:
● MariaDB doesn’t push from HAVING into WHERE (MDEV-7486)
● PostgreSQL doesn’t push inferred conditions
● Condition pushdown through window functions
– Available in PostgreSQL, MariaDB 10.3
41. 41
Split grouping use case
select *
from
customer, OCT_TOTALS
where
customer.customer_id=OCT_TOTALS.customer_id and
customer.customer_name IN ('Customer 1', 'Customer 2')
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
● Compute a table of groups
(OCT_TOTALS)
● Join the groups to another
table (customer)
● The other table has a
selective restriction (only
need two customers)
● But condition pushdown can’t
be used
42. 42
Execution, the old way
Sum
orders
select *
from
customer, OCT_TOTALS
where
customer.customer_id=
OCT_TOTALS.customer_id and
customer.customer_name IN ('Customer 1',
'Customer 2')
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
Customer 1
Customer 2
Customer 3
Customer 100
Customer 1
Customer 2
Customer 3
Customer 100
customer
Customer 1
Customer 2
OCT_TOTALS
● Inefficient, OCT_TOTALS
is computed for *all*
customers.
43. 43
Split grouping execution (1)
Sum
customer
Customer 1
Customer 100
orders
Customer 1
Customer 1 Sum
● Similar to “LATERAL DERIVED”
● Pick Customer1, compute part of
OCT_TOTALS table for him.
44. 44
Split grouping execution (2)
Sum
customer
Customer 2
Customer 2
Customer 1
Customer 100
orders
Customer 1
Customer 1
Customer 2
Sum
SumSum
● Similar to “LATERAL DERIVED”
● Pick Customer1, compute part of
OCT_TOTALS table for him
● Pick Customer2, compute part of
OCT_TOTALS table for him
45. 45
Split grouping execution (3)
Sum
customer
Customer 2
Customer 2
Customer 1
Customer 100
orders
Customer 1
Customer 1
Customer 2
Sum
SumSum
● Similar to “LATERAL DERIVED”
● Pick Customer1, compute part of
OCT_TOTALS table for him
● Pick Customer2, compute part of
OCT_TOTALS table for him
● ...
46. 46
Split Grouping prerequisites
Sum
customer
Customer 2
Customer 2
Customer 1
Customer 100
orders
Customer 1
Customer 1
Customer 2
Sum
SumSum
● There is a join condition that “selects” one
GROUP BY group:
– OCT_TOTALS.customer_id=
customer.customer_id
● The join order allows to make “lookups” in the
grouped temp table
– customer→ OCT_TOTALS
● There is an index that allows to read only one
GROUP BY group.
– INDEX(orders.customer_id)
OCT_TOTALS
47. 47
Split grouping execution
● Available since MariaDB 10.3
● The optimizer makes a critera + cost-based choice whether to use the optimization
● EXPLAIN shows “LATERAL DERIVED”
● @@optimizer_switch flag: split_materialization (ON by default)
select *
from
customer, OCT_TOTALS
where
customer.customer_id=
OCT_TOTALS.customer_id and
customer.customer_name IN ('Customer 1',
'Customer 2')
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
+------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+
| 1 | PRIMARY | customer | ALL | PRIMARY | NULL | NULL | NULL | 1000 | |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | 4 | customer.customer_id | 36 | |
| 2 | LATERAL DERIVED | orders | ref | customer_id | customer_id | 4 | customer.customer_id | 365 | Using where |
+------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+
48. 48
Summary so far
● Derived table merge
– Available since MySQL/MariaDB 5.1 and in most other databases
● Condition pushdown
– Available in PostgreSQL, MariaDB 10.2
– Not available in MySQL 5.7 or 8.0
● Condition pushdown through window functions
– Available in PostgreSQL, MariaDB 10.3
● Split grouping optimization
– MariaDB 10.3 only
50. 50
CTE name
CTE Body
CTE Usage
with engineers as (
select *
from employees
where dept='Engineering'
)
select *
from engineers
where ...
WITH
CTE syntax
Similar to DERIVED
tables
“Query-local VIEWs”
51. 51
select *
from
(
select *
from employees
where
dept='Engineering'
) as engineers
where
...
with engineers as (
select *
from employees
where dept='Engineering'
)
select *
from engineers
where
...
CTEs are like derived tables
52. 52
with engineers as (
select * from employees
where dept in ('Development','Support')
),
eu_engineers as (
select * from engineers where country IN ('NL',...)
)
select
...
from
eu_engineers;
Use case #1: CTEs refer to CTEs
More readable than nested FROM(SELECT …)
53. 53
with engineers as (
select * from employees
where dept in ('Development','Support')
),
select *
from
engineers E1
where not exists (select 1
from engineers E2
where E2.country=E1.country
and E2.name <> E1.name);
Use case #2: Multiple uses of CTE
Anti-self-join
54. 54
select *
from
sales_product_year CUR,
sales_product_year PREV,
where
CUR.product=PREV.product and
CUR.year=PREV.year + 1 and
CUR.total_amt > PREV.total_amt
with sales_product_year as (
select
product,
year(ship_date) as year,
sum(price) as total_amt
from
item_sales
group by
product, year
)
Use case #2: example 2
Year-over-year comparisons
55. 55
Optimizations for non-recursive CTEs
1. The same set as for derived tables
– Merge
– Condition pushdown
● through window functions
– Lateral derived
2. Compute CTE once if it is used multiple times
56. 56
Merge
Condition
pushdown
Lateral
derived
CTE
reuse
MariaDB 10.3 ✔ ✔ ✔ ✘
MS SQL Server ✔ ✔ ? ✘
PostgreSQL ✘ ✘ ✘ ✔
MySQL 8.0 ✔ ✘ ✘ ✔
CTE Optimizations
Merge and Condition Pushdown are the most important
MariaDB supports them, like MS SQL.
PostgreSQL’s approach is *weird*
“CTEs are optimization barriers”
MySQL 8.0: “try merging, otherwise reuse”
58. 58
Window functions optimizations
● Window functions introduced in
– MariaDB 10.2
– MySQL 8.0
● Optimizations for window functions
– Condition pushdown
– Reduce the number of sorting passes
– Streamed computation
– ORDER BY-like optimizations
59. 59
Reduce the number of sorting passes
tbl
tbl
tbl
join
sort
select
rank() over (order by col1),
ntile(4)over (order by col2),
rank() over (order by ...),
from
tbl1
join tbl2 on ...
● Each window function requires a sort
● Identical PARTITION/ORDER BY must share the sort step
● Compatible may share the sort step
● Supported by all: MariaDB, MySQL 8, PostgreSQL, ...
compute
window
function
60. 60
Streamed computation
win_func( )
over (partition by ...
order by ...
rows between preceding N1
and following N2)
● Window function is computed from rows in the window
frame
– O (n_rows * frame_size)
● Frame moves down with the current row
● For most functions, one can update the value after the
frame has moved – this is streamed computation
– SUM, COUNT, AVG
● For some, this doesn’t hold (e.g. MAX)
old_val
new_val
cur_row
61. 61
ORDER BY [LIMIT] like optimizations
● Skip sorting if the rows come already sorted
● ORDER BY … LIMIT and descending window function
select
row_number() over (...) as RN
from
...
order by RN limit 10
● Restriction on ROW_NUMBER
select *
from (select row_number() over (...) as RN
from ...
) as T
where RN < 10
62. 62
Window functions optimization summary
Reuse
compatible
sorts
Streamed
computation
Condition
pushdown
ORDER BY
LIMIT-like
optimizations
MariaDB 10.3 ✔ ~✔ ✔ ✘
MS SQL Server ✔ ~✔ ✔ ✔
PostgreSQL ✔ ~✔ ✔ ✘
MySQL 8.0 ✔ ~✔ ✘ ✘
Everyone
has this since
it’s mandatory
for identical
sorts
Essential,
otherwise
O(N) computation
becomes O(N^2)
Very nice to
have for
analytic queries
Sometimes used for
TOP-n queries by
those with “big
database” background
63. 63
Summary
● Both MariaDB and MySQL now have histograms
– MySQL’s are larger and more precise
– Both are lagging behind PostgreSQL, still
● Derived tables: MariaDB got condition pushdown
– MariaDB 10.3: Pushdown for window functions, Split grouping
– Caught up with PostgreSQL and exceeded it.
● Non-recursive CTEs
– See derived tables
– PostgreSQL and MySQL 8 have made weird choice
● Window functions
– Similar optimizations in all three
– MySQL lacks condition pushdown (careful with VIEWs).