SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
Using histograms
to get better performance
Sergei Petrunia
Varun Gupta
Database performance
● Performance is a product of many
factors
● One of them is Query optimizer
● It produces query plans
– A “good” query plan only
reads rows that contribute to
the query result
– A “bad” query plan means
unnecessary work is done
Do my queries use bad query plans?
● Queries take a long time
● Some are just inherently hard to
compute
● Some look good but turn out bad
due to factors that were not
accounted for
Query plan cost depends on data statistics
select *
from
lineitem, orders
where
o_orderkey=l_orderkey and
o_orderdate between '1990-01-01' and '1998-12-06' and
l_extendedprice > 1000000
● orders->lineitem
vs
lineitem->orders
● Depends on
condition selectivity
Another choice optimizer has to make
select *
from
orders
where
o_orderstatus='F'
order by
order_date
limit 10
● Use index(order_date)
– Stop as soon as we find 10 matches
● Find rows with o_orderstatus='F'
– Sort by o_orderdate picking first 10
● Again, it depends on condition
selectivity.
Data statistics in MariaDB
● Table: #rows in the table
● Index
– cardinality: AVG(#lineitems per order)
– “range estimates” - #rows(t.key BETWEEN const1 and
const2)
● Non-index column? Histogram
Histogram
● Partition the value space into buckets
– Store bucket bounds and #values in the bucket
– Imprecise
– Very compact
Summary so far
● Good database performance requires good query plans
● To pick those, optimizer needs statistics about the data
– Condition selectivity is important
● Certain kinds of statistics are always available
– Indexes
– For non-indexed columns, histograms may be needed.
Do my query plans suffer
from bad statistics?
Will my queries benefit?
● Very complex question
● No definite answer
● Suggestions
– ANALYZE for statements, r_filtered.
– Slow query log
ANALYZE for statements and r_filtered
● filtered – % of rows left after applying condition (expectation)
– r_filtered - ... - the reality
● r_filtered << filtered – the optimizer didn’t know the condition is selective
– Happens on a non-first table? We are filtering out late!
●
Add histogram on the column (Check the cond in FORMAT=JSON)
analyze select *
from lineitem, orders
where o_orderkey=l_orderkey and
o_orderdate between '1990-01-01' and '1998-12-06' and
l_extendedprice > 1000000
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra |
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where|
|1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where|
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
# Query_time: 1.961549 Lock_time: 0.011164 Rows_sent: 1 Rows_examined: 11745000
# Rows_affected: 0 Bytes_sent: 73
# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No
# Filesort: No Filesort_on_disk: No Merge_passes: 0 Priority_queue: No
#
# explain: id select_type table type possible_keys key key_len ref rows r_rows
filtered r_filtered Extra
# explain: 1 SIMPLE inventory ALL NULL NULL NULL NULL 11837024
11745000.00 100.00 0.00 Using where
#
SET timestamp=1551155484;
select count(inv_date_sk) from inventory where inv_quantity_on_hand>10000;
Slow Query Log
slow-query-log
long-query-time=...
log-slow-verbosity=query_plan,explain
my.cnf
hostname-slow.log
● Rows_examined >> Rows_sent? Grouping,or a poor query plan
● log_slow_query=explain will shows ANALYZE output
Histograms in MariaDB
Histograms in MariaDB
● Available since MariaDB 10.0 (Yes)
● Used by advanced users
● Not enabled by default
● Have limitations, not user-friendly
● MariaDB 10.4
– Fixes some of the limitations
– Makes histograms easier to use
Collecting histograms
Configuration for collecting histograms
histogram_size=0
histogram_type=SINGLE_PREC_HB
histogram_size=254
histogram_type=DOUBLE_PREC_HB
● MariaDB before 10.4: change the default histogram size
● MariaDB 10.4 : enable automatic sampling
histogram_size=254
histogram_type=DOUBLE_PREC_HB
analyze_sample_percentage=100
analyze_sample_percentage=0
Histograms are [still] not collected by default
● “ANALYZE TABLE” will not collect a histogram
MariaDB> analyze table t1;
+---------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t1 | analyze | status | OK |
+---------+---------+----------+----------+
● This will collect only
– Total #rows in table
– Index cardinalities (#different values)
ANALYZE ... PERSISTENT collects histograms
– Collect statistics for everything:
analyze table t1 persistent
for columns (col1,...) indexes (idx1,...);
+---------+---------+----------+-----------------------------------------+
| Table | Op | Msg_type | Msg_text |
+---------+---------+----------+-----------------------------------------+
| test.t1 | analyze | status | Engine-independent statistics collected |
| test.t1 | analyze | status | OK |
+---------+---------+----------+-----------------------------------------+
analyze table t1 persistent for all;
Can make histogram collection automatic
set use_stat_tables='preferably';
analyze table t1;
+---------+---------+----------+-----------------------------------------+
| Table | Op | Msg_type | Msg_text |
+---------+---------+----------+-----------------------------------------+
| test.t1 | analyze | status | Engine-independent statistics collected |
| test.t1 | analyze | status | OK |
+---------+---------+----------+-----------------------------------------+
● Beware: this may be *much* slower than ANALYZE TABLE
you’re used to
● Great for migrations
Histogram collection performance
● MariaDB 10.0: uses all data in the table to build histogram
– Precise, but expensive
– Particularly so for VARCHARs
● A test on a real table:
– Real table, 740M rows, 90GB
– CHECKSUM TABLE: 5 min
– ANALYZE TABLE ... PERSISTENT FOR ALL – 30 min
MariaDB 10.4: Bernoulli sampling
● Default: analyze_sample_percentage=100
– Uses the entire table, slow
● Suggested: analyze_sample_percentage=0
– “Roll the dice” sampling, size picked automatically
analyze table t1 persistent for columns (...) indexes();
analyze table t1 persistent for all;
– full table and secondary index scans
– does a full table scan
Further plans: genuine sampling
● Work on avoiding full table scans is in progress
● Will allow to make ANALYZE TABLE collect all histograms
Making the optimizer
use histograms
Make the optimizer use histograms
@@use_stat_tables=NEVER
@@optimizer_use_condition_selectivity=1
@@use_stat_tables=PREFERABLY // also affects ANALYZE!
@@optimizer_use_condition_selectivity=4
● MariaDB before 10.4: does not use histograms
● MariaDB 10.4 : uses histograms if they are collected
@@use_stat_tables=PREFERABLY_FOR_QUERIES
@@optimizer_use_condition_selectivity=4
– remember to re-collect!
Conclusions: how to start using histograms
● MariaDB before 10.4
analyze_sample_percentage=0
use_stat_tables=PREFERABLY # Changes optimizer
optimizer_use_condition_selectivity=4 # behavior
● MariaDB 10.4
● Both: ANALYZE TABLE ... PERSISTENT FOR ...
histogram_size=254 # No risk
histogram_type=DOUBLE_PREC_HB #
Can I just have histograms
for all columns?
A stored procedure to analyze every table
CREATE PROCEDURE analyze_persistent_for_all(db_name VARCHAR(64))
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE x VARCHAR(64);
DECLARE cur1 CURSOR FOR
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_SCHEMA=db_name;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur1;
read_loop: LOOP
FETCH cur1 INTO x;
IF done THEN
LEAVE read_loop;
END IF;
SET @sql = CONCAT('analyze table ', x, ' persistent for all');
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END LOOP;
CLOSE cur1;
END|
Should I ANALYZE ... PERSISTENT every table?
● New application
– Worth giving it a try
– Provision for periodic ANALYZE
– Column correlations?
● Existing application
– Performance fixes on a case-by-case basis.
Tests and benchmarks
TPC-DS benchmark
● scale=1
● The same dataset
– without histograms: ~20 min
– after ‘call analyze_persistent_for_all(‘tpcds’) from two slides
prior: 5 min.
TPC-DS benchmark run
A customer case with ORDER BY ... LIMIT
● table/column names replaced
CREATE TABLE cars (
type varchar(10),
company varchar(20),
model varchar(20),
quantity int,
KEY quantity (quantity),
KEY type (type)
);
select * from cars
where
type='electric' and
company='audi'
order by
quantity
limit 3;
● table/column names replaced
● quantity matches the ORDER BY, but need to match condition
● type is a Restrictive index
A customer case with ORDER BY ... LIMIT
● Uses ORDER-BY compatible index by default
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: cars
type: index
possible_keys: type
key: quantity
key_len: 5
ref: const
rows: 994266
r_rows: 700706.00
filtered: 0.20
r_filtered: 0.00
Extra: Using where
1 row in set (2.098 sec)
select * from cars
where
type='electric' and
company='audi'
order by
quantity
limit 3;
A customer case with ORDER BY ... LIMIT
● Providing the optimizer with histogram
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: cars
type: ref
possible_keys: type
key: type
key_len: 13
ref: const
rows: 2022
r_rows: 3.00
filtered: 100.00
r_filtered: 100.00
Extra: Using index condition; Using where; Using filesort
1 row in set (0.010 sec)
analyze table cars persistent for all;
select * from cars
where
type='electric' and
company='audi'
order by
quantity
limit 3;
Operations
Histograms are stored in a table
CREATE TABLE mysql.column_stats (
db_name varchar(64) NOT NULL,
table_name varchar(64) NOT NULL,
column_name varchar(64) NOT NULL,
min_value varbinary(255) DEFAULT NULL,
max_value varbinary(255) DEFAULT NULL,
nulls_ratio decimal(12,4) DEFAULT NULL,
avg_length decimal(12,4) DEFAULT NULL,
avg_frequency decimal(12,4) DEFAULT NULL,
hist_size tinyint unsigned,
hist_type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB'),
histogram varbinary(255),
PRIMARY KEY (db_name,table_name,column_name)
);
TPC-DS benchmark
● Can save/restore histograms
● Can set @@optimizer_use_condition_selectivity to disable
histogram use per-thread
Caveat: correlations
Problem with correlated conditions
● Possible selectivities
– MIN(1/n, 1/m)
– (1/n) * (1/m)
– 0
select ...
from order_items
where shipdate='2015-12-15' AND item_name='christmas light'
'swimsuit'
Problem with correlated conditions
● PostgreSQL: Multi-variate statistics
– Detects functional dependencies, col1=F(col2)
– Only used for equality predicates
– Also #DISTINCT(a,b)
● MariaDB: MDEV-11107: Use table check constraints in optimizer
– In development
select ...
from order_items
where shipdate='2015-12-15' AND item_name='christmas light'
'swimsuit'
Thanks!

Contenu connexe

Tendances

Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBScaleGrid.io
 
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdfProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdfJesmar Cannao'
 
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDBHistogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDBMydbops
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBMydbops
 
PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...
PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...
PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...MinhLeNguyenAnh2
 
MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting Mydbops
 
SQL Server Transaction Management
SQL Server Transaction ManagementSQL Server Transaction Management
SQL Server Transaction ManagementMark Ginnebaugh
 
MySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptxMySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptxNeoClova
 
MySQL: Indexing for Better Performance
MySQL: Indexing for Better PerformanceMySQL: Indexing for Better Performance
MySQL: Indexing for Better Performancejkeriaki
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresEDB
 
MySQL Data Encryption at Rest
MySQL Data Encryption at RestMySQL Data Encryption at Rest
MySQL Data Encryption at RestMydbops
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in ActionSveta Smirnova
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
Oracle Advanced SQL and Analytic Functions
Oracle Advanced SQL and Analytic FunctionsOracle Advanced SQL and Analytic Functions
Oracle Advanced SQL and Analytic FunctionsZohar Elkayam
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performanceoysteing
 
ProxySQL on Kubernetes
ProxySQL on KubernetesProxySQL on Kubernetes
ProxySQL on KubernetesRené Cannaò
 
MySQL Database Architectures - 2020-10
MySQL Database Architectures -  2020-10MySQL Database Architectures -  2020-10
MySQL Database Architectures - 2020-10Kenny Gryp
 
M|18 Deep Dive: InnoDB Transactions and Write Paths
M|18 Deep Dive: InnoDB Transactions and Write PathsM|18 Deep Dive: InnoDB Transactions and Write Paths
M|18 Deep Dive: InnoDB Transactions and Write PathsMariaDB plc
 

Tendances (20)

Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDB
 
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdfProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
 
advanced sql(database)
advanced sql(database)advanced sql(database)
advanced sql(database)
 
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDBHistogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
 
PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...
PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...
PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...
 
PostgreSQL: Advanced indexing
PostgreSQL: Advanced indexingPostgreSQL: Advanced indexing
PostgreSQL: Advanced indexing
 
MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting
 
SQL Server Transaction Management
SQL Server Transaction ManagementSQL Server Transaction Management
SQL Server Transaction Management
 
MySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptxMySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptx
 
MySQL: Indexing for Better Performance
MySQL: Indexing for Better PerformanceMySQL: Indexing for Better Performance
MySQL: Indexing for Better Performance
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
 
MySQL Data Encryption at Rest
MySQL Data Encryption at RestMySQL Data Encryption at Rest
MySQL Data Encryption at Rest
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in Action
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
Oracle Advanced SQL and Analytic Functions
Oracle Advanced SQL and Analytic FunctionsOracle Advanced SQL and Analytic Functions
Oracle Advanced SQL and Analytic Functions
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 
ProxySQL on Kubernetes
ProxySQL on KubernetesProxySQL on Kubernetes
ProxySQL on Kubernetes
 
MySQL Database Architectures - 2020-10
MySQL Database Architectures -  2020-10MySQL Database Architectures -  2020-10
MySQL Database Architectures - 2020-10
 
M|18 Deep Dive: InnoDB Transactions and Write Paths
M|18 Deep Dive: InnoDB Transactions and Write PathsM|18 Deep Dive: InnoDB Transactions and Write Paths
M|18 Deep Dive: InnoDB Transactions and Write Paths
 

Similaire à Using histograms to optimize database queries

MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015Dave Stokes
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015Dave Stokes
 
MariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histogramsMariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histogramsSergey Petrunya
 
Adaptive Query Optimization in 12c
Adaptive Query Optimization in 12cAdaptive Query Optimization in 12c
Adaptive Query Optimization in 12cAnju Garg
 
A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013Connor McDonald
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query OptimizationAnju Garg
 
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013Sergey Petrunya
 
Adapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cAdapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cMauro Pagano
 
Need for Speed: MySQL Indexing
Need for Speed: MySQL IndexingNeed for Speed: MySQL Indexing
Need for Speed: MySQL IndexingMYXPLAIN
 
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...Sergey Petrunya
 
SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所Hiroshi Sekiguchi
 
Histograms in 12c era
Histograms in 12c eraHistograms in 12c era
Histograms in 12c eraMauro Pagano
 
PostgreSQL 9.5 Features
PostgreSQL 9.5 FeaturesPostgreSQL 9.5 Features
PostgreSQL 9.5 FeaturesSaiful
 
Shaping Optimizer's Search Space
Shaping Optimizer's Search SpaceShaping Optimizer's Search Space
Shaping Optimizer's Search SpaceGerger
 
New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12Sergey Petrunya
 
Advanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and AnalysisAdvanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and AnalysisMYXPLAIN
 

Similaire à Using histograms to optimize database queries (20)

MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
 
MariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histogramsMariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histograms
 
Adaptive Query Optimization in 12c
Adaptive Query Optimization in 12cAdaptive Query Optimization in 12c
Adaptive Query Optimization in 12c
 
A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
 
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
 
Adapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cAdapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12c
 
Pro PostgreSQL
Pro PostgreSQLPro PostgreSQL
Pro PostgreSQL
 
Need for Speed: MySQL Indexing
Need for Speed: MySQL IndexingNeed for Speed: MySQL Indexing
Need for Speed: MySQL Indexing
 
MySQL performance tuning
MySQL performance tuningMySQL performance tuning
MySQL performance tuning
 
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
 
SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所
 
Histograms in 12c era
Histograms in 12c eraHistograms in 12c era
Histograms in 12c era
 
PostgreSQL 9.5 Features
PostgreSQL 9.5 FeaturesPostgreSQL 9.5 Features
PostgreSQL 9.5 Features
 
Shaping Optimizer's Search Space
Shaping Optimizer's Search SpaceShaping Optimizer's Search Space
Shaping Optimizer's Search Space
 
sqltuningcardinality1(1).ppt
sqltuningcardinality1(1).pptsqltuningcardinality1(1).ppt
sqltuningcardinality1(1).ppt
 
New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12
 
MariaDB Temporal Tables
MariaDB Temporal TablesMariaDB Temporal Tables
MariaDB Temporal Tables
 
Advanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and AnalysisAdvanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and Analysis
 

Plus de Sergey Petrunya

MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesSergey Petrunya
 
Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Sergey Petrunya
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesSergey Petrunya
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureSergey Petrunya
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace WalkthroughSergey Petrunya
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesSergey Petrunya
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что новогоSergey Petrunya
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeSergey Petrunya
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Sergey Petrunya
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkSergey Petrunya
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standSergey Petrunya
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18Sergey Petrunya
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3Sergey Petrunya
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Sergey Petrunya
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howSergey Petrunya
 
Эволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBЭволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBSergey Petrunya
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Sergey Petrunya
 
MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.Sergey Petrunya
 

Plus de Sergey Petrunya (20)

MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixes
 
Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimates
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger picture
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что нового
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit hole
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it stand
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3
 
MyRocks in MariaDB
MyRocks in MariaDBMyRocks in MariaDB
MyRocks in MariaDB
 
Say Hello to MyRocks
Say Hello to MyRocksSay Hello to MyRocks
Say Hello to MyRocks
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and how
 
Эволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBЭволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDB
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
 
MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.
 

Dernier

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 

Dernier (20)

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 

Using histograms to optimize database queries

  • 1. Using histograms to get better performance Sergei Petrunia Varun Gupta
  • 2. Database performance ● Performance is a product of many factors ● One of them is Query optimizer ● It produces query plans – A “good” query plan only reads rows that contribute to the query result – A “bad” query plan means unnecessary work is done
  • 3. Do my queries use bad query plans? ● Queries take a long time ● Some are just inherently hard to compute ● Some look good but turn out bad due to factors that were not accounted for
  • 4. Query plan cost depends on data statistics select * from lineitem, orders where o_orderkey=l_orderkey and o_orderdate between '1990-01-01' and '1998-12-06' and l_extendedprice > 1000000 ● orders->lineitem vs lineitem->orders ● Depends on condition selectivity
  • 5. Another choice optimizer has to make select * from orders where o_orderstatus='F' order by order_date limit 10 ● Use index(order_date) – Stop as soon as we find 10 matches ● Find rows with o_orderstatus='F' – Sort by o_orderdate picking first 10 ● Again, it depends on condition selectivity.
  • 6. Data statistics in MariaDB ● Table: #rows in the table ● Index – cardinality: AVG(#lineitems per order) – “range estimates” - #rows(t.key BETWEEN const1 and const2) ● Non-index column? Histogram
  • 7. Histogram ● Partition the value space into buckets – Store bucket bounds and #values in the bucket – Imprecise – Very compact
  • 8. Summary so far ● Good database performance requires good query plans ● To pick those, optimizer needs statistics about the data – Condition selectivity is important ● Certain kinds of statistics are always available – Indexes – For non-indexed columns, histograms may be needed.
  • 9. Do my query plans suffer from bad statistics?
  • 10. Will my queries benefit? ● Very complex question ● No definite answer ● Suggestions – ANALYZE for statements, r_filtered. – Slow query log
  • 11. ANALYZE for statements and r_filtered ● filtered – % of rows left after applying condition (expectation) – r_filtered - ... - the reality ● r_filtered << filtered – the optimizer didn’t know the condition is selective – Happens on a non-first table? We are filtering out late! ● Add histogram on the column (Check the cond in FORMAT=JSON) analyze select * from lineitem, orders where o_orderkey=l_orderkey and o_orderdate between '1990-01-01' and '1998-12-06' and l_extendedprice > 1000000 +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra | +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where| |1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where| +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
  • 12. # Query_time: 1.961549 Lock_time: 0.011164 Rows_sent: 1 Rows_examined: 11745000 # Rows_affected: 0 Bytes_sent: 73 # Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: No Filesort_on_disk: No Merge_passes: 0 Priority_queue: No # # explain: id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra # explain: 1 SIMPLE inventory ALL NULL NULL NULL NULL 11837024 11745000.00 100.00 0.00 Using where # SET timestamp=1551155484; select count(inv_date_sk) from inventory where inv_quantity_on_hand>10000; Slow Query Log slow-query-log long-query-time=... log-slow-verbosity=query_plan,explain my.cnf hostname-slow.log ● Rows_examined >> Rows_sent? Grouping,or a poor query plan ● log_slow_query=explain will shows ANALYZE output
  • 14. Histograms in MariaDB ● Available since MariaDB 10.0 (Yes) ● Used by advanced users ● Not enabled by default ● Have limitations, not user-friendly ● MariaDB 10.4 – Fixes some of the limitations – Makes histograms easier to use
  • 16. Configuration for collecting histograms histogram_size=0 histogram_type=SINGLE_PREC_HB histogram_size=254 histogram_type=DOUBLE_PREC_HB ● MariaDB before 10.4: change the default histogram size ● MariaDB 10.4 : enable automatic sampling histogram_size=254 histogram_type=DOUBLE_PREC_HB analyze_sample_percentage=100 analyze_sample_percentage=0
  • 17. Histograms are [still] not collected by default ● “ANALYZE TABLE” will not collect a histogram MariaDB> analyze table t1; +---------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------+---------+----------+----------+ | test.t1 | analyze | status | OK | +---------+---------+----------+----------+ ● This will collect only – Total #rows in table – Index cardinalities (#different values)
  • 18. ANALYZE ... PERSISTENT collects histograms – Collect statistics for everything: analyze table t1 persistent for columns (col1,...) indexes (idx1,...); +---------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +---------+---------+----------+-----------------------------------------+ | test.t1 | analyze | status | Engine-independent statistics collected | | test.t1 | analyze | status | OK | +---------+---------+----------+-----------------------------------------+ analyze table t1 persistent for all;
  • 19. Can make histogram collection automatic set use_stat_tables='preferably'; analyze table t1; +---------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +---------+---------+----------+-----------------------------------------+ | test.t1 | analyze | status | Engine-independent statistics collected | | test.t1 | analyze | status | OK | +---------+---------+----------+-----------------------------------------+ ● Beware: this may be *much* slower than ANALYZE TABLE you’re used to ● Great for migrations
  • 20. Histogram collection performance ● MariaDB 10.0: uses all data in the table to build histogram – Precise, but expensive – Particularly so for VARCHARs ● A test on a real table: – Real table, 740M rows, 90GB – CHECKSUM TABLE: 5 min – ANALYZE TABLE ... PERSISTENT FOR ALL – 30 min
  • 21. MariaDB 10.4: Bernoulli sampling ● Default: analyze_sample_percentage=100 – Uses the entire table, slow ● Suggested: analyze_sample_percentage=0 – “Roll the dice” sampling, size picked automatically analyze table t1 persistent for columns (...) indexes(); analyze table t1 persistent for all; – full table and secondary index scans – does a full table scan
  • 22. Further plans: genuine sampling ● Work on avoiding full table scans is in progress ● Will allow to make ANALYZE TABLE collect all histograms
  • 24. Make the optimizer use histograms @@use_stat_tables=NEVER @@optimizer_use_condition_selectivity=1 @@use_stat_tables=PREFERABLY // also affects ANALYZE! @@optimizer_use_condition_selectivity=4 ● MariaDB before 10.4: does not use histograms ● MariaDB 10.4 : uses histograms if they are collected @@use_stat_tables=PREFERABLY_FOR_QUERIES @@optimizer_use_condition_selectivity=4 – remember to re-collect!
  • 25. Conclusions: how to start using histograms ● MariaDB before 10.4 analyze_sample_percentage=0 use_stat_tables=PREFERABLY # Changes optimizer optimizer_use_condition_selectivity=4 # behavior ● MariaDB 10.4 ● Both: ANALYZE TABLE ... PERSISTENT FOR ... histogram_size=254 # No risk histogram_type=DOUBLE_PREC_HB #
  • 26. Can I just have histograms for all columns?
  • 27. A stored procedure to analyze every table CREATE PROCEDURE analyze_persistent_for_all(db_name VARCHAR(64)) BEGIN DECLARE done INT DEFAULT FALSE; DECLARE x VARCHAR(64); DECLARE cur1 CURSOR FOR SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_SCHEMA=db_name; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE; OPEN cur1; read_loop: LOOP FETCH cur1 INTO x; IF done THEN LEAVE read_loop; END IF; SET @sql = CONCAT('analyze table ', x, ' persistent for all'); PREPARE stmt FROM @sql; EXECUTE stmt; DEALLOCATE PREPARE stmt; END LOOP; CLOSE cur1; END|
  • 28. Should I ANALYZE ... PERSISTENT every table? ● New application – Worth giving it a try – Provision for periodic ANALYZE – Column correlations? ● Existing application – Performance fixes on a case-by-case basis.
  • 30. TPC-DS benchmark ● scale=1 ● The same dataset – without histograms: ~20 min – after ‘call analyze_persistent_for_all(‘tpcds’) from two slides prior: 5 min.
  • 32. A customer case with ORDER BY ... LIMIT ● table/column names replaced CREATE TABLE cars ( type varchar(10), company varchar(20), model varchar(20), quantity int, KEY quantity (quantity), KEY type (type) ); select * from cars where type='electric' and company='audi' order by quantity limit 3; ● table/column names replaced ● quantity matches the ORDER BY, but need to match condition ● type is a Restrictive index
  • 33. A customer case with ORDER BY ... LIMIT ● Uses ORDER-BY compatible index by default *************************** 1. row *************************** id: 1 select_type: SIMPLE table: cars type: index possible_keys: type key: quantity key_len: 5 ref: const rows: 994266 r_rows: 700706.00 filtered: 0.20 r_filtered: 0.00 Extra: Using where 1 row in set (2.098 sec) select * from cars where type='electric' and company='audi' order by quantity limit 3;
  • 34. A customer case with ORDER BY ... LIMIT ● Providing the optimizer with histogram *************************** 1. row *************************** id: 1 select_type: SIMPLE table: cars type: ref possible_keys: type key: type key_len: 13 ref: const rows: 2022 r_rows: 3.00 filtered: 100.00 r_filtered: 100.00 Extra: Using index condition; Using where; Using filesort 1 row in set (0.010 sec) analyze table cars persistent for all; select * from cars where type='electric' and company='audi' order by quantity limit 3;
  • 36. Histograms are stored in a table CREATE TABLE mysql.column_stats ( db_name varchar(64) NOT NULL, table_name varchar(64) NOT NULL, column_name varchar(64) NOT NULL, min_value varbinary(255) DEFAULT NULL, max_value varbinary(255) DEFAULT NULL, nulls_ratio decimal(12,4) DEFAULT NULL, avg_length decimal(12,4) DEFAULT NULL, avg_frequency decimal(12,4) DEFAULT NULL, hist_size tinyint unsigned, hist_type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB'), histogram varbinary(255), PRIMARY KEY (db_name,table_name,column_name) );
  • 37. TPC-DS benchmark ● Can save/restore histograms ● Can set @@optimizer_use_condition_selectivity to disable histogram use per-thread
  • 39. Problem with correlated conditions ● Possible selectivities – MIN(1/n, 1/m) – (1/n) * (1/m) – 0 select ... from order_items where shipdate='2015-12-15' AND item_name='christmas light' 'swimsuit'
  • 40. Problem with correlated conditions ● PostgreSQL: Multi-variate statistics – Detects functional dependencies, col1=F(col2) – Only used for equality predicates – Also #DISTINCT(a,b) ● MariaDB: MDEV-11107: Use table check constraints in optimizer – In development select ... from order_items where shipdate='2015-12-15' AND item_name='christmas light' 'swimsuit'