SlideShare a Scribd company logo
1 of 41
Using histograms
to get better performance
Sergei Petrunia
Varun Gupta
Database performance
● Performance is a product of many
factors
● One of them is Query optimizer
● It produces query plans
– A “good” query plan only
reads rows that contribute to
the query result
– A “bad” query plan means
unnecessary work is done
Do my queries use bad query plans?
● Queries take a long time
● Some are just inherently hard to
compute
● Some look good but turn out bad
due to factors that were not
accounted for
Query plan cost depends on data statistics
select *
from
lineitem, orders
where
o_orderkey=l_orderkey and
o_orderdate between '1990-01-01' and '1998-12-06' and
l_extendedprice > 1000000
● orders->lineitem
vs
lineitem->orders
● Depends on
condition selectivity
Another choice optimizer has to make
select *
from
orders
where
o_orderstatus='F'
order by
order_date
limit 10
● Use index(order_date)
– Stop as soon as we find 10 matches
● Find rows with o_orderstatus='F'
– Sort by o_orderdate picking first 10
● Again, it depends on condition
selectivity.
Data statistics in MariaDB
● Table: #rows in the table
● Index
– cardinality: AVG(#lineitems per order)
– “range estimates” - #rows(t.key BETWEEN const1 and
const2)
● Non-index column? Histogram
Histogram
● Partition the value space into buckets
– Store bucket bounds and #values in the bucket
– Imprecise
– Very compact
Summary so far
● Good database performance requires good query plans
● To pick those, optimizer needs statistics about the data
– Condition selectivity is important
● Certain kinds of statistics are always available
– Indexes
– For non-indexed columns, histograms may be needed.
Do my query plans suffer
from bad statistics?
Will my queries benefit?
● Very complex question
● No definite answer
● Suggestions
– ANALYZE for statements, r_filtered.
– Slow query log
ANALYZE for statements and r_filtered
● filtered – % of rows left after applying condition (expectation)
– r_filtered - ... - the reality
● r_filtered << filtered – the optimizer didn’t know the condition is selective
– Happens on a non-first table? We are filtering out late!
●
Add histogram on the column (Check the cond in FORMAT=JSON)
analyze select *
from lineitem, orders
where o_orderkey=l_orderkey and
o_orderdate between '1990-01-01' and '1998-12-06' and
l_extendedprice > 1000000
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra |
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where|
|1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where|
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
# Query_time: 1.961549 Lock_time: 0.011164 Rows_sent: 1 Rows_examined: 11745000
# Rows_affected: 0 Bytes_sent: 73
# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No
# Filesort: No Filesort_on_disk: No Merge_passes: 0 Priority_queue: No
#
# explain: id select_type table type possible_keys key key_len ref rows r_rows
filtered r_filtered Extra
# explain: 1 SIMPLE inventory ALL NULL NULL NULL NULL 11837024
11745000.00 100.00 0.00 Using where
#
SET timestamp=1551155484;
select count(inv_date_sk) from inventory where inv_quantity_on_hand>10000;
Slow Query Log
slow-query-log
long-query-time=...
log-slow-verbosity=query_plan,explain
my.cnf
hostname-slow.log
● Rows_examined >> Rows_sent? Grouping,or a poor query plan
● log_slow_query=explain will shows ANALYZE output
Histograms in MariaDB
Histograms in MariaDB
● Available since MariaDB 10.0 (Yes)
● Used by advanced users
● Not enabled by default
● Have limitations, not user-friendly
● MariaDB 10.4
– Fixes some of the limitations
– Makes histograms easier to use
Collecting histograms
Configuration for collecting histograms
histogram_size=0
histogram_type=SINGLE_PREC_HB
histogram_size=254
histogram_type=DOUBLE_PREC_HB
● MariaDB before 10.4: change the default histogram size
● MariaDB 10.4 : enable automatic sampling
histogram_size=254
histogram_type=DOUBLE_PREC_HB
analyze_sample_percentage=100
analyze_sample_percentage=0
Histograms are [still] not collected by default
● “ANALYZE TABLE” will not collect a histogram
MariaDB> analyze table t1;
+---------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t1 | analyze | status | OK |
+---------+---------+----------+----------+
● This will collect only
– Total #rows in table
– Index cardinalities (#different values)
ANALYZE ... PERSISTENT collects histograms
– Collect statistics for everything:
analyze table t1 persistent
for columns (col1,...) indexes (idx1,...);
+---------+---------+----------+-----------------------------------------+
| Table | Op | Msg_type | Msg_text |
+---------+---------+----------+-----------------------------------------+
| test.t1 | analyze | status | Engine-independent statistics collected |
| test.t1 | analyze | status | OK |
+---------+---------+----------+-----------------------------------------+
analyze table t1 persistent for all;
Can make histogram collection automatic
set use_stat_tables='preferably';
analyze table t1;
+---------+---------+----------+-----------------------------------------+
| Table | Op | Msg_type | Msg_text |
+---------+---------+----------+-----------------------------------------+
| test.t1 | analyze | status | Engine-independent statistics collected |
| test.t1 | analyze | status | OK |
+---------+---------+----------+-----------------------------------------+
● Beware: this may be *much* slower than ANALYZE TABLE
you’re used to
● Great for migrations
Histogram collection performance
● MariaDB 10.0: uses all data in the table to build histogram
– Precise, but expensive
– Particularly so for VARCHARs
● A test on a real table:
– Real table, 740M rows, 90GB
– CHECKSUM TABLE: 5 min
– ANALYZE TABLE ... PERSISTENT FOR ALL – 30 min
MariaDB 10.4: Bernoulli sampling
● Default: analyze_sample_percentage=100
– Uses the entire table, slow
● Suggested: analyze_sample_percentage=0
– “Roll the dice” sampling, size picked automatically
analyze table t1 persistent for columns (...) indexes();
analyze table t1 persistent for all;
– full table and secondary index scans
– does a full table scan
Further plans: genuine sampling
● Work on avoiding full table scans is in progress
● Will allow to make ANALYZE TABLE collect all histograms
Making the optimizer
use histograms
Make the optimizer use histograms
@@use_stat_tables=NEVER
@@optimizer_use_condition_selectivity=1
@@use_stat_tables=PREFERABLY // also affects ANALYZE!
@@optimizer_use_condition_selectivity=4
● MariaDB before 10.4: does not use histograms
● MariaDB 10.4 : uses histograms if they are collected
@@use_stat_tables=PREFERABLY_FOR_QUERIES
@@optimizer_use_condition_selectivity=4
– remember to re-collect!
Conclusions: how to start using histograms
● MariaDB before 10.4
analyze_sample_percentage=0
use_stat_tables=PREFERABLY # Changes optimizer
optimizer_use_condition_selectivity=4 # behavior
● MariaDB 10.4
● Both: ANALYZE TABLE ... PERSISTENT FOR ...
histogram_size=254 # No risk
histogram_type=DOUBLE_PREC_HB #
Can I just have histograms
for all columns?
A stored procedure to analyze every table
CREATE PROCEDURE analyze_persistent_for_all(db_name VARCHAR(64))
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE x VARCHAR(64);
DECLARE cur1 CURSOR FOR
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_SCHEMA=db_name;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur1;
read_loop: LOOP
FETCH cur1 INTO x;
IF done THEN
LEAVE read_loop;
END IF;
SET @sql = CONCAT('analyze table ', x, ' persistent for all');
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END LOOP;
CLOSE cur1;
END|
Should I ANALYZE ... PERSISTENT every table?
● New application
– Worth giving it a try
– Provision for periodic ANALYZE
– Column correlations?
● Existing application
– Performance fixes on a case-by-case basis.
Tests and benchmarks
TPC-DS benchmark
● scale=1
● The same dataset
– without histograms: ~20 min
– after ‘call analyze_persistent_for_all(‘tpcds’) from two slides
prior: 5 min.
TPC-DS benchmark run
A customer case with ORDER BY ... LIMIT
● table/column names replaced
CREATE TABLE cars (
type varchar(10),
company varchar(20),
model varchar(20),
quantity int,
KEY quantity (quantity),
KEY type (type)
);
select * from cars
where
type='electric' and
company='audi'
order by
quantity
limit 3;
● table/column names replaced
● quantity matches the ORDER BY, but need to match condition
● type is a Restrictive index
A customer case with ORDER BY ... LIMIT
● Uses ORDER-BY compatible index by default
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: cars
type: index
possible_keys: type
key: quantity
key_len: 5
ref: const
rows: 994266
r_rows: 700706.00
filtered: 0.20
r_filtered: 0.00
Extra: Using where
1 row in set (2.098 sec)
select * from cars
where
type='electric' and
company='audi'
order by
quantity
limit 3;
A customer case with ORDER BY ... LIMIT
● Providing the optimizer with histogram
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: cars
type: ref
possible_keys: type
key: type
key_len: 13
ref: const
rows: 2022
r_rows: 3.00
filtered: 100.00
r_filtered: 100.00
Extra: Using index condition; Using where; Using filesort
1 row in set (0.010 sec)
analyze table cars persistent for all;
select * from cars
where
type='electric' and
company='audi'
order by
quantity
limit 3;
Operations
Histograms are stored in a table
CREATE TABLE mysql.column_stats (
db_name varchar(64) NOT NULL,
table_name varchar(64) NOT NULL,
column_name varchar(64) NOT NULL,
min_value varbinary(255) DEFAULT NULL,
max_value varbinary(255) DEFAULT NULL,
nulls_ratio decimal(12,4) DEFAULT NULL,
avg_length decimal(12,4) DEFAULT NULL,
avg_frequency decimal(12,4) DEFAULT NULL,
hist_size tinyint unsigned,
hist_type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB'),
histogram varbinary(255),
PRIMARY KEY (db_name,table_name,column_name)
);
TPC-DS benchmark
● Can save/restore histograms
● Can set @@optimizer_use_condition_selectivity to disable
histogram use per-thread
Caveat: correlations
Problem with correlated conditions
● Possible selectivities
– MIN(1/n, 1/m)
– (1/n) * (1/m)
– 0
select ...
from order_items
where shipdate='2015-12-15' AND item_name='christmas light'
'swimsuit'
Problem with correlated conditions
● PostgreSQL: Multi-variate statistics
– Detects functional dependencies, col1=F(col2)
– Only used for equality predicates
– Also #DISTINCT(a,b)
● MariaDB: MDEV-11107: Use table check constraints in optimizer
– In development
select ...
from order_items
where shipdate='2015-12-15' AND item_name='christmas light'
'swimsuit'
Thanks!

More Related Content

What's hot

The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
Morgan Tocker
 

What's hot (20)

PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
 
PostgreSQL Extensions: A deeper look
PostgreSQL Extensions:  A deeper lookPostgreSQL Extensions:  A deeper look
PostgreSQL Extensions: A deeper look
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in Action
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Percona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient BackupsPercona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient Backups
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1
 
MySQL 5.5 Guide to InnoDB Status
MySQL 5.5 Guide to InnoDB StatusMySQL 5.5 Guide to InnoDB Status
MySQL 5.5 Guide to InnoDB Status
 
MySQL Data Encryption at Rest
MySQL Data Encryption at RestMySQL Data Encryption at Rest
MySQL Data Encryption at Rest
 
Histograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLHistograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQL
 
MariaDB Performance Tuning Crash Course
MariaDB Performance Tuning Crash CourseMariaDB Performance Tuning Crash Course
MariaDB Performance Tuning Crash Course
 
MariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & Optimization
 
Tuning Autovacuum in Postgresql
Tuning Autovacuum in PostgresqlTuning Autovacuum in Postgresql
Tuning Autovacuum in Postgresql
 
The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScale
 

Similar to How to use histograms to get better performance

Adaptive Query Optimization in 12c
Adaptive Query Optimization in 12cAdaptive Query Optimization in 12c
Adaptive Query Optimization in 12c
Anju Garg
 
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
Sergey Petrunya
 
PostgreSQL 9.5 Features
PostgreSQL 9.5 FeaturesPostgreSQL 9.5 Features
PostgreSQL 9.5 Features
Saiful
 

Similar to How to use histograms to get better performance (20)

MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
 
MariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histogramsMariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histograms
 
Adaptive Query Optimization in 12c
Adaptive Query Optimization in 12cAdaptive Query Optimization in 12c
Adaptive Query Optimization in 12c
 
A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
 
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
 
Adapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cAdapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12c
 
Pro PostgreSQL
Pro PostgreSQLPro PostgreSQL
Pro PostgreSQL
 
Need for Speed: MySQL Indexing
Need for Speed: MySQL IndexingNeed for Speed: MySQL Indexing
Need for Speed: MySQL Indexing
 
MySQL performance tuning
MySQL performance tuningMySQL performance tuning
MySQL performance tuning
 
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
 
SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所
 
Histograms in 12c era
Histograms in 12c eraHistograms in 12c era
Histograms in 12c era
 
PostgreSQL 9.5 Features
PostgreSQL 9.5 FeaturesPostgreSQL 9.5 Features
PostgreSQL 9.5 Features
 
Shaping Optimizer's Search Space
Shaping Optimizer's Search SpaceShaping Optimizer's Search Space
Shaping Optimizer's Search Space
 
sqltuningcardinality1(1).ppt
sqltuningcardinality1(1).pptsqltuningcardinality1(1).ppt
sqltuningcardinality1(1).ppt
 
New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12
 
MariaDB Temporal Tables
MariaDB Temporal TablesMariaDB Temporal Tables
MariaDB Temporal Tables
 
Advanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and AnalysisAdvanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and Analysis
 

More from MariaDB plc

More from MariaDB plc (20)

MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Newpharma
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - Cloud
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB Enterprise
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentation
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentation
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDB
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise Server
 
Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysis
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoring
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connector
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQL
 
What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

How to use histograms to get better performance

  • 1. Using histograms to get better performance Sergei Petrunia Varun Gupta
  • 2. Database performance ● Performance is a product of many factors ● One of them is Query optimizer ● It produces query plans – A “good” query plan only reads rows that contribute to the query result – A “bad” query plan means unnecessary work is done
  • 3. Do my queries use bad query plans? ● Queries take a long time ● Some are just inherently hard to compute ● Some look good but turn out bad due to factors that were not accounted for
  • 4. Query plan cost depends on data statistics select * from lineitem, orders where o_orderkey=l_orderkey and o_orderdate between '1990-01-01' and '1998-12-06' and l_extendedprice > 1000000 ● orders->lineitem vs lineitem->orders ● Depends on condition selectivity
  • 5. Another choice optimizer has to make select * from orders where o_orderstatus='F' order by order_date limit 10 ● Use index(order_date) – Stop as soon as we find 10 matches ● Find rows with o_orderstatus='F' – Sort by o_orderdate picking first 10 ● Again, it depends on condition selectivity.
  • 6. Data statistics in MariaDB ● Table: #rows in the table ● Index – cardinality: AVG(#lineitems per order) – “range estimates” - #rows(t.key BETWEEN const1 and const2) ● Non-index column? Histogram
  • 7. Histogram ● Partition the value space into buckets – Store bucket bounds and #values in the bucket – Imprecise – Very compact
  • 8. Summary so far ● Good database performance requires good query plans ● To pick those, optimizer needs statistics about the data – Condition selectivity is important ● Certain kinds of statistics are always available – Indexes – For non-indexed columns, histograms may be needed.
  • 9. Do my query plans suffer from bad statistics?
  • 10. Will my queries benefit? ● Very complex question ● No definite answer ● Suggestions – ANALYZE for statements, r_filtered. – Slow query log
  • 11. ANALYZE for statements and r_filtered ● filtered – % of rows left after applying condition (expectation) – r_filtered - ... - the reality ● r_filtered << filtered – the optimizer didn’t know the condition is selective – Happens on a non-first table? We are filtering out late! ● Add histogram on the column (Check the cond in FORMAT=JSON) analyze select * from lineitem, orders where o_orderkey=l_orderkey and o_orderdate between '1990-01-01' and '1998-12-06' and l_extendedprice > 1000000 +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra | +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where| |1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where| +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
  • 12. # Query_time: 1.961549 Lock_time: 0.011164 Rows_sent: 1 Rows_examined: 11745000 # Rows_affected: 0 Bytes_sent: 73 # Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: No Filesort_on_disk: No Merge_passes: 0 Priority_queue: No # # explain: id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra # explain: 1 SIMPLE inventory ALL NULL NULL NULL NULL 11837024 11745000.00 100.00 0.00 Using where # SET timestamp=1551155484; select count(inv_date_sk) from inventory where inv_quantity_on_hand>10000; Slow Query Log slow-query-log long-query-time=... log-slow-verbosity=query_plan,explain my.cnf hostname-slow.log ● Rows_examined >> Rows_sent? Grouping,or a poor query plan ● log_slow_query=explain will shows ANALYZE output
  • 14. Histograms in MariaDB ● Available since MariaDB 10.0 (Yes) ● Used by advanced users ● Not enabled by default ● Have limitations, not user-friendly ● MariaDB 10.4 – Fixes some of the limitations – Makes histograms easier to use
  • 16. Configuration for collecting histograms histogram_size=0 histogram_type=SINGLE_PREC_HB histogram_size=254 histogram_type=DOUBLE_PREC_HB ● MariaDB before 10.4: change the default histogram size ● MariaDB 10.4 : enable automatic sampling histogram_size=254 histogram_type=DOUBLE_PREC_HB analyze_sample_percentage=100 analyze_sample_percentage=0
  • 17. Histograms are [still] not collected by default ● “ANALYZE TABLE” will not collect a histogram MariaDB> analyze table t1; +---------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------+---------+----------+----------+ | test.t1 | analyze | status | OK | +---------+---------+----------+----------+ ● This will collect only – Total #rows in table – Index cardinalities (#different values)
  • 18. ANALYZE ... PERSISTENT collects histograms – Collect statistics for everything: analyze table t1 persistent for columns (col1,...) indexes (idx1,...); +---------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +---------+---------+----------+-----------------------------------------+ | test.t1 | analyze | status | Engine-independent statistics collected | | test.t1 | analyze | status | OK | +---------+---------+----------+-----------------------------------------+ analyze table t1 persistent for all;
  • 19. Can make histogram collection automatic set use_stat_tables='preferably'; analyze table t1; +---------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +---------+---------+----------+-----------------------------------------+ | test.t1 | analyze | status | Engine-independent statistics collected | | test.t1 | analyze | status | OK | +---------+---------+----------+-----------------------------------------+ ● Beware: this may be *much* slower than ANALYZE TABLE you’re used to ● Great for migrations
  • 20. Histogram collection performance ● MariaDB 10.0: uses all data in the table to build histogram – Precise, but expensive – Particularly so for VARCHARs ● A test on a real table: – Real table, 740M rows, 90GB – CHECKSUM TABLE: 5 min – ANALYZE TABLE ... PERSISTENT FOR ALL – 30 min
  • 21. MariaDB 10.4: Bernoulli sampling ● Default: analyze_sample_percentage=100 – Uses the entire table, slow ● Suggested: analyze_sample_percentage=0 – “Roll the dice” sampling, size picked automatically analyze table t1 persistent for columns (...) indexes(); analyze table t1 persistent for all; – full table and secondary index scans – does a full table scan
  • 22. Further plans: genuine sampling ● Work on avoiding full table scans is in progress ● Will allow to make ANALYZE TABLE collect all histograms
  • 24. Make the optimizer use histograms @@use_stat_tables=NEVER @@optimizer_use_condition_selectivity=1 @@use_stat_tables=PREFERABLY // also affects ANALYZE! @@optimizer_use_condition_selectivity=4 ● MariaDB before 10.4: does not use histograms ● MariaDB 10.4 : uses histograms if they are collected @@use_stat_tables=PREFERABLY_FOR_QUERIES @@optimizer_use_condition_selectivity=4 – remember to re-collect!
  • 25. Conclusions: how to start using histograms ● MariaDB before 10.4 analyze_sample_percentage=0 use_stat_tables=PREFERABLY # Changes optimizer optimizer_use_condition_selectivity=4 # behavior ● MariaDB 10.4 ● Both: ANALYZE TABLE ... PERSISTENT FOR ... histogram_size=254 # No risk histogram_type=DOUBLE_PREC_HB #
  • 26. Can I just have histograms for all columns?
  • 27. A stored procedure to analyze every table CREATE PROCEDURE analyze_persistent_for_all(db_name VARCHAR(64)) BEGIN DECLARE done INT DEFAULT FALSE; DECLARE x VARCHAR(64); DECLARE cur1 CURSOR FOR SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_SCHEMA=db_name; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE; OPEN cur1; read_loop: LOOP FETCH cur1 INTO x; IF done THEN LEAVE read_loop; END IF; SET @sql = CONCAT('analyze table ', x, ' persistent for all'); PREPARE stmt FROM @sql; EXECUTE stmt; DEALLOCATE PREPARE stmt; END LOOP; CLOSE cur1; END|
  • 28. Should I ANALYZE ... PERSISTENT every table? ● New application – Worth giving it a try – Provision for periodic ANALYZE – Column correlations? ● Existing application – Performance fixes on a case-by-case basis.
  • 30. TPC-DS benchmark ● scale=1 ● The same dataset – without histograms: ~20 min – after ‘call analyze_persistent_for_all(‘tpcds’) from two slides prior: 5 min.
  • 32. A customer case with ORDER BY ... LIMIT ● table/column names replaced CREATE TABLE cars ( type varchar(10), company varchar(20), model varchar(20), quantity int, KEY quantity (quantity), KEY type (type) ); select * from cars where type='electric' and company='audi' order by quantity limit 3; ● table/column names replaced ● quantity matches the ORDER BY, but need to match condition ● type is a Restrictive index
  • 33. A customer case with ORDER BY ... LIMIT ● Uses ORDER-BY compatible index by default *************************** 1. row *************************** id: 1 select_type: SIMPLE table: cars type: index possible_keys: type key: quantity key_len: 5 ref: const rows: 994266 r_rows: 700706.00 filtered: 0.20 r_filtered: 0.00 Extra: Using where 1 row in set (2.098 sec) select * from cars where type='electric' and company='audi' order by quantity limit 3;
  • 34. A customer case with ORDER BY ... LIMIT ● Providing the optimizer with histogram *************************** 1. row *************************** id: 1 select_type: SIMPLE table: cars type: ref possible_keys: type key: type key_len: 13 ref: const rows: 2022 r_rows: 3.00 filtered: 100.00 r_filtered: 100.00 Extra: Using index condition; Using where; Using filesort 1 row in set (0.010 sec) analyze table cars persistent for all; select * from cars where type='electric' and company='audi' order by quantity limit 3;
  • 36. Histograms are stored in a table CREATE TABLE mysql.column_stats ( db_name varchar(64) NOT NULL, table_name varchar(64) NOT NULL, column_name varchar(64) NOT NULL, min_value varbinary(255) DEFAULT NULL, max_value varbinary(255) DEFAULT NULL, nulls_ratio decimal(12,4) DEFAULT NULL, avg_length decimal(12,4) DEFAULT NULL, avg_frequency decimal(12,4) DEFAULT NULL, hist_size tinyint unsigned, hist_type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB'), histogram varbinary(255), PRIMARY KEY (db_name,table_name,column_name) );
  • 37. TPC-DS benchmark ● Can save/restore histograms ● Can set @@optimizer_use_condition_selectivity to disable histogram use per-thread
  • 39. Problem with correlated conditions ● Possible selectivities – MIN(1/n, 1/m) – (1/n) * (1/m) – 0 select ... from order_items where shipdate='2015-12-15' AND item_name='christmas light' 'swimsuit'
  • 40. Problem with correlated conditions ● PostgreSQL: Multi-variate statistics – Detects functional dependencies, col1=F(col2) – Only used for equality predicates – Also #DISTINCT(a,b) ● MariaDB: MDEV-11107: Use table check constraints in optimizer – In development select ... from order_items where shipdate='2015-12-15' AND item_name='christmas light' 'swimsuit'