SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
MySQL查询调优实践

            David Jiang
        Weibo: insidemysql
Gtalk: jiangchengyao@gmail.com

                            DTCC2012
主题
• B+树索引
• 简单查询优化
 – OLTP
• 复杂查询优化
 – OLAP




                DTCC2012
关于我
• 近10年MySQL数据库使用经验
 – MySQL 3.23 ~ MySQL 5.6
• InnoSQL分支版本创始人
 – www.innomysql.org
• 独立数据库咨询顾问
 – www.innosql.com
• 《MySQL技术内幕》系列作者
 – 《MySQL技术内幕: InnoDB存储引擎》(已出版)
 – 《MySQL技术内幕: SQL编程》(已出版)
 – 《MySQL技术内幕: 性能调优》(待出版)

                            DTCC2012
B+树索引
• 常用的索引
 – B+ Tree Index
 – T Tree Index
 – Hash Index
• 什么是索引?
 – 提高查询速度???
 – It depends
 – 减少IO次数
                           DTCC2012
B+树索引
• 聚集索引(Clustered Index)
  – 叶子节点存放整行记录
• 辅助索引(Secondary Index)
  – 叶子节点存放row identifier
    • InnoDB:primary key
       – 书签查找(bookmark lookup)
          » 查找代价大
    • MyISAM:物理位置(偏移量)
       – 更新代价大
• B+树的高度=> IO次数=>随机IO
  – 3~4层
                                 DTCC2012
B+树索引-InnoDB存储引擎




              DTCC2012
B+树索引-MyISAM存储引擎




             DTCC2012
B+树索引
•   聚集索引 VS 辅助索引
    –   Clustered index key = 4 bytes
    –   Secondary index key = 4 bytes
    –   Key pointer = 6 bytes
    –   Average row length = 300 bytes
    –   Page size = 16K = 16384 bytes
    –   Average node occupancy = 70% ( both for leaf and index page )
    –   Fan-out for clustered index = 16384 * 70% / ( 4+6 ) = 1000
    –   Fan-out for secondary index = 16384 * 70% / ( 4+ 6 ) = 1000
    –   Average row per page for clustered index = 16384 * 70% / 300 = 35
    –   Average row per page for clustered index = 16384 * 70% / ( 4 + 6 ) = 1000

              H     Clustered Index                       Secondary Index
              2     1000*35 = 35,000                       1000 * 1000 = 1,000,000
              3     (1000)2*35 = 35,000,000               (1000)2*1000 = 1,000,000,000
              4     (1000)3*35 = 35,000,000,000           (1000)3*1000 = 1,000,000,000,000
                                                                             DTCC2012
B+树索引
• 辅助索引的优势
 – 树的高度较小=>需要的IO次数少
 – 树的大小较小=>scan需要扫描的页较少
 – 优化器倾向于使用辅助索引
• 辅助索引的劣势
 – 查找完整记录还需查询
  • InnoDB:查询聚集索引
  • MyISAM:直接查找MYD物理位置

                         DTCC2012
B+树索引-InnoDB索引
     CREATE TABLE UserInfo (
     userid INT NOT NULL AUTO_INCREMENT,
     username VARCHAR(30),
     registdate DATETIME,
     email     VARCHAR(50),
     PRIMARY KEY (userid),
     UNIQUE KEY idx_username (username),
     KEY idx_registdate (registdate)
     )Engine=InnoDB;


CREATE TABLE UserInfo (                    CREATE TABLE idx_username (   CREATE TABLE idx_registdate (
userid INT NOT NULL AUTO_INCREMENT,        userid INT NOT NULL,          userid INT NOT NULL,
username VARCHAR(30),                      username VARCHAR(30),         registdate DATETIME),
registdate DATETIME,                       PRIMARY KEY (username,userid) PRIMARY KEY (registdate,userid)
email      VARCHAR(50),                    );                            );
PRIMARY KEY (userid)                                                            DTCC2012
);
B+树索引-InnoDB索引




             DTCC2012
B+树索引-插入顺序问题
• 聚集索引插入
 – 主键是自增长的
 – 插入是顺序的
 – 每页中的填充率高(15/16)
 – 顺序扫描(Scan)可以达到磁盘顺序读的速率
 – 一般不推荐使用UUID
  • 插入非顺序



                    DTCC2012
B+树索引-插入顺序问题
• 辅助索引插入
 – 插入的顺序是乱序的
  • 插入(’David’, ‘Monty’, ‘Jimmy’, ‘Amy’, ‘Michael’ )
 – 插入的顺序是顺序的
  • 插入时间
 – 需要产生B+树的分裂
  • 需要较大的开销
 – 每页的填充率较低(60%~70%)
 – 顺序扫描不能达到磁盘顺序读的速率
  • 若插入是乱序的
                                            DTCC2012
B+树索引-插入顺序问题
• 辅助索引顺序扫描速度
   – select count(1) from stock
        • KEY `fkey_stock_2` (`s_i_id`), INT
        • Avg row length: 355


辅助索引:6分42秒
  •~4M/秒
  •Logical_reads: 12700001 Physical_reads: 100057

强制聚集索引:4分38秒
  •~120~130M/秒
  •Logical_reads: 14670405 Physical_reads: 2170333
                                                     DTCC2012
简单查询优化
• 简单查询
 – OLTP
• 简单查询特点
 – SQL语句较为简单
 – 返回少部分数据
 – 并发量大
• 优化原则
 – 减少随机读=> 使用索引
   • High Cardinality
                        DTCC2012
简单查询优化
• SELECT … FROM table where primary_key = ???
• SELECT … FROM table where key = ???




                                   DTCC2012
简单查询优化-复合索引
• 索引键值为多个列
 – (a,b)




 a : 1,1,2,2,3,3
 (a, b): (1,1),(1,2),(2,1),(2,4),(3,1),(3,2)
 b: 1,2,1,4,1,2
                                               DTCC2012
简单查询优化-复合索引
• 复合索引(a,b)可被使用于:
 •   SELECT * FROM t WHERE a = ?
 •   SELECT * FROM t WHERE a = ? AND b = ?
 •   SELECT * FROM t WHERE a = ? ORDER BY b
 •   索引覆盖
     – 查询b也可以使用该索引
     – WHERE b = ???




                                              DTCC2012
简单查询优化-索引覆盖
• 从辅助索引直接得到结果
  – 不需要书签查找
• (primary key1,primary key2,…,key1,
  key2,…)
  – SELECT key2 FROM table WHERE key1=xxx;
  – SELECT primary key2,key2 FROM table WHERE
    key1=xxx;
  – SELECT primary key1,key2 FROM table WHERE
    key1=xxx;
  – SELECT primary key1,primary key2,key2 FROM table
    WHERE key1=xxx;

                                          DTCC2012
简单查询优化-索引覆盖
CREATE TABLE ItemLog(
logId INT NOT NULL AUTO_INCREMENT,
userId VARCHAR(100) NOT NULL,
itemId INT NOT NULL,
date DATETIME NOT NULL,
……
PRIMARY KEY(logId),
KEY idx_userId_date (userId,date)
)ENGINE=INNODB;


SELECT COUNT(1) FROM ItemLog WHERE date>='2012-04-01' AND date<'2012-05-01';




                                                             DTCC2012
简单查询优化-书签查找优化
CREATE TABLE UserInfo (
userid INT NOT NULL AUTO_INCREMENT,
username VARCHAR(30),
registdate DATETIME,
email     VARCHAR(50),
PRIMARY KEY (userid),
UNIQUE KEY idx_username (username),
KEY idx_registdate (registdate)
)Engine=InnoDB;

SELECT email FROM UserInfo WHERE username = ‘David’

If:聚集索引高度:4 && 辅助索引高度:3
Then: 一共需要7次逻辑IO


                                                      DTCC2012
简单查询优化-书签查找优化
• 分表
   – Like Index Coverage
CREATE TABLE UserInfo (               CREATE TABLE UserInfoDetail (
userid INT NOT NULL AUTO_INCREMENT,   userid INT NOT NULL AUTO_INCREMENT,
username VARCHAR(30),                 username VARCHAR(30),
registdate DATETIME,                  email     VARCHAR(50),
PRIMARY KEY (userid),                 PRIMARY KEY (userid),
UNIQUE KEY idx_username (username),   UNIQUE KEY idx_username (username)
KEY idex_registdate (registdate)      );
);

SELECT email FROM UserInfoDetail WHERE username = ‘David’
If:辅助索引高度:3
Then:逻辑IO减少为3
                                                            DTCC2012
简单查询优化-总结
• 每个页填充率高
 – 包含的记录多
• 减少IO次数
 – IO => 性能
 – OLTP only
• 利用索引覆盖技术避免书签查找
• 利用分表技术避免书签查找

                    DTCC2012
复杂查询优化
• 复杂查询
 – OLAP
     • JOIN
     • Subquery
• 复杂查询特点
 –   数据量大
 –   并发少
 –   需访问较多的数据
 –   索引并不再是唯一的优化方向
 –   调优工作复杂
                           DTCC2012
复杂查询优化-JOIN
• MySQL JOIN 类型
  – Simple Nested Loops Join
  – Block Nested Loops Join
     • MySQL 5.5+
  – Classic Hash Join
     • MariaDB 5.3+
     • Block Hash Nested Loops Join



                                      DTCC2012
复杂查询优化-SNLJ
For each tuple r in R do
     For each tuple s in S do
          If r and s satisfy the join condition
                Then output the tuple <r,s>


For each tuple r in R do
    lookup r join condition in S index
         if found s == r
                    Then output the tuple <r,s>

Scan cost (no index) = Rn+Rn×Sn = O(Rn×Sn)

Scan cost (with index) = Rn + Rn × SBH = = O(Rn)

                                                   DTCC2012
复杂查询优化-SNLJ
• INNER JOIN with Index
   – INNER JOIN联接顺序可更改
   – 优化器喜欢选择较小的表作为外部表
   – Scan cost (with index) = Rn + Rn × SBH = = O(Rn)
  SELECT b.emp_no,a.title,a.from_date,a.to_date
  FROM titles a
  INNER JOIN employees b
  on a.emp_no = b.emp_no;



  mysql> SELECT COUNT(1) FROM employeesG;
  *************************** 1. row ***************************
  count(1): 300024

  mysql> SELECT COUNT(1) FROM titlesG;
  *************************** 1. row ***************************
                                                                   DTCC2012
  count(1): 443308
复杂查询优化-SNLJ
SELECT b.emp_no,a.title,a.from_date,a.to_date
FROM titles a
STRAIGHT_JOIN employees b
on a.emp_no = b.emp_no;




                                                DTCC2012
复杂查询优化-BNLJ
For each tuple r in R do
   store used columns as p from R in join buffer
   For each tuple s in S do
           If p and s satisfy the join condition
                   Then output the tuple <p,s>


•The join_buffer_size system variable determines the size of each join buffer
•Join buffering can be used when the join is of type ALL or index or range.
•One buffer is allocated for each join that can be buffered, so a given query might be
processed using multiple join buffers.
•Only columns of interest to the join are stored in the join buffer, not whole rows.

  For join with no index
  Reduce inner table scan times
                                                                      DTCC2012
复杂查询优化-Example
• Outer table R
  – (1,’a’),(2,’b’),(3,’c’),(4,’d’)
• Inner table S:
  – (1,’2010-01-01’),(2, ’2010-01-01’),(3, ’2010-01-01’)

                             NLJ         BNLJ
        Outer table scan     1           1
        Inner table scan     4           1
        Compare times        12          12

                                                DTCC2012
复杂查询优化-BNLJ
SELECT b.emp_no,a.title,a.from_date,a.to_date
FROM employees_noindex b
INNER JOIN titles_noindex a ON a.emp_no = b.emp_no
WHERE b.birth_date >= '1965-01-01';




                                             DTCC2012
复杂查询优化-BNLJ
SELECT b.emp_no,a.title,a.from_date,a.to_date
FROM titles_noindex a
LEFT OUTER JOIN employees_noindex b ON a.emp_no = b.emp_no
WHERE b.birth_date >= '1965-01-01';



MySQL 5.5 709.645 sec (MySQL 5.5不支持OUTER JOIN)




MySQL 5.6 57.483 sec ~12x




                                                             DTCC2012
复杂查询优化-BHNJ
For each tuple r in R do
    store used columns as p from R in join buffer
    build hash table according join buffer
     for each tuple s in S do
           probe hash table
           if find
                   Then output the tuple <r,s>

• 通过哈希表减少内部表的比较次数
• Join Buffer Size的大小决定了Classic Hash Join的效率
• 只能用于等值联接

 Scan cost = Rn + Sn = = O(Rn + Sn )


                                                    DTCC2012
复杂查询优化-Example
• Outer table
  – (1,’a’),(2,’b’),(3,’c’),(4,’d’)
• Inner table:
  – (1,’2010-01-01’),(2, ’2010-01-01’),(3, ’2010-01-01’)
                           NLJ        BNLJ   BNLJH
        Outer table scan   1          1      1
        Inner table scan   4          1      1
        Compare times      12         12     3



                                             DTCC2012
复杂查询优化-BHNJ
SELECT MAX(l_extendedprice)
FROM orders, lineitem WHERE
o_orderdate BETWEEN '1995-01-01' AND '1995-01-31' AND
l_orderkey=o_orderkey;

MySQL 5.5 125.3sec




MariaDB 5.3 23.104sec ~5x




                                                        DTCC2012
复杂查询优化-BHNJ

                                 执行速度(秒)    逻辑IO

MySQL 5.5                        125.3      157061

MySQL 5.5 (强制使用i_l_orderkey索引)   153.021    406879

MariaDB 5.3 (BNLH)               23.104     6077083




                                           DTCC2012
复杂查询优化-JOIN总结
• SNLJ
  – 适合较小表之间的联接
  – 联接的列需包含有索引
• BHNLJ
  – 大表之间的等值联接操作
  – 大表与小表之间的等值联接操作
  – Join Buffer Size的大小决定了内部表的扫描次数
  – 推荐MariaDB作为数据集市或者数据仓库
                           DTCC2012
查询优化-子查询
• 子查询
 – 独立子查询
 – 相关子查询
• MySQL子查询
 – 独立子查询转换为相关子查询(SQL重写 LAZY)
   • SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
   • SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2
     WHERE t2.b = t1.a);
   • Scan cost: O(A+A*B)
 – 优化视情况而定

                                                 DTCC2012
查询优化-子查询
SELECT orderid,customerid,employeeid,orderdate
FROM orders
WHERE orderdate IN
         ( SELECT MAX(orderdate)
                  FROM orders
                  GROUP BY (DATE_FORMAT(orderdate,'%Y%m'))
         )
# Time: 111227 23:49:16
# User@Host: root[root] @ localhost [127.0.0.1]
# Query_time: 6.081214 Lock_time: 0.046800 Rows_sent: 42 Rows_examined: 727558 Logical_reads: 91584 Physical_reads: 19
use tpcc;
SET timestamp=1325000956;
SELECT orderid,customerid,employeeid,orderdate
FROM orders
WHERE orderdate IN
              ( SELECT MAX(orderdate)
                            FROM orders
                            GROUP BY (DATE_FORMAT(orderdate,'%Y%M'))
              );                                                                                DTCC2012
查询优化-子查询


SELECT orderid,customerid,employeeid,orderdate
FROM orders AS A
WHERE EXISTS
         ( SELECT *
                   FROM orders
                   GROUP BY(DATE_FORMAT(orderdate,'%Y%M'))
                   HAVING MAX(orderdate)= A.OrderDate
         );




                                                             DTCC2012
查询优化-子查询
SELECT orderid,customerid,employeeid,orderdate
FROM orders A
WHERE EXISTS
         ( SELECT * FROM (SELECT MAX(orderdate) AS orderdate
                  FROM orders
                  GROUP BY (DATE_FORMAT(orderdate,'%Y%M')) ) B
                  WHERE A.orderdate = B.orderdate
         );




# Time: 111227 23:45:49
# User@Host: root[root] @ localhost [127.0.0.1]
# Query_time: 0.251133 Lock_time: 0.052001 Rows_sent: 42 Rows_examined: 1729 Logical_reads: 1923
Physical_reads: 25                                                               DTCC2012
查询优化-子查询
SELECT orderid,customerid,employeeid,A.orderdate
FROM orders AS A,
( SELECT MAX(orderdate) AS orderdate
                  FROM orders
                  GROUP BY (DATE_FORMAT(orderdate,'%Y%m'))
) AS B
WHERE A.orderdate = B.orderdate;




# User@Host: root[root] @ localhost [127.0.0.1]
# Thread_id: 1 Schema: tpcc QC_hit: No
# Query_time: 0.296897 Lock_time: 0.212167 Rows_sent: 42 Rows_examined: 941 Logical_reads: 1258,
Physical_reads: 28.
                                                                                 DTCC2012
查询优化-子查询
  •    MySQL 5.6/MariaDB对子查询的优化
        – 支持对于独立子查询的优化
              •   SEMI JOIN
        – 支持各种类型的子查询优化
        – SET optimizer_switch='semijoin=on, materialization=on';




# User@Host: root[root] @ localhost [127.0.0.1]
# Thread_id: 1 Schema: tpcc QC_hit: No
# Query_time: 0.176819 Lock_time: 0.147888 Rows_sent: 42 Rows_examined: 1729 Logical_IO: 1927, Physical_IO: 29.




                                                                                     DTCC2012
查询优化-子查询总结
•   通过EXPLAIN分析执行计划
•   避免IN => EXISTS转换带来的高额开销
•   避免多次的关联查询操作
•   使用MySQL 5.6/MariaDB 5.3处理复杂子查询
    操作
    – SQL优化器自动优化



                            DTCC2012
参考资料
• http://dev.mysql.com/doc/
• http://kb.askmonty.org/en/
• 《MySQL技术内幕》
  – 《MySQL技术内幕:InnoDB存储引擎》
  – 《MySQL技术内幕:SQL编程》




                               DTCC2012
Q&A

      DTCC2012

Contenu connexe

Tendances

Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace WalkthroughSergey Petrunya
 
Using JSON with MariaDB and MySQL
Using JSON with MariaDB and MySQLUsing JSON with MariaDB and MySQL
Using JSON with MariaDB and MySQLAnders Karlsson
 
MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions oysteing
 
How to analyze and tune sql queries for better performance
How to analyze and tune sql queries for better performanceHow to analyze and tune sql queries for better performance
How to analyze and tune sql queries for better performanceoysteing
 
How to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinarHow to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinaroysteing
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesSergey Petrunya
 
PostgreSQL: Advanced features in practice
PostgreSQL: Advanced features in practicePostgreSQL: Advanced features in practice
PostgreSQL: Advanced features in practiceJano Suchal
 
MySQL Optimizer: What’s New in 8.0
MySQL Optimizer: What’s New in 8.0MySQL Optimizer: What’s New in 8.0
MySQL Optimizer: What’s New in 8.0oysteing
 
SQL window functions for MySQL
SQL window functions for MySQLSQL window functions for MySQL
SQL window functions for MySQLDag H. Wanvik
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Sergey Petrunya
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performanceSergey Petrunya
 
Common Table Expressions (CTE) & Window Functions in MySQL 8.0
Common Table Expressions (CTE) & Window Functions in MySQL 8.0Common Table Expressions (CTE) & Window Functions in MySQL 8.0
Common Table Expressions (CTE) & Window Functions in MySQL 8.0oysteing
 
Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0oysteing
 
Histograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLHistograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLSergey Petrunya
 
How to analyze and tune sql queries for better performance vts2016
How to analyze and tune sql queries for better performance vts2016How to analyze and tune sql queries for better performance vts2016
How to analyze and tune sql queries for better performance vts2016oysteing
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015Dave Stokes
 
MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015Dave Stokes
 
Indexing and Query Optimizer (Richard Kreuter)
Indexing and Query Optimizer (Richard Kreuter)Indexing and Query Optimizer (Richard Kreuter)
Indexing and Query Optimizer (Richard Kreuter)MongoDB
 

Tendances (20)

Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
 
Using JSON with MariaDB and MySQL
Using JSON with MariaDB and MySQLUsing JSON with MariaDB and MySQL
Using JSON with MariaDB and MySQL
 
MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions
 
How to analyze and tune sql queries for better performance
How to analyze and tune sql queries for better performanceHow to analyze and tune sql queries for better performance
How to analyze and tune sql queries for better performance
 
How to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinarHow to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinar
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
 
PostgreSQL: Advanced features in practice
PostgreSQL: Advanced features in practicePostgreSQL: Advanced features in practice
PostgreSQL: Advanced features in practice
 
MySQL Optimizer: What’s New in 8.0
MySQL Optimizer: What’s New in 8.0MySQL Optimizer: What’s New in 8.0
MySQL Optimizer: What’s New in 8.0
 
Sql
SqlSql
Sql
 
SQL window functions for MySQL
SQL window functions for MySQLSQL window functions for MySQL
SQL window functions for MySQL
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performance
 
Common Table Expressions (CTE) & Window Functions in MySQL 8.0
Common Table Expressions (CTE) & Window Functions in MySQL 8.0Common Table Expressions (CTE) & Window Functions in MySQL 8.0
Common Table Expressions (CTE) & Window Functions in MySQL 8.0
 
Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0
 
Histograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLHistograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQL
 
How to analyze and tune sql queries for better performance vts2016
How to analyze and tune sql queries for better performance vts2016How to analyze and tune sql queries for better performance vts2016
How to analyze and tune sql queries for better performance vts2016
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
 
MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015
 
ROracle
ROracle ROracle
ROracle
 
Indexing and Query Optimizer (Richard Kreuter)
Indexing and Query Optimizer (Richard Kreuter)Indexing and Query Optimizer (Richard Kreuter)
Indexing and Query Optimizer (Richard Kreuter)
 

Similaire à My sql查询优化实践

Covering indexes
Covering indexesCovering indexes
Covering indexesMYXPLAIN
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data
 
Why Use EXPLAIN FORMAT=JSON?
 Why Use EXPLAIN FORMAT=JSON?  Why Use EXPLAIN FORMAT=JSON?
Why Use EXPLAIN FORMAT=JSON? Sveta Smirnova
 
Automated Slow Query Analysis: Dex the Index Robot
Automated Slow Query Analysis: Dex the Index RobotAutomated Slow Query Analysis: Dex the Index Robot
Automated Slow Query Analysis: Dex the Index RobotMongoDB
 
30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practices30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practicesDavid Dhavan
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentationMichael Keane
 
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksMYXPLAIN
 
Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDBMongoDB
 
SQL Server Deep Dive, Denis Reznik
SQL Server Deep Dive, Denis ReznikSQL Server Deep Dive, Denis Reznik
SQL Server Deep Dive, Denis ReznikSigma Software
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB
 
How mysql choose the execution plan
How mysql choose the execution planHow mysql choose the execution plan
How mysql choose the execution plan辛鹤 李
 
SequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational DatabaseSequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational Databasewangzhonnew
 
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...EDB
 
Migration from mysql to elasticsearch
Migration from mysql to elasticsearchMigration from mysql to elasticsearch
Migration from mysql to elasticsearchRyosuke Nakamura
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL IndexingHoang Long
 
The internals of Spark SQL Joins, Dmytro Popovich
The internals of Spark SQL Joins, Dmytro PopovichThe internals of Spark SQL Joins, Dmytro Popovich
The internals of Spark SQL Joins, Dmytro PopovichSigma Software
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at ScaleMongoDB
 

Similaire à My sql查询优化实践 (20)

Covering indexes
Covering indexesCovering indexes
Covering indexes
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 
Why Use EXPLAIN FORMAT=JSON?
 Why Use EXPLAIN FORMAT=JSON?  Why Use EXPLAIN FORMAT=JSON?
Why Use EXPLAIN FORMAT=JSON?
 
Quick Wins
Quick WinsQuick Wins
Quick Wins
 
Automated Slow Query Analysis: Dex the Index Robot
Automated Slow Query Analysis: Dex the Index RobotAutomated Slow Query Analysis: Dex the Index Robot
Automated Slow Query Analysis: Dex the Index Robot
 
30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practices30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practices
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
Se2017 query-optimizer
Se2017 query-optimizerSe2017 query-optimizer
Se2017 query-optimizer
 
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New Tricks
 
JSSUG: SQL Sever Index Tuning
JSSUG: SQL Sever Index TuningJSSUG: SQL Sever Index Tuning
JSSUG: SQL Sever Index Tuning
 
Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDB
 
SQL Server Deep Dive, Denis Reznik
SQL Server Deep Dive, Denis ReznikSQL Server Deep Dive, Denis Reznik
SQL Server Deep Dive, Denis Reznik
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
How mysql choose the execution plan
How mysql choose the execution planHow mysql choose the execution plan
How mysql choose the execution plan
 
SequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational DatabaseSequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational Database
 
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
 
Migration from mysql to elasticsearch
Migration from mysql to elasticsearchMigration from mysql to elasticsearch
Migration from mysql to elasticsearch
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL Indexing
 
The internals of Spark SQL Joins, Dmytro Popovich
The internals of Spark SQL Joins, Dmytro PopovichThe internals of Spark SQL Joins, Dmytro Popovich
The internals of Spark SQL Joins, Dmytro Popovich
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 

Dernier

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Dernier (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

My sql查询优化实践

  • 1. MySQL查询调优实践 David Jiang Weibo: insidemysql Gtalk: jiangchengyao@gmail.com DTCC2012
  • 2. 主题 • B+树索引 • 简单查询优化 – OLTP • 复杂查询优化 – OLAP DTCC2012
  • 3. 关于我 • 近10年MySQL数据库使用经验 – MySQL 3.23 ~ MySQL 5.6 • InnoSQL分支版本创始人 – www.innomysql.org • 独立数据库咨询顾问 – www.innosql.com • 《MySQL技术内幕》系列作者 – 《MySQL技术内幕: InnoDB存储引擎》(已出版) – 《MySQL技术内幕: SQL编程》(已出版) – 《MySQL技术内幕: 性能调优》(待出版) DTCC2012
  • 4. B+树索引 • 常用的索引 – B+ Tree Index – T Tree Index – Hash Index • 什么是索引? – 提高查询速度??? – It depends – 减少IO次数 DTCC2012
  • 5. B+树索引 • 聚集索引(Clustered Index) – 叶子节点存放整行记录 • 辅助索引(Secondary Index) – 叶子节点存放row identifier • InnoDB:primary key – 书签查找(bookmark lookup) » 查找代价大 • MyISAM:物理位置(偏移量) – 更新代价大 • B+树的高度=> IO次数=>随机IO – 3~4层 DTCC2012
  • 8. B+树索引 • 聚集索引 VS 辅助索引 – Clustered index key = 4 bytes – Secondary index key = 4 bytes – Key pointer = 6 bytes – Average row length = 300 bytes – Page size = 16K = 16384 bytes – Average node occupancy = 70% ( both for leaf and index page ) – Fan-out for clustered index = 16384 * 70% / ( 4+6 ) = 1000 – Fan-out for secondary index = 16384 * 70% / ( 4+ 6 ) = 1000 – Average row per page for clustered index = 16384 * 70% / 300 = 35 – Average row per page for clustered index = 16384 * 70% / ( 4 + 6 ) = 1000 H Clustered Index Secondary Index 2 1000*35 = 35,000 1000 * 1000 = 1,000,000 3 (1000)2*35 = 35,000,000 (1000)2*1000 = 1,000,000,000 4 (1000)3*35 = 35,000,000,000 (1000)3*1000 = 1,000,000,000,000 DTCC2012
  • 9. B+树索引 • 辅助索引的优势 – 树的高度较小=>需要的IO次数少 – 树的大小较小=>scan需要扫描的页较少 – 优化器倾向于使用辅助索引 • 辅助索引的劣势 – 查找完整记录还需查询 • InnoDB:查询聚集索引 • MyISAM:直接查找MYD物理位置 DTCC2012
  • 10. B+树索引-InnoDB索引 CREATE TABLE UserInfo ( userid INT NOT NULL AUTO_INCREMENT, username VARCHAR(30), registdate DATETIME, email VARCHAR(50), PRIMARY KEY (userid), UNIQUE KEY idx_username (username), KEY idx_registdate (registdate) )Engine=InnoDB; CREATE TABLE UserInfo ( CREATE TABLE idx_username ( CREATE TABLE idx_registdate ( userid INT NOT NULL AUTO_INCREMENT, userid INT NOT NULL, userid INT NOT NULL, username VARCHAR(30), username VARCHAR(30), registdate DATETIME), registdate DATETIME, PRIMARY KEY (username,userid) PRIMARY KEY (registdate,userid) email VARCHAR(50), ); ); PRIMARY KEY (userid) DTCC2012 );
  • 12. B+树索引-插入顺序问题 • 聚集索引插入 – 主键是自增长的 – 插入是顺序的 – 每页中的填充率高(15/16) – 顺序扫描(Scan)可以达到磁盘顺序读的速率 – 一般不推荐使用UUID • 插入非顺序 DTCC2012
  • 13. B+树索引-插入顺序问题 • 辅助索引插入 – 插入的顺序是乱序的 • 插入(’David’, ‘Monty’, ‘Jimmy’, ‘Amy’, ‘Michael’ ) – 插入的顺序是顺序的 • 插入时间 – 需要产生B+树的分裂 • 需要较大的开销 – 每页的填充率较低(60%~70%) – 顺序扫描不能达到磁盘顺序读的速率 • 若插入是乱序的 DTCC2012
  • 14. B+树索引-插入顺序问题 • 辅助索引顺序扫描速度 – select count(1) from stock • KEY `fkey_stock_2` (`s_i_id`), INT • Avg row length: 355 辅助索引:6分42秒 •~4M/秒 •Logical_reads: 12700001 Physical_reads: 100057 强制聚集索引:4分38秒 •~120~130M/秒 •Logical_reads: 14670405 Physical_reads: 2170333 DTCC2012
  • 15. 简单查询优化 • 简单查询 – OLTP • 简单查询特点 – SQL语句较为简单 – 返回少部分数据 – 并发量大 • 优化原则 – 减少随机读=> 使用索引 • High Cardinality DTCC2012
  • 16. 简单查询优化 • SELECT … FROM table where primary_key = ??? • SELECT … FROM table where key = ??? DTCC2012
  • 17. 简单查询优化-复合索引 • 索引键值为多个列 – (a,b) a : 1,1,2,2,3,3 (a, b): (1,1),(1,2),(2,1),(2,4),(3,1),(3,2) b: 1,2,1,4,1,2 DTCC2012
  • 18. 简单查询优化-复合索引 • 复合索引(a,b)可被使用于: • SELECT * FROM t WHERE a = ? • SELECT * FROM t WHERE a = ? AND b = ? • SELECT * FROM t WHERE a = ? ORDER BY b • 索引覆盖 – 查询b也可以使用该索引 – WHERE b = ??? DTCC2012
  • 19. 简单查询优化-索引覆盖 • 从辅助索引直接得到结果 – 不需要书签查找 • (primary key1,primary key2,…,key1, key2,…) – SELECT key2 FROM table WHERE key1=xxx; – SELECT primary key2,key2 FROM table WHERE key1=xxx; – SELECT primary key1,key2 FROM table WHERE key1=xxx; – SELECT primary key1,primary key2,key2 FROM table WHERE key1=xxx; DTCC2012
  • 20. 简单查询优化-索引覆盖 CREATE TABLE ItemLog( logId INT NOT NULL AUTO_INCREMENT, userId VARCHAR(100) NOT NULL, itemId INT NOT NULL, date DATETIME NOT NULL, …… PRIMARY KEY(logId), KEY idx_userId_date (userId,date) )ENGINE=INNODB; SELECT COUNT(1) FROM ItemLog WHERE date>='2012-04-01' AND date<'2012-05-01'; DTCC2012
  • 21. 简单查询优化-书签查找优化 CREATE TABLE UserInfo ( userid INT NOT NULL AUTO_INCREMENT, username VARCHAR(30), registdate DATETIME, email VARCHAR(50), PRIMARY KEY (userid), UNIQUE KEY idx_username (username), KEY idx_registdate (registdate) )Engine=InnoDB; SELECT email FROM UserInfo WHERE username = ‘David’ If:聚集索引高度:4 && 辅助索引高度:3 Then: 一共需要7次逻辑IO DTCC2012
  • 22. 简单查询优化-书签查找优化 • 分表 – Like Index Coverage CREATE TABLE UserInfo ( CREATE TABLE UserInfoDetail ( userid INT NOT NULL AUTO_INCREMENT, userid INT NOT NULL AUTO_INCREMENT, username VARCHAR(30), username VARCHAR(30), registdate DATETIME, email VARCHAR(50), PRIMARY KEY (userid), PRIMARY KEY (userid), UNIQUE KEY idx_username (username), UNIQUE KEY idx_username (username) KEY idex_registdate (registdate) ); ); SELECT email FROM UserInfoDetail WHERE username = ‘David’ If:辅助索引高度:3 Then:逻辑IO减少为3 DTCC2012
  • 23. 简单查询优化-总结 • 每个页填充率高 – 包含的记录多 • 减少IO次数 – IO => 性能 – OLTP only • 利用索引覆盖技术避免书签查找 • 利用分表技术避免书签查找 DTCC2012
  • 24. 复杂查询优化 • 复杂查询 – OLAP • JOIN • Subquery • 复杂查询特点 – 数据量大 – 并发少 – 需访问较多的数据 – 索引并不再是唯一的优化方向 – 调优工作复杂 DTCC2012
  • 25. 复杂查询优化-JOIN • MySQL JOIN 类型 – Simple Nested Loops Join – Block Nested Loops Join • MySQL 5.5+ – Classic Hash Join • MariaDB 5.3+ • Block Hash Nested Loops Join DTCC2012
  • 26. 复杂查询优化-SNLJ For each tuple r in R do For each tuple s in S do If r and s satisfy the join condition Then output the tuple <r,s> For each tuple r in R do lookup r join condition in S index if found s == r Then output the tuple <r,s> Scan cost (no index) = Rn+Rn×Sn = O(Rn×Sn) Scan cost (with index) = Rn + Rn × SBH = = O(Rn) DTCC2012
  • 27. 复杂查询优化-SNLJ • INNER JOIN with Index – INNER JOIN联接顺序可更改 – 优化器喜欢选择较小的表作为外部表 – Scan cost (with index) = Rn + Rn × SBH = = O(Rn) SELECT b.emp_no,a.title,a.from_date,a.to_date FROM titles a INNER JOIN employees b on a.emp_no = b.emp_no; mysql> SELECT COUNT(1) FROM employeesG; *************************** 1. row *************************** count(1): 300024 mysql> SELECT COUNT(1) FROM titlesG; *************************** 1. row *************************** DTCC2012 count(1): 443308
  • 28. 复杂查询优化-SNLJ SELECT b.emp_no,a.title,a.from_date,a.to_date FROM titles a STRAIGHT_JOIN employees b on a.emp_no = b.emp_no; DTCC2012
  • 29. 复杂查询优化-BNLJ For each tuple r in R do store used columns as p from R in join buffer For each tuple s in S do If p and s satisfy the join condition Then output the tuple <p,s> •The join_buffer_size system variable determines the size of each join buffer •Join buffering can be used when the join is of type ALL or index or range. •One buffer is allocated for each join that can be buffered, so a given query might be processed using multiple join buffers. •Only columns of interest to the join are stored in the join buffer, not whole rows.  For join with no index  Reduce inner table scan times DTCC2012
  • 30. 复杂查询优化-Example • Outer table R – (1,’a’),(2,’b’),(3,’c’),(4,’d’) • Inner table S: – (1,’2010-01-01’),(2, ’2010-01-01’),(3, ’2010-01-01’) NLJ BNLJ Outer table scan 1 1 Inner table scan 4 1 Compare times 12 12 DTCC2012
  • 31. 复杂查询优化-BNLJ SELECT b.emp_no,a.title,a.from_date,a.to_date FROM employees_noindex b INNER JOIN titles_noindex a ON a.emp_no = b.emp_no WHERE b.birth_date >= '1965-01-01'; DTCC2012
  • 32. 复杂查询优化-BNLJ SELECT b.emp_no,a.title,a.from_date,a.to_date FROM titles_noindex a LEFT OUTER JOIN employees_noindex b ON a.emp_no = b.emp_no WHERE b.birth_date >= '1965-01-01'; MySQL 5.5 709.645 sec (MySQL 5.5不支持OUTER JOIN) MySQL 5.6 57.483 sec ~12x DTCC2012
  • 33. 复杂查询优化-BHNJ For each tuple r in R do store used columns as p from R in join buffer build hash table according join buffer for each tuple s in S do probe hash table if find Then output the tuple <r,s> • 通过哈希表减少内部表的比较次数 • Join Buffer Size的大小决定了Classic Hash Join的效率 • 只能用于等值联接 Scan cost = Rn + Sn = = O(Rn + Sn ) DTCC2012
  • 34. 复杂查询优化-Example • Outer table – (1,’a’),(2,’b’),(3,’c’),(4,’d’) • Inner table: – (1,’2010-01-01’),(2, ’2010-01-01’),(3, ’2010-01-01’) NLJ BNLJ BNLJH Outer table scan 1 1 1 Inner table scan 4 1 1 Compare times 12 12 3 DTCC2012
  • 35. 复杂查询优化-BHNJ SELECT MAX(l_extendedprice) FROM orders, lineitem WHERE o_orderdate BETWEEN '1995-01-01' AND '1995-01-31' AND l_orderkey=o_orderkey; MySQL 5.5 125.3sec MariaDB 5.3 23.104sec ~5x DTCC2012
  • 36. 复杂查询优化-BHNJ 执行速度(秒) 逻辑IO MySQL 5.5 125.3 157061 MySQL 5.5 (强制使用i_l_orderkey索引) 153.021 406879 MariaDB 5.3 (BNLH) 23.104 6077083 DTCC2012
  • 37. 复杂查询优化-JOIN总结 • SNLJ – 适合较小表之间的联接 – 联接的列需包含有索引 • BHNLJ – 大表之间的等值联接操作 – 大表与小表之间的等值联接操作 – Join Buffer Size的大小决定了内部表的扫描次数 – 推荐MariaDB作为数据集市或者数据仓库 DTCC2012
  • 38. 查询优化-子查询 • 子查询 – 独立子查询 – 相关子查询 • MySQL子查询 – 独立子查询转换为相关子查询(SQL重写 LAZY) • SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2); • SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a); • Scan cost: O(A+A*B) – 优化视情况而定 DTCC2012
  • 39. 查询优化-子查询 SELECT orderid,customerid,employeeid,orderdate FROM orders WHERE orderdate IN ( SELECT MAX(orderdate) FROM orders GROUP BY (DATE_FORMAT(orderdate,'%Y%m')) ) # Time: 111227 23:49:16 # User@Host: root[root] @ localhost [127.0.0.1] # Query_time: 6.081214 Lock_time: 0.046800 Rows_sent: 42 Rows_examined: 727558 Logical_reads: 91584 Physical_reads: 19 use tpcc; SET timestamp=1325000956; SELECT orderid,customerid,employeeid,orderdate FROM orders WHERE orderdate IN ( SELECT MAX(orderdate) FROM orders GROUP BY (DATE_FORMAT(orderdate,'%Y%M')) ); DTCC2012
  • 40. 查询优化-子查询 SELECT orderid,customerid,employeeid,orderdate FROM orders AS A WHERE EXISTS ( SELECT * FROM orders GROUP BY(DATE_FORMAT(orderdate,'%Y%M')) HAVING MAX(orderdate)= A.OrderDate ); DTCC2012
  • 41. 查询优化-子查询 SELECT orderid,customerid,employeeid,orderdate FROM orders A WHERE EXISTS ( SELECT * FROM (SELECT MAX(orderdate) AS orderdate FROM orders GROUP BY (DATE_FORMAT(orderdate,'%Y%M')) ) B WHERE A.orderdate = B.orderdate ); # Time: 111227 23:45:49 # User@Host: root[root] @ localhost [127.0.0.1] # Query_time: 0.251133 Lock_time: 0.052001 Rows_sent: 42 Rows_examined: 1729 Logical_reads: 1923 Physical_reads: 25 DTCC2012
  • 42. 查询优化-子查询 SELECT orderid,customerid,employeeid,A.orderdate FROM orders AS A, ( SELECT MAX(orderdate) AS orderdate FROM orders GROUP BY (DATE_FORMAT(orderdate,'%Y%m')) ) AS B WHERE A.orderdate = B.orderdate; # User@Host: root[root] @ localhost [127.0.0.1] # Thread_id: 1 Schema: tpcc QC_hit: No # Query_time: 0.296897 Lock_time: 0.212167 Rows_sent: 42 Rows_examined: 941 Logical_reads: 1258, Physical_reads: 28. DTCC2012
  • 43. 查询优化-子查询 • MySQL 5.6/MariaDB对子查询的优化 – 支持对于独立子查询的优化 • SEMI JOIN – 支持各种类型的子查询优化 – SET optimizer_switch='semijoin=on, materialization=on'; # User@Host: root[root] @ localhost [127.0.0.1] # Thread_id: 1 Schema: tpcc QC_hit: No # Query_time: 0.176819 Lock_time: 0.147888 Rows_sent: 42 Rows_examined: 1729 Logical_IO: 1927, Physical_IO: 29. DTCC2012
  • 44. 查询优化-子查询总结 • 通过EXPLAIN分析执行计划 • 避免IN => EXISTS转换带来的高额开销 • 避免多次的关联查询操作 • 使用MySQL 5.6/MariaDB 5.3处理复杂子查询 操作 – SQL优化器自动优化 DTCC2012
  • 45. 参考资料 • http://dev.mysql.com/doc/ • http://kb.askmonty.org/en/ • 《MySQL技术内幕》 – 《MySQL技术内幕:InnoDB存储引擎》 – 《MySQL技术内幕:SQL编程》 DTCC2012
  • 46. Q&A DTCC2012