大家好,得益于摩尔定律,计算机性能已大幅提升,加上数据库的进步以及微服务所倡导的各种反模式设计,因此现在编写复杂SQL查询的机会越来越少。业界已经开始提倡不要进行专门的SQL优化,因为节省下来的资源并不足以抵消员工的工资成本。本文将介绍7个常见的SQL慢查询语句,并解释如何优化它们的性能。
1.LIMIT语句
分页是最常用的方案之一,但也容易出现问题。例如对于以下简单的语句,DBA通常建议的解决方案是添加一个包含type
、name
和create_time
字段的复合索引。这样,条件和排序就可以有效利用索引,从而显著提高性能。
sql
SELECT *
FROM operation
WHERE type = 'SQLStats'
AND name = 'SlowLog'
ORDER BY create_time
LIMIT 1000, 10;
这可能会解决90%以上DBA的问题,但是当LIMIT子句变成"LIMIT 1000000, 10"时,程序员仍会抱怨"为什么在只查询10条记录的时候,速度还这么慢" 。要知道,数据库不知道第1000000条记录从何处开始,所以即使有索引,它仍需要从头开始计算。在大多数情况下,这个性能问题是由于懒惰编程造成的。
在前端数据浏览或批量导出大量数据的场景中,可以使用上一页的最大值作为查询参数。SQL可以重新设计如下:
sql
SELECT *
FROM operation
WHERE type = 'SQLStats'
AND name = 'SlowLog'
AND create_time > '2017-03-16 14:00:00'
ORDER BY create_time
LIMIT 10;
采用这种新设计后,查询时间保持不变,不会随着数据量的增加而变化。
2.隐式转换
SQL语句中另一个常见的错误是查询变量和字段定义的类型不匹配,以下面的语句为例:
sql
mysql> explain extended SELECT *
> FROM my_balance b
> WHERE b.bpn = 14000000123
> AND b.isverified IS NULL ;
mysql> show warnings;
| Warning | 1739 | Cannot use ref access on index 'bpn' due to type or collation conversion on field 'bpn'
在这种情况下,字段bpn
被定义为varchar(20)
,而MySQL的策略是在比较之前将字符串转换为数字。这会导致函数被应用到表字段上,从而使索引失效。
这种情况可能是由应用程序框架自动填充参数造成的,而不是程序员的本意。如今,应用程序框架通常都很复杂,虽然它们提供了便利,但也可能带来隐患。
3.连接更新和删除
尽管MySQL 5.6引入了物化,但它只优化了SELECT语句。对于UPDATE或DELETE语句,需要使用JOIN手动重写。
例如,请看下面的UPDATE语句。MySQL实际上执行了一个循环/嵌套子查询(DEPENDENT SUBQUERY),执行时间可想而知。
sql
UPDATE operation o
SET status = 'applying'
WHERE o.id IN (SELECT id
FROM (SELECT o.id,
o.status
FROM operation o
WHERE o.group = 123
AND o.status NOT IN ( 'done' )
ORDER BY o.parent,
o.id
LIMIT 1) t);
执行计划如下:
sql
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| 1 | PRIMARY | o | index | | PRIMARY | 8 | | 24 | Using where; Using temporary |
| 2 | DEPENDENT SUBQUERY | | | | | | | | Impossible WHERE noticed after reading const tables |
| 3 | DERIVED | o | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Using where; Using filesort |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
将其重写为JOIN后,子查询的选择类型从DEPENDENT SUBQUERY变为DERIVED,执行时间显著得从7秒缩短到2毫秒。
sql
UPDATE operation o
JOIN (SELECT o.id,
o.status
FROM operation o
WHERE o.group = 123
AND o.status NOT IN ( 'done' )
ORDER BY o.parent,
o.id
LIMIT 1) t
ON o.id = t.id
SET status = 'applying';
简化后的执行计划如下:
sql
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| 1 | PRIMARY | | | | | | | | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED | o | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Using where; Using filesort |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
4.混合排序
MySQL无法利用索引进行混合排序,但是在某些场景下,仍然可以使用特殊方法来提高性能。
sql
SELECT *
FROM my_order o
INNER JOIN my_appraise a ON a.orderid = o.id
ORDER BY a.is_reply ASC,
a.appraise_time DESC
LIMIT 0, 20;
执行计划显示的是全表扫描:
sql
+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
| 1 | SIMPLE | a | ALL | idx_orderid | NULL | NULL | NULL | 1967647 | Using filesort |
| 1 | SIMPLE | o | eq_ref | PRIMARY | PRIMARY | 122 | a.orderid | 1 | NULL |
+----+-------------+-------+--------+---------+---------+---------+-----------------+---------+-+
由于is_reply
只有0和1两种状态,可以将其重写如下,从而将执行时间从1.58秒缩短到2毫秒:
sql
SELECT *
FROM ((SELECT *
FROM my_order o
INNER JOIN my_appraise a
ON a.orderid = o.id
AND is_reply = 0
ORDER BY appraise_time DESC
LIMIT 0, 20)
UNION ALL
(SELECT *
FROM my_order o
INNER JOIN my_appraise a
ON a.orderid = o.id
AND is_reply = 1
ORDER BY appraise_time DESC
LIMIT 0, 20)) t
ORDER BY is_reply ASC,
appraisetime DESC
LIMIT 20;
5.EXISTS语句
在处理EXISTS子句时,MySQL仍然使用嵌套子查询进行执行。以下面的SQL语句为例:
sql
SELECT *
FROM my_neighbor n
LEFT JOIN my_neighbor_apply sra
ON n.id = sra.neighbor_id
AND sra.user_id = 'xxx'
WHERE n.topic_status < 4
AND EXISTS(SELECT 1
FROM message_info m
WHERE n.id = m.neighbor_id
AND m.inuser = 'xxx')
AND n.topic_type <> 5;
sql
+----+--------------------+-------+------+-----+------------------------------------------+---------+-------+---------+ -----+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+
| 1 | PRIMARY | n | ALL | | NULL | NULL | NULL | 1086041 | Using where |
| 1 | PRIMARY | sra | ref | | idx_user_id | 123 | const | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | m | ref | | idx_message_info | 122 | const | 1 | Using index condition; Using where |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+
通过删除EXISTS子句并将其更改为JOIN, 我们可以避免嵌套子查询,并将执行时间从1.93秒减少到1毫秒。
sql
SELECT *
FROM my_neighbor n
INNER JOIN message_info m
ON n.id = m.neighbor_id
AND m.inuser = 'xxx'
LEFT JOIN my_neighbor_apply sra
ON n.id = sra.neighbor_id
AND sra.user_id = 'xxx'
WHERE n.topic_status < 4
AND n.topic_type <> 5;
新的执行计划如下:
sql
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
| 1 | SIMPLE | m | ref | | idx_message_info | 122 | const | 1 | Using index condition |
| 1 | SIMPLE | n | eq_ref | | PRIMARY | 122 | ighbor_id | 1 | Using where |
| 1 | SIMPLE | sra | ref | | idx_user_id | 123 | const | 1 | Using where |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
6.条件下推
在某些情况下,外部查询条件无法下推到复杂的视图或子查询中:
-
聚合子查询。
-
带有LIMIT的子查询。
-
UNION或UNION ALL子查询。
-
输出字段中的子查询。
请看下面的语句,其中的条件会影响聚合子查询:
sql
SELECT *
FROM (SELECT target,
Count(*)
FROM operation
GROUP BY target) t
WHERE target = 'rm-xxxx';
sql
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
| 1 | PRIMARY | n | ALL | NULL | NULL | NULL | NULL | 1086041 | Using where |
| 1 | PRIMARY | sra | ref | NULL | idx_user_id | 123 | const | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | m | ref | NULL | idx_message_info | 122 | const | 1 | Using index condition; Using where |
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
通过删除EXISTS子句并将其更改为JOIN,我们可以避免嵌套子查询并将执行时间从1.93秒减少到1毫秒。
sql
SELECT *
FROM my_neighbor n
INNER JOIN message_info m
ON n.id = m.neighbor_id
AND m.inuser = 'xxx'
LEFT JOIN my_neighbor_apply sra
ON n.id = sra.neighbor_id
AND sra.user_id = 'xxx'
WHERE n.topic_status < 4
AND n.topic_type <> 5;
新的执行计划如下:
sql
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
| 1 | SIMPLE | m | ref | | idx_message_info | 122 | const | 1 | Using index condition |
| 1 | SIMPLE | n | eq_ref | | PRIMARY | 122 | ighbor_id | 1 | Using where |
| 1 | SIMPLE | sra | ref | | idx_user_id | 123 | const | 1 | Using where |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
7.提前缩小范围
以下经过部分优化的示例(左连接中的主表作为主查询条件):
sql
SELECT a.*,
c.allocated
FROM (
SELECT resourceid
FROM my_distribute d
WHERE isdelete = 0
AND cusmanagercode = '1234567'
ORDER BY salecode limit 20) a
LEFT JOIN
(
SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
FROM my_resources
GROUP BY resourcesid) c
ON a.resourceid = c.resourcesid;
很明显,子查询c是对整个表进行聚合查询,在处理大量表时可能会导致性能下降。
事实上,对于子查询c,左连接的结果集只关心可以与主表的resourceid
匹配的数据。因此我们可以将语句重写如下,将执行时间从2秒减少到2毫秒:
sql
SELECT a.*,
c.allocated
FROM (
SELECT resourceid
FROM my_distribute d
WHERE isdelete = 0
AND cusmanagercode = '1234567'
ORDER BY salecode limit 20) a
LEFT JOIN
(
SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
FROM my_resources r,
(
SELECT resourceid
FROM my_distribute d
WHERE isdelete = 0
AND cusmanagercode = '1234567'
ORDER BY salecode limit 20) a
WHERE r.resourcesid = a.resourcesid
GROUP BY resourcesid) c
ON a.resourceid = c.resourcesid;
然而子查询a在SQL语句中出现了多次,这种方法不仅会产生额外的成本,而且也会使语句变得更加复杂。可以使用WITH语句来简化它:
sql
WITH a AS
(
SELECT resourceid
FROM my_distribute d
WHERE isdelete = 0
AND cusmanagercode = '1234567'
ORDER BY salecode limit 20)
SELECT a.*,
c.allocated
FROM a
LEFT JOIN
(
SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
FROM my_resources r,
a
WHERE r.resourcesid = a.resourcesid
GROUP BY resourcesid) c
ON a.resourceid = c.resourcesid;
数据库编译器生成的执行计划决定了SQL语句的实际执行方式,但是编译器只能尽力提供服务,没有一个数据库编译器是完美的,上述情况在其他数据库中也同样存在。了解了数据库编译器的特性,我们就能绕过它的限制,编写出高性能的SQL语句。
在设计数据模型和编写SQL语句时,将算法思维或算法意识引入到这个过程非常重要。在编写复杂的SQL语句时,养成使用WITH语句的习惯可以简化语句,减轻数据库的负担。
最后,下面是SQL语句的执行顺序:
sql
FROM
ON
JOIN
WHERE
GROUP BY
HAVING
SELECT
DISTINCT
ORDER BY
LIMIT