结构分析
这段sql需要执行4~5秒,现在考虑进行优化,首先分析其查询结构
sql
select
t1.taskBeginTime as '时间',
case
t1.taskType
when '1' then '转发'
when '2' then '原创'
when '3' then '互动'
when '4' then '留资'
when '8' then '内容共创'
else ''
end as '任务类型',
t2.taskPlatFormName as '任务平台',
t1.taskName as '任务名称',
t1.taskJoinNum as '任务领取量',
'' as '领取率',
-- 子查询A(p1表)
(
select count(DISTINCT p1.open_id)
from bus_wechat_message_send_log p1
where p1.task_notice_id = t1.id
and template_id = '新工单提醒'
and p1.create_time BETWEEN '2025-03-01 00:00:00' AND '2025-03-07 23:59:59'
) as '消息推送人数',
-- 子查询B(p2+p3表)
(
select count(*)
from bus_partnertask p2
inner join bus_partner_new p3 on p2.kox_id = p3.id
and p3.is_enable = '1'
where t1.id = p2.taskId
and p2.auditStatus = '1'
and p2.isDeleted = '0'
and p2.tenant_id = 66
) as '任务完成量',
-- 子查询C(p4表)
(
select COALESCE(sum(p4.score), 0)
from bus_send_score_log p4
where p4.task_id = t1.id
and p4.source_id is null
and p4.res_code = 0
and p4.tenant_id = 66
) as '实际发放积分',
-- 子查询D(p5表)
(
select COALESCE(
sum(
p5.total_volume_num
),
0
)
from bus_partnertask p5
where p5.taskId = t1.id
and p5.isDeleted = '0'
) as '声量'
from
-- 主联表查询(t1+t2表):Nested loop left join
bus_task t1
LEFT JOIN bus_task_platform_mapping t2 on t1.id = t2.taskId
and t2.isDeleted = '0'
and t2.tenant_id = 66
where
1 = 1
and t1.isDeleted = '0'
and t1.taskStatus not in (1, 4)
and t1.tenant_id = 66
and t1.taskBeginTime BETWEEN '2025-03-01 00:00:00' AND '2025-03-07 23:59:59'
and t1.taskType not in (8)
order by t1.taskBeginTime desc;
梳理之后可以大致拆解为:
- 主联表查询(t1+t2)
- 子查询4个(p1、p2+p3、p4、p5)
- 排序(t1.taskBeginTime desc)
其他字段都可以归属于主联表查询的结果,所以不是重点
以上面分析的3个查询组成部分为重点,接下来去进行详细分析
索引分析 (expalin)
在刚刚的sql语句前面加上explain
css
explain [sql]
执行可以查询到执行计划:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | t1 | range | bus_task_taskBeginTime_index | bus_task_taskBeginTime_index | 6 | 24 | 4.0 | Using index condition; Using where; Backward index scan | ||
1 | PRIMARY | t2 | ref | bus_task_platform_mapping_taskId_index | bus_task_platform_mapping_taskId_index | 8 | voyah_koc.t1.id | 1 | 100.0 | Using where | |
5 | DEPENDENT SUBQUERY | p5 | ref | koxId_taskid_index | koxId_taskid_index | 9 | voyah_koc.t1.id | 1083 | 50.0 | Using where | |
4 | DEPENDENT SUBQUERY | p4 | ref | taskId_userid_index | taskId_userid_index | 9 | voyah_koc.t1.id | 2648 | 0.1 | Using where | |
3 | DEPENDENT SUBQUERY | p2 | ref | koxId_taskid_index | koxId_taskid_index | 9 | voyah_koc.t1.id | 1083 | 0.5 | Using index condition; Using where | |
3 | DEPENDENT SUBQUERY | p3 | eq_ref | PRIMARY,idx_userid | PRIMARY | 8 | voyah_koc.p2.kox_id | 1 | 10.0 | Using where | |
2 | DEPENDENT SUBQUERY | p1 | ALL | openid_createTime_index | null | null | null | 309587 | 0.11 | Using where |
需要重点关注的信息:type、rows+filtered、Extra
避免全表扫描
观察type这一列,发现都是各查询基本是 ref、eq_ref,说明已经用到索引
但是最后一列p1表对应的子查询,type=all,意味着进行了全表扫描,且rows=31w,所以拖慢了速度
只要通过添加索引,把type列的all干掉,基本上已经能解决主要问题
回到原sql,把这个all的全表扫描子查询抽出来:
sql
-- 子查询A(p1表)
(
select count(DISTINCT p1.open_id)
from bus_wechat_message_send_log p1
where p1.task_notice_id = t1.id
and template_id = '新工单提醒'
and p1.create_time BETWEEN '2025-03-01 00:00:00' AND '2025-03-07 23:59:59'
) as '消息推送人数',
主要用到 task_notice_id、create_time、template_id 这3个列
考虑到 task_notice_id 筛选效果最好(定位到1~2条),作为"最左匹配原则"的首列,创建复合索引
创建时间做小优化-索引desc排列,虽然此处不用但其他列表查询可能会用到
sql
ALTER TABLE `bus_wechat_message_send_log`
ADD INDEX `taskId_createTime_template` (`task_notice_id` ASC,`create_time` DESC,`template_id` ASC) COMMENT '复合索引:根据任务id+创建时间+模板类型搜索' ;
最后执行explain sql查看索引效果,发现已经优化掉全表扫描(all -> req),预估列数(31w -> 500)
再次执行慢sql,发现耗时从 5s 优化到 100ms,相当于原先耗时的 2%
结论:节约掉98%的查询时间
细节优化
1.复合索引效率
观察 filtered+rows 列,发现p5、p4、p2对应的子查询中,rows比较大,filtered比较小,引起注意
再观察extra列的信息,考虑是否复合索引没有很好的覆盖where筛选条件、是否可以包括select字段减少回表
Extra 字段 | 触发条件 | 性能影响 | 优化方向 |
---|---|---|---|
Using index | 覆盖索引查询 | 最优(无需回表) | 尽量通过覆盖索引设计减少回表 |
Using index condition | 索引条件下推生效 | 较优(减少回表数据量) | 确保 ICP 启用,优化索引覆盖查询条件 |
Using where | 服务器层过滤非索引条件 | 中等(需二次过滤) | 扩展索引或优化 WHERE 条件 |
2.排序字段
对于排序字段,可以通过索引使其扫描出来就是有序的,避免排序开销,此处是 t1.taskBeginTime desc
观察t1表的extra列,发现是 Backward index scan,也就是反向扫描,说明存在taskBeginTime的索引不过是asc排列
所以已经是有序的了,不过也可以考虑改为desc
耗时分析 (analyze)
在刚刚的sql语句前面加上explain
css
explain analyze [sql]
执行可以查询到更清晰的执行计划,包含各部分的耗时:
erlang
-> Nested loop left join (cost=30.6 rows=1.35) (actual time=0.206..1.04 rows=16 loops=1)
-> Filter: ((t1.isDeleted = '0') and (t1.taskStatus not in (1,4)) and (t1.tenant_id = 66) and (t1.taskType <> 8)) (cost=29.2 rows=0.96) (actual time=0.157..0.719 rows=16 loops=1)
-> Index range scan on t1 using bus_task_taskBeginTime_index over ('2025-03-01 00:00:00' <= taskBeginTime <= '2025-03-07 23:59:59') (reverse), with index condition: (t1.taskBeginTime between '2025-03-01 00:00:00' and '2025-03-07 23:59:59') (cost=29.2 rows=24) (actual time=0.149..0.612 rows=24 loops=1)
-> Filter: ((t2.isDeleted = '0') and (t2.tenant_id = 66)) (cost=1.48 rows=1.41) (actual time=0.0147..0.0188 rows=1 loops=16)
-> Index lookup on t2 using bus_task_platform_mapping_taskId_index (taskId=t1.id) (cost=1.48 rows=1.41) (actual time=0.012..0.0165 rows=2.06 loops=16)
-> Select #2 (subquery in projection; dependent)
-> Aggregate: count(distinct p1.open_id) (cost=32262 rows=1) (actual time=271..271 rows=1 loops=16)
-> Filter: ((p1.template_id = '新工单提醒') and (p1.task_notice_id = t1.id) and (p1.create_time between '2025-03-01 00:00:00' and '2025-03-07 23:59:59')) (cost=32227 rows=344) (actual time=268..271 rows=158 loops=16)
-> Table scan on p1 (cost=32227 rows=309587) (actual time=0.0215..163 rows=400997 loops=16)
-> Select #3 (subquery in projection; dependent)
-> Aggregate: count(0) (cost=968 rows=1) (actual time=0.0796..0.0797 rows=1 loops=16)
-> Nested loop inner join (cost=968 rows=0.542) (actual time=0.0767..0.0769 rows=0.125 loops=16)
-> Filter: ((p2.auditStatus = 1) and (p2.isDeleted = '0') and (p2.tenant_id = 66)) (cost=962 rows=5.42) (actual time=0.0716..0.0717 rows=0.125 loops=16)
-> Index lookup on p2 using koxId_taskid_index (taskId=t1.id), with index condition: (p2.kox_id is not null) (cost=962 rows=1083) (actual time=0.0679..0.0693 rows=1.38 loops=16)
-> Filter: (p3.is_enable = 1) (cost=0.997 rows=0.1) (actual time=0.0347..0.0348 rows=1 loops=2)
-> Single-row index lookup on p3 using PRIMARY (id=p2.kox_id) (cost=0.997 rows=1) (actual time=0.0332..0.0333 rows=1 loops=2)
-> Select #4 (subquery in projection; dependent)
-> Aggregate: sum(p4.score) (cost=2649 rows=1) (actual time=0.0189..0.019 rows=1 loops=16)
-> Filter: ((p4.source_id is null) and (p4.res_code = 0) and (p4.tenant_id = 66)) (cost=2649 rows=2.65) (actual time=0.0178..0.0178 rows=0 loops=16)
-> Index lookup on p4 using taskId_userid_index (task_id=t1.id) (cost=2649 rows=2649) (actual time=0.0172..0.0173 rows=0.125 loops=16)
-> Select #5 (subquery in projection; dependent)
-> Aggregate: sum(p5.total_volume_num) (cost=1069 rows=1) (actual time=0.0208..0.0208 rows=1 loops=16)
-> Filter: (p5.isDeleted = '0') (cost=1015 rows=542) (actual time=0.0151..0.0159 rows=1.38 loops=16)
-> Index lookup on p5 using koxId_taskid_index (taskId=t1.id) (cost=1015 rows=1083) (actual time=0.0135..0.0142 rows=1.38 loops=16)
为了更好的观察,根据箭头和层级,整理分段之后逐一分析:
- 第一坨
算子是 Nested loop left join,对应着主联表查询(t1+t2)
耗时看这个: (actual time=0.206..1.04 rows=16 loops=1)
含义:总共返回16行数据,返回第1行花费0.206ms,返回全部16行花费1.04ms,循环1次
计算方法:总耗时 = 返回全部耗时 x 循环次数
因此这一块t1、t2表耗时是 1.04x1 = 1.04ms
速度很快,可见这不是耗时的大头
sql
-> Nested loop left join (cost=30.6 rows=1.35) (actual time=0.206..1.04 rows=16 loops=1)
-> Filter: ((t1.isDeleted = '0') and (t1.taskStatus not in (1,4)) and (t1.tenant_id = 66) and (t1.taskType <> 8)) (cost=29.2 rows=0.96) (actual time=0.157..0.719 rows=16 loops=1)
-> Index range scan on t1 using bus_task_taskBeginTime_index over ('2025-03-01 00:00:00' <= taskBeginTime <= '2025-03-07 23:59:59') (reverse), with index condition: (t1.taskBeginTime between '2025-03-01 00:00:00' and '2025-03-07 23:59:59') (cost=29.2 rows=24) (actual time=0.149..0.612 rows=24 loops=1)
-> Filter: ((t2.isDeleted = '0') and (t2.tenant_id = 66)) (cost=1.48 rows=1.41) (actual time=0.0147..0.0188 rows=1 loops=16)
-> Index lookup on t2 using bus_task_platform_mapping_taskId_index (taskId=t1.id) (cost=1.48 rows=1.41) (actual time=0.012..0.0165 rows=2.06 loops=16)
- 第二坨
观察第二级 (Aggregate),总耗时 = 271 x 16 = 4336 ms = 4.3秒
可见这一块就是耗时的罪魁祸首,因此我们往下进一步深挖
观察第三级 (Filter),总耗时 = 271 x 16 = 4336 ms = 4.3秒
观察第四级 (Table scan on p1) ,总耗时 163 x 16 = 2608ms = 2.6秒
其中第3级是包括第4级的,也就是说第4级全表扫描309587条耗时2.6秒,那么第3级纯过滤的耗时是4.3-2.6=1.7秒
所以可以说,主要的时间开销就是对p1表的全表扫描和过滤操作
sql
-> Select #2 (subquery in projection; dependent)
-> Aggregate: count(distinct p1.open_id) (cost=32262 rows=1) (actual time=271..271 rows=1 loops=16)
-> Filter: ((p1.template_id = '新工单提醒') and (p1.task_notice_id = t1.id) and (p1.create_time between '2025-03-01 00:00:00' and '2025-03-07 23:59:59')) (cost=32227 rows=344) (actual time=268..271 rows=158 loops=16)
-> Table scan on p1 (cost=32227 rows=309587) (actual time=0.0215..163 rows=400997 loops=16)
- 第三、四、五坨
按照相同的思路,可以计算出:
(Select #3)对应p2+p3表的子查询B,子查询耗时 = 0.0797 x 16 = 1.28ms
(Select #4)对应p4表的子查询C,子查询耗时 = 0.019 x 16 = 0.3ms
(Select #5)对应p5表的子查询D,子查询耗时 = 0.0208 x 16 = 0.33ms
可见这几个子查询都不是耗时重点,忽略不计
erlang
-> Select #3 (subquery in projection; dependent)
-> Aggregate: count(0) (cost=968 rows=1) (actual time=0.0796..0.0797 rows=1 loops=16)
-> Nested loop inner join (cost=968 rows=0.542) (actual time=0.0767..0.0769 rows=0.125 loops=16)
-> Filter: ((p2.auditStatus = 1) and (p2.isDeleted = '0') and (p2.tenant_id = 66)) (cost=962 rows=5.42) (actual time=0.0716..0.0717 rows=0.125 loops=16)
-> Index lookup on p2 using koxId_taskid_index (taskId=t1.id), with index condition: (p2.kox_id is not null) (cost=962 rows=1083) (actual time=0.0679..0.0693 rows=1.38 loops=16)
-> Filter: (p3.is_enable = 1) (cost=0.997 rows=0.1) (actual time=0.0347..0.0348 rows=1 loops=2)
-> Single-row index lookup on p3 using PRIMARY (id=p2.kox_id) (cost=0.997 rows=1) (actual time=0.0332..0.0333 rows=1 loops=2)
-> Select #4 (subquery in projection; dependent)
-> Aggregate: sum(p4.score) (cost=2649 rows=1) (actual time=0.0189..0.019 rows=1 loops=16)
-> Filter: ((p4.source_id is null) and (p4.res_code = 0) and (p4.tenant_id = 66)) (cost=2649 rows=2.65) (actual time=0.0178..0.0178 rows=0 loops=16)
-> Index lookup on p4 using taskId_userid_index (task_id=t1.id) (cost=2649 rows=2649) (actual time=0.0172..0.0173 rows=0.125 loops=16)
-> Select #5 (subquery in projection; dependent)
-> Aggregate: sum(p5.total_volume_num) (cost=1069 rows=1) (actual time=0.0208..0.0208 rows=1 loops=16)
-> Filter: (p5.isDeleted = '0') (cost=1015 rows=542) (actual time=0.0151..0.0159 rows=1.38 loops=16)
-> Index lookup on p5 using koxId_taskid_index (taskId=t1.id) (cost=1015 rows=1083) (actual time=0.0135..0.0142 rows=1.38 loops=16)
- 总结
综上,经过分析可知,对p1表的子查询A,耗时4.3秒,其他主联表查询、子查询耗时总和乐3毫秒
只要把p1表的耗时降下来就能解决这个慢sql的问题