分页查询的 count 问题
业务场景中最常见的分页查询功能,配合查询参数, 分页查询会使用 select count(*) 求总数,limit offset 实现跳页。
以通知公告为例
parent_id字段 查询树结构,单表300w数据,看看会出现什么的问题。
java
// 使用 queryDSL 和BlazeJPAQuery 结合查询
@Override
public PageR<NoticeDTO> getPage(AntdPage page, NoticeVO notice) {
QNotice qNotice = QNotice.notice;
BooleanBuilder where = getWhere(notice);
BlazeJPAQuery<Notice> query = queryFactory.selectFrom(qNotice).where(where);
long size = query.fetchCount();
query.limit(page.getPageSize()).offset(page.getOffset());
List<NoticeDTO> list = mapstruct.toDto(query.fetch());
return toPageR(buildTree(list), size);
}
执行日志
log
2023-11-08 11:21:27.863 INFO 14700 --- [nio-8088-exec-6] i.g.s.s.common.log.LogPrintAspect : ********************* [/biz/notice/page] Request Log *********************
[2023-11-08 11:21:30] [statement-0] [耗时: 2657ms] 执行SQL:
select count(*) from biz_notice n1_0 where n1_0.title like '通知公告%' escape '!' and n1_0.parent_id is null
[2023-11-08 11:21:30] [statement-0] [耗时: 1ms] 执行SQL:
select n1_0.id,n1_0.create_by,n1_0.create_time,n1_0.json_str,n1_0.notice_time,n1_0.parent_id,n1_0.title,n1_0.update_by,n1_0.update_time from biz_notice n1_0 where n1_0.title like '通知公告%' escape '!' and n1_0.parent_id is null limit 20
2023-11-08 11:21:30.537 INFO 14700 --- [nio-8088-exec-6] i.g.s.s.common.log.LogPrintAspect : ********************* [/biz/notice/page] 请求过程 *********************
Request: GET http://127.0.0.1:8088/biz/notice/page
Params: /biz/notice/page?current=1&pageSize=20&title=通知公告&createTimeRange=2023-11-01 10:01:23,2023-11-15 10:01:23
IP: 127.0.0.1
UserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36
User: userId:2927901808918528,userName:admin
Method: io.github.smilexizheng.smileboot.modules.testgen.controller.NoticeController.getNoticePage
Args[pageable]:{"current":0,"offset":0,"pageSize":20,"pageable":{"offset":0,"pageNumber":0,"pageSize":20,"paged":true,"sort":{"empty":true,"sorted":false,"unsorted":true},"unpaged":false}}
Args[notice]:{"title":"通知公告"}
Time: 2671ms
web端 耗时2.69秒,方法耗时2.67秒。通过日志发现select count 造成了严重的性能问题。
java
//不查询count总数,则方法仅耗时 15毫秒
long size = 5000;
log
2023-11-08 11:49:46.101 INFO 14700 --- [nio-8088-exec-9] i.g.s.s.common.log.LogPrintAspect : ********************* [/biz/notice/page] Request Log *********************
[2023-11-08 11:49:46] [statement-6] [耗时: 5ms] 执行SQL:
select n1_0.id,n1_0.create_by,n1_0.create_time,n1_0.json_str,n1_0.notice_time,n1_0.parent_id,n1_0.title,n1_0.update_by,n1_0.update_time from biz_notice n1_0 where n1_0.title like '通知公告%' escape '!' and n1_0.parent_id is null limit 4900,20
2023-11-08 11:49:46.118 INFO 14700 --- [nio-8088-exec-9] i.g.s.s.common.log.LogPrintAspect : ********************* [/biz/notice/page] 请求过程 *********************
Request: GET http://127.0.0.1:8088/biz/notice/page
Params: /biz/notice/page?current=246&pageSize=20&title=通知公告&createTimeRange=2023-11-01 10:01:23,2023-11-15 10:01:23
IP: 127.0.0.1
UserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36
User: userId:2927901808918528,userName:admin
Method: io.github.smilexizheng.smileboot.modules.testgen.controller.NoticeController.getNoticePage
Args[pageable]:{"current":245,"offset":4900,"pageSize":20,"pageable":{"offset":4900,"pageNumber":245,"pageSize":20,"paged":true,"sort":{"empty":true,"sorted":false,"unsorted":true},"unpaged":false}}
Args[notice]:{"title":"通知公告"}
Time: 15ms
总结
多大本事,干多大事。 看业务场景 总体来说就是使用满足业务(假)的count总数,避免select count查询语句的执行。注意sql查询条件 走索引即可。
count 为什么性能不高?参考知乎链接
下面是 mysql 官方文档中对于 count 查询的相关说明:
InnoDB does not keep an internal count of rows in a table because concurrent transactions might "see" different numbers of rows at the same time. Consequently, SELECT COUNT() statements only count rows visible to the current transaction.
As of MySQL 8.0.13, SELECT COUNT( ) FROM tbl_name query performance for InnoDB tables is optimized for single-threaded workloads if there are no extra clauses such as WHERE or GROUP BY.InnoDB processes SELECT COUNT() statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, InnoDB processes SELECT COUNT() statements by scanning the clustered index.
存储引擎没有额外存储 count 信息以方便快速查询的主要原因还是事务并发情况下的考虑,不同的事务读的信息有可能不一致。 对于 MyISAM 这种不支持事务的引擎则是直接存储一个额外的 count 信息,做到快速获取:
For MyISAM tables, COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause.is optimization only applies to MyISAM tables, because an exact row count is stored for this storage engine and can be accessed very quickly. COUNT(1) is only subject to the same optimization if the first column is defined as NOT NULL.
此外 count(*) 和 count(1) 没有区别,官方文档也有解释。
InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.
postgresql 关于count的说明:
在把count聚集应用到整个表上时,习惯于使用其他 SQL 数据管理系统的用户可能会对它的性能感到失望。一个如下的查询:
SELECT count(*) FROM sometable;
将会要求与整个表大小成比例的工作:PostgreSQL将需要扫描整个表或者整个包含表中所有行的索引。
解决方法思路
1.限制查询结果的不准确分页
必须要有分页信息,但是要求又不高,比如有些业务需要做排行,比如某些场景 仅需要前1000,配合sql实现查询即可。
select count(*) from (select * from product where category=? order by id asc limit 5000);
2.不显示总数,前端仅加载更多
不显示总数也不支持跳页,只给加载更多选项逐页加载,
或者滚动到底部自动加载更多,是很多 app 和网站前台的采用的方案。
不过虽然不做 count 查询,offset 也一样要避免,这样才可以有比较好的性能。offset 在翻页次数多了以后,
也会因为要扫描太多的索引数据,导致查询不效率。
select * from product order by id asc limit 20;
上面是查询第一页的 sql ,按id排序的,在查询第二页的时候,最后一条记录的id 作为条件。
select * from product where id>上一页最后一条记录的ID order by id asc limit 20;
3冗余存储 count 信息行不行
提前存储好总数,但做冗余维护成本很高,各种修改或新增加都要更新数据,
在条件非常复杂的情况,要存储各种组合情况下的 count 才可以做到,所以显然大多数场景下是不可行的。
4数据量大又必须有精确分页的业务需求
采用强大的硬件来支撑或者使用分布式方案,投入高端硬件的同时,项目架构也要改进,总之实现大数据、高并发的成本非常高。