2)无索引情况下,distinct 效率高于group by。distinct 和 group by都会进行分组操作,但group by可能会进行排序,触发filesort,导致sql执行效率低下。
两者的语法区别在于,distinct 用于返回唯一不同的值,group by 的原理是先对结果进行分组,然后返回每组中的第一条数据,且是根据group by的后接字段进行去重的。
2. 实验
准备表:test_subject_bal 1千万数据
sql复制代码
select count(*) from test_subject_bal t
2.1 无索引情况下
1)distinct 实验
实验语句:
sql复制代码
select distinct t.social_credit_code,t.year_month
from test_subject_bal t
where t.data_flag='M' --and t.social_credit_code='014011024205200001'
and not exists(select 1 from validate_dtl_book b
where b.orgno_fz = t.social_credit_code
and b.kid = t.book_id)
结果截图: 11.375s 6.31s 7.108s 6.769s 6.660s
sql语句的执行计划: 125316
2)group by 实验
实验语句:
sql复制代码
select t.social_credit_code,t.year_month
from test_subject_bal t
where t.data_flag='M' --and t.social_credit_code='014011024205200001'
and not exists(select 1 from validate_dtl_book b
where b.orgno_fz = t.social_credit_code
and b.kid = t.book_id)
group by social_credit_code,t.year_month
截图截图:7.458s 6.570s 7.123s 7.041s 6.206s
sql语句的执行计划: 125316
2.2有索引情况下
sql复制代码
create table test_subject_bal2 as select * from test_subject_bal t;
CREATE INDEX idx_orgno_test_subject_bal2 ON test_subject_bal2 (social_credit_code);
CREATE INDEX idx_ymonth_test_subject_bal2 ON test_subject_bal2 (year_month);
1)distinct 实验
sql语句:
sql复制代码
select distinct t.social_credit_code,t.year_month
from test_subject_bal2 t
where t.data_flag='M' --and t.social_credit_code='014011024205200001'
and not exists(select 1 from validate_dtl_book b
where b.orgno_fz = t.social_credit_code
and b.kid = t.book_id)
结果截图:7.142s 6.911s 6.867s 7.908s 6.636s
sql执行计划:125319
2)group by 实验
sql语句:
sql复制代码
select t.social_credit_code,t.year_month
from test_subject_bal2 t
where t.data_flag='M' --and t.social_credit_code='014011024205200001'
and not exists(select 1 from validate_dtl_book b
where b.orgno_fz = t.social_credit_code
and b.kid = t.book_id)
group by social_credit_code,t.year_month
结果截图: 6.827s 7.285s 7.415s 6.415s 6.384s
sql执行计划:125319
2.3 有索引情况下,且索引字段是过滤条件的字段
sql复制代码
CREATE INDEX idx_data_flag_test_subject_bal2 ON test_subject_bal2 (data_flag);
1)distinct 实验
sql语句:
sql复制代码
select distinct t.social_credit_code,t.year_month
from test_subject_bal2 t
where t.data_flag='M' --and t.social_credit_code='014011024205200001'
and not exists(select 1 from validate_dtl_book b
where b.orgno_fz = t.social_credit_code
and b.kid = t.book_id)
结果截图:6.352s 6.729s 6.242s 6.163s 6.126s
sql执行计划:125319
2)group by实验
sql语句:
sql复制代码
select t.social_credit_code,t.year_month
from test_subject_bal2 t
where t.data_flag='M' --and t.social_credit_code='014011024205200001'
and not exists(select 1 from validate_dtl_book b
where b.orgno_fz = t.social_credit_code
and b.kid = t.book_id)
group by social_credit_code,t.year_month