Mysql中 distinct 和 group by 哪个效率高？

结论

先说结论

有索引的情况下：group by和distinct都能使用索引，效率相同
无索引的情况下：distinct效率高于group by。原因是distinct 和 group by都会进行分组操作，但group by可能会进行排序，触发filesort，导致sql执行效率低下。

推荐使用 group by

distinct的使用

bash 复制代码

SELECT DISTINCT columns FROM table_name WHERE where_conditions;

bash 复制代码

mysql> select distinct age from student;
+------+
| age |
+------+
| 10 |
| 12 |
| 11 |
| NULL |
+------+
4 rows in set (0.01 sec)

注意：

如果列具有NULL值，并且对该列使用DISTINCT子句，MySQL将保留一个NULL值，并删除其它的NULL值，因为DISTINCT子句将所有NULL值视为相同的值

distinct多列去重

bash 复制代码

SELECT DISTINCT column1,column2 FROM table_name WHERE where_conditions;

bash 复制代码

mysql> select distinct sex,age from student;
+--------+------+
| sex | age |
+--------+------+
| male | 10 |
| female |   12 |
| male | 11 |
| male | NULL |
| female | 11 |
+--------+------+
5 rows in set (0.02 sec)

group by的使用

与 distinct类似

语法

bash 复制代码

SELECT columns FROM table_name WHERE where_conditions GROUP BY columns;

bash 复制代码

mysql> select age from student group by age;
+------+
| age |
+------+
| 10 |
| 12 |
| 11 |
| NULL |
+------+
4 rows in set (0.02 sec)

多列去重

bash 复制代码

mysql> select sex,age from student group by sex,age;
+--------+------+
| sex | age |
+--------+------+
| male | 10 |
| female |   12 |
| male | 11 |
| male | NULL |
| female | 11 |
+--------+------+
5 rows in set (0.03 sec)

区别

两者的语法区别在于

group by可以进行单列去重
group by的原理是先对结果进行分组排序，然后返回每组中的第一条数据。且是根据group by的后接字段进行去重的。
group by，在MYSQL8.0之前，GROUP Y默认会依据字段进行隐式排序。

隐式排序

8.0之前官方解释：https://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html

GROUP BY 默认隐式排序（指在 GROUP BY 列没有 ASC 或 DESC 指示符的情况下也会进行排序）。然而，GROUP BY进行显式或隐式排序已经过时（deprecated）了，要生成给定的排序顺序，请提供 ORDER BY 子句。

8.0官方解释：https://dev.mysql.com/doc/refman/8.0/en/order-by-optimization.html

从前（Mysql5.7版本之前），Group by会根据确定的条件进行隐式排序。在mysql 8.0中，已经移除了这个功能，所以不再需要通过添加order by null 来禁止隐式排序了，但是，查询结果可能与以前的 MySQL 版本不同。要生成给定顺序的结果，请按通过ORDER BY指定需要进行排序的字段。

总结

在语义相同，有索引的情况下：

group by和distinct都能使用索引，效率相同。因为group by和distinct近乎等价，distinct可以被看做是特殊的group by。

在语义相同，无索引的情况下：

distinct效率高于group by。原因是distinct 和 group by都会进行分组操作，但group by在Mysql8.0之前会进行隐式排序，导致触发filesort，sql执行效率低下。

但从Mysql8.0开始，Mysql就删除了隐式排序，所以，此时在语义相同，无索引的情况下，group by和distinct的执行效率也是近乎等价的。

推荐group by的原因

group by语义更为清晰

group by可对数据进行更为复杂的一些处理