MySQL 索引优化（一）

索引创建原则

先写代码，再建索引
不应该在创建完表之后立马就创建索引，等主体业务开发完毕以后，再把涉及到该表的 sql 分析过后再建立索引。
联合索引尽量覆盖查询条件
在设计一个联合索引的时候，让联合索引尽可能的包含 sql 语句中的 where、order by、group by 的字段，还要确保这些字段尽量满足 sql 最左前缀原则。
选取索引字段选值比较多、区分度高的字段
比如我有一张用户表，它有一个性别字段，只包含男女两种值，这就没有办法进行快速的二分查找，不如全表扫描，索引也就失去意义了。建立索引，尽量选取值比较多的字段，才能更好的发挥 B+ 树二分查找的优势。
长字符串可以采用前缀索引
选取索引尽量对索引字段类型较小的列设计索引，字段小，占用磁盘空间也小，搜索的时候性能也会好一些。如果需要给长字符串建立索引，比如 varchar(255)，比如这时就可以针对这个字段的前 20 歌字符建立索引，也就是说，把这个字段每个值的前面 20 位放到索引树里，比如 KEY index(name(20),age,position)。
where 和 order by 冲突时优先 where
在 where 和 order by 出现索引设计冲突时，优先让 where 条件去使用索引快速筛选出一部分数据，再进行排序。因为大多数情况下，基于索引进行 where 筛选往往可以最快速筛选出需要的少部分数据，再进行排序的成本可能就会降低很多。
数据量足够大的情况下再建立索引
一般单表超过十万条数据以后，为了改善用户体验，再建立索引。

索引优化案例演示

示例表结构和数据

复制代码

CREATE TABLE `student` (
	`id` INT(11) NOT NULL AUTO_INCREMENT,
	`name` VARCHAR(24) NOT NULL DEFAULT '' COMMENT '姓名',
	`age` INT(11) NOT NULL DEFAULT '0' COMMENT '年龄',
	`school` VARCHAR(20) NOT NULL DEFAULT '' COMMENT '学校名',
	`start_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '入学时间',
	PRIMARY KEY (`id`),
	KEY `idx_name_age_school` (`name`,`age`,`school`) USING BTREE
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COMMENT='学生表';

INSERT INTO student(NAME,age,school,start_time) VALUES('老大',11,'老大小学',NOW());
INSERT INTO student(NAME,age,school,start_time) VALUES('老二', 12,'老大小学',NOW());
INSERT INTO student(NAME,age,school,start_time) VALUES('老三',13,'老大小学',NOW());


DROP PROCEDURE IF EXISTS insert_student;
DELIMITER ;;
CREATE PROCEDURE insert_student()
BEGIN
	DECLARE i INT;
	SET i=1;
	WHILE(i<=100000)DO
		INSERT INTO student(NAME,age,school) VALUES(CONCAT('老',i),i,'老大小学');
		SET i=i+1;
	END WHILE;
END;;
DELIMITER ;
CALL insert_student();

覆盖查询优化

第一个例子，联合索引的首个字段用范围查询不会走索引

复制代码

EXPLAIN SELECT * FROM student WHERE NAME > '老1' AND age = 11 AND school='老大小学';

    id  select_type  table    partitions  type    possible_keys          key     key_len  ref       rows  filtered  Extra        
------  -----------  -------  ----------  ------  ---------------------  ------  -------  ------  ------  --------  -------------
     1  SIMPLE       student  (NULL)      ALL     idx_name_age_position  (NULL)  (NULL)   (NULL)   97664      0.50  Using where

从上面的输出结果可以看得到，没有使用索引，mysql 内部可能判断结果集很大，回表效率不高，直接使用全表扫描，我们再来看强制使用索引的情况：

复制代码

EXPLAIN SELECT * FROM student FORCE INDEX(idx_name_age_school) WHERE NAME > '老1' AND age = 11 AND school='老大小学';

    id  select_type  table    partitions  type    possible_keys        key                  key_len  ref       rows  filtered  Extra                  
------  -----------  -------  ----------  ------  -------------------  -------------------  -------  ------  ------  --------  -----------------------
     1  SIMPLE       student  (NULL)      range   idx_name_age_school  idx_name_age_school  74       (NULL)   48832      1.00  Using index condition

看似结果集数量少了，但是回表查询效率不高，所以最重的查询效率不一定比全表扫描高，可以自己尝试一下这两条语句，看看执行时间的区别，我就不截图了。

使用覆盖索引查询优化那条查询语句：

复制代码

EXPLAIN SELECT NAME, age, school FROM student WHERE NAME > '老1' AND age = 11 AND school='老大小学';

    id  select_type  table    partitions  type    possible_keys        key                  key_len  ref       rows  filtered  Extra                     
------  -----------  -------  ----------  ------  -------------------  -------------------  -------  ------  ------  --------  --------------------------
     1  SIMPLE       student  (NULL)      range   idx_name_age_school  idx_name_age_school  74       (NULL)   48832      1.00  Using where; Using index

可以看到我们使用了 idx_name_age_school 索引，查询的数目也少了，extra 列也显示出使用了索引，提高了查询速度。

like kk%一般会使用索引

复制代码

EXPLAIN SELECT * FROM student_copy WHERE NAME LIKE '老%' AND age = 11 AND school='老大小学';

    id  select_type  table         partitions  type    possible_keys        key                  key_len  ref       rows  filtered  Extra                  
------  -----------  ------------  ----------  ------  -------------------  -------------------  -------  ------  ------  --------  -----------------------
     1  SIMPLE       student_copy  (NULL)      range   idx_name_age_school  idx_name_age_school  140      (NULL)       3     33.33  Using index condition

可以看到在 key 的字段里，使用了索引，这个主要是用到了索引下推优化。

正常联合索引（name, age, school）是按照最左原则，上面这个查询语句会查询 name 字段的索引，根据 name 字段过滤完，得到的索引行里的 age 和 school 是无序的，没有办法很好的利用。MySQL 5.6 之前的版本，这个查询语句只能查询名字是老开头的索引，然后拿着这些索引的主键一个个回表，到主键索引上找出相应的记录，再对比 age 和 school 这两个字段是否符合。

MySQL 5.6 引入索引下推，可以在遍历索引过程中，对索引中包含的所有字段先判断，过滤掉不符合的记录后再进行回表 ，能够有效的减少回表次数。使用索引下推优化后，上面那个查询在联合索引里匹配老开头的索引之后，同时也会在索引里过滤 age 和 school 两个字段，拿着过滤完剩下的索引对应的主键 id 再回表查整行数据。

Order by 的优化

复制代码

EXPLAIN SELECT * FROM student WHERE NAME = '老1' AND school = '老大小学' ORDER BY age;

    id  select_type  table    partitions  type    possible_keys        key                  key_len  ref       rows  filtered  Extra                  
------  -----------  -------  ----------  ------  -------------------  -------------------  -------  ------  ------  --------  -----------------------
     1  SIMPLE       student  (NULL)      ref     idx_name_age_school  idx_name_age_school  74       const        1     10.00  Using index condition

利用最左前缀原则，查询匹配的条件中并不包含 age 字段，只有 name 和 school 两个条件，key_len=74，可以看到 Extra 字段的结果是 Using index condition 而不是 Using filesort。

复制代码

EXPLAIN SELECT * FROM student WHERE NAME = '老1' ORDER BY school;

    id  select_type  table    partitions  type    possible_keys        key                  key_len  ref       rows  filtered  Extra           
------  -----------  -------  ----------  ------  -------------------  -------------------  -------  ------  ------  --------  ----------------
     1  SIMPLE       student  (NULL)      ref     idx_name_age_school  idx_name_age_school  74       const        1    100.00  Using filesort

看结果 key_len=74，查询使用了 name 索引，由于用了 school 进行排序，跳过了age，extra 结果 Using filesort。

对比下一条语句

复制代码

EXPLAIN SELECT * FROM student WHERE NAME = '老1' ORDER BY age, school;

    id  select_type  table    partitions  type    possible_keys        key                  key_len  ref       rows  filtered  Extra   
------  -----------  -------  ----------  ------  -------------------  -------------------  -------  ------  ------  --------  --------
     1  SIMPLE       student  (NULL)      ref     idx_name_age_school  idx_name_age_school  74       const        1    100.00  (NULL)

查询只用了索引 name，age 和 school 用于排序，没有 Using filesort。

将 age 和 school 调换

复制代码

EXPLAIN SELECT * FROM student WHERE NAME = '老1' ORDER BY school, age;

    id  select_type  table    partitions  type    possible_keys        key                  key_len  ref       rows  filtered  Extra           
------  -----------  -------  ----------  ------  -------------------  -------------------  -------  ------  ------  --------  ----------------
     1  SIMPLE       student  (NULL)      ref     idx_name_age_school  idx_name_age_school  74       const        1    100.00  Using filesort

又出现了 Using filesort，因为索引的创建顺序是 name，age，school，但是排序的顺序颠倒了，不符合最左前缀原则，又使用 Using filesort。

复制代码

EXPLAIN SELECT * FROM student WHERE NAME = '老1' AND age = 11 ORDER BY school, age;

    id  select_type  table    partitions  type    possible_keys        key                  key_len  ref            rows  filtered  Extra   
------  -----------  -------  ----------  ------  -------------------  -------------------  -------  -----------  ------  --------  --------
     1  SIMPLE       student  (NULL)      ref     idx_name_age_school  idx_name_age_school  78       const,const       1    100.00  (NULL)

虽然排序的字段与索引顺序还是不一样，但 age 是常量，在排序中被优化了，所以索引未颠倒，不会出现 Using filesort。

复制代码

EXPLAIN SELECT * FROM student WHERE NAME = '老1' ORDER BY age ASC, school DESC;

    id  select_type  table    partitions  type    possible_keys        key                  key_len  ref       rows  filtered  Extra           
------  -----------  -------  ----------  ------  -------------------  -------------------  -------  ------  ------  --------  ----------------
     1  SIMPLE       student  (NULL)      ref     idx_name_age_school  idx_name_age_school  74       const        1    100.00  Using filesort

虽然排序的字段和索引顺序一样了，但在排序的时候 age 正序，school 倒序，这与索引中的排序方式不同，从而产生 Using file sort。

复制代码

EXPLAIN SELECT * FROM student WHERE NAME > '老1' ORDER BY NAME;

    id  select_type  table    partitions  type    possible_keys        key     key_len  ref       rows  filtered  Extra                        
------  -----------  -------  ----------  ------  -------------------  ------  -------  ------  ------  --------  -----------------------------
     1  SIMPLE       student  (NULL)      ALL     idx_name_age_school  (NULL)  (NULL)   (NULL)   97664     50.00  Using where; Using filesort

使用范围查找的时候，这种情况可以通过使用覆盖索引优化，避免 Using filesort

复制代码

EXPLAIN SELECT NAME, age, school FROM student WHERE NAME > '老1' ORDER BY NAME;

    id  select_type  table    partitions  type    possible_keys        key                  key_len  ref       rows  filtered  Extra                     
------  -----------  -------  ----------  ------  -------------------  -------------------  -------  ------  ------  --------  --------------------------
     1  SIMPLE       student  (NULL)      range   idx_name_age_school  idx_name_age_school  74       (NULL)   48832    100.00  Using where; Using index

Group by 和 Order by 类似的，以上的情况都差不多，实质是先排序后分组 ，遵照索引创建顺序的最左前缀原则。对于 Group by 的优化如果不需要排序的可以加上 Order by null 禁止排序。能写在 where 中的不要用 having。

order by 总结

MySQL 支持用两种方式排序，filesort 和 index。Using index 指的是只需要扫描索引本身就能完成排序。index效率高，filesort 效率低。

order by 满足两种情况会使用 Using index。

order by 语句满足最左前缀原则。
where 子句和 order by 子句组合使用最左前缀原则。
尽量在索引列上完成排序，遵循索引建立时 的最左前缀原则。
能用覆盖索引尽量用覆盖索引。

总结

这篇文章包含了建立索引的原则，联合索引查询优化，like XX% 类型查询以及 order by、group by 查询的优化。最重要的部分还是最左前缀原则的理解，如果把最左前缀原则理解通透，这些优化都会融会贯通。下一篇继续索引优化。