文章目录
1、前缀索引
当字段类型为字符串(varchar,text等)时,有时候需要索引很长的字符串,这会让索引变得很大,查询时,浪费大量的磁盘IO,影响查询效率。此时可以只将字符串的一部分前缀,建立索引,这样可以大大节约索引空间,从而提高索引效率。
sql
复制代码
create index idx_xxxx on table_name(column(n))
2、前缀长度
可以根据索引的选择性来决定,而选择性是指不重复的索引值(基数)和数据表的记录总数的比值,索引选择性越高则查询效率越高,唯一索引的选择性是1,这是最好的索引选择性,性能页是最好的。
sql
复制代码
select count(distinct email)/count(*) from tb_user;
select count(distinct substring(email,1,5))/count(*) from tb_user;
3、查询表数据
cpp
复制代码
mysql> select * from tb_user;
+----+--------+-------------+-----------------------+----------------------+------+--------+--------+---------------------+
| id | name | phone | email | profession | age | gender | status | createtime |
+----+--------+-------------+-----------------------+----------------------+------+--------+--------+---------------------+
| 1 | 吕布 | 17799990000 | lvbu666@163.com | 软件工程 | 23 | 1 | 6 | 2001-02-02 00:00:00 |
| 2 | 曹操 | 17799990001 | caocao666@qq.com | 通讯工程 | 33 | 1 | 0 | 2001-03-05 00:00:00 |
| 3 | 赵云 | 17799990002 | 17799990@139.com | 英语 | 34 | 1 | 2 | 2002-03-02 00:00:00 |
| 4 | 孙悟空 | 17799990003 | 17799990@sina.com | 工程造价 | 54 | 1 | 0 | 2001-07-02 00:00:00 |
| 5 | 花木兰 | 17799990004 | 19980729@sina.com | 软件工程 | 23 | 2 | 1 | 2001-04-22 00:00:00 |
| 6 | 大乔 | 17799990005 | daqiao666@sina.com | 舞蹈 | 22 | 2 | 0 | 2001-02-07 00:00:00 |
| 7 | 露娜 | 17799990006 | luna_love@sina.com | 应用数学 | 24 | 2 | 0 | 2001-02-08 00:00:00 |
| 8 | 程咬金 | 17799990007 | chengyaojin@163.com | 化工 | 38 | 1 | 5 | 2001-05-23 00:00:00 |
| 9 | 项羽 | 17799990008 | xiaoyu666@qq.com | 金属材料 | 43 | 1 | 0 | 2001-09-18 00:00:00 |
| 10 | 白起 | 17799990009 | baiqi666@sina.com | 机械工程及其自动
化 | 27 | 1 | 2 | 2001-08-16 00:00:00 |
| 11 | 韩信 | 17799990010 | hanxin520@163.com | 无机非金属材料工
程 | 27 | 1 | 0 | 2001-06-12 00:00:00 |
| 12 | 荆轲 | 17799990011 | jingke123@163.com | 会计 | 29 | 1 | 0 | 2001-05-11 00:00:00 |
| 13 | 兰陵王 | 17799990012 | lanlinwang666@126.com | 工程造价 | 44 | 1 | 1 | 2001-04-09 00:00:00 |
| 14 | 狂铁 | 17799990013 | kuangtie@sina.com | 应用数学 | 43 | 1 | 2 | 2001-04-10 00:00:00 |
| 15 | 貂蝉 | 17799990014 | 84958948374@qq.com | 软件工程 | 40 | 2 | 3 | 2001-02-12 00:00:00 |
| 16 | 妲己 | 17799990015 | 2783238293@qq.com | 软件工程 | 31 | 2 | 0 | 2001-01-30 00:00:00 |
| 17 | 芈月 | 17799990016 | xiaomin2001@sina.com | 工业经济 | 35 | 2 | 0 | 2000-05-03 00:00:00 |
| 18 | 嬴政 | 17799990017 | 8839434342@qq.com | 化工 | 38 | 1 | 1 | 2001-08-08 00:00:00 |
| 19 | 狄仁杰 | 17799990018 | jujiamlm8166@163.com | 国际贸易 | 30 | 1 | 0 | 2007-03-12 00:00:00 |
| 20 | 安琪拉 | 17799990019 | jdodm1h@126.com | 城市规划 | 51 | 2 | 0 | 2001-08-15 00:00:00 |
| 21 | 典韦 | 17799990020 | ycaunanjian@163.com | 城市规划 | 52 | 1 | 2 | 2000-04-12 00:00:00 |
| 22 | 廉颇 | 17799990021 | lianpo321@126.com | 土木工程 | 19 | 1 | 3 | 2002-07-18 00:00:00 |
| 23 | 后羿 | 17799990022 | altycj2000@139.com | 城市园林 | 20 | 1 | 0 | 2002-03-10 00:00:00 |
| 24 | 姜子牙 | 17799990023 | 37483844@qq.com | 工程造价 | 29 | 1 | 4 | 2003-05-26 00:00:00 |
+----+--------+-------------+-----------------------+----------------------+------+--------+--------+---------------------+
24 rows in set (0.00 sec)
mysql>
4、查询表的记录总数
cpp
复制代码
mysql> select count(*) from tb_user;
+----------+
| count(*) |
+----------+
| 24 |
+----------+
1 row in set (0.00 sec)
mysql>
5、计算并返回具有电子邮件地址(email)的用户的数量
cpp
复制代码
mysql> select count(email) from tb_user;
+--------------+
| count(email) |
+--------------+
| 24 |
+--------------+
1 row in set (0.00 sec)
mysql>
6、从tb_user表中计算并返回具有不同电子邮件地址的用户的数量
cpp
复制代码
mysql> select count(distinct email) from tb_user;
+-----------------------+
| count(distinct email) |
+-----------------------+
| 24 |
+-----------------------+
1 row in set (0.00 sec)
mysql>
7、计算唯一电子邮件地址(email)的比例相对于表中的总行数
cpp
复制代码
mysql> select count(distinct email)/count(*) from tb_user;
+--------------------------------+
| count(distinct email)/count(*) |
+--------------------------------+
| 1.0000 |
+--------------------------------+
1 row in set (0.00 sec)
mysql>
其中1表示所有用户都有唯一的电子邮件地址,而0表示没有用户有电子邮件地址(尽管这在现实中不太可能)
。
用来衡量 email 字段的去重比例,即表示不重复的 email 占总记录数的比例。
用来评估数据中电子邮件地址的唯一性程度,或者说检测是否存在大量的重复邮箱账户。如果结果接近1,说明几乎每个行都有一个唯一的电子邮件地址;如果远小于1,则表示有较多的电子邮件地址重复。
8、从每个电子邮件地址中提取前10个字符,并计算这些前10个字符唯一值的数量与总用户数量的比率。
cpp
复制代码
mysql> select count(distinct substring(email,1,10))/count(*) from tb_user;
+------------------------------------------------+
| count(distinct substring(email,1,10))/count(*) |
+------------------------------------------------+
| 1.0000 |
+------------------------------------------------+
1 row in set (0.00 sec)
mysql>
9、电子邮件地址的前9个字符的唯一值的数量与总用户数量的比率
cpp
复制代码
mysql> select count(distinct substring(email,1,9))/count(*) from tb_user;
+-----------------------------------------------+
| count(distinct substring(email,1,9))/count(*) |
+-----------------------------------------------+
| 0.9583 |
+-----------------------------------------------+
1 row in set (0.00 sec)
mysql>
10、电子邮件地址的前8个字符与前9个字符在唯一性方面的表现是相似的
cpp
复制代码
mysql> select count(distinct substring(email,1,8))/count(*) from tb_user;
+-----------------------------------------------+
| count(distinct substring(email,1,8))/count(*) |
+-----------------------------------------------+
| 0.9583 |
+-----------------------------------------------+
1 row in set (0.00 sec)
mysql>
11、前 6 个字符的不重复数量占总行数的比例
cpp
复制代码
mysql> select count(distinct substring(email,1,6))/count(*) from tb_user;
+-----------------------------------------------+
| count(distinct substring(email,1,6))/count(*) |
+-----------------------------------------------+
| 0.9583 |
+-----------------------------------------------+
1 row in set (0.00 sec)
mysql>
12、前 5 个字符的不重复数量占总行数的比例
cpp
复制代码
mysql> select count(distinct substring(email,1,5))/count(*) from tb_user;
+-----------------------------------------------+
| count(distinct substring(email,1,5))/count(*) |
+-----------------------------------------------+
| 0.9583 |
+-----------------------------------------------+
1 row in set (0.00 sec)
mysql>
13、随着截取长度的减少,电子邮件地址前缀的唯一性也在减少
cpp
复制代码
mysql> select count(distinct substring(email,1,4))/count(*) from tb_user;
+-----------------------------------------------+
| count(distinct substring(email,1,4))/count(*) |
+-----------------------------------------------+
| 0.9167 |
+-----------------------------------------------+
1 row in set (0.00 sec)
mysql>
14、查看MySQL中tb_user表的索引
cpp
复制代码
mysql> show index from tb_user;
+---------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+---------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| tb_user | 0 | PRIMARY | 1 | id | A | 24 | NULL | NULL | | BTREE | | | YES | NULL |
| tb_user | 1 | idx_user_pro_age_sta | 1 | profession | A | 16 | NULL | NULL | YES | BTREE | | | YES | NULL |
| tb_user | 1 | idx_user_pro_age_sta | 2 | age | A | 22 | NULL | NULL | YES | BTREE | | | YES | NULL |
| tb_user | 1 | idx_user_pro_age_sta | 3 | status | A | 24 | NULL | NULL | YES | BTREE | | | YES | NULL |
+---------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
4 rows in set (0.01 sec)
mysql>
15、在tb_user表的email列上创建一个前缀索引,其中只包括email列的前5个字符
cpp
复制代码
mysql> create index idx_email_5 on tb_user(email(5));
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> show index from tb_user;
+---------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+---------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| tb_user | 0 | PRIMARY | 1 | id | A | 24 | NULL | NULL | | BTREE | | | YES | NULL |
| tb_user | 1 | idx_user_pro_age_sta | 1 | profession | A | 16 | NULL | NULL | YES | BTREE | | | YES | NULL |
| tb_user | 1 | idx_user_pro_age_sta | 2 | age | A | 22 | NULL | NULL | YES | BTREE | | | YES | NULL |
| tb_user | 1 | idx_user_pro_age_sta | 3 | status | A | 24 | NULL | NULL | YES | BTREE | | | YES | NULL |
| tb_user | 1 | idx_email_5 | 1 | email | A | 23 | 5 | NULL | YES | BTREE | | | YES | NULL |
+---------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
5 rows in set (0.01 sec)
mysql>
16、查询 email='daqiao666@sina.com' 的用户
cpp
复制代码
mysql> select * from tb_user where email='daqiao666@sina.com';
+----+------+-------------+--------------------+------------+------+--------+--------+---------------------+
| id | name | phone | email | profession | age | gender | status | createtime |
+----+------+-------------+--------------------+------------+------+--------+--------+---------------------+
| 6 | 大乔 | 17799990005 | daqiao666@sina.com | 舞蹈 | 22 | 2 | 0 | 2001-02-07 00:00:00 |
+----+------+-------------+--------------------+------------+------+--------+--------+---------------------+
1 row in set (0.00 sec)
mysql>
17、执行计划 email='daqiao666@sina.com'
cpp
复制代码
mysql> explain select * from tb_user where email='daqiao666@sina.com';
+----+-------------+---------+------------+------+---------------+-------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+-------------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | tb_user | NULL | ref | idx_email_5 | idx_email_5 | 23 | const | 1 | 100.00 | Using where |
+----+-------------+---------+------------+------+---------------+-------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql>