MySQL JOIN 和 GROUP BY

MySQL JOIN

在 MySQL 中，JOIN 语句用于将数据库中的两个表或者多个表组合起来。

比如在一个学校系统中，有一个学生信息表和一个学生成绩表。这两个表通过学生 ID 字段关联起来。当我们要查询学生的成绩的时候，就需要连接两个表以查询学生信息和成绩。

MySQL 连接类型

MySQL 支持以下类型的连接：

内部联接 (INNER JOIN)

左连接 (LEFT JOIN)

右连接 (RIGHT JOIN)

交叉连接 (CROSS JOIN)

MySQL 目前不支持全连接 FULL OUTER JOIN 。

实例表和数据

关于表连接的实例都使用 student 和 student_score 两个表来完成。

首先，使用下面的 SQL 语句创建表 student 和 student_score ：

sql 复制代码

CREATE TABLE `student` (
  `student_id` int NOT NULL,
  `name` varchar(45) NOT NULL,
  PRIMARY KEY (`student_id`)
);

CREATE TABLE `student_score` (
  `student_id` int NOT NULL,
  `subject` varchar(45) NOT NULL,
  `score` int NOT NULL
);

然后，分别在两个表中插入数据：

go 复制代码

INSERT INTO `student` (`student_id`, `name`)
VALUES (1,'Tim'),(2,'Jim'),(3,'Lucy');

INSERT INTO `student_score` (`student_id`, `subject`, `score`)
VALUES (1,'English',90),(1,'Math',80),(2,'English',85),
       (2,'Math',88),(5,'English',92);

第三，从表中查询数据以验证数据已经成功插入：

sql 复制代码

SELECT * FROM student;

sql 复制代码

+------------+------+
| student_id | name |
+------------+------+
|          1 | Tim  |
|          2 | Jim  |
|          3 | Lucy |
+------------+------+
3 rows in set (0.01 sec)

sql 复制代码

SELECT * FROM student_score;

sql 复制代码

+------------+---------+-------+
| student_id | subject | score |
+------------+---------+-------+
|          1 | English |    90 |
|          1 | Math    |    80 |
|          2 | English |    85 |
|          2 | Math    |    88 |
|          5 | English |    92 |
+------------+---------+-------+
5 rows in set (0.00 sec)

注意： student_score 表中的最后一行的 student_id 为 5，而 student 表中不存在 student_id 为 5 的记录。

交叉连接

交叉连接返回两个集合的笛卡尔积。也就是两个表中的所有的行的所有可能的组合。这相当于内连接没有连接条件或者连接条件永远为真。

如果一个有 m 行的表和另一个有 n 行的表，它们交叉连接将返回 m * n 行数据。

显式的交叉连接 student 和 student_score 表：

sql 复制代码

SELECT
    student.*,
    student_score.*
FROM
    student CROSS JOIN student_score;

隐式的交叉连接 student 和 student_score 表：

sql 复制代码

SELECT
  student.*,
  student_score.*
FROM
  student, student_score;

javascript 复制代码

+------------+------+------------+---------+-------+
| student_id | name | student_id | subject | score |
+------------+------+------------+---------+-------+
|          3 | Lucy |          1 | English |    90 |
|          2 | Jim  |          1 | English |    90 |
|          1 | Tim  |          1 | English |    90 |
|          3 | Lucy |          1 | Math    |    80 |
|          2 | Jim  |          1 | Math    |    80 |
|          1 | Tim  |          1 | Math    |    80 |
|          3 | Lucy |          2 | English |    85 |
|          2 | Jim  |          2 | English |    85 |
|          1 | Tim  |          2 | English |    85 |
|          3 | Lucy |          2 | Math    |    88 |
|          2 | Jim  |          2 | Math    |    88 |
|          1 | Tim  |          2 | Math    |    88 |
|          3 | Lucy |          5 | English |    92 |
|          2 | Jim  |          5 | English |    92 |
|          1 | Tim  |          5 | English |    92 |
+------------+------+------------+---------+-------+
15 rows in set (0.00 sec)

内连接

内连接基于连接条件组合两个表中的数据。内连接相当于加了过滤条件的交叉连接。

内连接将第一个表的每一行与第二个表的每一行进行比较，如果满足给定的连接条件，则将两个表的行组合在一起作为结果集中的一行。

以下 SQL 语句将 student 表和 student_score 表内连接：

sql 复制代码

SELECT
  student.*,
  student_score.*
FROM
  student
  INNER JOIN student_score
  ON student.student_id = student_score.student_id;

等价于：

ini 复制代码

SELECT
  student.*,
  student_score.*
FROM
  student, student_score
  WHERE student.student_id = student_score.student_id;

sql 复制代码

+------------+------+------------+---------+-------+
| student_id | name | student_id | subject | score |
+------------+------+------------+---------+-------+
|          1 | Tim  |          1 | English |    90 |
|          1 | Tim  |          1 | Math    |    80 |
|          2 | Jim  |          2 | English |    85 |
|          2 | Jim  |          2 | Math    |    88 |
+------------+------+------------+---------+-------+
4 rows in set (0.00 sec)

注意输出结果中，student 表中 student_id 为 3 的行和 student_score 表中 student_id 为 5 的行没有出现在输出结果中，这是因为他们没有满足连接条件：student.student_id = student_score.student_id。

由于两个表都使用相同的字段进行等值匹配，因此您可以使用 USING 以下查询中所示的子句：

sql 复制代码

SELECT
  student.*,
  student_score.*
FROM
  student
  INNER JOIN student_score USING(student_id);

左连接

左连接是左外连接的简称，左连接需要连接条件。

两个表左连接时，第一个表称为左表，第二表称为右表。例如 A LEFT JOIN B，A 是左表，B 是右表。

左连接以左表的数据行为基础，根据连接匹配右表的每一行，如果匹配成功则将左表和右表的行组合成新的数据行返回；如果匹配不成功则将左表的行和 NULL 值组合成新的数据行返回。

以下 SQL 语句将 student 表和 student_score 表左连接：

sql 复制代码

SELECT
    student.*,
    student_score.*
FROM
    student
    LEFT JOIN student_score
    ON student.student_id = student_score.student_id;

sql 复制代码

+------------+------+------------+---------+-------+
| student_id | name | student_id | subject | score |
+------------+------+------------+---------+-------+
|          1 | Tim  |          1 | Math    |    80 |
|          1 | Tim  |          1 | English |    90 |
|          2 | Jim  |          2 | Math    |    88 |
|          2 | Jim  |          2 | English |    85 |
|          3 | Lucy |       NULL | NULL    |  NULL |
+------------+------+------------+---------+-------+
5 rows in set (0.00 sec)

注意：

结果集中包含了 student 表的所有记录行。

student_score 表中不包含 student_id = 3 的记录行，因此结果几种最后一行中来自 student_score 的列的内容为 NULL。

tudent_score 表存在多条 student_id 为 1 和 2 的记录，因此 student 表也产生了多行数据。

由于两个表都使用相同的字段进行等值匹配，因此您可以使用 USING 以下查询中所示的子句：

sql 复制代码

SELECT
  student.*,
  student_score.*
FROM
  student
  LEFT JOIN student_score USING(student_id);

右连接

右连接是右外连接的简称，右连接需要连接条件。

右连接与左连接处理逻辑相反，右连接以右表的数据行为基础，根据条件匹配左表中的数据。如果匹配不到左表中的数据，则左表中的列为 NULL 值。

以下 SQL 语句将 student 表和 student_score 表右连接：

sql 复制代码

SELECT
    student.*,
    student_score.*
FROM
    student
    RIGHT JOIN student_score
    ON student.student_id = student_score.student_id;

sql 复制代码

+------------+------+------------+---------+-------+
| student_id | name | student_id | subject | score |
+------------+------+------------+---------+-------+
|          1 | Tim  |          1 | English |    90 |
|          1 | Tim  |          1 | Math    |    80 |
|          2 | Jim  |          2 | English |    85 |
|          2 | Jim  |          2 | Math    |    88 |
|       NULL | NULL |          5 | English |    92 |
+------------+------+------------+---------+-------+
5 rows in set (0.00 sec)

从结果集可以看出，由于左表中不存在到与右表 student_id = 5 匹配的记录，因此最后一行左表的列的值为 NULL。

右连接其实是左右表交换位置的左连接，即 A RIGHT JOIN B 就是 B LEFT JOIN A，因此右连接很少使用。

上面例子中的右连接可以转换为下面的左连接：

sql 复制代码

SELECT
  student.*,
  student_score.*
FROM
  student_score
  LEFT JOIN student
  ON student.student_id = student_score.student_id;

sql 复制代码

+------------+------+------------+---------+-------+
| student_id | name | student_id | subject | score |
+------------+------+------------+---------+-------+
|          1 | Tim  |          1 | English |    90 |
|          1 | Tim  |          1 | Math    |    80 |
|          2 | Jim  |          2 | English |    85 |
|          2 | Jim  |          2 | Math    |    88 |
|       NULL | NULL |          5 | English |    92 |
+------------+------+------------+---------+-------+
5 rows in set (0.00 sec)

从输出结果可以看出，此例子中的左连接的结果集与上面例子中的右连接的结果集是一样的。

结论

连接用于组合两个表的数据。

交叉连接返回两个表中的所有的行的所有可能的组合。

内连接基于连接条件组合两个表中的数据。

左连接以左表为基础组合两个表中的数据。

右连接以右表为基础组合两个表中的数据。

互换左表和右表后，左右连接可以互换。

MySQL GROUP BY

在 MySQL 中， GROUP BY 子句用于将结果集根据指定的字段或者表达式进行分组。

有时候，我们需要将结果集按照某个维度进行汇总。这在统计数据的时候经常用到，考虑以下的场景：

按班级求取平均成绩。

按学生汇总某个人的总分。

按年或者月份统计销售额。

按国家或者地区统计用户数量。

这些正是 GROUP BY 子句发挥作用的地方。

GROUP BY 语法

GROUP BY 子句是 SELECT 语句的可选子句。 GROUP BY 子句语法如下：

css 复制代码

SELECT column1[, column2, ...], aggregate_function(ci)
FROM table
[WHERE clause]
GROUP BY column1[, column2, ...];
[HAVING clause]

column1[, column2, ...] 是分组依据的字段，至少一个字段，可以多个字段。

aggregate_function(ci) 是聚合函数。这是可选的，但是一般都用得到。

SELECT 后的字段必须是分组字段中的字段。

WHERE 子句是可选的，用来过滤结果集中的数据。

HAVING 子句是可选的，用来过滤分组数据。

经常使用的聚合函数主要有：

SUM(): 求总和

AVG(): 求平均值

MAX(): 求最大值

MIN(): 求最小值

COUNT(): 计数

GROUP BY 实例

在以下实例中，我们使用 Sakila 示例数据库中的 actor 表和 payment 表。

简单的 GROUP BY 实例

我们使用 GROUP BY 子句查看 actor 表中的姓氏列表。

vbnet 复制代码

SELECT last_name
FROM actor
GROUP BY last_name;

sql 复制代码

+--------------+
| last_name    |
+--------------+
| AKROYD       |
| ALLEN        |
| ASTAIRE      |
| BACALL       |
| BAILEY       |
...
| ZELLWEGER    |
+--------------+
121 rows in set (0.00 sec)

本例中，使用 GROUP BY 子句按照 last_name 字段对数据进行分组。

本例的输出结果与以下使用 DISTINCT 的 SQL 输出结果完全一样：

sql 复制代码

SELECT DISTINCT last_name FROM actor;

GROUP BY 与聚合函数实例

我们使用 GROUP BY 子句和聚合函数 COUNT() 查看 actor 表中的姓氏列表以及每个姓氏的次数。

sql 复制代码

SELECT last_name, COUNT(*)
FROM actor
GROUP BY last_name
ORDER BY COUNT(*) DESC;

sql 复制代码

+--------------+----------+
| last_name    | COUNT(*) |
+--------------+----------+
| KILMER       |        5 |
| NOLTE        |        4 |
| TEMPLE       |        4 |
| AKROYD       |        3 |
| ALLEN        |        3 |
| BERRY        |        3 |
...
| WRAY         |        1 |
+--------------+----------+
121 rows in set (0.00 sec)

执行的顺序如下：

首先使用 GROUP BY 子句按照 last_name 字段对数据进行分组。

然后使用聚合函数 COUNT(*) 汇总每个姓氏的行数。

最后使用 ORDER BY 子句按照 COUNT(*) 降序排列。

这样，数量最多的姓氏排在最前面。

GROUP BY, LIMIT, 聚合函数实例

以下实例使用 GROUP BY 子句，LIMIT 子句和聚合函数 SUM() 返回总消费金额排名前 10 位的客户。

sql 复制代码

SELECT customer_id, SUM(amount) total
FROM payment
GROUP BY customer_id
ORDER BY total DESC
LIMIT 10;

sql 复制代码

+-------------+--------+
| customer_id | total  |
+-------------+--------+
|         526 | 221.55 |
|         148 | 216.54 |
|         144 | 195.58 |
|         137 | 194.61 |
|         178 | 194.61 |
|         459 | 186.62 |
|         469 | 177.60 |
|         468 | 175.61 |
|         236 | 175.58 |
|         181 | 174.66 |
+-------------+--------+
10 rows in set (0.02 sec)

执行的顺序如下：

首先使用 GROUP BY 子句按照 customer_id 字段对数据进行分组。

然后使用聚合函数 SUM(amount) 汇总每个客户的 amount 字段，并使用 total 作为别名。

然后使用 ORDER BY 子句按照 total 降序排列。

最后使用 LIMIT 10 子句返回前 10 个记录行。

GROUP BY 和 HAVING 实例

以下实例使用 GROUP BY 子句，HAVING 子句和聚合函数 SUM() 返回总消费金额在 180 以上的客户。

sql 复制代码

SELECT customer_id, SUM(amount) total
FROM payment
GROUP BY customer_id
HAVING total > 180
ORDER BY total DESC;

sql 复制代码

+-------------+--------+
| customer_id | total  |
+-------------+--------+
|         526 | 221.55 |
|         148 | 216.54 |
|         144 | 195.58 |
|         137 | 194.61 |
|         178 | 194.61 |
|         459 | 186.62 |
+-------------+--------+
6 rows in set (0.02 sec)

执行的顺序如下

首先使用 GROUP BY 子句按照 customer_id 字段对数据进行分组。

然后使用聚合函数 SUM(amount) 汇总每个客户的 amount 字段，并使用 total 作为别名。

然后使用 HAVING 过滤结果集中 total 大于 180 的数据行。

最后使用 ORDER BY 子句按照 total 降序排列。

HAVING 子句用来过滤 GROUP BY 分组的数据，需要使用逻辑表达式作为条件，其中逻辑表达式中的字段或表达式只能使用分组使用的字段和聚合函数。

结论

GROUP BY 子句用于将结果集根据指定的字段或者表达式进行分组。

GROUP BY 子句的分组字段或表达式至少一个，可以多个。

HAVING 子句是可选的，用来过滤分组数据。

GROUP BY 子句经常用于数据统计汇总，通常使用聚合函数。