目录
[WHERE 和 HAVING 的区别](#WHERE 和 HAVING 的区别)
[DISTINCT 和 GROUP BY 的区别](#DISTINCT 和 GROUP BY 的区别)
[LIMIT 和 OFFSET](#LIMIT 和 OFFSET)
[NULL 判断](#NULL 判断)
[COUNT (*)、COUNT (1)、COUNT(列)](#COUNT (*)、COUNT (1)、COUNT(列))
[SUM/AVG 遇到 NULL](#SUM/AVG 遇到 NULL)
[分组后平均值 > 100 的组](#分组后平均值 > 100 的组)
[多表 JOIN](#多表 JOIN)
[四种 JOIN 区别](#四种 JOIN 区别)
[LEFT JOIN 左表一定全保留吗](#LEFT JOIN 左表一定全保留吗)
[JOIN 数据膨胀原因](#JOIN 数据膨胀原因)
[ON 与 WHERE 在 LEFT JOIN 中区别](#ON 与 WHERE 在 LEFT JOIN 中区别)
[子查询与 WITH](#子查询与 WITH)
[相关子查询 vs 非相关子查询](#相关子查询 vs 非相关子查询)
[EXISTS 与 IN](#EXISTS 与 IN)
[CASE WHEN](#CASE WHEN)
[CASE 可放位置](#CASE 可放位置)
[MySQL IF 与 CASE WHEN](#MySQL IF 与 CASE WHEN)
[窗口函数能否放 WHERE](#窗口函数能否放 WHERE)
[每组 TopN](#每组 TopN)
[窗口 SUM 与 GROUP BY 区别](#窗口 SUM 与 GROUP BY 区别)
[UNION 与 UNION ALL](#UNION 与 UNION ALL)
[NULL 排序位置](#NULL 排序位置)
[SQL 执行顺序](#SQL 执行顺序)
[为什么别名不能用在 WHERE](#为什么别名不能用在 WHERE)
[SQL 优化](#SQL 优化)
[不推荐 SELECT *](#不推荐 SELECT *)
[大表 JOIN 优化](#大表 JOIN 优化)
[分组统计 + 过滤](#分组统计 + 过滤)
[每组 Top3](#每组 Top3)
[连续 N 天登录(通用模板)](#连续 N 天登录(通用模板))
[GROUP BY 分组聚合](#GROUP BY 分组聚合)
[多表 JOIN](#多表 JOIN)
[子查询 & HAVING](#子查询 & HAVING)
[CASE WHEN](#CASE WHEN)
[深造 / 毕业 / 报到 专项 SQL](#深造 / 毕业 / 报到 专项 SQL)
基础语法
WHERE 和 HAVING 的区别
- WHERE:对【原始数据行】过滤,发生在分组之前
- HAVING:对【分组后的结果】过滤,必须跟在 GROUP BY 后,可使用聚合函数
- WHERE 不能用聚合函数,HAVING 可以
DISTINCT 和 GROUP BY 的区别
- DISTINCT:对结果集去重,只保留唯一值,不做聚合
- GROUP BY:按字段分组,用于配合聚合函数做统计
结论:只去重用 DISTINCT,要统计用 GROUP BY
sql
SELECT DISTINCT col FROM t;
sql
SELECT col, COUNT(*) FROM t GROUP BY col;
LIMIT 和 OFFSET
- LIMIT n:取前 n 条
- OFFSET m:跳过前 m 条
- 等价于 LIMIT 5,10
作用:分页查询
sql
SELECT * FROM t LIMIT 10 OFFSET 5;
NULL 判断
- 错误:col = NULL 永远不成立(NULL 不等于任何值,包括自己)
- 正确:判断是否为空用 IS NULL / IS NOT NULL
java
SELECT * FROM t WHERE col IS NULL;
SELECT * FROM t WHERE col IS NOT NULL;
别名能否用于 WHERE
不能!因为 SQL 执行顺序:WHERE 早于 SELECT
错误
sql
SELECT col AS a FROM t WHERE a > 10;
正确
sql
SELECT col AS a FROM t WHERE col > 10;
聚合函数
COUNT (*)、COUNT (1)、COUNT(列)
- COUNT(*):统计所有行,包含 NULL
- COUNT(1):与 COUNT(*) 几乎一致,效率基本相同
- COUNT(列):只统计【该列不为 NULL】的行数 -- 面试结论:统计表总行数用 COUNT(*),统计非空用 COUNT(列)
SUM/AVG 遇到 NULL
- SUM、AVG、MAX、MIN 都会自动忽略 NULL,不参与计算
- 例如 AVG(10, NULL, 20) = (10+20)/2 = 15
聚合函数能否嵌套
不能直接嵌套:SUM(AVG(col)) 报错,必须用子查询/窗口函数间接实现
sql
SELECT SUM(avg_col)
FROM (SELECT AVG(col) AS avg_col FROM t GROUP BY id) tmp;
分组统计数量、总和、平均值
注意:SELECT 中非聚合字段必须出现在 GROUP BY 中
sql
SELECT
col1, -- 分组字段
COUNT(*) AS cnt, -- 每组行数
SUM(col2) AS sum_col,-- 求和
AVG(col2) AS avg_col -- 平均值
FROM t
GROUP BY col1;
分组后平均值 > 100 的组
HAVING 过滤分组结果
sql
SELECT col1, AVG(col2) AS avg_col
FROM t
GROUP BY col1
HAVING AVG(col2) > 100;
多表 JOIN
四种 JOIN 区别
- INNER JOIN:只保留两边都能匹配上的行
- LEFT JOIN:左表全部保留,右表匹配不上补 NULL
- RIGHT JOIN:右表全部保留,左表匹配不上补 NULL
- FULL JOIN:左右表都保留,匹配不上补 NULL(MySQL 不支持,可用 UNION 模拟)
LEFT JOIN 左表一定全保留吗
不一定! 如果在 WHERE 里对右表字段做过滤(IS NOT NULL / = 值),会把 LEFT JOIN 变成 INNER JOIN
JOIN 数据膨胀原因
一对多关系:1 条左表数据匹配多条右表数据
例如:1 个用户对应 5 条订单,JOIN 后行数变多
解决:先聚合右表,再 JOIN;或用 DISTINCT 去重
ON 与 WHERE 在 LEFT JOIN 中区别
- ON:JOIN 时的匹配条件,不影响左表行数
- WHERE:对 JOIN 后的结果整体过滤,会删除左表行
示例:左表全保留,只匹配 t2.status=1 的行
sql
SELECT *
FROM t1
LEFT JOIN t2 ON t1.id = t2.id AND t2.status = 1;
示例:等价 INNER JOIN,左表不满足行会被删掉
sql
SELECT *
FROM t1
LEFT JOIN t2 ON t1.id = t2.id
WHERE t2.status = 1;
子查询与 WITH
相关子查询 vs 非相关子查询
非相关子查询:子查询可独立运行,只执行一次
sql
SELECT * FROM t WHERE col IN (SELECT col FROM tmp);
相关子查询:子查询依赖外层表,每行执行一次
sql
SELECT * FROM t1
WHERE EXISTS (SELECT 1 FROM t2 WHERE t1.id = t2.id);
EXISTS 与 IN
- IN:适合子查询结果小
- EXISTS:适合子查询结果大,逐行匹配,效率更高
- 优先用 EXISTS
查询最大值所在整行
方法1:子查询
sql
SELECT * FROM t
WHERE col = (SELECT MAX(col) FROM t);
方法2:窗口函数
sql
SELECT * FROM (
SELECT *, RANK() OVER(ORDER BY col DESC) AS rk
FROM t
) tmp WHERE rk = 1;
大于本组平均值
sql
SELECT t.*
FROM t
JOIN (
SELECT group_id, AVG(col) AS group_avg
FROM t GROUP BY group_id
) tmp ON t.group_id = tmp.group_id
WHERE t.col > tmp.group_avg;
WITH AS 临时表
公共表达式,提高可读性,可多次复用
sql
WITH tmp AS (
SELECT group_id, AVG(col) AS avg_col FROM t GROUP BY group_id
)
SELECT * FROM tmp WHERE avg_col > 100;
CASE WHEN
顺序匹配
按顺序匹配,满足第一个即停止
sql
SELECT
score,
CASE
WHEN score >= 90 THEN '优秀'
WHEN score >= 70 THEN '良好'
WHEN score >= 60 THEN '及格'
ELSE '不及格'
END AS level
FROM t;
行转列统计数量
不满足条件为 NULL,COUNT 忽略 NULL
sql
SELECT
COUNT(CASE WHEN gender = 1 THEN 1 END) AS male_cnt,
COUNT(CASE WHEN gender = 0 THEN 1 END) AS female_cnt
FROM t;
CASE 可放位置
SELECT、WHERE、GROUP BY、HAVING、ORDER BY 都可以用
MySQL IF 与 CASE WHEN
IF(条件, 成立值, 不成立值) 只能单分支,CASE WHEN 支持多条件,更通用
sql
IF(score>60, '及格', '不及格')
窗口函数
三个排名函数区别
- ROW_NUMBER():连续不重复排名 1,2,3,4
- RANK():并列排名,跳号 1,1,3,4
- DENSE_RANK():并列排名,不跳号 1,1,2,3
sql
SELECT
id,
score,
ROW_NUMBER() OVER(ORDER BY score DESC) AS rn,
RANK() OVER(ORDER BY score DESC) AS rk,
DENSE_RANK() OVER(ORDER BY score DESC) AS drk
FROM t;
窗口函数结构
不改变行数,对每行计算一个统计值
sql
OVER (
PARTITION BY group_id -- 分组(可选)
ORDER BY col DESC -- 排序(必须)
)
窗口函数能否放 WHERE
不能!执行顺序:WHERE → GROUP BY → 窗口函数 → ORDER BY → LIMIT
窗口函数计算晚于 WHERE,所以不能直接过滤
每组 TopN
sql
SELECT * FROM (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY group_id ORDER BY score DESC) AS rn
FROM t
) tmp
WHERE rn <= 3;
累计和
sql
SELECT
id,
val,
SUM(val) OVER(ORDER BY id) AS cum_sum -- 按顺序累加
FROM t;
LAG、LEAD
- LAG(col, n):取当前行【前n行】的值
- LEAD(col, n):取当前行【后n行】的值
常用于环比、同比、连续问题
sql
SELECT
id,
LAG(val, 1) OVER(ORDER BY id) AS pre_val,
LEAD(val, 1) OVER(ORDER BY id) AS next_val
FROM t;
窗口 SUM 与 GROUP BY 区别
- GROUP BY:分组后【行数变少】
- SUM() OVER(PARTITION BY):保留所有行,附加分组统计值
去重、合并、排序
UNION 与 UNION ALL
- UNION:合并结果并【去重+排序】,慢
- UNION ALL:直接拼接,【不去重】,快
优先用 UNION ALL
查询重复数据
sql
SELECT col, COUNT(*)
FROM t
GROUP BY col
HAVING COUNT(*) > 1;
删除重复保留一条
保留 id 最小的一条
sql
DELETE t1
FROM t t1
JOIN t t2
WHERE t1.col = t2.col AND t1.id > t2.id;
多字段排序
先按 col1 降序,相同再按 col2 升序
sql
SELECT * FROM t
ORDER BY col1 DESC, col2 ASC;
NULL 排序位置
MySQL:NULL 视为最小值,升序在前,降序在后,如需把 NULL 放最后:
sql
ORDER BY ISNULL(col), col;
SQL 执行顺序
标准执行顺序
sql
1. FROM / JOIN
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT(含窗口函数、别名)
6. DISTINCT
7. ORDER BY
8. LIMIT / OFFSET
为什么别名不能用在 WHERE
别名在 SELECT 阶段生成,WHERE 执行早于 SELECT,所以看不到别名
SQL 优化
定位慢查询
- 开启慢查询日志
- 使用 EXPLAIN 查看执行计划
- 看是否走索引、是否全表扫描、是否出现 filesort
索引作用
索引是排序结构,类似目录
作用:避免全表扫描,加速 WHERE / ORDER BY / GROUP BY
索引失效场景
- 字段使用函数:WHERE YEAR(date) = 2024
- 隐式类型转换
- LIKE '%xxx%' 前模糊
- OR 连接无索引字段
- 优化器认为全表更快(数据量小)
不推荐 SELECT *
- 查询无用字段,增加 IO
- 无法使用覆盖索引
- 表结构变化后易出错
大表 JOIN 优化
- 小表驱动大表
- 关联字段建索引
- 先过滤再 JOIN
- 避免一对多导致数据膨胀
- 必要时分批查询
高频手写综合题(带注释)
分组统计 + 过滤
sql
SELECT
dept_id,
COUNT(*) AS cnt,
AVG(salary) AS avg_sal
FROM employee
WHERE salary > 0
GROUP BY dept_id
HAVING avg_sal > 5000
ORDER BY avg_sal DESC
LIMIT 10;
行转列
sql
SELECT
user_id,
SUM(CASE WHEN type = 1 THEN amount END) AS income,
SUM(CASE WHEN type = 2 THEN amount END) AS outcome
FROM bill
GROUP BY user_id;
每组 Top3
sql
SELECT * FROM (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY dept_id ORDER BY salary DESC) AS rn
FROM employee
) tmp
WHERE rn <= 3;
连续 N 天登录(通用模板)
sql
SELECT user_id
FROM (
SELECT
user_id,
dt,
-- 日期 - 行号,连续日期会得到相同值
DATE_SUB(dt, INTERVAL ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY dt) DAY) AS grp
FROM login_log
) tmp
GROUP BY user_id, grp
HAVING COUNT(*) >= 7;
环比计算
sql
SELECT
dt,
val,
LAG(val, 1) OVER(ORDER BY dt) AS pre_val,
ROUND((val - LAG(val,1) OVER(ORDER BY dt))/LAG(val,1) OVER(ORDER BY dt),2) AS ratio
FROM daily_data;
SQL应用案例-高等教育类
统一表结构
1. 录取/生源表:存储学生高考录取、生源、专业信息
sql
CREATE TABLE student_admit (
student_id VARCHAR(20) PRIMARY KEY, -- 学号(主键,唯一标识学生)
college VARCHAR(50), -- 学院
major VARCHAR(50), -- 专业
province VARCHAR(50), -- 生源省份
score INT, -- 录取分数
gender VARCHAR(10), -- 性别
is_first_choice TINYINT, -- 1一志愿 0调剂
admit_year INT, -- 录取年份
is_recruit TINYINT -- 1统招 0单招/综评
);
2. 报到表:存储新生报到状态
sql
CREATE TABLE student_register (
student_id VARCHAR(20) PRIMARY KEY, -- 学号(关联生源表)
is_register TINYINT, -- 1已报到 0未报到
register_time DATETIME -- 报到时间
);
3. 毕业表:存储学生毕业、学分、挂科信息
sql
CREATE TABLE student_graduate (
student_id VARCHAR(20) PRIMARY KEY, -- 学号
graduate_year INT, -- 毕业年份
is_graduate TINYINT, -- 1正常毕业 0未毕业
is_degree TINYINT, -- 1授予学位 0不授予
fail_course INT, -- 挂科门数
credit_complete DECIMAL(5,2) -- 已修学分
);
4. 深造/升学表:存储学生保研、考研、出国信息
sql
CREATE TABLE student_advance (
student_id VARCHAR(20) PRIMARY KEY, -- 学号
is_recommend TINYINT, -- 1保研 0未保研
is_postgrad TINYINT, -- 1考研上岸 0未上岸
is_abroad TINYINT, -- 1出国出境 0未出国
target_school_type VARCHAR(20) -- 录取院校类型:985/211/双一流/普通
);
基础查询
查询 2024 级学生核心信息
需求:筛选2024年录取学生,展示学号、学院、专业、分数、省份
sql
SELECT student_id, college, major, score, province
FROM student_admit
WHERE admit_year = 2024; -- 按录取年份过滤
查询未报到新生(按分数降序)
需求:关联生源表和报到表,筛选未报到学生,按录取分数排序
sql
SELECT a.* -- 查询录取表所有字段
FROM student_admit a
LEFT JOIN student_register r ON a.student_id = r.student_id -- 左连接保证所有录取学生都保留
WHERE r.is_register = 0 OR r.is_register IS NULL -- 0=未报到,NULL=无报到记录(视为未报到)
ORDER BY score DESC; -- 按录取分数降序
河南生源 500 分以上学生
需求:多条件过滤,省份=河南省 + 分数>500
sql
SELECT *
FROM student_admit
WHERE province = '河南省' AND score > 500;
查询含 "工程" 的专业及招生人数
需求:模糊匹配专业名,分组统计招生人数
sql
SELECT major, COUNT(*) AS enroll_num -- 统计每个专业人数
FROM student_admit
GROUP BY major
HAVING major LIKE '%工程%'; -- 分组后过滤专业名
去重统计生源省份数量
需求:统计不重复的省份总数
sql
SELECT COUNT(DISTINCT province) AS province_cnt
FROM student_admit; -- DISTINCT 去重,COUNT统计数量
GROUP BY 分组聚合
各学院录取人数、报到人数、报到率
需求:按学院分组,计算报到率(报到人数/录取人数)
sql
SELECT
a.college,
COUNT(*) AS admit_cnt, -- 录取总人数
SUM(r.is_register) AS register_cnt, -- 报到人数(is_register=1求和)
-- 计算报到率,保留2位小数,*100.0转为浮点避免整数除法
ROUND(SUM(r.is_register)*100.0/COUNT(*),2) AS register_rate
FROM student_admit a
LEFT JOIN student_register r ON a.student_id = r.student_id
GROUP BY a.college; -- 按学院分组
各专业录取分数统计
需求:按专业分组,计算平均分、最高分、最低分
sql
SELECT
major,
AVG(score) AS avg_score, -- 平均分
MAX(score) AS max_score, -- 最高分
MIN(score) AS min_score -- 最低分
FROM student_admit
GROUP BY major;
各省生源人数排序
需求:按省份分组统计人数,按人数降序
sql
SELECT province, COUNT(*) AS cnt
FROM student_admit
GROUP BY province
ORDER BY cnt DESC;
男女学生人数及占比
需求:按性别分组,计算人数和占比
sql
SELECT
gender,
COUNT(*) AS cnt,
-- 窗口函数SUM(COUNT(*))OVER()计算总人数,求占比
ROUND(COUNT(*)*100.0/SUM(COUNT(*))OVER(),2) AS ratio
FROM student_admit
GROUP BY gender;
各专业一志愿率统计
需求:按专业分组,计算一志愿录取率
sql
SELECT
major,
SUM(is_first_choice) AS first_cnt, -- 一志愿录取人数
COUNT(*) AS total, -- 总录取人数
ROUND(SUM(is_first_choice)*100.0/COUNT(*),2) AS first_rate -- 一志愿率
FROM student_admit
GROUP BY major;
空值、去重、COUNT
统计录取分数为空的学生数
sql
SELECT COUNT(*)
FROM student_admit
WHERE score IS NULL;
统计省份非空的有效生源数
sql
SELECT COUNT(*)
FROM student_admit
WHERE province IS NOT NULL;
查询重复学号的学生
需求:按学号分组,筛选出现次数>1的重复数据
sql
SELECT student_id, COUNT(*)
FROM student_admit
GROUP BY student_id
HAVING COUNT(*) > 1;
删除重复学号,保留一条
需求:自连接删除重复数据,保留学号最小的一条
sql
DELETE t1
FROM student_admit t1
JOIN student_admit t2 ON t1.student_id = t2.student_id
WHERE t1.ctid < t2.ctid;
多表 JOIN
各专业报到率统计
需求:按专业分组,关联报到表计算报到率
sql
SELECT
a.major,
COUNT(*) AS admit_cnt,
SUM(r.is_register) AS register_cnt,
ROUND(SUM(r.is_register)*100.0/COUNT(*),2) AS register_rate
FROM student_admit a
LEFT JOIN student_register r USING(student_id) -- USING简化关联字段写法
GROUP BY a.major;
所有录取学生 + 报到状态
需求:左连接保证所有录取学生保留,无报到记录填0
sql
SELECT
a.*,
COALESCE(r.is_register, 0) AS is_register -- COALESCE:NULL替换为0
FROM student_admit a
LEFT JOIN student_register r USING(student_id);
各专业学位授予率
需求:关联毕业表,计算学位授予比例
sql
SELECT
a.major,
COUNT(*) AS total, -- 总毕业人数
SUM(g.is_degree) AS degree_cnt, -- 授予学位人数
ROUND(SUM(g.is_degree)*100.0/COUNT(*),2) AS degree_rate -- 学位授予率
FROM student_admit a
JOIN student_graduate g USING(student_id)
GROUP BY a.major;
三表关联:各省生源深造人数
需求:生源+报到+深造三表关联,按省份统计深造人数
sql
SELECT
a.province,
COUNT(DISTINCT a.student_id) AS student_cnt, -- 去重统计总人数
-- CASE判断:满足任一深造条件记为1,求和得深造人数
SUM(CASE WHEN adv.is_recommend=1 OR adv.is_postgrad=1 OR adv.is_abroad=1 THEN 1 ELSE 0 END) AS advance_cnt
FROM student_admit a
LEFT JOIN student_register r USING(student_id)
LEFT JOIN student_advance adv USING(student_id)
GROUP BY a.province;
子查询 & HAVING
分数高于全校平均分的学生
需求:子查询先算全校平均分,再过滤学生
sql
SELECT *
FROM student_admit
WHERE score > (SELECT AVG(score) FROM student_admit);
报到率低于 85% 的专业
需求:子查询先计算各专业报到率,外层过滤
sql
SELECT major, register_rate
FROM (
SELECT
major,
ROUND(SUM(r.is_register)*100.0/COUNT(*),2) AS register_rate
FROM student_admit a
LEFT JOIN student_register r USING(student_id)
GROUP BY major
) t -- 临时表
WHERE register_rate < 85;
高于本专业平均分的学生
需求:先计算各专业平均分,再关联原表过滤
sql
SELECT a.*
FROM student_admit a
JOIN (
SELECT major, AVG(score) AS avg_major_score
FROM student_admit
GROUP BY major
) m ON a.major = m.major -- 按专业匹配平均分
WHERE a.score > m.avg_major_score; -- 分数>专业平均分
近三年录取人数逐年上升的专业
需求:窗口函数LAG获取上一年录取人数,对比逐年上升
sql
SELECT major, admit_year, cnt
FROM (
SELECT
major,
admit_year,
COUNT(*) AS cnt,
-- LAG取上一年数据,按专业分组、年份排序
LAG(COUNT(*),1) OVER(PARTITION BY major ORDER BY admit_year) AS pre_cnt
FROM student_admit
WHERE admit_year IN (2022,2023,2024)
GROUP BY major, admit_year
) t
WHERE cnt > pre_cnt; -- 今年人数>去年人数
报到 + 毕业 + 深造的学生总数
需求:三表内连接,筛选同时满足三个条件的学生
sql
SELECT COUNT(*) AS total
FROM student_register r
JOIN student_graduate g USING(student_id)
JOIN student_advance adv USING(student_id)
WHERE r.is_register=1 -- 已报到
AND g.is_graduate=1 -- 已毕业
-- 满足任一深造条件
AND (adv.is_recommend=1 OR adv.is_postgrad=1 OR adv.is_abroad=1);
CASE WHEN
各专业总深造率
需求:深造=保研+考研+出国,计算各专业深造比例
sql
SELECT
a.major,
COUNT(*) AS graduate_cnt,
SUM(CASE WHEN adv.is_recommend=1 OR adv.is_postgrad=1 OR adv.is_abroad=1 THEN 1 ELSE 0 END) AS advance_cnt,
ROUND(SUM(CASE WHEN is_recommend+is_postgrad+is_abroad>=1 THEN 1 ELSE 0 END)*100.0/COUNT(*),2) AS advance_rate
FROM student_admit a
JOIN student_graduate g USING(student_id)
LEFT JOIN student_advance adv USING(student_id)
GROUP BY a.major;
分数分段 + 深造率统计
需求:CASE分数分段,统计每段人数和深造率
sql
SELECT
CASE
WHEN score < 500 THEN '低分段'
WHEN score BETWEEN 500 AND 550 THEN '中段'
ELSE '高分段'
END AS score_level, -- 分数分段别名
COUNT(*) AS cnt,
ROUND(SUM(CASE WHEN is_recommend+is_postgrad+is_abroad>=1 THEN 1 ELSE 0 END)*100.0/COUNT(*),2) AS advance_rate
FROM student_admit a
LEFT JOIN student_advance adv USING(student_id)
GROUP BY score_level;
标记高分生源(高于校平均分)
子查询计算校平均分,CASE判断是否高分
sql
SELECT
*,
-- 子查询计算校平均分,CASE判断是否高分
CASE WHEN score > (SELECT AVG(score) FROM student_admit) THEN 1 ELSE 0 END AS is_high_score
FROM student_admit;
考研报名率、上岸率
sql
SELECT
major,
SUM(is_postgrad_apply) AS apply_cnt, -- 考研报名人数
SUM(is_postgrad) AS pass_cnt, -- 上岸人数
ROUND(SUM(is_postgrad)*100.0/SUM(is_postgrad_apply),2) AS pass_rate -- 上岸率
FROM student_admit a
JOIN student_advance adv USING(student_id)
GROUP BY major;
毕业结果分类统计
需求:按毕业状态、学分分类统计
sql
SELECT
CASE
WHEN is_graduate=1 THEN '正常毕业'
WHEN credit_complete < 120 THEN '延期毕业'
ELSE '结业/肄业'
END AS graduate_type,
COUNT(*) AS cnt
FROM student_graduate
GROUP BY graduate_type;
窗口函数
专业内分数 Top10
需求:按专业分组,分数降序排名,取每组前10
sql
SELECT *
FROM (
SELECT
*,
-- 分组内排名,连续不重复
ROW_NUMBER() OVER(PARTITION BY major ORDER BY score DESC) AS rn
FROM student_admit
) t
WHERE rn <= 10;
同专业分数分位数
需求:计算学生分数在专业内的排名百分比
sql
SELECT
student_id, major, score,
PERCENT_RANK() OVER(PARTITION BY major ORDER BY score) AS pct
FROM student_admit;
专业人数占学院比例
需求:按学院+专业分组,计算专业人数占学院总人数比例
sql
SELECT
college,
major,
COUNT(*) AS cnt,
-- 窗口函数按学院分组求和,计算占比
ROUND(COUNT(*)*100.0/SUM(COUNT(*))OVER(PARTITION BY college),2) AS ratio_in_college
FROM student_admit
GROUP BY college, major;
深造 / 毕业 / 报到 专项 SQL
各专业毕业率、延期毕业人数
sql
SELECT
a.major,
COUNT(*) AS total,
SUM(g.is_graduate) AS graduate_cnt,
ROUND(SUM(g.is_graduate)*100.0/COUNT(*),2) AS graduate_rate,
-- 筛选未毕业+学分不足的延期学生
SUM(CASE WHEN g.is_graduate=0 AND g.credit_complete<120 THEN 1 ELSE 0 END) AS delay_cnt
FROM student_admit a
JOIN student_graduate g USING(student_id)
GROUP BY a.major;
各专业深造明细 + 总深造率
sql
SELECT
major,
SUM(is_recommend) AS recommend_cnt, -- 保研人数
SUM(is_postgrad) AS postgrad_cnt, -- 考研上岸人数
SUM(is_abroad) AS abroad_cnt, -- 出国人数
-- 总深造率:满足任一深造条件/总人数
ROUND(SUM(CASE WHEN is_recommend+is_postgrad+is_abroad>=1 THEN 1 ELSE 0 END)*100.0/COUNT(*),2) AS total_advance_rate
FROM student_admit a
JOIN student_advance adv USING(student_id)
GROUP BY major;
各省生源深造率
sql
SELECT
province,
COUNT(*) AS cnt,
ROUND(SUM(CASE WHEN is_recommend+is_postgrad+is_abroad>=1 THEN 1 ELSE 0 END)*100.0/COUNT(*),2) AS advance_rate
FROM student_admit a
LEFT JOIN student_advance adv USING(student_id)
GROUP BY province;
升学院校类型占比
sql
SELECT
target_school_type,
COUNT(*) AS cnt,
ROUND(COUNT(*)*100.0/SUM(COUNT(*))OVER(),2) AS ratio -- 占总深造人数比例
FROM student_advance
WHERE target_school_type IS NOT NULL
GROUP BY target_school_type;
高频简答题
1. WHERE vs HAVING:
WHERE 分组前过滤原始行,不能用聚合;HAVING 分组后过滤结果,可以用聚合。
2. JOIN vs 子查询:
JOIN 性能更优,索引生效更快,优先使用 JOIN。
3. 窗口函数限制:
不能放在 WHERE 中,执行顺序晚于 WHERE。
4. SQL 优化:
关联字段建索引、先过滤再 JOIN、避免 SELECT*、减少数据膨胀、EXPLAIN 分析执行计划。
5. 口径不一致排查:
核对时间范围、是否去重、是否包含未报到 / 延期、关联逻辑、统计口径。