MySQL窗口函数 OVER()讲解

目录

  • 一、窗口函数概述
    • [1. 什么是窗口函数?](#1. 什么是窗口函数?)
    • [2. 窗口函数 vs 聚合函数](#2. 窗口函数 vs 聚合函数)
    • [3. 基本语法结构](#3. 基本语法结构)
  • 二、窗口函数核心组成部分
    • [1. PARTITION BY - 分区子句](#1. PARTITION BY - 分区子句)
    • [2. ORDER BY - 排序子句](#2. ORDER BY - 排序子句)
    • 3.聚合窗口函数
    • 4.专用窗口函数
    • [5. ROWS BETWEEN - 窗口帧子句](#5. ROWS BETWEEN - 窗口帧子句)
  • 三、窗口函数分类详解
    • [1. 序号函数(Ranking Functions)](#1. 序号函数(Ranking Functions))
    • [2. 分布函数(Distribution Functions)](#2. 分布函数(Distribution Functions))
    • [3. 前后值函数(Value Functions)](#3. 前后值函数(Value Functions))
    • [4. 聚合函数作为窗口函数](#4. 聚合函数作为窗口函数)

一、窗口函数概述

1. 什么是窗口函数?

窗口函数Window Function)是对一组行(称为"窗口")执行计算,并为每一行返回一个值 的函数。与聚合函数不同,窗口函数不减少行数。

2. 窗口函数 vs 聚合函数

特性 窗口函数 聚合函数
返回行数 与输入行数相同 通常减少行数(GROUP BY)
分组效果 保留所有行,添加计算结果 每组返回一行
语法位置 SELECT 子句中 SELECT 或 HAVING 子句中
典型函数 ROW_NUMBER(), RANK(), SUM() OVER() SUM(), COUNT(), AVG()

3. 基本语法结构

sql 复制代码
窗口函数([参数]) OVER (
  [PARTITION BY <分组列>] 
  [ORDER BY <排序列 ASC/DESC>]
  [ROWS BETWEEN 开始行 AND 结束行]
)
  • OVER() 里面不能直接放 GROUP BY!可以放PARTITION BY
  • PARTITION BY 子句用于指定分组列,关键字:PARTITION BY
  • ORDER BY 子句用于指定排序列,关键字ORDER BY
  • ROWS BETWEEN 子句用于指定窗口的范围 ,关键字ROWS BETWEEN 即[开始行]、[结束行]

其中,ROWS BETWEEN 子句在实际中可能用得相对少一些,因此有部分参考资料的语法描述省略了ROWS BETWEEN 子句,主要侧重于PARTITION BY分组与ORDER BY排序:

二、窗口函数核心组成部分

1. PARTITION BY - 分区子句

将数据划分为多个分区,在每个分区内独立计算。

sql 复制代码
-- 创建测试数据
CREATE TABLE sales (
    id INT PRIMARY KEY AUTO_INCREMENT,
    salesperson VARCHAR(50),
    region VARCHAR(50),
    sale_date DATE,
    amount DECIMAL(10, 2)
);

INSERT INTO sales (salesperson, region, sale_date, amount) VALUES
('张三', '北京', '2024-01-01', 1000.00),
('张三', '北京', '2024-01-02', 1500.00),
('李四', '上海', '2024-01-01', 2000.00),
('李四', '上海', '2024-01-02', 2500.00),
('王五', '北京', '2024-01-01', 1200.00),
('王五', '北京', '2024-01-03', 1800.00),
('赵六', '广州', '2024-01-02', 2200.00);

-- 按销售员分区计算
SELECT 
    salesperson,
    sale_date,
    amount,
    -- 每个销售员的销售总额
    SUM(amount) OVER (PARTITION BY salesperson) AS total_by_person,
    -- 每个地区的销售总额
    SUM(amount) OVER (PARTITION BY region) AS total_by_region,
    -- 不分区(全局总额)
    SUM(amount) OVER () AS grand_total
FROM sales
ORDER BY salesperson, sale_date;

输出结果:

复制代码
salesperson | sale_date  | amount | total_by_person | total_by_region | grand_total
------------|------------|--------|-----------------|-----------------|------------
张三        | 2024-01-01 | 1000.00| 2500.00         | 5500.00         | 12200.00
张三        | 2024-01-02 | 1500.00| 2500.00         | 5500.00         | 12200.00
李四        | 2024-01-01 | 2000.00| 4500.00         | 4500.00         | 12200.00
李四        | 2024-01-02 | 2500.00| 4500.00         | 4500.00         | 12200.00
王五        | 2024-01-01 | 1200.00| 3000.00         | 5500.00         | 12200.00
王五        | 2024-01-03 | 1800.00| 3000.00         | 5500.00         | 12200.00
赵六        | 2024-01-02 | 2200.00| 2200.00         | 2200.00         | 12200.00

2. ORDER BY - 排序子句

在分区内对行进行排序,影响排名函数和累计计算

sql 复制代码
SELECT 
    salesperson,
    sale_date,
    amount,
    -- 按金额排序(分区内)
    ROW_NUMBER() OVER (PARTITION BY salesperson ORDER BY amount DESC) AS rn,
    -- 累计金额(分区内按日期排序)
    SUM(amount) OVER (PARTITION BY salesperson ORDER BY sale_date) AS running_total,
    -- 移动平均值(最近3行的平均值)
    AVG(amount) OVER (PARTITION BY salesperson ORDER BY sale_date 
         ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_3
FROM sales
ORDER BY salesperson, sale_date;

3.聚合窗口函数

许多窗口函数的教程,通常将常用的窗口函数分为两大类:聚合窗口函数专用窗口函数。聚合窗口函数的函数名与普通常用聚合函数一致,功能也一致。从使用的角度来讲,与普通聚合函数的区别在于提供了窗口函数的专属子句,来使得数据的分析与获取更简便。主要有如下几个:

函数名 作用
SUM 对指定列的数值求和
AVG 计算指定列的平均值
COUNT 统计记录/非空值数量
MAX 找出指定列的最大值
MIN 找出指定列的最小值

4.专用窗口函数

常见的专用窗口函数

函数名 分类 说明
RANK 排序函数 类似于排名,并列的结果序号可以重复,序号不连续(如:1,2,2,4)
DENSE_RANK 排序函数 类似于排名,并列的结果序号可以重复,序号连续(如:1,2,2,3)
ROW_NUMBER 排序函数 对分组下的所有结果排序,基于分组分配唯一连续的行号(如:1,2,3,4)
PERCENT_RANK 分布函数 每行按公式 (rank-1) / (rows-1) 计算,结果为0~1的百分比值
CUME_DIST 分布函数 分组内小于等于当前rank值的行数 ÷ 分组内总行数,结果为0~1的百分比值

5. ROWS BETWEEN - 窗口帧子句

定义窗口函数的计算范围。

sql 复制代码
-- 各种窗口帧的示例
SELECT 
    salesperson,
    sale_date,
    amount,
    -- 默认:分区内所有行
    SUM(amount) OVER (PARTITION BY salesperson) AS total_all,
    
    -- ROWS模式:物理行
    SUM(amount) OVER (
        PARTITION BY salesperson 
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_rows,
    
    -- RANGE模式:逻辑值范围(相同值的行视为同一帧)
    SUM(amount) OVER (
        PARTITION BY salesperson 
        ORDER BY sale_date
        RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_range,
    
    -- 滑动窗口:当前行及前2行
    SUM(amount) OVER (
        PARTITION BY salesperson 
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS sum_last_3,
    
    -- 前后各一行
    SUM(amount) OVER (
        PARTITION BY salesperson 
        ORDER BY sale_date
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    ) AS sum_neighbors
FROM sales
ORDER BY salesperson, sale_date;

三、窗口函数分类详解

1. 序号函数(Ranking Functions)

sql 复制代码
-- 创建测试数据
CREATE TABLE employees (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(50),
    department VARCHAR(50),
    salary DECIMAL(10, 2)
);

INSERT INTO employees (name, department, salary) VALUES
('张三', '技术部', 8000.00),
('李四', '技术部', 9000.00),
('王五', '技术部', 9500.00),
('赵六', '技术部', 9000.00),
('钱七', '销售部', 7000.00),
('孙八', '销售部', 8500.00),
('周九', '销售部', 8500.00),
('吴十', '销售部', 7500.00);

-- 1. ROW_NUMBER():连续不重复的序号
SELECT 
    name,
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;

-- 2. RANK():有间隔的排名(相同值排名相同,下一个排名跳跃)
SELECT 
    name,
    department,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank_num
FROM employees;

-- 3. DENSE_RANK():无间隔的排名(相同值排名相同,下一个排名连续)
SELECT 
    name,
    department,
    salary,
    DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank_num
FROM employees;

-- 4. NTILE(n):将数据分为n组
SELECT 
    name,
    department,
    salary,
    NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) AS quartile
FROM employees;

输出对比:

复制代码
部门   | 姓名 | 薪资   | ROW_NUMBER | RANK | DENSE_RANK | NTILE(4)
------|------|--------|------------|------|------------|---------
技术部 | 王五 | 9500   | 1          | 1    | 1          | 1
技术部 | 李四 | 9000   | 2          | 2    | 2          | 1
技术部 | 赵六 | 9000   | 3          | 2    | 2          | 2
技术部 | 张三 | 8000   | 4          | 4    | 3          | 2
销售部 | 孙八 | 8500   | 1          | 1    | 1          | 1
销售部 | 周九 | 8500   | 2          | 1    | 1          | 1
销售部 | 吴十 | 7500   | 3          | 3    | 2          | 2
销售部 | 钱七 | 7000   | 4          | 4    | 3          | 2

2. 分布函数(Distribution Functions)

sql 复制代码
-- 5. PERCENT_RANK():百分比排名 (rank - 1) / (total_rows - 1)
SELECT 
    name,
    department,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary) AS rank_num,
    PERCENT_RANK() OVER (PARTITION BY department ORDER BY salary) AS percent_rank
FROM employees;

-- 6. CUME_DIST():累计分布(小于等于当前值的行数 / 总行数)
SELECT 
    name,
    department,
    salary,
    CUME_DIST() OVER (PARTITION BY department ORDER BY salary) AS cume_dist
FROM employees;

-- 7. PERCENTILE_CONT():连续百分位数(需要MySQL 8.0.2+)
SELECT 
    department,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) 
        OVER (PARTITION BY department) AS median_salary
FROM employees
GROUP BY department, salary;

-- 8. PERCENTILE_DISC():离散百分位数
SELECT 
    department,
    PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary) 
        OVER (PARTITION BY department) AS median_salary
FROM employees
GROUP BY department, salary;

3. 前后值函数(Value Functions)

sql 复制代码
-- 9. LAG(column, n, default):获取前n行的值
SELECT 
    name,
    department,
    salary,
    LAG(salary, 1, 0) OVER (PARTITION BY department ORDER BY salary) AS prev_salary,
    salary - LAG(salary, 1, 0) OVER (PARTITION BY department ORDER BY salary) AS salary_diff
FROM employees;

-- 10. LEAD(column, n, default):获取后n行的值
SELECT 
    name,
    department,
    sale_date,
    amount,
    LEAD(amount, 1, 0) OVER (PARTITION BY salesperson ORDER BY sale_date) AS next_amount,
    LEAD(sale_date, 1, NULL) OVER (PARTITION BY salesperson ORDER BY sale_date) AS next_date
FROM sales;

-- 11. FIRST_VALUE(column):窗口内第一个值
SELECT 
    name,
    department,
    salary,
    FIRST_VALUE(salary) OVER (
        PARTITION BY department 
        ORDER BY salary 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS lowest_salary,
    salary - FIRST_VALUE(salary) OVER (
        PARTITION BY department 
        ORDER BY salary 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS diff_from_lowest
FROM employees;

-- 12. LAST_VALUE(column):窗口内最后一个值(注意默认窗口帧!)
SELECT 
    name,
    department,
    salary,
    -- 错误用法:默认窗口帧是 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS wrong_last_value,
    
    -- 正确用法:指定完整的窗口帧
    LAST_VALUE(salary) OVER (
        PARTITION BY department 
        ORDER BY salary 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS correct_last_value,
    
    -- 或者使用NTH_VALUE
    NTH_VALUE(salary, 1) OVER (
        PARTITION BY department 
        ORDER BY salary 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS first_salary,
    
    NTH_VALUE(salary, 2) OVER (
        PARTITION BY department 
        ORDER BY salary 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS second_salary
FROM employees;

4. 聚合函数作为窗口函数

sql 复制代码
-- 所有聚合函数都可以作为窗口函数使用
SELECT 
    salesperson,
    region,
    sale_date,
    amount,
    -- 聚合函数
    COUNT(*) OVER (PARTITION BY salesperson) AS total_transactions,
    SUM(amount) OVER (PARTITION BY salesperson) AS total_amount,
    AVG(amount) OVER (PARTITION BY salesperson) AS avg_amount,
    MAX(amount) OVER (PARTITION BY salesperson) AS max_amount,
    MIN(amount) OVER (PARTITION BY salesperson) AS min_amount,
    
    -- 标准差和方差(MySQL 8.0+)
    STDDEV(amount) OVER (PARTITION BY salesperson) AS std_amount,
    VARIANCE(amount) OVER (PARTITION BY salesperson) AS var_amount,
    
    -- 累计聚合
    SUM(amount) OVER (
        PARTITION BY salesperson 
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total,
    
    -- 移动平均
    AVG(amount) OVER (
        PARTITION BY salesperson 
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3,
    
    -- 百分比
    amount * 100.0 / SUM(amount) OVER (PARTITION BY salesperson) AS percentage
FROM sales
ORDER BY salesperson, sale_date;
相关推荐
静心观复2 小时前
Java 中,`1 << 1`
java·开发语言
爱睡觉的王宇昊2 小时前
单体架构详细解析:从概念到实践--购物网站搭建
java·spring boot·架构·团队开发·个人开发·敏捷流程
不思念一个荒废的名字3 小时前
【黑马JavaWeb+AI知识梳理】Web后端开发04-登录认证
java·后端
java1234_小锋3 小时前
Redis到底支不支持事务啊?
java·数据库·redis
爱笑的眼睛113 小时前
超越`cross_val_score`:深入剖析Scikit-learn交叉验证API的设计哲学与高阶实践
java·人工智能·python·ai
L0CK3 小时前
三种依赖注入详解
java
Cat God 0073 小时前
基于Docker的MySQL 主从复制(读写分离)
mysql·docker·容器
shoubepatien4 小时前
JAVA -- 07
java·后端·intellij-idea
Gu_yyqx4 小时前
Maven进阶
java·maven