MySQL窗口函数 OVER()讲解

目录

  • 一、窗口函数概述
    • [1. 什么是窗口函数?](#1. 什么是窗口函数?)
    • [2. 窗口函数 vs 聚合函数](#2. 窗口函数 vs 聚合函数)
    • [3. 基本语法结构](#3. 基本语法结构)
  • 二、窗口函数核心组成部分
    • [1. PARTITION BY - 分区子句](#1. PARTITION BY - 分区子句)
    • [2. ORDER BY - 排序子句](#2. ORDER BY - 排序子句)
    • 3.聚合窗口函数
    • 4.专用窗口函数
    • [5. ROWS BETWEEN - 窗口帧子句](#5. ROWS BETWEEN - 窗口帧子句)
  • 三、窗口函数分类详解
    • [1. 序号函数(Ranking Functions)](#1. 序号函数(Ranking Functions))
    • [2. 分布函数(Distribution Functions)](#2. 分布函数(Distribution Functions))
    • [3. 前后值函数(Value Functions)](#3. 前后值函数(Value Functions))
    • [4. 聚合函数作为窗口函数](#4. 聚合函数作为窗口函数)

一、窗口函数概述

1. 什么是窗口函数?

窗口函数Window Function)是对一组行(称为"窗口")执行计算,并为每一行返回一个值 的函数。与聚合函数不同,窗口函数不减少行数。

2. 窗口函数 vs 聚合函数

特性 窗口函数 聚合函数
返回行数 与输入行数相同 通常减少行数(GROUP BY)
分组效果 保留所有行,添加计算结果 每组返回一行
语法位置 SELECT 子句中 SELECT 或 HAVING 子句中
典型函数 ROW_NUMBER(), RANK(), SUM() OVER() SUM(), COUNT(), AVG()

3. 基本语法结构

sql 复制代码
窗口函数([参数]) OVER (
  [PARTITION BY <分组列>] 
  [ORDER BY <排序列 ASC/DESC>]
  [ROWS BETWEEN 开始行 AND 结束行]
)
  • OVER() 里面不能直接放 GROUP BY!可以放PARTITION BY
  • PARTITION BY 子句用于指定分组列,关键字:PARTITION BY
  • ORDER BY 子句用于指定排序列,关键字ORDER BY
  • ROWS BETWEEN 子句用于指定窗口的范围 ,关键字ROWS BETWEEN 即[开始行]、[结束行]

其中,ROWS BETWEEN 子句在实际中可能用得相对少一些,因此有部分参考资料的语法描述省略了ROWS BETWEEN 子句,主要侧重于PARTITION BY分组与ORDER BY排序:

二、窗口函数核心组成部分

1. PARTITION BY - 分区子句

将数据划分为多个分区,在每个分区内独立计算。

sql 复制代码
-- 创建测试数据
CREATE TABLE sales (
    id INT PRIMARY KEY AUTO_INCREMENT,
    salesperson VARCHAR(50),
    region VARCHAR(50),
    sale_date DATE,
    amount DECIMAL(10, 2)
);

INSERT INTO sales (salesperson, region, sale_date, amount) VALUES
('张三', '北京', '2024-01-01', 1000.00),
('张三', '北京', '2024-01-02', 1500.00),
('李四', '上海', '2024-01-01', 2000.00),
('李四', '上海', '2024-01-02', 2500.00),
('王五', '北京', '2024-01-01', 1200.00),
('王五', '北京', '2024-01-03', 1800.00),
('赵六', '广州', '2024-01-02', 2200.00);

-- 按销售员分区计算
SELECT 
    salesperson,
    sale_date,
    amount,
    -- 每个销售员的销售总额
    SUM(amount) OVER (PARTITION BY salesperson) AS total_by_person,
    -- 每个地区的销售总额
    SUM(amount) OVER (PARTITION BY region) AS total_by_region,
    -- 不分区(全局总额)
    SUM(amount) OVER () AS grand_total
FROM sales
ORDER BY salesperson, sale_date;

输出结果:

复制代码
salesperson | sale_date  | amount | total_by_person | total_by_region | grand_total
------------|------------|--------|-----------------|-----------------|------------
张三        | 2024-01-01 | 1000.00| 2500.00         | 5500.00         | 12200.00
张三        | 2024-01-02 | 1500.00| 2500.00         | 5500.00         | 12200.00
李四        | 2024-01-01 | 2000.00| 4500.00         | 4500.00         | 12200.00
李四        | 2024-01-02 | 2500.00| 4500.00         | 4500.00         | 12200.00
王五        | 2024-01-01 | 1200.00| 3000.00         | 5500.00         | 12200.00
王五        | 2024-01-03 | 1800.00| 3000.00         | 5500.00         | 12200.00
赵六        | 2024-01-02 | 2200.00| 2200.00         | 2200.00         | 12200.00

2. ORDER BY - 排序子句

在分区内对行进行排序,影响排名函数和累计计算

sql 复制代码
SELECT 
    salesperson,
    sale_date,
    amount,
    -- 按金额排序(分区内)
    ROW_NUMBER() OVER (PARTITION BY salesperson ORDER BY amount DESC) AS rn,
    -- 累计金额(分区内按日期排序)
    SUM(amount) OVER (PARTITION BY salesperson ORDER BY sale_date) AS running_total,
    -- 移动平均值(最近3行的平均值)
    AVG(amount) OVER (PARTITION BY salesperson ORDER BY sale_date 
         ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_3
FROM sales
ORDER BY salesperson, sale_date;

3.聚合窗口函数

许多窗口函数的教程,通常将常用的窗口函数分为两大类:聚合窗口函数专用窗口函数。聚合窗口函数的函数名与普通常用聚合函数一致,功能也一致。从使用的角度来讲,与普通聚合函数的区别在于提供了窗口函数的专属子句,来使得数据的分析与获取更简便。主要有如下几个:

函数名 作用
SUM 对指定列的数值求和
AVG 计算指定列的平均值
COUNT 统计记录/非空值数量
MAX 找出指定列的最大值
MIN 找出指定列的最小值

4.专用窗口函数

常见的专用窗口函数

函数名 分类 说明
RANK 排序函数 类似于排名,并列的结果序号可以重复,序号不连续(如:1,2,2,4)
DENSE_RANK 排序函数 类似于排名,并列的结果序号可以重复,序号连续(如:1,2,2,3)
ROW_NUMBER 排序函数 对分组下的所有结果排序,基于分组分配唯一连续的行号(如:1,2,3,4)
PERCENT_RANK 分布函数 每行按公式 (rank-1) / (rows-1) 计算,结果为0~1的百分比值
CUME_DIST 分布函数 分组内小于等于当前rank值的行数 ÷ 分组内总行数,结果为0~1的百分比值

5. ROWS BETWEEN - 窗口帧子句

定义窗口函数的计算范围。

sql 复制代码
-- 各种窗口帧的示例
SELECT 
    salesperson,
    sale_date,
    amount,
    -- 默认:分区内所有行
    SUM(amount) OVER (PARTITION BY salesperson) AS total_all,
    
    -- ROWS模式:物理行
    SUM(amount) OVER (
        PARTITION BY salesperson 
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_rows,
    
    -- RANGE模式:逻辑值范围(相同值的行视为同一帧)
    SUM(amount) OVER (
        PARTITION BY salesperson 
        ORDER BY sale_date
        RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_range,
    
    -- 滑动窗口:当前行及前2行
    SUM(amount) OVER (
        PARTITION BY salesperson 
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS sum_last_3,
    
    -- 前后各一行
    SUM(amount) OVER (
        PARTITION BY salesperson 
        ORDER BY sale_date
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    ) AS sum_neighbors
FROM sales
ORDER BY salesperson, sale_date;

三、窗口函数分类详解

1. 序号函数(Ranking Functions)

sql 复制代码
-- 创建测试数据
CREATE TABLE employees (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(50),
    department VARCHAR(50),
    salary DECIMAL(10, 2)
);

INSERT INTO employees (name, department, salary) VALUES
('张三', '技术部', 8000.00),
('李四', '技术部', 9000.00),
('王五', '技术部', 9500.00),
('赵六', '技术部', 9000.00),
('钱七', '销售部', 7000.00),
('孙八', '销售部', 8500.00),
('周九', '销售部', 8500.00),
('吴十', '销售部', 7500.00);

-- 1. ROW_NUMBER():连续不重复的序号
SELECT 
    name,
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;

-- 2. RANK():有间隔的排名(相同值排名相同,下一个排名跳跃)
SELECT 
    name,
    department,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank_num
FROM employees;

-- 3. DENSE_RANK():无间隔的排名(相同值排名相同,下一个排名连续)
SELECT 
    name,
    department,
    salary,
    DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank_num
FROM employees;

-- 4. NTILE(n):将数据分为n组
SELECT 
    name,
    department,
    salary,
    NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) AS quartile
FROM employees;

输出对比:

复制代码
部门   | 姓名 | 薪资   | ROW_NUMBER | RANK | DENSE_RANK | NTILE(4)
------|------|--------|------------|------|------------|---------
技术部 | 王五 | 9500   | 1          | 1    | 1          | 1
技术部 | 李四 | 9000   | 2          | 2    | 2          | 1
技术部 | 赵六 | 9000   | 3          | 2    | 2          | 2
技术部 | 张三 | 8000   | 4          | 4    | 3          | 2
销售部 | 孙八 | 8500   | 1          | 1    | 1          | 1
销售部 | 周九 | 8500   | 2          | 1    | 1          | 1
销售部 | 吴十 | 7500   | 3          | 3    | 2          | 2
销售部 | 钱七 | 7000   | 4          | 4    | 3          | 2

2. 分布函数(Distribution Functions)

sql 复制代码
-- 5. PERCENT_RANK():百分比排名 (rank - 1) / (total_rows - 1)
SELECT 
    name,
    department,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary) AS rank_num,
    PERCENT_RANK() OVER (PARTITION BY department ORDER BY salary) AS percent_rank
FROM employees;

-- 6. CUME_DIST():累计分布(小于等于当前值的行数 / 总行数)
SELECT 
    name,
    department,
    salary,
    CUME_DIST() OVER (PARTITION BY department ORDER BY salary) AS cume_dist
FROM employees;

-- 7. PERCENTILE_CONT():连续百分位数(需要MySQL 8.0.2+)
SELECT 
    department,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) 
        OVER (PARTITION BY department) AS median_salary
FROM employees
GROUP BY department, salary;

-- 8. PERCENTILE_DISC():离散百分位数
SELECT 
    department,
    PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary) 
        OVER (PARTITION BY department) AS median_salary
FROM employees
GROUP BY department, salary;

3. 前后值函数(Value Functions)

sql 复制代码
-- 9. LAG(column, n, default):获取前n行的值
SELECT 
    name,
    department,
    salary,
    LAG(salary, 1, 0) OVER (PARTITION BY department ORDER BY salary) AS prev_salary,
    salary - LAG(salary, 1, 0) OVER (PARTITION BY department ORDER BY salary) AS salary_diff
FROM employees;

-- 10. LEAD(column, n, default):获取后n行的值
SELECT 
    name,
    department,
    sale_date,
    amount,
    LEAD(amount, 1, 0) OVER (PARTITION BY salesperson ORDER BY sale_date) AS next_amount,
    LEAD(sale_date, 1, NULL) OVER (PARTITION BY salesperson ORDER BY sale_date) AS next_date
FROM sales;

-- 11. FIRST_VALUE(column):窗口内第一个值
SELECT 
    name,
    department,
    salary,
    FIRST_VALUE(salary) OVER (
        PARTITION BY department 
        ORDER BY salary 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS lowest_salary,
    salary - FIRST_VALUE(salary) OVER (
        PARTITION BY department 
        ORDER BY salary 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS diff_from_lowest
FROM employees;

-- 12. LAST_VALUE(column):窗口内最后一个值(注意默认窗口帧!)
SELECT 
    name,
    department,
    salary,
    -- 错误用法:默认窗口帧是 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS wrong_last_value,
    
    -- 正确用法:指定完整的窗口帧
    LAST_VALUE(salary) OVER (
        PARTITION BY department 
        ORDER BY salary 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS correct_last_value,
    
    -- 或者使用NTH_VALUE
    NTH_VALUE(salary, 1) OVER (
        PARTITION BY department 
        ORDER BY salary 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS first_salary,
    
    NTH_VALUE(salary, 2) OVER (
        PARTITION BY department 
        ORDER BY salary 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS second_salary
FROM employees;

4. 聚合函数作为窗口函数

sql 复制代码
-- 所有聚合函数都可以作为窗口函数使用
SELECT 
    salesperson,
    region,
    sale_date,
    amount,
    -- 聚合函数
    COUNT(*) OVER (PARTITION BY salesperson) AS total_transactions,
    SUM(amount) OVER (PARTITION BY salesperson) AS total_amount,
    AVG(amount) OVER (PARTITION BY salesperson) AS avg_amount,
    MAX(amount) OVER (PARTITION BY salesperson) AS max_amount,
    MIN(amount) OVER (PARTITION BY salesperson) AS min_amount,
    
    -- 标准差和方差(MySQL 8.0+)
    STDDEV(amount) OVER (PARTITION BY salesperson) AS std_amount,
    VARIANCE(amount) OVER (PARTITION BY salesperson) AS var_amount,
    
    -- 累计聚合
    SUM(amount) OVER (
        PARTITION BY salesperson 
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total,
    
    -- 移动平均
    AVG(amount) OVER (
        PARTITION BY salesperson 
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3,
    
    -- 百分比
    amount * 100.0 / SUM(amount) OVER (PARTITION BY salesperson) AS percentage
FROM sales
ORDER BY salesperson, sale_date;
相关推荐
Ray Liang35 分钟前
用六边形架构与整洁架构对比是伪命题?
java·python·c#·架构设计
Java水解1 小时前
Java 中间件:Dubbo 服务降级(Mock 机制)
java·后端
SimonKing5 小时前
OpenCode AI辅助编程,不一样的编程思路,不写一行代码
java·后端·程序员
FastBean5 小时前
Jackson View Extension Spring Boot Starter
java·后端
Seven976 小时前
剑指offer-79、最⻓不含重复字符的⼦字符串
java
皮皮林55116 小时前
Java性能调优黑科技!1行代码实现毫秒级耗时追踪,效率飙升300%!
java
冰_河16 小时前
QPS从300到3100:我靠一行代码让接口性能暴涨10倍,系统性能原地起飞!!
java·后端·性能优化
桦说编程19 小时前
从 ForkJoinPool 的 Compensate 看并发框架的线程补偿思想
java·后端·源码阅读
躺平大鹅20 小时前
Java面向对象入门(类与对象,新手秒懂)
java