SQL 窗口函数

窗口函数总结

一、窗口函数概念

在关系型数据库管理系统(RDBMS)中,SQL的窗口函数(Window Functions)是SQL中一种特殊的函数类型,它可以在不改变原始数据行数的情况下,对数据进行分组计算和排序操作。

窗口函数是一种特殊的SQL函数,它保留了每一行的原始数据,但在此基础上对每一行计算出额外的分析数据,窗口函数不会"折叠"表中的行。与传统聚合函数(如SUM()、AVG())不同,聚合函数通常将数据汇总到单行,而窗口函数为每一行都计算相应的分析值。

二、窗口函数基本语法

窗口函数的语法结构

sql 复制代码
SELECT
    column1,
    column2,
    window_function(column3) OVER (
        PARTITION BY column4
        ORDER BY column5
        ROWS/RANGE BETWEEN start AND end
    ) AS result
FROM
    table_name;

语法组成部分:

  • window_function:窗口函数的名称,用于对数据进行操作(如排名、累计求和等),常见的函数包括RANK()、ROW_NUMBER()、SUM()、AVG()等
  • OVER:窗口函数的关键字,表示窗口函数的计算范围
  • PARTITION BY:可选的子句,按照某些列对数据进行分组,每个分区将单独处理,类似于GROUP BY,但不会将数据聚合成一行
  • ORDER BY:可选的子句,定义数据的排序方式,通常是窗口范围内的数据需要按某列排序
  • ROWS/RANGE BETWEEN:可选的子句,限定窗口函数所运用的数据范围

窗口范围关键字:

关键字 解释
UNBOUNDED PRECEDING 从当前窗口的开头到当前行
UNBOUNDED FOLLOWING 从当前行到窗口的末尾
CURRENT ROW 当前行
PRECEDING 当前行往前的第n行(包含当前行)
FOLLOWING 当前行往后的第n行(包含当前行)

示例:

取当前行和前五行:ROWS between 5 preceding and current row --共6行

取当前行和后五行:ROWS between current row and 5 following --共6行

取前五行和后五行:ROWS between 5 preceding and 5 following --共11行

取当前行和前六行:ROWS 6 preceding(等价于between...and current row) --共7行

这一天和前面6天:RANGE between interval 6 day preceding and current row --共7天

这一天和前面6天:RANGE interval 6 day preceding(等价于between...and current row) --共7天

字段值落在当前值-100到+200的区间:RANGE between 100 preceding and 200 following --共301个数值

三、窗口函数分类

函数部分

聚合函数:SUM(), AVG(), COUNT(), MAX(), MIN()

分析函数:LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()

排序函数:ROW_NUMBER(), RANK(), DENSE_RANK()

3.1 聚合函数类

  • SUM/COUNT/MAX/MIN/AVG/MEDIAN等都可以作为窗口函数使用

示例:

sql 复制代码
-- 统计订单月销售金额及截止到当月的累计销售金额
SELECT 
    MONTH(pay_time) pay_mount, 
    SUM(amount) amount,
    SUM(SUM(amount)) OVER (
        ORDER BY MONTH(pay_time) ASC 
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS amount_total 
FROM order_detail
GROUP BY pay_mount
ORDER BY pay_mount

3.2 排序函数类

  • ROW_NUMBER():为每行分配唯一的排序编号
  • RANK():排名相同名次并列,且名次中留下空位
  • DENSE_RANK():排名相同名次并列,名次中不会留下空位
  • CUME_DIST:行数/分组内总行数,相当于百分位
  • PERCENT_RANK():计算当前行的百分比排名
  • NTILE(n):将分组数据均匀切片分n组

示例:

sql 复制代码
-- 按部门和销售额排序,分配行号
SELECT 
    id, department, sales_amount,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY sales_amount DESC) AS rank_num
FROM sales;

-- 累计销售额计算
SELECT 
    department, sales_amount,
    SUM(sales_amount) OVER (PARTITION BY department ORDER BY sales_amount DESC) AS cumulative_sales
FROM sales;

【功能】ROW_NUMBER 函数为每行分配一个唯一的序号,在每个分区内部从1开始递增,即使遇到相同值也会分配不同的序号。

ROW_NUMBER() OVER (PARTITION BY partition_columns ORDER BY order_columns)

【说明】

  • 唯一性:为每一行分配唯一的序号
  • 连续性:序号从1开始连续递增
  • 无重复:即使值相同,序号也不同
应用示例
sql 复制代码
-- 查询每个部门员工薪资排名(序号唯一)
SELECT 
    department,
    employee_name,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_rank
FROM employees;

-- 删除重复数据(只保留每组的第一条记录)
SELECT * FROM (
    SELECT 
        id, name, email,
        ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
    FROM users
) t WHERE rn = 1;

【功能】RANK 函数根据排序条件为行分配排名,相同值获得相同的排名,但后续排名会跳跃。

RANK() OVER (PARTITION BY partition_columns ORDER BY order_columns)

【说明】

  • 并列排名:相同值获得相同排名
  • 排名跳跃:相同值后,下一个排名会跳跃
  • 非连续性:可能存在排名断层
应用示例
sql 复制代码
-- 学生成绩排名(并列情况会跳名次)
SELECT 
    student_name,
    score,
    RANK() OVER (ORDER BY score DESC) AS rank_score
FROM students;

-- 示例结果:
-- 张三 95分 rank=1
-- 李四 90分 rank=2  
-- 王五 90分 rank=2  (并列第二)
-- 赵六 85分 rank=4  (跳到第四名)

【功能】DENSE_RANK 函数与RANK类似,但相同值获得相同排名后,后续排名不会跳跃,保持连续。

DENSE_RANK() OVER (PARTITION BY partition_columns ORDER BY order_columns)

【说明】

  • 并列排名:相同值获得相同排名
  • 连续性:后续排名不跳跃
  • 密集排名:排名数字连续
应用示例
sql 复制代码
-- 学生成绩排名(并列后继续递增)
SELECT 
    student_name,
    score,
    DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank_score
FROM students;

-- 示例结果:
-- 张三 95分 rank=1
-- 李四 90分 rank=2  
-- 王五 90分 rank=2  (并列第二)
-- 赵六 85分 rank=3  (继续第三名)

【功能】PERCENT_RANK 函数计算当前行在其分区中的相对位置,返回0到1之间的百分比排名。

PERCENT_RANK() OVER (PARTITION BY partition_columns ORDER BY order_columns)

计算公式:PERCENT_RANK = (当前行的RANK值 - 1) / (分区内总行数 - 1)

【说明】

  • 范围:返回值在0到1之间
  • 标准化:提供标准化的排名百分比
  • 首尾值:第一行始终为0,最后一行始终为1
应用示例
sql 复制代码
-- 计算员工薪资在公司中的百分位排名
SELECT 
    employee_name,
    salary,
    PERCENT_RANK() OVER (ORDER BY salary) AS salary_percentile
FROM employees;

-- 按部门计算薪资百分位排名
SELECT 
    department,
    employee_name,
    salary,
    PERCENT_RANK() OVER (PARTITION BY department ORDER BY salary) AS dept_salary_percentile
FROM employees;

函数对比示例

sql 复制代码
-- 假设有以下数据:
-- 人员    分数
-- A       90
-- B       85
-- C       85
-- D       80
-- E       75

SELECT 
    name,
    score,
    ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num,
    RANK() OVER (ORDER BY score DESC) AS rank_val,
    DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank_val,
    PERCENT_RANK() OVER (ORDER BY score DESC) AS percent_rank_val
FROM scores;

-- 结果:
-- name | score | row_num | rank_val | dense_rank_val | percent_rank_val
-- A    | 90    | 1       | 1        | 1              | 0.0
-- B    | 85    | 2       | 2        | 2              | 0.25
-- C    | 85    | 3       | 2        | 2              | 0.25
-- D    | 80    | 4       | 4        | 3              | 0.75
-- E    | 75    | 5       | 5        | 4              | 1.0

排名函数的对比

函数 相同值处理 排名连续性 示例(分数90,85,85,80)
ROW_NUMBER() 分配不同排名 连续 1,2,3,4
RANK() 相同排名,后续跳跃 非连续 1,2,2,4
DENSE_RANK() 相同排名,后续连续 连续 1,2,2,3
1. TOP-N 查询
sql 复制代码
-- 查询每个部门薪资前3名的员工
SELECT * FROM (
    SELECT 
        department,
        employee_name,
        salary,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
) t WHERE rn <= 3;
2. 分位数分析
sql 复制代码
-- 识别薪资分布中的四分位数
SELECT 
    employee_name,
    salary,
    CASE 
        WHEN PERCENT_RANK() OVER (ORDER BY salary) <= 0.25 THEN 'Q1'
        WHEN PERCENT_RANK() OVER (ORDER BY salary) <= 0.50 THEN 'Q2'
        WHEN PERCENT_RANK() OVER (ORDER BY salary) <= 0.75 THEN 'Q3'
        ELSE 'Q4'
    END AS quartile
FROM employees;
3. 并列奖项处理
sql 复制代码
-- 体育比赛排名,需要考虑并列情况
SELECT 
    athlete_name,
    score,
    RANK() OVER (ORDER BY score DESC) AS overall_rank,
    CASE 
        WHEN RANK() OVER (ORDER BY score DESC) <= 3 THEN 'Podium'
        ELSE 'Other'
    END AS award_status
FROM competition_results;
4. 销售业绩排名
sql 复制代码
-- 按销售额进行密集排名
SELECT 
    salesperson,
    department,
    sales_amount,
    DENSE_RANK() OVER (PARTITION BY department ORDER BY sales_amount DESC) AS performance_rank
FROM sales_performance;
5. 产品评分排名
sql 复制代码
-- 按产品评分进行排名
SELECT 
    product_name,
    category,
    average_rating,
    DENSE_RANK() OVER (PARTITION BY category ORDER BY average_rating DESC) AS rating_rank
FROM product_reviews;
6. 奖项评定
sql 复制代码
-- 需要确定前N名获奖者,且允许并列的情况
SELECT 
    employee_name,
    score,
    DENSE_RANK() OVER (ORDER BY score DESC) AS award_rank
FROM contest_results
WHERE DENSE_RANK() OVER (ORDER BY score DESC) <= 5;  -- 前5名获奖
7. 绩效评估
sql 复制代码
-- 员工绩效等级划分
SELECT 
    employee_name,
    performance_score,
    CASE 
        WHEN DENSE_RANK() OVER (ORDER BY performance_score DESC) <= 10 THEN 'Top Performer'
        WHEN DENSE_RANK() OVER (ORDER BY performance_score DESC) <= 30 THEN 'High Performer'
        WHEN DENSE_RANK() OVER (ORDER BY performance_score DESC) <= 60 THEN 'Average Performer'
        ELSE 'Needs Improvement'
    END AS performance_level
FROM employee_performance;
8. 排行榜应用
sql 复制代码
-- 游戏排行榜,相同分数的玩家排名相同
SELECT 
    player_name,
    score,
    DENSE_RANK() OVER (ORDER BY score DESC) AS leaderboard_position
FROM game_scores;

选择建议

  • ROW_NUMBER:需要唯一排名,不允许并列
  • RANK:允许并列,但接受排名跳跃
  • DENSE_RANK:允许并列,但要求排名连续
  • PERCENT_RANK:需要标准化的百分比排名位置

3.3 分析函数类

  • FIRST_VALUE(column):取分组内排序后,截止到当前行,column列的第一个值
  • LAST_VALUE(column):取分组内排序后,截止到当前行,column列的最后一个值
  • LEAD(column, offset, default_value):偏移函数,取窗口内column列往下第n行的值
  • LAG(column, offset, default_value) :偏移函数,取窗口内column列往上第n行的值
    说明:FIRST_VALUE/LAST_VALUE关注绝对位置(第一行或最后一行),而LEAD/LAG关注相对位置(前n行或后n行)

示例:

sql 复制代码
-- 获取用户相邻两次购买同类产品的日期差
SELECT 
    user_id, product, amount, pay_time,
    LAG(pay_time) OVER (PARTITION BY user_id, product ORDER BY pay_time) last_pay_time
FROM order_detail
ORDER BY user_id, product

【功能】访问当前行之后的第n行数据,适用于预测、后续值比较等场景

LEAD(column, offset, default_value) OVER (PARTITION BY partition_columns ORDER BY order_columns)

【参数说明】

column: 要访问的列名

offset: 偏移量,默认为1(向下一行)

default_value: 当偏移超出边界时的默认值,可选参数

【功能】访问当前行之前的第n行数据,适用于历史对比、变化分析等场景

LAG(column, offset, default_value) OVER (PARTITION BY partition_columns ORDER BY order_columns)

【参数说明】

column: 要访问的列名

offset: 偏移量,默认为1(向上一行)

default_value: 当偏移超出边界时的默认值,可选参数

【性能优化建议】

确保ORDER BY列上有索引,以提高窗口函数的执行效率

合理使用PARTITION BY,避免过度细分导致性能下降

谨慎设置偏移量,过大的偏移量可能影响查询性能

1. 时间序列分析
sql 复制代码
-- 分析用户连续购买行为
SELECT 
    user_id,
    purchase_date,
    amount,
    LAG(purchase_date) OVER (PARTITION BY user_id ORDER BY purchase_date) AS last_purchase_date,
    LEAD(purchase_date) OVER (PARTITION BY user_id ORDER BY purchase_date) AS next_purchase_date,
    DATEDIFF(purchase_date, LAG(purchase_date) OVER (PARTITION BY user_id ORDER BY purchase_date)) AS days_since_last
FROM purchases;
2. 数据变化分析
sql 复制代码
-- 计算销售额的环比增长
SELECT 
    month,
    sales_amount,
    LAG(sales_amount) OVER (ORDER BY month) AS prev_month_sales,
    sales_amount - LAG(sales_amount) OVER (ORDER BY month) AS difference,
    (sales_amount - LAG(sales_amount) OVER (ORDER BY month)) / LAG(sales_amount) OVER (ORDER BY month) * 100 AS growth_rate
FROM monthly_sales;
3. 库存周转分析
sql 复制代码
-- 分析库存变化
SELECT 
    date,
    inventory_count,
    LEAD(inventory_count) OVER (ORDER BY date) AS next_day_inventory,
    inventory_count - LEAD(inventory_count) OVER (ORDER BY date) AS inventory_change
FROM inventory_log;
4. 价格波动分析
sql 复制代码
-- 股票价格分析
SELECT 
    trade_date,
    closing_price,
    LAG(closing_price, 1) OVER (ORDER BY trade_date) AS prev_close,
    LAG(closing_price, 5) OVER (ORDER BY trade_date) AS prev_week_close,
    LEAD(closing_price, 1) OVER (ORDER BY trade_date) AS next_close,
    (closing_price - LAG(closing_price, 1) OVER (ORDER BY trade_date)) / LAG(closing_price, 1) OVER (ORDER BY trade_date) * 100 AS daily_return
FROM stock_prices;
5. 默认值处理
sql 复制代码
-- 当没有前一行数据时,使用0作为默认值
SELECT 
    date,
    value,
    LAG(value, 1, 0) OVER (ORDER BY date) AS prev_value_with_default
FROM data_table;
6. 复杂偏移分析
sql 复制代码
-- 分析同比数据(去年同期对比)
SELECT 
    year,
    month,
    revenue,
    LAG(revenue, 12) OVER (ORDER BY year, month) AS last_year_revenue,
    CASE 
        WHEN LAG(revenue, 12) OVER (ORDER BY year, month) IS NOT NULL
        THEN (revenue - LAG(revenue, 12) OVER (ORDER BY year, month)) / LAG(revenue, 12) OVER (ORDER BY year, month) * 100
        ELSE NULL 
    END AS yoy_growth_rate
FROM monthly_revenue;
7. 结合分组使用
sql 复制代码
-- 按产品类别分析销售变化
SELECT 
    product_category,
    sale_date,
    sales,
    LAG(sales) OVER (PARTITION BY product_category ORDER BY sale_date) AS prev_sales,
    LEAD(sales) OVER (PARTITION BY product_category ORDER BY sale_date) AS next_sales
FROM sales_data;

3.4 比例分析函数

用于计算某行占某组总量的比例

示例:

sql 复制代码
-- 计算每个部门中,每行销售额占整个部门销售额的百分比
SELECT 
    department, sales_amount,
    ROUND(
        sales_amount * 1.0 / SUM(sales_amount) OVER (PARTITION BY department) * 100, 2
    ) AS percentage
FROM sales;

3.5 滑动窗口函数

基于ROWS(绝对行号)或RANGE(逻辑值范围)进行滑动计算

示例:

sql 复制代码
-- 计算每一行的当前值和前两行的销售总和
SELECT 
    id, department, sales_amount,
    SUM(sales_amount) OVER (
        PARTITION BY department 
        ORDER BY sales_amount 
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS sliding_sum
FROM sales;

四、实际应用场景

1. 数据排名分析

销售额排名统计
sql 复制代码
-- 按销售额进行排名,相同销售额并列排名,跳过下一个排名
SELECT 
    employee_name,
    sales_amount,
    RANK() OVER (ORDER BY sales_amount DESC) AS rank_by_sales,
    DENSE_RANK() OVER (ORDER BY sales_amount DESC) AS dense_rank_by_sales,
    ROW_NUMBER() OVER (ORDER BY sales_amount DESC) AS row_number_by_sales
FROM sales_records;
学生成绩排名
sql 复制代码
-- 按班级对学生进行成绩排名
SELECT 
    class,
    student_name,
    score,
    RANK() OVER (PARTITION BY class ORDER BY score DESC) AS class_rank
FROM student_scores;

2. 累计计算

月度累计销售额
sql 复制代码
-- 按月份累计销售额
SELECT 
    sale_month,
    monthly_sales,
    SUM(monthly_sales) OVER (
        ORDER BY sale_month 
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_sales
FROM monthly_sales_data;
历史累计用户数
sql 复制代码
-- 计算每日累计注册用户数
SELECT 
    register_date,
    new_users,
    SUM(new_users) OVER (
        ORDER BY register_date 
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS total_users
FROM daily_registrations;

3. 对比分析

与上期数据对比
sql 复制代码
-- 计算与上个月销售额的差异
SELECT 
    month,
    sales,
    LAG(sales, 1) OVER (ORDER BY month) AS prev_month_sales,
    sales - LAG(sales, 1) OVER (ORDER BY month) AS diff_from_prev
FROM monthly_sales;
同比环比分析
sql 复制代码
-- 年同比增长率计算
SELECT 
    year,
    month,
    revenue,
    LAG(revenue, 12) OVER (ORDER BY year, month) AS last_year_revenue,
    CASE 
        WHEN LAG(revenue, 12) OVER (ORDER BY year, month) > 0 
        THEN (revenue - LAG(revenue, 12) OVER (ORDER BY year, month)) / LAG(revenue, 12) OVER (ORDER BY year, month) * 100
        ELSE NULL 
    END AS yoy_growth_rate
FROM monthly_revenue;

4. 分组统计

不同维度的分组统计
sql 复制代码
-- 按地区和产品类别进行销售统计
SELECT 
    region,
    product_category,
    salesperson,
    sales_amount,
    AVG(sales_amount) OVER (PARTITION BY region) AS avg_sales_by_region,
    AVG(sales_amount) OVER (PARTITION BY product_category) AS avg_sales_by_category,
    COUNT(*) OVER (PARTITION BY region, product_category) AS count_in_group
FROM sales_data;
多层次数据分析
sql 复制代码
-- 多级分组分析:国家->城市->员工
SELECT 
    country,
    city,
    employee_name,
    sales_amount,
    -- 每个国家内的销售排名
    ROW_NUMBER() OVER (PARTITION BY country ORDER BY sales_amount DESC) AS country_rank,
    -- 每个城市内的销售排名
    ROW_NUMBER() OVER (PARTITION BY country, city ORDER BY sales_amount DESC) AS city_rank,
    -- 全局排名
    ROW_NUMBER() OVER (ORDER BY sales_amount DESC) AS global_rank
FROM employee_sales;

综合示例:电商平台分析

sql 复制代码
-- 综合分析用户购买行为
SELECT 
    user_id,
    order_date,
    order_amount,
    -- 用户累计消费金额
    SUM(order_amount) OVER (PARTITION BY user_id ORDER BY order_date) AS cumulative_user_spending,
    -- 用户订单排名
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date) AS order_sequence,
    -- 上一次订单金额
    LAG(order_amount) OVER (PARTITION BY user_id ORDER BY order_date) AS prev_order_amount,
    -- 与上次订单金额的差异
    order_amount - LAG(order_amount) OVER (PARTITION BY user_id ORDER BY order_date) AS amount_diff,
    -- 本月累计订单金额
    SUM(order_amount) OVER (PARTITION BY YEAR(order_date), MONTH(order_date) ORDER BY order_date) AS monthly_cumulative
FROM orders
ORDER BY user_id, order_date;

五、使用注意事项

  1. PARTITION BYORDER BY 都是可选的
    • 如果没有PARTITION BY,则整个数据集视为一个组
    • 如果没有ORDER BY,则窗口函数不会进行任何排序
  2. 窗口范围(ROWS/RANGE)的使用需要搭配ORDER BY
  3. 性能问题:窗口函数的计算会对性能有一定影响,尤其是当PARTITION BY和ORDER BY的组合过于复杂时,建议先将基础查询的结果保存为临时表,再针对结果应用窗口函数
  4. MySQL8.0支持窗口函数,但MySQL的窗口函数不支持DISTINCT
相关推荐
顶点多余2 小时前
Mysql--索引的操作
数据库·mysql
RDCJM2 小时前
Neo4j图数据库学习(二)——SpringBoot整合Neo4j
数据库·学习·neo4j
杀神lwz2 小时前
MongoDB入门+深入(三)--mongdbSql
数据库·mongodb
枫叶丹42 小时前
复杂SQL性能突围:代价驱动的连接条件下推策略与工程实践
数据库·sql·金仓数据库
聆风吟º2 小时前
直击复杂 SQL 瓶颈:金仓基于代价的连接条件下推技术落地
java·数据库·sql·kingbasees
努力进修3 小时前
复杂查询性能优化:连接条件下推的代价模型设计与实践
数据库·sql·性能优化
原来是猿3 小时前
Linux-【文件系统上】
linux·服务器·数据库
小陳参上7 小时前
用Python创建一个Discord聊天机器人
jvm·数据库·python
changhong19868 小时前
如何在 Spring Boot 中配置数据库?
数据库·spring boot·后端