【数据库】【Oracle】分析函数与窗口函数

Oracle 分析函数与窗口函数完全指南

分析函数(Analytic Functions)是 Oracle 最强大的 SQL 特性之一,可在不使用自连接或子查询的情况下,对结果集的当前行相关行进行计算。它们极大地简化了复杂的数据分析任务。


一、核心概念与价值

1.1 窗口函数 vs 聚合函数

sql 复制代码
-- 聚合函数:将多行变为单行
SELECT department_id, AVG(salary) 
FROM employees 
GROUP BY department_id;
-- 结果:每个部门一行,计算部门平均值

-- 窗口函数:每行都保留,添加计算列
SELECT employee_id, department_id, salary,
       AVG(salary) OVER (PARTITION BY department_id) AS dept_avg
FROM employees;
-- 结果:每个员工一行,附带部门平均值

核心区别

  • 聚合函数 :减少行数,需要 GROUP BY
  • 窗口函数 :保留行数,使用 OVER() 子句定义窗口
  • 性能优势:单次扫描完成计算,避免自连接

1.2 三大核心子句

sql 复制代码
FUNCTION() OVER (
    [PARTITION BY 列1, 列2]    -- 分组(类似GROUP BY)
    [ORDER BY 列3 [ASC|DESC]]  -- 排序(定义窗口顺序)
    [窗口子句]                  -- 行范围(ROWS/RANGE)
)

示例数据集

sql 复制代码
CREATE TABLE sales_data AS
SELECT 1 AS product_id, '2023-01' AS month, 100 AS sales FROM dual UNION ALL
SELECT 1, '2023-02', 150 FROM dual UNION ALL
SELECT 1, '2023-03', 200 FROM dual UNION ALL
SELECT 2, '2023-01', 80 FROM dual UNION ALL
SELECT 2, '2023-02', 120 FROM dual;

二、排名函数(Ranking Functions)

2.1 RANK / DENSE_RANK / ROW_NUMBER

函数 特点 示例结果(薪水相同)
RANK 并列排名,有间隔 1,1,3
DENSE_RANK 并列排名,无间隔 1,1,2
ROW_NUMBER 唯一序号,不并列 1,2,3
sql 复制代码
SELECT 
    employee_id,
    department_id,
    salary,
    RANK() OVER (ORDER BY salary DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees
WHERE department_id = 80;

2.2 分区排名

sql 复制代码
-- 每个部门内排名
SELECT 
    employee_id,
    department_id,
    salary,
    RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dept_rank
FROM employees;

2.3 NTILE 分组

将结果集均匀分配到指定数量的桶中。

sql 复制代码
-- 将员工按薪水分为4个等级
SELECT 
    employee_id,
    salary,
    NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;
-- 结果:1=前25%, 2=26-50%, 3=51-75%, 4=后25%

2.4 实战场景:Top-N 查询

sql 复制代码
-- 查询每个薪水最高的3名员工(传统子查询 vs 分析函数)

-- 传统方式(性能差)
SELECT * FROM employees e1
WHERE 3 > (SELECT COUNT(*) FROM employees e2 
           WHERE e2.department_id = e1.department_id 
           AND e2.salary > e1.salary);

-- 分析函数方式(推荐)
SELECT * FROM (
    SELECT 
        employee_id,
        department_id,
        salary,
        DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rnk
    FROM employees
) WHERE rnk <= 3;

三、聚合窗口函数

3.1 基础聚合函数 + OVER()

所有聚合函数(SUM, AVG, COUNT, MAX, MIN)都可作为窗口函数。

sql 复制代码
-- 计算累计薪水
SELECT 
    employee_id,
    hire_date,
    salary,
    SUM(salary) OVER (ORDER BY hire_date) AS running_total,
    AVG(salary) OVER (ORDER BY hire_date) AS running_avg,
    COUNT(*) OVER (ORDER BY hire_date) AS running_count
FROM employees;

3.2 分区聚合

sql 复制代码
-- 部门占比计算
SELECT 
    employee_id,
    department_id,
    salary,
    SUM(salary) OVER (PARTITION BY department_id) AS dept_total,
    salary / SUM(salary) OVER (PARTITION BY department_id) AS ratio,
    COUNT(*) OVER (PARTITION BY department_id) AS dept_emp_count
FROM employees;

四、取值函数(Value Functions)

4.1 LAG / LEAD:前后行取值

sql 复制代码
-- 计算薪资增长
SELECT 
    employee_id,
    hire_date,
    salary,
    LAG(salary, 1) OVER (ORDER BY hire_date) AS prev_salary,
    salary - LAG(salary, 1) OVER (ORDER BY hire_date) AS increase
FROM employees
WHERE department_id = 80;

参数说明

  • LAG(列, offset, default)offset 默认为1,default 为NULL
  • LEAD(列, offset, default):向后取值

4.2 FIRST_VALUE / LAST_VALUE

sql 复制代码
-- 获取部门最高/最低薪资
SELECT 
    employee_id,
    department_id,
    salary,
    FIRST_VALUE(salary) OVER (PARTITION BY department_id ORDER BY salary DESC 
                               ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS dept_highest,
    LAST_VALUE(salary) OVER (PARTITION BY department_id ORDER BY salary DESC 
                              ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS dept_lowest
FROM employees;

⚠️ 关键陷阱LAST_VALUE 默认窗口是 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,需显式指定 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 才能获取全局最后值。

4.3 NTH_VALUE

获取窗口中第 N 个值。

sql 复制代码
-- 获取部门薪水第二高的值
SELECT 
    employee_id,
    department_id,
    salary,
    NTH_VALUE(salary, 2) OVER (PARTITION BY department_id 
                               ORDER BY salary DESC 
                               ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS second_highest
FROM employees;

五、窗口子句(Windowing Clause)

5.1 ROWS vs RANGE 的区别

sql 复制代码
-- ROWS:物理行范围(与 ORDER BY 列值无关)
SELECT 
    employee_id,
    hire_date,
    salary,
    SUM(salary) OVER (ORDER BY hire_date 
                      ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS rows_sum
FROM employees;
-- 结果:当前行 + 前1行 + 后1行

-- RANGE:逻辑值范围(基于 ORDER BY 列的值域)
SELECT 
    employee_id,
    hire_date,
    salary,
    SUM(salary) OVER (ORDER BY hire_date 
                      RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW) AS range_sum
FROM employees;
-- 结果:hire_date 在 [当前日期-30天, 当前日期] 范围内的所有行

5.2 常用窗口定义

sql 复制代码
-- 从开头到当前行(累计)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

-- 前3行到当前行(移动平均)
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW

-- 前1行到后1行(3行窗口)
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

-- 全年数据(RANGE 示例)
RANGE BETWEEN INTERVAL '1' YEAR PRECEDING AND CURRENT ROW

-- 分组内所有行
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

六、分布函数(Distribution Functions)

6.1 CUME_DIST 与 PERCENT_RANK

sql 复制代码
-- 累积分布和百分比排名
SELECT 
    employee_id,
    department_id,
    salary,
    CUME_DIST() OVER (PARTITION BY department_id ORDER BY salary) AS cume_dist,
    PERCENT_RANK() OVER (PARTITION BY department_id ORDER BY salary) AS percent_rank
FROM employees;

-- CUME_DIST:≤当前值的行数 / 总行数
-- PERCENT_RANK:(当前排名-1) / (总行数-1)

6.2 PERCENTILE_CONT / PERCENTILE_DISC

sql 复制代码
-- 计算中位数
SELECT 
    department_id,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) OVER (PARTITION BY department_id) AS median_salary,
    PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary) OVER (PARTITION BY department_id) AS median_disc
FROM employees;

-- CONT:连续插值,返回计算值
-- DISC:离散取值,返回实际存在的值

七、高级分析函数

7.1 LISTAGG:字符串聚合

sql 复制代码
-- 每个部门的员工名单(11gR2+)
SELECT 
    department_id,
    LISTAGG(first_name, ',') WITHIN GROUP (ORDER BY hire_date) AS employees
FROM employees
GROUP BY department_id;

-- 窗口版本(不减少行数)
SELECT 
    employee_id,
    department_id,
    LISTAGG(first_name, ',') WITHIN GROUP (ORDER BY hire_date) 
        OVER (PARTITION BY department_id) AS dept_employees
FROM employees;

7.2 PIVOT / UNPIVOT

sql 复制代码
-- 行转列(PIVOT)
SELECT * FROM (
    SELECT department_id, job_id, salary FROM employees
)
PIVOT (
    AVG(salary) FOR job_id IN ('IT_PROG', 'SA_MAN', 'AD_PRES')
);

-- 列转行(UNPIVOT)
SELECT * FROM sales_data
UNPIVOT (
    amount FOR month IN (jan, feb, mar)
);

7.3 模型子句(MODEL)

最复杂的分析函数,支持电子表格式计算。

sql 复制代码
-- 计算年度累计销售
SELECT product_id, month, sales
FROM sales_data
MODEL 
  RETURN UPDATED ROWS
  PARTITION BY (product_id)
  DIMENSION BY (ROWNUM AS rn)
  MEASURES (month, sales, 0 AS cum_sales)
  RULES SEQUENTIAL ORDER (
    cum_sales[rn] = sales[cv(rn)] + NVL(cum_sales[cv(rn)-1], 0)
  )
ORDER BY product_id, rn;

八、性能优化与陷阱

8.1 性能考量

优势

  • 单次扫描完成计算,避免自连接
  • 支持并行执行
  • 减少网络传输(计算在服务端完成)

劣势

  • 需要排序(ORDER BY),大结果集消耗临时表空间
  • 复杂窗口可能导致内存溢出

优化策略

sql 复制代码
-- 1. 尽量减少 PARTITION BY 的列数
-- 2. 使用索引避免排序(如果 ORDER BY 列已有索引)
-- 3. 限制窗口大小(避免 UNBOUNDED)
-- 4. 物化视图预计算
CREATE MATERIALIZED VIEW mv_sales_summary AS
SELECT 
    product_id,
    month,
    sales,
    SUM(sales) OVER (PARTITION BY product_id ORDER BY month) AS running_total
FROM sales_data;

-- 5. 监控执行计划
EXPLAIN PLAN FOR
SELECT ... OVER (PARTITION BY ... ORDER BY ...)
FROM ...;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

8.2 常见陷阱

陷阱 1:ORDER BY 缺失导致不确定结果

sql 复制代码
-- 错误:ROW_NUMBER 未指定 ORDER BY,结果不稳定
SELECT ROW_NUMBER() OVER () FROM employees;  -- 避免!

-- 正确:明确排序
SELECT ROW_NUMBER() OVER (ORDER BY employee_id) FROM employees;

陷阱 2:RANGE 使用不当

sql 复制代码
-- 错误:RANGE 默认只到 CURRENT ROW
SELECT LAST_VALUE(salary) OVER (ORDER BY hire_date) FROM employees;  -- 不是真正的最后值

-- 正确:指定完整窗口
SELECT LAST_VALUE(salary) OVER (
    ORDER BY hire_date 
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) FROM employees;

陷阱 3:在 WHERE 子句中使用窗口函数

sql 复制代码
-- 错误:不能在 WHERE 中直接使用
SELECT * FROM employees 
WHERE ROW_NUMBER() OVER (ORDER BY salary DESC) <= 3;  -- ORA-30483

-- 正确:使用子查询
SELECT * FROM (
    SELECT e.*, ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn FROM employees e
) WHERE rn <= 3;

九、综合实战案例

案例 1:计算同比/环比

sql 复制代码
-- 计算月度销售额同比增长
SELECT 
    month,
    sales,
    LAG(sales, 12) OVER (ORDER BY month) AS sales_last_year,
    (sales - LAG(sales, 12) OVER (ORDER BY month)) / 
        LAG(sales, 12) OVER (ORDER BY month) * 100 AS yoy_growth
FROM monthly_sales;

案例 2:找出连续登录用户

sql 复制代码
-- 找出连续3天登录的用户
SELECT DISTINCT user_id FROM (
    SELECT 
        user_id,
        login_date,
        LAG(login_date, 2) OVER (PARTITION BY user_id ORDER BY login_date) AS lag_2_date
    FROM user_logins
) WHERE login_date - lag_2_date = 2;

案例 3:动态分组(数据分桶)

sql 复制代码
-- 将员工按薪水分成5个等级
SELECT 
    employee_id,
    salary,
    NTILE(5) OVER (ORDER BY salary DESC) AS salary_grade,
    CASE NTILE(5) OVER (ORDER BY salary DESC)
        WHEN 1 THEN 'Top 20%'
        WHEN 2 THEN '20%-40%'
        ...
        ELSE 'Bottom 20%'
    END AS grade_desc
FROM employees;

案例 4:客户生命周期价值(LTV)

sql 复制代码
-- 计算每个客户的累计消费和首次购买间隔天数
SELECT 
    customer_id,
    order_date,
    order_amount,
    SUM(order_amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS customer_lifetime_value,
    order_date - FIRST_VALUE(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS days_since_first
FROM orders;

十、总结与最佳实践

10.1 核心要点

函数类别 主要用途 性能影响
排名函数 Top-N、分组排名 中等(需排序)
聚合窗口 累计、占比、移动平均 中等(需排序)
取值函数 前后对比、首尾值 中等(需排序)
分布函数 百分位、中位数 高(需排序和计算)
高级函数 字符串聚合、模型计算 高(复杂计算)

10.2 Oracle 版本支持

版本 关键特性
Oracle 8i 基础窗口函数(ROW_NUMBER, RANK)
Oracle 9i LAG/LEAD, FIRST_VALUE/LAST_VALUE
Oracle 10g ROLLUP, CUBE, GROUPING SETS
Oracle 11gR2 LISTAGG
Oracle 12c OFFSET FETCH 替代部分场景

10.3 最佳实践清单

优先使用窗口函数 代替自连接,代码更简洁

明确指定 ORDER BY ,避免不确定性结果

完整定义窗口范围 ,特别是 LAST_VALUE

合理使用 PARTITION BY 减少排序数据量

监控执行计划 ,避免临时表空间溢出

物化视图预计算 频繁使用的复杂窗口

避免在 WHERE 中直接使用,改用子查询

分析函数是 SQL 从"数据查询"到"数据分析"的革命性飞跃 。掌握它们将使你的 Oracle SQL 能力产生质变,能够轻松应对复杂的数据分析需求。

相关推荐
冰暮流星8 分钟前
javascript之二重循环练习
开发语言·javascript·数据库
万岳科技系统开发33 分钟前
食堂采购系统源码库存扣减算法与并发控制实现详解
java·前端·数据库·算法
冉冰学姐1 小时前
SSM智慧社区管理系统jby69(程序+源码+数据库+调试部署+开发环境)带论文文档1万字以上,文末可获取,系统界面在最后面
数据库·管理系统·智慧社区·ssm 框架
杨超越luckly1 小时前
HTML应用指南:利用GET请求获取中国500强企业名单,揭秘企业增长、分化与转型的新常态
前端·数据库·html·可视化·中国500强
Elastic 中国社区官方博客1 小时前
Elasticsearch:Workflows 介绍 - 9.3
大数据·数据库·人工智能·elasticsearch·ai·全文检索
仍然.1 小时前
MYSQL--- 聚合查询,分组查询和联合查询
数据库
一 乐1 小时前
校园二手交易|基于springboot + vue校园二手交易系统(源码+数据库+文档)
java·数据库·vue.js·spring boot·后端
啦啦啦_99991 小时前
Redis-0-业务逻辑
数据库·redis·缓存
自不量力的A同学2 小时前
Redisson 4.2.0 发布,官方推荐的 Redis 客户端
数据库·redis·缓存
Exquisite.2 小时前
Mysql
数据库·mysql