【数据库】【Oracle】分析函数与窗口函数

Oracle 分析函数与窗口函数完全指南

分析函数(Analytic Functions)是 Oracle 最强大的 SQL 特性之一,可在不使用自连接或子查询的情况下,对结果集的当前行相关行进行计算。它们极大地简化了复杂的数据分析任务。


一、核心概念与价值

1.1 窗口函数 vs 聚合函数

sql 复制代码
-- 聚合函数:将多行变为单行
SELECT department_id, AVG(salary) 
FROM employees 
GROUP BY department_id;
-- 结果:每个部门一行,计算部门平均值

-- 窗口函数:每行都保留,添加计算列
SELECT employee_id, department_id, salary,
       AVG(salary) OVER (PARTITION BY department_id) AS dept_avg
FROM employees;
-- 结果:每个员工一行,附带部门平均值

核心区别

  • 聚合函数 :减少行数,需要 GROUP BY
  • 窗口函数 :保留行数,使用 OVER() 子句定义窗口
  • 性能优势:单次扫描完成计算,避免自连接

1.2 三大核心子句

sql 复制代码
FUNCTION() OVER (
    [PARTITION BY 列1, 列2]    -- 分组(类似GROUP BY)
    [ORDER BY 列3 [ASC|DESC]]  -- 排序(定义窗口顺序)
    [窗口子句]                  -- 行范围(ROWS/RANGE)
)

示例数据集

sql 复制代码
CREATE TABLE sales_data AS
SELECT 1 AS product_id, '2023-01' AS month, 100 AS sales FROM dual UNION ALL
SELECT 1, '2023-02', 150 FROM dual UNION ALL
SELECT 1, '2023-03', 200 FROM dual UNION ALL
SELECT 2, '2023-01', 80 FROM dual UNION ALL
SELECT 2, '2023-02', 120 FROM dual;

二、排名函数(Ranking Functions)

2.1 RANK / DENSE_RANK / ROW_NUMBER

函数 特点 示例结果(薪水相同)
RANK 并列排名,有间隔 1,1,3
DENSE_RANK 并列排名,无间隔 1,1,2
ROW_NUMBER 唯一序号,不并列 1,2,3
sql 复制代码
SELECT 
    employee_id,
    department_id,
    salary,
    RANK() OVER (ORDER BY salary DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees
WHERE department_id = 80;

2.2 分区排名

sql 复制代码
-- 每个部门内排名
SELECT 
    employee_id,
    department_id,
    salary,
    RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dept_rank
FROM employees;

2.3 NTILE 分组

将结果集均匀分配到指定数量的桶中。

sql 复制代码
-- 将员工按薪水分为4个等级
SELECT 
    employee_id,
    salary,
    NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;
-- 结果:1=前25%, 2=26-50%, 3=51-75%, 4=后25%

2.4 实战场景:Top-N 查询

sql 复制代码
-- 查询每个薪水最高的3名员工(传统子查询 vs 分析函数)

-- 传统方式(性能差)
SELECT * FROM employees e1
WHERE 3 > (SELECT COUNT(*) FROM employees e2 
           WHERE e2.department_id = e1.department_id 
           AND e2.salary > e1.salary);

-- 分析函数方式(推荐)
SELECT * FROM (
    SELECT 
        employee_id,
        department_id,
        salary,
        DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rnk
    FROM employees
) WHERE rnk <= 3;

三、聚合窗口函数

3.1 基础聚合函数 + OVER()

所有聚合函数(SUM, AVG, COUNT, MAX, MIN)都可作为窗口函数。

sql 复制代码
-- 计算累计薪水
SELECT 
    employee_id,
    hire_date,
    salary,
    SUM(salary) OVER (ORDER BY hire_date) AS running_total,
    AVG(salary) OVER (ORDER BY hire_date) AS running_avg,
    COUNT(*) OVER (ORDER BY hire_date) AS running_count
FROM employees;

3.2 分区聚合

sql 复制代码
-- 部门占比计算
SELECT 
    employee_id,
    department_id,
    salary,
    SUM(salary) OVER (PARTITION BY department_id) AS dept_total,
    salary / SUM(salary) OVER (PARTITION BY department_id) AS ratio,
    COUNT(*) OVER (PARTITION BY department_id) AS dept_emp_count
FROM employees;

四、取值函数(Value Functions)

4.1 LAG / LEAD:前后行取值

sql 复制代码
-- 计算薪资增长
SELECT 
    employee_id,
    hire_date,
    salary,
    LAG(salary, 1) OVER (ORDER BY hire_date) AS prev_salary,
    salary - LAG(salary, 1) OVER (ORDER BY hire_date) AS increase
FROM employees
WHERE department_id = 80;

参数说明

  • LAG(列, offset, default)offset 默认为1,default 为NULL
  • LEAD(列, offset, default):向后取值

4.2 FIRST_VALUE / LAST_VALUE

sql 复制代码
-- 获取部门最高/最低薪资
SELECT 
    employee_id,
    department_id,
    salary,
    FIRST_VALUE(salary) OVER (PARTITION BY department_id ORDER BY salary DESC 
                               ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS dept_highest,
    LAST_VALUE(salary) OVER (PARTITION BY department_id ORDER BY salary DESC 
                              ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS dept_lowest
FROM employees;

⚠️ 关键陷阱LAST_VALUE 默认窗口是 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,需显式指定 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 才能获取全局最后值。

4.3 NTH_VALUE

获取窗口中第 N 个值。

sql 复制代码
-- 获取部门薪水第二高的值
SELECT 
    employee_id,
    department_id,
    salary,
    NTH_VALUE(salary, 2) OVER (PARTITION BY department_id 
                               ORDER BY salary DESC 
                               ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS second_highest
FROM employees;

五、窗口子句(Windowing Clause)

5.1 ROWS vs RANGE 的区别

sql 复制代码
-- ROWS:物理行范围(与 ORDER BY 列值无关)
SELECT 
    employee_id,
    hire_date,
    salary,
    SUM(salary) OVER (ORDER BY hire_date 
                      ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS rows_sum
FROM employees;
-- 结果:当前行 + 前1行 + 后1行

-- RANGE:逻辑值范围(基于 ORDER BY 列的值域)
SELECT 
    employee_id,
    hire_date,
    salary,
    SUM(salary) OVER (ORDER BY hire_date 
                      RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW) AS range_sum
FROM employees;
-- 结果:hire_date 在 [当前日期-30天, 当前日期] 范围内的所有行

5.2 常用窗口定义

sql 复制代码
-- 从开头到当前行(累计)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

-- 前3行到当前行(移动平均)
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW

-- 前1行到后1行(3行窗口)
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

-- 全年数据(RANGE 示例)
RANGE BETWEEN INTERVAL '1' YEAR PRECEDING AND CURRENT ROW

-- 分组内所有行
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

六、分布函数(Distribution Functions)

6.1 CUME_DIST 与 PERCENT_RANK

sql 复制代码
-- 累积分布和百分比排名
SELECT 
    employee_id,
    department_id,
    salary,
    CUME_DIST() OVER (PARTITION BY department_id ORDER BY salary) AS cume_dist,
    PERCENT_RANK() OVER (PARTITION BY department_id ORDER BY salary) AS percent_rank
FROM employees;

-- CUME_DIST:≤当前值的行数 / 总行数
-- PERCENT_RANK:(当前排名-1) / (总行数-1)

6.2 PERCENTILE_CONT / PERCENTILE_DISC

sql 复制代码
-- 计算中位数
SELECT 
    department_id,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) OVER (PARTITION BY department_id) AS median_salary,
    PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary) OVER (PARTITION BY department_id) AS median_disc
FROM employees;

-- CONT:连续插值,返回计算值
-- DISC:离散取值,返回实际存在的值

七、高级分析函数

7.1 LISTAGG:字符串聚合

sql 复制代码
-- 每个部门的员工名单(11gR2+)
SELECT 
    department_id,
    LISTAGG(first_name, ',') WITHIN GROUP (ORDER BY hire_date) AS employees
FROM employees
GROUP BY department_id;

-- 窗口版本(不减少行数)
SELECT 
    employee_id,
    department_id,
    LISTAGG(first_name, ',') WITHIN GROUP (ORDER BY hire_date) 
        OVER (PARTITION BY department_id) AS dept_employees
FROM employees;

7.2 PIVOT / UNPIVOT

sql 复制代码
-- 行转列(PIVOT)
SELECT * FROM (
    SELECT department_id, job_id, salary FROM employees
)
PIVOT (
    AVG(salary) FOR job_id IN ('IT_PROG', 'SA_MAN', 'AD_PRES')
);

-- 列转行(UNPIVOT)
SELECT * FROM sales_data
UNPIVOT (
    amount FOR month IN (jan, feb, mar)
);

7.3 模型子句(MODEL)

最复杂的分析函数,支持电子表格式计算。

sql 复制代码
-- 计算年度累计销售
SELECT product_id, month, sales
FROM sales_data
MODEL 
  RETURN UPDATED ROWS
  PARTITION BY (product_id)
  DIMENSION BY (ROWNUM AS rn)
  MEASURES (month, sales, 0 AS cum_sales)
  RULES SEQUENTIAL ORDER (
    cum_sales[rn] = sales[cv(rn)] + NVL(cum_sales[cv(rn)-1], 0)
  )
ORDER BY product_id, rn;

八、性能优化与陷阱

8.1 性能考量

优势

  • 单次扫描完成计算,避免自连接
  • 支持并行执行
  • 减少网络传输(计算在服务端完成)

劣势

  • 需要排序(ORDER BY),大结果集消耗临时表空间
  • 复杂窗口可能导致内存溢出

优化策略

sql 复制代码
-- 1. 尽量减少 PARTITION BY 的列数
-- 2. 使用索引避免排序(如果 ORDER BY 列已有索引)
-- 3. 限制窗口大小(避免 UNBOUNDED)
-- 4. 物化视图预计算
CREATE MATERIALIZED VIEW mv_sales_summary AS
SELECT 
    product_id,
    month,
    sales,
    SUM(sales) OVER (PARTITION BY product_id ORDER BY month) AS running_total
FROM sales_data;

-- 5. 监控执行计划
EXPLAIN PLAN FOR
SELECT ... OVER (PARTITION BY ... ORDER BY ...)
FROM ...;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

8.2 常见陷阱

陷阱 1:ORDER BY 缺失导致不确定结果

sql 复制代码
-- 错误:ROW_NUMBER 未指定 ORDER BY,结果不稳定
SELECT ROW_NUMBER() OVER () FROM employees;  -- 避免!

-- 正确:明确排序
SELECT ROW_NUMBER() OVER (ORDER BY employee_id) FROM employees;

陷阱 2:RANGE 使用不当

sql 复制代码
-- 错误:RANGE 默认只到 CURRENT ROW
SELECT LAST_VALUE(salary) OVER (ORDER BY hire_date) FROM employees;  -- 不是真正的最后值

-- 正确:指定完整窗口
SELECT LAST_VALUE(salary) OVER (
    ORDER BY hire_date 
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) FROM employees;

陷阱 3:在 WHERE 子句中使用窗口函数

sql 复制代码
-- 错误:不能在 WHERE 中直接使用
SELECT * FROM employees 
WHERE ROW_NUMBER() OVER (ORDER BY salary DESC) <= 3;  -- ORA-30483

-- 正确:使用子查询
SELECT * FROM (
    SELECT e.*, ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn FROM employees e
) WHERE rn <= 3;

九、综合实战案例

案例 1:计算同比/环比

sql 复制代码
-- 计算月度销售额同比增长
SELECT 
    month,
    sales,
    LAG(sales, 12) OVER (ORDER BY month) AS sales_last_year,
    (sales - LAG(sales, 12) OVER (ORDER BY month)) / 
        LAG(sales, 12) OVER (ORDER BY month) * 100 AS yoy_growth
FROM monthly_sales;

案例 2:找出连续登录用户

sql 复制代码
-- 找出连续3天登录的用户
SELECT DISTINCT user_id FROM (
    SELECT 
        user_id,
        login_date,
        LAG(login_date, 2) OVER (PARTITION BY user_id ORDER BY login_date) AS lag_2_date
    FROM user_logins
) WHERE login_date - lag_2_date = 2;

案例 3:动态分组(数据分桶)

sql 复制代码
-- 将员工按薪水分成5个等级
SELECT 
    employee_id,
    salary,
    NTILE(5) OVER (ORDER BY salary DESC) AS salary_grade,
    CASE NTILE(5) OVER (ORDER BY salary DESC)
        WHEN 1 THEN 'Top 20%'
        WHEN 2 THEN '20%-40%'
        ...
        ELSE 'Bottom 20%'
    END AS grade_desc
FROM employees;

案例 4:客户生命周期价值(LTV)

sql 复制代码
-- 计算每个客户的累计消费和首次购买间隔天数
SELECT 
    customer_id,
    order_date,
    order_amount,
    SUM(order_amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS customer_lifetime_value,
    order_date - FIRST_VALUE(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS days_since_first
FROM orders;

十、总结与最佳实践

10.1 核心要点

函数类别 主要用途 性能影响
排名函数 Top-N、分组排名 中等(需排序)
聚合窗口 累计、占比、移动平均 中等(需排序)
取值函数 前后对比、首尾值 中等(需排序)
分布函数 百分位、中位数 高(需排序和计算)
高级函数 字符串聚合、模型计算 高(复杂计算)

10.2 Oracle 版本支持

版本 关键特性
Oracle 8i 基础窗口函数(ROW_NUMBER, RANK)
Oracle 9i LAG/LEAD, FIRST_VALUE/LAST_VALUE
Oracle 10g ROLLUP, CUBE, GROUPING SETS
Oracle 11gR2 LISTAGG
Oracle 12c OFFSET FETCH 替代部分场景

10.3 最佳实践清单

优先使用窗口函数 代替自连接,代码更简洁

明确指定 ORDER BY ,避免不确定性结果

完整定义窗口范围 ,特别是 LAST_VALUE

合理使用 PARTITION BY 减少排序数据量

监控执行计划 ,避免临时表空间溢出

物化视图预计算 频繁使用的复杂窗口

避免在 WHERE 中直接使用,改用子查询

分析函数是 SQL 从"数据查询"到"数据分析"的革命性飞跃 。掌握它们将使你的 Oracle SQL 能力产生质变,能够轻松应对复杂的数据分析需求。

相关推荐
陌北v12 小时前
为什么我从 MySQL 迁移到 PostgreSQL
数据库·mysql·postgresql
北辰水墨3 小时前
Protobuf:从入门到精通的学习笔记(含 3 个项目及避坑指南)
数据库·postgresql
JIngJaneIL3 小时前
基于java+ vue医院管理系统(源码+数据库+文档)
java·开发语言·前端·数据库·vue.js·spring boot
予枫的编程笔记3 小时前
Redis 核心数据结构深度解密:从基础命令到源码架构
java·数据结构·数据库·redis·缓存·架构
信创天地3 小时前
信创国产化数据库的厂商有哪些?分别用在哪个领域?
数据库·python·网络安全·系统架构·系统安全·运维开发
JIngJaneIL3 小时前
基于java + vue校园跑腿便利平台系统(源码+数据库+文档)
java·开发语言·前端·数据库·vue.js·spring boot
瀚高PG实验室4 小时前
highgo DB中数据库对象,模式,用户,权限之间的关系
数据库·瀚高数据库
越来越无动于衷4 小时前
odbc链接oracle数据源
数据库·oracle
李迟4 小时前
Golang实践录:使用sqlx操作sqlite3数据库
数据库·golang·sqlite