🍋🍋大数据学习🍋🍋
🔥系列专栏: 👑哲学语录: 用力所能及,改变世界。
💖如果觉得博主的文章还不错的话,请点赞👍+收藏⭐️+留言📝支持一下博主哦🤞
常用分析函数分类
- 排名函数
- 聚合函数(作为分析函数使用)
- 偏移函数
- 比例函数
1. 排名函数
排名函数用于为结果集中的行分配排名。
常用排名函数
ROW_NUMBER()
- 为每一行分配一个唯一的序号,从 1 开始。
- 示例:
sql
SELECT
employee_id,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;
- 说明:按部门分区,按薪资降序排列,为每个部门的员工分配排名。
RANK()
-
为每一行分配排名,如果有并列值,则跳过后续排名。
-
示例 :
|---|----------------------------------------------------------------------|
| |SELECT
|
| |employee_id,
|
| |department,
|
| |salary,
|
| |RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
|
| |FROM employees;
|- 说明:如果两个员工的薪资相同,他们将获得相同的排名,下一个排名将跳过。
-
DENSE_RANK()
-
为每一行分配排名,如果有并列值,则不跳过后续排名。
-
示例 :
|---|----------------------------------------------------------------------------|
| |SELECT
|
| |employee_id,
|
| |department,
|
| |salary,
|
| |DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
|
| |FROM employees;
|- 说明 :与
RANK()
类似,但不跳过排名。
- 说明 :与
-
NTILE(n)
-
将结果集划分为
n
个桶,并为每一行分配桶号。 -
示例 :
|---|----------------------------------------------------|
| |SELECT
|
| |employee_id,
|
| |salary,
|
| |NTILE(4) OVER (ORDER BY salary DESC) AS quartile
|
| |FROM employees;
|- 说明:将员工按薪资降序排列,并分配到四个桶中。
-
2. 聚合函数(作为分析函数使用)
Hive 支持将聚合函数用作分析函数,以便在窗口内进行计算。
常用聚合函数
SUM()
-
计算窗口内的总和。
-
示例 :
|---|--------------------------------------------------------------|
| |SELECT
|
| |employee_id,
|
| |department,
|
| |salary,
|
| |SUM(salary) OVER (PARTITION BY department) AS total_salary
|
| |FROM employees;
|- 说明:计算每个部门的薪资总和。
-
AVG()
-
计算窗口内的平均值。
-
示例 :
|---|------------------------------------------------------------|
| |SELECT
|
| |employee_id,
|
| |department,
|
| |salary,
|
| |AVG(salary) OVER (PARTITION BY department) AS avg_salary
|
| |FROM employees;
|- 说明:计算每个部门的平均薪资。
-
MIN()
/MAX()
-
计算窗口内的最小值或最大值。
-
示例 :
|---|-------------------------------------------------------------|
| |SELECT
|
| |employee_id,
|
| |department,
|
| |salary,
|
| |MIN(salary) OVER (PARTITION BY department) AS min_salary,
|
| |MAX(salary) OVER (PARTITION BY department) AS max_salary
|
| |FROM employees;
|- 说明:计算每个部门的最低和最高薪资。
-
COUNT()
-
计算窗口内的行数。
-
示例 :
|---|-------------------------------------------------------------|
| |SELECT
|
| |department,
|
| |COUNT(*) OVER (PARTITION BY department) AS employee_count
|
| |FROM employees;
|- 说明:计算每个部门的员工数量。
-
3. 偏移函数
偏移函数用于访问相对于当前行的其他行的值。
常用偏移函数
LEAD(column, offset, default)
-
返回当前行之后第
offset
行的值,如果没有则返回default
。 -
示例 :
|---|-----------------------------------------------------------------|
| |SELECT
|
| |employee_id,
|
| |salary,
|
| |LEAD(salary, 1, 0) OVER (ORDER BY employee_id) AS next_salary
|
| |FROM employees;
|- 说明:获取每个员工的下一个员工的薪资。
-
LAG(column, offset, default)
-
返回当前行之前第
offset
行的值,如果没有则返回default
。 -
示例 :
|---|----------------------------------------------------------------|
| |SELECT
|
| |employee_id,
|
| |salary,
|
| |LAG(salary, 1, 0) OVER (ORDER BY employee_id) AS prev_salary
|
| |FROM employees;
|- 说明:获取每个员工的上一个员工的薪资。
-
FIRST_VALUE(column)
-
返回窗口内第一行的值。
-
示例 :
|---|---------------------------------------------------------------------------------------------|
| |SELECT
|
| |employee_id,
|
| |department,
|
| |salary,
|
| |FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS highest_salary
|
| |FROM employees;
|- 说明:获取每个部门薪资最高的员工的薪资。
-
LAST_VALUE(column)
- 返回窗口内最后一行的值(需要结合正确的窗口框架)。
- 示例:
sql
SELECT
employee_id,
department,
salary,
LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS lowest_salary
FROM employees;
- 说明:获取每个部门薪资最低的员工的薪资(通过指定窗口框架)。
4. 比例函数
比例函数用于计算当前行在窗口内的相对位置。
常用比例函数
CUME_DIST()
- 计算当前行在窗口内的累积分布。
- 示例:
sql
SELECT
employee_id,
department,
salary,
CUME_DIST() OVER (PARTITION BY department ORDER BY salary DESC) AS cume_dist
FROM employees;
- 说明:计算每个员工在其部门内的薪资累积分布。
PERCENT_RANK()
- 计算当前行在窗口内的百分比排名。
- 示例:
sql
SELECT
employee_id,
department,
salary,
PERCENT_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS percent_rank
FROM employees;
- 说明:计算每个员工在其部门内的薪资百分比排名。
窗口框架(Window Frame)
窗口框架定义了分析函数计算的行范围。
常用窗口框架
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
- 从窗口的第一行到当前行。
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
- 从当前行到窗口的最后一行。
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
- 当前行的前一行和后一行。
示例
sql
SELECT
employee_id,
salary,
SUM(salary) OVER (PARTITION BY department
ORDER BY salary
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM employees;
- 说明:计算每个部门内按薪资排序的累计总和。