前述
Sql窗口分析函数【lead、lag详解】
Hive 分析函数lead、lag实例应用
- lag :用于统计窗口内往上第n行值
- lead :用于统计窗口内往下第n行值
sql
lead(列名,1,0) over (partition by 分组列 order by 排序列 rows between 开始位置 preceding and 结束位置 following)
lag 和lead 有三个参数:
- 列名;
- 偏移的offset;
- 超出记录窗口时的默认值。
题目描述
leetcode题目:1661. 每台机器的进程平均运行时间
Code
写法一:自连接
sql
select A.machine_id,
round(avg(B.timestamp - A.timestamp), 3) as processing_time
from Activity A, Activity B
where A.machine_id = B.machine_id and
A.process_id = B.process_id and
A.activity_type = 'start' and
B.activity_type = 'end'
group by machine_id
过程解析:连表,然后过滤需要的行。
写法二:同组内最大最小值确定end time和start time
思路转换:同组内的结束时间-开始时间 == max(timestamp) - min(timestamp)
sql
select machine_id,
round(avg(timm), 3) as processing_time
from (
select *,
max(timestamp) - min(timestamp) as timm
from Activity
group by machine_id, process_id
) A
group by machine_id
写法三:case when
思路:把 end 时间变成负数,方便求和/平均值计算。
sql
select machine_id,
round(avg(timm)*2, 3) as processing_time
from (
select *,
case
when activity_type='end'
then timestamp
else -timestamp
end as timm
from Activity
) A
group by machine_id
过程解析:
写法四:窗口函数lead()
sql
with t as(
select *,
lead(timestamp, 1, 0) over(partition by machine_id order by process_id asc, timestamp asc) as end_time
from Activity
)
select t.machine_id,
round(avg(end_time-timestamp), 3) as processing_time
from t
where t.activity_type = 'start'
group by t.machine_id
此写法学习大佬的题解 WITH+LEAD窗口函数