Hive详解（4）

程序员小郭同学2024-04-04 2:34

Hive

窗口函数

分析函数

聚合函数，例如sum，avg，max，min等
移位函数
1. lag(colName, n)：以当前行为基础，来处理第前n行的数据
2. lead(colName, n)：以当前行为基础，来处理第后n行的数据
3. ntile(n)：要求数据必须有序，将有序的数据依次放入n个桶中，保证每个桶中的数据几乎一致，相差最多不超过1个
排序函数
1. row_number：数据排序之后，按顺序给数据进行编号，即使数据相同，也是给定不同的编号
2. rank：数据排序之后，按顺序给数据进行编号，如果数据相同，则给定相同的序号，会产生空位
3. dense_rank：数据排序之后，按顺序给数据进行编号，如果数据相同，则给定相同的序号，但是不会产生空位

移位函数案例

需求二：查询每一位顾客的消费明细以及上一次的消费时间

复制代码

select *,
       lag(order_date, 1) over (partition by name order by order_date) as last_order_date
from orders;

需求三：查询最早进店消费的前20%的顾客信息

复制代码

select * from (
    select *,
           ntile(5) over (order by order_date) as n
    from orders
) t1 where n = 1;

排序函数案例

原始数据

复制代码

Bob Chinese 85
Alex Chinese 76
Bill Chinese 78
David Chinese 92
Jack Chinese 69
Lucy Chinese 74
LiLy Chinese 78
Bob Maths 91
Alex Maths 82
Bill Maths 69
David Maths 60
Jack Maths 69
Lucy Maths 71
LiLy Maths 82
Bob English 60
Alex English 62
Bill English 85
David English 85
Jack English 69
Lucy English 78
LiLy English 93

案例

复制代码

-- 建表
create table scores (
    name    string,
    subject string,
    score   int
) row format delimited fields terminated by ' ';
-- 加载数据
load data local inpath '/opt/hive_data/scores' into table scores;
-- 查询数据
select *
from scores tablesample (5 rows);
-- 按科目对成绩进行降序排序
select *,
       row_number() over (partition by subject order by score desc) as rn,
       rank() over (partition by subject order by score desc)       as ra,
       dense_rank() over (partition by subject order by score desc) as dr
from scores;
-- 获取各科目前三名的信息
select * from (
    select *, rank() over (partition by subject order by score desc) as n from scores
) t where n <= 3;