窗口函数
普通的聚集函数只能用来计算一行内的结果,或者把所有行聚集成一行结果。而窗口函数可以跨行计算,并且把结果填到每一行中。
•通过查询筛选出的行的某些部分,窗口调用函数实现了类似于聚集函数的功能,所以聚集函数也可以作为窗口函数使用。 窗口函数可以扫描所有的行,根据窗口函数的PARTITION BY选项将查询的行分为一组。
•列存表目前只支持窗口函数rank(expression)和row_number(expression),以及聚集函数的sum,count,avg,min和max,而行存表没有限制。
•窗口函数需要特殊的关键字OVER语句来指定窗口即触发一个窗口函数。OVER语句用于对数据进行分组,并对组内元素进行排序。窗口函数用于给组内的值生成序号。
•窗口函数中的order by后面必须跟字段名,若order by后面跟数字,该数字会被按照常量处理,对目标列没有起到排序的作用。
窗口函数的语法格式
function_name ([expression [, expression ... ]]) OVER ( window_definition ) function_name ([expression [, expression ... ]]) OVER window_namefunction_name ( * ) OVER ( window_definition ) function_name ( * ) OVER window_name
其中window_definition子句option为:
[ existing_window_name ] [ PARTITION BY expression [, ...] ] [ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ] [ frame_clause ]
frame_clause子句option为:
[ RANGE | ROWS ] frame_start [ RANGE | ROWS ] BETWEEN frame_start AND frame_end
窗口区间支持RANGE、ROWS两种模式,ROWS 以物理单位(行)指定窗口。RANGE将窗口指定为逻辑偏移量。
RANGE、ROWS中可以使用BETWEEN frame_start AND frame_end指定边界可取值。如果省略了frame_end默认为CURRENT ROW。
BETWEEN frame_start AND frame_end取值为:
CURRENT ROW,当前行。
N PRECEDING,当前行向前第n行。
UNBOUNDED PRECEDING,当前PARTITION的第1行。
N FOLLOWING,当前行向后第n行。
UNBOUNDED FOLLOWING,当前PARTITION的最后1行。
示例:
sql
create table pinko_0410(
id varchar
,c_type varchar
,create_time date
);
insert into pinko_0410 values ('1','P1','2022-04-01 11:22:33');
insert into pinko_0410 values ('2','P1','2022-04-01 11:22:36');
insert into pinko_0410 values ('3','P1','2022-04-01 11:22:28');
insert into pinko_0410 values ('4','P1','2022-04-01 11:22:57');
insert into pinko_0410 values ('5','P2','2022-04-01 11:22:45');
insert into pinko_0410 values ('6','P2','2022-04-03 11:22:33');
insert into pinko_0410 values ('7','P2','2022-04-03 11:22:38');
insert into pinko_0410 values ('8','P2','2022-04-03 11:22:20');
insert into pinko_0410 values ('9','P2','2022-04-03 11:22:11');
--查询表
select * from pinko_0410;
id c_type create_time
5 P2 2022-04-01 11:22:45
8 P2 2022-04-03 11:22:20
2 P1 2022-04-01 11:22:36
7 P2 2022-04-03 11:22:38
4 P1 2022-04-01 11:22:57
9 P2 2022-04-03 11:22:11
3 P1 2022-04-01 11:22:28
6 P2 2022-04-03 11:22:33
1 P1 2022-04-01 11:22:33
--生成行号
select * ,row_number() over() as rn
from pinko_0410;
id c_type create_time rn
1 P1 2022-04-01 11:22:33 1
8 P2 2022-04-03 11:22:20 2
7 P2 2022-04-03 11:22:38 3
2 P1 2022-04-01 11:22:36 4
5 P2 2022-04-01 11:22:45 5
4 P1 2022-04-01 11:22:57 6
9 P2 2022-04-03 11:22:11 7
3 P1 2022-04-01 11:22:28 8
6 P2 2022-04-03 11:22:33 9
--按c_type分组
select * ,row_number() over(partition by c_type) as rn
from pinko_0410;
id c_type create_time rn
8 P2 2022-04-03 11:22:20 1
9 P2 2022-04-03 11:22:11 2
6 P2 2022-04-03 11:22:33 3
7 P2 2022-04-03 11:22:38 4
5 P2 2022-04-01 11:22:45 5
1 P1 2022-04-01 11:22:33 1
2 P1 2022-04-01 11:22:36 2
4 P1 2022-04-01 11:22:57 3
3 P1 2022-04-01 11:22:28 4
--按c_type分组,create_time升序
select * ,row_number() over(partition by c_type order by create_time) as rn
from pinko_0410;
id c_type create_time rn
5 P2 2022-04-01 11:22:45 1
9 P2 2022-04-03 11:22:11 2
8 P2 2022-04-03 11:22:20 3
6 P2 2022-04-03 11:22:33 4
7 P2 2022-04-03 11:22:38 5
3 P1 2022-04-01 11:22:28 1
1 P1 2022-04-01 11:22:33 2
2 P1 2022-04-01 11:22:36 3
4 P1 2022-04-01 11:22:57 4
select * ,max(id) over(partition by c_type) as rn
from pinko_0410;
id c_type create_time rn
1 P1 2022-04-01 11:22:33 4
2 P1 2022-04-01 11:22:36 4
4 P1 2022-04-01 11:22:57 4
3 P1 2022-04-01 11:22:28 4
8 P2 2022-04-03 11:22:20 9
6 P2 2022-04-03 11:22:33 9
5 P2 2022-04-01 11:22:45 9
9 P2 2022-04-03 11:22:11 9
7 P2 2022-04-03 11:22:38 9
select * ,max(id) over(partition by c_type order by create_time) as rn
from pinko_0410;
id c_type create_time rn
5 P2 2022-04-01 11:22:45 5
9 P2 2022-04-03 11:22:11 9
8 P2 2022-04-03 11:22:20 9
6 P2 2022-04-03 11:22:33 9
7 P2 2022-04-03 11:22:38 9
3 P1 2022-04-01 11:22:28 3
1 P1 2022-04-01 11:22:33 3
2 P1 2022-04-01 11:22:36 3
4 P1 2022-04-01 11:22:57 4
select * ,lag(id) over(partition by c_type order by create_time rows between 1 PRECEDING and CURRENT ROW) as rn
from pinko_0410;
id c_type create_time rn
3 P1 2022-04-01 11:22:28
1 P1 2022-04-01 11:22:33 3
2 P1 2022-04-01 11:22:36 1
4 P1 2022-04-01 11:22:57 2
5 P2 2022-04-01 11:22:45
9 P2 2022-04-03 11:22:11 5
8 P2 2022-04-03 11:22:20 9
6 P2 2022-04-03 11:22:33 8
7 P2 2022-04-03 11:22:38 6
select * ,lag(id,2) over(partition by c_type order by create_time rows between 1 PRECEDING and CURRENT ROW) as rn
from pinko_0410;
id c_type create_time rn
5 P2 2022-04-01 11:22:45
9 P2 2022-04-03 11:22:11
8 P2 2022-04-03 11:22:20 5
6 P2 2022-04-03 11:22:33 9
7 P2 2022-04-03 11:22:38 8
3 P1 2022-04-01 11:22:28
1 P1 2022-04-01 11:22:33
2 P1 2022-04-01 11:22:36 3
4 P1 2022-04-01 11:22:57 1
select * ,lag(id,2,'hello') over(partition by c_type order by create_time rows between 1 PRECEDING and CURRENT ROW) as rn
from pinko_0410;
id c_type create_time rn
5 P2 2022-04-01 11:22:45 hello
9 P2 2022-04-03 11:22:11 hello
8 P2 2022-04-03 11:22:20 5
6 P2 2022-04-03 11:22:33 9
7 P2 2022-04-03 11:22:38 8
3 P1 2022-04-01 11:22:28 hello
1 P1 2022-04-01 11:22:33 hello
2 P1 2022-04-01 11:22:36 3
4 P1 2022-04-01 11:22:57 1