窗口函数
普通的聚集函数只能用来计算一行内的结果,或者把所有行聚集成一行结果。而窗口函数可以跨行计算,并且把结果填到每一行中。
•通过查询筛选出的行的某些部分,窗口调用函数实现了类似于聚集函数的功能,所以聚集函数也可以作为窗口函数使用。 窗口函数可以扫描所有的行,根据窗口函数的PARTITION BY选项将查询的行分为一组。
•列存表目前只支持窗口函数rank(expression)和row_number(expression),以及聚集函数的sum,count,avg,min和max,而行存表没有限制。
•窗口函数需要特殊的关键字OVER语句来指定窗口即触发一个窗口函数。OVER语句用于对数据进行分组,并对组内元素进行排序。窗口函数用于给组内的值生成序号。
•窗口函数中的order by后面必须跟字段名,若order by后面跟数字,该数字会被按照常量处理,对目标列没有起到排序的作用。
窗口函数的语法格式
function_name ([expression [, expression ... ]]) OVER ( window_definition ) function_name ([expression [, expression ... ]]) OVER window_namefunction_name ( * ) OVER ( window_definition ) function_name ( * ) OVER window_name
其中window_definition子句option为:
existing_window_name \] \[ PARTITION BY expression \[, ...\] \] \[ ORDER BY expression \[ ASC \| DESC \| USING operator \] \[ NULLS { FIRST \| LAST } \] \[, ...\] \] \[ frame_clause
frame_clause子句option为:
RANGE \| ROWS \] frame_start \[ RANGE \| ROWS \] BETWEEN frame_start AND frame_end 窗口区间支持RANGE、ROWS两种模式,ROWS 以物理单位(行)指定窗口。RANGE将窗口指定为逻辑偏移量。 RANGE、ROWS中可以使用BETWEEN frame_start AND frame_end指定边界可取值。如果省略了frame_end默认为CURRENT ROW。 BETWEEN frame_start AND frame_end取值为: CURRENT ROW,当前行。 N PRECEDING,当前行向前第n行。 UNBOUNDED PRECEDING,当前PARTITION的第1行。 N FOLLOWING,当前行向后第n行。 UNBOUNDED FOLLOWING,当前PARTITION的最后1行。 示例: ```sql create table pinko_0410( id varchar ,c_type varchar ,create_time date ); insert into pinko_0410 values ('1','P1','2022-04-01 11:22:33'); insert into pinko_0410 values ('2','P1','2022-04-01 11:22:36'); insert into pinko_0410 values ('3','P1','2022-04-01 11:22:28'); insert into pinko_0410 values ('4','P1','2022-04-01 11:22:57'); insert into pinko_0410 values ('5','P2','2022-04-01 11:22:45'); insert into pinko_0410 values ('6','P2','2022-04-03 11:22:33'); insert into pinko_0410 values ('7','P2','2022-04-03 11:22:38'); insert into pinko_0410 values ('8','P2','2022-04-03 11:22:20'); insert into pinko_0410 values ('9','P2','2022-04-03 11:22:11'); --查询表 select * from pinko_0410; id c_type create_time 5 P2 2022-04-01 11:22:45 8 P2 2022-04-03 11:22:20 2 P1 2022-04-01 11:22:36 7 P2 2022-04-03 11:22:38 4 P1 2022-04-01 11:22:57 9 P2 2022-04-03 11:22:11 3 P1 2022-04-01 11:22:28 6 P2 2022-04-03 11:22:33 1 P1 2022-04-01 11:22:33 --生成行号 select * ,row_number() over() as rn from pinko_0410; id c_type create_time rn 1 P1 2022-04-01 11:22:33 1 8 P2 2022-04-03 11:22:20 2 7 P2 2022-04-03 11:22:38 3 2 P1 2022-04-01 11:22:36 4 5 P2 2022-04-01 11:22:45 5 4 P1 2022-04-01 11:22:57 6 9 P2 2022-04-03 11:22:11 7 3 P1 2022-04-01 11:22:28 8 6 P2 2022-04-03 11:22:33 9 --按c_type分组 select * ,row_number() over(partition by c_type) as rn from pinko_0410; id c_type create_time rn 8 P2 2022-04-03 11:22:20 1 9 P2 2022-04-03 11:22:11 2 6 P2 2022-04-03 11:22:33 3 7 P2 2022-04-03 11:22:38 4 5 P2 2022-04-01 11:22:45 5 1 P1 2022-04-01 11:22:33 1 2 P1 2022-04-01 11:22:36 2 4 P1 2022-04-01 11:22:57 3 3 P1 2022-04-01 11:22:28 4 --按c_type分组,create_time升序 select * ,row_number() over(partition by c_type order by create_time) as rn from pinko_0410; id c_type create_time rn 5 P2 2022-04-01 11:22:45 1 9 P2 2022-04-03 11:22:11 2 8 P2 2022-04-03 11:22:20 3 6 P2 2022-04-03 11:22:33 4 7 P2 2022-04-03 11:22:38 5 3 P1 2022-04-01 11:22:28 1 1 P1 2022-04-01 11:22:33 2 2 P1 2022-04-01 11:22:36 3 4 P1 2022-04-01 11:22:57 4 select * ,max(id) over(partition by c_type) as rn from pinko_0410; id c_type create_time rn 1 P1 2022-04-01 11:22:33 4 2 P1 2022-04-01 11:22:36 4 4 P1 2022-04-01 11:22:57 4 3 P1 2022-04-01 11:22:28 4 8 P2 2022-04-03 11:22:20 9 6 P2 2022-04-03 11:22:33 9 5 P2 2022-04-01 11:22:45 9 9 P2 2022-04-03 11:22:11 9 7 P2 2022-04-03 11:22:38 9 select * ,max(id) over(partition by c_type order by create_time) as rn from pinko_0410; id c_type create_time rn 5 P2 2022-04-01 11:22:45 5 9 P2 2022-04-03 11:22:11 9 8 P2 2022-04-03 11:22:20 9 6 P2 2022-04-03 11:22:33 9 7 P2 2022-04-03 11:22:38 9 3 P1 2022-04-01 11:22:28 3 1 P1 2022-04-01 11:22:33 3 2 P1 2022-04-01 11:22:36 3 4 P1 2022-04-01 11:22:57 4 select * ,lag(id) over(partition by c_type order by create_time rows between 1 PRECEDING and CURRENT ROW) as rn from pinko_0410; id c_type create_time rn 3 P1 2022-04-01 11:22:28 1 P1 2022-04-01 11:22:33 3 2 P1 2022-04-01 11:22:36 1 4 P1 2022-04-01 11:22:57 2 5 P2 2022-04-01 11:22:45 9 P2 2022-04-03 11:22:11 5 8 P2 2022-04-03 11:22:20 9 6 P2 2022-04-03 11:22:33 8 7 P2 2022-04-03 11:22:38 6 select * ,lag(id,2) over(partition by c_type order by create_time rows between 1 PRECEDING and CURRENT ROW) as rn from pinko_0410; id c_type create_time rn 5 P2 2022-04-01 11:22:45 9 P2 2022-04-03 11:22:11 8 P2 2022-04-03 11:22:20 5 6 P2 2022-04-03 11:22:33 9 7 P2 2022-04-03 11:22:38 8 3 P1 2022-04-01 11:22:28 1 P1 2022-04-01 11:22:33 2 P1 2022-04-01 11:22:36 3 4 P1 2022-04-01 11:22:57 1 select * ,lag(id,2,'hello') over(partition by c_type order by create_time rows between 1 PRECEDING and CURRENT ROW) as rn from pinko_0410; id c_type create_time rn 5 P2 2022-04-01 11:22:45 hello 9 P2 2022-04-03 11:22:11 hello 8 P2 2022-04-03 11:22:20 5 6 P2 2022-04-03 11:22:33 9 7 P2 2022-04-03 11:22:38 8 3 P1 2022-04-01 11:22:28 hello 1 P1 2022-04-01 11:22:33 hello 2 P1 2022-04-01 11:22:36 3 4 P1 2022-04-01 11:22:57 1 ```