hive中explode相关的列转行总结
explode
explode 的输入只能是 array 或者map格式,按行输出array或map中的元素,比如:
sql
select explode(split('1,2,3',','))
输出
explode (split ('1,2,3', ',')) |
---|
1 |
2 |
3 |
sql
-- map explode
select explode(map('A','1','B','2','C','3'))
输出:
key | value |
---|---|
A | 1 |
B | 2 |
C | 3 |
posexplode()
在explode的基础上,输出了pos,结合lateral view 细腻表适用于对于多列进行转成多行的情况。
如:
sql
select posexplode(split('1,2,3',','))
输出:
pos | col |
---|---|
0 | 1 |
1 | 2 |
2 | 3 |
laterval view 虚拟表
结合udtf产出虚拟表:
sql
lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)*
fromClause: FROM baseTable (lateralView)*
lateral view explode(array)示例
sql
select id,tim,single_tim
from test.a lateral view explode(split(tim,',')) t as single_tim
--输出:
id tim single_tim
a,b,c,d 2:00,3:00,4:00,5:00 2:00
a,b,c,d 2:00,3:00,4:00,5:00 3:00
a,b,c,d 2:00,3:00,4:00,5:00 4:00
a,b,c,d 2:00,3:00,4:00,5:00 5:00
f,b,c,d 1:10,2:20,3:30,4:40 1:10
f,b,c,d 1:10,2:20,3:30,4:40 2:20
f,b,c,d 1:10,2:20,3:30,4:40 3:30
f,b,c,d 1:10,2:20,3:30,4:40 4:40
Time taken: 51.289 seconds, Fetched: 8 row(s)
lateral view posexplode(array)示例
sql
select id,tim,single_id,single_tim from test.a
lateral view posexplode(split(id,',')) t as single_id_index, single_id
lateral view posexplode(split(tim,',')) t as single_yim_index, single_tim
where single_id_index = single_yim_index;
-- 输出:
id tim single_id single_tim
a,b,c,d 2:00,3:00,4:00,5:00 a 2:00
a,b,c,d 2:00,3:00,4:00,5:00 b 3:00
a,b,c,d 2:00,3:00,4:00,5:00 c 4:00
a,b,c,d 2:00,3:00,4:00,5:00 d 5:00
f,b,c,d 1:10,2:20,3:30,4:40 f 1:10
f,b,c,d 1:10,2:20,3:30,4:40 b 2:20
f,b,c,d 1:10,2:20,3:30,4:40 c 3:30
f,b,c,d 1:10,2:20,3:30,4:40 d 4:40
注意事项
- lateral view [outer] explode(),如果要拆分的字段有null值,需要使用lateral view outer 替代,避免数据缺失,参考文档见:[https://blog.csdn.net/weixin_45857425/article/details/117933039\];(https://blog.csdn.net/weixin_45857425/article/details/117933039)
- lateral view的位置是from后where条件前;
- 生成的虚拟表的表名不可省略;
- from后可带多个lateral view;
参考文档:
重点阅读:Hive--sql中的explode()函数和posexplode()函数
explode和lateral view explode使用记录