hive - explode 用法以及练习

Logan_addoil2024-01-26 18:12

hive explode 的用法以及练习

一行变多行 explode

例如：临时表 temp_table ，列名为1st

1st
1,2,3
4,5,6

变为

方式一：直接使用 explode

bash 复制代码

select   
 explode(split(1st,',')) 
 from temp_table;

方式二：使用 lateral view explode() 临时别名 as 列名

bash 复制代码

select  
  type
 from  temp_table
 lateral view explode(split(1st,','))  tmp as type;

练习：

数据库表名default.classinfo ,对应列分别是班级，姓名，成绩

class	student	score
1班	A,B,C	88,90,77
2班	D,E	80,92
3班	F,G,H	95,75,66

练习1：将姓名分开变为一行一个名字

bash 复制代码

select 
	class ,student_name
from
	default.classinfo
    lateral view explode(split(student,',')) t as student_name;

练习2：给每个同学一个编号，按姓名顺序，从1开始

使用 posexplode 函数

bash 复制代码

select 
	class,student_index +1 as stu_index,student_name
from
	default.classinfo
	lateral view posexplode(split(student,',')) t as student_index,student_name;

这里+1 是因为编号是从0开始的

练习3：使学生姓名与成绩进行匹配

注意：这里是对两列进行explode，会两两进行匹配，以1班为例，就是 3* 3=9 ，这显然是不对的，此时，就需要用到posexplode

，然后通过where 保留序号相同的行

bash 复制代码

select 
	class,stu_name,stu_score
from 
	defalult.classinfo
	lateral view posexplode(split(student,',')) sn as stu_n_index,stu_name
	lateral view posexplode(split(score,',')) ss as stu_s_index,stu_score
where
	stu_n_index - stu_s_index

练习4：对每个班的成绩进行排名

注意：

row_number(): 排序的字段值相同时序列号不会重复，如：1、2、(2)3、4、5（出现两个2，第二个2继续编号3）

rank() : 排序的字段值相同时序列号会重复且下一个序列号跳过重复位，如：1、2、2、4、5（出现两个2，跳过序号3，继续编号4）

dense_rank(): 排序的字段值相同时序列号会重复且下一个序列号继续序号自增，如：1、2、2、3、4（出现两个2，继续按照3编号）

这里我们使用rank()

bash 复制代码

select 
	class,
	stu_name,
	stu_score,
	rank() over(partition by class order by stu_score desc ) as stu_rank

from 
	default.classinfo
	lateral view posexplode(split(student,','))  sn as stu_n_index,stu_name
	lateral view posexplode(split(score,',')) ss as stu_s_index ,stu_score
where
	stu_n_index = stu_s_index