HiveSQL高级进阶技巧

目录

掌握下面的技巧,你的SQL水平将有一个质的提升!

1.删除

正常hive删除操作基本都是覆盖原数据;

sql 复制代码
insert overwrite tmp 
select * from tmp where id != '666';

2.更新:

更新也是覆盖操作;

sql 复制代码
insert overwrite tmp 
select id,label,
       if(id = '1' and label = 'grade','25',value) as value 
from tmp where id != '666';

3.行转列:

思路1:

先通过concat函数把多列数据拼接成一个长的字符串,分割符为逗号,再通过explode函数炸裂成多行,然后使用split函数根据分隔符进行切割;

sql 复制代码
-- Step03:最后将info的内容切分
select id,split(info,':')[0] as label,split(info,':')[1] as value
from 
(
-- Step01:先将数据拼接成"heit:180,weit:60,age:26"
    select id,concat('heit',':',height,',','weit',':',weight,',','age',':',age) as value 
    from tmp
) as tmp
-- Step02:然后在借用explode函数将数据膨胀至多行
lateral view explode(split(value,',')) mytable as info;

思路2:使用union all函数,多段union

sql 复制代码
select id,'heit' as label,height as value
union all 
select id,'weit' as label,weight as value
union all 
select id,'age' as label,age as value

4.列转行:

思路1:多表join,进行关联

sql 复制代码
select 
tmp1.id as id,tmp1.value as height,tmp2.value as weight,tmp3.value as age 
from 
(select id,label,value from tmp2 where label = 'heit') as tmp1
join
on tmp1.id = tmp2.id
(select id,label,value from tmp2 where label = 'weit') as tmp2
join
on tmp1.id = tmp2.id
(select id,label,value from tmp2 where label = 'age') as tmp3
on tmp1.id = tmp3.id;

思路2:使用max(if) 或max(case when ),可以根据实际情况换成sum函数

sql 复制代码
select 
id,
max(case when label = 'heit' then value  end) as height,
max(case when label = 'weit' then value  end) as weight,
max(case when label = 'age' then value  end) as age 
from tmp2 
group by
id;

思路3:map的思想,先拼接成map的形式,再取下标

sql 复制代码
select
id,tmpmap['height'] as height,tmpmap['weight'] as weight,tmpmap['age'] as age
from 
(
    select id,
           str_to_map(concat_ws(',',collect_set(concat(label,':',value))),',',':') as tmpmap  
    from tmp2 group by id
) as tmp1;

5.分析函数:

sql 复制代码
select id,label,value,
       lead(value,1,0)over(partition by id order by label) as lead,
       lag(value,1,999)over(partition by id order by label) as lag,
       first_value(value)over(partition by id order by label) as first_value,
       last_value(value)over(partition by id order by label) as last_value
from tmp;
sql 复制代码
select id,label,value,
       row_number()over(partition by id order by value) as row_number,
       rank()over(partition by id order by value) as rank,
       dense_rank()over(partition by id order by value) as dense_rank
from tmp;

6.多维分析

sql 复制代码
select col1,col2,col3,count(1),
       Grouping__ID 
from tmp 
group by col1,col2,col3
grouping sets(col1,col2,col3,(col1,col2),(col1,col3),(col2,col3),())
sql 复制代码
select col1,col2,col3,count(1),
       Grouping__ID 
from tmp 
group by col1,col2,col3
with cube;

7.数据倾斜

groupby:

sql 复制代码
select label,sum(cnt) as all from 
(
    select rd,label,sum(1) as cnt from 
    (
        select id,label,round(rand(),2) as rd,value from tmp1
    ) as tmp
    group by rd,label
) as tmp
group by label;

join:

sql 复制代码
select label,sum(value) as all from 
(
    select rd,label,sum(value) as cnt from
    (
        select tmp1.rd as rd,tmp1.label as label,tmp1.value*tmp2.value as value 
        from 
        (
            select id,round(rand(),1) as rd,label,value from tmp1
        ) as tmp1
        join
        (
            select id,rd,label,value from tmp2
            lateral view explode(split('0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9',',')) mytable as rd
        ) as tmp2
        on tmp1.rd = tmp2.rd and tmp1.label = tmp2.label
    ) as tmp1
    group by rd,label
) as tmp1
group by label;
相关推荐
zzzzzz3102 天前
9K Star 炸裂开源!这个 C 语言写的代码知识图谱,把 Linux 内核索引压缩到了 3 分钟
linux·服务器·sql
云技纵横4 天前
唯一索引 INSERT 死锁实战:5 秒复现交叉插入的 S 锁循环等待
sql·mysql
王小王-1236 天前
基于 Hive 的网易云音乐数据分析及可视化系统
hive·hadoop·数据分析·音乐数据分析·网易云音乐分析·hive音乐分析·hadoop网易云
BD_Marathon6 天前
SQL学习指南——视图
数据库·sql
2601_962072556 天前
李梦娇常识4600问|题库|打印版
sql·华为od·华为·c#·华为云·.net·harmonyos
HackTwoHub6 天前
Sqli-Scanner SQL注入SKILL自动化挖掘SQL注入,零依赖自动化SQL注入挖掘,赏金猎人
数据库·人工智能·sql·web安全·网络安全·自动化·系统安全
Volunteer Technology6 天前
Flink Table API与SQL(一)
大数据·sql·flink
持敬chijing6 天前
Web渗透之SQL注入-常用sql语句
sql·安全·web安全·网络安全
Theo·Chan6 天前
更换 Kingbase V9 License 踩坑记
sql·信创·kingbase
yangshicong6 天前
第16章:AI数据分析与Text-to-SQL
人工智能·python·sql·数据分析·langchain