HiveSQL高级进阶技巧

目录

掌握下面的技巧,你的SQL水平将有一个质的提升!

1.删除

正常hive删除操作基本都是覆盖原数据;

sql 复制代码
insert overwrite tmp 
select * from tmp where id != '666';

2.更新:

更新也是覆盖操作;

sql 复制代码
insert overwrite tmp 
select id,label,
       if(id = '1' and label = 'grade','25',value) as value 
from tmp where id != '666';

3.行转列:

思路1:

先通过concat函数把多列数据拼接成一个长的字符串,分割符为逗号,再通过explode函数炸裂成多行,然后使用split函数根据分隔符进行切割;

sql 复制代码
-- Step03:最后将info的内容切分
select id,split(info,':')[0] as label,split(info,':')[1] as value
from 
(
-- Step01:先将数据拼接成"heit:180,weit:60,age:26"
    select id,concat('heit',':',height,',','weit',':',weight,',','age',':',age) as value 
    from tmp
) as tmp
-- Step02:然后在借用explode函数将数据膨胀至多行
lateral view explode(split(value,',')) mytable as info;

思路2:使用union all函数,多段union

sql 复制代码
select id,'heit' as label,height as value
union all 
select id,'weit' as label,weight as value
union all 
select id,'age' as label,age as value

4.列转行:

思路1:多表join,进行关联

sql 复制代码
select 
tmp1.id as id,tmp1.value as height,tmp2.value as weight,tmp3.value as age 
from 
(select id,label,value from tmp2 where label = 'heit') as tmp1
join
on tmp1.id = tmp2.id
(select id,label,value from tmp2 where label = 'weit') as tmp2
join
on tmp1.id = tmp2.id
(select id,label,value from tmp2 where label = 'age') as tmp3
on tmp1.id = tmp3.id;

思路2:使用max(if) 或max(case when ),可以根据实际情况换成sum函数

sql 复制代码
select 
id,
max(case when label = 'heit' then value  end) as height,
max(case when label = 'weit' then value  end) as weight,
max(case when label = 'age' then value  end) as age 
from tmp2 
group by
id;

思路3:map的思想,先拼接成map的形式,再取下标

sql 复制代码
select
id,tmpmap['height'] as height,tmpmap['weight'] as weight,tmpmap['age'] as age
from 
(
    select id,
           str_to_map(concat_ws(',',collect_set(concat(label,':',value))),',',':') as tmpmap  
    from tmp2 group by id
) as tmp1;

5.分析函数:

sql 复制代码
select id,label,value,
       lead(value,1,0)over(partition by id order by label) as lead,
       lag(value,1,999)over(partition by id order by label) as lag,
       first_value(value)over(partition by id order by label) as first_value,
       last_value(value)over(partition by id order by label) as last_value
from tmp;
sql 复制代码
select id,label,value,
       row_number()over(partition by id order by value) as row_number,
       rank()over(partition by id order by value) as rank,
       dense_rank()over(partition by id order by value) as dense_rank
from tmp;

6.多维分析

sql 复制代码
select col1,col2,col3,count(1),
       Grouping__ID 
from tmp 
group by col1,col2,col3
grouping sets(col1,col2,col3,(col1,col2),(col1,col3),(col2,col3),())
sql 复制代码
select col1,col2,col3,count(1),
       Grouping__ID 
from tmp 
group by col1,col2,col3
with cube;

7.数据倾斜

groupby:

sql 复制代码
select label,sum(cnt) as all from 
(
    select rd,label,sum(1) as cnt from 
    (
        select id,label,round(rand(),2) as rd,value from tmp1
    ) as tmp
    group by rd,label
) as tmp
group by label;

join:

sql 复制代码
select label,sum(value) as all from 
(
    select rd,label,sum(value) as cnt from
    (
        select tmp1.rd as rd,tmp1.label as label,tmp1.value*tmp2.value as value 
        from 
        (
            select id,round(rand(),1) as rd,label,value from tmp1
        ) as tmp1
        join
        (
            select id,rd,label,value from tmp2
            lateral view explode(split('0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9',',')) mytable as rd
        ) as tmp2
        on tmp1.rd = tmp2.rd and tmp1.label = tmp2.label
    ) as tmp1
    group by rd,label
) as tmp1
group by label;
相关推荐
枷锁—sha1 天前
【PortSwigger Academy】SQL 注入绕过登录 (Login Bypass)
数据库·sql·学习·安全·网络安全
查士丁尼·绵1 天前
hadoop集群存算分离
hive·hdfs·zookeeper·spark·hbase·yarn·galera
逍遥德1 天前
PostgreSQL 中唯一约束(UNIQUE CONSTRAINT) 和唯一索引(UNIQUE INDEX) 的核心区别
数据库·sql·postgresql·dba
工业甲酰苯胺1 天前
字符串分割并展开成表格的SQL实现方法
数据库·sql
小句1 天前
SQL中JOIN语法详解 GROUP BY语法详解
数据库·sql
昊昊该干饭了1 天前
一个真实查询需求如何从表设计走到高效 SQL
数据库·sql
Elastic 中国社区官方博客1 天前
使用瑞士风格哈希表实现更快的 ES|QL 统计
大数据·数据结构·sql·elasticsearch·搜索引擎·全文检索·散列表
pengweizhong1 天前
Dynamic‑SQL2 查询篇:MyBatis 增强利器,让 SQL 像写 Java 一样丝滑
java·sql·教程
逍遥德1 天前
Postgresql 系统表作用解释
数据库·后端·sql·postgresql
SJLoveIT1 天前
sql注入攻击的防御思路总结
数据库·sql