hive和presto的求数组长度函数区别及注意事项

1、任务

获取邮箱字符串'@'后字符串 ,求长度

2、hive & spark-sql 求数组长度的函数 size

sql 复制代码
hive & spark-sql 求数组长度的函数 size


select size(split(email, '@')),split(email, '@'),split(email, '@')[0],split(email, '@')[1]
FROM 
(select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid;

select size(split(email, '@')),split(email, '@'),split(email, '@')[0],split(email, '@')[1]
FROM 
(select 'jack@126.com' as email union select 'tom@126.com.cn' as email) tb_mid;


2	["tom","126.com.cn"]	tom	126.com.cn
2	["jack","126.com"]	jack	126.com
Time taken: 0.723 seconds, Fetched 2 row(s)

3、presto 求数组长度的函数 cardinality

sql 复制代码
presto  求数组长度的函数 cardinality

select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
FROM 
(select 'jack@126.com' as email union select 'tom@126.com.cn' as email) tb_mid;

_col0 |       _col1       | _col2 |   _col3    
-------+-------------------+-------+------------
     2 | [tom, 126.com.cn] | tom   | 126.com.cn 
     2 | [jack, 126.com]   | jack  | 126.com    
(2 rows)


select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
FROM 
(select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid;


Query 20231019_070945_20009_n9u2s failed: line 3:9: Column 'jack@126.com' cannot be resolved
select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
FROM
(select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid

4、注意事项

1)、在计算数组长度的时候,hive和presto的函数不同

其中hive的size函数默认数组的下标从0开始

presto的cardinality函数默认数组的下标从1开始

2)、presto 不支持双引号 ,而hive 既支持单引号,也支持双引号

sql 复制代码
presto> SELECT 
     -> email,
     -> (case when cardinality(split(email, '@')) = 2 then split(email, '@')[1] else '' end ) as email_suffix
     -> FROM 
     -> (select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid;
Query 20231016_070153_17958_p9f2s failed: line 5:9: Column 'jack@126.com' cannot be resolved
SELECT
email,
(case when cardinality(split(email, '@')) = 2 then split(email, '@')[1] else '' end ) as email_suffix
FROM
(select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid
相关推荐
随心............17 小时前
sqoop采集完成后导致hdfs数据与Oracle数据量不符的问题。怎么解决?
hive·hadoop·sqoop
随心............2 天前
yarn面试题
大数据·hive·spark
随心............2 天前
在开发过程中遇到问题如何解决,以及两个经典问题
hive·hadoop·spark
yumgpkpm3 天前
CMP (类ClouderaCDP7.3(404次编译) )华为鲲鹏Aarch64(ARM)信创环境 查询2100w行 hive 查询策略
数据库·数据仓库·hive·hadoop·flink·mapreduce·big data
starfalling10244 天前
【hive】一种高效增量表的实现
hive
D明明就是我4 天前
Hive 拉链表
数据仓库·hive·hadoop
嘉禾望岗5035 天前
hive join优化和数据倾斜处理
数据仓库·hive·hadoop
yumgpkpm5 天前
华为鲲鹏 Aarch64 环境下多 Oracle 数据库汇聚操作指南 CMP(类 Cloudera CDP 7.3)
大数据·hive·hadoop·elasticsearch·zookeeper·big data·cloudera
忧郁火龙果5 天前
六、Hive的基本使用
数据仓库·hive·hadoop
忧郁火龙果5 天前
五、安装配置hive
数据仓库·hive·hadoop