hive和presto的求数组长度函数区别及注意事项

1、任务

获取邮箱字符串'@'后字符串 ,求长度

2、hive & spark-sql 求数组长度的函数 size

sql 复制代码
hive & spark-sql 求数组长度的函数 size


select size(split(email, '@')),split(email, '@'),split(email, '@')[0],split(email, '@')[1]
FROM 
(select "[email protected]" as email union select "[email protected]" as email) tb_mid;

select size(split(email, '@')),split(email, '@'),split(email, '@')[0],split(email, '@')[1]
FROM 
(select '[email protected]' as email union select '[email protected]' as email) tb_mid;


2	["tom","126.com.cn"]	tom	126.com.cn
2	["jack","126.com"]	jack	126.com
Time taken: 0.723 seconds, Fetched 2 row(s)

3、presto 求数组长度的函数 cardinality

sql 复制代码
presto  求数组长度的函数 cardinality

select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
FROM 
(select '[email protected]' as email union select '[email protected]' as email) tb_mid;

_col0 |       _col1       | _col2 |   _col3    
-------+-------------------+-------+------------
     2 | [tom, 126.com.cn] | tom   | 126.com.cn 
     2 | [jack, 126.com]   | jack  | 126.com    
(2 rows)


select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
FROM 
(select "[email protected]" as email union select "[email protected]" as email) tb_mid;


Query 20231019_070945_20009_n9u2s failed: line 3:9: Column '[email protected]' cannot be resolved
select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
FROM
(select "[email protected]" as email union select "[email protected]" as email) tb_mid

4、注意事项

1)、在计算数组长度的时候,hive和presto的函数不同

其中hive的size函数默认数组的下标从0开始

presto的cardinality函数默认数组的下标从1开始

2)、presto 不支持双引号 ,而hive 既支持单引号,也支持双引号

sql 复制代码
presto> SELECT 
     -> email,
     -> (case when cardinality(split(email, '@')) = 2 then split(email, '@')[1] else '' end ) as email_suffix
     -> FROM 
     -> (select "[email protected]" as email union select "[email protected]" as email) tb_mid;
Query 20231016_070153_17958_p9f2s failed: line 5:9: Column '[email protected]' cannot be resolved
SELECT
email,
(case when cardinality(split(email, '@')) = 2 then split(email, '@')[1] else '' end ) as email_suffix
FROM
(select "[email protected]" as email union select "[email protected]" as email) tb_mid
相关推荐
viperrrrrrrrrr715 小时前
大数据学习(74)-Hue元数据
大数据·hive·impala·hue·metasrore
ui设计前端开发老司机1 天前
在大数据开发中hive是指什么?
大数据·hive·hadoop
窝窝和牛牛1 天前
Hive与Spark的UDF:数据处理利器的对比与实践
hive·hadoop·spark
一个天蝎座 白勺 程序猿2 天前
大数据(1.1)纽约出租车大数据分析实战:从Hadoop到Azkaban的全链路解析与优化
大数据·hive·hadoop·分布式·sql·数据分析·sqoop
晴天彩虹雨2 天前
Hive & Presto SQL 查询优化指南
数据仓库·hive·hadoop·sql·big data
百香果果ccc3 天前
Maven的继承和聚合
java·hive·maven
奔跑吧邓邓子3 天前
【商城实战(30)】从0到1搭建商城数据分析功能,开启数据驱动增长引擎
hive·数据挖掘·数据分析·spark·商城实战
winner88814 天前
Hive SQL 精进系列:字符串拼接的三种常用方式
hive·hadoop·sql
winner88814 天前
Hive SQL 精进系列:解锁 Hive SQL 中 KeyValue 函数的强大功能
hive·hadoop·sql
闯闯桑5 天前
hive 中优化性能的一些方法
数据仓库·hive·hadoop