hive和presto的求数组长度函数区别及注意事项

1、任务

获取邮箱字符串'@'后字符串 ,求长度

2、hive & spark-sql 求数组长度的函数 size

sql 复制代码
hive & spark-sql 求数组长度的函数 size


select size(split(email, '@')),split(email, '@'),split(email, '@')[0],split(email, '@')[1]
FROM 
(select "[email protected]" as email union select "[email protected]" as email) tb_mid;

select size(split(email, '@')),split(email, '@'),split(email, '@')[0],split(email, '@')[1]
FROM 
(select '[email protected]' as email union select '[email protected]' as email) tb_mid;


2	["tom","126.com.cn"]	tom	126.com.cn
2	["jack","126.com"]	jack	126.com
Time taken: 0.723 seconds, Fetched 2 row(s)

3、presto 求数组长度的函数 cardinality

sql 复制代码
presto  求数组长度的函数 cardinality

select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
FROM 
(select '[email protected]' as email union select '[email protected]' as email) tb_mid;

_col0 |       _col1       | _col2 |   _col3    
-------+-------------------+-------+------------
     2 | [tom, 126.com.cn] | tom   | 126.com.cn 
     2 | [jack, 126.com]   | jack  | 126.com    
(2 rows)


select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
FROM 
(select "[email protected]" as email union select "[email protected]" as email) tb_mid;


Query 20231019_070945_20009_n9u2s failed: line 3:9: Column '[email protected]' cannot be resolved
select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
FROM
(select "[email protected]" as email union select "[email protected]" as email) tb_mid

4、注意事项

1)、在计算数组长度的时候,hive和presto的函数不同

其中hive的size函数默认数组的下标从0开始

presto的cardinality函数默认数组的下标从1开始

2)、presto 不支持双引号 ,而hive 既支持单引号,也支持双引号

sql 复制代码
presto> SELECT 
     -> email,
     -> (case when cardinality(split(email, '@')) = 2 then split(email, '@')[1] else '' end ) as email_suffix
     -> FROM 
     -> (select "[email protected]" as email union select "[email protected]" as email) tb_mid;
Query 20231016_070153_17958_p9f2s failed: line 5:9: Column '[email protected]' cannot be resolved
SELECT
email,
(case when cardinality(split(email, '@')) = 2 then split(email, '@')[1] else '' end ) as email_suffix
FROM
(select "[email protected]" as email union select "[email protected]" as email) tb_mid
相关推荐
多多*1 小时前
Java反射 八股版
java·开发语言·hive·python·sql·log4j·mybatis
yyf9601264 小时前
hiveserver2与beeline进行远程连接hive配置及遇到的问题
数据仓库·hive
yyf9601264 小时前
hive在配置文件中添加了hive.metastore.uris之后进入hive输入命令报错
hive
jiedaodezhuti5 小时前
hive两个表不同数据类型字段关联引发的数据倾斜
数据仓库·hive·hadoop
IvanCodes6 小时前
五、Hive表类型、分区及数据加载
大数据·数据仓库·hive
静听山水1 天前
Hive JOIN 优化策略详解
hive
Microsoft Word1 天前
数据仓库Hive
数据仓库·hive·hadoop
IvanCodes1 天前
四、Hive DDL表定义、数据类型、SerDe 与分隔符核心
大数据·hive·hadoop
IvanCodes1 天前
三、Hive DDL数据库操作
大数据·数据库·hive·hadoop
IT成长日记2 天前
【Hive入门】Hive数据导入与导出:批量操作与HDFS数据迁移完全指南
hive·hadoop·hdfs·数据导入与导出·load data