hive常用SQL函数及案例

1 函数简介

Hive会将常用的逻辑封装成函数给用户进行使用，类似于Java中的函数。

好处：避免用户反复写逻辑，可以直接拿来使用。

重点：用户需要知道函数叫什么，能做什么。

Hive提供了大量的内置函数，按照其特点可大致分为如下几类：单行函数、聚合函数、炸裂函数、窗口函数。

以下命令可用于查询所有内置函数的相关信息。

（1）查看系统内置函数

bash 复制代码

show functions;

（2）查看内置函数用法

bash 复制代码

 desc function upper;

（3）查看内置函数详细信息

bash 复制代码

 desc function extended upper;

2 单行函数

单行函数的特点是一进一出，即输入一行，输出一行。

单行函数按照功能可分为如下几类: 日期函数、字符串函数、集合函数、数学函数、流程控制函数等。

（1）算术运算函数

案例实操：查询出所有员工的薪水后加1显示

bash 复制代码

select sal + 1 from emp;

3 数值函数

（1）round：四舍五入

bash 复制代码

select round(3.3);

（2）ceil：向上取整

bash 复制代码

select ceil(3.1) ;

（3）floor：向下取整

bash 复制代码

select floor(4.8);

4 字符串函数

(1) substring：截取字符串

语法一：substring(string A, int start)

返回值：string

说明：返回字符串A从start位置到结尾的字符串

语法二：substring(string A, int start, int len)

返回值：string

说明：返回字符串A从start位置开始，长度为len的字符串

说明：获取第二个字符以后的所有字符：

说明：获取倒数第三个字符以后的所有字符

bash 复制代码

 select substring("atguigu",-3);

说明：从第3个字符开始，向后获取2个字符

bash 复制代码

select substring("atguigu",3,2);

(2) replace ：替换

语法：replace(string A, string B, string C)

返回值：string

说明：将字符串A中的子字符串B替换为C

bash 复制代码

select replace('atguigu', 'a', 'A')

(3）regexp_replace：正则替换

语法：regexp_replace(string A, string B, string C)

返回值：string

说明：将字符串A中的符合java正则表达式B的部分替换为C。注意，在有些情况下要使用转义字符。

bash 复制代码

select regexp_replace('100-200', '(\\d+)', 'num')

(4）regexp：正则匹配

语法：字符串 regexp 正则表达式

返回值：boolean

说明：若字符串符合正则表达式，则返回true，否则返回false。

说明：正则匹配成功，输出true

bash 复制代码

select 'dfsaaaa' regexp 'dfsa+'

说明：正则匹配失败，输出false

bash 复制代码

select 'dfsaaaa' regexp 'dfsb+';

(5）repeat：重复字符串

语法：repeat(string A, int n)

返回值：string

说明：将字符串A重复n遍。

bash 复制代码

select repeat('123', 3);

(6）split ：字符串切割

语法：split(string str, string pat)

返回值：array

说明：按照正则表达式pat匹配到的内容分割str，分割后的字符串，以数组的形式返回。

(7）nvl ：替换null值

语法：nvl(A,B)

说明：若A的值不为null，则返回A，否则返回B。

bash 复制代码

select nvl(null,1);

(8）concat ：拼接字符串

语法：concat(string A, string B, string C, ......)

返回：string

说明：将A,B,C......等字符拼接为一个字符串

bash 复制代码

select concat('beijing','-','shanghai','-','shenzhen');

(9）concat_ws：以指定分隔符拼接字符串或者字符串数组

语法：concat_ws(string A, string...| array(string))

返回值：string

说明：使用分隔符A拼接多个字符串，或者一个数组的所有元素。

(10）get_json_object：解析json字符串

语法：get_json_object(string json_string, string path)

返回值：string

说明：解析json的字符串json_string，返回path指定的内容。如果输入的json字符串无效，那么返回NULL。

获取json数组里面的json具体数据

bash 复制代码

select get_json_object('[{"name":"大海海","sex":"男","age":"25"},{"name":"小宋宋","sex":"男","age":"47"}]','$.[0].name');

获取json数组里面的数据

bash 复制代码

select get_json_object('[{"name":"大海海","sex":"男","age":"25"},{"name":"小宋宋","sex":"男","age":"47"}]','$.[0]');

5 日期函数

(1）unix_timestamp：返回当前或指定时间的时间戳

语法：unix_timestamp()

返回值：bigint

说明：-前面是日期后面是指，日期传进来的具体格式

bash 复制代码

select unix_timestamp('2022/08/08 08-08-08','yyyy/MM/dd HH-mm-ss');

(2）from_unixtime：转化UNIX时间戳（从 1970-01-01 00:00:00 UTC 到指定时间的秒数）到当前时区的时间格式

语法：from_unixtime(bigint unixtime[, string format])

返回值：string

bash 复制代码

select from_unixtime(1659946088);

(3）current_date：当前日期

bash 复制代码

select current_date;

(4）current_timestamp：当前的日期加时间，并且精确的毫秒

bash 复制代码

select current_timestamp;

(5）month：获取日期中的月

语法：month (string date)

返回值：int

bash 复制代码

 select month('2022-08-08 08:08:08');

(6）day：获取日期中的日

语法：day (string date)

返回值：int

bash 复制代码

select day('2022-08-08 08:08:08')

(7）hour：获取日期中的小时

语法：hour (string date)

返回值：int

bash 复制代码

 select hour('2022-08-08 08:08:08');

(8）datediff：两个日期相差的天数（结束日期减去开始日期的天数）

语法：datediff(string enddate, string startdate)

返回值：int

bash 复制代码

 select datediff('2021-08-08','2022-10-09');

(9）date_add：日期加天数

语法：date_add(string startdate, int days)

返回值：string

说明：返回开始日期 startdate 增加 days 天后的日期

bash 复制代码

 select date_add('2022-08-08',2);

(10）date_sub：日期减天数

语法：date_sub (string startdate, int days)

返回值：string

说明：返回开始日期startdate减少days天后的日期。

bash 复制代码

 select date_sub('2022-08-08',2);

(11）date_format:将标准日期解析成指定格式字符串

bash 复制代码

 select date_format('2022-08-08','yyyy年-MM月-dd日')

6 流程控制函数

(1）case when：条件判断函数

语法一：case when a then b [when c then d]* [else e] end

返回值：T

说明：如果a为true，则返回b；如果c为true，则返回d；否则返回 e

bash 复制代码

select case when 1=2 then 'tom' when 2=2 then 'mary' else 'tim' end from location;

语法二： case a when b then c [when d then e]* [else f] end

返回值: T

说明：如果a等于b，那么返回c；如果a等于d，那么返回e；否则返回f

bash 复制代码

select case 100 when 50 then 'tom' when 100 then 'mary' else 'tim' end from location;

(2）if: 条件判断，类似于Java中三元运算符

语法：if（boolean testCondition, T valueTrue, T valueFalseOrNull）

返回值：T

说明：当条件testCondition为true时，返回valueTrue；否则返回valueFalseOrNull

条件满足，输出正确

bash 复制代码

select if(10 > 5,'正确','错误');

条件满足，输出错误

bash 复制代码

 select if(10 < 5,'正确','错误');

7 集合函数

(1）size：集合中元素的个数

bash 复制代码

 select size(array('beijing','shenzhen','shanghai')) from location;

(2）map：创建map集合

语法：map (key1, value1, key2, value2, ...)

说明：根据输入的key和value对构建map类型

bash 复制代码

 select map('xiaohai',1,'dahai',2);

(3）map_keys：返回map中的key

bash 复制代码

select map_keys(map('xiaohai',1,'dahai',2));

(4）map_values: 返回map中的value

bash 复制代码

select map_values(map('xiaohai',1,'dahai',2));

(5）array 声明array集合

语法：array(val1, val2, ...)

说明：根据输入的参数构建数组array类

bash 复制代码

 select array('1','2','3','4');

(6）array_contains: 判断array中是否包含某个元素

bash 复制代码

 select array_contains(array('a','b','c','d'),'a');

(7）sort_array：将array中的元素排序

bash 复制代码

select sort_array(array('a','d','c'));

(8）struct声明struct中的各属性

语法：struct(val1, val2, val3, ...)

说明：根据输入的参数构建结构体struct类

bash 复制代码

select struct('name','age','weight');

(9）named_struct声明struct的属性和值

bash 复制代码

select named_struct('name','xiaosong','age',18,'weight',80);

8 高级聚合函数

(1）collect_list 收集并形成list集合，结果不去重

bash 复制代码

select 
  sex,
  collect_list(job)
from
  employee
group by 
  sex

(2) collect_set 收集并形成set集合，结果去重

bash 复制代码

select 
  sex,
  collect_set(job)
from
  employee
group by 
  sex

9 常用窗口函数

参考以下文章：

开窗函数的使用详解(聚合函数图文详解)

原文链接：https://blog.csdn.net/m0_52606060/article/details/129150481

开窗函数的使用详解(窗口范围ROWS与RANGE图文详解)

原文链接：https://blog.csdn.net/m0_52606060/article/details/129132985

10 自定义函数

参考以下文章：

hive自定义函数及案例

原文链接：https://blog.csdn.net/m0_52606060/article/details/134826464

hive常用SQL函数及案例

1 函数简介

（1）查看系统内置函数

（2）查看内置函数用法

（3）查看内置函数详细信息

2 单行函数

（1）算术运算函数

3 数值函数

（1）round：四舍五入

（2）ceil：向上取整

（3）floor：向下取整

4 字符串函数

(1) substring：截取字符串

(2) replace ：替换

(3）regexp_replace：正则替换

(4）regexp：正则匹配

(5）repeat：重复字符串

(6）split ：字符串切割

(7）nvl ：替换null值

(8）concat ：拼接字符串

(9）concat_ws：以指定分隔符拼接字符串或者字符串数组

(10）get_json_object：解析json字符串

5 日期函数

(1）unix_timestamp：返回当前或指定时间的时间戳

(2）from_unixtime：转化UNIX时间戳（从 1970-01-01 00:00:00 UTC 到指定时间的秒数）到当前时区的时间格式

(3）current_date：当前日期

(4）current_timestamp：当前的日期加时间，并且精确的毫秒

(5）month：获取日期中的月

(6）day：获取日期中的日

(7）hour：获取日期中的小时

(8）datediff：两个日期相差的天数（结束日期减去开始日期的天数）

(9）date_add：日期加天数

(10）date_sub：日期减天数

(11）date_format:将标准日期解析成指定格式字符串

6 流程控制函数

(1）case when：条件判断函数

(2）if: 条件判断，类似于Java中三元运算符

7 集合函数

(1）size：集合中元素的个数

(2）map：创建map集合

(3）map_keys： 返回map中的key

(4）map_values: 返回map中的value

(5）array 声明array集合

(6）array_contains: 判断array中是否包含某个元素

(7）sort_array：将array中的元素排序

(8）struct声明struct中的各属性

(9）named_struct声明struct的属性和值

8 高级聚合函数

(1）collect_list 收集并形成list集合，结果不去重

(2) collect_set 收集并形成set集合，结果去重

9 常用窗口函数

开窗函数的使用详解(聚合函数图文详解)

开窗函数的使用详解(窗口范围ROWS与RANGE图文详解)

10 自定义函数

hive自定义函数及案例

(3）map_keys：返回map中的key