Hive 常用函数详细总结

本文汇总了 Hive 开发与面试中最常用、最实用的内置函数，每个函数均附有语法说明 和代码示例。内容涵盖：字符串处理、日期时间、条件判断、聚合统计、开窗分析、集合操作、类型转换、JSON 解析等。

一、字符串函数
*
1. concat / concat_ws
- 1. substr / substring
- 1. length
- 1. upper / lower
- 1. trim / ltrim / rtrim
- 1. regexp_replace
- 1. regexp_extract
- 1. split
- 1. instr / locate
- 1. get_json_object
二、日期时间函数
*
11. current_date / current_timestamp
*
12. year / month / day / hour / minute / second
*
13. datediff
*
14. date_add / date_sub
*
15. to_date
*
16. from_unixtime / unix_timestamp
*
17. date_format
*
18. trunc / last_day / next_day
三、条件函数
*
19. if
*
20. case when
*
21. coalesce
*
22. nvl
*
23. nullif / isnull / isnotnull
四、聚合函数
*
24. count / sum / avg / max / min
*
25. collect_set / collect_list
*
26. array_distinct / size
*
27. percentile / percentile_approx
*
28. histogram_numeric
五、开窗函数（分析函数）
*
29. row_number / rank / dense_rank
*
30. lag / lead
*
31. first_value / last_value
*
32. sum / avg 等聚合函数配合 over
*
33. ntile
六、集合与复杂类型函数
*
34. array_contains
*
35. sort_array
*
36. struct / named_struct
*
37. map 相关：map_keys / map_values / map_contains_key
七、类型转换与数学函数
*
38. cast
*
39. round / floor / ceil
*
40. rand
*
41. abs / pow / sqrt
八、其他实用函数
*
42. explode（Lateral View）
*
43. posexplode
*
44. reflect（调用 Java 方法）
*
45. hash
*
46. md5 / sha1
[九、UDF 与宏](#九、UDF 与宏)
*
47. 临时函数 TEMPORARY FUNCTION
*
48. 宏 CREATE TEMPORARY MACRO

一、字符串函数

1. `concat` / `concat_ws`

功能：连接字符串。
concat(str1, str2, ...)：直接拼接，若任一为 NULL 则结果为 NULL。
concat_ws(separator, str1, str2, ...)：用分隔符拼接，自动跳过 NULL。

sql 复制代码

SELECT concat('Hello', ' ', 'Hive');    -- 'Hello Hive'
SELECT concat_ws('-', '2025', '04', '18');  -- '2025-04-18'
SELECT concat_ws(',', 'a', NULL, 'b');      -- 'a,b'

2. `substr` / `substring`

功能：截取子串。substr(string, start[, length])，start 从 1 开始，负数表示从末尾倒数。

sql 复制代码

SELECT substr('Apache Hive', 8, 4);    -- 'Hive'
SELECT substr('Apache Hive', -4);      -- 'Hive'

3. `length`

功能：返回字符串长度（字符数）。

sql 复制代码

SELECT length('Hive');   -- 4

4. `upper` / `lower`

功能：大小写转换。

sql 复制代码

SELECT upper('hive'), lower('HIVE');   -- 'HIVE', 'hive'

5. `trim` / `ltrim` / `rtrim`

功能：去除首尾空格（默认）或指定字符。

sql 复制代码

SELECT trim('  Hive  ');      -- 'Hive'
SELECT ltrim('  Hive');       -- 'Hive'
SELECT rtrim('Hive  ');       -- 'Hive'

6. `regexp_replace`

功能：正则替换。regexp_replace(string, pattern, replacement)

sql 复制代码

SELECT regexp_replace('2025-04-18', '-', '/');   -- '2025/04/18'
-- 去掉数字外的所有字符
SELECT regexp_replace('abc123def', '[^0-9]', ''); -- '123'

7. `regexp_extract`

功能：正则提取第 n 个捕获组。regexp_extract(string, pattern, n)

sql 复制代码

SELECT regexp_extract('Hive 3.1.2', '(\\d+)\\.(\\d+)\\.(\\d+)', 2); -- '1'

8. `split`

功能：按正则分割字符串，返回数组。split(string, pattern)

sql 复制代码

SELECT split('a,b,c', ',')[1];   -- 'b'（数组下标从0开始）

9. `instr` / `locate`

功能：返回子串第一次出现的位置（从1开始）。instr(str, substr)，locate(substr, str[, pos])

sql 复制代码

SELECT instr('Hive is great', 'is');   -- 6
SELECT locate('great', 'Hive is great'); -- 10

10. `get_json_object`

功能：从 JSON 字符串中提取指定字段。get_json_object(json_string, path)

sql 复制代码

SELECT get_json_object('{"name":"Alice","age":25}', '$.name');  -- 'Alice'

二、日期时间函数

11. `current_date` / `current_timestamp`

sql 复制代码

SELECT current_date;        -- 2026-04-18
SELECT current_timestamp;   -- 2026-04-18 10:30:00.123

12. `year` / `month` / `day` / `hour` / `minute` / `second`

sql 复制代码

SELECT year('2025-04-18'), month('2025-04-18'), day('2025-04-18');  -- 2025, 4, 18
SELECT hour('2025-04-18 14:30:00'), minute(...), second(...);

13. `datediff`

功能：计算两个日期相差的天数。datediff(endDate, startDate)

sql 复制代码

SELECT datediff('2025-04-18', '2025-04-10');  -- 8

14. `date_add` / `date_sub`

功能：日期加减天数。date_add(date, days) / date_sub(date, days)

sql 复制代码

SELECT date_add('2025-04-18', 5);   -- '2025-04-23'
SELECT date_sub('2025-04-18', 3);   -- '2025-04-15'

15. `to_date`

功能：从时间戳中提取日期部分。

sql 复制代码

SELECT to_date('2025-04-18 13:20:00');  -- '2025-04-18'

16. `from_unixtime` / `unix_timestamp`

unix_timestamp([string[, pattern]])：日期转 Unix 时间戳（秒）。
from_unixtime(bigint[, pattern])：时间戳转日期字符串。

sql 复制代码

SELECT unix_timestamp('2025-04-18 10:00:00');       -- 1744956000
SELECT from_unixtime(1744956000, 'yyyy-MM-dd');     -- '2025-04-18'

17. `date_format`

功能：格式化日期。date_format(date, format)

sql 复制代码

SELECT date_format('2025-04-18', 'yyyy年MM月dd日');   -- '2025年04月18日'

18. `trunc` / `last_day` / `next_day`

trunc(date, 'MM')：返回当月第一天。
last_day(date)：返回当月最后一天。
next_day(date, 'Monday')：返回下一个周一。

sql 复制代码

SELECT trunc('2025-04-18', 'MM');   -- '2025-04-01'
SELECT last_day('2025-04-18');      -- '2025-04-30'
SELECT next_day('2025-04-18', 'SUNDAY');  -- 下一个周日

三、条件函数

19. `if`

功能：if(condition, true_value, false_value)

sql 复制代码

SELECT if(1=1, 'Yes', 'No');   -- 'Yes'

20. `case when`

功能：多条件判断。

sql 复制代码

SELECT case when score >= 90 then 'A' 
            when score >= 60 then 'B'
            else 'C' end as grade
FROM scores;

21. `coalesce`

功能：返回第一个非 NULL 值。coalesce(v1, v2, ...)

sql 复制代码

SELECT coalesce(NULL, NULL, 'Hive', 'Spark');  -- 'Hive'

22. `nvl`

功能：nvl(value, default)，若 value 为 NULL 则返回 default。

sql 复制代码

SELECT nvl(NULL, 'default');   -- 'default'

23. `nullif` / `isnull` / `isnotnull`

nullif(a, b)：若 a = b 返回 NULL，否则返回 a。
isnull(a)：等价于 a is null。

sql 复制代码

SELECT nullif(5,5);      -- NULL
SELECT isnull(NULL);     -- true

四、聚合函数

24. `count` / `sum` / `avg` / `max` / `min`

sql 复制代码

SELECT count(1), sum(sales), avg(price), max(price), min(price)
FROM orders;

25. `collect_set` / `collect_list`

功能：将分组内某列的值收集成数组（collect_set 去重，collect_list 不去重）。

sql 复制代码

SELECT dept, collect_set(name) AS distinct_names
FROM employees GROUP BY dept;

26. `array_distinct` / `size`

array_distinct(array)：数组去重。
size(array|map)：返回元素个数。

sql 复制代码

SELECT array_distinct(collect_list(city)) FROM table;
SELECT size(collect_set(uid)) FROM logs;

27. `percentile` / `percentile_approx`

percentile(col, p)：精确计算百分位数（仅整数列）。
percentile_approx(col, p[, B])：近似计算，p 为 0~1 或数组。

sql 复制代码

SELECT percentile_approx(salary, 0.5) AS median FROM emp;   -- 中位数
SELECT percentile_approx(salary, array(0.25,0.5,0.75)) FROM emp;

28. `histogram_numeric`

功能：生成直方图近似数据。histogram_numeric(col, nbins)

sql 复制代码

SELECT histogram_numeric(age, 10) FROM users;  -- 返回结构数组

五、开窗函数（分析函数）

29. `row_number` / `rank` / `dense_rank`

row_number()：从 1 开始连续编号，不并列。
rank()：并列时跳过后续序号（1,2,2,4）。
dense_rank()：并列不跳号（1,2,2,3）。

sql 复制代码

SELECT name, dept, salary,
       row_number() over (partition by dept order by salary desc) as rn
FROM emp;

30. `lag` / `lead`

lag(col, n, default)：取当前行前 n 行的值。
lead(col, n, default)：取后 n 行的值。

sql 复制代码

SELECT dt, amount,
       lag(amount, 1, 0) over (order by dt) as prev_amount
FROM sales;

31. `first_value` / `last_value`

取窗口内第一个或最后一个值（注意 last_value 默认窗口范围）。

sql 复制代码

SELECT name, dept,
       first_value(salary) over (partition by dept order by salary) as min_salary
FROM emp;

32. 聚合函数配合 `over`

sum(salary) over (partition by dept order by hire_date rows between unbounded preceding and current row)：累积和。

sql 复制代码

SELECT name, dept, salary,
       sum(salary) over (partition by dept order by hire_date) as running_total
FROM emp;

33. `ntile`

功能：将分组数据分成 n 个桶，返回桶编号（1~n）。

sql 复制代码

SELECT name, score, ntile(4) over (order by score) as quartile
FROM students;

六、集合与复杂类型函数

34. `array_contains`

功能：判断数组是否包含某元素。

sql 复制代码

SELECT array_contains(array(1,2,3), 2);   -- true

35. `sort_array`

功能：对数组排序（升序）。

sql 复制代码

SELECT sort_array(array(3,1,2));   -- [1,2,3]

36. `struct` / `named_struct`

创建结构体。

sql 复制代码

SELECT named_struct('name', 'Alice', 'age', 25) as person;

37. `map` 相关

map_keys(map)：返回所有键的数组。
map_values(map)：返回所有值的数组。
map_contains_key(map, key)：是否包含键。

sql 复制代码

SELECT map_keys(map('a',1,'b',2));  -- ['a','b']

七、类型转换与数学函数

38. `cast`

功能：显式类型转换。cast(value AS type)

sql 复制代码

SELECT cast('123' AS INT);           -- 123
SELECT cast('2025-04-18' AS DATE);   -- 2025-04-18

39. `round` / `floor` / `ceil`

round(col, d)：四舍五入保留 d 位小数。
floor / ceil：向下/向上取整。

sql 复制代码

SELECT round(3.14159, 2);   -- 3.14
SELECT floor(3.9);          -- 3

40. `rand`

功能：返回 0~1 之间的随机数。rand([seed])

sql 复制代码

SELECT rand();   -- 随机小数

41. `abs` / `pow` / `sqrt`

sql 复制代码

SELECT abs(-5), pow(2,3), sqrt(9);   -- 5, 8, 3

八、其他实用函数

42. `explode`（配合 Lateral View）

功能：将数组或 map 展开成多行。常与 LATERAL VIEW 联用。

sql 复制代码

SELECT id, hobby
FROM user_hobbies
LATERAL VIEW explode(hobbies) t AS hobby;

43. `posexplode`

功能：展开数组同时返回位置索引。

sql 复制代码

SELECT pos, val
FROM (SELECT array('a','b','c') AS arr) t
LATERAL VIEW posexplode(arr) AS pos, val;
-- 结果：(0,'a'), (1,'b'), (2,'c')

44. `reflect`

功能：调用 Java 静态方法。reflect(class, method, arg1, ...)

sql 复制代码

SELECT reflect('java.util.UUID', 'randomUUID');  -- 生成随机 UUID

45. `hash`

功能：计算哈希值（int）。

sql 复制代码

SELECT hash('Hive');   -- 返回整数

46. `md5` / `sha1` / `sha2`

加密哈希函数。

sql 复制代码

SELECT md5('password');   -- 5f4dcc3b5aa765d61d8327deb882cf99

九、UDF 与宏

47. 临时函数

注册自定义函数（当前会话有效）。

sql 复制代码

ADD JAR /path/to/my-udf.jar;
CREATE TEMPORARY FUNCTION my_len AS 'com.example.MyLengthUDF';
SELECT my_len('hello');

48. 宏

功能：创建可重用的表达式片段。CREATE TEMPORARY MACRO macro_name(参数) 表达式

sql 复制代码

CREATE TEMPORARY MACRO square(x) x * x;
SELECT square(5);   -- 25

Hive 常用函数详细总结

Hive 常用函数详细总结

目录

一、字符串函数

1. concat / concat_ws

2. substr / substring

3. length

4. upper / lower

5. trim / ltrim / rtrim

6. regexp_replace

7. regexp_extract

8. split

9. instr / locate

10. get_json_object

二、日期时间函数

11. current_date / current_timestamp

12. year / month / day / hour / minute / second

13. datediff

14. date_add / date_sub

15. to_date

16. from_unixtime / unix_timestamp

17. date_format

18. trunc / last_day / next_day

三、条件函数

19. if

20. case when

21. coalesce

22. nvl

23. nullif / isnull / isnotnull

四、聚合函数

24. count / sum / avg / max / min

25. collect_set / collect_list

26. array_distinct / size

27. percentile / percentile_approx

28. histogram_numeric

五、开窗函数（分析函数）

29. row_number / rank / dense_rank

30. lag / lead

31. first_value / last_value

32. 聚合函数配合 over

33. ntile

六、集合与复杂类型函数

34. array_contains

35. sort_array

36. struct / named_struct

37. map 相关

七、类型转换与数学函数

38. cast

39. round / floor / ceil

40. rand

41. abs / pow / sqrt

八、其他实用函数

42. explode（配合 Lateral View）

43. posexplode

44. reflect

45. hash

46. md5 / sha1 / sha2

九、UDF 与 宏

47. 临时函数

48. 宏

1. `concat` / `concat_ws`

2. `substr` / `substring`

3. `length`

4. `upper` / `lower`

5. `trim` / `ltrim` / `rtrim`

6. `regexp_replace`

7. `regexp_extract`

8. `split`

9. `instr` / `locate`

10. `get_json_object`

11. `current_date` / `current_timestamp`

12. `year` / `month` / `day` / `hour` / `minute` / `second`

13. `datediff`

14. `date_add` / `date_sub`

15. `to_date`

16. `from_unixtime` / `unix_timestamp`

17. `date_format`

18. `trunc` / `last_day` / `next_day`

19. `if`

20. `case when`

21. `coalesce`

22. `nvl`

23. `nullif` / `isnull` / `isnotnull`

24. `count` / `sum` / `avg` / `max` / `min`

25. `collect_set` / `collect_list`

26. `array_distinct` / `size`

27. `percentile` / `percentile_approx`

28. `histogram_numeric`

29. `row_number` / `rank` / `dense_rank`

30. `lag` / `lead`

31. `first_value` / `last_value`

32. 聚合函数配合 `over`

33. `ntile`

34. `array_contains`

35. `sort_array`

36. `struct` / `named_struct`

37. `map` 相关

38. `cast`

39. `round` / `floor` / `ceil`

40. `rand`

41. `abs` / `pow` / `sqrt`

42. `explode`（配合 Lateral View）

43. `posexplode`

44. `reflect`

45. `hash`

46. `md5` / `sha1` / `sha2`

九、UDF 与宏