| 序号 | 类型 | 地址 |
|---|---|---|
| 1 | Spark 函数 | 1、Spark函数_符号 |
| 2 | Spark 函数 | 2、Spark 函数_a/b/c |
| 3 | Spark 函数 | 3、Spark 函数_d/e/f/j/h/i/j/k/l |
| 4 | Spark 函数 | 4、Spark 函数_m/n/o/p/q/r |
| 5 | Spark 函数 | 5、Spark函数_s/t |
| 6 | Spark 函数 | 6、Spark 函数_u/v/w/x/y/z |
文章目录
21、U
ucase
ucase(str) - Returns str with all characters changed to uppercase.
Examples:
sql
> SELECT ucase('SparkSql');
SPARKSQL
Since: 1.0.1
unbase64
unbase64(str) - Converts the argument from a base 64 string str to a binary.
Examples:
sql
> SELECT unbase64('U3BhcmsgU1FM');
Spark SQL
Since: 1.5.0
unhex
unhex(expr) - Converts hexadecimal expr to binary.
Examples:
sql
> SELECT decode(unhex('537061726B2053514C'), 'UTF-8');
Spark SQL
Since: 1.5.0
uniform
uniform(min, max[, seed]) - Returns a random value with independent and identically distributed (i.i.d.) values with the specified range of numbers. The random seed is optional. The provided numbers specifying the minimum and maximum values of the range must be constant. If both of these numbers are integers, then the result will also be an integer. Otherwise if one or both of these are floating-point numbers, then the result will also be a floating-point number.
Examples:
sql
> SELECT uniform(10, 20, 0) > 0 AS result;
true
Since: 4.0.0
unix_date
unix_date(date) - Returns the number of days since 1970-01-01.
Examples:
sql
> SELECT unix_date(DATE("1970-01-02"));
1
Since: 3.1.0
unix_micros
unix_micros(timestamp) - Returns the number of microseconds since 1970-01-01 00:00:00 UTC.
Examples:
sql
> SELECT unix_micros(TIMESTAMP('1970-01-01 00:00:01Z'));
1000000
Since: 3.1.0
unix_millis
unix_millis(timestamp) - Returns the number of milliseconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision.
Examples:
sql
> SELECT unix_millis(TIMESTAMP('1970-01-01 00:00:01Z'));
1000
Since: 3.1.0
unix_seconds
unix_seconds(timestamp) - Returns the number of seconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision.
Examples:
sql
> SELECT unix_seconds(TIMESTAMP('1970-01-01 00:00:01Z'));
1
Since: 3.1.0
unix_timestamp
unix_timestamp([timeExp[, fmt]]) - Returns the UNIX timestamp of current or specified time.
Arguments:
- timeExp - A date/timestamp or string. If not provided, this defaults to current time.
- fmt - Date/time format pattern to follow. Ignored if
timeExpis not a string. Default value is "yyyy-MM-dd HH:mm:ss". See Datetime Patterns for valid date and time format patterns.
Examples:
sql
> SELECT unix_timestamp();
1476884637
> SELECT unix_timestamp('2016-04-08', 'yyyy-MM-dd');
1460041200
Since: 1.5.0
upper
upper(str) - Returns str with all characters changed to uppercase.
Examples:
sql
> SELECT upper('SparkSql');
SPARKSQL
Since: 1.0.1
url_decode
url_decode(str) - Decodes a str in 'application/x-www-form-urlencoded' format using a specific encoding scheme.
Arguments:
- str - a string expression to decode
Examples:
sql
> SELECT url_decode('https%3A%2F%2Fspark.apache.org');
https://spark.apache.org
Since: 3.4.0
url_encode
url_encode(str) - Translates a string into 'application/x-www-form-urlencoded' format using a specific encoding scheme.
Arguments:
str - a string expression to be translated
Examples:
sql
> SELECT url_encode('https://spark.apache.org');
https%3A%2F%2Fspark.apache.org
Since: 3.4.0
user
user() - user name of current execution context.
Examples:
sql
> SELECT user();
mockingjay
Since: 3.4.0
uuid
uuid() - Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.
Examples:
sql
> SELECT uuid();
46707d92-02f4-4817-8116-a4c3b23e6266
Note:
The function is non-deterministic.
Since: 2.3.0
22、V
validate_utf8
validate_utf8(str) - Returns the original string if str is a valid UTF-8 string, otherwise throws an exception.
Arguments:
- str - a string expression
Examples:
sql
> SELECT validate_utf8('Spark');
Spark
> SELECT validate_utf8(x'61');
a
Since: 4.0.0
var_pop
var_pop(expr) - Returns the population variance calculated from values of a group.
Examples:
sql
> SELECT var_pop(col) FROM VALUES (1), (2), (3) AS tab(col);
0.6666666666666666
Since: 1.6.0
var_samp
var_samp(expr) - Returns the sample variance calculated from values of a group.
Examples:
sql
> SELECT var_samp(col) FROM VALUES (1), (2), (3) AS tab(col);
1.0
Since: 1.6.0
variance
variance(expr) - Returns the sample variance calculated from values of a group.
Examples:
sql
> SELECT variance(col) FROM VALUES (1), (2), (3) AS tab(col);
1.0
Since: 1.6.0
variant_explode
variant_explode(expr) - It separates a variant object/array into multiple rows containing its fields/elements. Its result schema is struct<pos int, key string, value variant>. pos is the position of the field/element in its parent object/array, and value is the field/element value. key is the field name when exploding a variant object, or is NULL when exploding a variant array. It ignores any input that is not a variant array/object, including SQL NULL, variant null, and any other variant values.
Examples:
sql
> SELECT * from variant_explode(parse_json('["hello", "world"]'));
0 NULL "hello"
1 NULL "world"
> SELECT * from variant_explode(input => parse_json('{"a": true, "b": 3.14}'));
0 a true
1 b 3.14
Since: 4.0.0
variant_explode_outer
variant_explode_outer(expr) - It separates a variant object/array into multiple rows containing its fields/elements. Its result schema is struct<pos int, key string, value variant>. pos is the position of the field/element in its parent object/array, and value is the field/element value. key is the field name when exploding a variant object, or is NULL when exploding a variant array. It ignores any input that is not a variant array/object, including SQL NULL, variant null, and any other variant values.
Examples:
sql
> SELECT * from variant_explode_outer(parse_json('["hello", "world"]'));
0 NULL "hello"
1 NULL "world"
> SELECT * from variant_explode_outer(input => parse_json('{"a": true, "b": 3.14}'));
0 a true
1 b 3.14
Since: 4.0.0
variant_get
variant_get(v, path[, type]) - Extracts a sub-variant from v according to path, and then cast the sub-variant to type. When type is omitted, it is default to variant. Returns null if the path does not exist. Throws an exception if the cast fails.
Examples:
sql
> SELECT variant_get(parse_json('{"a": 1}'), '$.a', 'int');
1
> SELECT variant_get(parse_json('{"a": 1}'), '$.b', 'int');
NULL
> SELECT variant_get(parse_json('[1, "2"]'), '$[1]', 'string');
2
> SELECT variant_get(parse_json('[1, "2"]'), '$[2]', 'string');
NULL
> SELECT variant_get(parse_json('[1, "hello"]'), '$[1]');
"hello"
Since: 4.0.0
version
version() - Returns the Spark version. The string contains 2 fields, the first being a release version and the second being a git revision.
Examples:
sql
> SELECT version();
3.1.0 a6d6ea3efedbad14d99c24143834cd4e2e52fb40
Since: 3.0.0
23、W
weekday
weekday(date) - Returns the day of the week for date/timestamp (0 = Monday, 1 = Tuesday, ..., 6 = Sunday).
Examples:
sql
> SELECT weekday('2009-07-30');
3
Since: 2.4.0
weekofyear
weekofyear(date) - Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days.
Examples:
sql
> SELECT weekofyear('2008-02-20');
8
Since: 1.5.0
when
CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When expr1 = true, returns expr2; else when expr3 = true, returns expr4; else returns expr5.
Arguments:
- expr1, expr3 - the branch condition expressions should all be boolean type.
- expr2, expr4, expr5 - the branch value expressions and else value expression should all be same type or coercible to a common type.
Examples:
sql
> SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
1.0
> SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
2.0
> SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 END;
NULL
Since: 1.0.1
width_bucket
width_bucket(value, min_value, max_value, num_bucket) - Returns the bucket number to which value would be assigned in an equiwidth histogram with num_bucket buckets, in the range min_value to max_value."
Examples:
sql
> SELECT width_bucket(5.3, 0.2, 10.6, 5);
3
> SELECT width_bucket(-2.1, 1.3, 3.4, 3);
0
> SELECT width_bucket(8.1, 0.0, 5.7, 4);
5
> SELECT width_bucket(-0.9, 5.2, 0.5, 2);
3
> SELECT width_bucket(INTERVAL '0' YEAR, INTERVAL '0' YEAR, INTERVAL '10' YEAR, 10);
1
> SELECT width_bucket(INTERVAL '1' YEAR, INTERVAL '0' YEAR, INTERVAL '10' YEAR, 10);
2
> SELECT width_bucket(INTERVAL '0' DAY, INTERVAL '0' DAY, INTERVAL '10' DAY, 10);
1
> SELECT width_bucket(INTERVAL '1' DAY, INTERVAL '0' DAY, INTERVAL '10' DAY, 10);
2
Since: 3.1.0
window
window(time_column, window_duration[, slide_duration[, start_time]]) - Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. See 'Window Operations on Event Time' in Structured Streaming guide doc for detailed explanation and examples.
Arguments:
- time_column - The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType.
- window_duration - A string specifying the width of the window represented as "interval value". (See Interval Literal for more details.) Note that the duration is a fixed length of time, and does not vary over time according to a calendar.
- slide_duration - A string specifying the sliding interval of the window represented as "interval value". A new window will be generated every
slide_duration. Must be less than or equal to thewindow_duration. This duration is likewise absolute, and does not vary according to a calendar. - start_time - The offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide
start_timeas15 minutes.
Examples:
sql
> SELECT a, window.start, window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '5 minutes') ORDER BY a, start;
A1 2021-01-01 00:00:00 2021-01-01 00:05:00 2
A1 2021-01-01 00:05:00 2021-01-01 00:10:00 1
A2 2021-01-01 00:00:00 2021-01-01 00:05:00 1
> SELECT a, window.start, window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '10 minutes', '5 minutes') ORDER BY a, start;
A1 2020-12-31 23:55:00 2021-01-01 00:05:00 2
A1 2021-01-01 00:00:00 2021-01-01 00:10:00 3
A1 2021-01-01 00:05:00 2021-01-01 00:15:00 1
A2 2020-12-31 23:55:00 2021-01-01 00:05:00 1
A2 2021-01-01 00:00:00 2021-01-01 00:10:00 1
Since: 2.0.0
window_time
window_time(window_column) - Extract the time value from time/session window column which can be used for event time value of window. The extracted time is (window.end - 1) which reflects the fact that the aggregating windows have exclusive upper bound - [start, end) See 'Window Operations on Event Time' in Structured Streaming guide doc for detailed explanation and examples.
Arguments:
- window_column - The column representing time/session window.
Examples:
sql
> SELECT a, window.start as start, window.end as end, window_time(window), cnt FROM (SELECT a, window, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '5 minutes') ORDER BY a, window.start);
A1 2021-01-01 00:00:00 2021-01-01 00:05:00 2021-01-01 00:04:59.999999 2
A1 2021-01-01 00:05:00 2021-01-01 00:10:00 2021-01-01 00:09:59.999999 1
A2 2021-01-01 00:00:00 2021-01-01 00:05:00 2021-01-01 00:04:59.999999 1
Since: 3.4.0
24、X
xpath
xpath(xml, xpath) - Returns a string array of values within the nodes of xml that match the XPath expression.
Examples:
sql
> SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b/text()');
["b1","b2","b3"]
> SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b');
[null,null,null]
Since: 2.0.0
xpath_boolean
xpath_boolean(xml, xpath) - Returns true if the XPath expression evaluates to true, or if a matching node is found.
Examples:
sql
> SELECT xpath_boolean('<a><b>1</b></a>','a/b');
true
Since: 2.0.0
xpath_double
xpath_double(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
Examples:
sql
> SELECT xpath_double('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
3.0
Since: 2.0.0
xpath_float
xpath_float(xml, xpath) - Returns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
Examples:
sql
> SELECT xpath_float('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
3.0
Since: 2.0.0
xpath_int
xpath_int(xml, xpath) - Returns an integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Examples:
sql
> SELECT xpath_int('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
3
Since: 2.0.0
xpath_long
xpath_long(xml, xpath) - Returns a long integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Examples:
sql
> SELECT xpath_long('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
3
Since: 2.0.0
xpath_number
xpath_number(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
Examples:
sql
> SELECT xpath_number('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
3.0
Since: 2.0.0
xpath_short
xpath_short(xml, xpath) - Returns a short integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Examples:
sql
> SELECT xpath_short('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
3
Since: 2.0.0
xpath_string
xpath_string(xml, xpath) - Returns the text contents of the first xml node that matches the XPath expression.
Examples:
sql
> SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c');
cc
Since: 2.0.0
xxhash64
xxhash64(expr1, expr2, ...) - Returns a 64-bit hash value of the arguments. Hash seed is 42.
Examples:
sql
> SELECT xxhash64('Spark', array(123), 2);
5602566077635097486
Since: 3.0.0
25、Y
year
year(date) - Returns the year component of the date/timestamp.
Examples:
sql
> SELECT year('2016-07-30');
2016
Since: 1.5.0
26、Z
zeroifnull
zeroifnull(expr) - Returns zero if expr is equal to null, or expr otherwise.
Examples:
sql
> SELECT zeroifnull(NULL);
0
> SELECT zeroifnull(2);
2
Since: 4.0.0
zip_with
zip_with(left, right, func) - Merges the two given arrays, element-wise, into a single array using function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying function.
Examples:
sql
> SELECT zip_with(array(1, 2, 3), array('a', 'b', 'c'), (x, y) -> (y, x));
[{"y":"a","x":1},{"y":"b","x":2},{"y":"c","x":3}]
> SELECT zip_with(array(1, 2), array(3, 4), (x, y) -> x + y);
[4,6]
> SELECT zip_with(array('a', 'b', 'c'), array('d', 'e', 'f'), (x, y) -> concat(x, y));
["ad","be","cf"]
Since: 2.4.0