6、Spark 函数_u/v/w/x/y/z

序号	类型	地址
1	Spark 函数	1、Spark函数_符号
2	Spark 函数	2、Spark 函数_a/b/c
3	Spark 函数	3、Spark 函数_d/e/f/j/h/i/j/k/l
4	Spark 函数	4、Spark 函数_m/n/o/p/q/r
5	Spark 函数	5、Spark函数_s/t
6	Spark 函数	6、Spark 函数_u/v/w/x/y/z

文章目录

- 21、U
- - ucase
  - unbase64
  - unhex
  - uniform
  - unix_date
  - unix_micros
  - unix_millis
  - unix_seconds
  - unix_timestamp
  - upper
  - url_decode
  - url_encode
  - user
  - uuid
- 22、V
- 23、W
- - weekday
  - weekofyear
  - when
  - width_bucket
  - window
  - window_time
- 24、X
- 25、Y
- - year
- 26、Z
- - zeroifnull
  - zip_with

21、U

uniform(min, max[, seed]) - Returns a random value with independent and identically distributed (i.i.d.) values with the specified range of numbers. The random seed is optional. The provided numbers specifying the minimum and maximum values of the range must be constant. If both of these numbers are integers, then the result will also be an integer. Otherwise if one or both of these are floating-point numbers, then the result will also be a floating-point number.

Examples:

sql 复制代码

> SELECT uniform(10, 20, 0) > 0 AS result;
true

Since: 4.0.0

unix_date

unix_date(date) - Returns the number of days since 1970-01-01.

Examples:

sql 复制代码

> SELECT unix_date(DATE("1970-01-02"));
 1

Since: 3.1.0

unix_micros

unix_micros(timestamp) - Returns the number of microseconds since 1970-01-01 00:00:00 UTC.

Examples:

sql 复制代码

> SELECT unix_micros(TIMESTAMP('1970-01-01 00:00:01Z'));
 1000000

Since: 3.1.0

unix_millis

unix_millis(timestamp) - Returns the number of milliseconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision.

Examples:

sql 复制代码

> SELECT unix_millis(TIMESTAMP('1970-01-01 00:00:01Z'));
 1000

Since: 3.1.0

unix_seconds

unix_seconds(timestamp) - Returns the number of seconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision.

Examples:

sql 复制代码

> SELECT unix_seconds(TIMESTAMP('1970-01-01 00:00:01Z'));
 1

Since: 3.1.0

unix_timestamp

unix_timestamp([timeExp[, fmt]]) - Returns the UNIX timestamp of current or specified time.

Arguments:

timeExp - A date/timestamp or string. If not provided, this defaults to current time.
fmt - Date/time format pattern to follow. Ignored if timeExp is not a string. Default value is "yyyy-MM-dd HH:mm:ss". See Datetime Patterns for valid date and time format patterns.

Examples:

sql 复制代码

> SELECT unix_timestamp();
 1476884637
> SELECT unix_timestamp('2016-04-08', 'yyyy-MM-dd');
 1460041200

Since: 1.5.0

upper

upper(str) - Returns str with all characters changed to uppercase.

Examples:

sql 复制代码

> SELECT upper('SparkSql');
 SPARKSQL

Since: 1.0.1

url_decode

url_decode(str) - Decodes a str in 'application/x-www-form-urlencoded' format using a specific encoding scheme.

Arguments:

str - a string expression to decode

Examples:

sql 复制代码

> SELECT url_decode('https%3A%2F%2Fspark.apache.org');
 https://spark.apache.org

Since: 3.4.0

url_encode

url_encode(str) - Translates a string into 'application/x-www-form-urlencoded' format using a specific encoding scheme.

Arguments:

str - a string expression to be translated

Examples:

sql 复制代码

> SELECT url_encode('https://spark.apache.org');
 https%3A%2F%2Fspark.apache.org

Since: 3.4.0

user

user() - user name of current execution context.

Examples:

sql 复制代码

> SELECT user();
 mockingjay

Since: 3.4.0

uuid

uuid() - Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.

Examples:

sql 复制代码

> SELECT uuid();
 46707d92-02f4-4817-8116-a4c3b23e6266

Note:

The function is non-deterministic.

Since: 2.3.0

22、V

validate_utf8

validate_utf8(str) - Returns the original string if str is a valid UTF-8 string, otherwise throws an exception.

Arguments:

str - a string expression

Examples:

sql 复制代码

> SELECT validate_utf8('Spark');
 Spark
> SELECT validate_utf8(x'61');
 a

Since: 4.0.0

var_pop

var_pop(expr) - Returns the population variance calculated from values of a group.

Examples:

sql 复制代码

> SELECT var_pop(col) FROM VALUES (1), (2), (3) AS tab(col);
 0.6666666666666666

Since: 1.6.0

var_samp

var_samp(expr) - Returns the sample variance calculated from values of a group.

Examples:

sql 复制代码

> SELECT var_samp(col) FROM VALUES (1), (2), (3) AS tab(col);
 1.0

Since: 1.6.0

variance

variance(expr) - Returns the sample variance calculated from values of a group.

Examples:

sql 复制代码

> SELECT variance(col) FROM VALUES (1), (2), (3) AS tab(col);
 1.0

Since: 1.6.0

variant_explode

variant_explode(expr) - It separates a variant object/array into multiple rows containing its fields/elements. Its result schema is struct<pos int, key string, value variant>. pos is the position of the field/element in its parent object/array, and value is the field/element value. key is the field name when exploding a variant object, or is NULL when exploding a variant array. It ignores any input that is not a variant array/object, including SQL NULL, variant null, and any other variant values.

Examples:

sql 复制代码

> SELECT * from variant_explode(parse_json('["hello", "world"]'));
 0  NULL    "hello"
 1  NULL    "world"
> SELECT * from variant_explode(input => parse_json('{"a": true, "b": 3.14}'));
 0  a   true
 1  b   3.14

Since: 4.0.0

variant_explode_outer

variant_explode_outer(expr) - It separates a variant object/array into multiple rows containing its fields/elements. Its result schema is struct<pos int, key string, value variant>. pos is the position of the field/element in its parent object/array, and value is the field/element value. key is the field name when exploding a variant object, or is NULL when exploding a variant array. It ignores any input that is not a variant array/object, including SQL NULL, variant null, and any other variant values.

Examples:

sql 复制代码

> SELECT * from variant_explode_outer(parse_json('["hello", "world"]'));
 0  NULL    "hello"
 1  NULL    "world"
> SELECT * from variant_explode_outer(input => parse_json('{"a": true, "b": 3.14}'));
 0  a   true
 1  b   3.14

Since: 4.0.0

variant_get

variant_get(v, path[, type]) - Extracts a sub-variant from v according to path, and then cast the sub-variant to type. When type is omitted, it is default to variant. Returns null if the path does not exist. Throws an exception if the cast fails.

Examples:

sql 复制代码

> SELECT variant_get(parse_json('{"a": 1}'), '$.a', 'int');
 1
> SELECT variant_get(parse_json('{"a": 1}'), '$.b', 'int');
 NULL
> SELECT variant_get(parse_json('[1, "2"]'), '$[1]', 'string');
 2
> SELECT variant_get(parse_json('[1, "2"]'), '$[2]', 'string');
 NULL
> SELECT variant_get(parse_json('[1, "hello"]'), '$[1]');
 "hello"

Since: 4.0.0

version

version() - Returns the Spark version. The string contains 2 fields, the first being a release version and the second being a git revision.

Examples:

sql 复制代码

> SELECT version();
 3.1.0 a6d6ea3efedbad14d99c24143834cd4e2e52fb40

Since: 3.0.0

23、W

weekday

weekday(date) - Returns the day of the week for date/timestamp (0 = Monday, 1 = Tuesday, ..., 6 = Sunday).

Examples:

sql 复制代码

> SELECT weekday('2009-07-30');
 3

Since: 2.4.0

weekofyear

weekofyear(date) - Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days.

Examples:

sql 复制代码

> SELECT weekofyear('2008-02-20');
 8

Since: 1.5.0

when

CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When expr1 = true, returns expr2; else when expr3 = true, returns expr4; else returns expr5.

Arguments:

expr1, expr3 - the branch condition expressions should all be boolean type.
expr2, expr4, expr5 - the branch value expressions and else value expression should all be same type or coercible to a common type.

Examples:

sql 复制代码

> SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
 1.0
> SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
 2.0
> SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 END;
 NULL

Since: 1.0.1

width_bucket

width_bucket(value, min_value, max_value, num_bucket) - Returns the bucket number to which value would be assigned in an equiwidth histogram with num_bucket buckets, in the range min_value to max_value."

Examples:

sql 复制代码

> SELECT width_bucket(5.3, 0.2, 10.6, 5);
 3
> SELECT width_bucket(-2.1, 1.3, 3.4, 3);
 0
> SELECT width_bucket(8.1, 0.0, 5.7, 4);
 5
> SELECT width_bucket(-0.9, 5.2, 0.5, 2);
 3
> SELECT width_bucket(INTERVAL '0' YEAR, INTERVAL '0' YEAR, INTERVAL '10' YEAR, 10);
 1
> SELECT width_bucket(INTERVAL '1' YEAR, INTERVAL '0' YEAR, INTERVAL '10' YEAR, 10);
 2
> SELECT width_bucket(INTERVAL '0' DAY, INTERVAL '0' DAY, INTERVAL '10' DAY, 10);
 1
> SELECT width_bucket(INTERVAL '1' DAY, INTERVAL '0' DAY, INTERVAL '10' DAY, 10);
 2

Since: 3.1.0

window

window(time_column, window_duration[, slide_duration[, start_time]]) - Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. See 'Window Operations on Event Time' in Structured Streaming guide doc for detailed explanation and examples.

Arguments:

time_column - The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType.
window_duration - A string specifying the width of the window represented as "interval value". (See Interval Literal for more details.) Note that the duration is a fixed length of time, and does not vary over time according to a calendar.
slide_duration - A string specifying the sliding interval of the window represented as "interval value". A new window will be generated every slide_duration. Must be less than or equal to the window_duration. This duration is likewise absolute, and does not vary according to a calendar.
start_time - The offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide start_time as 15 minutes.

Examples:

sql 复制代码

> SELECT a, window.start, window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '5 minutes') ORDER BY a, start;
  A1    2021-01-01 00:00:00 2021-01-01 00:05:00 2
  A1    2021-01-01 00:05:00 2021-01-01 00:10:00 1
  A2    2021-01-01 00:00:00 2021-01-01 00:05:00 1
> SELECT a, window.start, window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '10 minutes', '5 minutes') ORDER BY a, start;
  A1    2020-12-31 23:55:00 2021-01-01 00:05:00 2
  A1    2021-01-01 00:00:00 2021-01-01 00:10:00 3
  A1    2021-01-01 00:05:00 2021-01-01 00:15:00 1
  A2    2020-12-31 23:55:00 2021-01-01 00:05:00 1
  A2    2021-01-01 00:00:00 2021-01-01 00:10:00 1

Since: 2.0.0

window_time

window_time(window_column) - Extract the time value from time/session window column which can be used for event time value of window. The extracted time is (window.end - 1) which reflects the fact that the aggregating windows have exclusive upper bound - [start, end) See 'Window Operations on Event Time' in Structured Streaming guide doc for detailed explanation and examples.

Arguments:

window_column - The column representing time/session window.

Examples:

sql 复制代码

> SELECT a, window.start as start, window.end as end, window_time(window), cnt FROM (SELECT a, window, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '5 minutes') ORDER BY a, window.start);
  A1    2021-01-01 00:00:00 2021-01-01 00:05:00 2021-01-01 00:04:59.999999  2
  A1    2021-01-01 00:05:00 2021-01-01 00:10:00 2021-01-01 00:09:59.999999  1
  A2    2021-01-01 00:00:00 2021-01-01 00:05:00 2021-01-01 00:04:59.999999  1

Since: 3.4.0

24、X

xpath

xpath(xml, xpath) - Returns a string array of values within the nodes of xml that match the XPath expression.

Examples:

sql 复制代码

> SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b/text()');
 ["b1","b2","b3"]
> SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b');
 [null,null,null]

Since: 2.0.0

xpath_boolean

xpath_boolean(xml, xpath) - Returns true if the XPath expression evaluates to true, or if a matching node is found.

Examples:

sql 复制代码

> SELECT xpath_boolean('<a><b>1</b></a>','a/b');
 true

Since: 2.0.0

xpath_double

xpath_double(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.

Examples:

sql 复制代码

> SELECT xpath_double('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
 3.0

Since: 2.0.0

xpath_float

xpath_float(xml, xpath) - Returns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.

Examples:

sql 复制代码

> SELECT xpath_float('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
 3.0

Since: 2.0.0

xpath_int

xpath_int(xml, xpath) - Returns an integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.

Examples:

sql 复制代码

> SELECT xpath_int('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
 3

Since: 2.0.0

xpath_long

xpath_long(xml, xpath) - Returns a long integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.

Examples:

sql 复制代码

> SELECT xpath_long('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
 3

Since: 2.0.0

xpath_number

xpath_number(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.

Examples:

sql 复制代码

> SELECT xpath_number('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
 3.0

Since: 2.0.0

xpath_short

xpath_short(xml, xpath) - Returns a short integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.

Examples:

sql 复制代码

> SELECT xpath_short('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
 3

Since: 2.0.0

xpath_string

xpath_string(xml, xpath) - Returns the text contents of the first xml node that matches the XPath expression.

Examples:

sql 复制代码

> SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c');
 cc

Since: 2.0.0

xxhash64

xxhash64(expr1, expr2, ...) - Returns a 64-bit hash value of the arguments. Hash seed is 42.

Examples:

sql 复制代码

> SELECT xxhash64('Spark', array(123), 2);
 5602566077635097486

Since: 3.0.0

25、Y

year

year(date) - Returns the year component of the date/timestamp.

Examples:

sql 复制代码

> SELECT year('2016-07-30');
 2016

Since: 1.5.0

26、Z

zeroifnull

zeroifnull(expr) - Returns zero if expr is equal to null, or expr otherwise.

Examples:

sql 复制代码

> SELECT zeroifnull(NULL);
 0
> SELECT zeroifnull(2);
 2

Since: 4.0.0

zip_with

zip_with(left, right, func) - Merges the two given arrays, element-wise, into a single array using function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying function.

Examples:

sql 复制代码

> SELECT zip_with(array(1, 2, 3), array('a', 'b', 'c'), (x, y) -> (y, x));
 [{"y":"a","x":1},{"y":"b","x":2},{"y":"c","x":3}]
> SELECT zip_with(array(1, 2), array(3, 4), (x, y) -> x + y);
 [4,6]
> SELECT zip_with(array('a', 'b', 'c'), array('d', 'e', 'f'), (x, y) -> concat(x, y));
 ["ad","be","cf"]

Since: 2.4.0