2、Spark 函数_a/b/c

序号	类型	地址
1	Spark 函数	1、Spark函数_符号
2	Spark 函数	2、Spark 函数_a/b/c
3	Spark 函数	3、Spark 函数_d/e/f/j/h/i/j/k/l
4	Spark 函数	4、Spark 函数_m/n/o/p/q/r
5	Spark 函数	5、Spark函数_s/t
6	Spark 函数	6、Spark 函数_u/v/w/x/y/z

文章目录

- 1、A
- - abs
  - acos
  - acosh
  - add_months
  - aes_decrypt
  - aes_encrypt
  - aggregate
  - and
  - any
  - any_value
  - approx_count_distinct
  - approx_percentile
  - array_agg
  - array_compact
  - array_contains
  - array_distinct
  - array_except
  - array_insert
  - array_intersect
  - array_join
  - array_max
  - array_min
  - array_position
  - array_prepend
  - array_remove
  - array_repeat
  - array_size
  - array_sort
  - array_union
  - arrays_overlap
  - arrays_zip
  - ascii
  - asin
  - asinh
  - assert_true
  - atan
  - atan2
  - atanh
  - avg
- 2、B
- - base64
  - between
  - bigint
  - bin
  - binary
  - bit_and
  - bit_count
  - bit_get
  - bit_length
  - bit_or
  - bit_xor
  - bitmap_bit_position
  - bitmap_bucket_number
  - bitmap_construct_agg
  - bitmap_count
  - bitmap_or_agg
  - bool_and
  - bool_or
  - boolean
  - bround
  - btrim
- 3、C
- - cardinality
  - case
  - cast
  - cbrt
  - ceil
  - ceiling
  - char
  - char_length
  - character_length
  - chr
  - coalesce
  - collate
  - collation
  - collations
  - collect_list
  - collect_set
  - concat
  - concat_ws
  - contains
  - conv
  - convert_timezone
  - corr
  - cos
  - cosh
  - cot
  - count
  - count_if
  - count_min_sketch
  - covar_pop
  - covar_samp
  - crc32
  - csc
  - cume_dist
  - curdate
  - current_catalog
  - current_database
  - current_date
  - current_schema
  - current_timestamp
  - current_timezone
  - current_user

1、A

aes_decrypt(expr, key[, mode[, padding[, aad]]]) - Returns a decrypted value of expr using AES in mode with padding. Key lengths of 16, 24 and 32 bits are supported. Supported combinations of (mode, padding) are ('ECB', 'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS'). Optional additional authenticated data (AAD) is only supported for GCM. If provided for encryption, the identical AAD value must be provided for decryption. The default mode is GCM.

Arguments:

expr - The binary value to decrypt.
key - The passphrase to use to decrypt the data.
mode - Specifies which block cipher mode should be used to decrypt messages. Valid modes: ECB, GCM, CBC.
padding - Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC.
aad - Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption.

Examples:

sql 复制代码

> SELECT aes_decrypt(unhex('83F16B2AA704794132802D248E6BFD4E380078182D1544813898AC97E709B28A94'), '0000111122223333');
 Spark
> SELECT aes_decrypt(unhex('6E7CA17BBB468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210'), '0000111122223333', 'GCM');
 Spark SQL
> SELECT aes_decrypt(unbase64('3lmwu+Mw0H3fi5NDvcu9lg=='), '1234567890abcdef', 'ECB', 'PKCS');
 Spark SQL
> SELECT aes_decrypt(unbase64('2NYmDCjgXTbbxGA3/SnJEfFC/JQ7olk2VQWReIAAFKo='), '1234567890abcdef', 'CBC');
 Apache Spark
> SELECT aes_decrypt(unbase64('AAAAAAAAAAAAAAAAAAAAAPSd4mWyMZ5mhvjiAPQJnfg='), 'abcdefghijklmnop12345678ABCDEFGH', 'CBC', 'DEFAULT');
 Spark
> SELECT aes_decrypt(unbase64('AAAAAAAAAAAAAAAAQiYi+sTLm7KD9UcZ2nlRdYDe/PX4'), 'abcdefghijklmnop12345678ABCDEFGH', 'GCM', 'DEFAULT', 'This is an AAD mixed into the input');
 Spark

Since: 3.3.0

aes_encrypt

aes_encrypt(expr, key[, mode[, padding[, iv[, aad]]]]) - Returns an encrypted value of expr using AES in given mode with the specified padding. Key lengths of 16, 24 and 32 bits are supported. Supported combinations of (mode, padding) are ('ECB', 'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS'). Optional initialization vectors (IVs) are only supported for CBC and GCM modes. These must be 16 bytes for CBC and 12 bytes for GCM. If not provided, a random vector will be generated and prepended to the output. Optional additional authenticated data (AAD) is only supported for GCM. If provided for encryption, the identical AAD value must be provided for decryption. The default mode is GCM.

Arguments:

expr - The binary value to encrypt.
key - The passphrase to use to encrypt the data.
mode - Specifies which block cipher mode should be used to encrypt messages. Valid modes: ECB, GCM, CBC.
padding - Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC.
iv - Optional initialization vector. Only supported for CBC and GCM modes. Valid values: None or ''. 16-byte array for CBC mode. 12-byte array for GCM mode.
aad - Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption.

Examples:

sql 复制代码

> SELECT hex(aes_encrypt('Spark', '0000111122223333'));
 83F16B2AA704794132802D248E6BFD4E380078182D1544813898AC97E709B28A94
> SELECT hex(aes_encrypt('Spark SQL', '0000111122223333', 'GCM'));
 6E7CA17BBB468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210
> SELECT base64(aes_encrypt('Spark SQL', '1234567890abcdef', 'ECB', 'PKCS'));
 3lmwu+Mw0H3fi5NDvcu9lg==
> SELECT base64(aes_encrypt('Apache Spark', '1234567890abcdef', 'CBC', 'DEFAULT'));
 2NYmDCjgXTbbxGA3/SnJEfFC/JQ7olk2VQWReIAAFKo=
> SELECT base64(aes_encrypt('Spark', 'abcdefghijklmnop12345678ABCDEFGH', 'CBC', 'DEFAULT', unhex('00000000000000000000000000000000')));
 AAAAAAAAAAAAAAAAAAAAAPSd4mWyMZ5mhvjiAPQJnfg=
> SELECT base64(aes_encrypt('Spark', 'abcdefghijklmnop12345678ABCDEFGH', 'GCM', 'DEFAULT', unhex('000000000000000000000000'), 'This is an AAD mixed into the input'));
 AAAAAAAAAAAAAAAAQiYi+sTLm7KD9UcZ2nlRdYDe/PX4

Since: 3.3.0

aggregate

aggregate(expr, start, merge, finish) - Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.

Examples:

sql 复制代码

> SELECT aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x);
 6
> SELECT aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x, acc -> acc * 10);
 60

Since: 2.4.0

and

expr1 and expr2 - Logical AND.

Examples:

sql 复制代码

> SELECT true and true;
 true
> SELECT true and false;
 false
> SELECT true and NULL;
 NULL
> SELECT false and NULL;
 false

Since: 1.0.0

any

any(expr) - Returns true if at least one value of expr is true.

Examples:

sql 复制代码

> SELECT any(col) FROM VALUES (true), (false), (false) AS tab(col);
 true
> SELECT any(col) FROM VALUES (NULL), (true), (false) AS tab(col);
 true
> SELECT any(col) FROM VALUES (false), (false), (NULL) AS tab(col);
 false

Since: 3.0.0

any_value

any_value(expr[, isIgnoreNull]) - Returns some value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values.

Examples:

sql 复制代码

> SELECT any_value(col) FROM VALUES (10), (5), (20) AS tab(col);
 10
> SELECT any_value(col) FROM VALUES (NULL), (5), (20) AS tab(col);
 NULL
> SELECT any_value(col, true) FROM VALUES (NULL), (5), (20) AS tab(col);
 5

Note:

The function is non-deterministic.

Since: 3.4.0

approx_count_distinct

approx_count_distinct(expr[, relativeSD]) - Returns the estimated cardinality by HyperLogLog++. relativeSD defines the maximum relative standard deviation allowed.

Examples:

sql 复制代码

> SELECT approx_count_distinct(col1) FROM VALUES (1), (1), (2), (2), (3) tab(col1);
 3

Since: 1.6.0

approx_percentile

approx_percentile(col, percentage [, accuracy]) - Returns the approximate percentile of the numeric or ansi interval column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The accuracy parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation. When percentage is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column col at the given percentage array.

Examples:\n\n```sql

SELECT approx_percentile(col, array(0.5, 0.4, 0.1), 100) FROM VALUES (0), (1), (2), (10) AS tab(col);
$1,1,0$
SELECT approx_percentile(col, 0.5, 100) FROM VALUES (0), (6), (7), (9), (10) AS tab(col);

7

SELECT approx_percentile(col, 0.5, 100) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '1' MONTH), (INTERVAL '2' MONTH), (INTERVAL '10' MONTH) AS tab(col);

0-1

SELECT approx_percentile(col, array(0.5, 0.7), 100) FROM VALUES (INTERVAL '0' SECOND), (INTERVAL '1' SECOND), (INTERVAL '2' SECOND), (INTERVAL '10' SECOND) AS tab(col);
$0 00:00:01.000000000,0 00:00:02.000000000$

复制代码

**Since:** 2.1.0


---
### [array](https://spark.apache.org/docs/latest/api/sql/#array)

array(expr, ...) - Returns an array with the given elements.

**Examples:**

```sql
> SELECT array(1, 2, 3);
 [1,2,3]

Since: 1.1.0

array_agg

array_agg(expr) - Collects and returns a list of non-unique elements.

Examples:\n\n```sql

SELECT array_agg(col) FROM VALUES (1), (2), (1) AS tab(col);
$1,2,1$

复制代码

**Note:**

The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

**Since:** 3.3.0


---
### [array_append](https://spark.apache.org/docs/latest/api/sql/#array_append)

array_append(array, element) - Add the element at the end of the array passed as first argument. Type of element should be similar to type of the elements of the array. Null element is also appended into the array. But if the array passed, is NULL output is NULL

**Examples:**

```sql
> SELECT array_append(array('b', 'd', 'c', 'a'), 'd');
 ["b","d","c","a","d"]
> SELECT array_append(array(1, 2, 3, null), null);
 [1,2,3,null,null]
> SELECT array_append(CAST(null as Array<Int>), 2);
 NULL

Since: 3.4.0

array_compact

array_compact(array) - Removes null values from the array.

Examples:

sql 复制代码

> SELECT array_compact(array(1, 2, 3, null));
 [1,2,3]
> SELECT array_compact(array("a", "b", "c"));
 ["a","b","c"]

Since: 3.4.0

array_contains

array_contains(array, value) - Returns true if the array contains the value.

Examples:

sql 复制代码

> SELECT array_contains(array(1, 2, 3), 2);
 true

Since: 1.5.0

array_distinct

array_distinct(array) - Removes duplicate values from the array.

Examples:

sql 复制代码

> SELECT array_distinct(array(1, 2, 3, null, 3));
 [1,2,3,null]

Since: 2.4.0

array_except

array_except(array1, array2) - Returns an array of the elements in array1 but not in array2, without duplicates.

Examples:

sql 复制代码

> SELECT array_except(array(1, 2, 3), array(1, 3, 5));
 [2]

Since: 2.4.0

array_insert

array_insert(x, pos, val) - Places val into index pos of array x. Array indices start at 1. The maximum negative index is -1 for which the function inserts new element after the current last element. Index above array size appends the array, or prepends the array if index is negative, with 'null' elements.

Examples:

sqlsql 复制代码

> SELECT array_insert(array(1, 2, 3, 4), 5, 5);
 [1,2,3,4,5]
> SELECT array_insert(array(5, 4, 3, 2), -1, 1);
 [5,4,3,2,1]
> SELECT array_insert(array(5, 3, 2, 1), -4, 4);
 [5,4,3,2,1]

Since: 3.4.0

array_intersect

array_intersect(array1, array2) - Returns an array of the elements in the intersection of array1 and array2, without duplicates.

Examples:

sql 复制代码

> SELECT array_intersect(array(1, 2, 3), array(1, 3, 5));
 [1,3]

Since: 2.4.0

array_join

array_join(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array using the delimiter and an optional string to replace nulls. If no value is set for nullReplacement, any null value is filtered.

Examples:

sql 复制代码

> SELECT array_join(array('hello', 'world'), ' ');
 hello world
> SELECT array_join(array('hello', null ,'world'), ' ');
 hello world
> SELECT array_join(array('hello', null ,'world'), ' ', ',');
 hello , world

Since: 2.4.0

array_max

array_max(array) - Returns the maximum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped.

Examples:

sql 复制代码

> SELECT array_max(array(1, 20, null, 3));
 20

Since: 2.4.0

array_min

array_min(array) - Returns the minimum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped.

Examples:

sql 复制代码

> SELECT array_min(array(1, 20, null, 3));
 1

Since: 2.4.0

array_position

array_position(array, element) - Returns the (1-based) index of the first matching element of the array as long, or 0 if no match is found.

Examples:

sql 复制代码

> SELECT array_position(array(312, 773, 708, 708), 708);
 3
> SELECT array_position(array(312, 773, 708, 708), 414);
 0

Since: 2.4.0

array_prepend

array_prepend(array, element) - Add the element at the beginning of the array passed as first argument. Type of element should be the same as the type of the elements of the array. Null element is also prepended to the array. But if the array passed is NULL output is NULL

Examples:

sql 复制代码

> SELECT array_prepend(array('b', 'd', 'c', 'a'), 'd');
 ["d","b","d","c","a"]
> SELECT array_prepend(array(1, 2, 3, null), null);
 [null,1,2,3,null]
> SELECT array_prepend(CAST(null as Array<Int>), 2);
 NULL

Since: 3.5.0

array_remove

array_remove(array, element) - Remove all elements that equal to element from array.

Examples:

sql 复制代码

> SELECT array_remove(array(1, 2, 3, null, 3), 3);
 [1,2,null]

Since: 2.4.0

array_repeat

array_repeat(element, count) - Returns the array containing element count times.

Examples:

sql 复制代码

> SELECT array_repeat('123', 2);
 ["123","123"]

Since: 2.4.0

array_size

array_size(expr) - Returns the size of an array. The function returns null for null input.

Examples:

sql 复制代码

> SELECT array_size(array('b', 'd', 'c', 'a'));
 4

Since: 3.3.0

array_sort

array_sort(expr, func) - Sorts the input array. If func is omitted, sort in ascending order. The elements of the input array must be orderable. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the end of the returned array. Since 3.0.0 this function also sorts and returns the array based on the given comparator function. The comparator will take two arguments representing two elements of the array. It returns a negative integer, 0, or a positive integer as the first element is less than, equal to, or greater than the second element. If the comparator function returns null, the function will fail and raise an error.

Examples:

sql 复制代码

> SELECT array_sort(array(5, 6, 1), (left, right) -> case when left < right then -1 when left > right then 1 else 0 end);
 [1,5,6]
> SELECT array_sort(array('bc', 'ab', 'dc'), (left, right) -> case when left is null and right is null then 0 when left is null then -1 when right is null then 1 when left < right then 1 when left > right then -1 else 0 end);
 ["dc","bc","ab"]
> SELECT array_sort(array('b', 'd', null, 'c', 'a'));
 ["a","b","c","d",null]

Since: 2.4.0

array_union

array_union(array1, array2) - Returns an array of the elements in the union of array1 and array2, without duplicates.

Examples:

sql 复制代码

> SELECT array_union(array(1, 2, 3), array(1, 3, 5));
 [1,2,3,5]

Since: 2.4.0

arrays_overlap

arrays_overlap(a1, a2) - Returns true if a1 contains at least a non-null element present also in a2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise.

Examples:

sql 复制代码

> SELECT arrays_overlap(array(1, 2, 3), array(3, 4, 5));
 true

Since: 2.4.0

arrays_zip

arrays_zip(a1, a2, ...) - Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.

Examples:

sql 复制代码

> SELECT arrays_zip(array(1, 2, 3), array(2, 3, 4));
 [{"0":1,"1":2},{"0":2,"1":3},{"0":3,"1":4}]
> SELECT arrays_zip(array(1, 2), array(2, 3), array(3, 4));
 [{"0":1,"1":2,"2":3},{"0":2,"1":3,"2":4}]

Since: 2.4.0

ascii

ascii(str) - Returns the numeric value of the first character of str.

Examples:

sql 复制代码

> SELECT ascii('222');
 50
> SELECT ascii(2);
 50

Since: 1.5.0

asin

asin(expr) - Returns the inverse sine (a.k.a. arc sine) the arc sin of expr, as if computed by java.lang.Math.asin.

Examples:

sql 复制代码

> SELECT asin(0);
 0.0
> SELECT asin(2);
 NaN

Since: 1.4.0

asinh

asinh(expr) - Returns inverse hyperbolic sine of expr.

Examples:

sql 复制代码

> SELECT asinh(0);
 0.0

Since: 3.0.0

assert_true

assert_true(expr [, message]) - Throws an exception if expr is not true.

Examples:

sql 复制代码

> SELECT assert_true(0 < 1);
 NULL

Since: 2.0.0

atan

atan(expr) - Returns the inverse tangent (a.k.a. arc tangent) of expr, as if computed by java.lang.Math.atan

Examples:

sql 复制代码

> SELECT atan(0);
 0.0

Since: 1.4.0

atan2

atan2(exprY, exprX) - Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates (exprX, exprY), as if computed by java.lang.Math.atan2.

Arguments:

exprY - coordinate on y-axis
exprX - coordinate on x-axis

Examples:

sql 复制代码

> SELECT atan2(0, 0);
 0.0

Since: 1.4.0

atanh

atanh(expr) - Returns inverse hyperbolic tangent of expr.

Examples:

sql 复制代码

> SELECT atanh(0);
 0.0
> SELECT atanh(2);
 NaN

Since: 3.0.0

avg

avg(expr) - Returns the mean calculated from values of a group.

Examples:

sql 复制代码

> SELECT avg(col) FROM VALUES (1), (2), (3) AS tab(col);
 2.0
> SELECT avg(col) FROM VALUES (1), (2), (NULL) AS tab(col);
 1.5

Since: 1.0.0

2、B

base64

base64(bin) - Converts the argument from a binary bin to a base 64 string.

Examples:

sql 复制代码

> SELECT base64('Spark SQL');
 U3BhcmsgU1FM
> SELECT base64(x'537061726b2053514c');
 U3BhcmsgU1FM

Since: 1.5.0

between

input [NOT] between lower AND upper - evaluate if input is [not] in between lower and upper

Arguments:

input - An expression that is being compared with lower and upper bound.
lower - Lower bound of the between check.
upper - Upper bound of the between check.

Examples:

sql 复制代码

> SELECT 0.5 between 0.1 AND 1.0;
  true

Since: 1.0.0

bigint

bigint(expr) - Casts the value expr to the target data type bigint.

Since: 2.0.1

bin

bin(expr) - Returns the string representation of the long value expr represented in binary.

Examples:

sql 复制代码

> SELECT bin(13);
 1101
> SELECT bin(-13);
 1111111111111111111111111111111111111111111111111111111111110011
> SELECT bin(13.3);
 1101

Since: 1.5.0

binary

binary(expr) - Casts the value expr to the target data type binary.

Since: 2.0.1

bit_and

bit_and(expr) - Returns the bitwise AND of all non-null input values, or null if none.

Examples:

sql 复制代码

> SELECT bit_and(col) FROM VALUES (3), (5) AS tab(col);
 1

Since: 3.0.0

bit_count

bit_count(expr) - Returns the number of bits that are set in the argument expr as an unsigned 64-bit integer, or NULL if the argument is NULL.

Examples:

sql 复制代码

> SELECT bit_count(0);
 0

Since: 3.0.0

bit_get

bit_get(expr, pos) - Returns the value of the bit (0 or 1) at the specified position. The positions are numbered from right to left, starting at zero. The position argument cannot be negative.

Examples:

sql 复制代码

> SELECT bit_get(11, 0);
 1
> SELECT bit_get(11, 2);
 0

Since: 3.2.0

bit_length

bit_length(expr) - Returns the bit length of string data or number of bits of binary data.

Examples:

sql 复制代码

> SELECT bit_length('Spark SQL');
 72
> SELECT bit_length(x'537061726b2053514c');
 72

Since: 2.3.0

bit_or

bit_or(expr) - Returns the bitwise OR of all non-null input values, or null if none.

Examples:

sql 复制代码

> SELECT bit_or(col) FROM VALUES (3), (5) AS tab(col);
 7

Since: 3.0.0

bit_xor

bit_xor(expr) - Returns the bitwise XOR of all non-null input values, or null if none.

Examples:

sql 复制代码

> SELECT bit_xor(col) FROM VALUES (3), (5) AS tab(col);
 6

Since: 3.0.0

bitmap_bit_position

bitmap_bit_position(child) - Returns the bit position for the given input child expression.

Examples:

sql 复制代码

> SELECT bitmap_bit_position(1);
 0
> SELECT bitmap_bit_position(123);
 122

Since: 3.5.0

bitmap_bucket_number

bitmap_bucket_number(child) - Returns the bucket number for the given input child expression.

Examples:

sql 复制代码

> SELECT bitmap_bucket_number(123);
 1
> SELECT bitmap_bucket_number(0);
 0

Since: 3.5.0

bitmap_construct_agg

bitmap_construct_agg(child) - Returns a bitmap with the positions of the bits set from all the values from the child expression. The child expression will most likely be bitmap_bit_position().

Examples:

sql 复制代码

> SELECT substring(hex(bitmap_construct_agg(bitmap_bit_position(col))), 0, 6) FROM VALUES (1), (2), (3) AS tab(col);
 070000
> SELECT substring(hex(bitmap_construct_agg(bitmap_bit_position(col))), 0, 6) FROM VALUES (1), (1), (1) AS tab(col);
 010000

Since: 3.5.0

bitmap_count

bitmap_count(child) - Returns the number of set bits in the child bitmap.

Examples:

sql 复制代码

> SELECT bitmap_count(X '1010');
 2
> SELECT bitmap_count(X 'FFFF');
 16
> SELECT bitmap_count(X '0');
 0

Since: 3.5.0

bitmap_or_agg

bitmap_or_agg(child) - Returns a bitmap that is the bitwise OR of all of the bitmaps from the child expression. The input should be bitmaps created from bitmap_construct_agg().

Examples:

sql 复制代码

> SELECT substring(hex(bitmap_or_agg(col)), 0, 6) FROM VALUES (X '10'), (X '20'), (X '40') AS tab(col);
 700000
> SELECT substring(hex(bitmap_or_agg(col)), 0, 6) FROM VALUES (X '10'), (X '10'), (X '10') AS tab(col);
 100000

Since: 3.5.0

bool_and

bool_and(expr) - Returns true if all values of expr are true.

Examples:

sql 复制代码

> SELECT bool_and(col) FROM VALUES (true), (true), (true) AS tab(col);
 true
> SELECT bool_and(col) FROM VALUES (NULL), (true), (true) AS tab(col);
 true
> SELECT bool_and(col) FROM VALUES (true), (false), (true) AS tab(col);
 false

Since: 3.0.0

bool_or

bool_or(expr) - Returns true if at least one value of expr is true.

Examples:

sql 复制代码

> SELECT bool_or(col) FROM VALUES (true), (false), (false) AS tab(col);
 true
> SELECT bool_or(col) FROM VALUES (NULL), (true), (false) AS tab(col);
 true
> SELECT bool_or(col) FROM VALUES (false), (false), (NULL) AS tab(col);
 false

Since: 3.0.0

boolean

boolean(expr) - Casts the value expr to the target data type boolean.

Since: 2.0.1

bround

bround(expr, d) - Returns expr rounded to d decimal places using HALF_EVEN rounding mode.

Examples:

sql 复制代码

> SELECT bround(2.5, 0);
 2
> SELECT bround(25, -1);
 20

Since: 2.0.0

btrim

btrim(str) - Removes the leading and trailing space characters from str.

btrim(str, trimStr) - Remove the leading and trailing trimStr characters from str.

Arguments:

str - a string expression
trimStr - the trim string characters to trim, the default value is a single space

Examples:

sql 复制代码

> SELECT btrim('    SparkSQL   ');
 SparkSQL
> SELECT btrim(encode('    SparkSQL   ', 'utf-8'));
 SparkSQL
> SELECT btrim('SSparkSQLS', 'SL');
 parkSQ
> SELECT btrim(encode('SSparkSQLS', 'utf-8'), encode('SL', 'utf-8'));
 parkSQ

Since: 3.2.0

3、C

cardinality

cardinality(expr) - Returns the size of an array or a map. This function returns -1 for null input only if spark.sql.ansi.enabled is false and spark.sql.legacy.sizeOfNull is true. Otherwise, it returns null for null input. With the default settings, the function returns null for null input.

Examples:

sql 复制代码

> SELECT cardinality(array('b', 'd', 'c', 'a'));
 4
> SELECT cardinality(map('a', 1, 'b', 2));
 2

Since: 2.4.0

case

CASE expr1 WHEN expr2 THEN expr3 [WHEN expr4 THEN expr5]* [ELSE expr6] END - When expr1 = expr2, returns expr3; when expr1 = expr4, return expr5; else return expr6.

Arguments:

expr1 - the expression which is one operand of comparison.
expr2, expr4 - the expressions each of which is the other operand of comparison.
expr3, expr5, expr6 - the branch value expressions and else value expression should all be same type or coercible to a common type.

Examples:

sql 复制代码

> SELECT CASE col1 WHEN 1 THEN 'one' WHEN 2 THEN 'two' ELSE '?' END FROM VALUES 1, 2, 3;
 one
 two
 ?
> SELECT CASE col1 WHEN 1 THEN 'one' WHEN 2 THEN 'two' END FROM VALUES 1, 2, 3;
 one
 two
 NULL

Since: 1.0.1

cast

cast(expr AS type) - Casts the value expr to the target data type type. expr :: type alternative casting syntax is also supported.

Examples:

sql 复制代码

> SELECT cast('10' as int);
 10
> SELECT '10' :: int;
 10

Since: 1.0.0

cbrt

cbrt(expr) - Returns the cube root of expr.

Examples:

sql 复制代码

> SELECT cbrt(27.0);
 3.0

Since: 1.4.0

ceil

ceil(expr[, scale]) - Returns the smallest number after rounding up that is not smaller than expr. An optional scale parameter can be specified to control the rounding behavior.

Examples:

sql 复制代码

> SELECT ceil(-0.1);
 0
> SELECT ceil(5);
 5
> SELECT ceil(3.1411, 3);
 3.142
> SELECT ceil(3.1411, -3);
 1000

Since: 3.3.0

ceiling

ceiling(expr[, scale]) - Returns the smallest number after rounding up that is not smaller than expr. An optional scale parameter can be specified to control the rounding behavior.

Examples:

sql 复制代码

> SELECT ceiling(-0.1);
 0
> SELECT ceiling(5);
 5
> SELECT ceiling(3.1411, 3);
 3.142
> SELECT ceiling(3.1411, -3);
 1000

Since: 3.3.0

char

char(expr) - Returns the ASCII character having the binary equivalent to expr. If n is larger than 256 the result is equivalent to chr(n % 256)

Examples:

sql 复制代码

> SELECT char(65);
 A

Since: 2.3.0

char_length

char_length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.

Examples:

sql 复制代码

> SELECT char_length('Spark SQL ');
 10
> SELECT char_length(x'537061726b2053514c');
 9
> SELECT CHAR_LENGTH('Spark SQL ');
 10
> SELECT CHARACTER_LENGTH('Spark SQL ');
 10

Since: 2.3.0

character_length

character_length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.

Examples:

sql 复制代码

> SELECT character_length('Spark SQL ');
 10
> SELECT character_length(x'537061726b2053514c');
 9
> SELECT CHAR_LENGTH('Spark SQL ');
 10
> SELECT CHARACTER_LENGTH('Spark SQL ');
 10

Since: 2.3.0

chr

chr(expr) - Returns the ASCII character having the binary equivalent to expr. If n is larger than 256 the result is equivalent to chr(n % 256)

Examples:

sql 复制代码

> SELECT chr(65);
 A

Since: 2.3.0

coalesce

coalesce(expr1, expr2, ...) - Returns the first non-null argument if exists. Otherwise, null.

Examples:

sql 复制代码

> SELECT coalesce(NULL, 1, NULL);
 1

Since: 1.0.0

collate

collate(expr, collationName) - Marks a given expression with the specified collation.

Arguments:

expr - String expression to perform collation on.
collationName - Foldable string expression that specifies the collation name.

Examples:

sql 复制代码

> SELECT COLLATION('Spark SQL' collate UTF8_LCASE);
SYSTEM.BUILTIN.UTF8_LCASE

Since: 4.0.0

collation

collation(expr) - Returns the collation name of a given expression.

Arguments:

expr - String expression to perform collation on.

Examples:

sql 复制代码

> SELECT collation('Spark SQL');
SYSTEM.BUILTIN.UTF8_BINARY

Since: 4.0.0

collations

collations() - Get all of the Spark SQL string collations

Examples:

sql 复制代码

> SELECT * FROM collations() WHERE NAME = 'UTF8_BINARY';
 SYSTEM  BUILTIN  UTF8_BINARY NULL  NULL  ACCENT_SENSITIVE  CASE_SENSITIVE  NO_PAD  NULL

Since: 4.0.0

collect_list

collect_list(expr) - Collects and returns a list of non-unique elements.

Examples:

sql 复制代码

> SELECT collect_list(col) FROM VALUES (1), (2), (1) AS tab(col);
 [1,2,1]

Note:

The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

Since: 2.0.0

collect_set

collect_set(expr) - Collects and returns a set of unique elements.

Examples:

sql 复制代码

> SELECT collect_set(col) FROM VALUES (1), (2), (1) AS tab(col);
 [1,2]

Note:

The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

Since: 2.0.0

concat

concat(col1, col2, ..., colN) - Returns the concatenation of col1, col2, ..., colN.

Examples:

sql 复制代码

> SELECT concat('Spark', 'SQL');
 SparkSQL
> SELECT concat(array(1, 2, 3), array(4, 5), array(6));
 [1,2,3,4,5,6]

Note:

Concat logic for arrays is available since 2.4.0.

Since: 1.5.0

concat_ws

concat_ws(sep[, str | array(str)]+) - Returns the concatenation of the strings separated by sep, skipping null values.

Examples:

sql 复制代码

> SELECT concat_ws(' ', 'Spark', 'SQL');
  Spark SQL
> SELECT concat_ws('s');

> SELECT concat_ws('/', 'foo', null, 'bar');
  foo/bar
> SELECT concat_ws(null, 'Spark', 'SQL');
  NULL

Since: 1.5.0

contains

contains(left, right) - Returns a boolean. The value is True if right is found inside left. Returns NULL if either input expression is NULL. Otherwise, returns False. Both left or right must be of STRING or BINARY type.

Examples:

sql 复制代码

> SELECT contains('Spark SQL', 'Spark');
 true
> SELECT contains('Spark SQL', 'SPARK');
 false
> SELECT contains('Spark SQL', null);
 NULL
> SELECT contains(x'537061726b2053514c', x'537061726b');
 true

Since: 3.3.0

conv

conv(num, from_base, to_base) - Convert num from from_base to to_base.

Examples:

sql 复制代码

> SELECT conv('100', 2, 10);
 4
> SELECT conv(-10, 16, -10);
 -16

Since: 1.5.0

convert_timezone

convert_timezone([sourceTz, ]targetTz, sourceTs) - Converts the timestamp without time zone sourceTs from the sourceTz time zone to targetTz.

Arguments:

sourceTz - the time zone for the input timestamp. If it is missed, the current session time zone is used as the source time zone.
targetTz - the time zone to which the input timestamp should be converted
sourceTs - a timestamp without time zone

Examples:

sql 复制代码

> SELECT convert_timezone('Europe/Brussels', 'America/Los_Angeles', timestamp_ntz'2021-12-06 00:00:00');
 2021-12-05 15:00:00
> SELECT convert_timezone('Europe/Brussels', timestamp_ntz'2021-12-05 15:00:00');
 2021-12-06 00:00:00

Since: 3.4.0

corr

corr(expr1, expr2) - Returns Pearson coefficient of correlation between a set of number pairs.

Examples:

sql 复制代码

> SELECT corr(c1, c2) FROM VALUES (3, 2), (3, 3), (6, 4) as tab(c1, c2);
 0.8660254037844387

Since: 1.6.0

cos

cos(expr) - Returns the cosine of expr, as if computed by java.lang.Math.cos.

Arguments:

expr - angle in radians

Examples:

sql 复制代码

> SELECT cos(0);
 1.0

Since: 1.4.0

cosh

cosh(expr) - Returns the hyperbolic cosine of expr, as if computed by java.lang.Math.cosh.

Arguments:

expr - hyperbolic angle

Examples:

sql 复制代码

> SELECT cosh(0);
 1.0

Since: 1.4.0

cot

cot(expr) - Returns the cotangent of expr, as if computed by 1/java.lang.Math.tan.

Arguments:

expr - angle in radians

Examples:

sql 复制代码

> SELECT cot(1);
 0.6420926159343306

Since: 2.3.0

count

count(*) - Returns the total number of retrieved rows, including rows containing null.

count(expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are all non-null.

count(DISTINCT expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are unique and non-null.

Examples:

sql 复制代码

> SELECT count(*) FROM VALUES (NULL), (5), (5), (20) AS tab(col);
 4
> SELECT count(col) FROM VALUES (NULL), (5), (5), (20) AS tab(col);
 3
> SELECT count(DISTINCT col) FROM VALUES (NULL), (5), (5), (10) AS tab(col);
 2

Since: 1.0.0

count_if

count_if(expr) - Returns the number of TRUE values for the expression.

Examples:

sql 复制代码

> SELECT count_if(col % 2 = 0) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col);
 2
> SELECT count_if(col IS NULL) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col);
 1

Since: 3.0.0

count_min_sketch

count_min_sketch(col, eps, confidence, seed) - Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to a CountMinSketch before usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.

Examples:

sql 复制代码

> SELECT hex(count_min_sketch(col, 0.5d, 0.5d, 1)) FROM VALUES (1), (2), (1) AS tab(col);
 0000000100000000000000030000000100000004000000005D8D6AB90000000000000000000000000000000200000000000000010000000000000000

Since: 2.2.0

covar_pop

covar_pop(expr1, expr2) - Returns the population covariance of a set of number pairs.

Examples:

sql 复制代码

> SELECT covar_pop(c1, c2) FROM VALUES (1,1), (2,2), (3,3) AS tab(c1, c2);
 0.6666666666666666

Since: 2.0.0

covar_samp

covar_samp(expr1, expr2) - Returns the sample covariance of a set of number pairs.

Examples:

sql 复制代码

> SELECT covar_samp(c1, c2) FROM VALUES (1,1), (2,2), (3,3) AS tab(c1, c2);
 1.0

Since: 2.0.0

crc32

crc32(expr) - Returns a cyclic redundancy check value of the expr as a bigint.

Examples:

sql 复制代码

> SELECT crc32('Spark');
 1557323817

Since: 1.5.0

csc

csc(expr) - Returns the cosecant of expr, as if computed by 1/java.lang.Math.sin.

Arguments:

expr - angle in radians

Examples:

sql 复制代码

> SELECT csc(1);
 1.1883951057781212

Since: 3.3.0

cume_dist

cume_dist() - Computes the position of a value relative to all values in the partition.

Examples:

sql 复制代码

> SELECT a, b, cume_dist() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
 A1 1   0.6666666666666666
 A1 1   0.6666666666666666
 A1 2   1.0
 A2 3   1.0

Since: 2.0.0

curdate

curdate() - Returns the current date at the start of query evaluation. All calls of curdate within the same query return the same value.

Examples:

sql 复制代码

> SELECT curdate();
 2022-09-06

Since: 3.4.0

current_catalog

current_catalog() - Returns the current catalog.

Examples:

sql 复制代码

> SELECT current_catalog();
 spark_catalog

Since: 3.1.0

current_database

current_database() - Returns the current database.

Examples:

sql 复制代码

> SELECT current_database();
 default

Since: 1.6.0

current_date

current_date() - Returns the current date at the start of query evaluation. All calls of current_date within the same query return the same value.

current_date - Returns the current date at the start of query evaluation.

Examples:

sql 复制代码

> SELECT current_date();
 2020-04-25
> SELECT current_date;
 2020-04-25

Note:

The syntax without braces has been supported since 2.0.1.

Since: 1.5.0

current_schema

current_schema() - Returns the current database.

Examples:

sql 复制代码

> SELECT current_schema();
 default

Since: 3.4.0

current_timestamp

current_timestamp() - Returns the current timestamp at the start of query evaluation. All calls of current_timestamp within the same query return the same value.

current_timestamp - Returns the current timestamp at the start of query evaluation.

Examples:

sql 复制代码

> SELECT current_timestamp();
 2020-04-25 15:49:11.914
> SELECT current_timestamp;
 2020-04-25 15:49:11.914

Note:

The syntax without braces has been supported since 2.0.1.

Since: 1.5.0

current_timezone

current_timezone() - Returns the current session local timezone.

Examples:

sql 复制代码

> SELECT current_timezone();
 Asia/Shanghai

Since: 3.1.0

current_user

current_user() - user name of current execution context.

Examples:

sql 复制代码

> SELECT current_user();
 mockingjay

Since: 3.2.0