5、Spark函数_s/t

序号	类型	地址
1	Spark 函数	1、Spark函数_符号
2	Spark 函数	2、Spark 函数_a/b/c
3	Spark 函数	3、Spark 函数_d/e/f/j/h/i/j/k/l
4	Spark 函数	4、Spark 函数_m/n/o/p/q/r
5	Spark 函数	5、Spark函数_s/t
6	Spark 函数	6、Spark 函数_u/v/w/x/y/z

文章目录

- 19、S
- - schema_of_avro
  - schema_of_csv
  - schema_of_json
  - schema_of_variant
  - schema_of_variant_agg
  - schema_of_xml
  - sec
  - second
  - sentences
  - sequence
  - session_user
  - session_window
  - sha
  - sha1
  - sha2
  - shiftleft
  - shiftright
  - shiftrightunsigned
  - shuffle
  - sign
  - signum
  - sin
  - sinh
  - size
  - skewness
  - slice
  - smallint
  - some
  - sort_array
  - soundex
  - space
  - spark_partition_id
  - split
  - split_part
  - sql_keywords
  - sqrt
  - stack
  - startswith
  - std
  - stddev
  - stddev_pop
  - stddev_samp
  - str_to_map
  - string
  - string_agg
  - struct
  - substr
  - substring
  - substring_index
  - sum
- 20、T

19、S

sequence

sequence(start, stop, step) - Generates an array of elements from start to stop (inclusive), incrementing by step. The type of the returned elements is the same as the type of argument expressions.

Supported types are: byte, short, integer, long, date, timestamp.

The start and stop expressions must resolve to the same type. If start and stop expressions resolve to the 'date' or 'timestamp' type then the step expression must resolve to the 'interval' or 'year-month interval' or 'day-time interval' type, otherwise to the same type as the start and stop expressions.

Arguments:

start - an expression. The start of the range.
stop - an expression. The end the range (inclusive).
step - an optional expression. The step of the range. By default step is 1 if start is less than or equal to stop, otherwise -1. For the temporal sequences it's 1 day and -1 day respectively. If start is greater than stop then the step must be negative, and vice versa.

Examples:

sql 复制代码

> SELECT sequence(1, 5);
 [1,2,3,4,5]
> SELECT sequence(5, 1);
 [5,4,3,2,1]
> SELECT sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval 1 month);
 [2018-01-01,2018-02-01,2018-03-01]
> SELECT sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval '0-1' year to month);
 [2018-01-01,2018-02-01,2018-03-01]

Since: 2.4.0

session_user

session_user() - user name of current execution context.

Examples:

sql 复制代码

> SELECT session_user();
 mockingjay

Since: 4.0.0

session_window

session_window(time_column, gap_duration) - Generates session window given a timestamp specifying column and gap duration. See 'Types of time windows' in Structured Streaming guide doc for detailed explanation and examples.

Arguments:

time_column - The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType.
gap_duration - A string specifying the timeout of the session represented as "interval value" (See Interval Literal for more details.) for the fixed gap duration, or an expression which is applied for each input and evaluated to the "interval value" for the dynamic gap duration.

Examples:

sql 复制代码

> SELECT a, session_window.start, session_window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:10:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, session_window(b, '5 minutes') ORDER BY a, start;
  A1    2021-01-01 00:00:00 2021-01-01 00:09:30 2
  A1    2021-01-01 00:10:00 2021-01-01 00:15:00 1
  A2    2021-01-01 00:01:00 2021-01-01 00:06:00 1
> SELECT a, session_window.start, session_window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:10:00'), ('A2', '2021-01-01 00:01:00'), ('A2', '2021-01-01 00:04:30') AS tab(a, b) GROUP by a, session_window(b, CASE WHEN a = 'A1' THEN '5 minutes' WHEN a = 'A2' THEN '1 minute' ELSE '10 minutes' END) ORDER BY a, start;
  A1    2021-01-01 00:00:00 2021-01-01 00:09:30 2
  A1    2021-01-01 00:10:00 2021-01-01 00:15:00 1
  A2    2021-01-01 00:01:00 2021-01-01 00:02:00 1
  A2    2021-01-01 00:04:30 2021-01-01 00:05:30 1

Since: 3.2.0

sha

sha(expr) - Returns a sha1 hash value as a hex string of the expr.

Examples:

sql 复制代码

> SELECT sha('Spark');
 85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c

Since: 1.5.0

sha1

sha1(expr) - Returns a sha1 hash value as a hex string of the expr.

Examples:

sql 复制代码

> SELECT sha1('Spark');
 85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c

Since: 1.5.0

sha2

sha2(expr, bitLength) - Returns a checksum of SHA-2 family as a hex string of expr. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent to 256.

Examples:

sql 复制代码

> SELECT sha2('Spark', 256);
 529bc3b07127ecb7e53a4dcf1991d9152c24537d919178022b2c42657f79a26b

Since: 1.5.0

shiftleft

base shiftleft exp - Bitwise left shift.

Examples:

sql 复制代码

> SELECT shiftleft(2, 1);
 4
> SELECT 2 << 1;
 4

Note:

<< operator is added in Spark 4.0.0 as an alias for shiftleft.

Since: 1.5.0

shiftright

base shiftright expr - Bitwise (signed) right shift.

Examples:

sql 复制代码

> SELECT shiftright(4, 1);
 2
> SELECT 4 >> 1;
 2

Note:

>> operator is added in Spark 4.0.0 as an alias for shiftright.

Since: 1.5.0

shiftrightunsigned

base shiftrightunsigned expr - Bitwise unsigned right shift.

Examples:

sql 复制代码

> SELECT shiftrightunsigned(4, 1);
 2
> SELECT 4 >>> 1;
 2

Note:

>>> operator is added in Spark 4.0.0 as an alias for shiftrightunsigned.

Since: 1.5.0

shuffle

shuffle(array) - Returns a random permutation of the given array.

Examples:

sql 复制代码

> SELECT shuffle(array(1, 20, 3, 5));
 [3,1,5,20]
> SELECT shuffle(array(1, 20, null, 3));
 [20,null,3,1]

Note:

The function is non-deterministic.

Since: 2.4.0

sign

sign(expr) - Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive.

Examples:

sql 复制代码

> SELECT sign(40);
 1.0
> SELECT sign(INTERVAL -'100' YEAR);
 -1.0

Since: 1.4.0

signum

signum(expr) - Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive.

Examples:

sql 复制代码

> SELECT signum(40);
 1.0
> SELECT signum(INTERVAL -'100' YEAR);
 -1.0

Since: 1.4.0

sin

sin(expr) - Returns the sine of expr, as if computed by java.lang.Math.sin.

Arguments:

expr - angle in radians

Examples:

sql 复制代码

> SELECT sin(0);
 0.0

Since: 1.4.0

sinh

sinh(expr) - Returns hyperbolic sine of expr, as if computed by java.lang.Math.sinh.

Arguments:

expr - hyperbolic angle

Examples:

sql 复制代码

> SELECT sinh(0);
 0.0

Since: 1.4.0

size

size(expr) - Returns the size of an array or a map. This function returns -1 for null input only if spark.sql.ansi.enabled is false and spark.sql.legacy.sizeOfNull is true. Otherwise, it returns null for null input. With the default settings, the function returns null for null input.

Examples:

sql 复制代码

> SELECT size(array('b', 'd', 'c', 'a'));
 4
> SELECT size(map('a', 1, 'b', 2));
 2

Since: 1.5.0

skewness

skewness(expr) - Returns the skewness value calculated from values of a group.

Examples:

sql 复制代码

> SELECT skewness(col) FROM VALUES (-10), (-20), (100), (1000) AS tab(col);
 1.1135657469022011
> SELECT skewness(col) FROM VALUES (-1000), (-100), (10), (20) AS tab(col);
 -1.1135657469022011

Since: 1.6.0

slice

slice(x, start, length) - Subsets array x starting from index start (array indices start at 1, or starting from the end if start is negative) with the specified length.

Examples:

sql 复制代码

> SELECT slice(array(1, 2, 3, 4), 2, 2);
 [2,3]
> SELECT slice(array(1, 2, 3, 4), -2, 2);
 [3,4]

Since: 2.4.0

smallint

smallint(expr) - Casts the value expr to the target data type smallint.

Since: 2.0.1

some

some(expr) - Returns true if at least one value of expr is true.

Examples:

sql 复制代码

> SELECT some(col) FROM VALUES (true), (false), (false) AS tab(col);
 true
> SELECT some(col) FROM VALUES (NULL), (true), (false) AS tab(col);
 true
> SELECT some(col) FROM VALUES (false), (false), (NULL) AS tab(col);
 false

Since: 3.0.0

sort_array

sort_array(array[, ascendingOrder]) - Sorts the input array in ascending or descending order according to the natural ordering of the array elements. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order.

Examples:

sql 复制代码

> SELECT sort_array(array('b', 'd', null, 'c', 'a'), true);
 [null,"a","b","c","d"]
> SELECT sort_array(array('b', 'd', null, 'c', 'a'), false);
 ["d","c","b","a",null]

Since: 1.5.0

soundex

soundex(str) - Returns Soundex code of the string.

Examples:

sql 复制代码

> SELECT soundex('Miller');
 M460

Since: 1.5.0

space

space(n) - Returns a string consisting of n spaces.

Examples:

sql 复制代码

> SELECT concat(space(2), '1');
   1

Since: 1.5.0

spark_partition_id

spark_partition_id() - Returns the current partition id.

Examples:

sql 复制代码

> SELECT spark_partition_id();
 0

Since: 1.4.0

split

split(str, regex, limit) - Splits str around occurrences that match regex and returns an array with a length of at most limit

Arguments:

str - a string expression to split.
regex - a string representing a regular expression. The regex string should be a Java regular expression.
limit - an integer expression which controls the number of times the regex is applied.
- limit > 0: The resulting array's length will not be more than limit, and the resulting array's last entry will contain all input beyond the last matched regex.
- limit <= 0: regex will be applied as many times as possible, and the resulting array can be of any size.

Examples:

sql 复制代码

> SELECT split('oneAtwoBthreeC', '[ABC]');
 ["one","two","three",""]
> SELECT split('oneAtwoBthreeC', '[ABC]', -1);
 ["one","two","three",""]
> SELECT split('oneAtwoBthreeC', '[ABC]', 2);
 ["one","twoBthreeC"]

Since: 1.5.0

split_part

split_part(str, delimiter, partNum) - Splits str by delimiter and return requested part of the split (1-based). If any input is null, returns null. if partNum is out of range of split parts, returns empty string. If partNum is 0, throws an error. If partNum is negative, the parts are counted backward from the end of the string. If the delimiter is an empty string, the str is not split.

Examples:

sql 复制代码

> SELECT split_part('11.12.13', '.', 3);
 13

Since: 3.3.0

sql_keywords

sql_keywords() - Get Spark SQL keywords

Examples:

sql 复制代码

> SELECT * FROM sql_keywords() LIMIT 2;
 ADD  false
 AFTER  false

Since: 3.5.0

sqrt

sqrt(expr) - Returns the square root of expr.

Examples:

sql 复制代码

> SELECT sqrt(4);
 2.0

Since: 1.1.1

stack

stack(n, expr1, ..., exprk) - Separates expr1, ..., exprk into n rows. Uses column names col0, col1, etc. by default unless specified otherwise.

Examples:

sql 复制代码

> SELECT stack(2, 1, 2, 3);
 1  2
 3  NULL

Since: 2.0.0

startswith

startswith(left, right) - Returns a boolean. The value is True if left starts with right. Returns NULL if either input expression is NULL. Otherwise, returns False. Both left or right must be of STRING or BINARY type.

Examples:

sql 复制代码

> SELECT startswith('Spark SQL', 'Spark');
 true
> SELECT startswith('Spark SQL', 'SQL');
 false
> SELECT startswith('Spark SQL', null);
 NULL
> SELECT startswith(x'537061726b2053514c', x'537061726b');
 true
> SELECT startswith(x'537061726b2053514c', x'53514c');
 false

Since: 3.3.0

std

std(expr) - Returns the sample standard deviation calculated from values of a group.

Examples:

sql 复制代码

> SELECT std(col) FROM VALUES (1), (2), (3) AS tab(col);
 1.0

Since: 1.6.0

stddev

stddev(expr) - Returns the sample standard deviation calculated from values of a group.

Examples:

sql 复制代码

> SELECT stddev(col) FROM VALUES (1), (2), (3) AS tab(col);
 1.0

Since: 1.6.0

stddev_pop

stddev_pop(expr) - Returns the population standard deviation calculated from values of a group.

Examples:

sql 复制代码

> SELECT stddev_pop(col) FROM VALUES (1), (2), (3) AS tab(col);
 0.816496580927726

Since: 1.6.0

stddev_samp

stddev_samp(expr) - Returns the sample standard deviation calculated from values of a group.

Examples:

sql 复制代码

> SELECT stddev_samp(col) FROM VALUES (1), (2), (3) AS tab(col);
 1.0

Since: 1.6.0

str_to_map

str_to_map(text[, pairDelim[, keyValueDelim]]) - Creates a map after splitting the text into key/value pairs using delimiters. Default delimiters are ',' for pairDelim and ':' for keyValueDelim. Both pairDelim and keyValueDelim are treated as regular expressions.

Examples:

sql 复制代码

> SELECT str_to_map('a:1,b:2,c:3', ',', ':');
 {"a":"1","b":"2","c":"3"}
> SELECT str_to_map('a');
 {"a":null}

Since: 2.0.1

string

string(expr) - Casts the value expr to the target data type string.

Since: 2.0.1

string_agg

string_agg(expr[, delimiter])[ WITHIN GROUP (ORDER BY key [ASC | DESC] [,...])] - Returns the concatenation of non-NULL input values, separated by the delimiter ordered by key. If all values are NULL, NULL is returned.

Arguments:

expr - a string or binary expression to be concatenated.
delimiter - an optional string or binary foldable expression used to separate the input values. If NULL, the concatenation will be performed without a delimiter. Default is NULL.
key - an optional expression for ordering the input values. Multiple keys can be specified. If none are specified, the order of the rows in the result is non-deterministic.

Examples:

sql 复制代码

> SELECT string_agg(col) FROM VALUES ('a'), ('b'), ('c') AS tab(col);
 abc
> SELECT string_agg(col) WITHIN GROUP (ORDER BY col DESC) FROM VALUES ('a'), ('b'), ('c') AS tab(col);
 cba
> SELECT string_agg(col) FROM VALUES ('a'), (NULL), ('b') AS tab(col);
 ab
> SELECT string_agg(col) FROM VALUES ('a'), ('a') AS tab(col);
 aa
> SELECT string_agg(DISTINCT col) FROM VALUES ('a'), ('a'), ('b') AS tab(col);
 ab
> SELECT string_agg(col, ', ') FROM VALUES ('a'), ('b'), ('c') AS tab(col);
 a, b, c
> SELECT string_agg(col) FROM VALUES (NULL), (NULL) AS tab(col);
 NULL

Note:

If the order is not specified, the function is non-deterministic because the order of the rows may be non-deterministic after a shuffle.
If DISTINCT is specified, then expr and key must be the same expression.

Since: 4.0.0

struct

struct(col1, col2, col3, ...) - Creates a struct with the given field values.

Examples:

sql 复制代码

> SELECT struct(1, 2, 3);
 {"col1":1,"col2":2,"col3":3}

Since: 1.4.0

substr

substr(str, pos[, len]) - Returns the substring of str that starts at pos and is of length len, or the slice of byte array that starts at pos and is of length len.

substr(str FROM pos[ FOR len]]) - Returns the substring of str that starts at pos and is of length len, or the slice of byte array that starts at pos and is of length len.

Examples:

sql 复制代码

> SELECT substr('Spark SQL', 5);
 k SQL
> SELECT substr('Spark SQL', -3);
 SQL
> SELECT substr('Spark SQL', 5, 1);
 k
> SELECT substr('Spark SQL' FROM 5);
 k SQL
> SELECT substr('Spark SQL' FROM -3);
 SQL
> SELECT substr('Spark SQL' FROM 5 FOR 1);
 k
> SELECT substr(encode('Spark SQL', 'utf-8'), 5);
 k SQL

Since: 1.5.0

substring

substring(str, pos[, len]) - Returns the substring of str that starts at pos and is of length len, or the slice of byte array that starts at pos and is of length len.

substring(str FROM pos[ FOR len]]) - Returns the substring of str that starts at pos and is of length len, or the slice of byte array that starts at pos and is of length len.

Examples:

sql 复制代码

> SELECT substring('Spark SQL', 5);
 k SQL
> SELECT substring('Spark SQL', -3);
 SQL
> SELECT substring('Spark SQL', 5, 1);
 k
> SELECT substring('Spark SQL' FROM 5);
 k SQL
> SELECT substring('Spark SQL' FROM -3);
 SQL
> SELECT substring('Spark SQL' FROM 5 FOR 1);
 k
> SELECT substring(encode('Spark SQL', 'utf-8'), 5);
 k SQL

Since: 1.5.0

substring_index

substring_index(str, delim, count) - Returns the substring from str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. The function substring_index performs a case-sensitive match when searching for delim.

Examples:

sql 复制代码

> SELECT substring_index('www.apache.org', '.', 2);
 www.apache

Since: 1.5.0

sum

sum(expr) - Returns the sum calculated from values of a group.

Examples:

sql 复制代码

> SELECT sum(col) FROM VALUES (5), (10), (15) AS tab(col);
 30
> SELECT sum(col) FROM VALUES (NULL), (10), (15) AS tab(col);
 25
> SELECT sum(col) FROM VALUES (NULL), (NULL) AS tab(col);
 NULL

Since: 1.0.0

20、T

tan

tan(expr) - Returns the tangent of expr, as if computed by java.lang.Math.tan.

Arguments:

expr - angle in radians

Examples:

sql 复制代码

> SELECT tan(0);
 0.0

Since: 1.4.0

tanh

tanh(expr) - Returns the hyperbolic tangent of expr, as if computed by java.lang.Math.tanh.

Arguments:

expr - hyperbolic angle

Examples:

sql 复制代码

> SELECT tanh(0);
 0.0

Since: 1.4.0

timestamp

timestamp(expr) - Casts the value expr to the target data type timestamp.

Since: 2.0.1

timestamp_micros

timestamp_micros(microseconds) - Creates timestamp from the number of microseconds since UTC epoch.

Examples:

sql 复制代码

> SELECT timestamp_micros(1230219000123123);
 2008-12-25 07:30:00.123123

Since: 3.1.0

timestamp_millis

timestamp_millis(milliseconds) - Creates timestamp from the number of milliseconds since UTC epoch.

Examples:

sql 复制代码

> SELECT timestamp_millis(1230219000123);
 2008-12-25 07:30:00.123

Since: 3.1.0

timestamp_seconds

timestamp_seconds(seconds) - Creates timestamp from the number of seconds (can be fractional) since UTC epoch.

Examples:

sql 复制代码

> SELECT timestamp_seconds(1230219000);
 2008-12-25 07:30:00
> SELECT timestamp_seconds(1230219000.123);
 2008-12-25 07:30:00.123

Since: 3.1.0

tinyint

tinyint(expr) - Casts the value expr to the target data type tinyint.

Since: 2.0.1

to_avro

to_avro(child[, jsonFormatSchema]) - Converts a Catalyst binary input value into its corresponding Avro format result.

Examples:

sql 复制代码

> SELECT to_avro(s, '{"type": "record", "name": "struct", "fields": [{ "name": "u", "type": ["int","string"] }]}') IS NULL FROM (SELECT NULL AS s);
 [true]
> SELECT to_avro(s) IS NULL FROM (SELECT NULL AS s);
 [true]

Since: 4.0.0

to_binary

to_binary(str[, fmt]) - Converts the input str to a binary value based on the supplied fmt. fmt can be a case-insensitive string literal of "hex", "utf-8", "utf8", or "base64". By default, the binary format for conversion is "hex" if fmt is omitted. The function returns NULL if at least one of the input parameters is NULL.

Examples:

sql 复制代码

> SELECT to_binary('abc', 'utf-8');
 abc

Since: 3.3.0

to_char

to_char(expr, format) - Convert expr to a string based on the format. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input value, generating a result string of the same length as the corresponding sequence in the format string. The result string is left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of the decimal value, starts with 0, and is before the decimal point. Otherwise, it is padded with spaces. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. ' $': Specifies the location of the$ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the end of the format string; specifies that the result string will be wrapped by angle brackets if the input value is negative. ('<1>'). If expr is a datetime, format shall be a valid datetime pattern, see Datetime Patterns. If expr is a binary, it is converted to a string in one of the formats: 'base64': a base 64 string. 'hex': a string in the hexadecimal format. 'utf-8': the input binary is decoded to UTF-8 string.

Examples:

sql 复制代码

> SELECT to_char(454, '999');
 454
> SELECT to_char(454.00, '000D00');
 454.00
> SELECT to_char(12454, '99G999');
 12,454
> SELECT to_char(78.12, '$99.99');
 $78.12
> SELECT to_char(-12454.8, '99G999D9S');
 12,454.8-
> SELECT to_char(date'2016-04-08', 'y');
 2016
> SELECT to_char(x'537061726b2053514c', 'base64');
 U3BhcmsgU1FM
> SELECT to_char(x'537061726b2053514c', 'hex');
 537061726B2053514C
> SELECT to_char(encode('abc', 'utf-8'), 'utf-8');
 abc

Since: 3.4.0

to_csv

to_csv(expr[, options]) - Returns a CSV string with a given struct value

Examples:

sql 复制代码

> SELECT to_csv(named_struct('a', 1, 'b', 2));
 1,2
> SELECT to_csv(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy'));
 26/08/2015

Since: 3.0.0

to_date

to_date(date_str[, fmt]) - Parses the date_str expression with the fmt expression to a date. Returns null with invalid input. By default, it follows casting rules to a date if the fmt is omitted.

Arguments:

date_str - A string to be parsed to date.
fmt - Date format pattern to follow. See Datetime Patterns for valid date and time format patterns.

Examples:

sql 复制代码

> SELECT to_date('2009-07-30 04:17:52');
 2009-07-30
> SELECT to_date('2016-12-31', 'yyyy-MM-dd');
 2016-12-31

Since: 1.5.0

to_json

to_json(expr[, options]) - Returns a JSON string with a given struct value

Examples:

sql 复制代码

> SELECT to_json(named_struct('a', 1, 'b', 2));
 {"a":1,"b":2}
> SELECT to_json(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy'));
 {"time":"26/08/2015"}
> SELECT to_json(array(named_struct('a', 1, 'b', 2)));
 [{"a":1,"b":2}]
> SELECT to_json(map('a', named_struct('b', 1)));
 {"a":{"b":1}}
> SELECT to_json(map(named_struct('a', 1),named_struct('b', 2)));
 {"[1]":{"b":2}}
> SELECT to_json(map('a', 1));
 {"a":1}
> SELECT to_json(array(map('a', 1)));
 [{"a":1}]

Since: 2.2.0

to_number

to_number(expr, fmt) - Convert string 'expr' to a number based on the string format 'fmt'. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input string. If the 0/9 sequence starts with 0 and is before the decimal point, it can only match a digit sequence of the same size. Otherwise, if the sequence starts with 9 or is after the decimal point, it can match a digit sequence that has the same or smaller size. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. 'expr' must match the grouping separator relevant for the size of the number. ' $': Specifies the location of the$ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' allows '-' but 'MI' does not. 'PR': Only allowed at the end of the format string; specifies that 'expr' indicates a negative number with wrapping angled brackets. ('<1>').

Examples:

sql 复制代码

> SELECT to_number('454', '999');
 454
> SELECT to_number('454.00', '000.00');
 454.00
> SELECT to_number('12,454', '99,999');
 12454
> SELECT to_number('$78.12', '$99.99');
 78.12
> SELECT to_number('12,454.8-', '99,999.9S');
 -12454.8

Since: 3.3.0

to_protobuf

to_protobuf(child, messageName, descFilePath, options) - Converts a Catalyst binary input value into its corresponding Protobuf format result.

Examples:

sql 复制代码

> SELECT to_protobuf(s, 'Person', '/path/to/descriptor.desc', map('emitDefaultValues', 'true')) IS NULL FROM (SELECT NULL AS s);
 [true]

Since: 4.0.0

to_timestamp

to_timestamp(timestamp_str[, fmt]) - Parses the timestamp_str expression with the fmt expression to a timestamp. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt is omitted. The result data type is consistent with the value of configuration spark.sql.timestampType.

Arguments:

timestamp_str - A string to be parsed to timestamp.
fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns.

Examples:

sql 复制代码

> SELECT to_timestamp('2016-12-31 00:12:00');
 2016-12-31 00:12:00
> SELECT to_timestamp('2016-12-31', 'yyyy-MM-dd');
 2016-12-31 00:00:00

Since: 2.2.0

to_timestamp_ltz

to_timestamp_ltz(timestamp_str[, fmt]) - Parses the timestamp_str expression with the fmt expression to a timestamp with local time zone. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt is omitted.

Arguments:

timestamp_str - A string to be parsed to timestamp with local time zone.
fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns.

Examples:

sql 复制代码

> SELECT to_timestamp_ltz('2016-12-31 00:12:00');
 2016-12-31 00:12:00
> SELECT to_timestamp_ltz('2016-12-31', 'yyyy-MM-dd');
 2016-12-31 00:00:00

Since: 3.4.0

to_timestamp_ntz

to_timestamp_ntz(timestamp_str[, fmt]) - Parses the timestamp_str expression with the fmt expression to a timestamp without time zone. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt is omitted.

Arguments:

timestamp_str - A string to be parsed to timestamp without time zone.
fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns.

Examples:

sql 复制代码

> SELECT to_timestamp_ntz('2016-12-31 00:12:00');
 2016-12-31 00:12:00
> SELECT to_timestamp_ntz('2016-12-31', 'yyyy-MM-dd');
 2016-12-31 00:00:00

Since: 3.4.0

to_unix_timestamp

to_unix_timestamp(timeExp[, fmt]) - Returns the UNIX timestamp of the given time.

Arguments:

timeExp - A date/timestamp or string which is returned as a UNIX timestamp.
fmt - Date/time format pattern to follow. Ignored if timeExp is not a string. Default value is "yyyy-MM-dd HH:mm:ss". See Datetime Patterns for valid date and time format patterns.

Examples:

sql 复制代码

> SELECT to_unix_timestamp('2016-04-08', 'yyyy-MM-dd');
 1460098800

Since: 1.6.0

to_utc_timestamp

to_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.

Examples:

sql 复制代码

> SELECT to_utc_timestamp('2016-08-31', 'Asia/Seoul');
 2016-08-30 15:00:00

Since: 1.5.0

to_varchar

to_varchar(expr, format) - Convert expr to a string based on the format. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input value, generating a result string of the same length as the corresponding sequence in the format string. The result string is left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of the decimal value, starts with 0, and is before the decimal point. Otherwise, it is padded with spaces. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. ' $': Specifies the location of the$ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the end of the format string; specifies that the result string will be wrapped by angle brackets if the input value is negative. ('<1>'). If expr is a datetime, format shall be a valid datetime pattern, see Datetime Patterns. If expr is a binary, it is converted to a string in one of the formats: 'base64': a base 64 string. 'hex': a string in the hexadecimal format. 'utf-8': the input binary is decoded to UTF-8 string.

Examples:

sql 复制代码

> SELECT to_varchar(454, '999');
 454
> SELECT to_varchar(454.00, '000D00');
 454.00
> SELECT to_varchar(12454, '99G999');
 12,454
> SELECT to_varchar(78.12, '$99.99');
 $78.12
> SELECT to_varchar(-12454.8, '99G999D9S');
 12,454.8-
> SELECT to_varchar(date'2016-04-08', 'y');
 2016
> SELECT to_varchar(x'537061726b2053514c', 'base64');
 U3BhcmsgU1FM
> SELECT to_varchar(x'537061726b2053514c', 'hex');
 537061726B2053514C
> SELECT to_varchar(encode('abc', 'utf-8'), 'utf-8');
 abc

Since: 3.5.0

to_variant_object

to_variant_object(expr) - Convert a nested input (array/map/struct) into a variant where maps and structs are converted to variant objects which are unordered unlike SQL structs. Input maps can only have string keys.

Examples:

sql 复制代码

> SELECT to_variant_object(named_struct('a', 1, 'b', 2));
 {"a":1,"b":2}
> SELECT to_variant_object(array(1, 2, 3));
 [1,2,3]
> SELECT to_variant_object(array(named_struct('a', 1)));
 [{"a":1}]
> SELECT to_variant_object(array(map("a", 2)));
 [{"a":2}]

Since: 4.0.0

to_xml

to_xml(expr[, options]) - Returns a XML string with a given struct value

Examples:

sql 复制代码

> SELECT to_xml(named_struct('a', 1, 'b', 2));
 <ROW>
     <a>1</a>
     <b>2</b>
 </ROW>
> SELECT to_xml(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy'));
 <ROW>
     <time>26/08/2015</time>
 </ROW>

Since: 4.0.0

transform

transform(expr, func) - Transforms elements in an array using the function.

Examples:

sql 复制代码

> SELECT transform(array(1, 2, 3), x -> x + 1);
 [2,3,4]
> SELECT transform(array(1, 2, 3), (x, i) -> x + i);
 [1,3,5]

Since: 2.4.0

transform_keys

transform_keys(expr, func) - Transforms elements in a map using the function.

Examples:

sql 复制代码

> SELECT transform_keys(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 1);
 {2:1,3:2,4:3}
> SELECT transform_keys(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v);
 {2:1,4:2,6:3}

Since: 3.0.0

transform_values

transform_values(expr, func) - Transforms values in the map using the function.

Examples:

sql 复制代码

> SELECT transform_values(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> v + 1);
 {1:2,2:3,3:4}
> SELECT transform_values(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v);
 {1:2,2:4,3:6}

Since: 3.0.0

translate

translate(input, from, to) - Translates the input string by replacing the characters present in the from string with the corresponding characters in the to string.

Examples:

sql 复制代码

> SELECT translate('AaBbCc', 'abc', '123');
 A1B2C3

Since: 1.5.0

trim

trim(str) - Removes the leading and trailing space characters from str.

trim(BOTH FROM str) - Removes the leading and trailing space characters from str.

trim(LEADING FROM str) - Removes the leading space characters from str.

trim(TRAILING FROM str) - Removes the trailing space characters from str.

trim(trimStr FROM str) - Remove the leading and trailing trimStr characters from str.

trim(BOTH trimStr FROM str) - Remove the leading and trailing trimStr characters from str.

trim(LEADING trimStr FROM str) - Remove the leading trimStr characters from str.

trim(TRAILING trimStr FROM str) - Remove the trailing trimStr characters from str.

Arguments:

str - a string expression
trimStr - the trim string characters to trim, the default value is a single space
BOTH, FROM - these are keywords to specify trimming string characters from both ends of the string
LEADING, FROM - these are keywords to specify trimming string characters from the left end of the string
TRAILING, FROM - these are keywords to specify trimming string characters from the right end of the string

Examples:

sql 复制代码

> SELECT trim('    SparkSQL   ');
 SparkSQL
> SELECT trim(BOTH FROM '    SparkSQL   ');
 SparkSQL
> SELECT trim(LEADING FROM '    SparkSQL   ');
 SparkSQL
> SELECT trim(TRAILING FROM '    SparkSQL   ');
     SparkSQL
> SELECT trim('SL' FROM 'SSparkSQLS');
 parkSQ
> SELECT trim(BOTH 'SL' FROM 'SSparkSQLS');
 parkSQ
> SELECT trim(LEADING 'SL' FROM 'SSparkSQLS');
 parkSQLS
> SELECT trim(TRAILING 'SL' FROM 'SSparkSQLS');
 SSparkSQ

Since: 1.5.0

trunc

trunc(date, fmt) - Returns date with the time portion of the day truncated to the unit specified by the format model fmt.

Arguments:

date - date value or valid date string
fmt - the format representing the unit to be truncated to
- "YEAR", "YYYY", "YY" - truncate to the first date of the year that the date falls in
- "QUARTER" - truncate to the first date of the quarter that the date falls in
- "MONTH", "MM", "MON" - truncate to the first date of the month that the date falls in
- "WEEK" - truncate to the Monday of the week that the date falls in

Examples:

sql 复制代码

> SELECT trunc('2019-08-04', 'week');
 2019-07-29
> SELECT trunc('2019-08-04', 'quarter');
 2019-07-01
> SELECT trunc('2009-02-12', 'MM');
 2009-02-01
> SELECT trunc('2015-10-27', 'YEAR');
 2015-01-01

Since: 1.5.0

try_add

try_add(expr1, expr2) - Returns the sum of expr1and expr2 and the result is null on overflow. The acceptable input types are the same with the + operator.

Examples:

sql 复制代码

> SELECT try_add(1, 2);
 3
> SELECT try_add(2147483647, 1);
 NULL
> SELECT try_add(date'2021-01-01', 1);
 2021-01-02
> SELECT try_add(date'2021-01-01', interval 1 year);
 2022-01-01
> SELECT try_add(timestamp'2021-01-01 00:00:00', interval 1 day);
 2021-01-02 00:00:00
> SELECT try_add(interval 1 year, interval 2 year);
 3-0

Since: 3.2.0

try_aes_decrypt

try_aes_decrypt(expr, key[, mode[, padding[, aad]]]) - This is a special version of aes_decrypt that performs the same operation, but returns a NULL value instead of raising an error if the decryption cannot be performed.

Examples:

sql 复制代码

> SELECT try_aes_decrypt(unhex('6E7CA17BBB468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210'), '0000111122223333', 'GCM');
 Spark SQL
> SELECT try_aes_decrypt(unhex('----------468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210'), '0000111122223333', 'GCM');
 NULL

Since: 3.5.0

try_avg

try_avg(expr) - Returns the mean calculated from values of a group and the result is null on overflow.

Examples:

sql 复制代码

> SELECT try_avg(col) FROM VALUES (1), (2), (3) AS tab(col);
 2.0
> SELECT try_avg(col) FROM VALUES (1), (2), (NULL) AS tab(col);
 1.5
> SELECT try_avg(col) FROM VALUES (interval '2147483647 months'), (interval '1 months') AS tab(col);
 NULL

Since: 3.3.0

try_divide

try_divide(dividend, divisor) - Returns dividend/divisor. It always performs floating point division. Its result is always null if expr2 is 0. dividend must be a numeric or an interval. divisor must be a numeric.

Examples:

sql 复制代码

> SELECT try_divide(3, 2);
 1.5
> SELECT try_divide(2L, 2L);
 1.0
> SELECT try_divide(1, 0);
 NULL
> SELECT try_divide(interval 2 month, 2);
 0-1
> SELECT try_divide(interval 2 month, 0);
 NULL

Since: 3.2.0

try_element_at

try_element_at(array, index) - Returns element of array at given (1-based) index. If Index is 0, Spark will throw an error. If index < 0, accesses elements from the last to the first. The function always returns NULL if the index exceeds the length of the array.

try_element_at(map, key) - Returns value for given key. The function always returns NULL if the key is not contained in the map.

Examples:

sql 复制代码

> SELECT try_element_at(array(1, 2, 3), 2);
 2
> SELECT try_element_at(map(1, 'a', 2, 'b'), 2);
 b

Since: 3.3.0

try_make_interval

try_make_interval([years[, months[, weeks[, days[, hours[, mins[, secs]]]]]]]) - This is a special version of make_interval that performs the same operation, but returns NULL when an overflow occurs.

Arguments:

years - the number of years, positive or negative
months - the number of months, positive or negative
weeks - the number of weeks, positive or negative
days - the number of days, positive or negative
hours - the number of hours, positive or negative
mins - the number of minutes, positive or negative
secs - the number of seconds with the fractional part in microsecond precision.

Examples:

sql 复制代码

> SELECT try_make_interval(100, 11, 1, 1, 12, 30, 01.001001);
 100 years 11 months 8 days 12 hours 30 minutes 1.001001 seconds
> SELECT try_make_interval(100, null, 3);
 NULL
> SELECT try_make_interval(0, 1, 0, 1, 0, 0, 100.000001);
 1 months 1 days 1 minutes 40.000001 seconds
> SELECT try_make_interval(2147483647);
 NULL

Since: 4.0.0

try_make_timestamp

try_make_timestamp(year, month, day, hour, min, sec[, timezone]) - Try to create a timestamp from year, month, day, hour, min, sec and timezone fields. The result data type is consistent with the value of configuration spark.sql.timestampType. The function returns NULL on invalid inputs.

Arguments:

year - the year to represent, from 1 to 9999
month - the month-of-year to represent, from 1 (January) to 12 (December)
day - the day-of-month to represent, from 1 to 31
hour - the hour-of-day to represent, from 0 to 23
min - the minute-of-hour to represent, from 0 to 59
sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. The value can be either an integer like 13 , or a fraction like 13.123. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp.
timezone - the time zone identifier. For example, CET, UTC and etc.

Examples:

sql 复制代码

> SELECT try_make_timestamp(2014, 12, 28, 6, 30, 45.887);
 2014-12-28 06:30:45.887
> SELECT try_make_timestamp(2014, 12, 28, 6, 30, 45.887, 'CET');
 2014-12-27 21:30:45.887
> SELECT try_make_timestamp(2019, 6, 30, 23, 59, 60);
 2019-07-01 00:00:00
> SELECT try_make_timestamp(2019, 6, 30, 23, 59, 1);
 2019-06-30 23:59:01
> SELECT try_make_timestamp(null, 7, 22, 15, 30, 0);
 NULL
> SELECT try_make_timestamp(2024, 13, 22, 15, 30, 0);
 NULL

Since: 4.0.0

try_make_timestamp_ltz

try_make_timestamp_ltz(year, month, day, hour, min, sec[, timezone]) - Try to create the current timestamp with local time zone from year, month, day, hour, min, sec and timezone fields. The function returns NULL on invalid inputs.

Arguments:

year - the year to represent, from 1 to 9999
month - the month-of-year to represent, from 1 (January) to 12 (December)
day - the day-of-month to represent, from 1 to 31
hour - the hour-of-day to represent, from 0 to 23
min - the minute-of-hour to represent, from 0 to 59
sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp.
timezone - the time zone identifier. For example, CET, UTC and etc.

Examples:

sql 复制代码

> SELECT try_make_timestamp_ltz(2014, 12, 28, 6, 30, 45.887);
 2014-12-28 06:30:45.887
> SELECT try_make_timestamp_ltz(2014, 12, 28, 6, 30, 45.887, 'CET');
 2014-12-27 21:30:45.887
> SELECT try_make_timestamp_ltz(2019, 6, 30, 23, 59, 60);
 2019-07-01 00:00:00
> SELECT try_make_timestamp_ltz(null, 7, 22, 15, 30, 0);
 NULL
> SELECT try_make_timestamp_ltz(2024, 13, 22, 15, 30, 0);
 NULL

Since: 4.0.0

try_make_timestamp_ntz

try_make_timestamp_ntz(year, month, day, hour, min, sec) - Try to create local date-time from year, month, day, hour, min, sec fields. The function returns NULL on invalid inputs.

Arguments:

year - the year to represent, from 1 to 9999
month - the month-of-year to represent, from 1 (January) to 12 (December)
day - the day-of-month to represent, from 1 to 31
hour - the hour-of-day to represent, from 0 to 23
min - the minute-of-hour to represent, from 0 to 59
sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp.

Examples:

sql 复制代码

> SELECT try_make_timestamp_ntz(2014, 12, 28, 6, 30, 45.887);
 2014-12-28 06:30:45.887
> SELECT try_make_timestamp_ntz(2019, 6, 30, 23, 59, 60);
 2019-07-01 00:00:00
> SELECT try_make_timestamp_ntz(null, 7, 22, 15, 30, 0);
 NULL
> SELECT try_make_timestamp_ntz(2024, 13, 22, 15, 30, 0);
 NULL

Since: 4.0.0

try_mod

try_mod(dividend, divisor) - Returns the remainder after expr1/expr2. dividend must be a numeric. divisor must be a numeric.

Examples:

sql 复制代码

> SELECT try_mod(3, 2);
 1
> SELECT try_mod(2L, 2L);
 0
> SELECT try_mod(3.0, 2.0);
 1.0
> SELECT try_mod(1, 0);
 NULL

Since: 4.0.0

try_multiply

try_multiply(expr1, expr2) - Returns expr1*expr2 and the result is null on overflow. The acceptable input types are the same with the * operator.

Examples:

sql 复制代码

> SELECT try_multiply(2, 3);
 6
> SELECT try_multiply(-2147483648, 10);
 NULL
> SELECT try_multiply(interval 2 year, 3);
 6-0

Since: 3.3.0

try_parse_json

try_parse_json(jsonStr) - Parse a JSON string as a Variant value. Return NULL when the string is not valid JSON value.

Examples:

sql 复制代码

> SELECT try_parse_json('{"a":1,"b":0.8}');
 {"a":1,"b":0.8}
> SELECT try_parse_json('{"a":1,');
 NULL

Since: 4.0.0

try_parse_url

try_parse_url(url, partToExtract[, key]) - This is a special version of parse_url that performs the same operation, but returns a NULL value instead of raising an error if the parsing cannot be performed.

Examples:

sql 复制代码

> SELECT try_parse_url('http://spark.apache.org/path?query=1', 'HOST');
 spark.apache.org
> SELECT try_parse_url('http://spark.apache.org/path?query=1', 'QUERY');
 query=1
> SELECT try_parse_url('inva lid://spark.apache.org/path?query=1', 'QUERY');
 NULL
> SELECT try_parse_url('http://spark.apache.org/path?query=1', 'QUERY', 'query');
 1

Since: 4.0.0

try_reflect

try_reflect(class, method[, arg1[, arg2 ...]]) - This is a special version of reflect that performs the same operation, but returns a NULL value instead of raising an error if the invoke method thrown exception.

Examples:

sql 复制代码

> SELECT try_reflect('java.util.UUID', 'randomUUID');
 c33fb387-8500-4bfa-81d2-6e0e3e930df2
> SELECT try_reflect('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
 a5cf6c42-0c85-418f-af6c-3e4e5b1328f2
> SELECT try_reflect('java.net.URLDecoder', 'decode', '%');
 NULL

Since: 4.0.0

try_subtract

try_subtract(expr1, expr2) - Returns expr1-expr2 and the result is null on overflow. The acceptable input types are the same with the - operator.

Examples:

sql 复制代码

> SELECT try_subtract(2, 1);
 1
> SELECT try_subtract(-2147483648, 1);
 NULL
> SELECT try_subtract(date'2021-01-02', 1);
 2021-01-01
> SELECT try_subtract(date'2021-01-01', interval 1 year);
 2020-01-01
> SELECT try_subtract(timestamp'2021-01-02 00:00:00', interval 1 day);
 2021-01-01 00:00:00
> SELECT try_subtract(interval 2 year, interval 1 year);
 1-0

Since: 3.3.0

try_sum

try_sum(expr) - Returns the sum calculated from values of a group and the result is null on overflow.

Examples:

sql 复制代码

> SELECT try_sum(col) FROM VALUES (5), (10), (15) AS tab(col);
 30
> SELECT try_sum(col) FROM VALUES (NULL), (10), (15) AS tab(col);
 25
> SELECT try_sum(col) FROM VALUES (NULL), (NULL) AS tab(col);
 NULL
> SELECT try_sum(col) FROM VALUES (9223372036854775807L), (1L) AS tab(col);
 NULL

Since: 3.3.0

try_to_binary

try_to_binary(str[, fmt]) - This is a special version of to_binary that performs the same operation, but returns a NULL value instead of raising an error if the conversion cannot be performed.

Examples:

sql 复制代码

> SELECT try_to_binary('abc', 'utf-8');
 abc
> select try_to_binary('a!', 'base64');
 NULL
> select try_to_binary('abc', 'invalidFormat');
 NULL

Since: 3.3.0

try_to_number

try_to_number(expr, fmt) - Convert string 'expr' to a number based on the string format fmt. Returns NULL if the string 'expr' does not match the expected format. The format follows the same semantics as the to_number function.

Examples:

sql 复制代码

> SELECT try_to_number('454', '999');
 454
> SELECT try_to_number('454.00', '000.00');
 454.00
> SELECT try_to_number('12,454', '99,999');
 12454
> SELECT try_to_number('$78.12', '$99.99');
 78.12
> SELECT try_to_number('12,454.8-', '99,999.9S');
 -12454.8

Since: 3.3.0

try_to_timestamp

try_to_timestamp(timestamp_str[, fmt]) - Parses the timestamp_str expression with the fmt expression to a timestamp. The function always returns null on an invalid input with/without ANSI SQL mode enabled. By default, it follows casting rules to a timestamp if the fmt is omitted. The result data type is consistent with the value of configuration spark.sql.timestampType.

Arguments:

timestamp_str - A string to be parsed to timestamp.
fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns.

Examples:

sql 复制代码

> SELECT try_to_timestamp('2016-12-31 00:12:00');
 2016-12-31 00:12:00
> SELECT try_to_timestamp('2016-12-31', 'yyyy-MM-dd');
 2016-12-31 00:00:00
> SELECT try_to_timestamp('foo', 'yyyy-MM-dd');
 NULL

Since: 3.4.0

try_url_decode

try_url_decode(str) - This is a special version of url_decode that performs the same operation, but returns a NULL value instead of raising an error if the decoding cannot be performed.

Arguments:

str - a string expression to decode

Examples:

sql 复制代码

> SELECT try_url_decode('https%3A%2F%2Fspark.apache.org');
 https://spark.apache.org

Since: 4.0.0

try_validate_utf8

try_validate_utf8(str) - Returns the original string if str is a valid UTF-8 string, otherwise returns NULL.

Arguments:

str - a string expression

Examples:

sql 复制代码

> SELECT try_validate_utf8('Spark');
 Spark
> SELECT try_validate_utf8(x'61');
 a
> SELECT try_validate_utf8(x'80');
 NULL
> SELECT try_validate_utf8(x'61C262');
 NULL

Since: 4.0.0

try_variant_get

try_variant_get(v, path[, type]) - Extracts a sub-variant from v according to path, and then cast the sub-variant to type. When type is omitted, it is default to variant. Returns null if the path does not exist or the cast fails.

Examples:

sql 复制代码

> SELECT try_variant_get(parse_json('{"a": 1}'), '$.a', 'int');
 1
> SELECT try_variant_get(parse_json('{"a": 1}'), '$.b', 'int');
 NULL
> SELECT try_variant_get(parse_json('[1, "2"]'), '$[1]', 'string');
 2
> SELECT try_variant_get(parse_json('[1, "2"]'), '$[2]', 'string');
 NULL
> SELECT try_variant_get(parse_json('[1, "hello"]'), '$[1]');
 "hello"
> SELECT try_variant_get(parse_json('[1, "hello"]'), '$[1]', 'int');
 NULL

Since: 4.0.0

typeof

typeof(expr) - Return DDL-formatted type string for the data type of the input.

Examples:

sql 复制代码

> SELECT typeof(1);
 int
> SELECT typeof(array(1));
 array<int>

Since: 3.0.0