| 序号 | 类型 | 地址 |
|---|---|---|
| 1 | Spark 函数 | 1、Spark函数_符号 |
| 2 | Spark 函数 | 2、Spark 函数_a/b/c |
| 3 | Spark 函数 | 3、Spark 函数_d/e/f/j/h/i/j/k/l |
| 4 | Spark 函数 | 4、Spark 函数_m/n/o/p/q/r |
| 5 | Spark 函数 | 5、Spark函数_s/t |
| 6 | Spark 函数 | 6、Spark 函数_u/v/w/x/y/z |
文章目录
-
- 19、S
-
- schema_of_avro
- schema_of_csv
- schema_of_json
- schema_of_variant
- schema_of_variant_agg
- schema_of_xml
- sec
- second
- sentences
- sequence
- session_user
- session_window
- sha
- sha1
- sha2
- shiftleft
- shiftright
- shiftrightunsigned
- shuffle
- sign
- signum
- sin
- sinh
- size
- skewness
- slice
- smallint
- some
- sort_array
- soundex
- space
- spark_partition_id
- split
- split_part
- sql_keywords
- sqrt
- stack
- startswith
- std
- stddev
- stddev_pop
- stddev_samp
- str_to_map
- string
- string_agg
- struct
- substr
- substring
- substring_index
- sum
- 20、T
-
- tan
- tanh
- timestamp
- timestamp_micros
- timestamp_millis
- timestamp_seconds
- tinyint
- to_avro
- to_binary
- to_char
- to_csv
- to_date
- to_json
- to_number
- to_protobuf
- to_timestamp
- to_timestamp_ltz
- to_timestamp_ntz
- to_unix_timestamp
- to_utc_timestamp
- to_varchar
- to_variant_object
- to_xml
- transform
- transform_keys
- transform_values
- translate
- trim
- trunc
- try_add
- try_aes_decrypt
- try_avg
- try_divide
- try_element_at
- try_make_interval
- try_make_timestamp
- try_make_timestamp_ltz
- try_make_timestamp_ntz
- try_mod
- try_multiply
- try_parse_json
- try_parse_url
- try_reflect
- try_subtract
- try_sum
- try_to_binary
- try_to_number
- try_to_timestamp
- try_url_decode
- try_validate_utf8
- try_variant_get
- typeof
19、S
schema_of_avro
schema_of_avro(jsonFormatSchema, options) - Returns schema in the DDL format of the avro schema in JSON string format.
Examples:
sql
> SELECT schema_of_avro('{"type": "record", "name": "struct", "fields": [{"name": "u", "type": ["int", "string"]}]}', map());
STRUCT<u: STRUCT<member0: INT, member1: STRING> NOT NULL>
Since: 4.0.0
schema_of_csv
schema_of_csv(csv[, options]) - Returns schema in the DDL format of CSV string.
Examples:
sql
> SELECT schema_of_csv('1,abc');
STRUCT<_c0: INT, _c1: STRING>
Since: 3.0.0
schema_of_json
schema_of_json(json[, options]) - Returns schema in the DDL format of JSON string.
Examples:
sql
> SELECT schema_of_json('[{"col":0}]');
ARRAY<STRUCT<col: BIGINT>>
> SELECT schema_of_json('[{"col":01}]', map('allowNumericLeadingZeros', 'true'));
ARRAY<STRUCT<col: BIGINT>>
Since: 2.4.0
schema_of_variant
schema_of_variant(v) - Returns schema in the SQL format of a variant.
Examples:
sql
> SELECT schema_of_variant(parse_json('null'));
VOID
> SELECT schema_of_variant(parse_json('[{"b":true,"a":0}]'));
ARRAY<OBJECT<a: BIGINT, b: BOOLEAN>>
Since: 4.0.0
schema_of_variant_agg
schema_of_variant_agg(v) - Returns the merged schema in the SQL format of a variant column.
Examples:
sql
> SELECT schema_of_variant_agg(parse_json(j)) FROM VALUES ('1'), ('2'), ('3') AS tab(j);
BIGINT
> SELECT schema_of_variant_agg(parse_json(j)) FROM VALUES ('{"a": 1}'), ('{"b": true}'), ('{"c": 1.23}') AS tab(j);
OBJECT<a: BIGINT, b: BOOLEAN, c: DECIMAL(3,2)>
Since: 4.0.0
schema_of_xml
schema_of_xml(xml[, options]) - Returns schema in the DDL format of XML string.
Examples:
sql
> SELECT schema_of_xml('<p><a>1</a></p>');
STRUCT<a: BIGINT>
> SELECT schema_of_xml('<p><a attr="2">1</a><a>3</a></p>', map('excludeAttribute', 'true'));
STRUCT<a: ARRAY<BIGINT>>
Since: 4.0.0
sec
sec(expr) - Returns the secant of expr, as if computed by 1/java.lang.Math.cos.
Arguments:
- expr - angle in radians
Examples:
sql
> SELECT sec(0);
1.0
Since: 3.3.0
second
second(timestamp) - Returns the second component of the string/timestamp.
Examples:
sql
> SELECT second('2009-07-30 12:58:59');
59
Since: 1.5.0
sentences
sentences(str[, lang[, country]]) - Splits str into an array of array of words.
Arguments:
- str - A STRING expression to be parsed.
- lang - An optional STRING expression with a language code from ISO 639 Alpha-2 (e.g. 'DE'), Alpha-3, or a language subtag of up to 8 characters.
- country - An optional STRING expression with a country code from ISO 3166 alpha-2 country code or a UN M.49 numeric-3 area code.
Examples:
sql
> SELECT sentences('Hi there! Good morning.');
[["Hi","there"],["Good","morning"]]
> SELECT sentences('Hi there! Good morning.', 'en');
[["Hi","there"],["Good","morning"]]
> SELECT sentences('Hi there! Good morning.', 'en', 'US');
[["Hi","there"],["Good","morning"]]
Since: 2.0.0
sequence
sequence(start, stop, step) - Generates an array of elements from start to stop (inclusive), incrementing by step. The type of the returned elements is the same as the type of argument expressions.
Supported types are: byte, short, integer, long, date, timestamp.
The start and stop expressions must resolve to the same type. If start and stop expressions resolve to the 'date' or 'timestamp' type then the step expression must resolve to the 'interval' or 'year-month interval' or 'day-time interval' type, otherwise to the same type as the start and stop expressions.
Arguments:
- start - an expression. The start of the range.
- stop - an expression. The end the range (inclusive).
- step - an optional expression. The step of the range. By default step is 1 if start is less than or equal to stop, otherwise -1. For the temporal sequences it's 1 day and -1 day respectively. If start is greater than stop then the step must be negative, and vice versa.
Examples:
sql
> SELECT sequence(1, 5);
[1,2,3,4,5]
> SELECT sequence(5, 1);
[5,4,3,2,1]
> SELECT sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval 1 month);
[2018-01-01,2018-02-01,2018-03-01]
> SELECT sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval '0-1' year to month);
[2018-01-01,2018-02-01,2018-03-01]
Since: 2.4.0
session_user
session_user() - user name of current execution context.
Examples:
sql
> SELECT session_user();
mockingjay
Since: 4.0.0
session_window
session_window(time_column, gap_duration) - Generates session window given a timestamp specifying column and gap duration. See 'Types of time windows' in Structured Streaming guide doc for detailed explanation and examples.
Arguments:
- time_column - The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType.
- gap_duration - A string specifying the timeout of the session represented as "interval value" (See Interval Literal for more details.) for the fixed gap duration, or an expression which is applied for each input and evaluated to the "interval value" for the dynamic gap duration.
Examples:
sql
> SELECT a, session_window.start, session_window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:10:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, session_window(b, '5 minutes') ORDER BY a, start;
A1 2021-01-01 00:00:00 2021-01-01 00:09:30 2
A1 2021-01-01 00:10:00 2021-01-01 00:15:00 1
A2 2021-01-01 00:01:00 2021-01-01 00:06:00 1
> SELECT a, session_window.start, session_window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:10:00'), ('A2', '2021-01-01 00:01:00'), ('A2', '2021-01-01 00:04:30') AS tab(a, b) GROUP by a, session_window(b, CASE WHEN a = 'A1' THEN '5 minutes' WHEN a = 'A2' THEN '1 minute' ELSE '10 minutes' END) ORDER BY a, start;
A1 2021-01-01 00:00:00 2021-01-01 00:09:30 2
A1 2021-01-01 00:10:00 2021-01-01 00:15:00 1
A2 2021-01-01 00:01:00 2021-01-01 00:02:00 1
A2 2021-01-01 00:04:30 2021-01-01 00:05:30 1
Since: 3.2.0
sha
sha(expr) - Returns a sha1 hash value as a hex string of the expr.
Examples:
sql
> SELECT sha('Spark');
85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c
Since: 1.5.0
sha1
sha1(expr) - Returns a sha1 hash value as a hex string of the expr.
Examples:
sql
> SELECT sha1('Spark');
85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c
Since: 1.5.0
sha2
sha2(expr, bitLength) - Returns a checksum of SHA-2 family as a hex string of expr. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent to 256.
Examples:
sql
> SELECT sha2('Spark', 256);
529bc3b07127ecb7e53a4dcf1991d9152c24537d919178022b2c42657f79a26b
Since: 1.5.0
shiftleft
base shiftleft exp - Bitwise left shift.
Examples:
sql
> SELECT shiftleft(2, 1);
4
> SELECT 2 << 1;
4
Note:
<< operator is added in Spark 4.0.0 as an alias for shiftleft.
Since: 1.5.0
shiftright
base shiftright expr - Bitwise (signed) right shift.
Examples:
sql
> SELECT shiftright(4, 1);
2
> SELECT 4 >> 1;
2
Note:
>> operator is added in Spark 4.0.0 as an alias for shiftright.
Since: 1.5.0
shiftrightunsigned
base shiftrightunsigned expr - Bitwise unsigned right shift.
Examples:
sql
> SELECT shiftrightunsigned(4, 1);
2
> SELECT 4 >>> 1;
2
Note:
>>> operator is added in Spark 4.0.0 as an alias for shiftrightunsigned.
Since: 1.5.0
shuffle
shuffle(array) - Returns a random permutation of the given array.
Examples:
sql
> SELECT shuffle(array(1, 20, 3, 5));
[3,1,5,20]
> SELECT shuffle(array(1, 20, null, 3));
[20,null,3,1]
Note:
The function is non-deterministic.
Since: 2.4.0
sign
sign(expr) - Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive.
Examples:
sql
> SELECT sign(40);
1.0
> SELECT sign(INTERVAL -'100' YEAR);
-1.0
Since: 1.4.0
signum
signum(expr) - Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive.
Examples:
sql
> SELECT signum(40);
1.0
> SELECT signum(INTERVAL -'100' YEAR);
-1.0
Since: 1.4.0
sin
sin(expr) - Returns the sine of expr, as if computed by java.lang.Math.sin.
Arguments:
- expr - angle in radians
Examples:
sql
> SELECT sin(0);
0.0
Since: 1.4.0
sinh
sinh(expr) - Returns hyperbolic sine of expr, as if computed by java.lang.Math.sinh.
Arguments:
- expr - hyperbolic angle
Examples:
sql
> SELECT sinh(0);
0.0
Since: 1.4.0
size
size(expr) - Returns the size of an array or a map. This function returns -1 for null input only if spark.sql.ansi.enabled is false and spark.sql.legacy.sizeOfNull is true. Otherwise, it returns null for null input. With the default settings, the function returns null for null input.
Examples:
sql
> SELECT size(array('b', 'd', 'c', 'a'));
4
> SELECT size(map('a', 1, 'b', 2));
2
Since: 1.5.0
skewness
skewness(expr) - Returns the skewness value calculated from values of a group.
Examples:
sql
> SELECT skewness(col) FROM VALUES (-10), (-20), (100), (1000) AS tab(col);
1.1135657469022011
> SELECT skewness(col) FROM VALUES (-1000), (-100), (10), (20) AS tab(col);
-1.1135657469022011
Since: 1.6.0
slice
slice(x, start, length) - Subsets array x starting from index start (array indices start at 1, or starting from the end if start is negative) with the specified length.
Examples:
sql
> SELECT slice(array(1, 2, 3, 4), 2, 2);
[2,3]
> SELECT slice(array(1, 2, 3, 4), -2, 2);
[3,4]
Since: 2.4.0
smallint
smallint(expr) - Casts the value expr to the target data type smallint.
Since: 2.0.1
some
some(expr) - Returns true if at least one value of expr is true.
Examples:
sql
> SELECT some(col) FROM VALUES (true), (false), (false) AS tab(col);
true
> SELECT some(col) FROM VALUES (NULL), (true), (false) AS tab(col);
true
> SELECT some(col) FROM VALUES (false), (false), (NULL) AS tab(col);
false
Since: 3.0.0
sort_array
sort_array(array[, ascendingOrder]) - Sorts the input array in ascending or descending order according to the natural ordering of the array elements. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order.
Examples:
sql
> SELECT sort_array(array('b', 'd', null, 'c', 'a'), true);
[null,"a","b","c","d"]
> SELECT sort_array(array('b', 'd', null, 'c', 'a'), false);
["d","c","b","a",null]
Since: 1.5.0
soundex
soundex(str) - Returns Soundex code of the string.
Examples:
sql
> SELECT soundex('Miller');
M460
Since: 1.5.0
space
space(n) - Returns a string consisting of n spaces.
Examples:
sql
> SELECT concat(space(2), '1');
1
Since: 1.5.0
spark_partition_id
spark_partition_id() - Returns the current partition id.
Examples:
sql
> SELECT spark_partition_id();
0
Since: 1.4.0
split
split(str, regex, limit) - Splits str around occurrences that match regex and returns an array with a length of at most limit
Arguments:
- str - a string expression to split.
- regex - a string representing a regular expression. The regex string should be a Java regular expression.
- limit - an integer expression which controls the number of times the regex is applied.
- limit > 0: The resulting array's length will not be more than
limit, and the resulting array's last entry will contain all input beyond the last matched regex. - limit <= 0:
regexwill be applied as many times as possible, and the resulting array can be of any size.
- limit > 0: The resulting array's length will not be more than
Examples:
sql
> SELECT split('oneAtwoBthreeC', '[ABC]');
["one","two","three",""]
> SELECT split('oneAtwoBthreeC', '[ABC]', -1);
["one","two","three",""]
> SELECT split('oneAtwoBthreeC', '[ABC]', 2);
["one","twoBthreeC"]
Since: 1.5.0
split_part
split_part(str, delimiter, partNum) - Splits str by delimiter and return requested part of the split (1-based). If any input is null, returns null. if partNum is out of range of split parts, returns empty string. If partNum is 0, throws an error. If partNum is negative, the parts are counted backward from the end of the string. If the delimiter is an empty string, the str is not split.
Examples:
sql
> SELECT split_part('11.12.13', '.', 3);
13
Since: 3.3.0
sql_keywords
sql_keywords() - Get Spark SQL keywords
Examples:
sql
> SELECT * FROM sql_keywords() LIMIT 2;
ADD false
AFTER false
Since: 3.5.0
sqrt
sqrt(expr) - Returns the square root of expr.
Examples:
sql
> SELECT sqrt(4);
2.0
Since: 1.1.1
stack
stack(n, expr1, ..., exprk) - Separates expr1, ..., exprk into n rows. Uses column names col0, col1, etc. by default unless specified otherwise.
Examples:
sql
> SELECT stack(2, 1, 2, 3);
1 2
3 NULL
Since: 2.0.0
startswith
startswith(left, right) - Returns a boolean. The value is True if left starts with right. Returns NULL if either input expression is NULL. Otherwise, returns False. Both left or right must be of STRING or BINARY type.
Examples:
sql
> SELECT startswith('Spark SQL', 'Spark');
true
> SELECT startswith('Spark SQL', 'SQL');
false
> SELECT startswith('Spark SQL', null);
NULL
> SELECT startswith(x'537061726b2053514c', x'537061726b');
true
> SELECT startswith(x'537061726b2053514c', x'53514c');
false
Since: 3.3.0
std
std(expr) - Returns the sample standard deviation calculated from values of a group.
Examples:
sql
> SELECT std(col) FROM VALUES (1), (2), (3) AS tab(col);
1.0
Since: 1.6.0
stddev
stddev(expr) - Returns the sample standard deviation calculated from values of a group.
Examples:
sql
> SELECT stddev(col) FROM VALUES (1), (2), (3) AS tab(col);
1.0
Since: 1.6.0
stddev_pop
stddev_pop(expr) - Returns the population standard deviation calculated from values of a group.
Examples:
sql
> SELECT stddev_pop(col) FROM VALUES (1), (2), (3) AS tab(col);
0.816496580927726
Since: 1.6.0
stddev_samp
stddev_samp(expr) - Returns the sample standard deviation calculated from values of a group.
Examples:
sql
> SELECT stddev_samp(col) FROM VALUES (1), (2), (3) AS tab(col);
1.0
Since: 1.6.0
str_to_map
str_to_map(text[, pairDelim[, keyValueDelim]]) - Creates a map after splitting the text into key/value pairs using delimiters. Default delimiters are ',' for pairDelim and ':' for keyValueDelim. Both pairDelim and keyValueDelim are treated as regular expressions.
Examples:
sql
> SELECT str_to_map('a:1,b:2,c:3', ',', ':');
{"a":"1","b":"2","c":"3"}
> SELECT str_to_map('a');
{"a":null}
Since: 2.0.1
string
string(expr) - Casts the value expr to the target data type string.
Since: 2.0.1
string_agg
string_agg(expr[, delimiter])[ WITHIN GROUP (ORDER BY key [ASC | DESC] [,...])] - Returns the concatenation of non-NULL input values, separated by the delimiter ordered by key. If all values are NULL, NULL is returned.
Arguments:
- expr - a string or binary expression to be concatenated.
- delimiter - an optional string or binary foldable expression used to separate the input values. If NULL, the concatenation will be performed without a delimiter. Default is NULL.
- key - an optional expression for ordering the input values. Multiple keys can be specified. If none are specified, the order of the rows in the result is non-deterministic.
Examples:
sql
> SELECT string_agg(col) FROM VALUES ('a'), ('b'), ('c') AS tab(col);
abc
> SELECT string_agg(col) WITHIN GROUP (ORDER BY col DESC) FROM VALUES ('a'), ('b'), ('c') AS tab(col);
cba
> SELECT string_agg(col) FROM VALUES ('a'), (NULL), ('b') AS tab(col);
ab
> SELECT string_agg(col) FROM VALUES ('a'), ('a') AS tab(col);
aa
> SELECT string_agg(DISTINCT col) FROM VALUES ('a'), ('a'), ('b') AS tab(col);
ab
> SELECT string_agg(col, ', ') FROM VALUES ('a'), ('b'), ('c') AS tab(col);
a, b, c
> SELECT string_agg(col) FROM VALUES (NULL), (NULL) AS tab(col);
NULL
Note:
- If the order is not specified, the function is non-deterministic because the order of the rows may be non-deterministic after a shuffle.
- If DISTINCT is specified, then expr and key must be the same expression.
Since: 4.0.0
struct
struct(col1, col2, col3, ...) - Creates a struct with the given field values.
Examples:
sql
> SELECT struct(1, 2, 3);
{"col1":1,"col2":2,"col3":3}
Since: 1.4.0
substr
substr(str, pos[, len]) - Returns the substring of str that starts at pos and is of length len, or the slice of byte array that starts at pos and is of length len.
substr(str FROM pos[ FOR len]]) - Returns the substring of str that starts at pos and is of length len, or the slice of byte array that starts at pos and is of length len.
Examples:
sql
> SELECT substr('Spark SQL', 5);
k SQL
> SELECT substr('Spark SQL', -3);
SQL
> SELECT substr('Spark SQL', 5, 1);
k
> SELECT substr('Spark SQL' FROM 5);
k SQL
> SELECT substr('Spark SQL' FROM -3);
SQL
> SELECT substr('Spark SQL' FROM 5 FOR 1);
k
> SELECT substr(encode('Spark SQL', 'utf-8'), 5);
k SQL
Since: 1.5.0
substring
substring(str, pos[, len]) - Returns the substring of str that starts at pos and is of length len, or the slice of byte array that starts at pos and is of length len.
substring(str FROM pos[ FOR len]]) - Returns the substring of str that starts at pos and is of length len, or the slice of byte array that starts at pos and is of length len.
Examples:
sql
> SELECT substring('Spark SQL', 5);
k SQL
> SELECT substring('Spark SQL', -3);
SQL
> SELECT substring('Spark SQL', 5, 1);
k
> SELECT substring('Spark SQL' FROM 5);
k SQL
> SELECT substring('Spark SQL' FROM -3);
SQL
> SELECT substring('Spark SQL' FROM 5 FOR 1);
k
> SELECT substring(encode('Spark SQL', 'utf-8'), 5);
k SQL
Since: 1.5.0
substring_index
substring_index(str, delim, count) - Returns the substring from str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. The function substring_index performs a case-sensitive match when searching for delim.
Examples:
sql
> SELECT substring_index('www.apache.org', '.', 2);
www.apache
Since: 1.5.0
sum
sum(expr) - Returns the sum calculated from values of a group.
Examples:
sql
> SELECT sum(col) FROM VALUES (5), (10), (15) AS tab(col);
30
> SELECT sum(col) FROM VALUES (NULL), (10), (15) AS tab(col);
25
> SELECT sum(col) FROM VALUES (NULL), (NULL) AS tab(col);
NULL
Since: 1.0.0
20、T
tan
tan(expr) - Returns the tangent of expr, as if computed by java.lang.Math.tan.
Arguments:
- expr - angle in radians
Examples:
sql
> SELECT tan(0);
0.0
Since: 1.4.0
tanh
tanh(expr) - Returns the hyperbolic tangent of expr, as if computed by java.lang.Math.tanh.
Arguments:
- expr - hyperbolic angle
Examples:
sql
> SELECT tanh(0);
0.0
Since: 1.4.0
timestamp
timestamp(expr) - Casts the value expr to the target data type timestamp.
Since: 2.0.1
timestamp_micros
timestamp_micros(microseconds) - Creates timestamp from the number of microseconds since UTC epoch.
Examples:
sql
> SELECT timestamp_micros(1230219000123123);
2008-12-25 07:30:00.123123
Since: 3.1.0
timestamp_millis
timestamp_millis(milliseconds) - Creates timestamp from the number of milliseconds since UTC epoch.
Examples:
sql
> SELECT timestamp_millis(1230219000123);
2008-12-25 07:30:00.123
Since: 3.1.0
timestamp_seconds
timestamp_seconds(seconds) - Creates timestamp from the number of seconds (can be fractional) since UTC epoch.
Examples:
sql
> SELECT timestamp_seconds(1230219000);
2008-12-25 07:30:00
> SELECT timestamp_seconds(1230219000.123);
2008-12-25 07:30:00.123
Since: 3.1.0
tinyint
tinyint(expr) - Casts the value expr to the target data type tinyint.
Since: 2.0.1
to_avro
to_avro(child[, jsonFormatSchema]) - Converts a Catalyst binary input value into its corresponding Avro format result.
Examples:
sql
> SELECT to_avro(s, '{"type": "record", "name": "struct", "fields": [{ "name": "u", "type": ["int","string"] }]}') IS NULL FROM (SELECT NULL AS s);
[true]
> SELECT to_avro(s) IS NULL FROM (SELECT NULL AS s);
[true]
Since: 4.0.0
to_binary
to_binary(str[, fmt]) - Converts the input str to a binary value based on the supplied fmt. fmt can be a case-insensitive string literal of "hex", "utf-8", "utf8", or "base64". By default, the binary format for conversion is "hex" if fmt is omitted. The function returns NULL if at least one of the input parameters is NULL.
Examples:
sql
> SELECT to_binary('abc', 'utf-8');
abc
Since: 3.3.0
to_char
to_char(expr, format) - Convert expr to a string based on the format. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input value, generating a result string of the same length as the corresponding sequence in the format string. The result string is left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of the decimal value, starts with 0, and is before the decimal point. Otherwise, it is padded with spaces. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. '': Specifies the location of the currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the end of the format string; specifies that the result string will be wrapped by angle brackets if the input value is negative. ('<1>'). If expr is a datetime, format shall be a valid datetime pattern, see Datetime Patterns. If expr is a binary, it is converted to a string in one of the formats: 'base64': a base 64 string. 'hex': a string in the hexadecimal format. 'utf-8': the input binary is decoded to UTF-8 string.
Examples:
sql
> SELECT to_char(454, '999');
454
> SELECT to_char(454.00, '000D00');
454.00
> SELECT to_char(12454, '99G999');
12,454
> SELECT to_char(78.12, '$99.99');
$78.12
> SELECT to_char(-12454.8, '99G999D9S');
12,454.8-
> SELECT to_char(date'2016-04-08', 'y');
2016
> SELECT to_char(x'537061726b2053514c', 'base64');
U3BhcmsgU1FM
> SELECT to_char(x'537061726b2053514c', 'hex');
537061726B2053514C
> SELECT to_char(encode('abc', 'utf-8'), 'utf-8');
abc
Since: 3.4.0
to_csv
to_csv(expr[, options]) - Returns a CSV string with a given struct value
Examples:
sql
> SELECT to_csv(named_struct('a', 1, 'b', 2));
1,2
> SELECT to_csv(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy'));
26/08/2015
Since: 3.0.0
to_date
to_date(date_str[, fmt]) - Parses the date_str expression with the fmt expression to a date. Returns null with invalid input. By default, it follows casting rules to a date if the fmt is omitted.
Arguments:
- date_str - A string to be parsed to date.
- fmt - Date format pattern to follow. See Datetime Patterns for valid date and time format patterns.
Examples:
sql
> SELECT to_date('2009-07-30 04:17:52');
2009-07-30
> SELECT to_date('2016-12-31', 'yyyy-MM-dd');
2016-12-31
Since: 1.5.0
to_json
to_json(expr[, options]) - Returns a JSON string with a given struct value
Examples:
sql
> SELECT to_json(named_struct('a', 1, 'b', 2));
{"a":1,"b":2}
> SELECT to_json(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy'));
{"time":"26/08/2015"}
> SELECT to_json(array(named_struct('a', 1, 'b', 2)));
[{"a":1,"b":2}]
> SELECT to_json(map('a', named_struct('b', 1)));
{"a":{"b":1}}
> SELECT to_json(map(named_struct('a', 1),named_struct('b', 2)));
{"[1]":{"b":2}}
> SELECT to_json(map('a', 1));
{"a":1}
> SELECT to_json(array(map('a', 1)));
[{"a":1}]
Since: 2.2.0
to_number
to_number(expr, fmt) - Convert string 'expr' to a number based on the string format 'fmt'. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input string. If the 0/9 sequence starts with 0 and is before the decimal point, it can only match a digit sequence of the same size. Otherwise, if the sequence starts with 9 or is after the decimal point, it can match a digit sequence that has the same or smaller size. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. 'expr' must match the grouping separator relevant for the size of the number. '': Specifies the location of the currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' allows '-' but 'MI' does not. 'PR': Only allowed at the end of the format string; specifies that 'expr' indicates a negative number with wrapping angled brackets. ('<1>').
Examples:
sql
> SELECT to_number('454', '999');
454
> SELECT to_number('454.00', '000.00');
454.00
> SELECT to_number('12,454', '99,999');
12454
> SELECT to_number('$78.12', '$99.99');
78.12
> SELECT to_number('12,454.8-', '99,999.9S');
-12454.8
Since: 3.3.0
to_protobuf
to_protobuf(child, messageName, descFilePath, options) - Converts a Catalyst binary input value into its corresponding Protobuf format result.
Examples:
sql
> SELECT to_protobuf(s, 'Person', '/path/to/descriptor.desc', map('emitDefaultValues', 'true')) IS NULL FROM (SELECT NULL AS s);
[true]
Since: 4.0.0
to_timestamp
to_timestamp(timestamp_str[, fmt]) - Parses the timestamp_str expression with the fmt expression to a timestamp. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt is omitted. The result data type is consistent with the value of configuration spark.sql.timestampType.
Arguments:
- timestamp_str - A string to be parsed to timestamp.
- fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns.
Examples:
sql
> SELECT to_timestamp('2016-12-31 00:12:00');
2016-12-31 00:12:00
> SELECT to_timestamp('2016-12-31', 'yyyy-MM-dd');
2016-12-31 00:00:00
Since: 2.2.0
to_timestamp_ltz
to_timestamp_ltz(timestamp_str[, fmt]) - Parses the timestamp_str expression with the fmt expression to a timestamp with local time zone. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt is omitted.
Arguments:
- timestamp_str - A string to be parsed to timestamp with local time zone.
- fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns.
Examples:
sql
> SELECT to_timestamp_ltz('2016-12-31 00:12:00');
2016-12-31 00:12:00
> SELECT to_timestamp_ltz('2016-12-31', 'yyyy-MM-dd');
2016-12-31 00:00:00
Since: 3.4.0
to_timestamp_ntz
to_timestamp_ntz(timestamp_str[, fmt]) - Parses the timestamp_str expression with the fmt expression to a timestamp without time zone. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt is omitted.
Arguments:
- timestamp_str - A string to be parsed to timestamp without time zone.
- fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns.
Examples:
sql
> SELECT to_timestamp_ntz('2016-12-31 00:12:00');
2016-12-31 00:12:00
> SELECT to_timestamp_ntz('2016-12-31', 'yyyy-MM-dd');
2016-12-31 00:00:00
Since: 3.4.0
to_unix_timestamp
to_unix_timestamp(timeExp[, fmt]) - Returns the UNIX timestamp of the given time.
Arguments:
- timeExp - A date/timestamp or string which is returned as a UNIX timestamp.
- fmt - Date/time format pattern to follow. Ignored if
timeExpis not a string. Default value is "yyyy-MM-dd HH:mm:ss". See Datetime Patterns for valid date and time format patterns.
Examples:
sql
> SELECT to_unix_timestamp('2016-04-08', 'yyyy-MM-dd');
1460098800
Since: 1.6.0
to_utc_timestamp
to_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.
Examples:
sql
> SELECT to_utc_timestamp('2016-08-31', 'Asia/Seoul');
2016-08-30 15:00:00
Since: 1.5.0
to_varchar
to_varchar(expr, format) - Convert expr to a string based on the format. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input value, generating a result string of the same length as the corresponding sequence in the format string. The result string is left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of the decimal value, starts with 0, and is before the decimal point. Otherwise, it is padded with spaces. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. '': Specifies the location of the currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the end of the format string; specifies that the result string will be wrapped by angle brackets if the input value is negative. ('<1>'). If expr is a datetime, format shall be a valid datetime pattern, see Datetime Patterns. If expr is a binary, it is converted to a string in one of the formats: 'base64': a base 64 string. 'hex': a string in the hexadecimal format. 'utf-8': the input binary is decoded to UTF-8 string.
Examples:
sql
> SELECT to_varchar(454, '999');
454
> SELECT to_varchar(454.00, '000D00');
454.00
> SELECT to_varchar(12454, '99G999');
12,454
> SELECT to_varchar(78.12, '$99.99');
$78.12
> SELECT to_varchar(-12454.8, '99G999D9S');
12,454.8-
> SELECT to_varchar(date'2016-04-08', 'y');
2016
> SELECT to_varchar(x'537061726b2053514c', 'base64');
U3BhcmsgU1FM
> SELECT to_varchar(x'537061726b2053514c', 'hex');
537061726B2053514C
> SELECT to_varchar(encode('abc', 'utf-8'), 'utf-8');
abc
Since: 3.5.0
to_variant_object
to_variant_object(expr) - Convert a nested input (array/map/struct) into a variant where maps and structs are converted to variant objects which are unordered unlike SQL structs. Input maps can only have string keys.
Examples:
sql
> SELECT to_variant_object(named_struct('a', 1, 'b', 2));
{"a":1,"b":2}
> SELECT to_variant_object(array(1, 2, 3));
[1,2,3]
> SELECT to_variant_object(array(named_struct('a', 1)));
[{"a":1}]
> SELECT to_variant_object(array(map("a", 2)));
[{"a":2}]
Since: 4.0.0
to_xml
to_xml(expr[, options]) - Returns a XML string with a given struct value
Examples:
sql
> SELECT to_xml(named_struct('a', 1, 'b', 2));
<ROW>
<a>1</a>
<b>2</b>
</ROW>
> SELECT to_xml(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy'));
<ROW>
<time>26/08/2015</time>
</ROW>
Since: 4.0.0
transform
transform(expr, func) - Transforms elements in an array using the function.
Examples:
sql
> SELECT transform(array(1, 2, 3), x -> x + 1);
[2,3,4]
> SELECT transform(array(1, 2, 3), (x, i) -> x + i);
[1,3,5]
Since: 2.4.0
transform_keys
transform_keys(expr, func) - Transforms elements in a map using the function.
Examples:
sql
> SELECT transform_keys(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 1);
{2:1,3:2,4:3}
> SELECT transform_keys(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v);
{2:1,4:2,6:3}
Since: 3.0.0
transform_values
transform_values(expr, func) - Transforms values in the map using the function.
Examples:
sql
> SELECT transform_values(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> v + 1);
{1:2,2:3,3:4}
> SELECT transform_values(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v);
{1:2,2:4,3:6}
Since: 3.0.0
translate
translate(input, from, to) - Translates the input string by replacing the characters present in the from string with the corresponding characters in the to string.
Examples:
sql
> SELECT translate('AaBbCc', 'abc', '123');
A1B2C3
Since: 1.5.0
trim
trim(str) - Removes the leading and trailing space characters from str.
trim(BOTH FROM str) - Removes the leading and trailing space characters from str.
trim(LEADING FROM str) - Removes the leading space characters from str.
trim(TRAILING FROM str) - Removes the trailing space characters from str.
trim(trimStr FROM str) - Remove the leading and trailing trimStr characters from str.
trim(BOTH trimStr FROM str) - Remove the leading and trailing trimStr characters from str.
trim(LEADING trimStr FROM str) - Remove the leading trimStr characters from str.
trim(TRAILING trimStr FROM str) - Remove the trailing trimStr characters from str.
Arguments:
- str - a string expression
- trimStr - the trim string characters to trim, the default value is a single space
- BOTH, FROM - these are keywords to specify trimming string characters from both ends of the string
- LEADING, FROM - these are keywords to specify trimming string characters from the left end of the string
- TRAILING, FROM - these are keywords to specify trimming string characters from the right end of the string
Examples:
sql
> SELECT trim(' SparkSQL ');
SparkSQL
> SELECT trim(BOTH FROM ' SparkSQL ');
SparkSQL
> SELECT trim(LEADING FROM ' SparkSQL ');
SparkSQL
> SELECT trim(TRAILING FROM ' SparkSQL ');
SparkSQL
> SELECT trim('SL' FROM 'SSparkSQLS');
parkSQ
> SELECT trim(BOTH 'SL' FROM 'SSparkSQLS');
parkSQ
> SELECT trim(LEADING 'SL' FROM 'SSparkSQLS');
parkSQLS
> SELECT trim(TRAILING 'SL' FROM 'SSparkSQLS');
SSparkSQ
Since: 1.5.0
trunc
trunc(date, fmt) - Returns date with the time portion of the day truncated to the unit specified by the format model fmt.
Arguments:
- date - date value or valid date string
- fmt - the format representing the unit to be truncated to
- "YEAR", "YYYY", "YY" - truncate to the first date of the year that the
datefalls in - "QUARTER" - truncate to the first date of the quarter that the
datefalls in - "MONTH", "MM", "MON" - truncate to the first date of the month that the
datefalls in - "WEEK" - truncate to the Monday of the week that the
datefalls in
- "YEAR", "YYYY", "YY" - truncate to the first date of the year that the
Examples:
sql
> SELECT trunc('2019-08-04', 'week');
2019-07-29
> SELECT trunc('2019-08-04', 'quarter');
2019-07-01
> SELECT trunc('2009-02-12', 'MM');
2009-02-01
> SELECT trunc('2015-10-27', 'YEAR');
2015-01-01
Since: 1.5.0
try_add
try_add(expr1, expr2) - Returns the sum of expr1and expr2 and the result is null on overflow. The acceptable input types are the same with the + operator.
Examples:
sql
> SELECT try_add(1, 2);
3
> SELECT try_add(2147483647, 1);
NULL
> SELECT try_add(date'2021-01-01', 1);
2021-01-02
> SELECT try_add(date'2021-01-01', interval 1 year);
2022-01-01
> SELECT try_add(timestamp'2021-01-01 00:00:00', interval 1 day);
2021-01-02 00:00:00
> SELECT try_add(interval 1 year, interval 2 year);
3-0
Since: 3.2.0
try_aes_decrypt
try_aes_decrypt(expr, key[, mode[, padding[, aad]]]) - This is a special version of aes_decrypt that performs the same operation, but returns a NULL value instead of raising an error if the decryption cannot be performed.
Examples:
sql
> SELECT try_aes_decrypt(unhex('6E7CA17BBB468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210'), '0000111122223333', 'GCM');
Spark SQL
> SELECT try_aes_decrypt(unhex('----------468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210'), '0000111122223333', 'GCM');
NULL
Since: 3.5.0
try_avg
try_avg(expr) - Returns the mean calculated from values of a group and the result is null on overflow.
Examples:
sql
> SELECT try_avg(col) FROM VALUES (1), (2), (3) AS tab(col);
2.0
> SELECT try_avg(col) FROM VALUES (1), (2), (NULL) AS tab(col);
1.5
> SELECT try_avg(col) FROM VALUES (interval '2147483647 months'), (interval '1 months') AS tab(col);
NULL
Since: 3.3.0
try_divide
try_divide(dividend, divisor) - Returns dividend/divisor. It always performs floating point division. Its result is always null if expr2 is 0. dividend must be a numeric or an interval. divisor must be a numeric.
Examples:
sql
> SELECT try_divide(3, 2);
1.5
> SELECT try_divide(2L, 2L);
1.0
> SELECT try_divide(1, 0);
NULL
> SELECT try_divide(interval 2 month, 2);
0-1
> SELECT try_divide(interval 2 month, 0);
NULL
Since: 3.2.0
try_element_at
try_element_at(array, index) - Returns element of array at given (1-based) index. If Index is 0, Spark will throw an error. If index < 0, accesses elements from the last to the first. The function always returns NULL if the index exceeds the length of the array.
try_element_at(map, key) - Returns value for given key. The function always returns NULL if the key is not contained in the map.
Examples:
sql
> SELECT try_element_at(array(1, 2, 3), 2);
2
> SELECT try_element_at(map(1, 'a', 2, 'b'), 2);
b
Since: 3.3.0
try_make_interval
try_make_interval([years[, months[, weeks[, days[, hours[, mins[, secs]]]]]]]) - This is a special version of make_interval that performs the same operation, but returns NULL when an overflow occurs.
Arguments:
- years - the number of years, positive or negative
- months - the number of months, positive or negative
- weeks - the number of weeks, positive or negative
- days - the number of days, positive or negative
- hours - the number of hours, positive or negative
- mins - the number of minutes, positive or negative
- secs - the number of seconds with the fractional part in microsecond precision.
Examples:
sql
> SELECT try_make_interval(100, 11, 1, 1, 12, 30, 01.001001);
100 years 11 months 8 days 12 hours 30 minutes 1.001001 seconds
> SELECT try_make_interval(100, null, 3);
NULL
> SELECT try_make_interval(0, 1, 0, 1, 0, 0, 100.000001);
1 months 1 days 1 minutes 40.000001 seconds
> SELECT try_make_interval(2147483647);
NULL
Since: 4.0.0
try_make_timestamp
try_make_timestamp(year, month, day, hour, min, sec[, timezone]) - Try to create a timestamp from year, month, day, hour, min, sec and timezone fields. The result data type is consistent with the value of configuration spark.sql.timestampType. The function returns NULL on invalid inputs.
Arguments:
- year - the year to represent, from 1 to 9999
- month - the month-of-year to represent, from 1 (January) to 12 (December)
- day - the day-of-month to represent, from 1 to 31
- hour - the hour-of-day to represent, from 0 to 23
- min - the minute-of-hour to represent, from 0 to 59
- sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. The value can be either an integer like 13 , or a fraction like 13.123. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp.
- timezone - the time zone identifier. For example, CET, UTC and etc.
Examples:
sql
> SELECT try_make_timestamp(2014, 12, 28, 6, 30, 45.887);
2014-12-28 06:30:45.887
> SELECT try_make_timestamp(2014, 12, 28, 6, 30, 45.887, 'CET');
2014-12-27 21:30:45.887
> SELECT try_make_timestamp(2019, 6, 30, 23, 59, 60);
2019-07-01 00:00:00
> SELECT try_make_timestamp(2019, 6, 30, 23, 59, 1);
2019-06-30 23:59:01
> SELECT try_make_timestamp(null, 7, 22, 15, 30, 0);
NULL
> SELECT try_make_timestamp(2024, 13, 22, 15, 30, 0);
NULL
Since: 4.0.0
try_make_timestamp_ltz
try_make_timestamp_ltz(year, month, day, hour, min, sec[, timezone]) - Try to create the current timestamp with local time zone from year, month, day, hour, min, sec and timezone fields. The function returns NULL on invalid inputs.
Arguments:
- year - the year to represent, from 1 to 9999
- month - the month-of-year to represent, from 1 (January) to 12 (December)
- day - the day-of-month to represent, from 1 to 31
- hour - the hour-of-day to represent, from 0 to 23
- min - the minute-of-hour to represent, from 0 to 59
- sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp.
- timezone - the time zone identifier. For example, CET, UTC and etc.
Examples:
sql
> SELECT try_make_timestamp_ltz(2014, 12, 28, 6, 30, 45.887);
2014-12-28 06:30:45.887
> SELECT try_make_timestamp_ltz(2014, 12, 28, 6, 30, 45.887, 'CET');
2014-12-27 21:30:45.887
> SELECT try_make_timestamp_ltz(2019, 6, 30, 23, 59, 60);
2019-07-01 00:00:00
> SELECT try_make_timestamp_ltz(null, 7, 22, 15, 30, 0);
NULL
> SELECT try_make_timestamp_ltz(2024, 13, 22, 15, 30, 0);
NULL
Since: 4.0.0
try_make_timestamp_ntz
try_make_timestamp_ntz(year, month, day, hour, min, sec) - Try to create local date-time from year, month, day, hour, min, sec fields. The function returns NULL on invalid inputs.
Arguments:
- year - the year to represent, from 1 to 9999
- month - the month-of-year to represent, from 1 (January) to 12 (December)
- day - the day-of-month to represent, from 1 to 31
- hour - the hour-of-day to represent, from 0 to 23
- min - the minute-of-hour to represent, from 0 to 59
- sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp.
Examples:
sql
> SELECT try_make_timestamp_ntz(2014, 12, 28, 6, 30, 45.887);
2014-12-28 06:30:45.887
> SELECT try_make_timestamp_ntz(2019, 6, 30, 23, 59, 60);
2019-07-01 00:00:00
> SELECT try_make_timestamp_ntz(null, 7, 22, 15, 30, 0);
NULL
> SELECT try_make_timestamp_ntz(2024, 13, 22, 15, 30, 0);
NULL
Since: 4.0.0
try_mod
try_mod(dividend, divisor) - Returns the remainder after expr1/expr2. dividend must be a numeric. divisor must be a numeric.
Examples:
sql
> SELECT try_mod(3, 2);
1
> SELECT try_mod(2L, 2L);
0
> SELECT try_mod(3.0, 2.0);
1.0
> SELECT try_mod(1, 0);
NULL
Since: 4.0.0
try_multiply
try_multiply(expr1, expr2) - Returns expr1*expr2 and the result is null on overflow. The acceptable input types are the same with the * operator.
Examples:
sql
> SELECT try_multiply(2, 3);
6
> SELECT try_multiply(-2147483648, 10);
NULL
> SELECT try_multiply(interval 2 year, 3);
6-0
Since: 3.3.0
try_parse_json
try_parse_json(jsonStr) - Parse a JSON string as a Variant value. Return NULL when the string is not valid JSON value.
Examples:
sql
> SELECT try_parse_json('{"a":1,"b":0.8}');
{"a":1,"b":0.8}
> SELECT try_parse_json('{"a":1,');
NULL
Since: 4.0.0
try_parse_url
try_parse_url(url, partToExtract[, key]) - This is a special version of parse_url that performs the same operation, but returns a NULL value instead of raising an error if the parsing cannot be performed.
Examples:
sql
> SELECT try_parse_url('http://spark.apache.org/path?query=1', 'HOST');
spark.apache.org
> SELECT try_parse_url('http://spark.apache.org/path?query=1', 'QUERY');
query=1
> SELECT try_parse_url('inva lid://spark.apache.org/path?query=1', 'QUERY');
NULL
> SELECT try_parse_url('http://spark.apache.org/path?query=1', 'QUERY', 'query');
1
Since: 4.0.0
try_reflect
try_reflect(class, method[, arg1[, arg2 ...]]) - This is a special version of reflect that performs the same operation, but returns a NULL value instead of raising an error if the invoke method thrown exception.
Examples:
sql
> SELECT try_reflect('java.util.UUID', 'randomUUID');
c33fb387-8500-4bfa-81d2-6e0e3e930df2
> SELECT try_reflect('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
a5cf6c42-0c85-418f-af6c-3e4e5b1328f2
> SELECT try_reflect('java.net.URLDecoder', 'decode', '%');
NULL
Since: 4.0.0
try_subtract
try_subtract(expr1, expr2) - Returns expr1-expr2 and the result is null on overflow. The acceptable input types are the same with the - operator.
Examples:
sql
> SELECT try_subtract(2, 1);
1
> SELECT try_subtract(-2147483648, 1);
NULL
> SELECT try_subtract(date'2021-01-02', 1);
2021-01-01
> SELECT try_subtract(date'2021-01-01', interval 1 year);
2020-01-01
> SELECT try_subtract(timestamp'2021-01-02 00:00:00', interval 1 day);
2021-01-01 00:00:00
> SELECT try_subtract(interval 2 year, interval 1 year);
1-0
Since: 3.3.0
try_sum
try_sum(expr) - Returns the sum calculated from values of a group and the result is null on overflow.
Examples:
sql
> SELECT try_sum(col) FROM VALUES (5), (10), (15) AS tab(col);
30
> SELECT try_sum(col) FROM VALUES (NULL), (10), (15) AS tab(col);
25
> SELECT try_sum(col) FROM VALUES (NULL), (NULL) AS tab(col);
NULL
> SELECT try_sum(col) FROM VALUES (9223372036854775807L), (1L) AS tab(col);
NULL
Since: 3.3.0
try_to_binary
try_to_binary(str[, fmt]) - This is a special version of to_binary that performs the same operation, but returns a NULL value instead of raising an error if the conversion cannot be performed.
Examples:
sql
> SELECT try_to_binary('abc', 'utf-8');
abc
> select try_to_binary('a!', 'base64');
NULL
> select try_to_binary('abc', 'invalidFormat');
NULL
Since: 3.3.0
try_to_number
try_to_number(expr, fmt) - Convert string 'expr' to a number based on the string format fmt. Returns NULL if the string 'expr' does not match the expected format. The format follows the same semantics as the to_number function.
Examples:
sql
> SELECT try_to_number('454', '999');
454
> SELECT try_to_number('454.00', '000.00');
454.00
> SELECT try_to_number('12,454', '99,999');
12454
> SELECT try_to_number('$78.12', '$99.99');
78.12
> SELECT try_to_number('12,454.8-', '99,999.9S');
-12454.8
Since: 3.3.0
try_to_timestamp
try_to_timestamp(timestamp_str[, fmt]) - Parses the timestamp_str expression with the fmt expression to a timestamp. The function always returns null on an invalid input with/without ANSI SQL mode enabled. By default, it follows casting rules to a timestamp if the fmt is omitted. The result data type is consistent with the value of configuration spark.sql.timestampType.
Arguments:
- timestamp_str - A string to be parsed to timestamp.
- fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns.
Examples:
sql
> SELECT try_to_timestamp('2016-12-31 00:12:00');
2016-12-31 00:12:00
> SELECT try_to_timestamp('2016-12-31', 'yyyy-MM-dd');
2016-12-31 00:00:00
> SELECT try_to_timestamp('foo', 'yyyy-MM-dd');
NULL
Since: 3.4.0
try_url_decode
try_url_decode(str) - This is a special version of url_decode that performs the same operation, but returns a NULL value instead of raising an error if the decoding cannot be performed.
Arguments:
- str - a string expression to decode
Examples:
sql
> SELECT try_url_decode('https%3A%2F%2Fspark.apache.org');
https://spark.apache.org
Since: 4.0.0
try_validate_utf8
try_validate_utf8(str) - Returns the original string if str is a valid UTF-8 string, otherwise returns NULL.
Arguments:
- str - a string expression
Examples:
sql
> SELECT try_validate_utf8('Spark');
Spark
> SELECT try_validate_utf8(x'61');
a
> SELECT try_validate_utf8(x'80');
NULL
> SELECT try_validate_utf8(x'61C262');
NULL
Since: 4.0.0
try_variant_get
try_variant_get(v, path[, type]) - Extracts a sub-variant from v according to path, and then cast the sub-variant to type. When type is omitted, it is default to variant. Returns null if the path does not exist or the cast fails.
Examples:
sql
> SELECT try_variant_get(parse_json('{"a": 1}'), '$.a', 'int');
1
> SELECT try_variant_get(parse_json('{"a": 1}'), '$.b', 'int');
NULL
> SELECT try_variant_get(parse_json('[1, "2"]'), '$[1]', 'string');
2
> SELECT try_variant_get(parse_json('[1, "2"]'), '$[2]', 'string');
NULL
> SELECT try_variant_get(parse_json('[1, "hello"]'), '$[1]');
"hello"
> SELECT try_variant_get(parse_json('[1, "hello"]'), '$[1]', 'int');
NULL
Since: 4.0.0
typeof
typeof(expr) - Return DDL-formatted type string for the data type of the input.
Examples:
sql
> SELECT typeof(1);
int
> SELECT typeof(array(1));
array<int>
Since: 3.0.0