| 序号 | 类型 | 地址 |
|---|---|---|
| 1 | Spark 函数 | 1、Spark函数_符号 |
| 2 | Spark 函数 | 2、Spark 函数_a/b/c |
| 3 | Spark 函数 | 3、Spark 函数_d/e/f/j/h/i/j/k/l |
| 4 | Spark 函数 | 4、Spark 函数_m/n/o/p/q/r |
| 5 | Spark 函数 | 5、Spark函数_s/t |
| 6 | Spark 函数 | 6、Spark 函数_u/v/w/x/y/z |
文章目录
-
- 13、M
-
- make_date
- make_dt_interval
- make_interval
- make_timestamp
- make_timestamp_ltz
- make_timestamp_ntz
- make_valid_utf8
- make_ym_interval
- map
- map_concat
- map_contains_key
- map_entries
- map_filter
- map_from_arrays
- map_from_entries
- map_keys
- map_values
- map_zip_with
- mask
- max
- max_by
- md5
- mean
- median
- min
- min_by
- minute
- mod
- mode
- monotonically_increasing_id
- month
- monthname
- months_between
- 14、N
- 15、O
- 16、O
- 17、Q
- 18、R
-
- radians
- raise_error
- rand
- randn
- random
- randstr
- range
- rank
- reduce
- reflect
- regexp
- regexp_count
- regexp_extract
- regexp_extract_all
- regexp_instr
- regexp_like
- regexp_replace
- regexp_substr
- regr_avgx
- regr_avgy
- regr_count
- regr_intercept
- regr_r2
- regr_slope
- regr_sxx
- regr_sxy
- regr_syy
- repeat
- replace
- reverse
- right
- rint
- rlike
- round
- row_number
- rpad
- rtrim
13、M
make_date
make_date(year, month, day) - Create date from year, month and day fields. If the configuration spark.sql.ansi.enabled is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.
Arguments:
- year - the year to represent, from 1 to 9999
- month - the month-of-year to represent, from 1 (January) to 12 (December)
- day - the day-of-month to represent, from 1 to 31
Examples:
sql
> SELECT make_date(2013, 7, 15);
2013-07-15
> SELECT make_date(2019, 7, NULL);
NULL
Since: 3.0.0
make_dt_interval
make_dt_interval([days[, hours[, mins[, secs]]]]) - Make DayTimeIntervalType duration from days, hours, mins and secs.
Arguments:
- days - the number of days, positive or negative
- hours - the number of hours, positive or negative
- mins - the number of minutes, positive or negative
- secs - the number of seconds with the fractional part in microsecond precision.
Examples:
sql
> SELECT make_dt_interval(1, 12, 30, 01.001001);
1 12:30:01.001001000
> SELECT make_dt_interval(2);
2 00:00:00.000000000
> SELECT make_dt_interval(100, null, 3);
NULL
Since: 3.2.0
make_interval
make_interval([years[, months[, weeks[, days[, hours[, mins[, secs]]]]]]]) - Make interval from years, months, weeks, days, hours, mins and secs.
Arguments:
- years - the number of years, positive or negative
- months - the number of months, positive or negative
- weeks - the number of weeks, positive or negative
- days - the number of days, positive or negative
- hours - the number of hours, positive or negative
- mins - the number of minutes, positive or negative
- secs - the number of seconds with the fractional part in microsecond precision.
Examples:
sql
> SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001);
100 years 11 months 8 days 12 hours 30 minutes 1.001001 seconds
> SELECT make_interval(100, null, 3);
NULL
> SELECT make_interval(0, 1, 0, 1, 0, 0, 100.000001);
1 months 1 days 1 minutes 40.000001 seconds
Since: 3.0.0
make_timestamp
make_timestamp(year, month, day, hour, min, sec[, timezone]) - Create timestamp from year, month, day, hour, min, sec and timezone fields. The result data type is consistent with the value of configuration spark.sql.timestampType. If the configuration spark.sql.ansi.enabled is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.
Arguments:
- year - the year to represent, from 1 to 9999
- month - the month-of-year to represent, from 1 (January) to 12 (December)
- day - the day-of-month to represent, from 1 to 31
- hour - the hour-of-day to represent, from 0 to 23
- min - the minute-of-hour to represent, from 0 to 59
- sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. The value can be either an integer like 13 , or a fraction like 13.123. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp.
- timezone - the time zone identifier. For example, CET, UTC and etc.
Examples:
sql
> SELECT make_timestamp(2014, 12, 28, 6, 30, 45.887);
2014-12-28 06:30:45.887
> SELECT make_timestamp(2014, 12, 28, 6, 30, 45.887, 'CET');
2014-12-27 21:30:45.887
> SELECT make_timestamp(2019, 6, 30, 23, 59, 60);
2019-07-01 00:00:00
> SELECT make_timestamp(2019, 6, 30, 23, 59, 1);
2019-06-30 23:59:01
> SELECT make_timestamp(null, 7, 22, 15, 30, 0);
NULL
Since: 3.0.0
make_timestamp_ltz
make_timestamp_ltz(year, month, day, hour, min, sec[, timezone]) - Create the current timestamp with local time zone from year, month, day, hour, min, sec and timezone fields. If the configuration spark.sql.ansi.enabled is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.
Arguments:
- year - the year to represent, from 1 to 9999
- month - the month-of-year to represent, from 1 (January) to 12 (December)
- day - the day-of-month to represent, from 1 to 31
- hour - the hour-of-day to represent, from 0 to 23
- min - the minute-of-hour to represent, from 0 to 59
- sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp.
- timezone - the time zone identifier. For example, CET, UTC and etc.
Examples:
sql
> SELECT make_timestamp_ltz(2014, 12, 28, 6, 30, 45.887);
2014-12-28 06:30:45.887
> SELECT make_timestamp_ltz(2014, 12, 28, 6, 30, 45.887, 'CET');
2014-12-27 21:30:45.887
> SELECT make_timestamp_ltz(2019, 6, 30, 23, 59, 60);
2019-07-01 00:00:00
> SELECT make_timestamp_ltz(null, 7, 22, 15, 30, 0);
NULL
Since: 3.4.0
make_timestamp_ntz
make_timestamp_ntz(year, month, day, hour, min, sec) - Create local date-time from year, month, day, hour, min, sec fields. If the configuration spark.sql.ansi.enabled is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.
Arguments:
- year - the year to represent, from 1 to 9999
- month - the month-of-year to represent, from 1 (January) to 12 (December)
- day - the day-of-month to represent, from 1 to 31
- hour - the hour-of-day to represent, from 0 to 23
- min - the minute-of-hour to represent, from 0 to 59
- sec - the second-of-minute and its micro-fraction to represent, from 0 to 60. If the sec argument equals to 60, the seconds field is set to 0 and 1 minute is added to the final timestamp.
Examples:
sql
> SELECT make_timestamp_ntz(2014, 12, 28, 6, 30, 45.887);
2014-12-28 06:30:45.887
> SELECT make_timestamp_ntz(2019, 6, 30, 23, 59, 60);
2019-07-01 00:00:00
> SELECT make_timestamp_ntz(null, 7, 22, 15, 30, 0);
NULL
Since: 3.4.0
make_valid_utf8
make_valid_utf8(str) - Returns the original string if str is a valid UTF-8 string, otherwise returns a new string whose invalid UTF8 byte sequences are replaced using the UNICODE replacement character U+FFFD.
Arguments:
- str - a string expression
Examples:
sql
> SELECT make_valid_utf8('Spark');
Spark
> SELECT make_valid_utf8(x'61');
a
> SELECT make_valid_utf8(x'80');
�
> SELECT make_valid_utf8(x'61C262');
a�b
Since: 4.0.0
make_ym_interval
make_ym_interval([years[, months]]) - Make year-month interval from years, months.
Arguments:
- years - the number of years, positive or negative
- months - the number of months, positive or negative
Examples:
sql
> SELECT make_ym_interval(1, 2);
1-2
> SELECT make_ym_interval(1, 0);
1-0
> SELECT make_ym_interval(-1, 1);
-0-11
> SELECT make_ym_interval(2);
2-0
Since: 3.2.0
map
map(key0, value0, key1, value1, ...) - Creates a map with the given key/value pairs.
Examples:
sql
> SELECT map(1.0, '2', 3.0, '4');
{1.0:"2",3.0:"4"}
Since: 2.0.0
map_concat
map_concat(map, ...) - Returns the union of all the given maps
Examples:
sql
> SELECT map_concat(map(1, 'a', 2, 'b'), map(3, 'c'));
{1:"a",2:"b",3:"c"}
Since: 2.4.0
map_contains_key
map_contains_key(map, key) - Returns true if the map contains the key.
Examples:
sql
> SELECT map_contains_key(map(1, 'a', 2, 'b'), 1);
true
> SELECT map_contains_key(map(1, 'a', 2, 'b'), 3);
false
Since: 3.3.0
map_entries
map_entries(map) - Returns an unordered array of all entries in the given map.
Examples:
sql
> SELECT map_entries(map(1, 'a', 2, 'b'));
[{"key":1,"value":"a"},{"key":2,"value":"b"}]
Since: 3.0.0
map_filter
map_filter(expr, func) - Filters entries in a map using the function.
Examples:
sql
> SELECT map_filter(map(1, 0, 2, 2, 3, -1), (k, v) -> k > v);
{1:0,3:-1}
Since: 3.0.0
map_from_arrays
map_from_arrays(keys, values) - Creates a map with a pair of the given key/value arrays. All elements in keys should not be null
Examples:
sql
> SELECT map_from_arrays(array(1.0, 3.0), array('2', '4'));
{1.0:"2",3.0:"4"}
Since: 2.4.0
map_from_entries
map_from_entries(arrayOfEntries) - Returns a map created from the given array of entries.
Examples:
sql
> SELECT map_from_entries(array(struct(1, 'a'), struct(2, 'b')));
{1:"a",2:"b"}
Since: 2.4.0
map_keys
map_keys(map) - Returns an unordered array containing the keys of the map.
Examples:
sql
> SELECT map_keys(map(1, 'a', 2, 'b'));
[1,2]
Since: 2.0.0
map_values
map_values(map) - Returns an unordered array containing the values of the map.
Examples:
sql
> SELECT map_values(map(1, 'a', 2, 'b'));
["a","b"]
Since: 2.0.0
map_zip_with
map_zip_with(map1, map2, function) - Merges two given maps into a single map by applying function to the pair of values with the same key. For keys only presented in one map, NULL will be passed as the value for the missing key. If an input map contains duplicated keys, only the first entry of the duplicated key is passed into the lambda function.
Examples:
sql
> SELECT map_zip_with(map(1, 'a', 2, 'b'), map(1, 'x', 2, 'y'), (k, v1, v2) -> concat(v1, v2));
{1:"ax",2:"by"}
> SELECT map_zip_with(map('a', 1, 'b', 2), map('b', 3, 'c', 4), (k, v1, v2) -> coalesce(v1, 0) + coalesce(v2, 0));
{"a":1,"b":5,"c":4}
Since: 3.0.0
mask
mask(input[, upperChar, lowerChar, digitChar, otherChar]) - masks the given string value. The function replaces characters with 'X' or 'x', and numbers with 'n'. This can be useful for creating copies of tables with sensitive information removed.
Arguments:
- input - string value to mask. Supported types: STRING, VARCHAR, CHAR
- upperChar - character to replace upper-case characters with. Specify NULL to retain original character. Default value: 'X'
- lowerChar - character to replace lower-case characters with. Specify NULL to retain original character. Default value: 'x'
- digitChar - character to replace digit characters with. Specify NULL to retain original character. Default value: 'n'
- otherChar - character to replace all other characters with. Specify NULL to retain original character. Default value: NULL
Examples:
sql
> SELECT mask('abcd-EFGH-8765-4321');
xxxx-XXXX-nnnn-nnnn
> SELECT mask('abcd-EFGH-8765-4321', 'Q');
xxxx-QQQQ-nnnn-nnnn
> SELECT mask('AbCD123-@$#', 'Q', 'q');
QqQQnnn-@$#
> SELECT mask('AbCD123-@$#');
XxXXnnn-@$#
> SELECT mask('AbCD123-@$#', 'Q');
QxQQnnn-@$#
> SELECT mask('AbCD123-@$#', 'Q', 'q');
QqQQnnn-@$#
> SELECT mask('AbCD123-@$#', 'Q', 'q', 'd');
QqQQddd-@$#
> SELECT mask('AbCD123-@$#', 'Q', 'q', 'd', 'o');
QqQQdddoooo
> SELECT mask('AbCD123-@$#', NULL, 'q', 'd', 'o');
AqCDdddoooo
> SELECT mask('AbCD123-@$#', NULL, NULL, 'd', 'o');
AbCDdddoooo
> SELECT mask('AbCD123-@$#', NULL, NULL, NULL, 'o');
AbCD123oooo
> SELECT mask(NULL, NULL, NULL, NULL, 'o');
NULL
> SELECT mask(NULL);
NULL
> SELECT mask('AbCD123-@$#', NULL, NULL, NULL, NULL);
AbCD123-@$#
Since: 3.4.0
max
max(expr) - Returns the maximum value of expr.
Examples:
sql
> SELECT max(col) FROM VALUES (10), (50), (20) AS tab(col);
50
Since: 1.0.0
max_by
max_by(x, y) - Returns the value of x associated with the maximum value of y.
Examples:
sql
> SELECT max_by(x, y) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y);
b
Note:
The function is non-deterministic so the output order can be different for those associated the same values of x.
Since: 3.0.0
md5
md5(expr) - Returns an MD5 128-bit checksum as a hex string of expr.
Examples:
sql
> SELECT md5('Spark');
8cde774d6f7333752ed72cacddb05126
Since: 1.5.0
mean
mean(expr) - Returns the mean calculated from values of a group.
Examples:
sql
> SELECT mean(col) FROM VALUES (1), (2), (3) AS tab(col);
2.0
> SELECT mean(col) FROM VALUES (1), (2), (NULL) AS tab(col);
1.5
Since: 1.0.0
median
median(col) - Returns the median of numeric or ANSI interval column col.
Examples:
sql
> SELECT median(col) FROM VALUES (0), (10) AS tab(col);
5.0
> SELECT median(col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col);
0-5
Since: 3.4.0
min
min(expr) - Returns the minimum value of expr.
Examples:
sql
> SELECT min(col) FROM VALUES (10), (-1), (20) AS tab(col);
-1
Since: 1.0.0
min_by
min_by(x, y) - Returns the value of x associated with the minimum value of y.
Examples:
sql
> SELECT min_by(x, y) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y);
a
Note:
The function is non-deterministic so the output order can be different for those associated the same values of x.
Since: 3.0.0
minute
minute(timestamp) - Returns the minute component of the string/timestamp.
Examples:
sql
> SELECT minute('2009-07-30 12:58:59');
58
Since: 1.5.0
mod
expr1 % expr2, or mod(expr1, expr2) - Returns the remainder after expr1/expr2.
Examples:
sql
> SELECT 2 % 1.8;
0.2
> SELECT MOD(2, 1.8);
0.2
Since: 2.3.0
mode
mode(col[, deterministic]) - Returns the most frequent value for the values within col. NULL values are ignored. If all the values are NULL, or there are 0 rows, returns NULL. When multiple values have the same greatest frequency then either any of values is returned if deterministic is false or is not defined, or the lowest value is returned if deterministic is true. mode() WITHIN GROUP (ORDER BY col) - Returns the most frequent value for the values within col (specified in ORDER BY clause). NULL values are ignored. If all the values are NULL, or there are 0 rows, returns NULL. When multiple values have the same greatest frequency only one value will be returned. The value will be chosen based on sort direction. Return the smallest value if sort direction is asc or the largest value if sort direction is desc from multiple values with the same frequency.
Examples:
sql
> SELECT mode(col) FROM VALUES (0), (10), (10) AS tab(col);
10
> SELECT mode(col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH), (INTERVAL '10' MONTH) AS tab(col);
0-10
> SELECT mode(col) FROM VALUES (0), (10), (10), (null), (null), (null) AS tab(col);
10
> SELECT mode(col, false) FROM VALUES (-10), (0), (10) AS tab(col);
0
> SELECT mode(col, true) FROM VALUES (-10), (0), (10) AS tab(col);
-10
> SELECT mode() WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10), (10) AS tab(col);
10
> SELECT mode() WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10), (10), (20), (20) AS tab(col);
10
> SELECT mode() WITHIN GROUP (ORDER BY col DESC) FROM VALUES (0), (10), (10), (20), (20) AS tab(col);
20
Since: 3.4.0
monotonically_increasing_id
monotonically_increasing_id() - Returns monotonically increasing 64-bit integers. The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the lower 33 bits represent the record number within each partition. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records. The function is non-deterministic because its result depends on partition IDs.
Examples:
sql
> SELECT monotonically_increasing_id();
0
Since: 1.4.0
month
month(date) - Returns the month component of the date/timestamp.
Examples:
sql
> SELECT month('2016-07-30');
7
Since: 1.5.0
monthname
monthname(date) - Returns the three-letter abbreviated month name from the given date.
Examples:
sql
> SELECT monthname('2008-02-20');
Feb
Since: 4.0.0
months_between
months_between(timestamp1, timestamp2[, roundOff]) - If timestamp1 is later than timestamp2, then the result is positive. If timestamp1 and timestamp2 are on the same day of month, or both are the last day of month, time of day will be ignored. Otherwise, the difference is calculated based on 31 days per month, and rounded to 8 digits unless roundOff=false.
Examples:
sql
> SELECT months_between('1997-02-28 10:30:00', '1996-10-30');
3.94959677
> SELECT months_between('1997-02-28 10:30:00', '1996-10-30', false);
3.9495967741935485
Since: 1.5.0
14、N
named_struct
named_struct(name1, val1, name2, val2, ...) - Creates a struct with the given field names and values.
Examples:
sql
> SELECT named_struct("a", 1, "b", 2, "c", 3);
{"a":1,"b":2,"c":3}
Since: 1.5.0
nanvl
nanvl(expr1, expr2) - Returns expr1 if it's not NaN, or expr2 otherwise.
Examples:
sql
> SELECT nanvl(cast('NaN' as double), 123);
123.0
Since: 1.5.0
negative
negative(expr) - Returns the negated value of expr.
Examples:
sql
> SELECT negative(1);
-1
Since: 1.0.0
next_day
next_day(start_date, day_of_week) - Returns the first date which is later than start_date and named as indicated. The function returns NULL if at least one of the input parameters is NULL. When both of the input parameters are not NULL and day_of_week is an invalid input, the function throws SparkIllegalArgumentException if spark.sql.ansi.enabled is set to true, otherwise NULL.
Examples:
sql
> SELECT next_day('2015-01-14', 'TU');
2015-01-20
Since: 1.5.0
not
not expr - Logical not.
Examples:
sql
> SELECT not true;
false
> SELECT not false;
true
> SELECT not NULL;
NULL
Since: 1.0.0
now
now() - Returns the current timestamp at the start of query evaluation.
Examples:
sql
> SELECT now();
2020-04-25 15:49:11.914
Since: 1.6.0
nth_value
nth_value(input[, offset]) - Returns the value of input at the row that is the offsetth row from beginning of the window frame. Offset starts at 1. If ignoreNulls=true, we will skip nulls when finding the offsetth row. Otherwise, every row counts for the offset. If there is no such an offsetth row (e.g., when the offset is 10, size of the window frame is less than 10), null is returned.
Arguments:
- input - the target column or expression that the function operates on.
- offset - a positive int literal to indicate the offset in the window frame. It starts with 1.
- ignoreNulls - an optional specification that indicates the NthValue should skip null values in the determination of which row to use.
Examples:
sql
> SELECT a, b, nth_value(b, 2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 1
A1 1 1
A1 2 1
A2 3 NULL
Since: 3.1.0
ntile
ntile(n) - Divides the rows for each window partition into n buckets ranging from 1 to at most n.
Arguments:
- buckets - an int expression which is number of buckets to divide the rows in. Default value is 1.
Examples:
sql
> SELECT a, b, ntile(2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 1
A1 1 1
A1 2 2
A2 3 1
Since: 2.0.0
nullif
nullif(expr1, expr2) - Returns null if expr1 equals to expr2, or expr1 otherwise.
Examples:
sql
> SELECT nullif(2, 2);
NULL
Since: 2.0.0
nullifzero
nullifzero(expr) - Returns null if expr is equal to zero, or expr otherwise.
Examples:
sql
> SELECT nullifzero(0);
NULL
> SELECT nullifzero(2);
2
Since: 4.0.0
nvl
nvl(expr1, expr2) - Returns expr2 if expr1 is null, or expr1 otherwise.
Examples:
sql
> SELECT nvl(NULL, array('2'));
["2"]
Since: 2.0.0
nvl2
nvl2(expr1, expr2, expr3) - Returns expr2 if expr1 is not null, or expr3 otherwise.
Examples:
sql
> SELECT nvl2(NULL, 2, 1);
1
Since: 2.0.0
15、O
octet_length
octet_length(expr) - Returns the byte length of string data or number of bytes of binary data.
Examples:
sql
> SELECT octet_length('Spark SQL');
9
> SELECT octet_length(x'537061726b2053514c');
9
Since: 2.3.0
or
expr1 or expr2 - Logical OR.
Examples:
sql
> SELECT true or false;
true
> SELECT false or false;
false
> SELECT true or NULL;
true
> SELECT false or NULL;
NULL
Since: 1.0.0
overlay
overlay(input, replace, pos[, len]) - Replace input with replace that starts at pos and is of length len.
Examples:
sql
> SELECT overlay('Spark SQL' PLACING '_' FROM 6);
Spark_SQL
> SELECT overlay('Spark SQL' PLACING 'CORE' FROM 7);
Spark CORE
> SELECT overlay('Spark SQL' PLACING 'ANSI ' FROM 7 FOR 0);
Spark ANSI SQL
> SELECT overlay('Spark SQL' PLACING 'tructured' FROM 2 FOR 4);
Structured SQL
> SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('_', 'utf-8') FROM 6);
Spark_SQL
> SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('CORE', 'utf-8') FROM 7);
Spark CORE
> SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('ANSI ', 'utf-8') FROM 7 FOR 0);
Spark ANSI SQL
> SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('tructured', 'utf-8') FROM 2 FOR 4);
Structured SQL
Since: 3.0.0
16、O
parse_json
parse_json(jsonStr) - Parse a JSON string as a Variant value. Throw an exception when the string is not valid JSON value.
Examples:
sql
> SELECT parse_json('{"a":1,"b":0.8}');
{"a":1,"b":0.8}
Since: 4.0.0
parse_url
parse_url(url, partToExtract[, key]) - Extracts a part from a URL.
Examples:
sql
> SELECT parse_url('http://spark.apache.org/path?query=1', 'HOST');
spark.apache.org
> SELECT parse_url('http://spark.apache.org/path?query=1', 'QUERY');
query=1
> SELECT parse_url('http://spark.apache.org/path?query=1', 'QUERY', 'query');
1
Since: 2.0.0
percent_rank
percent_rank() - Computes the percentage ranking of a value in a group of values.
Arguments:
- children - this is to base the rank on; a change in the value of one the children will trigger a change in rank. This is an internal parameter and will be assigned by the Analyser.
Examples:
sql
> SELECT a, b, percent_rank(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 0.0
A1 1 0.0
A1 2 1.0
A2 3 0.0
Since: 2.0.0
percentile
percentile(col, percentage [, frequency]) - Returns the exact percentile value of numeric or ANSI interval column col at the given percentage. The value of percentage must be between 0.0 and 1.0. The value of frequency should be positive integral
percentile(col, array(percentage1 [, percentage2]...) [, frequency]) - Returns the exact percentile value array of numeric column col at the given percentage(s). Each value of the percentage array must be between 0.0 and 1.0. The value of frequency should be positive integral
Examples:
sql
> SELECT percentile(col, 0.3) FROM VALUES (0), (10) AS tab(col);
3.0
> SELECT percentile(col, array(0.25, 0.75)) FROM VALUES (0), (10) AS tab(col);
[2.5,7.5]
> SELECT percentile(col, 0.5) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col);
0-5
> SELECT percentile(col, array(0.2, 0.5)) FROM VALUES (INTERVAL '0' SECOND), (INTERVAL '10' SECOND) AS tab(col);
[0 00:00:02.000000000,0 00:00:05.000000000]
Since: 2.1.0
percentile_approx
percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile of the numeric or ansi interval column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The accuracy parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation. When percentage is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column col at the given percentage array.
Examples:
sql
> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 100) FROM VALUES (0), (1), (2), (10) AS tab(col);
[1,1,0]
> SELECT percentile_approx(col, 0.5, 100) FROM VALUES (0), (6), (7), (9), (10) AS tab(col);
7
> SELECT percentile_approx(col, 0.5, 100) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '1' MONTH), (INTERVAL '2' MONTH), (INTERVAL '10' MONTH) AS tab(col);
0-1
> SELECT percentile_approx(col, array(0.5, 0.7), 100) FROM VALUES (INTERVAL '0' SECOND), (INTERVAL '1' SECOND), (INTERVAL '2' SECOND), (INTERVAL '10' SECOND) AS tab(col);
[0 00:00:01.000000000,0 00:00:02.000000000]
Since: 2.1.0
percentile_cont
percentile_cont(percentage) WITHIN GROUP (ORDER BY col) - Return a percentile value based on a continuous distribution of numeric or ANSI interval column col at the given percentage (specified in ORDER BY clause).
Examples:
sql
> SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10) AS tab(col);
2.5
> SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col);
0-2
Since: 4.0.0
percentile_disc
percentile_disc(percentage) WITHIN GROUP (ORDER BY col) - Return a percentile value based on a discrete distribution of numeric or ANSI interval column col at the given percentage (specified in ORDER BY clause).
Examples:
sql
> SELECT percentile_disc(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10) AS tab(col);
0.0
> SELECT percentile_disc(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col);
0-0
Since: 4.0.0
pi
pi() - Returns pi.
Examples:
sql
> SELECT pi();
3.141592653589793
Since: 1.5.0
pmod
pmod(expr1, expr2) - Returns the positive value of expr1 mod expr2.
Examples:
sql
> SELECT pmod(10, 3);
1
> SELECT pmod(-10, 3);
2
Since: 1.5.0
posexplode
posexplode(expr) - Separates the elements of array expr into multiple rows with positions, or the elements of map expr into multiple rows and columns with positions. Unless specified otherwise, uses the column name pos for position, col for elements of the array or key and value for elements of the map.
Examples:
sql
> SELECT posexplode(array(10,20));
0 10
1 20
> SELECT posexplode(collection => array(10,20));
0 10
1 20
Since: 2.0.0
posexplode_outer
posexplode_outer(expr) - Separates the elements of array expr into multiple rows with positions, or the elements of map expr into multiple rows and columns with positions. Unless specified otherwise, uses the column name pos for position, col for elements of the array or key and value for elements of the map.
Examples:
sql
> SELECT posexplode_outer(array(10,20));
0 10
1 20
> SELECT posexplode_outer(collection => array(10,20));
0 10
1 20
Since: 2.0.0
position
position(substr, str[, pos]) - Returns the position of the first occurrence of substr in str after position pos. The given pos and return value are 1-based.
Examples:
sql
> SELECT position('bar', 'foobarbar');
4
> SELECT position('bar', 'foobarbar', 5);
7
> SELECT POSITION('bar' IN 'foobarbar');
4
Since: 2.3.0
positive
positive(expr) - Returns the value of expr.
Examples:
sql
> SELECT positive(1);
1
Since: 1.5.0
pow
pow(expr1, expr2) - Raises expr1 to the power of expr2.
Examples:
sql
> SELECT pow(2, 3);
8.0
Since: 1.4.0
power
power(expr1, expr2) - Raises expr1 to the power of expr2.
Examples:
sql
> SELECT power(2, 3);
8.0
Since: 1.4.0
printf
printf(strfmt, obj, ...) - Returns a formatted string from printf-style format strings.
Examples:
sql
> SELECT printf("Hello World %d %s", 100, "days");
Hello World 100 days
Since: 1.5.0
17、Q
quarter
quarter(date) - Returns the quarter of the year for date, in the range 1 to 4.
Examples:
sql
> SELECT quarter('2016-08-31');
3
Since: 1.5.0
18、R
radians
radians(expr) - Converts degrees to radians.
Arguments:
- expr - angle in degrees
Examples:
sql
> SELECT radians(180);
3.141592653589793
Since: 1.4.0
raise_error
raise_error( expr ) - Throws a USER_RAISED_EXCEPTION with expr as message.
Examples:
sql
> SELECT raise_error('custom error message');
[USER_RAISED_EXCEPTION] custom error message
Since: 3.1.0
rand
rand([seed]) - Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).
Examples:
sql
> SELECT rand();
0.9629742951434543
> SELECT rand(0);
0.7604953758285915
> SELECT rand(null);
0.7604953758285915
Note:
The function is non-deterministic in general case.
Since: 1.5.0
randn
randn([seed]) - Returns a random value with independent and identically distributed (i.i.d.) values drawn from the standard normal distribution.
Examples:
sql
> SELECT randn();
-0.3254147983080288
> SELECT randn(0);
1.6034991609278433
> SELECT randn(null);
1.6034991609278433
Note:
The function is non-deterministic in general case.
Since: 1.5.0
random
random([seed]) - Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).
Examples:
sql
> SELECT random();
0.9629742951434543
> SELECT random(0);
0.7604953758285915
> SELECT random(null);
0.7604953758285915
Note:
The function is non-deterministic in general case.
Since: 3.0.0
randstr
randstr(length[, seed]) - Returns a string of the specified length whose characters are chosen uniformly at random from the following pool of characters: 0-9, a-z, A-Z. The random seed is optional. The string length must be a constant two-byte or four-byte integer (SMALLINT or INT, respectively).
Examples:
sql
> SELECT randstr(3, 0) AS result;
ceV
Since: 4.0.0
range
range(start[, end[, step[, numSlices]]]) / range(end) - Returns a table of values within a specified range.
Arguments:
- start - An optional BIGINT literal defaulted to 0, marking the first value generated.
- end - A BIGINT literal marking endpoint (exclusive) of the number generation.
- step - An optional BIGINT literal defaulted to 1, specifying the increment used when generating values.
- numParts - An optional INTEGER literal specifying how the production of rows is spread across partitions.
Examples:
sql
> SELECT * FROM range(1);
+---+
| id|
+---+
| 0|
+---+
> SELECT * FROM range(0, 2);
+---+
|id |
+---+
|0 |
|1 |
+---+
> SELECT * FROM range(0, 4, 2);
+---+
|id |
+---+
|0 |
|2 |
+---+
Since: 2.0.0
rank
rank() - Computes the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence.
Arguments:
- children - this is to base the rank on; a change in the value of one the children will trigger a change in rank. This is an internal parameter and will be assigned by the Analyser.
Examples:
sql
> SELECT a, b, rank(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 1
A1 1 1
A1 2 3
A2 3 1
Since: 2.0.0
reduce
reduce(expr, start, merge, finish) - Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.
Examples:
sql
> SELECT reduce(array(1, 2, 3), 0, (acc, x) -> acc + x);
6
> SELECT reduce(array(1, 2, 3), 0, (acc, x) -> acc + x, acc -> acc * 10);
60
Since: 3.4.0
reflect
reflect(class, method[, arg1[, arg2 ...]]) - Calls a method with reflection.
Examples:
sql
> SELECT reflect('java.util.UUID', 'randomUUID');
c33fb387-8500-4bfa-81d2-6e0e3e930df2
> SELECT reflect('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
a5cf6c42-0c85-418f-af6c-3e4e5b1328f2
Since: 2.0.0
regexp
regexp(str, regexp) - Returns true if str matches regexp, or false otherwise.
Arguments:
-
str - a string expression
-
regexp - a string expression. The regex string should be a Java regular expression.
Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal. For example, to match "\abc", a regular expression for
regexpcan be "^\abc$".There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the
regexpthat can match "\abc" is "^\abc$".It's recommended to use a raw string literal (with the
rprefix) to avoid escaping special characters in the pattern string if exists.
Examples:
sql
> SET spark.sql.parser.escapedStringLiterals=true;
spark.sql.parser.escapedStringLiterals true
> SELECT regexp('%SystemDrive%\Users\John', '%SystemDrive%\\Users.*');
true
> SET spark.sql.parser.escapedStringLiterals=false;
spark.sql.parser.escapedStringLiterals false
> SELECT regexp('%SystemDrive%\\Users\\John', '%SystemDrive%\\\\Users.*');
true
> SELECT regexp('%SystemDrive%\\Users\\John', r'%SystemDrive%\\Users.*');
true
Note:
Use LIKE to match with simple string pattern.
Since: 3.2.0
regexp_count
regexp_count(str, regexp) - Returns a count of the number of times that the regular expression pattern regexp is matched in the string str.
Arguments:
- str - a string expression.
- regexp - a string representing a regular expression. The regex string should be a Java regular expression.
Examples:
sql
> SELECT regexp_count('Steven Jones and Stephen Smith are the best players', 'Ste(v|ph)en');
2
> SELECT regexp_count('abcdefghijklmnopqrstuvwxyz', '[a-z]{3}');
8
Since: 3.4.0
regexp_extract
regexp_extract(str, regexp[, idx]) - Extract the first string in the str that match the regexp expression and corresponding to the regex group index.
Arguments:
-
str - a string expression.
-
regexp - a string representing a regular expression. The regex string should be a Java regular expression.
Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal. For example, to match "\abc", a regular expression for
regexpcan be "^\abc$".There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the
regexpthat can match "\abc" is "^\abc$".It's recommended to use a raw string literal (with the
rprefix) to avoid escaping special characters in the pattern string if exists. -
idx - an integer expression that representing the group index. The regex maybe contains multiple groups.
idxindicates which regex group to extract. The group index should be non-negative. The minimum value ofidxis 0, which means matching the entire regular expression. Ifidxis not specified, the default group index value is 1. Theidxparameter is the Java regex Matcher group() method index.
Examples:
sql
> SELECT regexp_extract('100-200', '(\\d+)-(\\d+)', 1);
100
> SELECT regexp_extract('100-200', r'(\d+)-(\d+)', 1);
100
Since: 1.5.0
regexp_extract_all
regexp_extract_all(str, regexp[, idx]) - Extract all strings in the str that match the regexp expression and corresponding to the regex group index.
Arguments:
-
str - a string expression.
-
regexp - a string representing a regular expression. The regex string should be a Java regular expression.
Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal. For example, to match "\abc", a regular expression for
regexpcan be "^\abc$".There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the
regexpthat can match "\abc" is "^\abc$".It's recommended to use a raw string literal (with the
rprefix) to avoid escaping special characters in the pattern string if exists. -
idx - an integer expression that representing the group index. The regex may contains multiple groups.
idxindicates which regex group to extract. The group index should be non-negative. The minimum value ofidxis 0, which means matching the entire regular expression. Ifidxis not specified, the default group index value is 1. Theidxparameter is the Java regex Matcher group() method index.
Examples:
sql
> SELECT regexp_extract_all('100-200, 300-400', '(\\d+)-(\\d+)', 1);
["100","300"]
> SELECT regexp_extract_all('100-200, 300-400', r'(\d+)-(\d+)', 1);
["100","300"]
Since: 3.1.0
regexp_instr
regexp_instr(str, regexp) - Searches a string for a regular expression and returns an integer that indicates the beginning position of the matched substring. Positions are 1-based, not 0-based. If no match is found, returns 0.
Arguments:
-
str - a string expression.
-
regexp - a string representing a regular expression. The regex string should be a Java regular expression.
Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal. For example, to match "\abc", a regular expression for
regexpcan be "^\abc$".There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the
regexpthat can match "\abc" is "^\abc$".It's recommended to use a raw string literal (with the
rprefix) to avoid escaping special characters in the pattern string if exists.
Examples:
sql
> SELECT regexp_instr(r"\abc", r"^\\abc$");
1
> SELECT regexp_instr('user@spark.apache.org', '@[^.]*');
5
Since: 3.4.0
regexp_like
regexp_like(str, regexp) - Returns true if str matches regexp, or false otherwise.
Arguments:
-
str - a string expression
-
regexp - a string expression. The regex string should be a Java regular expression.
Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal. For example, to match "\abc", a regular expression for
regexpcan be "^\abc$".There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the
regexpthat can match "\abc" is "^\abc$".It's recommended to use a raw string literal (with the
rprefix) to avoid escaping special characters in the pattern string if exists.
Examples:
sql
> SET spark.sql.parser.escapedStringLiterals=true;
spark.sql.parser.escapedStringLiterals true
> SELECT regexp_like('%SystemDrive%\Users\John', '%SystemDrive%\\Users.*');
true
> SET spark.sql.parser.escapedStringLiterals=false;
spark.sql.parser.escapedStringLiterals false
> SELECT regexp_like('%SystemDrive%\\Users\\John', '%SystemDrive%\\\\Users.*');
true
> SELECT regexp_like('%SystemDrive%\\Users\\John', r'%SystemDrive%\\Users.*');
true
Note:
Use LIKE to match with simple string pattern.
Since: 3.2.0
regexp_replace
regexp_replace(str, regexp, rep[, position]) - Replaces all substrings of str that match regexp with rep.
Arguments:
-
str - a string expression to search for a regular expression pattern match.
-
regexp - a string representing a regular expression. The regex string should be a Java regular expression.
Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal. For example, to match "\abc", a regular expression for
regexpcan be "^\abc$".There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the
regexpthat can match "\abc" is "^\abc$".It's recommended to use a raw string literal (with the
rprefix) to avoid escaping special characters in the pattern string if exists. -
rep - a string expression to replace matched substrings.
-
position - a positive integer literal that indicates the position within
strto begin searching. The default is 1. If position is greater than the number of characters instr, the result isstr.
Examples:
sql
> SELECT regexp_replace('100-200', '(\\d+)', 'num');
num-num
> SELECT regexp_replace('100-200', r'(\d+)', 'num');
num-num
Since: 1.5.0
regexp_substr
regexp_substr(str, regexp) - Returns the substring that matches the regular expression regexp within the string str. If the regular expression is not found, the result is null.
Arguments:
- str - a string expression.
- regexp - a string representing a regular expression. The regex string should be a Java regular expression.
Examples:
sql
> SELECT regexp_substr('Steven Jones and Stephen Smith are the best players', 'Ste(v|ph)en');
Steven
> SELECT regexp_substr('Steven Jones and Stephen Smith are the best players', 'Jeck');
NULL
Since: 3.4.0
regr_avgx
regr_avgx(y, x) - Returns the average of the independent variable for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
Examples:
sql
> SELECT regr_avgx(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
2.75
> SELECT regr_avgx(y, x) FROM VALUES (1, null) AS tab(y, x);
NULL
> SELECT regr_avgx(y, x) FROM VALUES (null, 1) AS tab(y, x);
NULL
> SELECT regr_avgx(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
3.0
> SELECT regr_avgx(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
3.0
Since: 3.3.0
regr_avgy
regr_avgy(y, x) - Returns the average of the dependent variable for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
Examples:
sql
> SELECT regr_avgy(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
1.75
> SELECT regr_avgy(y, x) FROM VALUES (1, null) AS tab(y, x);
NULL
> SELECT regr_avgy(y, x) FROM VALUES (null, 1) AS tab(y, x);
NULL
> SELECT regr_avgy(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
1.6666666666666667
> SELECT regr_avgy(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
1.5
Since: 3.3.0
regr_count
regr_count(y, x) - Returns the number of non-null number pairs in a group, where y is the dependent variable and x is the independent variable.
Examples:
sql
> SELECT regr_count(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
4
> SELECT regr_count(y, x) FROM VALUES (1, null) AS tab(y, x);
0
> SELECT regr_count(y, x) FROM VALUES (null, 1) AS tab(y, x);
0
> SELECT regr_count(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
3
> SELECT regr_count(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
2
Since: 3.3.0
regr_intercept
regr_intercept(y, x) - Returns the intercept of the univariate linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
Examples:
sql
> SELECT regr_intercept(y, x) FROM VALUES (1, 1), (2, 2), (3, 3), (4, 4) AS tab(y, x);
0.0
> SELECT regr_intercept(y, x) FROM VALUES (1, null) AS tab(y, x);
NULL
> SELECT regr_intercept(y, x) FROM VALUES (null, 1) AS tab(y, x);
NULL
> SELECT regr_intercept(y, x) FROM VALUES (1, 1), (2, null), (3, 3), (4, 4) AS tab(y, x);
0.0
> SELECT regr_intercept(y, x) FROM VALUES (1, 1), (2, null), (null, 3), (4, 4) AS tab(y, x);
0.0
Since: 3.4.0
regr_r2
regr_r2(y, x) - Returns the coefficient of determination for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
Examples:
sql
> SELECT regr_r2(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
0.2727272727272727
> SELECT regr_r2(y, x) FROM VALUES (1, null) AS tab(y, x);
NULL
> SELECT regr_r2(y, x) FROM VALUES (null, 1) AS tab(y, x);
NULL
> SELECT regr_r2(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
0.7500000000000001
> SELECT regr_r2(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
1.0
Since: 3.3.0
regr_slope
regr_slope(y, x) - Returns the slope of the linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
Examples:
sql
> SELECT regr_slope(y, x) FROM VALUES (1, 1), (2, 2), (3, 3), (4, 4) AS tab(y, x);
1.0
> SELECT regr_slope(y, x) FROM VALUES (1, null) AS tab(y, x);
NULL
> SELECT regr_slope(y, x) FROM VALUES (null, 1) AS tab(y, x);
NULL
> SELECT regr_slope(y, x) FROM VALUES (1, 1), (2, null), (3, 3), (4, 4) AS tab(y, x);
1.0
> SELECT regr_slope(y, x) FROM VALUES (1, 1), (2, null), (null, 3), (4, 4) AS tab(y, x);
1.0
Since: 3.4.0
regr_sxx
regr_sxx(y, x) - Returns REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
Examples:
sql
> SELECT regr_sxx(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
2.75
> SELECT regr_sxx(y, x) FROM VALUES (1, null) AS tab(y, x);
NULL
> SELECT regr_sxx(y, x) FROM VALUES (null, 1) AS tab(y, x);
NULL
> SELECT regr_sxx(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
2.0
> SELECT regr_sxx(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
2.0
Since: 3.4.0
regr_sxy
regr_sxy(y, x) - Returns REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
Examples:
sql
> SELECT regr_sxy(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
0.75
> SELECT regr_sxy(y, x) FROM VALUES (1, null) AS tab(y, x);
NULL
> SELECT regr_sxy(y, x) FROM VALUES (null, 1) AS tab(y, x);
NULL
> SELECT regr_sxy(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
1.0
> SELECT regr_sxy(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
1.0
Since: 3.4.0
regr_syy
regr_syy(y, x) - Returns REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
Examples:
sql
> SELECT regr_syy(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
0.75
> SELECT regr_syy(y, x) FROM VALUES (1, null) AS tab(y, x);
NULL
> SELECT regr_syy(y, x) FROM VALUES (null, 1) AS tab(y, x);
NULL
> SELECT regr_syy(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
0.6666666666666666
> SELECT regr_syy(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
0.5
Since: 3.4.0
repeat
repeat(str, n) - Returns the string which repeats the given string value n times.
Examples:
sql
> SELECT repeat('123', 2);
123123
Since: 1.5.0
replace
replace(str, search[, replace]) - Replaces all occurrences of search with replace.
Arguments:
- str - a string expression
- search - a string expression. If
searchis not found instr,stris returned unchanged. - replace - a string expression. If
replaceis not specified or is an empty string, nothing replaces the string that is removed fromstr.
Examples:
sql
> SELECT replace('ABCabc', 'abc', 'DEF');
ABCDEF
Since: 2.3.0
reverse
reverse(array) - Returns a reversed string or an array with reverse order of elements.
Examples:
sql
> SELECT reverse('Spark SQL');
LQS krapS
> SELECT reverse(array(2, 1, 4, 3));
[3,4,1,2]
Note:
Reverse logic for arrays is available since 2.4.0.
Since: 1.5.0
right
right(str, len) - Returns the rightmost len(len can be string type) characters from the string str,if len is less or equal than 0 the result is an empty string.
Examples:
sql
> SELECT right('Spark SQL', 3);
SQL
Since: 2.3.0
rint
rint(expr) - Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
Examples:
sql
> SELECT rint(12.3456);
12.0
Since: 1.4.0
rlike
rlike(str, regexp) - Returns true if str matches regexp, or false otherwise.
Arguments:
-
str - a string expression
-
regexp - a string expression. The regex string should be a Java regular expression.
Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser, see the unescaping rules at String Literal. For example, to match "\abc", a regular expression for
regexpcan be "^\abc$".There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the
regexpthat can match "\abc" is "^\abc$".It's recommended to use a raw string literal (with the
rprefix) to avoid escaping special characters in the pattern string if exists.
Examples:
sql
> SET spark.sql.parser.escapedStringLiterals=true;
spark.sql.parser.escapedStringLiterals true
> SELECT rlike('%SystemDrive%\Users\John', '%SystemDrive%\\Users.*');
true
> SET spark.sql.parser.escapedStringLiterals=false;
spark.sql.parser.escapedStringLiterals false
> SELECT rlike('%SystemDrive%\\Users\\John', '%SystemDrive%\\\\Users.*');
true
> SELECT rlike('%SystemDrive%\\Users\\John', r'%SystemDrive%\\Users.*');
true
Note:
Use LIKE to match with simple string pattern.
Since: 1.0.0
round
round(expr, d) - Returns expr rounded to d decimal places using HALF_UP rounding mode.
Examples:
sql
> SELECT round(2.5, 0);
3
Since: 1.5.0
row_number
row_number() - Assigns a unique, sequential number to each row, starting with one, according to the ordering of rows within the window partition.
Examples:
sql
> SELECT a, b, row_number() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 1
A1 1 2
A1 2 3
A2 3 1
Since: 2.0.0
rpad
rpad(str, len[, pad]) - Returns str, right-padded with pad to a length of len. If str is longer than len, the return value is shortened to len characters. If pad is not specified, str will be padded to the right with space characters if it is a character string, and with zeros if it is a binary string.
Examples:
sql
> SELECT rpad('hi', 5, '??');
hi???
> SELECT rpad('hi', 1, '??');
h
> SELECT rpad('hi', 5);
hi
> SELECT hex(rpad(unhex('aabb'), 5));
AABB000000
> SELECT hex(rpad(unhex('aabb'), 5, unhex('1122')));
AABB112211
Since: 1.5.0
rtrim
rtrim(str) - Removes the trailing space characters from str.
Arguments:
- str - a string expression
- trimStr - the trim string characters to trim, the default value is a single space
Examples:
sql
> SELECT rtrim(' SparkSQL ');
SparkSQL
Since: 1.5.0