目录
SUBSTRING_INDEX 是 spark sql 中一个非常实用的字符串处理函数,用于根据指定的分隔符截取字符串的部分内容。
函数语法
sql
SUBSTRING_INDEX(str, delim, count)
参数说明
str:要处理的原始字符串
delim:分隔符(可以是单个字符或多个字符)
count:指定截取的位置:
正数:从左往右截取,返回第 count 个分隔符之前的内容
负数:从右往左截取,返回第 |count| 个分隔符之后的内容
零:返回空字符串
当计数超出范围时,返回本身;当分隔符不存在,返回本身。
示例
示例1
sql
SELECT
url,
SUBSTRING_INDEX(url, '.', 1) AS domain,
SUBSTRING_INDEX(url, '.', 2) AS domain1,
SUBSTRING_INDEX(SUBSTRING_INDEX(url, '.', -2), '.', 1) AS name,
SUBSTRING_INDEX(url, '.', -1) AS tld
FROM (
SELECT 'www.mysql.com' AS url
) t;
powershell
结果如下:
url domain domain1 name tld
www.mysql.com www www.mysql mysql com
示例2
sql
SELECT
path,
SUBSTRING_INDEX(path, '/', -1) AS filename,
SUBSTRING_INDEX(path, '/', 2) AS directory,
SUBSTRING_INDEX(path, '/', 5) AS directory1,
SUBSTRING_INDEX(path, '*', 2) AS directory2
FROM (
SELECT '/home/user/documents/file.txt' AS path
) t;
powershell
结果如下:
path filename directory directory1 directory2
/home/user/documents/file.txt file.txt /home /home/user/documents/file.txt /home/user/documents/file.txt