一、问题
按每年的1月1日算当年的第一个自然周
(遇到跨年也不管,如果1月1日是周三,那么到1月5号(周日)算是本年的第一个自然周, 如果按周一是一周的第一天)
计算是本年的第几周,那么 spark sql 如何写 ?
二、分析
难点 :
- Spark SQL 的 DAYOFWEEK 函数返回的每周第一天是周日。
- 边界值的处理,即第一周如何判定、第二周从哪天开始计算。
对应的伪代码
java
int day_of_week(int day) {
if ( day == 7) {
return 1;
} else {
return day + 1;
}
}
dayofyear = DAYOFYEAR(your_date_column)
if(dayofyear <= 7 - day_of_week(first_day_of_year_week_number) + 1) {
return 1;
} else {
return ceil( (dayofyear - 1) / 7.0);
}
先给出 sql 关键逻辑
sql
CASE
WHEN DAYOFWEEK(your_date_column) = 1 THEN 7
ELSE DAYOFWEEK(your_date_column) - 1
END AS day_of_week,
CASE
WHEN DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) = 1 THEN 7
ELSE DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) - 1
END AS first_day_of_year_week_number,
to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd') as first_day_of_year,
// 上面的 sql 是内层
CASE
WHEN DAYOFYEAR(your_date_column) <= 8 - first_day_of_year_week_number THEN 1
ELSE CEIL( (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ) + 1
END AS week_number,
多找一些边界值测试一下。
DAYOFWEEK(your_date_column)分别返回
周日 周一 周二 周三 周四 周五 周六
1 2 3 4 5 6 7
如果要让周一是第一天,那么需要调整偏移量
java
int day_of_week(int day) {
if ( day == 7) {
return 1;
} else {
return day + 1;
}
}
调整后的函数逻辑
周一 周二 周三 周四 周五 周六 周日
1 2 3 4 5 6 7
sql 逻辑
sql
CASE
WHEN DAYOFWEEK(your_date_column) = 1 THEN 7
ELSE DAYOFWEEK(your_date_column) - 1
END AS day_of_week,
2023-01-01 年是周日,
那么 DAYOFWEEK(your_date_column) 返回的是 1,即本周第一天。
WEEKOFYEAR(your_date_column) 返回的是 52, 即 2022 年最后一周。
但实际上我们要求的结果应该是 2023 年的第一周。
2023-01-02 年是周一,
那么 DAYOFWEEK(your_date_column) 返回的是 2,即本周第二天。
WEEKOFYEAR(your_date_column) 返回的是 1, 即 2023 年第一周。
但实际上我们要求的结果应该是 2023 年的第二周。
三、验证
sql
drop table your_table;
CREATE TABLE your_table (
id INT,
your_date_column DATE
);
CREATE OR REPLACE TEMPORARY VIEW temp_view AS
SELECT 1 as id, to_date('2023-01-01', 'yyyy-MM-dd') as your_date_column
UNION ALL SELECT 2, to_date('2023-01-02', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-03', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-04', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-05', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-06', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-07', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-08', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-09', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-10', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-11', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-12', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-13', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-14', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-15', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-16', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-17', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-18', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-19', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-20', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-21', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-22', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-23', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-24', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-25', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-26', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-27', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-28', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-29', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-30', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-31', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-01', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-02', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-03', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-04', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-05', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-06', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-07', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-08', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-09', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-15', 'yyyy-MM-dd')
UNION ALL SELECT 4, to_date('2023-12-31', 'yyyy-MM-dd')
UNION ALL SELECT 5, to_date('2024-01-01', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-02', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-03', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-04', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-05', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-06', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-07', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-08', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-09', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-10', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-11', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-12', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-13', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-14', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-15', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-16', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-17', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-18', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-19', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-20', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-21', 'yyyy-MM-dd')
;
INSERT INTO your_table
SELECT * FROM temp_view;
SELECT
your_date_column,
DAYOFYEAR(your_date_column),
8 - first_day_of_year_week_number,
(DAYOFYEAR(your_date_column) - day_of_week ),
(DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ,
CEIL( (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ),
CEIL( (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ) + 1,
CASE
WHEN DAYOFYEAR(your_date_column) <= 8 - first_day_of_year_week_number THEN 1
ELSE CEIL( (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ) + 1
END AS week_number, // 所求的结果
*
FROM (
SELECT
'|',
your_date_column,
DAYOFWEEK(your_date_column),
DAYOFYEAR(your_date_column),
CASE
WHEN DAYOFWEEK(your_date_column) = 1 THEN 7
ELSE DAYOFWEEK(your_date_column) - 1
END AS day_of_week,
CASE
WHEN DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) = 1 THEN 7
ELSE DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) - 1
END AS first_day_of_year_week_number, // 每年第一天是周几,如果是周一返回 1,周日返回 7
to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd') as first_day_of_year, // 每年第一天的日期
date_format(your_date_column, 'EEEE') as WEEK
FROM
your_table
);
2023-01-01 1 1 -6 -0.857143 0 1 1 | 2023-01-01 1 1 7 7 2023-01-01 Sunday
2023-01-02 2 1 1 0.142857 1 2 2 | 2023-01-02 2 2 1 7 2023-01-01 Monday
2023-01-03 3 1 1 0.142857 1 2 2 | 2023-01-03 3 3 2 7 2023-01-01 Tuesday
2023-01-04 4 1 1 0.142857 1 2 2 | 2023-01-04 4 4 3 7 2023-01-01 Wednesday
2023-01-05 5 1 1 0.142857 1 2 2 | 2023-01-05 5 5 4 7 2023-01-01 Thursday
2023-01-06 6 1 1 0.142857 1 2 2 | 2023-01-06 6 6 5 7 2023-01-01 Friday
2023-01-07 7 1 1 0.142857 1 2 2 | 2023-01-07 7 7 6 7 2023-01-01 Saturday
2023-01-08 8 1 1 0.142857 1 2 2 | 2023-01-08 1 8 7 7 2023-01-01 Sunday
2023-01-09 9 1 8 1.142857 2 3 3 | 2023-01-09 2 9 1 7 2023-01-01 Monday
2023-01-10 10 1 8 1.142857 2 3 3 | 2023-01-10 3 10 2 7 2023-01-01 Tuesday
2023-01-11 11 1 8 1.142857 2 3 3 | 2023-01-11 4 11 3 7 2023-01-01 Wednesday
2023-01-12 12 1 8 1.142857 2 3 3 | 2023-01-12 5 12 4 7 2023-01-01 Thursday
2023-01-13 13 1 8 1.142857 2 3 3 | 2023-01-13 6 13 5 7 2023-01-01 Friday
2023-01-14 14 1 8 1.142857 2 3 3 | 2023-01-14 7 14 6 7 2023-01-01 Saturday
2023-01-15 15 1 8 1.142857 2 3 3 | 2023-01-15 1 15 7 7 2023-01-01 Sunday
2023-01-16 16 1 15 2.142857 3 4 4 | 2023-01-16 2 16 1 7 2023-01-01 Monday
2023-01-17 17 1 15 2.142857 3 4 4 | 2023-01-17 3 17 2 7 2023-01-01 Tuesday
2023-01-18 18 1 15 2.142857 3 4 4 | 2023-01-18 4 18 3 7 2023-01-01 Wednesday
2023-01-19 19 1 15 2.142857 3 4 4 | 2023-01-19 5 19 4 7 2023-01-01 Thursday
2023-01-20 20 1 15 2.142857 3 4 4 | 2023-01-20 6 20 5 7 2023-01-01 Friday
2023-01-21 21 1 15 2.142857 3 4 4 | 2023-01-21 7 21 6 7 2023-01-01 Saturday
2023-01-22 22 1 15 2.142857 3 4 4 | 2023-01-22 1 22 7 7 2023-01-01 Sunday
2023-01-23 23 1 22 3.142857 4 5 5 | 2023-01-23 2 23 1 7 2023-01-01 Monday
2023-01-24 24 1 22 3.142857 4 5 5 | 2023-01-24 3 24 2 7 2023-01-01 Tuesday
2023-01-25 25 1 22 3.142857 4 5 5 | 2023-01-25 4 25 3 7 2023-01-01 Wednesday
2023-01-26 26 1 22 3.142857 4 5 5 | 2023-01-26 5 26 4 7 2023-01-01 Thursday
2023-01-27 27 1 22 3.142857 4 5 5 | 2023-01-27 6 27 5 7 2023-01-01 Friday
2023-01-28 28 1 22 3.142857 4 5 5 | 2023-01-28 7 28 6 7 2023-01-01 Saturday
2023-01-29 29 1 22 3.142857 4 5 5 | 2023-01-29 1 29 7 7 2023-01-01 Sunday
2023-01-30 30 1 29 4.142857 5 6 6 | 2023-01-30 2 30 1 7 2023-01-01 Monday
2023-01-31 31 1 29 4.142857 5 6 6 | 2023-01-31 3 31 2 7 2023-01-01 Tuesday
2023-02-01 32 1 29 4.142857 5 6 6 | 2023-02-01 4 32 3 7 2023-01-01 Wednesday
2023-02-02 33 1 29 4.142857 5 6 6 | 2023-02-02 5 33 4 7 2023-01-01 Thursday
2023-02-03 34 1 29 4.142857 5 6 6 | 2023-02-03 6 34 5 7 2023-01-01 Friday
2023-02-04 35 1 29 4.142857 5 6 6 | 2023-02-04 7 35 6 7 2023-01-01 Saturday
2023-02-05 36 1 29 4.142857 5 6 6 | 2023-02-05 1 36 7 7 2023-01-01 Sunday
2023-02-06 37 1 36 5.142857 6 7 7 | 2023-02-06 2 37 1 7 2023-01-01 Monday
2023-02-07 38 1 36 5.142857 6 7 7 | 2023-02-07 3 38 2 7 2023-01-01 Tuesday
2023-02-08 39 1 36 5.142857 6 7 7 | 2023-02-08 4 39 3 7 2023-01-01 Wednesday
2023-02-09 40 1 36 5.142857 6 7 7 | 2023-02-09 5 40 4 7 2023-01-01 Thursday
2023-02-15 46 1 43 6.142857 7 8 8 | 2023-02-15 4 46 3 7 2023-01-01 Wednesday
2023-12-31 365 1 358 51.142857 52 53 53 | 2023-12-31 1 365 7 7 2023-01-01 Sunday
2024-01-01 1 7 0 0.000000 0 1 1 | 2024-01-01 2 1 1 1 2024-01-01 Monday
2024-01-02 2 7 0 0.000000 0 1 1 | 2024-01-02 3 2 2 1 2024-01-01 Tuesday
2024-01-03 3 7 0 0.000000 0 1 1 | 2024-01-03 4 3 3 1 2024-01-01 Wednesday
2024-01-04 4 7 0 0.000000 0 1 1 | 2024-01-04 5 4 4 1 2024-01-01 Thursday
2024-01-05 5 7 0 0.000000 0 1 1 | 2024-01-05 6 5 5 1 2024-01-01 Friday
2024-01-06 6 7 0 0.000000 0 1 1 | 2024-01-06 7 6 6 1 2024-01-01 Saturday
2024-01-07 7 7 0 0.000000 0 1 1 | 2024-01-07 1 7 7 1 2024-01-01 Sunday
2024-01-08 8 7 7 1.000000 1 2 2 | 2024-01-08 2 8 1 1 2024-01-01 Monday
2024-01-09 9 7 7 1.000000 1 2 2 | 2024-01-09 3 9 2 1 2024-01-01 Tuesday
2024-01-10 10 7 7 1.000000 1 2 2 | 2024-01-10 4 10 3 1 2024-01-01 Wednesday
2024-01-11 11 7 7 1.000000 1 2 2 | 2024-01-11 5 11 4 1 2024-01-01 Thursday
2024-01-12 12 7 7 1.000000 1 2 2 | 2024-01-12 6 12 5 1 2024-01-01 Friday
2024-01-13 13 7 7 1.000000 1 2 2 | 2024-01-13 7 13 6 1 2024-01-01 Saturday
2024-01-14 14 7 7 1.000000 1 2 2 | 2024-01-14 1 14 7 1 2024-01-01 Sunday
2024-01-15 15 7 14 2.000000 2 3 3 | 2024-01-15 2 15 1 1 2024-01-01 Monday
2024-01-16 16 7 14 2.000000 2 3 3 | 2024-01-16 3 16 2 1 2024-01-01 Tuesday
2024-01-17 17 7 14 2.000000 2 3 3 | 2024-01-17 4 17 3 1 2024-01-01 Wednesday
2024-01-18 18 7 14 2.000000 2 3 3 | 2024-01-18 5 18 4 1 2024-01-01 Thursday
2024-01-19 19 7 14 2.000000 2 3 3 | 2024-01-19 6 19 5 1 2024-01-01 Friday
2024-01-20 20 7 14 2.000000 2 3 3 | 2024-01-20 7 20 6 1 2024-01-01 Saturday
2024-01-21 21 7 14 2.000000 2 3 3 | 2024-01-21 1 21 7 1 2024-01-01 Sunday
Time taken: 8.512 seconds, Fetched 63 row(s)
在这个查询中:
date_format 函数的第二个参数 'EEEE' 指定返回完整的星期名称(如 Monday, Tuesday 等)。
DAYOFYEAR(your_date_column) 计算出年中的天数。
DAYOFWEEK(your_date_column) 返回一周中的某天(以周日为一周的第一天)。
sql
// 直接求结果,整理后的 sql 表达式
SELECT
your_date_column,
CASE
WHEN DAYOFYEAR(your_date_column) <= 8 - first_day_of_year_week_number THEN 1
ELSE CEIL( (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ) + 1
END AS week_number
FROM (
SELECT
your_date_column,
CASE
WHEN DAYOFWEEK(your_date_column) = 1 THEN 7
ELSE DAYOFWEEK(your_date_column) - 1
END AS day_of_week,
CASE
WHEN DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) = 1 THEN 7
ELSE DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) - 1
END AS first_day_of_year_week_number,
to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd') as first_day_of_year,
date_format(your_date_column, 'EEEE') as WEEK
FROM
your_table
);
2023-01-01 1
2023-01-02 2
2023-01-03 2
2023-01-04 2
2023-01-05 2
2023-01-06 2
2023-01-07 2
2023-01-08 2
2023-01-09 3
2023-01-10 3
2023-01-11 3
2023-01-12 3
2023-01-13 3
2023-01-14 3
2023-01-15 3
2023-01-16 4
2023-01-17 4
2023-01-18 4
2023-01-19 4
2023-01-20 4
2023-01-21 4
2023-01-22 4
2023-01-23 5
2023-01-24 5
2023-01-25 5
2023-01-26 5
2023-01-27 5
2023-01-28 5
2023-01-29 5
2023-01-30 6
2023-01-31 6
2023-02-01 6
2023-02-02 6
2023-02-03 6
2023-02-04 6
2023-02-05 6
2023-02-06 7
2023-02-07 7
2023-02-08 7
2023-02-09 7
2023-02-15 8
2023-12-31 53
2024-01-01 1
2024-01-02 1
2024-01-03 1
2024-01-04 1
2024-01-05 1
2024-01-06 1
2024-01-07 1
2024-01-08 2
2024-01-09 2
2024-01-10 2
2024-01-11 2
2024-01-12 2
2024-01-13 2
2024-01-14 2
2024-01-15 3
2024-01-16 3
2024-01-17 3
2024-01-18 3
2024-01-19 3
2024-01-20 3
2024-01-21 3
Time taken: 0.493 seconds, Fetched 63 row(s)
23/11/14 14:27:07 INFO SparkSQLCLIDriver: Time taken: 0.493 seconds, Fetched 63 row(s)