Spark SQL 每年的1月1日算当年的第一个自然周, 给出日期,计算是本年的第几周

一、问题

按每年的1月1日算当年的第一个自然周

(遇到跨年也不管,如果1月1日是周三,那么到1月5号(周日)算是本年的第一个自然周, 如果按周一是一周的第一天)

计算是本年的第几周,那么 spark sql 如何写 ?

二、分析

难点 :

  1. Spark SQL 的 DAYOFWEEK 函数返回的每周第一天是周日。
  2. 边界值的处理,即第一周如何判定、第二周从哪天开始计算。

对应的伪代码

java 复制代码
int day_of_week(int day) {
    if ( day == 7) {
        return 1;
    } else {
        return day + 1;
    }
}

dayofyear = DAYOFYEAR(your_date_column)
if(dayofyear <= 7 - day_of_week(first_day_of_year_week_number) + 1) {
    return 1;
} else {
    return ceil( (dayofyear - 1) / 7.0);
}

先给出 sql 关键逻辑

sql 复制代码
CASE 
    WHEN DAYOFWEEK(your_date_column) = 1 THEN 7
    ELSE DAYOFWEEK(your_date_column) - 1
END AS day_of_week,

CASE 
    WHEN DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) = 1 THEN 7
    ELSE DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) - 1
END AS first_day_of_year_week_number,


to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd') as first_day_of_year,



// 上面的 sql 是内层

CASE 
    WHEN DAYOFYEAR(your_date_column) <= 8 - first_day_of_year_week_number THEN 1
    ELSE CEIL(  (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ) + 1
END AS week_number,

多找一些边界值测试一下。

DAYOFWEEK(your_date_column)分别返回

周日		周一 	周二 	周三		周四		周五		周六
1		2		3		4		5		6		7

如果要让周一是第一天,那么需要调整偏移量

java 复制代码
int day_of_week(int day) {
    if ( day == 7) {
        return 1;
    } else {
        return day + 1;
    }
}

调整后的函数逻辑

周一 	周二 	周三		周四		周五		周六		周日
1		2		3		4		5		6		7

sql 逻辑

sql 复制代码
 CASE 
     WHEN DAYOFWEEK(your_date_column) = 1 THEN 7
     ELSE DAYOFWEEK(your_date_column) - 1
 END AS day_of_week,

2023-01-01 年是周日,

那么 DAYOFWEEK(your_date_column) 返回的是 1,即本周第一天。

WEEKOFYEAR(your_date_column) 返回的是 52, 即 2022 年最后一周。
但实际上我们要求的结果应该是 2023 年的第一周。

2023-01-02 年是周一,

那么 DAYOFWEEK(your_date_column) 返回的是 2,即本周第二天。

WEEKOFYEAR(your_date_column) 返回的是 1, 即 2023 年第一周。
但实际上我们要求的结果应该是 2023 年的第二周。


三、验证

sql 复制代码
drop table your_table;

CREATE TABLE your_table (
    id INT,
    your_date_column DATE
);


CREATE OR REPLACE TEMPORARY VIEW temp_view AS 
SELECT 1 as id, to_date('2023-01-01', 'yyyy-MM-dd') as your_date_column
UNION ALL SELECT 2, to_date('2023-01-02', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-03', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-04', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-05', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-06', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-07', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-08', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-09', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-10', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-11', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-12', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-13', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-14', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-15', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-16', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-17', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-18', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-19', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-20', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-21', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-22', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-23', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-24', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-25', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-26', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-27', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-28', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-29', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-30', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-31', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-01', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-02', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-03', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-04', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-05', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-06', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-07', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-08', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-09', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-15', 'yyyy-MM-dd')
UNION ALL SELECT 4, to_date('2023-12-31', 'yyyy-MM-dd')
UNION ALL SELECT 5, to_date('2024-01-01', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-02', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-03', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-04', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-05', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-06', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-07', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-08', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-09', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-10', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-11', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-12', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-13', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-14', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-15', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-16', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-17', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-18', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-19', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-20', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-21', 'yyyy-MM-dd')
;



INSERT INTO your_table
SELECT * FROM temp_view;





SELECT 
    your_date_column,
    DAYOFYEAR(your_date_column),
    8 - first_day_of_year_week_number,
    (DAYOFYEAR(your_date_column) - day_of_week ),
    (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ,
    CEIL(  (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ),
    CEIL(  (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ) + 1,
    CASE 
        WHEN DAYOFYEAR(your_date_column) <= 8 - first_day_of_year_week_number THEN 1
        ELSE CEIL(  (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ) + 1
    END AS week_number, // 所求的结果
    *
FROM (
    SELECT
    '|',
    your_date_column,
    DAYOFWEEK(your_date_column),
    DAYOFYEAR(your_date_column),
    CASE 
        WHEN DAYOFWEEK(your_date_column) = 1 THEN 7
        ELSE DAYOFWEEK(your_date_column) - 1
    END AS day_of_week,

    CASE 
        WHEN DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) = 1 THEN 7
        ELSE DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) - 1
    END AS first_day_of_year_week_number, // 每年第一天是周几,如果是周一返回 1,周日返回 7


    to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd') as first_day_of_year, // 每年第一天的日期
    date_format(your_date_column, 'EEEE') as WEEK
    
FROM
    your_table
);
2023-01-01	1	1	-6	-0.857143	0	1	1	|	2023-01-01	1	1	7	7	2023-01-01	Sunday
2023-01-02	2	1	1	0.142857	1	2	2	|	2023-01-02	2	2	1	7	2023-01-01	Monday
2023-01-03	3	1	1	0.142857	1	2	2	|	2023-01-03	3	3	2	7	2023-01-01	Tuesday
2023-01-04	4	1	1	0.142857	1	2	2	|	2023-01-04	4	4	3	7	2023-01-01	Wednesday
2023-01-05	5	1	1	0.142857	1	2	2	|	2023-01-05	5	5	4	7	2023-01-01	Thursday
2023-01-06	6	1	1	0.142857	1	2	2	|	2023-01-06	6	6	5	7	2023-01-01	Friday
2023-01-07	7	1	1	0.142857	1	2	2	|	2023-01-07	7	7	6	7	2023-01-01	Saturday
2023-01-08	8	1	1	0.142857	1	2	2	|	2023-01-08	1	8	7	7	2023-01-01	Sunday
2023-01-09	9	1	8	1.142857	2	3	3	|	2023-01-09	2	9	1	7	2023-01-01	Monday
2023-01-10	10	1	8	1.142857	2	3	3	|	2023-01-10	3	10	2	7	2023-01-01	Tuesday
2023-01-11	11	1	8	1.142857	2	3	3	|	2023-01-11	4	11	3	7	2023-01-01	Wednesday
2023-01-12	12	1	8	1.142857	2	3	3	|	2023-01-12	5	12	4	7	2023-01-01	Thursday
2023-01-13	13	1	8	1.142857	2	3	3	|	2023-01-13	6	13	5	7	2023-01-01	Friday
2023-01-14	14	1	8	1.142857	2	3	3	|	2023-01-14	7	14	6	7	2023-01-01	Saturday
2023-01-15	15	1	8	1.142857	2	3	3	|	2023-01-15	1	15	7	7	2023-01-01	Sunday
2023-01-16	16	1	15	2.142857	3	4	4	|	2023-01-16	2	16	1	7	2023-01-01	Monday
2023-01-17	17	1	15	2.142857	3	4	4	|	2023-01-17	3	17	2	7	2023-01-01	Tuesday
2023-01-18	18	1	15	2.142857	3	4	4	|	2023-01-18	4	18	3	7	2023-01-01	Wednesday
2023-01-19	19	1	15	2.142857	3	4	4	|	2023-01-19	5	19	4	7	2023-01-01	Thursday
2023-01-20	20	1	15	2.142857	3	4	4	|	2023-01-20	6	20	5	7	2023-01-01	Friday
2023-01-21	21	1	15	2.142857	3	4	4	|	2023-01-21	7	21	6	7	2023-01-01	Saturday
2023-01-22	22	1	15	2.142857	3	4	4	|	2023-01-22	1	22	7	7	2023-01-01	Sunday
2023-01-23	23	1	22	3.142857	4	5	5	|	2023-01-23	2	23	1	7	2023-01-01	Monday
2023-01-24	24	1	22	3.142857	4	5	5	|	2023-01-24	3	24	2	7	2023-01-01	Tuesday
2023-01-25	25	1	22	3.142857	4	5	5	|	2023-01-25	4	25	3	7	2023-01-01	Wednesday
2023-01-26	26	1	22	3.142857	4	5	5	|	2023-01-26	5	26	4	7	2023-01-01	Thursday
2023-01-27	27	1	22	3.142857	4	5	5	|	2023-01-27	6	27	5	7	2023-01-01	Friday
2023-01-28	28	1	22	3.142857	4	5	5	|	2023-01-28	7	28	6	7	2023-01-01	Saturday
2023-01-29	29	1	22	3.142857	4	5	5	|	2023-01-29	1	29	7	7	2023-01-01	Sunday
2023-01-30	30	1	29	4.142857	5	6	6	|	2023-01-30	2	30	1	7	2023-01-01	Monday
2023-01-31	31	1	29	4.142857	5	6	6	|	2023-01-31	3	31	2	7	2023-01-01	Tuesday
2023-02-01	32	1	29	4.142857	5	6	6	|	2023-02-01	4	32	3	7	2023-01-01	Wednesday
2023-02-02	33	1	29	4.142857	5	6	6	|	2023-02-02	5	33	4	7	2023-01-01	Thursday
2023-02-03	34	1	29	4.142857	5	6	6	|	2023-02-03	6	34	5	7	2023-01-01	Friday
2023-02-04	35	1	29	4.142857	5	6	6	|	2023-02-04	7	35	6	7	2023-01-01	Saturday
2023-02-05	36	1	29	4.142857	5	6	6	|	2023-02-05	1	36	7	7	2023-01-01	Sunday
2023-02-06	37	1	36	5.142857	6	7	7	|	2023-02-06	2	37	1	7	2023-01-01	Monday
2023-02-07	38	1	36	5.142857	6	7	7	|	2023-02-07	3	38	2	7	2023-01-01	Tuesday
2023-02-08	39	1	36	5.142857	6	7	7	|	2023-02-08	4	39	3	7	2023-01-01	Wednesday
2023-02-09	40	1	36	5.142857	6	7	7	|	2023-02-09	5	40	4	7	2023-01-01	Thursday
2023-02-15	46	1	43	6.142857	7	8	8	|	2023-02-15	4	46	3	7	2023-01-01	Wednesday
2023-12-31	365	1	358	51.142857	52	53	53	|	2023-12-31	1	365	7	7	2023-01-01	Sunday
2024-01-01	1	7	0	0.000000	0	1	1	|	2024-01-01	2	1	1	1	2024-01-01	Monday
2024-01-02	2	7	0	0.000000	0	1	1	|	2024-01-02	3	2	2	1	2024-01-01	Tuesday
2024-01-03	3	7	0	0.000000	0	1	1	|	2024-01-03	4	3	3	1	2024-01-01	Wednesday
2024-01-04	4	7	0	0.000000	0	1	1	|	2024-01-04	5	4	4	1	2024-01-01	Thursday
2024-01-05	5	7	0	0.000000	0	1	1	|	2024-01-05	6	5	5	1	2024-01-01	Friday
2024-01-06	6	7	0	0.000000	0	1	1	|	2024-01-06	7	6	6	1	2024-01-01	Saturday
2024-01-07	7	7	0	0.000000	0	1	1	|	2024-01-07	1	7	7	1	2024-01-01	Sunday
2024-01-08	8	7	7	1.000000	1	2	2	|	2024-01-08	2	8	1	1	2024-01-01	Monday
2024-01-09	9	7	7	1.000000	1	2	2	|	2024-01-09	3	9	2	1	2024-01-01	Tuesday
2024-01-10	10	7	7	1.000000	1	2	2	|	2024-01-10	4	10	3	1	2024-01-01	Wednesday
2024-01-11	11	7	7	1.000000	1	2	2	|	2024-01-11	5	11	4	1	2024-01-01	Thursday
2024-01-12	12	7	7	1.000000	1	2	2	|	2024-01-12	6	12	5	1	2024-01-01	Friday
2024-01-13	13	7	7	1.000000	1	2	2	|	2024-01-13	7	13	6	1	2024-01-01	Saturday
2024-01-14	14	7	7	1.000000	1	2	2	|	2024-01-14	1	14	7	1	2024-01-01	Sunday
2024-01-15	15	7	14	2.000000	2	3	3	|	2024-01-15	2	15	1	1	2024-01-01	Monday
2024-01-16	16	7	14	2.000000	2	3	3	|	2024-01-16	3	16	2	1	2024-01-01	Tuesday
2024-01-17	17	7	14	2.000000	2	3	3	|	2024-01-17	4	17	3	1	2024-01-01	Wednesday
2024-01-18	18	7	14	2.000000	2	3	3	|	2024-01-18	5	18	4	1	2024-01-01	Thursday
2024-01-19	19	7	14	2.000000	2	3	3	|	2024-01-19	6	19	5	1	2024-01-01	Friday
2024-01-20	20	7	14	2.000000	2	3	3	|	2024-01-20	7	20	6	1	2024-01-01	Saturday
2024-01-21	21	7	14	2.000000	2	3	3	|	2024-01-21	1	21	7	1	2024-01-01	Sunday
Time taken: 8.512 seconds, Fetched 63 row(s)


在这个查询中:
date_format 函数的第二个参数 'EEEE' 指定返回完整的星期名称(如 Monday, Tuesday 等)。
DAYOFYEAR(your_date_column) 计算出年中的天数。
DAYOFWEEK(your_date_column) 返回一周中的某天(以周日为一周的第一天)。
sql 复制代码
// 直接求结果,整理后的 sql 表达式
SELECT 
    your_date_column,
    CASE 
        WHEN DAYOFYEAR(your_date_column) <= 8 - first_day_of_year_week_number THEN 1
        ELSE CEIL(  (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ) + 1
    END AS week_number
FROM (
    SELECT
    your_date_column,
    CASE 
        WHEN DAYOFWEEK(your_date_column) = 1 THEN 7
        ELSE DAYOFWEEK(your_date_column) - 1
    END AS day_of_week,
    CASE 
        WHEN DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) = 1 THEN 7
        ELSE DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) - 1
    END AS first_day_of_year_week_number,
    to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd') as first_day_of_year,
    date_format(your_date_column, 'EEEE') as WEEK
    
FROM
    your_table
);

2023-01-01	1
2023-01-02	2
2023-01-03	2
2023-01-04	2
2023-01-05	2
2023-01-06	2
2023-01-07	2
2023-01-08	2
2023-01-09	3
2023-01-10	3
2023-01-11	3
2023-01-12	3
2023-01-13	3
2023-01-14	3
2023-01-15	3
2023-01-16	4
2023-01-17	4
2023-01-18	4
2023-01-19	4
2023-01-20	4
2023-01-21	4
2023-01-22	4
2023-01-23	5
2023-01-24	5
2023-01-25	5
2023-01-26	5
2023-01-27	5
2023-01-28	5
2023-01-29	5
2023-01-30	6
2023-01-31	6
2023-02-01	6
2023-02-02	6
2023-02-03	6
2023-02-04	6
2023-02-05	6
2023-02-06	7
2023-02-07	7
2023-02-08	7
2023-02-09	7
2023-02-15	8
2023-12-31	53
2024-01-01	1
2024-01-02	1
2024-01-03	1
2024-01-04	1
2024-01-05	1
2024-01-06	1
2024-01-07	1
2024-01-08	2
2024-01-09	2
2024-01-10	2
2024-01-11	2
2024-01-12	2
2024-01-13	2
2024-01-14	2
2024-01-15	3
2024-01-16	3
2024-01-17	3
2024-01-18	3
2024-01-19	3
2024-01-20	3
2024-01-21	3
Time taken: 0.493 seconds, Fetched 63 row(s)
23/11/14 14:27:07 INFO SparkSQLCLIDriver: Time taken: 0.493 seconds, Fetched 63 row(s)
相关推荐
Hacker_LaoYi1 小时前
SQL注入的那些面试题总结
数据库·sql
Hacker_LaoYi3 小时前
【渗透技术总结】SQL手工注入总结
数据库·sql
独行soc3 小时前
#渗透测试#漏洞挖掘#红蓝攻防#护网#sql注入介绍06-基于子查询的SQL注入(Subquery-Based SQL Injection)
数据库·sql·安全·web安全·漏洞挖掘·hw
Data跳动4 小时前
Spark内存都消耗在哪里了?
大数据·分布式·spark
独行soc5 小时前
#渗透测试#漏洞挖掘#红蓝攻防#护网#sql注入介绍08-基于时间延迟的SQL注入(Time-Based SQL Injection)
数据库·sql·安全·渗透测试·漏洞挖掘
lucky_syq6 小时前
流式处理,为什么Flink比Spark Streaming好?
大数据·flink·spark
清平乐的技术专栏6 小时前
Hive SQL 查询所有函数
hive·hadoop·sql
cmdch201711 小时前
Mybatis加密解密查询操作(sql前),where要传入加密后的字段时遇到的问题
数据库·sql·mybatis
goTsHgo11 小时前
在 Spark 上实现 Graph Embedding
大数据·spark·embedding
程序猿小柒11 小时前
【Spark】Spark SQL执行计划-精简版
大数据·sql·spark