官网链接:
0 问题描述
基于试卷作答记录表exam_record、试卷信息表examination_info , 统计2021年每个未完成试卷作答数大于1的有效用户的数据(有效用户指完成试卷作答数至少为1且未完成数小于5)
1 数据准备
sql
drop table if exists examination_info;
CREATE TABLE examination_info (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
exam_id int UNIQUE NOT NULL COMMENT '试卷ID',
tag varchar(32) COMMENT '类别标签',
difficulty varchar(8) COMMENT '难度',
duration int NOT NULL COMMENT '时长',
release_time datetime COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
drop table if exists exam_record;
CREATE TABLE exam_record (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid int NOT NULL COMMENT '用户ID',
exam_id int NOT NULL COMMENT '试卷ID',
start_time datetime NOT NULL COMMENT '开始时间',
submit_time datetime COMMENT '提交时间',
score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
INSERT INTO examination_info(exam_id,tag,difficulty,duration,release_time) VALUES
(9001, 'SQL', 'hard', 60, '2020-01-01 10:00:00'),
(9002, 'SQL', 'easy', 60, '2020-02-01 10:00:00'),
(9003, '算法', 'medium', 80, '2020-08-02 10:00:00');
INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2021-07-02 09:01:01', '2021-07-02 09:21:01', 80),
(1002, 9001, '2021-09-05 19:01:01', '2021-09-05 19:40:01', 81),
(1002, 9002, '2021-09-02 12:01:01', null, null),
(1002, 9003, '2021-09-01 12:01:01', null, null),
(1002, 9001, '2021-07-02 19:01:01', '2021-07-02 19:30:01', 82),
(1002, 9002, '2021-07-05 18:01:01', '2021-07-05 18:59:02', 90),
(1003, 9002, '2021-07-06 12:01:01', null, null),
(1003, 9003, '2021-09-07 10:01:01', '2021-09-07 10:31:01', 86),
(1004, 9003, '2021-09-06 12:01:01', null, null),
(1002, 9003, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 81),
(1005, 9001, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 88),
(1005, 9002, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 88),
(1006, 9002, '2021-09-02 12:11:01', '2021-09-02 12:31:01', 89);
2 数据分析
**step1:**获取各用户的tag,start_time及未完成标记和已完成标记,如果该作答记录交卷了则已完成标记为1,未完成标记为0,否则相反:if(submit_time is null, 1, null) as incomplete
代码如下:
sql
select er.uid,
ei.tag,
er.start_time,
-- incomplete 未提交试卷的标记
if(er.submit_time is null, 1, null) as incomplete,
-- complete已提交试卷的标记
if(er.submit_time is not null, 1,null) as complete
from exam_record er
left join examination_info ei on er.exam_id =ei.exam_id
where year(er.start_time)=2021
step2:
-
用户分组:group by uid,统计:未完成试卷作答数incomplete_cnt 、已完成试卷作答数complete_cnt_cnt
-
筛选出有效用户:having complete_cnt >= 1 and incomplete_cnt >1
and
incomplete_cnt <5 -
对于每条作答tag,用符号 ":" 来拼接 日期字段和tag字段:concat_ws
(':', date(start_time), tag);
对于一个人(组内)的多条作答,需去重 distinct concat_ws(':', date(start_time), tag); -
group_concat ( 要连接的字段 [separator '分隔符'])
最终代码如下:
sql
SELECT uid,
-- 未完成试卷的作答数
count(incomplete) as incomplete_cnt,
-- 已完成试卷的作答数
count(complete) as complete_cnt,
-- distinct concat_ws(':', date(start_time), tag) as cw
-- group_concat ( cw separator ';')
group_concat(distinct concat_ws(':', date(start_time), tag) separator ';') as detail
from (
SELECT er.uid,
ei.tag,
er.start_time,
-- incomplete 未提交试卷的标记
if(er.submit_time is null, 1, null) as incomplete,
-- complete已提交试卷的标记
if(er.submit_time is not null, 1,null) as complete
from exam_record er
left join examination_info ei on er.exam_id =ei.exam_id
where year(er.start_time)=2021
) as t1
group by uid
-- 有效用户:完成试卷的作答数至少为1 and 未完试卷的作答数小于5且大于1
having complete_cnt >= 1 and incomplete_cnt >1 and incomplete_cnt <5
order by incomplete_cnt desc;
3 小结
本案例涉及到:if条件判断+count,即条件聚合。 此外还涉及到date日期函数、concat_ws、group_concat函数的使用
concat_ws(带分隔符的字符串连接函数)
- 语法:concat_ws(string SEP, string A ,string B.......)
- 返回值:string
- 说明:返回输入字符串连接后的结果,SEP表示各个字符串的分隔符
- 举例:select concat_ws('|','ad','cv','op') ;---> ad|cv|op
group_concat函数
- 语法:group_concat([distinct] 要连接的字段 [order by 排序字段 asc/desc] [separator '分隔符'])
- 说明:将group by产生的同一个分组中的值连接起来,返回一个字符串结果
- 参数解释:distinct排除重复值; 如果需要对结果中的值进行排序,可以使用order by子句;separator '分隔符':是拼接符号,默认为逗号
- 返回值:string
- sql举例:
sql#--- 对buyer字段进行分组,把去除重复冗余的spending字段的值打印在同一行,'+'加号分隔 select buyer, group_concat(distinct spending separator '+') from spend group by buyer;