描述:
现在运营想要查看用户在某天刷题后第二天还会再来刷题的留存率。请你取出相应数据。
留存率计算:
示例:question_practice_detail
|--------|---------------|-----------------|------------|----------------|
| id | device_id | question_id | result | date |
| 1 | 2138 | 111 | wrong | 2021-05-03 |
| 2 | 3214 | 112 | wrong | 2021-05-09 |
| 3 | 3214 | 113 | wrong | 2021-06-15 |
| 4 | 6543 | 111 | right | 2021-08-13 |
| 5 | 2315 | 115 | right | 2021-08-13 |
| 6 | 2315 | 116 | right | 2021-08-14 |
| 7 | 2315 | 117 | wrong | 2021-08-15 |
| 8 | 3214 | 112 | wrong | 2021-05-09 |
| 9 | 3214 | 113 | wrong | 2021-08-15 |
| 10 | 6543 | 111 | right | 2021-08-13 |
| 11 | 2315 | 115 | right | 2021-08-13 |
| 12 | 2315 | 116 | right | 2021-08-14 |
| 13 | 2315 | 117 | wrong | 2021-08-15 |
| 14 | 3214 | 112 | wrong | 2021-08-16 |
| 15 | 3214 | 113 | wrong | 2021-08-18 |
| 16 | 6543 | 111 | right | 2021-08-13 |
根据示例,你的查询应返回以下结果:
|---------|
| avg_ret |
| 0.3000 |
示例输入:
sql
drop table if exists `question_practice_detail`;
CREATE TABLE `question_practice_detail` (
`id` int NOT NULL,
`device_id` int NOT NULL,
`question_id`int NOT NULL,
`result` varchar(32) NOT NULL,
`date` date NOT NULL
);
INSERT INTO question_practice_detail VALUES(1,2138,111,'wrong','2021-05-03');
INSERT INTO question_practice_detail VALUES(2,3214,112,'wrong','2021-05-09');
INSERT INTO question_practice_detail VALUES(3,3214,113,'wrong','2021-06-15');
INSERT INTO question_practice_detail VALUES(4,6543,111,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(5,2315,115,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(6,2315,116,'right','2021-08-14');
INSERT INTO question_practice_detail VALUES(7,2315,117,'wrong','2021-08-15');
INSERT INTO question_practice_detail VALUES(8,3214,112,'wrong','2021-05-09');
INSERT INTO question_practice_detail VALUES(9,3214,113,'wrong','2021-08-15');
INSERT INTO question_practice_detail VALUES(10,6543,111,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(11,2315,115,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(12,2315,116,'right','2021-08-14');
INSERT INTO question_practice_detail VALUES(13,2315,117,'wrong','2021-08-15');
INSERT INTO question_practice_detail VALUES(14,3214,112,'wrong','2021-08-16');
INSERT INTO question_practice_detail VALUES(15,3214,113,'wrong','2021-08-18');
INSERT INTO question_practice_detail VALUES(16,6543,111,'right','2021-08-13');
输出:
sql
avg_ret
0.3000
题解:
本题要求简单来说,就是想要你求一下,今天来了,明天也会来的人的总数,除以用户所有的刷题记录数,上表中question_practice_detail就是所谓的用户刷题的详情,里面记录了如下内容:
- id,记录id,在本题中无关紧要
- device_id,至关重要,在本题中对应一个唯一的用户
- question_id:不太重要,题目主要关心的是用户和训练的日期
- result:本次联系的结果,也不太重要,
- date:重要的列,因为本题直接跟日期相关,要求求一个用户有一天的记录,是否有当天记录+1day的记录
这题最核心的就是要求如果在某个date有记录,那么这个用户的date+1day是否有记录。可以想到的是自连接操作,也就是这个表自身左连接自身,我以本题中的某几行数据来进行说明:

我们让这个表,自己去连接自己,要匹配这个人今天练了,当天的明天也练了,那么肯定是要是同一个人的记录才有用,因此必须有:
- qpd1.device_id= qpd2.device_id
其中qpd是question_practice_detail的缩写,qpd1是左表,qpd2是右表,我们的意图是:
- qpd1 left join qpd2
但是我们不能直接这样连接,因为还要筛查,今天玩了明天也玩了的是哪些,但是我们不妨先看看直接连接的效果:
sql
mysql> select * from question_practice_detail qpd1 left join question_practice_detail qpd2 on qpd1.device_id = qpd2.device_id order by qpd1.device_id;
+----+-----------+-------------+--------+------------+------+-----------+-------------+--------+------------+
| id | device_id | question_id | result | date | id | device_id | question_id | result | date |
+----+-----------+-------------+--------+------------+------+-----------+-------------+--------+------------+
| 1 | 2138 | 111 | wrong | 2021-05-03 | 1 | 2138 | 111 | wrong | 2021-05-03 |
| 13 | 2315 | 117 | wrong | 2021-08-15 | 11 | 2315 | 115 | right | 2021-08-13 |
| 13 | 2315 | 117 | wrong | 2021-08-15 | 7 | 2315 | 117 | wrong | 2021-08-15 |
| 13 | 2315 | 117 | wrong | 2021-08-15 | 6 | 2315 | 116 | right | 2021-08-14 |
| 13 | 2315 | 117 | wrong | 2021-08-15 | 5 | 2315 | 115 | right | 2021-08-13 |
| 11 | 2315 | 115 | right | 2021-08-13 | 11 | 2315 | 115 | right | 2021-08-13 |
| 11 | 2315 | 115 | right | 2021-08-13 | 13 | 2315 | 117 | wrong | 2021-08-15 |
| 11 | 2315 | 115 | right | 2021-08-13 | 12 | 2315 | 116 | right | 2021-08-14 |
| 11 | 2315 | 115 | right | 2021-08-13 | 7 | 2315 | 117 | wrong | 2021-08-15 |
| 11 | 2315 | 115 | right | 2021-08-13 | 6 | 2315 | 116 | right | 2021-08-14 |
| 5 | 2315 | 115 | right | 2021-08-13 | 6 | 2315 | 116 | right | 2021-08-14 |
| 11 | 2315 | 115 | right | 2021-08-13 | 5 | 2315 | 115 | right | 2021-08-13 |
| 12 | 2315 | 116 | right | 2021-08-14 | 13 | 2315 | 117 | wrong | 2021-08-15 |
| 12 | 2315 | 116 | right | 2021-08-14 | 12 | 2315 | 116 | right | 2021-08-14 |
| 12 | 2315 | 116 | right | 2021-08-14 | 11 | 2315 | 115 | right | 2021-08-13 |
| 12 | 2315 | 116 | right | 2021-08-14 | 7 | 2315 | 117 | wrong | 2021-08-15 |
| 5 | 2315 | 115 | right | 2021-08-13 | 13 | 2315 | 117 | wrong | 2021-08-15 |
| 5 | 2315 | 115 | right | 2021-08-13 | 12 | 2315 | 116 | right | 2021-08-14 |
| 5 | 2315 | 115 | right | 2021-08-13 | 11 | 2315 | 115 | right | 2021-08-13 |
| 5 | 2315 | 115 | right | 2021-08-13 | 7 | 2315 | 117 | wrong | 2021-08-15 |
| 12 | 2315 | 116 | right | 2021-08-14 | 6 | 2315 | 116 | right | 2021-08-14 |
| 5 | 2315 | 115 | right | 2021-08-13 | 5 | 2315 | 115 | right | 2021-08-13 |
| 6 | 2315 | 116 | right | 2021-08-14 | 13 | 2315 | 117 | wrong | 2021-08-15 |
| 6 | 2315 | 116 | right | 2021-08-14 | 12 | 2315 | 116 | right | 2021-08-14 |
| 6 | 2315 | 116 | right | 2021-08-14 | 11 | 2315 | 115 | right | 2021-08-13 |
| 6 | 2315 | 116 | right | 2021-08-14 | 7 | 2315 | 117 | wrong | 2021-08-15 |
| 6 | 2315 | 116 | right | 2021-08-14 | 6 | 2315 | 116 | right | 2021-08-14 |
| 6 | 2315 | 116 | right | 2021-08-14 | 5 | 2315 | 115 | right | 2021-08-13 |
| 7 | 2315 | 117 | wrong | 2021-08-15 | 13 | 2315 | 117 | wrong | 2021-08-15 |
| 7 | 2315 | 117 | wrong | 2021-08-15 | 12 | 2315 | 116 | right | 2021-08-14 |
| 7 | 2315 | 117 | wrong | 2021-08-15 | 11 | 2315 | 115 | right | 2021-08-13 |
| 7 | 2315 | 117 | wrong | 2021-08-15 | 7 | 2315 | 117 | wrong | 2021-08-15 |
| 7 | 2315 | 117 | wrong | 2021-08-15 | 6 | 2315 | 116 | right | 2021-08-14 |
| 7 | 2315 | 117 | wrong | 2021-08-15 | 5 | 2315 | 115 | right | 2021-08-13 |
| 12 | 2315 | 116 | right | 2021-08-14 | 5 | 2315 | 115 | right | 2021-08-13 |
| 13 | 2315 | 117 | wrong | 2021-08-15 | 13 | 2315 | 117 | wrong | 2021-08-15 |
| 13 | 2315 | 117 | wrong | 2021-08-15 | 12 | 2315 | 116 | right | 2021-08-14 |
| 2 | 3214 | 112 | wrong | 2021-05-09 | 15 | 3214 | 113 | wrong | 2021-08-18 |
| 2 | 3214 | 112 | wrong | 2021-05-09 | 14 | 3214 | 112 | wrong | 2021-08-16 |
| 2 | 3214 | 112 | wrong | 2021-05-09 | 9 | 3214 | 113 | wrong | 2021-08-15 |
| 2 | 3214 | 112 | wrong | 2021-05-09 | 8 | 3214 | 112 | wrong | 2021-05-09 |
| 2 | 3214 | 112 | wrong | 2021-05-09 | 3 | 3214 | 113 | wrong | 2021-06-15 |
| 2 | 3214 | 112 | wrong | 2021-05-09 | 2 | 3214 | 112 | wrong | 2021-05-09 |
| 3 | 3214 | 113 | wrong | 2021-06-15 | 15 | 3214 | 113 | wrong | 2021-08-18 |
| 3 | 3214 | 113 | wrong | 2021-06-15 | 14 | 3214 | 112 | wrong | 2021-08-16 |
| 3 | 3214 | 113 | wrong | 2021-06-15 | 9 | 3214 | 113 | wrong | 2021-08-15 |
| 3 | 3214 | 113 | wrong | 2021-06-15 | 3 | 3214 | 113 | wrong | 2021-06-15 |
| 3 | 3214 | 113 | wrong | 2021-06-15 | 2 | 3214 | 112 | wrong | 2021-05-09 |
| 15 | 3214 | 113 | wrong | 2021-08-18 | 8 | 3214 | 112 | wrong | 2021-05-09 |
| 15 | 3214 | 113 | wrong | 2021-08-18 | 3 | 3214 | 113 | wrong | 2021-06-15 |
| 15 | 3214 | 113 | wrong | 2021-08-18 | 2 | 3214 | 112 | wrong | 2021-05-09 |
| 3 | 3214 | 113 | wrong | 2021-06-15 | 8 | 3214 | 112 | wrong | 2021-05-09 |
| 8 | 3214 | 112 | wrong | 2021-05-09 | 15 | 3214 | 113 | wrong | 2021-08-18 |
| 8 | 3214 | 112 | wrong | 2021-05-09 | 14 | 3214 | 112 | wrong | 2021-08-16 |
| 8 | 3214 | 112 | wrong | 2021-05-09 | 9 | 3214 | 113 | wrong | 2021-08-15 |
| 8 | 3214 | 112 | wrong | 2021-05-09 | 8 | 3214 | 112 | wrong | 2021-05-09 |
| 8 | 3214 | 112 | wrong | 2021-05-09 | 3 | 3214 | 113 | wrong | 2021-06-15 |
| 8 | 3214 | 112 | wrong | 2021-05-09 | 2 | 3214 | 112 | wrong | 2021-05-09 |
| 9 | 3214 | 113 | wrong | 2021-08-15 | 15 | 3214 | 113 | wrong | 2021-08-18 |
| 9 | 3214 | 113 | wrong | 2021-08-15 | 14 | 3214 | 112 | wrong | 2021-08-16 |
| 9 | 3214 | 113 | wrong | 2021-08-15 | 3 | 3214 | 113 | wrong | 2021-06-15 |
| 9 | 3214 | 113 | wrong | 2021-08-15 | 2 | 3214 | 112 | wrong | 2021-05-09 |
| 9 | 3214 | 113 | wrong | 2021-08-15 | 9 | 3214 | 113 | wrong | 2021-08-15 |
| 9 | 3214 | 113 | wrong | 2021-08-15 | 8 | 3214 | 112 | wrong | 2021-05-09 |
| 14 | 3214 | 112 | wrong | 2021-08-16 | 15 | 3214 | 113 | wrong | 2021-08-18 |
| 14 | 3214 | 112 | wrong | 2021-08-16 | 14 | 3214 | 112 | wrong | 2021-08-16 |
| 14 | 3214 | 112 | wrong | 2021-08-16 | 9 | 3214 | 113 | wrong | 2021-08-15 |
| 14 | 3214 | 112 | wrong | 2021-08-16 | 8 | 3214 | 112 | wrong | 2021-05-09 |
| 14 | 3214 | 112 | wrong | 2021-08-16 | 3 | 3214 | 113 | wrong | 2021-06-15 |
| 14 | 3214 | 112 | wrong | 2021-08-16 | 2 | 3214 | 112 | wrong | 2021-05-09 |
| 15 | 3214 | 113 | wrong | 2021-08-18 | 15 | 3214 | 113 | wrong | 2021-08-18 |
| 15 | 3214 | 113 | wrong | 2021-08-18 | 14 | 3214 | 112 | wrong | 2021-08-16 |
| 15 | 3214 | 113 | wrong | 2021-08-18 | 9 | 3214 | 113 | wrong | 2021-08-15 |
| 4 | 6543 | 111 | right | 2021-08-13 | 16 | 6543 | 111 | right | 2021-08-13 |
| 4 | 6543 | 111 | right | 2021-08-13 | 10 | 6543 | 111 | right | 2021-08-13 |
| 4 | 6543 | 111 | right | 2021-08-13 | 4 | 6543 | 111 | right | 2021-08-13 |
| 10 | 6543 | 111 | right | 2021-08-13 | 4 | 6543 | 111 | right | 2021-08-13 |
| 10 | 6543 | 111 | right | 2021-08-13 | 16 | 6543 | 111 | right | 2021-08-13 |
| 10 | 6543 | 111 | right | 2021-08-13 | 10 | 6543 | 111 | right | 2021-08-13 |
| 16 | 6543 | 111 | right | 2021-08-13 | 16 | 6543 | 111 | right | 2021-08-13 |
| 16 | 6543 | 111 | right | 2021-08-13 | 10 | 6543 | 111 | right | 2021-08-13 |
| 16 | 6543 | 111 | right | 2021-08-13 | 4 | 6543 | 111 | right | 2021-08-13 |
+----+-----------+-------------+--------+------------+------+-----------+-------------+--------+------------+
信息太杂太乱,我们整理一下:
sql
select
qpd1.device_id,
qpd1.date,
qpd2.device_id,
qpd2.date
from question_practice_detail qpd1
left join question_practice_detail qpd2
on qpd1.device_id = qpd2.device_id
where qpd1.device_id = 3214
order by qpd1.device_id;
- 上述表只留下了关键的唯一标识和日期,然后进行了排序,并且只查看设备表示为3214的用户。
输出:
sql
+-----------+------------+-----------+------------+
| device_id | date | device_id | date |
+-----------+------------+-----------+------------+
| 3214 | 2021-05-09 | 3214 | 2021-08-18 |
| 3214 | 2021-05-09 | 3214 | 2021-08-16 |
| 3214 | 2021-05-09 | 3214 | 2021-08-15 |
| 3214 | 2021-05-09 | 3214 | 2021-05-09 |
| 3214 | 2021-05-09 | 3214 | 2021-06-15 |
| 3214 | 2021-05-09 | 3214 | 2021-05-09 |
| 3214 | 2021-06-15 | 3214 | 2021-08-18 |
| 3214 | 2021-06-15 | 3214 | 2021-08-16 |
| 3214 | 2021-06-15 | 3214 | 2021-08-15 |
| 3214 | 2021-06-15 | 3214 | 2021-05-09 |
| 3214 | 2021-06-15 | 3214 | 2021-06-15 |
| 3214 | 2021-06-15 | 3214 | 2021-05-09 |
| 3214 | 2021-05-09 | 3214 | 2021-08-18 |
| 3214 | 2021-05-09 | 3214 | 2021-08-16 |
| 3214 | 2021-05-09 | 3214 | 2021-08-15 |
| 3214 | 2021-05-09 | 3214 | 2021-05-09 |
| 3214 | 2021-05-09 | 3214 | 2021-06-15 |
| 3214 | 2021-05-09 | 3214 | 2021-05-09 |
| 3214 | 2021-08-15 | 3214 | 2021-08-18 |
| 3214 | 2021-08-15 | 3214 | 2021-08-16 |
| 3214 | 2021-08-15 | 3214 | 2021-08-15 |
| 3214 | 2021-08-15 | 3214 | 2021-05-09 |
| 3214 | 2021-08-15 | 3214 | 2021-06-15 |
| 3214 | 2021-08-15 | 3214 | 2021-05-09 |
| 3214 | 2021-08-16 | 3214 | 2021-08-18 |
| 3214 | 2021-08-16 | 3214 | 2021-08-16 |
| 3214 | 2021-08-16 | 3214 | 2021-08-15 |
| 3214 | 2021-08-16 | 3214 | 2021-05-09 |
| 3214 | 2021-08-16 | 3214 | 2021-06-15 |
| 3214 | 2021-08-16 | 3214 | 2021-05-09 |
| 3214 | 2021-08-18 | 3214 | 2021-08-18 |
| 3214 | 2021-08-18 | 3214 | 2021-08-16 |
| 3214 | 2021-08-18 | 3214 | 2021-08-15 |
| 3214 | 2021-08-18 | 3214 | 2021-05-09 |
| 3214 | 2021-08-18 | 3214 | 2021-06-15 |
| 3214 | 2021-08-18 | 3214 | 2021-05-09 |
+-----------+------------+-----------+------------+
下面是原表中3214的联系记录:

我们单独剥离他,他的连接过程其实是这样的:

可以看到就2021-05-09这一天的练习,就要跟他自己的记录进行完全的连接,但是我们要的并不是杂乱无章的连接,2021-05-09这一天应该去匹配2021-05-10这一天,因为是左连接,如果咩有2021-05-10这一天,那么2021-05-09这一天的匹配右侧信息肯定为null。
再比如2021-08-15这一天,他在2021-08-16这一天进行了练习,那么2021-08-15这一天的记录的右侧就不回为null。
据此我们可以在之前device_id的基础上再进行练习天数的限制,但是该如何限制呢?我怎么解析这个qpd1的天数 + 1 然后和和qpd2的天数做比较呢?

很简单,mysql有个日期处理类的函数,交DATE_ADD:

其语法如下:
bash
DATE_ADD(date,INTERVAL expr type)
- date就是你要修改的列,
- INTERVAL为固定写法,
- expr为数值大小
- type为要增加的时间的类型,可以是:
- day
- month
- year
- hour
- second
- minute
- ..... (还有几个,可以自己查一下,当前的我们使用day就足够了)
知道了之后,我们再来看看这个日期该怎么修改:
- qpd1.device_id= qpd2.device_id (这是表连接的基本条件)
- qpd2.date = DATE_ADD(qpd1.date, interval 1 day);
这样,这个用户如果当天的下一天继续练习了,那么当天就能有匹配的信息,如果没有,那么右侧的信息列就为null。
执行代码如下:
sql
select
qpd1.device_id,
qpd1.date,
qpd2.device_id,
qpd2.date
from question_practice_detail qpd1
left join question_practice_detail qpd2
on qpd1.device_id = qpd2.device_id
and qpd2.date = DATE_ADD(qpd1.date, interval 1 day);
where qpd1.device_id = 3214
order by qpd1.device_id;
结果如下:
sql
+-----------+------------+-----------+------------+
| device_id | date | device_id | date |
+-----------+------------+-----------+------------+
| 2138 | 2021-05-03 | NULL | NULL |
| 3214 | 2021-05-09 | NULL | NULL |
| 3214 | 2021-06-15 | NULL | NULL |
| 6543 | 2021-08-13 | NULL | NULL |
| 2315 | 2021-08-13 | 2315 | 2021-08-14 |
| 2315 | 2021-08-13 | 2315 | 2021-08-14 |
| 2315 | 2021-08-14 | 2315 | 2021-08-15 |
| 2315 | 2021-08-14 | 2315 | 2021-08-15 |
| 2315 | 2021-08-15 | NULL | NULL |
| 3214 | 2021-05-09 | NULL | NULL |
| 3214 | 2021-08-15 | 3214 | 2021-08-16 |
| 6543 | 2021-08-13 | NULL | NULL |
| 2315 | 2021-08-13 | 2315 | 2021-08-14 |
| 2315 | 2021-08-13 | 2315 | 2021-08-14 |
| 2315 | 2021-08-14 | 2315 | 2021-08-15 |
| 2315 | 2021-08-14 | 2315 | 2021-08-15 |
| 2315 | 2021-08-15 | NULL | NULL |
| 3214 | 2021-08-16 | NULL | NULL |
| 3214 | 2021-08-18 | NULL | NULL |
| 6543 | 2021-08-13 | NULL | NULL |
+-----------+------------+-----------+------------+
我们越来越接近真相了。。。
分析一下这个数据,发现如果某个用户当天提交了两次,那么这两次就会跟明天的数据产生一次重复,因此我们应该先去重,我们不直接从表中查看,我们用查询结果当做表:
from question_practice_detail qpd1
修改为:
select distinc device_id, date from question_practice_detail
然后用别名指代他,例如qpd1
两个表都应该做去重,所以左连接的表也应该这么做来去重,因此代码如下:
sql
select
qpd1.device_id,
qpd1.date,
qpd2.device_id,
qpd2.date
from (select distinct device_id, date from question_practice_detail) as qpd1
left join (select distinct device_id, date from question_practice_detail) as qpd2
on qpd1.device_id = qpd2.device_id
and qpd2.date = DATE_ADD(qpd1.date, interval 1 day)
order by qpd1.device_id;
结果如下:
sql
+-----------+------------+-----------+------------+
| device_id | date | device_id | date |
+-----------+------------+-----------+------------+
| 2138 | 2021-05-03 | NULL | NULL |
| 3214 | 2021-05-09 | NULL | NULL |
| 3214 | 2021-06-15 | NULL | NULL |
| 6543 | 2021-08-13 | NULL | NULL |
| 2315 | 2021-08-13 | 2315 | 2021-08-14 |
| 2315 | 2021-08-14 | 2315 | 2021-08-15 |
| 2315 | 2021-08-15 | NULL | NULL |
| 3214 | 2021-08-15 | 3214 | 2021-08-16 |
| 3214 | 2021-08-16 | NULL | NULL |
| 3214 | 2021-08-18 | NULL | NULL |
+-----------+------------+-----------+------------+
剩下的就只差统计了,先来对比一下原表:

把他们两个放在一起对比一下:

可以看见,红色打叉的有箭头的都是匹配不上的。
就差最后一步了,总结数据,得到答案想要的样子。
现在来回顾一下要求:
那么留存率的分子有次日刷题记录的用户总数那就是右侧qpd2.device不为null的记录数,而分子就是总记录数:
sql
select
round(sum(case when qpd2.device_id is not null then 1 else 0 end) / count(*), 4) as avg_ret
from (select distinct device_id, date from question_practice_detail) as qpd1
left join (select distinct device_id, date from question_practice_detail) as qpd2
on qpd1.device_id = qpd2.device_id
and qpd2.date = DATE_ADD(qpd1.date, interval 1 day)
order by qpd1.device_id;
结果如下:
bash
+---------+
| avg_ret |
+---------+
| 0.3000 |
+---------+
1 row in set (0.00 sec)
但是这里还是有点问题,我个人认为这个题目描述有问题(我的做法完全是按官方理解的意思来的)。分子应该是"用户有次日刷题的记录数"而不是"有次日刷题的用户总数",分子该是记录条数,而不是人数,但是官方这里说是人数,但实际官方的代码解答是统计的记录数,因此略有不符。
