目录
- [1 题目](#1 题目)
- [2 建表语句](#2 建表语句)
- [3 题解](#3 题解)
题目来源:字节跳动。
1 题目
现有用户登录日志表,记录了每个用户登录的IP地址,请查询共同使用过3个及以上IP的用户对;
样例数据
+----------+-----------------+----------------------+
| user_id | ip | time_stamp |
+----------+-----------------+----------------------+
| 2 | 223.104.41.101 | 2023-08-24 07:00:00 |
| 4 | 223.104.41.122 | 2023-08-24 10:00:00 |
| 5 | 223.104.41.126 | 2023-08-24 11:00:00 |
| 4 | 223.104.41.126 | 2023-08-24 13:00:00 |
| 1 | 223.104.41.101 | 2023-08-24 16:00:00 |
| 3 | 223.104.41.101 | 2023-08-24 16:02:00 |
| 2 | 223.104.41.104 | 2023-08-24 16:30:00 |
| 1 | 223.104.41.121 | 2023-08-24 17:00:00 |
| 2 | 223.104.41.122 | 2023-08-24 17:05:00 |
| 3 | 223.104.41.103 | 2023-08-24 18:11:00 |
| 2 | 223.104.41.103 | 2023-08-24 19:00:00 |
| 1 | 223.104.41.104 | 2023-08-24 19:00:00 |
| 3 | 223.104.41.122 | 2023-08-24 19:07:00 |
| 1 | 223.104.41.122 | 2023-08-24 21:00:00 |
+----------+-----------------+----------------------+
2 建表语句
sql
--建表语句
CREATE TABLE t_login_log (
user_id bigint COMMENT '用户ID',
ip string COMMENT '用户登录ip地址',
time_stamp string COMMENT '登录时间'
) COMMENT '用户登录记录表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
;
-- 插入数据
insert into t_login_log(user_id,ip,time_stamp)
values
(1,'223.104.41.101','2023-08-24 16:00:00'),
(1,'223.104.41.121','2023-08-24 17:00:00'),
(1,'223.104.41.104','2023-08-24 19:00:00'),
(1,'223.104.41.122','2023-08-24 21:00:00'),
(1,'223.104.41.122','2023-08-24 22:00:00'),
(2,'223.104.41.101','2023-08-24 07:00:00'),
(2,'223.104.41.103','2023-08-24 19:00:00'),
(2,'223.104.41.104','2023-08-24 16:30:00'),
(2,'223.104.41.122','2023-08-24 17:05:00'),
(3,'223.104.41.103','2023-08-24 18:11:00'),
(3,'223.104.41.122','2023-08-24 19:07:00'),
(3,'223.104.41.101','2023-08-24 16:02:00'),
(4,'223.104.41.126','2023-08-24 13:00:00'),
(5,'223.104.41.126','2023-08-24 11:00:00'),
(4,'223.104.41.122','2023-08-24 10:00:00');
3 题解
(1)将所有用户登录记录按照用户ID和登录IP去重
sql
select user_id,
ip
from t_login_log
group by user_id, ip
执行结果
+----------+-----------------+
| user_id | ip |
+----------+-----------------+
| 1 | 223.104.41.101 |
| 1 | 223.104.41.104 |
| 1 | 223.104.41.121 |
| 1 | 223.104.41.122 |
| 2 | 223.104.41.101 |
| 2 | 223.104.41.103 |
| 2 | 223.104.41.104 |
| 2 | 223.104.41.122 |
| 3 | 223.104.41.101 |
| 3 | 223.104.41.103 |
| 3 | 223.104.41.122 |
| 4 | 223.104.41.122 |
| 4 | 223.104.41.126 |
| 5 | 223.104.41.126 |
+----------+-----------------+
(2)通过IP地址进行自关联,去重,剔除相同用户。
sql
with tmp as
(select user_id,
ip
from t_login_log
group by user_id, ip)
select t1.user_id,
t2.user_id,
t1.ip
from tmp as t1
join
tmp as t2
on t1.ip = t2.ip
where t1.user_id < t2.user_id
执行结果
+-------------+-------------+-----------------+
| t1.user_id | t2.user_id | t1.ip |
+-------------+-------------+-----------------+
| 1 | 2 | 223.104.41.101 |
| 1 | 3 | 223.104.41.101 |
| 2 | 3 | 223.104.41.101 |
| 2 | 3 | 223.104.41.103 |
| 1 | 2 | 223.104.41.104 |
| 1 | 2 | 223.104.41.122 |
| 1 | 3 | 223.104.41.122 |
| 1 | 4 | 223.104.41.122 |
| 2 | 3 | 223.104.41.122 |
| 2 | 4 | 223.104.41.122 |
| 3 | 4 | 223.104.41.122 |
| 4 | 5 | 223.104.41.126 |
+-------------+-------------+-----------------+
(3)根据用户组计算使用共同IP的个数
sql
with tmp as
(select user_id,
ip
from t_login_log
group by user_id, ip)
select t1.user_id,
t2.user_id,
count(t1.ip)
from tmp as t1
join
tmp as t2
on t1.ip = t2.ip
where t1.user_id < t2.user_id
group by t1.user_id,
t2.user_id
执行结果
+-------------+-------------+------+
| t1.user_id | t2.user_id | _c2 |
+-------------+-------------+------+
| 1 | 2 | 3 |
| 1 | 3 | 2 |
| 1 | 4 | 1 |
| 2 | 3 | 3 |
| 2 | 4 | 1 |
| 3 | 4 | 1 |
| 4 | 5 | 1 |
+-------------+-------------+------+
(4)查询共同使用过3个以上IP的用户对
sql
with tmp as
(select user_id,
ip
from t_login_log
group by user_id, ip)
select t1.user_id,
t2.user_id
from tmp as t1
join
tmp as t2
on t1.ip = t2.ip
where t1.user_id < t2.user_id
group by t1.user_id,
t2.user_id
having count(t1.ip) >= 3
执行结果
+-------------+-------------+
| t1.user_id | t2.user_id |
+-------------+-------------+
| 1 | 2 |
| 2 | 3 |
+-------------+-------------+