现有一张表,里面有三个字段为user_id、ip、log_time,现有需求要找出用户共同使用ip数量大于等于3个的用户对找出来。
1.表数据准备
bash
--建表语句
create table dms.user_login_log(
user_id string
,ip string
,log_time string
);
--插入数据
insert into dms.user_login_log values
(102,'192.168.10.101','2022-05-10 11:04:30'),
(102,'192.168.10.102','2022-05-10 11:05:00'),
(102,'192.168.10.103','2022-05-10 11:06:00'),
(102,'192.168.10.104','2022-05-10 11:07:00'),
(101,'192.168.10.101','2022-05-10 11:00:00'),
(101,'192.168.10.101','2022-05-10 11:01:00'),
(101,'192.168.10.102','2022-05-10 11:02:00'),
(101,'192.168.10.103','2022-05-10 11:03:00'),
(101,'192.168.10.104','2022-05-10 11:04:00'),
(103,'192.168.10.102','2022-05-10 11:08:00'),
(103,'192.168.10.103','2022-05-10 11:08:00'),
(103,'192.168.10.104','2022-05-10 11:10:00'),
(104,'192.168.10.103','2022-05-10 11:11:00'),
(104,'192.168.10.104','2022-05-10 11:12:00'),
(105,'192.168.10.105','2022-05-10 11:13:00')
2.需求分析
我们最终想要获取的是公共使用ip数量超过3的用户对,比如:101和102 共用了4个IP,101和103公用了3个IP,102和103公用3个。可以通过自关联实现。
3.实现sql如下
bash
select
A1.user_id
,A2.user_id
,count(1)
from
(select
user_id
,ip
from dms.user_login_log
group by user_id,ip) A1
join
(select
user_id
,ip
from dms.user_login_log
group by user_id,ip) A2
ON A1.ip = A2.ip
where A1.user_id>A2.user_id
group by A1.user_id,A2.user_id
having count(1)>=3
实现效果如下: