SQL面试题练习 —— 合并用户浏览行为

目录

  • [1 题目](#1 题目)
  • [2 建表语句](#2 建表语句)
  • [3 题解](#3 题解)

1 题目

有一份用户访问记录表,记录用户id和访问时间,如果用户访问时间间隔小于60s则认为时一次浏览,请合并用户的浏览行为。

样例数据

+----------+--------------+
| user_id  | access_time  |
+----------+--------------+
| 1        | 1736337600   |
| 1        | 1736337660   |
| 2        | 1736337670   |
| 1        | 1736337710   |
| 3        | 1736337715   |
| 2        | 1736337750   |
| 1        | 1736337760   |
| 3        | 1736337820   |
| 2        | 1736337850   |
| 1        | 1736337910   |
+----------+--------------+

2 建表语句

sql 复制代码
--建表语句
CREATE TABLE user_access_log (
  user_id INT,
  access_time BIGINT
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
--插入数据
insert into user_access_log (user_id,access_time)
values
(1,1736337600),
(1,1736337660),
(2,1736337670),
(1,1736337710),
(3,1736337715),
(2,1736337750),
(1,1736337760),
(3,1736337820),
(2,1736337850),
(1,1736337910);

3 题解

(1)分用户计算出每次点击时间差;

sql 复制代码
select user_id,
       access_time,
       last_access_time,
       access_time - last_access_time as time_diff
from (select user_id,
             access_time,
             lag(access_time) over (partition by user_id order by access_time) as last_access_time
      from user_access_log) t

执行结果

+----------+--------------+-------------------+------------+
| user_id  | access_time  | last_access_time  | time_diff  |
+----------+--------------+-------------------+------------+
| 1        | 1736337600   | NULL              | NULL       |
| 1        | 1736337660   | 1736337600        | 60         |
| 1        | 1736337710   | 1736337660        | 50         |
| 1        | 1736337760   | 1736337710        | 50         |
| 1        | 1736337910   | 1736337760        | 150        |
| 2        | 1736337670   | NULL              | NULL       |
| 2        | 1736337750   | 1736337670        | 80         |
| 2        | 1736337850   | 1736337750        | 100        |
| 3        | 1736337715   | NULL              | NULL       |
| 3        | 1736337820   | 1736337715        | 105        |
+----------+--------------+-------------------+------------+

(2)确认是否是新的访问

sql 复制代码
select user_id,
       access_time,
       last_access_time,
       if(access_time - last_access_time >= 60, 1, 0) as is_new_group
from (select user_id,
             access_time,
             lag(access_time) over (partition by user_id order by access_time) as last_access_time
      from user_access_log) t

执行结果

+----------+--------------+-------------------+---------------+
| user_id  | access_time  | last_access_time  | is_new_group  |
+----------+--------------+-------------------+---------------+
| 1        | 1736337600   | NULL              | 0             |
| 1        | 1736337660   | 1736337600        | 1             |
| 1        | 1736337710   | 1736337660        | 0             |
| 1        | 1736337760   | 1736337710        | 0             |
| 1        | 1736337910   | 1736337760        | 1             |
| 2        | 1736337670   | NULL              | 0             |
| 2        | 1736337750   | 1736337670        | 1             |
| 2        | 1736337850   | 1736337750        | 1             |
| 3        | 1736337715   | NULL              | 0             |
| 3        | 1736337820   | 1736337715        | 1             |
+----------+--------------+-------------------+---------------+

(3)得出结果

使用sum()over(partition by ...... order by ......)累加计算,给出组ID。聚合函数开窗使用order by 计算结果是从分组开始计算到当前行的结果。

这里的技巧:需要新建组的时候就给标签赋值1,否则0,然后累加计算结果在新建组的时候值就会变化,根据聚合值分组,得到合并结果。

sql 复制代码
with t_group as
         (select user_id,
                 access_time,
                 last_access_time,
                 if(access_time - last_access_time >= 60, 1, 0) as is_new_group
          from (select user_id,
                       access_time,
                       lag(access_time) over (partition by user_id order by access_time) as last_access_time
                from user_access_log) t)
select user_id,
       access_time,
       last_access_time,
       is_new_group,
       sum(is_new_group) over (partition by user_id order by access_time asc) as group_id
from t_group

执行结果

+----------+--------------+-------------------+---------------+-----------+
| user_id  | access_time  | last_access_time  | is_new_group  | group_id  |
+----------+--------------+-------------------+---------------+-----------+
| 1        | 1736337600   | NULL              | 0             | 0         |
| 1        | 1736337660   | 1736337600        | 1             | 1         |
| 1        | 1736337710   | 1736337660        | 0             | 1         |
| 1        | 1736337760   | 1736337710        | 0             | 1         |
| 1        | 1736337910   | 1736337760        | 1             | 2         |
| 2        | 1736337670   | NULL              | 0             | 0         |
| 2        | 1736337750   | 1736337670        | 1             | 1         |
| 2        | 1736337850   | 1736337750        | 1             | 2         |
| 3        | 1736337715   | NULL              | 0             | 0         |
| 3        | 1736337820   | 1736337715        | 1             | 1         |
+----------+--------------+-------------------+---------------+-----------+
相关推荐
Ai 编码助手4 小时前
MySQL中distinct与group by之间的性能进行比较
数据库·mysql
陈燚_重生之又为程序员4 小时前
基于梧桐数据库的实时数据分析解决方案
数据库·数据挖掘·数据分析
caridle4 小时前
教程:使用 InterBase Express 访问数据库(五):TIBTransaction
java·数据库·express
白云如幻4 小时前
MySQL排序查询
数据库·mysql
萧鼎4 小时前
Python并发编程库:Asyncio的异步编程实战
开发语言·数据库·python·异步
^velpro^4 小时前
数据库连接池的创建
java·开发语言·数据库
荒川之神5 小时前
ORACLE _11G_R2_ASM 常用命令
数据库·oracle
IT培训中心-竺老师5 小时前
Oracle 23AI创建示例库
数据库·oracle
小白学大数据5 小时前
JavaScript重定向对网络爬虫的影响及处理
开发语言·javascript·数据库·爬虫