SQL面试题练习 —— 合并用户浏览行为

目录

  • [1 题目](#1 题目)
  • [2 建表语句](#2 建表语句)
  • [3 题解](#3 题解)

1 题目

有一份用户访问记录表,记录用户id和访问时间,如果用户访问时间间隔小于60s则认为时一次浏览,请合并用户的浏览行为。

样例数据

复制代码
+----------+--------------+
| user_id  | access_time  |
+----------+--------------+
| 1        | 1736337600   |
| 1        | 1736337660   |
| 2        | 1736337670   |
| 1        | 1736337710   |
| 3        | 1736337715   |
| 2        | 1736337750   |
| 1        | 1736337760   |
| 3        | 1736337820   |
| 2        | 1736337850   |
| 1        | 1736337910   |
+----------+--------------+

2 建表语句

sql 复制代码
--建表语句
CREATE TABLE user_access_log (
  user_id INT,
  access_time BIGINT
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
--插入数据
insert into user_access_log (user_id,access_time)
values
(1,1736337600),
(1,1736337660),
(2,1736337670),
(1,1736337710),
(3,1736337715),
(2,1736337750),
(1,1736337760),
(3,1736337820),
(2,1736337850),
(1,1736337910);

3 题解

(1)分用户计算出每次点击时间差;

sql 复制代码
select user_id,
       access_time,
       last_access_time,
       access_time - last_access_time as time_diff
from (select user_id,
             access_time,
             lag(access_time) over (partition by user_id order by access_time) as last_access_time
      from user_access_log) t

执行结果

复制代码
+----------+--------------+-------------------+------------+
| user_id  | access_time  | last_access_time  | time_diff  |
+----------+--------------+-------------------+------------+
| 1        | 1736337600   | NULL              | NULL       |
| 1        | 1736337660   | 1736337600        | 60         |
| 1        | 1736337710   | 1736337660        | 50         |
| 1        | 1736337760   | 1736337710        | 50         |
| 1        | 1736337910   | 1736337760        | 150        |
| 2        | 1736337670   | NULL              | NULL       |
| 2        | 1736337750   | 1736337670        | 80         |
| 2        | 1736337850   | 1736337750        | 100        |
| 3        | 1736337715   | NULL              | NULL       |
| 3        | 1736337820   | 1736337715        | 105        |
+----------+--------------+-------------------+------------+

(2)确认是否是新的访问

sql 复制代码
select user_id,
       access_time,
       last_access_time,
       if(access_time - last_access_time >= 60, 1, 0) as is_new_group
from (select user_id,
             access_time,
             lag(access_time) over (partition by user_id order by access_time) as last_access_time
      from user_access_log) t

执行结果

复制代码
+----------+--------------+-------------------+---------------+
| user_id  | access_time  | last_access_time  | is_new_group  |
+----------+--------------+-------------------+---------------+
| 1        | 1736337600   | NULL              | 0             |
| 1        | 1736337660   | 1736337600        | 1             |
| 1        | 1736337710   | 1736337660        | 0             |
| 1        | 1736337760   | 1736337710        | 0             |
| 1        | 1736337910   | 1736337760        | 1             |
| 2        | 1736337670   | NULL              | 0             |
| 2        | 1736337750   | 1736337670        | 1             |
| 2        | 1736337850   | 1736337750        | 1             |
| 3        | 1736337715   | NULL              | 0             |
| 3        | 1736337820   | 1736337715        | 1             |
+----------+--------------+-------------------+---------------+

(3)得出结果

使用sum()over(partition by ...... order by ......)累加计算,给出组ID。聚合函数开窗使用order by 计算结果是从分组开始计算到当前行的结果。

这里的技巧:需要新建组的时候就给标签赋值1,否则0,然后累加计算结果在新建组的时候值就会变化,根据聚合值分组,得到合并结果。

sql 复制代码
with t_group as
         (select user_id,
                 access_time,
                 last_access_time,
                 if(access_time - last_access_time >= 60, 1, 0) as is_new_group
          from (select user_id,
                       access_time,
                       lag(access_time) over (partition by user_id order by access_time) as last_access_time
                from user_access_log) t)
select user_id,
       access_time,
       last_access_time,
       is_new_group,
       sum(is_new_group) over (partition by user_id order by access_time asc) as group_id
from t_group

执行结果

复制代码
+----------+--------------+-------------------+---------------+-----------+
| user_id  | access_time  | last_access_time  | is_new_group  | group_id  |
+----------+--------------+-------------------+---------------+-----------+
| 1        | 1736337600   | NULL              | 0             | 0         |
| 1        | 1736337660   | 1736337600        | 1             | 1         |
| 1        | 1736337710   | 1736337660        | 0             | 1         |
| 1        | 1736337760   | 1736337710        | 0             | 1         |
| 1        | 1736337910   | 1736337760        | 1             | 2         |
| 2        | 1736337670   | NULL              | 0             | 0         |
| 2        | 1736337750   | 1736337670        | 1             | 1         |
| 2        | 1736337850   | 1736337750        | 1             | 2         |
| 3        | 1736337715   | NULL              | 0             | 0         |
| 3        | 1736337820   | 1736337715        | 1             | 1         |
+----------+--------------+-------------------+---------------+-----------+
相关推荐
曹牧3 分钟前
oracle kv字符串转换为多行两列
数据库·oracle
CV艺术家14 分钟前
java原mysql切换国产达梦数据库
数据库·mysql
好大哥呀14 分钟前
如何在Spring Boot中配置数据库连接?
数据库·spring boot·后端
xcLeigh21 分钟前
IoTDB数据订阅API实战:实时消费数据+TsFile订阅全攻略
数据库·api·iotdb·数据备份·tsfile·数据订阅
许杰小刀24 分钟前
使用 Python 将 Excel 数据批量导入到数据库中(SQLite)
数据库·python·excel
一个天蝎座 白勺 程序猿25 分钟前
Apache IoTDB(16):时序数据库的数据删除从单点精准清除到企业级数据生命周期管理
数据库·apache·时序数据库·iotdb
努力进修28 分钟前
【MySQL】90% 的 MySQL 性能问题都和它有关!索引的正确打开方式,看完少走 3 年弯路
数据库·mysql
架构师老Y29 分钟前
005、数据库选型与ORM技术:SQLAlchemy深度解析
数据库·python
清水白石00830 分钟前
Python 在数据栈中的边界:何时高效原型、何时切换到 SQL、Spark、Rust 或数据库原生能力
数据库·python·自动化
dishugj31 分钟前
sqlplus / as sysdba登录数据库报错ora-01017解决办法
数据库·oracle