刚刚结束,真的死得透透的。
前面八股啥的还要我下去了解分层,了解他们的意义。
先说后面俩SQL题吧
求互相关注的对数
我读题不仔细,没看到被关注者是数组!结果傻眼了,面试官问我爆炸函数,我只能说练习中没用到!
sql
我感觉这样就可以了,但是HIVE一直报错explode,我输了。。。
WITH exploded_fans AS (
SELECT from_user, explode(to_user_array) AS to_user FROM fans
)
SELECT COUNT(*) AS mutual_follows_count
FROM exploded_fans ef1
JOIN exploded_fans ef2
ON ef1.from_user = ef2.to_user
AND ef1.to_user = ef2.from_user;
连续登录
这个只做出来了row number,而且还是没写对,partition by一开始没写上!后面提醒我连续的事,傻眼了。彻底傻眼。
sql
应该可以吧
WITH ranked_logs AS (
SELECT
uid,
dt,
ROW_NUMBER() OVER (PARTITION BY uid ORDER BY dt) AS row_num
FROM user_log
),
date_diff AS (
SELECT
a.uid,
a.dt,
DATEDIFF(a.dt, b.dt) AS diff
FROM ranked_logs a
JOIN ranked_logs b
ON a.uid = b.uid
AND a.row_num = b.row_num + 1
)
SELECT DISTINCT uid
FROM date_diff
where diff = 1
GROUP BY uid
HAVING COUNT(*) = 6;
sql
有一个用户粉丝表fans,表中有两个字段:
from_user 关注者 用户ID,
to_user_array 被关注者的一个数组,关注者的粉丝ID集。
如果两个人互相关注,则为一对,求互相关注的对数。
1 2,3
2 1
3 1
# split
# exp
# t t1 join t t2 on t1 in t2.to_user_array
现有一张用户日志表 user_log ,表中有2个字段:
uid 用户id
dt 用户登陆日期
求连续登录7天的用户
# CET
# 连续 !
with t as (
select
uid,
dt,
row number() over(partition by uid order by dt asc) as rk
from user_log
)
select uid from t where rk >= 7