HIVE实战处理(二十四)留存用户数

bash 复制代码
留存概念:
次X日活跃留存,次X日新增留存,也就是看今天的新增或活跃用户在后续几天的留存情况

一、留存表的生成逻辑

因为用户活跃日期和留存的日期无法对齐所以搞了2级分区(dt,static_day)

1)首先获得计算日D、根据要出的次X日留存,推算出前面的DT ,整体从活跃表里根据这些日期生成临时活跃表tmp1

2)分别把计算DT和前X日的DT进行匹配,按相差的天数进行匹配,如果匹配一直分别得到对应的次X日留存标识。

3)需要使用1个新的字段存储留存指标的的日期,比如20250701号的留存keep1_num只能等20250702号过完才能计算,那对应也是7.1号算留存日期,是指在DT=20250702的留存时间。

所以根据dt往前推算的日期都是留存日期,不能写到dt这个字段里,因为除了留存指标外还要计算统计日的指标。

如果留存日期=统计日期的,出的当日活跃。留存日期< 统计日期的话,出的是次X日留存指标。

bash 复制代码
--活跃临时表
create table tmp1 as 
select 
    ,t1.uuid
    ,t1.dt                                  as statis_day
    ,case when t1.dt='${DT}' then 'Y' else 'N' end                      as keep_0d_active_flag
	,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-1), '-', '') then 'Y'
          else 'N' end                      as keep_1d_active_flag
    ,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-2), '-', '') then 'Y'
          else 'N' end                      as keep_2d_active_flag     
    ,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-3), '-', '') then 'Y'
          else 'N' end                      as keep_3d_active_flag      
    ,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-4), '-', '') then 'Y'
          else 'N' end                      as keep_4d_active_flag      
    ,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-5), '-', '') then 'Y'
          else 'N' end                      as keep_5d_active_flag      
    ,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-6), '-', '') then 'Y'
          else 'N' end                      as keep_6d_active_flag      
    ,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-7), '-', '') then 'Y'
          else 'N' end                      as keep_7d_active_flag        
         
from 活跃表 t1
where t1.dt in ( 
${DT}
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-1), '-', '')
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-2), '-', '')
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-3), '-', '')
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-4), '-', '')
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-5), '-', '')
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-6), '-', '')    
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-7), '-', '')
)


;

--当日活跃以及留存指标
insert overwrite table 留存表 partition(dt='${DT}')
select   
    group_id
   ,statis_day
   ,channel
   ,version
   ,sum(case when keep_0d_active_flag='Y' then 1 else 0 end)  as av
   ,sum(case when keep_1d_active_flag='Y' then 1 else 0 end)  as keep_1d_av
   ,sum(case when keep_2d_active_flag='Y' then 1 else 0 end)  as keep_2d_av
   ,sum(case when keep_3d_active_flag='Y' then 1 else 0 end)  as keep_3d_av
   ,sum(case when keep_4d_active_flag='Y' then 1 else 0 end)  as keep_4d_av
   ,sum(case when keep_5d_active_flag='Y' then 1 else 0 end)  as keep_5d_av
   ,sum(case when keep_6d_active_flag='Y' then 1 else 0 end)  as keep_6d_av
   ,sum(case when keep_7d_active_flag='Y' then 1 else 0 end)  as keep_7d_av
from(
        select 
             cast(grouping__id as bigint)& 7 & 3  as group_id
            ,channel
			,uuid
            ,statis_day
            ,max(keep_1d_active_flag)      as keep_1d_active_flag
            ,max(keep_2d_active_flag)      as keep_2d_active_flag
            ,max(keep_3d_active_flag)      as keep_3d_active_flag
            ,max(keep_4d_active_flag)      as keep_4d_active_flag
            ,max(keep_5d_active_flag)      as keep_5d_active_flag
            ,max(keep_6d_active_flag)      as keep_6d_active_flag
            ,max(keep_7d_active_flag)      as keep_7d_active_flag 
        from tmp1
        group by 
                ,channel  --1
				,version	--2		
				,uuid       -- 4
				,statis_day --8				
                             
        grouping sets(          
             (channel,uuid,statis_day)    
            ,(version,uuid,statis_day)
			,(uuid,statis_day)			
			)
) ta
group by  
    group_id
   ,statis_day
   ,channel
   ,version

二、对于留存的表的查询处理

1)非留存指标的话,直接使用where dt between '20250701' and '20250707'

2)对于留存指标要取static_day,这个static_day是代表留存日期在dt的不同留存指标。

select

dt

,sum(active_num)

,sum(keep1_num)

,sum(keep2_num)

,sum(keep3_num)

,sum(keep4_num)

from

(select

dt,

,active_num

,0 as keep1_num

,0 as keep2_num

,0 as keep3_num

,0 as keep4_num

from 留存表 where dt between '20250701' and '20250704'

union all

select

static_day dt,

,0 as active_num

,keep1_num

,keep2_num

,keep3_num

,keep4_num

from 留存表 where static_day between '20250701' and '20250704'

) t group by dt

相关推荐
WHD30624 分钟前
苏州数据库(SQL Oracle)文件损坏修复
hadoop·sql·sqlite·flume·memcached
ClouderaHadoop34 分钟前
CDH集群机房搬迁方案
大数据·hadoop·cloudera·cdh
心疼你的一切2 小时前
基于CANN仓库打造轻量级AIGC:一键生成图片语义描述
数据仓库·aigc·cann
AC赳赳老秦6 小时前
代码生成超越 GPT-4:DeepSeek-V4 编程任务实战与 2026 开发者效率提升指南
数据库·数据仓库·人工智能·科技·rabbitmq·memcache·deepseek
心疼你的一切7 小时前
拆解 CANN 仓库:实现 AIGC 文本生成昇腾端部署
数据仓库·深度学习·aigc·cann
心疼你的一切7 小时前
模态交响:CANN驱动的跨模态AIGC统一架构
数据仓库·深度学习·架构·aigc·cann
心疼你的一切8 小时前
解锁CANN仓库核心能力:从零搭建AIGC轻量文本生成实战(附代码+流程图)
数据仓库·深度学习·aigc·流程图·cann
秃了也弱了。10 小时前
StarRocks:高性能分析型数据仓库
数据仓库
心疼你的一切10 小时前
数字智人:CANN加速的实时数字人生成与交互
数据仓库·深度学习·aigc·交互·cann
心疼你的一切11 小时前
语音革命:CANN驱动实时语音合成的技术突破
数据仓库·开源·aigc·cann