使用clickhouse的ReplacingMergeTree引擎表做活跃玩家信息表

文章来源:

https://www.lmlyz.online/index/detail/id/132.html

目的:

每天使用定时任务记录今日所有活跃玩家的信息,用于后续做统计,比如开服首日充值分层玩家的后续留存

ReplacingMergeTree引擎表简介:

1、MergeTree支持主键,但主键主要用来缩小查询范围,且不具备唯一性约束,可以正常写入相同主键的数据。但在一些情况下,可能需要表中没有主键重复的数据。ReplacingMergeTree就是在MergeTree的基础上加入了去重的功能,但它仅会在合并分区时,去删除重复的数据,写入相同数据时并不会引发异常。

2、ReplacingMergeTree引擎创建规范为:ENGINE = ReplacingMergeTree([ver]),其中ver为选填参数,它需要指定一个UInt8/UInt16、Date或DateTime类型的字段,它决定了数据去重时所用的算法,如果没有设置该参数,合并时保留分组内的最后一条数据;如果指定了该参数,则保留ver字段取值最大的那一行。

活跃玩家表创建:

sql 复制代码
CREATE TABLE player_active_data (
    `create_time` UInt64 DEFAULT toUnixTimestamp (now()) COMMENT '入库时间',
    `log_date` Date DEFAULT toDate ( report_time ) COMMENT '日期',
    `report_time` UInt64 COMMENT '统计日期',
    `server_id` UInt64 DEFAULT 0 COMMENT '区服id',
    `server_open_time` UInt64 DEFAULT 0 COMMENT '开服时间',
    `open_days` UInt32 DEFAULT 0 COMMENT '开服第N天',
    `open_register_days` UInt32 DEFAULT 0 COMMENT '开服第N天注册',
    `account` String DEFAULT '' COMMENT '玩家账号',
    `player_id` UInt64 DEFAULT 0 COMMENT '玩家id',
    `player_create_time` UInt64 DEFAULT 0 COMMENT '玩家注册时间',
    `guild_id` UInt64 DEFAULT 0 COMMENT '帮派ID',
    `player_name` String DEFAULT '' COMMENT '玩家名称',
    `player_level` UInt32 DEFAULT 1 COMMENT '玩家等级',
    `player_vip_level` UInt32 DEFAULT 0 COMMENT '玩家vip等级',
    `player_fight` UInt64 DEFAULT 0 COMMENT '玩家战力',
    `player_charge_money` UInt32 DEFAULT 0 COMMENT '玩家累积充值金额',
    `player_first_charge_time` UInt64 DEFAULT 0 COMMENT '玩家首充时间',
    `player_first_charge_money` UInt64 DEFAULT 0 COMMENT '玩家首充金额',
    `player_open_day_charge_money` UInt64 DEFAULT 0 COMMENT '玩家开服首日充值金额',
    `player_day_charge_money` UInt32 DEFAULT 0 COMMENT '当日充值金额',
    ......
    ......
    其余玩家信息
    ......
    ) ENGINE = ReplacingMergeTree ( create_time ) 
PARTITION BY toYYYYMMDD (fromUnixTimestamp ( report_time )) 
ORDER BY ( report_time, server_id, player_id ) SETTINGS index_granularity = 8192;

以上表根据数据的插入时间(create_time),同一天同一个服的每个玩家合并分区时只保留create_time最大的那一条数据,即可重复插入数据合并分区后最后只会保留最后一条。

注意事项:

1、合并分区不会立马合,需要达到一定的数据量才会合,所以在使用这个表的时候有可能查出两条相同的数据,所以使用的时候应该用聚合函数 anyLast() 取最后一条数据使用

2、也可以立刻手动合并分区:

sql 复制代码
OPTIMIZE TABLE player_active_data FINAL;

3、查看当前分区情况:

sql 复制代码
SELECT
    *
FROM
    system.parts
WHERE
    table = 'player_active_data'
    AND active
ORDER BY
    modification_time DESC;

最佳实践:

使用以上表格统计开服首日充值玩家的后续留存情况,并按开服首日充值档位进行分组

SQL如下:

sql 复制代码
WITH LastData AS (
    SELECT
        report_time,
        group_id,
        server_id,
        player_id,
        toDate ( player_create_time ) AS create_date,
        anyLast ( player_open_day_charge_money ) AS player_open_day_charge_money,
        anyLast ( open_days ) AS open_days 
    FROM
        player_active_stat_log 
    WHERE
        `server_id` IN (11111,222222,333333) 
        AND `report_time` BETWEEN 1747929600 AND 1755791999 
        AND `player_create_time` BETWEEN 1747929600 AND 1748015999 
    GROUP BY
        report_time,
        group_id,
        player_create_server_id,
        player_id,
        create_date 
    ) SELECT
    create_date,
CASE
        WHEN player_open_day_charge_money BETWEEN 0 AND 0 THEN '0' 
        WHEN player_open_day_charge_money BETWEEN 1 AND 6 THEN '1' 
        WHEN player_open_day_charge_money BETWEEN 7 AND 30 THEN'2' 
        WHEN player_open_day_charge_money BETWEEN 31 AND 98 THEN '3' 
        WHEN player_open_day_charge_money BETWEEN 99 AND 200 THEN '4' 
        WHEN player_open_day_charge_money BETWEEN 201 AND 500 THEN '5' 
        WHEN player_open_day_charge_money BETWEEN 501 AND 2000 THEN '6' 
        WHEN player_open_day_charge_money BETWEEN 2001 AND 999999 THEN '7' 
        END AS LEVEL,
countIf ( DISTINCT player_id, open_days = 1 ) AS player_active_1,
countIf ( DISTINCT player_id, open_days = 2 ) AS player_active_2,
countIf ( DISTINCT player_id, open_days = 3 ) AS player_active_3,
countIf ( DISTINCT player_id, open_days = 4 ) AS player_active_4,
countIf ( DISTINCT player_id, open_days = 5 ) AS player_active_5,
countIf ( DISTINCT player_id, open_days = 6 ) AS player_active_6,
countIf ( DISTINCT player_id, open_days = 7 ) AS player_active_7,
countIf ( DISTINCT player_id, open_days = 8 ) AS player_active_8,
countIf ( DISTINCT player_id, open_days = 9 ) AS player_active_9,
countIf ( DISTINCT player_id, open_days = 10 ) AS player_active_10,
countIf ( DISTINCT player_id, open_days = 11 ) AS player_active_11,
countIf ( DISTINCT player_id, open_days = 12 ) AS player_active_12,
countIf ( DISTINCT player_id, open_days = 13 ) AS player_active_13,
countIf ( DISTINCT player_id, open_days = 14 ) AS player_active_14,
countIf ( DISTINCT player_id, open_days = 21 ) AS player_active_21,
countIf ( DISTINCT player_id, open_days = 30 ) AS player_active_30,
countIf ( DISTINCT player_id, open_days = 60 ) AS player_active_60,
countIf ( DISTINCT player_id, open_days = 90 ) AS player_active_90 
FROM LastData GROUP BY level,create_date

前面使用WITH确保还没合并分区的时候也是查的最后一条数据,后面使用CASE WHEN对开服首日充值付费的金额进行分层,最后最活跃玩家进行统计计数,按层级和注册日期进行分组。

相关推荐
xiaogai_gai1 小时前
钉钉通讯录与金蝶云星空无缝集成的技术实现方法
大数据·数据库·钉钉
TDengine (老段)2 小时前
TDengine 中集群维护
大数据·运维·数据库·时序数据库·tdengine·涛思数据·物联
MonKingWD2 小时前
【redis原理篇】底层数据结构
数据结构·数据库·redis
渡梦酒3 小时前
Redis批量删除Key的三种方式
数据库·redis·junit
别来无恙1494 小时前
如何用 SQL 找到最受欢迎的用户?
数据库·sql·mysql
vvilkim5 小时前
MongoDB聚合框架:大数据处理的瑞士军刀
数据库·mongodb
heart000_15 小时前
MySQL增删改查基础教程:熟练掌握DML语句操作【MySQL系列】
数据库·mysql
cici158745 小时前
粗糙表面生成程序及模拟方法
linux·前端·数据库
低代码布道师5 小时前
第七部分:第二节 - 在 Node.js 中连接和操作 MySQL:厨房与仓库的沟通渠道
数据库·mysql·node.js
小Mie不吃饭6 小时前
Redis | 缓存技术对后端的重要性
数据库·redis·缓存