文章来源:
https://www.lmlyz.online/index/detail/id/132.html
目的:
每天使用定时任务记录今日所有活跃玩家的信息,用于后续做统计,比如开服首日充值分层玩家的后续留存
ReplacingMergeTree引擎表简介:
1、MergeTree支持主键,但主键主要用来缩小查询范围,且不具备唯一性约束,可以正常写入相同主键的数据。但在一些情况下,可能需要表中没有主键重复的数据。ReplacingMergeTree就是在MergeTree的基础上加入了去重的功能,但它仅会在合并分区时,去删除重复的数据,写入相同数据时并不会引发异常。
2、ReplacingMergeTree引擎创建规范为:ENGINE = ReplacingMergeTree([ver]),其中ver为选填参数,它需要指定一个UInt8/UInt16、Date或DateTime类型的字段,它决定了数据去重时所用的算法,如果没有设置该参数,合并时保留分组内的最后一条数据;如果指定了该参数,则保留ver字段取值最大的那一行。
活跃玩家表创建:
sql
CREATE TABLE player_active_data (
`create_time` UInt64 DEFAULT toUnixTimestamp (now()) COMMENT '入库时间',
`log_date` Date DEFAULT toDate ( report_time ) COMMENT '日期',
`report_time` UInt64 COMMENT '统计日期',
`server_id` UInt64 DEFAULT 0 COMMENT '区服id',
`server_open_time` UInt64 DEFAULT 0 COMMENT '开服时间',
`open_days` UInt32 DEFAULT 0 COMMENT '开服第N天',
`open_register_days` UInt32 DEFAULT 0 COMMENT '开服第N天注册',
`account` String DEFAULT '' COMMENT '玩家账号',
`player_id` UInt64 DEFAULT 0 COMMENT '玩家id',
`player_create_time` UInt64 DEFAULT 0 COMMENT '玩家注册时间',
`guild_id` UInt64 DEFAULT 0 COMMENT '帮派ID',
`player_name` String DEFAULT '' COMMENT '玩家名称',
`player_level` UInt32 DEFAULT 1 COMMENT '玩家等级',
`player_vip_level` UInt32 DEFAULT 0 COMMENT '玩家vip等级',
`player_fight` UInt64 DEFAULT 0 COMMENT '玩家战力',
`player_charge_money` UInt32 DEFAULT 0 COMMENT '玩家累积充值金额',
`player_first_charge_time` UInt64 DEFAULT 0 COMMENT '玩家首充时间',
`player_first_charge_money` UInt64 DEFAULT 0 COMMENT '玩家首充金额',
`player_open_day_charge_money` UInt64 DEFAULT 0 COMMENT '玩家开服首日充值金额',
`player_day_charge_money` UInt32 DEFAULT 0 COMMENT '当日充值金额',
......
......
其余玩家信息
......
) ENGINE = ReplacingMergeTree ( create_time )
PARTITION BY toYYYYMMDD (fromUnixTimestamp ( report_time ))
ORDER BY ( report_time, server_id, player_id ) SETTINGS index_granularity = 8192;
以上表根据数据的插入时间(create_time),同一天同一个服的每个玩家合并分区时只保留create_time最大的那一条数据,即可重复插入数据合并分区后最后只会保留最后一条。
注意事项:
1、合并分区不会立马合,需要达到一定的数据量才会合,所以在使用这个表的时候有可能查出两条相同的数据,所以使用的时候应该用聚合函数 anyLast() 取最后一条数据使用
2、也可以立刻手动合并分区:
sql
OPTIMIZE TABLE player_active_data FINAL;
3、查看当前分区情况:
sql
SELECT
*
FROM
system.parts
WHERE
table = 'player_active_data'
AND active
ORDER BY
modification_time DESC;
最佳实践:
使用以上表格统计开服首日充值玩家的后续留存情况,并按开服首日充值档位进行分组
SQL如下:
sql
WITH LastData AS (
SELECT
report_time,
group_id,
server_id,
player_id,
toDate ( player_create_time ) AS create_date,
anyLast ( player_open_day_charge_money ) AS player_open_day_charge_money,
anyLast ( open_days ) AS open_days
FROM
player_active_stat_log
WHERE
`server_id` IN (11111,222222,333333)
AND `report_time` BETWEEN 1747929600 AND 1755791999
AND `player_create_time` BETWEEN 1747929600 AND 1748015999
GROUP BY
report_time,
group_id,
player_create_server_id,
player_id,
create_date
) SELECT
create_date,
CASE
WHEN player_open_day_charge_money BETWEEN 0 AND 0 THEN '0'
WHEN player_open_day_charge_money BETWEEN 1 AND 6 THEN '1'
WHEN player_open_day_charge_money BETWEEN 7 AND 30 THEN'2'
WHEN player_open_day_charge_money BETWEEN 31 AND 98 THEN '3'
WHEN player_open_day_charge_money BETWEEN 99 AND 200 THEN '4'
WHEN player_open_day_charge_money BETWEEN 201 AND 500 THEN '5'
WHEN player_open_day_charge_money BETWEEN 501 AND 2000 THEN '6'
WHEN player_open_day_charge_money BETWEEN 2001 AND 999999 THEN '7'
END AS LEVEL,
countIf ( DISTINCT player_id, open_days = 1 ) AS player_active_1,
countIf ( DISTINCT player_id, open_days = 2 ) AS player_active_2,
countIf ( DISTINCT player_id, open_days = 3 ) AS player_active_3,
countIf ( DISTINCT player_id, open_days = 4 ) AS player_active_4,
countIf ( DISTINCT player_id, open_days = 5 ) AS player_active_5,
countIf ( DISTINCT player_id, open_days = 6 ) AS player_active_6,
countIf ( DISTINCT player_id, open_days = 7 ) AS player_active_7,
countIf ( DISTINCT player_id, open_days = 8 ) AS player_active_8,
countIf ( DISTINCT player_id, open_days = 9 ) AS player_active_9,
countIf ( DISTINCT player_id, open_days = 10 ) AS player_active_10,
countIf ( DISTINCT player_id, open_days = 11 ) AS player_active_11,
countIf ( DISTINCT player_id, open_days = 12 ) AS player_active_12,
countIf ( DISTINCT player_id, open_days = 13 ) AS player_active_13,
countIf ( DISTINCT player_id, open_days = 14 ) AS player_active_14,
countIf ( DISTINCT player_id, open_days = 21 ) AS player_active_21,
countIf ( DISTINCT player_id, open_days = 30 ) AS player_active_30,
countIf ( DISTINCT player_id, open_days = 60 ) AS player_active_60,
countIf ( DISTINCT player_id, open_days = 90 ) AS player_active_90
FROM LastData GROUP BY level,create_date
前面使用WITH确保还没合并分区的时候也是查的最后一条数据,后面使用CASE WHEN对开服首日充值付费的金额进行分层,最后最活跃玩家进行统计计数,按层级和注册日期进行分组。