导购电商平台用户行为分析系统:基于Flink的实时数据处理架构
大家好,我是省赚客APP研发者阿宝!在"省赚客"导购返利平台中,用户点击、浏览、下单、分享等行为数据是优化推荐算法、提升转化率的关键。为实现秒级响应的用户画像更新与实时营销触发,我们构建了基于Apache Flink的流式处理系统,从Kafka摄入原始事件,经清洗、聚合、关联后写入ClickHouse与Redis,支撑BI看板与实时策略引擎。
事件采集与Kafka Topic设计
前端SDK通过HTTP上报用户行为至Nginx日志,Logstash采集后写入Kafka。核心Topic如下:
user_click_events:用户点击商品/任务user_order_events:订单创建/完成user_share_events:分享成功事件
每条消息为JSON格式,包含userId, eventType, itemId, timestamp, sessionId等字段。

Flink作业主干结构
Flink作业使用Java编写,主类位于juwatech.cn.flink.UserBehaviorAnalysisJob:
java
package juwatech.cn.flink;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
public class UserBehaviorAnalysisJob {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(10000); // 每10秒checkpoint
// 1. 读取Kafka点击流
var clickStream = KafkaSourceBuilder.buildClickSource(env);
// 2. 清洗并转换为内部对象
var cleanClicks = clickStream
.filter(event -> event.getUserId() != null && event.getItemId() != null)
.map(new ClickEventMapper());
// 3. 实时统计PV/UV(5分钟窗口)
var pvuv = cleanClicks
.keyBy(click -> "global")
.window(TumblingProcessingTimeWindows.of(Time.minutes(5)))
.aggregate(new PvUvAgg(), new PvUvResultWindowFunction());
// 4. 写入ClickHouse
pvuv.addSink(new ClickHouseSink());
// 5. 用户实时偏好计算
var userPreference = cleanClicks
.keyBy(ClickEvent::getUserId)
.process(new UserPreferenceProcessFunction());
userPreference.addSink(new RedisUserPreferenceSink());
env.execute("UserBehaviorAnalysisJob");
}
}
自定义ProcessFunction实现用户偏好更新
我们通过KeyedProcessFunction维护每个用户的最近点击品类,并滑动更新偏好权重:
java
package juwatech.cn.flink.function;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;
import juwatech.cn.flink.model.ClickEvent;
import juwatech.cn.flink.model.UserPreference;
public class UserPreferenceProcessFunction
extends KeyedProcessFunction<String, ClickEvent, UserPreference> {
private ValueState<UserPreference> preferenceState;
@Override
public void open(Configuration parameters) {
preferenceState = getRuntimeContext().getState(
new ValueStateDescriptor<>("userPref", UserPreference.class)
);
}
@Override
public void processElement(ClickEvent event, Context ctx, Collector<UserPreference> out) {
UserPreference current = preferenceState.value();
if (current == null) {
current = new UserPreference(event.getUserId());
}
// 衰减旧权重,累加新点击
current.decayAndAdd(event.getCategoryId(), 0.95, 1.0);
preferenceState.update(current);
out.collect(current);
// 注册10分钟后清理(防僵尸用户)
ctx.timerService().registerProcessingTimeTimer(ctx.timerService().currentProcessingTime() + 600_000L);
}
@Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector<UserPreference> out) {
preferenceState.clear();
}
}
其中decayAndAdd方法对历史品类权重指数衰减,确保近期行为占主导。
维表关联:商品信息补全
点击事件仅含itemId,需关联商品维度表获取类目、价格等信息。我们使用AsyncFunction异步查询MySQL:
java
package juwatech.cn.flink.async;
import org.apache.flink.streaming.api.functions.async.ResultFuture;
import org.apache.flink.streaming.api.functions.async.RichAsyncFunction;
import java.util.Collections;
import java.util.concurrent.CompletableFuture;
public class ItemInfoAsyncLookup
extends RichAsyncFunction<ClickEvent, EnrichedClickEvent> {
private transient JdbcItemService itemService;
@Override
public void open(Configuration parameters) {
itemService = new JdbcItemService("jdbc:mysql://meta.juwatech.cn:3306/item_db");
}
@Override
public void asyncInvoke(ClickEvent input, ResultFuture<EnrichedClickEvent> resultFuture) {
CompletableFuture.supplyAsync(() -> {
ItemInfo item = itemService.getById(input.getItemId());
return new EnrichedClickEvent(input, item);
}).thenAccept(enriched -> {
resultFuture.complete(Collections.singletonList(enriched));
}).exceptionally(e -> {
resultFuture.completeExceptionally(e);
return null;
});
}
}
主流程中调用:
java
var enrichedClicks = AsyncDataStream.unorderedWait(
cleanClicks,
new ItemInfoAsyncLookup(),
2000, TimeUnit.MILLISECONDS,
100
);
结果输出:ClickHouse与Redis
聚合指标写入ClickHouse供BI查询:
java
package juwatech.cn.flink.sink;
import ru.yandex.clickhouse.ClickHouseConnectionImpl;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
public class ClickHouseSink extends RichSinkFunction<PvUvResult> {
private ClickHouseConnection conn;
@Override
public void open(Configuration parameters) {
conn = new ClickHouseConnectionImpl("jdbc:clickhouse://ch.juwatech.cn:8123/analytics");
}
@Override
public void invoke(PvUvResult value, Context context) {
String sql = "INSERT INTO user_behavior_metrics (window_end, pv, uv) VALUES (?, ?, ?)";
try (var stmt = conn.createStatement()) {
stmt.execute(sql.replaceFirst("\\?", String.valueOf(value.getWindowEnd()))
.replaceFirst("\\?", String.valueOf(value.getPv()))
.replaceFirst("\\?", String.valueOf(value.getUv())));
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
用户偏好实时写入Redis Hash:
java
// 在 RedisUserPreferenceSink.invoke() 中
jedis.hset("user:pref:" + pref.getUserId(), "category_weights",
JsonUtil.toJson(pref.getCategoryWeights()));
jedis.expire("user:pref:" + pref.getUserId(), 86400); // 24小时过期
本文著作权归聚娃科技省赚客app开发者团队,转载请注明出处!