淘客返利app数据中台设计:打通多平台数据的统一数据服务架构
大家好,我是省赚客APP研发者阿可!省赚客APP(juwatech.cn)需对接淘宝联盟、京东联盟、拼多多开放平台、抖音精选联盟等十余个外部数据源,每日处理超 500 万条订单、商品与佣金记录。早期各业务线独立拉取数据,导致口径不一、重复计算、存储冗余。为构建"一份数据、统一服务"的能力,我们搭建了基于 Flink + Iceberg + Doris 的数据中台,实现从原始数据接入、标准化建模到统一 API 服务的全链路闭环。本文结合核心 Java 代码与架构设计,详解多平台数据融合的关键实践。
数据接入层:多源异构数据统一采集
通过 Flink CDC 与 HTTP 轮询混合模式采集原始数据:
java
// 淘宝联盟订单拉取任务(定时调度)
@Scheduled(fixedDelay = 300_000) // 5分钟一次
public void pullTaobaoOrders() {
String nextPageToken = offsetStore.get("taobao_order_cursor");
TaobaoOrderResponse response = taobaoApiClient.queryOrders(
LocalDateTime.now().minusHours(24),
LocalDateTime.now(),
nextPageToken
);
for (TaobaoOrder order : response.getData()) {
// 转换为统一格式
UnifiedOrder unified = TaobaoOrderConverter.toUnified(order);
kafkaProducer.send("raw.orders", JsonUtil.toJson(unified));
}
if (response.hasNext()) {
offsetStore.set("taobao_order_cursor", response.getNextCursor());
}
}
所有原始事件写入 Kafka raw.orders Topic,供下游消费。

Flink 实时清洗与标准化
消费 Kafka 数据,进行字段对齐、状态补全、去重:
java
public class OrderStandardizationJob {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(60_000);
DataStream<String> rawStream = env.addSource(
new FlinkKafkaConsumer<>("raw.orders", new SimpleStringSchema(), kafkaProps)
);
DataStream<UnifiedOrder> standardized = rawStream
.map(json -> JsonUtil.fromJson(json, UnifiedOrder.class))
.keyBy(order -> order.getPlatform() + "_" + order.getOuterOrderId())
.process(new DeduplicationProcessFunction(Time.minutes(10)));
// 写入 Iceberg 表
standardized.sinkTo(
IcebergSink.forRowData()
.tableLoader(TableLoader.fromHadoopTable("hdfs://warehouse/iceberg/db/orders"))
.build()
);
env.execute("Order Standardization Job");
}
}
去重逻辑基于订单 ID + 平台组合,在 10 分钟窗口内保留第一条:
java
public class DeduplicationProcessFunction extends KeyedProcessFunction<String, UnifiedOrder, UnifiedOrder> {
private ValueState<Boolean> seen;
@Override
public void open(Configuration parameters) {
seen = getRuntimeContext().getState(new ValueStateDescriptor<>("seen", Boolean.class));
}
@Override
public void processElement(UnifiedOrder order, Context ctx, Collector<UnifiedOrder> out) throws Exception {
if (seen.value() == null) {
seen.update(true);
out.collect(order);
ctx.timerService().registerEventTimeTimer(ctx.timestamp() + 600_000); // 10分钟后清理
}
}
@Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector<UnifiedOrder> out) {
seen.clear();
}
}
统一数据模型与 Iceberg 分层存储
采用分层建模:
- ODS 层 :原始快照(Iceberg 表
ods.orders_raw); - DWD 层 :清洗后明细(
dwd.order_detail),含统一字段如user_id,commission_amount,order_status; - DWS 层 :聚合指标(
dws.user_daily_summary),按用户+日期汇总。
DWD 表结构示例:
sql
CREATE TABLE dwd.order_detail (
order_id STRING,
user_id BIGINT,
platform STRING,
item_title STRING,
price DECIMAL(12,2),
commission_rate DECIMAL(5,4),
commission_amount DECIMAL(12,2),
order_time TIMESTAMP,
status STRING -- VALID / INVALID / SETTLED
) PARTITIONED BY (days(order_time));
Doris 构建统一查询服务
将 DWD/DWS 表同步至 Apache Doris,提供毫秒级 OLAP 查询:
bash
# 使用 Flink Doris Connector 实时写入
stream.addSink(DorisSink.builder()
.setDorisReadOptions(DorisReadOptions.builder().build())
.setDorisExecutionOptions(DorisExecutionOptions.builder()
.setLabelPrefix("order_detail_sync")
.setStreamLoadProp(streamLoadProps)
.build())
.setSerializer(JsonDebeziumDeserializationSchema.builder()
.setDatabase("dwd")
.setTable("order_detail")
.build())
.build());
Java 服务通过 JDBC 查询 Doris 提供统一 API:
java
@RestController
@RequestMapping("/api/data")
public class UnifiedDataService {
@Autowired
private JdbcTemplate dorisTemplate;
@GetMapping("/user/commission")
public List<UserCommissionStat> getUserCommission(@RequestParam Long userId,
@RequestParam LocalDate startDate,
@RequestParam LocalDate endDate) {
return dorisTemplate.query(
"SELECT platform, SUM(commission_amount) AS total, COUNT(*) AS order_count " +
"FROM dwd.order_detail " +
"WHERE user_id = ? AND order_time >= ? AND order_time < ? AND status = 'VALID' " +
"GROUP BY platform",
(rs, rowNum) -> new UserCommissionStat(
rs.getString("platform"),
rs.getBigDecimal("total"),
rs.getInt("order_count")
),
userId, startDate.atStartOfDay(), endDate.plusDays(1).atStartOfDay()
);
}
}
元数据管理与血缘追踪
自研元数据平台记录字段映射关系,例如:
| 平台字段(淘宝) | 统一字段 | 类型 |
|---|---|---|
tk_status |
status |
STRING |
pub_share_pre_fee |
commission_amount |
DECIMAL |
确保数据可解释、可追溯。
本文著作权归聚娃科技省赚客app开发者团队,转载请注明出处!