Flink 流式读取 Debezium CDC 数据写入 Hudi 表无法处理 -D / Delete 消息

问题场景是:使用 Kafka Connect 的 Debezium MySQL Source Connector 将 MySQL 的 CDC 数据 (Avro 格式)接入到 Kafka 之后,通过 Flink 读取并解析这些 CDC 数据,然后以流式方式写入到 Hudi 表中,测试中发现,INSERT 和 UPDATE 消息都能很好的处理,但是,-D 类型的 Delete 消息被忽略了,即使已经开启了 'changelog.enabled' = 'true' ,既然无效。测试版本:Flink 1.17.1, Hudi 0.14.0, 具体测试工作和脚本在 <> 中已经完整记录,以下是问题表现:

数据库使用的是 Debezium 官方提供的 Docker 镜像,测试表为内置的 inventory 数据库中的 orders 表。操作记录:初始 4 条记录 10001 - 10004 => 添加 10005 => 更新 10001 => 删除 10004,以下是推送至 Kafka 中的全部 CDC 数据:

json 复制代码
Struct{order_number=10001} | {"before":null,"after":{"osci.mysql-server-3.inventory.orders.Value":{"order_number":10001,"order_date":16816,"purchaser":1001,"quantity":1,"product_id":102}},"source":{"version":"2.2.0.Final","connector":"mysql","name":"osci.mysql-server-3","ts_ms":1706686648000,"snapshot":{"string":"first_in_data_collection"},"db":"inventory","sequence":null,"table":{"string":"orders"},"server_id":0,"gtid":null,"file":"mysql-bin.000005","pos":154,"row":0,"thread":null,"query":null},"op":"r","ts_ms":{"long":1706686648863},"transaction":null}
Struct{order_number=10002} | {"before":null,"after":{"osci.mysql-server-3.inventory.orders.Value":{"order_number":10002,"order_date":16817,"purchaser":1002,"quantity":2,"product_id":105}},"source":{"version":"2.2.0.Final","connector":"mysql","name":"osci.mysql-server-3","ts_ms":1706686648000,"snapshot":{"string":"true"},"db":"inventory","sequence":null,"table":{"string":"orders"},"server_id":0,"gtid":null,"file":"mysql-bin.000005","pos":154,"row":0,"thread":null,"query":null},"op":"r","ts_ms":{"long":1706686648864},"transaction":null}
Struct{order_number=10003} | {"before":null,"after":{"osci.mysql-server-3.inventory.orders.Value":{"order_number":10003,"order_date":16850,"purchaser":1002,"quantity":2,"product_id":106}},"source":{"version":"2.2.0.Final","connector":"mysql","name":"osci.mysql-server-3","ts_ms":1706686648000,"snapshot":{"string":"true"},"db":"inventory","sequence":null,"table":{"string":"orders"},"server_id":0,"gtid":null,"file":"mysql-bin.000005","pos":154,"row":0,"thread":null,"query":null},"op":"r","ts_ms":{"long":1706686648897},"transaction":null}
Struct{order_number=10004} | {"before":null,"after":{"osci.mysql-server-3.inventory.orders.Value":{"order_number":10004,"order_date":16852,"purchaser":1003,"quantity":1,"product_id":107}},"source":{"version":"2.2.0.Final","connector":"mysql","name":"osci.mysql-server-3","ts_ms":1706686648000,"snapshot":{"string":"last_in_data_collection"},"db":"inventory","sequence":null,"table":{"string":"orders"},"server_id":0,"gtid":null,"file":"mysql-bin.000005","pos":154,"row":0,"thread":null,"query":null},"op":"r","ts_ms":{"long":1706686648898},"transaction":null}
Struct{order_number=10005} | {"before":null,"after":{"osci.mysql-server-3.inventory.orders.Value":{"order_number":10005,"order_date":19753,"purchaser":1003,"quantity":3,"product_id":105}},"source":{"version":"2.2.0.Final","connector":"mysql","name":"osci.mysql-server-3","ts_ms":1706687538000,"snapshot":{"string":"false"},"db":"inventory","sequence":null,"table":{"string":"orders"},"server_id":223344,"gtid":null,"file":"mysql-bin.000005","pos":354,"row":0,"thread":{"long":6},"query":null},"op":"c","ts_ms":{"long":1706687539115},"transaction":null}
Struct{order_number=10001} | {"before":{"osci.mysql-server-3.inventory.orders.Value":{"order_number":10001,"order_date":16816,"purchaser":1001,"quantity":1,"product_id":102}},"after":{"osci.mysql-server-3.inventory.orders.Value":{"order_number":10001,"order_date":16816,"purchaser":1002,"quantity":5,"product_id":104}},"source":{"version":"2.2.0.Final","connector":"mysql","name":"osci.mysql-server-3","ts_ms":1706687601000,"snapshot":{"string":"false"},"db":"inventory","sequence":null,"table":{"string":"orders"},"server_id":223344,"gtid":null,"file":"mysql-bin.000005","pos":640,"row":0,"thread":{"long":6},"query":null},"op":"u","ts_ms":{"long":1706687601997},"transaction":null}
Struct{order_number=10004} | {"before":{"osci.mysql-server-3.inventory.orders.Value":{"order_number":10004,"order_date":16852,"purchaser":1003,"quantity":1,"product_id":107}},"after":null,"source":{"version":"2.2.0.Final","connector":"mysql","name":"osci.mysql-server-3","ts_ms":1706687635000,"snapshot":{"string":"false"},"db":"inventory","sequence":null,"table":{"string":"orders"},"server_id":223344,"gtid":null,"file":"mysql-bin.000005","pos":947,"row":0,"thread":{"long":6},"query":null},"op":"d","ts_ms":{"long":1706687636121},"transaction":null}
Struct{order_number=10004} | null

读取 Kafka Debezium 源表结果如下:

使用 Tableau / Changelog 模式读取 Hudi Sink 表,无 -D 数据

上述行为和 《Flink Hudi 构建流式数据湖》 这篇文章对应不起来,正常是应该有 -D 记录的! 目前尚未找到原因,欢迎了解情况的朋友留言。

相关推荐
从头再来的码农4 小时前
大数据Flink相关面试题(一)
大数据·flink
MarkHD20 小时前
第四天 从CAN总线到Spark/Flink实时处理
大数据·flink·spark
SparkSql1 天前
FlinkCDC采集MySQL8.4报错
大数据·flink
james的分享1 天前
Flink之Table API
flink·table api
涤生大数据2 天前
带你玩转 Flink TumblingWindow:从理论到代码的深度探索
flink·理论·代码·tumblingwindow
Apache Flink2 天前
网易游戏 Flink 云原生实践
游戏·云原生·flink
SunTecTec3 天前
SQL Server To Paimon Demo by Flink standalone cluster mode
java·大数据·flink
工作中的程序员4 天前
flink监控指标
flink
小马爱打代码4 天前
SpringBoot整合Kafka、Flink实现流式处理
spring boot·flink·kafka