Flink 流式读取 Debezium CDC 数据写入 Hudi 表无法处理 -D / Delete 消息

问题场景是:使用 Kafka Connect 的 Debezium MySQL Source Connector 将 MySQL 的 CDC 数据 (Avro 格式)接入到 Kafka 之后,通过 Flink 读取并解析这些 CDC 数据,然后以流式方式写入到 Hudi 表中,测试中发现,INSERT 和 UPDATE 消息都能很好的处理,但是,-D 类型的 Delete 消息被忽略了,即使已经开启了 'changelog.enabled' = 'true' ,既然无效。测试版本:Flink 1.17.1, Hudi 0.14.0, 具体测试工作和脚本在 <> 中已经完整记录,以下是问题表现:

数据库使用的是 Debezium 官方提供的 Docker 镜像,测试表为内置的 inventory 数据库中的 orders 表。操作记录:初始 4 条记录 10001 - 10004 => 添加 10005 => 更新 10001 => 删除 10004,以下是推送至 Kafka 中的全部 CDC 数据:

json 复制代码
Struct{order_number=10001} | {"before":null,"after":{"osci.mysql-server-3.inventory.orders.Value":{"order_number":10001,"order_date":16816,"purchaser":1001,"quantity":1,"product_id":102}},"source":{"version":"2.2.0.Final","connector":"mysql","name":"osci.mysql-server-3","ts_ms":1706686648000,"snapshot":{"string":"first_in_data_collection"},"db":"inventory","sequence":null,"table":{"string":"orders"},"server_id":0,"gtid":null,"file":"mysql-bin.000005","pos":154,"row":0,"thread":null,"query":null},"op":"r","ts_ms":{"long":1706686648863},"transaction":null}
Struct{order_number=10002} | {"before":null,"after":{"osci.mysql-server-3.inventory.orders.Value":{"order_number":10002,"order_date":16817,"purchaser":1002,"quantity":2,"product_id":105}},"source":{"version":"2.2.0.Final","connector":"mysql","name":"osci.mysql-server-3","ts_ms":1706686648000,"snapshot":{"string":"true"},"db":"inventory","sequence":null,"table":{"string":"orders"},"server_id":0,"gtid":null,"file":"mysql-bin.000005","pos":154,"row":0,"thread":null,"query":null},"op":"r","ts_ms":{"long":1706686648864},"transaction":null}
Struct{order_number=10003} | {"before":null,"after":{"osci.mysql-server-3.inventory.orders.Value":{"order_number":10003,"order_date":16850,"purchaser":1002,"quantity":2,"product_id":106}},"source":{"version":"2.2.0.Final","connector":"mysql","name":"osci.mysql-server-3","ts_ms":1706686648000,"snapshot":{"string":"true"},"db":"inventory","sequence":null,"table":{"string":"orders"},"server_id":0,"gtid":null,"file":"mysql-bin.000005","pos":154,"row":0,"thread":null,"query":null},"op":"r","ts_ms":{"long":1706686648897},"transaction":null}
Struct{order_number=10004} | {"before":null,"after":{"osci.mysql-server-3.inventory.orders.Value":{"order_number":10004,"order_date":16852,"purchaser":1003,"quantity":1,"product_id":107}},"source":{"version":"2.2.0.Final","connector":"mysql","name":"osci.mysql-server-3","ts_ms":1706686648000,"snapshot":{"string":"last_in_data_collection"},"db":"inventory","sequence":null,"table":{"string":"orders"},"server_id":0,"gtid":null,"file":"mysql-bin.000005","pos":154,"row":0,"thread":null,"query":null},"op":"r","ts_ms":{"long":1706686648898},"transaction":null}
Struct{order_number=10005} | {"before":null,"after":{"osci.mysql-server-3.inventory.orders.Value":{"order_number":10005,"order_date":19753,"purchaser":1003,"quantity":3,"product_id":105}},"source":{"version":"2.2.0.Final","connector":"mysql","name":"osci.mysql-server-3","ts_ms":1706687538000,"snapshot":{"string":"false"},"db":"inventory","sequence":null,"table":{"string":"orders"},"server_id":223344,"gtid":null,"file":"mysql-bin.000005","pos":354,"row":0,"thread":{"long":6},"query":null},"op":"c","ts_ms":{"long":1706687539115},"transaction":null}
Struct{order_number=10001} | {"before":{"osci.mysql-server-3.inventory.orders.Value":{"order_number":10001,"order_date":16816,"purchaser":1001,"quantity":1,"product_id":102}},"after":{"osci.mysql-server-3.inventory.orders.Value":{"order_number":10001,"order_date":16816,"purchaser":1002,"quantity":5,"product_id":104}},"source":{"version":"2.2.0.Final","connector":"mysql","name":"osci.mysql-server-3","ts_ms":1706687601000,"snapshot":{"string":"false"},"db":"inventory","sequence":null,"table":{"string":"orders"},"server_id":223344,"gtid":null,"file":"mysql-bin.000005","pos":640,"row":0,"thread":{"long":6},"query":null},"op":"u","ts_ms":{"long":1706687601997},"transaction":null}
Struct{order_number=10004} | {"before":{"osci.mysql-server-3.inventory.orders.Value":{"order_number":10004,"order_date":16852,"purchaser":1003,"quantity":1,"product_id":107}},"after":null,"source":{"version":"2.2.0.Final","connector":"mysql","name":"osci.mysql-server-3","ts_ms":1706687635000,"snapshot":{"string":"false"},"db":"inventory","sequence":null,"table":{"string":"orders"},"server_id":223344,"gtid":null,"file":"mysql-bin.000005","pos":947,"row":0,"thread":{"long":6},"query":null},"op":"d","ts_ms":{"long":1706687636121},"transaction":null}
Struct{order_number=10004} | null

读取 Kafka Debezium 源表结果如下:

使用 Tableau / Changelog 模式读取 Hudi Sink 表,无 -D 数据

上述行为和 《Flink Hudi 构建流式数据湖》 这篇文章对应不起来,正常是应该有 -D 记录的! 目前尚未找到原因,欢迎了解情况的朋友留言。

相关推荐
老周聊架构12 分钟前
聊聊Flink:Flink的状态管理
大数据·flink
ssxueyi18 小时前
如何查看flink错误信息
大数据·flink
core5121 天前
flink SQL实现mysql source sink
mysql·flink·jdbc·source·cdc·sink·mysql-cdc
天冬忘忧1 天前
Flink调优----资源配置调优与状态及Checkpoint调优
大数据·flink
Ekine2 天前
【Flink-scala】DataStream编程模型之状态编程
大数据·flink·scala
别这么骄傲3 天前
使用Flinkcdc 采集mysql数据
大数据·数据库·flink
core5124 天前
flink实现复杂kafka数据读取
大数据·flink·kafka·cdc
xyz20114 天前
Flink State面试题和参考答案-(下)
flink·状态模式
ssxueyi5 天前
Flink CDC 读取oracle库数据性能优化
大数据·oracle·性能优化·flink
ssxueyi5 天前
Flink SQL保留关键字
大数据·sql·flink·保留关键字