37、Flink 的CDC 格式:debezium部署以及mysql示例(1)-debezium的部署与示例

一、Flink 专栏

Flink 专栏系统介绍某一知识点,并辅以具体的示例进行说明。

  • 1、Flink 部署系列

    本部分介绍Flink的部署、配置相关基础内容。

  • 2、Flink基础系列

    本部分介绍Flink 的基础部分,比如术语、架构、编程模型、编程指南、基本的datastream api用法、四大基石等内容。

  • 3、Flik Table API和SQL基础系列

    本部分介绍Flink Table Api和SQL的基本用法,比如Table API和SQL创建库、表用法、查询、窗口函数、catalog等等内容。

  • 4、Flik Table API和SQL提高与应用系列

    本部分是table api 和sql的应用部分,和实际的生产应用联系更为密切,以及有一定开发难度的内容。

  • 5、Flink 监控系列

    本部分和实际的运维、监控工作相关。

二、Flink 示例专栏

Flink 示例专栏是 Flink 专栏的辅助说明,一般不会介绍知识点的信息,更多的是提供一个一个可以具体使用的示例。本专栏不再分目录,通过链接即可看出介绍的内容。

两专栏的所有文章入口点击:Flink 系列文章汇总索引


文章目录


本文详细的介绍了debezium的部署、验证以及通过一个示例介绍其使用。

如果需要了解更多内容,可以在本人Flink 专栏中了解更新系统的内容。

本文除了maven依赖外,还依赖kafka、flink、debezium。

本专题文章分为如下几篇:

37、Flink 的CDC 格式:debezium部署以及mysql示例(1)-debezium的部署与示例
37、Flink 的CDC 格式:debezium部署以及mysql示例(2)-Flink 与Debezium 实践
37、Flink 的CDC 格式:debezium部署以及mysql示例(完整版)

一、Debezium Format

1、Debezium介绍

Debezium 是一个 CDC(Changelog Data Capture,变更数据捕获)的工具,可以把来自 MySQL、PostgreSQL、Oracle、Microsoft SQL Server 和许多其他数据库的更改实时流式传输到 Kafka 中。 Debezium 为变更日志提供了统一的格式结构,并支持使用 JSON 和 Apache Avro 序列化消息。

Flink 支持将 Debezium JSON 和 Avro 消息解析为 INSERT / UPDATE / DELETE 消息到 Flink SQL 系统中。在很多情况下,利用这个特性非常的有用,例如

  • 将增量数据从数据库同步到其他系统
  • 日志审计
  • 数据库的实时物化视图
  • 关联维度数据库的变更历史,等等。

Flink 还支持将 Flink SQL 中的 INSERT / UPDATE / DELETE 消息编码为 Debezium 格式的 JSON 或 Avro 消息,输出到 Kafka 等存储中。 但需要注意的是,目前 Flink 还不支持将 UPDATE_BEFORE 和 UPDATE_AFTER 合并为一条 UPDATE 消息。因此,Flink 将 UPDATE_BEFORE 和 UPDATE_AFTER 分别编码为 DELETE 和 INSERT 类型的 Debezium 消息。

2、binlog设置及验证

设置binlog需要监控的数据库,本示例使用的数据库是mysql5.7

1)、配置

本示例设置的参数参考下面的配置

bash 复制代码
[root@server4 ~]# cat /etc/my.cnf
# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/5.7/en/server-configuration-defaults.html

[mysqld]
......

log-bin=mysql-bin  # log-bin的名称,可以是任意名称
binlog-format=row  # 推荐该参数,其他的参数视情况而定,比如mixed、statement
server_id=1 # mysql集群环境中不要重复
binlog_do_db=test # test是mysql的数据库名称,如果监控多个数据库,可以添加多个binlog_do_db即可,例如下面示例
# binlog_do_db=test2
# binlog_do_db=test3
.....
  • STATEMENT模式(SBR)

    每一条会修改数据的sql语句会记录到binlog中。优点是并不需要记录每一条sql语句和每一行的数据变化,减少了binlog日志量,节约IO,提高性能。缺点是在某些情况下会导致master-slave中的数据不一致(如sleep()函数, last_insert_id(),以及user-defined functions(udf)等会出现问题)

  • ROW模式(RBR)

    不记录每条sql语句的上下文信息,仅需记录哪条数据被修改了,修改成什么样了。而且不会出现某些特定情况下的存储过程、或function、或trigger的调用和触发无法被正确复制的问题。缺点是会产生大量的日志,尤其是alter table的时候会让日志暴涨。

  • MIXED模式(MBR)

    以上两种模式的混合使用,一般的复制使用STATEMENT模式保存binlog,对于STATEMENT模式无法复制的操作使用ROW模式保存binlog,MySQL会根据执行的SQL语句选择日志保存方式。

2)、重启mysql

保存配置后重启mysql

bash 复制代码
service mysqld restart

3)、验证

重启后,可以通过2个简单的方法验证是否设置成功。

mysql默认的安装目录:cd /var/lib/mysql

bash 复制代码
[root@server4 ~]# cd /var/lib/mysql
[root@server4 mysql]# ll
......
-rw-r----- 1 mysql mysql    154 1月  10 2022 mysql-bin.000001
-rw-r----- 1 mysql mysql       1197 1月  16 12:21 mysql-bin.index
.....
  • 查看mysql-bin.000001文件是否生成,且其大小为154字节。mysql-bin.000001是mysql重启的次数,重启2次则为mysql-bin.000002
  • 在test数据库中创建或添加数据,mysql-bin.000001的大小是否增加

以上情况满足,则说明binlog配置正常

3、debezium部署及验证

1)、下载-mysql connector连接器

去其官网:https://debezium.io/releases/下载需要的版本。

本示例使用的是:debezium-connector-mysql-1.7.2.Final-plugin.tar.gz

2)、解压

创建解压目录:/usr/local/bigdata/debezium/connector

解压

bash 复制代码
tar zxvf /usr/local/bigdata/debezium-connector-mysql-1.7.2.Final-plugin.tar.gz -C /usr/local/bigdata/debezium/connector
## 解压后
[alanchan@server3 connector]$ ll
总用量 4
drwxr-xr-x 2 alanchan root 4096 1月  16 07:20 debezium-connector-mysql
[alanchan@server3 connector]$ cd debezium-connector-mysql/
[alanchan@server3 debezium-connector-mysql]$ ll
总用量 10312
-rw-rw-r-- 1 alanchan root  337864 12月 14 2021 antlr4-runtime-4.8.jar
-rw-rw-r-- 1 alanchan root  308966 12月 14 2021 CHANGELOG.md
-rw-rw-r-- 1 alanchan root   19228 12月 14 2021 CONTRIBUTE.md
-rw-rw-r-- 1 alanchan root    4981 12月 14 2021 COPYRIGHT.txt
-rw-rw-r-- 1 alanchan root   20682 12月 14 2021 debezium-api-1.7.2.Final.jar
-rw-rw-r-- 1 alanchan root  400546 12月 14 2021 debezium-connector-mysql-1.7.2.Final.jar
-rw-rw-r-- 1 alanchan root  886363 12月 14 2021 debezium-core-1.7.2.Final.jar
-rw-rw-r-- 1 alanchan root 2825430 12月 14 2021 debezium-ddl-parser-1.7.2.Final.jar
-rw-rw-r-- 1 alanchan root    4617 12月 14 2021 failureaccess-1.0.1.jar
-rw-rw-r-- 1 alanchan root 2858426 12月 14 2021 guava-30.0-jre.jar
-rw-rw-r-- 1 alanchan root  129157 12月 14 2021 LICENSE-3rd-PARTIES.txt
-rw-rw-r-- 1 alanchan root   11357 12月 14 2021 LICENSE.txt
-rw-rw-r-- 1 alanchan root  193386 12月 14 2021 mysql-binlog-connector-java-0.25.3.jar
-rw-rw-r-- 1 alanchan root 2475087 12月 14 2021 mysql-connector-java-8.0.27.jar
-rw-rw-r-- 1 alanchan root   19520 12月 14 2021 README_JA.md
-rw-rw-r-- 1 alanchan root   15286 12月 14 2021 README.md
-rw-rw-r-- 1 alanchan root   13114 12月 14 2021 README_ZH.md

3)、kafka配置

因为配置的是kafka的插件,所以需要修改kafka的插件配置,同时需要注意的是,debezium的安装目录需要kafka能找到。

本示例中kafka的安装目录:/usr/local/bigdata/kafka_2.12-3.0.0

  • 修改kafka插件配置文件connect-distributed.properties
    修改内容如下,其他的根据情况进行配置,否则就是默认的
bash 复制代码
bootstrap.servers=server1:9092,server2:9092,server3:9092
group.id=connect-cluster
status.storage.replication.factor=2

plugin.path=/usr/local/bigdata/debezium/connector

根据实际的应用环境决定是否分发该配置文件

  • 重启kafka集群

4)、启动kafak的插件

需要 在部署debezium的机器上进行此操作

bash 复制代码
#在kafka的/usr/local/bigdata/kafka_2.12-3.0.0/bin目录下启动
#执行命令:
connect-distributed.sh -daemon /usr/local/bigdata/kafka_2.12-3.0.0/config/connect-distributed.properties

[alanchan@server3 config]$ cd /usr/local/bigdata/kafka_2.12-3.0.0/bin
[alanchan@server3 bin]$ connect-distributed.sh -daemon /usr/local/bigdata/kafka_2.12-3.0.0/config/connect-distributed.properties
[alanchan@server3 bin]$ jps
8980 ConnectDistributed
9271 Jps
826 Kafka
# ConnectDistributed 进程名称即为kafka插件
# 也可以通过下面的方式验证
[alanchan@server3 bin]$ curl server3:8083
{"version":"3.0.0","commit":"8cb0a5e9d3441962","kafka_cluster_id":"dVRZjBtQQnum1bb7pu_ljg"}
# 也可以查看有哪些连接器在工作,由于当前还未注册任何的连接器,故为空
[alanchan@server3 bin]$ curl 192.168.10.43:8083/connectors/
[]

5)、注册mysql的连接器

  • 参数说明
bash 复制代码
{
    "name": "alan-debezium-mysql-connector", // 向 Kafka Connect 服务注册时的连接器名称。
    "config": {
        "connector.class": "io.debezium.connector.mysql.MySqlConnector", // 连接器的类名,不能修改。
        "database.hostname": "192.168.10.44", //MySQL 服务器地址
        "database.port": "3306", // MySQL 服务器端口号
        "database.user": "root", // 具有适当权限的 MySQL 用户
        "database.password": "123456", // MySQL 用户的密码
        "database.server.id": "184054", // 连接器的唯一 ID,随便写,但不应该重复
        "database.server.name": "ALAN", // MySQL 服务器或集群的逻辑名称,将来作为kafka的topic前缀
        "database.include.list": "cdctest", // 指定服务器托管的数据库列表,多个数据库可以用逗号分隔
        "database.history.kafka.bootstrap.servers": "server1:9092,server2:9092,server3:9092", // 连接器用于将 DDL 语句写入和恢复到数据库历史主题的 Kafka 代理列表
        "database.history.kafka.topic": "alan.historydb", // 数据库历史主题的名称。本主题仅供内部使用,消费者不得使用
        "include.schema.changes": "true" // 指定连接器是否应为 DDL 更改生成事件并将它们发送到fulfillment架构更改主题以供使用者使用的标志
    }
}
  • 命令行执行
bash 复制代码
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" 192.168.10.43:8083/connectors/ -d {"name": "alan-debezium-mysql-connector","config": {"connector.class": "io.debezium.connector.mysql.MySqlConnector","database.hostname": "192.168.10.44","database.port": "3306","database.user": "root","database.password": "123456","database.server.id": "184054", "database.server.name": "ALAN", "database.include.list": "cdctest", "database.history.kafka.bootstrap.servers": "server1:9092,server2:9092,server3:9092", "database.history.kafka.topic": "alan.historydb","include.schema.changes": "true"}}
  • postman等工具执行

    执行后,返回

    也可以在kafka的可视化工具中查看,比如offset explorer

    也可以通过命令行查看是否注册成功
bash 复制代码
[alanchan@server3 bin]$ curl 192.168.10.43:8083/connectors/
["alan-debezium-mysql-connector"]

6)、数据验证

启动成功后,debezium会将监控 的数据库表中的数据同步到kafka的消息队列中。

本示例中,mysql中的原始数据如下

启动插件成功后,kafka对应的topic中的数据如下

以上,则表示完成debezium的初步验证成功。

4、示例:通过Debezium CDC 将mysql数据变化输出至kafka

以下是针对表userscoressink新增、修改和删除的数据后kafka主题ALAN.cdctest.userscoressink的变化情况

bash 复制代码
[alanchan@server1 bin]$ kafka-console-consumer.sh --bootstrap-server server1:9092 --topic ALAN.cdctest.userscoressink --from-beginning
......
{"schema":{"type":"struct","fields":[{"type":"struct","fields":[{"type":"string","optional":true,"field":"name"},{"type":"double","optional":true,"field":"scores"}],"optional":true,"name":"ALAN.cdctest.userscoressink.Value","field":"before"},{"type":"struct","fields":[{"type":"string","optional":true,"field":"name"},{"type":"double","optional":true,"field":"scores"}],"optional":true,"name":"ALAN.cdctest.userscoressink.Value","field":"after"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":true,"field":"sequence"},{"type":"string","optional":true,"field":"table"},{"type":"int64","optional":false,"field":"server_id"},{"type":"string","optional":true,"field":"gtid"},{"type":"string","optional":false,"field":"file"},{"type":"int64","optional":false,"field":"pos"},{"type":"int32","optional":false,"field":"row"},{"type":"int64","optional":true,"field":"thread"},{"type":"string","optional":true,"field":"query"}],"optional":false,"name":"io.debezium.connector.mysql.Source","field":"source"},{"type":"string","optional":false,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"id"},{"type":"int64","optional":false,"field":"total_order"},{"type":"int64","optional":false,"field":"data_collection_order"}],"optional":true,"field":"transaction"}],"optional":false,"name":"ALAN.cdctest.userscoressink.Envelope"},"payload":{"before":null,"after":{"name":"alan_test","scores":666.0},"source":{"version":"1.7.2.Final","connector":"mysql","name":"ALAN","ts_ms":1705717276000,"snapshot":"false","db":"cdctest","sequence":null,"table":"userscoressink","server_id":1,"gtid":null,"file":"alan_master_logbin.000004","pos":4645,"row":0,"thread":null,"query":null},"op":"c","ts_ms":1705717750512,"transaction":null}}
{"schema":{"type":"struct","fields":[{"type":"struct","fields":[{"type":"string","optional":true,"field":"name"},{"type":"double","optional":true,"field":"scores"}],"optional":true,"name":"ALAN.cdctest.userscoressink.Value","field":"before"},{"type":"struct","fields":[{"type":"string","optional":true,"field":"name"},{"type":"double","optional":true,"field":"scores"}],"optional":true,"name":"ALAN.cdctest.userscoressink.Value","field":"after"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":true,"field":"sequence"},{"type":"string","optional":true,"field":"table"},{"type":"int64","optional":false,"field":"server_id"},{"type":"string","optional":true,"field":"gtid"},{"type":"string","optional":false,"field":"file"},{"type":"int64","optional":false,"field":"pos"},{"type":"int32","optional":false,"field":"row"},{"type":"int64","optional":true,"field":"thread"},{"type":"string","optional":true,"field":"query"}],"optional":false,"name":"io.debezium.connector.mysql.Source","field":"source"},{"type":"string","optional":false,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"id"},{"type":"int64","optional":false,"field":"total_order"},{"type":"int64","optional":false,"field":"data_collection_order"}],"optional":true,"field":"transaction"}],"optional":false,"name":"ALAN.cdctest.userscoressink.Envelope"},"payload":{"before":{"name":"alan_test","scores":666.0},"after":{"name":"alan_test","scores":888.0},"source":{"version":"1.7.2.Final","connector":"mysql","name":"ALAN","ts_ms":1705717298000,"snapshot":"false","db":"cdctest","sequence":null,"table":"userscoressink","server_id":1,"gtid":null,"file":"alan_master_logbin.000004","pos":4931,"row":0,"thread":null,"query":null},"op":"u","ts_ms":1705717772785,"transaction":null}}
{"schema":{"type":"struct","fields":[{"type":"struct","fields":[{"type":"string","optional":true,"field":"name"},{"type":"double","optional":true,"field":"scores"}],"optional":true,"name":"ALAN.cdctest.userscoressink.Value","field":"before"},{"type":"struct","fields":[{"type":"string","optional":true,"field":"name"},{"type":"double","optional":true,"field":"scores"}],"optional":true,"name":"ALAN.cdctest.userscoressink.Value","field":"after"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":true,"field":"sequence"},{"type":"string","optional":true,"field":"table"},{"type":"int64","optional":false,"field":"server_id"},{"type":"string","optional":true,"field":"gtid"},{"type":"string","optional":false,"field":"file"},{"type":"int64","optional":false,"field":"pos"},{"type":"int32","optional":false,"field":"row"},{"type":"int64","optional":true,"field":"thread"},{"type":"string","optional":true,"field":"query"}],"optional":false,"name":"io.debezium.connector.mysql.Source","field":"source"},{"type":"string","optional":false,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"id"},{"type":"int64","optional":false,"field":"total_order"},{"type":"int64","optional":false,"field":"data_collection_order"}],"optional":true,"field":"transaction"}],"optional":false,"name":"ALAN.cdctest.userscoressink.Envelope"},"payload":{"before":{"name":"alan_test","scores":888.0},"after":null,"source":{"version":"1.7.2.Final","connector":"mysql","name":"ALAN","ts_ms":1705717322000,"snapshot":"false","db":"cdctest","sequence":null,"table":"userscoressink","server_id":1,"gtid":null,"file":"alan_master_logbin.000004","pos":5234,"row":0,"thread":null,"query":null},"op":"d","ts_ms":1705717796886,"transaction":null}}

以上,本文详细的介绍了debezium的部署、验证以及通过一个示例介绍其使用。

本专题文章分为如下几篇:

37、Flink 的CDC 格式:debezium部署以及mysql示例(1)-debezium的部署与示例
37、Flink 的CDC 格式:debezium部署以及mysql示例(2)-Flink 与Debezium 实践
37、Flink 的CDC 格式:debezium部署以及mysql示例(完整版)

相关推荐
Data跳动5 小时前
Spark内存都消耗在哪里了?
大数据·分布式·spark
woshiabc1116 小时前
windows安装Elasticsearch及增删改查操作
大数据·elasticsearch·搜索引擎
lucky_syq6 小时前
Saprk和Flink的区别
大数据·flink
lucky_syq6 小时前
流式处理,为什么Flink比Spark Streaming好?
大数据·flink·spark
袋鼠云数栈6 小时前
深入浅出Flink CEP丨如何通过Flink SQL作业动态更新Flink CEP作业
大数据
小白学大数据8 小时前
如何使用Selenium处理JavaScript动态加载的内容?
大数据·javascript·爬虫·selenium·测试工具
15年网络推广青哥8 小时前
国际抖音TikTok矩阵运营的关键要素有哪些?
大数据·人工智能·矩阵
节点。csn9 小时前
Hadoop yarn安装
大数据·hadoop·分布式
arnold669 小时前
探索 ElasticSearch:性能优化之道
大数据·elasticsearch·性能优化
saynaihe9 小时前
安全地使用 Docker 和 Systemctl 部署 Kafka 的综合指南
运维·安全·docker·容器·kafka