Flink 系列文章
一、Flink 专栏
Flink 专栏系统介绍某一知识点,并辅以具体的示例进行说明。
-
1、Flink 部署系列
本部分介绍Flink的部署、配置相关基础内容。
-
2、Flink基础系列
本部分介绍Flink 的基础部分,比如术语、架构、编程模型、编程指南、基本的datastream api用法、四大基石等内容。
-
3、Flik Table API和SQL基础系列
本部分介绍Flink Table Api和SQL的基本用法,比如Table API和SQL创建库、表用法、查询、窗口函数、catalog等等内容。
-
4、Flik Table API和SQL提高与应用系列
本部分是table api 和sql的应用部分,和实际的生产应用联系更为密切,以及有一定开发难度的内容。
-
5、Flink 监控系列
本部分和实际的运维、监控工作相关。
二、Flink 示例专栏
Flink 示例专栏是 Flink 专栏的辅助说明,一般不会介绍知识点的信息,更多的是提供一个一个可以具体使用的示例。本专栏不再分目录,通过链接即可看出介绍的内容。
两专栏的所有文章入口点击:Flink 系列文章汇总索引
文章目录
- [Flink 系列文章](#Flink 系列文章)
- [一、Debezium Format](#一、Debezium Format)
-
- 1、Debezium介绍
- 2、binlog设置及验证
- 3、debezium部署及验证
-
- [1)、下载-mysql connector连接器](#1)、下载-mysql connector连接器)
- 2)、解压
- 3)、kafka配置
- 4)、启动kafak的插件
- 5)、注册mysql的连接器
- 6)、数据验证
- [4、示例:通过Debezium CDC 将mysql数据变化输出至kafka](#4、示例:通过Debezium CDC 将mysql数据变化输出至kafka)
本文详细的介绍了debezium的部署、验证以及通过一个示例介绍其使用。
如果需要了解更多内容,可以在本人Flink 专栏中了解更新系统的内容。
本文除了maven依赖外,还依赖kafka、flink、debezium。
本专题文章分为如下几篇:
37、Flink 的CDC 格式:debezium部署以及mysql示例(1)-debezium的部署与示例
37、Flink 的CDC 格式:debezium部署以及mysql示例(2)-Flink 与Debezium 实践
37、Flink 的CDC 格式:debezium部署以及mysql示例(完整版)
一、Debezium Format
1、Debezium介绍
Debezium 是一个 CDC(Changelog Data Capture,变更数据捕获)的工具,可以把来自 MySQL、PostgreSQL、Oracle、Microsoft SQL Server 和许多其他数据库的更改实时流式传输到 Kafka 中。 Debezium 为变更日志提供了统一的格式结构,并支持使用 JSON 和 Apache Avro 序列化消息。
Flink 支持将 Debezium JSON 和 Avro 消息解析为 INSERT / UPDATE / DELETE 消息到 Flink SQL 系统中。在很多情况下,利用这个特性非常的有用,例如
- 将增量数据从数据库同步到其他系统
- 日志审计
- 数据库的实时物化视图
- 关联维度数据库的变更历史,等等。
Flink 还支持将 Flink SQL 中的 INSERT / UPDATE / DELETE 消息编码为 Debezium 格式的 JSON 或 Avro 消息,输出到 Kafka 等存储中。 但需要注意的是,目前 Flink 还不支持将 UPDATE_BEFORE 和 UPDATE_AFTER 合并为一条 UPDATE 消息。因此,Flink 将 UPDATE_BEFORE 和 UPDATE_AFTER 分别编码为 DELETE 和 INSERT 类型的 Debezium 消息。
2、binlog设置及验证
设置binlog需要监控的数据库,本示例使用的数据库是mysql5.7
1)、配置
本示例设置的参数参考下面的配置
bash
[root@server4 ~]# cat /etc/my.cnf
# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/5.7/en/server-configuration-defaults.html
[mysqld]
......
log-bin=mysql-bin # log-bin的名称,可以是任意名称
binlog-format=row # 推荐该参数,其他的参数视情况而定,比如mixed、statement
server_id=1 # mysql集群环境中不要重复
binlog_do_db=test # test是mysql的数据库名称,如果监控多个数据库,可以添加多个binlog_do_db即可,例如下面示例
# binlog_do_db=test2
# binlog_do_db=test3
.....
-
STATEMENT模式(SBR)
每一条会修改数据的sql语句会记录到binlog中。优点是并不需要记录每一条sql语句和每一行的数据变化,减少了binlog日志量,节约IO,提高性能。缺点是在某些情况下会导致master-slave中的数据不一致(如sleep()函数, last_insert_id(),以及user-defined functions(udf)等会出现问题)
-
ROW模式(RBR)
不记录每条sql语句的上下文信息,仅需记录哪条数据被修改了,修改成什么样了。而且不会出现某些特定情况下的存储过程、或function、或trigger的调用和触发无法被正确复制的问题。缺点是会产生大量的日志,尤其是alter table的时候会让日志暴涨。
-
MIXED模式(MBR)
以上两种模式的混合使用,一般的复制使用STATEMENT模式保存binlog,对于STATEMENT模式无法复制的操作使用ROW模式保存binlog,MySQL会根据执行的SQL语句选择日志保存方式。
2)、重启mysql
保存配置后重启mysql
bash
service mysqld restart
3)、验证
重启后,可以通过2个简单的方法验证是否设置成功。
mysql默认的安装目录:cd /var/lib/mysql
bash
[root@server4 ~]# cd /var/lib/mysql
[root@server4 mysql]# ll
......
-rw-r----- 1 mysql mysql 154 1月 10 2022 mysql-bin.000001
-rw-r----- 1 mysql mysql 1197 1月 16 12:21 mysql-bin.index
.....
- 查看mysql-bin.000001文件是否生成,且其大小为154字节。mysql-bin.000001是mysql重启的次数,重启2次则为mysql-bin.000002
- 在test数据库中创建或添加数据,mysql-bin.000001的大小是否增加
以上情况满足,则说明binlog配置正常
3、debezium部署及验证
1)、下载-mysql connector连接器
去其官网:https://debezium.io/releases/下载需要的版本。
本示例使用的是:debezium-connector-mysql-1.7.2.Final-plugin.tar.gz
2)、解压
创建解压目录:/usr/local/bigdata/debezium/connector
解压
bash
tar zxvf /usr/local/bigdata/debezium-connector-mysql-1.7.2.Final-plugin.tar.gz -C /usr/local/bigdata/debezium/connector
## 解压后
[alanchan@server3 connector]$ ll
总用量 4
drwxr-xr-x 2 alanchan root 4096 1月 16 07:20 debezium-connector-mysql
[alanchan@server3 connector]$ cd debezium-connector-mysql/
[alanchan@server3 debezium-connector-mysql]$ ll
总用量 10312
-rw-rw-r-- 1 alanchan root 337864 12月 14 2021 antlr4-runtime-4.8.jar
-rw-rw-r-- 1 alanchan root 308966 12月 14 2021 CHANGELOG.md
-rw-rw-r-- 1 alanchan root 19228 12月 14 2021 CONTRIBUTE.md
-rw-rw-r-- 1 alanchan root 4981 12月 14 2021 COPYRIGHT.txt
-rw-rw-r-- 1 alanchan root 20682 12月 14 2021 debezium-api-1.7.2.Final.jar
-rw-rw-r-- 1 alanchan root 400546 12月 14 2021 debezium-connector-mysql-1.7.2.Final.jar
-rw-rw-r-- 1 alanchan root 886363 12月 14 2021 debezium-core-1.7.2.Final.jar
-rw-rw-r-- 1 alanchan root 2825430 12月 14 2021 debezium-ddl-parser-1.7.2.Final.jar
-rw-rw-r-- 1 alanchan root 4617 12月 14 2021 failureaccess-1.0.1.jar
-rw-rw-r-- 1 alanchan root 2858426 12月 14 2021 guava-30.0-jre.jar
-rw-rw-r-- 1 alanchan root 129157 12月 14 2021 LICENSE-3rd-PARTIES.txt
-rw-rw-r-- 1 alanchan root 11357 12月 14 2021 LICENSE.txt
-rw-rw-r-- 1 alanchan root 193386 12月 14 2021 mysql-binlog-connector-java-0.25.3.jar
-rw-rw-r-- 1 alanchan root 2475087 12月 14 2021 mysql-connector-java-8.0.27.jar
-rw-rw-r-- 1 alanchan root 19520 12月 14 2021 README_JA.md
-rw-rw-r-- 1 alanchan root 15286 12月 14 2021 README.md
-rw-rw-r-- 1 alanchan root 13114 12月 14 2021 README_ZH.md
3)、kafka配置
因为配置的是kafka的插件,所以需要修改kafka的插件配置,同时需要注意的是,debezium的安装目录需要kafka能找到。
本示例中kafka的安装目录:/usr/local/bigdata/kafka_2.12-3.0.0
- 修改kafka插件配置文件connect-distributed.properties
修改内容如下,其他的根据情况进行配置,否则就是默认的
bash
bootstrap.servers=server1:9092,server2:9092,server3:9092
group.id=connect-cluster
status.storage.replication.factor=2
plugin.path=/usr/local/bigdata/debezium/connector
根据实际的应用环境决定是否分发该配置文件
- 重启kafka集群
4)、启动kafak的插件
需要 在部署debezium的机器上进行此操作
bash
#在kafka的/usr/local/bigdata/kafka_2.12-3.0.0/bin目录下启动
#执行命令:
connect-distributed.sh -daemon /usr/local/bigdata/kafka_2.12-3.0.0/config/connect-distributed.properties
[alanchan@server3 config]$ cd /usr/local/bigdata/kafka_2.12-3.0.0/bin
[alanchan@server3 bin]$ connect-distributed.sh -daemon /usr/local/bigdata/kafka_2.12-3.0.0/config/connect-distributed.properties
[alanchan@server3 bin]$ jps
8980 ConnectDistributed
9271 Jps
826 Kafka
# ConnectDistributed 进程名称即为kafka插件
# 也可以通过下面的方式验证
[alanchan@server3 bin]$ curl server3:8083
{"version":"3.0.0","commit":"8cb0a5e9d3441962","kafka_cluster_id":"dVRZjBtQQnum1bb7pu_ljg"}
# 也可以查看有哪些连接器在工作,由于当前还未注册任何的连接器,故为空
[alanchan@server3 bin]$ curl 192.168.10.43:8083/connectors/
[]
5)、注册mysql的连接器
- 参数说明
bash
{
"name": "alan-debezium-mysql-connector", // 向 Kafka Connect 服务注册时的连接器名称。
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector", // 连接器的类名,不能修改。
"database.hostname": "192.168.10.44", //MySQL 服务器地址
"database.port": "3306", // MySQL 服务器端口号
"database.user": "root", // 具有适当权限的 MySQL 用户
"database.password": "123456", // MySQL 用户的密码
"database.server.id": "184054", // 连接器的唯一 ID,随便写,但不应该重复
"database.server.name": "ALAN", // MySQL 服务器或集群的逻辑名称,将来作为kafka的topic前缀
"database.include.list": "cdctest", // 指定服务器托管的数据库列表,多个数据库可以用逗号分隔
"database.history.kafka.bootstrap.servers": "server1:9092,server2:9092,server3:9092", // 连接器用于将 DDL 语句写入和恢复到数据库历史主题的 Kafka 代理列表
"database.history.kafka.topic": "alan.historydb", // 数据库历史主题的名称。本主题仅供内部使用,消费者不得使用
"include.schema.changes": "true" // 指定连接器是否应为 DDL 更改生成事件并将它们发送到fulfillment架构更改主题以供使用者使用的标志
}
}
- 命令行执行
bash
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" 192.168.10.43:8083/connectors/ -d {"name": "alan-debezium-mysql-connector","config": {"connector.class": "io.debezium.connector.mysql.MySqlConnector","database.hostname": "192.168.10.44","database.port": "3306","database.user": "root","database.password": "123456","database.server.id": "184054", "database.server.name": "ALAN", "database.include.list": "cdctest", "database.history.kafka.bootstrap.servers": "server1:9092,server2:9092,server3:9092", "database.history.kafka.topic": "alan.historydb","include.schema.changes": "true"}}
- postman等工具执行
执行后,返回
也可以在kafka的可视化工具中查看,比如offset explorer
也可以通过命令行查看是否注册成功
bash
[alanchan@server3 bin]$ curl 192.168.10.43:8083/connectors/
["alan-debezium-mysql-connector"]
6)、数据验证
启动成功后,debezium会将监控 的数据库表中的数据同步到kafka的消息队列中。
本示例中,mysql中的原始数据如下
启动插件成功后,kafka对应的topic中的数据如下
以上,则表示完成debezium的初步验证成功。
4、示例:通过Debezium CDC 将mysql数据变化输出至kafka
以下是针对表userscoressink新增、修改和删除的数据后kafka主题ALAN.cdctest.userscoressink的变化情况
bash
[alanchan@server1 bin]$ kafka-console-consumer.sh --bootstrap-server server1:9092 --topic ALAN.cdctest.userscoressink --from-beginning
......
{"schema":{"type":"struct","fields":[{"type":"struct","fields":[{"type":"string","optional":true,"field":"name"},{"type":"double","optional":true,"field":"scores"}],"optional":true,"name":"ALAN.cdctest.userscoressink.Value","field":"before"},{"type":"struct","fields":[{"type":"string","optional":true,"field":"name"},{"type":"double","optional":true,"field":"scores"}],"optional":true,"name":"ALAN.cdctest.userscoressink.Value","field":"after"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":true,"field":"sequence"},{"type":"string","optional":true,"field":"table"},{"type":"int64","optional":false,"field":"server_id"},{"type":"string","optional":true,"field":"gtid"},{"type":"string","optional":false,"field":"file"},{"type":"int64","optional":false,"field":"pos"},{"type":"int32","optional":false,"field":"row"},{"type":"int64","optional":true,"field":"thread"},{"type":"string","optional":true,"field":"query"}],"optional":false,"name":"io.debezium.connector.mysql.Source","field":"source"},{"type":"string","optional":false,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"id"},{"type":"int64","optional":false,"field":"total_order"},{"type":"int64","optional":false,"field":"data_collection_order"}],"optional":true,"field":"transaction"}],"optional":false,"name":"ALAN.cdctest.userscoressink.Envelope"},"payload":{"before":null,"after":{"name":"alan_test","scores":666.0},"source":{"version":"1.7.2.Final","connector":"mysql","name":"ALAN","ts_ms":1705717276000,"snapshot":"false","db":"cdctest","sequence":null,"table":"userscoressink","server_id":1,"gtid":null,"file":"alan_master_logbin.000004","pos":4645,"row":0,"thread":null,"query":null},"op":"c","ts_ms":1705717750512,"transaction":null}}
{"schema":{"type":"struct","fields":[{"type":"struct","fields":[{"type":"string","optional":true,"field":"name"},{"type":"double","optional":true,"field":"scores"}],"optional":true,"name":"ALAN.cdctest.userscoressink.Value","field":"before"},{"type":"struct","fields":[{"type":"string","optional":true,"field":"name"},{"type":"double","optional":true,"field":"scores"}],"optional":true,"name":"ALAN.cdctest.userscoressink.Value","field":"after"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":true,"field":"sequence"},{"type":"string","optional":true,"field":"table"},{"type":"int64","optional":false,"field":"server_id"},{"type":"string","optional":true,"field":"gtid"},{"type":"string","optional":false,"field":"file"},{"type":"int64","optional":false,"field":"pos"},{"type":"int32","optional":false,"field":"row"},{"type":"int64","optional":true,"field":"thread"},{"type":"string","optional":true,"field":"query"}],"optional":false,"name":"io.debezium.connector.mysql.Source","field":"source"},{"type":"string","optional":false,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"id"},{"type":"int64","optional":false,"field":"total_order"},{"type":"int64","optional":false,"field":"data_collection_order"}],"optional":true,"field":"transaction"}],"optional":false,"name":"ALAN.cdctest.userscoressink.Envelope"},"payload":{"before":{"name":"alan_test","scores":666.0},"after":{"name":"alan_test","scores":888.0},"source":{"version":"1.7.2.Final","connector":"mysql","name":"ALAN","ts_ms":1705717298000,"snapshot":"false","db":"cdctest","sequence":null,"table":"userscoressink","server_id":1,"gtid":null,"file":"alan_master_logbin.000004","pos":4931,"row":0,"thread":null,"query":null},"op":"u","ts_ms":1705717772785,"transaction":null}}
{"schema":{"type":"struct","fields":[{"type":"struct","fields":[{"type":"string","optional":true,"field":"name"},{"type":"double","optional":true,"field":"scores"}],"optional":true,"name":"ALAN.cdctest.userscoressink.Value","field":"before"},{"type":"struct","fields":[{"type":"string","optional":true,"field":"name"},{"type":"double","optional":true,"field":"scores"}],"optional":true,"name":"ALAN.cdctest.userscoressink.Value","field":"after"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":true,"field":"sequence"},{"type":"string","optional":true,"field":"table"},{"type":"int64","optional":false,"field":"server_id"},{"type":"string","optional":true,"field":"gtid"},{"type":"string","optional":false,"field":"file"},{"type":"int64","optional":false,"field":"pos"},{"type":"int32","optional":false,"field":"row"},{"type":"int64","optional":true,"field":"thread"},{"type":"string","optional":true,"field":"query"}],"optional":false,"name":"io.debezium.connector.mysql.Source","field":"source"},{"type":"string","optional":false,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"id"},{"type":"int64","optional":false,"field":"total_order"},{"type":"int64","optional":false,"field":"data_collection_order"}],"optional":true,"field":"transaction"}],"optional":false,"name":"ALAN.cdctest.userscoressink.Envelope"},"payload":{"before":{"name":"alan_test","scores":888.0},"after":null,"source":{"version":"1.7.2.Final","connector":"mysql","name":"ALAN","ts_ms":1705717322000,"snapshot":"false","db":"cdctest","sequence":null,"table":"userscoressink","server_id":1,"gtid":null,"file":"alan_master_logbin.000004","pos":5234,"row":0,"thread":null,"query":null},"op":"d","ts_ms":1705717796886,"transaction":null}}
以上,本文详细的介绍了debezium的部署、验证以及通过一个示例介绍其使用。
本专题文章分为如下几篇:
37、Flink 的CDC 格式:debezium部署以及mysql示例(1)-debezium的部署与示例
37、Flink 的CDC 格式:debezium部署以及mysql示例(2)-Flink 与Debezium 实践
37、Flink 的CDC 格式:debezium部署以及mysql示例(完整版)