Flink CDC从入门到上天系列第一篇:Flink CDC简易应用

1:什么是CDC

术语解析: CDC:表示变更数据捕获技术

技术本质: 通过捕获数据库变更事件(增删改),并按照发生顺序记录下来。实现数据同步

应用场景: 广泛应用于数据仓库、实时分析、数据同步等场景。

2:CDC种类

|------------|-------------------------|-------------------------------------------|
| | 基于查询的CDC | 基于Binlog的CDC |
| 开源产品 | sqoop、kafka jdbc source | Canal(阿里开源)、Maxwell、Debezium(flink cdc内置) |
| 执行模式 | 批处理 | 流处理 |
| 是否捕捉所有数据变化 | 否 | 是 |
| 延迟低 | 高延迟 | 低延迟 |
| 是否增加数据库压力 | 是 | 否 |

Binlog的CDC适配MySQL. flink cdc内置,直接采集binlog日志,对于MySQL TP库是绝佳的实现方案。

为什么有了Canal和Max well之后,还有flink cdc呢?

Canal/Maxwell 专注 CDC 采集与转发,而 Flink CDC 是 "采集 + 实时计算" 一体化的连接器,能简化架构、保障 Exactly-Once、支持全增量一体化与复杂流处理,完美适配实时数仓 / 数据湖等现代实时场景,因此成为主流选择。

flink 社区开发了 flink-cdc-connectors 组件,这是一个可以直接从 MysQL、PostgresQL等数据库直接读取全量数据增量变更数据的 source组件

目前也开源,开源地址:https://github.com/ververica/flink-cdc-connectors

技术对比: 需要理解两种API的适用场景和性能差异

实践重点: 掌握基于DataStream和SQL两种方式的代码实现

生产考量: 学习如何根据实际业务需求选择合适的API方案

1:DataStream方式的应用

1:创建一个Maven项目

这个没啥好说的。

2:导入Maven依赖

复制代码
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.example</groupId>
    <artifactId>flink-cdc</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <flink.version>1.17.1</flink.version>
        <scala.binary.version>2.12</scala.binary.version>
    </properties>

    <dependencies>
        <!-- Flink核心依赖 -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients</artifactId>
            <version>${flink.version}</version> <!-- 版本必须与上方一致 -->
            <!-- 本地运行时注释掉provided,打包提交集群时改为provided -->
            <!-- <scope>provided</scope> -->
        </dependency>

        <!-- MySQL CDC连接器 -->
        <dependency>
            <groupId>com.ververica</groupId>
            <artifactId>flink-connector-mysql-cdc</artifactId>
            <version>2.4.0</version>
        </dependency>

        <!-- MySQL JDBC驱动 -->
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>8.0.33</version>
        </dependency>


        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.75</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.8.1</version>
                <configuration>
                    <source>8</source>
                    <target>8</target>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

3:编写代码

复制代码
package com.dashu;

import com.ververica.cdc.connectors.mysql.MySqlSource;
import com.ververica.cdc.connectors.mysql.table.StartupOptions;
import com.ververica.cdc.debezium.DebeziumSourceFunction;
import com.ververica.cdc.debezium.StringDebeziumDeserializationSchema;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class FlinkCDC1 {
    public static void main(String[] args) throws Exception {

        // 获取Flink的执行环境
        StreamExecutionEnvironment evn = StreamExecutionEnvironment.getExecutionEnvironment();
        evn.setParallelism(1);

        // 通过Flink CDC构建SourceFunction
        DebeziumSourceFunction<String> sourceFunction = MySqlSource.<String>builder()
                .hostname("192.168.67.133")
                .port(3306)
                .username("root")
                .password("root")
                //可变形参:可以同时监控多个数据库
                .databaseList("cdc_test")
                //可变形参:可以同时监控多个数据表
                //表的格式要这么写。
                .tableList("cdc_test.table_a")
                // 反序列化器,后续肯定不用这个,这里先展示下。
                .deserializer(new StringDebeziumDeserializationSchema())
                // 监控模式。
                .startupOptions(StartupOptions.initial())
                .build();

        DataStreamSource<String> dataStreamSource = evn.addSource(sourceFunction);

        // 此时已经获取到流对象,可以对数据进行二次加工,写入到其他数据库,或者写入到中间件当中。
        // 数据打印
        dataStreamSource.print();

        // 启动任务
        evn.execute("FlinkCDC");
    }
}

4:开启MySQL的BinLog

1:MySQL8默认对所有schema都开启了binlog

2:MySQL配置文件在vim /etc/my.cnf

3:查询binlog文件

复制代码
[root@localhost ~]# cd /var/lib/mysql
[root@localhost mysql]# ls | grep binlog
binlog.000018
binlog.000019
binlog.000020
binlog.000021
binlog.000022
binlog.000023
binlog.000024
binlog.index
复制代码
[root@localhost mysql]# ls -l -h -t
total 93M
-rw-r-----. 1 mysql mysql 192K Feb  5 09:46 #ib_16384_0.dblwr
-rw-r-----. 1 mysql mysql  16M Feb  5 09:46 undo_002
-rw-r-----. 1 mysql mysql  12M Feb  5 09:45 ibdata1
-rw-r-----. 1 mysql mysql  16M Feb  5 09:45 undo_001
-rw-r-----. 1 mysql mysql 1.1K Feb  5 09:45 binlog.000024
-rw-r-----. 1 mysql mysql  28M Feb  5 09:45 mysql.ibd
drwxr-x---. 2 mysql mysql   27 Feb  5 09:44 cdc_test
-rw-r-----. 1 mysql mysql  12M Feb  5 09:30 ibtmp1
srwxrwxrwx. 1 mysql mysql    0 Feb  5 09:30 mysql.sock
-rw-------. 1 mysql mysql    5 Feb  5 09:30 mysql.sock.lock
-rw-r-----. 1 mysql mysql  112 Feb  5 09:30 binlog.index
-rw-r-----. 1 mysql mysql  14K Feb  5 09:30 binlog.000023
drwxr-x---. 2 mysql mysql 4.0K Feb  5 09:30 #innodb_redo
drwxr-x---. 2 mysql mysql  187 Feb  5 09:30 #innodb_temp
drwxr-x---. 2 mysql mysql 4.0K Feb  4 03:16 ems
drwxr-x---. 2 mysql mysql    6 Feb  4 03:09 nacos_config
-rw-r-----. 1 mysql mysql 3.5K Feb  4 02:06 ib_buffer_pool
-rw-r-----. 1 mysql mysql  180 Feb  4 02:06 binlog.000022
-rw-r-----. 1 mysql mysql  157 Feb  4 02:06 binlog.000021
-rw-r-----. 1 mysql mysql  496 Feb  3 06:34 binlog.000020
-rw-r-----. 1 mysql mysql 1.5K Jan 12 07:26 binlog.000019
-rw-r-----. 1 mysql mysql  157 Jan  7 09:38 binlog.000018
drwxr-x---. 2 mysql mysql   28 Nov 26 09:56 sys
drwxr-x---. 2 mysql mysql  143 Nov 26 09:55 mysql
-rw-------. 1 mysql mysql 1.7K Nov 26 09:55 private_key.pem
-rw-r--r--. 1 mysql mysql  452 Nov 26 09:55 public_key.pem
-rw-r--r--. 1 mysql mysql 1.1K Nov 26 09:55 client-cert.pem
-rw-------. 1 mysql mysql 1.7K Nov 26 09:55 client-key.pem
-rw-r--r--. 1 mysql mysql 1.1K Nov 26 09:55 server-cert.pem
-rw-------. 1 mysql mysql 1.7K Nov 26 09:55 server-key.pem
-rw-r--r--. 1 mysql mysql 1.1K Nov 26 09:55 ca.pem
-rw-------. 1 mysql mysql 1.7K Nov 26 09:55 ca-key.pem
drwxr-x---. 2 mysql mysql 8.0K Nov 26 09:55 performance_schema
-rw-r-----. 1 mysql mysql 8.2M Nov 26 09:55 #ib_16384_1.dblwr
-rw-r-----. 1 mysql mysql   56 Nov 26 09:55 auto.cnf
[root@localhost mysql]#

5:验证结果

复制代码
D:\soft\JDK\JDK18\bin\java.exe "-javaagent:D:\soft\IntelliJ IDEA 2023.2.8\lib\idea_rt.jar=45242:D:\soft\IntelliJ IDEA 2023.2.8\bin" -Dfile.encoding=UTF-8 -classpath D:\soft\JDK\JDK18\jre\lib\charsets.jar;D:\soft\JDK\JDK18\jre\lib\deploy.jar;D:\soft\JDK\JDK18\jre\lib\ext\access-bridge-64.jar;D:\soft\JDK\JDK18\jre\lib\ext\cldrdata.jar;D:\soft\JDK\JDK18\jre\lib\ext\dnsns.jar;D:\soft\JDK\JDK18\jre\lib\ext\jaccess.jar;D:\soft\JDK\JDK18\jre\lib\ext\jfxrt.jar;D:\soft\JDK\JDK18\jre\lib\ext\localedata.jar;D:\soft\JDK\JDK18\jre\lib\ext\nashorn.jar;D:\soft\JDK\JDK18\jre\lib\ext\sunec.jar;D:\soft\JDK\JDK18\jre\lib\ext\sunjce_provider.jar;D:\soft\JDK\JDK18\jre\lib\ext\sunmscapi.jar;D:\soft\JDK\JDK18\jre\lib\ext\sunpkcs11.jar;D:\soft\JDK\JDK18\jre\lib\ext\zipfs.jar;D:\soft\JDK\JDK18\jre\lib\javaws.jar;D:\soft\JDK\JDK18\jre\lib\jce.jar;D:\soft\JDK\JDK18\jre\lib\jfr.jar;D:\soft\JDK\JDK18\jre\lib\jfxswt.jar;D:\soft\JDK\JDK18\jre\lib\jsse.jar;D:\soft\JDK\JDK18\jre\lib\management-agent.jar;D:\soft\JDK\JDK18\jre\lib\plugin.jar;D:\soft\JDK\JDK18\jre\lib\resources.jar;D:\soft\JDK\JDK18\jre\lib\rt.jar;D:\code\flink\flink-cdc\target\classes;D:\repository\org\apache\flink\flink-java\1.17.1\flink-java-1.17.1.jar;D:\repository\org\apache\flink\flink-core\1.17.1\flink-core-1.17.1.jar;D:\repository\org\apache\flink\flink-annotations\1.17.1\flink-annotations-1.17.1.jar;D:\repository\org\apache\flink\flink-metrics-core\1.17.1\flink-metrics-core-1.17.1.jar;D:\repository\org\apache\flink\flink-shaded-asm-9\9.3-16.1\flink-shaded-asm-9-9.3-16.1.jar;D:\repository\org\apache\flink\flink-shaded-jackson\2.13.4-16.1\flink-shaded-jackson-2.13.4-16.1.jar;D:\repository\org\apache\commons\commons-text\1.10.0\commons-text-1.10.0.jar;D:\repository\com\esotericsoftware\kryo\kryo\2.24.0\kryo-2.24.0.jar;D:\repository\com\esotericsoftware\minlog\minlog\1.2\minlog-1.2.jar;D:\repository\org\objenesis\objenesis\2.1\objenesis-2.1.jar;D:\repository\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;D:\repository\org\apache\commons\commons-compress\1.21\commons-compress-1.21.jar;D:\repository\org\apache\commons\commons-lang3\3.12.0\commons-lang3-3.12.0.jar;D:\repository\org\apache\commons\commons-math3\3.6.1\commons-math3-3.6.1.jar;D:\repository\com\twitter\chill-java\0.7.6\chill-java-0.7.6.jar;D:\repository\org\slf4j\slf4j-api\1.7.36\slf4j-api-1.7.36.jar;D:\repository\com\google\code\findbugs\jsr305\1.3.9\jsr305-1.3.9.jar;D:\repository\org\apache\flink\flink-streaming-java\1.17.1\flink-streaming-java-1.17.1.jar;D:\repository\org\apache\flink\flink-file-sink-common\1.17.1\flink-file-sink-common-1.17.1.jar;D:\repository\org\apache\flink\flink-runtime\1.17.1\flink-runtime-1.17.1.jar;D:\repository\org\apache\flink\flink-rpc-core\1.17.1\flink-rpc-core-1.17.1.jar;D:\repository\org\apache\flink\flink-rpc-akka-loader\1.17.1\flink-rpc-akka-loader-1.17.1.jar;D:\repository\org\apache\flink\flink-queryable-state-client-java\1.17.1\flink-queryable-state-client-java-1.17.1.jar;D:\repository\org\apache\flink\flink-hadoop-fs\1.17.1\flink-hadoop-fs-1.17.1.jar;D:\repository\commons-io\commons-io\2.11.0\commons-io-2.11.0.jar;D:\repository\org\apache\flink\flink-shaded-netty\4.1.82.Final-16.1\flink-shaded-netty-4.1.82.Final-16.1.jar;D:\repository\org\apache\flink\flink-shaded-zookeeper-3\3.7.1-16.1\flink-shaded-zookeeper-3-3.7.1-16.1.jar;D:\repository\org\javassist\javassist\3.24.0-GA\javassist-3.24.0-GA.jar;D:\repository\org\xerial\snappy\snappy-java\1.1.8.3\snappy-java-1.1.8.3.jar;D:\repository\org\lz4\lz4-java\1.8.0\lz4-java-1.8.0.jar;D:\repository\org\apache\flink\flink-shaded-force-shading\16.1\flink-shaded-force-shading-16.1.jar;D:\repository\org\apache\flink\flink-shaded-guava\30.1.1-jre-16.1\flink-shaded-guava-30.1.1-jre-16.1.jar;D:\repository\org\apache\flink\flink-table-planner_2.12\1.17.1\flink-table-planner_2.12-1.17.1.jar;D:\repository\org\immutables\value\2.8.8\value-2.8.8.jar;D:\repository\org\immutables\value-annotations\2.8.8\value-annotations-2.8.8.jar;D:\repository\org\codehaus\janino\commons-compiler\3.0.11\commons-compiler-3.0.11.jar;D:\repository\org\codehaus\janino\janino\3.0.11\janino-3.0.11.jar;D:\repository\org\apache\flink\flink-table-api-java-bridge\1.17.1\flink-table-api-java-bridge-1.17.1.jar;D:\repository\org\apache\flink\flink-table-api-java\1.17.1\flink-table-api-java-1.17.1.jar;D:\repository\org\apache\flink\flink-table-api-bridge-base\1.17.1\flink-table-api-bridge-base-1.17.1.jar;D:\repository\org\apache\flink\flink-scala_2.12\1.17.1\flink-scala_2.12-1.17.1.jar;D:\repository\org\scala-lang\scala-reflect\2.12.7\scala-reflect-2.12.7.jar;D:\repository\org\scala-lang\scala-library\2.12.7\scala-library-2.12.7.jar;D:\repository\org\scala-lang\scala-compiler\2.12.7\scala-compiler-2.12.7.jar;D:\repository\org\scala-lang\modules\scala-xml_2.12\1.0.6\scala-xml_2.12-1.0.6.jar;D:\repository\com\twitter\chill_2.12\0.7.6\chill_2.12-0.7.6.jar;D:\repository\org\apiguardian\apiguardian-api\1.1.2\apiguardian-api-1.1.2.jar;D:\repository\org\apache\flink\flink-table-runtime\1.17.1\flink-table-runtime-1.17.1.jar;D:\repository\org\apache\flink\flink-table-common\1.17.1\flink-table-common-1.17.1.jar;D:\repository\com\ibm\icu\icu4j\67.1\icu4j-67.1.jar;D:\repository\org\apache\flink\flink-cep\1.17.1\flink-cep-1.17.1.jar;D:\repository\org\apache\flink\flink-clients\1.17.1\flink-clients-1.17.1.jar;D:\repository\org\apache\flink\flink-optimizer\1.17.1\flink-optimizer-1.17.1.jar;D:\repository\commons-cli\commons-cli\1.5.0\commons-cli-1.5.0.jar;D:\repository\com\ververica\flink-connector-mysql-cdc\2.4.0\flink-connector-mysql-cdc-2.4.0.jar;D:\repository\com\ververica\flink-connector-debezium\2.4.0\flink-connector-debezium-2.4.0.jar;D:\repository\io\debezium\debezium-api\1.9.7.Final\debezium-api-1.9.7.Final.jar;D:\repository\io\debezium\debezium-embedded\1.9.7.Final\debezium-embedded-1.9.7.Final.jar;D:\repository\org\apache\kafka\connect-api\3.2.0\connect-api-3.2.0.jar;D:\repository\org\apache\kafka\kafka-clients\3.2.0\kafka-clients-3.2.0.jar;D:\repository\javax\ws\rs\javax.ws.rs-api\2.1.1\javax.ws.rs-api-2.1.1.jar;D:\repository\org\apache\kafka\connect-runtime\3.2.0\connect-runtime-3.2.0.jar;D:\repository\org\apache\kafka\connect-transforms\3.2.0\connect-transforms-3.2.0.jar;D:\repository\org\apache\kafka\kafka-tools\3.2.0\kafka-tools-3.2.0.jar;D:\repository\net\sourceforge\argparse4j\argparse4j\0.7.0\argparse4j-0.7.0.jar;D:\repository\ch\qos\reload4j\reload4j\1.2.19\reload4j-1.2.19.jar;D:\repository\org\bitbucket\b_c\jose4j\0.7.9\jose4j-0.7.9.jar;D:\repository\com\fasterxml\jackson\core\jackson-annotations\2.12.6\jackson-annotations-2.12.6.jar;D:\repository\com\fasterxml\jackson\jaxrs\jackson-jaxrs-json-provider\2.12.6\jackson-jaxrs-json-provider-2.12.6.jar;D:\repository\com\fasterxml\jackson\jaxrs\jackson-jaxrs-base\2.12.6\jackson-jaxrs-base-2.12.6.jar;D:\repository\com\fasterxml\jackson\module\jackson-module-jaxb-annotations\2.12.6\jackson-module-jaxb-annotations-2.12.6.jar;D:\repository\jakarta\xml\bind\jakarta.xml.bind-api\2.3.2\jakarta.xml.bind-api-2.3.2.jar;D:\repository\jakarta\activation\jakarta.activation-api\1.2.1\jakarta.activation-api-1.2.1.jar;D:\repository\org\glassfish\jersey\containers\jersey-container-servlet\2.34\jersey-container-servlet-2.34.jar;D:\repository\org\glassfish\jersey\containers\jersey-container-servlet-core\2.34\jersey-container-servlet-core-2.34.jar;D:\repository\org\glassfish\hk2\external\jakarta.inject\2.6.1\jakarta.inject-2.6.1.jar;D:\repository\jakarta\ws\rs\jakarta.ws.rs-api\2.1.6\jakarta.ws.rs-api-2.1.6.jar;D:\repository\org\glassfish\jersey\inject\jersey-hk2\2.34\jersey-hk2-2.34.jar;D:\repository\org\glassfish\hk2\hk2-locator\2.6.1\hk2-locator-2.6.1.jar;D:\repository\org\glassfish\hk2\external\aopalliance-repackaged\2.6.1\aopalliance-repackaged-2.6.1.jar;D:\repository\org\glassfish\hk2\hk2-api\2.6.1\hk2-api-2.6.1.jar;D:\repository\org\glassfish\hk2\hk2-utils\2.6.1\hk2-utils-2.6.1.jar;D:\repository\javax\xml\bind\jaxb-api\2.3.0\jaxb-api-2.3.0.jar;D:\repository\javax\activation\activation\1.1.1\activation-1.1.1.jar;D:\repository\org\eclipse\jetty\jetty-server\9.4.44.v20210927\jetty-server-9.4.44.v20210927.jar;D:\repository\javax\servlet\javax.servlet-api\3.1.0\javax.servlet-api-3.1.0.jar;D:\repository\org\eclipse\jetty\jetty-http\9.4.44.v20210927\jetty-http-9.4.44.v20210927.jar;D:\repository\org\eclipse\jetty\jetty-io\9.4.44.v20210927\jetty-io-9.4.44.v20210927.jar;D:\repository\org\eclipse\jetty\jetty-servlet\9.4.44.v20210927\jetty-servlet-9.4.44.v20210927.jar;D:\repository\org\eclipse\jetty\jetty-security\9.4.44.v20210927\jetty-security-9.4.44.v20210927.jar;D:\repository\org\eclipse\jetty\jetty-util-ajax\9.4.44.v20210927\jetty-util-ajax-9.4.44.v20210927.jar;D:\repository\org\eclipse\jetty\jetty-servlets\9.4.44.v20210927\jetty-servlets-9.4.44.v20210927.jar;D:\repository\org\eclipse\jetty\jetty-continuation\9.4.44.v20210927\jetty-continuation-9.4.44.v20210927.jar;D:\repository\org\eclipse\jetty\jetty-util\9.4.44.v20210927\jetty-util-9.4.44.v20210927.jar;D:\repository\org\eclipse\jetty\jetty-client\9.4.44.v20210927\jetty-client-9.4.44.v20210927.jar;D:\repository\org\reflections\reflections\0.9.12\reflections-0.9.12.jar;D:\repository\org\apache\maven\maven-artifact\3.8.4\maven-artifact-3.8.4.jar;D:\repository\org\codehaus\plexus\plexus-utils\3.3.0\plexus-utils-3.3.0.jar;D:\repository\org\apache\kafka\connect-json\3.2.0\connect-json-3.2.0.jar;D:\repository\com\fasterxml\jackson\datatype\jackson-datatype-jdk8\2.12.6\jackson-datatype-jdk8-2.12.6.jar;D:\repository\org\apache\kafka\connect-file\3.2.0\connect-file-3.2.0.jar;D:\repository\io\debezium\debezium-connector-mysql\1.9.7.Final\debezium-connector-mysql-1.9.7.Final.jar;D:\repository\io\debezium\debezium-core\1.9.7.Final\debezium-core-1.9.7.Final.jar;D:\repository\com\fasterxml\jackson\core\jackson-core\2.13.2\jackson-core-2.13.2.jar;D:\repository\com\fasterxml\jackson\core\jackson-databind\2.13.2.2\jackson-databind-2.13.2.2.jar;D:\repository\com\fasterxml\jackson\datatype\jackson-datatype-jsr310\2.13.2\jackson-datatype-jsr310-2.13.2.jar;D:\repository\com\google\guava\guava\30.1.1-jre\guava-30.1.1-jre.jar;D:\repository\com\google\guava\failureaccess\1.0.1\failureaccess-1.0.1.jar;D:\repository\com\google\guava\listenablefuture\9999.0-empty-to-avoid-conflict-with-guava\listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar;D:\repository\io\debezium\debezium-ddl-parser\1.9.7.Final\debezium-ddl-parser-1.9.7.Final.jar;D:\repository\org\antlr\antlr4-runtime\4.8\antlr4-runtime-4.8.jar;D:\repository\com\zendesk\mysql-binlog-connector-java\0.27.2\mysql-binlog-connector-java-0.27.2.jar;D:\repository\com\github\luben\zstd-jni\1.5.0-2\zstd-jni-1.5.0-2.jar;D:\repository\com\esri\geometry\esri-geometry-api\2.2.0\esri-geometry-api-2.2.0.jar;D:\repository\com\zaxxer\HikariCP\4.0.3\HikariCP-4.0.3.jar;D:\repository\org\awaitility\awaitility\4.0.1\awaitility-4.0.1.jar;D:\repository\org\hamcrest\hamcrest\2.1\hamcrest-2.1.jar;D:\repository\com\mysql\mysql-connector-j\8.0.33\mysql-connector-j-8.0.33.jar;D:\repository\com\google\protobuf\protobuf-java\3.21.9\protobuf-java-3.21.9.jar;D:\repository\com\alibaba\fastjson\1.2.75\fastjson-1.2.75.jar com.dashu.FlinkCDC1
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
log4j:WARN No appenders could be found for logger (org.apache.flink.shaded.netty4.io.netty.util.internal.logging.InternalLoggerFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
SLF4J: Defaulting to no-operation MDCAdapter implementation.
SLF4J: See http://www.slf4j.org/codes.html#no_static_mdc_binder for further details.
二月 06, 2026 8:24:29 上午 com.github.shyiko.mysql.binlog.BinaryLogClient connect
信息: Connected to 192.168.67.133:3306 at binlog.000024/1095 (sid:5805, cid:25)
SourceRecord{sourcePartition={server=mysql_binlog_source}, sourceOffset={ts_sec=1770337470, file=binlog.000024, pos=1095}} ConnectRecord{topic='mysql_binlog_source.cdc_test.user_info', kafkaPartition=null, key=Struct{id=1001}, keySchema=Schema{mysql_binlog_source.cdc_test.user_info.Key:STRUCT}, value=Struct{after=Struct{id=1001,name=张三,sex=male},source=Struct{version=1.9.7.Final,connector=mysql,name=mysql_binlog_source,ts_ms=1770337470000,snapshot=last,db=cdc_test,table=user_info,server_id=0,file=binlog.000024,pos=1095,row=0},op=r,ts_ms=1770337469306}, valueSchema=Schema{mysql_binlog_source.cdc_test.user_info.Envelope:STRUCT}, timestamp=null, headers=ConnectHeaders(headers=)}

6:查看数据更新

首先,我们手动往数据库中插入一条数据:

然后,突然发现控制台已经读取到了:

复制代码
SourceRecord{sourcePartition={server=mysql_binlog_source}, sourceOffset={ts_sec=1770337470, file=binlog.000024, pos=1095}} ConnectRecord{topic='mysql_binlog_source.cdc_test.user_info', kafkaPartition=null, key=Struct{id=1001}, keySchema=Schema{mysql_binlog_source.cdc_test.user_info.Key:STRUCT}, value=Struct{after=Struct{id=1001,name=张三,sex=male},source=Struct{version=1.9.7.Final,connector=mysql,name=mysql_binlog_source,ts_ms=1770337470000,snapshot=last,db=cdc_test,table=user_info,server_id=0,file=binlog.000024,pos=1095,row=0},op=r,ts_ms=1770337469306}, valueSchema=Schema{mysql_binlog_source.cdc_test.user_info.Envelope:STRUCT}, timestamp=null, headers=ConnectHeaders(headers=)}
SourceRecord{sourcePartition={server=mysql_binlog_source}, sourceOffset={transaction_id=null, ts_sec=1770338014, file=binlog.000024, pos=1174, row=1, server_id=1, event=2}} ConnectRecord{topic='mysql_binlog_source.cdc_test.user_info', kafkaPartition=null, key=Struct{id=1002}, keySchema=Schema{mysql_binlog_source.cdc_test.user_info.Key:STRUCT}, value=Struct{after=Struct{id=1002,name=李四,sex=famal},source=Struct{version=1.9.7.Final,connector=mysql,name=mysql_binlog_source,ts_ms=1770338014000,db=cdc_test,table=user_info,server_id=1,file=binlog.000024,pos=1322,row=0,thread=26},op=c,ts_ms=1770338013346}, valueSchema=Schema{mysql_binlog_source.cdc_test.user_info.Envelope:STRUCT}, timestamp=null, headers=ConnectHeaders(headers=)}

然后,我们把李四,改为lisi:

复制代码
SourceRecord{sourcePartition={server=mysql_binlog_source}, sourceOffset={ts_sec=1770337470, file=binlog.000024, pos=1095}} ConnectRecord{topic='mysql_binlog_source.cdc_test.user_info', kafkaPartition=null, key=Struct{id=1001}, keySchema=Schema{mysql_binlog_source.cdc_test.user_info.Key:STRUCT}, value=Struct{after=Struct{id=1001,name=张三,sex=male},source=Struct{version=1.9.7.Final,connector=mysql,name=mysql_binlog_source,ts_ms=1770337470000,snapshot=last,db=cdc_test,table=user_info,server_id=0,file=binlog.000024,pos=1095,row=0},op=r,ts_ms=1770337469306}, valueSchema=Schema{mysql_binlog_source.cdc_test.user_info.Envelope:STRUCT}, timestamp=null, headers=ConnectHeaders(headers=)}
SourceRecord{sourcePartition={server=mysql_binlog_source}, sourceOffset={transaction_id=null, ts_sec=1770338014, file=binlog.000024, pos=1174, row=1, server_id=1, event=2}} ConnectRecord{topic='mysql_binlog_source.cdc_test.user_info', kafkaPartition=null, key=Struct{id=1002}, keySchema=Schema{mysql_binlog_source.cdc_test.user_info.Key:STRUCT}, value=Struct{after=Struct{id=1002,name=李四,sex=famal},source=Struct{version=1.9.7.Final,connector=mysql,name=mysql_binlog_source,ts_ms=1770338014000,db=cdc_test,table=user_info,server_id=1,file=binlog.000024,pos=1322,row=0,thread=26},op=c,ts_ms=1770338013346}, valueSchema=Schema{mysql_binlog_source.cdc_test.user_info.Envelope:STRUCT}, timestamp=null, headers=ConnectHeaders(headers=)}
SourceRecord{sourcePartition={server=mysql_binlog_source}, sourceOffset={transaction_id=null, ts_sec=1770338180, file=binlog.000024, pos=1489, row=1, server_id=1, event=2}} ConnectRecord{topic='mysql_binlog_source.cdc_test.user_info', kafkaPartition=null, key=Struct{id=1002}, keySchema=Schema{mysql_binlog_source.cdc_test.user_info.Key:STRUCT}, value=Struct{before=Struct{id=1002,name=李四,sex=famal},after=Struct{id=1002,name=lisi,sex=famal},source=Struct{version=1.9.7.Final,connector=mysql,name=mysql_binlog_source,ts_ms=1770338180000,db=cdc_test,table=user_info,server_id=1,file=binlog.000024,pos=1646,row=0,thread=26},op=u,ts_ms=1770338178961}, valueSchema=Schema{mysql_binlog_source.cdc_test.user_info.Envelope:STRUCT}, timestamp=null, headers=ConnectHeaders(headers=)}

然后再删除一条数据:

复制代码
SourceRecord{sourcePartition={server=mysql_binlog_source}, sourceOffset={ts_sec=1770337470, file=binlog.000024, pos=1095}} ConnectRecord{topic='mysql_binlog_source.cdc_test.user_info', kafkaPartition=null, key=Struct{id=1001}, keySchema=Schema{mysql_binlog_source.cdc_test.user_info.Key:STRUCT}, value=Struct{after=Struct{id=1001,name=张三,sex=male},source=Struct{version=1.9.7.Final,connector=mysql,name=mysql_binlog_source,ts_ms=1770337470000,snapshot=last,db=cdc_test,table=user_info,server_id=0,file=binlog.000024,pos=1095,row=0},op=r,ts_ms=1770337469306}, valueSchema=Schema{mysql_binlog_source.cdc_test.user_info.Envelope:STRUCT}, timestamp=null, headers=ConnectHeaders(headers=)}
SourceRecord{sourcePartition={server=mysql_binlog_source}, sourceOffset={transaction_id=null, ts_sec=1770338014, file=binlog.000024, pos=1174, row=1, server_id=1, event=2}} ConnectRecord{topic='mysql_binlog_source.cdc_test.user_info', kafkaPartition=null, key=Struct{id=1002}, keySchema=Schema{mysql_binlog_source.cdc_test.user_info.Key:STRUCT}, value=Struct{after=Struct{id=1002,name=李四,sex=famal},source=Struct{version=1.9.7.Final,connector=mysql,name=mysql_binlog_source,ts_ms=1770338014000,db=cdc_test,table=user_info,server_id=1,file=binlog.000024,pos=1322,row=0,thread=26},op=c,ts_ms=1770338013346}, valueSchema=Schema{mysql_binlog_source.cdc_test.user_info.Envelope:STRUCT}, timestamp=null, headers=ConnectHeaders(headers=)}
SourceRecord{sourcePartition={server=mysql_binlog_source}, sourceOffset={transaction_id=null, ts_sec=1770338180, file=binlog.000024, pos=1489, row=1, server_id=1, event=2}} ConnectRecord{topic='mysql_binlog_source.cdc_test.user_info', kafkaPartition=null, key=Struct{id=1002}, keySchema=Schema{mysql_binlog_source.cdc_test.user_info.Key:STRUCT}, value=Struct{before=Struct{id=1002,name=李四,sex=famal},after=Struct{id=1002,name=lisi,sex=famal},source=Struct{version=1.9.7.Final,connector=mysql,name=mysql_binlog_source,ts_ms=1770338180000,db=cdc_test,table=user_info,server_id=1,file=binlog.000024,pos=1646,row=0,thread=26},op=u,ts_ms=1770338178961}, valueSchema=Schema{mysql_binlog_source.cdc_test.user_info.Envelope:STRUCT}, timestamp=null, headers=ConnectHeaders(headers=)}
SourceRecord{sourcePartition={server=mysql_binlog_source}, sourceOffset={transaction_id=null, ts_sec=1770338337, file=binlog.000024, pos=1834, row=1, server_id=1, event=2}} ConnectRecord{topic='mysql_binlog_source.cdc_test.user_info', kafkaPartition=null, key=Struct{id=1002}, keySchema=Schema{mysql_binlog_source.cdc_test.user_info.Key:STRUCT}, value=Struct{before=Struct{id=1002,name=lisi,sex=famal},source=Struct{version=1.9.7.Final,connector=mysql,name=mysql_binlog_source,ts_ms=1770338337000,db=cdc_test,table=user_info,server_id=1,file=binlog.000024,pos=1982,row=0,thread=26},op=d,ts_ms=1770338335944}, valueSchema=Schema{mysql_binlog_source.cdc_test.user_info.Envelope:STRUCT}, timestamp=null, headers=ConnectHeaders(headers=)}

到这,我们已经基于Flink CDC使用流的方式监控到了MySQL数据的变化。

生产环境中,我们会默认开启check point。任务挂掉之后,会从checkpoint继续执行。从checkpoint恢复数据的时候。就不会从开始读取数据了。

7:开始CK

复制代码
package com.dashu;

import com.ververica.cdc.connectors.mysql.MySqlSource;
import com.ververica.cdc.connectors.mysql.table.StartupOptions;
import com.ververica.cdc.debezium.DebeziumSourceFunction;
import com.ververica.cdc.debezium.StringDebeziumDeserializationSchema;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class FlinkCDC1 {
    public static void main(String[] args) throws Exception {

        // 获取Flink的执行环境
        StreamExecutionEnvironment evn = StreamExecutionEnvironment.getExecutionEnvironment();
        evn.setParallelism(1);

        // 开启CK,这里我们开启的5秒钟做一次,生产环境中我们采用的是分钟级别一次。
        evn.enableCheckpointing(5000);
        // 设置checkpoint超时时间
        evn.getCheckpointConfig().setCheckpointTimeout(10000);
        // 设置Check为精准一次性。
        evn.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
        evn.getCheckpointConfig().setMaxConcurrentCheckpoints(1);

        // 这块咱们就不看了,没有安装hdfs
//        evn.setStateBackend(new FlinkSQLCDC(""))

        // 通过Flink CDC构建SourceFunction
        DebeziumSourceFunction<String> sourceFunction = MySqlSource.<String>builder()
                .hostname("192.168.67.133")
                .port(3306)
                .username("root")
                .password("root")
                //可变形参:可以同时监控多个数据库
                .databaseList("cdc_test")
                //可变形参:可以同时监控多个数据表
                //表的格式要这么写。
                .tableList("cdc_test.user_info")
                // 反序列化器,后续肯定不用这个,这里先展示下。
                .deserializer(new StringDebeziumDeserializationSchema())
                // 监控模式。
                .startupOptions(StartupOptions.initial())
                .build();

        DataStreamSource<String> dataStreamSource = evn.addSource(sourceFunction);

        // 此时已经获取到流对象,可以对数据进行二次加工,写入到其他数据库,或者写入到中间件当中。
        // 数据打印
        dataStreamSource.print();

        // 启动任务
        evn.execute("FlinkCDC");
    }
}
相关推荐
兆子龙20 小时前
用 React + Remotion 做视频:入门与 AI 驱动生成
前端·架构
一枚前端小姐姐1 天前
低代码平台表单设计系统技术分析(实战二)
低代码·架构·前端框架
爱勇宝1 天前
2026年前端生存规划:只会写页面的人,正在被悄悄淘汰
前端·后端·架构
天蓝色的鱼鱼1 天前
Node.js 中间层退潮:从“前端救星”到“成本噩梦”
前端·架构·node.js
codingWhat1 天前
uniapp 多地区、多平台、多环境打包方案
前端·架构·node.js
RoyLin1 天前
20 行代码,构建 Claude Code 核心能力
架构·agent
一枚前端小姐姐1 天前
低代码平台表单设计系统架构分析(实战一)
前端·低代码·架构
Mintopia1 天前
如何识别和拆解关键重要的业务问题:从“拍脑袋”到数据驱动的系统方法
架构
武子康1 天前
大数据-235 离线数仓 - 实战:Flume+HDFS+Hive 搭建 ODS/DWD/DWS/ADS 会员分析链路
大数据·后端·apache hive
DianSan_ERP2 天前
电商API接口全链路监控:构建坚不可摧的线上运维防线
大数据·运维·网络·人工智能·git·servlet