Flink CDC系列之：Kafka 数据接收器配置选项类KafkaDataSinkOptions

这是一个 Kafka 数据接收器配置选项类，定义了 Flink CDC 连接 Kafka Sink 的所有可配置参数。

类概述

java 复制代码

/** Options for {@link KafkaDataSinkOptions}. */
public class KafkaDataSinkOptions {
    // 配置常量定义
}

这个类包含了 Kafka Sink 的所有配置选项，使用 Flink CDC 的配置框架。

配置常量定义
前缀和分隔符

java 复制代码

// Kafka 特定属性的前缀
public static final String PROPERTIES_PREFIX = "properties.";

// 表映射的分隔符
public static final String DELIMITER_TABLE_MAPPINGS = ";";

// 选择器和主题的分隔符  
public static final String DELIMITER_SELECTOR_TOPIC = ":";

交付保证配置

java 复制代码

public static final ConfigOption<DeliveryGuarantee> DELIVERY_GUARANTEE =
        key("sink.delivery-guarantee")
                .enumType(DeliveryGuarantee.class)
                .defaultValue(DeliveryGuarantee.AT_LEAST_ONCE)
                .withDescription("Optional delivery guarantee when committing.");

说明：

选项名: sink.delivery-guarantee
类型: DeliveryGuarantee 枚举
默认值: AT_LEAST_ONCE（至少一次）

可选值:

AT_LEAST_ONCE: 至少一次
EXACTLY_ONCE: 精确一次
NONE: 无保证

分区策略配置

java 复制代码

public static final ConfigOption<PartitionStrategy> PARTITION_STRATEGY =
        key("partition.strategy")
                .enumType(PartitionStrategy.class)
                .defaultValue(PartitionStrategy.ALL_TO_ZERO)
                .withDescription(
                        "Defines the strategy for sending record to kafka topic, "
                                + "available options are `all-to-zero` and `hash-by-key`, default option is `all-to-zero`.");

说明：

选项名: partition.strategy
类型: PartitionStrategy 枚举
默认值: ALL_TO_ZERO（全部发送到分区0）
可选值:
- ALL_TO_ZERO: 所有记录发送到分区0
- HASH_BY_KEY: 根据主键哈希分区

Key 格式配置

java 复制代码

public static final ConfigOption<KeyFormat> KEY_FORMAT =
        key("key.format")
                .enumType(KeyFormat.class)
                .defaultValue(KeyFormat.JSON)
                .withDescription(
                        "Defines the format identifier for encoding key data, "
                                + "available options are `csv` and `json`, default option is `json`.");

说明：

选项名: key.format
类型: KeyFormat 枚举
默认值: JSON
可选值:
- JSON: JSON 格式
- CSV: CSV 格式

Value 格式配置

java 复制代码

public static final ConfigOption<JsonSerializationType> VALUE_FORMAT =
        key("value.format")
                .enumType(JsonSerializationType.class)
                .defaultValue(JsonSerializationType.DEBEZIUM_JSON)
                .withDescription(
                        "Defines the format identifier for encoding value data, "
                                + "available options are `debezium-json` and `canal-json`, default option is `debezium-json`.");

说明：

选项名: value.format
类型: JsonSerializationType 枚举
默认值: DEBEZIUM_JSON

可选值:

DEBEZIUM_JSON: Debezium JSON 格式
CANAL_JSON: Canal JSON 格式

统一主题配置

java 复制代码

public static final ConfigOption<String> TOPIC =
        key("topic")
                .stringType()
                .noDefaultValue()
                .withDescription(
                        "Optional. If this parameter is configured, all events will be sent to this topic.");

说明：

选项名: topic
类型: 字符串
默认值: 无（必须显式配置）
作用: 所有事件发送到指定主题

表信息 Header 配置

java 复制代码

public static final ConfigOption<Boolean> SINK_ADD_TABLEID_TO_HEADER_ENABLED =
        key("sink.add-tableId-to-header-enabled")
                .booleanType()
                .defaultValue(false)
                .withDescription(
                        "Optional. If this parameter is configured, a header with key of 'namespace','schemaName','tableName' will be added for each Kafka record.");

说明：

选项名: sink.add-tableId-to-header-enabled
类型: 布尔值
默认值: false
作用: 为每个 Kafka 记录添加包含表信息的 Header

自定义 Header 配置

java 复制代码

public static final ConfigOption<String> SINK_CUSTOM_HEADER =
        key("sink.custom-header")
                .stringType()
                .defaultValue("")
                .withDescription(
                        "custom headers for each kafka record. Each header are separated by ',', separate key and value by ':'. For example, we can set headers like 'key1:value1,key2:value2'.");

说明：

选项名: sink.custom-header
类型: 字符串
默认值: 空字符串
格式: key1:value1,key2:value2

表到主题映射配置

java 复制代码

public static final ConfigOption<String> SINK_TABLE_ID_TO_TOPIC_MAPPING =
        key("sink.tableId-to-topic.mapping")
                .stringType()
                .noDefaultValue()
                .withDescription(
                        Description.builder()
                                .text(
                                        "Custom table mappings for each table from upstream tableId to downstream Kafka topic. Each mapping is separated by ")
                                .text(DELIMITER_TABLE_MAPPINGS)
                                .text(
                                        ", separate upstream tableId selectors and downstream Kafka topic by ")
                                .text(DELIMITER_SELECTOR_TOPIC)
                                .text(
                                        ". For example, we can set 'sink.tableId-to-topic.mappingg' like 'mydb.mytable1:topic1;mydb.mytable2:topic2'.")
                                .build());

说明：

选项名: sink.tableId-to-topic.mapping
类型: 字符串
默认值: 无
格式: 数据库.表名:主题名;数据库.表名:主题名

Debezium Schema 包含配置

java 复制代码

public static final ConfigOption<Boolean> DEBEZIUM_JSON_INCLUDE_SCHEMA_ENABLED =
        key("debezium-json.include-schema.enabled")
                .booleanType()
                .defaultValue(false)
                .withDescription(
                        "Optional. If this parameter is configured, each debezium record will contain debezium schema information. Is only supported when using debezium-json.");

说明：

选项名: debezium-json.include-schema.enabled
类型: 布尔值
默认值: false
作用: 在 Debezium JSON 中包含 Schema 信息

配置使用示例
完整配置示例

java 复制代码

# 交付保证
sink.delivery-guarantee = exactly-once

# 分区策略
partition.strategy = hash-by-key

# 数据格式
key.format = json
value.format = debezium-json

# 主题配置
topic = unified-topic

# Header 配置
sink.add-tableId-to-header-enabled = true
sink.custom-header = source:mysql;version:1.0

# 表到主题映射
sink.tableId-to-topic.mapping = mydb.users:user-events;mydb.orders:order-events

# Debezium 配置
debezium-json.include-schema.enabled = true

# Kafka 生产者属性
properties.bootstrap.servers = localhost:9092
properties.acks = all

编程使用示例

java 复制代码

// 创建配置
Configuration config = new Configuration();
config.set(KafkaDataSinkOptions.DELIVERY_GUARANTEE, DeliveryGuarantee.EXACTLY_ONCE);
config.set(KafkaDataSinkOptions.PARTITION_STRATEGY, PartitionStrategy.HASH_BY_KEY);
config.set(KafkaDataSinkOptions.TOPIC, "cdc-events");
config.set(KafkaDataSinkOptions.SINK_ADD_TABLEID_TO_HEADER_ENABLED, true);

// 在 Sink 构建器中使用
KafkaDataSink sink = KafkaDataSink.builder()
    .setConfig(config)
    .build();

设计特点

类型安全: 使用枚举和具体类型，避免配置错误
描述清晰: 每个配置都有详细的说明文档
默认值合理: 为常用配置提供合理的默认值
扩展性好: 通过 PROPERTIES_PREFIX 支持所有 Kafka 生产者属性
用户友好: 提供示例格式，便于用户理解和使用

这个配置类为 Kafka Sink 提供了完整且灵活的参数配置能力，支持从简单到复杂的各种使用场景。