Flink CDC系列之:Kafka 数据接收器配置选项类KafkaDataSinkOptions
这是一个 Kafka 数据接收器配置选项类,定义了 Flink CDC 连接 Kafka Sink 的所有可配置参数。
类概述
java
/** Options for {@link KafkaDataSinkOptions}. */
public class KafkaDataSinkOptions {
// 配置常量定义
}
这个类包含了 Kafka Sink 的所有配置选项,使用 Flink CDC 的配置框架。
配置常量定义
前缀和分隔符
java
// Kafka 特定属性的前缀
public static final String PROPERTIES_PREFIX = "properties.";
// 表映射的分隔符
public static final String DELIMITER_TABLE_MAPPINGS = ";";
// 选择器和主题的分隔符
public static final String DELIMITER_SELECTOR_TOPIC = ":";
交付保证配置
java
public static final ConfigOption<DeliveryGuarantee> DELIVERY_GUARANTEE =
key("sink.delivery-guarantee")
.enumType(DeliveryGuarantee.class)
.defaultValue(DeliveryGuarantee.AT_LEAST_ONCE)
.withDescription("Optional delivery guarantee when committing.");
说明:
- 选项名: sink.delivery-guarantee
- 类型: DeliveryGuarantee 枚举
- 默认值: AT_LEAST_ONCE(至少一次)
可选值:
- AT_LEAST_ONCE: 至少一次
- EXACTLY_ONCE: 精确一次
- NONE: 无保证
分区策略配置
java
public static final ConfigOption<PartitionStrategy> PARTITION_STRATEGY =
key("partition.strategy")
.enumType(PartitionStrategy.class)
.defaultValue(PartitionStrategy.ALL_TO_ZERO)
.withDescription(
"Defines the strategy for sending record to kafka topic, "
+ "available options are `all-to-zero` and `hash-by-key`, default option is `all-to-zero`.");
说明:
- 选项名: partition.strategy
- 类型: PartitionStrategy 枚举
- 默认值: ALL_TO_ZERO(全部发送到分区0)
- 可选值:
- ALL_TO_ZERO: 所有记录发送到分区0
- HASH_BY_KEY: 根据主键哈希分区
Key 格式配置
java
public static final ConfigOption<KeyFormat> KEY_FORMAT =
key("key.format")
.enumType(KeyFormat.class)
.defaultValue(KeyFormat.JSON)
.withDescription(
"Defines the format identifier for encoding key data, "
+ "available options are `csv` and `json`, default option is `json`.");
说明:
- 选项名: key.format
- 类型: KeyFormat 枚举
- 默认值: JSON
- 可选值:
- JSON: JSON 格式
- CSV: CSV 格式
Value 格式配置
java
public static final ConfigOption<JsonSerializationType> VALUE_FORMAT =
key("value.format")
.enumType(JsonSerializationType.class)
.defaultValue(JsonSerializationType.DEBEZIUM_JSON)
.withDescription(
"Defines the format identifier for encoding value data, "
+ "available options are `debezium-json` and `canal-json`, default option is `debezium-json`.");
说明:
- 选项名: value.format
- 类型: JsonSerializationType 枚举
- 默认值: DEBEZIUM_JSON
可选值:
- DEBEZIUM_JSON: Debezium JSON 格式
- CANAL_JSON: Canal JSON 格式
统一主题配置
java
public static final ConfigOption<String> TOPIC =
key("topic")
.stringType()
.noDefaultValue()
.withDescription(
"Optional. If this parameter is configured, all events will be sent to this topic.");
说明:
- 选项名: topic
- 类型: 字符串
- 默认值: 无(必须显式配置)
- 作用: 所有事件发送到指定主题
表信息 Header 配置
java
public static final ConfigOption<Boolean> SINK_ADD_TABLEID_TO_HEADER_ENABLED =
key("sink.add-tableId-to-header-enabled")
.booleanType()
.defaultValue(false)
.withDescription(
"Optional. If this parameter is configured, a header with key of 'namespace','schemaName','tableName' will be added for each Kafka record.");
说明:
- 选项名: sink.add-tableId-to-header-enabled
- 类型: 布尔值
- 默认值: false
- 作用: 为每个 Kafka 记录添加包含表信息的 Header
自定义 Header 配置
java
public static final ConfigOption<String> SINK_CUSTOM_HEADER =
key("sink.custom-header")
.stringType()
.defaultValue("")
.withDescription(
"custom headers for each kafka record. Each header are separated by ',', separate key and value by ':'. For example, we can set headers like 'key1:value1,key2:value2'.");
说明:
- 选项名: sink.custom-header
- 类型: 字符串
- 默认值: 空字符串
- 格式: key1:value1,key2:value2
表到主题映射配置
java
public static final ConfigOption<String> SINK_TABLE_ID_TO_TOPIC_MAPPING =
key("sink.tableId-to-topic.mapping")
.stringType()
.noDefaultValue()
.withDescription(
Description.builder()
.text(
"Custom table mappings for each table from upstream tableId to downstream Kafka topic. Each mapping is separated by ")
.text(DELIMITER_TABLE_MAPPINGS)
.text(
", separate upstream tableId selectors and downstream Kafka topic by ")
.text(DELIMITER_SELECTOR_TOPIC)
.text(
". For example, we can set 'sink.tableId-to-topic.mappingg' like 'mydb.mytable1:topic1;mydb.mytable2:topic2'.")
.build());
说明:
- 选项名: sink.tableId-to-topic.mapping
- 类型: 字符串
- 默认值: 无
- 格式: 数据库.表名:主题名;数据库.表名:主题名
Debezium Schema 包含配置
java
public static final ConfigOption<Boolean> DEBEZIUM_JSON_INCLUDE_SCHEMA_ENABLED =
key("debezium-json.include-schema.enabled")
.booleanType()
.defaultValue(false)
.withDescription(
"Optional. If this parameter is configured, each debezium record will contain debezium schema information. Is only supported when using debezium-json.");
说明:
- 选项名: debezium-json.include-schema.enabled
- 类型: 布尔值
- 默认值: false
- 作用: 在 Debezium JSON 中包含 Schema 信息
配置使用示例
完整配置示例
java
# 交付保证
sink.delivery-guarantee = exactly-once
# 分区策略
partition.strategy = hash-by-key
# 数据格式
key.format = json
value.format = debezium-json
# 主题配置
topic = unified-topic
# Header 配置
sink.add-tableId-to-header-enabled = true
sink.custom-header = source:mysql;version:1.0
# 表到主题映射
sink.tableId-to-topic.mapping = mydb.users:user-events;mydb.orders:order-events
# Debezium 配置
debezium-json.include-schema.enabled = true
# Kafka 生产者属性
properties.bootstrap.servers = localhost:9092
properties.acks = all
编程使用示例
java
// 创建配置
Configuration config = new Configuration();
config.set(KafkaDataSinkOptions.DELIVERY_GUARANTEE, DeliveryGuarantee.EXACTLY_ONCE);
config.set(KafkaDataSinkOptions.PARTITION_STRATEGY, PartitionStrategy.HASH_BY_KEY);
config.set(KafkaDataSinkOptions.TOPIC, "cdc-events");
config.set(KafkaDataSinkOptions.SINK_ADD_TABLEID_TO_HEADER_ENABLED, true);
// 在 Sink 构建器中使用
KafkaDataSink sink = KafkaDataSink.builder()
.setConfig(config)
.build();
设计特点
- 类型安全: 使用枚举和具体类型,避免配置错误
- 描述清晰: 每个配置都有详细的说明文档
- 默认值合理: 为常用配置提供合理的默认值
- 扩展性好: 通过 PROPERTIES_PREFIX 支持所有 Kafka 生产者属性
- 用户友好: 提供示例格式,便于用户理解和使用
这个配置类为 Kafka Sink 提供了完整且灵活的参数配置能力,支持从简单到复杂的各种使用场景。