【Flink-Sql-Kafka-To-ClickHouse】使用 FlinkSql 将 Kafka 数据写入 ClickHouse

1)需求分析

1、数据源为 Kafka,定义 Kafka-Topic 为动态临时视图表。

2、写入到 ClickHouse,自定义 Sink 表。

3、source 和 sink 都使用 Flink 集成的 Connector

2)功能实现

导入ClickHouse连接器

xml 复制代码
<dependency>
    <groupId>com.aliyun</groupId>
    <artifactId>flink-connector-clickhouse</artifactId>
    <version>1.14.0</version>
</dependency>

如果在服务器上执行,需要将 jar 放到 Flink 的 lib 目录下。

3)准备工作

3.1.Kafka

1、创建好Topic

2、准备测试数据

json 复制代码
{
  "id": 1,
  "eventId": "TEST123",
  "eventStDt": "2022-11-3023:37:49",
  "bak6": "测试",
  "bak7": "https://test?user",
  "businessId": "17279811111111111111111111111111",
  "phone": "12345678910",
  "bak1": "1234",
  "bak2": "2022-12-0100:00:00",
  "bak13": "17279811111111111111111111111111",
  "bak14": "APP",
  "bak11": "TEST"
}

3.2.ClickHouse

1、创建表(此处我们使用生产环境中较为常用的 cluster 集群模式建表)

注意:集群模式要创建两次表,一次为 local 本地表,一次为 cluster 集群表。

  • local
sql 复制代码
CREATE TABLE test.kafka2ck_test_local on cluster test_cluster 
(
  `id` UInt32,
  `eventId` LowCardinality(Nullable(String)),
  `eventStDt` LowCardinality(Nullable(String)),
  `bak6` LowCardinality(Nullable(String)),
  `bak7` LowCardinality(Nullable(String)),
  `businessId` LowCardinality(Nullable(String)),
  `phone` LowCardinality(Nullable(String)),
  `bak1` LowCardinality(Nullable(String)),
  `bak2` LowCardinality(Nullable(String)),
  `bak13` LowCardinality(Nullable(String)),
  `bak14` LowCardinality(Nullable(String)),
  `bak11` LowCardinality(Nullable(String))
)
ENGINE = ReplicatedMergeTree
PARTITION BY id
PRIMARY KEY id
ORDER BY id
SETTINGS index_granularity = 8192;
  • cluster
sql 复制代码
CREATE TABLE test.kafka2ck_test on cluster test_cluster 
(
  `id` UInt32,
  `eventId` LowCardinality(Nullable(String)),
  `eventStDt` LowCardinality(Nullable(String)),
  `bak6` LowCardinality(Nullable(String)),
  `bak7` LowCardinality(Nullable(String)),
  `businessId` LowCardinality(Nullable(String)),
  `phone` LowCardinality(Nullable(String)),
  `bak1` LowCardinality(Nullable(String)),
  `bak2` LowCardinality(Nullable(String)),
  `bak13` LowCardinality(Nullable(String)),
  `bak14` LowCardinality(Nullable(String)),
  `bak11` LowCardinality(Nullable(String))
)
ENGINE = Distributed('test_cluster', 'test', 'kafka2ck_test_local', rand());
  • source
sql 复制代码
CREATE TABLE source_kafka_test (
  id INT,
  eventId STRING,
  eventStDt STRING,
  bak6 STRING,
  bak7 STRING,
  businessId STRING,
  phone STRING,
  bak1 STRING,
  bak2 STRING,
  bak13 STRING,
  bak14 STRING,
  bak11 STRING
 ) WITH (
    'connector' = 'kafka',
    'topic' = 'test',
    'format'='json',
    'properties.bootstrap.servers' = '${kafka-bootstrap-server}',
    'properties.group.id' = 'test01',
    'scan.startup.mode' = 'earliest-offset',
    'properties.security.protocol' = 'SASL_PLAINTEXT',
    'properties.sasl.kerberos.service.name' = 'kafka'
);
  • sink
sql 复制代码
CREATE TABLE sink_ck_test (
  id INT,
  eventId STRING,
  eventStDt STRING,
  bak6 STRING,
  bak7 STRING,
  businessId STRING,
  phone STRING,
  bak1 STRING,
  bak2 STRING,
  bak13 STRING,
  bak14 STRING,
  bak11 STRING,
  PRIMARY KEY (id) NOT ENFORCED
 ) WITH (
    'connector' = 'clickhouse',
    'url' = 'jdbc:clickhouse://123.1.1.1:9090',
    'database-name'='test',
    'table-name' = 'kafka2ck_test_local',
    'username' = 'test',
    'password' = '123456',
    'sink.batch-size' = '100',
    'sink.flush-interval' = '1000',
    'sink.max-retries' = '3'
);
  • insert
sql 复制代码
insert into sink_ck_test select * from source_kafka_test;

5)验证

在 Kafka 中写入对应 ClickHouse 格式的 Json 测试数据,观察 ClickHouse 中是否有数据写入。

json 复制代码
{
  "id": 1,
  "eventId": "TEST123",
  "eventStDt": "2022-11-3023:37:49",
  "bak6": "测试",
  "bak7": "https://test?user",
  "businessId": "17279811111111111111111111111111",
  "phone": "12345678910",
  "bak1": "1234",
  "bak2": "2022-12-0100:00:00",
  "bak13": "17279811111111111111111111111111",
  "bak14": "APP",
  "bak11": "TEST"
}
相关推荐
RunningShare7 小时前
千万级用户电商平台,Flink实时推荐系统如何实现毫秒级延迟?
大数据·flink·推荐系统·ab测试
编程充电站pro8 小时前
SQL 面试高频:INNER JOIN vs LEFT JOIN 怎么考?
数据库·sql
这周也會开心8 小时前
SQL-窗口函数做题总结
数据库·sql
東雪蓮☆11 小时前
Filebeat+Kafka+ELK 日志采集实战
分布式·elk·kafka
努力买辣条11 小时前
KafKa概念与安装
分布式·kafka
编程充电站pro12 小时前
面试陷阱:SQL 子查询 vs JOIN 的性能差异
数据库·sql
不太可爱的叶某人13 小时前
【学习笔记】kafka权威指南——第8章 跨集群数据镜像(7-10章只做了解)
笔记·学习·kafka
2351614 小时前
【MySQL】MVCC:从核心原理到幻读解决方案
java·数据库·后端·sql·mysql·缓存
RunningShare15 小时前
基于Flink的AB测试系统实现:从理论到生产实践
大数据·flink·ab测试
IT 小阿姨(数据库)16 小时前
PostgreSQL通过pg_basebackup物理备份搭建流复制备库(Streaming Replication Standby)
运维·服务器·数据库·sql·postgresql·centos