前言
Kafka Connect 是 Apache Kafka 的一个组件,用于使其它系统,比如数据库、云服务、文件系统等能方便地连接到 Kafka。数据既可以通过 Kafka Connect 从其它系统流向 Kafka, 也可以通过 Kafka Connect 从 Kafka 流向其它系统。从其它系统读数据的插件称为 Source Connector, 写数据到其它系统的插件称为 Sink Connector。Source Connector 和 Sink Connector 都不会直接连接 Kafka Broker,Source Connector 把数据转交给 Kafka Connect。Sink Connector 从 Kafka Connect 接收数据。
安装 TDengine Connector 插件
需要先安装git、maven、unzip。
编译插件
shell
cd /tmp
git clone --branch 3.0 https://github.com/taosdata/kafka-connect-tdengine.git
cd kafka-connect-tdengine
mvn clean package -Dmaven.test.skip=true
unzip -d $KAFKA_HOME/components/ target/components/packages/taosdata-kafka-connect-tdengine-*.zip
配置插件
1. 分布式模式配置
编辑$KAFKA_HOME/config/connect-distributed.properties
配置文件。关键配置项包括:
bootstrap.servers
: Kafka集群的地址,格式为host1:port1,host2:port2,...
。group.id
: Kafka Connect使用的消费者组ID。key.converter
和value.converter
: 指定将记录键值从Kafka转换成Connect数据格式的转换器。常用的是org.apache.kafka.connect.json.JsonConverter
。offset.storage.topic
: Kafka Connect存储偏移量的topic名称。config.storage.topic
: Kafka Connect存储连接器配置的topic名称。status.storage.topic
: Kafka Connect存储连接器和任务状态的topic名称。
我的配置如下:
ini
# 先创建/usr/share/java目录,不能使用java的安装目录,否则会导致插件无法使用
mkdir -p /usr/share/java
#kafka集群的地址
bootstrap.servers=192.168.174.131:9092,192.168.174.130:9092
# unique name for the cluster, used in forming the Connect cluster group. Note that this must not conflict with consumer group IDs
# 这个id必须唯一,不然会导致无法到主机的路由
group.id=connect-cluster-1
# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Topic to use for storing offsets. This topic should have many partitions and be replicated and compacted.
# Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
offset.storage.topic=connect-offsets
# 值为2是因为我只配置了两个kafka节点
offset.storage.replication.factor=2
# Topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated,
# and compacted topic. Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
config.storage.topic=connect-configs
# 值为2是因为我只配置了两个kafka节点
config.storage.replication.factor=2
# Topic to use for storing statuses. This topic can have multiple partitions and should be replicated and compacted.
# Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
status.storage.topic=connect-status
# 值为2是因为我只配置了两个kafka节点
status.storage.replication.factor=2
# List of comma-separated URIs the REST API will listen on. The supported protocols are HTTP and HTTPS.
# Specify hostname as 0.0.0.0 to bind to all interfaces.
# Leave hostname empty to bind to default interface.
# Examples of legal listener lists: HTTP://myhost:8083,HTTPS://myhost:8084"
#listeners=HTTP://:8083
# 可以配置主机hostname:8083
listeners=HTTP://:8083
plugin.path=/usr/share/java,/usr/kafka/kafka_2.12-3.7.0/components
启动 Kafka
shell
zookeeper-server-start.sh -daemon $KAFKA_HOME/config/zookeeper.properties
kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
connect-distributed.sh -daemon $KAFKA_HOME/config/connect-distributed.properties
- -daemon:表示以守护进程运行,如果不确定是否配置成功,建议先去掉运行。
注意:我这里遇到了在kafka01节点执行
kafka connect
的curl -X POST -d @sink-demo.json http://localhost:8083/connectors -H "Content-Type: application/json"
没问题,但kafka02的执行的时候一直报{"error_code":500,"message":"IO Error trying to forward REST request: java.net.NoRouteToHostException: 没有到主机的路由"}
这个错误。然后我放行了8083
端口,同时修改了group.id
的值,使其和kafka01不一致,然后,报{"error_code":500,"message":"IO Error trying to forward REST request: java.net.ConnectException: 拒绝连接"}
这个错误。再后来,我去掉守护进程,发现报如下错误:
shellorg.apache.kafka.connect.errors.ConnectException: Unable to initialize REST server at >org.apache.kafka.connect.runtime.rest.RestServer.initializeServer(RestServer.java:199) at >org.apache.kafka.connect.cli.AbstractConnectCli.startConnect(AbstractConnectCli.java:129) at org.apache.kafka.connect.cli.AbstractConnectCli.run(AbstractConnectCli.java:94) at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:116) Caused by: java.io.IOException: Failed to bind to 0.0.0.0/0.0.0.0:8083 at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:349) at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:310) at >org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80) at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:234) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at org.eclipse.jetty.server.Server.doStart(Server.java:401) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at >org.apache.kafka.connect.runtime.rest.RestServer.initializeServer(RestServer.java:197) ... 3 more Caused by: java.net.BindException: 地址已在使用 at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:438) at sun.nio.ch.Net.bind(Net.java:430) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:225) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:344) ... 10 more
查看
8083
端口占用情况
shelllsof -i :8083
发现端口已被占用 杀死进程
shellkill -9 63250
不以守护进程方式重新运行
shellconnect-distributed.sh $KAFKA_HOME/config/connect-distributed.properties
运行成功。
测试kafka connect
在kafka01和kafka02上分别测试。
验证 kafka Connect 是否启动成功
shell
curl http://localhost:8083/connectors
如果各组件都启动成功,会得到如下输出:
shell
[]
添加 Sink Connector 配置文件
bash
mkdir ~/test
cd ~/test
vi sink-demo.json
sink-demo.json,因为对无模式写入做配置,采用的是TAOS-RS
连接方式,内容如下:
shell
{
"name": "TDengineSinkConnector",
"config": {
"connector.class":"com.taosdata.kafka.connect.sink.TDengineSinkConnector",
"tasks.max": "1",
"topics": "meters",
"connection.url": "jdbc:TAOS-RS://127.0.0.1:6041/?user=root&password=taosdata&batchfetch=true",
"connection.user": "root",
"connection.password": "taosdata",
"connection.database": "power",
"db.schemaless": "line",
"data.precision": "ns",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"errors.tolerance": "all",
"errors.deadletterqueue.topic.name": "dead_letter_topic",
"errors.deadletterqueue.topic.replication.factor": 1
}
}
关键配置说明:
"topics": "meters"
和"connection.database": "power"
, 表示订阅主题 meters 的数据,并写入数据库 power。"db.schemaless": "line"
, 表示使用 InfluxDB Line 协议格式的数据。
创建 Sink Connector 实例
shell
curl -X POST -d @sink-demo.json http://localhost:8083/connectors -H "Content-Type: application/json"
若以上命令执行成功,则有如下输出:
shell
{
"name": "TDengineSinkConnector",
"config": {
"connection.database": "power",
"connection.password": "taosdata",
"connection.url": "jdbc:TAOS-RS://127.0.0.1:6041/?user=root&password=taosdata&batchfetch=true",
"connection.user": "root",
"connector.class": "com.taosdata.kafka.connect.sink.TDengineSinkConnector",
"data.precision": "ns",
"db.schemaless": "line",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"tasks.max": "1",
"topics": "meters",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"name": "TDengineSinkConnector",
"errors.tolerance": "all",
"errors.deadletterqueue.topic.name": "dead_letter_topic",
"errors.deadletterqueue.topic.replication.factor": "1",
},
"tasks": [],
"type": "sink"
}
写入测试数据
准备测试数据的文本文件,内容如下:
ini
vi ~/test/test-data.txt
meters,location=California.LosAngeles,groupid=2 current=11.8,voltage=221,phase=0.28 1648432611249000000
meters,location=California.LosAngeles,groupid=2 current=13.4,voltage=223,phase=0.29 1648432611250000000
meters,location=California.LosAngeles,groupid=3 current=10.8,voltage=223,phase=0.29 1648432611249000000
meters,location=California.LosAngeles,groupid=3 current=11.3,voltage=221,phase=0.35 1648432611250000000
使用 kafka-console-producer 向主题 meters 添加测试数据。
shell
cat test-data.txt | kafka-console-producer.sh --broker-list localhost:9092 --topic meters
验证同步是否成功
使用 TDengine CLI 验证同步是否成功。
shell
taos> use power;
Database changed.
taos> select * from meters;
_ts | current | voltage | phase | groupid | location |
===============================================================================================================================================================
2022-03-28 09:56:51.249000000 | 11.800000000 | 221.000000000 | 0.280000000 | 2 | California.LosAngeles |
2022-03-28 09:56:51.250000000 | 13.400000000 | 223.000000000 | 0.290000000 | 2 | California.LosAngeles |
2022-03-28 09:56:51.249000000 | 10.800000000 | 223.000000000 | 0.290000000 | 3 | California.LosAngeles |
2022-03-28 09:56:51.250000000 | 11.300000000 | 221.000000000 | 0.350000000 | 3 | California.LosAngeles |
Query OK, 4 row(s) in set (0.004208s)
unload 插件
测试完毕之后,用 unload 命令停止已加载的 connector。
查看当前活跃的 connector:
shell
curl http://localhost:8083/connectors
如果按照前述操作,此时应有一个活跃的 connector
shell
curl -X DELETE http://localhost:8083/connectors/TDengineSinkConnector
写在最后
无模式写入,或者使用消费api直接写数据时,生成的子表名是看不懂的,可以参考无模式写入的主要处理逻辑,但此配置只对rest方式连接生效。
我的配置如下:
shell
vi /etc/taos/taos.cfg
# 添加smlAutoChildTableNameDelimiter
smlAutoChildTableNameDelimiter _
重启taosd
:
arduino
systemctl stop taosd
systemctl start taosd
systemctl status taosd
taos -C
这时候,如果之前创建了Sink Connector
,需要先删除,重新添加,才会生效,如下图,不是默认的MD5子表名了。
参考: