CENTOS上的网络安全工具(三十四) Portainer Kafka-Clickhouse部署(3) 连接kafka和clickhouse

之前我们在CENTO OS上的网络安全工具(二十)ClickHouse swarm容器化集群部署中使用swarm集群部署过Clickhouse,这次使用portainer进行部署,本质上与swarm部署是一样的,唯一的区别,在于clickhouse的新版本支持Kraft部署,已经不再需要zookeeper了。

下面使用6个虚拟机节点(node1~node6)模拟部署3shard的1主1备ClickHouse集群,其中node1~node3的3个节点用来复用部署keeper节点。

一、 Keeper部署

当前版本的Clickhouse使用Kraft部署,对于在portainer上部署的方式,可以直接下载clickhouse的keeper镜像,但与直接在裸金属上部署一样,我们需要逐节点更改配置文件。

1. 下载Keeper镜像文件

还是用最简单的方案,直接从国内镜像下载后更改tag即可

2. Keeper配置文件

在需要部署的server和keeper节点宿主机上创建文件夹,以便存放该节点对应的配置文件

复制代码
[root@node1 share]# cd /root/share
[root@node1 share]# mkdir ckconfig

(1)导出配置文件

先用容器载入镜像,将配置文件拷贝出来,这个配置文件在/etc/clickhouse-keeper目录下:

复制代码
[root@node1 share]# docker run -it --name tmp clickhouse/clickhouse-keeper:head-alpine bash
55d75f7fa95f:/# 
55d75f7fa95f:/# cd /etc
55d75f7fa95f:/etc# cd clickhouse-keeper/
55d75f7fa95f:/etc/clickhouse-keeper# ls
keeper_config.xml
55d75f7fa95f:/etc/clickhouse-keeper# cat keeper_config.xml 
<clickhouse>
    <logger>
        <!-- Possible levels [1]:

          - none (turns off logging)
          - fatal
          - critical
          - error
          - warning
          - notice
          - information
          - debug
          - trace

            [1]: https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/Logger.h#L105-L114
        -->
        <level>trace</level>
        <log>/var/log/clickhouse-keeper/clickhouse-keeper.log</log>
        <errorlog>/var/log/clickhouse-keeper/clickhouse-keeper.err.log</errorlog>
        <!-- Rotation policy
             See https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/FileChannel.h#L54-L85
          -->
        <size>1000M</size>
        <count>10</count>
        <!-- <console>1</console> --> <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
    </logger>

    <max_connections>4096</max_connections>

    <keeper_server>
            <tcp_port>9181</tcp_port>

            <!-- Must be unique among all keeper serves -->
            <server_id>1</server_id>

            <log_storage_path>/var/lib/clickhouse/coordination/logs</log_storage_path>
            <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>

            <coordination_settings>
                <operation_timeout_ms>10000</operation_timeout_ms>
                <min_session_timeout_ms>10000</min_session_timeout_ms>
                <session_timeout_ms>100000</session_timeout_ms>
                <raft_logs_level>information</raft_logs_level>
                <compress_logs>false</compress_logs>
                <!-- All settings listed in https://github.com/ClickHouse/ClickHouse/blob/master/src/Coordination/CoordinationSettings.h -->
            </coordination_settings>

            <!-- enable sanity hostname checks for cluster configuration (e.g. if localhost is used with remote endpoints) -->
            <hostname_checks_enabled>true</hostname_checks_enabled>
            <raft_configuration>
                <server>
                    <id>1</id>

                    <!-- Internal port and hostname -->
                    <hostname>localhost</hostname>
                    <port>9234</port>
                </server>

                <!-- Add more servers here -->

            </raft_configuration>
    </keeper_server>


    <openSSL>
      <server>
            <!-- Used for secure tcp port -->
            <!-- openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt -->
            <!-- <certificateFile>/etc/clickhouse-keeper/server.crt</certificateFile> -->
            <!-- <privateKeyFile>/etc/clickhouse-keeper/server.key</privateKeyFile> -->
            <!-- dhparams are optional. You can delete the <dhParamsFile> element.
                 To generate dhparams, use the following command:
                  openssl dhparam -out /etc/clickhouse-keeper/dhparam.pem 4096
                 Only file format with BEGIN DH PARAMETERS is supported.
              -->
            <!-- <dhParamsFile>/etc/clickhouse-keeper/dhparam.pem</dhParamsFile> -->
            <verificationMode>none</verificationMode>
            <loadDefaultCAFile>true</loadDefaultCAFile>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
        </server>
    </openSSL>

</clickhouse>

(2)更改配置

参数说明:

就当前测试而言,所需要更改的参数只有4个,均在keeper_server选项下:

① 扩大监听范围

<listen_host>0.0.0.0</listen_host>

② 设置监听端口

<tcp_port>9181</tcp_port>

③ 对应修改server_id

服务器唯一标识符,必须在集群中唯一。如我们计划使用node1、2、3节点做一个3节点的keeper集群,所以它们应该分别对应server_id=1,2以及3。

<!-- Must be unique among all keeper serves -->

<server_id>1</server_id>

④ 选举集群配置

raft_configuration:对每一个keeper节点,均通过该参数找到其它节点。所以,该参数下记录了所有keeper节点的server_id以及hostname信息

复制代码
            <raft_configuration>
                <server>
                    <id>1</id>
                    <!-- Internal port and hostname -->
                    <hostname>keeper1</hostname>
                    <port>9234</port>
                </server>
                <server>
                    <id>2</id>
                    <!-- Internal port and hostname -->
                    <hostname>keeper2</hostname>
                    <port>9234</port>
                </server>
                <server>
                    <id>3</id>
                    <!-- Internal port and hostname -->
                    <hostname>keeper3</hostname>
                    <port>9234</port>
                </server>

                <!-- Add more servers here -->

            </raft_configuration>

(3)xml配置文件示例

最终的clickhouse-keeper.xml文件如下:

复制代码
<clickhouse>
    <logger>
        <!-- Possible levels [1]:

          - none (turns off logging)
          - fatal
          - critical
          - error
          - warning
          - notice
          - information
          - debug
          - trace

            [1]: https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/Logger.h#L105-L114
        -->
        <level>trace</level>
        <log>/var/log/clickhouse-keeper/clickhouse-keeper.log</log>
        <errorlog>/var/log/clickhouse-keeper/clickhouse-keeper.err.log</errorlog>
        <!-- Rotation policy
             See https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/FileChannel.h#L54-L85
          -->
        <size>1000M</size>
        <count>10</count>
        <!-- <console>1</console> --> <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
    </logger>
    <listen_host>0.0.0.0</listen_host>
    <max_connections>4096</max_connections>

    <keeper_server>
            <tcp_port>9181</tcp_port>

            <!-- Must be unique among all keeper serves -->
            <server_id>1</server_id>

            <log_storage_path>/var/lib/clickhouse/coordination/logs</log_storage_path>
            <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>

            <coordination_settings>
                <operation_timeout_ms>10000</operation_timeout_ms>
                <min_session_timeout_ms>10000</min_session_timeout_ms>
                <session_timeout_ms>100000</session_timeout_ms>
                <raft_logs_level>information</raft_logs_level>
                <compress_logs>false</compress_logs>
                <!-- All settings listed in https://github.com/ClickHouse/ClickHouse/blob/master/src/Coordination/CoordinationSettings.h -->
            </coordination_settings>

            <!-- enable sanity hostname checks for cluster configuration (e.g. if localhost is used with remote endpoints) -->
            <hostname_checks_enabled>true</hostname_checks_enabled>
            <raft_configuration>
                <server>
                    <id>1</id>
                    <!-- Internal port and hostname -->
                    <hostname>keeper1</hostname>
                    <port>9234</port>
                </server>
                <server>
                    <id>2</id>
                    <!-- Internal port and hostname -->
                    <hostname>keeper2</hostname>
                    <port>9234</port>
                </server>
                <server>
                    <id>3</id>
                    <!-- Internal port and hostname -->
                    <hostname>keeper3</hostname>
                    <port>9234</port>
                </server>

                <!-- Add more servers here -->

            </raft_configuration>
    </keeper_server>

    <openSSL>
      <server>
            <!-- Used for secure tcp port -->
            <!-- openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt -->
            <!-- <certificateFile>/etc/clickhouse-keeper/server.crt</certificateFile> -->
            <!-- <privateKeyFile>/etc/clickhouse-keeper/server.key</privateKeyFile> -->
            <!-- dhparams are optional. You can delete the <dhParamsFile> element.
                 To generate dhparams, use the following command:
                  openssl dhparam -out /etc/clickhouse-keeper/dhparam.pem 4096
                 Only file format with BEGIN DH PARAMETERS is supported.
              -->
            <!-- <dhParamsFile>/etc/clickhouse-keeper/dhparam.pem</dhParamsFile> -->
            <verificationMode>none</verificationMode>
            <loadDefaultCAFile>true</loadDefaultCAFile>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
        </server>
    </openSSL>

</clickhouse>

3. portainer部署

(1)yml文件示例

在修改好node1~node3每个节点的keeper配置文件后,就可以直接使用portainer启动这3个节点上的keeper服务,yml文件如下:

复制代码
version: '3.8'

services:
  keeper1:
    image: clickhouse/clickhouse-keeper:head-alpine
    ports:
      - "9181:9181"
    volumes:
      - keeper_data1:/var/lib/clickhouse-keeper
      - /root/share/ckconfig/clickhouse-keeper.xml:/etc/clickhouse-keeper/keeper_config.xml:ro
      - /root/share/cklog:/var/log:rw
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.Labels.sn == 1
    networks:
      - kafka_kafka

  keeper2:
    image: clickhouse/clickhouse-keeper:head-alpine
    ports:
      - "9182:9181"
    volumes:
      - keeper_data2:/var/lib/clickhouse-keeper
      - /root/share/ckconfig/clickhouse-keeper.xml:/etc/clickhouse-keeper/keeper_config.xml:ro
      - /root/share/cklog:/var/log:rw
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.Labels.sn == 2
    networks:
      - kafka_kafka

  keeper3:
    image: clickhouse/clickhouse-keeper:head-alpine
    ports:
      - "9183:9181"
    volumes:
      - keeper_data3:/var/lib/clickhouse-keeper
      - /root/share/ckconfig/clickhouse-keeper.xml:/etc/clickhouse-keeper/keeper_config.xml:ro
      - /root/share/cklog:/var/log:rw
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.Labels.sn == 3    
    networks:
      - kafka_kafka

volumes:
  keeper_data1:
    driver: local
  keeper_data2:
    driver: local
  keeper_data3:
    driver: local

networks:
  kafka_kafka:
    external: true

(2)参数解释

yml文件中参数基本与keeper本身的部署无关,主要是用来限制部署位置及映射配置文件地址的:

① volumes配置

将宿主机上我们配置好的clickhouse-keeper.xml文件映射到容器内部,替换默认配置的/etc/clickhouse-keeper/keeper_config.xml文件。需要十分注意"-"和"_"的区别,被这个坑好久......

② deploy配置

使用constraints,通过设置node.Labels.sn的值将keeper部署在对应节点上

4. 检查Keeper集群部署情况

Keeper成功启动的情况下,前面配置文件中所设置的监听端口9181应该由进程dockerd监听:

复制代码
[root@node1 ckconfig]# netstat -lntup|grep 9181
tcp6       0      0 :::9181                 :::*                    LISTEN      68710/dockerd     

但仅仅是9181监听端口打开不但表keeper集群启动及选举成功。不过好在9181这个端口也兼容Zookeeper的4字命令,可以用来查看集群状态

(1)keeper 4字命令

通过将4子命令的字母发向集群中某个keeper节点的监听端口,则可以通过其反馈信息掌握集群节点状态。这些4字命令还有不少,比较常用的如stat,还有一个搞笑的------"ruok",容易让人想起某总。

菜鸟教程中给出了zookeeper的四字命令列表:

(2) 查看keeper节点状态

通过向不同keeper节点的监听端口发送4字节命令可以查看这些节点在集群中的角色是leader还是follower。不过,由于使用了docker stack的overlay网络,这里节点IP实际会被负载均衡,直接用IP加9181端口方式,查看的实际都是暴漏9181端口的那个服务的状态:

复制代码
[root@node1 ckconfig]# echo stat|nc 192.168.76.12 9181
ClickHouse Keeper version: v25.11.1.2618-testing-0e8b02fb9a8a99a7e88641ff1254673a44a78f72
Clients:
 10.0.0.3:46746(recved=0,sent=0)

Latency min/avg/max: 0/0/0
Received: 0
Sent: 0
Connections: 0
Outstanding: 0
Zxid: 0x
Mode: follower
Node count: 4
[root@node1 ckconfig]# echo stat|nc 192.168.76.11 9181
ClickHouse Keeper version: v25.11.1.2618-testing-0e8b02fb9a8a99a7e88641ff1254673a44a78f72
Clients:
 10.0.0.2:52132(recved=0,sent=0)

Latency min/avg/max: 0/0/0
Received: 0
Sent: 0
Connections: 0
Outstanding: 0
Zxid: 0x
Mode: follower
Node count: 4
[root@node1 ckconfig]# echo stat|nc 192.168.76.13 9181
ClickHouse Keeper version: v25.11.1.2618-testing-0e8b02fb9a8a99a7e88641ff1254673a44a78f72
Clients:
 10.0.0.4:36430(recved=0,sent=0)

Latency min/avg/max: 0/0/0
Received: 0
Sent: 0
Connections: 0
Outstanding: 0
Zxid: 0x
Mode: follower
Node count: 4

所以,直接使用各节点暴露的不同端口,就可以查看不同服务节点的状态了:

复制代码
[root@node1 ckconfig]# echo stat|nc 192.168.76.11 9181
ClickHouse Keeper version: v25.11.1.2618-testing-0e8b02fb9a8a99a7e88641ff1254673a44a78f72
Clients:
 10.0.0.2:53598(recved=0,sent=0)

Latency min/avg/max: 0/0/0
Received: 0
Sent: 0
Connections: 0
Outstanding: 0
Zxid: 0x
Mode: follower
Node count: 4
[root@node1 ckconfig]# echo stat|nc 192.168.76.11 9182
ClickHouse Keeper version: v25.11.1.2618-testing-0e8b02fb9a8a99a7e88641ff1254673a44a78f72
Clients:
 10.0.0.2:33330(recved=0,sent=0)

Latency min/avg/max: 0/0/0
Received: 0
Sent: 0
Connections: 0
Outstanding: 0
Zxid: 0x
Mode: leader
Node count: 4
[root@node1 ckconfig]# echo stat|nc 192.168.76.11 9183
ClickHouse Keeper version: v25.11.1.2618-testing-0e8b02fb9a8a99a7e88641ff1254673a44a78f72
Clients:
 10.0.0.2:37114(recved=0,sent=0)

Latency min/avg/max: 0/0/0
Received: 0
Sent: 0
Connections: 0
Outstanding: 0
Zxid: 0x
Mode: follower
Node count: 4

二、Clickhouse集群部署

1. 下载Clickhouse Server镜像

2. Server配置文件

(1)拷贝配置文件

复制代码
[root@node1 ckconfig]# docker run -it --name ckserver -v /root/share/ckconfig:/opt:rw clickhouse/clickhouse-server:head-alpine bash
0d2fbbd44e03:/# cd /etc/clickhouse-server/
0d2fbbd44e03:/etc/clickhouse-server# cp config.xml /opt/server_config.xml
0d2fbbd44e03:/etc/clickhouse-server# cp users.xml /opt/server_users.xml
0d2fbbd44e03:/etc/clickhouse-server# exit

(2)更改clickhouse server配置文件config.xml

① 监听端口

外部接入集群的数据库查询接口,使用浏览器从WEB页面,或者以DBeaver连接,均需要对应配置这个接口:

复制代码
<http_port>8123</http_port>

各节点间通信接口

复制代码
<tcp_port>9000</tcp_port>

其它还有一些数据库接口被默认设置,没必要去修改,但需要注意后面配置及docker映射的端口不要与其冲突即可::

② 监听地址
复制代码
    <listen_host>0.0.0.0</listen_host>

打开0.0.0.0(IPV4)的监听外部连接配置。最好不要打开::(IPV6)那个,以免后面调试的时候连上IPV6不太好判断。

③ Keeper地址
复制代码
    <zookeeper>
        <node>
            <host>keeper1</host>
            <port>9181</port>
        </node>
        <node>
            <host>keeper2</host>
            <port>9181</port>
        </node>
        <node>
            <host>keeper3</host>
            <port>9181</port>
        </node>
    </zookeeper>

即前面我们配置的Keeper的server名称及监听端口。需要注意的是,不论是keeper还是ckserver这2个stack,我们都是加入了之前kafka的那个overlay网络,所以本质上它们在同一个内网内,只需要使用9181端口连接就可以,而不是容器映射的那个外部端口。

另外,不要把keeper和kafka的controller装在同一个节点上,否则会造成kraft冲突。从这个层面上说,感觉应该clickhouse server可以公用kafka的controller的kraft,也许能过够省略部署keeper这一步------这个免得麻烦起见,我们并没有尝试。

④ Remote Server列表
复制代码
    <remote_servers>
        <pighome>
            <shard>
                <replica>
                    <host>ck1</host>
                    <port>9000</port>
                    <user>default</user>
                    <password>123456</password>
                </replica>
                <replica>
                    <host>ck2</host>
                    <port>9000</port>
                    <user>default</user>
                    <password>123456</password>
                </replica>
            </shard>
            <shard>
                <replica>
                    <host>ck3</host>
                    <port>9000</port>
                    <user>default</user>
                    <password>123456</password>
                </replica>
                <replica>
                    <host>ck4</host>
                    <port>9000</port>
                    <user>default</user>
                    <password>123456</password>
                </replica>
            </shard>
            <shard>
                <replica>
                    <host>ck5</host>
                    <port>9000</port>
                    <user>default</user>
                    <password>123456</password>
                </replica>
                <replica>
                    <host>ck6</host>
                    <port>9000</port>
                    <user>default</user>
                    <password>123456</password>
                </replica>
            </shard>
        </pighome>
    </remote_servers>

我们使用了6个节点,部署了3个shard,每个shard部署一主一备2个节点,这些需要在remote-server节中列出来。

关系到的配置如下:

  • **集群名称:**remote-server节中我们取值为<pighome>的节,代表集群名称,这个会在分布式表中用到
  • **internal_replication:**内部复制,取值为false标识复制到所有replica,true为仅复制一份
  • **weight:**标识分片保留数据的权重分配
  • **主机名:**即swarm yml文件中的服务名
  • **端口:**应与clickhouse server内部通信端口一致,如9000
  • **用户名:**用来登录数据库的用户名(这里使用比较简单的用户名口令配置方式,从而也避免了去配置及映射users.xml配置文件)
  • **口令:**对应登录用户名的口令
⑤ 分片标识
复制代码
    <macros>
        <shard>ck1ck4</shard>
        <replica>ck1</replica>
    </macros>

分片标识是用于决定不同节点在clickhouse集群中角色分派的配置,通过名称及名称映射关系配置集群shard切分及节点主备关系、拓扑。

在上述参数中,分片标识是配置文件中唯一随所部署节点不同而又不同值的配置,如我们将6个节点分为3个shard,1主1备,则不同节点上的分片标识取值如下表:

|--------|---------|-----------|-------------|
| 序号 | 服务名 | shard | replica |
| 1 | ck1 | ck1ck4 | ck1 |
| 2 | ck2 | ck2ck5 | ck2 |
| 3 | ck3 | ck3ck6 | ck3 |
| 4 | ck4 | ck1ck4 | ck4 |
| 5 | ck5 | ck2ck5 | ck5 |
| 6 | ck6 | ck3ck6 | ck6 |

shard、replica的取名也可以随意,只不过要注意好唯一性和对应关系即可

⑥ 内部复制配置
复制代码
<internal_replication>false</internal_replication>

<internal_replication>false</internal_replication> 时,表示在 ClickHouse 集群中,分布式表在写入数据时会向所有副本(replica)写入数据,但不保证数据写入的一致性。

具体来说:

  1. 写入行为 ‌:当设置为 false 时,分布式表会将数据写入到分片(shard)中的所有副本中。

  2. 数据一致性‌:虽然数据会被写入到所有副本,但由于不保证写入的一致性,随着时间推移,各个副本之间的数据可能会出现差异。

  3. 适用场景‌:这种配置通常用于非复制表(non-replicated tables)的情况,即底层表不是复制表时。

  4. 与复制表的区别 ‌:如果底层表是复制表(replicated table),则应该设置为 true,这样分布式表不会向所有副本写入,而只是写入到一个副本,由复制表自身处理副本间的数据同步。

  5. 同步机制 ‌:对于设置为 false 的情况,数据同步通常依赖于 ZooKeeper 来管理副本同步信息。

⑦ shard权重
复制代码
<weight>1</weight>

一般配成1就行

3. portainer部署

(1)yml文件示例

复制代码
version: '3.8'

services:
  keeper1:
    hostname: keeper1
    image: clickhouse/clickhouse-keeper:head-alpine
    ports:
      - "9181:9181"
    volumes:
      - keeper_data1:/var/lib/clickhouse-keeper
      - /root/share/ckconfig/clickhouse-keeper.xml:/etc/clickhouse-keeper/keeper_config.xml:ro
      - /root/share/cklog:/var/log:rw
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.Labels.sn == 1
    networks:
      - cknet

  keeper2:
    hostname: keeper2
    image: clickhouse/clickhouse-keeper:head-alpine
    ports:
      - "9182:9181"
    volumes:
      - keeper_data2:/var/lib/clickhouse-keeper
      - /root/share/ckconfig/clickhouse-keeper.xml:/etc/clickhouse-keeper/keeper_config.xml:ro
      - /root/share/cklog:/var/log:rw
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.Labels.sn == 2
    networks:
      - cknet

  keeper3:
    hostname: keeper3
    image: clickhouse/clickhouse-keeper:head-alpine
    ports:
      - "9183:9181"
    volumes:
      - keeper_data3:/var/lib/clickhouse-keeper
      - /root/share/ckconfig/clickhouse-keeper.xml:/etc/clickhouse-keeper/keeper_config.xml:ro
      - /root/share/cklog:/var/log:rw
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.Labels.sn == 3    
    networks:
      - cknet

  ck1:
    hostname: ck1
    image: clickhouse/clickhouse-server:head-alpine
    ports:
      - "8123:8123"
      - "9000:9000"
    environment:
      - CLICKHOUSE_USER=default
      - CLICKHOUSE_PASSWORD=123456
    volumes:
      - ck_data1:/var/lib/clickhouse
      - /root/share/ckconfig/server_config.xml:/etc/clickhouse-server/config.xml:ro
      - /root/share/ckconfig/logs:/var/log/clickhouse-server:rw
    depends_on:
      - keeper1
      - keeper2
      - keeper3
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.Labels.sn == 1
    networks:
      - cknet

  ck2:
    hostname: ck2
    image: clickhouse/clickhouse-server:head-alpine
    ports:
      - "8223:8123"
      - "9100:9000"
    environment:
      - CLICKHOUSE_USER=default
      - CLICKHOUSE_PASSWORD=123456
    volumes:
      - ck_data2:/var/lib/clickhouse
      - /root/share/ckconfig/server_config.xml:/etc/clickhouse-server/config.xml:ro
      - /root/share/ckconfig/logs:/var/log/clickhouse-server:rw
    depends_on:
      - keeper1
      - keeper2
      - keeper3
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.Labels.sn == 2
    networks:
      - cknet

  ck3:
    hostname: ck3
    image: clickhouse/clickhouse-server:head-alpine
    ports:
      - "8323:8123"
      - "9200:9000"
    environment:
      - CLICKHOUSE_USER=default
      - CLICKHOUSE_PASSWORD=123456
    volumes:
      - ck_data3:/var/lib/clickhouse
      - /root/share/ckconfig/server_config.xml:/etc/clickhouse-server/config.xml:ro
      - /root/share/ckconfig/logs:/var/log/clickhouse-server:rw
    depends_on:
      - keeper1
      - keeper2
      - keeper3    
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.Labels.sn == 3
    networks:
      - cknet

  ck4:
    hostname: ck4
    image: clickhouse/clickhouse-server:head-alpine
    ports:
      - "8423:8123"
      - "9300:9000"
    environment:
      - CLICKHOUSE_USER=default
      - CLICKHOUSE_PASSWORD=123456
    volumes:
      - ck_data4:/var/lib/clickhouse
      - /root/share/ckconfig/server_config.xml:/etc/clickhouse-server/config.xml:ro
      - /root/share/ckconfig/logs:/var/log/clickhouse-server:rw
    depends_on:
      - keeper1
      - keeper2
      - keeper3
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.Labels.sn == 4
    networks:
      - cknet

  ck5:
    hostname: ck5
    image: clickhouse/clickhouse-server:head-alpine
    ports:
      - "8523:8123"
      - "9400:9000"
    environment:
      - CLICKHOUSE_USER=default
      - CLICKHOUSE_PASSWORD=123456
    volumes:
      - ck_data5:/var/lib/clickhouse
      - /root/share/ckconfig/server_config.xml:/etc/clickhouse-server/config.xml:ro
      - /root/share/ckconfig/logs:/var/log/clickhouse-server:rw
    depends_on:
      - keeper1
      - keeper2
      - keeper3
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.Labels.sn == 5
    networks:
      - cknet

  ck6:
    hostname: ck6
    image: clickhouse/clickhouse-server:head-alpine
    ports:
      - "8623:8123"
      - "9500:9000"
    environment:
      - CLICKHOUSE_USER=default
      - CLICKHOUSE_PASSWORD=123456
    volumes:
      - ck_data6:/var/lib/clickhouse
      - /root/share/ckconfig/server_config.xml:/etc/clickhouse-server/config.xml:ro
      - /root/share/ckconfig/logs:/var/log/clickhouse-server:rw
    depends_on:
      - keeper1
      - keeper2
      - keeper3
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.Labels.sn == 6
    networks:
      - cknet

volumes:
  ck_data1:
    driver: local
  ck_data2:
    driver: local
  ck_data3:
    driver: local
  ck_data4:
    driver: local
  ck_data5:
    driver: local
  ck_data6:
    driver: local
  keeper_data1:
    driver: local
  keeper_data2:
    driver: local
  keeper_data3:
    driver: local

networks:
  cknet:
    driver: overlay
    attachable: true

(2)参数解释

在environment参数中设置好启动所用用户名和默认密码------即我们在config.xml配置文件的remote-server节中配置的。

9000端口应该不需要映射,这里我们也没删,就这样吧

重要的是将我们修改后的各节点上的config.xml配置文件映射到容器内/etc/clickhouse-server/config.xml文件

另外由于数据需要保存在对应的节点上,避免不必要的迁移,所以同样需要以node.Label.sn来约束。

(3)几点说明

每次配clickhouse的swarm版本都很玄幻,一个是因为每个节点都需要单独的改配置文件,一旦有那么一点错误,就有可能出问题,而且log文件中还不怎么能看明白,只能反复检查配置文件,改这改那,突然就又行了。所以,下面是几个从无法工作状态到可以工作状态中改变的配置,具体是哪个真起了作用就不知道了。

① hostname

按理说在stack中一般都是用service name代替hostname使用,实际进容器去ping也会发现service name是可以当作hostname去ping的,所以应该说hostname是不必强制配置的。不过确实不知hostname不配的情况下是否可能影响swarm的负载均衡,总之无所谓的情况下不如配了吧。

② depends-on

server启动时需要依赖keeper。这个可能是真的需要配,但可能影响并不是决定性的------在不配置的关系的情况下也成功启动过,但不知道是不是因为恰好keeper先于server启动成功的缘故。

③ CLICKHOUSE_USER

CLICKHOUSE_USER配置为default,虽然配置成其它的用户------比如"root"也行,应该只要与config.xml中remote_server下的配置保持一致即可,但不知是否需要另行配置user.xml并与其保持一致。

以上几点试一下也不难,不过反复折腾下,已经燃尽了,不想试了......,下次碰到玄幻时再说吧。

4. 检查clickhouse部署情况

(1) 从WEB访问

从集群外部登录集群中节点的8123端口:

点击Web SQL UI:

使用所配置的用户名密码,执行select指令

sql 复制代码
SELECT * FROM system.clusters WHERE cluster = 'pighome';

查看分片情况,和我们在配置文件中配置的一致,说明集群启动成功了。

(2)从DBeaver访问

效果是一样的:

三、 配置从kafka到clickhouse的连接

Clickhouse本身支持创建到Kafka的连接,提供了相应的表引擎。通过创建一个Kafka表引擎,然后利用一个物化视图和对应本地表连接,再通过对应本地表的分布式表,即可将kafka的数据输入到Clickhouse集群,并进行检索查询了。

1. 将Kafka加入到Clickhouse所在网络

在前面我们已经试验过的Kafka的stack中,更改一下网络,将3个kafka节点加入到ck的网络里面来,注意网络名称及其前缀。另外需要再次提起的是,不要把kafka和keeper装在一个宿主机上,kraft会冲突。

启动后,整个swarm集群看起来如下所示:

2. 连接Kafka到Clickhouse

(1)创建Kafka引擎表

sql 复制代码
create table kafkaQueue
( id UInt16,
  line String )
engine = Kafka
settings
	kafka_broker_list   = 'kafka1:9092,kafka2:9092,kafka3:9092',
	kafka_topic_list    = 'testout',
	kafka_group_name    = 'lib',
	kafka_format        = 'CSV',
	kafka_num_consumers = 6,
	kafka_skip_broken_messages = 100;

(2)创建本地表

sql 复制代码
create table localLines on cluster pighome
( id UInt16,
  line String )
engine = ReplicatedReplacingMergeTree
partition by id
order by line;

(3)创建分布式表

sql 复制代码
create table distLines as localLines
engine = Distributed(pighome,default,localLines,rand());

(4)创建物化视图

sql 复制代码
create materialized view kafka2table
to distLines as
select * from kafkaQueue;

然后就可以使用select从分布式表中检索从kafka传入的数据,当然现在还没有:

3. 测试

在node4中进入容器,启动生产者,连接testout主题:

bash 复制代码
[root@node4 ckconfig]# docker container ls
CONTAINER ID   IMAGE                                      COMMAND                   CREATED          STATUS          PORTS                          NAMES
511800aaaa1c   bitnami/kafka:latest                       "/opt/bitnami/script..."   37 seconds ago   Up 34 seconds   9092/tcp                       kafka_kafka1.1.gr5dc19ocrflakv35lu0xukzd
c85c8ed53fcb   clickhouse/clickhouse-server:head-alpine   "/entrypoint.sh"          3 hours ago      Up 3 hours      8123/tcp, 9000/tcp, 9009/tcp   testck_ck4.1.ghx7hjzwi0mpcm9mpc33ypcwz
[root@node4 ckconfig]# docker exec -it 5118 bash
I have no name!@511800aaaa1c:/$ kafka-topics.sh --list --bootstrap-server 'kafka1:9092,kafka2:9092,kafka3:9092'
__consumer_offsets
I have no name!@511800aaaa1c:/$ kafka-topics.sh --create --topic testout --bootstrap-server 'kafka1:9092,kafka2:9092,kafka3:9092'
Created topic testout.
I have no name!@511800aaaa1c:/$ kafka-console-producer.sh --topic testout --bootstrap-server 'kafka1:9092,kafka2:9092,kafka3:9092'
>1,a
>2,b
>3,c
>1,a
>2,b
>3,c
>1,a
>2,b
>3,c
>

在node5中进入容器,启动消费者,以确定kafka工作正常:

bash 复制代码
[root@node5 ckconfig]# docker container ls
CONTAINER ID   IMAGE                                      COMMAND                   CREATED         STATUS         PORTS                          NAMES
814de9f38cca   bitnami/kafka:latest                       "/opt/bitnami/script..."   3 minutes ago   Up 3 minutes   9092/tcp                       kafka_kafka2.1.0wbapmx2bvyyts6qh4k1pj95l
6e37c6c92293   clickhouse/clickhouse-server:head-alpine   "/entrypoint.sh"          3 hours ago     Up 3 hours     8123/tcp, 9000/tcp, 9009/tcp   testck_ck5.1.y8ypybs3uuyotangvn65hgi28
[root@node5 ckconfig]# docker exec -it 814d bash
I have no name!@814de9f38cca:/$ kafka-console-consumer.sh --topic testout --from-beginning --bootstrap-server 'kafka1:9092,kafka2:9092,kafka3:9092'
1,a
2,b
3,c
1,a
2,b
3,c
1,a
2,b
3,c

在DBeaver中,查询分布式表:select * from distLines

结果如下图所示:

相关推荐
小夏子_riotous6 小时前
openstack的使用——9. 密钥管理服务Barbican
linux·运维·服务器·系统架构·centos·云计算·openstack
架构师老Y10 小时前
011、消息队列应用:RabbitMQ、Kafka与Celery
python·架构·kafka·rabbitmq·ruby
二宝哥11 小时前
Failed connect to mirrorlist.centos.org:80; Connection refused
linux·运维·centos
talen_hx29614 小时前
《kafka核心源码解读》学习笔记 Day 02
笔记·学习·kafka
lifallen14 小时前
如何保证 Kafka 的消息顺序性?
java·大数据·分布式·kafka
真实的菜14 小时前
Kafka 2.x vs 3.x,我为什么选择升级?
kafka
时光追逐者14 小时前
分享四款开源且实用的 Kafka 管理工具
分布式·kafka·开源
Rick199314 小时前
rabbitmq, rocketmq, kafka这三种消息如何分别保住可靠性,顺序性,以及应用场景?
kafka·rabbitmq·rocketmq
☞遠航☜16 小时前
kafka快速上手
分布式·kafka·linq
工具罗某人1 天前
docker compose部署kafka集群搭建
docker·容器·kafka