clickhouse迁移工具clickhouse-copier

原来的集群没有高可用而且配置很低，所有准备将一个3分片1副本集群迁到1分片2副本的高配置集群，数据量比较大。

虽然官方已经标记clickhouse-copier已经过时，但为了方便和高效还是用了copier，效果还挺好

以下是使用步骤：

服务器上下载copier解压后，首先需要准备keeper.xml文件，目标端使用的是clickhouse-keeper，这里配置还是zookeeper但是不影响

bash 复制代码

<!-- config.xml -->
<clickhouse>
    <logger>
        <level>trace</level>
        <size>100M</size>
        <log>./log/clickhouse-copier/copier/log.log</log>
        <errorlog>./log/clickhouse-copier/copier/log.err.log</errorlog>
        <count>3</count>
    </logger>
    
    # 填入对应的 keeper的ip地址
    <zookeeper>
        <node index="1">
            <host>172.23.3.44</host>
            <port>9181</port>
        </node>
        <node index="2">
            <host>172.23.3.55</host>
            <port>9181</port>
        </node>
        <node index="3">
            <host>172.23.3.66</host>
            <port>9181</port>
        </node>
    </zookeeper>

</clickhouse>

准备迁移任务文件task_migrate_all.xml，可以将需要迁移的表放在一个文件，或者分成多个文件都可，但保证需要迁移的表在源集群所有节点都有，可以建空表来跳过检查，否则会报错

bash 复制代码

<clickhouse>
    <remote_servers>
        ##源集群地址
        <source_cluster>
            <shard>
                <internal_replication>false</internal_replication>
                <replica>
                    <host>172.23.3.11</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
            </shard>
            <shard>
                <internal_replication>false</internal_replication>
                <replica>
                    <host>172.23.3.12</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
            </shard>
            <shard>
                <internal_replication>false</internal_replication>
                <replica>
                    <host>172.23.3.13</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
            </shard>
        </source_cluster>
        #目标集群地址
        <destination_cluster>
            <shard>
		<internal_replication>true</internal_replication>
                <replica>
                    <host>172.23.3.44</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
                <replica>
                    <host>172.23.3.55</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
            </shard>
        </destination_cluster>
    </remote_servers>
    # 并发线程
    <max_workers>12</max_workers>
    #对源库只读
    <settings_pull>
        <readonly>1</readonly>
    </settings_pull>

    <!-- Setting used to insert (push) data to destination cluster tables -->
    <settings_push>
        <readonly>0</readonly>
    </settings_push>

    <settings>
        <connect_timeout>3</connect_timeout>
        <!-- Sync insert is set forcibly, leave it here just in case. -->
        <distributed_foreground_insert>1</distributed_foreground_insert>
    </settings>
    # 需要迁移的表,可以tables里面可以写多张表
    <tables>
        <table_sms_send>
            <cluster_pull>source_cluster</cluster_pull>
            <database_pull>test_qwe</database_pull>
            <table_pull>sms_send</table_pull>
    
            <cluster_push>destination_cluster</cluster_push>
            <database_push>test_qwe</database_push>
            <table_push>sms_send</table_push>
            # 因为是多副本，需要使用复制表，否则只有单节点有数据
            <engine>
            ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{cluster}/{shard}/{database}/{table}', '{replica}')
            PARTITION BY toYYYYMM(create_time)
            ORDER BY (id, phone_number, state, create_time)
            </engine>
            # 1分片不需要分片键，这里设置为1
            <sharding_key>1</sharding_key>
        </table_sms_send>
    
        <table_use_car>
            <cluster_pull>source_cluster</cluster_pull>
            <database_pull>test_qwe</database_pull>
            <table_pull>use_car</table_pull>
    
            <cluster_push>destination_cluster</cluster_push>
            <database_push>testqwe</database_push>
            <table_push>use_car</table_push>
    
            <engine>
            ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{cluster}/{shard}/{database}/{table}', '{replica}', add_time)
            ORDER BY (id, detail_id)
            </engine>
            <sharding_key>1</sharding_key>
        </table_use_car>
    </tables>
 
</clickhouse>

在clickhouse-copier解压目录执行命令开始迁移 .

bash 复制代码

# 这里task-path表示keeper存储的迁移任务路径
/clickhouse-copier --config keeper.xml --task-path /clickhouse/copier_task/mig_test --task-file=task_migrate_all.xml