clickhouse迁移工具clickhouse-copier

原来的集群没有高可用而且配置很低,所有准备将一个3分片1副本集群迁到1分片2副本的高配置集群,数据量比较大。

虽然官方已经标记clickhouse-copier已经过时,但为了方便和高效还是用了copier,效果还挺好

以下是使用步骤:

  1. 服务器上下载copier解压后,首先需要准备keeper.xml文件,目标端使用的是clickhouse-keeper,这里配置还是zookeeper但是不影响
bash 复制代码
<!-- config.xml -->
<clickhouse>
    <logger>
        <level>trace</level>
        <size>100M</size>
        <log>./log/clickhouse-copier/copier/log.log</log>
        <errorlog>./log/clickhouse-copier/copier/log.err.log</errorlog>
        <count>3</count>
    </logger>
    
    # 填入对应的 keeper的ip地址
    <zookeeper>
        <node index="1">
            <host>172.23.3.44</host>
            <port>9181</port>
        </node>
        <node index="2">
            <host>172.23.3.55</host>
            <port>9181</port>
        </node>
        <node index="3">
            <host>172.23.3.66</host>
            <port>9181</port>
        </node>
    </zookeeper>

</clickhouse>
  1. 准备迁移任务文件task_migrate_all.xml,可以将需要迁移的表放在一个文件,或者分成多个文件都可,但保证需要迁移的表在源集群所有节点都有,可以建空表来跳过检查,否则会报错
bash 复制代码
<clickhouse>
    <remote_servers>
        ##源集群地址
        <source_cluster>
            <shard>
                <internal_replication>false</internal_replication>
                <replica>
                    <host>172.23.3.11</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
            </shard>
            <shard>
                <internal_replication>false</internal_replication>
                <replica>
                    <host>172.23.3.12</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
            </shard>
            <shard>
                <internal_replication>false</internal_replication>
                <replica>
                    <host>172.23.3.13</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
            </shard>
        </source_cluster>
        #目标集群地址
        <destination_cluster>
            <shard>
		<internal_replication>true</internal_replication>
                <replica>
                    <host>172.23.3.44</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
                <replica>
                    <host>172.23.3.55</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
            </shard>
        </destination_cluster>
    </remote_servers>
    # 并发线程
    <max_workers>12</max_workers>
    #对源库只读
    <settings_pull>
        <readonly>1</readonly>
    </settings_pull>

    <!-- Setting used to insert (push) data to destination cluster tables -->
    <settings_push>
        <readonly>0</readonly>
    </settings_push>

    <settings>
        <connect_timeout>3</connect_timeout>
        <!-- Sync insert is set forcibly, leave it here just in case. -->
        <distributed_foreground_insert>1</distributed_foreground_insert>
    </settings>
    # 需要迁移的表,可以tables里面可以写多张表
    <tables>
        <table_sms_send>
            <cluster_pull>source_cluster</cluster_pull>
            <database_pull>test_qwe</database_pull>
            <table_pull>sms_send</table_pull>
    
            <cluster_push>destination_cluster</cluster_push>
            <database_push>test_qwe</database_push>
            <table_push>sms_send</table_push>
            # 因为是多副本,需要使用复制表,否则只有单节点有数据
            <engine>
            ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{cluster}/{shard}/{database}/{table}', '{replica}')
            PARTITION BY toYYYYMM(create_time)
            ORDER BY (id, phone_number, state, create_time)
            </engine>
            # 1分片不需要分片键,这里设置为1
            <sharding_key>1</sharding_key>
        </table_sms_send>
    
        <table_use_car>
            <cluster_pull>source_cluster</cluster_pull>
            <database_pull>test_qwe</database_pull>
            <table_pull>use_car</table_pull>
    
            <cluster_push>destination_cluster</cluster_push>
            <database_push>testqwe</database_push>
            <table_push>use_car</table_push>
    
            <engine>
            ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{cluster}/{shard}/{database}/{table}', '{replica}', add_time)
            ORDER BY (id, detail_id)
            </engine>
            <sharding_key>1</sharding_key>
        </table_use_car>
    </tables>
 
</clickhouse>
  1. 在clickhouse-copier解压目录执行命令开始迁移 .
bash 复制代码
# 这里task-path表示keeper存储的迁移任务路径
/clickhouse-copier --config keeper.xml --task-path /clickhouse/copier_task/mig_test --task-file=task_migrate_all.xml 
相关推荐
l1t3 天前
PostgreSQL pg_clickhouse插件的安装和使用
数据库·clickhouse·postgresql·插件
honder试试4 天前
Springboot实现Clickhouse连接池的配置和接口查询
spring boot·后端·clickhouse
Mr_wilson_liu4 天前
通过DBeaver22.0.5 连接数据库ck(clickhouse)、pg(postgres)
数据库·clickhouse
波波仔866 天前
clickhouse表存储引擎
clickhouse·表存储引擎
波波仔866 天前
clickhouse存储和分区
clickhouse·排序·分区
波波仔866 天前
clickhouse insert与update区别
clickhouse·insert·update
波波仔866 天前
clickhouse简介
数据库·clickhouse
深色風信子6 天前
ClickHouse 快速入门
clickhouse·列式存储
波波仔866 天前
行存储与列存储的区别
数据库·clickhouse·行存储·列储存
吃喝不愁霸王餐APP开发者6 天前
霸王餐用户行为埋点:Kafka Connect+ClickHouse实时OLAP分析
分布式·clickhouse·kafka