clickhouse迁移工具clickhouse-copier

原来的集群没有高可用而且配置很低,所有准备将一个3分片1副本集群迁到1分片2副本的高配置集群,数据量比较大。

虽然官方已经标记clickhouse-copier已经过时,但为了方便和高效还是用了copier,效果还挺好

以下是使用步骤:

  1. 服务器上下载copier解压后,首先需要准备keeper.xml文件,目标端使用的是clickhouse-keeper,这里配置还是zookeeper但是不影响
bash 复制代码
<!-- config.xml -->
<clickhouse>
    <logger>
        <level>trace</level>
        <size>100M</size>
        <log>./log/clickhouse-copier/copier/log.log</log>
        <errorlog>./log/clickhouse-copier/copier/log.err.log</errorlog>
        <count>3</count>
    </logger>
    
    # 填入对应的 keeper的ip地址
    <zookeeper>
        <node index="1">
            <host>172.23.3.44</host>
            <port>9181</port>
        </node>
        <node index="2">
            <host>172.23.3.55</host>
            <port>9181</port>
        </node>
        <node index="3">
            <host>172.23.3.66</host>
            <port>9181</port>
        </node>
    </zookeeper>

</clickhouse>
  1. 准备迁移任务文件task_migrate_all.xml,可以将需要迁移的表放在一个文件,或者分成多个文件都可,但保证需要迁移的表在源集群所有节点都有,可以建空表来跳过检查,否则会报错
bash 复制代码
<clickhouse>
    <remote_servers>
        ##源集群地址
        <source_cluster>
            <shard>
                <internal_replication>false</internal_replication>
                <replica>
                    <host>172.23.3.11</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
            </shard>
            <shard>
                <internal_replication>false</internal_replication>
                <replica>
                    <host>172.23.3.12</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
            </shard>
            <shard>
                <internal_replication>false</internal_replication>
                <replica>
                    <host>172.23.3.13</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
            </shard>
        </source_cluster>
        #目标集群地址
        <destination_cluster>
            <shard>
		<internal_replication>true</internal_replication>
                <replica>
                    <host>172.23.3.44</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
                <replica>
                    <host>172.23.3.55</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
            </shard>
        </destination_cluster>
    </remote_servers>
    # 并发线程
    <max_workers>12</max_workers>
    #对源库只读
    <settings_pull>
        <readonly>1</readonly>
    </settings_pull>

    <!-- Setting used to insert (push) data to destination cluster tables -->
    <settings_push>
        <readonly>0</readonly>
    </settings_push>

    <settings>
        <connect_timeout>3</connect_timeout>
        <!-- Sync insert is set forcibly, leave it here just in case. -->
        <distributed_foreground_insert>1</distributed_foreground_insert>
    </settings>
    # 需要迁移的表,可以tables里面可以写多张表
    <tables>
        <table_sms_send>
            <cluster_pull>source_cluster</cluster_pull>
            <database_pull>test_qwe</database_pull>
            <table_pull>sms_send</table_pull>
    
            <cluster_push>destination_cluster</cluster_push>
            <database_push>test_qwe</database_push>
            <table_push>sms_send</table_push>
            # 因为是多副本,需要使用复制表,否则只有单节点有数据
            <engine>
            ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{cluster}/{shard}/{database}/{table}', '{replica}')
            PARTITION BY toYYYYMM(create_time)
            ORDER BY (id, phone_number, state, create_time)
            </engine>
            # 1分片不需要分片键,这里设置为1
            <sharding_key>1</sharding_key>
        </table_sms_send>
    
        <table_use_car>
            <cluster_pull>source_cluster</cluster_pull>
            <database_pull>test_qwe</database_pull>
            <table_pull>use_car</table_pull>
    
            <cluster_push>destination_cluster</cluster_push>
            <database_push>testqwe</database_push>
            <table_push>use_car</table_push>
    
            <engine>
            ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{cluster}/{shard}/{database}/{table}', '{replica}', add_time)
            ORDER BY (id, detail_id)
            </engine>
            <sharding_key>1</sharding_key>
        </table_use_car>
    </tables>
 
</clickhouse>
  1. 在clickhouse-copier解压目录执行命令开始迁移 .
bash 复制代码
# 这里task-path表示keeper存储的迁移任务路径
/clickhouse-copier --config keeper.xml --task-path /clickhouse/copier_task/mig_test --task-file=task_migrate_all.xml 
相关推荐
securitor8 小时前
【clickhouse】设置密码
clickhouse
天道有情战天下3 天前
ClickHouse使用Docker部署
clickhouse·docker·容器
冷雨夜中漫步4 天前
ClickHouse常见问题——ClickHouseKeeper配置listen_host后不生效
java·数据库·clickhouse
qq_339191144 天前
docker 启动一个clickhouse , docker 创建ck数据库
clickhouse·docker·容器
Kookoos6 天前
ABP + ClickHouse 实时 OLAP:物化视图与写入聚合
clickhouse·c#·linq·abp vnext·实时olap
喂完待续7 天前
【Big Data】AI赋能的ClickHouse 2.0:从JIT编译到LLM查询优化,下一代OLAP引擎进化路径
大数据·数据库·clickhouse·数据分析·olap·big data·序列晋升
阿里云大数据AI技术15 天前
鹰角网络基于阿里云 EMR Serverless StarRocks 的实时分析工程实践
starrocks·clickhouse·阿里云·emr·实时分析
更深兼春远18 天前
flink+clinkhouse安装部署
大数据·clickhouse·flink