clickhouse迁移工具clickhouse-copier

原来的集群没有高可用而且配置很低,所有准备将一个3分片1副本集群迁到1分片2副本的高配置集群,数据量比较大。

虽然官方已经标记clickhouse-copier已经过时,但为了方便和高效还是用了copier,效果还挺好

以下是使用步骤:

  1. 服务器上下载copier解压后,首先需要准备keeper.xml文件,目标端使用的是clickhouse-keeper,这里配置还是zookeeper但是不影响
bash 复制代码
<!-- config.xml -->
<clickhouse>
    <logger>
        <level>trace</level>
        <size>100M</size>
        <log>./log/clickhouse-copier/copier/log.log</log>
        <errorlog>./log/clickhouse-copier/copier/log.err.log</errorlog>
        <count>3</count>
    </logger>
    
    # 填入对应的 keeper的ip地址
    <zookeeper>
        <node index="1">
            <host>172.23.3.44</host>
            <port>9181</port>
        </node>
        <node index="2">
            <host>172.23.3.55</host>
            <port>9181</port>
        </node>
        <node index="3">
            <host>172.23.3.66</host>
            <port>9181</port>
        </node>
    </zookeeper>

</clickhouse>
  1. 准备迁移任务文件task_migrate_all.xml,可以将需要迁移的表放在一个文件,或者分成多个文件都可,但保证需要迁移的表在源集群所有节点都有,可以建空表来跳过检查,否则会报错
bash 复制代码
<clickhouse>
    <remote_servers>
        ##源集群地址
        <source_cluster>
            <shard>
                <internal_replication>false</internal_replication>
                <replica>
                    <host>172.23.3.11</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
            </shard>
            <shard>
                <internal_replication>false</internal_replication>
                <replica>
                    <host>172.23.3.12</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
            </shard>
            <shard>
                <internal_replication>false</internal_replication>
                <replica>
                    <host>172.23.3.13</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
            </shard>
        </source_cluster>
        #目标集群地址
        <destination_cluster>
            <shard>
		<internal_replication>true</internal_replication>
                <replica>
                    <host>172.23.3.44</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
                <replica>
                    <host>172.23.3.55</host> 
                    <port>9000</port>
                    <user>user_name</user>
                    <password>pwd</password>
                </replica>
            </shard>
        </destination_cluster>
    </remote_servers>
    # 并发线程
    <max_workers>12</max_workers>
    #对源库只读
    <settings_pull>
        <readonly>1</readonly>
    </settings_pull>

    <!-- Setting used to insert (push) data to destination cluster tables -->
    <settings_push>
        <readonly>0</readonly>
    </settings_push>

    <settings>
        <connect_timeout>3</connect_timeout>
        <!-- Sync insert is set forcibly, leave it here just in case. -->
        <distributed_foreground_insert>1</distributed_foreground_insert>
    </settings>
    # 需要迁移的表,可以tables里面可以写多张表
    <tables>
        <table_sms_send>
            <cluster_pull>source_cluster</cluster_pull>
            <database_pull>test_qwe</database_pull>
            <table_pull>sms_send</table_pull>
    
            <cluster_push>destination_cluster</cluster_push>
            <database_push>test_qwe</database_push>
            <table_push>sms_send</table_push>
            # 因为是多副本,需要使用复制表,否则只有单节点有数据
            <engine>
            ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{cluster}/{shard}/{database}/{table}', '{replica}')
            PARTITION BY toYYYYMM(create_time)
            ORDER BY (id, phone_number, state, create_time)
            </engine>
            # 1分片不需要分片键,这里设置为1
            <sharding_key>1</sharding_key>
        </table_sms_send>
    
        <table_use_car>
            <cluster_pull>source_cluster</cluster_pull>
            <database_pull>test_qwe</database_pull>
            <table_pull>use_car</table_pull>
    
            <cluster_push>destination_cluster</cluster_push>
            <database_push>testqwe</database_push>
            <table_push>use_car</table_push>
    
            <engine>
            ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{cluster}/{shard}/{database}/{table}', '{replica}', add_time)
            ORDER BY (id, detail_id)
            </engine>
            <sharding_key>1</sharding_key>
        </table_use_car>
    </tables>
 
</clickhouse>
  1. 在clickhouse-copier解压目录执行命令开始迁移 .
bash 复制代码
# 这里task-path表示keeper存储的迁移任务路径
/clickhouse-copier --config keeper.xml --task-path /clickhouse/copier_task/mig_test --task-file=task_migrate_all.xml 
相关推荐
ll5776443323 天前
解析PyTorch张量操作从基础重塑到高级自动微分技术
clickhouse
言之。4 天前
ClickHouse MCP
clickhouse
weixin_307779134 天前
Clickhouse导出库的表、视图、用户和角色定义的SQL语句
开发语言·数据库·算法·clickhouse·自动化
SelectDB技术团队4 天前
Apache Doris 与 ClickHouse:运维与开源闭源对比
运维·clickhouse·apache
weixin_307779139 天前
AWS云上ClickHouse数据仓库部署方案详解
开发语言·clickhouse·自动化·云计算·aws
板凳坐着晒太阳9 天前
ClickHouse 配置优化与问题解决
数据库·clickhouse
LoneEon14 天前
Ubuntu 部署 ClickHouse:高性能分析型数据库(附shell脚本一键部署↓)
数据库·clickhouse
恒悦sunsite15 天前
Ubuntu之apt安装ClickHouse数据库
数据库·clickhouse·ubuntu·列式存储·8123
问道飞鱼21 天前
【大数据技术】ClickHouse配置详细解读
大数据·clickhouse·配置信息