如何替换OCP节点(二):使用 antman脚本 | OceanBase应用实践

前言:

OceanBase Cloud Platform(简称OCP),是 OceanBase数据库的专属企业级数据库管理平台。

在实际生产环境中,OCP的安装通常是第一步,先搭建OCP平台,进而依赖OCP来创建、管理和监控我们的生产集群。但此后,可能由于机房调整或其他需求,可能会出现需要迁移或替换OCP服务器的情况。

上一篇文章,介绍了使用oat平台来替换OCP的方法,本文将介绍使用antman脚本替换OCP服务器的方法。(注:本文的环境的OCP负载均衡使用的f5,所以新的机器需要先配置f5,其他负载均衡场景同理)

环境背景:

大家如果有接触ob生产环境的经验的话,可以能会了解,前期版本,安装ocp的时候,需要安装ocp软件/metadba/obproxy三个docker包,后期ocp版本将db+proxy集成在了一个docker包里,oat的话只能纳管db+proxy

集成的metadb,分开的情况还需要使用antman脚本来替换。

>本篇文章主要介绍使用antman替换,下面说下我的软件信息

1.ocp软件:ocp-all-in-one:3.3.3-20220906114643

2.metadb:OB2276_x86_20210409

3.proxy:OBP186_20210315

4.antman:t-oceanbase-antman-1.4.3-20220807073355.alios7.x86_64

操作过程:

(一)环境检查/准备

  • 检查替换机器环境,包括分盘,创建admin用户,安装docker软件等,安装好后检查下。

    cd /root/t-oceanbase-antman/clonescripts/
    sh precheck.sh -m ocp

  • 登录meta库检查有没有tenant的主zone在要被替换的节点,提前切主

    MySQL [oceanbase]> select * from __all_Server;
    +----------------------------+----------------------------+--------------+----------+----+----------------+------------+-----------------+--------+-----------------------+--------------------------------------------------------------------------------------+-----------+--------------------+--------------+----------------+-------------------+
    | gmt_create | gmt_modified | svr_ip | svr_port | id | zone | inner_port | with_rootserver | status | block_migrate_in_time | build_version | stop_time | start_service_time | first_sessid | with_partition | last_offline_time |
    +----------------------------+----------------------------+--------------+----------+----+----------------+------------+-----------------+--------+-----------------------+--------------------------------------------------------------------------------------+-----------+--------------------+--------------+----------------+-------------------+
    | 2022-03-17 22:59:19.979627 | 2023-03-20 10:27:01.147283 | 10.10.100.87 | 2882 | 6 | META_OB_ZONE_2 | 2881 | 0 | active | 0 | 2.2.76_20210406232249-a1e144bdc179fbf473cea37f199e8a76c736b8d4(Apr 6 2021 23:55:12) | 0 | 1679279220991796 | 0 | 1 | 1679278517144838 |
    | 2022-03-17 23:54:49.277939 | 2023-03-20 09:29:22.079578 | 10.10.100.9 | 2882 | 7 | META_OB_ZONE_1 | 2881 | 1 | active | 0 | 2.2.76_20210406232249-a1e144bdc179fbf473cea37f199e8a76c736b8d4(Apr 6 2021 23:55:12) | 0 | 1679275725595691 | 0 | 1 | 0 |
    | 2021-12-21 22:44:16.476503 | 2023-03-20 09:29:22.080425 | 122.44.11.2 | 2882 | 5 | META_OB_ZONE_3 | 2881 | 0 | active | 0 | 2.2.76_20210406232249-a1e144bdc179fbf473cea37f199e8a76c736b8d4(Apr 6 2021 23:55:12) | 0 | 1640097866698859 | 0 | 1 | 0 |
    +----------------------------+----------------------------+--------------+----------+----+----------------+------------+-----------------+--------+-----------------------+--------------------------------------------------------------------------------------+-----------+--------------------+--------------+----------------+-------------------+
    MySQL [oceanbase]> select tenant_name,primary_zone from __all_tenant;
    +----------------+----------------------------------------------+
    | tenant_name | primary_zone |
    +----------------+----------------------------------------------+
    | sys | META_OB_ZONE_1;META_OB_ZONE_3;META_OB_ZONE_2 |
    | ocp_meta | META_OB_ZONE_1;META_OB_ZONE_3;META_OB_ZONE_2 |
    | ocp_monitor | META_OB_ZONE_1;META_OB_ZONE_3;META_OB_ZONE_2 |
    | oms_tt_tenant | META_OB_ZONE_1;META_OB_ZONE_3;META_OB_ZONE_2 |
    | oms_cc7_tenant | META_OB_ZONE_3;META_OB_ZONE_2;META_OB_ZONE_1 |
    | oms_ff9_tenant | META_OB_ZONE_1;META_OB_ZONE_2,META_OB_ZONE_3 |
    | oms_cc9_tenant | META_OB_ZONE_3;META_OB_ZONE_1,META_OB_ZONE_2 |
    | oms_dd_tenant | META_OB_ZONE_3;META_OB_ZONE_1,META_OB_ZONE_2 |
    | obdw_meta | META_OB_ZONE_3;META_OB_ZONE_1,META_OB_ZONE_2 |
    +----------------+----------------------------------------------+
    MySQL [oceanbase]> alter tenant sys primary_zone='META_OB_ZONE_2;META_OB_ZONE_3,META_OB_ZONE_1';
    Query OK, 0 rows affected (0.04 sec)

    MySQL [oceanbase]> alter tenant ocp_meta primary_zone='META_OB_ZONE_2;META_OB_ZONE_3,META_OB_ZONE_1';
    Query OK, 0 rows affected (1.27 sec)

    MySQL [oceanbase]> alter tenant ocp_monitor primary_zone='META_OB_ZONE_2;META_OB_ZONE_3,META_OB_ZONE_1';
    Query OK, 0 rows affected (0.02 sec)

    MySQL [oceanbase]> alter tenant oms_tt_tenant primary_zone='META_OB_ZONE_2;META_OB_ZONE_3,META_OB_ZONE_1';
    Query OK, 0 rows affected (0.03 sec)

    MySQL [oceanbase]> alter tenant oms_ff9_tenant primary_zone='META_OB_ZONE_2;META_OB_ZONE_3,META_OB_ZONE_1';
    Query OK, 0 rows affected (0.03 sec)

  • 因为使用antman脚本迁移,需要在执行机器上修改obcluster.conf文件,或者直接从原ocp上copy后检查下,镜像包也需要传到该机器/root/t-oceanbase-antman目录下

    55obffocp:~/t-oceanbase-antman # cat obcluster.conf
    ZONE1_RS_IP=10.10.100.9
    ZONE2_RS_IP=10.10.100.87
    ZONE3_RS_IP=122.44.11.2

    自动配置,无需修改 / AUTO-CONFIGURATION

    OBSERVER01_HOSTNAME=OCP_META_SERVER_1
    OBSERVER02_HOSTNAME=OCP_META_SERVER_2
    OBSERVER03_HOSTNAME=OCP_META_SERVER_3
    ZONE1_NAME=META_OB_ZONE_1 --后续命令参数,主要和参数文件中对上
    ZONE2_NAME=META_OB_ZONE_2
    ZONE3_NAME=META_OB_ZONE_3
    ##there must be more than half zone within same region
    ZONE1_REGION=OCP_META_REGION
    ZONE2_REGION=OCP_META_REGION
    ZONE3_REGION=OCP_META_REGION
    MYSQL_PORT=2881
    RPC_PORT=2882

    OCP_VERSION=3.3.3

  • 检查执行antman脚本机器上默认集群密码是否正确,cd ~/t-oceanbase-antman/tools ,执行getpass.sh的脚本,如果不对需要使用setpass.sh修改,因为后续proxy的docker迁移后会有验证,ocp的docker迁移前也会验证。

    55obffocp:~/t-oceanbase-antman/tools # bash setpass.sh -s 0Aa255yK^F
    password file sys in /root/.key already exist!


    Password of root@sys is CqVgg9}Aut
    Password of root@ocp_meta is r6kS^EINTU
    Password of root@ocp_monitor is pkJv1a{7J7
    Password of root@odc is j{fjdd3X9f
    Password of root@oms is {oOIsE9fdQ

    55obffocp:~ # mv .key .key_bak
    55obffocp:~ # cd /root/t-oceanbase-antman/tools/
    55obffocp:~/t-oceanbase-antman/tools # bash setpass.sh -s 0Aa255yK^F


    Password of root@sys is 0Aa255yK^F
    Password of root@ocp_meta is
    Password of root@ocp_monitor is
    Password of root@odc is
    Password of root@oms is
    55obffocp:~/t-oceanbase-antman/tools # bash setpass.sh -c rSf@jO%6EO


    Password of root@sys is 0Aa255yK^F
    Password of root@ocp_meta is rSf@jO%6EO
    Password of root@ocp_monitor is
    Password of root@odc is
    Password of root@oms is

(二)执行antman的manage脚本进行新机器的添加

  • 执行antman的manage脚本进行新机器的添加,ps:(这个版本manage会有报错,文末会有分享)

    55obffocp:~/t-oceanbase-antman # ./manage.sh -i ob,ocp,obproxy -l 133.55.22.19 -z 1 -R Jnydzycscc@123 -A OceanBase#123
    [2023-06-16 16:31:45.375633] INFO [check conf file /root/t-oceanbase-antman/obcluster.conf format ...]
    [2023-06-16 16:31:45.381844] INFO [conf file is upper case format.]
    [2023-06-16 16:31:45.391446] INFO [SSH_AUTH=password SSH_USER=root SSH_PORT=22 SSH_PASSWORD= SSH_KEY_FILE=/root/.ssh/id_rsa]
    LB_MODE=f5
    INSTALL_COMPONENTS componets: ob obproxy ocp
    CLEAR_COMPONENTS:
    IP_LIST: 133.55.22.19
    ZONE_LIST: 1
    ROOT_PASSWORD_LIST: Jnydzycscc@123
    ADMIN_PASSWORD_LIST: OceanBase#123
    [2023-06-16 16:31:45.746503] INFO [INSTALL_COMPONENT: ob START ######################################]
    [2023-06-16 16:31:45.751057] INFO [deploy_ob: check whether OBSERVER port 2881,2882 are in use or not on 133.55.22.19]
    [2023-06-16 16:31:45.806500] INFO [deploy_ob: OBSERVER port 2881,2882 are idle on 133.55.22.19]
    [2023-06-16 16:31:45.810773] INFO [deploy_ob: installing ob cluster, logfile: /root/t-oceanbase-antman/logs/deploy_ob.log]
    cp: '/root/t-oceanbase-antman/OB2276_x86_20210409.tar.gz' and '/root/t-oceanbase-antman/OB2276_x86_20210409.tar.gz' are the same file
    skip copy same file
    cp: '/root/t-oceanbase-antman/install_OB_docker.sh' and '/root/t-oceanbase-antman/install_OB_docker.sh' are the same file
    skip copy same file
    cp: '/root/t-oceanbase-antman/obcluster.conf' and '/root/t-oceanbase-antman/obcluster.conf' are the same file
    skip copy same file
    cp: '/root/t-oceanbase-antman/common/utils.sh' and '/root/t-oceanbase-antman/common/utils.sh' are the same file
    skip copy same file
    cp: '/root/.key' and '/root/.key' are the same file
    skip copy same file
    nohup: ignoring input
    [2023-06-16 16:31:45.841348] INFO [installing OB docker and starting OB server on 133.55.22.19, pid: 144513, log: /root/t-oceanbase-antman/logs/install_OB_docker.log and /home/admin/logs/ob-server/ inside docker]
    [2023-06-16 16:31:45.925592] INFO [load docker image: docker load -i /root/t-oceanbase-antman/OB2276_x86_20210409.tar.gz]
    [2023-06-16 16:31:45.930723] INFO [install_OB_docker.sh is still running on 133.55.22.19]
    [2023-06-16 16:31:56.021465] INFO [install_OB_docker.sh is still running on 133.55.22.19]
    Loaded image: reg.docker.alibaba-inc.com/antman/ob-docker:OB2276_x86_20210409
    [2023-06-16 16:32:06.111458] INFO [install_OB_docker.sh is still running on 133.55.22.19]
    [2023-06-16 16:32:06.359285] INFO [start container: docker run -d -it --cap-add SYS_RESOURCE --name META_OB_ZONE_1 --net=host -e OBCLUSTER_NAME=obcluster -e DEV_NAME=bond0 -e ROOTSERVICE_LIST="10.10.100.9:2882:2881;10.10.100.87:2882:2881;122.44.11.2:2882:2881" -e DATAFILE_DISK_PERCENTAGE=90 -e CLUSTER_ID=1632654636 -e ZONE_NAME=META_OB_ZONE_1 -e OBPROXY_PORT=2883 -e MYSQL_PORT=2881 -e RPC_PORT=2882 -e OCP_VIP=134.80.173.57 -e OCP_VPORT=80 -e app.password_root='Jnydzycscc@123' -e app.password_admin='OceanBase#123' -e OBPROXY_OPTSTR="" -e OPTSTR="cpu_count=64,system_memory=50G,memory_limit=254G,__min_full_resource_pool_memory=1073741824,_ob_enable_prepared_statement=false,memory_limit_percentage=90" --cpu-period 100000 --cpu-quota 6400000 --cpuset-cpus 0-63 --memory 256G -v /home/admin/oceanbase:/home/admin/oceanbase -v /data/log1:/data/log1 -v /data/1:/data/1 --restart on-failure:5 reg.docker.alibaba-inc.com/antman/ob-docker:OB2276_x86_20210409]
    WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
    4f1c15e8194cc1ae2fffcc124ea9c982b3fda87ce1a6d0038db88435c737af89
    [2023-06-16 16:32:16.209761] INFO [install_OB_docker.sh finished and reg.docker.alibaba-inc.com/antman/ob-docker:OB2276_x86_20210409 started on 133.55.22.19]
    [2023-06-16 16:32:16.214771] INFO [waiting on observer ready on 133.55.22.19]
    [2023-06-16 16:35:16.244133] INFO [waiting on observer ready on 133.55.22.19 for 3 Minitues]
    [2023-06-16 16:36:16.264776] INFO [waiting on observer ready on 133.55.22.19 for 4 Minitues]
    [2023-06-16 16:37:16.285808] INFO [waiting on observer ready on 133.55.22.19 for 5 Minitues]
    [2023-06-16 16:37:16.579057] INFO [observer on 133.55.22.19 is ready]
    [2023-06-16 16:37:16.584583] INFO [deploy_ob: installation of ob cluster done]
    [2023-06-16 16:37:16.588617] INFO [INSTALL_COMPONENT: ob DONE ######################################]
    [2023-06-16 16:37:16.593604] INFO [INSTALL_COMPONENT: obproxy START ######################################]

####日志太多,就不都粘贴出来了,可以上面看到metadb的docker服务添加完后开始了obproxy的docker服务添加###

说明:

1)"133.55.22.19"是要去替换ocp的服务器的实际物理IP。

2)"-z 1"选项,指定的133.55.22.19会被添加到OCP环境中的第1个zone,即和上文查到的10.10.100.9机器在同一个zone里。

这里关于zone的定义,主要是针对OCP服务器上的meta_ob docker而言,obproxy docker和ocp docker并没有zone的概念。

关于每台OCP服务器上的meta_ob docker属于哪一个zone,请参考"obcluster.conf"配置文件中的三个变量:ZONE1_RS_IP,ZONE2_RS_IP,ZONE3_RS_IP。

3)"-R"和"-A",后面需要分别填写成133.55.22.19服务器的root用户密码和admin用户密码。

4)-i是安装,如果替换成-c就是清除

####这时候正常的话可以通过新添加节点的ip:8080前台登录ocp了,也可以通过这个机器的2883端口去连meta库了

(三)登录ocp的metadb的sys租户新增meta_ob docker的上线

MySQL [oceanbase]> alter system add server '133.55.22.19:2882' zone 'META_OB_ZONE_1';
Query OK, 0 rows affected (0.02 sec)

MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
+--------------+----------------+-----------------+--------+--------------------+
| svr_ip       | zone           | with_rootserver | status | start_service_time |
+--------------+----------------+-----------------+--------+--------------------+
| 10.10.100.87 | META_OB_ZONE_2 |               1 | active |   1679279220991796 |
| 10.10.100.9  | META_OB_ZONE_1 |               0 | active |   1679275725595691 |
| 122.44.11.2  | META_OB_ZONE_3 |               0 | active |   1640097866698859 |
| 133.55.22.19 | META_OB_ZONE_1 |               0 | active |                  0 |
+--------------+----------------+-----------------+--------+--------------------+
4 rows in set (0.00 sec)

MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
+--------------+----------------+-----------------+--------+--------------------+
| svr_ip       | zone           | with_rootserver | status | start_service_time |
+--------------+----------------+-----------------+--------+--------------------+
| 10.10.100.87 | META_OB_ZONE_2 |               1 | active |   1679279220991796 |
| 10.10.100.9  | META_OB_ZONE_1 |               0 | active |   1679275725595691 |
| 122.44.11.2  | META_OB_ZONE_3 |               0 | active |   1640097866698859 |
| 133.55.22.19 | META_OB_ZONE_1 |               0 | active |   1686908200404755 |
+--------------+----------------+-----------------+--------+--------------------+
4 rows in set (0.01 sec)

(四)登录ocp的metadb的sys租户将被替换meta_ob docker的下线

MySQL [oceanbase]>  alter system delete server '10.10.100.9:2882' zone 'META_OB_ZONE_1';
Query OK, 0 rows affected (0.19 sec)

MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
+--------------+----------------+-----------------+----------+--------------------+
| svr_ip       | zone           | with_rootserver | status   | start_service_time |
+--------------+----------------+-----------------+----------+--------------------+
| 10.10.100.87 | META_OB_ZONE_2 |               1 | active   |   1679279220991796 |
| 10.10.100.9  | META_OB_ZONE_1 |               0 | deleting |   1679275725595691 |
| 122.44.11.2  | META_OB_ZONE_3 |               0 | active   |   1640097866698859 |
| 133.55.22.19 | META_OB_ZONE_1 |               0 | active   |   1686908200404755 |
+--------------+----------------+-----------------+----------+--------------------+
4 rows in set (0.01 sec)
MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
+--------------+----------------+-----------------+--------+--------------------+
| svr_ip       | zone           | with_rootserver | status | start_service_time |
+--------------+----------------+-----------------+--------+--------------------+
| 10.10.100.87 | META_OB_ZONE_2 |               1 | active |   1679279220991796 |
| 122.44.11.2  | META_OB_ZONE_3 |               0 | active |   1640097866698859 |
| 133.55.22.19 | META_OB_ZONE_1 |               0 | active |   1686908200404755 |
+--------------+----------------+-----------------+--------+--------------------+

(五)登录ocp_meta租户手工更新OCP服务器信息

####前面步骤处理完,ocp前台还可以看到残留的信息,需要替换下信息#####

55obffocp:~/t-oceanbase-antman # mysql -h10.10.100.87 -P2883 -uroot@ocp_meta#obcluster -p'rSf@jO%6EO' -Docp -c


MySQL [ocp]>  select * from compute_host where inner_ip_address='10.10.100.9'\G
*************************** 1. row ***************************
              id: 1
            name: ocp1a
     description: NULL
operating_system: 4.12.14-120-default
    architecture: x86_64
inner_ip_address: 10.10.100.9
        ssh_port: 2022
            kind: DEDICATED_PHYSICAL_MACHINE
   publish_ports: NULL
          status: ONLINE
          vpc_id: 1
          idc_id: 1
    host_type_id: 1
   serial_number: NULL
           alias: NULL
     create_time: 2021-09-26 21:04:11
     update_time: 2023-03-20 11:01:58
1 row in set (0.00 sec)

MySQL [ocp]> update compute_host set inner_ip_address='133.55.22.19', name='55obffocp' where inner_ip_address='10.10.100.9';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

MySQL [ocp]> select * from compute_host where id =1;
+----+-----------+-------------+---------------------+--------------+------------------+----------+----------------------------+---------------+--------+--------+--------+--------------+---------------+-------+---------------------+---------------------+
| id | name      | description | operating_system    | architecture | inner_ip_address | ssh_port | kind                       | publish_ports | status | vpc_id | idc_id | host_type_id | serial_number | alias | create_time         | update_time         |
+----+-----------+-------------+---------------------+--------------+------------------+----------+----------------------------+---------------+--------+--------+--------+--------------+---------------+-------+---------------------+---------------------+
|  1 | 55obffocp | NULL        | 4.12.14-120-default | x86_64       | 133.55.22.19     |     2022 | DEDICATED_PHYSICAL_MACHINE | NULL          | ONLINE |      1 |      1 |            1 | NULL          | NULL  | 2021-09-26 21:04:11 | 2023-06-16 17:47:19 |
+----+-----------+-------------+---------------------+--------------+------------------+----------+----------------------------+---------------+--------+--------+--------+--------------+---------------+-------+---------------------+---------------------+
1 row in set (0.00 sec)

(六)清理被替换机器上残留的服务

ocp1a:~/t-oceanbase-antman #  ./manage.sh -c ob,ocp,obproxy -l 10.10.100.9 -z 1 -R 'Dt!n(Rg4Av!t' -A OceanBase#123
grep: /etc/system-release: No such file or directory
[2023-06-16 22:45:44.101400] INFO [check conf file /root/t-oceanbase-antman/obcluster.conf format ...]
[2023-06-16 22:45:44.106779] INFO [conf file is upper case format.]
[2023-06-16 22:45:44.114437] INFO [SSH_AUTH=password SSH_USER=root SSH_PORT=22 SSH_PASSWORD= SSH_KEY_FILE=/root/.ssh/id_rsa]
LB_MODE=f5
INSTALL_COMPONENTS componets: 
CLEAR_COMPONENTS: ob obproxy ocp
IP_LIST: 10.10.100.9
ZONE_LIST: 1
ROOT_PASSWORD_LIST: Dt!n(Rg4Av!t
ADMIN_PASSWORD_LIST: OceanBase#123
[2023-06-16 22:45:44.474268] INFO [CLEAR_COMPONENT: ob START  ######################################]
cp: '/root/t-oceanbase-antman/uninstall.sh' and '/root/t-oceanbase-antman/uninstall.sh' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/obcluster.conf' and '/root/t-oceanbase-antman/obcluster.conf' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/common/utils.sh' and '/root/t-oceanbase-antman/common/utils.sh' are the same file
skip copy same file
grep: /etc/system-release: No such file or directory
[2023-06-16 22:45:44.504069] INFO [remove OB server and docker on host: 10.10.100.9]
[2023-06-16 22:45:44.548697] INFO [docker rm -f 62ab623cb4ed]
62ab623cb4ed
[2023-06-16 22:46:01.260706] INFO [remove OB server and docker on host: 10.10.100.9 done!]
[2023-06-16 22:46:01.370808] INFO [uninstall.sh ob  finished and reg.docker.alibaba-inc.com/antman/ob-docker:OB2276_x86_20210409 removed on 10.10.100.9]
[2023-06-16 22:46:01.375914] INFO [OB docker on 10.10.100.9 is removed]
[2023-06-16 22:46:01.380667] INFO [CLEAR_COMPONENT: ob DONE  ######################################]
[2023-06-16 22:46:01.385398] INFO [CLEAR_COMPONENT: obproxy START  ######################################]
cp: '/root/t-oceanbase-antman/uninstall.sh' and '/root/t-oceanbase-antman/uninstall.sh' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/obcluster.conf' and '/root/t-oceanbase-antman/obcluster.conf' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/common/utils.sh' and '/root/t-oceanbase-antman/common/utils.sh' are the same file
skip copy same file
grep: /etc/system-release: No such file or directory
[2023-06-16 22:46:01.416495] INFO [remove obproxy docker on host:10.10.100.9]
[2023-06-16 22:46:01.514215] INFO [docker rm -f 01bdcadf2e11]
01bdcadf2e11
[2023-06-16 22:46:01.765459] INFO [remove obproxy docker on host:10.10.100.9 done!]
[2023-06-16 22:46:01.858848] INFO [uninstall.sh obproxy  finished and reg.docker.alibaba-inc.com/antman/obproxy:OBP186_20210315 removed on 10.10.100.9]
[2023-06-16 22:46:01.863806] INFO [obproxy docker on 10.10.100.9 is removed]
[2023-06-16 22:46:01.868778] INFO [CLEAR_COMPONENT: obproxy DONE  ######################################]
[2023-06-16 22:46:01.873368] INFO [CLEAR_COMPONENT: ocp START  ######################################]
cp: '/root/t-oceanbase-antman/uninstall.sh' and '/root/t-oceanbase-antman/uninstall.sh' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/obcluster.conf' and '/root/t-oceanbase-antman/obcluster.conf' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/common/utils.sh' and '/root/t-oceanbase-antman/common/utils.sh' are the same file
skip copy same file
grep: /etc/system-release: No such file or directory
[2023-06-16 22:46:01.906934] INFO [remove ocp docker on host:10.10.100.9]
[2023-06-16 22:46:01.944811] INFO [docker rm -f 8b044744a92e]
8b044744a92e
[2023-06-16 22:46:26.162467] INFO [remove ocp docker on host:10.10.100.9 done]
[2023-06-16 22:46:26.253927] INFO [uninstall.sh ocp  finished and reg.docker.alibaba-inc.com/oceanbase/ocp-all-in-one:3.3.3-20220906114643 removed on 10.10.100.9]
[2023-06-16 22:46:26.258281] INFO [ocp docker on 10.10.100.9 is removed]
[2023-06-16 22:46:26.263047] INFO [CLEAR_COMPONENT: ocp DONE  ######################################]
ocp1a:~/t-oceanbase-antman # docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

报错记录及处理:

  • manage脚本执行报错

    55obffocp:~/t-oceanbase-antman # ./manage.sh -i ob,ocp,obproxy -l 133.55.22.19 -z 1 -R Jnydzycscc@123 -A OceanBase#123
    [2023-06-16 16:31:03.305062] INFO [check conf file /root/t-oceanbase-antman/obcluster.conf format ...]
    [2023-06-16 16:31:03.309079] INFO [conf file is upper case format.]
    [2023-06-16 16:31:03.315290] INFO [SSH_AUTH=password SSH_USER=root SSH_PORT=22 SSH_PASSWORD= SSH_KEY_FILE=/root/.ssh/id_rsa]
    LB_MODE=f5
    INSTALL_COMPONENTS componets: ob obproxy ocp
    CLEAR_COMPONENTS:
    IP_LIST: 133.55.22.19
    ZONE_LIST: 1
    ROOT_PASSWORD_LIST: Jnydzycscc@123
    ADMIN_PASSWORD_LIST: OceanBase#123
    /root/t-oceanbase-antman/common/utils.sh: line 484: -e: command not found
    [2023-06-16 16:31:03.636862] ERROR [: ssh authorization to 133.55.22.19 failed, Please check SSH affinity environment varialbes.]

######这个问题也需要修改脚本代码解决#########

  • 执行alter system delete server 命令之后很久被替换的server没有delete掉,一直是deleting状态,检查发现ocp的meta库内存参数调整过,新加的server参数小,导致unit迁移卡住。

    MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
    +--------------+----------------+-----------------+----------+--------------------+
    | svr_ip | zone | with_rootserver | status | start_service_time |
    +--------------+----------------+-----------------+----------+--------------------+
    | 10.10.100.87 | META_OB_ZONE_2 | 1 | active | 1679279220991796 |
    | 10.10.100.9 | META_OB_ZONE_1 | 0 | deleting | 1679275725595691 |
    | 122.44.11.2 | META_OB_ZONE_3 | 0 | active | 1640097866698859 |
    | 133.55.22.19 | META_OB_ZONE_1 | 0 | active | 1686908200404755 |
    +--------------+----------------+-----------------+----------+--------------------+
    4 rows in set (0.01 sec)

    MySQL [oceanbase]> select count(*),svr_ip from gvunit group by svr_ip; +----------+--------------+ | count(*) | svr_ip | +----------+--------------+ | 27 | 133.55.22.19 | | 33 | 10.10.100.87 | | 33 | 122.44.11.2 | | 6 | 10.10.100.9 | +----------+--------------+ 4 rows in set (0.01 sec) MySQL [oceanbase]> select * from gvunit where svr_ip='10.10.100.9';
    +---------+----------------+---------------------------------------------+------------------+----------------------------------------+----------------+-----------+-------------+-------------+----------+---------------------+-----------------------+---------+---------+-------------+-------------+----------+----------+---------------+-----------------+
    | unit_id | unit_config_id | unit_config_name | resource_pool_id | resource_pool_name | zone | tenant_id | tenant_name | svr_ip | svr_port | migrate_from_svr_ip | migrate_from_svr_port | max_cpu | min_cpu | max_memory | min_memory | max_iops | min_iops | max_disk_size | max_session_num |
    +---------+----------------+---------------------------------------------+------------------+----------------------------------------+----------------+-----------+-------------+-------------+----------+---------------------+-----------------------+---------+---------+-------------+-------------+----------+----------+---------------+-----------------+
    | 1106 | 1090 | config_oms_tt_tenant_META_OB_ZONE_1_S2_gpa | 1080 | pool_oms_tt_tenant_META_OB_ZONE_1_gpa | META_OB_ZONE_1 | NULL | NULL | 10.10.100.9 | 2882 | | 0 | 3 | 3 | 12884901888 | 12884901888 | 2500 | 2500 | 536870912000 | 750 |
    | 1139 | 1094 | oms_unit | 1129 | oms_ff9_tenant_resource_pool | META_OB_ZONE_1 | NULL | NULL | 10.10.100.9 | 2882 | | 0 | 2 | 2 | 5368709120 | 4294967296 | 128 | 128 | 5368709120 | 10000 |
    | 1122 | 1097 | config_oms_c55_tenant_META_OB_ZONE_1_S1_ifu | 1088 | pool_oms_c55_tenant_META_OB_ZONE_1_ifu | META_OB_ZONE_1 | NULL | NULL | 10.10.100.9 | 2882 | | 0 | 1.5 | 1.5 | 6442450944 | 6442450944 | 1250 | 1250 | 536870912000 | 375 |
    | 1126 | 1100 | config_oms_ff6_tenant_META_OB_ZONE_1_S1_uzz | 1092 | pool_oms_ff6_tenant_META_OB_ZONE_1_uzz | META_OB_ZONE_1 | NULL | NULL | 10.10.100.9 | 2882 | | 0 | 1.5 | 1.5 | 6442450944 | 6442450944 | 1250 | 1250 | 536870912000 | 375 |
    | 1127 | 1101 | config_oms_ff7_tenant_META_OB_ZONE_1_S1_gkj | 1093 | pool_oms_ff7_tenant_META_OB_ZONE_1_gkj | META_OB_ZONE_1 | NULL | NULL | 10.10.100.9 | 2882 | | 0 | 1.5 | 1.5 | 6442450944 | 6442450944 | 1250 | 1250 | 536870912000 | 375 |
    | 1135 | 1108 | config_oms_cc8_tenant_META_OB_ZONE_1_S1_wwo | 1101 | pool_oms_cc8_tenant_META_OB_ZONE_1_wwo | META_OB_ZONE_1 | NULL | NULL | 10.10.100.9 | 2882 | | 0 | 1.5 | 1.5 | 6442450944 | 6442450944 | 1250 | 1250 | 536870912000 | 375 |
    +---------+----------------+---------------------------------------------+------------------+----------------------------------------+----------------+-----------+-------------+-------------+----------+---------------------+-----------------------+---------+---------+-------------+-------------+----------+----------+---------------+-----------------+
    6 rows in set (0.02 sec)

    MySQL [oceanbase]> alter system migrate unit=1106 destination='133.55.22.19:2882';
    ERROR 4624 (HY000): machine resource is not enough to hold a new unit ----------------手动去迁移报资源不足

    MySQL [oceanbase]> select zone,svr_ip, cpu_total, cpu_assigned,cpu_assigned_percent cpu_ass_pct, round(mem_total/1024/1024/1024) mem_total_gb,
    -> round(mem_assigned/1024/1024/1024) mem_ass_gb, mem_assigned_percent mem_ass_pct, unit_num, migrating_unit_num, leader_count, round(load,2) load
    -> from __all_virtual_server_stat
    -> order by zone, svr_ip; ---------------------检查资源发现内存不足
    +----------------+--------------+-----------+--------------+-------------+--------------+------------+-------------+----------+--------------------+--------------+------+
    | zone | svr_ip | cpu_total | cpu_assigned | cpu_ass_pct | mem_total_gb | mem_ass_gb | mem_ass_pct | unit_num | migrating_unit_num | leader_count | load |
    +----------------+--------------+-----------+--------------+-------------+--------------+------------+-------------+----------+--------------------+--------------+------+
    | META_OB_ZONE_1 | 10.10.100.9 | 62 | 11 | 17 | 250 | 40 | 16 | 6 | 0 | 0 | 0.17 |
    | META_OB_ZONE_1 | 133.55.22.19 | 62 | 48 | 77 | 204 | 196 | 96 | 27 | 0 | 0 | 0.87 |
    | META_OB_ZONE_2 | 10.10.100.87 | 62 | 59 | 95 | 250 | 236 | 94 | 33 | 0 | 2935 | 0.95 |
    | META_OB_ZONE_3 | 122.44.11.2 | 62 | 59 | 95 | 250 | 236 | 94 | 33 | 0 | 1051 | 0.95 |

    MySQL [oceanbase]> show parameters like '%memory_limit%'
    ;
    | META_OB_ZONE_3 | observer | 122.44.11.2 | 2882 | memory_limit | NULL | 300G | the size of the memory reserved for internal use(for testing purpose). Range: [0M,) | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
    | META_OB_ZONE_1 | observer | 133.55.22.19 | 2882 | memory_limit | NULL | 254G | the size of the memory reserved for internal use(for testing purpose). Range: [0M,) | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
    | META_OB_ZONE_2 | observer | 10.10.100.87 | 2882 | memory_limit | NULL | 300G | the size of the memory reserved for internal use(for testing purpose). Range: [0M,) | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
    | META_OB_ZONE_1 | observer | 10.10.100.9 | 2882 | memory_limit | NULL | 300G | the size of the memory reserved for internal use(for testing purpose). Range: [0M,) | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |

    MySQL [oceanbase]> alter system set memory_limit ='300G' ;
    Query OK, 0 rows affected (0.05 sec)

    MySQL [oceanbase]> select count(),svr_ip from gv$unit group by svr_ip;
    +----------+--------------+
    | count(
    ) | svr_ip |
    +----------+--------------+
    | 33 | 133.55.22.19 |
    | 33 | 10.10.100.87 |
    | 33 | 122.44.11.2 |
    +----------+--------------+
    3 rows in set (0.00 sec)

    MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
    +--------------+----------------+-----------------+--------+--------------------+
    | svr_ip | zone | with_rootserver | status | start_service_time |
    +--------------+----------------+-----------------+--------+--------------------+
    | 10.10.100.87 | META_OB_ZONE_2 | 1 | active | 1679279220991796 |
    | 122.44.11.2 | META_OB_ZONE_3 | 0 | active | 1640097866698859 |
    | 133.55.22.19 | META_OB_ZONE_1 | 0 | active | 1686908200404755 |
    +--------------+----------------+-----------------+--------+--------------------+
    3 rows in set (0.00 sec)

总结:

到这里使用antman脚本的方式去替换ocp机器的操作就结束了,包括前面一篇使用oat替换ocp节点的文章可能看起来没什么难度,但是整个过程来回做了好几遍,为了别人以后少踩坑,所以写了这两篇文章分享。如果看了上篇文章的话应该知道oat替换ocp的时候,新加机器是在metadb中新创建了一个zone,然后再把被替换机器下掉,其中还涉及新建资源池修改Locality,增加副本数等操作。其实使用antman脚本的话这个步骤就不太一样,他是将新机器加入到需要替换机器的同一个zone内,然后做同zone内unit的迁移,然后把被替换的机器下线,现阶段的话,相对来说antman替换之后对于ocp的元数据的影响小一些,但是oat黑屏的操作少些,对于obproxy单独docker的前期场景必须使用antman,后期版本就看大家自己酌情选择了。

行之所向,莫问远方。

相关推荐
OceanBase数据库官方博客1 天前
半连接转内连接 | OceanBase SQL 查询改写
sql·oceanbase·分布式数据库
OceanBase数据库官方博客1 天前
解析在OceanBase创建分区的常见问题|OceanBase 用户问题精粹
oceanbase·分布式数据库·分区
OceanBase数据库官方博客1 天前
半连接转内连接规则的原理与代码解析 |OceanBase查询优化
sql·oceanbase·分布式数据库
IT培训中心-竺老师4 天前
OceanBase 数据库分布式与集中式 能力
数据库·分布式·oceanbase
靖顺4 天前
【OceanBase 诊断调优】—— OceanBase 数据库网络速率配置方案
网络·数据库·oceanbase
尚雷558012 天前
OceanBase 社区版 4.0 离线方式升级bp1至bp2 指南(含避坑总结)
oceanbase
五月高高12 天前
Linux部署oceanbase
linux·oceanbase
靖顺15 天前
【OceanBase 诊断调优】—— 统计信息自动收集超时导致的估行不准 SQL 选择错索引
数据库·sql·oceanbase
it界的哈士奇16 天前
Oceanbase离线集群部署
oceanbase