案例二---集群修改物理IP和VIP

kingbaseES V8R6集群运维案例之---修改物理IP和VIP案例(V2.0)

文章目录

案例说明:
在KingbaseES V8R6的集群中,ip地址配置在repmgr.conf和kingbase.auto.conf中,如果需要修改集群的物理ip和vip,需要修改这两个配置文件。ip的修改需要停止集群服务,在修改ip前,对于生产环境要规划好停机窗口,以免影响应用的访问。

操作步骤总结:

1、查看和确定主备库后,关闭集群(cluster和db)服务。

2、修改系统ip及/etc/hosts文件中ip。

3、修改集群配置文件repmgr.conf中的物理ip和vip信息。

4、重启系统网络服务应用新的物理ip。

5、启动主备库数据库服务。

6、注册主库到集群。

7、关闭备库数据库服务,将备库节点重新加入到集群,注册备库到集群。

8、查看集群服务状态(cluster和db)并启动主备库repmgrd服务。

9、重启集群(sys_monitor.sh)服务验证。

集群IP信息:

主:192.168.158.24 新主ip:192.168.158.34

备:192.168.158.25 新备ip:192.168.158.35

网关:192.168.158.2 网关:192.168.158.2

一、查看集群主备库状态信息

apl 复制代码
[kingbase@localhost bin]$ ./sys_monitor.sh start
2025-12-17 18:22:24 execute to start DB on "[192.168.158.25]" success, connect to check it.
2025-12-17 18:22:25 DB on "[192.168.158.25]" start success.
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                                      
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.168.158.24 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.158.25 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

主:

apl 复制代码
[root@localhost ~]# ip a 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:cf:e3:d4 brd ff:ff:ff:ff:ff:ff
    inet 192.168.158.24/24 brd 192.168.158.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::d9f2:f13c:d39c:6eef/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

查看repmgr.conf文件

apl 复制代码
[kingbase@localhost bin]$ cat ../etc/repmgr.conf
use_scmd=on
ha_running_mode='DG'
node_id=1
node_name='node1'
conninfo='host=192.168.158.24 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'
connection_check_type='mix'

data_directory='/home/kingbase/cluster/kingbase/data'
log_file='/home/kingbase/cluster/kingbase/log/hamgr.log'
kbha_log_file='/home/kingbase/cluster/kingbase/log/kbha.log'
sys_bindir='/home/kingbase/cluster/kingbase/bin'
scmd_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -p 8890 -o ServerAliveInterval=2 -o ServerAliveCountMax=3'

trusted_servers='192.168.158.2'
running_under_failure_trusted_servers='on'
repmgrd_pid_file='/home/kingbase/cluster/kingbase/etc/hamgrd.pid'
kbha_pid_file='/home/kingbase/cluster/kingbase/etc/kbha.pid'

failover='automatic'
synchronous='quorum'
recovery='standby'
auto_cluster_recovery_level='1'
monitoring_history='no'
reconnect_attempts=10
reconnect_interval=6

promote_command='/home/kingbase/cluster/kingbase/bin/repmgr standby promote -f /home/kingbase/cluster/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/kingbase/bin/repmgr standby follow -f /home/kingbase/cluster/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
ping_path='/bin'
use_check_disk='off'

查看kingbase.auto.conf文件

apl 复制代码
[kingbase@localhost bin]$ cat ../data/kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
synchronous_standby_names = 'ANY 1( node2)'

备:

apl 复制代码
[root@localhost ~]# ip a 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:9d:f7:73 brd ff:ff:ff:ff:ff:ff
    inet 192.168.158.25/24 brd 192.168.158.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::d9f2:f13c:d39c:6eef/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::1dc4:530f:98e:4ace/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

查看repmgr.conf文件

apl 复制代码
[kingbase@localhost ~]$ cd /home/kingbase/cluster/kingbase/bin/
[kingbase@localhost bin]$ cat ../etc/repmgr.conf
use_scmd=on
ha_running_mode='DG'
node_id=2
node_name='node2'
conninfo='host=192.168.158.25 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'
connection_check_type='mix'

data_directory='/home/kingbase/cluster/kingbase/data'
log_file='/home/kingbase/cluster/kingbase/log/hamgr.log'
kbha_log_file='/home/kingbase/cluster/kingbase/log/kbha.log'
sys_bindir='/home/kingbase/cluster/kingbase/bin'
scmd_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -p 8890 -o ServerAliveInterval=2 -o ServerAliveCountMax=3'

trusted_servers='192.168.158.2'
running_under_failure_trusted_servers='on'
repmgrd_pid_file='/home/kingbase/cluster/kingbase/etc/hamgrd.pid'
kbha_pid_file='/home/kingbase/cluster/kingbase/etc/kbha.pid'

failover='automatic'
synchronous='quorum'
recovery='standby'
auto_cluster_recovery_level='1'
monitoring_history='no'
reconnect_attempts=10
reconnect_interval=6

promote_command='/home/kingbase/cluster/kingbase/bin/repmgr standby promote -f /home/kingbase/cluster/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/kingbase/bin/repmgr standby follow -f /home/kingbase/cluster/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
ping_path='/bin'
use_check_disk='off'

查看kingbase.auto.conf文件

apl 复制代码
[kingbase@localhost bin]$ cat ../data/kingbase.auto.conf 
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'host=192.168.158.24 user=esrep port=54321 application_name=node2 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'
primary_slot_name = 'repmgr_slot_2'
[kingbase@localhost bin]$ 

二、修改物理机ip

关闭集群和数据库

apl 复制代码
#关闭集群和数据库
[kingbase@localhost bin]$ ./sys_monitor.sh stop

修改主物理ip

apl 复制代码
#修改ip为192.168.158.34
[root@localhost ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33 

启用新ip

apl 复制代码
[root@localhost ~]# ifdown ens33 && ifup ens33

[root@localhost ~]# ip a 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:cf:e3:d4 brd ff:ff:ff:ff:ff:ff
    inet 192.168.158.34/24 brd 192.168.158.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::d9f2:f13c:d39c:6eef/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

修改备物理ip

apl 复制代码
#修改ip为192.168.158.35
[root@localhost ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33 

启用新ip

apl 复制代码
[root@localhost ~]# ifdown ens33 && ifup ens33

[root@localhost ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:9d:f7:73 brd ff:ff:ff:ff:ff:ff
    inet 192.168.158.35/24 brd 192.168.158.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::d9f2:f13c:d39c:6eef/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::1dc4:530f:98e:4ace/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

三、修改repmgr.conf 和kingbase.auto.conf配置文件

主:repmgr.conf文件

apl 复制代码
#将这个ip改为新物理ip
conninfo='host=192.168.158.34 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'

备:repmgr.conf文件

apl 复制代码
#将这个ip改为新物理ip
conninfo='host=192.168.158.35 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'

​ kingbase.auto.conf文件

apl 复制代码
#将这个ip指向主库ip
primary_conninfo = 'host=192.168.158.34 user=esrep port=54321 application_name=node2 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'

四、启动主备库数据库服务

apl 复制代码
[kingbase@localhost bin]$ ./sys_ctl -D /home/kingbase/cluster/kingbase/data start 
等待服务器进程启动 ....2025-12-17 18:38:43.873 CST [4322] LOG:  sepapower extension initialized
2025-12-17 18:38:43.880 CST [4322] LOG:  sysaudit extension initialized
2025-12-17 18:38:43.880 CST [4322] LOG:  starting KingbaseES V009R001C002B0014 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit
2025-12-17 18:38:43.880 CST [4322] LOG:  listening on IPv4 address "0.0.0.0", port 54321
2025-12-17 18:38:43.880 CST [4322] LOG:  listening on IPv6 address "::", port 54321
2025-12-17 18:38:43.881 CST [4322] LOG:  listening on Unix socket "/tmp/.s.KINGBASE.54321"
2025-12-17 18:38:44.161 CST [4322] LOG:  redirecting log output to logging collector process
2025-12-17 18:38:44.161 CST [4322] HINT:  Future log output will appear in directory "../sys_log".
 完成
服务器进程已经启动

现在别起集群,因为数据库元数据表里面的信息跟现在的信息是不一致的,所以先手动起数据库

五、注册主库到集群

1、注册primary到集群

apl 复制代码
[kingbase@localhost bin]$ ./repmgr primary register -F
[INFO] connecting to primary database...
[INFO] "repmgr" extension is already installed
[NOTICE] primary node record (ID: 1) updated

2、查看集群节点状态

apl 复制代码
[kingbase@localhost bin]$ ./repmgr cluster show
 ID | Name  | Role    | Status        | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                                      
----+-------+---------+---------------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running     |          | default  | 100      | 1        |         | host=192.168.158.34 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby | ? unreachable | ? node1  | default  | 100      |          | ?       | host=192.168.158.25 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

[WARNING] following issues were detected
  - unable to connect to node "node2" (ID: 2)
  - node "node2" (ID: 2) is registered as an active standby but is unreachable

[HINT] execute with --verbose option to see connection error messages

六、注册备库到集群

1)关闭数据库服务

apl 复制代码
[kingbase@localhost bin]$ ./sys_ctl -D /home/kingbase/cluster/kingbase/data stop
等待服务器进程关闭 .... 完成
服务器进程已经关闭

2)将备库节点重新加入到集群

apl 复制代码
[kingbase@localhost bin]$ ./repmgr node rejoin -h 192.168.158.34 -U esrep -d esrep
[NOTICE] rejoin target is node "node1" (ID: 1)
[INFO] timelines are same, this server is not ahead
[DETAIL] local node lsn is 0/120007D8, rejoin target lsn is 0/120007D8
[INFO] creating replication slot as user "esrep"
[NOTICE] setting node 2's upstream to node 1
[WARNING] unable to ping "host=192.168.158.35 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000"
[DETAIL] KCIping() returned "KCIPING_NO_RESPONSE"
[NOTICE] begin to start server at 2025-12-17 18:46:35.167440
[NOTICE] starting server using "/home/kingbase/cluster/kingbase/bin/sys_ctl  -w -t 90 -D '/home/kingbase/cluster/kingbase/data' -l /home/kingbase/cluster/kingbase/bin/logfile start"
[NOTICE] start server finish at 2025-12-17 18:46:35.379881
[NOTICE] NODE REJOIN successful

3)注册standby到集群

apl 复制代码
[kingbase@localhost bin]$ ./repmgr standby register -h 192.168.158.34 -U esrep -d esrep -F
[INFO] connecting to local node "node2" (ID: 2)
[WARNING] database connection parameters not required when the standby to be registered is running
[DETAIL] repmgr uses the "conninfo" parameter in "repmgr.conf" to connect to the standby
[INFO] connecting to primary database
[INFO] standby registration complete
[NOTICE] standby node "node2" (ID: 2) successfully registered

主库查看集群状态和主备流复制状态

1)查看集群节点状态

apl 复制代码
[kingbase@localhost bin]$ ./repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                                      
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.168.158.34 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.158.35 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

2)查看主备流复制状态

apl 复制代码
[kingbase@localhost bin]$ ./ksql test system
输入 "help" 来获取帮助信息.

test=# 
test=# 
test=# select * from sys_stat_replication;
 pid  | usesysid | usename | application_name |  client_addr   | client_hostname | client_port |         backend_start         | backend_xmin |   s
tate   |  sent_lsn  | write_lsn  | flush_lsn  | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time 
          
------+----------+---------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+----
-------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------+---------------------
----------
 4785 |    16385 | esrep   | node2            | 192.168.158.35 |                 |       36808 | 2025-12-17 18:46:35.338953+08 |              | str
eaming | 0/12000C40 | 0/12000C40 | 0/12000C40 | 0/12000C40 |           |           |            |             1 | quorum     | 2025-12-17 18:50:12.
292499+08
(1 行记录)

test=# 

启动主备库repmgrd服务

apl 复制代码
[kingbase@localhost bin]$ ./repmgrd -d
[2025-12-17 18:50:46] [NOTICE] redirecting logging output to "/home/kingbase/cluster/kingbase/log/hamgr.log"

七、重启集群服务验证

1、通过sys_monitor.sh启动集群

apl 复制代码
[kingbase@localhost bin]$ ./sys_monitor.sh restart 

2、查看集群节点状态

apl 复制代码
[kingbase@localhost bin]$ ./repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                                      
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.168.158.34 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.158.35 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

3、查看主备流复制状态

apl 复制代码
test=# select * from sys_stat_replication;
 pid  | usesysid | usename | application_name |  client_addr   | client_hostname | client_port |         backend_start         | backend_xmin |   s
tate   |  sent_lsn  | write_lsn  | flush_lsn  | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time 
          
------+----------+---------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+----
-------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------+---------------------
----------
 5597 |    16385 | esrep   | node2            | 192.168.158.35 |                 |       36830 | 2025-12-17 18:51:41.469861+08 |              | str
eaming | 0/130004D0 | 0/130004D0 | 0/130004D0 | 0/130004D0 |           |           |            |             1 | quorum     | 2025-12-17 18:54:44.
170276+08
(1 行记录)

八、总结

apl 复制代码
对于集群IP的修改需要停止集群服务(cluster和db),将影响业务的正常运行,所以在集群部署前需要做好IP的规划,避免在后期修改给业务正常运行带来影响。


5597 | 16385 | esrep | node2 | 192.168.158.35 | | 36830 | 2025-12-17 18:51:41.469861+08 | | str

eaming | 0/130004D0 | 0/130004D0 | 0/130004D0 | 0/130004D0 | | | | 1 | quorum | 2025-12-17 18:54:44.

170276+08

(1 行记录)

复制代码
八、总结

```apl
对于集群IP的修改需要停止集群服务(cluster和db),将影响业务的正常运行,所以在集群部署前需要做好IP的规划,避免在后期修改给业务正常运行带来影响。
相关推荐
只能是遇见2 小时前
sql实战解析-sum()over(partition by xx order by xx)
数据库·sql
知识分享小能手2 小时前
PostgreSQL 入门学习教程,从入门到精通,PostgreSQL 16 内部结构深度解析 —语法、实现与实战案例(20)
数据库·学习·postgresql
qq_411262422 小时前
在建立udp连接的时候,有时候能成功,有时候AT 指令返回+ERRNO:0x70
网络·网络协议·udp
Skilce2 小时前
HAProxy
linux·运维·负载均衡
IvorySQL2 小时前
官宣!全球 PostgreSQL 大神再度集结,HOW 2026 正式定档
数据库·postgresql·开源
盐水冰2 小时前
【烘焙坊项目】后端搭建(10) - 地址簿功能&用户下单&微信支付
java·数据库·后端
数据知道2 小时前
MongoDB热点数据识别:提升访问速度的缓存策略与实现
数据库·mongodb·缓存
一个天蝎座 白勺 程序猿3 小时前
KingbaseES数据库MySQL兼容性解析:从TCO账本到“傻瓜式“迁移的密码
android·数据库·mysql·kingbasees
有一个好名字3 小时前
claude code安装
linux·运维·前端