kingbaseES V8R6集群运维案例之---修改物理IP和VIP案例(V2.0)
文章目录
- [kingbaseES V8R6集群运维案例之---修改物理IP和VIP案例(V2.0)](#kingbaseES V8R6集群运维案例之---修改物理IP和VIP案例(V2.0))
-
-
- 一、查看集群主备库状态信息
- 二、修改物理机ip
- [三、修改repmgr.conf 和kingbase.auto.conf配置文件](#三、修改repmgr.conf 和kingbase.auto.conf配置文件)
- 四、启动主备库数据库服务
- 五、注册主库到集群
- 六、注册备库到集群
- 七、重启集群服务验证
-
案例说明:
在KingbaseES V8R6的集群中,ip地址配置在repmgr.conf和kingbase.auto.conf中,如果需要修改集群的物理ip和vip,需要修改这两个配置文件。ip的修改需要停止集群服务,在修改ip前,对于生产环境要规划好停机窗口,以免影响应用的访问。
操作步骤总结:
1、查看和确定主备库后,关闭集群(cluster和db)服务。
2、修改系统ip及/etc/hosts文件中ip。
3、修改集群配置文件repmgr.conf中的物理ip和vip信息。
4、重启系统网络服务应用新的物理ip。
5、启动主备库数据库服务。
6、注册主库到集群。
7、关闭备库数据库服务,将备库节点重新加入到集群,注册备库到集群。
8、查看集群服务状态(cluster和db)并启动主备库repmgrd服务。
9、重启集群(sys_monitor.sh)服务验证。
集群IP信息:
主:192.168.158.24 新主ip:192.168.158.34
备:192.168.158.25 新备ip:192.168.158.35
网关:192.168.158.2 网关:192.168.158.2
一、查看集群主备库状态信息
apl
[kingbase@localhost bin]$ ./sys_monitor.sh start
2025-12-17 18:22:24 execute to start DB on "[192.168.158.25]" success, connect to check it.
2025-12-17 18:22:25 DB on "[192.168.158.25]" start success.
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=192.168.158.24 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.158.25 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
主:
apl
[root@localhost ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:cf:e3:d4 brd ff:ff:ff:ff:ff:ff
inet 192.168.158.24/24 brd 192.168.158.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet6 fe80::d9f2:f13c:d39c:6eef/64 scope link noprefixroute
valid_lft forever preferred_lft forever
查看repmgr.conf文件
apl
[kingbase@localhost bin]$ cat ../etc/repmgr.conf
use_scmd=on
ha_running_mode='DG'
node_id=1
node_name='node1'
conninfo='host=192.168.158.24 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'
connection_check_type='mix'
data_directory='/home/kingbase/cluster/kingbase/data'
log_file='/home/kingbase/cluster/kingbase/log/hamgr.log'
kbha_log_file='/home/kingbase/cluster/kingbase/log/kbha.log'
sys_bindir='/home/kingbase/cluster/kingbase/bin'
scmd_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -p 8890 -o ServerAliveInterval=2 -o ServerAliveCountMax=3'
trusted_servers='192.168.158.2'
running_under_failure_trusted_servers='on'
repmgrd_pid_file='/home/kingbase/cluster/kingbase/etc/hamgrd.pid'
kbha_pid_file='/home/kingbase/cluster/kingbase/etc/kbha.pid'
failover='automatic'
synchronous='quorum'
recovery='standby'
auto_cluster_recovery_level='1'
monitoring_history='no'
reconnect_attempts=10
reconnect_interval=6
promote_command='/home/kingbase/cluster/kingbase/bin/repmgr standby promote -f /home/kingbase/cluster/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/kingbase/bin/repmgr standby follow -f /home/kingbase/cluster/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
ping_path='/bin'
use_check_disk='off'
查看kingbase.auto.conf文件
apl
[kingbase@localhost bin]$ cat ../data/kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
synchronous_standby_names = 'ANY 1( node2)'
备:
apl
[root@localhost ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:9d:f7:73 brd ff:ff:ff:ff:ff:ff
inet 192.168.158.25/24 brd 192.168.158.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet6 fe80::d9f2:f13c:d39c:6eef/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::1dc4:530f:98e:4ace/64 scope link noprefixroute
valid_lft forever preferred_lft forever
查看repmgr.conf文件
apl
[kingbase@localhost ~]$ cd /home/kingbase/cluster/kingbase/bin/
[kingbase@localhost bin]$ cat ../etc/repmgr.conf
use_scmd=on
ha_running_mode='DG'
node_id=2
node_name='node2'
conninfo='host=192.168.158.25 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'
connection_check_type='mix'
data_directory='/home/kingbase/cluster/kingbase/data'
log_file='/home/kingbase/cluster/kingbase/log/hamgr.log'
kbha_log_file='/home/kingbase/cluster/kingbase/log/kbha.log'
sys_bindir='/home/kingbase/cluster/kingbase/bin'
scmd_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -p 8890 -o ServerAliveInterval=2 -o ServerAliveCountMax=3'
trusted_servers='192.168.158.2'
running_under_failure_trusted_servers='on'
repmgrd_pid_file='/home/kingbase/cluster/kingbase/etc/hamgrd.pid'
kbha_pid_file='/home/kingbase/cluster/kingbase/etc/kbha.pid'
failover='automatic'
synchronous='quorum'
recovery='standby'
auto_cluster_recovery_level='1'
monitoring_history='no'
reconnect_attempts=10
reconnect_interval=6
promote_command='/home/kingbase/cluster/kingbase/bin/repmgr standby promote -f /home/kingbase/cluster/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/kingbase/bin/repmgr standby follow -f /home/kingbase/cluster/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
ping_path='/bin'
use_check_disk='off'
查看kingbase.auto.conf文件
apl
[kingbase@localhost bin]$ cat ../data/kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'host=192.168.158.24 user=esrep port=54321 application_name=node2 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'
primary_slot_name = 'repmgr_slot_2'
[kingbase@localhost bin]$
二、修改物理机ip
主
关闭集群和数据库
apl
#关闭集群和数据库
[kingbase@localhost bin]$ ./sys_monitor.sh stop
修改主物理ip
apl
#修改ip为192.168.158.34
[root@localhost ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33
启用新ip
apl
[root@localhost ~]# ifdown ens33 && ifup ens33
[root@localhost ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:cf:e3:d4 brd ff:ff:ff:ff:ff:ff
inet 192.168.158.34/24 brd 192.168.158.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet6 fe80::d9f2:f13c:d39c:6eef/64 scope link noprefixroute
valid_lft forever preferred_lft forever
修改备物理ip
apl
#修改ip为192.168.158.35
[root@localhost ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33
启用新ip
apl
[root@localhost ~]# ifdown ens33 && ifup ens33
[root@localhost ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:9d:f7:73 brd ff:ff:ff:ff:ff:ff
inet 192.168.158.35/24 brd 192.168.158.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet6 fe80::d9f2:f13c:d39c:6eef/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::1dc4:530f:98e:4ace/64 scope link noprefixroute
valid_lft forever preferred_lft forever
三、修改repmgr.conf 和kingbase.auto.conf配置文件
主:repmgr.conf文件
apl
#将这个ip改为新物理ip
conninfo='host=192.168.158.34 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'
备:repmgr.conf文件
apl
#将这个ip改为新物理ip
conninfo='host=192.168.158.35 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'
kingbase.auto.conf文件
apl
#将这个ip指向主库ip
primary_conninfo = 'host=192.168.158.34 user=esrep port=54321 application_name=node2 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'
四、启动主备库数据库服务
apl
[kingbase@localhost bin]$ ./sys_ctl -D /home/kingbase/cluster/kingbase/data start
等待服务器进程启动 ....2025-12-17 18:38:43.873 CST [4322] LOG: sepapower extension initialized
2025-12-17 18:38:43.880 CST [4322] LOG: sysaudit extension initialized
2025-12-17 18:38:43.880 CST [4322] LOG: starting KingbaseES V009R001C002B0014 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit
2025-12-17 18:38:43.880 CST [4322] LOG: listening on IPv4 address "0.0.0.0", port 54321
2025-12-17 18:38:43.880 CST [4322] LOG: listening on IPv6 address "::", port 54321
2025-12-17 18:38:43.881 CST [4322] LOG: listening on Unix socket "/tmp/.s.KINGBASE.54321"
2025-12-17 18:38:44.161 CST [4322] LOG: redirecting log output to logging collector process
2025-12-17 18:38:44.161 CST [4322] HINT: Future log output will appear in directory "../sys_log".
完成
服务器进程已经启动
现在别起集群,因为数据库元数据表里面的信息跟现在的信息是不一致的,所以先手动起数据库
五、注册主库到集群
1、注册primary到集群
apl
[kingbase@localhost bin]$ ./repmgr primary register -F
[INFO] connecting to primary database...
[INFO] "repmgr" extension is already installed
[NOTICE] primary node record (ID: 1) updated
2、查看集群节点状态
apl
[kingbase@localhost bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+---------------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=192.168.158.34 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | ? unreachable | ? node1 | default | 100 | | ? | host=192.168.158.25 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[WARNING] following issues were detected
- unable to connect to node "node2" (ID: 2)
- node "node2" (ID: 2) is registered as an active standby but is unreachable
[HINT] execute with --verbose option to see connection error messages
六、注册备库到集群
1)关闭数据库服务
apl
[kingbase@localhost bin]$ ./sys_ctl -D /home/kingbase/cluster/kingbase/data stop
等待服务器进程关闭 .... 完成
服务器进程已经关闭
2)将备库节点重新加入到集群
apl
[kingbase@localhost bin]$ ./repmgr node rejoin -h 192.168.158.34 -U esrep -d esrep
[NOTICE] rejoin target is node "node1" (ID: 1)
[INFO] timelines are same, this server is not ahead
[DETAIL] local node lsn is 0/120007D8, rejoin target lsn is 0/120007D8
[INFO] creating replication slot as user "esrep"
[NOTICE] setting node 2's upstream to node 1
[WARNING] unable to ping "host=192.168.158.35 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000"
[DETAIL] KCIping() returned "KCIPING_NO_RESPONSE"
[NOTICE] begin to start server at 2025-12-17 18:46:35.167440
[NOTICE] starting server using "/home/kingbase/cluster/kingbase/bin/sys_ctl -w -t 90 -D '/home/kingbase/cluster/kingbase/data' -l /home/kingbase/cluster/kingbase/bin/logfile start"
[NOTICE] start server finish at 2025-12-17 18:46:35.379881
[NOTICE] NODE REJOIN successful
3)注册standby到集群
apl
[kingbase@localhost bin]$ ./repmgr standby register -h 192.168.158.34 -U esrep -d esrep -F
[INFO] connecting to local node "node2" (ID: 2)
[WARNING] database connection parameters not required when the standby to be registered is running
[DETAIL] repmgr uses the "conninfo" parameter in "repmgr.conf" to connect to the standby
[INFO] connecting to primary database
[INFO] standby registration complete
[NOTICE] standby node "node2" (ID: 2) successfully registered
主库查看集群状态和主备流复制状态
1)查看集群节点状态
apl
[kingbase@localhost bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=192.168.158.34 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.158.35 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2)查看主备流复制状态
apl
[kingbase@localhost bin]$ ./ksql test system
输入 "help" 来获取帮助信息.
test=#
test=#
test=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | s
tate | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
------+----------+---------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+----
-------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------+---------------------
----------
4785 | 16385 | esrep | node2 | 192.168.158.35 | | 36808 | 2025-12-17 18:46:35.338953+08 | | str
eaming | 0/12000C40 | 0/12000C40 | 0/12000C40 | 0/12000C40 | | | | 1 | quorum | 2025-12-17 18:50:12.
292499+08
(1 行记录)
test=#
启动主备库repmgrd服务
apl
[kingbase@localhost bin]$ ./repmgrd -d
[2025-12-17 18:50:46] [NOTICE] redirecting logging output to "/home/kingbase/cluster/kingbase/log/hamgr.log"
七、重启集群服务验证
1、通过sys_monitor.sh启动集群
apl
[kingbase@localhost bin]$ ./sys_monitor.sh restart
2、查看集群节点状态
apl
[kingbase@localhost bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=192.168.158.34 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.158.35 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
3、查看主备流复制状态
apl
test=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | s
tate | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
------+----------+---------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+----
-------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------+---------------------
----------
5597 | 16385 | esrep | node2 | 192.168.158.35 | | 36830 | 2025-12-17 18:51:41.469861+08 | | str
eaming | 0/130004D0 | 0/130004D0 | 0/130004D0 | 0/130004D0 | | | | 1 | quorum | 2025-12-17 18:54:44.
170276+08
(1 行记录)
八、总结
apl
对于集群IP的修改需要停止集群服务(cluster和db),将影响业务的正常运行,所以在集群部署前需要做好IP的规划,避免在后期修改给业务正常运行带来影响。
5597 | 16385 | esrep | node2 | 192.168.158.35 | | 36830 | 2025-12-17 18:51:41.469861+08 | | str
eaming | 0/130004D0 | 0/130004D0 | 0/130004D0 | 0/130004D0 | | | | 1 | quorum | 2025-12-17 18:54:44.
170276+08
(1 行记录)
八、总结
```apl
对于集群IP的修改需要停止集群服务(cluster和db),将影响业务的正常运行,所以在集群部署前需要做好IP的规划,避免在后期修改给业务正常运行带来影响。