今天和veeam的厂商一起排查一个备份的问题,在检查rac节点的ip时发现有些异常,在node1的节点上只看到了一个ip?这是什么情况,跟着博主一起来排查处理。
节点1 :发现只有public ip 没有vip和scanip?
[root@racDB01 ~]# ip ad1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:50:56:8a:1e:48 brd ff:ff:ff:ff:ff:ff inet 10.247.41.81/24 brd 10.247.41.255 scope global ens192 valid_lft forever preferred_lft forever inet6 fe80::250:56ff:fe8a:1e48/64 scope link valid_lft forever preferred_lft forever3: ens256: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:50:56:8a:fe:da brd ff:ff:ff:ff:ff:ff inet 192.168.1.101/24 brd 192.168.1.255 scope global ens256 valid_lft forever preferred_lft forever inet 169.254.24.84/19 brd 169.254.31.255 scope global ens256:1 valid_lft forever preferred_lft forever inet6 fe80::250:56ff:fe8a:feda/64 scope link valid_lft forever preferred_lft forever
检查节点2:节点2只有public ip和scanip,也没有vip?
[oracle@racDB02 ~]$ ip ad1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:50:56:8a:36:ce brd ff:ff:ff:ff:ff:ff inet 10.247.41.82/24 brd 10.247.41.255 scope global ens192 valid_lft forever preferred_lft forever inet 10.247.41.85/24 brd 10.247.41.255 scope global secondary ens192:1 valid_lft forever preferred_lft forever inet6 fe80::250:56ff:fe8a:36ce/64 scope link valid_lft forever preferred_lft forever3: ens256: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:50:56:8a:12:00 brd ff:ff:ff:ff:ff:ff inet 192.168.1.102/24 brd 192.168.1.255 scope global ens256 valid_lft forever preferred_lft forever inet 169.254.3.36/19 brd 169.254.31.255 scope global ens256:1 valid_lft forever preferred_lft forever inet6 fe80::250:56ff:fe8a:1200/64 scope link valid_lft forever preferred_lft forever
检查session连接,看到确实是负载没有均衡,大部分都跑在了scanip所在的节点2
[oracle@racDB01 ~]$ sqlplus / as sysdbaSQL*Plus: Release 19.0.0.0.0 - Production on Sat Oct 12 15:45:12 2024Version 19.20.0.0.0Copyright (c) 1982, 2022, Oracle. All rights reserved.Connected to:Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - ProductionVersion 19.20.0.0.0SYS@racdb1>select inst_id,count(1) from gv$session group by inst_id order by 1; 2 INST_ID COUNT(1)---------- ---------- 1 128 2 391
检查vip的连通性,发现两个vip都是不通的
[oracle@racDB01 ~]$ ping 10.247.41.83PING 10.247.41.83 (10.247.41.83) 56(84) bytes of data.From 10.247.41.81 icmp_seq=1 Destination Host UnreachableFrom 10.247.41.81 icmp_seq=2 Destination Host UnreachableFrom 10.247.41.81 icmp_seq=3 Destination Host UnreachableFrom 10.247.41.81 icmp_seq=4 Destination Host Unreachable^C--- 10.247.41.83 ping statistics ---4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 2999mspipe 4[oracle@racDB01 ~]$ ping 10.247.41.84PING 10.247.41.84 (10.247.41.84) 56(84) bytes of data.From 10.247.41.81 icmp_seq=1 Destination Host UnreachableFrom 10.247.41.81 icmp_seq=2 Destination Host UnreachableFrom 10.247.41.81 icmp_seq=3 Destination Host UnreachableFrom 10.247.41.81 icmp_seq=4 Destination Host Unreachable
检查集群vip的状态,发现vip的状态都是正常的online
ora.scan1.vip 1 ONLINE ONLINE racdb02 STABLEora.racdb01.vip 1 ONLINE ONLINE racdb01 STABLEora.racsdb02.vip 1 ONLINE ONLINE racdb02 STABLE
再检查vip的配置信息,发现vip跑在public ip上
节点2也有同样的问题
[oracle@racDB01 ~]#srvctl config vip -n racdb01VIP exists: network number 1, hosting node racdb01VIP Name: racdb01VIP IPv4 Address: 10.247.41.81VIP IPv6 Address: VIP is enabled.##IP配置信息[oracle@racDB01 ~]$ cat /etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain localhost6 localhost6.localdomain6#public IP10.247.41.81 racDB0110.247.41.82 racDB02#VIP10.247.41.83 racDB01-vip10.247.41.84 racDB02-vip#private IP192.168.1.101 racDB01-priv192.168.1.102 racDB02-priv#scanip10.247.41.85 racf-scan
Soultion
虽然这个状态是不正常的,但是目前并未影响前端应用,但是还是要处理掉,解决办法如下:
一个节点一个节点的处理,不会影响应用
1.关闭监听
su - grid[grid@RACDB01 ~]$ lsnrctl stopLSNRCTL for Linux: Version 19.0.0.0.0 - Production on 12-OCT-2024 15:53:32Copyright (c) 1991, 2023, Oracle. All rights reserved.Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))The command completed successfully
2. 停止vip
srvctl stop vip -n racdb01
3.修改vip到原来正确的ip地址
srvctl modify nodeapps -n racdb02 -A 10.247.41.83/255.255.255.0/ens192
4.启动vip 并检查
srvctl start vip -n racdb01srvctl config vip -n racdb01[root@RACDB01 ~]# srvctl config vip -n racdb01VIP exists: network number 1, hosting node racdb01VIP IPv4 Address: 10.247.41.83VIP IPv6 Address: VIP is enabled.VIP is individually enabled on nodes: VIP is individually disabled on nodes:
5. 启动监听并检查
6.节点2同样的操作
但是为什么会vip跑到public上这个有点奇怪,找了一圈log也没找到。