某业务生产系统中,三节点的rac数据库中3号节点因故障停机后,进行crs的重启。重启完成后,发现数据库的监听未起来,启动的过程中并提示crs-5005错误。
一、问题过程
查看监听,发现监听no service
bash
ywdb03@/oracle/grid/crs>lsnrctl stat
LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.1.0 - Production on 11-APR-2014 18:52:56
Copyright (c) 1991, 2009, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for IBM/AIX RISC System/6000: Version 11.2.0.1.0 - Production
Start Date 11-APR-2014 18:52:50
Uptime 0 days 0 hr. 0 min. 6 sec
Trace Level off
Security ON: Local OS Authentication
SNMP ON
Listener Parameter File /oracle/grid/product/network/admin/listener.ora
Listener Log File /home/grid/diag/tnslsnr/ywdb03/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
The listener supports no services
The command completed successfully
通过srvctl去启动监听,发现一直提示CRS-5005错误
bash
ywdb03@/oracle/grid/crs>srvctl start listener -n ywdb03
PRCR-1013 : Failed to start resource ora.LISTENER.lsnr
PRCR-1064 : Failed to start resource ora.LISTENER.lsnr on node ywdb03
CRS-5005: IP Address: 192.168.10.6 is already in use in the network
CRS-2674: Start of 'ora.ywdb03.vip' on 'ywdb03' failed
二、解决过程
我们查看数据库另外另个节点,看是否有端口占用这个地址,结果发现在2号节点中发现问题。
bash
ywdb02@/tmp/ibmsupt#ifconfig -a
en8: flags=1e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
inet 192.168.10.2 netmask 0xffffff80 broadcast 192.168.10.127
inet 192.168.10.7 netmask 0xffffff80 broadcast 192.168.10.127
inet 192.168.10.6 netmask 0xffffff80 broadcast 192.168.10.127
inet 192.168.10.5 netmask 0xffffff80 broadcast 192.168.10.127
tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
en9: flags=1e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
inet 172.16.1.2 netmask 0xffffff00 broadcast 172.16.1.255
tcp_sendspace 65536 tcp_recvspace 65536 rfc1323 1
en0: flags=5e080822,c0<BROADCAST,NOTRAILERS,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN>
en1: flags=5e080822,c0<BROADCAST,NOTRAILERS,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN>
en4: flags=5e080822,c0<BROADCAST,NOTRAILERS,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN>
en5: flags=5e080822,c0<BROADCAST,NOTRAILERS,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN>
lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LARGESEND,CHAIN>
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1%1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
删除该端口占用地址。
bash
ywdb02@/tmp/ibmsupt#ifconfig en8 delete 192.168.10.6
ywdb02@/tmp/ibmsupt#ifconfig -a
en8: flags=1e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
inet 192.168.10.2 netmask 0xffffff80 broadcast 192.168.10.127
inet 192.168.10.7 netmask 0xffffff80 broadcast 192.168.10.127
inet 192.168.10.5 netmask 0xffffff80 broadcast 192.168.10.127
tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
en9: flags=1e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
inet 172.16.1.2 netmask 0xffffff00 broadcast 172.16.1.255
tcp_sendspace 65536 tcp_recvspace 65536 rfc1323 1
en0: flags=5e080822,c0<BROADCAST,NOTRAILERS,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN>
en1: flags=5e080822,c0<BROADCAST,NOTRAILERS,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN>
en4: flags=5e080822,c0<BROADCAST,NOTRAILERS,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN>
en5: flags=5e080822,c0<BROADCAST,NOTRAILERS,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN>
lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LARGESEND,CHAIN>
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1%1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
3号节点重启监听服务,服务正常起来
bash
ywdb03@/oracle/grid/crs>srvctl stop listener -n ywdb03
ywdb03@/oracle/grid/crs>srvctl start listener -n ywdb03
三,问题说明
我们也可以通过重启所有的节点的监听服务来释放占用,最终恢复3号节点的监听服务(但由于当时我们另两个数据库节点正在支撑业务服务,所以不能通过该方式进行解决)。