背景
虚拟机下测试用的19c的RAC,在安装RU的时候,一个节点安装时间可以接受,另一个节点安装时间很长。(时间长,主要耗在关闭集群和启动集群)
平时在启动关闭rac的时候,时间较长,尤其是关闭rac的时候,耗时很长。
-- 在安装19.19的时候,节点1耗时78分钟,节点2耗时50分钟
OPatchauto session completed at Tue Jun 6 16:59:30 2023
Time taken to complete the session 78 minutes, 26 seconds
[root@node19c01 grid]#
OPatchauto session completed at Tue Jun 6 15:38:29 2023
Time taken to complete the session 50 minutes, 11 seconds
[root@node19c02 psu]#
-- 在安装19.20的时候,节点2耗时72分钟
OPatchauto session completed at Thu Aug 17 10:15:58 2023
Time taken to complete the session 72 minutes, 13 seconds
[root@node19c02 bin]#
-- 在安装19.19的时候,节点1耗时43分钟,节点2耗时223分钟
OPatchauto session completed at Sat Nov 4 17:47:11 2023
Time taken to complete the session 43 minutes, 41 seconds
[root@node19c01 ~]#
OPatchauto session completed at Sat Nov 4 21:34:26 2023
Time taken to complete the session 223 minutes, 51 seconds
[root@node19c02 35642822]#
[root@node19c02 35642822]#
-- 通过查看打补丁的过程,发现这两个步骤很耗时。尤其是bring down,关闭集群的时候,比较耗时
Preparing to bring down database service on home /u01/app/oracle/product/19.0.0/db_1
Performing postpatch operations on CRS - starting CRS service on home /u01/app/19.0.0/grid
-- 在手工启动集群的时候,相对较慢,但可以接受,有一次是卡在CRS-4537: Cluster Ready Services is online
-- 在手工关闭集群的时候,非常慢卡在这个地方 CRS-2677: Stop of 'ora.chad' on 'node19c02' succeeded
-- 最后一次较长时间,卡在CRS-2677: Stop of 'ora.node19c02.vip' on 'node19c02' succeeded
--查看alert log 有较多的ora.asm等信息 。GI关闭了2小时多。有一些check ora.asm的信息,有查找dns192.168.2.1的信息
2020-05-01 16:04:14.508 [ORAAGENT(10234)]CRS-5818: Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:10:10} in /u01/app/grid/diag/crs/node19c02/crs/trace/crsd_oraagent_grid.trc.
2020-05-01 16:04:45.505 [ORAAGENT(10234)]CRS-5818: Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:1:17} in /u01/app/grid/diag/crs/node19c02/crs/trace/crsd_oraagent_grid.trc.
2020-05-01 16:05:35.642 [ORAAGENT(10234)]CRS-5818: Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:10:12} in /u01/app/grid/diag/crs/node19c02/crs/trace/crsd_oraagent_grid.trc.
2020-05-01 16:17:54.224 [ORAAGENT(10234)]CRS-5818: Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:10:32} in /u01/app/grid/diag/crs/node19c02/crs/trace/crsd_oraagent_grid.trc.
2020-05-01 16:18:12.596 [ORAAGENT(17237)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 17237
2020-05-01 16:18:25.224 [ORAAGENT(10234)]CRS-5818: Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:1:11} in /u01/app/grid/diag/crs/node19c02/crs/trace/crsd_oraagent_grid.trc.
2020-05-01 16:18:25.632 [OHASD(1354)]CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node19c02'
2020-05-01 16:19:15.368 [ORAAGENT(10234)]CRS-5818: Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:10:34} in /u01/app/grid/diag/crs/node19c02/crs/trace/crsd_oraagent_grid.trc.
ag/crs/node19c02/crs/trace/crsd_oraagent_grid.trc.
2023-06-06 10:56:30.111 [ORAAGENT(9379)]CRS-5818: Aborted command 'res_attr_modified' for resource 'ora.ons'. Details at (:CRSAGF00113:) {0:6:2} in /u01/app/grid/diag/crs/node19c02/crs/trace/crsd_oraagent_grid.trc.
2023-06-06 10:57:30.737 [ORAAGENT(10450)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 10450
2023-06-06 10:58:01.065 [ORAAGENT(10450)]CRS-5818: Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:7:2} in /u01/app/grid/diag/crs/node19c02/crs/trace/crsd_oraagent_grid.trc.
2023-06-06 11:07:48.782 [ORAAGENT(11765)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 11765
2023-06-06 11:11:47.098 [OHASD(1310)]CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node19c02'
2023-06-06 11:12:20.649 [ORAROOTAGENT(2214)]CRS-5822: Agent '/u01/app/19.0.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:3:70} in /u01/app/grid/diag/crs/node19c02/crs/trace/crsd_orarootagent_root.trc.
2023-06-06 11:12:20.656 [ORAAGENT(10450)]CRS-5822: Agent '/u01/app/19.0.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:7:9} in /u01/app/grid/diag/crs/node19c02/crs/trace/crsd_oraagent_grid.trc.
2023-06-06 11:12:37.449 [OCTSSD(2043)]CRS-2405: The Cluster Time Synchronization Service on host node19c02 is shutdown by user
2023-06-06 11:12:37.450 [OCTSSD(2043)]CRS-8504: Oracle Clusterware OCTSSD process with operating system process ID 2043 is exiting
2023-06-06 11:12:38.572 [OCSSD(1712)]CRS-1603: CSSD on node node19c02 has been shut down.
2023-06-06 11:12:41.601 [GPNPD(1551)]CRS-2329: GPNPD on node node19c02 shut down.
2023-06-06 11:13:03.869 [OHASD(1310)]CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node19c02' has completed
2023-06-06 11:13:03.892 [ORAROOTAGENT(1411)]CRS-5822: Agent '/u01/app/19.0.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:4:15} in /u01/app/grid/diag/crs/node19c02/crs/trace/ohasd_orarootagent_root.trc.
2023-06-06 16:15:22.172 [CVUD(1751)]CRS-10051: CVU found following errors with Clusterware setup : PRVG-10048 : Name "node19c01" was not resolved to an address of the specified type by name servers "192.168.2.1".
2023-06-06 23:29:08.319 [CVUD(69952)]CRS-10051: CVU found following errors with Clusterware setup : PRVF-5622 : The 'search' entry does not exist in file "/etc/resolv.conf" on nodes: "node19c01".
PRVG-11372 : Number of SCAN IP addresses that SCAN "scan19c" resolved to did not match the number of SCAN VIP resources
PRVG-1101 : SCAN name "scan19c" failed to resolve
2023-08-16 13:16:20.626 [ORAAGENT(15597)]CRS-5818: Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {2:57504:1091} in /u01/app/grid/diag/crs/node19c02/crs/trace/crsd_oraagent_grid.trc.
2023-08-16 13:17:00.706 [ORAAGENT(15597)]CRS-5818: Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:9:2} in /u01/app/grid/diag/crs/node19c02/crs/trace/crsd_oraagent_grid.trc.
2023-08-16 13:17:40.731 [ORAAGENT(15597)]CRS-5818: Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:9:2} in /u01/app/grid/diag/crs/node19c02/crs/trace/crsd_oraagent_grid.trc.
2023-08-16 13:18:06.887 [CVUD(3151)]CRS-10051: CVU found following errors with Clusterware setup : Refer to My Oracle Support notes "1357657.1" for more details regarding errors "PRVG-11067".
Refer to My Oracle Support notes "1357657.1" for more details regarding errors "PRVG-11067".
PRVF-5622 : The 'search' entry does not exist in file "/etc/resolv.conf" on nodes: "node19c01".
Refer to My Oracle Support notes "1357657.1" for more details regarding errors "PRVG-11067".
PRVG-11372 : Number of SCAN IP addresses that SCAN "scan19c" resolved to did not match the number of SCAN VIP resources
PRVG-1101 : SCAN name "scan19c" failed to resolve
2023-08-17 08:08:02.411 [CRSD(3019)]CRS-2771: Maximum restart attempts reached for resource 'ora.node19c02.vip'; will not restart.
2023-08-17 08:08:02.737 [ORAAGENT(15554)]CRS-5016: Process "/u01/app/19.0.0/grid/bin/lsnrctl" spawned by agent "ORAAGENT" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/grid/diag/crs/node19c02/crs/trace/crsd_oraagent_grid.trc"
2023-08-17 08:08:03.537 [GIPCD(1795)]CRS-42216: No interfaces are configured on the local node for interface definition ens34(:.*)?:10.10.10.0: available interface definitions are [ens33(:.*)?:192.168.2.0][ens34:1(:.*)?:169.254.0.0][ens33(:.*)?:[fe80:0:0:0:0:0:0:0]][ens34(:.*)?:[fe80:0:0:0:0:0:0:0]].
2023-08-17 08:08:30.185 [GIPCD(1795)]CRS-42216: No interfaces are configured on the local node for interface definition ens34(:.*)?:10.10.10.0: available interface definitions are [ens33(:.*)?:192.168.2.0][ens34:1(:.*)?:169.254.0.0][ens33(:.*)?:[fe80:0:0:0:0:0:0:0]][ens34(:.*)?:[fe80:0:0:0:0:0:0:0]].
2023-08-17 08:08:31.144 [OCSSD(1891)]CRS-1609: This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00008:) in /u01/app/grid/diag/crs/node19c02/crs/trace/ocssd.trc.
2023-08-17 08:08:31.186 [OCSSD(1891)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/grid/diag/crs/node19c02/crs/trace/ocssd.trc
2023-08-17 08:08:35.699 [GIPCD(1795)]CRS-42216: No interfaces are configured on the local node for interface definition ens34(:.*)?:10.10.10.0: available interface definitions are [ens33(:.*)?:192.168.2.0][ens34:1(:.*)?:169.254.0.0][ens33(:.*)?:[fe80:0:0:0:0:0:0:0]][ens34(:.*)?:[fe80:0:0:0:0:0:0:0]].
2023-08-17 08:08:36.383 [ORAAGENT(1666)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/u01/app/grid/diag/crs/node19c02/crs/trace/ohasd_oraagent_grid.trc"
/diag/crs/node19c02/crs/trace/crsd_oraagent_grid.trc.
2023-11-04 11:10:07.792 [OHASD(1553)]CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node19c02'
2023-11-04 11:10:27.634 [OCSSD(2111)]CRS-1625: Node node19c01, number 1, was shut down
2023-11-04 13:33:01.295 [OCTSSD(2517)]CRS-8504: Oracle Clusterware OCTSSD process with operating system process ID 2517 is exiting
2023-11-04 13:33:02.427 [OCSSD(2111)]CRS-1603: CSSD on node node19c02 has been shut down.
2023-11-04 13:33:04.790 [ORAROOTAGENT(98836)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 98836
2023-11-04 13:33:05.415 [GPNPD(1902)]CRS-2329: Grid Plug and Play Daemon(GPNPD) on node node19c02 shut down.
2023-11-04 13:33:28.198 [OHASD(1553)]CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node19c02' has completed
--查看asm实例的log,无异常
-- 查看相关的trc文件,无太多有用的信息
2023-11-08 15:40:23.814 : USRTHRD:930125568: [ INFO] {0:9:2} dumpAsmLsnrReloadVec AsmLsnr Res = ora.ASMNET1LSNR_ASM.lsnr, Reload done = 1
2023-11-08 15:40:24.029 :CLSDYNAM:1897694976: [ora.DATA.dg]{0:9:2} [check] DgpAgent::runCheck 220 check if ASM failed
2023-11-08 15:40:24.029 :CLSDYNAM:1897694976: [ora.DATA.dg]{0:9:2} [check] DgpAgent::queryDgStatus 130 dgName DGStatus is not cached.
2023-11-08 15:40:24.030 : USRTHRD:1897694976: [ INFO] {0:9:2} Thread:DGStatusUpdater thread constructor exit this:4c11f680 m_pThnd:0 m_thndMX:4c11f6a0, m_tintMX:4c11f6f0 &m_postMX:0x7f6b4c11f6d0
2023-11-08 15:40:24.030 :CLSDYNAM:1912403712: [ora.OCR.dg]{0:9:2} [check] DgpAgent::runCheck 220 check if ASM failed
2023-11-08 15:40:24.030 :CLSDYNAM:1912403712: [ora.OCR.dg]{0:9:2} [check] DgpAgent::queryDgStatus 130 dgName DGStatus is not cached.
-- 查看节点1 和节点2上的日志,查找resolv.conf。发现有很多解析dns的相关信息,很多网关192.168.2.1的信息
[grid@node19c01 trace]$ more * | grep resolv.conf
2020-05-01 11:32:37.545 [CVUD(3455)]CRS-10051: CVU found following errors with Clusterware setup : PRVF-5622 : The 'search' entry does not exist in file "/etc/resolv.conf" on nodes: "node19c01".
2020-05-01 12:29:34.334 [CVUD(4030)]CRS-10051: CVU found following errors with Clusterware setup : PRVF-5622 : The 'search' entry does not exist in file "/etc/resolv.conf" on nodes: "node19c01".
2023-11-08 16:46:43.956 [CVUD(42984)]CRS-10051: CVU found following errors with Clusterware setup : PRVF-5622 : The 'search' entry does not exist in file "/etc/resolv.conf" on nodes: "node19c01".
[grid@node19c01 trace]$
[root@node19c02 trace]# more * | grep resolv.conf
2023-06-06 16:15:22.170 [CVUD(1751)]CRS-10051: CVU found following errors with Clusterware setup : PRVF-5622 : The 'search' entry does not exist in file "/etc/resolv.conf" on nodes: "node19c01".
PRVF-5622 : The 'search' entry does not exist in file "/etc/resolv.conf" on nodes: "node19c01".
2023-06-06 23:29:08.319 [CVUD(69952)]CRS-10051: CVU found following errors with Clusterware setup : PRVF-5622 : The 'search' entry does not exist in file "/etc/resolv.conf" on nodes: "node19c01".
PRVF-5622 : The 'search' entry does not exist in file "/etc/resolv.conf" on nodes: "node19c01".
PRVF-5622 : The 'search' entry does not exist in file "/etc/resolv.conf" on nodes: "node19c01".
2023-08-17 09:00:22.018 [CVUD(3098)]CRS-10051: CVU found following errors with Clusterware setup : PRVF-5622 : The 'search' entry does not exist in file "/etc/resolv.conf" on nodes: "node19c01".
2023-10-24 09:58:56.141 [CVUD(4051)]CRS-10051: CVU found following errors with Clusterware setup : PRVF-5622 : The 'search' entry does not exist in file "/etc/resolv.conf" on nodes: "node19c01".
[root@node19c02 trace]#
[grid@node19c01 trace]$ more alert.log | grep 192.168.2.1
PRVG-10048 : Name "node19c01" was not resolved to an address of the specified type by name servers "192.168.2.1".
PRVG-10048 : Name "node19c02" was not resolved to an address of the specified type by name servers "192.168.2.1".
PRVG-10048 : Name "node19c01" was not resolved to an address of the specified type by name servers "192.168.2.1".
PRVG-10048 : Name "node19c01" was not resolved to an address of the specified type by name servers "192.168.2.1".
PRVG-10048 : Name "node19c02" was not resolved to an address of the specified type by name servers "192.168.2.1".
PRVG-10048 : Name "node19c01" was not resolved to an address of the specified type by name servers "192.168.2.1".
[grid@node19c01 trace]$
[root@node19c02 trace]# more alert.log | grep 192.168.2.1
PRVG-10048 : Name "node19c02" was not resolved to an address of the specified type by name servers "192.168.2.1".
PRVG-10048 : Name "node19c02" was not resolved to an address of the specified type by name servers "192.168.2.1".
2023-06-06 16:15:22.172 [CVUD(1751)]CRS-10051: CVU found following errors with Clusterware setup : PRVG-10048 : Name "node19c01" was not resolved to an address of the specified type by name servers "192.168.2.1".
PRVG-10048 : Name "node19c02" was not resolved to an address of the specified type by name servers "192.168.2.1".
PRVG-10048 : Name "node19c01" was not resolved to an address of the specified type by name servers "192.168.2.1".
PRVG-10048 : Name "node19c02" was not resolved to an address of the specified type by name servers "192.168.2.1".
2023-06-06 23:29:08.320 [CVUD(69952)]CRS-10051: CVU found following errors with Clusterware setup : PRVG-10048 : Name "node19c01" was not resolved to an address of the specified type by name servers "192.168.2.1".
[root@node19c02 trace]#
-- 查看dns的配置。OS上配置了dns
[root@node19c02 trace]# cat /etc/resolv.conf
# Generated by NetworkManager
search localdomain
nameserver 192.168.2.1
nameserver 192.168.71.2
[root@node19c02 trace]#
[grid@node19c01 trace]$ cat /etc/resolv.conf
# Generated by NetworkManager
nameserver 192.168.2.1
[grid@node19c01 trace]$
--查看网卡的配置,配置了网关192.168.2.1
[grid@node19c01 network-scripts]$ more ifcfg-ens33 | grep GATEWAY
GATEWAY=192.168.2.1
[grid@node19c01 network-scripts]$
[root@node19c02 network-scripts]# more ifcfg-ens33 | grep GATEWAY
GATEWAY=192.168.2.1
[root@node19c02 network-scripts]#
-- 解决方法
1 清理掉/etc/resolv.conf里面的信息,即,取消dns的配置(这个是主要原因)
2 网卡里面的网关,去掉。
去掉dns信息和网关信息后,启动和关闭集群,正常。可以很快启动,很快关闭掉。
END