oracle exadata x9的存储节点重启问题分析

近期遇到一台oracle exadata x9的存储节点重启,由于一体机的存储节点是有冗余的,故障发生后,未对数据库业务产生影响。

通过监控系统发现当时数据库的ASM容量发生了波动变化,发出了告警。进行分析后发现是RAID控制器方面的问题(也可能是BUG),有告警日志megaraid_sas 0000:4b:00.0: Application firmware crash;而重启后存储节点已经自动恢复正常,数据库层面,也存储节点对应的ASM磁盘也自动加回去了,凸显了oracle exadata还是很好用的。

检查ILOM,无告警。

检查MESSAGE日志:

Mar 13 02:50:01 celadm05 systemd: Started Session 64057 of user root.

Mar 13 02:50:01 celadm05 systemd: Started Session 64058 of user root.

Mar 13 03:00:01 celadm05 systemd: Started Session 64059 of user root.

Mar 13 03:00:01 celadm05 systemd: Started Session 64060 of user root.

Mar 13 03:00:01 celadm05 systemd: Started Session 64061 of user root.

Mar 13 03:01:01 celadm05 systemd: Started Session 64062 of user root.

Mar 13 03:03:01 celadm05 systemd: Started Session 64063 of user root.

Mar 13 03:06:06 celadm05 systemd: Stopping LSB: Monitors LSI MegaRAID health....

Mar 13 03:06:06 celadm05 kernel: megaraid_sas 0000:4b:00.0: Application firmware crash dump mode set success

Mar 13 03:06:06 celadm05 kernel: megaraid_sas 0000:4b:00.0: Application firmware crash dump mode set success

Mar 13 03:06:06 celadm05 mrdiag: Stopping mrdiagd: [ OK ]

Mar 13 03:06:06 celadm05 systemd: Stopped LSB: Monitors LSI MegaRAID health..

Mar 13 03:06:06 celadm05 systemd: Starting LSB: Monitors LSI MegaRAID health....

Mar 13 03:06:06 celadm05 mrdiag: Starting mrdiagd: [ OK ]#015[ OK ]

Mar 13 03:06:06 celadm05 systemd: Started LSB: Monitors LSI MegaRAID health..

Mar 13 03:06:06 celadm05 kernel: megaraid_sas 0000:4b:00.0: Application firmware crash dump mode set success

Mar 13 03:06:09 celadm05 kernel: XFS (md25p1): Mounting V4 Filesystem

Mar 13 03:06:09 celadm05 kernel: XFS (md25p1): Ending clean mount

Mar 13 03:06:12 celadm05 kernel: XFS (md25p1): Unmounting Filesystem

Mar 13 03:06:12 celadm05 setup_disks.py: Starting: command line: ['/opt/oracle.cellos/setup_disks.py', '--get-dev

ice', '--label', 'var', '--sys-dev', '/dev/md24p5', '--layout', 'cell_layout_1']

Mar 13 03:06:14 celadm05 setup_disks.py: Starting: command line: ['/opt/oracle.cellos/setup_disks.py', '--get-dev

ice', '--label', 'var', '--sys-dev', '/dev/md24p5', '--inactive', '--layout', 'cell_layout_1']

Mar 13 03:06:18 celadm05 kernel: XFS (md25p1): Mounting V4 Filesystem

Mar 13 03:06:18 celadm05 kernel: XFS (md25p1): Ending clean mount

Mar 13 03:06:21 celadm05 kernel: XFS (md25p1): Unmounting Filesystem

Mar 13 03:06:55 celadm05 kernel: md25: p1 p2

Mar 13 03:06:59 celadm05 kernel: md25: p1 p2

Mar 13 03:07:01 celadm05 systemd: Stopping ExaWatcher...

Mar 13 03:07:02 celadm05 systemd: Stopped ExaWatcher.

Mar 13 03:07:02 celadm05 systemd: Started ExaWatcher.

Mar 13 03:07:02 celadm05 rsyslogd: [origin software="rsyslogd" swVersion="8.24.0-57.0.1.el7_9.2" x-pid="44858" x-

info="http://www.rsyslog.com"] rsyslogd was HUPed

Mar 13 03:10:01 celadm05 systemd: Started Session 64064 of user root.

Mar 13 03:10:01 celadm05 systemd: Started Session 64065 of user root.

Mar 13 03:20:01 celadm05 systemd: Started Session 64066 of user root.

Mar 13 03:20:01 celadm05 systemd: Started Session 64067 of user root.

Mar 13 03:29:39 celadm05 nscd: 6064 monitoring file `/etc/passwd` (1)

Mar 13 03:29:39 celadm05 nscd: 6064 monitoring directory `/etc` (2)

Mar 13 03:29:39 celadm05 nscd: 6064 monitoring file `/etc/group` (3)

Mar 13 03:29:39 celadm05 nscd: 6064 monitoring directory `/etc` (2)

Mar 13 03:29:39 celadm05 nscd: 6064 monitoring file `/etc/hosts` (4)

Mar 13 03:29:39 celadm05 nscd: 6064 monitoring directory `/etc` (2)

Mar 13 03:29:39 celadm05 nscd: 6064 monitoring file `/etc/resolv.conf` (5)

Mar 13 03:29:39 celadm05 nscd: 6064 monitoring directory `/etc` (2)

Mar 13 03:29:39 celadm05 nscd: 6064 monitoring file `/etc/services` (6)

Mar 13 03:29:39 celadm05 nscd: 6064 monitoring directory `/etc` (2)

Mar 13 03:30:01 celadm05 systemd: Started Session 64068 of user root.

Mar 13 03:30:01 celadm05 systemd: Started Session 64069 of user root.

Mar 13 03:33:01 celadm05 systemd: Started Session 64070 of user root.

Mar 13 03:40:01 celadm05 systemd: Started Session 64071 of user root.

Mar 13 03:40:01 celadm05 systemd: Started Session 64072 of user root.

Mar 13 03:50:01 celadm05 systemd: Started Session 64073 of user root.

Mar 13 03:50:01 celadm05 systemd: Started Session 64074 of user root.

Mar 13 04:00:01 celadm05 systemd: Started Session 64075 of user root.

Mar 13 04:00:01 celadm05 systemd: Started Session 64076 of user root.

Mar 13 04:00:01 celadm05 systemd: Started Session 64077 of user root.

Mar 13 04:01:01 celadm05 systemd: Started Session 64078 of user root.

Mar 13 04:03:01 celadm05 systemd: Started Session 64079 of user root.

Mar 13 04:10:01 celadm05 systemd: Started Session 64080 of user root.

Mar 13 04:10:01 celadm05 systemd: Started Session 64081 of user root.

Mar 13 04:20:01 celadm05 systemd: Started Session 64082 of user root.

Mar 13 04:20:01 celadm05 systemd: Started Session 64083 of user root.

Mar 13 04:29:39 celadm05 nscd: 11183 monitoring file `/etc/passwd` (1)

Mar 13 04:29:39 celadm05 nscd: 11183 monitoring directory `/etc` (2)

Mar 13 04:29:39 celadm05 nscd: 11183 monitoring file `/etc/group` (3)

Mar 13 04:29:39 celadm05 nscd: 11183 monitoring directory `/etc` (2)

Mar 13 04:29:39 celadm05 nscd: 11183 monitoring file `/etc/hosts` (4)

Mar 13 04:29:39 celadm05 nscd: 11183 monitoring directory `/etc` (2)

Mar 13 04:29:39 celadm05 nscd: 11183 monitoring file `/etc/resolv.conf` (5)

Mar 13 04:29:39 celadm05 nscd: 11183 monitoring directory `/etc` (2)

Mar 13 04:29:39 celadm05 nscd: 11183 monitoring file `/etc/services` (6)

Mar 13 04:29:39 celadm05 nscd: 11183 monitoring directory `/etc` (2)

Mar 13 04:30:01 celadm05 systemd: Started Session 64084 of user root.

Mar 13 04:30:01 celadm05 systemd: Started Session 64085 of user root.

Mar 13 04:33:01 celadm05 systemd: Started Session 64086 of user root.

相关存储盘的检查日志:

root@celadm05 \~\]# cellcli -e list physicaldisk 252:0 0X0U5B normal 252:1 05PL0G normal 252:2 0MWSDB normal 252:3 0VD58B normal 252:4 0404TG normal 252:5 0W0WTB normal 252:6 0405RG normal 252:7 0W0UJB normal 252:8 0KB7ZB normal 252:9 0LV4NB normal 252:10 0LZVBB normal 252:11 0LUZ4B normal FLASH_1_1 PHAG2096014A6P4CGN-2 normal FLASH_1_2 PHAG2096014A6P4CGN-1 normal FLASH_4_1 PHAG210500JH6P4CGN-1 normal FLASH_4_2 PHAG210500JH6P4CGN-2 normal FLASH_6_1 PHAG210500126P4CGN-1 normal FLASH_6_2 PHAG210500126P4CGN-2 normal FLASH_8_1 PHAG212000HA6P4CGN-2 normal FLASH_8_2 PHAG212000HA6P4CGN-1 normal M2_SYS_0 22033422D05F normal M2_SYS_1 2203342D7EE1 normal PMEM_0_1 8089-a2-2145-00008c90 normal PMEM_0_3 8089-a2-2145-00008ba0 normal PMEM_0_7 8089-a2-2145-00008cba normal PMEM_0_8 8089-a2-2145-00008cec normal PMEM_0_12 8089-a2-2145-00008eb8 normal PMEM_0_14 8089-a2-2145-00008ef7 normal PMEM_1_1 8089-a2-2145-00008f8e normal PMEM_1_3 8089-a2-2145-00008c9e normal PMEM_1_7 8089-a2-2145-00008eba normal PMEM_1_8 8089-a2-2145-00008fe4 normal PMEM_1_12 8089-a2-2145-00008c9c normal PMEM_1_14 8089-a2-2145-00008fce normal

相关推荐
2303_821287381 小时前
mysql在事务中执行DDL的后果_MySQL 8.0之前的限制
jvm·数据库·python
文档搬运工1 小时前
RU 19.31安装
oracle
其实防守也摸鱼1 小时前
全新安装 SQL Server 并直接设置数据目录到 E 盘 完整步骤
数据库·sql·网络安全·sqlserver·教程·工具
2301_769340671 小时前
Golang怎么用gRPC Gateway_Golang gRPC Gateway教程【经典】
jvm·数据库·python
Jetev1 小时前
HTML函数运行时触控屏失灵是硬件故障吗_输入层兼容性测试【详解】
jvm·数据库·python
wang3zc1 小时前
Golang怎么实现SSE服务端推送事件_Golang如何用Server-Sent Events实时推送数据【教程】
jvm·数据库·python
yexuhgu1 小时前
c++ trpc-cpp框架 c++如何使用腾讯trpc构建微服务
jvm·数据库·python
夏恪1 小时前
Go语言如何连接Redis_Go语言Redis连接操作教程【进阶】
jvm·数据库·python
七七powerful1 小时前
运维养龙虾--MongoDB 官方 Agent Skills 深度解析:为编码智能体注入专家级最佳实践
数据库·mongodb