ORACLE ODAX9-2的一个误告警Affects: /SYS/MB的分析处理

在运维的多套ORACLE ODAX9-2版本,都遇到了一个计算节点的告警:Description: The service Processor poweron selftest has deteced a problem. Probabity;:100, UulD:cd1ebbdf-f099-61de-ca44-ef646defe034, Resource:/SYS/MB,;此告警从描述上来看比较验证,但是事实是主机运行正常,对此告警进行分析认为就误报,ORACLE ODA的硬件管理平台ILOM上提供了清理告警的接口,按如下步骤进行清理后,告警消除,后续持续观察,系统运行正常。

处理步骤如下:

1、查看告警信息:

点击查看告警详情:

2、命令行接口查看告警信息(序列号已经脱敏请勿对比)

root@aaadb1 \~# ipmitool sunoem cli

Connected. Use ^D to exit.

-> start /SP/faultmgmt/shell

Are you sure you want to start /SP/faultmgmt/shell (y/n)? y

faultmgmtsp> fmadm faulty


Time UUID msgid Severity


2023-07-31/08:55:39 cd1ebbdf-f099-61de-ca44-ef646defe034 ILOM-8000-4T Critical

Problem Status : open

Diag Engine : fdd 1.0

System

Manufacturer : Oracle Corporation

Name : ORACLE SERVER X9-2L

Part_Number : 7603

Serial_Number : 23000

System Component

Firmware_Manufacturer : Oracle Corporation

Firmware_Version : (ILOM)5.1.0.23 r147470,(BIOS)62070300

Firmware_Release : (ILOM)2022.09.03,(BIOS)2022.08.17


Suspect 1 of 1

Problem class : fault.chassis.device.sppost

Certainty : 100%

Affects : /SYS/MB

Status : faulted

FRU

Status : faulty

Location : /SYS/MB

Manufacturer : Oracle Corporation

Name : ASM,MTHRBD,2U

Part_Number : 820000

Revision : 12

Serial_Number : 4650000

Chassis

Manufacturer : Oracle Corporation

Name : ORACLE SERVER X9-2L

Part_Number : 76000000

Serial_Number : 2300000

Description : The Service Processor power-on self test has detected a

problem.

Response : The service-required LED may be illuminated on the affected

FRU and chassis.

Impact : The Service Processor may not be able to perform necessary

functions to power on, monitor, or manage the system.

Action : Please refer to the associated reference document at

http://support.oracle.com/msg/ILOM-8000-4T for the latest

service procedures and policies regarding this diagnosis.

3、清理告警

清理的步骤参考PSH Procedural Article for ILOM-Based Diagnosis (Doc ID 1155200.1),使用'fmadm repair' 命令即可

  • Enter the fault management shell.

-> start /SP/faultmgmt/shell

Are you sure you want to start /SP/faultmgmt/shell (y/n) ? y

faultmgmtsp>

  • Use 'fmadm repair' to clear the fault.

Rather than the UUID, the FRU path (/SYS/FANBD/FM0) could also be used.

Example 3
Example 3 shows the 'fmadm repaired' command required after the suspect FRU has been replaced. Using the UUID from the 'fmadm faulty from Example 1 above, the command would be:

faultmgmtsp> fmadm repair 9df39f93-f356-6d26-e081-e4f3a9872c2f

Example 4

Example 4 shows the 'fmadm repaired' command required after the FRU has been replaced.. This example shows the FRU Path from Example 2 above being used. The command would be:

fmadm repair /SYS/MB

具体处理日志如下:(根据告警事件的UUID)

faultmgmtsp> fmadm repair cd1ebbdf-f099-61de-ca44-ef646defe034

faultmgmtsp> fmadm faulty

No faults found

faultmgmtsp> exit

-> exit

Disconnected

相关推荐
这个DBA有点耶1 小时前
NULL不是空——数据库里最反直觉的设计,90%新人踩过的坑
数据库·mysql·代码规范
这个DBA有点耶3 小时前
AI写的SQL跑崩了生产库,这锅谁背?
数据库·人工智能·程序员
镜舟科技4 小时前
Databricks 再提 LTAP,AI 时代的数据底座为何重回大一统叙事?
数据库·架构·agent
Databend5 小时前
从湖仓升级为 Agent 时代的数据控制面,Snowflake 和 Databricks 有哪些布局
大数据·数据库·agent
ClouGence8 小时前
SQL Server CDC 能放到 Always On 备库读吗?一文讲透原理与实践
数据库·sql server
先吃饱再说1 天前
存储的进化:从 MySQL 到浏览器缓存,数据到底住在哪?
数据库
Nturmoils1 天前
字段太多看不全,ksql 的展开模式和输出控制怎么用
数据库·后端
Databend1 天前
Agent 轨迹分析与归因的数据工程实践
大数据·数据库·agent
这个DBA有点耶1 天前
SQL改写进阶:标量子查询的“隐形代价”与消除实战
数据库·mysql·架构
smallyoung1 天前
数据库乐观锁深度解析:MySQL、PostgreSQL 实战 + Spring Boot 集成指南
数据库·mysql·postgresql