ORACLE RAC ASM磁盘组OFFLINE后的处理步骤

近期某客户的备机数据库集群ASM磁盘出现问题,数据库系统异常关闭。对问题进行排查,可以发现问题是存储磁盘出现了IO问题后OFFLINE,从而导致磁盘组也OFFLINE。由于系统是备机,发现时已经过去了较长时间,多余NORMAL冗余(或者双活的存储配置)来说,超过REPAIR TIME的断开需要在加回ASM磁盘组时重新同步。

如下是整个分析及处理过程:

1、检查磁盘状态

SQL> select NAME,GROUP_NUMBER,TYPE,state from v$asm_diskgroup;

NAME


GROUP_NUMBER TYPE STATE


test_DBDATA

0 DISMOUNTED

test_OCR

2 NORMAL MOUNTED

2、尝试MOUNT磁盘组的报错

SQL> alter diskgroup all mount;

alter diskgroup all mount

*

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15040: diskgroup is incomplete

ORA-15042: ASM disk "2" is missing from group number "1"

ORA-15042: ASM disk "1" is missing from group number "1"

ORA-15017: diskgroup "test_OCR" cannot be mounted

ORA-15013: diskgroup "test_OCR" is already mounted

SQL> ALTER DISKGROUP test_DBDATA MOUNT ;

ALTER DISKGROUP test_DBDATA MOUNT

*

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15040: diskgroup is incomplete

ORA-15042: ASM disk "2" is missing from group number "1"

ORA-15042: ASM disk "1" is missing from group number "1"

3、检查当前的ASM磁盘情况(注意此命令是动态查看磁盘,如果磁盘现在丢失不显示,就不体现在输出,也就是看到的盘的数量,和正常运行时可能不一样,需要注意。)

SQL> set linesize 200 pagesize 200

SQL> col name for a20

SQL> col path for a30

SQL> col HEADER_STATUS for a12

SQL> select NAME,GROUP_NUMBER,HEADER_STATUS,PATH from V$ASM_DISK order by 2,4;

NAME GROUP_NUMBER HEADER_STATU PATH


0 MEMBER /dev/raw/raw6

0 MEMBER /dev/raw/raw7

0 MEMBER /dev/raw/raw8

test_OCR_0001 2 MEMBER /dev/raw/raw5

test_OCR_0002 2 MEMBER /dev/raw/raw9

_DROPPED_0000_test_OC 2 UNKNOWN

4、操作系统日志查看磁盘故障日志

ul 25 13:26:48 test1 kernel: sd 2:0:1:1: [sdac] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK

Jul 25 13:26:48 test1 kernel: sd 2:0:1:1: [sdac] CDB: Write(10): 2a 00 08 9a 05 1c 00 00 01 00

Jul 25 13:26:48 test1 kernel: sd 2:0:1:1: [sdac] Unhandled error code

Jul 25 13:26:48 test1 kernel: sd 2:0:1:1: [sdac] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK

Jul 25 13:26:48 test1 kernel: sd 2:0:1:1: [sdac] CDB: Write(10): 2a 00 08 9a 05 1a 00 00 02 00

Jul 25 13:26:48 test1 kernel: sd 2:0:1:1: [sdac] Unhandled error code

Jul 25 13:26:48 test1 kernel: sd 2:0:1:1: [sdac] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK

Jul 25 13:26:48 test1 kernel: sd 2:0:1:1: [sdac] CDB: Write(10): 2a 00 08 9a 05 19 00 00 01 00

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 144311575

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 929664

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 927744

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 1133312

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 1133056

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 930816

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 929792

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 4146944

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 4146688

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 1134464

Jul 25 13:26:48 test1 kernel: sd 2:0:1:2: rejecting I/O to offline device

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 7267456

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 7274992

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 144311573

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 144311571

5、查看和修复ASM磁盘

root@test1 rules.d\]# cat 60-raw.rules ACTION=="add", KERNEL=="sd\*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="360060160076042009e3fff15cc1de711", RUN+="/bin/raw /dev/raw/raw1 %N" ACTION=="add", KERNEL=="sd\*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="36006016007604200e69910facb1de711", RUN+="/bin/raw /dev/raw/raw2 %N" ACTION=="add", KERNEL=="sd\*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="36006016007604200e4f075cdcb1de711", RUN+="/bin/raw /dev/raw/raw3 %N" ACTION=="add", KERNEL=="sd\*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="360060160076042009bc20b31cc1de711", RUN+="/bin/raw /dev/raw/raw4 %N" ACTION=="add", KERNEL=="sd\*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="360060160068044001b35a19bd01de711", RUN+="/bin/raw /dev/raw/raw5 %N" ACTION=="add", KERNEL=="sd\*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="36006016006804400818e5d63d01de711", RUN+="/bin/raw /dev/raw/raw6 %N" ACTION=="add", KERNEL=="sd\*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="360060160068044000b4a946ed01de711", RUN+="/bin/raw /dev/raw/raw7 %N" ACTION=="add", KERNEL=="sd\*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="36006016006804400ebb2b17cd01de711", RUN+="/bin/raw /dev/raw/raw8 %N" ACTION=="add", KERNEL=="sd\*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="360060160068044003f43ddb3851fe711", RUN+="/bin/raw /dev/raw/raw9 %N" KERNEL=="raw\[1-9\]", OWNER="grid", GROUP="asmadmin", MODE="660" \[root@test1 rules.d\]# start_udev Starting udev: \[ OK

root@test1 rules.d\]# ls -al /dev/raw/raw\* crw-rw---- 1 grid asmadmin 162, 1 Jul 25 17:23 /dev/raw/raw1 crw-rw---- 1 grid asmadmin 162, 2 Jul 25 17:23 /dev/raw/raw2 crw-rw---- 1 grid asmadmin 162, 3 Jul 25 17:23 /dev/raw/raw3 crw-rw---- 1 grid asmadmin 162, 4 Jul 25 17:23 /dev/raw/raw4 crw-rw---- 1 grid asmadmin 162, 5 Jul 25 17:23 /dev/raw/raw5 crw-rw---- 1 grid asmadmin 162, 6 Jul 25 17:23 /dev/raw/raw6 crw-rw---- 1 grid asmadmin 162, 7 Jul 25 17:23 /dev/raw/raw7 crw-rw---- 1 grid asmadmin 162, 8 Jul 25 17:23 /dev/raw/raw8 crw-rw---- 1 grid asmadmin 162, 9 Jul 25 17:23 /dev/raw/raw9 crw-rw---- 1 root disk 162, 0 Jul 25 17:23 /dev/raw/rawctl 6、查看磁盘并MOUNT磁盘组(此时可以看到全部ASM需要用的磁盘) SQL\> select NAME,GROUP_NUMBER,HEADER_STATUS,PATH from V$ASM_DISK order by 2,4; NAME GROUP_NUMBER HEADER_STATU PATH -------------------- ------------ ------------ ------------------------------ 0 MEMBER /dev/raw/raw1 0 MEMBER /dev/raw/raw2 0 MEMBER /dev/raw/raw3 0 MEMBER /dev/raw/raw4 0 MEMBER /dev/raw/raw6 0 MEMBER /dev/raw/raw7 0 MEMBER /dev/raw/raw8 test_OCR_0001 2 MEMBER /dev/raw/raw5 test_OCR_0002 2 MEMBER /dev/raw/raw9 _DROPPED_0000_test_OCR 2 UNKNOWN MOUNT磁盘组,此时对于DATA磁盘组,整个UMOUNT,此时MOUNT就行: SQL\> ALTER DISKGROUP test_DBDATA MOUNT ; Diskgroup altered. 对于OCR磁盘组,为NORMAL模式,部分盘OFFLINE,此时加回来就提升报错(参数不支持): SQL\> ALTER DISKGROUP test_OCR ONLINE DISKS IN FAILGROUP test_OCR_0000 NOWAIT; ALTER DISKGROUP test_OCR ONLINE DISKS IN FAILGROUP test_OCR_0000 NOWAIT \* ERROR at line 1: ORA-15032: not all alterations performed ORA-15283: ASM operation requires compatible.rdbms of 11.1.0.0.0 or higher 需要强制加回来: SQL\> alter diskgroup test_OCR add disk '/dev/raw/raw4' force; Diskgroup altered. 也再次提升,双活存储时需要设置compatible.rdbms参数及disk_repair_time参数,来应对这种盘OFFLINE问题。

相关推荐
沉到海底去吧Go33 分钟前
【图片自动识别改名】识别图片中的文字并批量改名的工具,根据文字对图片批量改名,基于QT和腾讯OCR识别的实现方案
数据库·qt·ocr·图片识别自动改名·图片区域识别改名·pdf识别改名
老纪的技术唠嗑局1 小时前
重剑无锋,大巧不工 —— OceanBase 中的 Nest Loop Join 使用技巧分享
数据库·sql
未来之窗软件服务1 小时前
JAVASCRIPT 前端数据库-V6--仙盟数据库架构-—-—仙盟创梦IDE
数据库·数据库架构·仙盟创梦ide·东方仙盟·东方仙盟数据库
一只爱撸猫的程序猿3 小时前
构建一个简单的智能文档问答系统实例
数据库·spring boot·aigc
nanzhuhe3 小时前
sql中group by使用场景
数据库·sql·数据挖掘
消失在人海中3 小时前
oracle sql 语句 优化方法
数据库·sql·oracle
Clang's Blog3 小时前
一键搭建 WordPress + MySQL + phpMyAdmin 环境(支持 PHP 版本选择 & 自定义配置)
数据库·mysql·php·wordpr
zzc9213 小时前
MATLAB仿真生成无线通信网络拓扑推理数据集
开发语言·网络·数据库·人工智能·python·深度学习·matlab
未来之窗软件服务3 小时前
JAVASCRIPT 前端数据库-V1--仙盟数据库架构-—-—仙盟创梦IDE
数据库·数据库架构·仙盟创梦ide·东方仙盟数据库
LjQ20404 小时前
网络爬虫一课一得
开发语言·数据库·python·网络爬虫