ORACLE RAC ASM磁盘组OFFLINE后的处理步骤

近期某客户的备机数据库集群ASM磁盘出现问题,数据库系统异常关闭。对问题进行排查,可以发现问题是存储磁盘出现了IO问题后OFFLINE,从而导致磁盘组也OFFLINE。由于系统是备机,发现时已经过去了较长时间,多余NORMAL冗余(或者双活的存储配置)来说,超过REPAIR TIME的断开需要在加回ASM磁盘组时重新同步。

如下是整个分析及处理过程:

1、检查磁盘状态

SQL> select NAME,GROUP_NUMBER,TYPE,state from v$asm_diskgroup;

NAME


GROUP_NUMBER TYPE STATE


test_DBDATA

0 DISMOUNTED

test_OCR

2 NORMAL MOUNTED

2、尝试MOUNT磁盘组的报错

SQL> alter diskgroup all mount;

alter diskgroup all mount

*

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15040: diskgroup is incomplete

ORA-15042: ASM disk "2" is missing from group number "1"

ORA-15042: ASM disk "1" is missing from group number "1"

ORA-15017: diskgroup "test_OCR" cannot be mounted

ORA-15013: diskgroup "test_OCR" is already mounted

SQL> ALTER DISKGROUP test_DBDATA MOUNT ;

ALTER DISKGROUP test_DBDATA MOUNT

*

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15040: diskgroup is incomplete

ORA-15042: ASM disk "2" is missing from group number "1"

ORA-15042: ASM disk "1" is missing from group number "1"

3、检查当前的ASM磁盘情况(注意此命令是动态查看磁盘,如果磁盘现在丢失不显示,就不体现在输出,也就是看到的盘的数量,和正常运行时可能不一样,需要注意。)

SQL> set linesize 200 pagesize 200

SQL> col name for a20

SQL> col path for a30

SQL> col HEADER_STATUS for a12

SQL> select NAME,GROUP_NUMBER,HEADER_STATUS,PATH from V$ASM_DISK order by 2,4;

NAME GROUP_NUMBER HEADER_STATU PATH


0 MEMBER /dev/raw/raw6

0 MEMBER /dev/raw/raw7

0 MEMBER /dev/raw/raw8

test_OCR_0001 2 MEMBER /dev/raw/raw5

test_OCR_0002 2 MEMBER /dev/raw/raw9

_DROPPED_0000_test_OC 2 UNKNOWN

4、操作系统日志查看磁盘故障日志

ul 25 13:26:48 test1 kernel: sd 2:0:1:1: [sdac] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK

Jul 25 13:26:48 test1 kernel: sd 2:0:1:1: [sdac] CDB: Write(10): 2a 00 08 9a 05 1c 00 00 01 00

Jul 25 13:26:48 test1 kernel: sd 2:0:1:1: [sdac] Unhandled error code

Jul 25 13:26:48 test1 kernel: sd 2:0:1:1: [sdac] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK

Jul 25 13:26:48 test1 kernel: sd 2:0:1:1: [sdac] CDB: Write(10): 2a 00 08 9a 05 1a 00 00 02 00

Jul 25 13:26:48 test1 kernel: sd 2:0:1:1: [sdac] Unhandled error code

Jul 25 13:26:48 test1 kernel: sd 2:0:1:1: [sdac] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK

Jul 25 13:26:48 test1 kernel: sd 2:0:1:1: [sdac] CDB: Write(10): 2a 00 08 9a 05 19 00 00 01 00

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 144311575

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 929664

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 927744

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 1133312

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 1133056

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 930816

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 929792

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 4146944

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 4146688

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 1134464

Jul 25 13:26:48 test1 kernel: sd 2:0:1:2: rejecting I/O to offline device

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 7267456

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 7274992

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 144311573

Jul 25 13:26:48 test1 kernel: end_request: I/O error, dev sdac, sector 144311571

5、查看和修复ASM磁盘

[root@test1 rules.d]# cat 60-raw.rules

ACTION=="add", KERNEL=="sd*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="360060160076042009e3fff15cc1de711", RUN+="/bin/raw /dev/raw/raw1 %N"

ACTION=="add", KERNEL=="sd*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="36006016007604200e69910facb1de711", RUN+="/bin/raw /dev/raw/raw2 %N"

ACTION=="add", KERNEL=="sd*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="36006016007604200e4f075cdcb1de711", RUN+="/bin/raw /dev/raw/raw3 %N"

ACTION=="add", KERNEL=="sd*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="360060160076042009bc20b31cc1de711", RUN+="/bin/raw /dev/raw/raw4 %N"

ACTION=="add", KERNEL=="sd*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="360060160068044001b35a19bd01de711", RUN+="/bin/raw /dev/raw/raw5 %N"

ACTION=="add", KERNEL=="sd*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="36006016006804400818e5d63d01de711", RUN+="/bin/raw /dev/raw/raw6 %N"

ACTION=="add", KERNEL=="sd*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="360060160068044000b4a946ed01de711", RUN+="/bin/raw /dev/raw/raw7 %N"

ACTION=="add", KERNEL=="sd*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="36006016006804400ebb2b17cd01de711", RUN+="/bin/raw /dev/raw/raw8 %N"

ACTION=="add", KERNEL=="sd*", PROGRAM=="/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="360060160068044003f43ddb3851fe711", RUN+="/bin/raw /dev/raw/raw9 %N"

KERNEL=="raw[1-9]", OWNER="grid", GROUP="asmadmin", MODE="660"

[root@test1 rules.d]# start_udev

Starting udev: [ OK ]

[root@test1 rules.d]# ls -al /dev/raw/raw*

crw-rw---- 1 grid asmadmin 162, 1 Jul 25 17:23 /dev/raw/raw1

crw-rw---- 1 grid asmadmin 162, 2 Jul 25 17:23 /dev/raw/raw2

crw-rw---- 1 grid asmadmin 162, 3 Jul 25 17:23 /dev/raw/raw3

crw-rw---- 1 grid asmadmin 162, 4 Jul 25 17:23 /dev/raw/raw4

crw-rw---- 1 grid asmadmin 162, 5 Jul 25 17:23 /dev/raw/raw5

crw-rw---- 1 grid asmadmin 162, 6 Jul 25 17:23 /dev/raw/raw6

crw-rw---- 1 grid asmadmin 162, 7 Jul 25 17:23 /dev/raw/raw7

crw-rw---- 1 grid asmadmin 162, 8 Jul 25 17:23 /dev/raw/raw8

crw-rw---- 1 grid asmadmin 162, 9 Jul 25 17:23 /dev/raw/raw9

crw-rw---- 1 root disk 162, 0 Jul 25 17:23 /dev/raw/rawctl

6、查看磁盘并MOUNT磁盘组(此时可以看到全部ASM需要用的磁盘)

SQL> select NAME,GROUP_NUMBER,HEADER_STATUS,PATH from V$ASM_DISK order by 2,4;

NAME GROUP_NUMBER HEADER_STATU PATH


0 MEMBER /dev/raw/raw1

0 MEMBER /dev/raw/raw2

0 MEMBER /dev/raw/raw3

0 MEMBER /dev/raw/raw4

0 MEMBER /dev/raw/raw6

0 MEMBER /dev/raw/raw7

0 MEMBER /dev/raw/raw8

test_OCR_0001 2 MEMBER /dev/raw/raw5

test_OCR_0002 2 MEMBER /dev/raw/raw9

_DROPPED_0000_test_OCR 2 UNKNOWN

MOUNT磁盘组,此时对于DATA磁盘组,整个UMOUNT,此时MOUNT就行:

SQL> ALTER DISKGROUP test_DBDATA MOUNT ;

Diskgroup altered.

对于OCR磁盘组,为NORMAL模式,部分盘OFFLINE,此时加回来就提升报错(参数不支持):

SQL> ALTER DISKGROUP test_OCR ONLINE DISKS IN FAILGROUP test_OCR_0000 NOWAIT;

ALTER DISKGROUP test_OCR ONLINE DISKS IN FAILGROUP test_OCR_0000 NOWAIT

*

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15283: ASM operation requires compatible.rdbms of 11.1.0.0.0 or higher

需要强制加回来:

SQL> alter diskgroup test_OCR add disk '/dev/raw/raw4' force;

Diskgroup altered.

也再次提升,双活存储时需要设置compatible.rdbms参数及disk_repair_time参数,来应对这种盘OFFLINE问题。

相关推荐
夜泉_ly1 小时前
MySQL -安装与初识
数据库·mysql
qq_529835352 小时前
对计算机中缓存的理解和使用Redis作为缓存
数据库·redis·缓存
月光水岸New5 小时前
Ubuntu 中建的mysql数据库使用Navicat for MySQL连接不上
数据库·mysql·ubuntu
狄加山6755 小时前
数据库基础1
数据库
我爱松子鱼5 小时前
mysql之规则优化器RBO
数据库·mysql
chengooooooo5 小时前
苍穹外卖day8 地址上传 用户下单 订单支付
java·服务器·数据库
Rverdoser6 小时前
【SQL】多表查询案例
数据库·sql
Galeoto6 小时前
how to export a table in sqlite, and import into another
数据库·sqlite
人间打气筒(Ada)7 小时前
MySQL主从架构
服务器·数据库·mysql
leegong231117 小时前
学习PostgreSQL专家认证
数据库·学习·postgresql