简介
问题描述:在Oracle19cRAC中启动CRS服务,其中一个节点的ASM实例无法自动启动,需要手动执行startup命令,且启动实例后集群服务正常。
这种情况就有可能是ASM密码文件错误造成的,当ASM的密码文件丢失、损坏,或手动更替后,就会出现这种问题。
常规的备份中并不包含ASM密码文件,如数据库备份、OCR备份、OLR备份等,这就使得当ASM密码文件出现错误的时候,无法通过常规的备份进行恢复。
因此,我将介绍四种不同的恢复ASM密码文件的方式,分成四个文章来介绍,通过四个测试,来验证其可行性。
1、通过asmcmd --nocp credfix命令恢复。
2、通过密码文件备份恢复。
3、没有密码文件备份,同时版本低于19.8,通过升级补丁后执行asmcmd --nocp credfix命令恢复。
4、直接创建密码文件恢复
测试一:
通过asmcmd --nocp credfix命令恢复ASM密码文件。
测试环境:
Oracle19c双节点RAC集群
rac1 节点一
rac2 节点二
版本19.16
适用场景:
Oracle 19.8 及以上版本
ASM 密码文件丢失或损坏,其他集群组件正常
需具有 root 用户权限和节点间 SSH 互信
修复方法:
使用asmcmd --nocp credverify验证用户凭证状态。
使用asmcmd --nocp credfix命令修复用户凭证。
该方法仅适用于19.8及以上版本的数据库,因为只有在19.8及以上版本中才存在credfix命令。
或者可以参考测试三,在出现问题后打补丁,然后再执行credfix命令修复。
1.查看集群状态和配置:
bash
[grid@rac1 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
ora.chad
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
ora.net1.network
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
ora.ons
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 ONLINE OFFLINE STABLE
ora.DATANEW.dg(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 OFFLINE OFFLINE STABLE
ora.DG_DATA.dg(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 OFFLINE OFFLINE STABLE
ora.DG_FRA.dg(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 OFFLINE OFFLINE STABLE
ora.DG_OCR.dg(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 OFFLINE OFFLINE STABLE
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac1 STABLE
ora.asm(ora.asmgroup)
1 ONLINE ONLINE rac1 Started,STABLE
2 ONLINE ONLINE rac2 Started,STABLE
3 OFFLINE OFFLINE STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 OFFLINE OFFLINE STABLE
ora.cvu
1 ONLINE ONLINE rac1 STABLE
ora.emrep.db
1 ONLINE ONLINE rac1 Open,HOME=/u01/app/o
racle/product/19.3.0
/dbhome_1,STABLE
2 ONLINE ONLINE rac2 Open,HOME=/u01/app/o
racle/product/19.3.0
/dbhome_1,STABLE
ora.qosmserver
1 ONLINE ONLINE rac1 STABLE
ora.rac1.vip
1 ONLINE ONLINE rac1 STABLE
ora.rac2.vip
1 ONLINE ONLINE rac2 STABLE
ora.scan1.vip
1 ONLINE ONLINE rac1 STABLE
--------------------------------------------------------------------------------
bash
asmcmd pwget --asm
+DG_OCR/orapwasm
bash
[grid@rac1 ~]$ asmcmd lspwusr
Username sysdba sysoper sysasm
SYS TRUE TRUE TRUE
ASMSNMP TRUE FALSE FALSE
ORACLE_148 TRUE FALSE FALSE
CRSUSER__ASM_004 TRUE FALSE TRUE
2.破坏密码文件,简单模拟故障
直接替换密码文件,相当于破坏了原先的密码文件。
bash
orapwd file='+dg_ocr/orapwasm' asm=y force=y password=Password123*
credfix命令需要有密码文件存在,才能修复凭证,无法凭空创建一个密码文件。
bash
查看权限
//重建的SYS用户会缺少sysasm权限。
[grid@rac1 ~]$ asmcmd lspwusr
Username sysdba sysoper sysasm
SYS TRUE TRUE FALSE
3.重启CRS观察异常状况
所有节点重启CRS
bash
[root@rac1 bin]# crsctl stop crs
[root@rac2 bin]# crsctl stop crs
[root@rac1 bin]# crsctl start crs
[root@rac2 bin]# crsctl start crs
节点一
bash
[grid@rac1 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
ONLINE ONLINE rac1 STABLE
ora.chad
ONLINE ONLINE rac1 STABLE
ora.net1.network
ONLINE ONLINE rac1 STABLE
ora.ons
ONLINE ONLINE rac1 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE OFFLINE STABLE
3 ONLINE OFFLINE STABLE
ora.DATANEW.dg(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 OFFLINE OFFLINE STABLE
3 OFFLINE OFFLINE STABLE
ora.DG_DATA.dg(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 OFFLINE OFFLINE STABLE
3 OFFLINE OFFLINE STABLE
ora.DG_FRA.dg(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 OFFLINE OFFLINE STABLE
3 OFFLINE OFFLINE STABLE
ora.DG_OCR.dg(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 OFFLINE OFFLINE STABLE
3 OFFLINE OFFLINE STABLE
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac1 STABLE
ora.asm(ora.asmgroup)
1 ONLINE ONLINE rac1 Started,STABLE
2 ONLINE OFFLINE STABLE
3 OFFLINE OFFLINE STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE OFFLINE STABLE
3 OFFLINE OFFLINE STABLE
ora.cvu
1 ONLINE ONLINE rac1 STABLE
ora.emrep.db
1 ONLINE ONLINE rac1 Open,HOME=/u01/app/o
racle/product/19.3.0
/dbhome_1,STABLE
2 ONLINE OFFLINE STABLE
ora.qosmserver
1 ONLINE ONLINE rac1 STABLE
ora.rac1.vip
1 ONLINE ONLINE rac1 STABLE
ora.rac2.vip
1 ONLINE INTERMEDIATE rac1 FAILED OVER,STABLE
ora.scan1.vip
1 ONLINE ONLINE rac1 STABLE
--------------------------------------------------------------------------------
节点二
bash
[grid@rac2 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
[grid@rac2 ~]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE rac2 STABLE
ora.cluster_interconnect.haip
1 ONLINE ONLINE rac2 STABLE
ora.crf
1 ONLINE ONLINE rac2 STABLE
ora.crsd
1 ONLINE OFFLINE STABLE
ora.cssd
1 ONLINE ONLINE rac2 STABLE
ora.cssdmonitor
1 ONLINE ONLINE rac2 STABLE
ora.ctssd
1 ONLINE ONLINE rac2 OBSERVER,STABLE
ora.diskmon
1 OFFLINE OFFLINE STABLE
ora.evmd
1 ONLINE ONLINE rac2 STABLE
ora.gipcd
1 ONLINE ONLINE rac2 STABLE
ora.gpnpd
1 ONLINE ONLINE rac2 STABLE
ora.mdnsd
1 ONLINE ONLINE rac2 STABLE
ora.storage
1 ONLINE OFFLINE rac2 STARTING
--------------------------------------------------------------------------------
别看这个ora.storage是starting,实际上是没法自己启动了,可以从相关的日志文件中找到类似ORA-01017: invalid username/password; logon denied这样的报错。
4.手动启动ASM实例
节点二没法自动起,所以手动startup节点二的ASM实例。
bash
[grid@rac2 trace]$ sqlplus / as sysasm
SQL>startup
如果集群没有其它异常的话,这里是可以正常启动的。
如果你遇到了类似的错误,但startup无法启动ASM实例,则代表你还存在其它错误,应当先处理好其它错误,最后解决ASM密码文件的问题。
启动后查看集群状态,正常。
bash
[grid@rac1 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
ora.chad
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
ora.net1.network
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
ora.ons
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 ONLINE OFFLINE STABLE
ora.DATANEW.dg(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 OFFLINE OFFLINE STABLE
ora.DG_DATA.dg(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 OFFLINE OFFLINE STABLE
ora.DG_FRA.dg(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 OFFLINE OFFLINE STABLE
ora.DG_OCR.dg(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 OFFLINE OFFLINE STABLE
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac1 STABLE
ora.asm(ora.asmgroup)
1 ONLINE ONLINE rac1 Started,STABLE
2 ONLINE ONLINE rac2 Started,STABLE
3 OFFLINE OFFLINE STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 OFFLINE OFFLINE STABLE
ora.cvu
1 ONLINE ONLINE rac1 STABLE
ora.emrep.db
1 ONLINE ONLINE rac1 Open,HOME=/u01/app/o
racle/product/19.3.0
/dbhome_1,STABLE
2 ONLINE ONLINE rac2 Open,HOME=/u01/app/o
racle/product/19.3.0
/dbhome_1,STABLE
ora.qosmserver
1 ONLINE ONLINE rac1 STABLE
ora.rac1.vip
1 ONLINE ONLINE rac1 STABLE
ora.rac2.vip
1 ONLINE ONLINE rac2 STABLE
ora.scan1.vip
1 ONLINE ONLINE rac1 STABLE
--------------------------------------------------------------------------------
5.执行credfix修复凭证
执行credfix修复凭证
如果之前没有创建密码文件,要先使用orapwd命令创建一个密码文件,类似 orapwd file='+dg_ocr/orapwasm' asm=y force=y password=Password123* 的语句。
bash
检查凭证状态
[grid@rac1 ~]$ asmcmd --nocp credverify
credverify: No credentials in password file, please run 'credfix' to fix the credentials.
要求执行credfix修复凭证
[grid@rac1 ~]$ asmcmd --nocp credfix
credfix: Credentials for CRSUSER__ASM_004 not in password file, trying next credential.
op=addcrscreds wrap=/tmp/creds0.xml
KFOD-00610: Internal error [610] [kfodCredExport] [export]
CRSUSER__ASM_004
grid用户执行报错,需要在root执行。
在root用户中执行asmcmd --nocp credfix之前,还需要对各节点root用户进行互信。
互信命令:/u01/app/19.3.0/grid/oui/prov/resources/scripts/sshUserSetup.sh -user root -hosts "rac1 rac2 " -advanced -noPromptPassphrase
[grid@rac1 ~]$ logout
[root@rac1 ~]# asmcmd --nocp credfix
credfix: Connected successfully using credentials for CRSUSER__ASM_005
op=credstoxml wrap=/tmp/new_creds.xml
op=credimport wrap=/tmp/new_creds.xml olr=true force=true
credfix: OLR for rac1 has been fixed if credentials were created incorrectly.
credfix: Starting SSH session on node rac2.
credfix: OLR for rac2 has been fixed if credentials were created incorrectly. Exiting SSH session.
op=delcrscreds crs_user=CRSUSER__ASM_004
credfix: Deleted CRSUSER__ASM_004 from OCR.
credverify: Credentials created correctly on rac1.
credverify: Starting SSH session on node rac2
credverify: Credentials created correctly on rac2. Exiting SSH session.
credfix: Credentials have been fixed if they were created incorrectly.
查看凭证状态
bash
[grid@rac1 ~]$ asmcmd --nocp credverify
credverify: Credentials created correctly on rac1.
credverify: Starting SSH session on node rac2
credverify: Credentials created correctly on rac2. Exiting SSH session.
[grid@rac1 ~]$ asmcmd lspwusr
Username sysdba sysoper sysasm
SYS TRUE TRUE FALSE
CRSUSER__ASM_005 TRUE FALSE TRUE
在执行credfix命令并修复凭证之后,会重新生成一个CRSUSER__ASM_00X用户,在每次修复后,这个数字都会+1。
一个默认的RAC,这个用户就是CRSUSER__ASM_001,我因为测试过多次,所以现在变成了CRSUSER__ASM_005。
6.重新创建用户并授权
bash
asmcmd orapwusr --add ASMSNMP
asmcmd orapwusr --grant sysdba ASMSNMP
asmcmd orapwusr --add ORACLE_148
asmcmd orapwusr --grant sysdba ORACLE_148
asmcmd orapwusr --grant sysasm SYS
查看用户权限
bash
[grid@rac1 ~]$ asmcmd lspwusr
Username sysdba sysoper sysasm
SYS TRUE TRUE TRUE
CRSUSER__ASM_005 TRUE FALSE TRUE
ASMSNMP TRUE FALSE FALSE
ORACLE_148 TRUE FALSE FALSE
7.重新启动CRS验证恢复
重启CRS
bash
[root@rac1 bin]# ./crsctl stop crs
[root@rac2 bin]# ./crsctl stop crs
[root@rac1 bin]# ./crsctl start crs
[root@rac2 bin]# ./crsctl start crs
查看状态。
bash
[grid@rac1 asm]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
ora.chad
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
ora.net1.network
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
ora.ons
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 ONLINE OFFLINE STABLE
ora.DATANEW.dg(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 OFFLINE OFFLINE STABLE
ora.DG_DATA.dg(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 OFFLINE OFFLINE STABLE
ora.DG_FRA.dg(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 OFFLINE OFFLINE STABLE
ora.DG_OCR.dg(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 OFFLINE OFFLINE STABLE
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac2 STABLE
ora.asm(ora.asmgroup)
1 ONLINE ONLINE rac1 Started,STABLE
2 ONLINE ONLINE rac2 Started,STABLE
3 OFFLINE OFFLINE STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
3 OFFLINE OFFLINE STABLE
ora.cvu
1 ONLINE ONLINE rac2 STABLE
ora.emrep.db
1 ONLINE ONLINE rac1 Open,HOME=/u01/app/o
racle/product/19.3.0
/dbhome_1,STABLE
2 ONLINE ONLINE rac2 Open,HOME=/u01/app/o
racle/product/19.3.0
/dbhome_1,STABLE
ora.qosmserver
1 ONLINE ONLINE rac2 STABLE
ora.rac1.vip
1 ONLINE ONLINE rac1 STABLE
ora.rac2.vip
1 ONLINE ONLINE rac2 STABLE
ora.scan1.vip
1 ONLINE ONLINE rac2 STABLE
--------------------------------------------------------------------------------
集群正常启动,恢复成功。