ORACLE RAC ADG备库报错ORA-04021: timeout occurred while waiting to lock object

问题:核心的灾备 RAC ADG 备库,这两天频繁重启,并且报如下错误,通过查看MOS,发现是个BUG

ADG备库的ALERT错误日志如下:

Errors in file /u01/app/oracle/diag/rdbms/hxxxsz/hxxxsz1/trace/hxxxsz1_lgwr_69711.trc:

ORA-04021: timeout occurred while waiting to lock object

Mon Dec 16 16:26:15 2024

ORA-01555 caused by SQL statement below (SQL ID: 87gaftwrm2h68, Query Duration=899 sec, SCN: 0x05cf.7a01a7dc):

select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname,o.subname from obj$ o where o.obj#=:1

LGWR (ospid: 69711): terminating the instance due to error 4021

Mon Dec 16 16:26:15 2024

System state dump requested by (instance=1, osid=69711 (LGWR)), summary=[abnormal instance termination].

System State dumped to trace file /u01/app/oracle/diag/rdbms/hxxxsz/hxxxsz1/trace/hxxxsz1_diag_69557_20241216162615.trc

Mon Dec 16 16:26:15 2024

ORA-1092 : opitsk aborting process

Mon Dec 16 16:26:16 2024

License high water mark = 1321

Instance terminated by LGWR, pid = 69711

USER (ospid: 42412): terminating the instance

Instance terminated by USER, pid = 42412

Mon Dec 16 16:26:23 2024

Starting ORACLE instance (normal)

************************ Large Pages Information *******************

Per process system memlock (soft) limit = UNLIMITED

解决方案:

  1. 查看隐藏参数:

SELECT ksppinm, ksppstvl, ksppdesc FROM xksppi x, xksppcv y WHERE x.indx = y.indx AND ksppinm ='_adg_parselock_timeout';

KSPPINM


KSPPSTVL


KSPPDESC


_adg_parselock_timeout

0

timeout for parselock get on ADG in centiseconds

  1. 执行以下语句:

alter system set "_adg_parselock_timeout"=500 scope=both sid='*';

参考MOS内容如下:

ORA-04021: timeout occurred while waiting to lock object : DR Instance terminated by LGWR (Doc ID 2183882.1)

Applies to:

|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Oracle Database - Enterprise Edition - Version 11.2.0.3 and later Oracle Database Exadata Cloud Machine - Version N/A and later Oracle Cloud Infrastructure - Database Service - Version N/A and later Oracle Database Cloud Exadata Service - Version N/A and later Oracle Database Exadata Express Cloud Service - Version N/A and later Information in this document applies to any platform. |

Symptoms

|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DR database crashed with below errors.. Client address: (ADDRESS=(PROTOCOL=<protocol>)(HOST=<hostname>)(PORT=<port>)) WARNING: inbound connection timed out (ORA-3136) Wed Jul 13 13:43:24 2016 Errors in file /<path>/diag/rdbms/<db_name>/<oracle_sid>/trace/<oracle_sid>lgwr<pid>.trc: ORA-04021: timeout occurred while waiting to lock object LGWR (ospid: 31312): terminating the instance due to error 4021 Wed Jul 13 13:43:24 2016 System state dump requested by (instance=1, osid=31312 (LGWR)), summary=[abnormal instance termination]. System State dumped to trace file /<path>/diag/rdbms/<db_name>/<oracle_sid>/trace/<oracle_sid>diag<pid>.trc Wed Jul 13 13:43:25 2016 License high water mark = 318 Instance terminated by LGWR, pid = 31312 USER (ospid: 20898): terminating the instance Instance terminated by USER, pid = 20898 Wed Jul 13 13:43:39 2016 Starting ORACLE instance (normal) |

Cause

|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Bug 16717701 - ADG SHOULD GET THE INSTANCE PARSE LOCK WITH A TIMEOUT ------> Superseded By Bug fix Bug 17018214 Bug 11712267 - ACTIVE DATA GUARD DATABASE HUNG ON 'LIBRARY CACHE: MUTEX X' WAIT EVENT LGWR trace file (RXEPRR1_lgwr_31312.trc) *** 2016-07-13 13:43:24.498 *** SESSION ID:(6709.1) 2016-07-13 13:43:24.498 *** CLIENT ID:() 2016-07-13 13:43:24.498 *** SERVICE NAME:(SYS$BACKGROUND) 2016-07-13 13:43:24.498 *** MODULE NAME:() 2016-07-13 13:43:24.498 *** ACTION NAME:() 2016-07-13 13:43:24.498 error 4021 detected in background process ORA-04021: timeout occurred while waiting to lock object kjzduptcctx: Notifying DIAG for crash event ----- Abridged Call Stack Trace ----- ksedsts()+1296<-kjzdicrshnfy()+364<-ksuitm()+1688<-ksbrdp()+4296<-opirip()+1680<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380 ----- End of Abridged Call Stack Trace ----- |

Solution

|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Issue matches with bug 11712267 and bug 16717701 Since two bugs are matching with the case, You can try with option (1) . As per Bug 11712267 change the cursor_sharing to force on Active dataguard (ADG). Monitor your environment for sometime. If it crashes again then follow with the option (2) Option (2): As per bug description LGWR can request DBINSTANCE lock in X mode without any timeout which can lead to a hang / deadlock. Both fixes are already included in 11.2.0.4 but the fix is DISABLED by default. == > To ENABLE the fix one has to set == > "_adg_parselock_timeout" > to the number of centi-seconds == > LGWR should wait before backing off and retrying the request. Value should be in centi seconds. == > I Don't think there is really any hard fast rule for a value - at default (0) it will not timeout. A value representing a few seconds seems reasonable - if LGWR has been stuck for say 5 seconds waiting it seems reasonable guess it is not going to get the lock. The param just causes it to abort the current attempt and retry If you want to play safe can start with a higher value then decrease later. A higher value will just mean more sessions blocked for longer in case of the deadlock situation. 500 Seems reasonable , but I have no data to base it on. There should be a statistic "ADG parselock X get attempts" If it gets set too small that value would likely increase a lot due to keep timing out and retrying. This is a dynamic parameter Follow option (1) . change the cursor_sharing to force on ADG If issue re-appears then follow option (2) as below Please set "_adg_parselock_timeout" to 500 == > SQL > alter system set "_adg_parselock_timeout"=500 scope=both sid='*'; |

相关推荐
lifallen几秒前
Paimon vs. HBase:全链路开销对比
java·大数据·数据结构·数据库·算法·flink·hbase
Brookty1 小时前
【MySQL】JDBC编程
java·数据库·后端·学习·mysql·jdbc
先做个垃圾出来………2 小时前
SQL的底层逻辑解析
数据库·sql
码不停蹄的玄黓2 小时前
深入拆解MySQL InnoDB可重复读(RR)隔离级别:MVCC+临键锁如何「锁」住一致性?
数据库·mysql·可重复读
paopaokaka_luck2 小时前
基于SpringBoot+Vue的酒类仓储管理系统
数据库·vue.js·spring boot·后端·小程序
薛晓刚3 小时前
哪个领域数据库最难替换?
数据库
芷栀夏3 小时前
基于Anything LLM的本地知识库系统远程访问实现路径
数据库·人工智能
软件2054 小时前
【redis使用场景——缓存——数据淘汰策略】
数据库·redis·缓存
ChinaRainbowSea4 小时前
9-2 MySQL 分析查询语句:EXPLAIN(详细说明)
java·数据库·后端·sql·mysql
时序数据说4 小时前
Java类加载机制及关于时序数据库IoTDB排查
java·大数据·数据库·物联网·时序数据库·iotdb