Oracle19C低版本一天遭遇两BUG(ORA-04031/ORA-600)

昨天帮朋友看一个系统异常卡顿的案例,在这里分享给大家

环境:Exadata X8M 数据库版本19.11

1.系统报错信息

表象为系统卡顿,页面无法刷出,登陆到主机上看到节点1 系统等待存在大量的 cursor: pin S wait on X等待

查看两个节点的alert log 看到有大量的ORA-04031报错

复制代码
2025-04-15T14:43:53.522183+08:00
Errors in file /u01/app/oracle/diag/rdbms/test1/test11/trace/test11_m000_342669.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01000: maximum open cursors exceeded
2025-04-15T14:44:50.515707+08:00
DDE: Problem Key 'ORA 4031' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
2025-04-15T14:46:29.968518+08:00
Errors in file /u01/app/oracle/diag/rdbms/test1/test11/trace/test11_mz08_287162.trc:
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^2798","kglseshtTable")
2025-04-15T14:46:30.005517+08:00
Process MZ08 died, see its trace file

trace file信息
Trace file /u01/app/oracle/diag/rdbms/test1/test11/trace/test11_mz08_287162.trc
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.11.0.0.0
*** 2025-04-15T14:46:29.968405+08:00 (CDB$ROOT(1))
<error barrier> at 0x7ffceea153e8 placed ksv.c@7147
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^2798","kglseshtTable")
OPIRIP: Uncaught error 447. Error stack:

*** 2025-04-15T14:46:29.969405+08:00
ORA-00447: fatal error in background process
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^2798","kglseshtTable")

MOS新加AI搜索能力

ORA-04031涉及到的bug非常多,遇到这类问题 优先查MOS,这里简单介绍一下MOS新推出的 AI搜索能力,可以根据你提供的报错 比较精准的给出搜索结果,并根据相关性将引用的文档列出来,这个功能非常好

根据AI的提示 简单翻译一下

请注意:服务提供的信息可能存在不准确之处,应进行独立验证。本解答适用于:Oracle Database - Enterprise Edition - 版本19.3.0.0.0至19.14.0.0.0

错误消息ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^2798","kglseshtTable")表明数据库无法从共享池分配所需的共享内存。

根据《数据库错误消息》文档,该错误的成因是所需共享内存超过了当前可用量。

解决方案如下:

  1. 增加初始化参数SGA_TARGET、MEMORY_SIZE或MEMORY_TARGET的值

  2. 若无法增加上述参数,则降低DB_CACHE_SIZE的值(如已设置)

  3. 若未使用SGA_TARGET、MEMORY_SIZE或MEMORY_TARGET,则增加对应内存池的大小:

    • 共享池不足时增加SHARED_POOL_SIZE

    • IMC堆不足时增加INMEMORY_SIZE

另一份相关文档《ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^1489","kglseshtTable")》提供了更具体的错误分析。

该错误由产品缺陷引起,已在未公开的Bug 33647820中记录。解决方案为:

  1. 应用19.15或更高版本的补丁集(该问题已标记为修复)

  2. 或应用适用于您版本和平台的补丁33647820(如存在)

检查补丁冲突请使用My Oracle Support (MOS)的Patch Planner工具。若对应平台和版本无可用补丁,请联系Oracle技术支持申请修复。

2.原因

很明显这个报错是因为触发了一个为 unpublished 的Bug ,根据文档 ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^1489","kglseshtTable") (Doc ID 2923103.1)

这个报错和文档中的描述完全一致,大概猜测是数据库实例运行太久 造成的share pool imbalance,该问题在19.15被解决,19.15之前版本BUG还是挺多的,如果有条件可以考虑升级到19.20+

话说Exadata稳定性还是非常强大的,这台机器从入场至今1387天(接近四年了),没有重启过,出问题这个实例也没有重启过, 如果不是这次遭遇BUG 应该还能跑很久。

3.解决方案

根据以上MOS的信息,有一下几种处理方式

紧急处理方式,强制刷新share pool

复制代码
alter system flush shared_pool;

或者重启数据库 很多内存相关的bug可以通过重启数据库来解决,毕竟打补丁现在来不及;我这里选择的处理方案是轮流重启两个节点;

然而因为这个实例已经运行了太久 shutdown用了好长时间,并有大量pid 都需要手动kill,一下报出几百个PID,还好现在AI比较强大直接将这部分log丢给deepseek,让它把pid筛选出来就好了。

复制代码
PDBPRO(6):Active process 279643 user 'grid' program 'oracle@test.com.cn', waiting for 'SQL*Net message from client'
PDBPRO(6):
PDBPRO(6):Active process 124660 user 'grid' program 'oracle@test.com.cn', waiting for 'SQL*Net message from client'
PDBPRO(6):
PDBPRO(6):Active process 246421 user 'grid' program 'oracle@test.com.cn', waiting for 'SQL*Net message from client'
PDBPRO(6):
PDBPRO(6):Active process 224818 user 'grid' program 'oracle@test.com.cn', waiting for 'read by other session'
PDBPRO(6):
PDBPRO(6):Active process 124650 user 'grid' program 'oracle@test.com.cn', waiting for 'SQL*Net message from client'
PDBPRO(6):

4.重启再遇到BUG

节点2重启正常,但是在节点1重启时发现只能mount 无法OPEN 关键部分报错如下

复制代码
ALTER DATABASE OPEN /* db agent *//* {0:4:346} */

2025-04-15T19:47:52.586367+08:00
CTWR started with pid=89, OS id=33744
Errors in file /u01/app/oracle/diag/rdbms/test1/test11/trace/test11_ctwr_33744.trc (incident=246935) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 52011112 bytes of shared memory ("large pool","unknown object","large pool","CTWR dba buffer")
Incident details in: /u01/app/oracle/diag/rdbms/test1/test11/incident/incdir_246935/test11_ctwr_33744_i246935.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2025-04-15T19:47:53.513579+08:00
ORA-04031 heap dump being written to trace file /u01/app/oracle/diag/rdbms/test1/test11/incident/incdir_246935/test11_ctwr_33744_i246935.trc
2025-04-15T19:47:54.106979+08:00
Errors in file /u01/app/oracle/diag/rdbms/test1/test11/trace/test11_ctwr_33744.trc  (incident=246936) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [krcpasb_initial_alloc_failure], [3250176], [], [], [], [], [], [], [], [], [], []
ORA-04031: unable to allocate 52011112 bytes of shared memory ("large pool","unknown object","large pool","CTWR dba buffer")
Incident details in: /u01/app/oracle/diag/rdbms/test1/test11/incident/incdir_246936/test11_ctwr_33744_i246936.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2025-04-15T19:47:54.598420+08:00
Errors in file /u01/app/oracle/diag/rdbms/test1/test11/trace/test11_ctwr_33744.trc:
ORA-00600: internal error code, arguments: [krcpasb_initial_alloc_failure], [3250176], [], [], [], [], [], [], [], [], [], []
ORA-04031: unable to allocate 52011112 bytes of shared memory ("large pool","unknown object","large pool","CTWR dba buffer")
2025-04-15T19:47:54.598589+08:00
The change tracking error 600.
2025-04-15T19:47:54.598742+08:00
Stopping background process CTWR
2025-04-15T19:47:54.599120+08:00
Errors in file /u01/app/oracle/diag/rdbms/test1/test11/trace/test11_ctwr_33744.trc:
ORA-00600: internal error code, arguments: [krcpasb_initial_alloc_failure], [3250176], [], [], [], [], [], [], [], [], [], []
ORA-04031: unable to allocate 52011112 bytes of shared memory ("large pool","unknown object","large pool","CTWR dba buffer")
2025-04-15T19:47:54.600494+08:00
Dumping diagnostic data in directory=[cdmp_20250415194754], requested by (instance=1, osid=33744 (CTWR)), summary=[incident=246936].
Errors in file /u01/app/oracle/diag/rdbms/test1/test11/trace/test11_ctwr_33744.trc  (incident=246937) (PDBNAME=CDB$ROOT):
ORA-487 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: /u01/app/oracle/diag/rdbms/test1/test11/incident/incdir_246937/test11_ctwr_33744_i246937.trc

错误发生在启动 CTWR(Change Tracking Writer)进程时,最终导致 实例终止(instance crash)

4.1 CTWR 进程启动失败

CTWR 是 Change Tracking Writer,用于实现 增量备份变更跟踪(Block Change Tracking) 功能。它启动时尝试在 large pool 中分配大块内存失败,引发了 ORA-4031:

"large pool", "CTWR dba buffer"

4.2 ORA-00600 + ORA-4031 的组合说明这是一个严重的系统级错误

  • ORA-00600 [krcpasb_initial_alloc_failure]内部内存分配失败

  • 错误位置在 Oracle kernel 模块 krcp* 系列,属于 change tracking 内部模块

  • 后续的 进程中止、系统状态转储、实例终止 都是级联故障结果

4.3是否命中 Oracle 官方 Bug?

查MOS 很快找到和这个报错和BUG Bug 32428097 高度一致!

Bug 32428097 - ORA-600 [krcpasb_initial_alloc_failure] during CTWR startup

说明

  • Oracle 19.x 在使用 change tracking 时,CTWR 在启动期间分配内存失败,触发 ORA-04031 + ORA-00600 + 实例崩溃。

  • 这是 Oracle 确认的回归问题(Regression Bug)在19.13修复。

  • 常见于:

    • 大量数据变更(如恢复、测试环境还原)

    • large pool 不足

    • 某些版本升级后首次启用 change tracking

4.4解决方案

暂时关闭block change track

复制代码
ALTER DATABASE DISABLE BLOCK CHANGE TRACKING;

5.总结

截止至2025年4月16日 Oracle19C已经更新至19.27,我认为至少在未来五年内,19c仍然会是主力版本;当然拉如果没有遭遇BUG,理论上可以不打补丁的,但是为了系统的稳定,仍然建议将19C升级至19.20+ (保守点19.15+)

附录oracle各版本支持时间线。

参考文档:

ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^1489","kglseshtTable") (Doc ID 2923103.1)

Bug 32428097 - BCT: CTWR crashes during "_bct_public_dba_buffer_size" reset with ORA-00600 [krcpasb_initial_alloc_failure] & ORA-4031 (Doc ID 32428097.8)

Release Schedule of Current Database Releases (Doc ID 742060.1)

相关推荐
柊二三38 分钟前
XML的简略知识点
xml·数据库·oracle
每天敲200行代码3 小时前
MySQL 事务管理
数据库·mysql·事务
巴里巴气3 小时前
MongoDB索引及其原理
数据库·mongodb
程序员勋勋13 小时前
Redis的String数据类型底层实现
数据库·redis·缓存
不修×蝙蝠4 小时前
MySQL 全详解:从入门到精通的实战指南
数据库·mysql·索引·最左前缀
我的ID配享太庙呀5 小时前
Django 科普介绍:从入门到了解其核心魅力
数据库·后端·python·mysql·django·sqlite
不辉放弃6 小时前
kafka的消费者负载均衡机制
数据库·分布式·kafka·负载均衡
拉姆哥的小屋6 小时前
用 Flask 打造宠物店线上平台:从 0 到 1 的全栈开发实践
数据库·oracle·flask
liliangcsdn7 小时前
mac neo4j install & verifcation
数据库·neo4j
Cyanto7 小时前
MyBatis-Plus高效开发实战
java·开发语言·数据库