RAC环境数据文件读取异常导致实例重启

近期遇到某个客户,数据库文件出现读写异常,导致数据库实例重启;对于读写异常可能是当时存储读写的问题,但是重启后,读写异常的数据文件出现数据库文件头不更新(读取该数据文件上数据报错的问题),现象如下:

1、数据库中读取错误后CKPT进程终止数据库实例的日志

Wed Nov 05 13:05:31 2025

Read of datafile '+DATA/hisdb/datafile/emr5202204.dbf' (fno 136) header failed with ORA-01208

Rereading datafile 136 header failed with ORA-01208

Errors in file /u01/app/oracle/diag/rdbms/hisdb/hisdb1/trace/hisdb1_ckpt_24373.trc:

ORA-63999: data file suffered media failure

ORA-01122: database file 136 failed verification check

ORA-01110: data file 136: '+DATA/hisdb/datafile/emr5202204.dbf'

ORA-01208: data file is an old version - not accessing current version

Errors in file /u01/app/oracle/diag/rdbms/hisdb/hisdb1/trace/hisdb1_ckpt_24373.trc:

ORA-63999: data file suffered media failure

ORA-01122: database file 136 failed verification check

ORA-01110: data file 136: '+DATA/hisdb/datafile/emr5202204.dbf'

ORA-01208: data file is an old version - not accessing current version

CKPT (ospid: 24373): terminating the instance due to error 63999

Wed Nov 05 13:05:31 2025

System state dump requested by (instance=1, osid=24373 (CKPT)), summary=[abnormal instance termination].

2、重启后对比数据文件的数据文件头的检查点SCN与控制文件里的SCN

主库:

134 88813504782 88813504782 ONLINE YES

135 88813504782 88813504782 ONLINE YES

136 88547412368 ====>> 88546077727 ONLINE YES ====>>

137 88813504782 88813504782 ONLINE YES

138 88813504782 88813504782 ONLINE YES

139 88813504782 88813504782 ONLINE YES

备库:

134 88815822076 88815822076 ONLINE YES

135 88815822076 88815822076 ONLINE YES

136 88815822076 ====>> 88815822076 ONLINE YES ====>>

137 88815822076 88815822076 ONLINE YES

138 88815822076 88815822076 ONLINE YES

139 88815822076 88815822076 ONLINE YES

140 88815822076 88815822076 ONLINE YES

对该问题,经过一番分析,决定采用如下方式处理:

1、采用不重启数据库,对该数据文件进行OFFLINE动作,再使用归档日志进行恢复的方式,成功后ONLINE数据文件。==该方案成功恢复文件

2、如果方法1失败,则从备机备份该数据文件,在主库进行恢复。

首先,需要排查归档日志文件还在在,幸运的是归档文件在:

对应的归档日志查询:

SQL> select * from (select THREAD#,SEQUENCE#,FIRST_TIME,NEXT_TIME,FIRST_CHANGE# from gv$archived_log where FIRST_CHANGE#>88546077727) where rownum<11;

THREAD# SEQUENCE# FIRST_TIME NEXT_TIME FIRST_CHANGE#


1 39432 2025-11-05 12:14:47 2025-11-05 13:00:08 8.8547E+10

1 39432 2025-11-05 12:14:47 2025-11-05 13:00:08 8.8547E+10

1 39433 2025-11-05 13:00:08 2025-11-05 13:06:16 8.8551E+10

1 39433 2025-11-05 13:00:08 2025-11-05 13:06:16 8.8551E+10

2 37816 2025-11-05 12:44:47 2025-11-05 13:06:17 8.8550E+10

1 39434 2025-11-05 13:06:16 2025-11-05 13:06:16 8.8551E+10

1 39434 2025-11-05 13:06:16 2025-11-05 13:06:16 8.8551E+10

2 37816 2025-11-05 12:44:47 2025-11-05 13:06:17 8.8550E+10

1 39435 2025-11-05 13:07:04 2025-11-05 13:07:04 8.8552E+10

1 39435 2025-11-05 13:07:04 2025-11-05 13:07:04 8.8552E+10

3、对日志进行OFFLINE和恢复操作

SQL> select file_id,file_name,tablespace_name,status,online_status from dba_data_files where file_id=136;

FILE_ID


FILE_NAME


TABLESPACE_NAME STATUS ONLINE_


136

+DATA/hisdb/datafile/emr5202204.dbf

EMR52022 AVAILABLE ONLINE

SQL> alter database datafile 136 offline;

Database altered.

SQL> select file_id,file_name,tablespace_name,status,online_status from dba_data_files where file_id=136;

FILE_ID


FILE_NAME


TABLESPACE_NAME STATUS ONLINE_


136

+DATA/hisdb/datafile/emr5202204.dbf

EMR52022 AVAILABLE RECOVER

SQL> set linesize 180 pagesize 100

SQL> col ctl-CHECKPOINT_CHANGE# for 99999999999999999

SQL> col dbf-CHECKPOINT_CHANGE# for 99999999999999999

SQL> select ctl.FILE#,ctl.CHECKPOINT_CHANGE# as "ctl-CHECKPOINT_CHANGE#",ctl.LAST_CHANGE# as "ctl-LAST_CHANGE#",

2 dbf.CHECKPOINT_CHANGE# as "dbf-CHECKPOINT_CHANGE#",dbf.status,dbf.fuzzy from vdatafile ctl,vdatafile_header dbf where ctl.file#=dbf.file# and dbf.file# in(135,136);

FILE# ctl-CHECKPOINT_CHANGE# ctl-LAST_CHANGE# dbf-CHECKPOINT_CHANGE# STATUS FUZ


135 88822205014 88822205014 ONLINE YES

136 88547412368 8.8824E+10 88546077727 OFFLINE YES

SQL> recover datafile 136;

ORA-00279: change 88546077727 generated at 11/05/2025 12:02:45 needed for thread 2

ORA-00289: suggestion : +ARCH/hisdb/archivelog/2025_11_05/thread_2_seq_37815.4146.1216385087

ORA-00280: change 88546077727 for thread 2 is in sequence #37815

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

auto

ORA-00279: change 88546077727 generated at 11/05/2025 11:45:44 needed for thread 1

ORA-00289: suggestion : +ARCH/hisdb/archivelog/2025_11_05/thread_1_seq_39431.4039.1216383287

ORA-00280: change 88546077727 for thread 1 is in sequence #39431

ORA-00279: change 88818174303 generated at 11/07/2025 16:07:01 needed for thread 1

ORA-00289: suggestion : +ARCH/hisdb/archivelog/2025_11_07/thread_1_seq_39527.4437.1216571273

ORA-00280: change 88818174303 for thread 1 is in sequence #39527

Log applied.

Media recovery complete.

SQL>

SQL>

SQL> select file_id,file_name,tablespace_name,status,online_status from dba_data_files where file_id=136;

FILE_ID


FILE_NAME


TABLESPACE_NAME STATUS ONLINE_


136

+DATA/hisdb/datafile/emr5202204.dbf

EMR52022 AVAILABLE OFFLINE

SQL> set linesize 180 pagesize 100

SQL> col ctl-CHECKPOINT_CHANGE# for 99999999999999999

SQL> col dbf-CHECKPOINT_CHANGE# for 99999999999999999

SQL> select ctl.FILE#,ctl.CHECKPOINT_CHANGE# as "ctl-CHECKPOINT_CHANGE#",ctl.LAST_CHANGE# as "ctl-LAST_CHANGE#",

2 dbf.CHECKPOINT_CHANGE# as "dbf-CHECKPOINT_CHANGE#",dbf.status,dbf.fuzzy from vdatafile ctl,vdatafile_header dbf where ctl.file#=dbf.file# and dbf.file# in(135,136);

FILE# ctl-CHECKPOINT_CHANGE# ctl-LAST_CHANGE# dbf-CHECKPOINT_CHANGE# STATUS FUZ


135 88822357274 88822357274 ONLINE YES

136 88824117167 8.8824E+10 88824117167 OFFLINE NO

SQL> alter database datafile 136 online;

Database altered.

SQL> set linesize 180 pagesize 100

SQL> col ctl-CHECKPOINT_CHANGE# for 99999999999999999

SQL> col dbf-CHECKPOINT_CHANGE# for 99999999999999999

SQL> select ctl.FILE#,ctl.CHECKPOINT_CHANGE# as "ctl-CHECKPOINT_CHANGE#",ctl.LAST_CHANGE# as "ctl-LAST_CHANGE#",

2 dbf.CHECKPOINT_CHANGE# as "dbf-CHECKPOINT_CHANGE#",dbf.status,dbf.fuzzy from vdatafile ctl,vdatafile_header dbf where ctl.file#=dbf.file# and dbf.file# in(135,136);

FILE# ctl-CHECKPOINT_CHANGE# ctl-LAST_CHANGE# dbf-CHECKPOINT_CHANGE# STATUS FUZ


135 88822357274 88822357274 ONLINE YES

136 88824767977 88824767977 ONLINE YES

c此时数据文件头的SCN已经恢复,人工执行检查点命令,查看SCN已经与其他数据文件一致:

SQL> alter system checkpoint;

System altered.

SQL> set linesize 180 pagesize 100

SQL> col ctl-CHECKPOINT_CHANGE# for 99999999999999999

SQL> col dbf-CHECKPOINT_CHANGE# for 99999999999999999

SQL> select ctl.FILE#,ctl.CHECKPOINT_CHANGE# as "ctl-CHECKPOINT_CHANGE#",ctl.LAST_CHANGE# as "ctl-LAST_CHANGE#",

2 dbf.CHECKPOINT_CHANGE# as "dbf-CHECKPOINT_CHANGE#",dbf.status,dbf.fuzzy from vdatafile ctl,vdatafile_header dbf where ctl.file#=dbf.file# and dbf.file# in(135,136,138);

FILE# ctl-CHECKPOINT_CHANGE# ctl-LAST_CHANGE# dbf-CHECKPOINT_CHANGE# STATUS FUZ


135 88825188862 88825188862 ONLINE YES

136 88825188862 88825188862 ONLINE YES

138 88825188862 88825188862 ONLINE YES

此时对该表空间的数据进行查询,已经恢复正常,RMAN进行校验也无异常RMAN> backup validate check logical datafile 136;。

相关推荐
小猿姐2 小时前
当KubeBlocks遇上国产数据库之Kingbase:让信创数据库“飞得更高”
运维·数据库·云原生
小李的便利店2 小时前
系统架构设计师-案例分析-数据库系统设计
数据库·系统架构
洛菡夕2 小时前
MySQL全量、增量备份与恢复
数据库·mysql
Sunia2 小时前
《Spring AI + 大模型全栈实战》学习手册系列 · 专题二:《Milvus 向量数据库:从零开始搭建 RAG 系统的核心组件》
数据库
絆人心2 小时前
最新 SQL 常用语句大全(新手入门 + 老手速查,含 DQL/DML/DDL)
数据库·sql·oracle
keyborad pianist3 小时前
一篇文章学会Redis
数据库·redis·缓存
星辰_mya3 小时前
SQL 性能调优:EXPLAIN 详解与慢查询优化案例
数据库·sql·面试·架构师
xixingzhe23 小时前
spring boot druid 10秒超时问题
java·数据库·spring boot
IndulgeCui3 小时前
Kingbase 身份认证与权限控制实践—数据库安全的第一道防线
数据库