近期遇到某个客户,数据库文件出现读写异常,导致数据库实例重启;对于读写异常可能是当时存储读写的问题,但是重启后,读写异常的数据文件出现数据库文件头不更新(读取该数据文件上数据报错的问题),现象如下:
1、数据库中读取错误后CKPT进程终止数据库实例的日志
Wed Nov 05 13:05:31 2025
Read of datafile '+DATA/hisdb/datafile/emr5202204.dbf' (fno 136) header failed with ORA-01208
Rereading datafile 136 header failed with ORA-01208
Errors in file /u01/app/oracle/diag/rdbms/hisdb/hisdb1/trace/hisdb1_ckpt_24373.trc:
ORA-63999: data file suffered media failure
ORA-01122: database file 136 failed verification check
ORA-01110: data file 136: '+DATA/hisdb/datafile/emr5202204.dbf'
ORA-01208: data file is an old version - not accessing current version
Errors in file /u01/app/oracle/diag/rdbms/hisdb/hisdb1/trace/hisdb1_ckpt_24373.trc:
ORA-63999: data file suffered media failure
ORA-01122: database file 136 failed verification check
ORA-01110: data file 136: '+DATA/hisdb/datafile/emr5202204.dbf'
ORA-01208: data file is an old version - not accessing current version
CKPT (ospid: 24373): terminating the instance due to error 63999
Wed Nov 05 13:05:31 2025
System state dump requested by (instance=1, osid=24373 (CKPT)), summary=[abnormal instance termination].
2、重启后对比数据文件的数据文件头的检查点SCN与控制文件里的SCN
主库:
134 88813504782 88813504782 ONLINE YES
135 88813504782 88813504782 ONLINE YES
136 88547412368 ====>> 88546077727 ONLINE YES ====>>
137 88813504782 88813504782 ONLINE YES
138 88813504782 88813504782 ONLINE YES
139 88813504782 88813504782 ONLINE YES
备库:
134 88815822076 88815822076 ONLINE YES
135 88815822076 88815822076 ONLINE YES
136 88815822076 ====>> 88815822076 ONLINE YES ====>>
137 88815822076 88815822076 ONLINE YES
138 88815822076 88815822076 ONLINE YES
139 88815822076 88815822076 ONLINE YES
140 88815822076 88815822076 ONLINE YES
对该问题,经过一番分析,决定采用如下方式处理:
1、采用不重启数据库,对该数据文件进行OFFLINE动作,再使用归档日志进行恢复的方式,成功后ONLINE数据文件。==该方案成功恢复文件
2、如果方法1失败,则从备机备份该数据文件,在主库进行恢复。
首先,需要排查归档日志文件还在在,幸运的是归档文件在:
对应的归档日志查询:
SQL> select * from (select THREAD#,SEQUENCE#,FIRST_TIME,NEXT_TIME,FIRST_CHANGE# from gv$archived_log where FIRST_CHANGE#>88546077727) where rownum<11;
THREAD# SEQUENCE# FIRST_TIME NEXT_TIME FIRST_CHANGE#
1 39432 2025-11-05 12:14:47 2025-11-05 13:00:08 8.8547E+10
1 39432 2025-11-05 12:14:47 2025-11-05 13:00:08 8.8547E+10
1 39433 2025-11-05 13:00:08 2025-11-05 13:06:16 8.8551E+10
1 39433 2025-11-05 13:00:08 2025-11-05 13:06:16 8.8551E+10
2 37816 2025-11-05 12:44:47 2025-11-05 13:06:17 8.8550E+10
1 39434 2025-11-05 13:06:16 2025-11-05 13:06:16 8.8551E+10
1 39434 2025-11-05 13:06:16 2025-11-05 13:06:16 8.8551E+10
2 37816 2025-11-05 12:44:47 2025-11-05 13:06:17 8.8550E+10
1 39435 2025-11-05 13:07:04 2025-11-05 13:07:04 8.8552E+10
1 39435 2025-11-05 13:07:04 2025-11-05 13:07:04 8.8552E+10
3、对日志进行OFFLINE和恢复操作
SQL> select file_id,file_name,tablespace_name,status,online_status from dba_data_files where file_id=136;
FILE_ID
FILE_NAME
TABLESPACE_NAME STATUS ONLINE_
136
+DATA/hisdb/datafile/emr5202204.dbf
EMR52022 AVAILABLE ONLINE
SQL> alter database datafile 136 offline;
Database altered.
SQL> select file_id,file_name,tablespace_name,status,online_status from dba_data_files where file_id=136;
FILE_ID
FILE_NAME
TABLESPACE_NAME STATUS ONLINE_
136
+DATA/hisdb/datafile/emr5202204.dbf
EMR52022 AVAILABLE RECOVER
SQL> set linesize 180 pagesize 100
SQL> col ctl-CHECKPOINT_CHANGE# for 99999999999999999
SQL> col dbf-CHECKPOINT_CHANGE# for 99999999999999999
SQL> select ctl.FILE#,ctl.CHECKPOINT_CHANGE# as "ctl-CHECKPOINT_CHANGE#",ctl.LAST_CHANGE# as "ctl-LAST_CHANGE#",
2 dbf.CHECKPOINT_CHANGE# as "dbf-CHECKPOINT_CHANGE#",dbf.status,dbf.fuzzy from vdatafile ctl,vdatafile_header dbf where ctl.file#=dbf.file# and dbf.file# in(135,136);
FILE# ctl-CHECKPOINT_CHANGE# ctl-LAST_CHANGE# dbf-CHECKPOINT_CHANGE# STATUS FUZ
135 88822205014 88822205014 ONLINE YES
136 88547412368 8.8824E+10 88546077727 OFFLINE YES
SQL> recover datafile 136;
ORA-00279: change 88546077727 generated at 11/05/2025 12:02:45 needed for thread 2
ORA-00289: suggestion : +ARCH/hisdb/archivelog/2025_11_05/thread_2_seq_37815.4146.1216385087
ORA-00280: change 88546077727 for thread 2 is in sequence #37815
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
auto
ORA-00279: change 88546077727 generated at 11/05/2025 11:45:44 needed for thread 1
ORA-00289: suggestion : +ARCH/hisdb/archivelog/2025_11_05/thread_1_seq_39431.4039.1216383287
ORA-00280: change 88546077727 for thread 1 is in sequence #39431
ORA-00279: change 88818174303 generated at 11/07/2025 16:07:01 needed for thread 1
ORA-00289: suggestion : +ARCH/hisdb/archivelog/2025_11_07/thread_1_seq_39527.4437.1216571273
ORA-00280: change 88818174303 for thread 1 is in sequence #39527
Log applied.
Media recovery complete.
SQL>
SQL>
SQL> select file_id,file_name,tablespace_name,status,online_status from dba_data_files where file_id=136;
FILE_ID
FILE_NAME
TABLESPACE_NAME STATUS ONLINE_
136
+DATA/hisdb/datafile/emr5202204.dbf
EMR52022 AVAILABLE OFFLINE
SQL> set linesize 180 pagesize 100
SQL> col ctl-CHECKPOINT_CHANGE# for 99999999999999999
SQL> col dbf-CHECKPOINT_CHANGE# for 99999999999999999
SQL> select ctl.FILE#,ctl.CHECKPOINT_CHANGE# as "ctl-CHECKPOINT_CHANGE#",ctl.LAST_CHANGE# as "ctl-LAST_CHANGE#",
2 dbf.CHECKPOINT_CHANGE# as "dbf-CHECKPOINT_CHANGE#",dbf.status,dbf.fuzzy from vdatafile ctl,vdatafile_header dbf where ctl.file#=dbf.file# and dbf.file# in(135,136);
FILE# ctl-CHECKPOINT_CHANGE# ctl-LAST_CHANGE# dbf-CHECKPOINT_CHANGE# STATUS FUZ
135 88822357274 88822357274 ONLINE YES
136 88824117167 8.8824E+10 88824117167 OFFLINE NO
SQL> alter database datafile 136 online;
Database altered.
SQL> set linesize 180 pagesize 100
SQL> col ctl-CHECKPOINT_CHANGE# for 99999999999999999
SQL> col dbf-CHECKPOINT_CHANGE# for 99999999999999999
SQL> select ctl.FILE#,ctl.CHECKPOINT_CHANGE# as "ctl-CHECKPOINT_CHANGE#",ctl.LAST_CHANGE# as "ctl-LAST_CHANGE#",
2 dbf.CHECKPOINT_CHANGE# as "dbf-CHECKPOINT_CHANGE#",dbf.status,dbf.fuzzy from vdatafile ctl,vdatafile_header dbf where ctl.file#=dbf.file# and dbf.file# in(135,136);
FILE# ctl-CHECKPOINT_CHANGE# ctl-LAST_CHANGE# dbf-CHECKPOINT_CHANGE# STATUS FUZ
135 88822357274 88822357274 ONLINE YES
136 88824767977 88824767977 ONLINE YES
c此时数据文件头的SCN已经恢复,人工执行检查点命令,查看SCN已经与其他数据文件一致:
SQL> alter system checkpoint;
System altered.
SQL> set linesize 180 pagesize 100
SQL> col ctl-CHECKPOINT_CHANGE# for 99999999999999999
SQL> col dbf-CHECKPOINT_CHANGE# for 99999999999999999
SQL> select ctl.FILE#,ctl.CHECKPOINT_CHANGE# as "ctl-CHECKPOINT_CHANGE#",ctl.LAST_CHANGE# as "ctl-LAST_CHANGE#",
2 dbf.CHECKPOINT_CHANGE# as "dbf-CHECKPOINT_CHANGE#",dbf.status,dbf.fuzzy from vdatafile ctl,vdatafile_header dbf where ctl.file#=dbf.file# and dbf.file# in(135,136,138);
FILE# ctl-CHECKPOINT_CHANGE# ctl-LAST_CHANGE# dbf-CHECKPOINT_CHANGE# STATUS FUZ
135 88825188862 88825188862 ONLINE YES
136 88825188862 88825188862 ONLINE YES
138 88825188862 88825188862 ONLINE YES
此时对该表空间的数据进行查询,已经恢复正常,RMAN进行校验也无异常RMAN> backup validate check logical datafile 136;。