1.问题分析
问题描述:windows环境oracle 11.2.0.1版本数据库异常关闭,之后无法启动,无备份、未打开归档。
故障分析:
1.直观查看数据库的数据文件、REDO文件均在,查看数据文件的最后修改时间,除SYSAUX02.DBF外其它数据文件时间接近;
2.基于数据库检查点机制分析,假定REDO文件未损坏,则数据库的检查点在全部REDO日志切换轮循使用过程中,必然发生一次;因此所有REDO文件均未损坏时,可以正常OPEN数据库;此种情景如下两种情况均不会导致数据丢失:
1.实例恢复时数据库会自动使用REDO文件来进行crash recovery并OPEN数据库
2.控制文件损坏使用备份或重建控制文件时,此时recover database时人工指定日志文件路径,让数据库进行恢复。
因此基于此分析,可以尝试数据库异常OPEN时的处理思路,一步步处理。
处理过程:
1.分析数据库中数据文件头及控制文件中记录的数据文件检查点SCN信息,判断数据文件的一致性及可能的数据丢失情况
FILE# ctl-CHECKPOINT_CHANGE# ctl-LAST_CHANGE# dbf-CHECKPOINT_CHANGE# STATUS FUZ
2 1176659225 1036813104 ONLINE NO ====>>观察此文件的数据文件头检查点SCN,与控制文件中记录的差距巨大,
10 1176708807 1176708807 OFFLINE NO ====>>观察此文件,数据文件头检查点SCN,与控制文件中记录值一致,但是OFFLINE状态,怀疑是人为故障处理时操作。
2.ONLINE 10号数据文件后,正常OPEN数据库查看信息并进一步处理
3.正常OPEN时提示需要以RESETLOGS方式打开数据库,怀疑前期重建过控制文件。
此时直接重建控制文件,并进行RECOVER DATABASE操作
4.ALTER DATABASE RECOVER database using backup controlfile 恢复时,出现报错ORA-00310: archived log contains sequence 29326; sequence 26024 required;从此报错可以发现RECOVER数据库时,需要用到很久远之前的序号26024号归档日志,而当前的REDO LOG日志序号是29326;结合前期观察到的数据文件最后修改日期、数据文件头SCN等对比信息,判断为2号数据文件的问题。
分析:
(1)、2号文件为SYSAUX,结合SCN数据来看,此文件状态与最新差异较多,如果使用BBED修改数据文件头方式,虽然可以骗过OPEN时的检查点校验,但是在应用REDO LOG进行实例恢复阶段,较大可能会出现数据不一致的错误;同时在OPEN后事务回滚阶段,还会有UNDO回滚问题,因此一般采用此方法。
(2)、如不管数据文件头SCN不一致的情况,使用隐含参数"_allow_resetlogs_corruption"=true强制打开数据库,会有报错ORA-01248: file 4 was created in the future of incomplete recovery;这是由于重建控制文件,不进去恢复,此时认为SCN是异常的2号文件的时间(几个月前),而此时间点之后有新增加的数据文件,所以报错;由于新增加的文件有SYSTEM表空间的,因此也无法将这种后续添加的文件OFFLINE状态后打开数据库。
(3)基于对数据库启动机制的了解,SYSTEM/UNDO损坏时不能打开,SYSAUX的文件损坏情况下可以强制打开数据库;经过前期分析、尝试,确认以OFFLINE 2号文件(SYSAUX01.DBF)方式,尝试打开数据库。
5.OFFLINE 2号文件(SYSAUX01.DBF)并进行RECOVER
通过输入文件名方式将3个REDO文件放在尝试,可以正常恢复。
6.OPEN数据库成功,临时表空间需要处理。
后续导出数据时由于SYSAUX表空间文件OFFLINE,无法使用EXPDP以及EXP导出USERS模式,可以用EXP导出表模式批量导表。
============
1.分析数据库中数据文件头及控制文件中记录的数据文件检查点SCN信息,判断数据文件的一致性及可能的数据丢失情况
FILE# ctl-CHECKPOINT_CHANGE# ctl-LAST_CHANGE# dbf-CHECKPOINT_CHANGE# STATUS FUZ
1 1176708807 1176708807 ONLINE NO
2 1176659225 1036813104 ONLINE NO ====>>观察此文件的数据文件头检查点SCN,与控制文件中记录的差距巨大,
3 1176659225 1176618204 ONLINE YES
4 1176659225 1176618204 ONLINE YES
5 1176659225 1176618204 ONLINE YES
6 1176659225 1176618204 ONLINE YES
7 1176659225 1176618204 ONLINE YES
8 1176659225 1176618204 ONLINE YES
9 1176659225 1176618204 ONLINE YES
10 1176708807 1176708807 OFFLINE NO ====>>观察此文件,数据文件头检查点SCN,与控制文件中记录值一致,但是OFFLINE状态,怀疑是人为故障处理时操作。
11 1176659225 1176618204 ONLINE YES
2.ONLINE 10号数据文件后,正常OPEN数据库查看信息并进一步处理
Sun Oct 10 20:24:55 2021
alter database datafile 10 online
Completed: alter database datafile 10 online
Sun Oct 10 20:25:19 2021
alter database open
Errors in file d:\app\admin\diag\rdbms\orcl\orcl\trace\orcl_ora_6860.trc:
ORA-01589: 要打开数据库则必须使用 RESETLOGS 或 NORESETLOGS 选项
ORA-1589 signalled during: alter database open...
3.正常OPEN时提示需要以RESETLOGS方式打开数据库,怀疑前期重建过控制文件。
此时直接重建控制文件,并进行RECOVER DATABASE操作
Sun Oct 10 20:29:23 2021
Successful mount of redo thread 1, with mount id 1613376419
Completed: CREATE CONTROLFILE REUSE DATABASE "ORCL" RESETLOGS NOARCHIVELOG
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
MAXINSTANCES 8
MAXLOGHISTORY 292
LOGFILE
GROUP 1 'D:\app\Administrator\oradata\orcl\REDO01.log' SIZE 50M BLOCKSIZE 512,
GROUP 2 'D:\app\Administrator\oradata\orcl\REDO02.log' SIZE 50M BLOCKSIZE 512,
GROUP 3 'D:\app\Administrator\oradata\orcl\REDO03.log' SIZE 50M BLOCKSIZE 512
DATAFILE
'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF',
'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSAUX01.DBF',
'D:\APP\ADMINISTRATOR\ORADATA\ORCL\UNDOTBS01.DBF',
'D:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS01.DBF',
'D:\APP\ADMINISTRATOR\ORADATA\ORCL\FDC.DBF',
'D:\APP\ADMINISTRATOR\ORADATA\ORCL\ZCGL.DBF',
'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM.DBF',
'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM02.DBF',
'D:\APP\ADMINISTRATOR\ORADATA\ORCL\ZCGL2.DBF',
'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSAUX02.DBF',
'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM03.DBF'
CHARACTER SET AL32UTF8
ALTER DATABASE RECOVER database using backup controlfile
Media Recovery Start
started logmerger process
Sun Oct 10 20:33:35 2021
......
Parallel Media Recovery started with 4 slaves
ORA-279 signalled during: ALTER DATABASE RECOVER database using backup controlfile ...
Sun Oct 10 20:34:05 2021
ALTER DATABASE RECOVER LOGFILE 'd:\app\administrator\oradata\orcl\redo01.log'
Media Recovery Log d:\app\administrator\oradata\orcl\redo01.log
Sun Oct 10 20:34:05 2021
Errors with log d:\app\administrator\oradata\orcl\redo01.log
Errors in file d:\app\admin\diag\rdbms\orcl\orcl\trace\orcl_pr00_9952.trc:
ORA-00310: archived log contains sequence 29326; sequence 26024 required
ORA-00334: archived log: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG'
ORA-310 signalled during: ALTER DATABASE RECOVER LOGFILE 'd:\app\administrator\oradata\orcl\redo01.log' ...
ALTER DATABASE RECOVER CANCEL
Media Recovery Canceled
Completed: ALTER DATABASE RECOVER CANCEL
ALTER DATABASE RECOVER database using backup controlfile
Media Recovery Start
started logmerger process
4.ALTER DATABASE RECOVER database using backup controlfile 恢复时,出现报错ORA-00310: archived log contains sequence 29326; sequence 26024 required;从此报错可以发现RECOVER数据库时,需要用到很久远之前的序号26024号归档日志,而当前的REDO LOG日志序号是29326;结合前期观察到的数据文件最后修改日期、数据文件头SCN等对比信息,判断为2号数据文件的问题。
分析:
(1)、2号文件为SYSAUX,结合SCN数据来看,此文件状态与最新差异较多,如果使用BBED修改数据文件头方式,虽然可以骗过OPEN时的检查点校验,但是在应用REDO LOG进行实例恢复阶段,较大可能会出现数据不一致的错误;同时在OPEN后事务回滚阶段,还会有UNDO回滚问题,因此一般采用此方法。
(2)、如不管数据文件头SCN不一致的情况,使用隐含参数"_allow_resetlogs_corruption"=true强制打开数据库,会有报错ORA-01248: file 4 was created in the future of incomplete recovery;这是由于重建控制文件,不进去恢复,此时认为SCN是异常的2号文件的时间(几个月前),而此时间点之后有新增加的数据文件,所以报错;由于新增加的文件有SYSTEM表空间的,因此也无法将这种后续添加的文件OFFLINE状态后打开数据库。
(3)基于对数据库启动机制的了解,SYSTEM/UNDO损坏时不能打开,SYSAUX的文件损坏情况下可以强制打开数据库;经过前期分析、尝试,确认以OFFLINE 2号文件(SYSAUX01.DBF)方式,尝试打开数据库。
FILE# ctl-CHECKPOINT_CHANGE# dbf-CHECKPOINT_CHANGE# STATUS FUZ
1 1176708807 1176708807 ONLINE NO
2 1176659225 1036813104 ONLINE NO ====>>观察此文件的数据文件头检查点SCN,与控制文件中记录的差距巨大,
SQL> select * from v$dbfile order by 1;
FILE# NAME
1 D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF
2 D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSAUX01.DBF
5.OFFLINE 2号文件(SYSAUX01.DBF)并进行RECOVER
通过文件名方式将3个REDO文件放在尝试,可以正常恢复。
Sun Oct 10 20:38:59 2021
alter database datafile 2 offline
Completed: alter database datafile 2 offline
Sun Oct 10 20:39:22 2021
ALTER DATABASE RECOVER database using backup controlfile
Media Recovery Start
started logmerger process
Sun Oct 10 20:39:22 2021
............
Parallel Media Recovery started with 4 slaves
Warning: Datafile 2 (D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSAUX01.DBF) is offline during full database recovery and will not be recovered
ORA-279 signalled during: ALTER DATABASE RECOVER database using backup controlfile ...
ALTER DATABASE RECOVER LOGFILE 'd:\app\administrator\oradata\orcl\redo01.log'
Media Recovery Log d:\app\administrator\oradata\orcl\redo01.log
Errors with log d:\app\administrator\oradata\orcl\redo01.log
Errors in file d:\app\admin\diag\rdbms\orcl\orcl\trace\orcl_pr00_1364.trc:
ORA-00310: archived log contains sequence 29326; sequence 29327 required =====>>>>>
ORA-00334: archived log: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG'
ORA-310 signalled during: ALTER DATABASE RECOVER LOGFILE 'd:\app\administrator\oradata\orcl\redo01.log' ...
ALTER DATABASE RECOVER CANCEL
Media Recovery Canceled
Completed: ALTER DATABASE RECOVER CANCEL
Sun Oct 10 20:39:34 2021
ALTER DATABASE RECOVER database using backup controlfile
Media Recovery Start
started logmerger process
Sun Oct 10 20:39:34 2021
..................
Parallel Media Recovery started with 4 slaves
Warning: Datafile 2 (D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSAUX01.DBF) is offline during full database recovery and will not be recovered
ORA-279 signalled during: ALTER DATABASE RECOVER database using backup controlfile ...
ALTER DATABASE RECOVER LOGFILE 'd:\app\administrator\oradata\orcl\redo02.log'
Media Recovery Log d:\app\administrator\oradata\orcl\redo02.log
Sun Oct 10 20:39:41 2021
Hex dump of (file 11, block 1796922) in trace file d:\app\admin\diag\rdbms\orcl\orcl\trace\orcl_pr04_8636.trc
Corrupt block relative dba: 0x02db6b3a (file 11, block 1796922)
Fractured block found during media recovery
Data in bad block:
type: 6 format: 2 rdba: 0x02db6b3a
last change scn: 0x0000.46225752 seq: 0x2 flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x00000001
check value in block header: 0xe22f
computed block checksum: 0xd561
Reading datafile 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM03.DBF' for corruption at rdba: 0x02db6b3a (file 11, block 1796922)
Reread (file 11, block 1796922) found same corrupt data =====>>>>>有坏块,此文件为SYSTEM03,基表数据不在其中
ORA-279 signalled during: ALTER DATABASE RECOVER LOGFILE 'd:\app\administrator\oradata\orcl\redo02.log' ...
Sun Oct 10 20:39:54 2021
ALTER DATABASE RECOVER LOGFILE 'd:\app\administrator\oradata\orcl\redo03.log'
Media Recovery Log d:\app\administrator\oradata\orcl\redo03.log
Completed: ALTER DATABASE RECOVER LOGFILE 'd:\app\administrator\oradata\orcl\redo03.log'
6.OPEN数据库成功,临时表空间需要处理。
后续导出数据时由于SYSAUX表空间文件OFFLINE,无法使用EXPDP以及EXP导出USERS模式,可以用EXP导出表模式批量导表。
Sun Oct 10 20:40:09 2021
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 1176708807
Resetting resetlogs activation ID 1548501587 (0x5c4c4253)
Sun Oct 10 20:40:09 2021
Setting recovery target incarnation to 2
Sun Oct 10 20:40:09 2021
Assigning activation ID 1613415321 (0x602ac399)
Thread 1 opened at log sequence 1
Current log# 1 seq# 1 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1
Sun Oct 10 20:40:09 2021
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Dictionary check beginning
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
File #2 is offline, but is part of an online tablespace.
data file 2: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSAUX01.DBF'
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
............
Updating character set in controlfile to ZHS16GBK
No Resource Manager plan active
............
replication_dependency_tracking turned off (no async multimaster replication found)
Errors in file d:\app\admin\diag\rdbms\orcl\orcl\trace\orcl_ora_4692.trc (incident=140969):
ORA-25319: 队列表重新分区已中止
Incident details in: d:\app\admin\diag\rdbms\orcl\orcl\incident\incdir_140969\orcl_ora_4692_i140969.trc
error 25319 happened during Queue table repartitioning
Starting background process QMNC
Sun Oct 10 20:40:11 2021
QMNC started with pid=22, OS id=8456
XDB UNINITIALIZED: XDB$SCHEMA not accessible
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Sun Oct 10 20:40:11 2021
Sweep [inc][140969]: completed
Sweep [inc2][140969]: completed
Sun Oct 10 20:40:11 2021
ORA-376 encountered when generating server alert SMG-4120
Sun Oct 10 20:40:11 2021
Checker run found 1 new persistent data failures
Sun Oct 10 20:40:11 2021
Trace dumping is performing id=[cdmp_20211010204011]
Completed: alter database open resetlogs