Oracle非归档模式遇到文件损坏怎么办?

昨天夜里基地夜班的兄弟,打电话说有个报表库连不上了,赶紧起来连上VPN查看一下,看到实例宕机了,先赶紧startup起来。

1.查看报错信息

环境介绍:Redhat 6.9 Oracle 11.2.0.4 No Archive Mode

查看alert log 关键报错信息如下

复制代码
Thread 1 advanced to log sequence 4231012 (LGWR switch)
  Current log# 2 seq# 4231012 mem# 0: /oradata/rtp/redo02.log
Thu May 08 23:22:56 2025
KCF: read, write or open error, block=0x240ab online=1
        file=118 '/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf'
        error=27072 txt: 'Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 147627
Additional information: -1'
Errors in file /u01/app/oracle/diag/rdbms/rtp/rtp/trace/rtp_dbw0_3300.trc:
Errors in file /u01/app/oracle/diag/rdbms/rtp/rtp/trace/rtp_dbw0_3300.trc:
ORA-63999: data file suffered media failure
ORA-01114: IO error writing block to file 118 (block # 147627)
ORA-01110: data file 118: '/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf'
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 147627
Additional information: -1
DBW0 (ospid: 3300): terminating the instance due to error 63999
Thu May 08 23:22:57 2025
System state dump requested by (instance=1, osid=3300 (DBW0)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/rtp/rtp/trace/rtp_diag_3292_20250508232257.trc
Instance terminated by DBW0, pid = 3300

排查路径 查看报错的trc文件

复制代码
Trace file /u01/app/oracle/diag/rdbms/rtp/rtp/trace/rtp\_dbw0\_3300.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE\_HOME = /u01/app/oracle/product/11.2.0/db\_1
System name:    Linux
Node name:      rtpdb
Release:        2.6.32-696.el6.x86\_64
Version:        #1 SMP Tue Feb 21 00:53:17 EST 2017
Machine:        x86\_64
VM name:        VMWare Version: 6
Instance name: rtp
Redo thread mounted by this instance: 1
Oracle process number: 10
Unix process pid: 3300, image: oracle\@rtpdb (DBW0)

\*\*\* 2025-05-08 23:22:56.680
\*\*\* SESSION ID:(1521.1) 2025-05-08 23:22:56.680
\*\*\* CLIENT ID:() 2025-05-08 23:22:56.680
\*\*\* SERVICE NAME:(SYS\$BACKGROUND) 2025-05-08 23:22:56.680
\*\*\* MODULE NAME:() 2025-05-08 23:22:56.680
\*\*\* ACTION NAME:() 2025-05-08 23:22:56.680

KCF: read, write or open error, block=0x240ab online=1
file=118 '/oradata2/rtp/RTP/datafile/o1\_mf\_tbs\_ods\_n1qx02j0\_.dbf'
error=27072 txt: 'Linux-x86\_64 Error: 5: Input/output error
Additional information: 4
Additional information: 147627
Additional information: -1'
Encountered write error
DDE rules only execution for: ORA 1110
\----- START Event Driven Actions Dump ----
\---- END Event Driven Actions Dump ----
\----- START DDE Actions Dump -----
Executing SYNC actions
\----- START DDE Action: 'DB\_STRUCTURE\_INTEGRITY\_CHECK' (Async) -----
Successfully dispatched
\----- END DDE Action: 'DB\_STRUCTURE\_INTEGRITY\_CHECK' (SUCCESS, 0 csec) -----
Executing ASYNC actions
\----- END DDE Actions Dump (total 0 csec) -----
error 63999 detected in background process
ORA-63999: data file suffered media failure
ORA-01114: IO error writing block to file 118 (block # 147627)
ORA-01110: data file 118: '/oradata2/rtp/RTP/datafile/o1\_mf\_tbs\_ods\_n1qx02j0\_.dbf'
ORA-27072: File I/O error
Linux-x86\_64 Error: 5: Input/output error
Additional information: 4
Additional information: 147627
Additional information: -1
kjzduptcctx: Notifying DIAG for crash event
\----- Abridged Call Stack Trace -----
ksedsts()+465<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+63<-ksuitm()+5570<-ksbrdp()+3507<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai\_real()+250<-ssthrdmain()+265<-main()+201<-\_\_libc\_start\_main()+253
\----- End of Abridged Call Stack Trace -----

\*\*\* 2025-05-08 23:22:56.750
DBW0 (ospid: 3300): terminating the instance due to error 63999
ksuitm: waiting up to \[5] seconds before killing DIAG(3292)
\[oracle\@rtpdb \~]\$

2. OS层面检查IO报错问题

2.1查看/oradata2挂载点是否正常,发现有较多的io错误

复制代码
[oracle@rtpdb ~]$ dmesg | grep -i error
end_request: I/O error, dev sdc, sector 716711160
Buffer I/O error on device dm-0, logical block 89588639
lost page write due to I/O error on dm-0
Buffer I/O error on device dm-0, logical block 89588640
lost page write due to I/O error on dm-0
Buffer I/O error on device dm-0, logical block 89588641
lost page write due to I/O error on dm-0
Buffer I/O error on device dm-0, logical block 89588642
lost page write due to I/O error on dm-0
Buffer I/O error on device dm-0, logical block 89588643
lost page write due to I/O error on dm-0
Buffer I/O error on device dm-0, logical block 89588644
lost page write due to I/O error on dm-0
Buffer I/O error on device dm-0, logical block 89588645
lost page write due to I/O error on dm-0
Buffer I/O error on device dm-0, logical block 89588646
lost page write due to I/O error on dm-0
Buffer I/O error on device dm-0, logical block 89588647
lost page write due to I/O error on dm-0
JBD2: Detected IO errors while flushing file data on dm-0-8
[root@rtpdb ~]# tail -f /var/log/messages
May  8 23:22:50 rtpdb kernel: Buffer I/O error on device dm-0, logical block 89588645
May  8 23:22:50 rtpdb kernel: lost page write due to I/O error on dm-0
May  8 23:22:50 rtpdb kernel: Buffer I/O error on device dm-0, logical block 89588646
May  8 23:22:50 rtpdb kernel: lost page write due to I/O error on dm-0
May  8 23:22:50 rtpdb kernel: Buffer I/O error on device dm-0, logical block 89588647
May  8 23:22:50 rtpdb kernel: lost page write due to I/O error on dm-0
May  8 23:22:50 rtpdb kernel: JBD2: Detected IO errors while flushing file data on dm-0-8
May  9 03:40:04 rtpdb rhsmd: In order for Subscription Manager to provide your system with updates, your system must be registered with the Customer Portal. Please enter your Red Hat login to ensure your system is up-to-date.
May  9 12:38:21 rtpdb kernel: NET: Unregistered protocol family 36
May  9 12:38:21 rtpdb kernel: NET: Registered protocol family 36

3.Rman检查报错的文件是否有坏块

复制代码
RMAN> VALIDATE DATAFILE 118;

Starting validate at 09-MAY-25
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=647 device type=DISK
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
input datafile file number=00118 name=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf
channel ORA_DISK_1: validation complete, elapsed time: 00:00:15
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
118  FAILED 0              85421        268800          74401038924
  File Name: /oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              99872           
  Index      0              82856           
  Other      49             651             

validate found one or more corrupt blocks
See trace file /u01/app/oracle/diag/rdbms/rtp/rtp/trace/rtp_ora_22883.trc for details
Finished validate at 09-MAY-25

RMAN> list backup summary;

specification does not match any backup in the repository

RMAN> 

从/u01/app/oracle/diag/rdbms/rtp/rtp/trace/rtp_ora_22883.trc中检查具体的有哪些block损坏, 一下检查到这么多corrupt block

而且还没有物理备份(非归档模式的库)?该如何处理

复制代码
[oracle@rtpdb ~]$ cat /u01/app/oracle/diag/rdbms/rtp/rtp/trace/rtp_ora_22883.trc | grep -i "Corrupt"
Corrupt block relative dba: 0x1d82e5cf (file 118, block 189903)
Reread of blocknum=189903, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data
Reread of blocknum=189903, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data
Reread of blocknum=189903, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data
Reread of blocknum=189903, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data
Reread of blocknum=189903, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data。。。
Corrupt block relative dba: 0x1d82e5ff (file 118, block 189951)
Reread of blocknum=189951, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data
Reread of blocknum=189951, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data
Reread of blocknum=189951, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data
Reread of blocknum=189951, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data
Reread of blocknum=189951, file=/oradata2/rtp/RTP/datafile/o1_mf_tbs_ods_n1qx02j0_.dbf. found same corrupt data

这里出现连续的block 189903 到 block 189918 至少有 16坏块

4.查看坏块对应的object

检查这些坏块是输入对应的哪个object,看到是一个表

复制代码
SELECT tablespace_name, segment_type, owner, segment_name 
FROM dba_extents 
WHERE file_id = 118 AND 
      (block_id BETWEEN 189903 AND 189918 OR
       (block_id + blocks - 1) BETWEEN 189903 AND 189918 OR
       block_id < 189903 AND (block_id + blocks - 1) > 189918);
TABLESPACE_NAME                SEGMENT_TYPE       OWNER                          SEGMENT_NAME
------------------------------ ------------------ ------------------------------ ---------------------------------------------------------------------------------
TBS_ODS                        TABLE              ODS                            LOT_MATERIAL_MASTER

5.标记坏块,防止操作失败

标记这些坏块并跳过,这个不算是标准处理流程,因为是报表库,元数据都是从另外一个库拉取的,这里标记跳过,联系报表的同事重建这个表

复制代码
BEGIN
  DBMS_REPAIR.SKIP_CORRUPT_BLOCKS (
    schema_name   => 'ODS',
    object_name   => 'LOT_MATERIAL_MASTER',
    object_type   => DBMS_REPAIR.TABLE_OBJECT,
    flags         => DBMS_REPAIR.SKIP_FLAG);
END;
/
PL/SQL procedure successfully completed.

5.1 DBMS_REPAIR.SKIP_CORRUPT_BLOCKS包介绍

DBMS_REPAIR.SKIP_CORRUPT_BLOCKS 用于告诉数据库在访问特定表或索引时跳过已知的坏块(corrupt blocks) ,从而避免访问错误中断操作


主要作用

  • 标记指定对象中的坏块为可跳过,当应用或查询访问这些坏块时,Oracle 会跳过它们,而不是报错。

  • 适用于:

    • 表(TABLE_OBJECT

    • 索引(INDEX_OBJECT

  • 常用于数据库文件损坏、硬盘故障、备份文件不完整 等情况下临时绕过问题块继续业务运行或数据导出


🧠 使用场景举例:

  • 表中某些数据块损坏,导致全表扫描失败。

  • 临时需要导出未损坏的数据,用于转移或恢复。

  • 配合 DBMS_REPAIR.CHECK_OBJECT 检测坏块后,继续运行业务逻辑。


📌 工作机制

启用后,对象上的查询或操作:

  • 遇到坏块 → Oracle 跳过不访问这些坏块

  • 这样能 最大程度保留/导出/访问完好数据

  • 不影响数据块的实际内容(不会修复坏块,仅跳过)


常用调用格式

BEGIN DBMS_REPAIR.SKIP_CORRUPT_BLOCKS ( schema_name => 'SCOTT', object_name => 'EMP', object_type => DBMS_REPAIR.TABLE_OBJECT, flags => DBMS_REPAIR.SKIP_FLAG -- 开启跳过 ); END;

关闭跳过功能:

BEGIN DBMS_REPAIR.SKIP_CORRUPT_BLOCKS ( schema_name => 'SCOTT', object_name => 'EMP', object_type => DBMS_REPAIR.TABLE_OBJECT, flags => DBMS_REPAIR.NOSKIP_FLAG -- 关闭跳过 ); END;


⚠️ 注意事项

  1. 此操作不会修复坏块,只是忽略它们

  2. 配合 DBMS_REPAIR.CHECK_OBJECT 使用,先识别出坏块。

  3. 通常用于应急,不应长期依赖。

  4. 处理后建议尽快进行数据恢复或表重建

总结

因为这个是库是非归档模式的,所以没有物理备份,这样遭遇了block corrupt确实非常麻烦,建议重要的库还是一定要启用归档并使用RMAN备份。

复制代码
SQL> archive log list;
Database log mode              No Archive Mode
Automatic archival             Disabled
Archive destination            /u01/app/oracle/product/11.2.0/db_1/dbs/arch
Oldest online log sequence     4231143
Current log sequence           4231147

expdp备份部分表的脚本 供参考

复制代码
[oracle@rtpdb ~]$ cat $HOME/jobs/expback.sh
#!/bin/bash
#backup table on noarchive db
#create by norton.fan 20220729
PATH=$PATH:$HOME/bin
ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1
ORACLE_SID=rtp
PATH=$ORACLE_HOME/bin:$PATH
export ORACLE_BASE ORACLE_HOME ORACLE_SID
export PATH
NLS_LANG=AMERICAN_AMERICA.UTF8
export NLS_LANG
#export DELTIME=`date -d "15 days ago" +%Y%m%d`
export BACKUPTIME=`date +%Y%m%d%H%M%S`
expdp ods/ods dumpfile=ods$BACKUPTIME.dmp logfile=ods$BACKUPTIME.log parfile=/home/oracle/jobs/exp.par
#echo "Delete backup cycle before 15 days"
find /oradata/backup/ -mtime +1 -name  *.dmp -exec rm -f {} ';'   
find /oradata/backup/ -mtime +7 -name  *.log -exec rm -f {} ';'
[oracle@rtpdb ~]$ 
[oracle@rtpdb ~]$ cat /home/oracle/jobs/exp.par
DIRECTORY = dmpdir
SCHEMAS = ods
INCLUDE = TABLE:"IN (select table_name from exptab)" ##将需要备份的表名放入到exptab表中
相关推荐
柱子jason16 分钟前
使用IOT-Tree对接工业现场PLC并把采集数据记录到关系数据库中
数据库·物联网·plc·工业物联网·工业自动化·iot-tree·生产线配套
2301_8223754419 分钟前
Python虚拟环境(venv)完全指南:隔离项目依赖
jvm·数据库·python
2301_7903009620 分钟前
Python类型提示(Type Hints)详解
jvm·数据库·python
一路向北⁢35 分钟前
Spring Boot 3 整合 SSE (Server-Sent Events) 企业级最佳实践(二)
java·数据库·spring boot·sse·通信
远方160938 分钟前
112-Oracle database 26ai下载和安装环境准备
大数据·数据库·sql·oracle·database
2401_8384725139 分钟前
Python多线程与多进程:如何选择?(GIL全局解释器锁详解)
jvm·数据库·python
光影少年1 小时前
非关系数据库和关系型数据库都有哪些?
数据库·数据库开发·非关系型数据库
2301_822363601 小时前
Python单元测试(unittest)实战指南
jvm·数据库·python
麦兜*1 小时前
深入解析分布式数据库TiDB核心架构:基于Raft一致性协议与HTAP混合负载实现金融级高可用与实时分析的工程实践
数据库·分布式·tidb
m0_561359671 小时前
Python面向对象编程(OOP)终极指南
jvm·数据库·python