前言
今天应用反馈有一套报表库查询有延时,几十分钟没有同步数据了,但是我检查 DG 同步状态之后发现一切正常,手动触发切换日志可以正常同步数据,但是无法实时查询主库的在线重做日志。
不是 DG 配置问题,不是 standby logfile 问题,不是网络问题,能是什么问题呢?
本文记录一下完整的分析过程以及解决方案。
检查 DG 状态
首先,先检查 DG 同步状态是否正常:
sql
set linesize 2222 heading on wrap on
column dest_name format a20
column status format a8
column database_mode format a15
column recovery_mode format a35
column protection_mode format a25
column destination format a15
column archived_seq# format 999999999
column applied_seq# format 999999999
column error format a10
column srl format a5
column db_unique_name format a15
column gap_status format a10
SELECT inst_id, dest_name, status, database_mode, protection_mode,
recovery_mode, gap_status, archived_seq#, applied_seq#, srl,
db_unique_name, destination, error
FROM gv$archive_dest_status
WHERE status <> 'INACTIVE'
AND type = 'PHYSICAL';
INST_ID DEST_NAME STATUS DATABASE_MODE PROTECTION_MODE RECOVERY_MODE GAP_STATUS ARCHIVED_SEQ# APPLIED_SEQ# SRL DB_UNIQUE_NAME DESTINATION ERROR
---------- -------------------- -------- --------------- ------------------------- ----------------------------------- ---------- ------------- ------------ ----- --------------- --------------- ----------
2 LOG_ARCHIVE_DEST_2 VALID OPEN_READ-ONLY MAXIMUM PERFORMANCE MANAGED REAL TIME APPLY NO GAP 123726 123725 YES orcldg orcldg
1 LOG_ARCHIVE_DEST_2 VALID OPEN_READ-ONLY MAXIMUM PERFORMANCE MANAGED REAL TIME APPLY NO GAP 123726 123725 YES orcldg orcldg
检查备库同步进程:
sql
set line2222 pages1000
select inst_id,pid,process,thread#,sequence#,status,delay_mins from gv$managed_standby;
INST_ID PID PROCESS THREAD# SEQUENCE# STATUS DELAY_MINS
---------- ---------- --------- ---------- ---------- ------------ ----------
1 10386 RFS 0 0 IDLE 0
1 3728 ARCH 2 123538 CLOSING 0
1 3730 ARCH 0 0 CONNECTED 0
1 3732 ARCH 2 123543 CLOSING 0
1 3921 ARCH 2 123537 CLOSING 0
1 4613 RFS 0 0 IDLE 0
1 4195 RFS 0 0 IDLE 0
1 4286 RFS 0 0 IDLE 0
1 10384 RFS 1 123728 IDLE 0
1 10381 RFS 2 123544 IDLE 0
1 10388 RFS 0 0 IDLE 0
1 4647 RFS 0 0 IDLE 0
1 5573 MRP0 1 123728 APPLYING_LOG 0
检查 DG 配置:
sql
-- 主库
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_1 string LOCATION=+ARCH VALID_FOR=(ALL_
LOGFILES,ALL_ROLES) DB_UNIQUE_
NAME=ORCL
log_archive_dest_2 string service=orcldg LGWR ASYNC vali
d_for=(online_logfiles,primary
_role) db_unique_name=orcldg
-- 备库
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_1 string
log_archive_dest_2 string service=orcl LGWR ASYNC valid_
for=(online_logfiles,primary_r
ole) db_unique_name=orcl
检查 standby logfile 状态:
sql
THREAD# GROUP# MEMBER STATUS ARC T2.BYTES/1024/1024
---------- ---------- ---------------------------------------------------------------------------------------------------- ---------- --- ------------------
1 5 /u01/app/oracle/oradata/redostd05.log UNASSIGNED NO 200
1 6 /u01/app/oracle/oradata/redostd06.log UNASSIGNED NO 200
1 9 /u01/app/oracle/oradata/redostd09.log UNASSIGNED NO 200
2 7 /u01/app/oracle/oradata/redostd07.log UNASSIGNED NO 200
2 8 /u01/app/oracle/oradata/redostd08.log UNASSIGNED NO 200
2 10 /u01/app/oracle/oradata/redostd10.log UNASSIGNED NO 200
通过以上查询可以发现,DG 配置以及同步都是正常的,应该不会出现无法实时查询的问题,那为什么会延时呢?
尝试在主库手动触发日志切换:
sql
-- 主库执行
SQL> alter system archive log current;
发现数据可以正常同步过来,那就是说同步是没问题的。
检查 alert 日志
查看数据库 alert 日志看看是不是有报错:
bash
## 在主库发现有报错
LNS: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (270)
LNS: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Error 270 for archive log file 2 to 'orcldg'
ORA-00270: error creating archive log
oerr 看一下报错的意思:
bash
[oracle@orcl:/home/oracle]$ oerr ora 270
00270, 00000, "error creating archive log %s"
// *Cause: An error was encountered when either creating or opening
// the destination file for archiving.
// *Action: Check that the archive destination is valid and that there
// is sufficient space on the destination device.
难道是 DG 的磁盘空间满了?
bash
[oracle@orcldg:/]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_jxsemesdg01-lv_root
3.1T 2.6T 334G 89% /
tmpfs 63G 72K 63G 1% /dev/shm
/dev/sda1 477M 79M 373M 18% /boot
磁盘空间还没满,那为什么无法创建归档?检查一下归档目录位置:
sql
SQL> archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 123726
Next log sequence to archive 0
Current log sequence 123727
没有设置归档目录,默认存放到闪回目录下,检查一下闪回配置:
sql
SQL> show parameter db_recover
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest string /u01/app/oracle/fast_recovery_
area
db_recovery_file_dest_size big integer 100G
好的,破案了,闪回配置的空间上限设置了 100G,然后归档也是存放在闪回目录,当闪回目录达到上限之后,归档自然就无法写入了。
解决方案
解决方法也很简单,就是把归档日志从闪回目录独立出去:
bash
[oracle@orcldg:/u01/app/oracle]$ mkdir archivelog
[oracle@orcldg:/u01/app/oracle]$ cd archivelog/
[oracle@orcldg:/u01/app/oracle/archivelog]$ pwd
/u01/app/oracle/archivelog
备库设置归档目录:
sql
SQL> alter system set log_archive_dest_1='LOCATION=/u01/app/oracle/archivelog VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=orcldg';
System altered.
SQL> archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /u01/app/oracle/archivelog
Oldest online log sequence 123726
Next log sequence to archive 0
Current log sequence 123727
主库重新激活同步:
sql
SQL> alter system set log_archive_dest_state_2=defer;
System altered.
SQL> alter system set log_archive_dest_state_2=enable;
System altered.
再从检查备库 standby logfile 状态:
sql
SQL> select t2.thread#,t1.group#,t1.member,t2.STATUS,t2.ARCHIVED,t2.bytes/1024/1024 from v$logfile t1,v$standby_log t2 where t1.group#=t2.group# order by 1,2;
THREAD# GROUP# MEMBER STATUS ARC T2.BYTES/1024/1024
---------- ---------- ---------------------------------------------------------------------------------------------------- ---------- --- ------------------
1 5 /u01/app/oracle/oradata/redostd05.log UNASSIGNED NO 200
1 6 /u01/app/oracle/oradata/redostd06.log UNASSIGNED NO 200
1 9 /u01/app/oracle/oradata/redostd09.log UNASSIGNED NO 200
2 7 /u01/app/oracle/oradata/redostd07.log ACTIVE YES 200
2 8 /u01/app/oracle/oradata/redostd08.log UNASSIGNED NO 200
2 10 /u01/app/oracle/oradata/redostd10.log UNASSIGNED NO 200
ADG 恢复实时查询,问题解决。
写在最后
一句话总结:闪回恢复区与归档日志共享空间时,必须做好空间规划和监控,否则会导致 DG 实时应用中断而难以察觉。生产环境建议将归档独立部署,避免空间竞争。