Oracle adg进行switchover主备切换后,ogg如何恢复?

html 复制代码
作者介绍:老苏,10余年DBA工作运维经验,擅长Oracle、MySQL、PG、Mongodb数据库运维(如安装迁移,性能优化、故障应急处理等)
公众号:老苏畅谈运维
欢迎关注本人公众号,更多精彩与您分享。

Oracle主备进行switchover切换时,原先在主库运行的OGG进程是否会异常,如何进行恢复?

环境说明

主机名 IP地址 实例名 数据库角色 OGG角色 数据库版本 OGG版本
ora11g 10.10.10.160 ora11g 主库 源端 11.2.0.4 19.1.0.0.4
ora11gdg 10.10.10.41 dgora11g 备库 / 11.2.0.4 /
hostname 10.10.10.44 dbbbed / 目标端 11.2.0.4 19.1.0.0.4

说明:10.10.10.160为主库,当前为OGG源端,10.10.10.41是10.10.10.160的备库,10.10.10.44为OGG的目标端。

switchover切换后OGG状态

发生switchover切换前 10.10.10.160为主库,当前为OGG源端,上面有进程抽取和投递进程:EXTCS、DPCS。

发生switchover切换后10.10.10.160为备库,10.10.10.41变为主库,那么10.10.10.160上面的EXTCS、DPCS能正常工作吗?关于如何进行switchover切换,可以参考之前的文档,下面我们切换完后看一下他们进程状态:

--有一张测试表test_job,后台存储过程每个1分钟自动插入数据。
sys@ora11g> select max(C_DATE) from szr.test_job;

MAX(C_DATE)
-------------------
2024-08-19 15:34:24

--切换完进程状态
GGSCI (ora11g) 12> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     DPCS        00:00:00      00:00:02    
EXTRACT     ABENDED     EXTCS       00:00:01      00:10:14    

发生主备switchover切换后,在主库的ogg源端extract进程会abend,报错如下:

2024-08-19T15:48:19.056+0800  INFO    OGG-06604  Oracle GoldenGate Capture for Oracle, extcs.prm:  Database ORA11G CPU info: CPU Count 2, CPU Core Count 2, CPU Socket Count 2.
2024-08-19T15:48:19.056+0800  WARNING OGG-00727  Oracle GoldenGate Capture for Oracle, extcs.prm:  Switch extract to archived log only mode on physical standby database.
2024-08-19T15:48:19.056+0800  WARNING OGG-01830  Oracle GoldenGate Capture for Oracle, extcs.prm:  LOGRETENTION is disabled by default in ARCHIVEDLOGONLY mode.
2024-08-19T15:48:19.154+0800  ERROR   OGG-00060  Oracle GoldenGate Capture for Oracle, extcs.prm:  Extract requires a value specified for parameter ALTARCHIVELOGDEST when in archived log only mode.
2024-08-19T15:48:19.159+0800  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, extcs.prm:  PROCESS ABENDING.
2024-08-19T15:48:20.973+0800  INFO    OGG-00987  Oracle GoldenGate Command Interpreter for Oracle:  GGSCI command (oracle): info all.

恢复方式1

默认无法在ADG环境抽取redo,解决方案是在抽取进程中添加参数:

TRANLOGOPTIONS MINEFROMACTIVEDG

--参考MOS文档,注意OGG版本需在 12.1.2.1.0 及以后才有这个参数
Does GoldenGate Support Extracting Redo from Active Data Guard? (Doc ID 1299805.1)
 Does GoldenGate support extract reading redo from generated from an active data guard?
 Yes, this is supported in 12.1.2.1.0 (but not previous version) GoldenGate classic extract, with parameter:
 TRANLOGOPTIONS MINEFROMACTIVEDG
 For versions older than 12.1.2.1.0 the below error can occur:
 ERROR   OGG-00303  Unrecognized option (MINEFROMACTIVEDG) for TRANLOGOPTIONS.
 So Goldengate must be upgraded to 12.1.2.1.0 or later versions to resolve it.

GGSCI (ora11g) 15> edit param extcs

EXTRACT extcs
userid ggadmin,password ggadmin123
TRANLOGOPTIONS  dblogreader
TRANLOGOPTIONS MINEFROMACTIVEDG
LOGALLSUPCOLS
UPDATERECORDFORMAT compact
GETTRUNCATES
GETUPDATEBEFORES
NOCOMPRESSDELETES
NOCOMPRESSUPDATES
DBOPTIONS ALLOWUNUSEDCOLUMN
TRANLOGOPTIONS EXCLUDEUSER ggadmin
FETCHOPTIONS NOUSESNAPSHOT
EXTTRAIL ./dirdat/cs
DISCARDFILE ./dirrpt/extcs.dsc, APPEND, MEGABYTES 1024
discardrollover at 3:00
warnlongtrans 1h,checkinterval 5m
CACHEMGR CACHESIZE 1024MB, CACHEDIRECTORY ./dirtmp
REPORTCOUNT EVERY 60 SECONDS, RATE
DDL INCLUDE MAPPED
--DDLOPTIONS ADDTRANDATA
DDLOPTIONS REPORT
TABLE SZR.*;

DDLOPTIONS ADDTRANDATA 需要注释掉,否则会报错:

DDLOPTIONS ADDTRANDATA is not supported on read only database

上述参数改完后,还会遇到下列错误:

2024-08-19T16:01:23.948+0800  INFO    OGG-01517  Oracle GoldenGate Capture for Oracle, extcs.prm:  Position of first record processed Sequence 342, RBA 625168, SCN 0.3730950 (3730950), 2024 M08 19 15:35:00.
2024-08-19T16:01:23.949+0800  ERROR   OGG-02803  Oracle GoldenGate Capture for Oracle, extcs.prm:  Encountered a Data Guard role transition. Alter Extract to SCN 3,751,013 and restart Extract, or recreate Extract with the correct number of threads at SCN 3,751,013.
2024-08-19T16:01:23.949+0800  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, extcs.prm:  PROCESS ABENDING.
2024-08-19T16:01:24.913+0800  INFO    OGG-02232  Oracle GoldenGate Capture for Oracle, dpcs.prm:  Switching to next trail file /ogg/dirdat/cs000000003 at 2024-08-19 16:01:24.909392 due to EOF. with current RBA 1,793,635.
2024-08-19T16:01:25.014+0800  INFO    OGG-00987  Oracle GoldenGate Command Interpreter for Oracle:  GGSCI command (oracle): info all.

3751013是备库变成主库时的scn,可以通过原备库(新主库)的alert日志看到SwitchOver after complete recovery through change 3751013,或者通过v$database视图的列STANDBY_BECAME_PRIMARY_SCN获取。

解决方法:

--修改extract的捕获scn为报错显示的scn
GGSCI (ora11g) 22> ALTER EXTRACT extcs  scn 3751013;
EXTRACT altered.

--重新启动
GGSCI (ora11g) 23> start extcs

Sending START request to MANAGER ...
EXTRACT EXTCS starting


GGSCI (ora11g) 24> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     DPCS        00:00:00      00:00:08    
EXTRACT     RUNNING     EXTCS       00:00:00      00:00:00    

查看目标端复制进程:

GGSCI (11gbbed) 24> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
REPLICAT    RUNNING     REPCS       00:00:00      00:00:02   

SQL> select max(C_DATE) from szr.test_job;

MAX(C_DATE)
-------------------
2024-08-19 16:16:18

复制进程正常,测试表的数据也发生变化了。

恢复方式2

上面的恢复方法是修改原有的OGG配置,那么能不能把OGG拷贝到切换的后的主库上去,正常跑起来呢?也就是主备切换之后的新主库进行抽取,在新主库上配置新的抽取进程,看一下具体操作过程:

--关闭源端所有ogg进程

GGSCI (ora11g) 35> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     DPCS        00:00:00      00:00:08    
EXTRACT     ABENDED     EXTCS       00:00:00      00:02:16    


GGSCI (ora11g) 36> stop *

Sending STOP request to EXTRACT DPCS ...
Request processed.
EXTRACT EXTCS is already stopped.


GGSCI (ora11g) 37> stop mgr
Manager process is required by other GGS processes.
Are you sure you want to stop it (y/n)?y

Sending STOP request to MANAGER ...
Request processed.
Manager stopped.

GGSCI (ora11g) 38> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     STOPPED                                           
EXTRACT     STOPPED     DPCS        00:00:00      00:00:45    
EXTRACT     ABENDED     EXTCS       00:00:00      00:03:10  

sys@ora11g> select max(C_DATE) from szr.test_job;

MAX(C_DATE)
-------------------
2024-08-19 16:33:37

--安装目录拷贝到新主库(10.10.10.41)

--拷贝到相同路径下
[oracle@ora11g:/ogg]$ scp -r * 10.10.10.41:/ogg
PS:注意属主,权限

--新主库添加环境变量

$ vi .bash_profile
###OGG_HOME
export OGG_HOME=/ogg
export PATH=$ORACLE_HOME/bin:$PATH:$ORACLE_HOME/OPatch:$OGG_HOME

--如果两边的数据库安装目录,实例名配置不一样,则修改配置文件里面的配置ORACLE_HOME,ORACLE_SID

--启动进程

GGSCI (ora11gdg) 4> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     STOPPED                                           
EXTRACT     STOPPED     DPCS        00:00:00      00:07:40    
EXTRACT     ABENDED     EXTCS       00:00:00      00:10:05    


GGSCI (ora11gdg) 5> start mgr
Manager started.


GGSCI (ora11gdg) 6> start *

Sending START request to MANAGER ...
EXTRACT DPCS starting

Sending START request to MANAGER ...
EXTRACT EXTCS starting
GGSCI (ora11gdg) 9> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     DPCS        00:00:00      00:00:09    
EXTRACT     ABENDED     EXTCS       00:00:00      00:10:38  

EXTCS启动不起来,报错如下:

2024-08-19 16:46:50  ERROR   OGG-02803  Encountered a Data Guard role transition. Alter Extract to SCN 3,793,698 and restart Extract, or recreate Extract with the correct number of threads at SCN 3,793,698.

***********************************************************************
*                   ** Run Time Statistics **                         *
***********************************************************************

按之前的方法进行处理:

--修改extract的捕获scn为报错显示的scn
GGSCI (ora11gdg) 15>  ALTER EXTRACT extcs  scn 3793698;
EXTRACT altered.

 
GGSCI (ora11gdg) 16> start extcs

Sending START request to MANAGER ...
EXTRACT EXTCS starting


GGSCI (ora11gdg) 17> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     DPCS        00:00:00      00:00:05    
EXTRACT     RUNNING     EXTCS       00:00:00      00:00:05    

抽取进程启动正常后,看一下复制进程和测试表的数据变化

GGSCI (11gbbed) 26> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
REPLICAT    RUNNING     REPCS       00:00:04      00:00:06    

SQL> select sysdate,max(C_DATE) from szr.test_job;

SYSDATE 	    MAX(C_DATE)
------------------- -------------------
2024-08-19 16:50:14 2024-08-19 16:49:32

可以看到,数据同步正常。

总结:switchover切换,OGG抽取进程进程需要进行处理,才能使其恢复正常工作。

关注我,学习更多的数据库知识!

相关推荐
瓜牛_gn23 分钟前
mysql特性
数据库·mysql
奶糖趣多多1 小时前
Redis知识点
数据库·redis·缓存
CoderIsArt2 小时前
Redis的三种模式:主从模式,哨兵与集群模式
数据库·redis·缓存
师太,答应老衲吧4 小时前
SQL实战训练之,力扣:2020. 无流量的帐户数(递归)
数据库·sql·leetcode
Channing Lewis5 小时前
salesforce case可以新建一个roll up 字段,统计出这个case下的email数量吗
数据库·salesforce
毕业设计制作和分享7 小时前
ssm《数据库系统原理》课程平台的设计与实现+vue
前端·数据库·vue.js·oracle·mybatis
ketil277 小时前
Redis - String 字符串
数据库·redis·缓存
Hsu_kk8 小时前
MySQL 批量删除海量数据的几种方法
数据库·mysql
编程学无止境8 小时前
第02章 MySQL环境搭建
数据库·mysql
knight-n8 小时前
MYSQL库的操作
数据库·mysql