Oracle service failover后自动切回原来的node

After database service resources in CRS failed over to another node in a RAC Cluster, they were terminated and automatically fail back to the original node incorrectly.

CAUSE

The "fail back" was actually a relocation of the service resource, initiated by user process(UiServer):


crsd_148.trc:2017-12-19 12:24:32.701587 :UiServer:1239340800: {1:54916:896} Container [ Name: UI_RELOCATE

crsd_148.trc:2017-12-19 12:24:32.701630 :UiServer:1239340800: {1:54916:896} Sending to PE. ctx= 0x7f830003a9f0, ClientPID=96087 ===========>

crsd_148.trc:2017-12-19 12:24:32.707319 :UiServer:1239340800: {1:54916:896} Response: c4|5!ORDERk7|MESSAGEt79|CRS-2673: Attempting to stop 'ora.orcl.orclsrv.svc' on 'node1'k7|MSGTY

PEt1|3k5|OBJIDt33|ora.orcl.orclsrv.svc 1 1k4|WAITt1|0

crsd_148.trc:2017-12-19 12:24:32.777894 :UiServer:1239340800: {1:54916:896} Response: c4|5!ORDERk7|MESSAGEt78|CRS-2677: Stop of 'ora.orcl.orclsrv.svc' on 'node1' succeededk7|MSGTYP

Et1|3k5|OBJIDt33|ora.orcl.orclsrv.svc 1 1k4|WAITt1|0

crsd_148.trc:2017-12-19 12:24:32.780142 : AGFW:1251948288: {1:54916:896} Agfw Proxy Server received the message: RESOURCE_START[ora.orcl.orclsrv.svc 1 1] ID 4098:8698047

crsd_148.trc:2017-12-19 12:24:32.780238 : AGFW:1251948288: {1:54916:896} Creating the resource: ora.orcl.orclsrv.svc 1 1

crsd_148.trc:2017-12-19 12:24:32.780339 : AGFW:1251948288: {1:54916:896} Initializing the resource ora.orcl.orclsrv.svc 1 1 for type ora.service.type

crsd_148.trc:2017-12-19 12:24:32.780380 : AGFW:1251948288: {1:54916:896} SR: acl = owner:oracle:rwx,pgrp:dba:r--,other::r--,group:dba:r-x,user:oracle:r-x

crsd_148.trc:2017-12-19 12:24:32.780665 : AGFW:1251948288: {1:54916:896} Agfw Proxy Server sending message: RESOURCE_ADD[ora.orcl.orclsrv.svc 1 1] ID 4356:2828 to the agent /u01/app/12.1.0.2/grid/bin/oraagent_oracle

The relocate was initiated by a clientPID 96087, using a "JAVA" program:

2017-12-19 12:24:32.701587 :UiServer:1239340800: {1:54916:896} Container [ Name: UI_RELOCATE

API_HDR_VER:

TextMessage[3]

CLIENT:

TextMessage[]

CLIENT_NAME:
TextMessage[java] ==============>

CLIENT_PID:
TextMessage[96087] ================>

CLIENT_PRIMARY_GROUP:

TextMessage[dba]

Further analysis showed that this was caused due to the presence of a custom FAN (Fast Application Notification) script defined inside the $GRID_HOME/racg/usrco directory. The following example of the custom script shows that it forcibly relocates a service back to the preferred node after it has failed over to an available node:

Custom script:

if service is not running, then start it

echo " service stopped, starting" >> "$LOGFILE"

ORACLE_HOME/bin/srvctl start service -d "DATABASE" -s "service" \>\> "LOGFILE"

else

Service is running, but is it running on preferred instance?

RUNNING=( 'echo "$SRVSTATUS" | sed -rne "s/.* ([a-zA-Z0-9]+)/\1/p" | tr "," "\n"' )

echo "{RUNNING\[@\]} = {PREFERRED[@]}"

if ! in_array "INSTANCE" "{RUNNING[@]}" ; then

echo " not running on preferred INSTANCE" \>\> "LOGFILE"

Find the first non-preferred running instance

CURRENT=""

for inst in "${RUNNING[@]}"; do

if ! in_array "inst" "{PREFERRED[@]}" ; then

CURRENT="$inst"

break

fi

done

Relocate

if [[ -n "$CURRENT" ]]; then

echo " relocate CURRENT -\> INSTANCE" >> "LOGFILE" **ORACLE_HOME/bin/srvctl relocate service -d "DATABASE" -s "service" -i "CURRENT" -t "INSTANCE" >> "$LOGFILE" <<<<<<<<<<<<<<<<<<<<<** <<<

fi

else

Service is already running on preferred instance, no need to do anything

echo " running on preferred INSTANCE" \>\> "LOGFILE"

fi

fi

fi

fi

done

SOLUTION

  1. If the custom script is a requirement in the environment, ignore the relocation.

  2. If the custom script is not a requirement or not wanted, verify and remove any user defined scripts in the '$GRID_HOME/racg/usrco' directory that were created for the services fail back.

相关推荐
JunLa7 分钟前
L angGraph vs 链式调用
java·网络·数据库
DianSan_ERP21 分钟前
抖店订单接口中消费者信息加密解密机制与安全履约全解析
前端·网络·数据库·后端·安全·团队开发·运维开发
爱码小白21 分钟前
MySQL运维篇
大数据·数据库·python
wang3zc25 分钟前
HTML函数能否用外接显卡坞提升性能_eGPU对HTML函数帮助【汇总】
jvm·数据库·python
難釋懷32 分钟前
Redis网络模型-Redis是单线程的吗?为什么使用单线程
网络·数据库·redis
2301_7815714233 分钟前
mysql如何配置自增ID预留_mysql innodb_autoinc_lock_mode参数
jvm·数据库·python
解决问题no解决代码问题37 分钟前
Quartz 1.6.5
数据库·servlet·oracle
桂花很香,旭很美41 分钟前
Redis-智能体开发中的大杀器
数据库·redis·缓存
dinglu1030DL1 小时前
CSS如何实现背景颜色的棋盘格分布_利用repeating-gradient
jvm·数据库·python
2303_821287381 小时前
Golang reflect反射怎么用_Golang反射教程【通俗】
jvm·数据库·python