Redis源码分析之哨兵
1. 哨兵启动
哨兵可以通过redis-server命令启动,如下:(或者redis-sentinel)
css
redis-server --sentinel
哨兵在main函数中的流程如下:
c
//server.c#main
int main(int argc, char **argv) {
...
//检测是否以sentinel模式启动
server.sentinel_mode = checkForSentinelMode(argc,argv);
...
if (server.sentinel_mode) {
//server.port 修改成的26379
initSentinelConfig();
//主要更改可执行命令,和做一些初始化
initSentinel();
}
...
//内部如果是sentinel,会调用sentinelHandleConfiguration方法,进行解析配置,然后初始化
loadServerConfig(configfile,options);
...
//随机生成一个40字节的哨兵id,打印启动日志
sentinelIsRunning();
}
在initSentinel方法中,会更改哨兵可执行的命令。其中publish命令的处理函数为sentinelPublishCommand,和订阅__sentinel__:hello
频道的内部处理一致。
c
//sentinel.c
struct redisCommand sentinelcmds[] = {
{"ping",pingCommand,1,"",0,NULL,0,0,0,0,0},
{"sentinel",sentinelCommand,-2,"",0,NULL,0,0,0,0,0},
{"subscribe",subscribeCommand,-2,"",0,NULL,0,0,0,0,0},
{"unsubscribe",unsubscribeCommand,-1,"",0,NULL,0,0,0,0,0},
{"psubscribe",psubscribeCommand,-2,"",0,NULL,0,0,0,0,0},
{"punsubscribe",punsubscribeCommand,-1,"",0,NULL,0,0,0,0,0},
{"publish",sentinelPublishCommand,3,"",0,NULL,0,0,0,0,0},
{"info",sentinelInfoCommand,-1,"",0,NULL,0,0,0,0,0},
{"role",sentinelRoleCommand,1,"l",0,NULL,0,0,0,0,0},
{"client",clientCommand,-2,"rs",0,NULL,0,0,0,0,0},
{"shutdown",shutdownCommand,-1,"",0,NULL,0,0,0,0,0},
{"auth",authCommand,2,"sltF",0,NULL,0,0,0,0,0}
};
在sentinelHandleConfiguration方法中,会对配置中的monitor进行解析,把master信息添加到sentinel.masters中。
c
//sentinel.c#sentinelHandleConfiguration
if (!strcasecmp(argv[0],"monitor") && argc == 5) {
/* monitor <name> <host> <port> <quorum> */
int quorum = atoi(argv[4]);
if (quorum <= 0) return "Quorum must be 1 or greater.";
if (createSentinelRedisInstance(argv[1],SRI_MASTER,argv[2],
atoi(argv[3]),quorum,NULL) == NULL) {
......
}
}
在main函数中,只是进行了一些初始化。对master的监控,在定时任务中。
c
//server.c#serverCron
if (server.sentinel_mode) sentinelTimer();
sentinelTimer函数,主要做了以下事情:
1)建立命令连接和消息连接。消息连接建立之后会订阅Redis服务的_sentinel_:hello频道。
2)在命令连接上,每10s发送info命令进行信息采集,获取slaves信息。每1s在命令连接上发送ping命令探测存活性。每2s在命令连接上发布一条消息,信息格式如下:
sentinel_ip,sentinel_port,sentinel_runid,current_epoch,master_name,master_ip,master_port,master_epoch
3)检测服务是否处于主观下线。
4)检测服务是否处于客观下线,如果是主服务器还需要做主从切换。
一个哨兵如何知道其他哨兵对一台master服务器的状态判断?
sentinel会向监控同一台Master的所有哨兵通过命令连接发送如下格式:
csharp
SENTINEL is-master-down-by-addr master_ip master_port current_epoch sentinel_runid
最后一项当需要投票时发送sentinel_runnid,否则发送一个*
2. 主要流程
下面主要介绍上述几个流程
2.1 建立连接
在sentinelReconnectInstance方法中,会进行创建连接。对于master、slave和Sentinel都会创建命令连接 ,通过link.cc变量存储。对于master和slave会创建消息连接,sentinel则不会 ,通过link.pc变量存储。另外消息连接中,会订阅 __sentinel__:hello
主题,且接收函数为sentinelReceiveHelloMessages。
c
//sentinel.c#sentinelReconnectInstance
//创建命令连接
if (link->cc == NULL) {
link->cc = redisAsyncConnectBind(ri->addr->ip,ri->addr->port,NET_FIRST_BIND_ADDR);
...
}
//创建消息连接
/* Pub / Sub */
if ((ri->flags & (SRI_MASTER|SRI_SLAVE)) && link->pc == NULL) {
link->pc = redisAsyncConnectBind(ri->addr->ip,ri->addr->port,NET_FIRST_BIND_ADDR);
...
retval = redisAsyncCommand(link->pc,
sentinelReceiveHelloMessages, ri, "%s %s",
sentinelInstanceMapCommand(ri,"SUBSCRIBE"),
SENTINEL_HELLO_CHANNEL);
}
在sentinelReceiveHelloMessages方法中会调用sentinelProcessHelloMessage,内部会发现其他Sentinel,然后将其加入到Sentinel列表中。
c
//sentinel.c#sentinelProcessHelloMessage
//token[2]为sentinel_runid,token[0]为sentinel_ip
si = createSentinelRedisInstance(token[2],SRI_SENTINEL,
token[0],port,master->quorum,master);
2.2 发送命令
对于master和slave会进行发送INFO命令,默认10s执行一次。info_period 为判断条件,在一些特殊时期,该值为1s。该命令为异步命令,INFO命令的回复处理函数在sentinelInfoReplyCallback中,内部会对结果进行处理。
c
//sentinel.c#sentinelSendPeriodicCommands
/* Send INFO to masters and slaves, not sentinels. */
if ((ri->flags & SRI_SENTINEL) == 0 &&
(ri->info_refresh == 0 ||
(now - ri->info_refresh) > info_period))
{
retval = redisAsyncCommand(ri->link->cc,
sentinelInfoReplyCallback, ri, "%s",
sentinelInstanceMapCommand(ri,"INFO"));
if (retval == C_OK) ri->link->pending_commands++;
}
sentinelInfoReplyCallback内部会调用sentinelRefreshInstanceInfo方法,内部对INFO结果进行解析。其中最重要的是从INFO命令中发现slaves信息。先从已有连接中根据ip和端口找slave信息,如果没有找到,则进行连接。
c
//sentinel.c#sentinelRefreshInstanceInfo
/* Check if we already have this slave into our table,
* otherwise add it. */
if (sentinelRedisInstanceLookupSlave(ri,ip,atoi(port)) == NULL) {
if ((slave = createSentinelRedisInstance(NULL,SRI_SLAVE,ip,
atoi(port), ri->quorum, ri)) != NULL) {
sentinelEvent(LL_NOTICE,"+slave",slave,"%@");
sentinelFlushConfig();
}
}
createSentinelRedisInstance方法中会创建一个sentinelRedisInstance实例。createSentinelRedisInstance方法是一个通用方法,其中不同flags存储的地方不同。如刚刚是从INFO中找slaves信息,flags为SRI_SLAVE,则存储在master实例的slaves属性中。
c
//sentinel.c#sentinelRefreshInstanceInfo
createSentinelRedisInstance
if (flags & SRI_MASTER) table = sentinel.masters;
else if (flags & SRI_SLAVE) table = master->slaves;
else if (flags & SRI_SENTINEL) table = master->sentinels;
...
dictAdd(table, ri->name, ri);
默认1s向master、slave、sentinel发送Ping命令。sentinelSendPing方法内部会发送PING命令,并且记录下相关的时间。
c
//sentinel.c#sentinelSendPeriodicCommands
if ((now - ri->link->last_pong_time) > ping_period &&
(now - ri->link->last_ping_time) > ping_period/2) {
sentinelSendPing(ri);
}
默认2s,向 "sentinel:hello"主题中发布消息。sentinelSendHello方法内部会根据格式发送一条命令。对于master和slave来说,会将该接收到的数据转发给订阅的客户端(sentinel)。对于Sentinel来说,直接处理了消息。(效果和订阅一致)
c
//sentinel.c#sentinelSendPeriodicCommands
if ((now - ri->last_pub_time) > SENTINEL_PUBLISH_PERIOD) {
sentinelSendHello(ri);
}
2.3 主观判断
主观判断针对master、slave、sentinel。
判断一:elapsed为上一次接收到ping回复时间至今时间,ri->down_after_period根据配置,默认30S。默认为30s能发现服务器挂掉。
判断二:当前节点为主节点,通过Slave的INFO命令得到(ri->role_reported == SRI_SLAVE),并且上一次得到INFO命令的回复已经超过了ri->down_after_period 的时间加两倍INFO命令的时间(默认50s)。这里的判断,可能是担心从服务器连不上主服务器,但是主服务器还能和sentinel通信。
c
//sentinel.c#sentinelCheckSubjectivelyDown
void sentinelCheckSubjectivelyDown(sentinelRedisInstance *ri) {
mstime_t elapsed = 0;
if (ri->link->act_ping_time)
elapsed = mstime() - ri->link->act_ping_time;
else if (ri->link->disconnected)
elapsed = mstime() - ri->link->last_avail_time;
...
if (elapsed > ri->down_after_period ||
(ri->flags & SRI_MASTER &&
ri->role_reported == SRI_SLAVE &&
mstime() - ri->role_reported_time >
(ri->down_after_period+SENTINEL_INFO_PERIOD*2))) {
/* Is subjectively down */
if ((ri->flags & SRI_S_DOWN) == 0) {
sentinelEvent(LL_WARNING,"+sdown",ri,"%@");
ri->s_down_since_time = mstime();
ri->flags |= SRI_S_DOWN;
}
}
}
2.4 客观判断
客观判断主要针对master服务器。
c
//sentinel.c#sentinelHandleRedisInstance
if (ri->flags & SRI_MASTER) {
sentinelCheckObjectivelyDown(ri);
if (sentinelStartFailoverIfNeeded(ri))
sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_ASK_FORCED);
sentinelFailoverStateMachine(ri);
sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_NO_FLAGS);
}
主观判断的函数是在sentinelCheckObjectivelyDown中,会对sentinel进行遍历,查看他们是否认为Master是下线,如果是则进行计数加1。当计数大于配置设置的quorum时,则认为Master客观下线,修改flags。
C
//sentinel.c#sentinelCheckObjectivelyDown
void sentinelCheckObjectivelyDown(sentinelRedisInstance *master) {
dictIterator *di;
dictEntry *de;
unsigned int quorum = 0, odown = 0;
if (master->flags & SRI_S_DOWN) {
/* Is down for enough sentinels? */
quorum = 1; /* the current sentinel. */
/* Count all the other sentinels. */
di = dictGetIterator(master->sentinels);
while((de = dictNext(di)) != NULL) {
sentinelRedisInstance *ri = dictGetVal(de);
if (ri->flags & SRI_MASTER_DOWN) quorum++;
}
dictReleaseIterator(di);
if (quorum >= master->quorum) odown = 1;
}
/* Set the flag accordingly to the outcome. */
if (odown) {
if ((master->flags & SRI_O_DOWN) == 0) {
sentinelEvent(LL_WARNING,"+odown",master,"%@ #quorum %d/%d",
quorum, master->quorum);
master->flags |= SRI_O_DOWN;
master->o_down_since_time = mstime();
}
}
...
}
从sentinelHandleRedisInstance中的流程,可以发现,如果第一个Sentinel发现Master下线,则会查看其它Sentinel的判断。那么刚开始其它Sentinel的flags就不是SRI_MASTER_DOWN。通过一系列内部判断。实际上会在第二个sentinelAskMasterStateToOtherSentinels方法内部,向其余Sentinel询问Master的判断。
在sentinel中,会遍历其它Sentinel,然后发送命令:SENTINEL is-master-down-by-addr <master-ip> <master-port> <epoch> <*>
,此时只是询问Master的状态,所以runId是*
(Sentinel需要投票选举为Leader时,为当前sentinel的runId)。这是一个异步操作,回调函数在sentinelReceiveIsMasterDownReply中。
c
//sentinel.c#sentinelAskMasterStateToOtherSentinels
void sentinelAskMasterStateToOtherSentinels(sentinelRedisInstance *master, int flags) {
dictIterator *di;
dictEntry *de;
di = dictGetIterator(master->sentinels);
while((de = dictNext(di)) != NULL) {
sentinelRedisInstance *ri = dictGetVal(de);
//忽略掉一些判断
...
/* Ask */
ll2string(port,sizeof(port),master->addr->port);
//向sentinel发送命令,SENTINEL is-master-down-by-addr <master-ip> <master-port> <epoch> <*>
//此时只是询问Master的状态,所以runId是*(Sentinel需要投票选举为Leader时,为当前sentinel的runId)
retval = redisAsyncCommand(ri->link->cc,
sentinelReceiveIsMasterDownReply, ri,
"%s is-master-down-by-addr %s %s %llu %s",
sentinelInstanceMapCommand(ri,"SENTINEL"),
master->addr->ip, port,
sentinel.current_epoch,
(master->failover_state > SENTINEL_FAILOVER_STATE_NONE) ?
sentinel.myid : "*");
if (retval == C_OK) ri->link->pending_commands++;
}
dictReleaseIterator(di);
}
在查看回调函数sentinelReceiveIsMasterDownReply之前,先看下其它Sentinel怎么回复SENTINEL is-master-down-by-addr
请求的。接收命令在sentinelCommand方法中。首先会根据Master的ip和port找到这个实例,判断是否是客观下线,满足条件的话isdown为1。这次的请求是判断master是否下线,并不会进行投票,因此不会执行sentinelVoteLeader方法。回复消息会回复三个信息 <master-isdown> <leader-runid> <leader-epoch>
c
//sentinel.c#sentinelCommand
void sentinelCommand(client *c) {
//忽略其它判断
...
else if (!strcasecmp(c->argv[1]->ptr,"is-master-down-by-addr")) {
sentinelRedisInstance *ri;
long long req_epoch;
uint64_t leader_epoch = 0;
char *leader = NULL;
long port;
int isdown = 0;
if (c->argc != 6) goto numargserr;
if (getLongFromObjectOrReply(c,c->argv[3],&port,NULL) != C_OK ||
getLongLongFromObjectOrReply(c,c->argv[4],&req_epoch,NULL)
!= C_OK)
return;
ri = getSentinelRedisInstanceByAddrAndRunID(sentinel.masters,
c->argv[2]->ptr,port,NULL);
/* It exists? Is actually a master? Is subjectively down? It's down.
* Note: if we are in tilt mode we always reply with "0". */
if (!sentinel.tilt && ri && (ri->flags & SRI_S_DOWN) &&
(ri->flags & SRI_MASTER))
isdown = 1;
/* Vote for the master (or fetch the previous vote) if the request
* includes a runid, otherwise the sender is not seeking for a vote. */
if (ri && ri->flags & SRI_MASTER && strcasecmp(c->argv[5]->ptr,"*")) {
leader = sentinelVoteLeader(ri,(uint64_t)req_epoch,
c->argv[5]->ptr,
&leader_epoch);
}
/* Reply with a three-elements multi-bulk reply:
* down state, leader, vote epoch. */
addReplyMultiBulkLen(c,3);
addReply(c, isdown ? shared.cone : shared.czero);
addReplyBulkCString(c, leader ? leader : "*");
addReplyLonglong(c, (long long)leader_epoch);
if (leader) sdsfree(leader);
}
}
在sentinelReceiveIsMasterDownReply中,首先会判断回复的内容的格式。然后如果,则设置该sentinel的flags为SRI_MASTER_DOWN(或操作),因为此次仅为查看其余Sentinel是否也判定Master离线,所以后面的逻辑并不会继续往下执行。
c
void sentinelReceiveIsMasterDownReply(redisAsyncContext *c, void *reply, void *privdata) {
sentinelRedisInstance *ri = privdata;
...
/* Ignore every error or unexpected reply.
* Note that if the command returns an error for any reason we'll
* end clearing the SRI_MASTER_DOWN flag for timeout anyway. */
if (r->type == REDIS_REPLY_ARRAY && r->elements == 3 &&
r->element[0]->type == REDIS_REPLY_INTEGER &&
r->element[1]->type == REDIS_REPLY_STRING &&
r->element[2]->type == REDIS_REPLY_INTEGER)
{
ri->last_master_down_reply_time = mstime();
if (r->element[0]->integer == 1) {
ri->flags |= SRI_MASTER_DOWN;
} else {
ri->flags &= ~SRI_MASTER_DOWN;
}
if (strcmp(r->element[1]->str,"*")) {
/* If the runid in the reply is not "*" the Sentinel actually
* replied with a vote. */
sdsfree(ri->leader);
if ((long long)ri->leader_epoch != r->element[2]->integer)
serverLog(LL_WARNING,
"%s voted for %s %llu", ri->name,
r->element[1]->str,
(unsigned long long) r->element[2]->integer);
ri->leader = sdsnew(r->element[1]->str);
ri->leader_epoch = r->element[2]->integer;
}
}
}
至此,客观判断的逻辑也整理清楚了。实际上整个过程也是通过数个sentinelTimer完成,并非在一个sentinelTime中就完成。
3. 故障转移
当Redis哨兵方案中的Master处于客观下线状态,为了保证Redis的高可用性,此时需要主从切换,将一个Slave提升为Master,其他Slave从该提升的Slave继续同步数据。在故障转移过程中,定义了一个状态,具体定义如下:
arduino
//没有进行切换
#define SENTINEL_FAILOVER_STATE_NONE 0 /* No failover in progress. */
//等待开始进行切换,等待哨兵选择一个Leader
#define SENTINEL_FAILOVER_STATE_WAIT_START 1 /* Wait for failover_start_time*/
//选择一台从服务器作为新的主服务器
#define SENTINEL_FAILOVER_STATE_SELECT_SLAVE 2 /* Select slave to promote */
//将选中的从服务器切换为主服务器
#define SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE 3 /* Slave -> Master */
//等待被选中的从服务器上报状态
#define SENTINEL_FAILOVER_STATE_WAIT_PROMOTION 4 /* Wait slave to change role */
//将其他Slave切换为向新的主服务器同步数据
#define SENTINEL_FAILOVER_STATE_RECONF_SLAVES 5 /* SLAVEOF newmaster */
//重置Master,将Master的IP:PORT设置为被选中从服务器的IP:PORT
#define SENTINEL_FAILOVER_STATE_UPDATE_CONFIG 6 /* Monitor promoted slave. */
3.1 开始故障转移
在sentinelStartFailoverIfNeeded方法中,如果发现Master已经为客观下线,那么会开始进行故障转移。内部会调用sentinelStartFailover方法,修改failover_state和将当前纪元加1,设置故障转移开始时间。
c
//sentinel.c#sentinelStartFailover
void sentinelStartFailover(sentinelRedisInstance *master) {
serverAssert(master->flags & SRI_MASTER);
master->failover_state = SENTINEL_FAILOVER_STATE_WAIT_START;
master->flags |= SRI_FAILOVER_IN_PROGRESS;
master->failover_epoch = ++sentinel.current_epoch;
sentinelEvent(LL_WARNING,"+new-epoch",master,"%llu",
(unsigned long long) sentinel.current_epoch);
sentinelEvent(LL_WARNING,"+try-failover",master,"%@");
master->failover_start_time = mstime()+rand()%SENTINEL_MAX_DESYNC;
master->failover_state_change_time = mstime();
}
然后会调用第一个sentinelAskMasterStateToOtherSentinels方法,向其他Sentinel发送is-master-down-by-addr,此时会带runId,要求其他Sentinel对其进行投票。
3.2 选择Sentinel Leader
开始故障转移后,会要求其他Sentinel投票给当前的Sentinel。会调用sentinelAskMasterStateToOtherSentinels,和判断master状态是一样的请求,只是最后的参数设置为sentinel的runId,要求其他sentinel给当前sentinel投票。如果其中一个sentinel(暂未开始故障转移)给其他Sentinel投票了,那么它在这一轮将失去成为Leader的资格。
c
if (sentinelStartFailoverIfNeeded(ri))
sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_ASK_FORCED);
然后在sentinelFailoverStateMachine会对故障转移状态进行判断,然后会进入到sentinelFailoverWaitStart方法中,在该方法中,会判断投票情况。
首先,会判断一下其他Sentinel投票的情况,然后从中找出一个投票数最大的Sentinel节点
c
//sentinel.c#sentinelGetLeader
di = dictGetIterator(master->sentinels);
while((de = dictNext(di)) != NULL) {
sentinelRedisInstance *ri = dictGetVal(de);
if (ri->leader != NULL && ri->leader_epoch == sentinel.current_epoch)
sentinelLeaderIncr(counters,ri->leader);
}
dictReleaseIterator(di);
di = dictGetIterator(counters);
while((de = dictNext(di)) != NULL) {
uint64_t votes = dictGetUnsignedIntegerVal(de);
if (votes > max_votes) {
max_votes = votes;
winner = dictGetKey(de);
}
}
其次,如果找到拥有最大票数的sentinel,则当前的Sentinel把票投给他,否则投给自己。然后会再从中找出最大投票的节点。
c
//sentinel.c#sentinelGetLeader
if (winner)
myvote = sentinelVoteLeader(master,epoch,winner,&leader_epoch);
else
myvote = sentinelVoteLeader(master,epoch,sentinel.myid,&leader_epoch);
if (myvote && leader_epoch == epoch) {
uint64_t votes = sentinelLeaderIncr(counters,myvote);
if (votes > max_votes) {
max_votes = votes;
winner = myvote;
}
}
最后,要求拥有最大票数的Sentinel,票数必须同时满足大于等于( Sentinel总数 + 1) /2 和 大于等于配置的参数,才认为这个Sentinel是Leader
c
//sentinel.c#sentinelGetLeader
voters_quorum = voters/2+1;
if (winner && (max_votes < voters_quorum || max_votes < master->quorum))
winner = NULL;
winner = winner ? sdsnew(winner) : NULL;
return winner;
回到sentinelFailoverWaitStart方法中,会判断当前执行的Sentinel是否是Leader,如果不是,则判断选Leader有没有超时。election_timeout默认10s,failover_timeout默认3分钟,超时时间取它们间最小值。
如果当前Sentinel是Leader的话,则会修改状态。
c
//sentinel.c#sentinelFailoverWaitStart
if (!isleader && !(ri->flags & SRI_FORCE_FAILOVER)) {
int election_timeout = SENTINEL_ELECTION_TIMEOUT;
/* The election timeout is the MIN between SENTINEL_ELECTION_TIMEOUT
* and the configured failover timeout. */
if (election_timeout > ri->failover_timeout)
election_timeout = ri->failover_timeout;
/* Abort the failover if I'm not the leader after some time. */
if (mstime() - ri->failover_start_time > election_timeout)
{
sentinelEvent(LL_WARNING,"-failover-abort-not-elected",ri,"%@");
sentinelAbortFailover(ri);
}
return;
}
sentinelEvent(LL_WARNING,"+elected-leader",ri,"%@");
if (sentinel.simfailure_flags & SENTINEL_SIMFAILURE_CRASH_AFTER_ELECTION)
sentinelSimFailureCrash();
ri->failover_state = SENTINEL_FAILOVER_STATE_SELECT_SLAVE;
ri->failover_state_change_time = mstime();
sentinelEvent(LL_WARNING,"+failover-state-select-slave",ri,"%@");
3.3 选择从服务器
随后会开始选择从服务器,会执行sentinelFailoverSelectSlave方法,内部会调用sentinelSelectSlave选择出一个合适的从服务器。
如果Master主观下线,则max_master_down_time为down_after_period(配置down-after-milliseconds,默认为30s)的10倍。
1)如果从服务器被判断为主观或客观离线,则不选择。
2)如果从服务器断开连接,则不选择。
3)如果从服务器最后可用时间last_avail_time,超过了5个PING周期(5s)(5s没有有效回复PING命令),则不选择。
4)如果从服务器的priority为0,则不选择。
5)如果从服务器INFO命令回复太久,则不选择。
6)如果从服务器断开时间太长(默认超过5分钟(30 * 10 s),认为连接不稳定),则不选择。
最后,将剩余Slave再比较选择出一个(比较方法为:compareSlavesForPromotion):优先级如下
1)slave_priority优先级
2)复制偏移量(slave_repl_offset)比较大的。
3)比较runId的字符顺序
c
//sentinel.c#sentinelSelectSlave
sentinelRedisInstance *sentinelSelectSlave(sentinelRedisInstance *master) {
sentinelRedisInstance **instance =
zmalloc(sizeof(instance[0])*dictSize(master->slaves));
sentinelRedisInstance *selected = NULL;
int instances = 0;
dictIterator *di;
dictEntry *de;
mstime_t max_master_down_time = 0;
if (master->flags & SRI_S_DOWN)
max_master_down_time += mstime() - master->s_down_since_time;
max_master_down_time += master->down_after_period * 10;
di = dictGetIterator(master->slaves);
while((de = dictNext(di)) != NULL) {
sentinelRedisInstance *slave = dictGetVal(de);
mstime_t info_validity_time;
if (slave->flags & (SRI_S_DOWN|SRI_O_DOWN)) continue;
if (slave->link->disconnected) continue;
if (mstime() - slave->link->last_avail_time > SENTINEL_PING_PERIOD*5) continue;
if (slave->slave_priority == 0) continue;
/* If the master is in SDOWN state we get INFO for slaves every second.
* Otherwise we get it with the usual period so we need to account for
* a larger delay. */
if (master->flags & SRI_S_DOWN)
info_validity_time = SENTINEL_PING_PERIOD*5;
else
info_validity_time = SENTINEL_INFO_PERIOD*3;
if (mstime() - slave->info_refresh > info_validity_time) continue;
if (slave->master_link_down_time > max_master_down_time) continue;
instance[instances++] = slave;
}
dictReleaseIterator(di);
if (instances) {
qsort(instance,instances,sizeof(sentinelRedisInstance*),
compareSlavesForPromotion);
selected = instance[0];
}
zfree(instance);
return selected;
}
回到sentinelFailoverSelectSlave方法中,如果没有Slave选中,则会终止这次故障转移。
如果选择了Slave,会修改flags,设置提升的从服务器,修改故障转移状态。
c
//sentinel.c#sentinelFailoverSelectSlave
sentinelRedisInstance *slave = sentinelSelectSlave(ri);
if (slave == NULL) {
sentinelEvent(LL_WARNING,"-failover-abort-no-good-slave",ri,"%@");
sentinelAbortFailover(ri);
} else {
sentinelEvent(LL_WARNING,"+selected-slave",slave,"%@");
slave->flags |= SRI_PROMOTED;
ri->promoted_slave = slave;
ri->failover_state = SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE;
ri->failover_state_change_time = mstime();
sentinelEvent(LL_NOTICE,"+failover-state-send-slaveof-noone",
slave, "%@");
}
3.4 提升从服务器
会把选择的从服务器提升为主服务器,哨兵会向选择的从服务器发送如下命令:
objectivec
MULTI //开启一个事务
SLAVEOF NO ONE //关闭从服务器的复制功能,将其转换为一个主服务器
CONFIG REWRITE //将从服务器redis.conf进行重写(会根据当前运行中的配置,重写原来的配置)
CLIENT KILL TYPE nornal //关闭连接到该服务器的客户端
EXEC //执行事务
在sentinelFailoverSendSlaveOfNoOne方法内,会调用sentinelSendSlaveOf方法,以下代码为几个命令的封装和发送。最后sentinelFailoverSendSlaveOfNoOne方法内,会修改故障转移状态为SENTINEL_FAILOVER_STATE_WAIT_PROMOTION。
c
//sentinel.c#sentinelSendSlaveOf
int sentinelSendSlaveOf(sentinelRedisInstance *ri, char *host, int port) {
char portstr[32];
int retval;
ll2string(portstr,sizeof(portstr),port);
/* If host is NULL we send SLAVEOF NO ONE that will turn the instance
* into a master. */
if (host == NULL) {
host = "NO";
memcpy(portstr,"ONE",4);
}
retval = redisAsyncCommand(ri->link->cc,
sentinelDiscardReplyCallback, ri, "%s",
sentinelInstanceMapCommand(ri,"MULTI"));
if (retval == C_ERR) return retval;
ri->link->pending_commands++;
retval = redisAsyncCommand(ri->link->cc,
sentinelDiscardReplyCallback, ri, "%s %s %s",
sentinelInstanceMapCommand(ri,"SLAVEOF"),
host, portstr);
if (retval == C_ERR) return retval;
ri->link->pending_commands++;
retval = redisAsyncCommand(ri->link->cc,
sentinelDiscardReplyCallback, ri, "%s REWRITE",
sentinelInstanceMapCommand(ri,"CONFIG"));
if (retval == C_ERR) return retval;
ri->link->pending_commands++;
retval = redisAsyncCommand(ri->link->cc,
sentinelDiscardReplyCallback, ri, "%s KILL TYPE normal",
sentinelInstanceMapCommand(ri,"CLIENT"));
if (retval == C_ERR) return retval;
ri->link->pending_commands++;
retval = redisAsyncCommand(ri->link->cc,
sentinelDiscardReplyCallback, ri, "%s",
sentinelInstanceMapCommand(ri,"EXEC"));
if (retval == C_ERR) return retval;
ri->link->pending_commands++;
return C_OK;
}
3.5 等待从服务器切换为主服务
在sentinelFailoverWaitPromotion方法内,仅仅是判断本次故障转移时间,是否超过设置的故障超时时间。
c
void sentinelFailoverWaitPromotion(sentinelRedisInstance *ri) {
/* Just handle the timeout. Switching to the next state is handled
* by the function parsing the INFO command of the promoted slave. */
if (mstime() - ri->failover_state_change_time > ri->failover_timeout) {
sentinelEvent(LL_WARNING,"-failover-abort-slave-timeout",ri,"%@");
sentinelAbortFailover(ri);
}
}
真正的处理逻辑在从服务器对INFO命令的回复中,即sentinelRefreshInstanceInfo方法中(如果提升的从服务器成为Master之后,上报的角色就会修改)。在该方法中,会修改故障转移状态等操作。
c
//sentinel.c#sentinelRefreshInstanceInfo
if ((ri->flags & SRI_SLAVE) && role == SRI_MASTER) {
/* If this is a promoted slave we can change state to the
* failover state machine. */
if ((ri->flags & SRI_PROMOTED) &&
(ri->master->flags & SRI_FAILOVER_IN_PROGRESS) &&
(ri->master->failover_state ==
SENTINEL_FAILOVER_STATE_WAIT_PROMOTION)) {
ri->master->config_epoch = ri->master->failover_epoch;
ri->master->failover_state = SENTINEL_FAILOVER_STATE_RECONF_SLAVES;
ri->master->failover_state_change_time = mstime();
sentinelFlushConfig();
sentinelEvent(LL_WARNING,"+promoted-slave",ri,"%@");
if (sentinel.simfailure_flags &
SENTINEL_SIMFAILURE_CRASH_AFTER_PROMOTION)
sentinelSimFailureCrash();
sentinelEvent(LL_WARNING,"+failover-state-reconf-slaves",
ri->master,"%@");
sentinelCallClientReconfScript(ri->master,SENTINEL_LEADER,
"start",ri->master->addr,ri->addr);
sentinelForceHelloUpdateForMaster(ri->master);
}
}
3.6 其他从服务器切换主服务器
在sentinelFailoverReconfNextSlave方法内,会对其他从服务器发送命令(调用sentinelSendSlaveOf方法)。命令如下:
rust
MULTI //开启一个事务
SLAVEOF <promoted_slave->addr->ip> <promoted_slave->addr->port> //切换主服务器
CONFIG REWRITE //将从服务器redis.conf进行重写(会根据当前运行中的配置,重写原来的配置)
CLIENT KILL TYPE nornal //关闭连接到该服务器的客户端
EXEC //执行事务
最后会调用sentinelFailoverDetectEnd()方法,判断所有的从服务器是否已经切换成功,如果是则话,则修改故障转移状态为:SENTINEL_FAILOVER_STATE_UPDATE_CONFIG
3.7 主服务器替换
当状态为SENTINEL_FAILOVER_STATE_UPDATE_CONFIG时,会触发sentinelFailoverSwitchToPromotedSlave方法的执行。
c
//sentinel.c#sentinelHandleDictOfRedisInstances
...
while((de = dictNext(di)) != NULL) {
sentinelRedisInstance *ri = dictGetVal(de);
sentinelHandleRedisInstance(ri);
if (ri->flags & SRI_MASTER) {
sentinelHandleDictOfRedisInstances(ri->slaves);
sentinelHandleDictOfRedisInstances(ri->sentinels);
if (ri->failover_state == SENTINEL_FAILOVER_STATE_UPDATE_CONFIG) {
switch_to_promoted = ri;
}
}
}
if (switch_to_promoted)
sentinelFailoverSwitchToPromotedSlave(switch_to_promoted);
在sentinelResetMasterAndChangeAddress中,会将原来的Master连接给重置掉,在sentinelResetMaster会把故障转移状态修改为SENTINEL_FAILOVER_STATE_NONE。然后将从服务器和旧Master加入到Slaves信息中。所有连接会在下一个sentinelTimer中重新连接。
c
//sentinel.c#sentinelResetMasterAndChangeAddress
sentinelResetMaster(master,SENTINEL_RESET_NO_SENTINELS);
oldaddr = master->addr;
master->addr = newaddr;
master->o_down_since_time = 0;
master->s_down_since_time = 0;
/* Add slaves back. */
for (j = 0; j < numslaves; j++)
{
sentinelRedisInstance *slave;
slave = createSentinelRedisInstance(NULL,SRI_SLAVE,slaves[j]->ip,
slaves[j]->port, master->quorum, master);
releaseSentinelAddr(slaves[j]);
if (slave) sentinelEvent(LL_NOTICE,"+slave",slave,"%@");
}
4. 问题
4.1 旧Master服务器启动,如何知道自己已经不是主服务器?
旧Master启动后,Sentinel会对它进行监听,如果连接成功,会定时发送INFO信息。根据返回的信息,会发现Sentinel中将其标记为slave,但是旧Master上报为Master,这个时候就会让这个旧Master去复制新的Master。以下代码为主要逻辑:
c
//sentinel.c#sentinelRefreshInstanceInfo
if ((ri->flags & SRI_SLAVE) && role == SRI_MASTER) {
/* If this is a promoted slave we can change state to the
* failover state machine. */
if ((ri->flags & SRI_PROMOTED) &&
(ri->master->flags & SRI_FAILOVER_IN_PROGRESS) &&
(ri->master->failover_state ==
SENTINEL_FAILOVER_STATE_WAIT_PROMOTION)) {
//从服务器替换为主服务器逻辑
...
} else {
/* A slave turned into a master. We want to force our view and
* reconfigure as slave. Wait some time after the change before
* going forward, to receive new configs if any. */
mstime_t wait_time = SENTINEL_PUBLISH_PERIOD*4;
if (!(ri->flags & SRI_PROMOTED) &&
sentinelMasterLooksSane(ri->master) &&
sentinelRedisInstanceNoDownFor(ri,wait_time) &&
mstime() - ri->role_reported_time > wait_time) {
int retval = sentinelSendSlaveOf(ri,
ri->master->addr->ip,
ri->master->addr->port);
if (retval == C_OK)
sentinelEvent(LL_NOTICE,"+convert-to-slave",ri,"%@");
}
}
}
4.2 其他Sentinel怎么知道新Master的地址,并进行监控?
Sentinel会对Master和Slave订阅__sentinel__:hello
频道,然后每隔2S发送信息,其中就包括了Master的信息,如果Master已经被修改了,那么其余Sentinel会从Master或者Slave的pub/sub中得知。在sentinelProcessHelloMessage方法中,会对切换的Master信息进行处理,然后监听新的Master。
c
//sentinel.c#sentinelProcessHelloMessage
/* Update master info if received configuration is newer. */
if (si && master->config_epoch < master_config_epoch) {
master->config_epoch = master_config_epoch;
if (master_port != master->addr->port ||
strcmp(master->addr->ip, token[5])) {
sentinelAddr *old_addr;
sentinelEvent(LL_WARNING,"+config-update-from",si,"%@");
sentinelEvent(LL_WARNING,"+switch-master",
master,"%s %s %d %s %d",
master->name,
master->addr->ip, master->addr->port,
token[5], master_port);
old_addr = dupSentinelAddr(master->addr);
sentinelResetMasterAndChangeAddress(master, token[5], master_port);
sentinelCallClientReconfScript(master,
SENTINEL_OBSERVER,"start",
old_addr,master->addr);
releaseSentinelAddr(old_addr);
}
}
5. 参考资料
1)《Redis5 设计与源码分析》
2)《Redis设计与实现》
3)Redis Sentinel的选举Leader原理及源码解析(结合Raft算法)
4)Redis源码5.0分支