Redis源码分析之哨兵

1. 哨兵启动

哨兵可以通过redis-server命令启动，如下：（或者redis-sentinel）

css 复制代码

redis-server --sentinel

哨兵在main函数中的流程如下：

c 复制代码

//server.c#main
int main(int argc, char **argv) {
 ...
 //检测是否以sentinel模式启动
 server.sentinel_mode = checkForSentinelMode(argc,argv);
 ...
 if (server.sentinel_mode) {
    //server.port 修改成的26379
    initSentinelConfig();
    //主要更改可执行命令，和做一些初始化
    initSentinel();
 }
 ...
 //内部如果是sentinel，会调用sentinelHandleConfiguration方法，进行解析配置，然后初始化
 loadServerConfig(configfile,options);
 ...
 //随机生成一个40字节的哨兵id，打印启动日志
 sentinelIsRunning();
}

在initSentinel方法中，会更改哨兵可执行的命令。其中publish命令的处理函数为sentinelPublishCommand，和订阅__sentinel__:hello频道的内部处理一致。

c 复制代码

//sentinel.c
struct redisCommand sentinelcmds[] = {
    {"ping",pingCommand,1,"",0,NULL,0,0,0,0,0},
    {"sentinel",sentinelCommand,-2,"",0,NULL,0,0,0,0,0},
    {"subscribe",subscribeCommand,-2,"",0,NULL,0,0,0,0,0},
    {"unsubscribe",unsubscribeCommand,-1,"",0,NULL,0,0,0,0,0},
    {"psubscribe",psubscribeCommand,-2,"",0,NULL,0,0,0,0,0},
    {"punsubscribe",punsubscribeCommand,-1,"",0,NULL,0,0,0,0,0},
    {"publish",sentinelPublishCommand,3,"",0,NULL,0,0,0,0,0},
    {"info",sentinelInfoCommand,-1,"",0,NULL,0,0,0,0,0},
    {"role",sentinelRoleCommand,1,"l",0,NULL,0,0,0,0,0},
    {"client",clientCommand,-2,"rs",0,NULL,0,0,0,0,0},
    {"shutdown",shutdownCommand,-1,"",0,NULL,0,0,0,0,0},
    {"auth",authCommand,2,"sltF",0,NULL,0,0,0,0,0}
};

在sentinelHandleConfiguration方法中，会对配置中的monitor进行解析，把master信息添加到sentinel.masters中。

c 复制代码

//sentinel.c#sentinelHandleConfiguration
if (!strcasecmp(argv[0],"monitor") && argc == 5) {
	/* monitor <name> <host> <port> <quorum> */
	int quorum = atoi(argv[4]);
	if (quorum <= 0) return "Quorum must be 1 or greater.";
	if (createSentinelRedisInstance(argv[1],SRI_MASTER,argv[2],
	                                        atoi(argv[3]),quorum,NULL) == NULL) {
		......
	}
}

在main函数中，只是进行了一些初始化。对master的监控，在定时任务中。

c 复制代码

//server.c#serverCron
if (server.sentinel_mode) sentinelTimer();

sentinelTimer函数，主要做了以下事情：

1）建立命令连接和消息连接。消息连接建立之后会订阅Redis服务的_sentinel_:hello频道。

2）在命令连接上，每10s发送info命令进行信息采集，获取slaves信息。每1s在命令连接上发送ping命令探测存活性。每2s在命令连接上发布一条消息，信息格式如下：

复制代码

sentinel_ip,sentinel_port,sentinel_runid,current_epoch,master_name,master_ip,master_port,master_epoch

3）检测服务是否处于主观下线。

4）检测服务是否处于客观下线，如果是主服务器还需要做主从切换。

一个哨兵如何知道其他哨兵对一台master服务器的状态判断？

sentinel会向监控同一台Master的所有哨兵通过命令连接发送如下格式：

csharp 复制代码

SENTINEL is-master-down-by-addr master_ip master_port current_epoch sentinel_runid

最后一项当需要投票时发送sentinel_runnid，否则发送一个*

2. 主要流程

下面主要介绍上述几个流程

2.1 建立连接

在sentinelReconnectInstance方法中，会进行创建连接。对于master、slave和Sentinel都会创建命令连接 ，通过link.cc变量存储。对于master和slave会创建消息连接，sentinel则不会 ，通过link.pc变量存储。另外消息连接中，会订阅 __sentinel__:hello主题，且接收函数为sentinelReceiveHelloMessages。

c 复制代码

//sentinel.c#sentinelReconnectInstance
//创建命令连接
if (link->cc == NULL) {
    link->cc = redisAsyncConnectBind(ri->addr->ip,ri->addr->port,NET_FIRST_BIND_ADDR);
    ...
}
//创建消息连接
/* Pub / Sub */
if ((ri->flags & (SRI_MASTER|SRI_SLAVE)) && link->pc == NULL) {
    link->pc = redisAsyncConnectBind(ri->addr->ip,ri->addr->port,NET_FIRST_BIND_ADDR);
    ...
     retval = redisAsyncCommand(link->pc,
                sentinelReceiveHelloMessages, ri, "%s %s",
                sentinelInstanceMapCommand(ri,"SUBSCRIBE"),
                SENTINEL_HELLO_CHANNEL);
}

在sentinelReceiveHelloMessages方法中会调用sentinelProcessHelloMessage，内部会发现其他Sentinel，然后将其加入到Sentinel列表中。

c 复制代码

//sentinel.c#sentinelProcessHelloMessage
//token[2]为sentinel_runid，token[0]为sentinel_ip
si = createSentinelRedisInstance(token[2],SRI_SENTINEL,
                            token[0],port,master->quorum,master);

2.2 发送命令

对于master和slave会进行发送INFO命令，默认10s执行一次。info_period 为判断条件，在一些特殊时期，该值为1s。该命令为异步命令，INFO命令的回复处理函数在sentinelInfoReplyCallback中，内部会对结果进行处理。

c 复制代码

//sentinel.c#sentinelSendPeriodicCommands
/* Send INFO to masters and slaves, not sentinels. */
    if ((ri->flags & SRI_SENTINEL) == 0 &&
        (ri->info_refresh == 0 ||
        (now - ri->info_refresh) > info_period))
    {
        retval = redisAsyncCommand(ri->link->cc,
            sentinelInfoReplyCallback, ri, "%s",
            sentinelInstanceMapCommand(ri,"INFO"));
        if (retval == C_OK) ri->link->pending_commands++;
    }

sentinelInfoReplyCallback内部会调用sentinelRefreshInstanceInfo方法，内部对INFO结果进行解析。其中最重要的是从INFO命令中发现slaves信息。先从已有连接中根据ip和端口找slave信息，如果没有找到，则进行连接。

c 复制代码

//sentinel.c#sentinelRefreshInstanceInfo
/* Check if we already have this slave into our table,
             * otherwise add it. */
if (sentinelRedisInstanceLookupSlave(ri,ip,atoi(port)) == NULL) {
	if ((slave = createSentinelRedisInstance(NULL,SRI_SLAVE,ip,
	                            atoi(port), ri->quorum, ri)) != NULL) {
		sentinelEvent(LL_NOTICE,"+slave",slave,"%@");
		sentinelFlushConfig();
	}
}

createSentinelRedisInstance方法中会创建一个sentinelRedisInstance实例。createSentinelRedisInstance方法是一个通用方法，其中不同flags存储的地方不同。如刚刚是从INFO中找slaves信息，flags为SRI_SLAVE，则存储在master实例的slaves属性中。

c 复制代码

//sentinel.c#sentinelRefreshInstanceInfo
 createSentinelRedisInstance
 if (flags & SRI_MASTER) table = sentinel.masters;
    else if (flags & SRI_SLAVE) table = master->slaves;
    else if (flags & SRI_SENTINEL) table = master->sentinels;
...
 dictAdd(table, ri->name, ri);

默认1s向master、slave、sentinel发送Ping命令。sentinelSendPing方法内部会发送PING命令，并且记录下相关的时间。

c 复制代码

//sentinel.c#sentinelSendPeriodicCommands
if ((now - ri->link->last_pong_time) > ping_period &&
               (now - ri->link->last_ping_time) > ping_period/2) {
        sentinelSendPing(ri);
    }

默认2s，向 "sentinel:hello"主题中发布消息。sentinelSendHello方法内部会根据格式发送一条命令。对于master和slave来说，会将该接收到的数据转发给订阅的客户端（sentinel）。对于Sentinel来说，直接处理了消息。（效果和订阅一致）

c 复制代码

//sentinel.c#sentinelSendPeriodicCommands
if ((now - ri->last_pub_time) > SENTINEL_PUBLISH_PERIOD) {
        sentinelSendHello(ri);
    }

2.3 主观判断

主观判断针对master、slave、sentinel。

判断一：elapsed为上一次接收到ping回复时间至今时间，ri->down_after_period根据配置，默认30S。默认为30s能发现服务器挂掉。

判断二：当前节点为主节点，通过Slave的INFO命令得到（ri->role_reported == SRI_SLAVE），并且上一次得到INFO命令的回复已经超过了ri->down_after_period 的时间加两倍INFO命令的时间（默认50s）。这里的判断，可能是担心从服务器连不上主服务器，但是主服务器还能和sentinel通信。

c 复制代码

//sentinel.c#sentinelCheckSubjectivelyDown
void sentinelCheckSubjectivelyDown(sentinelRedisInstance *ri) {
	mstime_t elapsed = 0;
	if (ri->link->act_ping_time)
	        elapsed = mstime() - ri->link->act_ping_time; 
    else if (ri->link->disconnected)
	        elapsed = mstime() - ri->link->last_avail_time;
	
    ...
    if (elapsed > ri->down_after_period ||
	        (ri->flags & SRI_MASTER &&
	         ri->role_reported == SRI_SLAVE &&
	         mstime() - ri->role_reported_time >
	          (ri->down_after_period+SENTINEL_INFO_PERIOD*2))) {
		/* Is subjectively down */
		if ((ri->flags & SRI_S_DOWN) == 0) {
			sentinelEvent(LL_WARNING,"+sdown",ri,"%@");
			ri->s_down_since_time = mstime();
			ri->flags |= SRI_S_DOWN;
		}
	}
}

2.4 客观判断

客观判断主要针对master服务器。

c 复制代码

//sentinel.c#sentinelHandleRedisInstance
if (ri->flags & SRI_MASTER) {
        sentinelCheckObjectivelyDown(ri);
        if (sentinelStartFailoverIfNeeded(ri))
            sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_ASK_FORCED);
        sentinelFailoverStateMachine(ri);
        sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_NO_FLAGS);
 }

主观判断的函数是在sentinelCheckObjectivelyDown中，会对sentinel进行遍历，查看他们是否认为Master是下线，如果是则进行计数加1。当计数大于配置设置的quorum时，则认为Master客观下线，修改flags。

C 复制代码

//sentinel.c#sentinelCheckObjectivelyDown
void sentinelCheckObjectivelyDown(sentinelRedisInstance *master) {
    dictIterator *di;
    dictEntry *de;
    unsigned int quorum = 0, odown = 0;

    if (master->flags & SRI_S_DOWN) {
        /* Is down for enough sentinels? */
        quorum = 1; /* the current sentinel. */
        /* Count all the other sentinels. */
        di = dictGetIterator(master->sentinels);
        while((de = dictNext(di)) != NULL) {
            sentinelRedisInstance *ri = dictGetVal(de);

            if (ri->flags & SRI_MASTER_DOWN) quorum++;
        }
        dictReleaseIterator(di);
        if (quorum >= master->quorum) odown = 1;
    }

    /* Set the flag accordingly to the outcome. */
    if (odown) {
        if ((master->flags & SRI_O_DOWN) == 0) {
            sentinelEvent(LL_WARNING,"+odown",master,"%@ #quorum %d/%d",
                quorum, master->quorum);
            master->flags |= SRI_O_DOWN;
            master->o_down_since_time = mstime();
        }
    } 
    ...
}

从sentinelHandleRedisInstance中的流程，可以发现，如果第一个Sentinel发现Master下线，则会查看其它Sentinel的判断。那么刚开始其它Sentinel的flags就不是SRI_MASTER_DOWN。通过一系列内部判断。实际上会在第二个sentinelAskMasterStateToOtherSentinels方法内部，向其余Sentinel询问Master的判断。

在sentinel中，会遍历其它Sentinel，然后发送命令：SENTINEL is-master-down-by-addr <master-ip> <master-port> <epoch> <*>，此时只是询问Master的状态，所以runId是*（Sentinel需要投票选举为Leader时，为当前sentinel的runId）。这是一个异步操作，回调函数在sentinelReceiveIsMasterDownReply中。

c 复制代码

//sentinel.c#sentinelAskMasterStateToOtherSentinels
void sentinelAskMasterStateToOtherSentinels(sentinelRedisInstance *master, int flags) {
    dictIterator *di;
    dictEntry *de;

    di = dictGetIterator(master->sentinels);
    while((de = dictNext(di)) != NULL) {
        sentinelRedisInstance *ri = dictGetVal(de);
        //忽略掉一些判断
        ...
        /* Ask */
        ll2string(port,sizeof(port),master->addr->port);
        //向sentinel发送命令，SENTINEL is-master-down-by-addr <master-ip> <master-port> <epoch> <*>
        //此时只是询问Master的状态，所以runId是*（Sentinel需要投票选举为Leader时，为当前sentinel的runId）
        retval = redisAsyncCommand(ri->link->cc,
                    sentinelReceiveIsMasterDownReply, ri,
                    "%s is-master-down-by-addr %s %s %llu %s",
                    sentinelInstanceMapCommand(ri,"SENTINEL"),
                    master->addr->ip, port,
                    sentinel.current_epoch,
                    (master->failover_state > SENTINEL_FAILOVER_STATE_NONE) ?
                    sentinel.myid : "*");
        if (retval == C_OK) ri->link->pending_commands++;
    }
    dictReleaseIterator(di);
}

在查看回调函数sentinelReceiveIsMasterDownReply之前，先看下其它Sentinel怎么回复SENTINEL is-master-down-by-addr请求的。接收命令在sentinelCommand方法中。首先会根据Master的ip和port找到这个实例，判断是否是客观下线，满足条件的话isdown为1。这次的请求是判断master是否下线，并不会进行投票，因此不会执行sentinelVoteLeader方法。回复消息会回复三个信息 <master-isdown> <leader-runid> <leader-epoch>

c 复制代码

//sentinel.c#sentinelCommand
void sentinelCommand(client *c) { 
    //忽略其它判断
    ...
    else if (!strcasecmp(c->argv[1]->ptr,"is-master-down-by-addr")) {
	
		sentinelRedisInstance *ri;
		long long req_epoch;
		uint64_t leader_epoch = 0;
		char *leader = NULL;
		long port;
		int isdown = 0;
		if (c->argc != 6) goto numargserr;
		if (getLongFromObjectOrReply(c,c->argv[3],&port,NULL) != C_OK ||
		            getLongLongFromObjectOrReply(c,c->argv[4],&req_epoch,NULL)
		                                                              != C_OK)
		            return;
		ri = getSentinelRedisInstanceByAddrAndRunID(sentinel.masters,
		            c->argv[2]->ptr,port,NULL);
		/* It exists? Is actually a master? Is subjectively down? It's down.
         * Note: if we are in tilt mode we always reply with "0". */
		if (!sentinel.tilt && ri && (ri->flags & SRI_S_DOWN) &&
		                                    (ri->flags & SRI_MASTER))
		            isdown = 1;
		/* Vote for the master (or fetch the previous vote) if the request
         * includes a runid, otherwise the sender is not seeking for a vote. */
		if (ri && ri->flags & SRI_MASTER && strcasecmp(c->argv[5]->ptr,"*")) {
			leader = sentinelVoteLeader(ri,(uint64_t)req_epoch,
			                                            c->argv[5]->ptr,
			                                            &leader_epoch);
		}
		/* Reply with a three-elements multi-bulk reply:
         * down state, leader, vote epoch. */
		addReplyMultiBulkLen(c,3);
		addReply(c, isdown ? shared.cone : shared.czero);
		addReplyBulkCString(c, leader ? leader : "*");
		addReplyLonglong(c, (long long)leader_epoch);
		if (leader) sdsfree(leader);
	}
}

在sentinelReceiveIsMasterDownReply中，首先会判断回复的内容的格式。然后如果，则设置该sentinel的flags为SRI_MASTER_DOWN（或操作），因为此次仅为查看其余Sentinel是否也判定Master离线，所以后面的逻辑并不会继续往下执行。

c 复制代码

void sentinelReceiveIsMasterDownReply(redisAsyncContext *c, void *reply, void *privdata) {
    sentinelRedisInstance *ri = privdata;
    ...
    /* Ignore every error or unexpected reply.
     * Note that if the command returns an error for any reason we'll
     * end clearing the SRI_MASTER_DOWN flag for timeout anyway. */
    if (r->type == REDIS_REPLY_ARRAY && r->elements == 3 &&
        r->element[0]->type == REDIS_REPLY_INTEGER &&
        r->element[1]->type == REDIS_REPLY_STRING &&
        r->element[2]->type == REDIS_REPLY_INTEGER)
    {
        ri->last_master_down_reply_time = mstime();
        if (r->element[0]->integer == 1) {
            ri->flags |= SRI_MASTER_DOWN;
        } else {
            ri->flags &= ~SRI_MASTER_DOWN;
        }
        if (strcmp(r->element[1]->str,"*")) {
            /* If the runid in the reply is not "*" the Sentinel actually
             * replied with a vote. */
            sdsfree(ri->leader);
            if ((long long)ri->leader_epoch != r->element[2]->integer)
                serverLog(LL_WARNING,
                    "%s voted for %s %llu", ri->name,
                    r->element[1]->str,
                    (unsigned long long) r->element[2]->integer);
            ri->leader = sdsnew(r->element[1]->str);
            ri->leader_epoch = r->element[2]->integer;
        }
    }
}

至此，客观判断的逻辑也整理清楚了。实际上整个过程也是通过数个sentinelTimer完成，并非在一个sentinelTime中就完成。

3. 故障转移

当Redis哨兵方案中的Master处于客观下线状态，为了保证Redis的高可用性，此时需要主从切换，将一个Slave提升为Master，其他Slave从该提升的Slave继续同步数据。在故障转移过程中，定义了一个状态，具体定义如下：

arduino 复制代码

//没有进行切换
#define SENTINEL_FAILOVER_STATE_NONE 0  /* No failover in progress. */
//等待开始进行切换，等待哨兵选择一个Leader
#define SENTINEL_FAILOVER_STATE_WAIT_START 1  /* Wait for failover_start_time*/
//选择一台从服务器作为新的主服务器
#define SENTINEL_FAILOVER_STATE_SELECT_SLAVE 2 /* Select slave to promote */
//将选中的从服务器切换为主服务器
#define SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE 3 /* Slave -> Master */
//等待被选中的从服务器上报状态
#define SENTINEL_FAILOVER_STATE_WAIT_PROMOTION 4 /* Wait slave to change role */
//将其他Slave切换为向新的主服务器同步数据
#define SENTINEL_FAILOVER_STATE_RECONF_SLAVES 5 /* SLAVEOF newmaster */
//重置Master，将Master的IP:PORT设置为被选中从服务器的IP:PORT
#define SENTINEL_FAILOVER_STATE_UPDATE_CONFIG 6 /* Monitor promoted slave. */

3.1 开始故障转移

在sentinelStartFailoverIfNeeded方法中，如果发现Master已经为客观下线，那么会开始进行故障转移。内部会调用sentinelStartFailover方法，修改failover_state和将当前纪元加1，设置故障转移开始时间。

c 复制代码

//sentinel.c#sentinelStartFailover
void sentinelStartFailover(sentinelRedisInstance *master) {
    serverAssert(master->flags & SRI_MASTER);

    master->failover_state = SENTINEL_FAILOVER_STATE_WAIT_START;
    master->flags |= SRI_FAILOVER_IN_PROGRESS;
    master->failover_epoch = ++sentinel.current_epoch;
    sentinelEvent(LL_WARNING,"+new-epoch",master,"%llu",
        (unsigned long long) sentinel.current_epoch);
    sentinelEvent(LL_WARNING,"+try-failover",master,"%@");
    master->failover_start_time = mstime()+rand()%SENTINEL_MAX_DESYNC;
    master->failover_state_change_time = mstime();
}

然后会调用第一个sentinelAskMasterStateToOtherSentinels方法，向其他Sentinel发送is-master-down-by-addr，此时会带runId，要求其他Sentinel对其进行投票。

3.2 选择Sentinel Leader

开始故障转移后，会要求其他Sentinel投票给当前的Sentinel。会调用sentinelAskMasterStateToOtherSentinels，和判断master状态是一样的请求，只是最后的参数设置为sentinel的runId，要求其他sentinel给当前sentinel投票。如果其中一个sentinel(暂未开始故障转移)给其他Sentinel投票了，那么它在这一轮将失去成为Leader的资格。

c 复制代码

if (sentinelStartFailoverIfNeeded(ri))
     sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_ASK_FORCED);

然后在sentinelFailoverStateMachine会对故障转移状态进行判断，然后会进入到sentinelFailoverWaitStart方法中，在该方法中，会判断投票情况。

首先，会判断一下其他Sentinel投票的情况，然后从中找出一个投票数最大的Sentinel节点

c 复制代码

//sentinel.c#sentinelGetLeader
di = dictGetIterator(master->sentinels);
while((de = dictNext(di)) != NULL) {
	sentinelRedisInstance *ri = dictGetVal(de);
	if (ri->leader != NULL && ri->leader_epoch == sentinel.current_epoch)
	            sentinelLeaderIncr(counters,ri->leader);
}
dictReleaseIterator(di);

di = dictGetIterator(counters);
while((de = dictNext(di)) != NULL) {
	uint64_t votes = dictGetUnsignedIntegerVal(de);
	if (votes > max_votes) {
		max_votes = votes;
		winner = dictGetKey(de);
	}
}

其次，如果找到拥有最大票数的sentinel，则当前的Sentinel把票投给他，否则投给自己。然后会再从中找出最大投票的节点。

c 复制代码

//sentinel.c#sentinelGetLeader
if (winner)
    myvote = sentinelVoteLeader(master,epoch,winner,&leader_epoch); 
else
    myvote = sentinelVoteLeader(master,epoch,sentinel.myid,&leader_epoch);
if (myvote && leader_epoch == epoch) {
	uint64_t votes = sentinelLeaderIncr(counters,myvote);
	if (votes > max_votes) {
		max_votes = votes;
		winner = myvote;
	}
}

最后，要求拥有最大票数的Sentinel，票数必须同时满足大于等于( Sentinel总数 + 1) /2 和大于等于配置的参数，才认为这个Sentinel是Leader

c 复制代码

//sentinel.c#sentinelGetLeader
voters_quorum = voters/2+1;
if (winner && (max_votes < voters_quorum || max_votes < master->quorum))
        winner = NULL;
winner = winner ? sdsnew(winner) : NULL;
return winner;

回到sentinelFailoverWaitStart方法中，会判断当前执行的Sentinel是否是Leader，如果不是，则判断选Leader有没有超时。election_timeout默认10s，failover_timeout默认3分钟，超时时间取它们间最小值。

如果当前Sentinel是Leader的话，则会修改状态。

c 复制代码

//sentinel.c#sentinelFailoverWaitStart
if (!isleader && !(ri->flags & SRI_FORCE_FAILOVER)) {
	int election_timeout = SENTINEL_ELECTION_TIMEOUT;
	/* The election timeout is the MIN between SENTINEL_ELECTION_TIMEOUT
         * and the configured failover timeout. */
	if (election_timeout > ri->failover_timeout)
	            election_timeout = ri->failover_timeout;
	/* Abort the failover if I'm not the leader after some time. */
	if (mstime() - ri->failover_start_time > election_timeout) 
	{
		sentinelEvent(LL_WARNING,"-failover-abort-not-elected",ri,"%@");
		sentinelAbortFailover(ri);
	}
	return;
}
sentinelEvent(LL_WARNING,"+elected-leader",ri,"%@");
if (sentinel.simfailure_flags & SENTINEL_SIMFAILURE_CRASH_AFTER_ELECTION)
        sentinelSimFailureCrash();
ri->failover_state = SENTINEL_FAILOVER_STATE_SELECT_SLAVE;
ri->failover_state_change_time = mstime();
sentinelEvent(LL_WARNING,"+failover-state-select-slave",ri,"%@");

3.3 选择从服务器

随后会开始选择从服务器，会执行sentinelFailoverSelectSlave方法，内部会调用sentinelSelectSlave选择出一个合适的从服务器。

如果Master主观下线，则max_master_down_time为down_after_period（配置down-after-milliseconds，默认为30s）的10倍。

1）如果从服务器被判断为主观或客观离线，则不选择。

2）如果从服务器断开连接，则不选择。

3）如果从服务器最后可用时间last_avail_time，超过了5个PING周期（5s）（5s没有有效回复PING命令），则不选择。

4）如果从服务器的priority为0，则不选择。

5）如果从服务器INFO命令回复太久，则不选择。

6）如果从服务器断开时间太长（默认超过5分钟（30 * 10 s），认为连接不稳定），则不选择。

最后，将剩余Slave再比较选择出一个（比较方法为：compareSlavesForPromotion）：优先级如下

1）slave_priority优先级

2）复制偏移量（slave_repl_offset）比较大的。

3）比较runId的字符顺序

c 复制代码

//sentinel.c#sentinelSelectSlave
sentinelRedisInstance *sentinelSelectSlave(sentinelRedisInstance *master) {
    sentinelRedisInstance **instance =
        zmalloc(sizeof(instance[0])*dictSize(master->slaves));
    sentinelRedisInstance *selected = NULL;
    int instances = 0;
    dictIterator *di;
    dictEntry *de;
    mstime_t max_master_down_time = 0;

    if (master->flags & SRI_S_DOWN)
        max_master_down_time += mstime() - master->s_down_since_time;
    max_master_down_time += master->down_after_period * 10;

    di = dictGetIterator(master->slaves);
    while((de = dictNext(di)) != NULL) {
        sentinelRedisInstance *slave = dictGetVal(de);
        mstime_t info_validity_time;

        if (slave->flags & (SRI_S_DOWN|SRI_O_DOWN)) continue;
        if (slave->link->disconnected) continue;
        if (mstime() - slave->link->last_avail_time > SENTINEL_PING_PERIOD*5) continue;
        if (slave->slave_priority == 0) continue;

        /* If the master is in SDOWN state we get INFO for slaves every second.
         * Otherwise we get it with the usual period so we need to account for
         * a larger delay. */
        if (master->flags & SRI_S_DOWN)
            info_validity_time = SENTINEL_PING_PERIOD*5;
        else
            info_validity_time = SENTINEL_INFO_PERIOD*3;
        if (mstime() - slave->info_refresh > info_validity_time) continue;
        if (slave->master_link_down_time > max_master_down_time) continue;
        instance[instances++] = slave;
    }
    dictReleaseIterator(di);
    if (instances) {
        qsort(instance,instances,sizeof(sentinelRedisInstance*),
            compareSlavesForPromotion);
        selected = instance[0];
    }
    zfree(instance);
    return selected;
}

回到sentinelFailoverSelectSlave方法中，如果没有Slave选中，则会终止这次故障转移。

如果选择了Slave，会修改flags，设置提升的从服务器，修改故障转移状态。

c 复制代码

//sentinel.c#sentinelFailoverSelectSlave
sentinelRedisInstance *slave = sentinelSelectSlave(ri);
if (slave == NULL) {
	sentinelEvent(LL_WARNING,"-failover-abort-no-good-slave",ri,"%@");
	sentinelAbortFailover(ri);
} else {
	sentinelEvent(LL_WARNING,"+selected-slave",slave,"%@");
	slave->flags |= SRI_PROMOTED;
	ri->promoted_slave = slave;
	ri->failover_state = SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE;
	ri->failover_state_change_time = mstime();
	sentinelEvent(LL_NOTICE,"+failover-state-send-slaveof-noone",
	            slave, "%@");
}

3.4 提升从服务器

会把选择的从服务器提升为主服务器，哨兵会向选择的从服务器发送如下命令：

objectivec 复制代码

MULTI  //开启一个事务
SLAVEOF NO ONE //关闭从服务器的复制功能，将其转换为一个主服务器
CONFIG REWRITE //将从服务器redis.conf进行重写（会根据当前运行中的配置，重写原来的配置）
CLIENT KILL TYPE nornal //关闭连接到该服务器的客户端
EXEC //执行事务

在sentinelFailoverSendSlaveOfNoOne方法内，会调用sentinelSendSlaveOf方法，以下代码为几个命令的封装和发送。最后sentinelFailoverSendSlaveOfNoOne方法内，会修改故障转移状态为SENTINEL_FAILOVER_STATE_WAIT_PROMOTION。

c 复制代码

//sentinel.c#sentinelSendSlaveOf
int sentinelSendSlaveOf(sentinelRedisInstance *ri, char *host, int port) {
    char portstr[32];
    int retval;

    ll2string(portstr,sizeof(portstr),port);

    /* If host is NULL we send SLAVEOF NO ONE that will turn the instance
     * into a master. */
    if (host == NULL) {
        host = "NO";
        memcpy(portstr,"ONE",4);
    }

    retval = redisAsyncCommand(ri->link->cc,
        sentinelDiscardReplyCallback, ri, "%s",
        sentinelInstanceMapCommand(ri,"MULTI"));
    if (retval == C_ERR) return retval;
    ri->link->pending_commands++;

    retval = redisAsyncCommand(ri->link->cc,
        sentinelDiscardReplyCallback, ri, "%s %s %s",
        sentinelInstanceMapCommand(ri,"SLAVEOF"),
        host, portstr);
    if (retval == C_ERR) return retval;
    ri->link->pending_commands++;

    retval = redisAsyncCommand(ri->link->cc,
        sentinelDiscardReplyCallback, ri, "%s REWRITE",
        sentinelInstanceMapCommand(ri,"CONFIG"));
    if (retval == C_ERR) return retval;
    ri->link->pending_commands++;

    retval = redisAsyncCommand(ri->link->cc,
        sentinelDiscardReplyCallback, ri, "%s KILL TYPE normal",
        sentinelInstanceMapCommand(ri,"CLIENT"));
    if (retval == C_ERR) return retval;
    ri->link->pending_commands++;

    retval = redisAsyncCommand(ri->link->cc,
        sentinelDiscardReplyCallback, ri, "%s",
        sentinelInstanceMapCommand(ri,"EXEC"));
    if (retval == C_ERR) return retval;
    ri->link->pending_commands++;

    return C_OK;
}

3.5 等待从服务器切换为主服务

在sentinelFailoverWaitPromotion方法内，仅仅是判断本次故障转移时间，是否超过设置的故障超时时间。

c 复制代码

void sentinelFailoverWaitPromotion(sentinelRedisInstance *ri) {
    /* Just handle the timeout. Switching to the next state is handled
     * by the function parsing the INFO command of the promoted slave. */
    if (mstime() - ri->failover_state_change_time > ri->failover_timeout) {
        sentinelEvent(LL_WARNING,"-failover-abort-slave-timeout",ri,"%@");
        sentinelAbortFailover(ri);
    }
}

真正的处理逻辑在从服务器对INFO命令的回复中，即sentinelRefreshInstanceInfo方法中（如果提升的从服务器成为Master之后，上报的角色就会修改）。在该方法中，会修改故障转移状态等操作。

c 复制代码

//sentinel.c#sentinelRefreshInstanceInfo
if ((ri->flags & SRI_SLAVE) && role == SRI_MASTER) {
	/* If this is a promoted slave we can change state to the
         * failover state machine. */
	if ((ri->flags & SRI_PROMOTED) &&
	            (ri->master->flags & SRI_FAILOVER_IN_PROGRESS) &&
	            (ri->master->failover_state ==
	                SENTINEL_FAILOVER_STATE_WAIT_PROMOTION)) {
		ri->master->config_epoch = ri->master->failover_epoch;
		ri->master->failover_state = SENTINEL_FAILOVER_STATE_RECONF_SLAVES;
		ri->master->failover_state_change_time = mstime();
		sentinelFlushConfig();
		sentinelEvent(LL_WARNING,"+promoted-slave",ri,"%@");
		if (sentinel.simfailure_flags &
		                SENTINEL_SIMFAILURE_CRASH_AFTER_PROMOTION)
		                sentinelSimFailureCrash();
		sentinelEvent(LL_WARNING,"+failover-state-reconf-slaves",
		                ri->master,"%@");
		sentinelCallClientReconfScript(ri->master,SENTINEL_LEADER,
		                "start",ri->master->addr,ri->addr);
		sentinelForceHelloUpdateForMaster(ri->master);
	}
}

3.6 其他从服务器切换主服务器

在sentinelFailoverReconfNextSlave方法内，会对其他从服务器发送命令（调用sentinelSendSlaveOf方法）。命令如下：

rust 复制代码

MULTI  //开启一个事务
SLAVEOF <promoted_slave->addr->ip> <promoted_slave->addr->port> //切换主服务器
CONFIG REWRITE //将从服务器redis.conf进行重写（会根据当前运行中的配置，重写原来的配置）
CLIENT KILL TYPE nornal //关闭连接到该服务器的客户端
EXEC //执行事务

最后会调用sentinelFailoverDetectEnd()方法，判断所有的从服务器是否已经切换成功，如果是则话，则修改故障转移状态为：SENTINEL_FAILOVER_STATE_UPDATE_CONFIG

3.7 主服务器替换

当状态为SENTINEL_FAILOVER_STATE_UPDATE_CONFIG时，会触发sentinelFailoverSwitchToPromotedSlave方法的执行。

c 复制代码

//sentinel.c#sentinelHandleDictOfRedisInstances
...
while((de = dictNext(di)) != NULL) {
	sentinelRedisInstance *ri = dictGetVal(de);
	sentinelHandleRedisInstance(ri);
	if (ri->flags & SRI_MASTER) {
		sentinelHandleDictOfRedisInstances(ri->slaves);
		sentinelHandleDictOfRedisInstances(ri->sentinels);
		if (ri->failover_state == SENTINEL_FAILOVER_STATE_UPDATE_CONFIG) {
			switch_to_promoted = ri;
		}
	}
}
if (switch_to_promoted)
        sentinelFailoverSwitchToPromotedSlave(switch_to_promoted);

在sentinelResetMasterAndChangeAddress中，会将原来的Master连接给重置掉，在sentinelResetMaster会把故障转移状态修改为SENTINEL_FAILOVER_STATE_NONE。然后将从服务器和旧Master加入到Slaves信息中。所有连接会在下一个sentinelTimer中重新连接。

c 复制代码

//sentinel.c#sentinelResetMasterAndChangeAddress
sentinelResetMaster(master,SENTINEL_RESET_NO_SENTINELS);
oldaddr = master->addr;
master->addr = newaddr;
master->o_down_since_time = 0;
master->s_down_since_time = 0;
/* Add slaves back. */
for (j = 0; j < numslaves; j++) 
{
	sentinelRedisInstance *slave;
	slave = createSentinelRedisInstance(NULL,SRI_SLAVE,slaves[j]->ip,
	                    slaves[j]->port, master->quorum, master);
	releaseSentinelAddr(slaves[j]);
	if (slave) sentinelEvent(LL_NOTICE,"+slave",slave,"%@");
}

4. 问题

4.1 旧Master服务器启动，如何知道自己已经不是主服务器？

旧Master启动后，Sentinel会对它进行监听，如果连接成功，会定时发送INFO信息。根据返回的信息，会发现Sentinel中将其标记为slave，但是旧Master上报为Master，这个时候就会让这个旧Master去复制新的Master。以下代码为主要逻辑：

c 复制代码

//sentinel.c#sentinelRefreshInstanceInfo
if ((ri->flags & SRI_SLAVE) && role == SRI_MASTER) {
	/* If this is a promoted slave we can change state to the
         * failover state machine. */
	if ((ri->flags & SRI_PROMOTED) &&
	            (ri->master->flags & SRI_FAILOVER_IN_PROGRESS) &&
	            (ri->master->failover_state ==
	                SENTINEL_FAILOVER_STATE_WAIT_PROMOTION)) {
        //从服务器替换为主服务器逻辑
		...
	} else {
		/* A slave turned into a master. We want to force our view and
             * reconfigure as slave. Wait some time after the change before
             * going forward, to receive new configs if any. */
		mstime_t wait_time = SENTINEL_PUBLISH_PERIOD*4;
		if (!(ri->flags & SRI_PROMOTED) &&
		                 sentinelMasterLooksSane(ri->master) &&
		                 sentinelRedisInstanceNoDownFor(ri,wait_time) &&
		                 mstime() - ri->role_reported_time > wait_time) {
			int retval = sentinelSendSlaveOf(ri,
			                        ri->master->addr->ip,
			                        ri->master->addr->port);
			if (retval == C_OK)
			                    sentinelEvent(LL_NOTICE,"+convert-to-slave",ri,"%@");
		}
	}
}

4.2 其他Sentinel怎么知道新Master的地址，并进行监控？

Sentinel会对Master和Slave订阅__sentinel__:hello频道，然后每隔2S发送信息，其中就包括了Master的信息，如果Master已经被修改了，那么其余Sentinel会从Master或者Slave的pub/sub中得知。在sentinelProcessHelloMessage方法中，会对切换的Master信息进行处理，然后监听新的Master。

c 复制代码

//sentinel.c#sentinelProcessHelloMessage
/* Update master info if received configuration is newer. */
if (si && master->config_epoch < master_config_epoch) {
	master->config_epoch = master_config_epoch;
	if (master_port != master->addr->port ||
	                strcmp(master->addr->ip, token[5])) {
		sentinelAddr *old_addr;
		sentinelEvent(LL_WARNING,"+config-update-from",si,"%@");
		sentinelEvent(LL_WARNING,"+switch-master",
		                    master,"%s %s %d %s %d",
		                    master->name,
		                    master->addr->ip, master->addr->port,
		                    token[5], master_port);
		old_addr = dupSentinelAddr(master->addr);
		sentinelResetMasterAndChangeAddress(master, token[5], master_port);
		sentinelCallClientReconfScript(master,
		                    SENTINEL_OBSERVER,"start",
		                    old_addr,master->addr);
		releaseSentinelAddr(old_addr);
	}
}

5. 参考资料

1）《Redis5 设计与源码分析》

2）《Redis设计与实现》

3）Redis Sentinel的选举Leader原理及源码解析（结合Raft算法）

4）Redis源码5.0分支