一文带你了解熟悉 Zookeeper 选举机制

1. Zookeeper 选举机制概述

  • SID:服务器ID。用来唯一标识一台ZooKeeper集群中的机器,每台机器不能重复,和myid一致。
  • ZXID:事务ID。ZXID是一个事务ID,用来标识一次服务器状态的变更。在某一时刻,集群中的每台机器的ZXID值不一定完全一致,这和ZooKeeper服务器对于客户端"更新请求"的处理逻辑有关。
  • Epoch:每个Leader任期的代号。没有Leader时同一轮投票过程中的逻辑时钟值是相同的。每投完一次票这个数据就会增加

1.1. Zookeeper 选举机制

当第一次启动时,Zookeeper的选举机制如下图:

(1)服务器1启动,发起一次选举。服务器1投自己一票。此时服务器1票数一票,不够半数以上(3票),选举无法完成,服务器1状态保持为LOOKING;

(2)服务器2启动,再发起一次选举。服务器1和2分别投自己一票并交换选票信息:此时服务器1发现服务器2的myid比自己目前投票推举的(服务器1)大,更改选票为推举服务器2。此时服务器1票数0票,服务器2票数2票,没有半数以上结果,选举无法完成,服务器1,2状态保持LOOKING

(3)服务器3启动,发起一次选举。此时服务器1和2都会更改选票为服务器3。此次投票结果:服务器1为0票,服务器2为0票,服务器3为3票。此时服务器3的票数已经超过半数,服务器3当选Leader。服务器1,2更改状态为FOLLOWING,服务器3更改状态为LEADING;

(4)服务器4启动,发起一次选举。此时服务器1,2,3已经不是LOOKING状态,不会更改选票信息。交换选票信息结果:服务器3为3票,服务器4为1票。此时服务器4服从多数,更改选票信息为服务器3,并更改状态为FOLLOWING;

(5)服务器5启动,同4一样当小弟。

当非第一次启动时,选举过程如下所示:

(1)当ZooKeeper集群中的一台服务器出现以下两种情况之一时,就会开始进入Leader选举:

•服务器初始化启动。

•服务器运行期间无法和Leader保持连接。

(2)而当一台机器进入Leader选举流程时,当前集群也可能会处于以下两种状态:

•集群中本来就已经存在一个Leader。

对于第一种已经存在Leader的情况,机器试图去选举Leader时,会被告知当前服务器的Leader信息,对于该机器来说,仅仅需要和Leader机器建立连接,并进行状态同步即可。

•集群中确实不存在Leader。

假设ZooKeeper由5台服务器组成,SID分别为1、2、3、4、5,ZXID分别为8、8、8、7、7,并且此时SID为3的服务器是Leader。某一时刻,3和5服务器出现故障,因此开始进行Leader选举。

SID为1、2、4的机器投票情况:

(EPOCH,ZXID,SID) (EPOCH,ZXID,SID) (EPOCH,ZXID,SID)
(1,8,1) (1,8,2) (1,7,4)

1.2. Leader 选举规则

选举Leader规则:

  1. EPOCH大的直接胜出
  2. EPOCH相同,事务id大的胜出
  3. 事务id相同,服务器id大的胜出

1.3. 选举流程分析

2. 选举准备

2.1. 选举准备源码概览

2.2. QuorumPeer.java 源码

java 复制代码
public synchronized void start() {
    if (!getView().containsKey(myid)) {
        throw new RuntimeException("My id" + myid + " not in the peer list");
     }
    loadDataBase();
    startServerCnxnFactory();
    try {
        adminServer.start();
    } catch (AdminServerException e) {
        LOG.warn("Problem startingAdminServer", e);
        System.out.println(e);
    }
       // 选举准备
    startLeaderElection();
    super.start();
}
 
synchronized public void startLeaderElection() {
   try {
       if (getPeerState() ==ServerState.LOOKING) {
                //  创建选票
                // (1)选票组件:epoch(leader的任期代号)、zxid(某个leader当选期间执行的事务编号)、myid(serverid)
                // (2)开始选票时,都是先投自己
           currentVote = new Vote(myid,getLastLoggedZxid(), getCurrentEpoch());
       }
   } catch(IOException e) {
       RuntimeException re = newRuntimeException(e.getMessage());
       re.setStackTrace(e.getStackTrace());
       throw re;
   }
 
   // if (!getView().containsKey(myid)) {
  //     throw new RuntimeException("My id " + myid + " not in thepeer list");
    //}
    if (electionType == 0) {
        try {
            udpSocket = new DatagramSocket(getQuorumAddress().getPort());
            responder = new ResponderThread();
            responder.start();
        } catch (SocketException e) {
            throw new RuntimeException(e);
        }
    }
       // 创建选举算法实例
    this.electionAlg = createElectionAlgorithm(electionType);
}
 
protectedElection createElectionAlgorithm(intelectionAlgorithm){
    Election le=null;
 
    //TODO: use a factory rather than a switch
    switch (electionAlgorithm) {
    case 0:
        le = new LeaderElection(this);
        break;
    case 1:
        le = new AuthFastLeaderElection(this);
        break;
    case 2:
        le = new AuthFastLeaderElection(this,true);
        break;
    case 3:
              // 1创建QuorumCnxnManager,负责选举过程中的所有网络通信
        QuorumCnxManager qcm = createCnxnManager();
        QuorumCnxManager oldQcm =qcmRef.getAndSet(qcm);
             
        if (oldQcm != null) {
            LOG.warn("Clobberingalready-set QuorumCnxManager (restarting leader election?)");
            oldQcm.halt();
        }
        QuorumCnxManager.Listener listener = qcm.listener;
        if(listener != null){
                     // 2启动监听线程
            listener.start();
                    
                     // 3准备开始选举
            FastLeaderElection fle = new FastLeaderElection(this,qcm);
            fle.start();
            le = fle;
        } else {
            LOG.error("Null listener wheninitializing cnx manager");
        }
        break;
    default:
        assert false;
    }
    return le;
}

2.2.1. 网络通信组件初始化

kotlin 复制代码
public QuorumCnxManager createCnxnManager() {
    return new QuorumCnxManager(this,
            this.getId(),
            this.getView(),
            this.authServer,
            this.authLearner,
            this.tickTime * this.syncLimit,
            this.getQuorumListenOnAllIPs(),
            this.quorumCnxnThreadsSize,
            this.isQuorumSaslAuthEnabled());
}
 
public QuorumCnxManager(QuorumPeerself,
                        final long mySid,
                        Map<Long,QuorumPeer.QuorumServer> view,
                        QuorumAuthServer authServer,
                        QuorumAuthLearner authLearner,
                        int socketTimeout,
                        boolean listenOnAllIPs,
                        int quorumCnxnThreadsSize,
                        boolean quorumSaslAuthEnabled) {
       // 创建各种队列                              
    this.recvQueue= new ArrayBlockingQueue<Message>(RECV_CAPACITY);
    this.queueSendMap= new ConcurrentHashMap<Long,ArrayBlockingQueue<ByteBuffer>>();
    this.senderWorkerMap= new ConcurrentHashMap<Long,SendWorker>();
    this.lastMessageSent= new ConcurrentHashMap<Long,ByteBuffer>();
 
    String cnxToValue =System.getProperty("zookeeper.cnxTimeout");
    if(cnxToValue != null){
        this.cnxTO =Integer.parseInt(cnxToValue);
    }
 
    this.self = self;
 
    this.mySid = mySid;
    this.socketTimeout = socketTimeout;
    this.view = view;
    this.listenOnAllIPs = listenOnAllIPs;
 
    initializeAuth(mySid, authServer, authLearner,quorumCnxnThreadsSize,
            quorumSaslAuthEnabled);
 
    // Starts listener thread that waits forconnection requests
    listener = new Listener();
   listener.setName("QuorumPeerListener");
}

2.2.2. 监听线程初始化

  • 点击QuorumCnxManager.Listener,找到对应的run方法
scss 复制代码
public void run() {
    int numRetries = 0;
    InetSocketAddress addr;
    Socket client = null;
    Exception exitException = null;
      
    while ((!shutdown) &&(portBindMaxRetry == 0 || numRetries < portBindMaxRetry)) {
        try {
            if(self.shouldUsePortUnification()) {
                LOG.info("CreatingTLS-enabled quorum server socket");
                ss = newUnifiedServerSocket(self.getX509Util(), true);
            } else if (self.isSslQuorum()) {
                LOG.info("CreatingTLS-only quorum server socket");
                ss = newUnifiedServerSocket(self.getX509Util(), false);
            } else {
                ss = new ServerSocket();
            }
 
            ss.setReuseAddress(true);
 
            if (self.getQuorumListenOnAllIPs()){
                int port =self.getElectionAddress().getPort();
                addr = newInetSocketAddress(port);
            } else {
                // Resolve hostname for thisserver in case the
                // underlying ip address haschanged.
               self.recreateSocketAddresses(self.getId());
                addr =self.getElectionAddress();
            }
            LOG.info("My election bindport: " + addr.toString());
            setName(addr.toString());
                     // 绑定服务器地址
            ss.bind(addr);
                     // 死循环
            while (!shutdown) {
                try {
                                   // 阻塞,等待处理请求
                    client = ss.accept();
                                  
                    setSockOpts(client);
                    LOG.info("Receivedconnection request "
                            +formatInetAddr((InetSocketAddress)client.getRemoteSocketAddress()));
                    // Receive and handle theconnection request
                    // asynchronously if thequorum sasl authentication is
                    // enabled. This isrequired because sasl server
                    // authentication processmay take few seconds to finish,
                    // this may delay next peer connectionrequests.
                    if (quorumSaslAuthEnabled){
                       receiveConnectionAsync(client);
                    } else {
                       receiveConnection(client);
                    }
                    numRetries = 0;
                } catch (SocketTimeoutExceptione) {
                    LOG.warn("The socketis listening for the election accepted "
                             + "and ittimed out unexpectedly, but will retry."
                             + "seeZOOKEEPER-2836");
                }
            }
        } catch (IOException e) {
            ... ...
            closeSocket(client);
        }
    }
    ... ..
.

}

2.2.3. 选举准备 :点击点击FastLeaderElection

ini 复制代码
public FastLeaderElection(QuorumPeerself, QuorumCnxManager manager){
    this.stop = false;
    this.manager = manager;
 
    starter(self, manager);
}
 
private void starter(QuorumPeer self,QuorumCnxManager manager) {
    this.self = self;
    proposedLeader = -1;
    proposedZxid = -1;
      
       // 初始化队列和信息
    sendqueue = new LinkedBlockingQueue<ToSend>();
    recvqueue = new LinkedBlockingQueue<Notification>();
    this.messenger= new Messenger(manager);
}

3. 选举执行

3.1. 选举执行源码概览

3.2. QuorumPeer.java

scss 复制代码
public synchronized void start() {
    if (!getView().containsKey(myid)) {
        throw new RuntimeException("My id" + myid + " not in the peer list");
    }
       // 冷启动数据恢复
    loadDataBase();
    startServerCnxnFactory();
    try {
              // 启动通信工厂实例对象
        adminServer.start();
    } catch (AdminServerException e) {
        LOG.warn("Problem startingAdminServer", e);
        System.out.println(e);
    }
       // 准备选举环境
    startLeaderElection();
       // 执行选举
    super.start();
}


1)执行super.start();就相当于执行QuorumPeer.java类中的run()方法

当Zookeeper启动后,首先都是Looking状态,通过选举,让其中一台服务器成为Leader,其他的服务器成为Follower。

QuorumPeer.java


public void run() {
    updateThreadName();
 
    LOG.debug("Starting quorumpeer");
    try {
        jmxQuorumBean = new QuorumBean(this);
       MBeanRegistry.getInstance().register(jmxQuorumBean, null);
        for(QuorumServer s:getView().values()){
            ZKMBeanInfo p;
            if (getId() == s.id) {
                p = jmxLocalPeerBean = new LocalPeerBean(this);
                try {
                   MBeanRegistry.getInstance().register(p, jmxQuorumBean);
                } catch (Exception e) {
                    LOG.warn("Failed to register withJMX", e);
                    jmxLocalPeerBean = null;
                }
            } else {
                RemotePeerBean rBean = new RemotePeerBean(this, s);
                try {
                    MBeanRegistry.getInstance().register(rBean, jmxQuorumBean);
                    jmxRemotePeerBean.put(s.id, rBean);
                } catch (Exception e) {
                    LOG.warn("Failed toregister with JMX", e);
                }
            }
        }
    } catch (Exception e) {
        LOG.warn("Failed to register withJMX", e);
        jmxQuorumBean = null;
    }
 
    try {
        /*
         * Main loop
         */
        while (running) {
            switch (getPeerState()) {
            case LOOKING:
                LOG.info("LOOKING");
 
                if(Boolean.getBoolean("readonlymode.enabled")) {
                    LOG.info("Attemptingto start ReadOnlyZooKeeperServer");
 
                    // Create read-only serverbut don't start it immediately
                    final ReadOnlyZooKeeperServer roZk =
                        new ReadOnlyZooKeeperServer(logFactory, this, this.zkDb);
 
                    // Instead of starting roZkimmediately, wait some grace
                    // period before we decidewe're partitioned.
                    //
                    // Thread is used herebecause otherwise it would require
                    // changes in each ofelection strategy classes which is
                    // unnecessary codecoupling.
                    Thread roZkMgr = newThread() {
                        public void run() {
                            try {
                                // lower-boundgrace period to 2 secs
                               sleep(Math.max(2000, tickTime));
                                if(ServerState.LOOKING.equals(getPeerState())) {
                                   roZk.startup();
                                }
                            } catch(InterruptedException e) {
                               LOG.info("Interrupted while attempting to startReadOnlyZooKeeperServer, not started");
                            } catch (Exceptione) {
                               LOG.error("FAILED to start ReadOnlyZooKeeperServer", e);
                            }
                        }
                    };
                    try {
                        roZkMgr.start();
                        reconfigFlagClear();
                        if (shuttingDownLE) {
                            shuttingDownLE =false;
                           startLeaderElection();
                        }
                        // 进行选举,选举结束,返回最终成为Leader胜选的那张选票
                        setCurrentVote(makeLEStrategy().lookForLeader());
                    } catch (Exception e) {
                       LOG.warn("Unexpected exception", e);
                       setPeerState(ServerState.LOOKING);
                    } finally {
                        // If the thread is inthe the grace period, interrupt
                        // to come out ofwaiting.
                        roZkMgr.interrupt();
                        roZk.shutdown();
                    }
                } else {
                    try {
                       reconfigFlagClear();
                        if (shuttingDownLE) {
                           shuttingDownLE =false;
                          startLeaderElection();
                           }
                       setCurrentVote(makeLEStrategy().lookForLeader());
                    } catch (Exception e) {
                       LOG.warn("Unexpected exception", e);
                       setPeerState(ServerState.LOOKING);
                    }                       
                }
                break;
            case OBSERVING:
                try {
                   LOG.info("OBSERVING");
                    setObserver(makeObserver(logFactory));
                    observer.observeLeader();
                } catch (Exception e) {
                    LOG.warn("Unexpectedexception",e );
                } finally {
                    observer.shutdown();
                    setObserver(null); 
                   updateServerState();
                }
                break;
            case FOLLOWING:
                try {
                  LOG.info("FOLLOWING");
                    setFollower(makeFollower(logFactory));
                    follower.followLeader();
                } catch (Exception e) {
                   LOG.warn("Unexpectedexception",e);
                } finally {
                   follower.shutdown();
                   setFollower(null);
                   updateServerState();
                }
                break;
            case LEADING:
                LOG.info("LEADING");
                try {
                    setLeader(makeLeader(logFactory));
                    leader.lead();
                    setLeader(null);
                } catch (Exception e) {
                    LOG.warn("Unexpectedexception",e);
                } finally {
                    if (leader != null) {
                       leader.shutdown("Forcing shutdown");
                        setLeader(null);
                    }
                    updateServerState();
                }
                break;
            }
            start_fle =Time.currentElapsedTime();
        }
    } finally {
        ... ...
    }
}


2)ctrl+alt+b点击lookForLeader()的实现类FastLeaderElection.java


public Vote lookForLeader()throws InterruptedException {
    try {
        self.jmxLeaderElectionBean = newLeaderElectionBean();
        MBeanRegistry.getInstance().register(
                self.jmxLeaderElectionBean,self.jmxLocalPeerBean);
    } catch (Exception e) {
        LOG.warn("Failed to register withJMX", e);
        self.jmxLeaderElectionBean = null;
    }
    if (self.start_fle == 0) {
       self.start_fle =Time.currentElapsedTime();
    }
    try {
              // 正常启动中,所有其他服务器,都会给我发送一个投票
              // 保存每一个服务器的最新合法有效的投票
        HashMap<Long, Vote> recvset = new HashMap<Long, Vote>();
             
              // 存储合法选举之外的投票结果
        HashMap<Long, Vote> outofelection= new HashMap<Long, Vote>();
 
              // 一次选举的最大等待时间,默认值是0.2s
        int notTimeout = finalizeWait;
             
              // 每发起一轮选举,logicalclock++
              // 在没有合法的epoch数据之前,都使用逻辑时钟代替
              // 选举leader的规则:依次比较 epoch(任期)  zxid(事务id)  serverid(myid)  谁大谁当选leader
        synchronized(this){
                     // 更新逻辑时钟,每进行一次选举,都需要更新逻辑时钟
                     // logicalclock = epoch
            logicalclock.incrementAndGet();
                     // 更新选票(serverid,zxid, epoch),
            updateProposal(getInitId(), getInitLastLoggedZxid(),getPeerEpoch());
        }
 
        LOG.info("New election. My id=  " + self.getId() +
                ", proposed zxid=0x"+ Long.toHexString(proposedZxid));
              // 广播选票,把自己的选票发给其他服务器
        sendNotifications();
 
        /*
         * Loop in which we exchangenotifications until we find a leader
         */
              // 一轮一轮的选举直到选举成功
        while ((self.getPeerState() ==ServerState.LOOKING) &&
                (!stop)){
            ... ...
        }
        return null;
    } finally {
        ... ...
    }
}


3)点击sendNotifications,广播选票,把自己的选票发给其他服务器


private void sendNotifications() {
       // 遍历投票参与者,给每台服务器发送选票
    for (long sid :self.getCurrentAndNextConfigVoters()) {
        QuorumVerifier qv =self.getQuorumVerifier();
              // 创建发送选票
        ToSend notmsg = new ToSend(ToSend.mType.notification,
                proposedLeader,
                proposedZxid,
                logicalclock.get(),
                QuorumPeer.ServerState.LOOKING,
                sid,
                proposedEpoch,qv.toString().getBytes());
                           
        if(LOG.isDebugEnabled()){
            LOG.debug("SendingNotification: " + proposedLeader + " (n.leader), 0x"  +
                 Long.toHexString(proposedZxid) + " (n.zxid), 0x" +Long.toHexString(logicalclock.get())  +
                  " (n.round), " +sid + " (recipient), " + self.getId() +
                  " (myid), 0x" +Long.toHexString(proposedEpoch) + " (n.peerEpoch)");
        }
              // 把发送选票放入发送队列
        sendqueue.offer(notmsg);
    }
}


4)在FastLeaderElection.java类中查找WorkerSender线程。


class WorkerSender extends ZooKeeperThread {
    volatile boolean stop;
    QuorumCnxManager manager;
 
    WorkerSender(QuorumCnxManager manager){
        super("WorkerSender");
        this.stop = false;
        this.manager = manager;
    }
 
    public void run() {
        while (!stop) {
            try {
                            // 队列阻塞,时刻准备接收要发送的选票
                ToSend m = sendqueue.poll(3000,TimeUnit.MILLISECONDS);
                if(m == null) continue;
                            // 处理要发送的选票
                process(m);
            } catch (InterruptedException e) {
               break;
            }
        }
        LOG.info("WorkerSender isdown");
    }
 
    /**
     * Called by run() once there is a newmessage to send.
     * @param m     message to send
     */
    void process(ToSend m) {
        ByteBuffer requestBuffer =buildMsg(m.state.ordinal(),
                                           m.leader,
                                           m.zxid,
                                           m.electionEpoch,
                                            m.peerEpoch,
                                           m.configData);
              // 发送选票
        manager.toSend(m.sid, requestBuffer);
    }
}
 
 
public void toSend(Long sid,ByteBuffer b) {
    /*
     * If sending message to myself, thensimply enqueue it (loopback).
     */
       // 判断如果是发给自己的消息,直接进入自己的RecvQueue
    if (this.mySid == sid) {
         b.position(0);
         addToRecvQueue(new Message(b.duplicate(), sid));
        /*
         * Otherwise send to the correspondingthread to send.
         */
    } else {
         /*
          * Start a new connection if doesn'thave one already.
          */
               // 如果是发给其他服务器,创建对应的发送队列或者获取已经存在的发送队列
               // ,并把要发送的消息放入该队列
         ArrayBlockingQueue<ByteBuffer>bq = new ArrayBlockingQueue<ByteBuffer>(
            SEND_CAPACITY);
         ArrayBlockingQueue<ByteBuffer>oldq = queueSendMap.putIfAbsent(sid, bq);
         if (oldq != null) {
             addToSendQueue(oldq, b);
         } else {
             addToSendQueue(bq, b);
         }
               // 将选票发送出去
         connectOne(sid);
    }
}


5)如果数据是发送给自己的,添加到自己的接收队列


public void addToRecvQueue(Messagemsg) {
    synchronized(recvQLock) {
        if (recvQueue.remainingCapacity() == 0){
            try {
                recvQueue.remove();
            } catch (NoSuchElementException ne){
                // element could be removed bypoll()
                 LOG.debug("Trying toremove from an empty " +
                     "recvQueue. Ignoringexception " + ne);
            }
        }
        try {
                     // 将发送给自己的选票添加到recvQueue队列
            recvQueue.add(msg);
        } catch (IllegalStateException ie) {
            // This should never happen
            LOG.error("Unable to insert elementin the recvQueue " + ie);
        }
    }
}    
    

 

6)数据添加到发送队列


private void addToSendQueue(ArrayBlockingQueue<ByteBuffer>queue,
     ByteBuffer buffer) {
    if(queue.remainingCapacity() == 0) {
        try {
           queue.remove();
        } catch(NoSuchElementException ne) {
            //element could be removed by poll()
           LOG.debug("Trying to remove from an empty " +
                   "Queue. Ignoring exception " + ne);
        }
    }
    try {
              //将要发送的消息添加到发送队列
        queue.add(buffer);
    } catch(IllegalStateException ie) {
        // Thisshould never happen
       LOG.error("Unable to insert an element in the queue " + ie);
    }
}


7)与要发送的服务器节点建立通信连接


synchronized void connectOne(long sid){
    if(senderWorkerMap.get(sid) != null) {
       LOG.debug("There is a connection already for server " + sid);
        return;
    }
   synchronized (self.QV_LOCK) {
        boolean knownId = false;
        //Resolve hostname for the remote server before attempting to
        //connect in case the underlying ip address has changed.
       self.recreateSocketAddresses(sid);
       Map<Long, QuorumPeer.QuorumServer> lastCommittedView =self.getView();
       QuorumVerifier lastSeenQV = self.getLastSeenQuorumVerifier();
       Map<Long, QuorumPeer.QuorumServer> lastProposedView =lastSeenQV.getAllMembers();
             
        if(lastCommittedView.containsKey(sid)) {
           knownId = true;
            if(connectOne(sid,lastCommittedView.get(sid).electionAddr))
               return;
        }
        if(lastSeenQV != null && lastProposedView.containsKey(sid)
               && (!knownId || (lastProposedView.get(sid).electionAddr !=
               lastCommittedView.get(sid).electionAddr))) {
           knownId = true;
            if(connectOne(sid,lastProposedView.get(sid).electionAddr))
               return;
        }
        if(!knownId) {
           LOG.warn("Invalid server id: " + sid);
           return;
        }
    }
}
 
 
synchronized private boolean connectOne(long sid, InetSocketAddresselectionAddr){
    if(senderWorkerMap.get(sid) != null) {
       LOG.debug("There is a connection already for server " + sid);
        returntrue;
    }
 
    Socket sock= null;
    try {
       LOG.debug("Opening channel to server " + sid);
        if(self.isSslQuorum()) {
            SSLSocket sslSock = self.getX509Util().createSSLSocket();
            setSockOpts(sslSock);
            sslSock.connect(electionAddr, cnxTO);
            sslSock.startHandshake();
            sock = sslSock;
            LOG.info("SSL handshake complete with {} - {} - {}",sslSock.getRemoteSocketAddress(), sslSock.getSession().getProtocol(),sslSock.getSession().getCipherSuite());
         } else{
            sock = new Socket();
            setSockOpts(sock);
             sock.connect(electionAddr, cnxTO);
 
         }
        LOG.debug("Connected to server " + sid);
        //Sends connection request asynchronously if the quorum
        // saslauthentication is enabled. This is required because
        // saslserver authentication process may take few seconds to
        //finish, this may delay next peer connection requests.
        if(quorumSaslAuthEnabled) {
            initiateConnectionAsync(sock, sid);
        } else{
                     //处理连接
            initiateConnection(sock,sid);
        }
        returntrue;
    } catch(UnresolvedAddressException e) {
        ... ...
    }
}
 
public void initiateConnection(final Socket sock, final Long sid) {
    try {
        startConnection(sock,sid);
    } catch(IOException e) {
       LOG.error("Exception while connecting, id: {}, addr: {}, closinglearner connection",
               new Object[] { sid, sock.getRemoteSocketAddress() }, e);
       closeSocket(sock);
        return;
    }
}


8)创建并启动发送器线程和接收器线程


private boolean startConnection(Socket sock, Long sid)
        throws IOException {
   DataOutputStream dout = null;
   DataInputStream din = null;
    try {
        // UseBufferedOutputStream to reduce the number of IP packets. This is
        //important for x-DC scenarios.
              //通过输出流,向服务器发送数据
       BufferedOutputStream buf = new BufferedOutputStream(sock.getOutputStream());
        dout = newDataOutputStream(buf);
 
        //Sending id and challenge
        //represents protocol version (in other words - message type)
       dout.writeLong(PROTOCOL_VERSION);
       dout.writeLong(self.getId());
        String addr = formatInetAddr(self.getElectionAddress());
        byte[] addr_bytes = addr.getBytes();
       dout.writeInt(addr_bytes.length);
       dout.write(addr_bytes);
       dout.flush();
              //通过输入流读取对方发送过来的选票
        din = new DataInputStream(
                newBufferedInputStream(sock.getInputStream()));
    } catch(IOException e) {
       LOG.warn("Ignoring exception reading or writing challenge: ",e);
       closeSocket(sock);
        returnfalse;
    }
 
    // authenticatelearner
   QuorumPeer.QuorumServer qps = self.getVotingView().get(sid);
    if (qps !=null) {
        // TODO- investigate why reconfig makes qps null.
       authLearner.authenticate(sock, qps.hostname);
    }
 
    // If lostthe challenge, then drop the new connection
           //如果对方的id比我的大,我是没有资格给对方发送连接请求的,直接关闭自己的客户端
    if (sid > self.getId()) {
       LOG.info("Have smaller server identifier, so dropping the " +
               "connection: (" + sid + ", " + self.getId() +")");
       closeSocket(sock);
        //Otherwise proceed with the connection
    } else {
              //初始化,发送器和接收器
        SendWorker sw = new SendWorker(sock,sid);
        RecvWorker rw = new RecvWorker(sock,din, sid, sw);
       sw.setRecv(rw);
 
       SendWorker vsw = senderWorkerMap.get(sid);
 
        if(vsw!= null)
           vsw.finish();
             
       senderWorkerMap.put(sid, sw);
       queueSendMap.putIfAbsent(sid, new ArrayBlockingQueue<ByteBuffer>(
               SEND_CAPACITY));
                           
              //启动发送器线程和接收器线程
        sw.start();
        rw.start();
 
        return true;
    }
    return false;
}


9)点击SendWorker,并查找该类下的run方法

QuorumCnxManager.java

 


public void run() {
   thread Cnt.incrementAndGet();
    try {
        /**
         * If there is nothing in the queue to send, then we
         * send the lastMessage to ensure that the last message
         * was received by the peer. The message could be dropped
         * incase self or the peer shutdown their connection
         * (and exit the thread) prior toreading/processing
         * the last message. Duplicate messages are handled correctly
         * by the peer.
         *
         * If the send queue is non-empty, then we have a recent
         *message than that stored in lastMessage. To avoid sending
         *stale message, we should send the message in the send queue.
         */
       Array BlockingQueue<ByteBuffer> bq = queueSendMap.get(sid);
        if (bq== null || isSendQueueEmpty(bq)) {
          ByteBuffer b = lastMessageSent.get(sid);
           if(b != null) {
              LOG.debug("Attempting to send lastMessage to sid=" + sid);
              send(b);
           }
        }
    } catch(IOException e) {
        LOG.error("Failedto send last message. Shutting down thread.", e);
       this.finish();
    }
 
    try {
              //只要连接没有断开
        while(running && !shutdown && sock != null) {
 
           ByteBuffer b = null;
            try{
               ArrayBlockingQueue<ByteBuffer> bq = queueSendMap
                        .get(sid);
               if (bq != null) {
                                   //不断从发送队列SendQueue中,获取发送消息,并执行发送
                   b = pollSendQueue(bq,1000, TimeUnit.MILLISECONDS);
               } else {
                   LOG.error("No queue of incoming messages for " +
                              "server" + sid);
                   break;
               }
 
               if(b != null){
                                   //更新对于sid这台服务器的最近一条消息
                    lastMessageSent.put(sid, b);
                                  
                                   //执行发送
                   send(b);
               }
            }catch (InterruptedException e) {
               LOG.warn("Interrupted while waiting for message on queue",
                        e);
            }
        }
    } catch(Exception e) {
       LOG.warn("Exception when using channel: for id " + sid
                + " my id = " + QuorumCnxManager.this.mySid
                + " error = " + e);
    }
   this.finish();
   LOG.warn("Send worker leaving thread " + " id " +sid + " my id = " + self.getId());
}
 
synchronized void send(ByteBuffer b) throws IOException {
    byte[] msgBytes = new byte[b.capacity()];
    try {
       b.position(0);
       b.get(msgBytes);
    } catch(BufferUnderflowException be) {
       LOG.error("BufferUnderflowException ", be);
        return;
    }
   // 输出流向外发送
   dout.writeInt(b.capacity());
    dout.write(b.array());
   dout.flush();
}


10)点击RecvWorker,并查找该类下的run方法

QuorumCnxManager.java

 


public void run() {
   threadCnt.incrementAndGet();
    try {
              //只要连接没有断开
        while(running && !shutdown && sock != null) {
            /**
             *Reads the first int to determine the length of the
             *message
             */
            int length = din.readInt();
            if(length <= 0 || length > PACKETMAXSIZE) {
               throw new IOException(
                        "Received packetwith invalid packet: "
                               + length);
            }
            /**
             *Allocates a new ByteBuffer to receive the message
             */
           byte[] msgArray = new byte[length];
            //输入流接收消息
            din.readFully(msgArray, 0, length);
           ByteBuffer message = ByteBuffer.wrap(msgArray);
                     //接收对方发送过来的选票
            addToRecvQueue(newMessage(message.duplicate(), sid));
        }
    } catch(Exception e) {
       LOG.warn("Connection broken for id " + sid + ", my id ="
                + QuorumCnxManager.this.mySid + ", error = " , e);
    } finally {
       LOG.warn("Interrupting SendWorker");
       sw.finish();
       closeSocket(sock);
    }
}
 
public void addToRecvQueue(Message msg) {
   synchronized(recvQLock) {
        if(recvQueue.remainingCapacity() == 0) {
            try{
               recvQueue.remove();
            }catch (NoSuchElementException ne) {
               // element could be removed by poll()
                LOG.debug("Trying to remove from an empty " +
                    "recvQueue. Ignoring exception " + ne);
            }
        }
        try {
                     //将接收到的消息,放入接收消息队列
            recvQueue.add(msg);
        } catch(IllegalStateException ie) {
            //This should never happen
           LOG.error("Unable to insert element in the recvQueue " + ie);
        }
    }
}


11)在FastLeaderElection.java类中查找WorkerReceiver线程。


class WorkerReceiver extends ZooKeeperThread  {
    volatileboolean stop;
   QuorumCnxManager manager;
 
   WorkerReceiver(QuorumCnxManager manager) {
       super("WorkerReceiver");
       this.stop = false;
       this.manager = manager;
    }
 
    public void run() {
 
        Message response;
        while(!stop) {
            //Sleeps on receive
            try{
                            //从RecvQueue中取出选举投票消息(其他服务器发送过来的)
               response = manager.pollRecvQueue(3000, TimeUnit.MILLISECONDS);
               ... ...
            }catch (InterruptedException e) {
               LOG.warn("Interrupted Exception while waiting for new message"+
                        e.toString());
            }
        }
       LOG.info("WorkerReceiver is down");
    }
}
相关推荐
Elastic 中国社区官方博客23 分钟前
使用 Vertex AI Gemini 模型和 Elasticsearch Playground 快速创建 RAG 应用程序
大数据·人工智能·elasticsearch·搜索引擎·全文检索
极客先躯39 分钟前
高级java每日一道面试题-2024年10月3日-分布式篇-分布式系统中的容错策略都有哪些?
java·分布式·版本控制·共识算法·超时重试·心跳检测·容错策略
夜月行者1 小时前
如何使用ssm实现基于SSM的宠物服务平台的设计与实现+vue
java·后端·ssm
程序猿小D1 小时前
第二百六十七节 JPA教程 - JPA查询AND条件示例
java·开发语言·前端·数据库·windows·python·jpa
CHICX12291 小时前
【Hadoop】改一下core-site.xml和hdfs-site.xml配置就可以访问Web UI
xml·大数据·hadoop
潘多编程1 小时前
Java中的状态机实现:使用Spring State Machine管理复杂状态流转
java·开发语言·spring
_阿伟_2 小时前
SpringMVC
java·spring
代码在改了2 小时前
springboot厨房达人美食分享平台(源码+文档+调试+答疑)
java·spring boot
权^2 小时前
MySQL--聚合查询、联合查询、子查询、合并查询(上万字超详解!!!)
大数据·数据库·学习·mysql
猿java2 小时前
使用 Kafka面临的挑战
java·后端·kafka