3.HAService启动流程解析

HAService启动流程解析

本文从 Standalone(non-HA)与 HA 模式切入,梳理 Flink 为"高可用能力"抽象出的统一接口 HighAvailabilityServices 及其启动流程。同时补充与其强相关的两类基础能力接口:Leader 选举(LeaderElection)与 Leader 发现(LeaderRetrievalService)。

一、主题与核心组件职责

本文主要解析 Flink 集群启动过程中 HA 服务的初始化流程,以及 HA 服务内部的核心组件职责。

  • HighAvailabilityServices(接口:HA 能力门面) :对外提供主进程组件的 Leader 选举/发现能力、元数据持久化能力(JobGraph/JobResult/BlobStore)以及清理能力的统一访问入口。典型实现:StandaloneHaServices(NONE)、ZooKeeperLeaderElectionHaServices(ZOOKEEPER)、KubernetesLeaderElectionHaServices(KUBERNETES)。
  • LeaderElectionService(接口:选举后端/工厂) :为某个 componentId 创建专属的 LeaderElection 句柄,并把选举逻辑绑定到对应的 HA 后端实现。典型实现:DefaultLeaderElectionService(基于 LeaderElectionDriverFactory/driver 的通用实现);Standalone 模式下一般不经由该接口创建句柄,而是由 HighAvailabilityServices 直接返回 StandaloneLeaderElection
  • LeaderElection(接口:Leader 选举句柄) :作为 LeaderElectionServiceLeaderContender 之间的桥梁(代理),抽象出"开始参与选举、确认领导权并发布地址、检查领导权、关闭"的统一协议。典型实现:DefaultLeaderElection(driver 驱动)、StandaloneLeaderElection(退化实现)。
  • LeaderRetrievalService(接口:Leader 发现) :负责发现当前 leader 并通知监听器(LeaderRetrievalListener)。典型实现:StandaloneLeaderRetrievalService(start 即 notify)、DefaultLeaderRetrievalService(driver 驱动;ZK 对应 ZooKeeperLeaderRetrievalDriverFactory,K8s 对应 KubernetesLeaderRetrievalDriverFactory)。
  • LeaderContender(接口:Leader 回调) :被 LeaderElection 回调的业务组件接口,用于通知业务组件(如 Dispatcher/ResourceManager)当前 leadership 的授予与撤销。典型实现:DefaultDispatcherRunnerResourceManagerServiceImplJobMasterServiceLeadershipRunnerWebMonitorEndpoint
  • LeaderRetrievalListener(接口:Leader 发现的监听者) :被 LeaderRetrievalService 回调的监听者接口,当 leader 发生变化时接收 leader 的地址和 fencing token。
  • LeaderRetriever(实现:Leader 信息缓存/转发)implements LeaderRetrievalListener,把 notifyLeaderAddress(address, leaderId) 事件翻译成 CompletableFuture<Tuple2<String, UUID>> leaderFuture,并用 AtomicReference 保存"当前这一轮 leaderFuture"(leader 变化时会切换到新的 future)。
  • LeaderGatewayRetriever(抽象实现:从 leaderFuture 到 gatewayFuture)extends LeaderRetrieverimplements GatewayRetriever<T>,把 leaderFuture 映射为 CompletableFuture<T>;leader 变化时通过 notifyNewLeaderAddress(...) 触发 createGateway(newLeaderFuture) 生成新的 gatewayFuture;当 gatewayFuture 异常完成时,getFuture() 会触发重连并用 CAS 避免并发重复创建。
  • RpcGatewayRetriever<F, T>(最终实现:基于 RpcService 连接 leader)extends LeaderGatewayRetriever<T>,在 createGateway(...) 内用 rpcService.connect(leaderAddress, fencingToken, gatewayType) 建立到 leader 的 RPC 连接,并通过 RetryStrategy 实现失败重试。

二、基础服务组件(HA 依赖的地基)

HA 服务的初始化在 Flink 集群启动的早期阶段(initializeServices),它本身也依赖一些更基础的服务:

  • Configuration:提供 HA 模式(NONE, ZOOKEEPER, KUBERNETES)以及存储路径、ZK 地址等配置。
  • RpcSystemUtils / RpcService:用于在 Standalone 模式下拼接 ResourceManager/Dispatcher 等组件的 RPC URL。
  • Executor (ioExecutor):提供后台线程池,用于执行 HA 内部的异步回调和 I/O 操作。
  • BlobStoreService:HA 常常与 BlobServer 深度绑定,HA 服务负责创建并提供高可用的 BlobStore。

三、启动入口与类图分析

HA 服务的入口位于 ClusterEntrypoint#initializeServices,其核心动作是:调用 createHaServices,进而进入 HighAvailabilityServicesUtils.createHighAvailabilityServices(...),根据配置实例化具体的 HighAvailabilityServices 实现,并在后续供 Dispatcher/ResourceManager/WebMonitor 等组件获取"选举句柄 + 发现服务"。

3.1 启动入口:initializeServices → createHaServices

java 复制代码
// flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/ClusterEntrypoint.java
protected HighAvailabilityServices createHaServices(
        Configuration configuration, Executor executor, RpcSystemUtils rpcSystemUtils)
        throws Exception {
    return HighAvailabilityServicesUtils.createHighAvailabilityServices(
            configuration,
            executor,
            AddressResolution.NO_ADDRESS_RESOLUTION,
            rpcSystemUtils,
            this);
}
  • 依赖传递Configuration 决定 HA 模式与后端参数,Executor 用于 IO/回调执行,RpcSystemUtils 用于 NONE(Standalone) 下拼接 RPC URL。

3.2 UML:入口依赖 + HA Services 实现

createHaServices()
getRpcUrl(...)
createBlobStore()
Configuration
Executor
RpcSystemUtils
RpcService
FatalErrorHandler
BlobStoreService
ClusterEntrypoint
HighAvailabilityServicesUtils
<<interface>>
HighAvailabilityServices
StandaloneHaServices
ZooKeeperLeaderElectionHaServices
KubernetesLeaderElectionHaServices

3.3 UML:Leader 选举/发现(接口 + 默认实现)

createLeaderElection(componentId)
startLeaderElection(contender)
start(listener)
<<interface>>
LeaderElectionService
<<interface>>
LeaderElection
<<interface>>
LeaderContender
DefaultLeaderElectionService
DefaultLeaderElection
StandaloneLeaderElection
<<interface>>
LeaderRetrievalService
<<interface>>
LeaderRetrievalListener
DefaultLeaderRetrievalService
StandaloneLeaderRetrievalService

3.4 UML:Leader 地址 → Gateway(Retriever 继承链)

LeaderRetrievalListener
LeaderRetriever
LeaderGatewayRetriever
GatewayRetriever
RpcGatewayRetriever

四、接口与实现:主要方法解读

本节作为"类图"的补充,按接口 → 实现的层次,把最关键的职责与主要方法走一遍,避免只知道"类叫什么"却不知道"方法怎么协作"。

4.1 HighAvailabilityServices:门面接口与三种实现

接口职责

  • 对外提供:主进程组件的 LeaderElection / LeaderRetrievalService,以及 JobGraph/Checkpoint/JobResult/BlobStore 等元数据存储与清理能力
  • 对内屏蔽:Standalone / ZooKeeper / Kubernetes 不同后端差异

典型实现

  • NONE:StandaloneHaServices
  • ZOOKEEPER:ZooKeeperLeaderElectionHaServices(继承 AbstractHaServices
  • KUBERNETES:KubernetesLeaderElectionHaServices(继承 AbstractHaServices

主要方法(以 RM 为例)

  • getResourceManagerLeaderElection():返回 RM 对应的 LeaderElection
  • getResourceManagerLeaderRetriever():返回 RM 对应的 LeaderRetrievalService
  • createBlobStore() / getJobResultStore() / getJobGraphStore() / getCheckpointRecoveryFactory():元数据/BlobStore 相关
  • close() / cleanupAllData() / globalCleanupAsync(...):关闭与清理

AbstractHaServices 的关键点是:内部持有一个 DefaultLeaderElectionService,并通过 createLeaderElection(componentId) 为不同组件创建各自的 LeaderElection 句柄:

java 复制代码
// flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/AbstractHaServices.java
this.leaderElectionService = new DefaultLeaderElectionService(driverFactory);

@Override
public LeaderElection getResourceManagerLeaderElection() {
    return leaderElectionService.createLeaderElection(getLeaderPathForResourceManager());
}

4.2 LeaderElectionService / LeaderElection / LeaderContender:选举侧的"句柄 + 回调"

LeaderElectionService(工厂/后端绑定)

  • 主要方法:createLeaderElection(String componentId)
  • 典型实现:DefaultLeaderElectionService(组合 LeaderElectionDriverFactory,把具体后端交互下沉到 driver)

LeaderElection(每个组件的选举句柄)

  • 主要方法:startLeaderElection(LeaderContender contender)
  • 主要方法:confirmLeadershipAsync(UUID leaderSessionID, String leaderAddress)
  • 主要方法:hasLeadershipAsync(UUID leaderSessionId)
  • 主要方法:close()
  • 典型实现:DefaultLeaderElection(driver 驱动)、StandaloneLeaderElection(退化实现)

LeaderContender(业务组件回调)

  • 主要方法:grantLeadership(UUID leaderSessionID)
  • 主要方法:revokeLeadership()
  • 典型实现:ResourceManagerServiceImplDefaultDispatcherRunnerJobMasterServiceLeadershipRunnerWebMonitorEndpoint

Standalone 的"退化选举"核心在于:注册 contender 的瞬间就授予 leadership,并且 confirm 是 no-op:

java 复制代码
// flink-runtime/src/main/java/org/apache/flink/runtime/leaderelection/StandaloneLeaderElection.java
this.leaderContender = contender;
this.leaderContender.grantLeadership(sessionID);

4.3 LeaderRetrievalService / LeaderRetrievalListener:发现侧的"通知协议"

LeaderRetrievalService(发现服务)

  • 主要方法:start(LeaderRetrievalListener listener)
  • 主要方法:stop()
  • 典型实现:
    • StandaloneLeaderRetrievalService:start 即 notifyLeaderAddress(...)
    • DefaultLeaderRetrievalService:driver 驱动;由 LeaderRetrievalDriverFactory 创建 driver(ZK/K8s 各自有 factory)

LeaderRetrievalListener(监听协议)

  • 主要方法:notifyLeaderAddress(@Nullable String leaderAddress, @Nullable UUID leaderSessionID)
  • 主要方法:handleError(Exception exception)
  • 典型实现:LeaderRetriever(以及其子类 LeaderGatewayRetriever / RpcGatewayRetriever

DefaultLeaderRetrievalService 的关键点是:driver 负责"从后端拿到 LeaderInformation",service 负责"对比是否真的变化并回调 listener":

java 复制代码
// flink-runtime/src/main/java/org/apache/flink/runtime/leaderretrieval/DefaultLeaderRetrievalService.java
leaderListener.notifyLeaderAddress(newLeaderAddress, newLeaderSessionID);

4.4 LeaderRetriever → LeaderGatewayRetriever → RpcGatewayRetriever:把 leader 信息变成可用 Gateway

LeaderRetriever(实现:Leader 信息缓存/转发)

  • implements:LeaderRetrievalListener
  • 主要方法:notifyLeaderAddress(...)(更新 leaderFuture 并触发 notifyNewLeaderAddress(...) hook)
  • 主要方法:getLeaderFuture() / getLeaderNow()

LeaderGatewayRetriever(抽象实现:leaderFuture → gatewayFuture)

  • extends:LeaderRetriever
  • implements:GatewayRetriever<T>
  • 主要方法:getFuture()(失败时触发重连,CAS 避免并发重复创建)
  • 主要方法:notifyNewLeaderAddress(...)(leader 变化时生成新的 gatewayFuture,并把结果转发给旧 future)
  • 抽象方法:createGateway(CompletableFuture<Tuple2<String, UUID>> leaderFuture)

RpcGatewayRetriever<F, T>(最终实现:rpcService.connect + retry)

  • extends:LeaderGatewayRetriever<T>
  • 主要方法:createGateway(...)(通过 rpcService.connect(...)RetryStrategy 组合完成连接/重试)

五、核心启动流程源码拆解

从启动链路视角看,HA 的核心流程分为两段:

  • initializeServices 阶段:创建 HighAvailabilityServices(NONE/ZK/K8s)
  • 核心组件启动阶段:组件从 HighAvailabilityServices 拿到 LeaderElection/LeaderRetrievalService 并进入统一的"选举 + 发现"路径

NONE
ZOOKEEPER
KUBERNETES
ClusterEntrypoint.initializeServices
createHaServices
HighAvailabilityServicesUtils.createHighAvailabilityServices
HighAvailabilityMode
StandaloneHaServices
ZooKeeperLeaderElectionHaServices
KubernetesLeaderElectionHaServices
get*LeaderElection/get*LeaderRetriever
组件 startLeaderElection / startRetrieval

5.1 HighAvailabilityServicesUtils 分发创建逻辑

根据配置中的 high-availability 参数,路由到不同的 HA 服务实现。

java 复制代码
// flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/HighAvailabilityServicesUtils.java
public static HighAvailabilityServices createHighAvailabilityServices(
        Configuration configuration, Executor executor, AddressResolution addressResolution,
        RpcSystemUtils rpcSystemUtils, FatalErrorHandler fatalErrorHandler) throws Exception {

    HighAvailabilityMode highAvailabilityMode = HighAvailabilityMode.fromConfig(configuration);

    switch (highAvailabilityMode) {
        case NONE:
            // 以 JobManager(host, port) 为基准,拼接出各个主进程组件的 RPC URL / Web 地址
            final Tuple2<String, Integer> hostnamePort = getJobManagerAddress(configuration);
            final String resourceManagerRpcUrl =
                    rpcSystemUtils.getRpcUrl(
                            hostnamePort.f0,
                            hostnamePort.f1,
                            RpcServiceUtils.createWildcardName(ResourceManager.RESOURCE_MANAGER_NAME),
                            addressResolution,
                            configuration);
            final String dispatcherRpcUrl =
                    rpcSystemUtils.getRpcUrl(
                            hostnamePort.f0,
                            hostnamePort.f1,
                            RpcServiceUtils.createWildcardName(Dispatcher.DISPATCHER_NAME),
                            addressResolution,
                            configuration);
            final String webMonitorAddress = getWebMonitorAddress(configuration, addressResolution);

            return new StandaloneHaServices(resourceManagerRpcUrl, dispatcherRpcUrl, webMonitorAddress);

        case ZOOKEEPER:
            return createZooKeeperHaServices(configuration, executor, fatalErrorHandler);

        case KUBERNETES:
            return createCustomHAServices(
                    "org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory",
                    configuration, executor);
            // ...
    }
}
  • NONE(Standalone) 分支的关键点 :提前把 ResourceManager/Dispatcher 的 RPC URL 以及 WebMonitor 地址计算出来并封装进 StandaloneHaServices,后续的 StandaloneLeaderRetrievalService 会把这个"已知地址"同步给 listener。
  • ZOOKEEPER/KUBERNETES 分支的关键点:创建真实的后端 driver,把 leader 信息持久化/监听交给外部协调系统完成。

5.2 Standalone:地址封装到 leader retrieval(RM/Dispatcher/WebMonitor)

Standalone 场景下 StandaloneHaServices 会把上一步拼接得到的地址,封装成各自组件的"选举句柄 + 发现服务":

java 复制代码
// flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/nonha/standalone/StandaloneHaServices.java
public class StandaloneHaServices extends AbstractNonHaServices {
    @Override
    public LeaderElection getResourceManagerLeaderElection() {
        return new StandaloneLeaderElection(DEFAULT_LEADER_ID);
    }

    @Override
    public LeaderRetrievalService getResourceManagerLeaderRetriever() {
        return new StandaloneLeaderRetrievalService(resourceManagerAddress, DEFAULT_LEADER_ID);
    }
}

5.3 Standalone:以 ResourceManager 为例走一遍"选举 + 发现 + 建立 Gateway"

StandaloneHaServices
StandaloneLeaderElection
ResourceManagerServiceImpl (LeaderContender)
grantLeadership(DEFAULT_LEADER_ID)
start ResourceManager
StandaloneLeaderRetrievalService (RM RPC URL)
notifyLeaderAddress(...)
RpcGatewayRetriever (LeaderRetrievalListener)
rpcService.connect(...) -> ResourceManagerGateway

5.3.1 组件工厂启动 ResourceManagerService,并触发选举
java 复制代码
// flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/component/DefaultDispatcherResourceManagerComponentFactory.java
@Override
public DispatcherResourceManagerComponent create(
        Configuration configuration,
        ResourceID resourceId,
        Executor ioExecutor,
        RpcService rpcService,
        HighAvailabilityServices highAvailabilityServices,
        BlobServer blobServer,
        HeartbeatServices heartbeatServices,
        DelegationTokenManager delegationTokenManager,
        MetricRegistry metricRegistry,
        ExecutionGraphInfoStore executionGraphInfoStore,
        MetricQueryServiceRetriever metricQueryServiceRetriever,
        Collection<FailureEnricher> failureEnrichers,
        FatalErrorHandler fatalErrorHandler)
        throws Exception {
    LeaderRetrievalService dispatcherLeaderRetrievalService = null;
    LeaderRetrievalService resourceManagerRetrievalService = null;
    WebMonitorEndpoint<?> webMonitorEndpoint = null;
    ResourceManagerService resourceManagerService = null;
    DispatcherRunner dispatcherRunner = null;

    try {
        dispatcherLeaderRetrievalService =
                highAvailabilityServices.getDispatcherLeaderRetriever();
        resourceManagerRetrievalService =
                highAvailabilityServices.getResourceManagerLeaderRetriever();

        final LeaderGatewayRetriever<DispatcherGateway> dispatcherGatewayRetriever =
                new RpcGatewayRetriever<>(
                        rpcService,
                        DispatcherGateway.class,
                        DispatcherId::fromUuid,
                        new ExponentialBackoffRetryStrategy(
                                12, Duration.ofMillis(10), Duration.ofMillis(50)));

        final LeaderGatewayRetriever<ResourceManagerGateway> resourceManagerGatewayRetriever =
                new RpcGatewayRetriever<>(
                        rpcService,
                        ResourceManagerGateway.class,
                        ResourceManagerId::fromUuid,
                        new ExponentialBackoffRetryStrategy(
                                12, Duration.ofMillis(10), Duration.ofMillis(50)));

        webMonitorEndpoint =
                restEndpointFactory.createRestEndpoint(
                        configuration,
                        dispatcherGatewayRetriever,
                        resourceManagerGatewayRetriever,
                        ...);

        ...

        resourceManagerService =
                ResourceManagerServiceImpl.create(
                        ...);

        dispatcherRunner =
                dispatcherRunnerFactory.createDispatcherRunner(
                        ...);

        final DispatcherOperationCaches dispatcherOperationCaches = ...;

        log.debug("Starting ResourceManagerService.");
        resourceManagerService.start();

        resourceManagerRetrievalService.start(resourceManagerGatewayRetriever);
        dispatcherLeaderRetrievalService.start(dispatcherGatewayRetriever);

        return new DispatcherResourceManagerComponent(
                dispatcherRunner,
                resourceManagerService,
                dispatcherLeaderRetrievalService,
                resourceManagerRetrievalService,
                webMonitorEndpoint,
                fatalErrorHandler,
                dispatcherOperationCaches);
    } catch (Exception exception) {
        ...
        throw new FlinkException("Could not create the DispatcherResourceManagerComponent.", exception);
    }
}
  • webMonitorEndpointcreateRestEndpoint(...) 时注入 dispatcherGatewayRetrieverresourceManagerGatewayRetriever,用于 Web/REST 层与 Dispatcher、ResourceManager 建立交互通道(例如把 REST 请求路由到对应 gateway,或驱动指标抓取等)。
java 复制代码
// flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManagerServiceImpl.java
@Override
public void start() throws Exception {
    synchronized (lock) {
        if (running) {
            LOG.debug("Resource manager service has already started.");
            return;
        }
        running = true;
    }

    LOG.info("Starting resource manager service.");
    
    //这一步启动选举服务
    //因为standalone模式下,没有HA服务,所以直接将自己作为leader直接启动
    leaderElection.startLeaderElection(this);
}
5.3.2 StandaloneLeaderElection:注册即授予 leadership
java 复制代码
// flink-runtime/src/main/java/org/apache/flink/runtime/leaderelection/StandaloneLeaderElection.java
@Override
public void startLeaderElection(LeaderContender contender) throws Exception {
    synchronized (lock) {
        Preconditions.checkState(
                leaderContender == null,
                "No LeaderContender should have been registered with this LeaderElection, yet.");
        this.leaderContender = contender;

        this.leaderContender.grantLeadership(sessionID);
    }
}
5.3.3 StandaloneLeaderRetrievalService:start 即 notifyLeaderAddress
java 复制代码
// flink-runtime/src/main/java/org/apache/flink/runtime/leaderretrieval/StandaloneLeaderRetrievalService.java
@Override
public void start(LeaderRetrievalListener listener) {
    checkNotNull(listener, "Listener must not be null.");

    synchronized (startStopLock) {
        checkState(!started, "StandaloneLeaderRetrievalService can only be started once.");
        started = true;

        // directly notify the listener, because we already know the leading JobManager's
        // address
        listener.notifyLeaderAddress(leaderAddress, leaderId);
    }
}
5.3.4 listener 是谁:RpcGatewayRetriever(LeaderRetriever → LeaderGatewayRetriever 的最终落地)
java 复制代码
// flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/retriever/impl/RpcGatewayRetriever.java
@Override
protected CompletableFuture<T> createGateway(
        CompletableFuture<Tuple2<String, UUID>> leaderFuture) {
    return FutureUtils.retryWithDelay(
            () ->
                    leaderFuture.thenCompose(
                            (Tuple2<String, UUID> addressLeaderTuple) ->
                                    rpcService.connect(
                                            addressLeaderTuple.f0,
                                            fencingTokenMapper.apply(addressLeaderTuple.f1),
                                            gatewayType)),
            retryStrategy,
            rpcService.getScheduledExecutor());
}

六、总结

  • HighAvailabilityServices 是 HA 能力门面:上层只依赖它就能拿到选举句柄/发现服务与元数据存储能力
  • ZK/K8s 通过 AbstractHaServices + driver 把"协调/持久化/监听"下沉到外部系统;Standalone 通过"固定 leaderId + 立即回调"退化实现保持接口语义一致
  • LeaderRetriever → LeaderGatewayRetriever → RpcGatewayRetriever 这条继承链把"leader 信息通知"进一步翻译成"可用的 RPC Gateway future",是 WebMonitor/REST/Metric 等组件拿 gateway 的关键桥梁
相关推荐
大大大大晴天️3 小时前
Flink技术实践-FlinkSQL Join技术全解
大数据·flink
csgo打的菜又爱玩3 小时前
4.BlobServer 源码解析
大数据·架构·flink
tian_jiangnan4 小时前
flink mysql集群增删改查
大数据·mysql·flink
武子康4 小时前
大数据-268 实时数仓-ODS 层 Flink+Kafka+HBase实时流处理:Kafka数据写入维度表实战
大数据·后端·flink
渣渣盟1 天前
Flink流处理:温度跳变检测与状态管理
大数据·flink·scala
lifallen1 天前
Paimon 与 ForSt 场景选型分析
java·大数据·flink
juniperhan1 天前
Flink 系列第9篇:Flink 重启策略详解
java·大数据·数据仓库·flink
csgo打的菜又爱玩1 天前
2.Flink RPC通信流程解析
大数据·rpc·flink
大大大大晴天️2 天前
Flink技术实践-Flink SQL 开发中的隐蔽陷阱
大数据·sql·flink