HAService启动流程解析
本文从 Standalone(non-HA)与 HA 模式切入,梳理 Flink 为"高可用能力"抽象出的统一接口 HighAvailabilityServices 及其启动流程。同时补充与其强相关的两类基础能力接口:Leader 选举(LeaderElection)与 Leader 发现(LeaderRetrievalService)。
一、主题与核心组件职责
本文主要解析 Flink 集群启动过程中 HA 服务的初始化流程,以及 HA 服务内部的核心组件职责。
- HighAvailabilityServices(接口:HA 能力门面) :对外提供主进程组件的 Leader 选举/发现能力、元数据持久化能力(JobGraph/JobResult/BlobStore)以及清理能力的统一访问入口。典型实现:
StandaloneHaServices(NONE)、ZooKeeperLeaderElectionHaServices(ZOOKEEPER)、KubernetesLeaderElectionHaServices(KUBERNETES)。 - LeaderElectionService(接口:选举后端/工厂) :为某个
componentId创建专属的LeaderElection句柄,并把选举逻辑绑定到对应的 HA 后端实现。典型实现:DefaultLeaderElectionService(基于LeaderElectionDriverFactory/driver 的通用实现);Standalone 模式下一般不经由该接口创建句柄,而是由HighAvailabilityServices直接返回StandaloneLeaderElection。 - LeaderElection(接口:Leader 选举句柄) :作为
LeaderElectionService与LeaderContender之间的桥梁(代理),抽象出"开始参与选举、确认领导权并发布地址、检查领导权、关闭"的统一协议。典型实现:DefaultLeaderElection(driver 驱动)、StandaloneLeaderElection(退化实现)。 - LeaderRetrievalService(接口:Leader 发现) :负责发现当前 leader 并通知监听器(
LeaderRetrievalListener)。典型实现:StandaloneLeaderRetrievalService(start 即 notify)、DefaultLeaderRetrievalService(driver 驱动;ZK 对应ZooKeeperLeaderRetrievalDriverFactory,K8s 对应KubernetesLeaderRetrievalDriverFactory)。 - LeaderContender(接口:Leader 回调) :被
LeaderElection回调的业务组件接口,用于通知业务组件(如 Dispatcher/ResourceManager)当前 leadership 的授予与撤销。典型实现:DefaultDispatcherRunner、ResourceManagerServiceImpl、JobMasterServiceLeadershipRunner、WebMonitorEndpoint。 - LeaderRetrievalListener(接口:Leader 发现的监听者) :被
LeaderRetrievalService回调的监听者接口,当 leader 发生变化时接收 leader 的地址和 fencing token。 - LeaderRetriever(实现:Leader 信息缓存/转发) :
implements LeaderRetrievalListener,把notifyLeaderAddress(address, leaderId)事件翻译成CompletableFuture<Tuple2<String, UUID>> leaderFuture,并用AtomicReference保存"当前这一轮 leaderFuture"(leader 变化时会切换到新的 future)。 - LeaderGatewayRetriever(抽象实现:从 leaderFuture 到 gatewayFuture) :
extends LeaderRetriever且implements GatewayRetriever<T>,把leaderFuture映射为CompletableFuture<T>;leader 变化时通过notifyNewLeaderAddress(...)触发createGateway(newLeaderFuture)生成新的 gatewayFuture;当 gatewayFuture 异常完成时,getFuture()会触发重连并用 CAS 避免并发重复创建。 - RpcGatewayRetriever<F, T>(最终实现:基于 RpcService 连接 leader) :
extends LeaderGatewayRetriever<T>,在createGateway(...)内用rpcService.connect(leaderAddress, fencingToken, gatewayType)建立到 leader 的 RPC 连接,并通过RetryStrategy实现失败重试。
二、基础服务组件(HA 依赖的地基)
HA 服务的初始化在 Flink 集群启动的早期阶段(initializeServices),它本身也依赖一些更基础的服务:
- Configuration:提供 HA 模式(NONE, ZOOKEEPER, KUBERNETES)以及存储路径、ZK 地址等配置。
- RpcSystemUtils / RpcService:用于在 Standalone 模式下拼接 ResourceManager/Dispatcher 等组件的 RPC URL。
- Executor (ioExecutor):提供后台线程池,用于执行 HA 内部的异步回调和 I/O 操作。
- BlobStoreService:HA 常常与 BlobServer 深度绑定,HA 服务负责创建并提供高可用的 BlobStore。
三、启动入口与类图分析
HA 服务的入口位于 ClusterEntrypoint#initializeServices,其核心动作是:调用 createHaServices,进而进入 HighAvailabilityServicesUtils.createHighAvailabilityServices(...),根据配置实例化具体的 HighAvailabilityServices 实现,并在后续供 Dispatcher/ResourceManager/WebMonitor 等组件获取"选举句柄 + 发现服务"。
3.1 启动入口:initializeServices → createHaServices
java
// flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/ClusterEntrypoint.java
protected HighAvailabilityServices createHaServices(
Configuration configuration, Executor executor, RpcSystemUtils rpcSystemUtils)
throws Exception {
return HighAvailabilityServicesUtils.createHighAvailabilityServices(
configuration,
executor,
AddressResolution.NO_ADDRESS_RESOLUTION,
rpcSystemUtils,
this);
}
- 依赖传递 :
Configuration决定 HA 模式与后端参数,Executor用于 IO/回调执行,RpcSystemUtils用于 NONE(Standalone) 下拼接 RPC URL。
3.2 UML:入口依赖 + HA Services 实现
createHaServices()
getRpcUrl(...)
createBlobStore()
Configuration
Executor
RpcSystemUtils
RpcService
FatalErrorHandler
BlobStoreService
ClusterEntrypoint
HighAvailabilityServicesUtils
<<interface>>
HighAvailabilityServices
StandaloneHaServices
ZooKeeperLeaderElectionHaServices
KubernetesLeaderElectionHaServices
3.3 UML:Leader 选举/发现(接口 + 默认实现)
createLeaderElection(componentId)
startLeaderElection(contender)
start(listener)
<<interface>>
LeaderElectionService
<<interface>>
LeaderElection
<<interface>>
LeaderContender
DefaultLeaderElectionService
DefaultLeaderElection
StandaloneLeaderElection
<<interface>>
LeaderRetrievalService
<<interface>>
LeaderRetrievalListener
DefaultLeaderRetrievalService
StandaloneLeaderRetrievalService
3.4 UML:Leader 地址 → Gateway(Retriever 继承链)
LeaderRetrievalListener
LeaderRetriever
LeaderGatewayRetriever
GatewayRetriever
RpcGatewayRetriever
四、接口与实现:主要方法解读
本节作为"类图"的补充,按接口 → 实现的层次,把最关键的职责与主要方法走一遍,避免只知道"类叫什么"却不知道"方法怎么协作"。
4.1 HighAvailabilityServices:门面接口与三种实现
接口职责
- 对外提供:主进程组件的
LeaderElection/LeaderRetrievalService,以及 JobGraph/Checkpoint/JobResult/BlobStore 等元数据存储与清理能力 - 对内屏蔽:Standalone / ZooKeeper / Kubernetes 不同后端差异
典型实现
- NONE:
StandaloneHaServices - ZOOKEEPER:
ZooKeeperLeaderElectionHaServices(继承AbstractHaServices) - KUBERNETES:
KubernetesLeaderElectionHaServices(继承AbstractHaServices)
主要方法(以 RM 为例)
getResourceManagerLeaderElection():返回 RM 对应的LeaderElectiongetResourceManagerLeaderRetriever():返回 RM 对应的LeaderRetrievalServicecreateBlobStore()/getJobResultStore()/getJobGraphStore()/getCheckpointRecoveryFactory():元数据/BlobStore 相关close()/cleanupAllData()/globalCleanupAsync(...):关闭与清理
AbstractHaServices 的关键点是:内部持有一个 DefaultLeaderElectionService,并通过 createLeaderElection(componentId) 为不同组件创建各自的 LeaderElection 句柄:
java
// flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/AbstractHaServices.java
this.leaderElectionService = new DefaultLeaderElectionService(driverFactory);
@Override
public LeaderElection getResourceManagerLeaderElection() {
return leaderElectionService.createLeaderElection(getLeaderPathForResourceManager());
}
4.2 LeaderElectionService / LeaderElection / LeaderContender:选举侧的"句柄 + 回调"
LeaderElectionService(工厂/后端绑定)
- 主要方法:
createLeaderElection(String componentId) - 典型实现:
DefaultLeaderElectionService(组合LeaderElectionDriverFactory,把具体后端交互下沉到 driver)
LeaderElection(每个组件的选举句柄)
- 主要方法:
startLeaderElection(LeaderContender contender) - 主要方法:
confirmLeadershipAsync(UUID leaderSessionID, String leaderAddress) - 主要方法:
hasLeadershipAsync(UUID leaderSessionId) - 主要方法:
close() - 典型实现:
DefaultLeaderElection(driver 驱动)、StandaloneLeaderElection(退化实现)
LeaderContender(业务组件回调)
- 主要方法:
grantLeadership(UUID leaderSessionID) - 主要方法:
revokeLeadership() - 典型实现:
ResourceManagerServiceImpl、DefaultDispatcherRunner、JobMasterServiceLeadershipRunner、WebMonitorEndpoint
Standalone 的"退化选举"核心在于:注册 contender 的瞬间就授予 leadership,并且 confirm 是 no-op:
java
// flink-runtime/src/main/java/org/apache/flink/runtime/leaderelection/StandaloneLeaderElection.java
this.leaderContender = contender;
this.leaderContender.grantLeadership(sessionID);
4.3 LeaderRetrievalService / LeaderRetrievalListener:发现侧的"通知协议"
LeaderRetrievalService(发现服务)
- 主要方法:
start(LeaderRetrievalListener listener) - 主要方法:
stop() - 典型实现:
StandaloneLeaderRetrievalService:start 即notifyLeaderAddress(...)DefaultLeaderRetrievalService:driver 驱动;由LeaderRetrievalDriverFactory创建 driver(ZK/K8s 各自有 factory)
LeaderRetrievalListener(监听协议)
- 主要方法:
notifyLeaderAddress(@Nullable String leaderAddress, @Nullable UUID leaderSessionID) - 主要方法:
handleError(Exception exception) - 典型实现:
LeaderRetriever(以及其子类LeaderGatewayRetriever/RpcGatewayRetriever)
DefaultLeaderRetrievalService 的关键点是:driver 负责"从后端拿到 LeaderInformation",service 负责"对比是否真的变化并回调 listener":
java
// flink-runtime/src/main/java/org/apache/flink/runtime/leaderretrieval/DefaultLeaderRetrievalService.java
leaderListener.notifyLeaderAddress(newLeaderAddress, newLeaderSessionID);
4.4 LeaderRetriever → LeaderGatewayRetriever → RpcGatewayRetriever:把 leader 信息变成可用 Gateway
LeaderRetriever(实现:Leader 信息缓存/转发)
- implements:
LeaderRetrievalListener - 主要方法:
notifyLeaderAddress(...)(更新leaderFuture并触发notifyNewLeaderAddress(...)hook) - 主要方法:
getLeaderFuture()/getLeaderNow()
LeaderGatewayRetriever(抽象实现:leaderFuture → gatewayFuture)
- extends:
LeaderRetriever - implements:
GatewayRetriever<T> - 主要方法:
getFuture()(失败时触发重连,CAS 避免并发重复创建) - 主要方法:
notifyNewLeaderAddress(...)(leader 变化时生成新的 gatewayFuture,并把结果转发给旧 future) - 抽象方法:
createGateway(CompletableFuture<Tuple2<String, UUID>> leaderFuture)
RpcGatewayRetriever<F, T>(最终实现:rpcService.connect + retry)
- extends:
LeaderGatewayRetriever<T> - 主要方法:
createGateway(...)(通过rpcService.connect(...)与RetryStrategy组合完成连接/重试)
五、核心启动流程源码拆解
从启动链路视角看,HA 的核心流程分为两段:
initializeServices阶段:创建HighAvailabilityServices(NONE/ZK/K8s)- 核心组件启动阶段:组件从
HighAvailabilityServices拿到LeaderElection/LeaderRetrievalService并进入统一的"选举 + 发现"路径
NONE
ZOOKEEPER
KUBERNETES
ClusterEntrypoint.initializeServices
createHaServices
HighAvailabilityServicesUtils.createHighAvailabilityServices
HighAvailabilityMode
StandaloneHaServices
ZooKeeperLeaderElectionHaServices
KubernetesLeaderElectionHaServices
get*LeaderElection/get*LeaderRetriever
组件 startLeaderElection / startRetrieval
5.1 HighAvailabilityServicesUtils 分发创建逻辑
根据配置中的 high-availability 参数,路由到不同的 HA 服务实现。
java
// flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/HighAvailabilityServicesUtils.java
public static HighAvailabilityServices createHighAvailabilityServices(
Configuration configuration, Executor executor, AddressResolution addressResolution,
RpcSystemUtils rpcSystemUtils, FatalErrorHandler fatalErrorHandler) throws Exception {
HighAvailabilityMode highAvailabilityMode = HighAvailabilityMode.fromConfig(configuration);
switch (highAvailabilityMode) {
case NONE:
// 以 JobManager(host, port) 为基准,拼接出各个主进程组件的 RPC URL / Web 地址
final Tuple2<String, Integer> hostnamePort = getJobManagerAddress(configuration);
final String resourceManagerRpcUrl =
rpcSystemUtils.getRpcUrl(
hostnamePort.f0,
hostnamePort.f1,
RpcServiceUtils.createWildcardName(ResourceManager.RESOURCE_MANAGER_NAME),
addressResolution,
configuration);
final String dispatcherRpcUrl =
rpcSystemUtils.getRpcUrl(
hostnamePort.f0,
hostnamePort.f1,
RpcServiceUtils.createWildcardName(Dispatcher.DISPATCHER_NAME),
addressResolution,
configuration);
final String webMonitorAddress = getWebMonitorAddress(configuration, addressResolution);
return new StandaloneHaServices(resourceManagerRpcUrl, dispatcherRpcUrl, webMonitorAddress);
case ZOOKEEPER:
return createZooKeeperHaServices(configuration, executor, fatalErrorHandler);
case KUBERNETES:
return createCustomHAServices(
"org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory",
configuration, executor);
// ...
}
}
- NONE(Standalone) 分支的关键点 :提前把
ResourceManager/Dispatcher的 RPC URL 以及WebMonitor地址计算出来并封装进StandaloneHaServices,后续的StandaloneLeaderRetrievalService会把这个"已知地址"同步给 listener。 - ZOOKEEPER/KUBERNETES 分支的关键点:创建真实的后端 driver,把 leader 信息持久化/监听交给外部协调系统完成。
5.2 Standalone:地址封装到 leader retrieval(RM/Dispatcher/WebMonitor)
Standalone 场景下 StandaloneHaServices 会把上一步拼接得到的地址,封装成各自组件的"选举句柄 + 发现服务":
java
// flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/nonha/standalone/StandaloneHaServices.java
public class StandaloneHaServices extends AbstractNonHaServices {
@Override
public LeaderElection getResourceManagerLeaderElection() {
return new StandaloneLeaderElection(DEFAULT_LEADER_ID);
}
@Override
public LeaderRetrievalService getResourceManagerLeaderRetriever() {
return new StandaloneLeaderRetrievalService(resourceManagerAddress, DEFAULT_LEADER_ID);
}
}
5.3 Standalone:以 ResourceManager 为例走一遍"选举 + 发现 + 建立 Gateway"
StandaloneHaServices
StandaloneLeaderElection
ResourceManagerServiceImpl (LeaderContender)
grantLeadership(DEFAULT_LEADER_ID)
start ResourceManager
StandaloneLeaderRetrievalService (RM RPC URL)
notifyLeaderAddress(...)
RpcGatewayRetriever (LeaderRetrievalListener)
rpcService.connect(...) -> ResourceManagerGateway
5.3.1 组件工厂启动 ResourceManagerService,并触发选举
java
// flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/component/DefaultDispatcherResourceManagerComponentFactory.java
@Override
public DispatcherResourceManagerComponent create(
Configuration configuration,
ResourceID resourceId,
Executor ioExecutor,
RpcService rpcService,
HighAvailabilityServices highAvailabilityServices,
BlobServer blobServer,
HeartbeatServices heartbeatServices,
DelegationTokenManager delegationTokenManager,
MetricRegistry metricRegistry,
ExecutionGraphInfoStore executionGraphInfoStore,
MetricQueryServiceRetriever metricQueryServiceRetriever,
Collection<FailureEnricher> failureEnrichers,
FatalErrorHandler fatalErrorHandler)
throws Exception {
LeaderRetrievalService dispatcherLeaderRetrievalService = null;
LeaderRetrievalService resourceManagerRetrievalService = null;
WebMonitorEndpoint<?> webMonitorEndpoint = null;
ResourceManagerService resourceManagerService = null;
DispatcherRunner dispatcherRunner = null;
try {
dispatcherLeaderRetrievalService =
highAvailabilityServices.getDispatcherLeaderRetriever();
resourceManagerRetrievalService =
highAvailabilityServices.getResourceManagerLeaderRetriever();
final LeaderGatewayRetriever<DispatcherGateway> dispatcherGatewayRetriever =
new RpcGatewayRetriever<>(
rpcService,
DispatcherGateway.class,
DispatcherId::fromUuid,
new ExponentialBackoffRetryStrategy(
12, Duration.ofMillis(10), Duration.ofMillis(50)));
final LeaderGatewayRetriever<ResourceManagerGateway> resourceManagerGatewayRetriever =
new RpcGatewayRetriever<>(
rpcService,
ResourceManagerGateway.class,
ResourceManagerId::fromUuid,
new ExponentialBackoffRetryStrategy(
12, Duration.ofMillis(10), Duration.ofMillis(50)));
webMonitorEndpoint =
restEndpointFactory.createRestEndpoint(
configuration,
dispatcherGatewayRetriever,
resourceManagerGatewayRetriever,
...);
...
resourceManagerService =
ResourceManagerServiceImpl.create(
...);
dispatcherRunner =
dispatcherRunnerFactory.createDispatcherRunner(
...);
final DispatcherOperationCaches dispatcherOperationCaches = ...;
log.debug("Starting ResourceManagerService.");
resourceManagerService.start();
resourceManagerRetrievalService.start(resourceManagerGatewayRetriever);
dispatcherLeaderRetrievalService.start(dispatcherGatewayRetriever);
return new DispatcherResourceManagerComponent(
dispatcherRunner,
resourceManagerService,
dispatcherLeaderRetrievalService,
resourceManagerRetrievalService,
webMonitorEndpoint,
fatalErrorHandler,
dispatcherOperationCaches);
} catch (Exception exception) {
...
throw new FlinkException("Could not create the DispatcherResourceManagerComponent.", exception);
}
}
webMonitorEndpoint在createRestEndpoint(...)时注入dispatcherGatewayRetriever与resourceManagerGatewayRetriever,用于 Web/REST 层与 Dispatcher、ResourceManager 建立交互通道(例如把 REST 请求路由到对应 gateway,或驱动指标抓取等)。
java
// flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManagerServiceImpl.java
@Override
public void start() throws Exception {
synchronized (lock) {
if (running) {
LOG.debug("Resource manager service has already started.");
return;
}
running = true;
}
LOG.info("Starting resource manager service.");
//这一步启动选举服务
//因为standalone模式下,没有HA服务,所以直接将自己作为leader直接启动
leaderElection.startLeaderElection(this);
}
5.3.2 StandaloneLeaderElection:注册即授予 leadership
java
// flink-runtime/src/main/java/org/apache/flink/runtime/leaderelection/StandaloneLeaderElection.java
@Override
public void startLeaderElection(LeaderContender contender) throws Exception {
synchronized (lock) {
Preconditions.checkState(
leaderContender == null,
"No LeaderContender should have been registered with this LeaderElection, yet.");
this.leaderContender = contender;
this.leaderContender.grantLeadership(sessionID);
}
}
5.3.3 StandaloneLeaderRetrievalService:start 即 notifyLeaderAddress
java
// flink-runtime/src/main/java/org/apache/flink/runtime/leaderretrieval/StandaloneLeaderRetrievalService.java
@Override
public void start(LeaderRetrievalListener listener) {
checkNotNull(listener, "Listener must not be null.");
synchronized (startStopLock) {
checkState(!started, "StandaloneLeaderRetrievalService can only be started once.");
started = true;
// directly notify the listener, because we already know the leading JobManager's
// address
listener.notifyLeaderAddress(leaderAddress, leaderId);
}
}
5.3.4 listener 是谁:RpcGatewayRetriever(LeaderRetriever → LeaderGatewayRetriever 的最终落地)
java
// flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/retriever/impl/RpcGatewayRetriever.java
@Override
protected CompletableFuture<T> createGateway(
CompletableFuture<Tuple2<String, UUID>> leaderFuture) {
return FutureUtils.retryWithDelay(
() ->
leaderFuture.thenCompose(
(Tuple2<String, UUID> addressLeaderTuple) ->
rpcService.connect(
addressLeaderTuple.f0,
fencingTokenMapper.apply(addressLeaderTuple.f1),
gatewayType)),
retryStrategy,
rpcService.getScheduledExecutor());
}
六、总结
HighAvailabilityServices是 HA 能力门面:上层只依赖它就能拿到选举句柄/发现服务与元数据存储能力- ZK/K8s 通过
AbstractHaServices + driver把"协调/持久化/监听"下沉到外部系统;Standalone 通过"固定 leaderId + 立即回调"退化实现保持接口语义一致 LeaderRetriever → LeaderGatewayRetriever → RpcGatewayRetriever这条继承链把"leader 信息通知"进一步翻译成"可用的 RPC Gateway future",是 WebMonitor/REST/Metric 等组件拿 gateway 的关键桥梁