WebMonitorEndpoint 启动流程解析(以 JM 侧 DispatcherRestEndpoint 为例)
这篇只回答一个问题:JobManager 启动时,Web/REST(Web UI + REST API)是怎么被创建出来、绑定端口、注册路由并对外可用的?
一、主题与核心组件职责(总览)
org.apache.flink.runtime.webmonitor.WebMonitorEndpoint:JM 侧 REST 服务的抽象基类;负责拼装大多数 REST handlers,并作为LeaderContender参与"REST endpoint 的 leader 选举",把restBaseUrl对外宣告出去。org.apache.flink.runtime.rest.RestServerEndpoint:REST 服务器底座(Netty);负责 Router/Handler 注册、端口绑定、生成restBaseUrl,并在最后回调startInternal()。org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint:WebMonitorEndpoint在 Dispatcher 场景下的实现;在通用 handlers 基础上追加JobSubmitHandler(提交作业)。org.apache.flink.runtime.rest.SessionRestEndpointFactory:在 Session 集群模式下创建DispatcherRestEndpoint的工厂。org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory:JobManager 启动链路里创建并start()WebMonitorEndpoint 的位置(在 Dispatcher/RM 启动前)。
二、启动入口:JM 里是谁创建并启动 WebMonitorEndpoint?
在 JobManager 启动的组件装配阶段,DefaultDispatcherResourceManagerComponentFactory 负责创建 Dispatcher/RM/WebMonitorEndpoint,并控制启动顺序:先启动 REST endpoint(拿到可用的 restBaseUrl),再启动 ResourceManagerService/DispatcherRunner。
路径:<flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/component/DefaultDispatcherResourceManagerComponentFactory.java>
FQCN:org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory#create
java
webMonitorEndpoint =
restEndpointFactory.createRestEndpoint(
configuration,
dispatcherGatewayRetriever,
resourceManagerGatewayRetriever,
blobServer,
executor,
metricFetcher,
highAvailabilityServices.getClusterRestEndpointLeaderElection(),
fatalErrorHandler);
webMonitorEndpoint.start();
- 这里把
dispatcherGatewayRetriever/resourceManagerGatewayRetriever注入进 REST endpoint:handlers 可以通过它们去调用 Dispatcher/RM 的 RPC 网关。 - 把
highAvailabilityServices.getClusterRestEndpointLeaderElection()注入进 REST endpoint:REST endpoint 会参与 leader 选举,并把restBaseUrl作为 leader 地址对外发布(HA 下尤为关键)。
三、核心类关系(类图)
extends
extends
implements
creates
RestServerEndpoint
WebMonitorEndpoint<T extends RestfulGateway>
DispatcherRestEndpoint
RestEndpointFactory<T>
SessionRestEndpointFactory
四、创建阶段:SessionRestEndpointFactory → DispatcherRestEndpoint → WebMonitorEndpoint
4.1 SessionRestEndpointFactory:从配置创建 DispatcherRestEndpoint
路径:<flink-runtime/src/main/java/org/apache/flink/runtime/rest/SessionRestEndpointFactory.java>
FQCN:org.apache.flink.runtime.rest.SessionRestEndpointFactory#createRestEndpoint
java
public WebMonitorEndpoint<DispatcherGateway> createRestEndpoint(
Configuration configuration,
LeaderGatewayRetriever<DispatcherGateway> dispatcherGatewayRetriever,
LeaderGatewayRetriever<ResourceManagerGateway> resourceManagerGatewayRetriever,
TransientBlobService transientBlobService,
ScheduledExecutorService executor,
MetricFetcher metricFetcher,
LeaderElection leaderElection,
FatalErrorHandler fatalErrorHandler)
throws Exception {
final RestHandlerConfiguration restHandlerConfiguration =
RestHandlerConfiguration.fromConfiguration(configuration);
return new DispatcherRestEndpoint(
dispatcherGatewayRetriever,
configuration,
restHandlerConfiguration,
resourceManagerGatewayRetriever,
transientBlobService,
executor,
metricFetcher,
leaderElection,
RestEndpointFactory.createExecutionGraphCache(restHandlerConfiguration),
fatalErrorHandler);
}
- 这一步的关键产物是
RestHandlerConfiguration:决定 Web UI 是否开启、各类 handler 的 timeout、refreshInterval、webUiDir 等。 - 返回的具体类型是
DispatcherRestEndpoint:它是 JM 侧 Web/REST endpoint 的默认实现。
4.2 DispatcherRestEndpoint:在通用 handlers 上追加"作业提交"
路径:<flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/DispatcherRestEndpoint.java>
FQCN:org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint#DispatcherRestEndpoint
java
public DispatcherRestEndpoint(
GatewayRetriever<DispatcherGateway> leaderRetriever,
Configuration clusterConfiguration,
RestHandlerConfiguration restConfiguration,
GatewayRetriever<ResourceManagerGateway> resourceManagerRetriever,
TransientBlobService transientBlobService,
ScheduledExecutorService executor,
MetricFetcher metricFetcher,
LeaderElection leaderElection,
ExecutionGraphCache executionGraphCache,
FatalErrorHandler fatalErrorHandler)
throws IOException, ConfigurationException {
super(
leaderRetriever,
clusterConfiguration,
restConfiguration,
resourceManagerRetriever,
transientBlobService,
executor,
metricFetcher,
leaderElection,
executionGraphCache,
fatalErrorHandler);
}
路径:<flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/DispatcherRestEndpoint.java>
FQCN:org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint#initializeHandlers
java
protected List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> initializeHandlers(
final CompletableFuture<String> localAddressFuture) {
List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> handlers =
super.initializeHandlers(localAddressFuture);
final Time timeout = restConfiguration.getTimeout();
JobSubmitHandler jobSubmitHandler =
new JobSubmitHandler(
leaderRetriever, timeout, responseHeaders, executor, clusterConfiguration);
handlers.add(Tuple2.of(jobSubmitHandler.getMessageHeaders(), jobSubmitHandler));
return handlers;
}
super.initializeHandlers(...)先把通用的 handler 都装配出来(cluster/job/taskmanager/metrics 等)。- 然后追加
JobSubmitHandler:只有 Dispatcher endpoint 需要"提交作业"这类 handler。
4.3 WebMonitorEndpoint:通用 handler 的大本营
路径:<flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/WebMonitorEndpoint.java>
FQCN:org.apache.flink.runtime.webmonitor.WebMonitorEndpoint
java
public class WebMonitorEndpoint<T extends RestfulGateway> extends RestServerEndpoint
implements LeaderContender, JsonArchivist {
路径:<flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/WebMonitorEndpoint.java>
FQCN:org.apache.flink.runtime.webmonitor.WebMonitorEndpoint#WebMonitorEndpoint
java
public WebMonitorEndpoint(
GatewayRetriever<? extends T> leaderRetriever,
Configuration clusterConfiguration,
RestHandlerConfiguration restConfiguration,
GatewayRetriever<ResourceManagerGateway> resourceManagerRetriever,
TransientBlobService transientBlobService,
ScheduledExecutorService executor,
MetricFetcher metricFetcher,
LeaderElection leaderElection,
ExecutionGraphCache executionGraphCache,
FatalErrorHandler fatalErrorHandler)
throws IOException, ConfigurationException {
super(clusterConfiguration);
this.leaderRetriever = Preconditions.checkNotNull(leaderRetriever);
this.clusterConfiguration = Preconditions.checkNotNull(clusterConfiguration);
this.restConfiguration = Preconditions.checkNotNull(restConfiguration);
this.resourceManagerRetriever = Preconditions.checkNotNull(resourceManagerRetriever);
this.transientBlobService = Preconditions.checkNotNull(transientBlobService);
this.executor = Preconditions.checkNotNull(executor);
this.executionGraphCache = executionGraphCache;
this.metricFetcher = metricFetcher;
this.leaderElection = Preconditions.checkNotNull(leaderElection);
this.fatalErrorHandler = Preconditions.checkNotNull(fatalErrorHandler);
}
- 这里的依赖大体可分 4 类:网关检索(Dispatcher/RM)、REST 配置、异步执行器、可选能力(metrics 拉取、blob upload/download、leader 选举、ExecutionGraph 缓存)。
五、启动阶段:RestServerEndpoint.start() 做了什么?
DispatcherRestEndpoint 没有覆写 start();真正的启动逻辑在父类 RestServerEndpoint.start():准备路由 → 注册 handlers → Netty 绑定端口 → 生成 restBaseUrl → 回调 startInternal()。
路径:<flink-runtime/src/main/java/org/apache/flink/runtime/rest/RestServerEndpoint.java>
FQCN:org.apache.flink.runtime.rest.RestServerEndpoint#start
java
public final void start() throws Exception {
synchronized (lock) {
Preconditions.checkState(
state == State.CREATED, "The RestServerEndpoint cannot be restarted.");
final Router router = new Router();
final CompletableFuture<String> restAddressFuture = new CompletableFuture<>();
handlers = initializeHandlers(restAddressFuture);
Collections.sort(handlers, RestHandlerUrlComparator.INSTANCE);
checkAllEndpointsAndHandlersAreUnique(handlers);
handlers.forEach(handler -> registerHandler(router, handler, log));
MultipartRoutes multipartRoutes = createMultipartRoutes(handlers);
ChannelInitializer<SocketChannel> initializer =
new ChannelInitializer<SocketChannel>() {
@Override
protected void initChannel(SocketChannel ch) throws ConfigurationException {
RouterHandler handler = new RouterHandler(router, responseHeaders);
if (isHttpsEnabled()) {
ch.pipeline()
.addLast(
"ssl",
new RedirectingSslHandler(
restAddress,
restAddressFuture,
sslHandlerFactory));
}
ch.pipeline()
.addLast(new HttpServerCodec())
.addLast(new FileUploadHandler(uploadDir, multipartRoutes))
.addLast(
new FlinkHttpObjectAggregator(
maxContentLength, responseHeaders));
for (InboundChannelHandlerFactory factory :
inboundChannelHandlerFactories) {
Optional<ChannelHandler> channelHandler =
factory.createHandler(configuration, responseHeaders);
if (channelHandler.isPresent()) {
ch.pipeline().addLast(channelHandler.get());
}
}
ch.pipeline()
.addLast(new ChunkedWriteHandler())
.addLast(handler.getName(), handler)
.addLast(new PipelineErrorHandler(log, responseHeaders));
}
};
bootstrap = new ServerBootstrap();
bootstrap.group(bossGroup, workerGroup).channel(NioServerSocketChannel.class).childHandler(initializer);
// bind 端口(按 port range 迭代尝试)
// ...
restBaseUrl = new URL(determineProtocol(), advertisedAddress, port, "").toString();
restAddressFuture.complete(restBaseUrl);
state = State.RUNNING;
startInternal();
}
}
initializeHandlers(restAddressFuture):交给子类(这里是WebMonitorEndpoint/DispatcherRestEndpoint)返回(headersSpec, handler)列表。RestHandlerUrlComparator:对 handlers 的 URL pattern 做排序,保证更具体的路径先匹配。registerHandler(router, handler, log):把每个 handler 的 URL pattern 注册进 Netty Router。- Netty pipeline:HTTP codec → upload/aggregator →(可插入 inbound handlers)→ chunked write → router handler → error handler。
restBaseUrl的生成与发布:绑定完成后拼接 URL,完成restAddressFuture,后续 handler/leader election 会用到它。- 最后
startInternal():留给子类启动"REST endpoint 自身的附属服务"(例如 leader 选举、定时任务)。
六、WebMonitorEndpoint.startInternal():为什么 Web endpoint 也要"选举 leader"?
Web endpoint 启动后,会参与一条独立的 leader 选举:clusterRestEndpointLeaderElection。目的不是"谁处理 HTTP 请求"(HTTP 已经在本 JVM 里绑定端口了),而是为了在 HA 场景下对外发布一个"当前 REST endpoint 的 leader 地址"(也就是 restBaseUrl)。
路径:<flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/WebMonitorEndpoint.java>
FQCN:org.apache.flink.runtime.webmonitor.WebMonitorEndpoint#startInternal
java
public void startInternal() throws Exception {
leaderElection.startLeaderElection(this);
startExecutionGraphCacheCleanupTask();
if (hasWebUI) {
log.info("Web frontend listening at {}.", getRestBaseUrl());
}
}
路径:<flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/WebMonitorEndpoint.java>
FQCN:org.apache.flink.runtime.webmonitor.WebMonitorEndpoint#grantLeadership
java
public void grantLeadership(final UUID leaderSessionID) {
leaderElection.confirmLeadershipAsync(leaderSessionID, getRestBaseUrl());
}
startLeaderElection(this):把WebMonitorEndpoint自己作为LeaderContender注册进去。grantLeadership(...)回调里把getRestBaseUrl()作为 leader 地址进行确认/发布:其它组件(或客户端)就可以"发现当前 REST leader 在哪里"。
七、把启动顺序串起来(WebMonitorEndpoint 在 JM 启动链路中的位置)
DefaultDispatcherResourceManagerComponentFactory.create
SessionRestEndpointFactory.createRestEndpoint
new DispatcherRestEndpoint(WebMonitorEndpoint)
RestServerEndpoint.start()
initializeHandlers() 注册 REST handlers
Netty bind 端口 → restBaseUrl
WebMonitorEndpoint.startInternal() → leaderElection.startLeaderElection
grantLeadership → confirmLeadershipAsync(restBaseUrl)
八、回到主题(收束)
- WebMonitorEndpoint 的"启动"分两层:
RestServerEndpoint.start()负责 Netty/路由/端口,WebMonitorEndpoint.startInternal()负责 leader 选举与后台任务。 - Session 模式下创建的是
DispatcherRestEndpoint:在通用 handlers 上追加"作业提交"等 Dispatcher 专属能力。 - 在 JM 启动链路中,REST endpoint 会被优先启动:这样后续启动的 RM/Dispatcher 能拿到稳定的
restBaseUrl进行对外宣告与互相引用。