8.WebMonitorEndpoint解析.md

WebMonitorEndpoint 启动流程解析(以 JM 侧 DispatcherRestEndpoint 为例)

这篇只回答一个问题:JobManager 启动时,Web/REST(Web UI + REST API)是怎么被创建出来、绑定端口、注册路由并对外可用的?

一、主题与核心组件职责(总览)

  • org.apache.flink.runtime.webmonitor.WebMonitorEndpoint:JM 侧 REST 服务的抽象基类;负责拼装大多数 REST handlers,并作为 LeaderContender 参与"REST endpoint 的 leader 选举",把 restBaseUrl 对外宣告出去。
  • org.apache.flink.runtime.rest.RestServerEndpoint:REST 服务器底座(Netty);负责 Router/Handler 注册、端口绑定、生成 restBaseUrl,并在最后回调 startInternal()
  • org.apache.flink.runtime.dispatcher.DispatcherRestEndpointWebMonitorEndpoint 在 Dispatcher 场景下的实现;在通用 handlers 基础上追加 JobSubmitHandler(提交作业)。
  • org.apache.flink.runtime.rest.SessionRestEndpointFactory:在 Session 集群模式下创建 DispatcherRestEndpoint 的工厂。
  • org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory:JobManager 启动链路里创建并 start() WebMonitorEndpoint 的位置(在 Dispatcher/RM 启动前)。

二、启动入口:JM 里是谁创建并启动 WebMonitorEndpoint?

在 JobManager 启动的组件装配阶段,DefaultDispatcherResourceManagerComponentFactory 负责创建 Dispatcher/RM/WebMonitorEndpoint,并控制启动顺序:先启动 REST endpoint(拿到可用的 restBaseUrl),再启动 ResourceManagerService/DispatcherRunner

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/component/DefaultDispatcherResourceManagerComponentFactory.java>

FQCN:org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory#create

java 复制代码
webMonitorEndpoint =
        restEndpointFactory.createRestEndpoint(
                configuration,
                dispatcherGatewayRetriever,
                resourceManagerGatewayRetriever,
                blobServer,
                executor,
                metricFetcher,
                highAvailabilityServices.getClusterRestEndpointLeaderElection(),
                fatalErrorHandler);

webMonitorEndpoint.start();
  • 这里把 dispatcherGatewayRetriever/resourceManagerGatewayRetriever 注入进 REST endpoint:handlers 可以通过它们去调用 Dispatcher/RM 的 RPC 网关。
  • highAvailabilityServices.getClusterRestEndpointLeaderElection() 注入进 REST endpoint:REST endpoint 会参与 leader 选举,并把 restBaseUrl 作为 leader 地址对外发布(HA 下尤为关键)。

三、核心类关系(类图)

extends
extends
implements
creates
RestServerEndpoint
WebMonitorEndpoint<T extends RestfulGateway>
DispatcherRestEndpoint
RestEndpointFactory<T>
SessionRestEndpointFactory

四、创建阶段:SessionRestEndpointFactory → DispatcherRestEndpoint → WebMonitorEndpoint

4.1 SessionRestEndpointFactory:从配置创建 DispatcherRestEndpoint

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/rest/SessionRestEndpointFactory.java>

FQCN:org.apache.flink.runtime.rest.SessionRestEndpointFactory#createRestEndpoint

java 复制代码
public WebMonitorEndpoint<DispatcherGateway> createRestEndpoint(
        Configuration configuration,
        LeaderGatewayRetriever<DispatcherGateway> dispatcherGatewayRetriever,
        LeaderGatewayRetriever<ResourceManagerGateway> resourceManagerGatewayRetriever,
        TransientBlobService transientBlobService,
        ScheduledExecutorService executor,
        MetricFetcher metricFetcher,
        LeaderElection leaderElection,
        FatalErrorHandler fatalErrorHandler)
        throws Exception {
    final RestHandlerConfiguration restHandlerConfiguration =
            RestHandlerConfiguration.fromConfiguration(configuration);

    return new DispatcherRestEndpoint(
            dispatcherGatewayRetriever,
            configuration,
            restHandlerConfiguration,
            resourceManagerGatewayRetriever,
            transientBlobService,
            executor,
            metricFetcher,
            leaderElection,
            RestEndpointFactory.createExecutionGraphCache(restHandlerConfiguration),
            fatalErrorHandler);
}
  • 这一步的关键产物是 RestHandlerConfiguration:决定 Web UI 是否开启、各类 handler 的 timeout、refreshInterval、webUiDir 等。
  • 返回的具体类型是 DispatcherRestEndpoint:它是 JM 侧 Web/REST endpoint 的默认实现。

4.2 DispatcherRestEndpoint:在通用 handlers 上追加"作业提交"

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/DispatcherRestEndpoint.java>

FQCN:org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint#DispatcherRestEndpoint

java 复制代码
public DispatcherRestEndpoint(
        GatewayRetriever<DispatcherGateway> leaderRetriever,
        Configuration clusterConfiguration,
        RestHandlerConfiguration restConfiguration,
        GatewayRetriever<ResourceManagerGateway> resourceManagerRetriever,
        TransientBlobService transientBlobService,
        ScheduledExecutorService executor,
        MetricFetcher metricFetcher,
        LeaderElection leaderElection,
        ExecutionGraphCache executionGraphCache,
        FatalErrorHandler fatalErrorHandler)
        throws IOException, ConfigurationException {
    super(
            leaderRetriever,
            clusterConfiguration,
            restConfiguration,
            resourceManagerRetriever,
            transientBlobService,
            executor,
            metricFetcher,
            leaderElection,
            executionGraphCache,
            fatalErrorHandler);
}

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/DispatcherRestEndpoint.java>

FQCN:org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint#initializeHandlers

java 复制代码
protected List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> initializeHandlers(
        final CompletableFuture<String> localAddressFuture) {
    List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> handlers =
            super.initializeHandlers(localAddressFuture);

    final Time timeout = restConfiguration.getTimeout();

    JobSubmitHandler jobSubmitHandler =
            new JobSubmitHandler(
                    leaderRetriever, timeout, responseHeaders, executor, clusterConfiguration);

    handlers.add(Tuple2.of(jobSubmitHandler.getMessageHeaders(), jobSubmitHandler));
    return handlers;
}
  • super.initializeHandlers(...) 先把通用的 handler 都装配出来(cluster/job/taskmanager/metrics 等)。
  • 然后追加 JobSubmitHandler:只有 Dispatcher endpoint 需要"提交作业"这类 handler。

4.3 WebMonitorEndpoint:通用 handler 的大本营

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/WebMonitorEndpoint.java>

FQCN:org.apache.flink.runtime.webmonitor.WebMonitorEndpoint

java 复制代码
public class WebMonitorEndpoint<T extends RestfulGateway> extends RestServerEndpoint
        implements LeaderContender, JsonArchivist {

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/WebMonitorEndpoint.java>

FQCN:org.apache.flink.runtime.webmonitor.WebMonitorEndpoint#WebMonitorEndpoint

java 复制代码
public WebMonitorEndpoint(
        GatewayRetriever<? extends T> leaderRetriever,
        Configuration clusterConfiguration,
        RestHandlerConfiguration restConfiguration,
        GatewayRetriever<ResourceManagerGateway> resourceManagerRetriever,
        TransientBlobService transientBlobService,
        ScheduledExecutorService executor,
        MetricFetcher metricFetcher,
        LeaderElection leaderElection,
        ExecutionGraphCache executionGraphCache,
        FatalErrorHandler fatalErrorHandler)
        throws IOException, ConfigurationException {
    super(clusterConfiguration);
    this.leaderRetriever = Preconditions.checkNotNull(leaderRetriever);
    this.clusterConfiguration = Preconditions.checkNotNull(clusterConfiguration);
    this.restConfiguration = Preconditions.checkNotNull(restConfiguration);
    this.resourceManagerRetriever = Preconditions.checkNotNull(resourceManagerRetriever);
    this.transientBlobService = Preconditions.checkNotNull(transientBlobService);
    this.executor = Preconditions.checkNotNull(executor);
    this.executionGraphCache = executionGraphCache;
    this.metricFetcher = metricFetcher;
    this.leaderElection = Preconditions.checkNotNull(leaderElection);
    this.fatalErrorHandler = Preconditions.checkNotNull(fatalErrorHandler);
}
  • 这里的依赖大体可分 4 类:网关检索(Dispatcher/RM)、REST 配置、异步执行器、可选能力(metrics 拉取、blob upload/download、leader 选举、ExecutionGraph 缓存)。

五、启动阶段:RestServerEndpoint.start() 做了什么?

DispatcherRestEndpoint 没有覆写 start();真正的启动逻辑在父类 RestServerEndpoint.start()准备路由 → 注册 handlers → Netty 绑定端口 → 生成 restBaseUrl → 回调 startInternal()

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/rest/RestServerEndpoint.java>

FQCN:org.apache.flink.runtime.rest.RestServerEndpoint#start

java 复制代码
public final void start() throws Exception {
    synchronized (lock) {
        Preconditions.checkState(
                state == State.CREATED, "The RestServerEndpoint cannot be restarted.");

        final Router router = new Router();
        final CompletableFuture<String> restAddressFuture = new CompletableFuture<>();

        handlers = initializeHandlers(restAddressFuture);
        Collections.sort(handlers, RestHandlerUrlComparator.INSTANCE);
        checkAllEndpointsAndHandlersAreUnique(handlers);
        handlers.forEach(handler -> registerHandler(router, handler, log));

        MultipartRoutes multipartRoutes = createMultipartRoutes(handlers);

        ChannelInitializer<SocketChannel> initializer =
                new ChannelInitializer<SocketChannel>() {
                    @Override
                    protected void initChannel(SocketChannel ch) throws ConfigurationException {
                        RouterHandler handler = new RouterHandler(router, responseHeaders);
                        if (isHttpsEnabled()) {
                            ch.pipeline()
                                    .addLast(
                                            "ssl",
                                            new RedirectingSslHandler(
                                                    restAddress,
                                                    restAddressFuture,
                                                    sslHandlerFactory));
                        }

                        ch.pipeline()
                                .addLast(new HttpServerCodec())
                                .addLast(new FileUploadHandler(uploadDir, multipartRoutes))
                                .addLast(
                                        new FlinkHttpObjectAggregator(
                                                maxContentLength, responseHeaders));

                        for (InboundChannelHandlerFactory factory :
                                inboundChannelHandlerFactories) {
                            Optional<ChannelHandler> channelHandler =
                                    factory.createHandler(configuration, responseHeaders);
                            if (channelHandler.isPresent()) {
                                ch.pipeline().addLast(channelHandler.get());
                            }
                        }

                        ch.pipeline()
                                .addLast(new ChunkedWriteHandler())
                                .addLast(handler.getName(), handler)
                                .addLast(new PipelineErrorHandler(log, responseHeaders));
                    }
                };

        bootstrap = new ServerBootstrap();
        bootstrap.group(bossGroup, workerGroup).channel(NioServerSocketChannel.class).childHandler(initializer);

        // bind 端口(按 port range 迭代尝试)
        // ...

        restBaseUrl = new URL(determineProtocol(), advertisedAddress, port, "").toString();
        restAddressFuture.complete(restBaseUrl);
        state = State.RUNNING;

        startInternal();
    }
}
  • initializeHandlers(restAddressFuture):交给子类(这里是 WebMonitorEndpoint/DispatcherRestEndpoint)返回 (headersSpec, handler) 列表。
  • RestHandlerUrlComparator:对 handlers 的 URL pattern 做排序,保证更具体的路径先匹配。
  • registerHandler(router, handler, log):把每个 handler 的 URL pattern 注册进 Netty Router。
  • Netty pipeline:HTTP codec → upload/aggregator →(可插入 inbound handlers)→ chunked write → router handler → error handler。
  • restBaseUrl 的生成与发布:绑定完成后拼接 URL,完成 restAddressFuture,后续 handler/leader election 会用到它。
  • 最后 startInternal():留给子类启动"REST endpoint 自身的附属服务"(例如 leader 选举、定时任务)。

六、WebMonitorEndpoint.startInternal():为什么 Web endpoint 也要"选举 leader"?

Web endpoint 启动后,会参与一条独立的 leader 选举:clusterRestEndpointLeaderElection。目的不是"谁处理 HTTP 请求"(HTTP 已经在本 JVM 里绑定端口了),而是为了在 HA 场景下对外发布一个"当前 REST endpoint 的 leader 地址"(也就是 restBaseUrl)。

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/WebMonitorEndpoint.java>

FQCN:org.apache.flink.runtime.webmonitor.WebMonitorEndpoint#startInternal

java 复制代码
public void startInternal() throws Exception {
    leaderElection.startLeaderElection(this);
    startExecutionGraphCacheCleanupTask();
    if (hasWebUI) {
        log.info("Web frontend listening at {}.", getRestBaseUrl());
    }
}

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/WebMonitorEndpoint.java>

FQCN:org.apache.flink.runtime.webmonitor.WebMonitorEndpoint#grantLeadership

java 复制代码
public void grantLeadership(final UUID leaderSessionID) {
    leaderElection.confirmLeadershipAsync(leaderSessionID, getRestBaseUrl());
}
  • startLeaderElection(this):把 WebMonitorEndpoint 自己作为 LeaderContender 注册进去。
  • grantLeadership(...) 回调里把 getRestBaseUrl() 作为 leader 地址进行确认/发布:其它组件(或客户端)就可以"发现当前 REST leader 在哪里"。

七、把启动顺序串起来(WebMonitorEndpoint 在 JM 启动链路中的位置)

DefaultDispatcherResourceManagerComponentFactory.create
SessionRestEndpointFactory.createRestEndpoint
new DispatcherRestEndpoint(WebMonitorEndpoint)
RestServerEndpoint.start()
initializeHandlers() 注册 REST handlers
Netty bind 端口 → restBaseUrl
WebMonitorEndpoint.startInternal() → leaderElection.startLeaderElection
grantLeadership → confirmLeadershipAsync(restBaseUrl)

八、回到主题(收束)

  • WebMonitorEndpoint 的"启动"分两层:RestServerEndpoint.start() 负责 Netty/路由/端口,WebMonitorEndpoint.startInternal() 负责 leader 选举与后台任务。
  • Session 模式下创建的是 DispatcherRestEndpoint:在通用 handlers 上追加"作业提交"等 Dispatcher 专属能力。
  • 在 JM 启动链路中,REST endpoint 会被优先启动:这样后续启动的 RM/Dispatcher 能拿到稳定的 restBaseUrl 进行对外宣告与互相引用。
相关推荐
数字化顾问11 小时前
(121页PPT)IT规划咨询项目规划报告(附下载方式)
大数据
ws20190711 小时前
从芯片到架构:AUTO TECH China 2026聚焦汽车计算新赛道
大数据·人工智能·科技·汽车
小北的AI科技分享11 小时前
API管理的五种路径:五款工具的功能侧写与数据支撑
大数据·人工智能·api管理
容器魔方11 小时前
“驾驭工程”下一跳?JiuwenClaw AgentTeam开启“协同工程”全新范式
人工智能·云原生·容器·架构·开源
zgdlsz11 小时前
羲之文化传承人王杰宝:沉厚笔墨间的守正出新
大数据·数据库·数据仓库·涛思数据
科智咨询11 小时前
2026 AI智能体落地纪实:谁在用?用在哪?
大数据·人工智能·科技·aigc
YuanDaima204811 小时前
Docker 核心架构与底层技术原理解析
运维·人工智能·docker·微服务·容器·架构·个人开发
拼尽全力前进12 小时前
JDDL 核心实现原理与架构解析
架构·wpf
清 晨12 小时前
YouTube电视端结账能力增强后跨境品牌如何重构长视频带货链路
大数据·人工智能·新媒体运营·跨境·营销策略
fengxin_rou12 小时前
用户模块架构实战:DTO 与 Domain 分层、Optional 空值处理、事务只读优化详解
java·后端·架构·用户实战