8.WebMonitorEndpoint解析.md

WebMonitorEndpoint 启动流程解析(以 JM 侧 DispatcherRestEndpoint 为例)

这篇只回答一个问题:JobManager 启动时,Web/REST(Web UI + REST API)是怎么被创建出来、绑定端口、注册路由并对外可用的?

一、主题与核心组件职责(总览)

  • org.apache.flink.runtime.webmonitor.WebMonitorEndpoint:JM 侧 REST 服务的抽象基类;负责拼装大多数 REST handlers,并作为 LeaderContender 参与"REST endpoint 的 leader 选举",把 restBaseUrl 对外宣告出去。
  • org.apache.flink.runtime.rest.RestServerEndpoint:REST 服务器底座(Netty);负责 Router/Handler 注册、端口绑定、生成 restBaseUrl,并在最后回调 startInternal()
  • org.apache.flink.runtime.dispatcher.DispatcherRestEndpointWebMonitorEndpoint 在 Dispatcher 场景下的实现;在通用 handlers 基础上追加 JobSubmitHandler(提交作业)。
  • org.apache.flink.runtime.rest.SessionRestEndpointFactory:在 Session 集群模式下创建 DispatcherRestEndpoint 的工厂。
  • org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory:JobManager 启动链路里创建并 start() WebMonitorEndpoint 的位置(在 Dispatcher/RM 启动前)。

二、启动入口:JM 里是谁创建并启动 WebMonitorEndpoint?

在 JobManager 启动的组件装配阶段,DefaultDispatcherResourceManagerComponentFactory 负责创建 Dispatcher/RM/WebMonitorEndpoint,并控制启动顺序:先启动 REST endpoint(拿到可用的 restBaseUrl),再启动 ResourceManagerService/DispatcherRunner

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/component/DefaultDispatcherResourceManagerComponentFactory.java>

FQCN:org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory#create

java 复制代码
webMonitorEndpoint =
        restEndpointFactory.createRestEndpoint(
                configuration,
                dispatcherGatewayRetriever,
                resourceManagerGatewayRetriever,
                blobServer,
                executor,
                metricFetcher,
                highAvailabilityServices.getClusterRestEndpointLeaderElection(),
                fatalErrorHandler);

webMonitorEndpoint.start();
  • 这里把 dispatcherGatewayRetriever/resourceManagerGatewayRetriever 注入进 REST endpoint:handlers 可以通过它们去调用 Dispatcher/RM 的 RPC 网关。
  • highAvailabilityServices.getClusterRestEndpointLeaderElection() 注入进 REST endpoint:REST endpoint 会参与 leader 选举,并把 restBaseUrl 作为 leader 地址对外发布(HA 下尤为关键)。

三、核心类关系(类图)

extends
extends
implements
creates
RestServerEndpoint
WebMonitorEndpoint<T extends RestfulGateway>
DispatcherRestEndpoint
RestEndpointFactory<T>
SessionRestEndpointFactory

四、创建阶段:SessionRestEndpointFactory → DispatcherRestEndpoint → WebMonitorEndpoint

4.1 SessionRestEndpointFactory:从配置创建 DispatcherRestEndpoint

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/rest/SessionRestEndpointFactory.java>

FQCN:org.apache.flink.runtime.rest.SessionRestEndpointFactory#createRestEndpoint

java 复制代码
public WebMonitorEndpoint<DispatcherGateway> createRestEndpoint(
        Configuration configuration,
        LeaderGatewayRetriever<DispatcherGateway> dispatcherGatewayRetriever,
        LeaderGatewayRetriever<ResourceManagerGateway> resourceManagerGatewayRetriever,
        TransientBlobService transientBlobService,
        ScheduledExecutorService executor,
        MetricFetcher metricFetcher,
        LeaderElection leaderElection,
        FatalErrorHandler fatalErrorHandler)
        throws Exception {
    final RestHandlerConfiguration restHandlerConfiguration =
            RestHandlerConfiguration.fromConfiguration(configuration);

    return new DispatcherRestEndpoint(
            dispatcherGatewayRetriever,
            configuration,
            restHandlerConfiguration,
            resourceManagerGatewayRetriever,
            transientBlobService,
            executor,
            metricFetcher,
            leaderElection,
            RestEndpointFactory.createExecutionGraphCache(restHandlerConfiguration),
            fatalErrorHandler);
}
  • 这一步的关键产物是 RestHandlerConfiguration:决定 Web UI 是否开启、各类 handler 的 timeout、refreshInterval、webUiDir 等。
  • 返回的具体类型是 DispatcherRestEndpoint:它是 JM 侧 Web/REST endpoint 的默认实现。

4.2 DispatcherRestEndpoint:在通用 handlers 上追加"作业提交"

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/DispatcherRestEndpoint.java>

FQCN:org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint#DispatcherRestEndpoint

java 复制代码
public DispatcherRestEndpoint(
        GatewayRetriever<DispatcherGateway> leaderRetriever,
        Configuration clusterConfiguration,
        RestHandlerConfiguration restConfiguration,
        GatewayRetriever<ResourceManagerGateway> resourceManagerRetriever,
        TransientBlobService transientBlobService,
        ScheduledExecutorService executor,
        MetricFetcher metricFetcher,
        LeaderElection leaderElection,
        ExecutionGraphCache executionGraphCache,
        FatalErrorHandler fatalErrorHandler)
        throws IOException, ConfigurationException {
    super(
            leaderRetriever,
            clusterConfiguration,
            restConfiguration,
            resourceManagerRetriever,
            transientBlobService,
            executor,
            metricFetcher,
            leaderElection,
            executionGraphCache,
            fatalErrorHandler);
}

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/DispatcherRestEndpoint.java>

FQCN:org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint#initializeHandlers

java 复制代码
protected List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> initializeHandlers(
        final CompletableFuture<String> localAddressFuture) {
    List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> handlers =
            super.initializeHandlers(localAddressFuture);

    final Time timeout = restConfiguration.getTimeout();

    JobSubmitHandler jobSubmitHandler =
            new JobSubmitHandler(
                    leaderRetriever, timeout, responseHeaders, executor, clusterConfiguration);

    handlers.add(Tuple2.of(jobSubmitHandler.getMessageHeaders(), jobSubmitHandler));
    return handlers;
}
  • super.initializeHandlers(...) 先把通用的 handler 都装配出来(cluster/job/taskmanager/metrics 等)。
  • 然后追加 JobSubmitHandler:只有 Dispatcher endpoint 需要"提交作业"这类 handler。

4.3 WebMonitorEndpoint:通用 handler 的大本营

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/WebMonitorEndpoint.java>

FQCN:org.apache.flink.runtime.webmonitor.WebMonitorEndpoint

java 复制代码
public class WebMonitorEndpoint<T extends RestfulGateway> extends RestServerEndpoint
        implements LeaderContender, JsonArchivist {

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/WebMonitorEndpoint.java>

FQCN:org.apache.flink.runtime.webmonitor.WebMonitorEndpoint#WebMonitorEndpoint

java 复制代码
public WebMonitorEndpoint(
        GatewayRetriever<? extends T> leaderRetriever,
        Configuration clusterConfiguration,
        RestHandlerConfiguration restConfiguration,
        GatewayRetriever<ResourceManagerGateway> resourceManagerRetriever,
        TransientBlobService transientBlobService,
        ScheduledExecutorService executor,
        MetricFetcher metricFetcher,
        LeaderElection leaderElection,
        ExecutionGraphCache executionGraphCache,
        FatalErrorHandler fatalErrorHandler)
        throws IOException, ConfigurationException {
    super(clusterConfiguration);
    this.leaderRetriever = Preconditions.checkNotNull(leaderRetriever);
    this.clusterConfiguration = Preconditions.checkNotNull(clusterConfiguration);
    this.restConfiguration = Preconditions.checkNotNull(restConfiguration);
    this.resourceManagerRetriever = Preconditions.checkNotNull(resourceManagerRetriever);
    this.transientBlobService = Preconditions.checkNotNull(transientBlobService);
    this.executor = Preconditions.checkNotNull(executor);
    this.executionGraphCache = executionGraphCache;
    this.metricFetcher = metricFetcher;
    this.leaderElection = Preconditions.checkNotNull(leaderElection);
    this.fatalErrorHandler = Preconditions.checkNotNull(fatalErrorHandler);
}
  • 这里的依赖大体可分 4 类:网关检索(Dispatcher/RM)、REST 配置、异步执行器、可选能力(metrics 拉取、blob upload/download、leader 选举、ExecutionGraph 缓存)。

五、启动阶段:RestServerEndpoint.start() 做了什么?

DispatcherRestEndpoint 没有覆写 start();真正的启动逻辑在父类 RestServerEndpoint.start()准备路由 → 注册 handlers → Netty 绑定端口 → 生成 restBaseUrl → 回调 startInternal()

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/rest/RestServerEndpoint.java>

FQCN:org.apache.flink.runtime.rest.RestServerEndpoint#start

java 复制代码
public final void start() throws Exception {
    synchronized (lock) {
        Preconditions.checkState(
                state == State.CREATED, "The RestServerEndpoint cannot be restarted.");

        final Router router = new Router();
        final CompletableFuture<String> restAddressFuture = new CompletableFuture<>();

        handlers = initializeHandlers(restAddressFuture);
        Collections.sort(handlers, RestHandlerUrlComparator.INSTANCE);
        checkAllEndpointsAndHandlersAreUnique(handlers);
        handlers.forEach(handler -> registerHandler(router, handler, log));

        MultipartRoutes multipartRoutes = createMultipartRoutes(handlers);

        ChannelInitializer<SocketChannel> initializer =
                new ChannelInitializer<SocketChannel>() {
                    @Override
                    protected void initChannel(SocketChannel ch) throws ConfigurationException {
                        RouterHandler handler = new RouterHandler(router, responseHeaders);
                        if (isHttpsEnabled()) {
                            ch.pipeline()
                                    .addLast(
                                            "ssl",
                                            new RedirectingSslHandler(
                                                    restAddress,
                                                    restAddressFuture,
                                                    sslHandlerFactory));
                        }

                        ch.pipeline()
                                .addLast(new HttpServerCodec())
                                .addLast(new FileUploadHandler(uploadDir, multipartRoutes))
                                .addLast(
                                        new FlinkHttpObjectAggregator(
                                                maxContentLength, responseHeaders));

                        for (InboundChannelHandlerFactory factory :
                                inboundChannelHandlerFactories) {
                            Optional<ChannelHandler> channelHandler =
                                    factory.createHandler(configuration, responseHeaders);
                            if (channelHandler.isPresent()) {
                                ch.pipeline().addLast(channelHandler.get());
                            }
                        }

                        ch.pipeline()
                                .addLast(new ChunkedWriteHandler())
                                .addLast(handler.getName(), handler)
                                .addLast(new PipelineErrorHandler(log, responseHeaders));
                    }
                };

        bootstrap = new ServerBootstrap();
        bootstrap.group(bossGroup, workerGroup).channel(NioServerSocketChannel.class).childHandler(initializer);

        // bind 端口(按 port range 迭代尝试)
        // ...

        restBaseUrl = new URL(determineProtocol(), advertisedAddress, port, "").toString();
        restAddressFuture.complete(restBaseUrl);
        state = State.RUNNING;

        startInternal();
    }
}
  • initializeHandlers(restAddressFuture):交给子类(这里是 WebMonitorEndpoint/DispatcherRestEndpoint)返回 (headersSpec, handler) 列表。
  • RestHandlerUrlComparator:对 handlers 的 URL pattern 做排序,保证更具体的路径先匹配。
  • registerHandler(router, handler, log):把每个 handler 的 URL pattern 注册进 Netty Router。
  • Netty pipeline:HTTP codec → upload/aggregator →(可插入 inbound handlers)→ chunked write → router handler → error handler。
  • restBaseUrl 的生成与发布:绑定完成后拼接 URL,完成 restAddressFuture,后续 handler/leader election 会用到它。
  • 最后 startInternal():留给子类启动"REST endpoint 自身的附属服务"(例如 leader 选举、定时任务)。

六、WebMonitorEndpoint.startInternal():为什么 Web endpoint 也要"选举 leader"?

Web endpoint 启动后,会参与一条独立的 leader 选举:clusterRestEndpointLeaderElection。目的不是"谁处理 HTTP 请求"(HTTP 已经在本 JVM 里绑定端口了),而是为了在 HA 场景下对外发布一个"当前 REST endpoint 的 leader 地址"(也就是 restBaseUrl)。

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/WebMonitorEndpoint.java>

FQCN:org.apache.flink.runtime.webmonitor.WebMonitorEndpoint#startInternal

java 复制代码
public void startInternal() throws Exception {
    leaderElection.startLeaderElection(this);
    startExecutionGraphCacheCleanupTask();
    if (hasWebUI) {
        log.info("Web frontend listening at {}.", getRestBaseUrl());
    }
}

路径:<flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/WebMonitorEndpoint.java>

FQCN:org.apache.flink.runtime.webmonitor.WebMonitorEndpoint#grantLeadership

java 复制代码
public void grantLeadership(final UUID leaderSessionID) {
    leaderElection.confirmLeadershipAsync(leaderSessionID, getRestBaseUrl());
}
  • startLeaderElection(this):把 WebMonitorEndpoint 自己作为 LeaderContender 注册进去。
  • grantLeadership(...) 回调里把 getRestBaseUrl() 作为 leader 地址进行确认/发布:其它组件(或客户端)就可以"发现当前 REST leader 在哪里"。

七、把启动顺序串起来(WebMonitorEndpoint 在 JM 启动链路中的位置)

DefaultDispatcherResourceManagerComponentFactory.create
SessionRestEndpointFactory.createRestEndpoint
new DispatcherRestEndpoint(WebMonitorEndpoint)
RestServerEndpoint.start()
initializeHandlers() 注册 REST handlers
Netty bind 端口 → restBaseUrl
WebMonitorEndpoint.startInternal() → leaderElection.startLeaderElection
grantLeadership → confirmLeadershipAsync(restBaseUrl)

八、回到主题(收束)

  • WebMonitorEndpoint 的"启动"分两层:RestServerEndpoint.start() 负责 Netty/路由/端口,WebMonitorEndpoint.startInternal() 负责 leader 选举与后台任务。
  • Session 模式下创建的是 DispatcherRestEndpoint:在通用 handlers 上追加"作业提交"等 Dispatcher 专属能力。
  • 在 JM 启动链路中,REST endpoint 会被优先启动:这样后续启动的 RM/Dispatcher 能拿到稳定的 restBaseUrl 进行对外宣告与互相引用。
相关推荐
薛定猫AI2 小时前
【深度解析】AI Coding 工具的模型自由与 Agent 架构:从 VS Code 插件到云端代理的技术演进
大数据·人工智能·架构
雪碧聊技术2 小时前
告别“复制粘贴”!微服务架构下如何统一管理POM依赖版本(实战详解)
微服务·云原生·架构
sleeppingfrog2 小时前
claude code配置智普模型流程
大数据·elasticsearch·搜索引擎
AI服务老曹2 小时前
从GB28181接入到边缘NPU算力调度:深度解析支持异构计算的工业级AI视频管理平台架构
人工智能·架构·音视频
唐兴通个人2 小时前
唐兴通受邀华润医药高管培训:AI时代OTC与处方药营销逻辑全面重构数字化转型与创新思维
大数据·人工智能
互联科技报2 小时前
短视频矩阵混剪工具源码架构深度解析:从超级编导、筷子科技到超级智剪2.0的技术范式演进
科技·矩阵·架构
AI服务老曹2 小时前
【架构深度解析】从异构计算到微服务:构建支持 X86/ARM 与 GPU/NPU 协同的 GB28181 视频 AI 平台
arm开发·微服务·架构
七颗糖很甜2 小时前
预警!超级厄尔尼诺即将登场:2026-2027年全球气候或迎“极端狂暴模式”
java·大数据·python·算法·github