Flink从入门到上天系列第五篇：Flink集群化部署模式

一：部署模式

在一些应用场景中，对于集群资源分配和占用的方式，可能会有特定的需求。Flink为各种场景提供了不同的部署模式，主要有以下三种：

会话模式(SessionMode)、单作业模式(Per-Job Mode)、应用模式 (Application Mode).

它们的区别主要在于：集群的生命周期以及资源的分配方式:以及应用的main方法到底在哪里执行一-客户端(Client)还是JobManager。

1：会话模式

会话模式其实最符合常规思维。我们需要先启动一个集群，保持一个会话 ，在这个会话中通过客户端提交作业。集群启动时所有资源就都已经确定 ，所以所有提交的作业会竞争集群中的资源。

会话模式比较适合于单个规模小、执行时间短的大量作业。

2：单作业模式

会话模式因为资源共享会导致很多问题，所以为了更好地隔离资源，我们可以考虑为每个提交的作业启动一个集群 ，这就是所谓的单作业(Per-Job)模式。

作业完成后，集群就会关闭，所有资源也会释放。

这些特性使得单作业模式在生产环境运行更加稳定，所以是实际应用的首选模式。需要注意的是，Flink本身无法直接这样运行，所以单作业模式一般需要借助一些资源管理框架让来启动集群，比如YARN、Kubemnetes(K8S)，我们之前部署好的flink是会话模式。

3：应用模式

前面提到的两种模式下，应用代码都是在客户端上执行，然后由客户端提交给JobManager的。

但是这种方式客户端需要占用大量网络带宽，去下载依赖和把二进制数据发送给JobManagge。加上很多情况下我们提交作业用的是同一个客户端，就会加重客户端所在节点的资源消耗。

所以解决办法就是，我们不要客户端了，直接把应用提交到JobManger上运行 。而这也就代表着，我们需要为每一个提交的应用单独启动一个JobManage，也就是创建一个集群。这个JobManager只为执行这一个应用而存在，执行结束之后JobManager也就关闭了，这就是所谓的应用模式。

应用模式才是未来，但作业模式已经被标记过时了。在新版本中。应用模式本来就是为了解决单作业模式的一些痛点的。
这里我们所讲到的部署模式，相对是比较抽象的概念。实际应用时，一般需要和资源管理平台结合起来，选择特定的模式来分配资源、部署应用。接下来，我们就针对不同的资源提供者的场景，具体介绍Flink的部署方式。

二：Standalone运行模式

运行模式的意思就是谁来管理集群资源。

Standalone就是Flink自己管理。

独立模式是独立运行的，不依赖任何外部的资源管理平台。当然独立也是有代价的：如果资源不足，或者出现故障，没有自动扩展或重分配资源的保证，必须手动处理。所以独立模式一般只用在开发测试或作业非常少的场景下。

这种运行模式已经指定好了，需要几个taskManager，需要几个物理机。

1：会话模式部署

我们在第3.2节用的就是Standalone集群的会话模式部署。

提前启动集群，并通过Web页面客户端提交任务（可以多个任务，但是集群资源固定）

2：单作业模式

Flink的Standalone集群并不支持单作业模式部署。因为单作业模式需要借助一些资源管理平台。

3：应用模式

应用模式下不会提前创建集群，所以不能调用start-cluster.sh脚本。我们可以使用同样在bin目录下的standalone-job.sh来创建一个JobManager。

1：环境准备

复制代码

nc -lk 7777

2：jar包放到lib/目录下。

复制代码

mv FlinkTutorial-1.0-SNAPSHOT.jar lib/

3：JobManager。

复制代码

./standalone-job.sh start --job-classname com.dahsu.wc.SocketStreamWordCount

这里我们直接指定作业入口类，脚本会到lib目录扫描所有的jar包。

4：启动TaskManager。

复制代码

/taskmanager.sh start

5：模拟发送单词数据。

6：flink UI上观察数据即可

三：Yarn运行模式

1：准备环境

大部分大数据的底层架构是基于Hadoop生态构建的进行构建的。

Hadoop三大件：HDFS+Yarn+MapReduce。其中的Yarn就负责资源的调度和管理。Flink可以提交任务给Yarn来管理。

YARN上部署的过程是：客户端把 Flink应用提交给Yarn的ResourceManager，Yarn的ResourceManager 会向 Yarn的NodeManager 申请容器。

在这些容器上，Flink 会部署JobManager和 TaskManager 的实例，从而启动集群。Flink 会根据运行在JobManger上的作业所需要的Slot数量动态分配TaskManager资源。

这也是Yarn模式的优势。

准备工作：

1：安装部署好Hadoop

2：让flink能感知到hadoop存在。通过配置环境变量即可。这样也能做到hadoop和flink的解耦。

修改下环境变量：
复制代码
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64
export PATH=$JAVA_HOME/bin:$PATH
export HADOOP_HOME=/usr/local/src/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_CLASSPATH=`hadoop classpath`
启动Hadoop的hdfs和yarn

2：会话模式部署

执行脚本命令向YARN集群中申请资源，开启一个YARN会话，启动flink集群。

命令概述：

bin/yarn-session.sh -nm test

可用参数解读：

-d：分离模式，如果你不想让Flink YARN客户端一直前台运行，可以使用这个参数，即使关掉当前对话窗口，YARN session也可以后台运行。

-jm（--jobManagerMemory）：配置JobManager所需内存，默认单位MB。

-nm（--name）：配置在YARN UI界面上显示的任务名。

-qu（--queue）：指定YARN队列名。

-tm（--taskManager）：配置每个TaskManager所使用内存。

自从Flink1.11.0版本不再使用-n参数和-s参数分别指定TaskManager数量和slot数量，YARN会按照需求动态分配TaskManager和slot。所以从这个意义上讲，YARN的会话模式也不会把集群资源固定，同样是动态分配的。

复制代码

[root@bigdata137 bin]# ./yarn-session.sh -nm test
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/flink-1.17.0/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop-3.3.5/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2026-02-13 14:35:16,215 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.memory.process.size, 1728m
2026-02-13 14:35:16,220 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.bind-host, 0.0.0.0
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.execution.failover-strategy, region
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.address, 192.168.67.137
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.memory.process.size, 1024m
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.port, 6123
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: rest.bind-address, 0.0.0.0
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: rest.port, 8081
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.bind-host, 0.0.0.0
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.host, 192.168.67.137
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: parallelism.default, 1
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: rest.address, 192.168.67.137
2026-02-13 14:35:16,420 WARN  org.apache.hadoop.util.NativeCodeLoader                      [] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2026-02-13 14:35:16,445 INFO  org.apache.flink.runtime.security.modules.HadoopModule       [] - Hadoop user set to root (auth:SIMPLE)
2026-02-13 14:35:16,445 INFO  org.apache.flink.runtime.security.modules.HadoopModule       [] - Kerberos security is disabled.
2026-02-13 14:35:16,455 INFO  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file will be created as /tmp/jaas-2140832887329830020.conf.
2026-02-13 14:35:16,484 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/usr/local/src/flink-1.17.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2026-02-13 14:35:16,531 INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - Connecting to ResourceManager at bigdata138/192.168.67.138:8032
2026-02-13 14:35:16,722 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (102.400mb (107374184 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2026-02-13 14:35:16,737 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (172.800mb (181193935 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2026-02-13 14:35:16,860 INFO  org.apache.hadoop.conf.Configuration                         [] - resource-types.xml not found
2026-02-13 14:35:16,860 INFO  org.apache.hadoop.yarn.util.resource.ResourceUtils           [] - Unable to find 'resource-types.xml'.
2026-02-13 14:35:16,932 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - The configured TaskManager memory is 1728 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 320 MB may not be used by Flink.
2026-02-13 14:35:16,932 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1728, slotsPerTaskManager=1}
2026-02-13 14:35:16,989 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-datadog
2026-02-13 14:35:16,992 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: external-resource-gpu
2026-02-13 14:35:16,992 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-graphite
2026-02-13 14:35:16,992 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-influx
2026-02-13 14:35:16,992 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-slf4j
2026-02-13 14:35:16,992 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-prometheus
2026-02-13 14:35:16,992 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-statsd
2026-02-13 14:35:16,992 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-jmx
2026-02-13 14:35:21,040 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (102.400mb (107374184 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2026-02-13 14:35:21,051 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cannot use kerberos delegation token manager, no valid kerberos credentials provided.
2026-02-13 14:35:21,055 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Submitting application master application_1770964001601_0001
2026-02-13 14:35:21,385 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl        [] - Submitted application application_1770964001601_0001
2026-02-13 14:35:21,385 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Waiting for the cluster to be allocated
2026-02-13 14:35:21,388 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Deploying cluster, current state ACCEPTED
2026-02-13 14:35:26,197 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - YARN application has been deployed successfully.
2026-02-13 14:35:26,198 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface bigdata139:8081 of application 'application_1770964001601_0001'.
JobManager Web Interface: http://bigdata139:8081

到这里，我们真正实现了flink和yarn集群的打通。会话模式会首先启动启动，集群作为一个应用被yarn进行管理。也就是资源不再由flink进行分配，而是由yarn进行分配。

启动的时候先启动一个yarn应用，这个yarn引用就相当于开了一个集群，左右的作业都是在这个yarn应用上进行。

之后，我们提交真正的任务作业：

1：UI提交作业

然后，此任务提交之前必须先启动137下的7777端口，否则任务不会连接不上该端口就任务会自动失败。

复制代码

[root@bigdata137 bin]# nc -lt 7777
big shit
hello java
hello flink
hello hadoop
hello baby
hello shit
hello aolige

我们还是查看下执行结果：

Yarn的会话模式的话，资源也是动态分配的。没有运行任务的时候，overview当中task manager是0个，运行作业之后，变成了1.
复制代码
[root@bigdata138 cuillei]# jpsall
=============== bigdata137 ===============
49904 YarnTaskExecutorRunner
36513 DataNode
36498 Jps
36283 NameNode
36910 NodeManager
=============== bigdata138 ===============
13333 ResourceManager
13094 DataNode
16614 YarnSessionClusterEntrypoint
13596 NodeManager
27887 Jps
=============== bigdata139 ===============
17313 Jps
11366 DataNode
11590 NodeManager
11498 SecondaryNameNode
[root@bigdata138 cuillei]#
如果我们现在手动cancel掉任务。
复制代码
[root@bigdata137 bin]# jpsall
=============== bigdata137 ===============
36513 DataNode
37333 Jps
36283 NameNode
36910 NodeManager
=============== bigdata138 ===============
13333 ResourceManager
13094 DataNode
16614 YarnSessionClusterEntrypoint
13596 NodeManager
33263 Jps
=============== bigdata139 ===============
11366 DataNode
11590 NodeManager
17561 Jps
11498 SecondaryNameNode
在 YARN Session 模式下，Flink 的所有核心进程（包括 Session 本身和提交到 Session 的任务）都运行在 YARN 为你申请的 Container 里。我用更直观的方式拆解这个运行架构，让你彻底明白：

YARN Session 模式的进程分布（全在 Container 里）

YARN 本质是「资源调度器」，它不会直接运行程序，而是为程序分配「Container」（可以理解为 YARN 集群中一个独立的 "资源单元"，包含指定的 CPU / 内存），所有 Flink 进程都跑在这些 Container 里：

Flink 进程所在 Container 作用

JobManager（ApplicationMaster） YARN 分配的 1 个专属 Container 1. 作为 YARN 的 ApplicationMaster（AM），向 YARN 申请资源；2. 作为 Flink Session 的 "大脑"，接收任务、调度任务；3. 这个 Container 是 Session 的核心，只要它在，Session 就活着

TaskManager YARN 分配的 N 个 Container（数量由你启动 Session 时指定，比如 -tm 2 就是 2 个） 1. 运行 Flink 任务的具体算子（比如 map、reduce、窗口计算）；2. 每个 TaskManager 对应 1 个 Container，资源隔离；3. 提交到 Session 的所有任务，最终都在这些 TaskManager Container 里执行

YARN Session 模式下，后续提交的所有 Flink 任务，本质上都是在竞争这个 Session 启动时申请的固定一批 Container 内的资源（CPU、内存、Slot），不会再向 YARN 申请新的 Container。

Flink 进程	所在 Container	作用
JobManager（ApplicationMaster）	YARN 分配的 1 个专属 Container	1. 作为 YARN 的 ApplicationMaster（AM），向 YARN 申请资源；2. 作为 Flink Session 的 "大脑"，接收任务、调度任务；3. 这个 Container 是 Session 的核心，只要它在，Session 就活着
TaskManager	YARN 分配的 N 个 Container（数量由你启动 Session 时指定，比如 `-tm 2` 就是 2 个）	1. 运行 Flink 任务的具体算子（比如 map、reduce、窗口计算）；2. 每个 TaskManager 对应 1 个 Container，资源隔离；3. 提交到 Session 的所有任务，最终都在这些 TaskManager Container 里执行

2：命令行提交作业

上边基本上已经把知识点讲述清楚了，我们接下来只是尝试一下在YARN Session模式下通过命令行提交作业任务。

通过命令行提交作业：

提交之前先查看状态：

然后，我们关闭hadoop集群，然后再开启hadoop集群。接下来，我们开启Yarn Session模式。

复制代码

[root@bigdata137 bin]# yarn-session.sh -d -nm flinkdemo
bash: yarn-session.sh: command not found...
[root@bigdata137 bin]# ./yarn-session.sh -d -nm flinkdemo
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/flink-1.17.0/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop-3.3.5/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2026-02-13 23:06:38,973 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.memory.process.size, 1728m
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.bind-host, 0.0.0.0
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.execution.failover-strategy, region
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.address, 192.168.67.137
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.memory.process.size, 1024m
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.port, 6123
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: rest.bind-address, 0.0.0.0
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: rest.port, 8081
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.bind-host, 0.0.0.0
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.host, 192.168.67.137
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: parallelism.default, 1
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: rest.address, 192.168.67.137
2026-02-13 23:06:38,993 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                [] - Found Yarn properties file under /tmp/.yarn-properties-root.
2026-02-13 23:06:39,162 WARN  org.apache.hadoop.util.NativeCodeLoader                      [] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2026-02-13 23:06:39,178 INFO  org.apache.flink.runtime.security.modules.HadoopModule       [] - Hadoop user set to root (auth:SIMPLE)
2026-02-13 23:06:39,179 INFO  org.apache.flink.runtime.security.modules.HadoopModule       [] - Kerberos security is disabled.
2026-02-13 23:06:39,186 INFO  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file will be created as /tmp/jaas-776283340901102364.conf.
2026-02-13 23:06:39,203 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/usr/local/src/flink-1.17.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2026-02-13 23:06:39,243 INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - Connecting to ResourceManager at bigdata138/192.168.67.138:8032
2026-02-13 23:06:39,418 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (102.400mb (107374184 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2026-02-13 23:06:39,427 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (172.800mb (181193935 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2026-02-13 23:06:39,536 INFO  org.apache.hadoop.conf.Configuration                         [] - resource-types.xml not found
2026-02-13 23:06:39,536 INFO  org.apache.hadoop.yarn.util.resource.ResourceUtils           [] - Unable to find 'resource-types.xml'.
2026-02-13 23:06:39,604 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - The configured TaskManager memory is 1728 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 320 MB may not be used by Flink.
2026-02-13 23:06:39,604 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1728, slotsPerTaskManager=1}
2026-02-13 23:06:39,635 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-datadog
2026-02-13 23:06:39,637 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: external-resource-gpu
2026-02-13 23:06:39,637 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-graphite
2026-02-13 23:06:39,637 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-influx
2026-02-13 23:06:39,637 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-slf4j
2026-02-13 23:06:39,637 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-prometheus
2026-02-13 23:06:39,637 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-statsd
2026-02-13 23:06:39,637 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-jmx
2026-02-13 23:06:43,624 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (102.400mb (107374184 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2026-02-13 23:06:43,633 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cannot use kerberos delegation token manager, no valid kerberos credentials provided.
2026-02-13 23:06:43,638 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Submitting application master application_1770995005496_0001
2026-02-13 23:06:43,971 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl        [] - Submitted application application_1770995005496_0001
2026-02-13 23:06:43,971 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Waiting for the cluster to be allocated
2026-02-13 23:06:44,002 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Deploying cluster, current state ACCEPTED
2026-02-13 23:06:48,570 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - YARN application has been deployed successfully.
2026-02-13 23:06:48,571 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface bigdata138:8081 of application 'application_1770995005496_0001'.
JobManager Web Interface: http://bigdata138:8081
2026-02-13 23:06:48,748 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                [] - The Flink YARN session cluster has been started in detached mode. In order to stop Flink gracefully, use the following command:
$ echo "stop" | ./bin/yarn-session.sh -id application_1770995005496_0001
If this should not be possible, then you can also kill Flink via YARN's web interface or via:
$ yarn application -kill application_1770995005496_0001
Note that killing Flink might not clean up all job artifacts and temporary files.
[root@bigdata137 bin]# jpsall
=============== bigdata137 ===============
90326 DataNode
90731 NodeManager
90092 NameNode
92575 Jps
=============== bigdata138 ===============
37271 YarnSessionClusterEntrypoint
36266 ResourceManager
37354 Jps
36460 NodeManager
36030 DataNode
=============== bigdata139 ===============
20630 NodeManager
20537 SecondaryNameNode
20925 Jps
20415 DataNode

接下来我们通过命令行提交任务：

提交任务之前，必须再137物理机上开启:

复制代码

nc -lt 7777

这样服务监听了7777端口之后，服务启动的时候才不会暴：connection refused。

上边yarn session启动告诉我们，我们提交任务只需要往bigdata138:8081提交即可

首先先把我们的作业jar包放到lib下面。之后立即提交任务。

复制代码

[root@bigdata137 bin]# ./flink run -m 192.168.67.138:8081 -c com.dashu.worldcount.wordCountUnboundedStream ../lib/flink170-1.0-SNAPSHOT.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/flink-1.17.0/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop-3.3.5/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2026-02-13 23:09:32,706 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                [] - Found Yarn properties file under /tmp/.yarn-properties-root.
2026-02-13 23:09:32,706 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                [] - Found Yarn properties file under /tmp/.yarn-properties-root.
Job has been submitted with JobID e591cdd8d259fd2202a888d6844185bd
^C[root@bigdata137 bin]#

标红的是变化的，之前这些内容都是0：

发送数据查看最终效果：

3：注意

为什么命令行提交作业，不输入ip和端口号也能默认提交yarn session模式下？
复制代码
[root@bigdata137 tmp]# cat .yarn-properties-root
#Generated YARN properties file
#Fri Feb 13 23:06:48 CST 2026
dynamicPropertiesString=
applicationID=application_1770995005496_0001
[root@bigdata137 tmp]#
是因为在临时文件中，给记录下来了。任务提交的时候，如果不指定ip和端口号默认会从配置文件中去找，也就会默认匹配到yarn session模式，否则才会都standalone的会话模式。

yarn session模式，如果把任务cacel掉不会影响集群，那么如果想关闭掉集群怎么办？

1：直接在hadoop的resourceManager管理工作台，找到具体应用（集群）直接删除应用即可。

2：使用启动时日志中写出来的停止命令：

echo "stop" | ./yarn-session.sh -id application_1770995005496_0001

3：单作业模式部署

1：单作业模式部署详情

在YARN环境中，由于有了外部平台做资源调度，所以我们也可以直接向YARN提交一个单独的作业，从而启动一个Flink集群。

也就是不用像是yarn session那样，分为两步：创建yarn session集群、提交作业。

提交作业：

复制代码

[root@bigdata137 bin]# ./flink run -t yarn-per-job -c com.dashu.worldcount.wordCountUnboundedStream ../lib/flink170-1.0-SNAPSHOT.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/flink-1.17.0/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop-3.3.5/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2026-02-14 13:58:09,093 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/usr/local/src/flink-1.17.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2026-02-14 13:58:09,149 INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - Connecting to ResourceManager at bigdata138/192.168.67.138:8032
2026-02-14 13:58:09,442 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2026-02-14 13:58:09,461 WARN  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Job Clusters are deprecated since Flink 1.15. Please use an Application Cluster/Application Mode instead.
2026-02-14 13:58:09,663 INFO  org.apache.hadoop.conf.Configuration                         [] - resource-types.xml not found
2026-02-14 13:58:09,664 INFO  org.apache.hadoop.yarn.util.resource.ResourceUtils           [] - Unable to find 'resource-types.xml'.
2026-02-14 13:58:09,817 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - The configured TaskManager memory is 1728 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 320 MB may not be used by Flink.
2026-02-14 13:58:09,818 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1728, slotsPerTaskManager=1}
2026-02-14 13:58:16,562 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cannot use kerberos delegation token manager, no valid kerberos credentials provided.
2026-02-14 13:58:16,567 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Submitting application master application_1771048519804_0001
2026-02-14 13:58:16,965 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl        [] - Submitted application application_1771048519804_0001
2026-02-14 13:58:16,965 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Waiting for the cluster to be allocated
2026-02-14 13:58:17,009 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Deploying cluster, current state ACCEPTED
2026-02-14 13:58:25,393 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - YARN application has been deployed successfully.
2026-02-14 13:58:25,394 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface bigdata139:8081 of application 'application_1771048519804_0001'.
Job has been submitted with JobID 17e4520479aadb5dcdbe31bbe080bf4d

当前我们提交了任务，生成了应用对应的就是flink集群，然后因为我们没有开启137的7777的监听。导致任务启动失败，集群结束。

接下来我们

然后再重新启动：

2：停止但作业模式部署

1：直接在flink页面上cacel（单作业模式停止之后，应用会关闭资源会回收掉。）

2：通过命令行取消作业

复制代码

[root@bigdata137 bin]# flink list -t yarn-per-job -Dyarn.application.id=application_1771048519804_0005
bash: flink: command not found...
Similar command is: 'link'
[root@bigdata137 bin]# ./flink list -t yarn-per-job -Dyarn.application.id=application_1771048519804_0005
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/flink-1.17.0/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop-3.3.5/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2026-02-14 14:28:18,433 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/usr/local/src/flink-1.17.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2026-02-14 14:28:18,525 INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - Connecting to ResourceManager at bigdata138/192.168.67.138:8032
2026-02-14 14:28:18,670 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2026-02-14 14:28:18,762 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface bigdata138:8081 of application 'application_1771048519804_0005'.
Waiting for response...
------------------ Running/Restarting Jobs -------------------
14.02.2026 14:18:34 : aa315a1351ef01f19a1e178b457adf4d : Flink Streaming Job (RUNNING)
--------------------------------------------------------------
No scheduled jobs.
[root@bigdata137 bin]# ./flink cancel -t yarn-per-job -Dyarn.application.id=application_1771048519804_0005 aa315a1351ef01f19a1e178b457adf4d
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/flink-1.17.0/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop-3.3.5/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Cancelling job aa315a1351ef01f19a1e178b457adf4d.
2026-02-14 14:30:29,759 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/usr/local/src/flink-1.17.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2026-02-14 14:30:29,852 INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - Connecting to ResourceManager at bigdata138/192.168.67.138:8032
2026-02-14 14:30:30,004 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2026-02-14 14:30:30,094 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface bigdata138:8081 of application 'application_1771048519804_0005'.
Cancelled job aa315a1351ef01f19a1e178b457adf4d.
[root@bigdata137 bin]#

3：这个作业取消和页面取消没有任何问题。

4：应用模式部署

1：应用模式部署详情

应用模式同样非常简单，与单作业模式类似，直接执行flink run-application命令即可

区别就是代码不再有client解析，而是由resourceManager进行解析。节省网络带宽

提交作业：

复制代码

[root@bigdata137 bin]# ./flink run-application -t yarn-application -c com.dashu.worldcount.wordCountUnboundedStream ../lib/flink170-1.0-SNAPSHOT.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/flink-1.17.0/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop-3.3.5/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2026-02-14 15:01:16,657 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/usr/local/src/flink-1.17.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2026-02-14 15:01:16,706 INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - Connecting to ResourceManager at bigdata138/192.168.67.138:8032
2026-02-14 15:01:16,897 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2026-02-14 15:01:17,037 INFO  org.apache.hadoop.conf.Configuration                         [] - resource-types.xml not found
2026-02-14 15:01:17,037 INFO  org.apache.hadoop.yarn.util.resource.ResourceUtils           [] - Unable to find 'resource-types.xml'.
2026-02-14 15:01:17,087 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - The configured TaskManager memory is 1728 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 320 MB may not be used by Flink.
2026-02-14 15:01:17,088 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1728, slotsPerTaskManager=1}
2026-02-14 15:01:20,778 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cannot use kerberos delegation token manager, no valid kerberos credentials provided.
2026-02-14 15:01:20,782 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Submitting application master application_1771048519804_0007
2026-02-14 15:01:20,832 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl        [] - Submitted application application_1771048519804_0007
2026-02-14 15:01:20,832 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Waiting for the cluster to be allocated
2026-02-14 15:01:20,834 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Deploying cluster, current state ACCEPTED
2026-02-14 15:01:26,158 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - YARN application has been deployed successfully.
2026-02-14 15:01:26,159 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface bigdata139:8081 of application 'application_1771048519804_0007'.
[roo

2：停止应用模式

1：flink页面上cancle任务

2：查看并取消作业

在命令行中查看或取消作业。

./flink list -t yarn-application -Dyarn.application.id=application_XXXX_YY

./flink cancel -t yarn-application -Dyarn.application.id=application_XXXX_YY <jobId>

3：上传HDFS提交

可以通过yarn.provided.lib.dirs配置选项指定位置，将flink的依赖上传到远程。

yarn-applycation模式，yarn自身机制会将flink依赖和jar包上传到hdfs。yarn从hdfs读取jar包。这样还不如我们自己直接上传到hdfs节省带宽。

这种方式下，flink本身的依赖和用户jar可以预先上传到HDFS，而不需要单独发送到集群，这就使得作业提交更加轻量了。

这种模式不仅仅适配于application模式，还适配于pre-job等。

1：上传jar包到hdfs文件系统

复制代码

[root@bigdata137 bin]# hadoop fs -mkdir /flink-dist
[root@bigdata137 flink-1.17.0]# hadoop fs -put lib/ /flink-dist
[root@bigdata137 flink-1.17.0]# hadoop fs -put plugins/ /flink-dist
[root@bigdata137 flink-1.17.0]# hadoop fs -mkdir /flink-jars
[root@bigdata137 flink-1.17.0]# hadoop fs -put flink170-1.0-SNAPSHOT.jar /flink-jars
[root@bigdata137 flink-1.17.0]#

2：提交作业

复制代码

[root@bigdata137 bin]# ./flink run-application -t yarn-application -Dyarn.provided.lib.dirs="hdfs://bigdata137:8020/flink-dist" -c com.dashu.worldcount.wordCountUnboundedStream hdfs://bigdata137:8020/flink-jars/flink170-1.0-SNAPSHOT.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/flink-1.17.0/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop-3.3.5/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2026-02-14 16:12:08,743 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/usr/local/src/flink-1.17.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2026-02-14 16:12:08,796 INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - Connecting to ResourceManager at bigdata138/192.168.67.138:8032
2026-02-14 16:12:08,981 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2026-02-14 16:12:09,121 INFO  org.apache.hadoop.conf.Configuration                         [] - resource-types.xml not found
2026-02-14 16:12:09,122 INFO  org.apache.hadoop.yarn.util.resource.ResourceUtils           [] - Unable to find 'resource-types.xml'.
2026-02-14 16:12:09,171 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - The configured TaskManager memory is 1728 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 320 MB may not be used by Flink.
2026-02-14 16:12:09,171 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1728, slotsPerTaskManager=1}
2026-02-14 16:12:09,974 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cannot use kerberos delegation token manager, no valid kerberos credentials provided.
2026-02-14 16:12:09,980 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Submitting application master application_1771048519804_0010
2026-02-14 16:12:10,028 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl        [] - Submitted application application_1771048519804_0010
2026-02-14 16:12:10,028 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Waiting for the cluster to be allocated
2026-02-14 16:12:10,031 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Deploying cluster, current state ACCEPTED
2026-02-14 16:12:14,358 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - YARN application has been deployed successfully.
2026-02-14 16:12:14,359 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface bigdata138:8081 of application 'application_1771048519804_0010'.
[root@bigdata137 bin]#