Flink从入门到上天系列第五篇:Flink集群化部署模式

一:部署模式

在一些应用场景中,对于集群资源分配和占用的方式,可能会有特定的需求。Flink为各种场景提供了不同的部署模式,主要有以下三种:

会话模式(SessionMode)、单作业模式(Per-Job Mode)、 应用模式 (Application Mode).

它们的区别主要在于:集群的生命周期以及资源的分配方式:以及应用的main方法到底在哪里执行一-客户端(Client)还是JobManager。

1:会话模式

会话模式其实最符合常规思维。我们需要先启动一个集群,保持一个会话 ,在这个会话中通过客户端提交作业。集群启动时所有资源就都已经确定 ,所以所有提交的作业会竞争集群中的资源。

会话模式比较适合于单个规模小、执行时间短的大量作业。

2:单作业模式

会话模式因为资源共享会导致很多问题,所以为了更好地隔离资源,我们可以考虑为每个提交的作业启动一个集群 ,这就是所谓的单作业(Per-Job)模式。

作业完成后,集群就会关闭,所有资源也会释放。

这些特性使得单作业模式在生产环境运行更加稳定,所以是实际应用的首选模式。需要注意的是,Flink本身无法直接这样运行,所以单作业模式一般需要借助一些资源管理框架让来启动集群,比如YARN、Kubemnetes(K8S),我们之前部署好的flink是会话模式。

3:应用模式

前面提到的两种模式下,应用代码都是在客户端上执行,然后由客户端提交给JobManager的。

但是这种方式客户端需要占用大量网络带宽,去下载依赖和把二进制数据发送给JobManagge。加上很多情况下我们提交作业用的是同一个客户端,就会加重客户端所在节点的资源消耗。

所以解决办法就是,我们不要客户端了,直接把应用提交到JobManger上运行 。而这也就代表着,我们需要为每一个提交的应用单独启动一个JobManage,也就是创建一个集群。这个JobManager只为执行这一个应用而存在,执行结束之后JobManager也就关闭了,这就是所谓的应用模式。

应用模式才是未来,但作业模式已经被标记过时了。在新版本中。应用模式本来就是为了解决单作业模式的一些痛点的。
这里我们所讲到的部署模式,相对是比较抽象的概念。实际应用时,一般需要和资源管理平台结合起来,选择特定的模式来分配资源、部署应用。接下来,我们就针对不同的资源提供者的场景,具体介绍Flink的部署方式。

二:Standalone运行模式

运行模式的意思就是谁来管理集群资源。

Standalone就是Flink自己管理。

独立模式是独立运行的,不依赖任何外部的资源管理平台。当然独立也是有代价的:如果资源不足,或者出现故障,没有自动扩展或重分配资源的保证,必须手动处理。所以独立模式一般只用在开发测试或作业非常少的场景下。

这种运行模式已经指定好了,需要几个taskManager,需要几个物理机。

1:会话模式部署

我们在第3.2节用的就是Standalone集群的会话模式部署。

提前启动集群,并通过Web页面客户端提交任务(可以多个任务,但是集群资源固定)

2:单作业模式

Flink的Standalone集群并不支持单作业模式部署。因为单作业模式需要借助一些资源管理平台。

3:应用模式

应用模式下不会提前创建集群,所以不能调用start-cluster.sh脚本。我们可以使用同样在bin目录下的standalone-job.sh来创建一个JobManager。

1:环境准备

复制代码
nc -lk 7777

2:jar包放到lib/目录下。

复制代码
mv FlinkTutorial-1.0-SNAPSHOT.jar lib/

3:JobManager。

复制代码
./standalone-job.sh start --job-classname com.dahsu.wc.SocketStreamWordCount

这里我们直接指定作业入口类,脚本会到lib目录扫描所有的jar包。

4:启动TaskManager。

复制代码
/taskmanager.sh start

5:模拟发送单词数据。

三:Yarn运行模式

1:准备环境

大部分大数据的底层架构是基于Hadoop生态构建的进行构建的。

Hadoop三大件:HDFS+Yarn+MapReduce。其中的Yarn就负责资源的调度和管理。Flink可以提交任务给Yarn来管理。

YARN上部署的过程是:客户端把 Flink应用提交给Yarn的ResourceManager,Yarn的ResourceManager 会向 Yarn的NodeManager 申请容器。

在这些容器上,Flink 会部署JobManager和 TaskManager 的实例,从而启动集群。Flink 会根据运行在JobManger上的作业所需要的Slot数量动态分配TaskManager资源

这也是Yarn模式的优势。

准备工作:

1:安装部署好Hadoop

2:让flink能感知到hadoop存在。通过配置环境变量即可。这样也能做到hadoop和flink的解耦。

修改下环境变量:

复制代码
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64
export PATH=$JAVA_HOME/bin:$PATH
export HADOOP_HOME=/usr/local/src/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_CLASSPATH=`hadoop classpath`

启动Hadoop的hdfs和yarn

2:会话模式部署

执行脚本命令向YARN集群中申请资源,开启一个YARN会话,启动flink集群。

命令概述:

bin/yarn-session.sh -nm test

可用参数解读:

-d:分离模式,如果你不想让Flink YARN客户端一直前台运行,可以使用这个参数,即使关掉当前对话窗口,YARN session也可以后台运行。

-jm(--jobManagerMemory):配置JobManager所需内存,默认单位MB。

-nm(--name):配置在YARN UI界面上显示的任务名。

-qu(--queue):指定YARN队列名。

-tm(--taskManager):配置每个TaskManager所使用内存。

自从Flink1.11.0版本不再使用-n参数和-s参数分别指定TaskManager数量和slot数量,YARN会按照需求动态分配TaskManager和slot。所以从这个意义上讲,YARN的会话模式也不会把集群资源固定,同样是动态分配的。

复制代码
[root@bigdata137 bin]# ./yarn-session.sh -nm test
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/flink-1.17.0/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop-3.3.5/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2026-02-13 14:35:16,215 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.memory.process.size, 1728m
2026-02-13 14:35:16,220 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.bind-host, 0.0.0.0
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.execution.failover-strategy, region
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.address, 192.168.67.137
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.memory.process.size, 1024m
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.port, 6123
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: rest.bind-address, 0.0.0.0
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: rest.port, 8081
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.bind-host, 0.0.0.0
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.host, 192.168.67.137
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: parallelism.default, 1
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2026-02-13 14:35:16,221 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: rest.address, 192.168.67.137
2026-02-13 14:35:16,420 WARN  org.apache.hadoop.util.NativeCodeLoader                      [] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2026-02-13 14:35:16,445 INFO  org.apache.flink.runtime.security.modules.HadoopModule       [] - Hadoop user set to root (auth:SIMPLE)
2026-02-13 14:35:16,445 INFO  org.apache.flink.runtime.security.modules.HadoopModule       [] - Kerberos security is disabled.
2026-02-13 14:35:16,455 INFO  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file will be created as /tmp/jaas-2140832887329830020.conf.
2026-02-13 14:35:16,484 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/usr/local/src/flink-1.17.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2026-02-13 14:35:16,531 INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - Connecting to ResourceManager at bigdata138/192.168.67.138:8032
2026-02-13 14:35:16,722 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (102.400mb (107374184 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2026-02-13 14:35:16,737 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (172.800mb (181193935 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2026-02-13 14:35:16,860 INFO  org.apache.hadoop.conf.Configuration                         [] - resource-types.xml not found
2026-02-13 14:35:16,860 INFO  org.apache.hadoop.yarn.util.resource.ResourceUtils           [] - Unable to find 'resource-types.xml'.
2026-02-13 14:35:16,932 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - The configured TaskManager memory is 1728 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 320 MB may not be used by Flink.
2026-02-13 14:35:16,932 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1728, slotsPerTaskManager=1}
2026-02-13 14:35:16,989 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-datadog
2026-02-13 14:35:16,992 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: external-resource-gpu
2026-02-13 14:35:16,992 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-graphite
2026-02-13 14:35:16,992 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-influx
2026-02-13 14:35:16,992 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-slf4j
2026-02-13 14:35:16,992 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-prometheus
2026-02-13 14:35:16,992 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-statsd
2026-02-13 14:35:16,992 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-jmx
2026-02-13 14:35:21,040 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (102.400mb (107374184 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2026-02-13 14:35:21,051 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cannot use kerberos delegation token manager, no valid kerberos credentials provided.
2026-02-13 14:35:21,055 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Submitting application master application_1770964001601_0001
2026-02-13 14:35:21,385 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl        [] - Submitted application application_1770964001601_0001
2026-02-13 14:35:21,385 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Waiting for the cluster to be allocated
2026-02-13 14:35:21,388 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Deploying cluster, current state ACCEPTED
2026-02-13 14:35:26,197 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - YARN application has been deployed successfully.
2026-02-13 14:35:26,198 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface bigdata139:8081 of application 'application_1770964001601_0001'.
JobManager Web Interface: http://bigdata139:8081

到这里,我们真正实现了flink和yarn集群的打通。会话模式会首先启动启动,集群作为一个应用被yarn进行管理。也就是资源不再由flink进行分配,而是由yarn进行分配。

启动的时候先启动一个yarn应用,这个yarn引用就相当于开了一个集群,左右的作业都是在这个yarn应用上进行。

之后,我们提交真正的任务作业:

1:UI提交作业

然后,此任务提交之前必须先启动137下的7777端口,否则任务不会连接不上该端口就任务会自动失败。

复制代码
[root@bigdata137 bin]# nc -lt 7777
big shit
hello java
hello flink
hello hadoop
hello baby
hello shit
hello aolige

我们还是查看下执行结果:

Yarn的会话模式的话,资源也是动态分配的。没有运行任务的时候,overview当中task manager是0个,运行作业之后,变成了1.

复制代码
[root@bigdata138 cuillei]# jpsall
=============== bigdata137 ===============
49904 YarnTaskExecutorRunner
36513 DataNode
36498 Jps
36283 NameNode
36910 NodeManager
=============== bigdata138 ===============
13333 ResourceManager
13094 DataNode
16614 YarnSessionClusterEntrypoint
13596 NodeManager
27887 Jps
=============== bigdata139 ===============
17313 Jps
11366 DataNode
11590 NodeManager
11498 SecondaryNameNode
[root@bigdata138 cuillei]#

如果我们现在手动cancel掉任务。

复制代码
[root@bigdata137 bin]# jpsall
=============== bigdata137 ===============
36513 DataNode
37333 Jps
36283 NameNode
36910 NodeManager
=============== bigdata138 ===============
13333 ResourceManager
13094 DataNode
16614 YarnSessionClusterEntrypoint
13596 NodeManager
33263 Jps
=============== bigdata139 ===============
11366 DataNode
11590 NodeManager
17561 Jps
11498 SecondaryNameNode

在 YARN Session 模式下,Flink 的所有核心进程(包括 Session 本身和提交到 Session 的任务)都运行在 YARN 为你申请的 Container 里。我用更直观的方式拆解这个运行架构,让你彻底明白:

YARN Session 模式的进程分布(全在 Container 里)

YARN 本质是「资源调度器」,它不会直接运行程序,而是为程序分配「Container」(可以理解为 YARN 集群中一个独立的 "资源单元",包含指定的 CPU / 内存),所有 Flink 进程都跑在这些 Container 里:

Flink 进程 所在 Container 作用
JobManager(ApplicationMaster) YARN 分配的 1 个专属 Container 1. 作为 YARN 的 ApplicationMaster(AM),向 YARN 申请资源;2. 作为 Flink Session 的 "大脑",接收任务、调度任务;3. 这个 Container 是 Session 的核心,只要它在,Session 就活着
TaskManager YARN 分配的 N 个 Container(数量由你启动 Session 时指定,比如 -tm 2 就是 2 个) 1. 运行 Flink 任务的具体算子(比如 map、reduce、窗口计算);2. 每个 TaskManager 对应 1 个 Container,资源隔离;3. 提交到 Session 的所有任务,最终都在这些 TaskManager Container 里执行

YARN Session 模式下,后续提交的所有 Flink 任务,本质上都是在竞争这个 Session 启动时申请的固定一批 Container 内的资源(CPU、内存、Slot),不会再向 YARN 申请新的 Container。

2:命令行提交作业

上边基本上已经把知识点讲述清楚了,我们接下来只是尝试一下在YARN Session模式下通过命令行提交作业任务。

通过命令行提交作业:

提交之前先查看状态:

然后,我们关闭hadoop集群,然后再开启hadoop集群。接下来,我们开启Yarn Session模式。

复制代码
[root@bigdata137 bin]# yarn-session.sh -d -nm flinkdemo
bash: yarn-session.sh: command not found...
[root@bigdata137 bin]# ./yarn-session.sh -d -nm flinkdemo
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/flink-1.17.0/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop-3.3.5/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2026-02-13 23:06:38,973 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.memory.process.size, 1728m
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.bind-host, 0.0.0.0
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.execution.failover-strategy, region
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.address, 192.168.67.137
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.memory.process.size, 1024m
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.port, 6123
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: rest.bind-address, 0.0.0.0
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: rest.port, 8081
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.bind-host, 0.0.0.0
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.host, 192.168.67.137
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: parallelism.default, 1
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2026-02-13 23:06:38,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: rest.address, 192.168.67.137
2026-02-13 23:06:38,993 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                [] - Found Yarn properties file under /tmp/.yarn-properties-root.
2026-02-13 23:06:39,162 WARN  org.apache.hadoop.util.NativeCodeLoader                      [] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2026-02-13 23:06:39,178 INFO  org.apache.flink.runtime.security.modules.HadoopModule       [] - Hadoop user set to root (auth:SIMPLE)
2026-02-13 23:06:39,179 INFO  org.apache.flink.runtime.security.modules.HadoopModule       [] - Kerberos security is disabled.
2026-02-13 23:06:39,186 INFO  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file will be created as /tmp/jaas-776283340901102364.conf.
2026-02-13 23:06:39,203 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/usr/local/src/flink-1.17.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2026-02-13 23:06:39,243 INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - Connecting to ResourceManager at bigdata138/192.168.67.138:8032
2026-02-13 23:06:39,418 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (102.400mb (107374184 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2026-02-13 23:06:39,427 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (172.800mb (181193935 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2026-02-13 23:06:39,536 INFO  org.apache.hadoop.conf.Configuration                         [] - resource-types.xml not found
2026-02-13 23:06:39,536 INFO  org.apache.hadoop.yarn.util.resource.ResourceUtils           [] - Unable to find 'resource-types.xml'.
2026-02-13 23:06:39,604 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - The configured TaskManager memory is 1728 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 320 MB may not be used by Flink.
2026-02-13 23:06:39,604 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1728, slotsPerTaskManager=1}
2026-02-13 23:06:39,635 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-datadog
2026-02-13 23:06:39,637 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: external-resource-gpu
2026-02-13 23:06:39,637 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-graphite
2026-02-13 23:06:39,637 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-influx
2026-02-13 23:06:39,637 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-slf4j
2026-02-13 23:06:39,637 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-prometheus
2026-02-13 23:06:39,637 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-statsd
2026-02-13 23:06:39,637 INFO  org.apache.flink.core.plugin.DefaultPluginManager            [] - Plugin loader with ID not found, creating it: metrics-jmx
2026-02-13 23:06:43,624 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (102.400mb (107374184 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2026-02-13 23:06:43,633 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cannot use kerberos delegation token manager, no valid kerberos credentials provided.
2026-02-13 23:06:43,638 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Submitting application master application_1770995005496_0001
2026-02-13 23:06:43,971 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl        [] - Submitted application application_1770995005496_0001
2026-02-13 23:06:43,971 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Waiting for the cluster to be allocated
2026-02-13 23:06:44,002 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Deploying cluster, current state ACCEPTED
2026-02-13 23:06:48,570 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - YARN application has been deployed successfully.
2026-02-13 23:06:48,571 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface bigdata138:8081 of application 'application_1770995005496_0001'.
JobManager Web Interface: http://bigdata138:8081
2026-02-13 23:06:48,748 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                [] - The Flink YARN session cluster has been started in detached mode. In order to stop Flink gracefully, use the following command:
$ echo "stop" | ./bin/yarn-session.sh -id application_1770995005496_0001
If this should not be possible, then you can also kill Flink via YARN's web interface or via:
$ yarn application -kill application_1770995005496_0001
Note that killing Flink might not clean up all job artifacts and temporary files.
[root@bigdata137 bin]# jpsall
=============== bigdata137 ===============
90326 DataNode
90731 NodeManager
90092 NameNode
92575 Jps
=============== bigdata138 ===============
37271 YarnSessionClusterEntrypoint
36266 ResourceManager
37354 Jps
36460 NodeManager
36030 DataNode
=============== bigdata139 ===============
20630 NodeManager
20537 SecondaryNameNode
20925 Jps
20415 DataNode

接下来我们通过命令行提交任务:

提交任务之前,必须再137物理机上开启:

复制代码
nc -lt 7777

这样服务监听了7777端口之后,服务启动的时候才不会暴:connection refused。

上边yarn session启动告诉我们,我们提交任务只需要往bigdata138:8081提交即可

首先先把我们的作业jar包放到lib下面。之后立即提交任务。

复制代码
[root@bigdata137 bin]# ./flink run -m 192.168.67.138:8081 -c com.dashu.worldcount.wordCountUnboundedStream ../lib/flink170-1.0-SNAPSHOT.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/flink-1.17.0/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop-3.3.5/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2026-02-13 23:09:32,706 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                [] - Found Yarn properties file under /tmp/.yarn-properties-root.
2026-02-13 23:09:32,706 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                [] - Found Yarn properties file under /tmp/.yarn-properties-root.
Job has been submitted with JobID e591cdd8d259fd2202a888d6844185bd
^C[root@bigdata137 bin]#

标红的是变化的,之前这些内容都是0:

发送数据查看最终效果:

3:注意

为什么命令行提交作业,不输入ip和端口号也能默认提交yarn session模式下?

复制代码
[root@bigdata137 tmp]# cat .yarn-properties-root
#Generated YARN properties file
#Fri Feb 13 23:06:48 CST 2026
dynamicPropertiesString=
applicationID=application_1770995005496_0001
[root@bigdata137 tmp]#

是因为在临时文件中,给记录下来了。任务提交的时候,如果不指定ip和端口号默认会从配置文件中去找,也就会默认匹配到yarn session模式,否则才会都standalone的会话模式。

yarn session模式,如果把任务cacel掉不会影响集群,那么如果想关闭掉集群怎么办?

1:直接在hadoop的resourceManager管理工作台,找到具体应用(集群)直接删除应用即可。

2:使用启动时日志中写出来的停止命令:

echo "stop" | ./yarn-session.sh -id application_1770995005496_0001

3:单作业模式部署

1:单作业模式部署详情

在YARN环境中,由于有了外部平台做资源调度,所以我们也可以直接向YARN提交一个单独的作业,从而启动一个Flink集群。

也就是不用像是yarn session那样,分为两步:创建yarn session集群、提交作业。

提交作业:

复制代码
[root@bigdata137 bin]# ./flink run -t yarn-per-job -c com.dashu.worldcount.wordCountUnboundedStream ../lib/flink170-1.0-SNAPSHOT.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/flink-1.17.0/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop-3.3.5/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2026-02-14 13:58:09,093 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/usr/local/src/flink-1.17.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2026-02-14 13:58:09,149 INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - Connecting to ResourceManager at bigdata138/192.168.67.138:8032
2026-02-14 13:58:09,442 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2026-02-14 13:58:09,461 WARN  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Job Clusters are deprecated since Flink 1.15. Please use an Application Cluster/Application Mode instead.
2026-02-14 13:58:09,663 INFO  org.apache.hadoop.conf.Configuration                         [] - resource-types.xml not found
2026-02-14 13:58:09,664 INFO  org.apache.hadoop.yarn.util.resource.ResourceUtils           [] - Unable to find 'resource-types.xml'.
2026-02-14 13:58:09,817 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - The configured TaskManager memory is 1728 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 320 MB may not be used by Flink.
2026-02-14 13:58:09,818 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1728, slotsPerTaskManager=1}
2026-02-14 13:58:16,562 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cannot use kerberos delegation token manager, no valid kerberos credentials provided.
2026-02-14 13:58:16,567 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Submitting application master application_1771048519804_0001
2026-02-14 13:58:16,965 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl        [] - Submitted application application_1771048519804_0001
2026-02-14 13:58:16,965 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Waiting for the cluster to be allocated
2026-02-14 13:58:17,009 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Deploying cluster, current state ACCEPTED
2026-02-14 13:58:25,393 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - YARN application has been deployed successfully.
2026-02-14 13:58:25,394 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface bigdata139:8081 of application 'application_1771048519804_0001'.
Job has been submitted with JobID 17e4520479aadb5dcdbe31bbe080bf4d

当前我们提交了任务,生成了应用对应的就是flink集群,然后因为我们没有开启137的7777的监听。导致任务启动失败,集群结束。

接下来我们

然后再重新启动:

2:停止但作业模式部署

1:直接在flink页面上cacel(单作业模式停止之后,应用会关闭资源会回收掉。)

2:通过命令行取消作业

复制代码
[root@bigdata137 bin]# flink list -t yarn-per-job -Dyarn.application.id=application_1771048519804_0005
bash: flink: command not found...
Similar command is: 'link'
[root@bigdata137 bin]# ./flink list -t yarn-per-job -Dyarn.application.id=application_1771048519804_0005
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/flink-1.17.0/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop-3.3.5/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2026-02-14 14:28:18,433 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/usr/local/src/flink-1.17.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2026-02-14 14:28:18,525 INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - Connecting to ResourceManager at bigdata138/192.168.67.138:8032
2026-02-14 14:28:18,670 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2026-02-14 14:28:18,762 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface bigdata138:8081 of application 'application_1771048519804_0005'.
Waiting for response...
------------------ Running/Restarting Jobs -------------------
14.02.2026 14:18:34 : aa315a1351ef01f19a1e178b457adf4d : Flink Streaming Job (RUNNING)
--------------------------------------------------------------
No scheduled jobs.
[root@bigdata137 bin]# ./flink cancel -t yarn-per-job -Dyarn.application.id=application_1771048519804_0005 aa315a1351ef01f19a1e178b457adf4d
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/flink-1.17.0/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop-3.3.5/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Cancelling job aa315a1351ef01f19a1e178b457adf4d.
2026-02-14 14:30:29,759 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/usr/local/src/flink-1.17.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2026-02-14 14:30:29,852 INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - Connecting to ResourceManager at bigdata138/192.168.67.138:8032
2026-02-14 14:30:30,004 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2026-02-14 14:30:30,094 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface bigdata138:8081 of application 'application_1771048519804_0005'.
Cancelled job aa315a1351ef01f19a1e178b457adf4d.
[root@bigdata137 bin]#

3:这个作业取消和页面取消没有任何问题。

4:应用模式部署

1:应用模式部署详情

应用模式同样非常简单,与单作业模式类似,直接执行flink run-application命令即可

区别就是代码不再有client解析,而是由resourceManager进行解析。节省网络带宽

提交作业:

复制代码
[root@bigdata137 bin]# ./flink run-application -t yarn-application -c com.dashu.worldcount.wordCountUnboundedStream ../lib/flink170-1.0-SNAPSHOT.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/flink-1.17.0/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop-3.3.5/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2026-02-14 15:01:16,657 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/usr/local/src/flink-1.17.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2026-02-14 15:01:16,706 INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - Connecting to ResourceManager at bigdata138/192.168.67.138:8032
2026-02-14 15:01:16,897 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2026-02-14 15:01:17,037 INFO  org.apache.hadoop.conf.Configuration                         [] - resource-types.xml not found
2026-02-14 15:01:17,037 INFO  org.apache.hadoop.yarn.util.resource.ResourceUtils           [] - Unable to find 'resource-types.xml'.
2026-02-14 15:01:17,087 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - The configured TaskManager memory is 1728 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 320 MB may not be used by Flink.
2026-02-14 15:01:17,088 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1728, slotsPerTaskManager=1}
2026-02-14 15:01:20,778 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cannot use kerberos delegation token manager, no valid kerberos credentials provided.
2026-02-14 15:01:20,782 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Submitting application master application_1771048519804_0007
2026-02-14 15:01:20,832 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl        [] - Submitted application application_1771048519804_0007
2026-02-14 15:01:20,832 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Waiting for the cluster to be allocated
2026-02-14 15:01:20,834 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Deploying cluster, current state ACCEPTED
2026-02-14 15:01:26,158 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - YARN application has been deployed successfully.
2026-02-14 15:01:26,159 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface bigdata139:8081 of application 'application_1771048519804_0007'.
[roo

2:停止应用模式

1:flink页面上cancle任务

2:查看并取消作业

在命令行中查看或取消作业。

./flink list -t yarn-application -Dyarn.application.id=application_XXXX_YY

./flink cancel -t yarn-application -Dyarn.application.id=application_XXXX_YY <jobId>

3:上传HDFS提交

可以通过yarn.provided.lib.dirs配置选项指定位置,将flink的依赖上传到远程。

yarn-applycation模式,yarn自身机制会将flink依赖和jar包上传到hdfs。yarn从hdfs读取jar包。这样还不如我们自己直接上传到hdfs节省带宽。

这种方式下,flink本身的依赖和用户jar可以预先上传到HDFS,而不需要单独发送到集群,这就使得作业提交更加轻量了。

这种模式不仅仅适配于application模式,还适配于pre-job等。

1:上传jar包到hdfs文件系统

复制代码
[root@bigdata137 bin]# hadoop fs -mkdir /flink-dist
[root@bigdata137 flink-1.17.0]# hadoop fs -put lib/ /flink-dist
[root@bigdata137 flink-1.17.0]# hadoop fs -put plugins/ /flink-dist
[root@bigdata137 flink-1.17.0]# hadoop fs -mkdir /flink-jars
[root@bigdata137 flink-1.17.0]# hadoop fs -put flink170-1.0-SNAPSHOT.jar /flink-jars
[root@bigdata137 flink-1.17.0]#

2:提交作业

复制代码
[root@bigdata137 bin]# ./flink run-application -t yarn-application -Dyarn.provided.lib.dirs="hdfs://bigdata137:8020/flink-dist" -c com.dashu.worldcount.wordCountUnboundedStream hdfs://bigdata137:8020/flink-jars/flink170-1.0-SNAPSHOT.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/flink-1.17.0/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop-3.3.5/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2026-02-14 16:12:08,743 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/usr/local/src/flink-1.17.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2026-02-14 16:12:08,796 INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - Connecting to ResourceManager at bigdata138/192.168.67.138:8032
2026-02-14 16:12:08,981 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2026-02-14 16:12:09,121 INFO  org.apache.hadoop.conf.Configuration                         [] - resource-types.xml not found
2026-02-14 16:12:09,122 INFO  org.apache.hadoop.yarn.util.resource.ResourceUtils           [] - Unable to find 'resource-types.xml'.
2026-02-14 16:12:09,171 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - The configured TaskManager memory is 1728 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 320 MB may not be used by Flink.
2026-02-14 16:12:09,171 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1728, slotsPerTaskManager=1}
2026-02-14 16:12:09,974 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cannot use kerberos delegation token manager, no valid kerberos credentials provided.
2026-02-14 16:12:09,980 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Submitting application master application_1771048519804_0010
2026-02-14 16:12:10,028 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl        [] - Submitted application application_1771048519804_0010
2026-02-14 16:12:10,028 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Waiting for the cluster to be allocated
2026-02-14 16:12:10,031 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Deploying cluster, current state ACCEPTED
2026-02-14 16:12:14,358 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - YARN application has been deployed successfully.
2026-02-14 16:12:14,359 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface bigdata138:8081 of application 'application_1771048519804_0010'.
[root@bigdata137 bin]#
相关推荐
AC赳赳老秦1 小时前
2026主权AI趋势:DeepSeek搭建企业自有可控AI环境,保障数据安全实战
大数据·数据库·人工智能·python·科技·rabbitmq·deepseek
莫叫石榴姐2 小时前
数据开发需求工时如何评估?
大数据·数据仓库·人工智能·数据分析·产品运营
培培说证2 小时前
2026 大专大数据技术专业证书含金量排名大专适用!
大数据
LaughingZhu3 小时前
Product Hunt 每日热榜 | 2026-02-13
大数据·人工智能·经验分享·搜索引擎·产品运营
廋到被风吹走3 小时前
DDD领域驱动设计深度解析:从理论到代码实践
java·大数据·linux
说私域4 小时前
以非常6+1体系为支撑 融入AI智能名片商城小程序 提升组织建设效能
大数据·人工智能·小程序·流量运营·私域运营
数琨创享TQMS质量数智化4 小时前
数琨创享:以数智化质量目标管理闭环赋能可量化、可追溯、可驱动的质量运营
大数据·人工智能·qms质量管理系统
我只会写Bug啊4 小时前
【软考】系统架构设计师-论文范文(六)
大数据·系统架构·信息系统项目管理师·架构设计·系统分析师