Flink 本地启动的多种方式

Application模式通过代码提交到Yarn上启动

java 复制代码
//设置Yarn客户端
YarnClient yarnClient = ;
Configuration configuration = new Configuration();
if (customConfiguration != null) {
  configuration.addAll(customConfiguration);
}
configuration.set(JobManagerOptions.TOTAL_PROCESS_MEMORY, MemorySize.parse("1024m"));
configuration.set(TaskManagerOptions.TOTAL_PROCESS_MEMORY, MemorySize.parse("1024m"));
configuration.set(DeploymentOptions.TARGET, YarnDeploymentTarget.APPLICATION.getName());
// 设置flink-dist-???.jar
String distPath = ;
configuration.set(YarnConfigOptions.FLINK_DIST_JAR, distPath);
// 设置需要执行的jar包
String examplePath = ;
configuration.set(PipelineOptions.JARS, Collections.singletonList(examplePath));
FileSystem fileSystem = FileSystem.get(hadoopClusterTest.getConfig());
//设置flink lib
String dirPath = ;
// 上传flink libjar包到hdfs中
fileSystem.copyFromLocalFile(new Path(dirPath), new Path(dirPath));
configuration.set(YarnConfigOptions.PROVIDED_LIB_DIRS, Collections.singletonList(dirPath));
setIfAbsent(configuration, PipelineOptions.JARS, new ArrayList<>());
YarnConfiguration yarnConfiguration = new YarnConfiguration();
YarnClientYarnClusterInformationRetriever yarnClientYarnClusterInformationRetriever =
  YarnClientYarnClusterInformationRetriever.create(yarnClient);
YarnClusterDescriptor yarnClusterDescriptor = new YarnClusterDescriptor(
  configuration,
  yarnConfiguration,
  yarnClient,
  yarnClientYarnClusterInformationRetriever,
  true
);
ClusterSpecification clusterSpecification = new ClusterSpecification
  .ClusterSpecificationBuilder()
  .setSlotsPerTaskManager(1)
  .createClusterSpecification();

ApplicationConfiguration applicationConfiguration = new ApplicationConfiguration(
  new String[]{},
  // 需要执行的类全名
);
try {
  // 启动ApplicationCluster
  yarnClusterDescriptor.deployApplicationCluster(clusterSpecification, applicationConfiguration);
} catch (ClusterDeploymentException e) {
  e.printStackTrace();
}

Session模式通过代码提交到Yarn上启动

java 复制代码
public class YarnFlinkSessionTest {
    ClusterClient<ApplicationId> clusterClient;
    @Test
    void test() throws ExecutionException, InterruptedException {
        YarnClient yarnClient = //创建Yarn客户端
        Configuration configuration = new Configuration();
        configuration.set(JobManagerOptions.TOTAL_PROCESS_MEMORY,
            MemorySize.parse("1024m"));
        configuration.set(TaskManagerOptions.TOTAL_PROCESS_MEMORY,
            MemorySize.parse("1024m"));
        configuration.set(YarnConfigOptions.FLINK_DIST_JAR, "${FLINK_HOME}/lib/flink-dist-1.16.2.jar");
        YarnConfiguration yarnConfiguration = new YarnConfiguration();
        YarnClientYarnClusterInformationRetriever yarnClientYarnClusterInformationRetriever =
            YarnClientYarnClusterInformationRetriever.create(yarnClient);
        YarnClusterDescriptor yarnClusterDescriptor = new YarnClusterDescriptor(
            configuration,
            yarnConfiguration,
            yarnClient,
            yarnClientYarnClusterInformationRetriever,
            true
        );
        ClusterSpecification clusterSpecification = new ClusterSpecification.ClusterSpecificationBuilder()
            .setMasterMemoryMB(1024)
            .setTaskManagerMemoryMB(1024)
            .setSlotsPerTaskManager(1)
            .createClusterSpecification();
        try {
            ClusterClientProvider<ApplicationId> applicationIdClusterClientProvider = yarnClusterDescriptor.deploySessionCluster(clusterSpecification);
            clusterClient = applicationIdClusterClientProvider.getClusterClient();
        } catch (ClusterDeploymentException e) {
            e.printStackTrace();
        }
        Thread.sleep(10000000);
    }
}

MiniCluster在start方法中启动QueryService、RPCService、Zookeeper、BlobServer、TaskManager、DispatcherLeader、ResourceManager、DispatcherGateway、WebMonitor进行RPC通信。。

MiniCluster启动后再调用submitJob提交任务

RpcTaskManagerGateway、TaskExecutor

命令行Flink本地Standalone模式启动

运行任务:

./bin/flink run ./examples/streaming/TopSpeedWindowing.jar

  1. 该命令会调用CliFrontend.main()方法

  2. CliFrontend.main()方法再调用内部run()方法,然后调用内部executeProgram()方法

  3. 最后CliFrontend.executeProgram()调用ClientUtils.executeProgram()方法.

  4. 最后通过StandloneSessionClusterEntrypoint的main方法启动Flink

RestServerEndpoint在执行start()方法时注册Netty的ChannelHandler,可以通过WebMonitorEndpoint查看具体的Handler类型和实现。

JobManager::onStart -> JobMaster::startJobExecution

官方文档命令行启动

yarn: https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/resource-providers/yarn/

kubernetes: https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/resource-providers/native_kubernetes/

standalone: https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/resource-providers/standalone/overview/

相关推荐
从零开始学习人工智能36 分钟前
Doris 与 Elasticsearch:谁更适合你的数据分析需求?
大数据·elasticsearch·数据分析
TDengine (老段)2 小时前
TDengine 快速体验(Docker 镜像方式)
大数据·数据库·物联网·docker·时序数据库·tdengine·涛思数据
金融小师妹2 小时前
解码美元-黄金负相关:LSTM-Attention因果发现与黄金反弹推演
大数据·人工智能·算法
安科瑞刘鸿鹏3 小时前
双碳时代,能源调度的难题正从“发电侧”转向“企业侧”
大数据·运维·物联网·安全·能源
时序数据说3 小时前
时序数据库IoTDB数据模型建模实例详解
大数据·数据库·开源·时序数据库·iotdb
时序数据说3 小时前
时序数据库IoTDB结合SeaTunnel实现高效数据同步
大数据·数据库·开源·时序数据库·iotdb
代码搬运媛4 小时前
ES Modules 与 CommonJS 的核心区别详解
大数据·elasticsearch·搜索引擎
zh_199955 小时前
Hive面试题汇总
大数据·hive·hadoop·架构·面试题
不爱学英文的码字机器6 小时前
[Git] 标签管理
大数据·git·elasticsearch
Elastic 中国社区官方博客7 小时前
使用 OpenTelemetry 和 Elastic 简化公共部门的可观察性
大数据·elasticsearch·搜索引擎·全文检索·可用性测试·opentelemetry