flink StreamGraph 构造flink任务

文章目录

背景

通常使用flink 提供的高级算子来编写flink 任务,对底层不是很了解,尤其是如何生成作业图的细节

下面通过构造一个有向无环图,来实际看一下

主要步骤

1.增加source

2.增加operator

  1. 增加一条边,连接source和operator

  2. 增加sink

  3. 增加一条边,连接operator和sink

代码

bash 复制代码
 // Step 1: Create basic configurations
        Configuration configuration = new Configuration();
        ExecutionConfig executionConfig = new ExecutionConfig();
        CheckpointConfig checkpointConfig = new CheckpointConfig();
        SavepointRestoreSettings savepointRestoreSettings = SavepointRestoreSettings.none();

        // Step 2: Create a new StreamGraph instance
        StreamGraph streamGraph = new StreamGraph(configuration, executionConfig, checkpointConfig, savepointRestoreSettings);

        // Step 3: Add a source operator

        GeneratorFunction<Long, String> generatorFunction = index -> "Number: " + index;
        DataGeneratorSource<String> source = new DataGeneratorSource<>(generatorFunction, Long.MAX_VALUE, RateLimiterStrategy.perSecond(1), Types.STRING);
        SourceOperatorFactory<String> sourceOperatorFactory = new SourceOperatorFactory<>(source, WatermarkStrategy.noWatermarks());
        streamGraph.addSource(1, "sourceNode", "sourceDescription", sourceOperatorFactory, TypeInformation.of(String.class), TypeInformation.of(String.class), "sourceSlot");

        // Step 4: Add a map operator to transform the data
        StreamMap<String, String> mapOperator = new StreamMap<>(new MapFunction<String, String>() {
            @Override
            public String map(String value) throws Exception {
                return value;
            }
        });
        SimpleOperatorFactory<String> mapOperatorFactory = SimpleOperatorFactory.of(mapOperator);
        streamGraph.addOperator(2, "mapNode", "mapDescription", mapOperatorFactory, TypeInformation.of(String.class), TypeInformation.of(String.class), "mapSlot");

        // Step 5: Connect source and map operator
        streamGraph.addEdge(1, 2, 0);

        // Step 6: Add a sink operator to consume the data
        StreamMap<String, String> sinkOperator = new StreamMap<>(new MapFunction<String, String>() {
            @Override
            public String map(String value) throws Exception {
                System.out.println(value);
                return value;
            }
        });
        SimpleOperatorFactory<String> sinkOperatorFactory = SimpleOperatorFactory.of(sinkOperator);
        streamGraph.addSink(3, "sinkNode", "sinkDescription", sinkOperatorFactory, TypeInformation.of(String.class), TypeInformation.of(String.class), "sinkSlot");

        // Step 7: Connect map and sink operator
        streamGraph.addEdge(2, 3, 0);
        streamGraph.setTimeCharacteristic(TimeCharacteristic.ProcessingTime);
        streamGraph.setMaxParallelism(1,1);
        streamGraph.setMaxParallelism(2,1);
        streamGraph.setMaxParallelism(3,1);
        streamGraph.setGlobalStreamExchangeMode(GlobalStreamExchangeMode.ALL_EDGES_PIPELINED);


        // Step 8: Convert StreamGraph to JobGraph
        JobGraph jobGraph = streamGraph.getJobGraph();


        // Step 9: Set up a MiniCluster for local execution
        MiniClusterConfiguration miniClusterConfig = new MiniClusterConfiguration.Builder()
                .setNumTaskManagers(10)
                .setNumSlotsPerTaskManager(10)
                .build();
        MiniCluster miniCluster = new MiniCluster(miniClusterConfig);

        // Step 10: Start the MiniCluster
        miniCluster.start();

        // Step 11: Submit the job to the MiniCluster
        JobExecutionResult result = miniCluster.executeJobBlocking(jobGraph);
        System.out.println("Job completed with result: " + result);

        // Step 12: Stop the MiniCluster
        miniCluster.close();
相关推荐
SelectDB技术团队1 小时前
Apache Doris 2025 Roadmap:构建 GenAI 时代实时高效统一的数据底座
大数据·数据库·数据仓库·人工智能·ai·数据分析·湖仓一体
你觉得2051 小时前
浙江大学朱霖潮研究员:《人工智能重塑科学与工程研究》以蛋白质结构预测为例|附PPT下载方法
大数据·人工智能·机器学习·ai·云计算·aigc·powerpoint
益莱储中国1 小时前
世界通信大会、嵌入式展及慕尼黑上海光博会亮点回顾
大数据
Loving_enjoy2 小时前
基于Hadoop的明星社交媒体影响力数据挖掘平台:设计与实现
大数据·hadoop·数据挖掘
浮尘笔记2 小时前
go-zero使用elasticsearch踩坑记:时间存储和展示问题
大数据·elasticsearch·golang·go
碳基学AI3 小时前
哈尔滨工业大学DeepSeek公开课:探索大模型原理、技术与应用从GPT到DeepSeek|附视频与讲义免费下载方法
大数据·人工智能·python·gpt·算法·语言模型·集成学习
一个天蝎座 白勺 程序猿4 小时前
大数据(4.6)Hive执行引擎选型终极指南:MapReduce/Tez/Spark性能实测×万亿级数据资源配置公式
大数据·hive·mapreduce
HelpHelp同学5 小时前
信息混乱难查找?三步搭建高效帮助中心解决难题
大数据·人工智能·知识库管理系统
TDengine (老段)11 小时前
TDengine 中的关联查询
大数据·javascript·网络·物联网·时序数据库·tdengine·iotdb
直裾15 小时前
Mapreduce的使用
大数据·数据库·mapreduce