flink 状态参数设置

前提

代码示例,通过flink消费kafka,查看list状态中的数据,确定参数的具体含义

kafka的代码:发送两个key值,一秒发送一次

复制代码
	for(int i = 0; i< 100; i++){
	    JSONObject object = new JSONObject();
	    object.put("id", 1);
	    object.put("value", i);
	    String s = object.toJSONString();
	    kafkaProducer.send(new ProducerRecord("test_topic_partition_one", s.getBytes(StandardCharsets.UTF_8))).get();
	
	    object = new JSONObject();
	    object.put("id", 2);
	    object.put("value", 100 + i);
	    s = object.toJSONString();
	    kafkaProducer.send(new ProducerRecord("test_topic_partition_one", s.getBytes(StandardCharsets.UTF_8))).get();
	    Thread.sleep(1000);
	}

flink消费kafka示例:

复制代码
	final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
	env.enableCheckpointing(10 * 1000);
	KafkaSource<String> source = KafkaSource.<String>builder()
	                .setBootstrapServers("broker:9092")
	                .setProperties(properties)
	                .setTopics("test_topic_partition_one")
	                .setGroupId("my-group")
	                .setStartingOffsets(OffsetsInitializer.latest())
	                .setValueOnlyDeserializer(new SimpleStringSchema())
	                .build();
	DataStreamSource<String> kafkaSource = env
		.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source")
		.setParallelism(2);
	
	DataStream<Tuple2<String, Integer>> dataStream = kafkaSource.map(
	                new MapFunction<String, Tuple2<String, Integer>>() {
	                    @Override
	                    public Tuple2<String, Integer> map(String value) throws Exception {
	                        JSONObject object = JSONObject.parseObject(value);
	                        return new Tuple2<String, Integer>(object.getString("id"), object.getInteger("value"));
	                    }
	                });
	
	DataStream<String> resultStream = dataStream
	                .keyBy(value -> value.f0) // 根据第一个字段(键)进行分组
	                .process(new ListValueProcess());
	
	 // 打印结果
	 resultStream.print();

ListValueProcess状态函数

复制代码
 	@Override
    public void processElement(Tuple2<String, Integer> value, KeyedProcessFunction<String, Tuple2<String, Integer>, String>.Context ctx, Collector<String> out) throws Exception {
        // 添加元素到 ListState
        listState.add(value.f1);

        // 获取 ListState 中的所有元素,并输出它们
        String key = value.f0;
        List<Integer> list = new ArrayList<>();
        for (Integer integer : listState.get()) {
            list.add(integer);
        }
        String result = "key:" + key + ", value:" +list;
        // 输出结果
        out.collect(result);
    }
    @Override
    public void open(Configuration parameters) throws Exception {
        super.open(parameters);

        StateTtlConfig ttlConfig = StateTtlConfig
                .newBuilder(Time.seconds(10))
                .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
                .setStateVisibility(StateTtlConfig.StateVisibility.ReturnExpiredIfNotCleanedUp)
                .build();


        // 初始化 ListState
        // 不同的key 具有不用的listState
        // 用于存储一个key多个值
        ListStateDescriptor<Integer> integerListStateDescriptor = new ListStateDescriptor<>("my-list-state", Integer.class);
        integerListStateDescriptor.enableTimeToLive(ttlConfig);

        listState = getRuntimeContext().getListState(integerListStateDescriptor);

    }

可以看到StateTtlConfig大部份有三个参数

  • 指定状态保存时间
  • setUpdateType 设置状态更新策略:OnCreateAndWriteOnReadAndWrite
  • setStateVisibility 设置状态可见行 :ReturnExpiredIfNotCleanedUpNeverReturnExpired

这里我们保存状态时间是10s
OnCreateAndWrite: 表示当状态被创建与更新的时候,表示更新了状态
OnReadAndWrite:表示状态被创建与更新和读取的时候,表示更新了状态
ReturnExpiredIfNotCleanedUp:表示状态过期了但没有删除,也可以读取到状态
NeverReturnExpired:表示状态过期就读取不到

结果示例:

当:OnCreateAndWriteReturnExpiredIfNotCleanedUp

复制代码
1> key:1, value:[6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124]
1> key:1, value:[6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125]
1> key:1, value:[6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126]
1> key:1, value:[18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127]
1> key:1, value:[18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128]
1> key:1, value:[18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129]
1> key:1, value:[18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130]

可以看到,状态会定期删除过期的数据,而且数据可见可能大于10s的范围。

OnCreateAndWriteNeverReturnExpired

复制代码
1> key:2, value:[109, 110, 111, 112, 113, 114, 115, 116, 117]
1> key:1, value:[10, 11, 12, 13, 14, 15, 16, 17, 18]
1> key:2, value:[110, 111, 112, 113, 114, 115, 116, 117, 118]
1> key:1, value:[11, 12, 13, 14, 15, 16, 17, 18, 19]
1> key:2, value:[111, 112, 113, 114, 115, 116, 117, 118, 119]
1> key:1, value:[12, 13, 14, 15, 16, 17, 18, 19, 20]
1> key:2, value:[112, 113, 114, 115, 116, 117, 118, 119, 120]
1> key:1, value:[13, 14, 15, 16, 17, 18, 19, 20, 21]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121]
1> key:1, value:[14, 15, 16, 17, 18, 19, 20, 21, 22]
1> key:2, value:[114, 115, 116, 117, 118, 119, 120, 121, 122]

可以看到,状态的数据只保留最近10s内的值

OnReadAndWriteReturnExpiredIfNotCleanedUp

复制代码
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132]

可以看到,状态保留了所有的数据,因为每次都会读取了数据,所以不会过期

OnReadAndWriteNeverReturnExpired

复制代码
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131]

可以看到,状态保留了所有的数据,因为每次都会读取了数据,所以不会过期

相关推荐
Wnq100721 小时前
数据链共享:从印巴空战到工业控制的跨越性应用
大数据·人工智能·数据链共享
海金沙333 小时前
购物数据分析
大数据
Leo.yuan3 小时前
热力图是什么?三分钟学会热力图数据分析怎么做!
大数据·数据库·数据挖掘·数据分析·html
IvanCodes3 小时前
一、数据仓库基石:核心理论、分层艺术与 ETL/ELT 之辨
大数据·数据仓库·hive·etl
玩转数据库管理工具FOR DBLENS3 小时前
项目高压生存指南:科学重构身体与认知系统的抗压算法
大数据·数据库·职场和发展·项目管理
金融小师妹3 小时前
量化解析美英协议的非对称冲击:多因子模型与波动率曲面重构
大数据·人工智能·算法
SelectDB技术团队3 小时前
可观测性方案怎么选?SelectDB vs Elasticsearch vs ClickHouse
大数据·数据仓库·clickhouse·elasticsearch·信息可视化·doris·半结构化
AI_Auto4 小时前
数字化转型-4A架构之数据架构
大数据·架构
海金沙334 小时前
数据实验分析
大数据
从头再来的码农4 小时前
大数据Flink相关面试题(一)
大数据·flink