flink 状态参数设置

前提

代码示例,通过flink消费kafka,查看list状态中的数据,确定参数的具体含义

kafka的代码:发送两个key值,一秒发送一次

	for(int i = 0; i< 100; i++){
	    JSONObject object = new JSONObject();
	    object.put("id", 1);
	    object.put("value", i);
	    String s = object.toJSONString();
	    kafkaProducer.send(new ProducerRecord("test_topic_partition_one", s.getBytes(StandardCharsets.UTF_8))).get();
	
	    object = new JSONObject();
	    object.put("id", 2);
	    object.put("value", 100 + i);
	    s = object.toJSONString();
	    kafkaProducer.send(new ProducerRecord("test_topic_partition_one", s.getBytes(StandardCharsets.UTF_8))).get();
	    Thread.sleep(1000);
	}

flink消费kafka示例:

	final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
	env.enableCheckpointing(10 * 1000);
	KafkaSource<String> source = KafkaSource.<String>builder()
	                .setBootstrapServers("broker:9092")
	                .setProperties(properties)
	                .setTopics("test_topic_partition_one")
	                .setGroupId("my-group")
	                .setStartingOffsets(OffsetsInitializer.latest())
	                .setValueOnlyDeserializer(new SimpleStringSchema())
	                .build();
	DataStreamSource<String> kafkaSource = env
		.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source")
		.setParallelism(2);
	
	DataStream<Tuple2<String, Integer>> dataStream = kafkaSource.map(
	                new MapFunction<String, Tuple2<String, Integer>>() {
	                    @Override
	                    public Tuple2<String, Integer> map(String value) throws Exception {
	                        JSONObject object = JSONObject.parseObject(value);
	                        return new Tuple2<String, Integer>(object.getString("id"), object.getInteger("value"));
	                    }
	                });
	
	DataStream<String> resultStream = dataStream
	                .keyBy(value -> value.f0) // 根据第一个字段(键)进行分组
	                .process(new ListValueProcess());
	
	 // 打印结果
	 resultStream.print();

ListValueProcess状态函数

 	@Override
    public void processElement(Tuple2<String, Integer> value, KeyedProcessFunction<String, Tuple2<String, Integer>, String>.Context ctx, Collector<String> out) throws Exception {
        // 添加元素到 ListState
        listState.add(value.f1);

        // 获取 ListState 中的所有元素,并输出它们
        String key = value.f0;
        List<Integer> list = new ArrayList<>();
        for (Integer integer : listState.get()) {
            list.add(integer);
        }
        String result = "key:" + key + ", value:" +list;
        // 输出结果
        out.collect(result);
    }
    @Override
    public void open(Configuration parameters) throws Exception {
        super.open(parameters);

        StateTtlConfig ttlConfig = StateTtlConfig
                .newBuilder(Time.seconds(10))
                .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
                .setStateVisibility(StateTtlConfig.StateVisibility.ReturnExpiredIfNotCleanedUp)
                .build();


        // 初始化 ListState
        // 不同的key 具有不用的listState
        // 用于存储一个key多个值
        ListStateDescriptor<Integer> integerListStateDescriptor = new ListStateDescriptor<>("my-list-state", Integer.class);
        integerListStateDescriptor.enableTimeToLive(ttlConfig);

        listState = getRuntimeContext().getListState(integerListStateDescriptor);

    }

可以看到StateTtlConfig大部份有三个参数

  • 指定状态保存时间
  • setUpdateType 设置状态更新策略:OnCreateAndWriteOnReadAndWrite
  • setStateVisibility 设置状态可见行 :ReturnExpiredIfNotCleanedUpNeverReturnExpired

这里我们保存状态时间是10s
OnCreateAndWrite: 表示当状态被创建与更新的时候,表示更新了状态
OnReadAndWrite:表示状态被创建与更新和读取的时候,表示更新了状态
ReturnExpiredIfNotCleanedUp:表示状态过期了但没有删除,也可以读取到状态
NeverReturnExpired:表示状态过期就读取不到

结果示例:

当:OnCreateAndWriteReturnExpiredIfNotCleanedUp

1> key:1, value:[6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124]
1> key:1, value:[6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125]
1> key:1, value:[6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126]
1> key:1, value:[18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127]
1> key:1, value:[18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128]
1> key:1, value:[18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129]
1> key:1, value:[18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130]

可以看到,状态会定期删除过期的数据,而且数据可见可能大于10s的范围。

OnCreateAndWriteNeverReturnExpired

1> key:2, value:[109, 110, 111, 112, 113, 114, 115, 116, 117]
1> key:1, value:[10, 11, 12, 13, 14, 15, 16, 17, 18]
1> key:2, value:[110, 111, 112, 113, 114, 115, 116, 117, 118]
1> key:1, value:[11, 12, 13, 14, 15, 16, 17, 18, 19]
1> key:2, value:[111, 112, 113, 114, 115, 116, 117, 118, 119]
1> key:1, value:[12, 13, 14, 15, 16, 17, 18, 19, 20]
1> key:2, value:[112, 113, 114, 115, 116, 117, 118, 119, 120]
1> key:1, value:[13, 14, 15, 16, 17, 18, 19, 20, 21]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121]
1> key:1, value:[14, 15, 16, 17, 18, 19, 20, 21, 22]
1> key:2, value:[114, 115, 116, 117, 118, 119, 120, 121, 122]

可以看到,状态的数据只保留最近10s内的值

OnReadAndWriteReturnExpiredIfNotCleanedUp

1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132]

可以看到,状态保留了所有的数据,因为每次都会读取了数据,所以不会过期

OnReadAndWriteNeverReturnExpired

1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131]

可以看到,状态保留了所有的数据,因为每次都会读取了数据,所以不会过期

相关推荐
拓端研究室TRL3 小时前
【梯度提升专题】XGBoost、Adaboost、CatBoost预测合集:抗乳腺癌药物优化、信贷风控、比特币应用|附数据代码...
大数据
黄焖鸡能干四碗3 小时前
信息化运维方案,实施方案,开发方案,信息中心安全运维资料(软件资料word)
大数据·人工智能·软件需求·设计规范·规格说明书
编码小袁3 小时前
探索数据科学与大数据技术专业本科生的广阔就业前景
大数据
WeeJot嵌入式4 小时前
大数据治理:确保数据的可持续性和价值
大数据
zmd-zk5 小时前
kafka+zookeeper的搭建
大数据·分布式·zookeeper·中间件·kafka
激流丶5 小时前
【Kafka 实战】如何解决Kafka Topic数量过多带来的性能问题?
java·大数据·kafka·topic
测试界的酸菜鱼5 小时前
Python 大数据展示屏实例
大数据·开发语言·python
时差9535 小时前
【面试题】Hive 查询:如何查找用户连续三天登录的记录
大数据·数据库·hive·sql·面试·database
Mephisto.java5 小时前
【大数据学习 | kafka高级部分】kafka中的选举机制
大数据·学习·kafka
Mephisto.java5 小时前
【大数据学习 | kafka高级部分】kafka的优化参数整理
大数据·sql·oracle·kafka·json·database