Flink 1.14.* Flink窗口创建和窗口计算源码

解析Flink如何创建的窗口,和以聚合函数为例,窗口如何计算聚合函数

一、构建不同窗口的build类

这个是示例,

java 复制代码
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<Tuple2<String, Integer>> input = env.fromElements(
                Tuple2.of("key1", 1),
                Tuple2.of("key1", 3),
                Tuple2.of("key2", 2),
                Tuple2.of("key2", 4)
 );

1、全局窗口

下面是创建全局窗口的代码

java 复制代码
AllWindowedStream<Tuple2<String, Integer>, TimeWindow> windowed = input.windowAll(TumblingProcessingTimeWindows.of(Time.seconds(5)));
java 复制代码
@PublicEvolving
public <W extends Window> AllWindowedStream<T, W> windowAll(WindowAssigner<? super T, W> assigner) {
    return new AllWindowedStream(this, assigner);
 }
java 复制代码
@Public
public class AllWindowedStream<T, W extends Window> {
    private final KeyedStream<T, Byte> input;
    private final WindowAssigner<? super T, W> windowAssigner;
    private Trigger<? super T, ? super W> trigger;
    private Evictor<? super T, ? super W> evictor;
    private long allowedLateness = 0L;
    private OutputTag<T> lateDataOutputTag;

    @PublicEvolving
    public AllWindowedStream(DataStream<T> input, WindowAssigner<? super T, W> windowAssigner) {
        //这里设置input的KeySelector为null的对象
        this.input = input.keyBy(new NullByteKeySelector());
        this.windowAssigner = windowAssigner;
        this.trigger = windowAssigner.getDefaultTrigger(input.getExecutionEnvironment());
    }
}

AllWindowedStream 是对整个数据流应用窗口操作的抽象,而不进行键分组。换句话说,AllWindowedStream 是对全局数据流进行窗口操作。

使用场景:

  1. 当你不需要对数据流进行键分组,而是希望对整个数据流应用窗口操作时,使用 AllWindowedStream
  2. 适用于全局统计、全局聚合等场景。

2、创建按键分流后的窗口

下面是根据第一位字段当键分流,针对键分的流数据,分别创建窗口

java 复制代码
KeyedStream<Tuple2<String, Integer>, String> keyed = input.keyBy(value -> value.f0);
WindowedStream<Tuple2<String, Integer>, String, TimeWindow> windowed = keyed.window(TumblingProcessingTimeWindows.of(Time.seconds(5)));
java 复制代码
 @PublicEvolving
    public <W extends Window> WindowedStream<T, KEY, W> window(WindowAssigner<? super T, W> assigner) {
        return new WindowedStream(this, assigner);
    }
java 复制代码
@Public
public class WindowedStream<T, K, W extends Window> {
    private final KeyedStream<T, K> input;
    //WindowOperatorBuilder 是 Flink 内部用于构建窗口操作符的工具类。它主要用于在内部构建和配置窗口操作符(WindowOperator),并不直接用于用户代码中。WindowOperatorBuilder 提供了一种灵活的方式来配置窗口操作符的各种细节,包括窗口分配器、窗口触发器、窗口合并器等。
    private final WindowOperatorBuilder<T, K, W> builder;

    @PublicEvolving
    public WindowedStream(KeyedStream<T, K> input, WindowAssigner<? super T, W> windowAssigner) {
        //这里只需要设置input,input的keyBy已经在前面设置了
        this.input = input;
        //通过input.getKeySelector()获取KeyedStream设置的函数
        this.builder = new WindowOperatorBuilder(windowAssigner, windowAssigner.getDefaultTrigger(input.getExecutionEnvironment()), input.getExecutionConfig(), input.getType(), input.getKeySelector(), input.getKeyType());
    }
    //调用WindowedStream的trigger实际上调用的是WindowOperatorBuilder的trigger方法
    @PublicEvolving
    public WindowedStream<T, K, W> trigger(Trigger<? super T, ? super W> trigger) {
        this.builder.trigger(trigger);
        return this;
    }

}   
java 复制代码
public class WindowOperatorBuilder<T, K, W extends Window> {
    private static final String WINDOW_STATE_NAME = "window-contents";
    private final ExecutionConfig config;
    private final WindowAssigner<? super T, W> windowAssigner;
    private final TypeInformation<T> inputType;
    private final KeySelector<T, K> keySelector;
    private final TypeInformation<K> keyType;
    private Trigger<? super T, ? super W> trigger;
    @Nullable
    private Evictor<? super T, ? super W> evictor;
    private long allowedLateness = 0L;
    @Nullable
    private OutputTag<T> lateDataOutputTag;

    public WindowOperatorBuilder(WindowAssigner<? super T, W> windowAssigner, Trigger<? super T, ? super W> trigger, ExecutionConfig config, TypeInformation<T> inputType, KeySelector<T, K> keySelector, TypeInformation<K> keyType) {
        this.windowAssigner = windowAssigner;
        this.config = config;
        this.inputType = inputType;
        //把KeyedStream中的keySelector赋值到WindowOperatorBuilder的keySelector
        this.keySelector = keySelector;
        this.keyType = keyType;
        this.trigger = trigger;
    }
}

WindowedStream 是在对数据流进行键分组后,对每个键的子流应用窗口操作的抽象。也就是说,WindowedStream 是对每个键进行独立的窗口操作。

使用场景:

  1. 当你需要对数据流按键分组,并对每个键的子流应用窗口操作时,使用 WindowedStream
  2. 适用于需要对不同键进行独立统计和聚合的场景。

二、在使用窗口处理数据流时,不同窗口创建的都是窗口算子WindowOperator

这里以聚合函数为例,看不同的窗口类型创建的算子是什么。

1、聚合函数实现

java 复制代码
 // 定义聚合函数
  AggregateFunction<Tuple2<String, Integer>, Tuple2<String, Integer>, Tuple2<String, Integer>> aggregateFunction =
  new AggregateFunction<Tuple2<String, Integer>, Tuple2<String, Integer>, Tuple2<String, Integer>>() {

      @Override
      public Tuple2<String, Integer> createAccumulator() {
          return new Tuple2<>("", 0);
      }

      @Override
      public Tuple2<String, Integer> add(Tuple2<String, Integer> value, Tuple2<String, Integer> accumulator) {
          return new Tuple2<>(value.f0, value.f1 + accumulator.f1);
      }

      @Override
      public Tuple2<String, Integer> getResult(Tuple2<String, Integer> accumulator) {
          return accumulator;
      }

      @Override
      public Tuple2<String, Integer> merge(Tuple2<String, Integer> a, Tuple2<String, Integer> b) {
          return new Tuple2<>(a.f0, a.f1 + b.f1);
      }
  };
  //聚合函数接口 
  public interface AggregateFunction<IN, ACC, OUT> extends Function, Serializable {
    ACC createAccumulator();

    ACC add(IN var1, ACC var2);

    OUT getResult(ACC var1);

    ACC merge(ACC var1, ACC var2);
}

2、创建全局窗口(入参传的是NullByteKeySelector)

根据上面知道,此时

java 复制代码
@Public
public class AllWindowedStream<T, W extends Window> {
	@PublicEvolving
    public AllWindowedStream(DataStream<T> input, WindowAssigner<? super T, W> windowAssigner) {
        //这里设置input的KeySelector为null的对象
        this.input = input.keyBy(new NullByteKeySelector());
        this.windowAssigner = windowAssigner;
        this.trigger = windowAssigner.getDefaultTrigger(input.getExecutionEnvironment());
    }
    @PublicEvolving
    public <ACC, V, R> SingleOutputStreamOperator<R> aggregate(AggregateFunction<T, ACC, V> aggregateFunction, AllWindowFunction<V, R, W> windowFunction, TypeInformation<ACC> accumulatorType, TypeInformation<R> resultType) {
    	//根据AllWindowedStream的构造函数,知道此时this.input.getKeySelector()=new NullByteKeySelector
        KeySelector<T, Byte> keySel = this.input.getKeySelector();

        //省略干扰代码
        AggregatingStateDescriptor<T, ACC, V> stateDesc = new AggregatingStateDescriptor("window-contents", aggregateFunction, accumulatorType.createSerializer(this.getExecutionEnvironment().getConfig()));
      
        operator = new WindowOperator(this.windowAssigner, this.windowAssigner.getWindowSerializer(this.getExecutionEnvironment().getConfig()), keySel, this.input.getKeyType().createSerializer(this.getExecutionEnvironment().getConfig()), stateDesc, new InternalSingleValueAllWindowFunction(windowFunction), this.trigger, this.allowedLateness, this.lateDataOutputTag);
        //省略干扰代码   
        return this.input.transform(opName, resultType, (OneInputStreamOperator)operator).forceNonParallel();
        
    }

}
@PublicEvolving
public class AggregatingStateDescriptor<IN, ACC, OUT> extends StateDescriptor<AggregatingState<IN, OUT>, ACC> {
    private final AggregateFunction<IN, ACC, OUT> aggFunction;
    public AggregatingStateDescriptor(String name, AggregateFunction<IN, ACC, OUT> aggFunction, TypeSerializer<ACC> typeSerializer) {
        super(name, typeSerializer, (Object)null);
        this.aggFunction = (AggregateFunction)Preconditions.checkNotNull(aggFunction);
    }
}

3、创建按键分流后的窗口(入参传的是KeyedStream的KeySelector)

java 复制代码
public class WindowedStream<T, K, W extends Window> {
	 @PublicEvolving
    public WindowedStream(KeyedStream<T, K> input, WindowAssigner<? super T, W> windowAssigner) {
        this.input = input;
        this.builder = new WindowOperatorBuilder(windowAssigner, windowAssigner.getDefaultTrigger(input.getExecutionEnvironment()), input.getExecutionConfig(), input.getType(), input.getKeySelector(), input.getKeyType());
    }
    
    public <ACC, V, R> SingleOutputStreamOperator<R> aggregate(AggregateFunction<T, ACC, V> aggregateFunction, WindowFunction<V, R, K, W> windowFunction, TypeInformation<ACC> accumulatorType, TypeInformation<R> resultType) {
        //删除干扰代码
        aggregateFunction = (AggregateFunction)this.input.getExecutionEnvironment().clean(aggregateFunction);
        String opName = this.builder.generateOperatorName(aggregateFunction, windowFunction);
        OneInputStreamOperator<T, R> operator = this.builder.aggregate(aggregateFunction, windowFunction, accumulatorType);
        return this.input.transform(opName, resultType, operator);
    }

}

通过上面我们知道builder指的是WindowOperatorBuilder,并且构造函数入参中的keySelector实际上是keyedStreamkeySelector

java 复制代码
public class WindowOperatorBuilder<T, K, W extends Window> {

	public WindowOperatorBuilder(WindowAssigner<? super T, W> windowAssigner, Trigger<? super T, ? super W> trigger, ExecutionConfig config, TypeInformation<T> inputType, KeySelector<T, K> keySelector, TypeInformation<K> keyType) {
        this.windowAssigner = windowAssigner;
        this.config = config;
        this.inputType = inputType;
        //这个keySelector = keyedStream的keySelector
        this.keySelector = keySelector;
        this.keyType = keyType;
        this.trigger = trigger;
    }
    		

    public <ACC, V, R> WindowOperator<K, T, ?, R, W> aggregate(AggregateFunction<T, ACC, V> aggregateFunction, WindowFunction<V, R, K, W> windowFunction, TypeInformation<ACC> accumulatorType) {
         //删除干扰代码
          AggregatingStateDescriptor<T, ACC, V> stateDesc = new AggregatingStateDescriptor("window-contents", aggregateFunction, accumulatorType.createSerializer(this.config));
         return this.buildWindowOperator(stateDesc, new InternalSingleValueWindowFunction(windowFunction));
          
        }
        private <ACC, R> WindowOperator<K, T, ACC, R, W> buildWindowOperator(StateDescriptor<? extends AppendingState<T, ACC>, ?> stateDesc, InternalWindowFunction<ACC, R, K, W> function) {
            return new WindowOperator(this.windowAssigner, this.windowAssigner.getWindowSerializer(this.config), this.keySelector, this.keyType.createSerializer(this.config), stateDesc, function, this.trigger, this.allowedLateness, this.lateDataOutputTag);
        }
    }  
}    

两种窗口最后都是构建WindowOperator,只是传的参数不一样,其中全局窗口的keySelectornull对象,按键建窗口的keySelector是取的KeyedStream

3、WindowOperator

java 复制代码
@Internal
public class WindowOperator<K, IN, ACC, OUT, W extends Window> extends AbstractUdfStreamOperator<OUT, InternalWindowFunction<ACC, OUT, K, W>> implements OneInputStreamOperator<IN, OUT>, Triggerable<K, W> {
    private final KeySelector<IN, K> keySelector;
    private transient InternalAppendingState<K, W, IN, ACC, ACC> windowState;
    private final StateDescriptor<? extends AppendingState<IN, ACC>, ?> windowStateDescriptor;


    public WindowOperator(WindowAssigner<? super IN, W> windowAssigner, TypeSerializer<W> windowSerializer, KeySelector<IN, K> keySelector, TypeSerializer<K> keySerializer, StateDescriptor<? extends AppendingState<IN, ACC>, ?> windowStateDescriptor, InternalWindowFunction<ACC, OUT, K, W> windowFunction, Trigger<? super IN, ? super W> trigger, long allowedLateness, OutputTag<IN> lateDataOutputTag) {
        //删除干扰代码    
        this.windowStateDescriptor = windowStateDescriptor;
        this.keySelector = (KeySelector)Preconditions.checkNotNull(keySelector);

    }
    public void open() throws Exception {
        if (this.windowStateDescriptor != null) {
            this.windowState = (InternalAppendingState)this.getOrCreateKeyedState(this.windowSerializer, this.windowStateDescriptor);
        }

    }
    //数据到的执行方法
    public void processElement(StreamRecord<IN> element) throws Exception {
        //它遍历名为elementWindows的迭代器
        Collection<W> elementWindows = this.windowAssigner.assignWindows(element.getValue(), element.getTimestamp(), this.windowAssignerContext);
        //这里有判断窗口是否是像会话窗口那种需要动态合并窗口的逻辑,为了不干扰理解,这里删除了那一块代码逻辑,有兴趣的可以专门去看一下
        //删除干扰代码
        Iterator var12 = elementWindows.iterator();
        label59:
        while(true) {
            Window window;
            TriggerResult triggerResult;
            while(true) {
                //在每次迭代中,它会检查窗口是否已经过期(isWindowLate方法)
                do {
                    if (!var12.hasNext()) {
                        break label59;
                    }

                    window = (Window)var12.next();
                } while(this.isWindowLate(window));
                //更新窗口的状态,将元素值添加到窗口状态中,并在触发器上下文中设置键和窗口
                isSkippedElement = false;
                this.windowState.setCurrentNamespace(window);
                //add方法
                this.windowState.add(element.getValue());
                this.triggerContext.key = key;
                this.triggerContext.window = window;
                //调用onElement方法对元素进行处理并检查触发器结果
                triggerResult = this.triggerContext.onElement(element);
                if (!triggerResult.isFire()) {
                    //如果触发结果不需要触发(isFire() 返回 false),则跳出内部循环。
                    break;
                }
                //如果窗口内容不为空,它将发出窗口内容并终止内部循环
                ACC contents = this.windowState.get();
                if (contents != null) {
                    this.emitWindowContents(window, contents);
                    break;
                }
            }
            //如果触发器结果要求清除窗口(isPurge()返回true),则会清除窗口状态
            if (triggerResult.isPurge()) {
                this.windowState.clear();
            }

            this.registerCleanupTimer(window);
        }
    
    }
    //水位线判断逻辑
    protected boolean isWindowLate(W window) {
        return this.windowAssigner.isEventTime() && this.cleanupTime(window) <= this.internalTimerService.currentWatermark();
    }

}    

这里又发现了熟悉的接口,OneInputStreamOperator<IN, OUT>processElement方法实际上是父类接口Input<IN>的processElement方法

下面是WindowOperator的类关系图,和Flink 1.14.*中flatMap,filter等基本转换函数源码RichFlatMapFunctionRichFilterFunction一样的父类AbstractUdfStreamOperator ,接口新增了特性

通过这些,大家心里应该有数了,不管是FlatMap还是Filter还是窗口,都是基于这个类关系图扩展下来的

相关推荐
陈大爷(有低保)3 分钟前
UDP Socket聊天室(Java)
java·网络协议·udp
kinlon.liu17 分钟前
零信任安全架构--持续验证
java·安全·安全架构·mfa·持续验证
码爸34 分钟前
flink 例子(scala)
大数据·elasticsearch·flink·scala
FLGB34 分钟前
Flink 与 Kubernetes (K8s)、YARN 和 Mesos集成对比
大数据·flink·kubernetes
码爸36 分钟前
flink 批量压缩redis集群 sink
大数据·redis·flink
core51236 分钟前
Flink官方文档
大数据·flink·文档·官方
王哲晓38 分钟前
Linux通过yum安装Docker
java·linux·docker
周全全39 分钟前
Flink1.18.1 Standalone模式集群搭建
大数据·flink·集群·主从·standalone
java66666888842 分钟前
如何在Java中实现高效的对象映射:Dozer与MapStruct的比较与优化
java·开发语言
Violet永存43 分钟前
源码分析:LinkedList
java·开发语言