Flink-反压-2.源码分析-流程-1

前言

整个反压机制不是单单一个算子去实现的,而是上下游协同操作的,因此,解析源码的时候会拆出每个单独的部分,没办法全面去协调解析,很绕,分为以下几步

  1. 下游解析上游发送的数据消息并占用缓冲区,等待下游消费者处理
  2. 下游消费者处理完,回收缓冲区,更新信用值(缓冲区)
  3. 下游计算信用值,并发送给上游
  4. 上游拿到信用值,并根据信用值去发送数据

一.下游解析上游的数据

涉及的核心类如下

  1. CreditBasedPartitionRequestClientHandler
  2. RemoteInputChannel

1.CreditBasedPartitionRequestClientHandler

(1) channelRead()

解析上游发送的消息并分发处理,调decodeMsg(msg)

java 复制代码
// 这是数据接收逻辑:解析上游发送的消息并分发处理,调decodeMsg(msg)
@Override
public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
    try {
        decodeMsg(msg);
    } catch (Throwable t) {
        notifyAllChannelsOfErrorAndClose(t);
    }
}

(2) 调用的decodeMsg()

针对三种不同的消息,进行特殊处理

  1. BufferResponse数据缓冲区消息:获取对应的RemoteInputChannel,然后调decodeBufferOrEvent()去处理msg
  2. ErrorResponse错误消息:
    • 致命错误:通知所有通道关闭
    • 非致命错误:仅通知相关通道关闭
  3. BacklogAnnouncement积压消息:同样,获取对应的RemoteInputChannel,然后调其.onSenderBacklog()处理积压情况
java 复制代码
/* 消息的解码和分析,分为以下三种消息
*  1.BufferResponse:数据缓冲区(包含实际数据)
*  2.ErrorResponse:错误通知(如分区不存在、远程任务失败)
*  3.BacklogAnnouncement:积压通知(告知下游上游当前积压的数据量)
* */
private void decodeMsg(Object msg) {
    final Class<?> msgClazz = msg.getClass();

    // ---- Buffer --------------------------------------------------------
    // 情况1:BufferResponse数据缓冲区
    if (msgClazz == NettyMessage.BufferResponse.class) {
        NettyMessage.BufferResponse bufferOrEvent = (NettyMessage.BufferResponse) msg;
        // 获取目标输入通道,这里有个概念,就是上游的RSP都有对应下游IC的id,因此,才能知道该把数据发给谁
        RemoteInputChannel inputChannel = inputChannels.get(bufferOrEvent.receiverId);
        // 通道无效,则释放缓冲区并取消请求
        if (inputChannel == null || inputChannel.isReleased()) {
            bufferOrEvent.releaseBuffer();

            cancelRequestFor(bufferOrEvent.receiverId);

            return;
        }
        // 通道有效,调decodeBufferOrEvent()去处理缓冲区
        try {
            decodeBufferOrEvent(inputChannel, bufferOrEvent);
        } catch (Throwable t) {
            inputChannel.onError(t);
        }

    }
    // 情况2:ErrorResponse错误通知
    else if (msgClazz == NettyMessage.ErrorResponse.class) {
        // ---- Error ---------------------------------------------------------
        NettyMessage.ErrorResponse error = (NettyMessage.ErrorResponse) msg;

        SocketAddress remoteAddr = ctx.channel().remoteAddress();
        // 致命错误,通知所有通道并关闭链接
        if (error.isFatalError()) {
            notifyAllChannelsOfErrorAndClose(
                    new RemoteTransportException(
                            "Fatal error at remote task manager '"
                                    + remoteAddr
                                    + " [ "
                                    + connectionID.getResourceID().getStringWithMetadata()
                                    + " ] "
                                    + "'.",
                            remoteAddr,
                            error.cause));
        } else { // 非致命错误,仅通知特定通道
            RemoteInputChannel inputChannel = inputChannels.get(error.receiverId);

            if (inputChannel != null) {
                if (error.cause.getClass() == PartitionNotFoundException.class) { // 分区不存在,特殊处理
                    inputChannel.onFailedPartitionRequest();
                } else { // 其他错误处理
                    inputChannel.onError(
                            new RemoteTransportException(
                                    "Error at remote task manager '"
                                            + remoteAddr
                                            + " [ "
                                            + connectionID
                                                    .getResourceID()
                                                    .getStringWithMetadata()
                                            + " ] "
                                            + "'.",
                                    remoteAddr,
                                    error.cause));
                }
            }
        }
    }
    // 情况3:BacklogAnnouncement积压通知
    else if (msgClazz == NettyMessage.BacklogAnnouncement.class) {
        NettyMessage.BacklogAnnouncement announcement = (NettyMessage.BacklogAnnouncement) msg;
        // 同样,获取目标输入通道
        RemoteInputChannel inputChannel = inputChannels.get(announcement.receiverId);
        // 通道无效,则取消请求
        if (inputChannel == null || inputChannel.isReleased()) {
            cancelRequestFor(announcement.receiverId);
            return;
        }
        // 通道有效,调onSenderBacklog()处理积压
        try {
            inputChannel.onSenderBacklog(announcement.backlog);
        } catch (Throwable throwable) {
            inputChannel.onError(throwable);
        }
    } else { // 其他情况,则抛出异常
        throw new IllegalStateException(
                "Received unknown message from producer: " + msg.getClass());
    }
}

(3) 调用的decodeBufferOrEvent()

针对BufferResponse数据缓冲区消息又分为以下几种情况

  1. 空消息(可能是barrier、心跳、边界标记):调inputChannel.onEmptyBuffer()
  2. 有消息(带数据的):对数据进行分片,调sliceBuffer()零拷贝去处理,并且涉及自定义回收逻辑bufferOrEvent.getBuffer().recycleBuffer()
java 复制代码
// 该方法负责将从上游接受的BufferResponse 消息解码并分发给对应的输入通道
private void decodeBufferOrEvent(
        RemoteInputChannel inputChannel, NettyMessage.BufferResponse bufferOrEvent)
        throws Throwable {
    // 1.空缓冲区:可能是心跳、缓冲区边界标记
    if (bufferOrEvent.isBuffer() && bufferOrEvent.bufferSize == 0) {
        inputChannel.onEmptyBuffer(bufferOrEvent.sequenceNumber, bufferOrEvent.backlog);
    }
    // 2.有效缓冲区:有数据的
    else if (bufferOrEvent.getBuffer() != null) {
        // 采用的是分片处理的方式
        if (bufferOrEvent.numOfPartialBuffers > 0) {
            int offset = 0; // 记录当前分片在原始缓冲区中的起始位置

            int seq = bufferOrEvent.sequenceNumber; // 起始序列号
            AtomicInteger waitToBeReleased =
                    new AtomicInteger(bufferOrEvent.numOfPartialBuffers); // 待释放分片计数器,递减
            AtomicInteger processedPartialBuffers = new AtomicInteger(0); // 已处理分片计数器,递增
            try {
                for (int i = 0; i < bufferOrEvent.numOfPartialBuffers; i++) {
                    int size = bufferOrEvent.getPartialBufferSizes().get(i); // 获取当前分片的大小
                    // 1.处理计数
                    processedPartialBuffers.incrementAndGet();
                    // 2.创建分片并传递给输入通道的本地,进行维护
                    inputChannel.onBuffer(
                            sliceBuffer(
                                    bufferOrEvent,
                                    memorySegment -> { // 自定义缓冲区回收逻辑,当所有分片处理完成,则调用缓冲区回收逻辑,回收的是整个buffer
                                        if (waitToBeReleased.decrementAndGet() == 0) {
                                            bufferOrEvent.getBuffer().recycleBuffer();
                                        }
                                    },
                                    offset,
                                    size), // 创建分片缓冲区,采用零拷贝
                            seq++, // 递增序列号
                            i == bufferOrEvent.numOfPartialBuffers - 1
                                    ? bufferOrEvent.backlog
                                    : -1, // 仅最后分片携带积压信息
                            -1);
                    // 3. 更新偏移量
                    offset += size;
                }
            } catch (Throwable throwable) {
                LOG.error("Failed to process partial buffers.", throwable);
                if (processedPartialBuffers.get() != bufferOrEvent.numOfPartialBuffers) {
                    bufferOrEvent.getBuffer().recycleBuffer();
                }
                throw throwable;
            }
        } else {
            inputChannel.onBuffer(
                    bufferOrEvent.getBuffer(),
                    bufferOrEvent.sequenceNumber,
                    bufferOrEvent.backlog,
                    bufferOrEvent.subpartitionId);
        }

    }
    // 3.其他情况,直接抛出异常
    else {
        throw new IllegalStateException(
                "The read buffer is null in credit-based input channel.");
    }
}

(4) 调用的sliceBuffer()

零拷贝,只引用

java 复制代码
private static NetworkBuffer sliceBuffer(
        NettyMessage.BufferResponse bufferOrEvent,
        BufferRecycler recycler,
        int offset,
        int size) {
    // 1.从原始缓冲区获取指定位置的ByteBuffer
    ByteBuffer nioBuffer = bufferOrEvent.getBuffer().getNioBuffer(offset, size);

    // 2.对nioBuffer包装成MemorySegment(零拷贝,只是引用)
    MemorySegment segment;
    if (nioBuffer.isDirect()) {
        segment = MemorySegmentFactory.wrapOffHeapMemory(nioBuffer);
    } else {
        byte[] bytes = nioBuffer.array();
        segment = MemorySegmentFactory.wrap(bytes);
    }
    // 3.创建新的NetworkBuffer,使用自定义回收逻辑,上面传入的recycler
    return new NetworkBuffer(
            segment, recycler, bufferOrEvent.dataType, bufferOrEvent.isCompressed, size);
}

好了,到这里,我们发现CreditBasedPartitionRequestClientHandler只是对消息进行分类,封装,然后具体发送处理还是调的RemoteInputChannel的一系列方法

2.RemoteInputChannel

(1) onBuffer()

java 复制代码
public void onBuffer(Buffer buffer, int sequenceNumber, int backlog, int subpartitionId)
        throws IOException {
    boolean recycleBuffer = true;

    try {
        // 缓冲区顺序校验
        if (expectedSequenceNumber != sequenceNumber) {
            onError(new BufferReorderingException(expectedSequenceNumber, sequenceNumber));
            return;
        }
        // 针对特殊数据类型(如barrier),阻塞上游数据
        if (buffer.getDataType().isBlockingUpstream()) {
            onBlockingUpstream();
            // 要求backlog必须为0
            checkArgument(backlog == 0, "Illegal number of backlog: %s, should be 0.", backlog);
        }

        final boolean wasEmpty;
        boolean firstPriorityEvent = false;
        // 同步处理接收的缓冲区
        synchronized (receivedBuffers) {
            // 记录接收日志
            NetworkActionsLogger.traceInput(
                    "RemoteInputChannel#onBuffer",
                    buffer,
                    inputGate.getOwningTaskName(),
                    channelInfo,
                    channelStatePersister,
                    sequenceNumber);
            // Similar to notifyBufferAvailable(), make sure that we never add a buffer
            // after releaseAllResources() released all buffers from receivedBuffers
            // (see above for details).
            // 若通道已经释放,直接return;否则,执行下发逻辑
            if (isReleased.get()) {
                return;
            }

            wasEmpty = receivedBuffers.isEmpty();
            // 封装缓冲区为SequenceBuffer
            SequenceBuffer sequenceBuffer =
                    new SequenceBuffer(buffer, sequenceNumber, subpartitionId);
            DataType dataType = buffer.getDataType();
            // 只要有数据,就将recycleBuffer置为false,以免回收,表示当前数据正在占用缓冲区
            // 对于优先级事件如barrier,加入到receivedBuffers的优先级队列中
            if (dataType.hasPriority()) {
                firstPriorityEvent = addPriorityBuffer(sequenceBuffer);
                recycleBuffer = false;
            } else {// 对于普通事件,加入到receivedBuffers的普通队列中
                receivedBuffers.add(sequenceBuffer);
                recycleBuffer = false;
                if (dataType.requiresAnnouncement()) {
                    firstPriorityEvent = addPriorityBuffer(announce(sequenceBuffer));
                }
            }

            // 更新队列总大小
            totalQueueSizeInBytes += buffer.getSize();
            // 检测barrier
            final OptionalLong barrierId =
                    channelStatePersister.checkForBarrier(sequenceBuffer.buffer);
            if (barrierId.isPresent() && barrierId.getAsLong() > lastBarrierId) {
                // checkpoint was not yet started by task thread,
                // so remember the numbers of buffers to spill for the time when
                // it will be started
                lastBarrierId = barrierId.getAsLong();
                lastBarrierSequenceNumber = sequenceBuffer.sequenceNumber;
            }
            // 持久化通道状态
            channelStatePersister.maybePersist(buffer);
            // 更新序列号
            ++expectedSequenceNumber;
        }
        // 调notifyPriorityEvent()优先处理barrier情况
        if (firstPriorityEvent) {
            notifyPriorityEvent(sequenceNumber);
        }
        // 调notifyChannelNonEmpty()处理普通数据
        if (wasEmpty) {
            notifyChannelNonEmpty();
        }
        // 背压反馈
        if (backlog >= 0) {
            onSenderBacklog(backlog);
        }
    } finally {
        // 若recycleBuffer为true,表示缓冲区可回收,更新信用值
        if (recycleBuffer) {
            buffer.recycleBuffer();
        }
    }
}

到这,我们看得出来其实onBuffer()方法是将数据占用上缓冲区,只有特殊情况才会调buffer.recycleBuffer()回收缓冲区,那么,消费完缓冲区的数据后再回收缓冲区的一定另有其人

二.消费完缓冲区数据后,回收缓冲区,更新信用值

以StreamTask为例子

1.算子做了啥

(1) StreamTask.processInput()

其实还是调的inputProcessor.processInput()

java 复制代码
protected void processInput(MailboxDefaultAction.Controller controller) throws Exception {
    DataInputStatus status = inputProcessor.processInput();
    // 其他代码不重要,这里就给省略了
    。。。
}

(2) StreamInputProcessor实现类的processInput()

StreamInputProcessor是一个接口其实现类如下 StreamOneInputProcessor为例,它实现的processInput()如下

其实也是调的input.emitNext(output)input是StreamTaskInput实现类

java 复制代码
@Override
public DataInputStatus processInput() throws Exception {
    DataInputStatus status = input.emitNext(output);

    if (status == DataInputStatus.END_OF_DATA) {
        endOfInputAware.endInput(input.getInputIndex() + 1);
        output = new FinishedDataOutput<>();
    } else if (status == DataInputStatus.END_OF_RECOVERY) {
        if (input instanceof RecoverableStreamTaskInput) {
            input = ((RecoverableStreamTaskInput<IN>) input).finishRecovery();
        }
        return DataInputStatus.MORE_AVAILABLE;
    }

    return status;
}

(3) StreamTaskInput实现类的emitNext()

StreamTaskInput是一个接口,其实现类如下图 AbstractStreamTaskNetworkInput为例子,其实现的emitNext()如下

其实它又调了currentRecordDeserializer.getNextRecord(),currentRecordDeserializer是RecordDeserializer实现类

java 复制代码
@Override
public DataInputStatus emitNext(DataOutput<T> output) throws Exception {

    while (true) {
        // get the stream element from the deserializer
        if (currentRecordDeserializer != null) {
            RecordDeserializer.DeserializationResult result;
            try {
                result = currentRecordDeserializer.getNextRecord(deserializationDelegate);
            } catch (IOException e) {
                throw new IOException(
                        String.format("Can't get next record for channel %s", lastChannel), e);
            }
            // 缓冲区数据呗完全消费完后,释放引用
            if (result.isBufferConsumed()) {
                currentRecordDeserializer = null;
            }

            if (result.isFullRecord()) {
                final boolean breakBatchEmitting =
                        processElement(deserializationDelegate.getInstance(), output);
                if (canEmitBatchOfRecords.check() && !breakBatchEmitting) {
                    continue;
                }
                return DataInputStatus.MORE_AVAILABLE;
            }
        }

        Optional<BufferOrEvent> bufferOrEvent = checkpointedInputGate.pollNext();
        if (bufferOrEvent.isPresent()) {
            // return to the mailbox after receiving a checkpoint barrier to avoid processing of
            // data after the barrier before checkpoint is performed for unaligned checkpoint
            // mode
            // processBuffer() 会将 BufferOrEvent 转换为 RecordDeserializer 并解析数据。
            if (bufferOrEvent.get().isBuffer()) {
                processBuffer(bufferOrEvent.get());
            } else {
                DataInputStatus status = processEvent(bufferOrEvent.get(), output);
                if (status == DataInputStatus.MORE_AVAILABLE && canEmitBatchOfRecords.check()) {
                    continue;
                }
                return status;
            }
        } else {
            if (checkpointedInputGate.isFinished()) {
                checkState(
                        checkpointedInputGate.getAvailableFuture().isDone(),
                        "Finished BarrierHandler should be available");
                return DataInputStatus.END_OF_INPUT;
            }
            return DataInputStatus.NOTHING_AVAILABLE;
        }
    }
}

(4) RecordDeserializer实现类的getNextRecord()

RecordDeserializer也是一个接口,其实现类如下 SpillingAdaptiveSpanningRecordDeserializer为例,其实现的getNextRecord()如下

java 复制代码
@Override
public DeserializationResult getNextRecord(T target) throws IOException {
    // always check the non-spanning wrapper first.
    // this should be the majority of the cases for small records
    // for large records, this portion of the work is very small in comparison anyways

    // 1. 尝试从当前缓冲区读取下一条完整记录
    final DeserializationResult result = readNextRecord(target);
    // 2. 如果当前缓冲区已被完全消费(即所有数据都被解析)
    if (result.isBufferConsumed()) {
        // 2.1 回收当前缓冲区
        currentBuffer.recycleBuffer();
        // 2.2 释放对缓冲区的引用,以便 GC
        currentBuffer = null;
    }
    // 3. 返回解析结果(包含是否成功解析完整记录、缓冲区是否被消费等信息)
    return result;
}

到这,我们发现它调用了buffer的recycleBuffer()去回收缓冲区,那么下面我们看buffer是怎么做的

2.Buffer做了啥

Buffer是一个接口,其实现类如下图

NetworkBuffer为例,其实现的recycleBuffer()如下

(1) NetworkBuffer

java 复制代码
public class NetworkBuffer extends AbstractReferenceCountedByteBuf implements Buffer {
  private BufferRecycler recycler;
  。。。

  // 调AbstractReferenceCountedByteBuf的release()方法
  @Override
  public void recycleBuffer() {
      release();
  }
  。。。
  
  // 由AbstractReferenceCountedByteBuf.handleRelease()调用
  @Override
  protected void deallocate() {
      // 调用 BufferRecycler 回收内存
      recycler.recycle(memorySegment);
  }
}


而AbstractReferenceCountedByteBuf是一个抽象类,其release()又调handleRelease()再调实现类的deallocate()
  public boolean release() {
      return this.handleRelease(updater.release(this));
  }

  public boolean release(int decrement) {
      return this.handleRelease(updater.release(this, decrement));
  }

  private boolean handleRelease(boolean result) {
      if (result) {
          this.deallocate();
      }

      return result;
  }

(2) 调用的BufferRecycler.recycle() -- 重要

BufferRecycler是一个接口,其实现类如下

BufferManger为例

<1> BufferManager.recycle()

流程如下

  1. 若输入通道已释放:将内存段直接返回给全局缓冲区池(globalPool),无需更新信用值
  2. 若输入通道正常未释放:将内存段封装成NetworkBuffer加入到专属缓冲区队列中,然后判断是否需要释放浮动缓冲区
  3. 若释放了浮动缓冲区,调用LocalBufferPool的recycleBuffer()回收该浮动缓冲区
  4. 若没有释放浮动缓冲区,则信用值+1,通知上游
java 复制代码
@Override
public void recycle(MemorySegment segment) {
    @Nullable Buffer releasedFloatingBuffer = null;
    synchronized (bufferQueue) {
        try {
            // Similar to notifyBufferAvailable(), make sure that we never add a buffer
            // after channel released all buffers via releaseAllResources().
            // 情况1:输入通道已经释放
            if (inputChannel.isReleased()) {
                // 如果输入通道已关闭,则将内存段直接返回给全局缓冲区池(globalPool),无需更新信用值。
                globalPool.recycleUnpooledMemorySegments(Collections.singletonList(segment));
                return;
            }
            // 情况2:输入通道正常
            else {
                // 将回收的内存段包装为新的 NetworkBuffer(关联当前 BufferManager 作为回收器)。
                // 调用 bufferQueue.addExclusiveBuffer() 将该NetworkBuffer加入专属缓冲队列,并检查是否需要释放浮动缓冲区,赋值给releasedFloatingBuffer。
                releasedFloatingBuffer =
                        bufferQueue.addExclusiveBuffer(
                                new NetworkBuffer(segment, this), numRequiredBuffers);
            }
        } catch (Throwable t) {
            ExceptionUtils.rethrow(t);
        } finally {
            bufferQueue.notifyAll();
        }
    }
    // 若释放了浮动缓冲区,递归回收该浮动缓冲区
    if (releasedFloatingBuffer != null) {
        // 这里浮动缓冲区其实调的是LocalBufferPool的recycleBuffer(),具体原因看下一篇文章关于浮动缓冲区和专属缓冲区
        releasedFloatingBuffer.recycleBuffer();
    }
    // 若没有释放浮动缓冲区,则通知输入通道有一个新缓冲区可用  就是告诉上游,当前信用值+1
    else {
        try {
            inputChannel.notifyBufferAvailable(1);
        } catch (Throwable t) {
            ExceptionUtils.rethrow(t);
        }
    }
}
<2> BufferManager#AvailableBufferQueue.addExclusiveBuffer()

这个AvailableBufferQueueBufferManager的内部类,他的主要属性如下

java 复制代码
static final class AvailableBufferQueue {  
    // 1.来自固定缓冲池的浮动缓冲区队列
    final ArrayDeque<Buffer> floatingBuffers;  
    // 2.来自全局缓冲池的专属缓冲区队列
    final ArrayDeque<Buffer> exclusiveBuffers;

addExclusiveBuffer()流程如下

  1. 将新缓冲区加入到专属缓冲区队列
  2. 如果当前可用的缓冲区数量大于所需缓冲区数量,则释放一个浮动缓冲区
java 复制代码
@Nullable
Buffer addExclusiveBuffer(Buffer buffer, int numRequiredBuffers) {
    // 1.将新缓冲区加入到专属缓冲区队列
    exclusiveBuffers.add(buffer);
    // 2.如果当前可用的缓冲区数量大于所需缓冲区数量,则释放一个浮动缓冲区   专属不可释放
    if (getAvailableBufferSize() > numRequiredBuffers) {
        return floatingBuffers.poll();
    }
    return null;
}
<3> BufferManager#AvailableBufferQueue.getAvailableBufferSize()

当前可用的缓冲区数量 = 浮动缓冲区数量 + 专属缓冲区数量

java 复制代码
// 计算当前可用的缓冲区数量 = 浮动缓冲区数量 + 专属缓冲区数量
int getAvailableBufferSize() {
    return floatingBuffers.size() + exclusiveBuffers.size();
}
相关推荐
牛奔24 分钟前
Go 如何避免频繁抢占?
开发语言·后端·golang
想用offer打牌5 小时前
MCP (Model Context Protocol) 技术理解 - 第二篇
后端·aigc·mcp
KYGALYX7 小时前
服务异步通信
开发语言·后端·微服务·ruby
Hello.Reader7 小时前
Flink ZooKeeper HA 实战原理、必配项、Kerberos、安全与稳定性调优
安全·zookeeper·flink
掘了7 小时前
「2025 年终总结」在所有失去的人中,我最怀念我自己
前端·后端·年终总结
爬山算法7 小时前
Hibernate(90)如何在故障注入测试中使用Hibernate?
java·后端·hibernate
Moment8 小时前
富文本编辑器在 AI 时代为什么这么受欢迎
前端·javascript·后端
Cobyte8 小时前
AI全栈实战:使用 Python+LangChain+Vue3 构建一个 LLM 聊天应用
前端·后端·aigc
程序员侠客行9 小时前
Mybatis连接池实现及池化模式
java·后端·架构·mybatis
Honmaple9 小时前
QMD (Quarto Markdown) 搭建与使用指南
后端