Flink-反压-2.源码分析-流程-1

前言

整个反压机制不是单单一个算子去实现的,而是上下游协同操作的,因此,解析源码的时候会拆出每个单独的部分,没办法全面去协调解析,很绕,分为以下几步

  1. 下游解析上游发送的数据消息并占用缓冲区,等待下游消费者处理
  2. 下游消费者处理完,回收缓冲区,更新信用值(缓冲区)
  3. 下游计算信用值,并发送给上游
  4. 上游拿到信用值,并根据信用值去发送数据

一.下游解析上游的数据

涉及的核心类如下

  1. CreditBasedPartitionRequestClientHandler
  2. RemoteInputChannel

1.CreditBasedPartitionRequestClientHandler

(1) channelRead()

解析上游发送的消息并分发处理,调decodeMsg(msg)

java 复制代码
// 这是数据接收逻辑:解析上游发送的消息并分发处理,调decodeMsg(msg)
@Override
public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
    try {
        decodeMsg(msg);
    } catch (Throwable t) {
        notifyAllChannelsOfErrorAndClose(t);
    }
}

(2) 调用的decodeMsg()

针对三种不同的消息,进行特殊处理

  1. BufferResponse数据缓冲区消息:获取对应的RemoteInputChannel,然后调decodeBufferOrEvent()去处理msg
  2. ErrorResponse错误消息:
    • 致命错误:通知所有通道关闭
    • 非致命错误:仅通知相关通道关闭
  3. BacklogAnnouncement积压消息:同样,获取对应的RemoteInputChannel,然后调其.onSenderBacklog()处理积压情况
java 复制代码
/* 消息的解码和分析,分为以下三种消息
*  1.BufferResponse:数据缓冲区(包含实际数据)
*  2.ErrorResponse:错误通知(如分区不存在、远程任务失败)
*  3.BacklogAnnouncement:积压通知(告知下游上游当前积压的数据量)
* */
private void decodeMsg(Object msg) {
    final Class<?> msgClazz = msg.getClass();

    // ---- Buffer --------------------------------------------------------
    // 情况1:BufferResponse数据缓冲区
    if (msgClazz == NettyMessage.BufferResponse.class) {
        NettyMessage.BufferResponse bufferOrEvent = (NettyMessage.BufferResponse) msg;
        // 获取目标输入通道,这里有个概念,就是上游的RSP都有对应下游IC的id,因此,才能知道该把数据发给谁
        RemoteInputChannel inputChannel = inputChannels.get(bufferOrEvent.receiverId);
        // 通道无效,则释放缓冲区并取消请求
        if (inputChannel == null || inputChannel.isReleased()) {
            bufferOrEvent.releaseBuffer();

            cancelRequestFor(bufferOrEvent.receiverId);

            return;
        }
        // 通道有效,调decodeBufferOrEvent()去处理缓冲区
        try {
            decodeBufferOrEvent(inputChannel, bufferOrEvent);
        } catch (Throwable t) {
            inputChannel.onError(t);
        }

    }
    // 情况2:ErrorResponse错误通知
    else if (msgClazz == NettyMessage.ErrorResponse.class) {
        // ---- Error ---------------------------------------------------------
        NettyMessage.ErrorResponse error = (NettyMessage.ErrorResponse) msg;

        SocketAddress remoteAddr = ctx.channel().remoteAddress();
        // 致命错误,通知所有通道并关闭链接
        if (error.isFatalError()) {
            notifyAllChannelsOfErrorAndClose(
                    new RemoteTransportException(
                            "Fatal error at remote task manager '"
                                    + remoteAddr
                                    + " [ "
                                    + connectionID.getResourceID().getStringWithMetadata()
                                    + " ] "
                                    + "'.",
                            remoteAddr,
                            error.cause));
        } else { // 非致命错误,仅通知特定通道
            RemoteInputChannel inputChannel = inputChannels.get(error.receiverId);

            if (inputChannel != null) {
                if (error.cause.getClass() == PartitionNotFoundException.class) { // 分区不存在,特殊处理
                    inputChannel.onFailedPartitionRequest();
                } else { // 其他错误处理
                    inputChannel.onError(
                            new RemoteTransportException(
                                    "Error at remote task manager '"
                                            + remoteAddr
                                            + " [ "
                                            + connectionID
                                                    .getResourceID()
                                                    .getStringWithMetadata()
                                            + " ] "
                                            + "'.",
                                    remoteAddr,
                                    error.cause));
                }
            }
        }
    }
    // 情况3:BacklogAnnouncement积压通知
    else if (msgClazz == NettyMessage.BacklogAnnouncement.class) {
        NettyMessage.BacklogAnnouncement announcement = (NettyMessage.BacklogAnnouncement) msg;
        // 同样,获取目标输入通道
        RemoteInputChannel inputChannel = inputChannels.get(announcement.receiverId);
        // 通道无效,则取消请求
        if (inputChannel == null || inputChannel.isReleased()) {
            cancelRequestFor(announcement.receiverId);
            return;
        }
        // 通道有效,调onSenderBacklog()处理积压
        try {
            inputChannel.onSenderBacklog(announcement.backlog);
        } catch (Throwable throwable) {
            inputChannel.onError(throwable);
        }
    } else { // 其他情况,则抛出异常
        throw new IllegalStateException(
                "Received unknown message from producer: " + msg.getClass());
    }
}

(3) 调用的decodeBufferOrEvent()

针对BufferResponse数据缓冲区消息又分为以下几种情况

  1. 空消息(可能是barrier、心跳、边界标记):调inputChannel.onEmptyBuffer()
  2. 有消息(带数据的):对数据进行分片,调sliceBuffer()零拷贝去处理,并且涉及自定义回收逻辑bufferOrEvent.getBuffer().recycleBuffer()
java 复制代码
// 该方法负责将从上游接受的BufferResponse 消息解码并分发给对应的输入通道
private void decodeBufferOrEvent(
        RemoteInputChannel inputChannel, NettyMessage.BufferResponse bufferOrEvent)
        throws Throwable {
    // 1.空缓冲区:可能是心跳、缓冲区边界标记
    if (bufferOrEvent.isBuffer() && bufferOrEvent.bufferSize == 0) {
        inputChannel.onEmptyBuffer(bufferOrEvent.sequenceNumber, bufferOrEvent.backlog);
    }
    // 2.有效缓冲区:有数据的
    else if (bufferOrEvent.getBuffer() != null) {
        // 采用的是分片处理的方式
        if (bufferOrEvent.numOfPartialBuffers > 0) {
            int offset = 0; // 记录当前分片在原始缓冲区中的起始位置

            int seq = bufferOrEvent.sequenceNumber; // 起始序列号
            AtomicInteger waitToBeReleased =
                    new AtomicInteger(bufferOrEvent.numOfPartialBuffers); // 待释放分片计数器,递减
            AtomicInteger processedPartialBuffers = new AtomicInteger(0); // 已处理分片计数器,递增
            try {
                for (int i = 0; i < bufferOrEvent.numOfPartialBuffers; i++) {
                    int size = bufferOrEvent.getPartialBufferSizes().get(i); // 获取当前分片的大小
                    // 1.处理计数
                    processedPartialBuffers.incrementAndGet();
                    // 2.创建分片并传递给输入通道的本地,进行维护
                    inputChannel.onBuffer(
                            sliceBuffer(
                                    bufferOrEvent,
                                    memorySegment -> { // 自定义缓冲区回收逻辑,当所有分片处理完成,则调用缓冲区回收逻辑,回收的是整个buffer
                                        if (waitToBeReleased.decrementAndGet() == 0) {
                                            bufferOrEvent.getBuffer().recycleBuffer();
                                        }
                                    },
                                    offset,
                                    size), // 创建分片缓冲区,采用零拷贝
                            seq++, // 递增序列号
                            i == bufferOrEvent.numOfPartialBuffers - 1
                                    ? bufferOrEvent.backlog
                                    : -1, // 仅最后分片携带积压信息
                            -1);
                    // 3. 更新偏移量
                    offset += size;
                }
            } catch (Throwable throwable) {
                LOG.error("Failed to process partial buffers.", throwable);
                if (processedPartialBuffers.get() != bufferOrEvent.numOfPartialBuffers) {
                    bufferOrEvent.getBuffer().recycleBuffer();
                }
                throw throwable;
            }
        } else {
            inputChannel.onBuffer(
                    bufferOrEvent.getBuffer(),
                    bufferOrEvent.sequenceNumber,
                    bufferOrEvent.backlog,
                    bufferOrEvent.subpartitionId);
        }

    }
    // 3.其他情况,直接抛出异常
    else {
        throw new IllegalStateException(
                "The read buffer is null in credit-based input channel.");
    }
}

(4) 调用的sliceBuffer()

零拷贝,只引用

java 复制代码
private static NetworkBuffer sliceBuffer(
        NettyMessage.BufferResponse bufferOrEvent,
        BufferRecycler recycler,
        int offset,
        int size) {
    // 1.从原始缓冲区获取指定位置的ByteBuffer
    ByteBuffer nioBuffer = bufferOrEvent.getBuffer().getNioBuffer(offset, size);

    // 2.对nioBuffer包装成MemorySegment(零拷贝,只是引用)
    MemorySegment segment;
    if (nioBuffer.isDirect()) {
        segment = MemorySegmentFactory.wrapOffHeapMemory(nioBuffer);
    } else {
        byte[] bytes = nioBuffer.array();
        segment = MemorySegmentFactory.wrap(bytes);
    }
    // 3.创建新的NetworkBuffer,使用自定义回收逻辑,上面传入的recycler
    return new NetworkBuffer(
            segment, recycler, bufferOrEvent.dataType, bufferOrEvent.isCompressed, size);
}

好了,到这里,我们发现CreditBasedPartitionRequestClientHandler只是对消息进行分类,封装,然后具体发送处理还是调的RemoteInputChannel的一系列方法

2.RemoteInputChannel

(1) onBuffer()

java 复制代码
public void onBuffer(Buffer buffer, int sequenceNumber, int backlog, int subpartitionId)
        throws IOException {
    boolean recycleBuffer = true;

    try {
        // 缓冲区顺序校验
        if (expectedSequenceNumber != sequenceNumber) {
            onError(new BufferReorderingException(expectedSequenceNumber, sequenceNumber));
            return;
        }
        // 针对特殊数据类型(如barrier),阻塞上游数据
        if (buffer.getDataType().isBlockingUpstream()) {
            onBlockingUpstream();
            // 要求backlog必须为0
            checkArgument(backlog == 0, "Illegal number of backlog: %s, should be 0.", backlog);
        }

        final boolean wasEmpty;
        boolean firstPriorityEvent = false;
        // 同步处理接收的缓冲区
        synchronized (receivedBuffers) {
            // 记录接收日志
            NetworkActionsLogger.traceInput(
                    "RemoteInputChannel#onBuffer",
                    buffer,
                    inputGate.getOwningTaskName(),
                    channelInfo,
                    channelStatePersister,
                    sequenceNumber);
            // Similar to notifyBufferAvailable(), make sure that we never add a buffer
            // after releaseAllResources() released all buffers from receivedBuffers
            // (see above for details).
            // 若通道已经释放,直接return;否则,执行下发逻辑
            if (isReleased.get()) {
                return;
            }

            wasEmpty = receivedBuffers.isEmpty();
            // 封装缓冲区为SequenceBuffer
            SequenceBuffer sequenceBuffer =
                    new SequenceBuffer(buffer, sequenceNumber, subpartitionId);
            DataType dataType = buffer.getDataType();
            // 只要有数据,就将recycleBuffer置为false,以免回收,表示当前数据正在占用缓冲区
            // 对于优先级事件如barrier,加入到receivedBuffers的优先级队列中
            if (dataType.hasPriority()) {
                firstPriorityEvent = addPriorityBuffer(sequenceBuffer);
                recycleBuffer = false;
            } else {// 对于普通事件,加入到receivedBuffers的普通队列中
                receivedBuffers.add(sequenceBuffer);
                recycleBuffer = false;
                if (dataType.requiresAnnouncement()) {
                    firstPriorityEvent = addPriorityBuffer(announce(sequenceBuffer));
                }
            }

            // 更新队列总大小
            totalQueueSizeInBytes += buffer.getSize();
            // 检测barrier
            final OptionalLong barrierId =
                    channelStatePersister.checkForBarrier(sequenceBuffer.buffer);
            if (barrierId.isPresent() && barrierId.getAsLong() > lastBarrierId) {
                // checkpoint was not yet started by task thread,
                // so remember the numbers of buffers to spill for the time when
                // it will be started
                lastBarrierId = barrierId.getAsLong();
                lastBarrierSequenceNumber = sequenceBuffer.sequenceNumber;
            }
            // 持久化通道状态
            channelStatePersister.maybePersist(buffer);
            // 更新序列号
            ++expectedSequenceNumber;
        }
        // 调notifyPriorityEvent()优先处理barrier情况
        if (firstPriorityEvent) {
            notifyPriorityEvent(sequenceNumber);
        }
        // 调notifyChannelNonEmpty()处理普通数据
        if (wasEmpty) {
            notifyChannelNonEmpty();
        }
        // 背压反馈
        if (backlog >= 0) {
            onSenderBacklog(backlog);
        }
    } finally {
        // 若recycleBuffer为true,表示缓冲区可回收,更新信用值
        if (recycleBuffer) {
            buffer.recycleBuffer();
        }
    }
}

到这,我们看得出来其实onBuffer()方法是将数据占用上缓冲区,只有特殊情况才会调buffer.recycleBuffer()回收缓冲区,那么,消费完缓冲区的数据后再回收缓冲区的一定另有其人

二.消费完缓冲区数据后,回收缓冲区,更新信用值

以StreamTask为例子

1.算子做了啥

(1) StreamTask.processInput()

其实还是调的inputProcessor.processInput()

java 复制代码
protected void processInput(MailboxDefaultAction.Controller controller) throws Exception {
    DataInputStatus status = inputProcessor.processInput();
    // 其他代码不重要,这里就给省略了
    。。。
}

(2) StreamInputProcessor实现类的processInput()

StreamInputProcessor是一个接口其实现类如下 StreamOneInputProcessor为例,它实现的processInput()如下

其实也是调的input.emitNext(output)input是StreamTaskInput实现类

java 复制代码
@Override
public DataInputStatus processInput() throws Exception {
    DataInputStatus status = input.emitNext(output);

    if (status == DataInputStatus.END_OF_DATA) {
        endOfInputAware.endInput(input.getInputIndex() + 1);
        output = new FinishedDataOutput<>();
    } else if (status == DataInputStatus.END_OF_RECOVERY) {
        if (input instanceof RecoverableStreamTaskInput) {
            input = ((RecoverableStreamTaskInput<IN>) input).finishRecovery();
        }
        return DataInputStatus.MORE_AVAILABLE;
    }

    return status;
}

(3) StreamTaskInput实现类的emitNext()

StreamTaskInput是一个接口,其实现类如下图 AbstractStreamTaskNetworkInput为例子,其实现的emitNext()如下

其实它又调了currentRecordDeserializer.getNextRecord(),currentRecordDeserializer是RecordDeserializer实现类

java 复制代码
@Override
public DataInputStatus emitNext(DataOutput<T> output) throws Exception {

    while (true) {
        // get the stream element from the deserializer
        if (currentRecordDeserializer != null) {
            RecordDeserializer.DeserializationResult result;
            try {
                result = currentRecordDeserializer.getNextRecord(deserializationDelegate);
            } catch (IOException e) {
                throw new IOException(
                        String.format("Can't get next record for channel %s", lastChannel), e);
            }
            // 缓冲区数据呗完全消费完后,释放引用
            if (result.isBufferConsumed()) {
                currentRecordDeserializer = null;
            }

            if (result.isFullRecord()) {
                final boolean breakBatchEmitting =
                        processElement(deserializationDelegate.getInstance(), output);
                if (canEmitBatchOfRecords.check() && !breakBatchEmitting) {
                    continue;
                }
                return DataInputStatus.MORE_AVAILABLE;
            }
        }

        Optional<BufferOrEvent> bufferOrEvent = checkpointedInputGate.pollNext();
        if (bufferOrEvent.isPresent()) {
            // return to the mailbox after receiving a checkpoint barrier to avoid processing of
            // data after the barrier before checkpoint is performed for unaligned checkpoint
            // mode
            // processBuffer() 会将 BufferOrEvent 转换为 RecordDeserializer 并解析数据。
            if (bufferOrEvent.get().isBuffer()) {
                processBuffer(bufferOrEvent.get());
            } else {
                DataInputStatus status = processEvent(bufferOrEvent.get(), output);
                if (status == DataInputStatus.MORE_AVAILABLE && canEmitBatchOfRecords.check()) {
                    continue;
                }
                return status;
            }
        } else {
            if (checkpointedInputGate.isFinished()) {
                checkState(
                        checkpointedInputGate.getAvailableFuture().isDone(),
                        "Finished BarrierHandler should be available");
                return DataInputStatus.END_OF_INPUT;
            }
            return DataInputStatus.NOTHING_AVAILABLE;
        }
    }
}

(4) RecordDeserializer实现类的getNextRecord()

RecordDeserializer也是一个接口,其实现类如下 SpillingAdaptiveSpanningRecordDeserializer为例,其实现的getNextRecord()如下

java 复制代码
@Override
public DeserializationResult getNextRecord(T target) throws IOException {
    // always check the non-spanning wrapper first.
    // this should be the majority of the cases for small records
    // for large records, this portion of the work is very small in comparison anyways

    // 1. 尝试从当前缓冲区读取下一条完整记录
    final DeserializationResult result = readNextRecord(target);
    // 2. 如果当前缓冲区已被完全消费(即所有数据都被解析)
    if (result.isBufferConsumed()) {
        // 2.1 回收当前缓冲区
        currentBuffer.recycleBuffer();
        // 2.2 释放对缓冲区的引用,以便 GC
        currentBuffer = null;
    }
    // 3. 返回解析结果(包含是否成功解析完整记录、缓冲区是否被消费等信息)
    return result;
}

到这,我们发现它调用了buffer的recycleBuffer()去回收缓冲区,那么下面我们看buffer是怎么做的

2.Buffer做了啥

Buffer是一个接口,其实现类如下图

NetworkBuffer为例,其实现的recycleBuffer()如下

(1) NetworkBuffer

java 复制代码
public class NetworkBuffer extends AbstractReferenceCountedByteBuf implements Buffer {
  private BufferRecycler recycler;
  。。。

  // 调AbstractReferenceCountedByteBuf的release()方法
  @Override
  public void recycleBuffer() {
      release();
  }
  。。。
  
  // 由AbstractReferenceCountedByteBuf.handleRelease()调用
  @Override
  protected void deallocate() {
      // 调用 BufferRecycler 回收内存
      recycler.recycle(memorySegment);
  }
}


而AbstractReferenceCountedByteBuf是一个抽象类,其release()又调handleRelease()再调实现类的deallocate()
  public boolean release() {
      return this.handleRelease(updater.release(this));
  }

  public boolean release(int decrement) {
      return this.handleRelease(updater.release(this, decrement));
  }

  private boolean handleRelease(boolean result) {
      if (result) {
          this.deallocate();
      }

      return result;
  }

(2) 调用的BufferRecycler.recycle() -- 重要

BufferRecycler是一个接口,其实现类如下

BufferManger为例

<1> BufferManager.recycle()

流程如下

  1. 若输入通道已释放:将内存段直接返回给全局缓冲区池(globalPool),无需更新信用值
  2. 若输入通道正常未释放:将内存段封装成NetworkBuffer加入到专属缓冲区队列中,然后判断是否需要释放浮动缓冲区
  3. 若释放了浮动缓冲区,调用LocalBufferPool的recycleBuffer()回收该浮动缓冲区
  4. 若没有释放浮动缓冲区,则信用值+1,通知上游
java 复制代码
@Override
public void recycle(MemorySegment segment) {
    @Nullable Buffer releasedFloatingBuffer = null;
    synchronized (bufferQueue) {
        try {
            // Similar to notifyBufferAvailable(), make sure that we never add a buffer
            // after channel released all buffers via releaseAllResources().
            // 情况1:输入通道已经释放
            if (inputChannel.isReleased()) {
                // 如果输入通道已关闭,则将内存段直接返回给全局缓冲区池(globalPool),无需更新信用值。
                globalPool.recycleUnpooledMemorySegments(Collections.singletonList(segment));
                return;
            }
            // 情况2:输入通道正常
            else {
                // 将回收的内存段包装为新的 NetworkBuffer(关联当前 BufferManager 作为回收器)。
                // 调用 bufferQueue.addExclusiveBuffer() 将该NetworkBuffer加入专属缓冲队列,并检查是否需要释放浮动缓冲区,赋值给releasedFloatingBuffer。
                releasedFloatingBuffer =
                        bufferQueue.addExclusiveBuffer(
                                new NetworkBuffer(segment, this), numRequiredBuffers);
            }
        } catch (Throwable t) {
            ExceptionUtils.rethrow(t);
        } finally {
            bufferQueue.notifyAll();
        }
    }
    // 若释放了浮动缓冲区,递归回收该浮动缓冲区
    if (releasedFloatingBuffer != null) {
        // 这里浮动缓冲区其实调的是LocalBufferPool的recycleBuffer(),具体原因看下一篇文章关于浮动缓冲区和专属缓冲区
        releasedFloatingBuffer.recycleBuffer();
    }
    // 若没有释放浮动缓冲区,则通知输入通道有一个新缓冲区可用  就是告诉上游,当前信用值+1
    else {
        try {
            inputChannel.notifyBufferAvailable(1);
        } catch (Throwable t) {
            ExceptionUtils.rethrow(t);
        }
    }
}
<2> BufferManager#AvailableBufferQueue.addExclusiveBuffer()

这个AvailableBufferQueueBufferManager的内部类,他的主要属性如下

java 复制代码
static final class AvailableBufferQueue {  
    // 1.来自固定缓冲池的浮动缓冲区队列
    final ArrayDeque<Buffer> floatingBuffers;  
    // 2.来自全局缓冲池的专属缓冲区队列
    final ArrayDeque<Buffer> exclusiveBuffers;

addExclusiveBuffer()流程如下

  1. 将新缓冲区加入到专属缓冲区队列
  2. 如果当前可用的缓冲区数量大于所需缓冲区数量,则释放一个浮动缓冲区
java 复制代码
@Nullable
Buffer addExclusiveBuffer(Buffer buffer, int numRequiredBuffers) {
    // 1.将新缓冲区加入到专属缓冲区队列
    exclusiveBuffers.add(buffer);
    // 2.如果当前可用的缓冲区数量大于所需缓冲区数量,则释放一个浮动缓冲区   专属不可释放
    if (getAvailableBufferSize() > numRequiredBuffers) {
        return floatingBuffers.poll();
    }
    return null;
}
<3> BufferManager#AvailableBufferQueue.getAvailableBufferSize()

当前可用的缓冲区数量 = 浮动缓冲区数量 + 专属缓冲区数量

java 复制代码
// 计算当前可用的缓冲区数量 = 浮动缓冲区数量 + 专属缓冲区数量
int getAvailableBufferSize() {
    return floatingBuffers.size() + exclusiveBuffers.size();
}
相关推荐
uzong5 小时前
技术故障复盘模版
后端
GetcharZp5 小时前
基于 Dify + 通义千问的多模态大模型 搭建发票识别 Agent
后端·llm·agent
桦说编程6 小时前
Java 中如何创建不可变类型
java·后端·函数式编程
IT毕设实战小研6 小时前
基于Spring Boot 4s店车辆管理系统 租车管理系统 停车位管理系统 智慧车辆管理系统
java·开发语言·spring boot·后端·spring·毕业设计·课程设计
wyiyiyi6 小时前
【Web后端】Django、flask及其场景——以构建系统原型为例
前端·数据库·后端·python·django·flask
阿华的代码王国7 小时前
【Android】RecyclerView复用CheckBox的异常状态
android·xml·java·前端·后端
Jimmy7 小时前
AI 代理是什么,其有助于我们实现更智能编程
前端·后端·ai编程
AntBlack8 小时前
不当韭菜V1.1 :增强能力 ,辅助构建自己的交易规则
后端·python·pyqt
青云交8 小时前
Java 大视界 -- 基于 Java 的大数据可视化在城市交通拥堵治理与出行效率提升中的应用(398)
java·大数据·flink·大数据可视化·拥堵预测·城市交通治理·实时热力图
bobz9658 小时前
pip install 已经不再安全
后端