Flink-反压-2.源码分析-流程-1

前言

整个反压机制不是单单一个算子去实现的,而是上下游协同操作的,因此,解析源码的时候会拆出每个单独的部分,没办法全面去协调解析,很绕,分为以下几步

  1. 下游解析上游发送的数据消息并占用缓冲区,等待下游消费者处理
  2. 下游消费者处理完,回收缓冲区,更新信用值(缓冲区)
  3. 下游计算信用值,并发送给上游
  4. 上游拿到信用值,并根据信用值去发送数据

一.下游解析上游的数据

涉及的核心类如下

  1. CreditBasedPartitionRequestClientHandler
  2. RemoteInputChannel

1.CreditBasedPartitionRequestClientHandler

(1) channelRead()

解析上游发送的消息并分发处理,调decodeMsg(msg)

java 复制代码
// 这是数据接收逻辑:解析上游发送的消息并分发处理,调decodeMsg(msg)
@Override
public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
    try {
        decodeMsg(msg);
    } catch (Throwable t) {
        notifyAllChannelsOfErrorAndClose(t);
    }
}

(2) 调用的decodeMsg()

针对三种不同的消息,进行特殊处理

  1. BufferResponse数据缓冲区消息:获取对应的RemoteInputChannel,然后调decodeBufferOrEvent()去处理msg
  2. ErrorResponse错误消息:
    • 致命错误:通知所有通道关闭
    • 非致命错误:仅通知相关通道关闭
  3. BacklogAnnouncement积压消息:同样,获取对应的RemoteInputChannel,然后调其.onSenderBacklog()处理积压情况
java 复制代码
/* 消息的解码和分析,分为以下三种消息
*  1.BufferResponse:数据缓冲区(包含实际数据)
*  2.ErrorResponse:错误通知(如分区不存在、远程任务失败)
*  3.BacklogAnnouncement:积压通知(告知下游上游当前积压的数据量)
* */
private void decodeMsg(Object msg) {
    final Class<?> msgClazz = msg.getClass();

    // ---- Buffer --------------------------------------------------------
    // 情况1:BufferResponse数据缓冲区
    if (msgClazz == NettyMessage.BufferResponse.class) {
        NettyMessage.BufferResponse bufferOrEvent = (NettyMessage.BufferResponse) msg;
        // 获取目标输入通道,这里有个概念,就是上游的RSP都有对应下游IC的id,因此,才能知道该把数据发给谁
        RemoteInputChannel inputChannel = inputChannels.get(bufferOrEvent.receiverId);
        // 通道无效,则释放缓冲区并取消请求
        if (inputChannel == null || inputChannel.isReleased()) {
            bufferOrEvent.releaseBuffer();

            cancelRequestFor(bufferOrEvent.receiverId);

            return;
        }
        // 通道有效,调decodeBufferOrEvent()去处理缓冲区
        try {
            decodeBufferOrEvent(inputChannel, bufferOrEvent);
        } catch (Throwable t) {
            inputChannel.onError(t);
        }

    }
    // 情况2:ErrorResponse错误通知
    else if (msgClazz == NettyMessage.ErrorResponse.class) {
        // ---- Error ---------------------------------------------------------
        NettyMessage.ErrorResponse error = (NettyMessage.ErrorResponse) msg;

        SocketAddress remoteAddr = ctx.channel().remoteAddress();
        // 致命错误,通知所有通道并关闭链接
        if (error.isFatalError()) {
            notifyAllChannelsOfErrorAndClose(
                    new RemoteTransportException(
                            "Fatal error at remote task manager '"
                                    + remoteAddr
                                    + " [ "
                                    + connectionID.getResourceID().getStringWithMetadata()
                                    + " ] "
                                    + "'.",
                            remoteAddr,
                            error.cause));
        } else { // 非致命错误,仅通知特定通道
            RemoteInputChannel inputChannel = inputChannels.get(error.receiverId);

            if (inputChannel != null) {
                if (error.cause.getClass() == PartitionNotFoundException.class) { // 分区不存在,特殊处理
                    inputChannel.onFailedPartitionRequest();
                } else { // 其他错误处理
                    inputChannel.onError(
                            new RemoteTransportException(
                                    "Error at remote task manager '"
                                            + remoteAddr
                                            + " [ "
                                            + connectionID
                                                    .getResourceID()
                                                    .getStringWithMetadata()
                                            + " ] "
                                            + "'.",
                                    remoteAddr,
                                    error.cause));
                }
            }
        }
    }
    // 情况3:BacklogAnnouncement积压通知
    else if (msgClazz == NettyMessage.BacklogAnnouncement.class) {
        NettyMessage.BacklogAnnouncement announcement = (NettyMessage.BacklogAnnouncement) msg;
        // 同样,获取目标输入通道
        RemoteInputChannel inputChannel = inputChannels.get(announcement.receiverId);
        // 通道无效,则取消请求
        if (inputChannel == null || inputChannel.isReleased()) {
            cancelRequestFor(announcement.receiverId);
            return;
        }
        // 通道有效,调onSenderBacklog()处理积压
        try {
            inputChannel.onSenderBacklog(announcement.backlog);
        } catch (Throwable throwable) {
            inputChannel.onError(throwable);
        }
    } else { // 其他情况,则抛出异常
        throw new IllegalStateException(
                "Received unknown message from producer: " + msg.getClass());
    }
}

(3) 调用的decodeBufferOrEvent()

针对BufferResponse数据缓冲区消息又分为以下几种情况

  1. 空消息(可能是barrier、心跳、边界标记):调inputChannel.onEmptyBuffer()
  2. 有消息(带数据的):对数据进行分片,调sliceBuffer()零拷贝去处理,并且涉及自定义回收逻辑bufferOrEvent.getBuffer().recycleBuffer()
java 复制代码
// 该方法负责将从上游接受的BufferResponse 消息解码并分发给对应的输入通道
private void decodeBufferOrEvent(
        RemoteInputChannel inputChannel, NettyMessage.BufferResponse bufferOrEvent)
        throws Throwable {
    // 1.空缓冲区:可能是心跳、缓冲区边界标记
    if (bufferOrEvent.isBuffer() && bufferOrEvent.bufferSize == 0) {
        inputChannel.onEmptyBuffer(bufferOrEvent.sequenceNumber, bufferOrEvent.backlog);
    }
    // 2.有效缓冲区:有数据的
    else if (bufferOrEvent.getBuffer() != null) {
        // 采用的是分片处理的方式
        if (bufferOrEvent.numOfPartialBuffers > 0) {
            int offset = 0; // 记录当前分片在原始缓冲区中的起始位置

            int seq = bufferOrEvent.sequenceNumber; // 起始序列号
            AtomicInteger waitToBeReleased =
                    new AtomicInteger(bufferOrEvent.numOfPartialBuffers); // 待释放分片计数器,递减
            AtomicInteger processedPartialBuffers = new AtomicInteger(0); // 已处理分片计数器,递增
            try {
                for (int i = 0; i < bufferOrEvent.numOfPartialBuffers; i++) {
                    int size = bufferOrEvent.getPartialBufferSizes().get(i); // 获取当前分片的大小
                    // 1.处理计数
                    processedPartialBuffers.incrementAndGet();
                    // 2.创建分片并传递给输入通道的本地,进行维护
                    inputChannel.onBuffer(
                            sliceBuffer(
                                    bufferOrEvent,
                                    memorySegment -> { // 自定义缓冲区回收逻辑,当所有分片处理完成,则调用缓冲区回收逻辑,回收的是整个buffer
                                        if (waitToBeReleased.decrementAndGet() == 0) {
                                            bufferOrEvent.getBuffer().recycleBuffer();
                                        }
                                    },
                                    offset,
                                    size), // 创建分片缓冲区,采用零拷贝
                            seq++, // 递增序列号
                            i == bufferOrEvent.numOfPartialBuffers - 1
                                    ? bufferOrEvent.backlog
                                    : -1, // 仅最后分片携带积压信息
                            -1);
                    // 3. 更新偏移量
                    offset += size;
                }
            } catch (Throwable throwable) {
                LOG.error("Failed to process partial buffers.", throwable);
                if (processedPartialBuffers.get() != bufferOrEvent.numOfPartialBuffers) {
                    bufferOrEvent.getBuffer().recycleBuffer();
                }
                throw throwable;
            }
        } else {
            inputChannel.onBuffer(
                    bufferOrEvent.getBuffer(),
                    bufferOrEvent.sequenceNumber,
                    bufferOrEvent.backlog,
                    bufferOrEvent.subpartitionId);
        }

    }
    // 3.其他情况,直接抛出异常
    else {
        throw new IllegalStateException(
                "The read buffer is null in credit-based input channel.");
    }
}

(4) 调用的sliceBuffer()

零拷贝,只引用

java 复制代码
private static NetworkBuffer sliceBuffer(
        NettyMessage.BufferResponse bufferOrEvent,
        BufferRecycler recycler,
        int offset,
        int size) {
    // 1.从原始缓冲区获取指定位置的ByteBuffer
    ByteBuffer nioBuffer = bufferOrEvent.getBuffer().getNioBuffer(offset, size);

    // 2.对nioBuffer包装成MemorySegment(零拷贝,只是引用)
    MemorySegment segment;
    if (nioBuffer.isDirect()) {
        segment = MemorySegmentFactory.wrapOffHeapMemory(nioBuffer);
    } else {
        byte[] bytes = nioBuffer.array();
        segment = MemorySegmentFactory.wrap(bytes);
    }
    // 3.创建新的NetworkBuffer,使用自定义回收逻辑,上面传入的recycler
    return new NetworkBuffer(
            segment, recycler, bufferOrEvent.dataType, bufferOrEvent.isCompressed, size);
}

好了,到这里,我们发现CreditBasedPartitionRequestClientHandler只是对消息进行分类,封装,然后具体发送处理还是调的RemoteInputChannel的一系列方法

2.RemoteInputChannel

(1) onBuffer()

java 复制代码
public void onBuffer(Buffer buffer, int sequenceNumber, int backlog, int subpartitionId)
        throws IOException {
    boolean recycleBuffer = true;

    try {
        // 缓冲区顺序校验
        if (expectedSequenceNumber != sequenceNumber) {
            onError(new BufferReorderingException(expectedSequenceNumber, sequenceNumber));
            return;
        }
        // 针对特殊数据类型(如barrier),阻塞上游数据
        if (buffer.getDataType().isBlockingUpstream()) {
            onBlockingUpstream();
            // 要求backlog必须为0
            checkArgument(backlog == 0, "Illegal number of backlog: %s, should be 0.", backlog);
        }

        final boolean wasEmpty;
        boolean firstPriorityEvent = false;
        // 同步处理接收的缓冲区
        synchronized (receivedBuffers) {
            // 记录接收日志
            NetworkActionsLogger.traceInput(
                    "RemoteInputChannel#onBuffer",
                    buffer,
                    inputGate.getOwningTaskName(),
                    channelInfo,
                    channelStatePersister,
                    sequenceNumber);
            // Similar to notifyBufferAvailable(), make sure that we never add a buffer
            // after releaseAllResources() released all buffers from receivedBuffers
            // (see above for details).
            // 若通道已经释放,直接return;否则,执行下发逻辑
            if (isReleased.get()) {
                return;
            }

            wasEmpty = receivedBuffers.isEmpty();
            // 封装缓冲区为SequenceBuffer
            SequenceBuffer sequenceBuffer =
                    new SequenceBuffer(buffer, sequenceNumber, subpartitionId);
            DataType dataType = buffer.getDataType();
            // 只要有数据,就将recycleBuffer置为false,以免回收,表示当前数据正在占用缓冲区
            // 对于优先级事件如barrier,加入到receivedBuffers的优先级队列中
            if (dataType.hasPriority()) {
                firstPriorityEvent = addPriorityBuffer(sequenceBuffer);
                recycleBuffer = false;
            } else {// 对于普通事件,加入到receivedBuffers的普通队列中
                receivedBuffers.add(sequenceBuffer);
                recycleBuffer = false;
                if (dataType.requiresAnnouncement()) {
                    firstPriorityEvent = addPriorityBuffer(announce(sequenceBuffer));
                }
            }

            // 更新队列总大小
            totalQueueSizeInBytes += buffer.getSize();
            // 检测barrier
            final OptionalLong barrierId =
                    channelStatePersister.checkForBarrier(sequenceBuffer.buffer);
            if (barrierId.isPresent() && barrierId.getAsLong() > lastBarrierId) {
                // checkpoint was not yet started by task thread,
                // so remember the numbers of buffers to spill for the time when
                // it will be started
                lastBarrierId = barrierId.getAsLong();
                lastBarrierSequenceNumber = sequenceBuffer.sequenceNumber;
            }
            // 持久化通道状态
            channelStatePersister.maybePersist(buffer);
            // 更新序列号
            ++expectedSequenceNumber;
        }
        // 调notifyPriorityEvent()优先处理barrier情况
        if (firstPriorityEvent) {
            notifyPriorityEvent(sequenceNumber);
        }
        // 调notifyChannelNonEmpty()处理普通数据
        if (wasEmpty) {
            notifyChannelNonEmpty();
        }
        // 背压反馈
        if (backlog >= 0) {
            onSenderBacklog(backlog);
        }
    } finally {
        // 若recycleBuffer为true,表示缓冲区可回收,更新信用值
        if (recycleBuffer) {
            buffer.recycleBuffer();
        }
    }
}

到这,我们看得出来其实onBuffer()方法是将数据占用上缓冲区,只有特殊情况才会调buffer.recycleBuffer()回收缓冲区,那么,消费完缓冲区的数据后再回收缓冲区的一定另有其人

二.消费完缓冲区数据后,回收缓冲区,更新信用值

以StreamTask为例子

1.算子做了啥

(1) StreamTask.processInput()

其实还是调的inputProcessor.processInput()

java 复制代码
protected void processInput(MailboxDefaultAction.Controller controller) throws Exception {
    DataInputStatus status = inputProcessor.processInput();
    // 其他代码不重要,这里就给省略了
    。。。
}

(2) StreamInputProcessor实现类的processInput()

StreamInputProcessor是一个接口其实现类如下 StreamOneInputProcessor为例,它实现的processInput()如下

其实也是调的input.emitNext(output)input是StreamTaskInput实现类

java 复制代码
@Override
public DataInputStatus processInput() throws Exception {
    DataInputStatus status = input.emitNext(output);

    if (status == DataInputStatus.END_OF_DATA) {
        endOfInputAware.endInput(input.getInputIndex() + 1);
        output = new FinishedDataOutput<>();
    } else if (status == DataInputStatus.END_OF_RECOVERY) {
        if (input instanceof RecoverableStreamTaskInput) {
            input = ((RecoverableStreamTaskInput<IN>) input).finishRecovery();
        }
        return DataInputStatus.MORE_AVAILABLE;
    }

    return status;
}

(3) StreamTaskInput实现类的emitNext()

StreamTaskInput是一个接口,其实现类如下图 AbstractStreamTaskNetworkInput为例子,其实现的emitNext()如下

其实它又调了currentRecordDeserializer.getNextRecord(),currentRecordDeserializer是RecordDeserializer实现类

java 复制代码
@Override
public DataInputStatus emitNext(DataOutput<T> output) throws Exception {

    while (true) {
        // get the stream element from the deserializer
        if (currentRecordDeserializer != null) {
            RecordDeserializer.DeserializationResult result;
            try {
                result = currentRecordDeserializer.getNextRecord(deserializationDelegate);
            } catch (IOException e) {
                throw new IOException(
                        String.format("Can't get next record for channel %s", lastChannel), e);
            }
            // 缓冲区数据呗完全消费完后,释放引用
            if (result.isBufferConsumed()) {
                currentRecordDeserializer = null;
            }

            if (result.isFullRecord()) {
                final boolean breakBatchEmitting =
                        processElement(deserializationDelegate.getInstance(), output);
                if (canEmitBatchOfRecords.check() && !breakBatchEmitting) {
                    continue;
                }
                return DataInputStatus.MORE_AVAILABLE;
            }
        }

        Optional<BufferOrEvent> bufferOrEvent = checkpointedInputGate.pollNext();
        if (bufferOrEvent.isPresent()) {
            // return to the mailbox after receiving a checkpoint barrier to avoid processing of
            // data after the barrier before checkpoint is performed for unaligned checkpoint
            // mode
            // processBuffer() 会将 BufferOrEvent 转换为 RecordDeserializer 并解析数据。
            if (bufferOrEvent.get().isBuffer()) {
                processBuffer(bufferOrEvent.get());
            } else {
                DataInputStatus status = processEvent(bufferOrEvent.get(), output);
                if (status == DataInputStatus.MORE_AVAILABLE && canEmitBatchOfRecords.check()) {
                    continue;
                }
                return status;
            }
        } else {
            if (checkpointedInputGate.isFinished()) {
                checkState(
                        checkpointedInputGate.getAvailableFuture().isDone(),
                        "Finished BarrierHandler should be available");
                return DataInputStatus.END_OF_INPUT;
            }
            return DataInputStatus.NOTHING_AVAILABLE;
        }
    }
}

(4) RecordDeserializer实现类的getNextRecord()

RecordDeserializer也是一个接口,其实现类如下 SpillingAdaptiveSpanningRecordDeserializer为例,其实现的getNextRecord()如下

java 复制代码
@Override
public DeserializationResult getNextRecord(T target) throws IOException {
    // always check the non-spanning wrapper first.
    // this should be the majority of the cases for small records
    // for large records, this portion of the work is very small in comparison anyways

    // 1. 尝试从当前缓冲区读取下一条完整记录
    final DeserializationResult result = readNextRecord(target);
    // 2. 如果当前缓冲区已被完全消费(即所有数据都被解析)
    if (result.isBufferConsumed()) {
        // 2.1 回收当前缓冲区
        currentBuffer.recycleBuffer();
        // 2.2 释放对缓冲区的引用,以便 GC
        currentBuffer = null;
    }
    // 3. 返回解析结果(包含是否成功解析完整记录、缓冲区是否被消费等信息)
    return result;
}

到这,我们发现它调用了buffer的recycleBuffer()去回收缓冲区,那么下面我们看buffer是怎么做的

2.Buffer做了啥

Buffer是一个接口,其实现类如下图

NetworkBuffer为例,其实现的recycleBuffer()如下

(1) NetworkBuffer

java 复制代码
public class NetworkBuffer extends AbstractReferenceCountedByteBuf implements Buffer {
  private BufferRecycler recycler;
  。。。

  // 调AbstractReferenceCountedByteBuf的release()方法
  @Override
  public void recycleBuffer() {
      release();
  }
  。。。
  
  // 由AbstractReferenceCountedByteBuf.handleRelease()调用
  @Override
  protected void deallocate() {
      // 调用 BufferRecycler 回收内存
      recycler.recycle(memorySegment);
  }
}


而AbstractReferenceCountedByteBuf是一个抽象类,其release()又调handleRelease()再调实现类的deallocate()
  public boolean release() {
      return this.handleRelease(updater.release(this));
  }

  public boolean release(int decrement) {
      return this.handleRelease(updater.release(this, decrement));
  }

  private boolean handleRelease(boolean result) {
      if (result) {
          this.deallocate();
      }

      return result;
  }

(2) 调用的BufferRecycler.recycle() -- 重要

BufferRecycler是一个接口,其实现类如下

BufferManger为例

<1> BufferManager.recycle()

流程如下

  1. 若输入通道已释放:将内存段直接返回给全局缓冲区池(globalPool),无需更新信用值
  2. 若输入通道正常未释放:将内存段封装成NetworkBuffer加入到专属缓冲区队列中,然后判断是否需要释放浮动缓冲区
  3. 若释放了浮动缓冲区,调用LocalBufferPool的recycleBuffer()回收该浮动缓冲区
  4. 若没有释放浮动缓冲区,则信用值+1,通知上游
java 复制代码
@Override
public void recycle(MemorySegment segment) {
    @Nullable Buffer releasedFloatingBuffer = null;
    synchronized (bufferQueue) {
        try {
            // Similar to notifyBufferAvailable(), make sure that we never add a buffer
            // after channel released all buffers via releaseAllResources().
            // 情况1:输入通道已经释放
            if (inputChannel.isReleased()) {
                // 如果输入通道已关闭,则将内存段直接返回给全局缓冲区池(globalPool),无需更新信用值。
                globalPool.recycleUnpooledMemorySegments(Collections.singletonList(segment));
                return;
            }
            // 情况2:输入通道正常
            else {
                // 将回收的内存段包装为新的 NetworkBuffer(关联当前 BufferManager 作为回收器)。
                // 调用 bufferQueue.addExclusiveBuffer() 将该NetworkBuffer加入专属缓冲队列,并检查是否需要释放浮动缓冲区,赋值给releasedFloatingBuffer。
                releasedFloatingBuffer =
                        bufferQueue.addExclusiveBuffer(
                                new NetworkBuffer(segment, this), numRequiredBuffers);
            }
        } catch (Throwable t) {
            ExceptionUtils.rethrow(t);
        } finally {
            bufferQueue.notifyAll();
        }
    }
    // 若释放了浮动缓冲区,递归回收该浮动缓冲区
    if (releasedFloatingBuffer != null) {
        // 这里浮动缓冲区其实调的是LocalBufferPool的recycleBuffer(),具体原因看下一篇文章关于浮动缓冲区和专属缓冲区
        releasedFloatingBuffer.recycleBuffer();
    }
    // 若没有释放浮动缓冲区,则通知输入通道有一个新缓冲区可用  就是告诉上游,当前信用值+1
    else {
        try {
            inputChannel.notifyBufferAvailable(1);
        } catch (Throwable t) {
            ExceptionUtils.rethrow(t);
        }
    }
}
<2> BufferManager#AvailableBufferQueue.addExclusiveBuffer()

这个AvailableBufferQueueBufferManager的内部类,他的主要属性如下

java 复制代码
static final class AvailableBufferQueue {  
    // 1.来自固定缓冲池的浮动缓冲区队列
    final ArrayDeque<Buffer> floatingBuffers;  
    // 2.来自全局缓冲池的专属缓冲区队列
    final ArrayDeque<Buffer> exclusiveBuffers;

addExclusiveBuffer()流程如下

  1. 将新缓冲区加入到专属缓冲区队列
  2. 如果当前可用的缓冲区数量大于所需缓冲区数量,则释放一个浮动缓冲区
java 复制代码
@Nullable
Buffer addExclusiveBuffer(Buffer buffer, int numRequiredBuffers) {
    // 1.将新缓冲区加入到专属缓冲区队列
    exclusiveBuffers.add(buffer);
    // 2.如果当前可用的缓冲区数量大于所需缓冲区数量,则释放一个浮动缓冲区   专属不可释放
    if (getAvailableBufferSize() > numRequiredBuffers) {
        return floatingBuffers.poll();
    }
    return null;
}
<3> BufferManager#AvailableBufferQueue.getAvailableBufferSize()

当前可用的缓冲区数量 = 浮动缓冲区数量 + 专属缓冲区数量

java 复制代码
// 计算当前可用的缓冲区数量 = 浮动缓冲区数量 + 专属缓冲区数量
int getAvailableBufferSize() {
    return floatingBuffers.size() + exclusiveBuffers.size();
}
相关推荐
超浪的晨6 小时前
Java 实现 B/S 架构详解:从基础到实战,彻底掌握浏览器/服务器编程
java·开发语言·后端·学习·个人开发
追逐时光者7 小时前
一款超级经典复古的 Windows 9x 主题风格 Avalonia UI 控件库,满满的回忆杀!
后端·.net
Python涛哥8 小时前
go语言基础教程:【1】基础语法:变量
开发语言·后端·golang
我命由我123458 小时前
PostgreSQL 保留关键字冲突问题:语法错误 在 “user“ 或附近的 LINE 1: CREATE TABLE user
数据库·后端·sql·mysql·postgresql·问题·数据库系统
LUCIAZZZ9 小时前
final修饰符不可变的底层
java·开发语言·spring boot·后端·spring·操作系统
wsj__WSJ9 小时前
Spring Boot 请求参数绑定:全面解析常用注解及最佳实践
java·spring boot·后端
CodeUp.10 小时前
SpringBoot航空订票系统的设计与实现
java·spring boot·后端
码事漫谈10 小时前
Linux下使用VSCode配置GCC环境与调试指南
后端
求知摆渡10 小时前
RocketMQ 从二进制到 Docker 完整部署(含 Dashboard)
运维·后端