Flink-Checkpoint-1.源码流程

一.先从CheckpointCoordinator入手

1.触发检查点的流程

检查点triggerCheckpoint(CheckpointType checkpointType)->调triggerCheckpointFromCheckpointThread() -> 调triggerCheckpoint()重载方法 -> 调startTriggeringCheckpoint()

保存点triggerSavepoint()triggerSynchronousSavepoint() -> 调triggerSavepointInternal() -> 调triggerCheckpointFromCheckpointThread() -> 然后就跟上面的检查点流程一样了

重点注意一下triggerCheckpoint(CheckpointType checkpointType)

java 复制代码
public CompletableFuture<CompletedCheckpoint> triggerCheckpoint(CheckpointType checkpointType) {

    if (checkpointType == null) {
        throw new IllegalArgumentException("checkpointType cannot be null");
    }
    // 设置检查点类型
    final SnapshotType snapshotType;
    switch (checkpointType) {
        case CONFIGURED:
            snapshotType = checkpointProperties.getCheckpointType();
            break;
        case FULL:
            snapshotType = FULL_CHECKPOINT;
            break;
        case INCREMENTAL:
            snapshotType = CHECKPOINT;
            break;
        default:
            throw new IllegalArgumentException("unknown checkpointType: " + checkpointType);
    }

    // 构建CheckpointProperties属性配置
    final CheckpointProperties properties =
            new CheckpointProperties(
                    checkpointProperties.forceCheckpoint(), // 是否强制触发(忽略最小间隔等限制)
                    snapshotType, // 增量|全量
                    checkpointProperties.discardOnSubsumed(), // 被新Checkpoint覆盖时是否丢弃
                    checkpointProperties.discardOnJobFinished(), // 作业完成时是否丢弃
                    checkpointProperties.discardOnJobCancelled(), // 作业取消时是否丢弃
                    checkpointProperties.discardOnJobFailed(), // 作业失败时是否丢弃
                    checkpointProperties.discardOnJobSuspended(), // 作业暂停时是否丢弃
                    checkpointProperties.isUnclaimed()); // 是否为"未认领"状态(用于特殊场景,如外部恢复)
    return triggerCheckpointFromCheckpointThread(properties, null, false);
}

2.看startTriggeringCheckpoint()这个前期准备

分为7步

  1. 前置检查与状态初始化
  2. 计算检查点计划
  3. 创建待处理的Checkpoint
  4. 设置存储路径并异步触发所有OperatorCoordinator 的 Checkpoint 操作
  5. 捕获主状态
  6. 合并所有异步操作的完成结果,给masterTriggerCompletionPromise
  7. 处理这些异步操作完成的结果
    • 失败:调onTriggerFailure()去处理异常
    • 成功:调triggerCheckpointRequest()
java 复制代码
private void startTriggeringCheckpoint(CheckpointTriggerRequest request) {
    try {
        // 1.前置检查与状态初始化
        synchronized (lock) {
            // 验证系统状态是否适合启动检查点,确认系统未关闭
            preCheckGlobalState(request.isPeriodic);
        }

        // we will actually trigger this checkpoint!
        Preconditions.checkState(!isTriggering);
        isTriggering = true;

        final long timestamp = System.currentTimeMillis();
        // 2.计算检查点计划
        CompletableFuture<CheckpointPlan> checkpointPlanFuture =
                checkpointPlanCalculator.calculateCheckpointPlan();

        boolean initializeBaseLocations = !baseLocationsForCheckpointInitialized;
        baseLocationsForCheckpointInitialized = true;
        // 这个masterTriggerCompletionPromise最后会合并下面几个CompletableFuture的结果
        CompletableFuture<Void> masterTriggerCompletionPromise = new CompletableFuture<>();

        // 3.创建待处理的Checkpoint
        final CompletableFuture<PendingCheckpoint> pendingCheckpointCompletableFuture =
                checkpointPlanFuture
                        .thenApplyAsync(
                                plan -> {
                                    try {
                                        // 生成唯一且递增的CheckpointID,确保分布式环境下唯一性
                                        long checkpointID =
                                                checkpointIdCounter.getAndIncrement();
                                        // 将plan和CheckpointID封装成Tuple2往下传递
                                        return new Tuple2<>(plan, checkpointID);
                                    } catch (Throwable e) {
                                        throw new CompletionException(e);
                                    }
                                },
                                executor)
                        .thenApplyAsync(
                                // 这里的checkpointInfo是上面的tuple2<plan,CheckpointID>
                                (checkpointInfo) ->
                                        // 创建pendingCheckpoint(代表正在进行的Checkpoint),该方法会创建PendingCheckpoint对象,将其放到pendingCheckpoints中缓存起来,然后再return
                                        createPendingCheckpoint(
                                                timestamp,
                                                request.props,
                                                checkpointInfo.f0,
                                                request.isPeriodic,
                                                checkpointInfo.f1,
                                                request.getOnCompletionFuture(),
                                                masterTriggerCompletionPromise),
                                timer);
        // 4.设置存储路径并定时触发所有OperatorCoordinator 的 Checkpoint 操作
        final CompletableFuture<?> coordinatorCheckpointsComplete =
                pendingCheckpointCompletableFuture
                        .thenApplyAsync(
                                pendingCheckpoint -> {
                                    try {
                                        // 初始化checkpoint的存储位置,调checkpointStorageView.initializeLocationForCheckpoint(checkpointID);
                                        CheckpointStorageLocation checkpointStorageLocation =
                                                initializeCheckpointLocation(
                                                        pendingCheckpoint.getCheckpointID(),
                                                        request.props,
                                                        request.externalSavepointLocation,
                                                        initializeBaseLocations);
                                        // 将pendingCheckpoint和存储位置封装到tuple2,往下传递
                                        return Tuple2.of(
                                                pendingCheckpoint, checkpointStorageLocation);
                                    } catch (Throwable e) {
                                        throw new CompletionException(e);
                                    }
                                },
                                executor)
                        .thenComposeAsync(
                                // 这里的checkpointInfo是上面的tuple2<pendingCheckpoint,存储位置>
                                (checkpointInfo) -> {
                                    PendingCheckpoint pendingCheckpoint = checkpointInfo.f0;
                                    // 检查当前正在执行的checkpoin是否已被释放,若是,则跳过操作;不是,则进行后续操作
                                    if (pendingCheckpoint.isDisposed()) {
                                        // The disposed checkpoint will be handled later,
                                        // skip snapshotting the coordinator states.
                                        return null;
                                    }
                                    // 对其设置目标路径,就是你要把checkpoin存到哪里
                                    synchronized (lock) {
                                        pendingCheckpoint.setCheckpointTargetLocation(
                                                checkpointInfo.f1);
                                    }
                                    // 让timer线程池去异步触发所有OperatorCoordinator 的 Checkpoint 操作,并返回一个异步 Future。
                                    return OperatorCoordinatorCheckpoints
                                            .triggerAndAcknowledgeAllCoordinatorCheckpointsWithCompletion(
                                                    coordinatorsToCheckpoint,
                                                    pendingCheckpoint,
                                                    timer);
                                },
                                timer);

        // We have to take the snapshot of the master hooks after the coordinator checkpoints
        // has completed.
        // This is to ensure the tasks are checkpointed after the OperatorCoordinators in case
        // ExternallyInducedSource is used.

        // 5.捕获主状态
        final CompletableFuture<?> masterStatesComplete =
                coordinatorCheckpointsComplete.thenComposeAsync(
                        ignored -> {
                            // If the code reaches here, the pending checkpoint is guaranteed to
                            // be not null.
                            // We use FutureUtils.getWithoutException() to make compiler happy
                            // with checked
                            // exceptions in the signature.
                            PendingCheckpoint checkpoint =
                                    FutureUtils.getWithoutException(
                                            pendingCheckpointCompletableFuture);
                            if (checkpoint == null || checkpoint.isDisposed()) {
                                // The disposed checkpoint will be handled later,
                                // skip snapshotting the master states.
                                return null;
                            }
                            // 捕获与JobManager相关的主状态,如作业配置,算子状态等
                            return snapshotMasterState(checkpoint);
                        },
                        timer);
        // 6.合并所有异步操作的完成结果,给masterTriggerCompletionPromise
        FutureUtils.forward(
                // 当且仅当4、5都完成时,才返回结果给masterTriggerCompletionPromise
                CompletableFuture.allOf(masterStatesComplete, coordinatorCheckpointsComplete),
                masterTriggerCompletionPromise);

        // 7. 处理这些异步操作完成的结果
        FutureUtils.assertNoException(
                masterTriggerCompletionPromise
                        .handleAsync(
                                (ignored, throwable) -> {
                                    final PendingCheckpoint checkpoint =
                                            FutureUtils.getWithoutException(
                                                    pendingCheckpointCompletableFuture);

                                    Preconditions.checkState(
                                            checkpoint != null || throwable != null,
                                            "Either the pending checkpoint needs to be created or an error must have occurred.");
                                    // throwable为null,表示成功;不为null,表示失败
                                    if (throwable != null) {
                                        // the initialization might not be finished yet
                                        if (checkpoint == null) {
                                            onTriggerFailure(request, throwable);
                                        } else {
                                            onTriggerFailure(checkpoint, throwable);
                                        }
                                    } else {
                                        // 成功的处理逻辑,调triggerCheckpointRequest向所有TM发送检查点请求,开始实际的状态保存过程
                                        triggerCheckpointRequest(
                                                request, timestamp, checkpoint);
                                    }
                                    return null;
                                },
                                timer)
                        .exceptionally( // 这里就是异常处理
                                error -> {
                                    if (!isShutdown()) {
                                        throw new CompletionException(error);
                                    } else if (findThrowable(
                                                    error, RejectedExecutionException.class)
                                            .isPresent()) {
                                        LOG.debug("Execution rejected during shutdown");
                                    } else {
                                        LOG.warn("Error encountered during shutdown", error);
                                    }
                                    return null;
                                }));
    } catch (Throwable throwable) {
        onTriggerFailure(request, throwable);
    }
}

3.然后看triggerCheckpointRequest()

java 复制代码
private void triggerCheckpointRequest(
        CheckpointTriggerRequest request, long timestamp, PendingCheckpoint checkpoint) {
    // 再次检查checpoint是否已失效
    if (checkpoint.isDisposed()) {
        onTriggerFailure(
                checkpoint,
                new CheckpointException(
                        CheckpointFailureReason.TRIGGER_CHECKPOINT_FAILURE,
                        checkpoint.getFailureCause()));
    } else {
        // 调triggerTasks去触发任务执行
        triggerTasks(request, timestamp, checkpoint)
                .exceptionally( // 异常处理,省略
                        failure -> {。。。}
                        });
                        
        // 最终检查是否所有的Task都完成了Checkpoint
        if (maybeCompleteCheckpoint(checkpoint)) {
            // 将isTriggering标记为false;并调度下一次的Checkpoint
            onTriggerSuccess();
        }
    }
}

// 当前Checkpoint成功的调用:将isTriggering标记为false;并调度下一次的Checkpoint
private void onTriggerSuccess() {
    isTriggering = false;
    executeQueuedRequest();
}

4.看triggerTasks()

分为4步

  1. 获取当前的CheckpointID
  2. 确定Checkpoint的类型(全量|增量),涉及上面triggerCheckpoint(CheckpointType checkpointType)
  3. 构建Checkpoint配置
  4. 向所有Task发送触发指令
  5. 合并所有Task的结果
    • 全成功:总Future完成(isDone() 为 true)
    • 有一个失败:总Future会异常完成,有exception
java 复制代码
private CompletableFuture<Void> triggerTasks(
        CheckpointTriggerRequest request, long timestamp, PendingCheckpoint checkpoint) {
    // 1.获取当前的CheckpointID
    final long checkpointId = checkpoint.getCheckpointID();

    // 2.确定Checkpoint的类型(全量|增量)
    final SnapshotType type;
    if (this.forceFullSnapshot && !request.props.isSavepoint()) { // 强制全量且非保存点才会走这
        type = FULL_CHECKPOINT;
    } else {
        type = request.props.getCheckpointType(); // 这里的props是在上面triggerCheckpoint(CheckpointType checkpointType)中设置的
    }

    // 3.构建Checkpoint配置
    final CheckpointOptions checkpointOptions =
            CheckpointOptions.forConfig(
                    type, // 检查点类型,增量还是全量
                    checkpoint.getCheckpointStorageLocation().getLocationReference(), // 存储路径的引用
                    isExactlyOnceMode, // 是否开启精准一次语义
                    unalignedCheckpointsEnabled, // 是否允许非对齐Barrier
                    alignedCheckpointTimeout); // 对齐Barrier的超时时间

    // 4.向所有Task发送触发指令
    List<CompletableFuture<Acknowledge>> acks = new ArrayList<>();
    for (Execution execution : checkpoint.getCheckpointPlan().getTasksToTrigger()) {
        if (request.props.isSynchronous()) { // 同步模式
            acks.add(
                    execution.triggerSynchronousSavepoint(
                            checkpointId, timestamp, checkpointOptions));
        } else { // 异步模式
            acks.add(execution.triggerCheckpoint(checkpointId, timestamp, checkpointOptions));
        }
    }
    /* 合并所有Task的结果
        若所有Task都成功确认,总Future完成(isDone() 为 true)
        若任一Task失败(如已崩溃、网络超时),总Future会异常完成,有exception
     */
    return FutureUtils.waitForAll(acks);
}

(1) 调用的execution.triggerCheckpoint()

java 复制代码
public CompletableFuture<Acknowledge> triggerCheckpoint(
        long checkpointId, long timestamp, CheckpointOptions checkpointOptions) {
    // 继续调triggerCheckpointHelper
    return triggerCheckpointHelper(checkpointId, timestamp, checkpointOptions);
}

(2) 继续调的triggerCheckpointHelper()

java 复制代码
private CompletableFuture<Acknowledge> triggerCheckpointHelper(
        long checkpointId, long timestamp, CheckpointOptions checkpointOptions) {
    // 1.获取分配的资源(slot)
    final LogicalSlot slot = assignedResource;
    // 2.验证资源并获取TM的网关
    if (slot != null) {
        final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway();
        // 3.发送检查点触发请求,就是让目标方的TM去调triggerCheckpoint,完成Checkpoint
        return taskManagerGateway.triggerCheckpoint(
                attemptId, getVertex().getJobId(), checkpointId, timestamp, checkpointOptions);
    }
    LOG.debug(
            "The execution has no slot assigned. This indicates that the execution is no longer running.");
    return CompletableFuture.completedFuture(Acknowledge.get());
}

好了,到这里,我们知道又去调目标端的TM网关去做triggerCheckpoint,接下来,看taskManagerGateway把

二.再看TaskManagerGateway

1.首先TaskManagerGateway是一个接口

他的实现类如图:

其接口代码如下

java 复制代码
public interface TaskManagerGateway extends TaskExecutorOperatorEventGateway {

    String getAddress();

    CompletableFuture<Acknowledge> submitTask(TaskDeploymentDescriptor tdd, Duration timeout);

    CompletableFuture<Acknowledge> cancelTask(
            ExecutionAttemptID executionAttemptID, Duration timeout);

    CompletableFuture<Acknowledge> updatePartitions(
            ExecutionAttemptID executionAttemptID,
            Iterable<PartitionInfo> partitionInfos,
            Duration timeout);

    void releasePartitions(JobID jobId, Set<ResultPartitionID> partitionIds);

    void notifyCheckpointOnComplete(
            ExecutionAttemptID executionAttemptID,
            JobID jobId,
            long completedCheckpointId,
            long completedTimestamp,
            long lastSubsumedCheckpointId);

    void notifyCheckpointAborted(
            ExecutionAttemptID executionAttemptID,
            JobID jobId,
            long checkpointId,
            long latestCompletedCheckpointId,
            long timestamp);

    CompletableFuture<Acknowledge> triggerCheckpoint(
            ExecutionAttemptID executionAttemptID,
            JobID jobId,
            long checkpointId,
            long timestamp,
            CheckpointOptions checkpointOptions);

    CompletableFuture<Acknowledge> freeSlot(
            final AllocationID allocationId,
            final Throwable cause,
            @RpcTimeout final Duration timeout);

    @Override
    CompletableFuture<Acknowledge> sendOperatorEventToTask(
            ExecutionAttemptID task, OperatorID operator, SerializedValue<OperatorEvent> evt);
}

2.看实现类RpcTaskManagerGateway

调的是TaskExecutorGateway实现类的triggerCheckpoint方法

java 复制代码
 public class RpcTaskManagerGateway implements TaskManagerGateway {

    private final TaskExecutorGateway taskExecutorGateway;

    private final JobMasterId jobMasterId;
    @Override
    public CompletableFuture<Acknowledge> triggerCheckpoint(
            ExecutionAttemptID executionAttemptID,
            JobID jobId,
            long checkpointId,
            long timestamp,
            CheckpointOptions checkpointOptions) {
         // 调的是TaskExecutorGateway实现类的triggerCheckpoint方法
        return taskExecutorGateway.triggerCheckpoint(
                executionAttemptID, checkpointId, timestamp, checkpointOptions);
    }
    ...
}

(1) 浅看一下TaskExecutorGateway接口

其实现类如图:

(2) 实现类TaskExecutor实现的triggerCheckpoint()

其实就是3步

  1. 根据任务尝试ID获取Task实例
  2. 触发Task的检查点Barrier方法(核心逻辑)
  3. 返回成功确认
java 复制代码
@Override
public CompletableFuture<Acknowledge> triggerCheckpoint(
        ExecutionAttemptID executionAttemptID,
        long checkpointId,
        long checkpointTimestamp,
        CheckpointOptions checkpointOptions) {
    // 1.根据任务尝试ID获取Task实例
    final Task task = taskSlotTable.getTask(executionAttemptID);
    if (task != null) {
        try (MdcCloseable ignored =
                MdcUtils.withContext(MdcUtils.asContextData(task.getJobID()))) {
            log.debug(
                    "Trigger checkpoint {}@{} for {}.",
                    checkpointId,
                    checkpointTimestamp,
                    executionAttemptID);
            // 2.触发Task的检查点Barrier方法(核心逻辑)
            task.triggerCheckpointBarrier(checkpointId, checkpointTimestamp, checkpointOptions);
            // 3.返回成功确认
            return CompletableFuture.completedFuture(Acknowledge.get());
        }
    } else {
        final String message =
                "TaskManager received a checkpoint request for unknown task "
                        + executionAttemptID
                        + '.';

        log.debug(message);
        return FutureUtils.completedExceptionally(
                new CheckpointException(
                        message, CheckpointFailureReason.TASK_CHECKPOINT_FAILURE));
    }
}

(3) 调用的Task.triggerCheckpointBarrier()

java 复制代码
public void triggerCheckpointBarrier(
        final long checkpointID,
        final long checkpointTimestamp,
        final CheckpointOptions checkpointOptions) {
    // 1.获取任务实际执行体(TaskInvokable实现类)
    final TaskInvokable invokable = this.invokable;
    // 2.封装检查点元数据
    final CheckpointMetaData checkpointMetaData =
            new CheckpointMetaData(
                    checkpointID, checkpointTimestamp, System.currentTimeMillis());

    if (executionState == ExecutionState.RUNNING) {
        // 3.校验任务执行体是否支持检查点,需要实现CheckpointableTask接口
        checkState(invokable instanceof CheckpointableTask, "invokable is not checkpointable");
        try {
            // 4.调用执行体的triggerCheckpointAsync(异步触发检查点方法)
            ((CheckpointableTask) invokable)
                    .triggerCheckpointAsync(checkpointMetaData, checkpointOptions)
                    .handle( // 5.处理响应结果
                            (triggerResult, exception) -> {
                                if (exception != null || !triggerResult) {
                                    // 5.1 失败,通知JobMaster检查点失败
                                    declineCheckpoint(
                                            checkpointID,
                                            CheckpointFailureReason.TASK_FAILURE,
                                            exception);
                                    return false;
                                }
                                return true;
                            });
            // 下面就是一些异常处理了,忽略
        } ...
}

好了,到这里我们看到调用是TaskInvokable实现类的triggerCheckpointAsync方法去做的

三.然后看TaskInvokable

1.TaskInvokable是一个接口

其实现类如下图

2.实现类StreamTask

其实也是套娃 triggerCheckpointAsync()->triggerCheckpointAsyncInMailbox()->performCheckpoint()->subtaskCheckpointCoordinator.checkpointState()

java 复制代码
步1.TaskInvokable.triggerCheckpointAsync()
@Override
public CompletableFuture<Boolean> triggerCheckpointAsync(
        CheckpointMetaData checkpointMetaData, CheckpointOptions checkpointOptions) {
    // 1.检查强制全量快照支持,若当前状态后端不支持,则抛异常
    checkForcedFullSnapshotSupport(checkpointOptions);
    /* 2.设置执行优先级
    *   UNALIGNED:非对齐检查点,设置为紧急优先级
    *   ALIGNED:对齐检查点,设置为默认优先级
    * */
    MailboxExecutor.MailOptions mailOptions =
            CheckpointOptions.AlignmentType.UNALIGNED == checkpointOptions.getAlignment()
                    ? MailboxExecutor.MailOptions.urgent()
                    : MailboxExecutor.MailOptions.options();
    // 3.异步执行检查点操作,根据输入是否结束,选择触发方法
    CompletableFuture<Boolean> result = new CompletableFuture<>();
    mainMailboxExecutor.execute(
            mailOptions,
            () -> {
                try {
                    boolean noUnfinishedInputGates =
                            Arrays.stream(getEnvironment().getAllInputGates())
                                    .allMatch(InputGate::isFinished);
                    // 3.1 所以输入已完成,调triggerCheckpointAsyncInMailbox()
                    if (noUnfinishedInputGates) {
                        result.complete(
                                triggerCheckpointAsyncInMailbox(
                                        checkpointMetaData, checkpointOptions));
                    } else { // 3.2 存在未完成的输入,调triggerUnfinishedChannelsCheckpoint()
                        result.complete(
                                triggerUnfinishedChannelsCheckpoint(
                                        checkpointMetaData, checkpointOptions));
                    }
                } catch (Exception ex) {
                    // Report the failure both via the Future result but also to the mailbox
                    result.completeExceptionally(ex);
                    throw ex;
                }
            },
            "checkpoint %s with %s",
            checkpointMetaData,
            checkpointOptions);
    return result;
}

步2.调用的triggerCheckpointAsyncInMailbox()
private boolean triggerCheckpointAsyncInMailbox(
        CheckpointMetaData checkpointMetaData, CheckpointOptions checkpointOptions)
        throws Exception {
    FlinkSecurityManager.monitorUserSystemExitForCurrentThread();
    try {
        // 计算检查点启动延迟:从 JobManager 触发检查点到 Task 实际开始执行检查点的时间差
        latestAsyncCheckpointStartDelayNanos =
                1_000_000
                        * Math.max(
                                0,
                                System.currentTimeMillis() - checkpointMetaData.getTimestamp());

        // 初始化检查点相关指标收集器
        CheckpointMetricsBuilder checkpointMetrics =
                new CheckpointMetricsBuilder()
                        .setAlignmentDurationNanos(0L)
                        .setBytesProcessedDuringAlignment(0L)
                        .setCheckpointStartDelayNanos(latestAsyncCheckpointStartDelayNanos);
        // 初始化输入处理器
        subtaskCheckpointCoordinator.initInputsCheckpoint(
                checkpointMetaData.getCheckpointId(), checkpointOptions);
        // 这里才是实际执行检查点的操作
        boolean success =
                performCheckpoint(checkpointMetaData, checkpointOptions, checkpointMetrics);
        if (!success) {
            declineCheckpoint(checkpointMetaData.getCheckpointId());
        }
        return success;
        // 下面是异常处理,忽略
    } 。。。
}

步3.调用的performCheckpoint()
private boolean performCheckpoint(
        CheckpointMetaData checkpointMetaData,
        CheckpointOptions checkpointOptions,
        CheckpointMetricsBuilder checkpointMetrics)
        throws Exception {

    final SnapshotType checkpointType = checkpointOptions.getCheckpointType();
    LOG.debug(
            "Starting checkpoint {} {} on task {}",
            checkpointMetaData.getCheckpointId(),
            checkpointType,
            getName());

    if (isRunning) { // 可运行
        actionExecutor.runThrowing(
                () -> {
                    // 同步保存点处理
                    if (isSynchronous(checkpointType)) {
                        setSynchronousSavepoint(checkpointMetaData.getCheckpointId());
                    }
                    // 标记任务完成后的首个检查点
                    if (areCheckpointsWithFinishedTasksEnabled()
                            && endOfDataReceived
                            && this.finalCheckpointMinId == null) {
                        this.finalCheckpointMinId = checkpointMetaData.getCheckpointId();
                    }

                    // 这里是负责barrier机制和状态快照的方法,subtaskCheckpointCoordinator是SubtaskCheckpointCoordinator接口的实现类
                    subtaskCheckpointCoordinator.checkpointState(
                            checkpointMetaData,        // 检查点元数据(ID、时间戳)
                            checkpointOptions,         // 检查点配置(对齐方式、优先级)
                            checkpointMetrics,         // 检查点指标收集器
                            operatorChain,             // 算子链(包含所有算子状态)
                            finishedOperators,         // 已完成的算子列表
                            this::isRunning);          // 任务状态检查函数
                });

        return true;
    } else { // 不可以运行
        actionExecutor.runThrowing(
                () -> {
                    final CancelCheckpointMarker message =
                            new CancelCheckpointMarker(checkpointMetaData.getCheckpointId());
                    recordWriter.broadcastEvent(message);
                });

        return false;
    }
}

四.最后看SubtaskCheckpointCoordinator

1.SubtaskCheckpointCoordinator是一个接口

其实现类如图:

2.看实现类SubtaskCheckpointCoordinatorImpl

(1) 核心方法checkpointState()

分为以下几步

  1. 处理已终止(超时或故障)的检查点,广播取消标记并调abort终止操作,然后通知下游算子,调operatorChain.broadcastEvent(new CancelCheckpointMarker())
  2. 准备检查点,进行预检查点工作,调operatorChain.prepareSnapshotPreBarrier()
  3. 广播Barrier,调operatorChain.broadcastEvent()
  4. 注册Barrier对齐的超时定时器,调registerAlignmentTimer()
  5. 处理通道状态,将输出缓存区的数据写入到状态后端StateBackEnd中,调channelStateWriter.finishOutput()
  6. 执行状态快照,调takeSnapshotSync()执行,然后调finishAndReportAsync()返回结果
java 复制代码
@Override
public void checkpointState(
        CheckpointMetaData metadata,
        CheckpointOptions options,
        CheckpointMetricsBuilder metrics,
        OperatorChain<?, ?> operatorChain,
        boolean isTaskFinished,
        Supplier<Boolean> isRunning)
        throws Exception {

    checkNotNull(options);
    checkNotNull(metrics);

    // 准备工作:检查点顺序验证,确保检查点按照递增顺序执行,丢弃重复和过时的checkpoint请求
    if (lastCheckpointId >= metadata.getCheckpointId()) {
        LOG.info(
                "Out of order checkpoint barrier (aborted previously?): {} >= {}",
                lastCheckpointId,
                metadata.getCheckpointId());
        channelStateWriter.abort(metadata.getCheckpointId(), new CancellationException(), true);
        checkAndClearAbortedStatus(metadata.getCheckpointId());
        return;
    }

    logCheckpointProcessingDelay(metadata);

    // Step (0): Record the last triggered checkpointId and abort the sync phase of checkpoint
    // if necessary.
    // 0.处理已终止(超时或故障)的检查点
    lastCheckpointId = metadata.getCheckpointId();
    if (checkAndClearAbortedStatus(metadata.getCheckpointId())) {
        // 广播取消标记并调abort终止操作,CancelCheckpointMarker通知下游算子
        operatorChain.broadcastEvent(new CancelCheckpointMarker(metadata.getCheckpointId()));
        channelStateWriter.abort(
                metadata.getCheckpointId(),
                new CancellationException("checkpoint aborted via notification"),
                true);
        LOG.info(
                "Checkpoint {} has been notified as aborted, would not trigger any checkpoint.",
                metadata.getCheckpointId());
        return;
    }
    // 准备检查点环境
    if (fileMergingSnapshotManager != null) {
        // notify file merging snapshot manager for managed dir lifecycle management
        fileMergingSnapshotManager.notifyCheckpointStart(
                FileMergingSnapshotManager.SubtaskKey.of(env), metadata.getCheckpointId());
    }

    if (options.getAlignment() == CheckpointOptions.AlignmentType.FORCED_ALIGNED) {
        options = options.withUnalignedSupported();
        initInputsCheckpoint(metadata.getCheckpointId(), options);
    }

    // Step (1): Prepare the checkpoint, allow operators to do some pre-barrier work.
    //           The pre-barrier work should be nothing or minimal in the common case.
    // 1.准备检查点,进行预检查点工作。
    operatorChain.prepareSnapshotPreBarrier(metadata.getCheckpointId());

    // Step (2): Send the checkpoint barrier downstream
    /*2.广播检查点Barrier
    * 其实这个CheckpointBarrier,就是检查点ID、时间戳、CheckpointOptions等信息的封装,属性如下
    * private final long id;
    * private final long timestamp;
    * private final CheckpointOptions checkpointOptions;
    * */
    LOG.debug(
            "Task {} broadcastEvent at {}, triggerTime {}, passed time {}",
            taskName,
            System.currentTimeMillis(),
            metadata.getTimestamp(),
            System.currentTimeMillis() - metadata.getTimestamp());
    CheckpointBarrier checkpointBarrier =
            new CheckpointBarrier(metadata.getCheckpointId(), metadata.getTimestamp(), options);
    operatorChain.broadcastEvent(checkpointBarrier, options.isUnalignedCheckpoint());

    // Step (3): Register alignment timer to timeout aligned barrier to unaligned barrier
    // 3.注册Barrier对齐的超时定时器
    registerAlignmentTimer(metadata.getCheckpointId(), operatorChain, checkpointBarrier);

    // Step (4): Prepare to spill the in-flight buffers for input and output
    // 4.处理通道状态,将输出缓存区的数据写入到状态后端StateBackEnd中
    if (options.needsChannelState()) {
        // output data already written while broadcasting event
        channelStateWriter.finishOutput(metadata.getCheckpointId());
    }

    // Step (5): Take the state snapshot. This should be largely asynchronous, to not impact
    // progress of the
    // streaming topology
    /*5.执行状态快照
    *  takeSnapshotSync():同步触发算子链的状态快照,返回异步结果。
    *  finishAndReportAsync():异步处理快照结果,完成后通知 JobManager。
    *  cleanup():处理失败情况,释放资源并记录错误。
    * */
    Map<OperatorID, OperatorSnapshotFutures> snapshotFutures =
            CollectionUtil.newHashMapWithExpectedSize(operatorChain.getNumberOfOperators());
    try {
        if (takeSnapshotSync(
                snapshotFutures, metadata, metrics, options, operatorChain, isRunning)) {
            finishAndReportAsync(
                    snapshotFutures,
                    metadata,
                    metrics,
                    operatorChain.isTaskDeployedAsFinished(),
                    isTaskFinished,
                    isRunning);
        } else {
            cleanup(snapshotFutures, metadata, metrics, new Exception("Checkpoint declined"));
        }
    } catch (Exception ex) {
        cleanup(snapshotFutures, metadata, metrics, ex);
        throw ex;
    }
}

(2) 调用的takeSnapshotSync()

核心其实就是调operatorChain的snapshotState()去做状态快照,存到storage对应的存储位置上

java 复制代码
private boolean takeSnapshotSync(
        Map<OperatorID, OperatorSnapshotFutures> operatorSnapshotsInProgress,
        CheckpointMetaData checkpointMetaData,
        CheckpointMetricsBuilder checkpointMetrics,
        CheckpointOptions checkpointOptions,
        OperatorChain<?, ?> operatorChain,
        Supplier<Boolean> isRunning)
        throws Exception {

    checkState(
            !operatorChain.isClosed(),
            "OperatorChain and Task should never be closed at this point");

    long checkpointId = checkpointMetaData.getCheckpointId();
    long started = System.nanoTime();
    // 通道状态处理,记录检查点时刻的输入、输出缓冲区数据,用于故障恢复
    ChannelStateWriteResult channelStateWriteResult =
            checkpointOptions.needsChannelState()
                    ? channelStateWriter.getAndRemoveWriteResult(checkpointId)
                    : ChannelStateWriteResult.EMPTY;
    // 解析存储位置
    CheckpointStreamFactory storage =
            checkpointStorage.resolveCheckpointStorageLocation(
                    checkpointId, checkpointOptions.getTargetLocation());
    storage = applyFileMergingCheckpoint(storage, checkpointOptions);

    try {
        // 调operatorChain的snapshotState()去做状态快照,存到storage对应的存储位置上
        operatorChain.snapshotState(
                operatorSnapshotsInProgress,
                checkpointMetaData,
                checkpointOptions,
                isRunning,
                channelStateWriteResult,
                storage);

    } finally {
        // 清理缓存
        checkpointStorage.clearCacheFor(checkpointId);
    }
    // 记录同步阶段的耗时
    checkpointMetrics.setSyncDurationMillis((System.nanoTime() - started) / 1_000_000);

    LOG.debug(
            "{} - finished synchronous part of checkpoint {}. Alignment duration: {} ms, snapshot duration {} ms, is unaligned checkpoint : {}",
            taskName,
            checkpointId,
            checkpointMetrics.getAlignmentDurationNanosOrDefault() / 1_000_000,
            checkpointMetrics.getSyncDurationMillis(),
            checkpointOptions.isUnalignedCheckpoint());

    return true;
}

(3) 调用的finishAndReportAsync()

java 复制代码
private void finishAndReportAsync(
        Map<OperatorID, OperatorSnapshotFutures> snapshotFutures,
        CheckpointMetaData metadata,
        CheckpointMetricsBuilder metrics,
        boolean isTaskDeployedAsFinished,
        boolean isTaskFinished,
        Supplier<Boolean> isRunning)
        throws IOException {
    // 创建异步任务
    AsyncCheckpointRunnable asyncCheckpointRunnable =
            new AsyncCheckpointRunnable(
                    snapshotFutures,          // 各算子的异步快照结果
                    metadata,                 // 检查点元数据(ID、时间戳)
                    metrics,                  // 检查点指标收集器
                    System.nanoTime(),        // 异步阶段开始时间
                    taskName,                 // 任务名称
                    unregisterConsumer(),     // 资源清理回调
                    env,                      // 任务环境(含配置、状态后端)
                    asyncExceptionHandler,    // 异步异常处理器
                    isTaskDeployedAsFinished, // 任务是否以"已完成"状态部署
                    isTaskFinished,           // 任务是否已完成数据处理
                    isRunning);               // 任务是否仍在运行的判断函数
    // 注册异步任务
    registerAsyncCheckpointRunnable(
            asyncCheckpointRunnable.getCheckpointId(), asyncCheckpointRunnable);

    // 提交异步任务到线程池执行
    asyncOperationsThreadPool.execute(asyncCheckpointRunnable);
}

(4) 调用的registerAlignmentTimer()

若barrier对齐超时了,那么会触发降级策略,改为barrier非对齐模式,具体解析看下章 目的:允许barrier跨越数据,避免阻塞,避免检查点成为流处理的瓶颈

比如有一个案例[数据A, 数据B, barrier-C, 数据D, 数据E]

  1. barrier对齐模式不允许跨越:barrier-C必须等数据A和数据B处理完并发送到下游后,才能传递barrier-C
  2. barrier非对齐模式允许跨越:barrier-C不等待数据A和数据B处理完成,直接将barrier-C标记为"高优先级",优先传递到下游算子 注意:虽然barrier-C跨越了数据 A 和 B,但这些数据被存档到检查点中。当任务故障恢复时,Flink 会重放这些存档的数据,确保下游算子最终能收到完整的 A、B、D、E(不会丢失数据)。
java 复制代码
private void registerAlignmentTimer(
        long checkpointId,
        OperatorChain<?, ?> operatorChain,
        CheckpointBarrier checkpointBarrier) {
    // 取消已有的定时器,避免多个检查点的定时器相互干扰
    cancelAlignmentTimer();
    // 判断是否需要超时保护
    if (!checkpointBarrier.getCheckpointOptions().isTimeoutable()) {
        return;
    }
    // 计算延迟
    long timerDelay = BarrierAlignmentUtil.getTimerDelay(clock, checkpointBarrier);
    // 注册超时任务
    alignmentTimer =
            registerTimer.registerTask(
                    () -> {
                        try {
                            // 超时后,将barrier对齐转为非对齐逻辑  这是降级机制,具体实现看下章
                            operatorChain.alignedBarrierTimeout(checkpointId);
                        } catch (Exception e) {
                            ExceptionUtils.rethrowIOException(e);
                        }
                        alignmentTimer = null;
                        return null;
                    },
                    Duration.ofMillis(timerDelay));
    alignmentCheckpointId = checkpointId;
}
相关推荐
Haoea!17 分钟前
Flink-05学习 接上节,将FlinkJedisPoolConfig 从Kafka写入Redis
学习·flink·kafka
争不过朝夕,又念着往昔19 分钟前
Go语言反射机制详解
开发语言·后端·golang
绝无仅有2 小时前
企微审批对接错误与解决方案
后端·算法·架构
Super Rookie2 小时前
Spring Boot 企业项目技术选型
java·spring boot·后端
来自宇宙的曹先生2 小时前
用 Spring Boot + Redis 实现哔哩哔哩弹幕系统(上篇博客改进版)
spring boot·redis·后端
00后程序员2 小时前
Fiddler中文版如何提升API调试效率:本地化优势与开发者实战体验汇总
后端
用户8122199367223 小时前
C# .Net Core零基础从入门到精通实战教程全集【190课】
后端
bobz9653 小时前
FROM scratch: docker 构建方式分析
后端
lzzy_lx_20893 小时前
Spring Boot登录认证实现学习心得:从皮肤信息系统项目中学到的经验
java·spring boot·后端