Flink-Checkpoint-1.源码流程

一.先从CheckpointCoordinator入手

1.触发检查点的流程

检查点 ：triggerCheckpoint(CheckpointType checkpointType)->调triggerCheckpointFromCheckpointThread() -> 调triggerCheckpoint()重载方法 -> 调startTriggeringCheckpoint()

保存点 ：triggerSavepoint()或triggerSynchronousSavepoint() -> 调triggerSavepointInternal() -> 调triggerCheckpointFromCheckpointThread() -> 然后就跟上面的检查点流程一样了

重点注意一下triggerCheckpoint(CheckpointType checkpointType)

java 复制代码

public CompletableFuture<CompletedCheckpoint> triggerCheckpoint(CheckpointType checkpointType) {

    if (checkpointType == null) {
        throw new IllegalArgumentException("checkpointType cannot be null");
    }
    // 设置检查点类型
    final SnapshotType snapshotType;
    switch (checkpointType) {
        case CONFIGURED:
            snapshotType = checkpointProperties.getCheckpointType();
            break;
        case FULL:
            snapshotType = FULL_CHECKPOINT;
            break;
        case INCREMENTAL:
            snapshotType = CHECKPOINT;
            break;
        default:
            throw new IllegalArgumentException("unknown checkpointType: " + checkpointType);
    }

    // 构建CheckpointProperties属性配置
    final CheckpointProperties properties =
            new CheckpointProperties(
                    checkpointProperties.forceCheckpoint(), // 是否强制触发（忽略最小间隔等限制）
                    snapshotType, // 增量|全量
                    checkpointProperties.discardOnSubsumed(), // 被新Checkpoint覆盖时是否丢弃
                    checkpointProperties.discardOnJobFinished(), // 作业完成时是否丢弃
                    checkpointProperties.discardOnJobCancelled(), // 作业取消时是否丢弃
                    checkpointProperties.discardOnJobFailed(), // 作业失败时是否丢弃
                    checkpointProperties.discardOnJobSuspended(), // 作业暂停时是否丢弃
                    checkpointProperties.isUnclaimed()); // 是否为"未认领"状态（用于特殊场景，如外部恢复）
    return triggerCheckpointFromCheckpointThread(properties, null, false);
}

2.看`startTriggeringCheckpoint()`这个前期准备

分为7步

前置检查与状态初始化
计算检查点计划
创建待处理的Checkpoint
设置存储路径并异步触发所有OperatorCoordinator 的 Checkpoint 操作
捕获主状态
合并所有异步操作的完成结果，给masterTriggerCompletionPromise
处理这些异步操作完成的结果
- 失败：调onTriggerFailure()去处理异常
- 成功：调triggerCheckpointRequest()

java 复制代码

private void startTriggeringCheckpoint(CheckpointTriggerRequest request) {
    try {
        // 1.前置检查与状态初始化
        synchronized (lock) {
            // 验证系统状态是否适合启动检查点，确认系统未关闭
            preCheckGlobalState(request.isPeriodic);
        }

        // we will actually trigger this checkpoint!
        Preconditions.checkState(!isTriggering);
        isTriggering = true;

        final long timestamp = System.currentTimeMillis();
        // 2.计算检查点计划
        CompletableFuture<CheckpointPlan> checkpointPlanFuture =
                checkpointPlanCalculator.calculateCheckpointPlan();

        boolean initializeBaseLocations = !baseLocationsForCheckpointInitialized;
        baseLocationsForCheckpointInitialized = true;
        // 这个masterTriggerCompletionPromise最后会合并下面几个CompletableFuture的结果
        CompletableFuture<Void> masterTriggerCompletionPromise = new CompletableFuture<>();

        // 3.创建待处理的Checkpoint
        final CompletableFuture<PendingCheckpoint> pendingCheckpointCompletableFuture =
                checkpointPlanFuture
                        .thenApplyAsync(
                                plan -> {
                                    try {
                                        // 生成唯一且递增的CheckpointID，确保分布式环境下唯一性
                                        long checkpointID =
                                                checkpointIdCounter.getAndIncrement();
                                        // 将plan和CheckpointID封装成Tuple2往下传递
                                        return new Tuple2<>(plan, checkpointID);
                                    } catch (Throwable e) {
                                        throw new CompletionException(e);
                                    }
                                },
                                executor)
                        .thenApplyAsync(
                                // 这里的checkpointInfo是上面的tuple2<plan,CheckpointID>
                                (checkpointInfo) ->
                                        // 创建pendingCheckpoint(代表正在进行的Checkpoint)，该方法会创建PendingCheckpoint对象，将其放到pendingCheckpoints中缓存起来，然后再return
                                        createPendingCheckpoint(
                                                timestamp,
                                                request.props,
                                                checkpointInfo.f0,
                                                request.isPeriodic,
                                                checkpointInfo.f1,
                                                request.getOnCompletionFuture(),
                                                masterTriggerCompletionPromise),
                                timer);
        // 4.设置存储路径并定时触发所有OperatorCoordinator 的 Checkpoint 操作
        final CompletableFuture<?> coordinatorCheckpointsComplete =
                pendingCheckpointCompletableFuture
                        .thenApplyAsync(
                                pendingCheckpoint -> {
                                    try {
                                        // 初始化checkpoint的存储位置，调checkpointStorageView.initializeLocationForCheckpoint(checkpointID);
                                        CheckpointStorageLocation checkpointStorageLocation =
                                                initializeCheckpointLocation(
                                                        pendingCheckpoint.getCheckpointID(),
                                                        request.props,
                                                        request.externalSavepointLocation,
                                                        initializeBaseLocations);
                                        // 将pendingCheckpoint和存储位置封装到tuple2，往下传递
                                        return Tuple2.of(
                                                pendingCheckpoint, checkpointStorageLocation);
                                    } catch (Throwable e) {
                                        throw new CompletionException(e);
                                    }
                                },
                                executor)
                        .thenComposeAsync(
                                // 这里的checkpointInfo是上面的tuple2<pendingCheckpoint,存储位置>
                                (checkpointInfo) -> {
                                    PendingCheckpoint pendingCheckpoint = checkpointInfo.f0;
                                    // 检查当前正在执行的checkpoin是否已被释放，若是，则跳过操作；不是，则进行后续操作
                                    if (pendingCheckpoint.isDisposed()) {
                                        // The disposed checkpoint will be handled later,
                                        // skip snapshotting the coordinator states.
                                        return null;
                                    }
                                    // 对其设置目标路径，就是你要把checkpoin存到哪里
                                    synchronized (lock) {
                                        pendingCheckpoint.setCheckpointTargetLocation(
                                                checkpointInfo.f1);
                                    }
                                    // 让timer线程池去异步触发所有OperatorCoordinator 的 Checkpoint 操作，并返回一个异步 Future。
                                    return OperatorCoordinatorCheckpoints
                                            .triggerAndAcknowledgeAllCoordinatorCheckpointsWithCompletion(
                                                    coordinatorsToCheckpoint,
                                                    pendingCheckpoint,
                                                    timer);
                                },
                                timer);

        // We have to take the snapshot of the master hooks after the coordinator checkpoints
        // has completed.
        // This is to ensure the tasks are checkpointed after the OperatorCoordinators in case
        // ExternallyInducedSource is used.

        // 5.捕获主状态
        final CompletableFuture<?> masterStatesComplete =
                coordinatorCheckpointsComplete.thenComposeAsync(
                        ignored -> {
                            // If the code reaches here, the pending checkpoint is guaranteed to
                            // be not null.
                            // We use FutureUtils.getWithoutException() to make compiler happy
                            // with checked
                            // exceptions in the signature.
                            PendingCheckpoint checkpoint =
                                    FutureUtils.getWithoutException(
                                            pendingCheckpointCompletableFuture);
                            if (checkpoint == null || checkpoint.isDisposed()) {
                                // The disposed checkpoint will be handled later,
                                // skip snapshotting the master states.
                                return null;
                            }
                            // 捕获与JobManager相关的主状态，如作业配置，算子状态等
                            return snapshotMasterState(checkpoint);
                        },
                        timer);
        // 6.合并所有异步操作的完成结果，给masterTriggerCompletionPromise
        FutureUtils.forward(
                // 当且仅当4、5都完成时，才返回结果给masterTriggerCompletionPromise
                CompletableFuture.allOf(masterStatesComplete, coordinatorCheckpointsComplete),
                masterTriggerCompletionPromise);

        // 7. 处理这些异步操作完成的结果
        FutureUtils.assertNoException(
                masterTriggerCompletionPromise
                        .handleAsync(
                                (ignored, throwable) -> {
                                    final PendingCheckpoint checkpoint =
                                            FutureUtils.getWithoutException(
                                                    pendingCheckpointCompletableFuture);

                                    Preconditions.checkState(
                                            checkpoint != null || throwable != null,
                                            "Either the pending checkpoint needs to be created or an error must have occurred.");
                                    // throwable为null，表示成功；不为null，表示失败
                                    if (throwable != null) {
                                        // the initialization might not be finished yet
                                        if (checkpoint == null) {
                                            onTriggerFailure(request, throwable);
                                        } else {
                                            onTriggerFailure(checkpoint, throwable);
                                        }
                                    } else {
                                        // 成功的处理逻辑，调triggerCheckpointRequest向所有TM发送检查点请求，开始实际的状态保存过程
                                        triggerCheckpointRequest(
                                                request, timestamp, checkpoint);
                                    }
                                    return null;
                                },
                                timer)
                        .exceptionally( // 这里就是异常处理
                                error -> {
                                    if (!isShutdown()) {
                                        throw new CompletionException(error);
                                    } else if (findThrowable(
                                                    error, RejectedExecutionException.class)
                                            .isPresent()) {
                                        LOG.debug("Execution rejected during shutdown");
                                    } else {
                                        LOG.warn("Error encountered during shutdown", error);
                                    }
                                    return null;
                                }));
    } catch (Throwable throwable) {
        onTriggerFailure(request, throwable);
    }
}

3.然后看`triggerCheckpointRequest()`

java 复制代码

private void triggerCheckpointRequest(
        CheckpointTriggerRequest request, long timestamp, PendingCheckpoint checkpoint) {
    // 再次检查checpoint是否已失效
    if (checkpoint.isDisposed()) {
        onTriggerFailure(
                checkpoint,
                new CheckpointException(
                        CheckpointFailureReason.TRIGGER_CHECKPOINT_FAILURE,
                        checkpoint.getFailureCause()));
    } else {
        // 调triggerTasks去触发任务执行
        triggerTasks(request, timestamp, checkpoint)
                .exceptionally( // 异常处理，省略
                        failure -> {。。。}
                        });
                        
        // 最终检查是否所有的Task都完成了Checkpoint
        if (maybeCompleteCheckpoint(checkpoint)) {
            // 将isTriggering标记为false；并调度下一次的Checkpoint
            onTriggerSuccess();
        }
    }
}

// 当前Checkpoint成功的调用：将isTriggering标记为false；并调度下一次的Checkpoint
private void onTriggerSuccess() {
    isTriggering = false;
    executeQueuedRequest();
}

4.看`triggerTasks()`

分为4步

获取当前的CheckpointID
确定Checkpoint的类型(全量|增量)，涉及上面triggerCheckpoint(CheckpointType checkpointType)
构建Checkpoint配置
向所有Task发送触发指令
合并所有Task的结果
- 全成功：总Future完成（isDone() 为 true）
- 有一个失败：总Future会异常完成，有exception

java 复制代码

private CompletableFuture<Void> triggerTasks(
        CheckpointTriggerRequest request, long timestamp, PendingCheckpoint checkpoint) {
    // 1.获取当前的CheckpointID
    final long checkpointId = checkpoint.getCheckpointID();

    // 2.确定Checkpoint的类型(全量|增量)
    final SnapshotType type;
    if (this.forceFullSnapshot && !request.props.isSavepoint()) { // 强制全量且非保存点才会走这
        type = FULL_CHECKPOINT;
    } else {
        type = request.props.getCheckpointType(); // 这里的props是在上面triggerCheckpoint(CheckpointType checkpointType)中设置的
    }

    // 3.构建Checkpoint配置
    final CheckpointOptions checkpointOptions =
            CheckpointOptions.forConfig(
                    type, // 检查点类型，增量还是全量
                    checkpoint.getCheckpointStorageLocation().getLocationReference(), // 存储路径的引用
                    isExactlyOnceMode, // 是否开启精准一次语义
                    unalignedCheckpointsEnabled, // 是否允许非对齐Barrier
                    alignedCheckpointTimeout); // 对齐Barrier的超时时间

    // 4.向所有Task发送触发指令
    List<CompletableFuture<Acknowledge>> acks = new ArrayList<>();
    for (Execution execution : checkpoint.getCheckpointPlan().getTasksToTrigger()) {
        if (request.props.isSynchronous()) { // 同步模式
            acks.add(
                    execution.triggerSynchronousSavepoint(
                            checkpointId, timestamp, checkpointOptions));
        } else { // 异步模式
            acks.add(execution.triggerCheckpoint(checkpointId, timestamp, checkpointOptions));
        }
    }
    /* 合并所有Task的结果
        若所有Task都成功确认，总Future完成（isDone() 为 true）
        若任一Task失败（如已崩溃、网络超时），总Future会异常完成，有exception
     */
    return FutureUtils.waitForAll(acks);
}

(1) 调用的`execution.triggerCheckpoint()`

java 复制代码

public CompletableFuture<Acknowledge> triggerCheckpoint(
        long checkpointId, long timestamp, CheckpointOptions checkpointOptions) {
    // 继续调triggerCheckpointHelper
    return triggerCheckpointHelper(checkpointId, timestamp, checkpointOptions);
}

(2) 继续调的`triggerCheckpointHelper()`

java 复制代码

private CompletableFuture<Acknowledge> triggerCheckpointHelper(
        long checkpointId, long timestamp, CheckpointOptions checkpointOptions) {
    // 1.获取分配的资源(slot)
    final LogicalSlot slot = assignedResource;
    // 2.验证资源并获取TM的网关
    if (slot != null) {
        final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway();
        // 3.发送检查点触发请求，就是让目标方的TM去调triggerCheckpoint，完成Checkpoint
        return taskManagerGateway.triggerCheckpoint(
                attemptId, getVertex().getJobId(), checkpointId, timestamp, checkpointOptions);
    }
    LOG.debug(
            "The execution has no slot assigned. This indicates that the execution is no longer running.");
    return CompletableFuture.completedFuture(Acknowledge.get());
}

好了，到这里，我们知道又去调目标端的TM网关去做triggerCheckpoint，接下来，看taskManagerGateway把

二.再看TaskManagerGateway

1.首先TaskManagerGateway是一个接口

他的实现类如图：

其接口代码如下

java 复制代码

public interface TaskManagerGateway extends TaskExecutorOperatorEventGateway {

    String getAddress();

    CompletableFuture<Acknowledge> submitTask(TaskDeploymentDescriptor tdd, Duration timeout);

    CompletableFuture<Acknowledge> cancelTask(
            ExecutionAttemptID executionAttemptID, Duration timeout);

    CompletableFuture<Acknowledge> updatePartitions(
            ExecutionAttemptID executionAttemptID,
            Iterable<PartitionInfo> partitionInfos,
            Duration timeout);

    void releasePartitions(JobID jobId, Set<ResultPartitionID> partitionIds);

    void notifyCheckpointOnComplete(
            ExecutionAttemptID executionAttemptID,
            JobID jobId,
            long completedCheckpointId,
            long completedTimestamp,
            long lastSubsumedCheckpointId);

    void notifyCheckpointAborted(
            ExecutionAttemptID executionAttemptID,
            JobID jobId,
            long checkpointId,
            long latestCompletedCheckpointId,
            long timestamp);

    CompletableFuture<Acknowledge> triggerCheckpoint(
            ExecutionAttemptID executionAttemptID,
            JobID jobId,
            long checkpointId,
            long timestamp,
            CheckpointOptions checkpointOptions);

    CompletableFuture<Acknowledge> freeSlot(
            final AllocationID allocationId,
            final Throwable cause,
            @RpcTimeout final Duration timeout);

    @Override
    CompletableFuture<Acknowledge> sendOperatorEventToTask(
            ExecutionAttemptID task, OperatorID operator, SerializedValue<OperatorEvent> evt);
}

2.看实现类`RpcTaskManagerGateway`

调的是TaskExecutorGateway实现类的triggerCheckpoint方法

java 复制代码

 public class RpcTaskManagerGateway implements TaskManagerGateway {

    private final TaskExecutorGateway taskExecutorGateway;

    private final JobMasterId jobMasterId;
    @Override
    public CompletableFuture<Acknowledge> triggerCheckpoint(
            ExecutionAttemptID executionAttemptID,
            JobID jobId,
            long checkpointId,
            long timestamp,
            CheckpointOptions checkpointOptions) {
         // 调的是TaskExecutorGateway实现类的triggerCheckpoint方法
        return taskExecutorGateway.triggerCheckpoint(
                executionAttemptID, checkpointId, timestamp, checkpointOptions);
    }
    ...
}

(1) 浅看一下`TaskExecutorGateway接口`

其实现类如图：

(2) 实现类`TaskExecutor`实现的`triggerCheckpoint()`

其实就是3步

根据任务尝试ID获取Task实例
触发Task的检查点Barrier方法(核心逻辑)
返回成功确认

java 复制代码

@Override
public CompletableFuture<Acknowledge> triggerCheckpoint(
        ExecutionAttemptID executionAttemptID,
        long checkpointId,
        long checkpointTimestamp,
        CheckpointOptions checkpointOptions) {
    // 1.根据任务尝试ID获取Task实例
    final Task task = taskSlotTable.getTask(executionAttemptID);
    if (task != null) {
        try (MdcCloseable ignored =
                MdcUtils.withContext(MdcUtils.asContextData(task.getJobID()))) {
            log.debug(
                    "Trigger checkpoint {}@{} for {}.",
                    checkpointId,
                    checkpointTimestamp,
                    executionAttemptID);
            // 2.触发Task的检查点Barrier方法(核心逻辑)
            task.triggerCheckpointBarrier(checkpointId, checkpointTimestamp, checkpointOptions);
            // 3.返回成功确认
            return CompletableFuture.completedFuture(Acknowledge.get());
        }
    } else {
        final String message =
                "TaskManager received a checkpoint request for unknown task "
                        + executionAttemptID
                        + '.';

        log.debug(message);
        return FutureUtils.completedExceptionally(
                new CheckpointException(
                        message, CheckpointFailureReason.TASK_CHECKPOINT_FAILURE));
    }
}

(3) 调用的`Task.triggerCheckpointBarrier()`

java 复制代码

public void triggerCheckpointBarrier(
        final long checkpointID,
        final long checkpointTimestamp,
        final CheckpointOptions checkpointOptions) {
    // 1.获取任务实际执行体(TaskInvokable实现类)
    final TaskInvokable invokable = this.invokable;
    // 2.封装检查点元数据
    final CheckpointMetaData checkpointMetaData =
            new CheckpointMetaData(
                    checkpointID, checkpointTimestamp, System.currentTimeMillis());

    if (executionState == ExecutionState.RUNNING) {
        // 3.校验任务执行体是否支持检查点，需要实现CheckpointableTask接口
        checkState(invokable instanceof CheckpointableTask, "invokable is not checkpointable");
        try {
            // 4.调用执行体的triggerCheckpointAsync(异步触发检查点方法)
            ((CheckpointableTask) invokable)
                    .triggerCheckpointAsync(checkpointMetaData, checkpointOptions)
                    .handle( // 5.处理响应结果
                            (triggerResult, exception) -> {
                                if (exception != null || !triggerResult) {
                                    // 5.1 失败，通知JobMaster检查点失败
                                    declineCheckpoint(
                                            checkpointID,
                                            CheckpointFailureReason.TASK_FAILURE,
                                            exception);
                                    return false;
                                }
                                return true;
                            });
            // 下面就是一些异常处理了，忽略
        } ...
}

好了，到这里我们看到调用是TaskInvokable实现类的triggerCheckpointAsync方法去做的

三.然后看TaskInvokable

1.TaskInvokable是一个接口

其实现类如下图

2.实现类`StreamTask`

其实也是套娃 triggerCheckpointAsync()->triggerCheckpointAsyncInMailbox()->performCheckpoint()->subtaskCheckpointCoordinator.checkpointState()

java 复制代码

步1.TaskInvokable.triggerCheckpointAsync()
@Override
public CompletableFuture<Boolean> triggerCheckpointAsync(
        CheckpointMetaData checkpointMetaData, CheckpointOptions checkpointOptions) {
    // 1.检查强制全量快照支持，若当前状态后端不支持，则抛异常
    checkForcedFullSnapshotSupport(checkpointOptions);
    /* 2.设置执行优先级
    *   UNALIGNED：非对齐检查点，设置为紧急优先级
    *   ALIGNED：对齐检查点，设置为默认优先级
    * */
    MailboxExecutor.MailOptions mailOptions =
            CheckpointOptions.AlignmentType.UNALIGNED == checkpointOptions.getAlignment()
                    ? MailboxExecutor.MailOptions.urgent()
                    : MailboxExecutor.MailOptions.options();
    // 3.异步执行检查点操作，根据输入是否结束，选择触发方法
    CompletableFuture<Boolean> result = new CompletableFuture<>();
    mainMailboxExecutor.execute(
            mailOptions,
            () -> {
                try {
                    boolean noUnfinishedInputGates =
                            Arrays.stream(getEnvironment().getAllInputGates())
                                    .allMatch(InputGate::isFinished);
                    // 3.1 所以输入已完成，调triggerCheckpointAsyncInMailbox()
                    if (noUnfinishedInputGates) {
                        result.complete(
                                triggerCheckpointAsyncInMailbox(
                                        checkpointMetaData, checkpointOptions));
                    } else { // 3.2 存在未完成的输入，调triggerUnfinishedChannelsCheckpoint()
                        result.complete(
                                triggerUnfinishedChannelsCheckpoint(
                                        checkpointMetaData, checkpointOptions));
                    }
                } catch (Exception ex) {
                    // Report the failure both via the Future result but also to the mailbox
                    result.completeExceptionally(ex);
                    throw ex;
                }
            },
            "checkpoint %s with %s",
            checkpointMetaData,
            checkpointOptions);
    return result;
}

步2.调用的triggerCheckpointAsyncInMailbox()
private boolean triggerCheckpointAsyncInMailbox(
        CheckpointMetaData checkpointMetaData, CheckpointOptions checkpointOptions)
        throws Exception {
    FlinkSecurityManager.monitorUserSystemExitForCurrentThread();
    try {
        // 计算检查点启动延迟：从 JobManager 触发检查点到 Task 实际开始执行检查点的时间差
        latestAsyncCheckpointStartDelayNanos =
                1_000_000
                        * Math.max(
                                0,
                                System.currentTimeMillis() - checkpointMetaData.getTimestamp());

        // 初始化检查点相关指标收集器
        CheckpointMetricsBuilder checkpointMetrics =
                new CheckpointMetricsBuilder()
                        .setAlignmentDurationNanos(0L)
                        .setBytesProcessedDuringAlignment(0L)
                        .setCheckpointStartDelayNanos(latestAsyncCheckpointStartDelayNanos);
        // 初始化输入处理器
        subtaskCheckpointCoordinator.initInputsCheckpoint(
                checkpointMetaData.getCheckpointId(), checkpointOptions);
        // 这里才是实际执行检查点的操作
        boolean success =
                performCheckpoint(checkpointMetaData, checkpointOptions, checkpointMetrics);
        if (!success) {
            declineCheckpoint(checkpointMetaData.getCheckpointId());
        }
        return success;
        // 下面是异常处理，忽略
    } 。。。
}

步3.调用的performCheckpoint()
private boolean performCheckpoint(
        CheckpointMetaData checkpointMetaData,
        CheckpointOptions checkpointOptions,
        CheckpointMetricsBuilder checkpointMetrics)
        throws Exception {

    final SnapshotType checkpointType = checkpointOptions.getCheckpointType();
    LOG.debug(
            "Starting checkpoint {} {} on task {}",
            checkpointMetaData.getCheckpointId(),
            checkpointType,
            getName());

    if (isRunning) { // 可运行
        actionExecutor.runThrowing(
                () -> {
                    // 同步保存点处理
                    if (isSynchronous(checkpointType)) {
                        setSynchronousSavepoint(checkpointMetaData.getCheckpointId());
                    }
                    // 标记任务完成后的首个检查点
                    if (areCheckpointsWithFinishedTasksEnabled()
                            && endOfDataReceived
                            && this.finalCheckpointMinId == null) {
                        this.finalCheckpointMinId = checkpointMetaData.getCheckpointId();
                    }

                    // 这里是负责barrier机制和状态快照的方法，subtaskCheckpointCoordinator是SubtaskCheckpointCoordinator接口的实现类
                    subtaskCheckpointCoordinator.checkpointState(
                            checkpointMetaData,        // 检查点元数据（ID、时间戳）
                            checkpointOptions,         // 检查点配置（对齐方式、优先级）
                            checkpointMetrics,         // 检查点指标收集器
                            operatorChain,             // 算子链（包含所有算子状态）
                            finishedOperators,         // 已完成的算子列表
                            this::isRunning);          // 任务状态检查函数
                });

        return true;
    } else { // 不可以运行
        actionExecutor.runThrowing(
                () -> {
                    final CancelCheckpointMarker message =
                            new CancelCheckpointMarker(checkpointMetaData.getCheckpointId());
                    recordWriter.broadcastEvent(message);
                });

        return false;
    }
}

四.最后看SubtaskCheckpointCoordinator

1.SubtaskCheckpointCoordinator是一个接口

其实现类如图：

2.看实现类`SubtaskCheckpointCoordinatorImpl`

(1) 核心方法`checkpointState()`

分为以下几步

处理已终止(超时或故障)的检查点,广播取消标记并调abort终止操作，然后通知下游算子，调operatorChain.broadcastEvent(new CancelCheckpointMarker())
准备检查点，进行预检查点工作，调operatorChain.prepareSnapshotPreBarrier()
广播Barrier，调operatorChain.broadcastEvent()
注册Barrier对齐的超时定时器，调registerAlignmentTimer()
处理通道状态，将输出缓存区的数据写入到状态后端StateBackEnd中，调channelStateWriter.finishOutput()
执行状态快照，调takeSnapshotSync()执行，然后调finishAndReportAsync()返回结果

java 复制代码

@Override
public void checkpointState(
        CheckpointMetaData metadata,
        CheckpointOptions options,
        CheckpointMetricsBuilder metrics,
        OperatorChain<?, ?> operatorChain,
        boolean isTaskFinished,
        Supplier<Boolean> isRunning)
        throws Exception {

    checkNotNull(options);
    checkNotNull(metrics);

    // 准备工作：检查点顺序验证，确保检查点按照递增顺序执行，丢弃重复和过时的checkpoint请求
    if (lastCheckpointId >= metadata.getCheckpointId()) {
        LOG.info(
                "Out of order checkpoint barrier (aborted previously?): {} >= {}",
                lastCheckpointId,
                metadata.getCheckpointId());
        channelStateWriter.abort(metadata.getCheckpointId(), new CancellationException(), true);
        checkAndClearAbortedStatus(metadata.getCheckpointId());
        return;
    }

    logCheckpointProcessingDelay(metadata);

    // Step (0): Record the last triggered checkpointId and abort the sync phase of checkpoint
    // if necessary.
    // 0.处理已终止(超时或故障)的检查点
    lastCheckpointId = metadata.getCheckpointId();
    if (checkAndClearAbortedStatus(metadata.getCheckpointId())) {
        // 广播取消标记并调abort终止操作，CancelCheckpointMarker通知下游算子
        operatorChain.broadcastEvent(new CancelCheckpointMarker(metadata.getCheckpointId()));
        channelStateWriter.abort(
                metadata.getCheckpointId(),
                new CancellationException("checkpoint aborted via notification"),
                true);
        LOG.info(
                "Checkpoint {} has been notified as aborted, would not trigger any checkpoint.",
                metadata.getCheckpointId());
        return;
    }
    // 准备检查点环境
    if (fileMergingSnapshotManager != null) {
        // notify file merging snapshot manager for managed dir lifecycle management
        fileMergingSnapshotManager.notifyCheckpointStart(
                FileMergingSnapshotManager.SubtaskKey.of(env), metadata.getCheckpointId());
    }

    if (options.getAlignment() == CheckpointOptions.AlignmentType.FORCED_ALIGNED) {
        options = options.withUnalignedSupported();
        initInputsCheckpoint(metadata.getCheckpointId(), options);
    }

    // Step (1): Prepare the checkpoint, allow operators to do some pre-barrier work.
    //           The pre-barrier work should be nothing or minimal in the common case.
    // 1.准备检查点，进行预检查点工作。
    operatorChain.prepareSnapshotPreBarrier(metadata.getCheckpointId());

    // Step (2): Send the checkpoint barrier downstream
    /*2.广播检查点Barrier
    * 其实这个CheckpointBarrier，就是检查点ID、时间戳、CheckpointOptions等信息的封装，属性如下
    * private final long id;
    * private final long timestamp;
    * private final CheckpointOptions checkpointOptions;
    * */
    LOG.debug(
            "Task {} broadcastEvent at {}, triggerTime {}, passed time {}",
            taskName,
            System.currentTimeMillis(),
            metadata.getTimestamp(),
            System.currentTimeMillis() - metadata.getTimestamp());
    CheckpointBarrier checkpointBarrier =
            new CheckpointBarrier(metadata.getCheckpointId(), metadata.getTimestamp(), options);
    operatorChain.broadcastEvent(checkpointBarrier, options.isUnalignedCheckpoint());

    // Step (3): Register alignment timer to timeout aligned barrier to unaligned barrier
    // 3.注册Barrier对齐的超时定时器
    registerAlignmentTimer(metadata.getCheckpointId(), operatorChain, checkpointBarrier);

    // Step (4): Prepare to spill the in-flight buffers for input and output
    // 4.处理通道状态，将输出缓存区的数据写入到状态后端StateBackEnd中
    if (options.needsChannelState()) {
        // output data already written while broadcasting event
        channelStateWriter.finishOutput(metadata.getCheckpointId());
    }

    // Step (5): Take the state snapshot. This should be largely asynchronous, to not impact
    // progress of the
    // streaming topology
    /*5.执行状态快照
    *  takeSnapshotSync()：同步触发算子链的状态快照，返回异步结果。
    *  finishAndReportAsync()：异步处理快照结果，完成后通知 JobManager。
    *  cleanup()：处理失败情况，释放资源并记录错误。
    * */
    Map<OperatorID, OperatorSnapshotFutures> snapshotFutures =
            CollectionUtil.newHashMapWithExpectedSize(operatorChain.getNumberOfOperators());
    try {
        if (takeSnapshotSync(
                snapshotFutures, metadata, metrics, options, operatorChain, isRunning)) {
            finishAndReportAsync(
                    snapshotFutures,
                    metadata,
                    metrics,
                    operatorChain.isTaskDeployedAsFinished(),
                    isTaskFinished,
                    isRunning);
        } else {
            cleanup(snapshotFutures, metadata, metrics, new Exception("Checkpoint declined"));
        }
    } catch (Exception ex) {
        cleanup(snapshotFutures, metadata, metrics, ex);
        throw ex;
    }
}

(2) 调用的`takeSnapshotSync()`

核心其实就是调operatorChain的snapshotState()去做状态快照，存到storage对应的存储位置上

java 复制代码

private boolean takeSnapshotSync(
        Map<OperatorID, OperatorSnapshotFutures> operatorSnapshotsInProgress,
        CheckpointMetaData checkpointMetaData,
        CheckpointMetricsBuilder checkpointMetrics,
        CheckpointOptions checkpointOptions,
        OperatorChain<?, ?> operatorChain,
        Supplier<Boolean> isRunning)
        throws Exception {

    checkState(
            !operatorChain.isClosed(),
            "OperatorChain and Task should never be closed at this point");

    long checkpointId = checkpointMetaData.getCheckpointId();
    long started = System.nanoTime();
    // 通道状态处理，记录检查点时刻的输入、输出缓冲区数据，用于故障恢复
    ChannelStateWriteResult channelStateWriteResult =
            checkpointOptions.needsChannelState()
                    ? channelStateWriter.getAndRemoveWriteResult(checkpointId)
                    : ChannelStateWriteResult.EMPTY;
    // 解析存储位置
    CheckpointStreamFactory storage =
            checkpointStorage.resolveCheckpointStorageLocation(
                    checkpointId, checkpointOptions.getTargetLocation());
    storage = applyFileMergingCheckpoint(storage, checkpointOptions);

    try {
        // 调operatorChain的snapshotState()去做状态快照，存到storage对应的存储位置上
        operatorChain.snapshotState(
                operatorSnapshotsInProgress,
                checkpointMetaData,
                checkpointOptions,
                isRunning,
                channelStateWriteResult,
                storage);

    } finally {
        // 清理缓存
        checkpointStorage.clearCacheFor(checkpointId);
    }
    // 记录同步阶段的耗时
    checkpointMetrics.setSyncDurationMillis((System.nanoTime() - started) / 1_000_000);

    LOG.debug(
            "{} - finished synchronous part of checkpoint {}. Alignment duration: {} ms, snapshot duration {} ms, is unaligned checkpoint : {}",
            taskName,
            checkpointId,
            checkpointMetrics.getAlignmentDurationNanosOrDefault() / 1_000_000,
            checkpointMetrics.getSyncDurationMillis(),
            checkpointOptions.isUnalignedCheckpoint());

    return true;
}

(3) 调用的`finishAndReportAsync()`

java 复制代码

private void finishAndReportAsync(
        Map<OperatorID, OperatorSnapshotFutures> snapshotFutures,
        CheckpointMetaData metadata,
        CheckpointMetricsBuilder metrics,
        boolean isTaskDeployedAsFinished,
        boolean isTaskFinished,
        Supplier<Boolean> isRunning)
        throws IOException {
    // 创建异步任务
    AsyncCheckpointRunnable asyncCheckpointRunnable =
            new AsyncCheckpointRunnable(
                    snapshotFutures,          // 各算子的异步快照结果
                    metadata,                 // 检查点元数据（ID、时间戳）
                    metrics,                  // 检查点指标收集器
                    System.nanoTime(),        // 异步阶段开始时间
                    taskName,                 // 任务名称
                    unregisterConsumer(),     // 资源清理回调
                    env,                      // 任务环境（含配置、状态后端）
                    asyncExceptionHandler,    // 异步异常处理器
                    isTaskDeployedAsFinished, // 任务是否以"已完成"状态部署
                    isTaskFinished,           // 任务是否已完成数据处理
                    isRunning);               // 任务是否仍在运行的判断函数
    // 注册异步任务
    registerAsyncCheckpointRunnable(
            asyncCheckpointRunnable.getCheckpointId(), asyncCheckpointRunnable);

    // 提交异步任务到线程池执行
    asyncOperationsThreadPool.execute(asyncCheckpointRunnable);
}

(4) 调用的`registerAlignmentTimer()`

若barrier对齐超时了，那么会触发降级策略，改为barrier非对齐模式，具体解析看下章目的：允许barrier跨越数据，避免阻塞，避免检查点成为流处理的瓶颈

比如有一个案例[数据A, 数据B, barrier-C, 数据D, 数据E]

barrier对齐模式不允许跨越：barrier-C必须等数据A和数据B处理完并发送到下游后，才能传递barrier-C
barrier非对齐模式允许跨越：barrier-C不等待数据A和数据B处理完成，直接将barrier-C标记为"高优先级"，优先传递到下游算子注意：虽然barrier-C跨越了数据 A 和 B，但这些数据被存档到检查点中。当任务故障恢复时，Flink 会重放这些存档的数据，确保下游算子最终能收到完整的 A、B、D、E（不会丢失数据）。

java 复制代码

private void registerAlignmentTimer(
        long checkpointId,
        OperatorChain<?, ?> operatorChain,
        CheckpointBarrier checkpointBarrier) {
    // 取消已有的定时器，避免多个检查点的定时器相互干扰
    cancelAlignmentTimer();
    // 判断是否需要超时保护
    if (!checkpointBarrier.getCheckpointOptions().isTimeoutable()) {
        return;
    }
    // 计算延迟
    long timerDelay = BarrierAlignmentUtil.getTimerDelay(clock, checkpointBarrier);
    // 注册超时任务
    alignmentTimer =
            registerTimer.registerTask(
                    () -> {
                        try {
                            // 超时后，将barrier对齐转为非对齐逻辑  这是降级机制，具体实现看下章
                            operatorChain.alignedBarrierTimeout(checkpointId);
                        } catch (Exception e) {
                            ExceptionUtils.rethrowIOException(e);
                        }
                        alignmentTimer = null;
                        return null;
                    },
                    Duration.ofMillis(timerDelay));
    alignmentCheckpointId = checkpointId;
}

Flink-Checkpoint-1.源码流程

一.先从CheckpointCoordinator入手

1.触发检查点的流程

2.看startTriggeringCheckpoint()这个前期准备

3.然后看triggerCheckpointRequest()

4.看triggerTasks()

(1) 调用的execution.triggerCheckpoint()

(2) 继续调的triggerCheckpointHelper()

二.再看TaskManagerGateway

1.首先TaskManagerGateway是一个接口

2.看实现类RpcTaskManagerGateway

(1) 浅看一下TaskExecutorGateway接口

(2) 实现类TaskExecutor实现的triggerCheckpoint()

(3) 调用的Task.triggerCheckpointBarrier()

三.然后看TaskInvokable

1.TaskInvokable是一个接口

2.实现类StreamTask

四.最后看SubtaskCheckpointCoordinator

1.SubtaskCheckpointCoordinator是一个接口

2.看实现类SubtaskCheckpointCoordinatorImpl

(1) 核心方法checkpointState()

(2) 调用的takeSnapshotSync()

(3) 调用的finishAndReportAsync()

(4) 调用的registerAlignmentTimer()

2.看`startTriggeringCheckpoint()`这个前期准备

3.然后看`triggerCheckpointRequest()`

4.看`triggerTasks()`

(1) 调用的`execution.triggerCheckpoint()`

(2) 继续调的`triggerCheckpointHelper()`

2.看实现类`RpcTaskManagerGateway`

(1) 浅看一下`TaskExecutorGateway接口`

(2) 实现类`TaskExecutor`实现的`triggerCheckpoint()`

(3) 调用的`Task.triggerCheckpointBarrier()`

2.实现类`StreamTask`

2.看实现类`SubtaskCheckpointCoordinatorImpl`

(1) 核心方法`checkpointState()`

(2) 调用的`takeSnapshotSync()`

(3) 调用的`finishAndReportAsync()`

(4) 调用的`registerAlignmentTimer()`