一.先从CheckpointCoordinator入手
1.触发检查点的流程
检查点 :triggerCheckpoint(CheckpointType checkpointType)
->调triggerCheckpointFromCheckpointThread(
) -> 调triggerCheckpoint()重载方法
-> 调startTriggeringCheckpoint()
保存点 :triggerSavepoint()
或triggerSynchronousSavepoint()
-> 调triggerSavepointInternal()
-> 调triggerCheckpointFromCheckpointThread()
-> 然后就跟上面的检查点流程一样了
重点注意一下triggerCheckpoint(CheckpointType checkpointType)
java
public CompletableFuture<CompletedCheckpoint> triggerCheckpoint(CheckpointType checkpointType) {
if (checkpointType == null) {
throw new IllegalArgumentException("checkpointType cannot be null");
}
// 设置检查点类型
final SnapshotType snapshotType;
switch (checkpointType) {
case CONFIGURED:
snapshotType = checkpointProperties.getCheckpointType();
break;
case FULL:
snapshotType = FULL_CHECKPOINT;
break;
case INCREMENTAL:
snapshotType = CHECKPOINT;
break;
default:
throw new IllegalArgumentException("unknown checkpointType: " + checkpointType);
}
// 构建CheckpointProperties属性配置
final CheckpointProperties properties =
new CheckpointProperties(
checkpointProperties.forceCheckpoint(), // 是否强制触发(忽略最小间隔等限制)
snapshotType, // 增量|全量
checkpointProperties.discardOnSubsumed(), // 被新Checkpoint覆盖时是否丢弃
checkpointProperties.discardOnJobFinished(), // 作业完成时是否丢弃
checkpointProperties.discardOnJobCancelled(), // 作业取消时是否丢弃
checkpointProperties.discardOnJobFailed(), // 作业失败时是否丢弃
checkpointProperties.discardOnJobSuspended(), // 作业暂停时是否丢弃
checkpointProperties.isUnclaimed()); // 是否为"未认领"状态(用于特殊场景,如外部恢复)
return triggerCheckpointFromCheckpointThread(properties, null, false);
}
2.看startTriggeringCheckpoint()
这个前期准备
分为7步
- 前置检查与状态初始化
- 计算检查点计划
- 创建待处理的Checkpoint
- 设置存储路径并异步触发所有OperatorCoordinator 的 Checkpoint 操作
- 捕获主状态
- 合并所有异步操作的完成结果,给masterTriggerCompletionPromise
- 处理这些异步操作完成的结果
- 失败:调
onTriggerFailure()
去处理异常 - 成功:调
triggerCheckpointRequest()
- 失败:调
java
private void startTriggeringCheckpoint(CheckpointTriggerRequest request) {
try {
// 1.前置检查与状态初始化
synchronized (lock) {
// 验证系统状态是否适合启动检查点,确认系统未关闭
preCheckGlobalState(request.isPeriodic);
}
// we will actually trigger this checkpoint!
Preconditions.checkState(!isTriggering);
isTriggering = true;
final long timestamp = System.currentTimeMillis();
// 2.计算检查点计划
CompletableFuture<CheckpointPlan> checkpointPlanFuture =
checkpointPlanCalculator.calculateCheckpointPlan();
boolean initializeBaseLocations = !baseLocationsForCheckpointInitialized;
baseLocationsForCheckpointInitialized = true;
// 这个masterTriggerCompletionPromise最后会合并下面几个CompletableFuture的结果
CompletableFuture<Void> masterTriggerCompletionPromise = new CompletableFuture<>();
// 3.创建待处理的Checkpoint
final CompletableFuture<PendingCheckpoint> pendingCheckpointCompletableFuture =
checkpointPlanFuture
.thenApplyAsync(
plan -> {
try {
// 生成唯一且递增的CheckpointID,确保分布式环境下唯一性
long checkpointID =
checkpointIdCounter.getAndIncrement();
// 将plan和CheckpointID封装成Tuple2往下传递
return new Tuple2<>(plan, checkpointID);
} catch (Throwable e) {
throw new CompletionException(e);
}
},
executor)
.thenApplyAsync(
// 这里的checkpointInfo是上面的tuple2<plan,CheckpointID>
(checkpointInfo) ->
// 创建pendingCheckpoint(代表正在进行的Checkpoint),该方法会创建PendingCheckpoint对象,将其放到pendingCheckpoints中缓存起来,然后再return
createPendingCheckpoint(
timestamp,
request.props,
checkpointInfo.f0,
request.isPeriodic,
checkpointInfo.f1,
request.getOnCompletionFuture(),
masterTriggerCompletionPromise),
timer);
// 4.设置存储路径并定时触发所有OperatorCoordinator 的 Checkpoint 操作
final CompletableFuture<?> coordinatorCheckpointsComplete =
pendingCheckpointCompletableFuture
.thenApplyAsync(
pendingCheckpoint -> {
try {
// 初始化checkpoint的存储位置,调checkpointStorageView.initializeLocationForCheckpoint(checkpointID);
CheckpointStorageLocation checkpointStorageLocation =
initializeCheckpointLocation(
pendingCheckpoint.getCheckpointID(),
request.props,
request.externalSavepointLocation,
initializeBaseLocations);
// 将pendingCheckpoint和存储位置封装到tuple2,往下传递
return Tuple2.of(
pendingCheckpoint, checkpointStorageLocation);
} catch (Throwable e) {
throw new CompletionException(e);
}
},
executor)
.thenComposeAsync(
// 这里的checkpointInfo是上面的tuple2<pendingCheckpoint,存储位置>
(checkpointInfo) -> {
PendingCheckpoint pendingCheckpoint = checkpointInfo.f0;
// 检查当前正在执行的checkpoin是否已被释放,若是,则跳过操作;不是,则进行后续操作
if (pendingCheckpoint.isDisposed()) {
// The disposed checkpoint will be handled later,
// skip snapshotting the coordinator states.
return null;
}
// 对其设置目标路径,就是你要把checkpoin存到哪里
synchronized (lock) {
pendingCheckpoint.setCheckpointTargetLocation(
checkpointInfo.f1);
}
// 让timer线程池去异步触发所有OperatorCoordinator 的 Checkpoint 操作,并返回一个异步 Future。
return OperatorCoordinatorCheckpoints
.triggerAndAcknowledgeAllCoordinatorCheckpointsWithCompletion(
coordinatorsToCheckpoint,
pendingCheckpoint,
timer);
},
timer);
// We have to take the snapshot of the master hooks after the coordinator checkpoints
// has completed.
// This is to ensure the tasks are checkpointed after the OperatorCoordinators in case
// ExternallyInducedSource is used.
// 5.捕获主状态
final CompletableFuture<?> masterStatesComplete =
coordinatorCheckpointsComplete.thenComposeAsync(
ignored -> {
// If the code reaches here, the pending checkpoint is guaranteed to
// be not null.
// We use FutureUtils.getWithoutException() to make compiler happy
// with checked
// exceptions in the signature.
PendingCheckpoint checkpoint =
FutureUtils.getWithoutException(
pendingCheckpointCompletableFuture);
if (checkpoint == null || checkpoint.isDisposed()) {
// The disposed checkpoint will be handled later,
// skip snapshotting the master states.
return null;
}
// 捕获与JobManager相关的主状态,如作业配置,算子状态等
return snapshotMasterState(checkpoint);
},
timer);
// 6.合并所有异步操作的完成结果,给masterTriggerCompletionPromise
FutureUtils.forward(
// 当且仅当4、5都完成时,才返回结果给masterTriggerCompletionPromise
CompletableFuture.allOf(masterStatesComplete, coordinatorCheckpointsComplete),
masterTriggerCompletionPromise);
// 7. 处理这些异步操作完成的结果
FutureUtils.assertNoException(
masterTriggerCompletionPromise
.handleAsync(
(ignored, throwable) -> {
final PendingCheckpoint checkpoint =
FutureUtils.getWithoutException(
pendingCheckpointCompletableFuture);
Preconditions.checkState(
checkpoint != null || throwable != null,
"Either the pending checkpoint needs to be created or an error must have occurred.");
// throwable为null,表示成功;不为null,表示失败
if (throwable != null) {
// the initialization might not be finished yet
if (checkpoint == null) {
onTriggerFailure(request, throwable);
} else {
onTriggerFailure(checkpoint, throwable);
}
} else {
// 成功的处理逻辑,调triggerCheckpointRequest向所有TM发送检查点请求,开始实际的状态保存过程
triggerCheckpointRequest(
request, timestamp, checkpoint);
}
return null;
},
timer)
.exceptionally( // 这里就是异常处理
error -> {
if (!isShutdown()) {
throw new CompletionException(error);
} else if (findThrowable(
error, RejectedExecutionException.class)
.isPresent()) {
LOG.debug("Execution rejected during shutdown");
} else {
LOG.warn("Error encountered during shutdown", error);
}
return null;
}));
} catch (Throwable throwable) {
onTriggerFailure(request, throwable);
}
}
3.然后看triggerCheckpointRequest()
java
private void triggerCheckpointRequest(
CheckpointTriggerRequest request, long timestamp, PendingCheckpoint checkpoint) {
// 再次检查checpoint是否已失效
if (checkpoint.isDisposed()) {
onTriggerFailure(
checkpoint,
new CheckpointException(
CheckpointFailureReason.TRIGGER_CHECKPOINT_FAILURE,
checkpoint.getFailureCause()));
} else {
// 调triggerTasks去触发任务执行
triggerTasks(request, timestamp, checkpoint)
.exceptionally( // 异常处理,省略
failure -> {。。。}
});
// 最终检查是否所有的Task都完成了Checkpoint
if (maybeCompleteCheckpoint(checkpoint)) {
// 将isTriggering标记为false;并调度下一次的Checkpoint
onTriggerSuccess();
}
}
}
// 当前Checkpoint成功的调用:将isTriggering标记为false;并调度下一次的Checkpoint
private void onTriggerSuccess() {
isTriggering = false;
executeQueuedRequest();
}
4.看triggerTasks()
分为4步
- 获取当前的CheckpointID
- 确定Checkpoint的类型(全量|增量),涉及上面
triggerCheckpoint(CheckpointType checkpointType)
- 构建Checkpoint配置
- 向所有Task发送触发指令
- 合并所有Task的结果
- 全成功:总Future完成(isDone() 为 true)
- 有一个失败:总Future会异常完成,有exception
java
private CompletableFuture<Void> triggerTasks(
CheckpointTriggerRequest request, long timestamp, PendingCheckpoint checkpoint) {
// 1.获取当前的CheckpointID
final long checkpointId = checkpoint.getCheckpointID();
// 2.确定Checkpoint的类型(全量|增量)
final SnapshotType type;
if (this.forceFullSnapshot && !request.props.isSavepoint()) { // 强制全量且非保存点才会走这
type = FULL_CHECKPOINT;
} else {
type = request.props.getCheckpointType(); // 这里的props是在上面triggerCheckpoint(CheckpointType checkpointType)中设置的
}
// 3.构建Checkpoint配置
final CheckpointOptions checkpointOptions =
CheckpointOptions.forConfig(
type, // 检查点类型,增量还是全量
checkpoint.getCheckpointStorageLocation().getLocationReference(), // 存储路径的引用
isExactlyOnceMode, // 是否开启精准一次语义
unalignedCheckpointsEnabled, // 是否允许非对齐Barrier
alignedCheckpointTimeout); // 对齐Barrier的超时时间
// 4.向所有Task发送触发指令
List<CompletableFuture<Acknowledge>> acks = new ArrayList<>();
for (Execution execution : checkpoint.getCheckpointPlan().getTasksToTrigger()) {
if (request.props.isSynchronous()) { // 同步模式
acks.add(
execution.triggerSynchronousSavepoint(
checkpointId, timestamp, checkpointOptions));
} else { // 异步模式
acks.add(execution.triggerCheckpoint(checkpointId, timestamp, checkpointOptions));
}
}
/* 合并所有Task的结果
若所有Task都成功确认,总Future完成(isDone() 为 true)
若任一Task失败(如已崩溃、网络超时),总Future会异常完成,有exception
*/
return FutureUtils.waitForAll(acks);
}
(1) 调用的execution.triggerCheckpoint()
java
public CompletableFuture<Acknowledge> triggerCheckpoint(
long checkpointId, long timestamp, CheckpointOptions checkpointOptions) {
// 继续调triggerCheckpointHelper
return triggerCheckpointHelper(checkpointId, timestamp, checkpointOptions);
}
(2) 继续调的triggerCheckpointHelper()
java
private CompletableFuture<Acknowledge> triggerCheckpointHelper(
long checkpointId, long timestamp, CheckpointOptions checkpointOptions) {
// 1.获取分配的资源(slot)
final LogicalSlot slot = assignedResource;
// 2.验证资源并获取TM的网关
if (slot != null) {
final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway();
// 3.发送检查点触发请求,就是让目标方的TM去调triggerCheckpoint,完成Checkpoint
return taskManagerGateway.triggerCheckpoint(
attemptId, getVertex().getJobId(), checkpointId, timestamp, checkpointOptions);
}
LOG.debug(
"The execution has no slot assigned. This indicates that the execution is no longer running.");
return CompletableFuture.completedFuture(Acknowledge.get());
}
好了,到这里,我们知道又去调目标端的TM网关去做triggerCheckpoint,接下来,看taskManagerGateway把
二.再看TaskManagerGateway
1.首先TaskManagerGateway是一个接口
他的实现类如图:

其接口代码如下
java
public interface TaskManagerGateway extends TaskExecutorOperatorEventGateway {
String getAddress();
CompletableFuture<Acknowledge> submitTask(TaskDeploymentDescriptor tdd, Duration timeout);
CompletableFuture<Acknowledge> cancelTask(
ExecutionAttemptID executionAttemptID, Duration timeout);
CompletableFuture<Acknowledge> updatePartitions(
ExecutionAttemptID executionAttemptID,
Iterable<PartitionInfo> partitionInfos,
Duration timeout);
void releasePartitions(JobID jobId, Set<ResultPartitionID> partitionIds);
void notifyCheckpointOnComplete(
ExecutionAttemptID executionAttemptID,
JobID jobId,
long completedCheckpointId,
long completedTimestamp,
long lastSubsumedCheckpointId);
void notifyCheckpointAborted(
ExecutionAttemptID executionAttemptID,
JobID jobId,
long checkpointId,
long latestCompletedCheckpointId,
long timestamp);
CompletableFuture<Acknowledge> triggerCheckpoint(
ExecutionAttemptID executionAttemptID,
JobID jobId,
long checkpointId,
long timestamp,
CheckpointOptions checkpointOptions);
CompletableFuture<Acknowledge> freeSlot(
final AllocationID allocationId,
final Throwable cause,
@RpcTimeout final Duration timeout);
@Override
CompletableFuture<Acknowledge> sendOperatorEventToTask(
ExecutionAttemptID task, OperatorID operator, SerializedValue<OperatorEvent> evt);
}
2.看实现类RpcTaskManagerGateway
调的是TaskExecutorGateway实现类的triggerCheckpoint方法
java
public class RpcTaskManagerGateway implements TaskManagerGateway {
private final TaskExecutorGateway taskExecutorGateway;
private final JobMasterId jobMasterId;
@Override
public CompletableFuture<Acknowledge> triggerCheckpoint(
ExecutionAttemptID executionAttemptID,
JobID jobId,
long checkpointId,
long timestamp,
CheckpointOptions checkpointOptions) {
// 调的是TaskExecutorGateway实现类的triggerCheckpoint方法
return taskExecutorGateway.triggerCheckpoint(
executionAttemptID, checkpointId, timestamp, checkpointOptions);
}
...
}
(1) 浅看一下TaskExecutorGateway接口
其实现类如图:

(2) 实现类TaskExecutor
实现的triggerCheckpoint()
其实就是3步
- 根据任务尝试ID获取Task实例
- 触发Task的检查点Barrier方法(核心逻辑)
- 返回成功确认
java
@Override
public CompletableFuture<Acknowledge> triggerCheckpoint(
ExecutionAttemptID executionAttemptID,
long checkpointId,
long checkpointTimestamp,
CheckpointOptions checkpointOptions) {
// 1.根据任务尝试ID获取Task实例
final Task task = taskSlotTable.getTask(executionAttemptID);
if (task != null) {
try (MdcCloseable ignored =
MdcUtils.withContext(MdcUtils.asContextData(task.getJobID()))) {
log.debug(
"Trigger checkpoint {}@{} for {}.",
checkpointId,
checkpointTimestamp,
executionAttemptID);
// 2.触发Task的检查点Barrier方法(核心逻辑)
task.triggerCheckpointBarrier(checkpointId, checkpointTimestamp, checkpointOptions);
// 3.返回成功确认
return CompletableFuture.completedFuture(Acknowledge.get());
}
} else {
final String message =
"TaskManager received a checkpoint request for unknown task "
+ executionAttemptID
+ '.';
log.debug(message);
return FutureUtils.completedExceptionally(
new CheckpointException(
message, CheckpointFailureReason.TASK_CHECKPOINT_FAILURE));
}
}
(3) 调用的Task.triggerCheckpointBarrier()
java
public void triggerCheckpointBarrier(
final long checkpointID,
final long checkpointTimestamp,
final CheckpointOptions checkpointOptions) {
// 1.获取任务实际执行体(TaskInvokable实现类)
final TaskInvokable invokable = this.invokable;
// 2.封装检查点元数据
final CheckpointMetaData checkpointMetaData =
new CheckpointMetaData(
checkpointID, checkpointTimestamp, System.currentTimeMillis());
if (executionState == ExecutionState.RUNNING) {
// 3.校验任务执行体是否支持检查点,需要实现CheckpointableTask接口
checkState(invokable instanceof CheckpointableTask, "invokable is not checkpointable");
try {
// 4.调用执行体的triggerCheckpointAsync(异步触发检查点方法)
((CheckpointableTask) invokable)
.triggerCheckpointAsync(checkpointMetaData, checkpointOptions)
.handle( // 5.处理响应结果
(triggerResult, exception) -> {
if (exception != null || !triggerResult) {
// 5.1 失败,通知JobMaster检查点失败
declineCheckpoint(
checkpointID,
CheckpointFailureReason.TASK_FAILURE,
exception);
return false;
}
return true;
});
// 下面就是一些异常处理了,忽略
} ...
}
好了,到这里我们看到调用是TaskInvokable实现类的triggerCheckpointAsync方法去做的
三.然后看TaskInvokable
1.TaskInvokable是一个接口
其实现类如下图
2.实现类StreamTask
其实也是套娃 triggerCheckpointAsync()
->triggerCheckpointAsyncInMailbox()
->performCheckpoint()
->subtaskCheckpointCoordinator.checkpointState()
java
步1.TaskInvokable.triggerCheckpointAsync()
@Override
public CompletableFuture<Boolean> triggerCheckpointAsync(
CheckpointMetaData checkpointMetaData, CheckpointOptions checkpointOptions) {
// 1.检查强制全量快照支持,若当前状态后端不支持,则抛异常
checkForcedFullSnapshotSupport(checkpointOptions);
/* 2.设置执行优先级
* UNALIGNED:非对齐检查点,设置为紧急优先级
* ALIGNED:对齐检查点,设置为默认优先级
* */
MailboxExecutor.MailOptions mailOptions =
CheckpointOptions.AlignmentType.UNALIGNED == checkpointOptions.getAlignment()
? MailboxExecutor.MailOptions.urgent()
: MailboxExecutor.MailOptions.options();
// 3.异步执行检查点操作,根据输入是否结束,选择触发方法
CompletableFuture<Boolean> result = new CompletableFuture<>();
mainMailboxExecutor.execute(
mailOptions,
() -> {
try {
boolean noUnfinishedInputGates =
Arrays.stream(getEnvironment().getAllInputGates())
.allMatch(InputGate::isFinished);
// 3.1 所以输入已完成,调triggerCheckpointAsyncInMailbox()
if (noUnfinishedInputGates) {
result.complete(
triggerCheckpointAsyncInMailbox(
checkpointMetaData, checkpointOptions));
} else { // 3.2 存在未完成的输入,调triggerUnfinishedChannelsCheckpoint()
result.complete(
triggerUnfinishedChannelsCheckpoint(
checkpointMetaData, checkpointOptions));
}
} catch (Exception ex) {
// Report the failure both via the Future result but also to the mailbox
result.completeExceptionally(ex);
throw ex;
}
},
"checkpoint %s with %s",
checkpointMetaData,
checkpointOptions);
return result;
}
步2.调用的triggerCheckpointAsyncInMailbox()
private boolean triggerCheckpointAsyncInMailbox(
CheckpointMetaData checkpointMetaData, CheckpointOptions checkpointOptions)
throws Exception {
FlinkSecurityManager.monitorUserSystemExitForCurrentThread();
try {
// 计算检查点启动延迟:从 JobManager 触发检查点到 Task 实际开始执行检查点的时间差
latestAsyncCheckpointStartDelayNanos =
1_000_000
* Math.max(
0,
System.currentTimeMillis() - checkpointMetaData.getTimestamp());
// 初始化检查点相关指标收集器
CheckpointMetricsBuilder checkpointMetrics =
new CheckpointMetricsBuilder()
.setAlignmentDurationNanos(0L)
.setBytesProcessedDuringAlignment(0L)
.setCheckpointStartDelayNanos(latestAsyncCheckpointStartDelayNanos);
// 初始化输入处理器
subtaskCheckpointCoordinator.initInputsCheckpoint(
checkpointMetaData.getCheckpointId(), checkpointOptions);
// 这里才是实际执行检查点的操作
boolean success =
performCheckpoint(checkpointMetaData, checkpointOptions, checkpointMetrics);
if (!success) {
declineCheckpoint(checkpointMetaData.getCheckpointId());
}
return success;
// 下面是异常处理,忽略
} 。。。
}
步3.调用的performCheckpoint()
private boolean performCheckpoint(
CheckpointMetaData checkpointMetaData,
CheckpointOptions checkpointOptions,
CheckpointMetricsBuilder checkpointMetrics)
throws Exception {
final SnapshotType checkpointType = checkpointOptions.getCheckpointType();
LOG.debug(
"Starting checkpoint {} {} on task {}",
checkpointMetaData.getCheckpointId(),
checkpointType,
getName());
if (isRunning) { // 可运行
actionExecutor.runThrowing(
() -> {
// 同步保存点处理
if (isSynchronous(checkpointType)) {
setSynchronousSavepoint(checkpointMetaData.getCheckpointId());
}
// 标记任务完成后的首个检查点
if (areCheckpointsWithFinishedTasksEnabled()
&& endOfDataReceived
&& this.finalCheckpointMinId == null) {
this.finalCheckpointMinId = checkpointMetaData.getCheckpointId();
}
// 这里是负责barrier机制和状态快照的方法,subtaskCheckpointCoordinator是SubtaskCheckpointCoordinator接口的实现类
subtaskCheckpointCoordinator.checkpointState(
checkpointMetaData, // 检查点元数据(ID、时间戳)
checkpointOptions, // 检查点配置(对齐方式、优先级)
checkpointMetrics, // 检查点指标收集器
operatorChain, // 算子链(包含所有算子状态)
finishedOperators, // 已完成的算子列表
this::isRunning); // 任务状态检查函数
});
return true;
} else { // 不可以运行
actionExecutor.runThrowing(
() -> {
final CancelCheckpointMarker message =
new CancelCheckpointMarker(checkpointMetaData.getCheckpointId());
recordWriter.broadcastEvent(message);
});
return false;
}
}
四.最后看SubtaskCheckpointCoordinator
1.SubtaskCheckpointCoordinator是一个接口
其实现类如图:

2.看实现类SubtaskCheckpointCoordinatorImpl
(1) 核心方法checkpointState()
分为以下几步
- 处理已终止(超时或故障)的检查点,广播取消标记并调abort终止操作,然后通知下游算子,调
operatorChain.broadcastEvent(new CancelCheckpointMarker())
- 准备检查点,进行预检查点工作,调
operatorChain.prepareSnapshotPreBarrier()
- 广播Barrier,调
operatorChain.broadcastEvent()
- 注册Barrier对齐的超时定时器,调
registerAlignmentTimer()
- 处理通道状态,将输出缓存区的数据写入到状态后端StateBackEnd中,调
channelStateWriter.finishOutput()
- 执行状态快照,调
takeSnapshotSync()
执行,然后调finishAndReportAsync()
返回结果
java
@Override
public void checkpointState(
CheckpointMetaData metadata,
CheckpointOptions options,
CheckpointMetricsBuilder metrics,
OperatorChain<?, ?> operatorChain,
boolean isTaskFinished,
Supplier<Boolean> isRunning)
throws Exception {
checkNotNull(options);
checkNotNull(metrics);
// 准备工作:检查点顺序验证,确保检查点按照递增顺序执行,丢弃重复和过时的checkpoint请求
if (lastCheckpointId >= metadata.getCheckpointId()) {
LOG.info(
"Out of order checkpoint barrier (aborted previously?): {} >= {}",
lastCheckpointId,
metadata.getCheckpointId());
channelStateWriter.abort(metadata.getCheckpointId(), new CancellationException(), true);
checkAndClearAbortedStatus(metadata.getCheckpointId());
return;
}
logCheckpointProcessingDelay(metadata);
// Step (0): Record the last triggered checkpointId and abort the sync phase of checkpoint
// if necessary.
// 0.处理已终止(超时或故障)的检查点
lastCheckpointId = metadata.getCheckpointId();
if (checkAndClearAbortedStatus(metadata.getCheckpointId())) {
// 广播取消标记并调abort终止操作,CancelCheckpointMarker通知下游算子
operatorChain.broadcastEvent(new CancelCheckpointMarker(metadata.getCheckpointId()));
channelStateWriter.abort(
metadata.getCheckpointId(),
new CancellationException("checkpoint aborted via notification"),
true);
LOG.info(
"Checkpoint {} has been notified as aborted, would not trigger any checkpoint.",
metadata.getCheckpointId());
return;
}
// 准备检查点环境
if (fileMergingSnapshotManager != null) {
// notify file merging snapshot manager for managed dir lifecycle management
fileMergingSnapshotManager.notifyCheckpointStart(
FileMergingSnapshotManager.SubtaskKey.of(env), metadata.getCheckpointId());
}
if (options.getAlignment() == CheckpointOptions.AlignmentType.FORCED_ALIGNED) {
options = options.withUnalignedSupported();
initInputsCheckpoint(metadata.getCheckpointId(), options);
}
// Step (1): Prepare the checkpoint, allow operators to do some pre-barrier work.
// The pre-barrier work should be nothing or minimal in the common case.
// 1.准备检查点,进行预检查点工作。
operatorChain.prepareSnapshotPreBarrier(metadata.getCheckpointId());
// Step (2): Send the checkpoint barrier downstream
/*2.广播检查点Barrier
* 其实这个CheckpointBarrier,就是检查点ID、时间戳、CheckpointOptions等信息的封装,属性如下
* private final long id;
* private final long timestamp;
* private final CheckpointOptions checkpointOptions;
* */
LOG.debug(
"Task {} broadcastEvent at {}, triggerTime {}, passed time {}",
taskName,
System.currentTimeMillis(),
metadata.getTimestamp(),
System.currentTimeMillis() - metadata.getTimestamp());
CheckpointBarrier checkpointBarrier =
new CheckpointBarrier(metadata.getCheckpointId(), metadata.getTimestamp(), options);
operatorChain.broadcastEvent(checkpointBarrier, options.isUnalignedCheckpoint());
// Step (3): Register alignment timer to timeout aligned barrier to unaligned barrier
// 3.注册Barrier对齐的超时定时器
registerAlignmentTimer(metadata.getCheckpointId(), operatorChain, checkpointBarrier);
// Step (4): Prepare to spill the in-flight buffers for input and output
// 4.处理通道状态,将输出缓存区的数据写入到状态后端StateBackEnd中
if (options.needsChannelState()) {
// output data already written while broadcasting event
channelStateWriter.finishOutput(metadata.getCheckpointId());
}
// Step (5): Take the state snapshot. This should be largely asynchronous, to not impact
// progress of the
// streaming topology
/*5.执行状态快照
* takeSnapshotSync():同步触发算子链的状态快照,返回异步结果。
* finishAndReportAsync():异步处理快照结果,完成后通知 JobManager。
* cleanup():处理失败情况,释放资源并记录错误。
* */
Map<OperatorID, OperatorSnapshotFutures> snapshotFutures =
CollectionUtil.newHashMapWithExpectedSize(operatorChain.getNumberOfOperators());
try {
if (takeSnapshotSync(
snapshotFutures, metadata, metrics, options, operatorChain, isRunning)) {
finishAndReportAsync(
snapshotFutures,
metadata,
metrics,
operatorChain.isTaskDeployedAsFinished(),
isTaskFinished,
isRunning);
} else {
cleanup(snapshotFutures, metadata, metrics, new Exception("Checkpoint declined"));
}
} catch (Exception ex) {
cleanup(snapshotFutures, metadata, metrics, ex);
throw ex;
}
}
(2) 调用的takeSnapshotSync()
核心其实就是调operatorChain的snapshotState()
去做状态快照,存到storage
对应的存储位置上
java
private boolean takeSnapshotSync(
Map<OperatorID, OperatorSnapshotFutures> operatorSnapshotsInProgress,
CheckpointMetaData checkpointMetaData,
CheckpointMetricsBuilder checkpointMetrics,
CheckpointOptions checkpointOptions,
OperatorChain<?, ?> operatorChain,
Supplier<Boolean> isRunning)
throws Exception {
checkState(
!operatorChain.isClosed(),
"OperatorChain and Task should never be closed at this point");
long checkpointId = checkpointMetaData.getCheckpointId();
long started = System.nanoTime();
// 通道状态处理,记录检查点时刻的输入、输出缓冲区数据,用于故障恢复
ChannelStateWriteResult channelStateWriteResult =
checkpointOptions.needsChannelState()
? channelStateWriter.getAndRemoveWriteResult(checkpointId)
: ChannelStateWriteResult.EMPTY;
// 解析存储位置
CheckpointStreamFactory storage =
checkpointStorage.resolveCheckpointStorageLocation(
checkpointId, checkpointOptions.getTargetLocation());
storage = applyFileMergingCheckpoint(storage, checkpointOptions);
try {
// 调operatorChain的snapshotState()去做状态快照,存到storage对应的存储位置上
operatorChain.snapshotState(
operatorSnapshotsInProgress,
checkpointMetaData,
checkpointOptions,
isRunning,
channelStateWriteResult,
storage);
} finally {
// 清理缓存
checkpointStorage.clearCacheFor(checkpointId);
}
// 记录同步阶段的耗时
checkpointMetrics.setSyncDurationMillis((System.nanoTime() - started) / 1_000_000);
LOG.debug(
"{} - finished synchronous part of checkpoint {}. Alignment duration: {} ms, snapshot duration {} ms, is unaligned checkpoint : {}",
taskName,
checkpointId,
checkpointMetrics.getAlignmentDurationNanosOrDefault() / 1_000_000,
checkpointMetrics.getSyncDurationMillis(),
checkpointOptions.isUnalignedCheckpoint());
return true;
}
(3) 调用的finishAndReportAsync()
java
private void finishAndReportAsync(
Map<OperatorID, OperatorSnapshotFutures> snapshotFutures,
CheckpointMetaData metadata,
CheckpointMetricsBuilder metrics,
boolean isTaskDeployedAsFinished,
boolean isTaskFinished,
Supplier<Boolean> isRunning)
throws IOException {
// 创建异步任务
AsyncCheckpointRunnable asyncCheckpointRunnable =
new AsyncCheckpointRunnable(
snapshotFutures, // 各算子的异步快照结果
metadata, // 检查点元数据(ID、时间戳)
metrics, // 检查点指标收集器
System.nanoTime(), // 异步阶段开始时间
taskName, // 任务名称
unregisterConsumer(), // 资源清理回调
env, // 任务环境(含配置、状态后端)
asyncExceptionHandler, // 异步异常处理器
isTaskDeployedAsFinished, // 任务是否以"已完成"状态部署
isTaskFinished, // 任务是否已完成数据处理
isRunning); // 任务是否仍在运行的判断函数
// 注册异步任务
registerAsyncCheckpointRunnable(
asyncCheckpointRunnable.getCheckpointId(), asyncCheckpointRunnable);
// 提交异步任务到线程池执行
asyncOperationsThreadPool.execute(asyncCheckpointRunnable);
}
(4) 调用的registerAlignmentTimer()
若barrier对齐超时了,那么会触发降级策略,改为barrier非对齐模式,具体解析看下章 目的:允许barrier跨越数据,避免阻塞,避免检查点成为流处理的瓶颈
比如有一个案例[数据A, 数据B, barrier-C, 数据D, 数据E]
- barrier对齐模式不允许跨越:barrier-C必须等数据A和数据B处理完并发送到下游后,才能传递barrier-C
- barrier非对齐模式允许跨越:barrier-C不等待数据A和数据B处理完成,直接将barrier-C标记为"高优先级",优先传递到下游算子 注意:虽然barrier-C跨越了数据 A 和 B,但这些数据被存档到检查点中。当任务故障恢复时,Flink 会重放这些存档的数据,确保下游算子最终能收到完整的 A、B、D、E(不会丢失数据)。
java
private void registerAlignmentTimer(
long checkpointId,
OperatorChain<?, ?> operatorChain,
CheckpointBarrier checkpointBarrier) {
// 取消已有的定时器,避免多个检查点的定时器相互干扰
cancelAlignmentTimer();
// 判断是否需要超时保护
if (!checkpointBarrier.getCheckpointOptions().isTimeoutable()) {
return;
}
// 计算延迟
long timerDelay = BarrierAlignmentUtil.getTimerDelay(clock, checkpointBarrier);
// 注册超时任务
alignmentTimer =
registerTimer.registerTask(
() -> {
try {
// 超时后,将barrier对齐转为非对齐逻辑 这是降级机制,具体实现看下章
operatorChain.alignedBarrierTimeout(checkpointId);
} catch (Exception e) {
ExceptionUtils.rethrowIOException(e);
}
alignmentTimer = null;
return null;
},
Duration.ofMillis(timerDelay));
alignmentCheckpointId = checkpointId;
}