Flink源码之Checkpoint执行流程

Checkpoint完整流程如上图所示:

  1. JobMaster的CheckpointCoordinator向所有SourceTask发送RPC触发一次CheckPoint
  2. SourceTask向下游广播CheckpointBarrier
  3. SouceTask完成状态快照后向JobMaster发送快照结果
  4. 非SouceTask在Barrier对齐后完成状态快照向JobMaster发送快照结果
  5. JobMaster保存SubTask快照结果
  6. JobMaster收到所有SubTask快照结果后保存快照信息,想SubTask通知Checkpoint完成

以下对整个流程具体说明。

CheckpointCoordinator

JobMaster将JobGraph转换为ExecutionGraph时,如果开启Checkpoint,会为ExecutionGraph生成一个CheckpointCoordinator

复制代码
DefaultExecutionGraphBuilder.buildGraph//在此会将JobGraph转换为ExecutionGraph
    DefaultExecutionGraph::new
    DefaultExecutionGraph::attachJobGraph //创建ExecutionJobVertex
    	DefaultExecutionTopology.fromExecutionGraph //创建ExecutionTopology
    DefaultExecutionGraph::enableCheckpointing //创建CheckpointCoordinator
    	DefaultExecutionGraph::createCheckpointPlanCalculator//创建DefaultCheckpointPlanCalculator
    	CheckpointCoordinator::new 

CheckpointCoordinator封装了StateBackend和CheckpointStorage

StateBackend负责管理状态:

  • HashMapStateBackend //内存
  • EmbeddedRocksDBStateBackend //内存+磁盘

CheckpointStorage则是负责存储StateBackend管理的状态:

在为StreamTask构造SubtaskCheckpointCoordinatorImpl时会调用:

复制代码
CheckpointStorage::createCheckpointStorage

创建CheckpointStorageAccess用于执行Checkpoint时解析状态存储位置

  • MemoryBackendCheckpointStorageAccess //对应JobManagerCheckpointStorage
  • FsCheckpointStorageAccess //对应FileSystemCheckpointStorage

CheckpointCoordinator在执行状态快照时会调用

复制代码
CheckpointStorageAccess::resolveCheckpointStorageLocation

生成CheckpointStreamFactory用于生成读写状态数据流

  • MemCheckpointStreamFactory //对应JobManagerCheckpointStorage
  • FsCheckpointStreamFactory //对应FileSystemCheckpointStorage

Checkpoint触发流程

JobMaster状态转换为running后,通过CheckpointCoordinator向SourceTask发送TriggerCheckpoint

JobMaster端触发流程

复制代码
JobMaster::start  //RPCServer启动
JobMaster::onStart
JobMaster::startJobExecution
JobMaster::startJobMasterServices //获取RM地址后与RM建立连接
JobMaster::startScheduling
SchedulerBase::startScheduling
DefaultScheduler::startSchedulingInternal
SchedulerBase::transitionToRunning
	DefaultExecutionGraph::transitionToRunning //调用ExecutionGraph监听器通知状态变化
		CheckpointCoordinatorDeActivator::jobStatusChanges//触发checkpoint
			CheckpointCoordinator::startCheckpointScheduler
				CheckpointCoordinator::scheduleTriggerWithDelay //定时不断触发Checkpoint
				CheckpointCoordinator::triggerCheckpoint
				CheckpointCoordinator::startTriggeringCheckpoint
				DefaultCheckpointPlanCalculator::calculateCheckpointPlan//Plan中会隔离出SourceTask作为作为Trigger Checkpoint的入口
				CheckpointCoordinator::createPendingCheckpoint
				CheckpointCoordinator::triggerCheckpointRequest
				CheckpointCoordinator::triggerTasks 
					Execution::triggerCheckpoint //向每个SourceTask发送TriggerCheckpoint请求
                    Execution::triggerCheckpointHelper
                    TaskManagerGateway::triggerCheckpoint//向TaskExecutor发RPC

StreamTask端执行流程

SourceTask

SourceTask由JobMaster RPC直接触发,执行时先广播CheckpointBarrier,然后对状态执行异步快照

复制代码
TaskExecutor::triggerCheckpoint
Task::triggerCheckpointBarrier
AbstractInvokable::triggerCheckpointAsync
SourceStreamTask::triggerCheckpointAsync
StreamTask::triggerCheckpointAsync
StreamTask::triggerCheckpointAsyncInMailbox
StreamTask::performCheckpoint
SubtaskCheckpointCoordinatorImpl::checkpointState
	OperatorChain.broadcastEvent //广播CheckpointBarrier
CheckpointStorage::createCheckpointStorage//为JobId创建CheckpointStorageAccess
SubtaskCheckpointCoordinatorImpl::takeSnapshotSync
CheckpointStorageWorkerView::resolveCheckpointStorageLocation//CheckpointStorageAccess创建 CheckpointStreamFactory
	OperatorChain::snapshotState //对每个Operator
		RegularOperatorChain::buildOperatorSnapshotFutures
		RegularOperatorChain::checkpointStreamOperator
			AbstractStreamOperator::snapshotState
			StreamOperatorStateHandler::snapshotState//调用Operator/Keyed Backend的snapshot
				StateSnapshotContextSynchronousImpl::new
				AbstractUdfStreamOperator::snapshotState //调用UDF中snapshotState方法,一般用于更新OperatorState
				DefaultOperatorStateBackend::snapshot
					SnapshotStrategyRunner::snapshot
					  DefaultOperatorStateBackendSnapshotStrategy::syncPrepareResources//深copy operator state,便于后续进行异步快照
					  DefaultOperatorStateBackendSnapshotStrategy::asyncSnapshot//异步快照					  	  CheckpointStateOutputStream::closeAndGetHandle
						OperatorStreamStateHandle::new //包装元信息及数据StreamStateHandle
					
				HeapKeyedStateBackend::snapshot
					SnapshotStrategyRunner::snapshot
					    HeapSnapshotStrategy::syncPrepareResources
						HeapSnapshotStrategy::asyncSnapshot //采用COWSateTable异步快照
							CheckpointStateOutputStream::closeAndGetHandle
							KeyGroupsStateHandle::new //包装KeyGroup及数据StreamStateHandle
SubtaskCheckpointCoordinatorImpl::finishAndReportAsync //向JobMaster发送checkpoint的结果
	AsyncCheckpointRunnable::new 
	AsyncCheckpointRunnable::run
		AsyncCheckpointRunnable::finalizeNonFinishedSnapshots
			OperatorSnapshotFinalizer::new //等待TaskSnapshot状态信息序列化完成
		AsyncCheckpointRunnable::reportCompletedSnapshotStates
			TaskStateManagerImpl::reportTaskStateSnapshots
				RpcCheckpointResponder::acknowledgeCheckpoint//向JobMaster发送Ack,带上State信息
非SourceTask

在StreamTask启动后调用StreamTask::processInput不断读取数据进行处理, 非SourceTask在收到上游的CheckpointBarrier对齐后触发Checkpoint,

复制代码
StreamTask::processInput
StreamOneInputProcessor::processInput
StreamTaskNetworkInput::emitNext(StreamTaskNetworkOutput)
AbstractStreamTaskNetworkInput::emitNext //循环不断从buffer中读取StreamElement
处理
	CheckpointedInputGate::pollNext
	CheckpointedInputGate::handleEvent
		SingleCheckpointBarrierHandler::processBarrier
		SingleCheckpointBarrierHandler::markCheckpointAlignedAndTransformState
			WaitingForFirstBarrier::barrierReceived
			AbstractAlignedBarrierHandlerState::barrierReceived
			 SingleCheckpointBarrierHandler.ControllerImpl::allBarriersReceived//判断对齐
			 AbstractAlignedBarrierHandlerState::triggerGlobalCheckpoint
			  SingleCheckpointBarrierHandler.ControllerImpl::triggerGlobalCheckpoint
			  SingleCheckpointBarrierHandler::triggerCheckpoint
			  	CheckpointBarrierHandler::notifyCheckpoint //触发StreamTask Checkpoint
			  		StreamTask::triggerCheckpointOnBarrier
			  			StreamTask::performCheckpoint //后续调用过程与SourceTask一样
			  			SubtaskCheckpointCoordinatorImpl::checkpointState   		

根据调用栈看出,非SourceStreamTask执行Checkpoint只是触发时机不同,SourceTask由JobMaster RPC定时不断触发,非SourceTask则是在上游的CheckpointBarrier对齐后触发Checkpoint,最终执行逻辑都是将当前算子的信息写入CheckpointStorage后向JobMaster发送确认信息。

StreamTask向JobMaster ACK信息中包含状态元信息及StreamStateHandle,根据状态存储位置分为:

  • ByteStreamStateHandle //对应JobManagerCheckpointStorage,将状态序列化为byte[]发送给JobMaster
  • FileStateHandle //对应FileSystemCheckpointStorage,将状态写入文件系统后将文件路径发送给JobMaster

JobMaster端完成流程

JobMaster收到StreamTask的acknowledgeCheckpoint后:

复制代码
JobMaster::acknowledgeCheckpoint
SchedulerBase::acknowledgeCheckpoint
ExecutionGraphHandler::acknowledgeCheckpoint
CheckpointCoordinator::receiveAcknowledgeMessage
	PendingCheckpoint::acknowledgeTask //某一个Task的确认
		PendingCheckpoint::updateOperatorState//更新SubTask状态信息
	CheckpointCoordinator::completePendingCheckpoint//所有Task Ack后
		PendingCheckpoint::finalizeCheckpoint
			Checkpoints.storeCheckpointMetadata//保存CheckpointMetadata
				CompletedCheckpoint::new
		CheckpointCoordinator::sendAcknowledgeMessages//向Task通知Checkpoint完成消息
			ExecutionVertex::notifyCheckpointComplete
				TaskManagerGateway.notifyCheckpointComplete

JobMaster收到所有StreamTask的Checkpoint状态信息后,标志一次Checkpoint完成,这时会通知StreamTask CheckPoint完成消息,便于SubTask监听Checkpoint完成后做后续动作。

相关推荐
liliangcsdn7 分钟前
如何用bootstrap模拟估计pass@k
大数据·人工智能·bootstrap
DMD16834 分钟前
AI赋能旅游与酒店业:技术逻辑与开发实践解析
大数据·人工智能·信息可视化·重构·旅游·产业升级
Elastic 中国社区官方博客1 小时前
Elasticsearch 中使用 NVIDIA cuVS 实现最高快 12 倍的向量索引速度:GPU 加速第 2 章
大数据·人工智能·elasticsearch·搜索引擎·ai·全文检索·数据库架构
jqpwxt2 小时前
启点智慧景区多商户分账系统,多业态景区收银管理系统
大数据·旅游
jkyy20142 小时前
线上线下融合、跨场景协同—社区健康医疗小屋的智能升级
大数据·人工智能·物联网·健康医疗
阿乔外贸日记3 小时前
中国汽车零配件出口企业情况
大数据·人工智能·智能手机·云计算·汽车
天远数科3 小时前
微服务架构下的风控数据集成:基于Go的支付行为指数API实战
大数据·api
飞飞传输3 小时前
选对国产FTP服务器,筑牢数据传输安全防线,合规高效双达标
大数据·运维·安全
2501_941142934 小时前
云原生微服务环境下服务熔断与降级优化实践——提升系统稳定性与容错能力
java·大数据·网络
智海观潮4 小时前
SparkSQL真的不支持存储NullType类型数据到Parquet吗?
大数据·spark