【源码分析】zeebe actor模型源码解读

zeebe actor 模型🙋‍♂️

如果有阅读过zeebe 源码的朋友一定能够经常看到actor.run() 之类的语法,那么这篇文章就围绕actor.run 方法,说说zeebe actor 的模型。

环境⛅

zeebe release-8.1.14

actor.run() 是怎么开始的🌈

LongPollingActivateJobsHandler.java

以LongPollingActivateJobsHandler 的激活任务方法为例,我们可以看到run 方法实际上执行ActorControl类的run 方法,让我们进到run 方法中。

java 复制代码
	private ActorControl actor;

    public void activateJobs(final InflightActivateJobsRequest request) {
        actor.run(
                () -> {
                    final InFlightLongPollingActivateJobsRequestsState state =
                            getJobTypeState(request.getType());

                    if (state.shouldAttempt(failedAttemptThreshold)) {
                        activateJobsUnchecked(state, request);
                    } else {
                        completeOrResubmitRequest(request, false);
                    }
                });
    }

ActorControl

可以看到scheduleRunnable 的目标是构造ActorJob,然后将job 添加到ActorTask 中,添加的方式分为insert 和submit。其实到这里我们就可以认为actor.run 就已经结束了,因为insert 和submit 方法主要就是将job 添加到task 的jobQueues 中,对于job 的执行要等到队列不断被线程pop 到当前job 的时候。

java 复制代码
	final ActorTask task;
	
    @Override
    public void run(final Runnable action) {
        scheduleRunnable(action);
    }
    private void scheduleRunnable(final Runnable runnable) {
        final ActorThread currentActorThread = ActorThread.current();

        if (currentActorThread != null && currentActorThread.getCurrentTask() == task) {
            final ActorJob newJob = currentActorThread.newJob();
            newJob.setRunnable(runnable);
            newJob.onJobAddedToTask(task);

            // 插入到执行队列
            task.insertJob(newJob);
        } else {
            final ActorJob job = new ActorJob();
            job.setRunnable(runnable);
            job.onJobAddedToTask(task);

            // 提交到外部队列
            // submit 实际上是将task 放到thread group 里边
            task.submit(job);
        }
    }

job 是怎么被执行的⚡

并不是任意一个ActorControl 都可以执行run 方法的,按照上图所示,Actor 会在broker 生命周期开始要进行注册 ,也就是说ActorControl 中的task 会注册到taskQueues。然后"线程池"不断从taskQueues 中pop 出task,每个task 中又会有多个job,按照策略选取不同的job 执行,我们可以认为job 就是actor.run(Runnable runnable) 中的runnable。

Gateway.java

gateway 注册task

java 复制代码
  
  private CompletableFuture<ActivateJobsHandler> submitActorToActivateJobs(
      final ActivateJobsHandler handler) {
    final var future = new CompletableFuture<ActivateJobsHandler>();
    final var actor =
        Actor.newActor()
            .name("ActivateJobsHandler")
            .actorStartedHandler(handler.andThen(t -> future.complete(handler)))
            .build();
	
	// 将task 注册到TaskQueues
    actorSchedulingService.submitActor(actor);
    return future;
  }

ActorThreadGroup.java

就是上面提到的"线程池",负责初始化每一条ActorThread 线程,并为其分配默认的WorkStealingGroup

java 复制代码
	protected final String groupName;
    protected final ActorThread[] threads;
    protected final WorkStealingGroup tasks;
    protected final int numOfThreads;

	// 构造器,初始化每条线程,并为其分配一个默认的WorkStealingGroup 任务队列
    public ActorThreadGroup(
            final String groupName, final int numOfThreads, final ActorSchedulerBuilder builder) {
        this.groupName = groupName;
        this.numOfThreads = numOfThreads;

        tasks = new WorkStealingGroup(numOfThreads);

        threads = new ActorThread[numOfThreads];

        for (int t = 0; t < numOfThreads; t++) {
            final String threadName = String.format("%s-%d", groupName, t);
            final ActorThread thread =
                    builder
                            .getActorThreadFactory()
                            .newThread(
                                    threadName,
                                    t,
                                    this,
                                    tasks,
                                    builder.getActorClock(),
                                    builder.getActorTimerQueue(),
                                    builder.isMetricsEnabled());

            threads[t] = thread;
        }
    }
	
	// start
    public void start() {
        for (final ActorThread actorThread : threads) {

            // 启动每一个ActorThread
            actorThread.start();
        }
    }

ActorThread.java

ActorThread 继承自Thread,可以看到start=>run=>doWork 的引用流程,在doWork 方法中,首先从taskScheduler 中获取当前task,然后执行当前task

java 复制代码
	// 继承自Thread 
    @Override
    public synchronized void start() {
        if (UNSAFE.compareAndSwapObject(
                this, STATE_OFFSET, ActorThreadState.NEW, ActorThreadState.RUNNING)) {
            
            // super.start 会执行下面的run 方法
            super.start();
        } else {
            throw new IllegalStateException("Cannot start runner, not in state 'NEW'.");
        }
    }

	// 主要执行doWork 方法
    @Override
    public void run() {
        idleStrategy.init();
		
        while (state == ActorThreadState.RUNNING) {
            try {
                doWork();
            } catch (final Exception e) {
                LOG.error("Unexpected error occurred while in the actor thread {}", getName(), e);
            }
        }

        state = ActorThreadState.TERMINATED;

        terminationFuture.complete(null);
    }
	private void doWork() {
        submittedCallbacks.drain(this);

        if (clock.update()) {
            timerJobQueue.processExpiredTimers(clock);
        }

		// 从taskScheduler 中获取当前task
        currentTask = taskScheduler.getNextTask();

        if (currentTask != null) {
            final var actorName = currentTask.actor.getName();
            try (final var timer = actorMetrics.startExecutionTimer(actorName)) {

                // 执行当前任务
                executeCurrentTask();
            }
            if (actorMetrics.isEnabled()) {
                actorMetrics.updateJobQueueLength(actorName, currentTask.estimateQueueLength());
                actorMetrics.countExecution(actorName);
            }
        } else {
            idleStrategy.onIdle();
        }
    }

	private void executeCurrentTask() {
        final var properties = currentTask.getActor().getContext();
        MDC.setContextMap(properties);
        idleStrategy.onTaskExecuted();

        boolean resubmit = false;

        try {
			// 真正执行当前任务
            resubmit = currentTask.execute(this);
        } catch (final Throwable e) {
            FATAL_ERROR_HANDLER.handleError(e);
            LOG.error("Unexpected error occurred in task {}", currentTask, e);
        } finally {
            MDC.remove("actor-name");
            clock.update();
        }

        if (resubmit) {
            currentTask.resubmit();
        }
    }

ActorTask.java

ActorTask 的执行流程,它会不断从订阅的列表中拉取job,poll 方法会更新当前currentJob, 如果一次逻辑执行中从fastlaneJobs 中poll 到了任务,那么currentJob != null 会短路返回true,而不进行poll(),从这里可以看到submittedJobs 和fastlaneJobs 的区别!

找到job 后开始执行当前job

java 复制代码
	public boolean execute(final ActorThread runner) {
        schedulingState.set(TaskSchedulingState.ACTIVE);

        boolean resubmit = false;

        // 不断从订阅的列表中拉取job,poll 方法会更新当前currentJob, 如果一次逻辑执行中从fastlaneJobs 中poll 到了任务,那么currentJob != null 会短路返回true,而不进行poll()
        while (!resubmit && (currentJob != null || poll())) {
            currentJob.execute(runner);

            switch (currentJob.schedulingState) {
                case TERMINATED -> {
                    final ActorJob terminatedJob = currentJob;

                    // 从fastlaneJobs任务集合中拉取任务
                    currentJob = fastLaneJobs.poll();

                    // 如果是通过订阅触发的任务
                    if (terminatedJob.isTriggeredBySubscription()) {
                        final ActorSubscription subscription = terminatedJob.getSubscription();

                        // 如果订阅是一次性的,那么在订阅触发后则将订阅移除
                        if (!subscription.isRecurring()) {
                            removeSubscription(subscription);
                        }

                        // 执行订阅的回调任务
                        subscription.onJobCompleted();
                    } else {
                      
                        runner.recycleJob(terminatedJob);
                    }
                }
                case QUEUED ->
                    // the task is experiencing backpressure: do not retry it right now, instead re-enqueue
                    // the actor task.
                    // this allows other tasks which may be needed to unblock the backpressure to run
                        resubmit = true;
                default -> {
                }
            }

            if (shouldYield) {
                shouldYield = false;
                resubmit = currentJob != null;
                break;
            }
        }

        if (currentJob == null) {
            resubmit = onAllJobsDone();
        }

        return resubmit;
    }
    private boolean poll() {
        boolean result = false;

        result |= pollSubmittedJobs();
        result |= pollSubscriptions();

        return result;
    }

ActorJob.java

ActorJob 的执行逻辑

还记得上面说过ActorJob 可以理解为runnable 的吗,在invoke 中ActorJob 中的runnable 真正执行了,至此job 的执行过程结束

java 复制代码
	void execute(final ActorThread runner) {
        actorThread = runner;
        observeSchedulingLatency(runner.getActorMetrics());
        try {

            // 执行actor 的 callable 或者 runnable 方法
            invoke();

            if (resultFuture != null) {
                resultFuture.complete(invocationResult);
                resultFuture = null;
            }

        } catch (final Throwable e) {
            FATAL_ERROR_HANDLER.handleError(e);
            task.onFailure(e);
        } finally {
            actorThread = null;

            // 无论那种情况,成功或者失败,都要判断是否job 应该被resubmitted
            // in any case, success or exception, decide if the job should be resubmitted
            if (isTriggeredBySubscription() || runnable == null) {
                schedulingState = TaskSchedulingState.TERMINATED;
            } else {
                schedulingState = TaskSchedulingState.QUEUED;
                scheduledAt = System.nanoTime();
            }
        }
    }
    
    private void invoke() throws Exception {
        if (callable != null) {
            invocationResult = callable.call();
        } else {
            // only tasks triggered by a subscription can "yield"; everything else just executes once
            if (!isTriggeredBySubscription()) {
                final Runnable r = runnable;
                runnable = null;
                r.run();
            } else {
            	// runnable 真正执行
                runnable.run();
            }
        }
    }

总结📝

本文中的激活例子其实只是列举了Actor 的实现原理,想一想文中提到的功能用一个真正的线程池可以很好的解决。但是actor模型的特性远不仅如此,对于其他特性在zeebe 中是如何实现的还请读者自己去挖掘🤏~

zeebe 团队真的是太喜欢functional programming了,找一个方法的调用链头都大了😅