聊聊PowerJob的MapReduceProcessor

本文主要研究一下PowerJob的MapReduceProcessor

MapReduceProcessor

复制代码
public interface MapReduceProcessor extends MapProcessor {

    /**
     * reduce方法将在所有任务结束后调用
     * @param context 任务上下文
     * @param taskResults 保存了各个子Task的执行结果
     * @return reduce产生的结果将作为任务最终的返回结果
     */
    ProcessResult reduce(TaskContext context, List<TaskResult> taskResults);
}

MapReduceProcessor继承了MapProcessor,它新增了reduce方法

TaskResult

tech/powerjob/worker/core/processor/TaskResult.java

复制代码
@Data
public class TaskResult {

    private String taskId;
    private boolean success;
    private String result;

}

TaskResult定义了taskId、success、result属性

handleLastTask

tech/powerjob/worker/core/processor/runnable/HeavyProcessorRunnable.java

复制代码
    private void handleLastTask(String taskId, Long instanceId, TaskContext taskContext, ExecuteType executeType) {
        final BasicProcessor processor = processorBean.getProcessor();
        ProcessResult processResult;
        Stopwatch stopwatch = Stopwatch.createStarted();
        log.debug("[ProcessorRunnable-{}] the last task(taskId={}) start to process.", instanceId, taskId);

        List<TaskResult> taskResults = workerRuntime.getTaskPersistenceService().getAllTaskResult(instanceId, task.getSubInstanceId());
        try {
            switch (executeType) {
                case BROADCAST:

                    if (processor instanceof BroadcastProcessor) {
                        BroadcastProcessor broadcastProcessor = (BroadcastProcessor) processor;
                        processResult = broadcastProcessor.postProcess(taskContext, taskResults);
                    } else {
                        processResult = BroadcastProcessor.defaultResult(taskResults);
                    }
                    break;
                case MAP_REDUCE:

                    if (processor instanceof MapReduceProcessor) {
                        MapReduceProcessor mapReduceProcessor = (MapReduceProcessor) processor;
                        processResult = mapReduceProcessor.reduce(taskContext, taskResults);
                    } else {
                        processResult = new ProcessResult(false, "not implement the MapReduceProcessor");
                    }
                    break;
                default:
                    processResult = new ProcessResult(false, "IMPOSSIBLE OR BUG");
            }
        } catch (Throwable e) {
            processResult = new ProcessResult(false, e.toString());
            log.warn("[ProcessorRunnable-{}] execute last task(taskId={}) failed.", instanceId, taskId, e);
        }

        TaskStatus status = processResult.isSuccess() ? TaskStatus.WORKER_PROCESS_SUCCESS : TaskStatus.WORKER_PROCESS_FAILED;
        reportStatus(status, suit(processResult.getMsg()), null, taskContext.getWorkflowContext().getAppendedContextData());

        log.info("[ProcessorRunnable-{}] the last task execute successfully, using time: {}", instanceId, stopwatch);
    }

HeavyProcessorRunnable的handleLastTask方法先通过workerRuntime.getTaskPersistenceService().getAllTaskResult获取taskResults,然后对于MapReduceProcessor则回调mapReduceProcessor.reduce方法

getAllTaskResult

tech/powerjob/worker/persistence/TaskPersistenceService.java

复制代码
    public List<TaskResult> getAllTaskResult(Long instanceId, Long subInstanceId) {
        try {
            return execute(() -> taskDAO.getAllTaskResult(instanceId, subInstanceId));
        }catch (Exception e) {
            log.error("[TaskPersistenceService] getTaskId2ResultMap for instance(id={}) failed.", instanceId, e);
        }
        return Lists.newLinkedList();
    }

TaskPersistenceService的getAllTaskResult方法根据instanceId, subInstanceId查询task_info表select task_id, status, result from task_info where instance_id = ? and sub_instance_id = ?,最后只返回状态是WORKER_PROCESS_SUCCESS或者WORKER_PROCESS_FAILED的任务信息

小结

MapReduceProcessor继承了MapProcessor,它新增了reduce方法;HeavyProcessorRunnable的handleLastTask方法先通过workerRuntime.getTaskPersistenceService().getAllTaskResult获取taskResults,然后对于MapReduceProcessor则回调mapReduceProcessor.reduce方法;getAllTaskResult方法根据instanceId, subInstanceId查询task_info表返回状态是WORKER_PROCESS_SUCCESS或者WORKER_PROCESS_FAILED的任务信息(task_info表只在worker节点上),默认是h2(~/powerjob/worker/h2/{uuid}/powerjob_worker_db.mv.db)

相关推荐
lly20240622 分钟前
C 标准库 - `<stdio.h>`
开发语言
沫璃染墨24 分钟前
C++ string 从入门到精通:构造、迭代器、容量接口全解析
c语言·开发语言·c++
jwn99925 分钟前
Laravel6.x核心特性全解析
开发语言·php·laravel
迷藏49427 分钟前
**发散创新:基于Rust实现的开源合规权限管理框架设计与实践**在现代软件架构中,**权限控制(RBAC)** 已成为保障
java·开发语言·python·rust·开源
功德+n1 小时前
Linux下安装与配置Docker完整详细步骤
linux·运维·服务器·开发语言·docker·centos
明日清晨1 小时前
python扫码登录dy
开发语言·python
我是唐青枫1 小时前
C#.NET gRPC 深入解析:Proto 定义、流式调用与服务间通信取舍
开发语言·c#·.net
JJay.1 小时前
Kotlin 高阶函数学习指南
android·开发语言·kotlin
bazhange1 小时前
python如何像matlab一样使用向量化替代for循环
开发语言·python·matlab
jinanwuhuaguo1 小时前
截止到4月8日,OpenClaw 2026年4月更新深度解读剖析:从“能力回归”到“信任内建”的范式跃迁
android·开发语言·人工智能·深度学习·kotlin