聊聊PowerJob的MapReduceProcessor

本文主要研究一下PowerJob的MapReduceProcessor

MapReduceProcessor

public interface MapReduceProcessor extends MapProcessor {

    /**
     * reduce方法将在所有任务结束后调用
     * @param context 任务上下文
     * @param taskResults 保存了各个子Task的执行结果
     * @return reduce产生的结果将作为任务最终的返回结果
     */
    ProcessResult reduce(TaskContext context, List<TaskResult> taskResults);
}

MapReduceProcessor继承了MapProcessor,它新增了reduce方法

TaskResult

tech/powerjob/worker/core/processor/TaskResult.java

@Data
public class TaskResult {

    private String taskId;
    private boolean success;
    private String result;

}

TaskResult定义了taskId、success、result属性

handleLastTask

tech/powerjob/worker/core/processor/runnable/HeavyProcessorRunnable.java

    private void handleLastTask(String taskId, Long instanceId, TaskContext taskContext, ExecuteType executeType) {
        final BasicProcessor processor = processorBean.getProcessor();
        ProcessResult processResult;
        Stopwatch stopwatch = Stopwatch.createStarted();
        log.debug("[ProcessorRunnable-{}] the last task(taskId={}) start to process.", instanceId, taskId);

        List<TaskResult> taskResults = workerRuntime.getTaskPersistenceService().getAllTaskResult(instanceId, task.getSubInstanceId());
        try {
            switch (executeType) {
                case BROADCAST:

                    if (processor instanceof BroadcastProcessor) {
                        BroadcastProcessor broadcastProcessor = (BroadcastProcessor) processor;
                        processResult = broadcastProcessor.postProcess(taskContext, taskResults);
                    } else {
                        processResult = BroadcastProcessor.defaultResult(taskResults);
                    }
                    break;
                case MAP_REDUCE:

                    if (processor instanceof MapReduceProcessor) {
                        MapReduceProcessor mapReduceProcessor = (MapReduceProcessor) processor;
                        processResult = mapReduceProcessor.reduce(taskContext, taskResults);
                    } else {
                        processResult = new ProcessResult(false, "not implement the MapReduceProcessor");
                    }
                    break;
                default:
                    processResult = new ProcessResult(false, "IMPOSSIBLE OR BUG");
            }
        } catch (Throwable e) {
            processResult = new ProcessResult(false, e.toString());
            log.warn("[ProcessorRunnable-{}] execute last task(taskId={}) failed.", instanceId, taskId, e);
        }

        TaskStatus status = processResult.isSuccess() ? TaskStatus.WORKER_PROCESS_SUCCESS : TaskStatus.WORKER_PROCESS_FAILED;
        reportStatus(status, suit(processResult.getMsg()), null, taskContext.getWorkflowContext().getAppendedContextData());

        log.info("[ProcessorRunnable-{}] the last task execute successfully, using time: {}", instanceId, stopwatch);
    }

HeavyProcessorRunnable的handleLastTask方法先通过workerRuntime.getTaskPersistenceService().getAllTaskResult获取taskResults,然后对于MapReduceProcessor则回调mapReduceProcessor.reduce方法

getAllTaskResult

tech/powerjob/worker/persistence/TaskPersistenceService.java

    public List<TaskResult> getAllTaskResult(Long instanceId, Long subInstanceId) {
        try {
            return execute(() -> taskDAO.getAllTaskResult(instanceId, subInstanceId));
        }catch (Exception e) {
            log.error("[TaskPersistenceService] getTaskId2ResultMap for instance(id={}) failed.", instanceId, e);
        }
        return Lists.newLinkedList();
    }

TaskPersistenceService的getAllTaskResult方法根据instanceId, subInstanceId查询task_info表select task_id, status, result from task_info where instance_id = ? and sub_instance_id = ?,最后只返回状态是WORKER_PROCESS_SUCCESS或者WORKER_PROCESS_FAILED的任务信息

小结

MapReduceProcessor继承了MapProcessor,它新增了reduce方法;HeavyProcessorRunnable的handleLastTask方法先通过workerRuntime.getTaskPersistenceService().getAllTaskResult获取taskResults,然后对于MapReduceProcessor则回调mapReduceProcessor.reduce方法;getAllTaskResult方法根据instanceId, subInstanceId查询task_info表返回状态是WORKER_PROCESS_SUCCESS或者WORKER_PROCESS_FAILED的任务信息(task_info表只在worker节点上),默认是h2(~/powerjob/worker/h2/{uuid}/powerjob_worker_db.mv.db)

相关推荐
编程零零七2 小时前
Python数据分析工具(三):pymssql的用法
开发语言·前端·数据库·python·oracle·数据分析·pymssql
2401_858286113 小时前
52.【C语言】 字符函数和字符串函数(strcat函数)
c语言·开发语言
铁松溜达py3 小时前
编译器/工具链环境:GCC vs LLVM/Clang,MSVCRT vs UCRT
开发语言·网络
everyStudy3 小时前
JavaScript如何判断输入的是空格
开发语言·javascript·ecmascript
C-SDN花园GGbond4 小时前
【探索数据结构与算法】插入排序:原理、实现与分析(图文详解)
c语言·开发语言·数据结构·排序算法
迷迭所归处5 小时前
C++ —— 关于vector
开发语言·c++·算法
架构文摘JGWZ6 小时前
Java 23 的12 个新特性!!
java·开发语言·学习
leon6256 小时前
优化算法(一)—遗传算法(Genetic Algorithm)附MATLAB程序
开发语言·算法·matlab
锦亦之22336 小时前
QT+OSG+OSG-earth如何在窗口显示一个地球
开发语言·qt
我是苏苏7 小时前
Web开发:ABP框架2——入门级别的增删改查Demo
java·开发语言