聊聊PowerJob的MapReduceProcessor

本文主要研究一下PowerJob的MapReduceProcessor

MapReduceProcessor

复制代码
public interface MapReduceProcessor extends MapProcessor {

    /**
     * reduce方法将在所有任务结束后调用
     * @param context 任务上下文
     * @param taskResults 保存了各个子Task的执行结果
     * @return reduce产生的结果将作为任务最终的返回结果
     */
    ProcessResult reduce(TaskContext context, List<TaskResult> taskResults);
}

MapReduceProcessor继承了MapProcessor,它新增了reduce方法

TaskResult

tech/powerjob/worker/core/processor/TaskResult.java

复制代码
@Data
public class TaskResult {

    private String taskId;
    private boolean success;
    private String result;

}

TaskResult定义了taskId、success、result属性

handleLastTask

tech/powerjob/worker/core/processor/runnable/HeavyProcessorRunnable.java

复制代码
    private void handleLastTask(String taskId, Long instanceId, TaskContext taskContext, ExecuteType executeType) {
        final BasicProcessor processor = processorBean.getProcessor();
        ProcessResult processResult;
        Stopwatch stopwatch = Stopwatch.createStarted();
        log.debug("[ProcessorRunnable-{}] the last task(taskId={}) start to process.", instanceId, taskId);

        List<TaskResult> taskResults = workerRuntime.getTaskPersistenceService().getAllTaskResult(instanceId, task.getSubInstanceId());
        try {
            switch (executeType) {
                case BROADCAST:

                    if (processor instanceof BroadcastProcessor) {
                        BroadcastProcessor broadcastProcessor = (BroadcastProcessor) processor;
                        processResult = broadcastProcessor.postProcess(taskContext, taskResults);
                    } else {
                        processResult = BroadcastProcessor.defaultResult(taskResults);
                    }
                    break;
                case MAP_REDUCE:

                    if (processor instanceof MapReduceProcessor) {
                        MapReduceProcessor mapReduceProcessor = (MapReduceProcessor) processor;
                        processResult = mapReduceProcessor.reduce(taskContext, taskResults);
                    } else {
                        processResult = new ProcessResult(false, "not implement the MapReduceProcessor");
                    }
                    break;
                default:
                    processResult = new ProcessResult(false, "IMPOSSIBLE OR BUG");
            }
        } catch (Throwable e) {
            processResult = new ProcessResult(false, e.toString());
            log.warn("[ProcessorRunnable-{}] execute last task(taskId={}) failed.", instanceId, taskId, e);
        }

        TaskStatus status = processResult.isSuccess() ? TaskStatus.WORKER_PROCESS_SUCCESS : TaskStatus.WORKER_PROCESS_FAILED;
        reportStatus(status, suit(processResult.getMsg()), null, taskContext.getWorkflowContext().getAppendedContextData());

        log.info("[ProcessorRunnable-{}] the last task execute successfully, using time: {}", instanceId, stopwatch);
    }

HeavyProcessorRunnable的handleLastTask方法先通过workerRuntime.getTaskPersistenceService().getAllTaskResult获取taskResults,然后对于MapReduceProcessor则回调mapReduceProcessor.reduce方法

getAllTaskResult

tech/powerjob/worker/persistence/TaskPersistenceService.java

复制代码
    public List<TaskResult> getAllTaskResult(Long instanceId, Long subInstanceId) {
        try {
            return execute(() -> taskDAO.getAllTaskResult(instanceId, subInstanceId));
        }catch (Exception e) {
            log.error("[TaskPersistenceService] getTaskId2ResultMap for instance(id={}) failed.", instanceId, e);
        }
        return Lists.newLinkedList();
    }

TaskPersistenceService的getAllTaskResult方法根据instanceId, subInstanceId查询task_info表select task_id, status, result from task_info where instance_id = ? and sub_instance_id = ?,最后只返回状态是WORKER_PROCESS_SUCCESS或者WORKER_PROCESS_FAILED的任务信息

小结

MapReduceProcessor继承了MapProcessor,它新增了reduce方法;HeavyProcessorRunnable的handleLastTask方法先通过workerRuntime.getTaskPersistenceService().getAllTaskResult获取taskResults,然后对于MapReduceProcessor则回调mapReduceProcessor.reduce方法;getAllTaskResult方法根据instanceId, subInstanceId查询task_info表返回状态是WORKER_PROCESS_SUCCESS或者WORKER_PROCESS_FAILED的任务信息(task_info表只在worker节点上),默认是h2(~/powerjob/worker/h2/{uuid}/powerjob_worker_db.mv.db)

相关推荐
南汐汐月39 分钟前
重生归来,我要成功 Python 高手--day33 决策树
开发语言·python·决策树
星释1 小时前
Rust 练习册 :Proverb与字符串处理
开发语言·后端·rust
工会主席-阿冰1 小时前
数据索引是无序时,直接用这个数据去画图的话,显示的图是错误的
开发语言·python·数据挖掘
麦麦鸡腿堡1 小时前
Java_TreeSet与TreeMap源码解读
java·开发语言
gladiator+2 小时前
Java中的设计模式------策略设计模式
java·开发语言·设计模式
Lucifer__hell2 小时前
【python+tkinter】图形界面简易计算器的实现
开发语言·python·tkinter
2301_812914872 小时前
py day34 装饰器
开发语言·python
卡提西亚2 小时前
C++笔记-24-文件读写操作
开发语言·c++·笔记
snakecy2 小时前
树莓派学习资料共享
大数据·开发语言·学习·系统架构
Nebula_g2 小时前
C语言应用实例:学生管理系统1(指针、结构体综合应用,动态内存分配)
c语言·开发语言·学习·算法·基础