DAG 学习笔记：从拓扑排序到并行执行

  /**
     * 拓扑排序(Kahn 算法)
     */

    （1）nodeMap:key为节点Id,值为工作流节点模型
      (2)dependencies:key为节点Id，值为该节点的前置节点ID集合
    private List<WorkflowNode> topologicalSort(Map<String, WorkflowNode> nodeMap, 
                                                 Map<String, List<String>> dependencies) {
        List<WorkflowNode> result = new ArrayList<>();
        
        // 计算每个节点的入度
        Map<String, Integer> inDegree = new HashMap<>();
        for (String nodeId : nodeMap.keySet()) {
            inDegree.put(nodeId, dependencies.get(nodeId).size());
        }
        
        // 找出所有入度为 0 的节点(起始节点)
        Queue<String> queue = new LinkedList<>();
        for (Map.Entry<String, Integer> entry : inDegree.entrySet()) {
            if (entry.getValue() == 0) {
                queue.offer(entry.getKey());
            }
        }
        
        // 构建反向依赖图(用于拓扑排序)
        Map<String, List<String>> dependents = new HashMap<>();
        for (String nodeId : nodeMap.keySet()) {
            dependents.put(nodeId, new ArrayList<>());
        }
        for (Map.Entry<String, List<String>> entry : dependencies.entrySet()) {
            String target = entry.getKey();
            for (String source : entry.getValue()) {
                dependents.get(source).add(target);
            }
        }
        
        // Kahn 算法执行拓扑排序
        while (!queue.isEmpty()) {
            String nodeId = queue.poll();
            result.add(nodeMap.get(nodeId));
            
            // 将该节点的所有后继节点的入度减 1
            List<String> deps = dependents.get(nodeId);
            if (deps != null) {
                for (String dep : deps) {
                    int degree = inDegree.get(dep) - 1;
                    inDegree.put(dep, degree);
                    if (degree == 0) {
                        queue.offer(dep);
                    }
                }
            }
        }
        
        // 如果排序后的节点数小于总节点数,说明存在循环依赖
        if (result.size() != nodeMap.size()) {
            throw new RuntimeException("工作流存在循环依赖,无法完成拓扑排序");
        }
        
        return result;
    }

二、DFS 检测循环

作用：

在项目中，DFS 检测是工作流执行前的安全检查，确保工作流没有循环依赖，避免无限执行。

步骤：

递归遍历：深度优先搜索每个节点
路径记录：用 currentPath 记录当前路径
循环判断：如果当前节点在 currentPath 中，说明存在循环
回溯清理：递归返回时从 currentPath 中移除节点

实现：

java 复制代码

/**
     * 检测循环依赖(使用 DFS)
     */
    private void detectCycle(Map<String, List<String>> dependencies, List<WorkflowNode> nodes) {
        Set<String> visited = new HashSet<>();
        Set<String> currentPath = new HashSet<>();
        
        for (WorkflowNode node : nodes) {
            if (hasCycleDFS(node.getId(), dependencies, visited, currentPath)) {
                throw new RuntimeException("工作流存在循环依赖,节点: " + node.getId());
            }
        }
    }
    
    /**
     * DFS 检测循环
     */
    private boolean hasCycleDFS(String nodeId, Map<String, List<String>> dependencies, 
                                 Set<String> visited, Set<String> currentPath) {
        if (currentPath.contains(nodeId)) {
            return true; // 有循环
        }
        
        if (visited.contains(nodeId)) {
            return false; // 已访问
        }
        
        visited.add(nodeId);
        currentPath.add(nodeId);
        
        // 递归检查所有依赖节点
        List<String> deps = dependencies.get(nodeId);
        if (deps != null) {
            for (String dep : deps) {
                if (hasCycleDFS(dep, dependencies, visited,currentPath)) {
                    return true;
                }
            }
        }
        
        currentPath.remove(nodeId);
        return false;
    }

三、DAG 经过拓扑排序后，按照依赖顺序依次执行每个节点，将前一个节点的输出作为下一个节点的输入，并记录每个节点的执行结果

java 复制代码

List<ExecutionResponse.NodeResult> nodeResults = new ArrayList<>();//存储执行结果

List<WorkflowNode> sortedNodes；//拓扑排序后的节点列表

for(WorkflowNode node:sortedNodes){
	 NodeExecutor executor = executorFactory.getExecutor(node.getType());//使用节点执行器执行

         Map<String, Object> output = executor.execute(node, currentInput);

//封装执行结果
 ExecutionResponse.NodeResult nodeResult = new ExecutionResponse.NodeResult();

	 nodeResult.setNodeId(node.getId());

                nodeResult.setNodeName(node.getType());

                nodeResult.setInput(JSON.toJSONString(currentInput));

 nodeResult.setStatus("SUCCESS");

                    nodeResult.setOutput(JSON.toJSONString(output));


 nodeResults.add(nodeResult);
//当前输出为下一个节点的输入
	   currentInput = output;

}

可并行扩展：

DAG 的精髓在于并行：如果节点 A 和节点 B 都没有依赖，它们应该同时跑。目前的 Kahn 算法逻辑里，入度为 0 的节点可能有多个，可以用线程池或协程来同时处理这些节点，而不是用一个 for 循环挨个跑。

java 复制代码

import java.util.*;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;



public class DagParallelRunner {

    // 1. 线程池 (生产环境建议自定义配置核心线程数)
    private final ExecutorService executorService = Executors.newFixedThreadPool(10);
    
    // 2. 全局上下文线程安全
    private final Map<String, Object> globalContext = new ConcurrentHashMap<>();

    /**
     * 并行执行入口
     * @param nodes 所有节点列表
     * @param dependencies 依赖关系 Map<当前节点ID, 前置节点ID列表>
     */
    public void execute(List<WorkflowNode> nodes, Map<String, List<String>> dependencies) throws InterruptedException {
        
        // 3. 构建反向依赖图 (用于任务完成后通知下游)
        Map<String, List<String>> reverseDeps = new HashMap<>();
        for (WorkflowNode node : nodes) reverseDeps.put(node.getId(), new ArrayList<>());
        for (Map.Entry<String, List<String>> entry : dependencies.entrySet()) {
            String target = entry.getKey();
            for (String source : entry.getValue()) {
                reverseDeps.get(source).add(target);
            }
        }

        // 4. 计算每个节点的初始入度 (运行时动态扣减)
        Map<String, AtomicInteger> inDegreeMap = new ConcurrentHashMap<>();
        for (WorkflowNode node : nodes) {
            int inDegree = dependencies.get(node.getId()).size();
            inDegreeMap.put(node.getId(), new AtomicInteger(inDegree));
        }

        // 5. 找出所有入度为0的起始节点，直接提交执行
        Queue<String> readyQueue = new ConcurrentLinkedQueue<>();
        for (WorkflowNode node : nodes) {
            if (inDegreeMap.get(node.getId()).get() == 0) {
                readyQueue.offer(node.getId());
            }
        }

        // 6. 核心并发循环
        // 使用 CountDownLatch 等待所有任务结束
        CountDownLatch latch = new CountDownLatch(nodes.size());

        while (true) {
            String nodeId = readyQueue.poll();
            if (nodeId == null) break; // 队列为空，说明所有可执行任务已提交

            WorkflowNode node = nodes.stream().filter(n -> n.getId().equals(nodeId)).findFirst().orElse(null);
            if (node == null) continue;

            // 提交任务到线程池
            executorService.submit(() -> {
                try {
                    // --- 执行业务逻辑 ---
                    NodeExecutor executor = new ExecutorFactory().getExecutor(node.getType()); // 假设工厂类已实现
                    
                    // 从全局上下文收集输入 (这里简单传入所有，实际可按需过滤)
                    Map<String, Object> input = new HashMap<>(globalContext); 
                    
                    Map<String, Object> output = executor.execute(node, input);
                    
                    // 结果写入全局上下文
                    globalContext.put(node.getId(), output);
                    System.out.println("节点 " + node.getId() + " 执行完成");

                } catch (Exception e) {
                    e.printStackTrace();
                    // 生产环境需处理失败逻辑（如阻断下游）
                } finally {
                    // --- 调度逻辑：通知下游节点 ---
                    List<String> successors = reverseDeps.get(node.getId());
                    if (successors != null) {
                        for (String successorId : successors) {
                            // 入度减 1，如果变为 0 则加入就绪队列
                            if (inDegreeMap.get(successorId).decrementAndGet() == 0) {
                                readyQueue.offer(successorId);
                            }
                        }
                    }
                    latch.countDown();
                }
            });
        }

        // 7. 等待所有任务执行完毕
        latch.await();
        System.out.println("DAG 全部执行完毕！");
        executorService.shutdown();
    }
}

---------------------------------------------------------分割线---------------------------------------------------------------

本篇文章到此结束，感谢阅读！

完结撒花 ✿✿✿✿✿✿

DAG 学习笔记：从拓扑排序到并行执行

DAG 基本概念

DAG是一种特殊的有向图，具有以下特点：

DAG 在工作流中的核心作用 ：

适用场景：

缺陷：

一、拓扑排序

目的 ：

步骤：

实现:

二、DFS 检测循环

作用：

步骤：

实现：

三、DAG 经过拓扑排序后，按照依赖顺序依次执行每个节点，将前一个节点的输出作为下一个节点的输入，并记录每个节点的执行结果

可并行扩展：

DAG 在工作流中的核心作用：

目的：