Skill 系列（05）：Skill 工作流串联——4 种模式实测，并发加速 1.5x

单个 Skill 的上限

单个 Skill 天生职责单一。一个"写技术文章"的 Skill 做不到"先搜索竞品信息再分析再写报告"，这需要多步骤、多专业能力的协同。

串联把单个 Skill 组合成工作流，突破单一上下文的边界。

4 种串联模式

模式 1：顺序链（Sequential Chain）

css 复制代码

A → B → C
每个 Skill 的输出是下一个 Skill 的输入

最简单，适合线性任务。关键约束：一步失败，整条链中断。

模式 2：并发 Fan-out

bash 复制代码

        → B1 →
A → split → B2 → merge → C
        → B3 →

多个 Skill 并发执行，最后汇总。理论加速比 = 分支数，实际取决于 merge 步骤的时延。

模式 3：条件路由

css 复制代码

A → Router → [type=technical] → tech-writer
             [type=marketing] → marketing-writer
             [default]        → general-writer

Router Skill 输出枚举类型，工作流根据结果跳转。关键：Router 必须输出明确的枚举值，不能是模糊文本。

模式 4：反馈循环

css 复制代码

A → Evaluator → [score ≥ 7] → 输出
                    ↓
               [score < 7] → 携带 feedback → A（重试，max 3）

质量门：输出不达标则携带评审意见重写。必须设最大重试次数，防止死循环。

Demo 设计

4 种模式各用真实 LLM 调用实现：

模式	实现	测量
顺序链	LangGraph 3 节点图：keywords → outline → write	端到端时延
并发 fan-out	`ThreadPoolExecutor` × 3 → merge	fan-out 时延、总时延、加速比
条件路由	LLM 分类输入类型 → 路由到 3 种 writer	路由准确率、输出风格对比
反馈循环	写 → 质量评分（1-10）→ 带 feedback 重写，max 3 轮	迭代次数、每轮评分

运行结果

Pattern 1：顺序链（keyword → outline → write）

vbnet 复制代码

Topic:    Python async/await: from coroutines to production-ready patterns

Keywords: async programming, coroutines, await, production-ready patterns

Outline:  - Introduction to Async Programming in Python
          - Understanding Coroutines and the `async` Keyword
          - Implementing `await` ...

Article:  ### Introduction to Async Programming in Python
          Async programming in Python has revolutionized the way we handle I/O...

Time: 35.1s (3 sequential LLM calls)

Pattern 2：并发 Fan-out（3 analyzers → merge）

vbnet 复制代码

Company: Notion

Product:  From a competitor's perspective, Notion stands out with its
          comprehensive suite...
Market:   Notion's market positioning is as a versatile productivity platform...
Tech:     Notion's technology stack is notable for its robust collaboration features...

Merged:   Notion's competitive edge lies in its versatile productivity suite...

Fan-out time: 12.4s  |  Total (incl. merge): 24.5s
Sequential equiv: ~37.2s  |  Speedup: ~1.5x

Pattern 3：条件路由

vbnet 复制代码

Input:  "Explain how Kubernetes pod scheduling works with a code example"
Route:  technical  (18.9s)
Output: Kubernetes pod scheduling is the process of assigning a pod...

Input:  "Write a compelling product description for our new AI writing tool"
Route:  marketing  (10.7s)
Output: Unleash Your Words with Unmatched Precision! Transform your writing...

Input:  "What is machine learning and why does it matter?"
Route:  technical  (23.5s)
Output: Machine learning (ML) is a subset of artificial intelligence...

Pattern 4：反馈循环（max 3, threshold=7/10）

yaml 复制代码

Topic: Write a technical article about Redis Cluster sharding strategy

  Iteration 1: score=8/10  ✓ PASS
               feedback: The article provides a clear explanation of Redis Cluster
               sharding, but could benefit from...

Final score: 8/10  |  Iterations: 1/3  |  Time: 44.8s

三个发现

发现 1：并发加速比 1.5x，不是理论的 3x

3 个分析器并发运行，fan-out 时间是 12.4s，顺序等价是 37.2s（12.4 × 3），单纯 fan-out 阶段加速比约 3x。但总时延（含 merge）是 24.5s，总体加速比只有 1.5x。

这是 Amdahl 定律的直接体现：

erlang 复制代码

总加速比 = 1 / (串行比例 + 并行比例 / 并发数)

本例：
  并行部分（fan-out）：12.4s / 24.5s ≈ 51%
  串行部分（merge）：12.1s / 24.5s ≈ 49%

当串行部分占 49% 时，无论增加多少并发，加速比上限 ≈ 2x

在 Skill 工作流里，merge 步骤耗时与 fan-out 相当时，增加并发分支对总体时延的收益越来越小。减少 merge 步骤的 Token 量，比增加更多并发分支更有效。

发现 2：第三个路由结果是"technical"而不是"general"

"What is machine learning and why does it matter?" 被路由到了 technical writer，而不是 general writer。

这个分类合理，但取决于受众。同样的输入，面向工程师路由到 technical，面向非技术读者应该路由到 general。Classifier Skill 对目标受众一无所知，只能根据话题猜。

生产环境的修复方向：在路由输入里加上受众信息：

python 复制代码

classifier_input = f"Request: {request}\nTarget audience: {workflow_input.audience}"

发现 3：反馈循环首轮通过，并不总是需要重试

P4 首次得分 8/10，直接通过。质量门自动过滤低质量输出，首次通过证明阈值设置合理。

阈值标定：

5/10 太低：几乎不触发重试，质量门失去意义
9/10 太高：几乎每次重试，成本翻倍
7/10 是合理起点：允许首次通过，也能拦截真正的低质量输出

Feedback 内容的质量决定重试效果：

python 复制代码

# 有效 feedback（指向具体问题）
"Missing code examples for the write-behind pattern; also clarify TTL vs eviction"

# 无效 feedback（泛泛要求）
"Make it better and more comprehensive"

错误处理设计

串联工作流的错误分 4 类，处理策略不同：

arduino 复制代码

可重试错误（Transient）
  → LLM 超时、API 限流
  → 策略：指数退避，重试 3 次（1s → 2s → 4s）

质量未达标（Quality Gate Failed）
  → Skill 输出得分低于阈值
  → 策略：携带 feedback 重试，max 3 次；仍未通过则返回最佳结果 + 质量标注

不可重试错误（Fatal）
  → 权限不足、输入格式错误
  → 策略：立即中止，返回明确错误信息

部分完成（Partial Completion）
  → 并发分支中某个失败
  → 策略：汇总成功分支的结果 + 标注失败部分，而不是整体失败

状态传递规范

上游 Skill 的输出是下游 Skill 的输入。格式不规范会让整条链脆弱。推荐结构：

json 复制代码

{
  "status": "success",
  "output": {
    "main_content": "...",
    "metadata": {
      "word_count": 2500,
      "confidence": 0.92
    }
  },
  "trace_id": "skill-abc-123"
}

长工作流的上下文压缩： 当上游输出超过 5000 tokens 时，下游 Skill 通常不需要完整内容。三种策略：

插入 summarizer Skill，压缩后再传递
只传上游输出的特定字段（output.metadata，不传 output.main_content）
将大型中间产物存入外部存储，下游按需检索

设计 Checklist

顺序链

每步输出格式明确（下游才能正确解析）
关键步骤有错误降级策略（不能一步失败全链路中断）

并发 fan-out

merge Skill 能处理部分分支失败（标注而不是整体失败）
评估 merge 步骤时延，确认并发收益值得这个复杂度

条件路由

Router Skill 输出明确枚举值，不是自由文本
有 default 分支，覆盖未预期的分类结果
路由输入包含受众/场景信息

反馈循环

设置最大重试次数（max 3 是合理起点）
Feedback 指向具体问题，不是泛泛要求
超过最大次数后返回最佳结果 + 质量标注，不是错误

总结

并发加速比 1.5x，而不是 3x：fan-out 阶段加速了 3x，但 49% 的串行 merge 把总体加速压低到 1.5x；Amdahl 定律在 Skill 工作流里同样有效
条件路由依赖受众信息：纯粹的话题分类不够，同一个问题对不同受众应该路由到不同 Skill
反馈循环的价值在过滤，不在强制重试：首轮通过说明质量门设计合理；重要的是让 feedback 指向具体问题，而不是"再好一点"

参考资料

LangGraph StateGraph 文档
本系列完整 Demo 代码：skill-05-workflow

欢迎访问 PrimeSkills ------ 一个精心策划的 AI Agent 与技能市场，所有内容均经过真实企业级工作流验证。没有噱头，只有真正有效的东西。

更多实用知识和有趣产品，欢迎访问我的个人主页