如何用状态机把 LLM 调用组织成一条可靠、可观测、可恢复的生产线
前言
2025 年,大模型已经能写出质量不错的文章了。但"能写"和"能可靠地生产"之间,隔着一道工程鸿沟:
- 一篇长文内容需要获取数据 → 联网搜索 → 多轮 AI 分析 → 分段写作 → 排版配图,共十几个步骤
- AI 调用超时、网络抖动、中间结果丢失是常态
- 用户提交后不能干等,要异步执行、随时查进度
- 某个步骤挂了不能每次都重头跑
我们基于 Spring StateMachine + LLM,构建了一套"AI 写作流水线",把大模型能力组织成了一个可恢复、可观测、易扩展的状态机。这篇文章拆解它的设计和实现。
一、场景:多步骤 AI 内容生产
用户输入:主题 + 风格参数
↓
系统输出:一篇完整的结构化内容
↓
包含:数据分析、素材整理、章节撰写、图表配图......
难点在于:这些环节有严格的前后依赖,每个都可能失败,整体耗时可达数分钟,不可能让用户同步等待。
二、架构选型:为什么是状态机?
2.1 直觉方案:执行器模式
如果不假思索,最自然的设计是写一个接口 + 一个 for 循环:
java
interface Executor {
boolean execute(Context ctx);
}
// 按顺序排列
List<Executor> pipeline = Arrays.asList(
new InitializeExecutor(),
new FetchDataExecutor(),
new SearchArticlesExecutor(),
// ... 15 个执行器
);
// 顺序执行
for (Executor e : pipeline) {
if (!e.execute(ctx)) break; // 失败了就停
}
简单、直接、一目了然。但它有几个硬伤:
- 断点续写能力为零------执行到第 10 步挂了,恢复时只能从第 1 步重来
- 错误粒度模糊------任何一步失败都是退出循环,无法区分"网络超时"和"AI 生成异常"
- 异步步骤难以混合------如果某个步骤是网络 I/O(异步执行),for 循环没法等它完成
- 运行时无观测能力------"当前跑到第几步"需要自己埋点
2.2 最终方案:Spring StateMachine
状态机框架天然提供了:
- 命名状态:每一步有唯一的身份标识(不是 List 下标)
- 事件驱动:通过事件在状态间跳转,天然支持异步回调驱动
- 错误路由:每个正常态对应一个错误态,错误分类清晰
- 监听器机制:状态变更/错误/转换都有钩子
- 独立实例:每篇内容一个状态机实例,互不干扰
三、核心设计:一张图看懂状态机
3.1 三大基本元素
状态(State) = 流水线上的工位,例如"抓取素材""生成大纲"
事件(Event) = 工序完成的信号,例如"FETCHED""OUTLINE_GENERATED"
动作(Action) = 工位上具体干活的人,例如 fetchArticlesService.execute()
流转规则:当前状态 + 收到事件 → 下一个状态(并执行动作)
3.2 完整流转图
┌──────────────────────────┐
│ START (初始状态) │
└────────────┬───────────────┘
│ sendEvent(INITIALIZED)
▼
┌──────────────────────────┐
┌─────────│ InitializeWriting │─────────┐
│ 成功 └────────────┬───────────────┘ 失败 │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────┐
│ GET_PROPERTY_DATA │ │ PROP_DATA_FETCHED│ │ INITIALIZE_FAILED │
│ (获取基础数据) │─┼────────────────→│ │ (终止态) │
└──────────────────┘ │ │ └──────────────────────┘
│ └──────────────────┘
▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────┐
│ FETCH_ARTICLES │ │ FETCHED │ │ FETCH_FAILED │
│ (素材采集) │─┼────────────────→│ │ (终止态) │
└──────────────────┘ │ │ └──────────────────────┘
│ └──────────────────┘
▼
┌──────────────────┐
│ CHECK_SIMILARITY │ → SCREEN_ARTICLES → ABSTRACT_ARTICLE
│ (相关性检测) │ → GROUP_ARTICLES
└──────────────────┘ → GENERATE_INITIAL_OUTLINE
│ → MERGE_OUTLINE
▼ → WRITE_LEAF_CONTENT
(AI写具体段落)
│ → WRITE_NON_LEAF_CONTENT
(AI写章节概览)
▼ → MERGE_ARTICLE
→ PROOFREAD_IMAGE
→ GENERATE_CHARTS
→ INSERT_CHARTS
→ END ✓
[任一阶段失败 → 进入对应的 XXX_FAILED 终止态]
3.3 整体编排时序
从用户请求到文章产出的完整调用链路:
数据库 Action(业务) StateMachine WorkflowServiceImpl ContentController 数据库 Action(业务) StateMachine WorkflowServiceImpl ContentController #mermaid-svg-5bMdEJRFblerL12Y{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-5bMdEJRFblerL12Y .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-5bMdEJRFblerL12Y .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-5bMdEJRFblerL12Y .error-icon{fill:#552222;}#mermaid-svg-5bMdEJRFblerL12Y .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-5bMdEJRFblerL12Y .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-5bMdEJRFblerL12Y .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-5bMdEJRFblerL12Y .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-5bMdEJRFblerL12Y .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-5bMdEJRFblerL12Y .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-5bMdEJRFblerL12Y .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-5bMdEJRFblerL12Y .marker{fill:#333333;stroke:#333333;}#mermaid-svg-5bMdEJRFblerL12Y .marker.cross{stroke:#333333;}#mermaid-svg-5bMdEJRFblerL12Y svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-5bMdEJRFblerL12Y p{margin:0;}#mermaid-svg-5bMdEJRFblerL12Y .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-5bMdEJRFblerL12Y text.actor>tspan{fill:black;stroke:none;}#mermaid-svg-5bMdEJRFblerL12Y .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-5bMdEJRFblerL12Y .innerArc{stroke-width:1.5;stroke-dasharray:none;}#mermaid-svg-5bMdEJRFblerL12Y .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-svg-5bMdEJRFblerL12Y .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-svg-5bMdEJRFblerL12Y #arrowhead path{fill:#333;stroke:#333;}#mermaid-svg-5bMdEJRFblerL12Y .sequenceNumber{fill:white;}#mermaid-svg-5bMdEJRFblerL12Y #sequencenumber{fill:#333;}#mermaid-svg-5bMdEJRFblerL12Y #crosshead path{fill:#333;stroke:#333;}#mermaid-svg-5bMdEJRFblerL12Y .messageText{fill:#333;stroke:none;}#mermaid-svg-5bMdEJRFblerL12Y .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-5bMdEJRFblerL12Y .labelText,#mermaid-svg-5bMdEJRFblerL12Y .labelText>tspan{fill:black;stroke:none;}#mermaid-svg-5bMdEJRFblerL12Y .loopText,#mermaid-svg-5bMdEJRFblerL12Y .loopText>tspan{fill:black;stroke:none;}#mermaid-svg-5bMdEJRFblerL12Y .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-5bMdEJRFblerL12Y .note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-5bMdEJRFblerL12Y .noteText,#mermaid-svg-5bMdEJRFblerL12Y .noteText>tspan{fill:black;stroke:none;}#mermaid-svg-5bMdEJRFblerL12Y .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-svg-5bMdEJRFblerL12Y .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-svg-5bMdEJRFblerL12Y .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-svg-5bMdEJRFblerL12Y .actorPopupMenu{position:absolute;}#mermaid-svg-5bMdEJRFblerL12Y .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg-5bMdEJRFblerL12Y .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-5bMdEJRFblerL12Y .actor-man circle,#mermaid-svg-5bMdEJRFblerL12Y line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#mermaid-svg-5bMdEJRFblerL12Y :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} ... 后续约10个步骤依次执行 ... 用户 POST /content/create (主题 + 风格参数)startWriting(conversationId)保存任务记录(INIT)start()★ 执行 InitializeWriting记录执行状态sendEvent(INITIALIZED)★ 执行 GetPropertyData记录执行状态sendEvent(PROP_DATA_FETCHED)★ 执行 FetchArticles (素材采集)记录执行状态sendEvent(FETCHED)★ 执行 CheckSimilarity★ 执行 ScreenArticles★ 执行 AbstractArticle★ 执行 GroupArticles★ 执行 GenerateInitialOutline★ 执行 InsertCharts标记任务完成sendEvent(COMPLETED)通知任务结束返回 taskId202 Accepted (异步执行) 用户
3.4 状态机内部流转机制
Action StateMachineListener StateMachine StateMachineConfig Action StateMachineListener StateMachine StateMachineConfig #mermaid-svg-BO3QSfi0ajfa3n5c{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-BO3QSfi0ajfa3n5c .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-BO3QSfi0ajfa3n5c .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-BO3QSfi0ajfa3n5c .error-icon{fill:#552222;}#mermaid-svg-BO3QSfi0ajfa3n5c .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-BO3QSfi0ajfa3n5c .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-BO3QSfi0ajfa3n5c .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-BO3QSfi0ajfa3n5c .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-BO3QSfi0ajfa3n5c .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-BO3QSfi0ajfa3n5c .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-BO3QSfi0ajfa3n5c .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-BO3QSfi0ajfa3n5c .marker{fill:#333333;stroke:#333333;}#mermaid-svg-BO3QSfi0ajfa3n5c .marker.cross{stroke:#333333;}#mermaid-svg-BO3QSfi0ajfa3n5c svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-BO3QSfi0ajfa3n5c p{margin:0;}#mermaid-svg-BO3QSfi0ajfa3n5c .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-BO3QSfi0ajfa3n5c text.actor>tspan{fill:black;stroke:none;}#mermaid-svg-BO3QSfi0ajfa3n5c .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-BO3QSfi0ajfa3n5c .innerArc{stroke-width:1.5;stroke-dasharray:none;}#mermaid-svg-BO3QSfi0ajfa3n5c .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-svg-BO3QSfi0ajfa3n5c .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-svg-BO3QSfi0ajfa3n5c #arrowhead path{fill:#333;stroke:#333;}#mermaid-svg-BO3QSfi0ajfa3n5c .sequenceNumber{fill:white;}#mermaid-svg-BO3QSfi0ajfa3n5c #sequencenumber{fill:#333;}#mermaid-svg-BO3QSfi0ajfa3n5c #crosshead path{fill:#333;stroke:#333;}#mermaid-svg-BO3QSfi0ajfa3n5c .messageText{fill:#333;stroke:none;}#mermaid-svg-BO3QSfi0ajfa3n5c .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-BO3QSfi0ajfa3n5c .labelText,#mermaid-svg-BO3QSfi0ajfa3n5c .labelText>tspan{fill:black;stroke:none;}#mermaid-svg-BO3QSfi0ajfa3n5c .loopText,#mermaid-svg-BO3QSfi0ajfa3n5c .loopText>tspan{fill:black;stroke:none;}#mermaid-svg-BO3QSfi0ajfa3n5c .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-BO3QSfi0ajfa3n5c .note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-BO3QSfi0ajfa3n5c .noteText,#mermaid-svg-BO3QSfi0ajfa3n5c .noteText>tspan{fill:black;stroke:none;}#mermaid-svg-BO3QSfi0ajfa3n5c .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-svg-BO3QSfi0ajfa3n5c .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-svg-BO3QSfi0ajfa3n5c .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-svg-BO3QSfi0ajfa3n5c .actorPopupMenu{position:absolute;}#mermaid-svg-BO3QSfi0ajfa3n5c .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg-BO3QSfi0ajfa3n5c .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-BO3QSfi0ajfa3n5c .actor-man circle,#mermaid-svg-BO3QSfi0ajfa3n5c line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#mermaid-svg-BO3QSfi0ajfa3n5c :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 初始化阶段:定义拓扑 运行阶段:事件驱动流转 循环直到终止态或错误态 configure(StateMachineConfigurer)注册 States 枚举注册 Events 枚举注册 Transition: source + event → target为每个 State 绑定 ActiononStateChange (离开当前状态)★ 进入新状态 → 执行业务逻辑sendEvent(SUCCESS / ERROR)onStateChanged (到达下一个状态)onTransition (完成一次转换)
三个时序图中,整体编排时序 展示的是业务视角(用户→服务→数据库),内部机制时序展示的是框架视角(配置→状态机→监听器),两者相辅相成。
3.5 核心概念:六张映射关系
状态机运行时依赖六类映射,分别服务于正常运行 和断点恢复两个场景:
| # | 映射 | 用途 | 位置 |
|---|---|---|---|
| 1 | source + event → target | 定义流转路线 | StateMachineTransitionConfigurer |
| 2 | state → Action | 进入状态时执行业务 | StateMachineStateConfigurer |
| 3 | "fetch_articles" → FETCH_ARTICLES |
数据库字符串转状态 | StateMachineStateHandler |
| 4 | FETCH_ARTICLES → PROPERTY_DATA_FETCHED |
目标状态对应的触发事件 | StateMachineStateHandler |
| 5 | FETCH_ARTICLES → GET_PROPERTY_DATA |
目标状态的前一个状态 | StateMachineStateHandler |
| 6 | ExecutionTypeEnum → DB 记录 |
运行时状态持久化 | ExecutionServiceImpl |
其中 1-2 是运行期映射 (定义正常行为),3-5 是恢复期映射 (定义断点续写的导航逻辑),6 是持久化映射(运行时状态落地到数据库)。
四、核心技术细节
4.1 执行上下文:状态机的"数据总线"
所有步骤之间共享的数据放在 ExecutionContext 中,通过 StateMachine 的 ExtendedState 传递:
java
// 启动时注入
stateMachine.getExtendedState().getVariables().put("context", executionContext);
// Action 中取出使用
ExecutionContext ctx = (ExecutionContext)
stateContext.getExtendedState().getVariables().get("context");
每个 Service 执行后,把结果写回 Context,下一个 Service 直接读取:
java
// 步骤 A
context.setSearchResult(searchResult);
context.setProcessedData(processedData);
// 步骤 B
SearchResult result = context.getSearchResult();
List<ProcessedData> data = context.getProcessedData();
4.2 异步步骤的处理
网络 I/O 步骤通过 Reactor 做异步处理,不阻塞状态机线程:
java
private Action<..., ...> fetchArticlesAction() {
return stateContext -> {
Mono.fromCallable(() -> fetchArticlesService.execute(context))
.subscribeOn(Schedulers.boundedElastic()) // IO 线程池
.doOnSuccess(result -> {
if (Boolean.TRUE.equals(result)) {
sendEvent(stateMachine, FETCHED); // 异步回调中驱动下一步
} else {
sendEvent(stateMachine, FETCH_ERROR); // 失败回调
}
})
.doOnError(e -> sendEvent(stateMachine, FETCH_ERROR))
.subscribe(); // 不阻塞,立即返回
};
}
同步步骤和异步步骤在状态机中完全透明混合------状态机只认事件,不关心事件是谁、何时发出的。
其异步交互时序如下:
StateMachine(下一步) LLM/外部服务 异步I/O线程 Action StateMachine StateMachine(下一步) LLM/外部服务 异步I/O线程 Action StateMachine #mermaid-svg-D6nLPUaav94HaK8u{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-D6nLPUaav94HaK8u .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-D6nLPUaav94HaK8u .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-D6nLPUaav94HaK8u .error-icon{fill:#552222;}#mermaid-svg-D6nLPUaav94HaK8u .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-D6nLPUaav94HaK8u .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-D6nLPUaav94HaK8u .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-D6nLPUaav94HaK8u .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-D6nLPUaav94HaK8u .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-D6nLPUaav94HaK8u .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-D6nLPUaav94HaK8u .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-D6nLPUaav94HaK8u .marker{fill:#333333;stroke:#333333;}#mermaid-svg-D6nLPUaav94HaK8u .marker.cross{stroke:#333333;}#mermaid-svg-D6nLPUaav94HaK8u svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-D6nLPUaav94HaK8u p{margin:0;}#mermaid-svg-D6nLPUaav94HaK8u .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-D6nLPUaav94HaK8u text.actor>tspan{fill:black;stroke:none;}#mermaid-svg-D6nLPUaav94HaK8u .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-D6nLPUaav94HaK8u .innerArc{stroke-width:1.5;stroke-dasharray:none;}#mermaid-svg-D6nLPUaav94HaK8u .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-svg-D6nLPUaav94HaK8u .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-svg-D6nLPUaav94HaK8u #arrowhead path{fill:#333;stroke:#333;}#mermaid-svg-D6nLPUaav94HaK8u .sequenceNumber{fill:white;}#mermaid-svg-D6nLPUaav94HaK8u #sequencenumber{fill:#333;}#mermaid-svg-D6nLPUaav94HaK8u #crosshead path{fill:#333;stroke:#333;}#mermaid-svg-D6nLPUaav94HaK8u .messageText{fill:#333;stroke:none;}#mermaid-svg-D6nLPUaav94HaK8u .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-D6nLPUaav94HaK8u .labelText,#mermaid-svg-D6nLPUaav94HaK8u .labelText>tspan{fill:black;stroke:none;}#mermaid-svg-D6nLPUaav94HaK8u .loopText,#mermaid-svg-D6nLPUaav94HaK8u .loopText>tspan{fill:black;stroke:none;}#mermaid-svg-D6nLPUaav94HaK8u .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-D6nLPUaav94HaK8u .note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-D6nLPUaav94HaK8u .noteText,#mermaid-svg-D6nLPUaav94HaK8u .noteText>tspan{fill:black;stroke:none;}#mermaid-svg-D6nLPUaav94HaK8u .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-svg-D6nLPUaav94HaK8u .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-svg-D6nLPUaav94HaK8u .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-svg-D6nLPUaav94HaK8u .actorPopupMenu{position:absolute;}#mermaid-svg-D6nLPUaav94HaK8u .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg-D6nLPUaav94HaK8u .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-D6nLPUaav94HaK8u .actor-man circle,#mermaid-svg-D6nLPUaav94HaK8u line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#mermaid-svg-D6nLPUaav94HaK8u :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 立即返回,不阻塞状态机线程 alt成功失败 进入状态(如 FetchArticles)从 ExtendedState 取出 ContextMono.subscribeOn(boundedElastic)HTTP 调用返回结果doOnSuccess / doOnErrorsendEvent(FETCHED)转换到下一状态sendEvent(FETCH_ERROR)进入 FETCH_FAILED 终止态
关键点:Action 内部通过 Reactor 做异步编排,
subscribe()立即返回不阻塞;真正的业务结果在回调中以事件方式驱动状态机继续推进。状态机本身对"同步还是异步"完全不感知。
4.3 断点续写:状态机的杀手级能力
当任务因超时或异常中断后,可以精确恢复:
java
public boolean continueFailedTask(String conversationId, String taskId) {
// 1. 查数据库,找到上次执行到哪里
ExecutionRecords lastExec = dao.selectLastExecutionByTaskId(taskId);
// 2. 确定要从哪个状态恢复
States resumeFrom = determineResumeState(lastExec);
// → 上次成功 → 从下一个状态开始
// → 上次失败 → 从当前状态重试
// 3. 重构执行上下文(从数据库恢复之前的中间结果)
ExecutionContext ctx = buildExecutionContext(conversation, task, ...);
// 4. 重置状态机到目标状态的前一个节点,发事件驱动进入目标状态
stateMachine.getStateMachineAccessor()
.doWithAllRegions(accessor -> {
accessor.resetStateMachine(
new DefaultStateMachineContext<>(previousState, null, null, null));
});
// 5. 发送目标状态对应的事件 → 触发 Action 执行
Events event = handler.getStateEvent(resumeFrom);
stateMachine.sendEvent(event);
}
恢复逻辑的核心在于 StateMachineStateHandler 维护的三张映射表,它回答了三个问题:
- 当前状态是什么?(从数据库执行类型反推)
- 要进入目标状态需要什么事件?(stateToEventMap)
- 状态机应该重置到哪里?(previousStateMap,回退一步再重放)
断点恢复的完整时序如下:
Action StateMachine StateMachineStateHandler 数据库 WorkflowServiceImpl Action StateMachine StateMachineStateHandler 数据库 WorkflowServiceImpl #mermaid-svg-2WmJJMh5H35JaXU2{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-2WmJJMh5H35JaXU2 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-2WmJJMh5H35JaXU2 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-2WmJJMh5H35JaXU2 .error-icon{fill:#552222;}#mermaid-svg-2WmJJMh5H35JaXU2 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-2WmJJMh5H35JaXU2 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-2WmJJMh5H35JaXU2 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-2WmJJMh5H35JaXU2 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-2WmJJMh5H35JaXU2 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-2WmJJMh5H35JaXU2 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-2WmJJMh5H35JaXU2 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-2WmJJMh5H35JaXU2 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-2WmJJMh5H35JaXU2 .marker.cross{stroke:#333333;}#mermaid-svg-2WmJJMh5H35JaXU2 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-2WmJJMh5H35JaXU2 p{margin:0;}#mermaid-svg-2WmJJMh5H35JaXU2 .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-2WmJJMh5H35JaXU2 text.actor>tspan{fill:black;stroke:none;}#mermaid-svg-2WmJJMh5H35JaXU2 .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-2WmJJMh5H35JaXU2 .innerArc{stroke-width:1.5;stroke-dasharray:none;}#mermaid-svg-2WmJJMh5H35JaXU2 .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-svg-2WmJJMh5H35JaXU2 .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-svg-2WmJJMh5H35JaXU2 #arrowhead path{fill:#333;stroke:#333;}#mermaid-svg-2WmJJMh5H35JaXU2 .sequenceNumber{fill:white;}#mermaid-svg-2WmJJMh5H35JaXU2 #sequencenumber{fill:#333;}#mermaid-svg-2WmJJMh5H35JaXU2 #crosshead path{fill:#333;stroke:#333;}#mermaid-svg-2WmJJMh5H35JaXU2 .messageText{fill:#333;stroke:none;}#mermaid-svg-2WmJJMh5H35JaXU2 .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-2WmJJMh5H35JaXU2 .labelText,#mermaid-svg-2WmJJMh5H35JaXU2 .labelText>tspan{fill:black;stroke:none;}#mermaid-svg-2WmJJMh5H35JaXU2 .loopText,#mermaid-svg-2WmJJMh5H35JaXU2 .loopText>tspan{fill:black;stroke:none;}#mermaid-svg-2WmJJMh5H35JaXU2 .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-2WmJJMh5H35JaXU2 .note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-2WmJJMh5H35JaXU2 .noteText,#mermaid-svg-2WmJJMh5H35JaXU2 .noteText>tspan{fill:black;stroke:none;}#mermaid-svg-2WmJJMh5H35JaXU2 .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-svg-2WmJJMh5H35JaXU2 .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-svg-2WmJJMh5H35JaXU2 .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-svg-2WmJJMh5H35JaXU2 .actorPopupMenu{position:absolute;}#mermaid-svg-2WmJJMh5H35JaXU2 .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg-2WmJJMh5H35JaXU2 .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-2WmJJMh5H35JaXU2 .actor-man circle,#mermaid-svg-2WmJJMh5H35JaXU2 line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#mermaid-svg-2WmJJMh5H35JaXU2 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} alt上次成功上次失败(重试) 状态机回退到前一个稳定状态 后续步骤继续推进,无需人工干预 continueFailedTask(taskId)selectLastExecutionByTaskId(taskId)最近执行记录 (状态=FetchArticles, 结果=失败)determineResumeState(lastExec)从下一个状态开始从当前状态重试buildExecutionContext (从DB恢复中间结果)getStateEvent(resumeFrom)FETCHED (目标状态对应的触发事件)resetStateMachine(previousState)sendEvent(FETCHED)★ 进入目标状态 → 执行业务逻辑记录重试执行状态sendEvent(FETCHED)
相比执行器模式需要手动维护 checkpoint 列表,状态机的状态标识本身就是天然断点。
StateMachineStateHandler的映射表让"从哪里恢复、用什么事件驱动"变成了纯配置问题。
4.4 超时终止机制
超过阈值未完成的任务自动终止:
java
@Value("${task.timeout.seconds:7200}")
private Integer timeoutSeconds;
public int timeoutTasks() {
List<TaskInfo> timeoutTasks = dao.selectTimeoutTasks(timeoutSeconds);
for (TaskInfo task : timeoutTasks) {
StateMachine<..., ...> sm = factory.getStateMachine(task.getTaskId());
sm.stopReactively().subscribe();
dao.markError(task.getTaskId(), "超时失败");
}
}
超时检测的交互时序如下:
StateMachine 数据库 WorkflowServiceImpl 定时任务 StateMachine 数据库 WorkflowServiceImpl 定时任务 #mermaid-svg-o25L90DJ69K5jIdW{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-o25L90DJ69K5jIdW .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-o25L90DJ69K5jIdW .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-o25L90DJ69K5jIdW .error-icon{fill:#552222;}#mermaid-svg-o25L90DJ69K5jIdW .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-o25L90DJ69K5jIdW .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-o25L90DJ69K5jIdW .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-o25L90DJ69K5jIdW .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-o25L90DJ69K5jIdW .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-o25L90DJ69K5jIdW .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-o25L90DJ69K5jIdW .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-o25L90DJ69K5jIdW .marker{fill:#333333;stroke:#333333;}#mermaid-svg-o25L90DJ69K5jIdW .marker.cross{stroke:#333333;}#mermaid-svg-o25L90DJ69K5jIdW svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-o25L90DJ69K5jIdW p{margin:0;}#mermaid-svg-o25L90DJ69K5jIdW .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-o25L90DJ69K5jIdW text.actor>tspan{fill:black;stroke:none;}#mermaid-svg-o25L90DJ69K5jIdW .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-o25L90DJ69K5jIdW .innerArc{stroke-width:1.5;stroke-dasharray:none;}#mermaid-svg-o25L90DJ69K5jIdW .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-svg-o25L90DJ69K5jIdW .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-svg-o25L90DJ69K5jIdW #arrowhead path{fill:#333;stroke:#333;}#mermaid-svg-o25L90DJ69K5jIdW .sequenceNumber{fill:white;}#mermaid-svg-o25L90DJ69K5jIdW #sequencenumber{fill:#333;}#mermaid-svg-o25L90DJ69K5jIdW #crosshead path{fill:#333;stroke:#333;}#mermaid-svg-o25L90DJ69K5jIdW .messageText{fill:#333;stroke:none;}#mermaid-svg-o25L90DJ69K5jIdW .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-o25L90DJ69K5jIdW .labelText,#mermaid-svg-o25L90DJ69K5jIdW .labelText>tspan{fill:black;stroke:none;}#mermaid-svg-o25L90DJ69K5jIdW .loopText,#mermaid-svg-o25L90DJ69K5jIdW .loopText>tspan{fill:black;stroke:none;}#mermaid-svg-o25L90DJ69K5jIdW .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-o25L90DJ69K5jIdW .note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-o25L90DJ69K5jIdW .noteText,#mermaid-svg-o25L90DJ69K5jIdW .noteText>tspan{fill:black;stroke:none;}#mermaid-svg-o25L90DJ69K5jIdW .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-svg-o25L90DJ69K5jIdW .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-svg-o25L90DJ69K5jIdW .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-svg-o25L90DJ69K5jIdW .actorPopupMenu{position:absolute;}#mermaid-svg-o25L90DJ69K5jIdW .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg-o25L90DJ69K5jIdW .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-o25L90DJ69K5jIdW .actor-man circle,#mermaid-svg-o25L90DJ69K5jIdW line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#mermaid-svg-o25L90DJ69K5jIdW :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 每分钟扫描一次 loop每个超时任务 用户/前端通过 taskId 可查询超时结果 timeoutTasks()selectTimeoutTasks(7200s)超时任务清单 taskId_1, taskId_2, ...factory.getStateMachine(taskId)stopReactively().subscribe()markError(taskId, "执行超时")
五、项目结构一览
project
├── pom.xml # Maven 父工程
├── core-module # 主服务模块
│ ├── Application.java # Spring Boot 入口
│ ├── api/
│ │ ├── llms/ # LLM 客户端 (OpenAI 兼容接口)
│ │ ├── feign/ # Feign 接口-外部服务调用
│ │ └── workflow/ # 数据模型
│ ├── service/config/statemachine/
│ │ ├── States.java # 状态枚举
│ │ ├── Events.java # 事件枚举
│ │ ├── StateMachineConfig.java # 状态机配置(核心)
│ │ └── Variables.java # 上下文变量常量
│ ├── service/impl/
│ │ ├── execution/ # ★ 多个执行 Service (核心业务)
│ │ │ ├── InitializeService.java
│ │ │ ├── FetchDataService.java
│ │ │ ├── CollectMaterialService.java
│ │ │ ├── CheckRelevanceService.java
│ │ │ ├── ScreenMaterialService.java
│ │ │ ├── AbstractContentService.java
│ │ │ ├── GroupMaterialService.java
│ │ │ ├── GenerateOutlineService.java
│ │ │ ├── MergeOutlineService.java
│ │ │ ├── WriteLeafContentService.java
│ │ │ ├── WriteNonLeafContentService.java
│ │ │ ├── MergeContentService.java
│ │ │ ├── ProofreadImageService.java
│ │ │ ├── GenerateChartsService.java
│ │ │ └── InsertChartsService.java
│ │ ├── WorkflowServiceImpl.java # 状态机启动/恢复
│ │ ├── StateMachineStateHandler.java # 映射关系辅助类
│ │ ├── LlmService.java # LLM 调用封装
│ │ └── execution/ExecutionContext.java # 执行上下文(数据总线)
│ └── service/http/controller/
│ └── ContentController.java # REST API 入口
│
└── pojo-module # POJO 模块
└── bo/StartRequestBo.java # 请求参数
六、如何快速迭代
这个架构的一个隐性收益是:大多数需求不需要改核心框架,只需要改少数几个文件。
场景 1:改 Prompt(最常见的需求)
通过配置中心修改 Prompt 文本即可。Prompt 通过数字 ID 引用:
properties
llm.prompt.leaf.content = 50 # 写段落用哪套Prompt
llm.prompt.chart.config = 42 # 生成图表用哪套Prompt
改动量:0 行代码。
场景 2:改某个步骤的业务逻辑
直接找 execution/ 目录下对应的 Service 文件,修改其 execute() 方法。
改动量:1 个文件。
场景 3:增删一个步骤
如果要在"Screening"和"Grouping"之间加一个"Deduplicate"步骤:
改动文件:
States.java → +DEDUPLICATE (1行)
Events.java → +DEDUPLICATED (1行)
StateMachineConfig.java → +state + transition (4行)
StateMachineStateHandler.java → +2行映射 (2行)
新建文件:
DeduplicateService.java → execute() 方法 (约30行)
改动量:4 个文件 + 1 个新文件,核心框架零修改。
场景 4:添加新的处理模式/风格
java
// StyleEnum.java 加一行
NEW_STYLE("newStyle", "新风格描述")
// StyleConfig.java 加对应的 Prompt 模板
改动量:2 个文件。
七、写在最后
状态机模式 vs 执行器模式,怎么选?
| 维度 | 状态机 | 执行器模式 |
|---|---|---|
| 代码量 | 有框架代码成本 | 简单直接 |
| 学习曲线 | 需要理解 State/Event/Action | 任何开发者秒懂 |
| 断点续写 | 天然支持 | 需自己实现 |
| 错误分类 | 每个状态有专属错误态 | 只有 true/false |
| 异步混合 | Action 内异步,事件驱动 | for 循环难处理异步 |
| 可观测性 | 内建监听器 | 需手动埋点 |
| 运行时修改流程 | 配置即拓扑 | 改 List 顺序 |
用状态机的场景 :流程长(10+步)、需要断点续写、异步/同步混合、错误需要精细分类。
用执行器模式的场景:流程短(3-5步)、不需要恢复、纯同步、快速原型。
技术选型的启发
这个项目最有意思的地方不在于"用了状态机",而在于把 LLM 调用当成一个有副作用的异步 I/O 操作来编排------就像编排数据库查询、RPC 调用一样。
很多团队把 AI 写作做成一步到位的大 Prompt("请写一篇关于XX的文章"),结果 Prompt 越长、越不稳定、越难调试。这个项目的思路是把复杂任务拆成多个原子步骤,每个步骤只做一件事,交给状态机来保证顺序和可靠性。这种"化整为零、拆大为小"的思路,可能比状态机本身更有参考价值。
项目基于 Spring Boot + Spring StateMachine + LLM,代码已上线生产环境。