Planning Agent 架构深度解析：从 ReAct 到 Plan-and-Execute 与 Reflexion 的工程实践

在 AI Agent 技术栈中，规划能力（Planning） 是区分"玩具级"与"生产级" Agent 的核心分水岭。本文将深入剖析三种主流 Planning Agent 架构模式，从理论原理到代码实现，助你构建可靠的自主决策系统。

一、为什么 Planning 是 Agent 的核心能力？

1.1 从 ChatBot 到 Agent 的跃迁

传统的大模型应用（如 ChatGPT）本质上是状态less的问答系统：

复制代码

用户提问 → LLM 推理 → 直接回答

而真正的 AI Agent 需要具备状态ful的任务闭环能力：

复制代码

用户目标 → 任务分解 → 工具调用 → 观察反馈 → 动态调整 → 结果交付

关键差异 ：Agent 必须能够自主规划如何达成目标，而非简单依赖单次推理。

1.2 Planning 能力的三个层级

层级	特征	代表模式	适用场景
L1 - 单步决策	每次只决定下一步行动	ReAct	简单问答、实时信息查询
L2 - 全局规划	先制定完整计划再执行	Plan-and-Execute	复杂多步骤任务、批处理
L3 - 动态优化	执行中反思并调整策略	Reflexion	高精度要求、迭代优化任务

二、ReAct：思考与行动的交织

2.1 核心原理

ReAct（Reasoning + Acting） 由 Google Research 于 2022 年提出，其核心思想是让 LLM "边想边做"：

复制代码

Thought → Action → Observation → Thought → Action → ... → Answer

设计哲学：

推理（Reasoning）：分析当前状态，规划下一步
行动（Acting）：执行具体工具调用
观察（Observation）：获取行动结果，更新认知

2.2 执行流程示例

以"查询北京明天天气并给出穿衣建议"为例：

步骤	类型	内容
1	Thought	用户询问明天天气，我需要调用天气查询工具
2	Action	`get_weather(city="北京", date="明天")`
3	Observation	返回：晴，18-25°C，降水概率 10%
4	Thought	气温适中，降水概率低，适合轻薄外套
5	Final Answer	明天北京晴，18-25°C，建议穿轻薄外套

2.3 代码实现（LangChain 风格）

python 复制代码

from langchain.agents import Tool, AgentExecutor, create_react_agent
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

# 定义工具
tools = [
    Tool(
        name="get_weather",
        func=lambda city: f"{city}明天晴，18-25°C",
        description="获取指定城市的天气预报"
    ),
    Tool(
        name="search",
        func=lambda query: f"搜索结果：{query}的相关信息",
        description="用于搜索互联网信息"
    )
]

# ReAct Prompt 模板
react_template = """Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought:{agent_scratchpad}"""

# 创建 Agent
llm = ChatOpenAI(model="gpt-4", temperature=0)
prompt = PromptTemplate.from_template(react_template)
agent = create_react_agent(llm, tools, prompt)

# 执行（关键：设置 max_iterations 防止无限循环）
executor = AgentExecutor(
    agent=agent,
    tools=tools,
    max_iterations=10,  # ⚠️ 安全防护
    handle_parsing_errors=True
)

result = executor.invoke({"input": "北京明天天气怎么样？"})

2.4 适用场景与局限性

✅ 适用场景	❌ 局限性
单步决策明确的任务	长任务效率低（串行执行）
需要实时信息的问答	缺乏全局规划，容易走弯路
工具调用链较短（3-5步内）	上下文窗口限制（长链会溢出）
动态环境需要灵活调整	无法并行执行独立步骤

三、Plan-and-Execute：先规划后执行

3.1 核心原理

与 ReAct 的"边想边做"不同，Plan-and-Execute 采用**"先想后做"**策略：

复制代码

用户输入 → Planner（生成完整计划）→ Executor（按计划执行）→ 结果整合

关键优势：

全局视角：一次性规划所有步骤，避免局部最优
可并行化：识别独立步骤，并行执行提升效率
可监控性：计划透明，便于人工介入和调整

3.2 架构设计

vbscript 复制代码

┌─────────────────────────────────────────────────────────────┐
│                    Plan-and-Execute Agent                    │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────────┐         ┌──────────────┐                  │
│  │   Planner    │────────▶│   Executor   │                  │
│  │  (规划器)     │         │   (执行器)    │                  │
│  └──────────────┘         └──────────────┘                  │
│         │                        │                          │
│         ▼                        ▼                          │
│  ┌──────────────────────────────────────────┐              │
│  │           Step-by-Step Execution          │              │
│  │  Step 1 → Step 2 → Step 3 → ... → Step N  │              │
│  └──────────────────────────────────────────┘              │
│                              │                              │
│                              ▼                              │
│  ┌──────────────────────────────────────────┐              │
│  │         Response Synthesizer              │              │
│  │        (结果整合与输出)                    │              │
│  └──────────────────────────────────────────┘              │
└─────────────────────────────────────────────────────────────┘

3.3 代码实现（LangGraph 风格）

python 复制代码

from typing import TypedDict, List, Annotated
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
import operator

# 定义状态
class AgentState(TypedDict):
    input: str
    plan: List[str]
    past_steps: Annotated[List[tuple], operator.add]
    response: str

# Planner：生成执行计划
async def plan_step(state: AgentState):
    planner_prompt = f"""For the following task, make a plan step by step.
Task: {state['input']}

Output format: Return a JSON list of steps.
Example: ["Step 1 description", "Step 2 description", ...]
"""
    response = await llm.ainvoke(planner_prompt)
    # 解析计划步骤
    plan = parse_plan(response.content)
    return {"plan": plan}

# Executor：执行单个步骤
async def execute_step(state: AgentState):
    current_step = state["plan"][0]
    
    # 调用工具执行当前步骤
    executor_prompt = f"""Execute the following step:
Step: {current_step}
Context: {state['past_steps']}

Use available tools to complete this step."""
    
    response = await executor_agent.ainvoke(executor_prompt)
    
    return {
        "past_steps": [(current_step, response.content)],
        "plan": state["plan"][1:]  # 移除已执行的步骤
    }

# 判断是否需要继续执行
def should_continue(state: AgentState):
    if len(state["plan"]) == 0:
        return "synthesize"
    return "execute"

# 结果整合
async def synthesize_response(state: AgentState):
    synthesize_prompt = f"""Based on the following execution results, provide a final answer:
Original task: {state['input']}
Execution steps and results:
{state['past_steps']}
"""
    response = await llm.ainvoke(synthesize_prompt)
    return {"response": response.content}

# 构建状态图
workflow = StateGraph(AgentState)

workflow.add_node("planner", plan_step)
workflow.add_node("executor", execute_step)
workflow.add_node("synthesize", synthesize_response)

workflow.set_entry_point("planner")
workflow.add_edge("planner", "executor")
workflow.add_conditional_edges(
    "executor",
    should_continue,
    {
        "execute": "executor",
        "synthesize": "synthesize"
    }
)
workflow.add_edge("synthesize", END)

app = workflow.compile()

# 执行
result = await app.ainvoke({
    "input": "帮我制定一个北京3日游计划，包括景点、交通和美食推荐"
})

3.4 与 ReAct 的对比

维度	ReAct	Plan-and-Execute
决策模式	边想边做	先想后做
规划粒度	单步规划	全局规划
执行效率	中等（串行）	高（可并行）
灵活性	高	中
适用任务	简单问答、工具调用	复杂多步骤任务
Token 消耗	较高（多轮推理）	较低（一次性规划）

四、Reflexion：自我反思与迭代优化

4.1 核心原理

Reflexion （2023 年提出）在 Plan-and-Execute 基础上增加了反思循环，让 Agent 能够从错误中学习：

erlang 复制代码

执行 → 评估结果 → 反思问题 → 调整策略 → 重新执行 → ...

关键组件：

Actor：执行任务的 Agent（可以是 ReAct 或 Plan-and-Execute）
Evaluator：评估执行结果是否达标
Self-Reflection：分析问题原因，提出改进建议
Memory：存储反思结果，用于后续优化

4.2 架构设计

scss 复制代码

┌──────────────────────────────────────────────────────────────┐
│                      Reflexion Agent                         │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│   ┌─────────┐    ┌──────────┐    ┌─────────────┐            │
│   │  Actor  │───▶│Evaluator │───▶│  Reflection │            │
│   │ (执行)   │    │ (评估)    │    │   (反思)     │            │
│   └─────────┘    └──────────┘    └──────┬──────┘            │
│        ▲                                │                    │
│        │                                │                    │
│        └────────────────────────────────┘                    │
│                    (改进策略反馈)                             │
│                                                              │
│   ┌────────────────────────────────────────────────────┐    │
│   │                    Memory Store                     │    │
│   │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐ │    │
│   │  │ 失败案例 #1  │  │ 失败案例 #2  │  │ 成功经验 #1  │ │    │
│   │  │ + 反思总结   │  │ + 反思总结   │  │ + 成功要素   │ │    │
│   │  └─────────────┘  └─────────────┘  └─────────────┘ │    │
│   └────────────────────────────────────────────────────┘    │
│                                                              │
└──────────────────────────────────────────────────────────────┘

4.3 代码实现

python 复制代码

from dataclasses import dataclass
from typing import List, Optional

@dataclass
class Reflection:
    """反思记录"""
    task: str
    result: str
    score: float  # 0-1 评估分数
    reflection: str  # 反思总结
    improvement: str  # 改进建议

class ReflexionAgent:
    def __init__(self, llm, max_iterations=3):
        self.llm = llm
        self.max_iterations = max_iterations
        self.memory: List[Reflection] = []
        
    async def evaluate(self, task: str, result: str) -> tuple[float, str]:
        """评估执行结果"""
        eval_prompt = f"""Evaluate the following result for the task.
Task: {task}
Result: {result}

Score the result from 0.0 to 1.0 and provide brief feedback.
Format: Score: X.XX\nFeedback: ..."""
        
        response = await self.llm.ainvoke(eval_prompt)
        # 解析分数和反馈
        score = parse_score(response.content)
        feedback = parse_feedback(response.content)
        return score, feedback
    
    async def reflect(self, task: str, result: str, score: float, feedback: str) -> str:
        """生成反思总结"""
        # 检索相关历史经验
        relevant_experiences = self._retrieve_similar(task)
        
        reflect_prompt = f"""Analyze why the result scored {score} and how to improve.
Task: {task}
Result: {result}
Feedback: {feedback}

Relevant past experiences:
{relevant_experiences}

Provide:
1. Root cause analysis
2. Specific improvement strategy"""
        
        response = await self.llm.ainvoke(reflect_prompt)
        return response.content
    
    async def run(self, task: str) -> str:
        """主执行循环"""
        for iteration in range(self.max_iterations):
            # 构建带反思提示的执行指令
            enhanced_task = self._enhance_with_reflection(task)
            
            # 执行任务（使用 Plan-and-Execute 或 ReAct）
            result = await self.actor.run(enhanced_task)
            
            # 评估结果
            score, feedback = await self.evaluate(task, result)
            
            # 如果达到阈值，返回结果
            if score >= 0.8:
                return result
            
            # 生成反思并存储
            reflection = await self.reflect(task, result, score, feedback)
            self.memory.append(Reflection(
                task=task,
                result=result,
                score=score,
                reflection=reflection,
                improvement=feedback
            ))
        
        # 达到最大迭代次数，返回最佳结果
        return self._get_best_result()
    
    def _enhance_with_reflection(self, task: str) -> str:
        """将历史反思经验注入任务提示"""
        relevant = self._retrieve_similar(task)
        if not relevant:
            return task
        
        return f"""Task: {task}

Lessons learned from similar tasks:
{relevant}

Use these insights to improve your approach."""

4.4 适用场景

✅ 适用场景	❌ 局限性
需要高精度输出的任务（代码生成）	迭代成本高（多次 LLM 调用）
可明确定义成功标准的任务	评估标准难定义
重复性任务（可从历史学习）	可能陷入优化循环
复杂问题求解（试错学习）	需要额外的 Memory 存储

五、三大范式选型指南

5.1 决策流程图

markdown 复制代码

你的任务场景
      │
      ├─ 单步即可完成？
      │   └─ 是 → 直接调用 LLM（无需 Agent）
      │
      ├─ 步骤不确定，需要动态调整？
      │   └─ 是 → ReAct
      │
      ├─ 步骤可预见，追求稳定输出？
      │   └─ 是 → Plan-and-Execute
      │
      └─ 需要高精度，可接受迭代成本？
          └─ 是 → Reflexion

5.2 选型速查表

场景	推荐范式	理由
实时问答助手	ReAct	快速响应，单轮交互
数据分析报告生成	Plan-and-Execute	步骤清晰，可并行处理
代码生成/审查	Reflexion	需要高质量，可迭代优化
自动化工作流	Plan-and-Execute	预定义步骤，稳定执行
创意内容生成	ReAct	灵活探索，保留创造性
复杂问题求解	Reflexion	试错学习，逐步逼近最优

六、生产环境最佳实践

6.1 安全防护（必须配置）

python 复制代码

# 1. 限制最大迭代次数
max_iterations = 10  # 防止无限循环

# 2. 设置超时时间
timeout = 300  # 5分钟

# 3. 成本预算控制
max_tokens_per_task = 10000

# 4. 工具调用白名单
allowed_tools = ["search", "get_weather", "calculate"]

# 5. 输出审核（敏感内容过滤）
def content_filter(output: str) -> bool:
    blocked_keywords = [...]
    return not any(kw in output for kw in blocked_keywords)

6.2 可观测性设计

python 复制代码

# 记录每一步的执行日志
execution_log = {
    "task": "...",
    "steps": [
        {
            "step": 1,
            "thought": "...",
            "action": "...",
            "observation": "...",
            "latency_ms": 1234,
            "tokens_used": 567
        }
    ],
    "total_latency_ms": 5000,
    "total_tokens": 5000,
    "success": True
}

6.3 混合架构设计

在实际生产中，往往需要组合多种模式：

python 复制代码

class HybridPlanningAgent:
    """混合规划 Agent：顶层 Plan-and-Execute + 底层 ReAct + 关键节点 Reflexion"""
    
    async def run(self, complex_task: str):
        # 1. 顶层：Plan-and-Execute 分解任务
        subtasks = await self.planner.decompose(complex_task)
        
        results = []
        for subtask in subtasks:
            # 2. 根据子任务特性选择模式
            if subtask.requires_precision:
                # 高精度要求：使用 Reflexion
                result = await self.reflexion_agent.run(subtask)
            elif subtask.steps_are_uncertain:
                # 步骤不确定：使用 ReAct
                result = await self.react_agent.run(subtask)
            else:
                # 标准任务：使用 Plan-and-Execute
                result = await self.plan_execute_agent.run(subtask)
            
            results.append(result)
        
        # 3. 整合结果
        return await self.synthesizer.synthesize(results)

七、总结

Planning Agent 的三种范式代表了 AI 自主决策能力的三个演进阶段：

范式	核心思想	关键优势	适用场景
ReAct	边想边做	灵活、透明	简单任务、动态环境
Plan-and-Execute	先想后做	高效、可控	复杂任务、批处理
Reflexion	想→做→反思→改进	高质量、可学习	高精度要求任务

选型建议：

从 ReAct 开始：概念简单，易于理解和调试
逐步引入规划：当任务复杂度提升时使用 Plan-and-Execute
关键任务使用 Reflexion：对输出质量敏感的场景引入反思机制
持续优化：监控任务成功率、平均执行步数、Token 消耗等核心指标

Planning Agent 的本质是将人类的问题解决思维范式编码到 AI 系统中。理解这三种模式的本质差异，才能在实际工程中做出正确的架构选型。

参考资源

ReAct 论文 ：ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022)
Reflexion 论文 ：Reflexion: Self-Reflective Agents (Shinn et al., 2023)
LangGraph 文档 ：langchain-ai.github.io/langgraph/
LangChain Agents ：python.langchain.com/docs/module...