【 超越 ReAct:手搓 Plan-and-Execute (Planner) Agent】

文章目录

  • [🧠 超越 ReAct:手搓 Plan-and-Execute (Planner) Agent](#🧠 超越 ReAct:手搓 Plan-and-Execute (Planner) Agent)
    • [🏗️ 架构哲学:谋定而后动](#🏗️ 架构哲学:谋定而后动)
    • [🛠️ 核心实现:Planner Agent (`planner_agent.py`)](#🛠️ 核心实现:Planner Agent (planner_agent.py))
    • [🚀 运行实战:Demo 脚本 (`demo_planner.py`)](#🚀 运行实战:Demo 脚本 (demo_planner.py))
    • [📝 总结](#📝 总结)

🧠 超越 ReAct:手搓 Plan-and-Execute (Planner) Agent

"如果说 ReAct 是带指南针的探险家,那么 Planner 就是带施工图纸的工程师。"

在上一篇文章中,我们通过纯 Python 实现了 ReAct 模式。ReAct 的美妙之处在于它的灵活性------边走边看。但在处理长链路、高复杂度任务时,ReAct 容易陷入"局部最优"的陷阱,甚至在多步推理后忘记了最初的目标。

这次,我们通过 Plan-and-Execute 模式来解决这个问题。依然坚持:0 框架,全原生,纯逻辑


🏗️ 架构哲学:谋定而后动

Plan-and-Execute 模式将思考过程拆分为两个明确的阶段:

  1. Planning (规划):由一个"大脑"负责拆解问题,生成步骤清单。不执行,只规划。
  2. Executing (执行):由一个"工人"负责逐条执行步骤。工人不需要知道全局目标,只需要把手头的活干好。

这种分离带来了巨大的确定性。

逻辑架构图

Execution Phase
Planning Phase
Think
Yes
Need Tool?
Yes
No
No
User Query
Planner Agent
Generate Step-by-Step Plan

Step 1, Step 2, Step 3...

Start Execution
More Steps?
Fetch Next Step
Inject Previous Context
Executor Agent
Check
Call Tool
Observation
Step Result
Save to Context
Synthesize Final Answer
Final Output


🛠️ 核心实现:Planner Agent (planner_agent.py)

这是完整的 Planner Agent 实现。请注意我们是如何通过两个不同的 Prompt 分别控制"规划"和"执行"阶段的。我们不需要复杂的 Prompt Engineering,只需要强制模型输出一个 JSON 列表。

这里没有魔法。我们不需要 OutputParser 类,只需要 json.loads()。如果模型输出格式不对,那是 Prompt 写得不够强硬,或者是 Temperature 设置得太高(建议 0.1)。在执行阶段,我们将 context 字典直接注入到 System Prompt 中,实现显式的状态管理。

python 复制代码
import json
from typing import List, Dict, Any
from openai import OpenAI
from tools import ToolRegistry

class PlanAndExecuteAgent:
    """
    Plan-and-Execute Agent (Native Implementation)
    
    Philosophy:
    Instead of "thinking on the fly" (ReAct), this agent:
    1. PLANS: Breaks the complex task into a sequence of simple steps first.
    2. EXECUTES: Executes each step sequentially.
    
    This is better for complex tasks where maintaining long-term context in a single loop is difficult.
    """
    def __init__(self, model: str = "gpt-4o", tools_registry: ToolRegistry = None, api_key: str = None, base_url: str = None):
        self.client = OpenAI(api_key=api_key, base_url=base_url)
        self.model = model
        self.registry = tools_registry
        self.tool_descriptions = self._build_tool_descriptions()
        self.tool_names = ", ".join([s["name"] for s in self.registry.get_tools_schema()])

    def _build_tool_descriptions(self) -> str:
        schemas = self.registry.get_tools_schema()
        lines = []
        for s in schemas:
            lines.append(f"{s['name']}: {s['description']}")
            lines.append(f"    Args: {json.dumps(s['parameters'])}")
        return "\n".join(lines)

    def plan(self, query: str) -> List[str]:
        """
        Phase 1: The Planner
        Generates a list of logical steps to solve the problem.
        """
        print(f"\n🧠 Planning for: {query}")
        
        system_prompt = f"""
You are a global planner.
Your goal is to break down a complex user question into a sequence of simple, logical steps.
You have access to the following tools (but do not use them yet, just plan for them):
{self.tool_descriptions}

Output Format:
You must output a strict JSON list of strings. Each string is a step.
Example:
["Get the weather in Beijing", "Get the weather in New York", "Compare the temperatures"]

Do not output anything else. Just the JSON list.
"""
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": query}
        ]
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=messages,
            temperature=0.1 # Lower temperature for stable planning
        )
        
        content = response.choices[0].message.content.strip()
        # Clean up markdown if present
        content = content.replace("```json", "").replace("```", "").strip()
        
        try:
            steps = json.loads(content)
            print(f"📋 Generated Plan: {json.dumps(steps, indent=2, ensure_ascii=False)}")
            return steps
        except Exception as e:
            print(f"❌ Planning Failed: {e}. Output: {content}")
            return [query] # Fallback to treating the whole query as one step

    def execute_step(self, step: str, context: Dict[str, Any]) -> str:
        """
        Phase 2: The Executor (Solver)
        Executes a single step, having access to previous context.
        This is essentially a mini-ReAct or Function Calling loop, but we'll simplify it to a "One-Shot" tool use for this demo.
        """
        print(f"\n👉 Executing Step: {step}")
        
        # Context string
        context_str = "\n".join([f"Previous Step: {k} -> Result: {v}" for k, v in context.items()])
        
        system_prompt = f"""
You are a worker agent.
Your task is to execute the current step given the context of previous steps.
You have access to tools:
{self.tool_descriptions}

Context:
{context_str}

Current Step: {step}

Instructions:
1. If you can answer the step using the context, just answer.
2. If you need a tool, output a JSON object: {{"tool": "tool_name", "args": {{...}}}}
3. If you don't need a tool, output a JSON object: {{"answer": "your answer"}}

Output MUST be strict JSON.
"""
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": f"Execute this step: {step}"}
        ]
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=messages
        )
        
        content = response.choices[0].message.content.strip()
        content = content.replace("```json", "").replace("```", "").strip()
        
        try:
            result_json = json.loads(content)
            
            if "tool" in result_json:
                tool_name = result_json["tool"]
                tool_args = result_json["args"]
                print(f"🛠️ Worker invoking: {tool_name} with {tool_args}")
                observation = self.registry.execute(tool_name, tool_args)
                print(f"👀 Observation: {observation}")
                return f"Tool Output: {observation}"
            elif "answer" in result_json:
                return result_json["answer"]
            else:
                return str(result_json)
        except Exception as e:
            return f"Error executing step: {e}. Raw output: {content}"

    def run(self, query: str):
        # 1. Plan
        plan = self.plan(query)
        
        # 2. Execute Loop
        context = {}
        for step in plan:
            result = self.execute_step(step, context)
            context[step] = result
            print(f"✅ Step Result: {result}")
            
        # 3. Final Synthesis
        print("\n🏁 Synthesizing Final Answer...")
        final_prompt = f"""
Original Question: {query}

Execution History:
{json.dumps(context, indent=2, ensure_ascii=False)}

Please provide the final comprehensive answer based on the execution history.
"""
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": final_prompt}]
        )
        final_answer = response.choices[0].message.content
        print(f"\n🎉 Final Answer:\n{final_answer}")
        return final_answer

🚀 运行实战:Demo 脚本 (demo_planner.py)

为了运行这个 Agent,我们需要组装环境,加载 API Key,并注册一些测试用的工具。

python 复制代码
import os
import sys
from dotenv import load_dotenv
# Import main to ensure tools (get_weather, calculate) are registered in the registry
# In a real project, tools would be in a separate module like 'my_tools.py'
try:
    import main
except ImportError:
    # If main.py has issues or specific runtime code, we might define tools here
    pass

from tools import registry
from planner_agent import PlanAndExecuteAgent

# If main didn't register tools (e.g. if we refactored), let's ensure we have them
if "get_weather" not in registry._tools:
    @registry.register(name="get_weather", description="Get the current weather for a given city.")
    def get_weather(city: str):
        print(f"[System] Querying weather for {city}...")
        mock_data = {
            "Beijing": "Sunny, 25°C",
            "Shanghai": "Rainy, 22°C",
            "New York": "Cloudy, 15°C",
            "Tokyo": "Sunny, 20°C"
        }
        return mock_data.get(city, "Unknown city, assuming Sunny, 20°C")

    @registry.register(name="calculate", description="Calculate a mathematical expression.")
    def calculate(expression: str):
        print(f"[System] Calculating: {expression}")
        try:
            return eval(expression)
        except Exception as e:
            return f"Error: {e}"

def run_demo():
    load_dotenv(override=True)
    api_key = os.getenv("OPENAI_API_KEY")
    base_url = os.getenv("BASE_URL", "https://api.moonshot.cn/v1")
    model_name = os.getenv("MODEL_NAME", "moonshot-v1-8k")

    if not api_key:
        print("Error: OPENAI_API_KEY environment variable is not set.")
        return

    print(f"🤖 Initializing Plan-and-Execute Agent with model: {model_name}")
    
    agent = PlanAndExecuteAgent(
        model=model_name,
        tools_registry=registry,
        api_key=api_key,
        base_url=base_url
    )

    # A multi-step query suitable for planning
    query = "What is the temperature difference between Beijing and Shanghai? (Get weather for both first)"
    
    print(f"\n📢 User Query: {query}")
    agent.run(query)

if __name__ == "__main__":
    run_demo()

📝 总结

通过 planner_agent.py,我们再次证明了:

  1. Agent 不是黑盒 :它只是 While 循环、List 数据结构和 String 拼接。
  2. 控制权在开发者手中:我们可以精确控制 Plan 的生成、Context 的传递格式,以及每一步的容错逻辑。
  3. 原生最快:没有层层封装的 overhead,系统响应速度极快,调试极其简单。
相关推荐
美酒没故事°17 小时前
Open WebUI安装指南。搭建自己的自托管 AI 平台
人工智能·windows·ai
Csvn17 小时前
🌟 LangChain 30 天保姆级教程 · Day 13|OutputParser 进阶!让 AI 输出自动转为结构化对象,并支持自动重试!
python·langchain
鸿乃江边鸟17 小时前
Nanobot 从onboard启动命令来看个人助理Agent的实现
人工智能·ai
cch891818 小时前
Python主流框架全解析
开发语言·python
本旺18 小时前
【Openclaw 】完美解决 Codex 认证失败
ai·codex·openclaw·小龙虾·gpt5.4
sg_knight18 小时前
设计模式实战:状态模式(State)
python·ui·设计模式·状态模式·state
好运的阿财18 小时前
process 工具与子agent管理机制详解
网络·人工智能·python·程序人生·ai编程
张張40818 小时前
(域格)环境搭建和编译
c语言·开发语言·python·ai
weixin_4235339918 小时前
【Windows11离线安装anaconda、python、vscode】
开发语言·vscode·python
Ricky111zzz19 小时前
leetcode学python记录1
python·算法·leetcode·职场和发展