文章目录
- [🧠 超越 ReAct:手搓 Plan-and-Execute (Planner) Agent](#🧠 超越 ReAct:手搓 Plan-and-Execute (Planner) Agent)
-
- [🏗️ 架构哲学:谋定而后动](#🏗️ 架构哲学:谋定而后动)
- [🛠️ 核心实现:Planner Agent (`planner_agent.py`)](#🛠️ 核心实现:Planner Agent (
planner_agent.py)) - [🚀 运行实战:Demo 脚本 (`demo_planner.py`)](#🚀 运行实战:Demo 脚本 (
demo_planner.py)) - [📝 总结](#📝 总结)
🧠 超越 ReAct:手搓 Plan-and-Execute (Planner) Agent
"如果说 ReAct 是带指南针的探险家,那么 Planner 就是带施工图纸的工程师。"
在上一篇文章中,我们通过纯 Python 实现了 ReAct 模式。ReAct 的美妙之处在于它的灵活性------边走边看。但在处理长链路、高复杂度任务时,ReAct 容易陷入"局部最优"的陷阱,甚至在多步推理后忘记了最初的目标。
这次,我们通过 Plan-and-Execute 模式来解决这个问题。依然坚持:0 框架,全原生,纯逻辑。
🏗️ 架构哲学:谋定而后动
Plan-and-Execute 模式将思考过程拆分为两个明确的阶段:
- Planning (规划):由一个"大脑"负责拆解问题,生成步骤清单。不执行,只规划。
- Executing (执行):由一个"工人"负责逐条执行步骤。工人不需要知道全局目标,只需要把手头的活干好。
这种分离带来了巨大的确定性。
逻辑架构图
Execution Phase
Planning Phase
Think
Yes
Need Tool?
Yes
No
No
User Query
Planner Agent
Generate Step-by-Step Plan
Step 1, Step 2, Step 3...
Start Execution
More Steps?
Fetch Next Step
Inject Previous Context
Executor Agent
Check
Call Tool
Observation
Step Result
Save to Context
Synthesize Final Answer
Final Output
🛠️ 核心实现:Planner Agent (planner_agent.py)
这是完整的 Planner Agent 实现。请注意我们是如何通过两个不同的 Prompt 分别控制"规划"和"执行"阶段的。我们不需要复杂的 Prompt Engineering,只需要强制模型输出一个 JSON 列表。
这里没有魔法。我们不需要 OutputParser 类,只需要 json.loads()。如果模型输出格式不对,那是 Prompt 写得不够强硬,或者是 Temperature 设置得太高(建议 0.1)。在执行阶段,我们将 context 字典直接注入到 System Prompt 中,实现显式的状态管理。
python
import json
from typing import List, Dict, Any
from openai import OpenAI
from tools import ToolRegistry
class PlanAndExecuteAgent:
"""
Plan-and-Execute Agent (Native Implementation)
Philosophy:
Instead of "thinking on the fly" (ReAct), this agent:
1. PLANS: Breaks the complex task into a sequence of simple steps first.
2. EXECUTES: Executes each step sequentially.
This is better for complex tasks where maintaining long-term context in a single loop is difficult.
"""
def __init__(self, model: str = "gpt-4o", tools_registry: ToolRegistry = None, api_key: str = None, base_url: str = None):
self.client = OpenAI(api_key=api_key, base_url=base_url)
self.model = model
self.registry = tools_registry
self.tool_descriptions = self._build_tool_descriptions()
self.tool_names = ", ".join([s["name"] for s in self.registry.get_tools_schema()])
def _build_tool_descriptions(self) -> str:
schemas = self.registry.get_tools_schema()
lines = []
for s in schemas:
lines.append(f"{s['name']}: {s['description']}")
lines.append(f" Args: {json.dumps(s['parameters'])}")
return "\n".join(lines)
def plan(self, query: str) -> List[str]:
"""
Phase 1: The Planner
Generates a list of logical steps to solve the problem.
"""
print(f"\n🧠 Planning for: {query}")
system_prompt = f"""
You are a global planner.
Your goal is to break down a complex user question into a sequence of simple, logical steps.
You have access to the following tools (but do not use them yet, just plan for them):
{self.tool_descriptions}
Output Format:
You must output a strict JSON list of strings. Each string is a step.
Example:
["Get the weather in Beijing", "Get the weather in New York", "Compare the temperatures"]
Do not output anything else. Just the JSON list.
"""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": query}
]
response = self.client.chat.completions.create(
model=self.model,
messages=messages,
temperature=0.1 # Lower temperature for stable planning
)
content = response.choices[0].message.content.strip()
# Clean up markdown if present
content = content.replace("```json", "").replace("```", "").strip()
try:
steps = json.loads(content)
print(f"📋 Generated Plan: {json.dumps(steps, indent=2, ensure_ascii=False)}")
return steps
except Exception as e:
print(f"❌ Planning Failed: {e}. Output: {content}")
return [query] # Fallback to treating the whole query as one step
def execute_step(self, step: str, context: Dict[str, Any]) -> str:
"""
Phase 2: The Executor (Solver)
Executes a single step, having access to previous context.
This is essentially a mini-ReAct or Function Calling loop, but we'll simplify it to a "One-Shot" tool use for this demo.
"""
print(f"\n👉 Executing Step: {step}")
# Context string
context_str = "\n".join([f"Previous Step: {k} -> Result: {v}" for k, v in context.items()])
system_prompt = f"""
You are a worker agent.
Your task is to execute the current step given the context of previous steps.
You have access to tools:
{self.tool_descriptions}
Context:
{context_str}
Current Step: {step}
Instructions:
1. If you can answer the step using the context, just answer.
2. If you need a tool, output a JSON object: {{"tool": "tool_name", "args": {{...}}}}
3. If you don't need a tool, output a JSON object: {{"answer": "your answer"}}
Output MUST be strict JSON.
"""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": f"Execute this step: {step}"}
]
response = self.client.chat.completions.create(
model=self.model,
messages=messages
)
content = response.choices[0].message.content.strip()
content = content.replace("```json", "").replace("```", "").strip()
try:
result_json = json.loads(content)
if "tool" in result_json:
tool_name = result_json["tool"]
tool_args = result_json["args"]
print(f"🛠️ Worker invoking: {tool_name} with {tool_args}")
observation = self.registry.execute(tool_name, tool_args)
print(f"👀 Observation: {observation}")
return f"Tool Output: {observation}"
elif "answer" in result_json:
return result_json["answer"]
else:
return str(result_json)
except Exception as e:
return f"Error executing step: {e}. Raw output: {content}"
def run(self, query: str):
# 1. Plan
plan = self.plan(query)
# 2. Execute Loop
context = {}
for step in plan:
result = self.execute_step(step, context)
context[step] = result
print(f"✅ Step Result: {result}")
# 3. Final Synthesis
print("\n🏁 Synthesizing Final Answer...")
final_prompt = f"""
Original Question: {query}
Execution History:
{json.dumps(context, indent=2, ensure_ascii=False)}
Please provide the final comprehensive answer based on the execution history.
"""
response = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": final_prompt}]
)
final_answer = response.choices[0].message.content
print(f"\n🎉 Final Answer:\n{final_answer}")
return final_answer
🚀 运行实战:Demo 脚本 (demo_planner.py)
为了运行这个 Agent,我们需要组装环境,加载 API Key,并注册一些测试用的工具。
python
import os
import sys
from dotenv import load_dotenv
# Import main to ensure tools (get_weather, calculate) are registered in the registry
# In a real project, tools would be in a separate module like 'my_tools.py'
try:
import main
except ImportError:
# If main.py has issues or specific runtime code, we might define tools here
pass
from tools import registry
from planner_agent import PlanAndExecuteAgent
# If main didn't register tools (e.g. if we refactored), let's ensure we have them
if "get_weather" not in registry._tools:
@registry.register(name="get_weather", description="Get the current weather for a given city.")
def get_weather(city: str):
print(f"[System] Querying weather for {city}...")
mock_data = {
"Beijing": "Sunny, 25°C",
"Shanghai": "Rainy, 22°C",
"New York": "Cloudy, 15°C",
"Tokyo": "Sunny, 20°C"
}
return mock_data.get(city, "Unknown city, assuming Sunny, 20°C")
@registry.register(name="calculate", description="Calculate a mathematical expression.")
def calculate(expression: str):
print(f"[System] Calculating: {expression}")
try:
return eval(expression)
except Exception as e:
return f"Error: {e}"
def run_demo():
load_dotenv(override=True)
api_key = os.getenv("OPENAI_API_KEY")
base_url = os.getenv("BASE_URL", "https://api.moonshot.cn/v1")
model_name = os.getenv("MODEL_NAME", "moonshot-v1-8k")
if not api_key:
print("Error: OPENAI_API_KEY environment variable is not set.")
return
print(f"🤖 Initializing Plan-and-Execute Agent with model: {model_name}")
agent = PlanAndExecuteAgent(
model=model_name,
tools_registry=registry,
api_key=api_key,
base_url=base_url
)
# A multi-step query suitable for planning
query = "What is the temperature difference between Beijing and Shanghai? (Get weather for both first)"
print(f"\n📢 User Query: {query}")
agent.run(query)
if __name__ == "__main__":
run_demo()
📝 总结
通过 planner_agent.py,我们再次证明了:
- Agent 不是黑盒 :它只是
While循环、List数据结构和String拼接。 - 控制权在开发者手中:我们可以精确控制 Plan 的生成、Context 的传递格式,以及每一步的容错逻辑。
- 原生最快:没有层层封装的 overhead,系统响应速度极快,调试极其简单。