系列文章导航:AI系列文章导航目录-持续更新中
第14课:ReAct推理-行动框架
📝 本文摘要:本文详解ReAct框架------几乎所有现代Agent的推理基础。内容包括:纯ReAct问题(无全局规划、无错误反思、无记忆)、ReAct工作流(Thought→Action→Observation循环)、推理模式(直接推理/推理后行动/推理后放弃)、与传统规划方法对比(FSM/HTN/BFS)、现代Agent对ReAct的扩展(加入规划、反思、记忆),以及何时选择ReAct vs 规划优先的策略建议。
ReAct是Agent最核心的推理框架。几乎所有现代Agent都是ReAct的变体。理解它,你就理解了Agent"怎么想"。
一、ReAct是什么
1.1 论文来源
ReAct: Synergizing Reasoning and Acting in Language Models
作者: Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao
发表: ICLR 2023
机构: Princeton University + Google
1.2 核心思想
一句话理解:ReAct = 边想边做。就像你解决一个复杂问题时,不是先想好所有步骤再动手,也不是不动脑子就乱试,而是"想一步→做一步→看结果→再想下一步"。
之前的两种极端:
1. 纯推理 (Reasoning Only)
Chain-of-Thought(CoT,思维链): 只"想"不"做"
类比: 纸上谈兵,不去实际查证
→ 模型可能在推理过程中产生事实性错误,无法纠正
例: "我记得这个数据是..." ← 可能记错,但没法验证
2. 纯行动 (Acting Only)
直接调工具: 只"做"不"想"
类比: 不动脑子就乱试
→ 模型可能选错工具或参数,因为没想清楚
例: 直接调用删除API,没想过是否应该先确认
ReAct的洞见:
推理和行动应该交替进行!
想 → 做 → 看结果 → 再想 → 再做 → ...
"边想边做"比"光想不做"或"光做不想"都好
类比: 就像一个经验丰富的工程师排查bug:
"我觉得可能是数据库问题"(想) → 查日志(做) → "果然有连接超时"(看) →
"那我查下连接池"(再想) → 查指标(再做) → ...
二、ReAct的核心循环
2.1 标准ReAct循环
Question: 用户的原始问题
Thought 1: [推理] 我需要先了解X...
Action 1: [行动] tool_x(query="...")
Observation 1: [观察] 工具返回结果
Thought 2: [推理] 根据结果,我发现Y,还需要了解Z...
Action 2: [行动] tool_z(query="...")
Observation 2: [观察] 工具返回结果
Thought N: [推理] 我现在有足够的信息来回答了
Answer: [最终回答] ...
2.2 与纯CoT的对比
问题: "科罗拉多造山运动东部区域延伸到的地区海拔范围是多少?"
纯CoT (只推理):
Thought: 科罗拉多造山运动...东部区域...海拔范围...
我记得大概是... ← 可能记错
Answer: 海拔范围是1500-3000米 ← 错误
ReAct (推理+行动):
Thought: 我需要先查科罗拉多造山运动的信息
Action: search("科罗拉多造山运动")
Observation: 科罗拉多造山运动是...东部延伸到高原地区...
Thought: 需要查这个高原地区的海拔
Action: search("高原地区 海拔")
Observation: 该区域海拔1800-2400米
Thought: 找到答案了
Answer: 海拔范围是1800-2400米 ← 正确
核心区别:ReAct通过工具调用获取真实信息,避免了纯推理中的事实性错误。
三、ReAct的实现
3.1 最简ReAct Agent
python
from openai import OpenAI
import json
client = OpenAI()
REACT_SYSTEM_PROMPT = """你是一个ReAct智能体。请严格按照以下格式思考和行动:
Thought: 分析当前情况,决定下一步做什么
Action: 调用一个工具(使用function calling)
Observation: (工具返回的结果会自动填入)
重复Thought-Action-Observation循环,直到你有足够信息回答问题。
最后用Thought总结,然后给出最终回答。"""
def react_agent(question: str, tools: list, tool_map: dict, max_steps: int = 5) -> str:
messages = [
{"role": "system", "content": REACT_SYSTEM_PROMPT},
{"role": "user", "content": question}
]
for step in range(max_steps):
print(f"\n--- Step {step + 1} ---")
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=tools,
temperature=0.0
)
msg = response.choices[0].message
# 直接文本回复 = 任务完成
if msg.content and not msg.tool_calls:
print(f"Final Answer: {msg.content}")
return msg.content
# 工具调用
if msg.tool_calls:
messages.append(msg)
for tc in msg.tool_calls:
args = json.loads(tc.function.arguments)
print(f"Action: {tc.function.name}({args})")
result = tool_map[tc.function.name](**args)
result_str = json.dumps(result, ensure_ascii=False)
print(f"Observation: {result_str[:200]}")
messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": result_str
})
return "达到最大步数限制,未能完成任务。"
3.2 ReAct with Reflection(带反思的ReAct)
python
REFLECT_SYSTEM_PROMPT = """你是一个ReAct智能体,具备反思能力。
在每次Observation后,你需要评估:
1. 工具返回的结果是否回答了当前子问题?
2. 是否需要换一个策略?
3. 到目前为止收集的信息是否足以回答原始问题?
如果发现当前方法无效,及时调整策略。"""
def react_with_reflection(question: str, tools: list, tool_map: dict, max_steps: int = 7) -> str:
messages = [
{"role": "system", "content": REFLECT_SYSTEM_PROMPT},
{"role": "user", "content": question}
]
for step in range(max_steps):
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=tools,
temperature=0.0
)
msg = response.choices[0].message
if msg.content and not msg.tool_calls:
return msg.content
if msg.tool_calls:
messages.append(msg)
for tc in msg.tool_calls:
args = json.loads(tc.function.arguments)
result = tool_map[tc.function.name](**args)
messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": json.dumps(result, ensure_ascii=False)
})
# 在每轮结束后注入反思提示
if step < max_steps - 1:
messages.append({
"role": "user",
"content": "请反思当前进展,决定下一步行动。"
})
return "达到最大步数限制。"
四、ReAct的变体与演进
4.1 主要变体
1. ReAct (原版)
Thought → Action → Observation 循环
2. Reflexion (Shinn et al., 2023,反思Agent框架)
在ReAct基础上加入"自我反思"
失败后生成反思 → 存入记忆 → 下次避免同样错误
3. LATS (Zhou et al., 2023,Language Agent Tree Search,语言智能体树搜索)
把ReAct的线性搜索变成树搜索
尝试多条路径,选最优的
4. Plan-and-Execute
先做全局规划,再逐步执行
适合复杂任务,避免ReAct的"走一步看一步"
5. Reasoning + Acting + Planning (RAP,推理+行动+规划框架)
推理 + 行动 + 规划三合一
4.2 什么时候用什么
简单任务(1-3步): 标准ReAct
中等任务(3-7步): ReAct + Reflection
复杂任务(7+步): Plan-and-Execute 或 LATS
需要创造性: ReAct + 自我评估循环
五、ReAct的局限与解决
5.1 已知局限
1. 线性思维: 只能走一条路,错了需要重头再来
→ LATS提供多路径搜索
2. 上下文膨胀: 每步的Thought/Action/Observation都占token
→ 需要上下文压缩
3. 规划能力弱: "走一步看一步",缺少全局视角
→ Plan-and-Execute先做规划
4. 容易跑偏: 中间步骤可能偏离目标
→ Reflection机制纠偏
5. 死循环: 可能反复做同样的Action
→ 重复检测 + 最大步数限制
5.2 推理模型如何改进ReAct
传统LLM + ReAct:
Thought质量依赖Prompt,推理容易出错
推理模型(o1/R1) + ReAct:
Thought本身就是高质量推理链
不需要额外的"请一步步思考"
规划更可靠,选择更准确
趋势: 推理模型让ReAct从"勉强可用"变成"可靠"
📝 作业
作业1:实现一个带反思的ReAct Agent
基于本课的代码,实现一个能解决多步推理问题的Agent。场景:技术排障Agent。
参考答案:
python
from openai import OpenAI
import json
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
# 模拟工具
def check_service_status(service: str) -> dict:
statuses = {
"api-gateway": {"status": "running", "uptime": "99.9%", "version": "v3.2.1"},
"order-service": {"status": "degraded", "uptime": "95.2%", "error_rate": "4.8%"},
"payment-service": {"status": "running", "uptime": "99.8%"},
"user-service": {"status": "running", "uptime": "99.9%"},
}
return statuses.get(service, {"status": "unknown"})
def check_logs(service: str, level: str = "ERROR", minutes: int = 30) -> dict:
logs = {
"order-service": {
"ERROR": ["ConnectionPool exhausted at 14:23",
"Timeout waiting for payment-service at 14:25"],
"WARN": ["High latency detected at 14:20"]
}
}
return {"logs": logs.get(service, {}).get(level, [])}
def check_metrics(service: str, metric: str) -> dict:
metrics = {
"order-service": {
"cpu": {"value": "85%", "threshold": "80%"},
"memory": {"value": "4.2GB/8GB", "threshold": "7GB"},
"connections": {"value": "200/200", "threshold": "200"},
}
}
return metrics.get(service, {}).get(metric, {"value": "N/A"})
tools = [
{
"type": "function",
"function": {
"name": "check_service_status",
"description": "检查服务运行状态",
"parameters": {
"type": "object",
"properties": {"service": {"type": "string", "description": "服务名称"}},
"required": ["service"],
"additionalProperties": False
}
}
},
{
"type": "function",
"function": {
"name": "check_logs",
"description": "查看服务日志",
"parameters": {
"type": "object",
"properties": {
"service": {"type": "string", "description": "服务名称"},
"level": {"type": "string", "enum": ["ERROR", "WARN", "INFO"], "description": "日志级别"},
"minutes": {"type": "integer", "description": "查看最近多少分钟"}
},
"required": ["service"],
"additionalProperties": False
}
}
},
{
"type": "function",
"function": {
"name": "check_metrics",
"description": "查看服务监控指标",
"parameters": {
"type": "object",
"properties": {
"service": {"type": "string", "description": "服务名称"},
"metric": {"type": "string", "enum": ["cpu", "memory", "connections"], "description": "指标名"}
},
"required": ["service", "metric"],
"additionalProperties": False
}
}
}
]
tool_map = {
"check_service_status": check_service_status,
"check_logs": check_logs,
"check_metrics": check_metrics,
}
# 运行Agent
messages = [
{"role": "system", "content": """你是SRE排障Agent。使用ReAct模式进行故障诊断:
1. 先检查服务状态
2. 根据异常查看日志和指标
3. 分析根因并给出建议
每步思考后再行动,根据观察结果调整策略。"""},
{"role": "user", "content": "order-service延迟升高,请排查原因"}
]
for step in range(6):
print(f"\n=== Step {step+1} ===")
response = client.chat.completions.create(
model="qwen2.5:7b", messages=messages, tools=tools, temperature=0.0
)
msg = response.choices[0].message
if msg.content:
print(f"Thought/Answer: {msg.content[:300]}")
if not msg.tool_calls:
break
messages.append(msg)
for tc in msg.tool_calls:
args = json.loads(tc.function.arguments)
result = tool_map[tc.function.name](**args)
print(f"Action: {tc.function.name}({args})")
print(f"Observation: {json.dumps(result, ensure_ascii=False)[:200]}")
messages.append({
"role": "tool", "tool_call_id": tc.id,
"content": json.dumps(result, ensure_ascii=False)
})
下一篇文章见:AI系列文章导航目录-持续更新中