8.OpenClaw源码解析——三层洋葱重试

上节课我们学习了可靠消息投递，当大模型回复完消息之后，会将消息进行chunk切分（这里主要还是防止消息过长）并写入tmp.json, 之后后台会启一个线程，并扫描特定文件夹下的*.json文件，再发送，如果成功则删除该临时JSON文件。

但是如果大模型本身就有问题怎么办？比如API Key被限流，Key失效，上下文超长，请求超时等。

今天我们来讲三层洋葱重试，他是保障OpenClaw能够正常调用的机制。分别使用了key轮换， 压缩上下文和agent执行。

1.三层重试

1.1. 🧅 三层结构总览


层级	名称	职责
Layer 3	Agent 执行层	最核心、最基础，直接调用 LLM，不负责容错，异常向上抛出
Layer 2	上下文压缩层	处理上下文过长问题（截断 tool_result、压缩旧对话为摘要）
Layer 1	Key 轮换层	处理认证/计费/限流/超时等故障，将失败 Profile 置入冷却并切换下一可用 Key

调用顺序：Layer 3 →（若异常）→ Layer 2 →（若仍异常）→ Layer 1，就像剥洋葱，外层为内层兜底。

1.2. Layer3: Agent执行层

这是最内层，也是实际执行层 。它直接向 LLM 发送消息并处理工具调用，自身不捕获任何 API 异常，所有异常都会原样向外层传播。

python 复制代码

# Layer 3: 标准工具调用循环
# 运行 s01/s02 中的 while True + stop_reason 模式
# end_turn 时返回 (final_response, updated_messages)
# 任何 API 异常都向外层传播

current_messages = list(messages)
iteration = 0

while iteration < self.max_iterations:
    iteration += 1

    # 1. 将内部消息格式转换为 LangChain 消息格式
    from langchain_core.messages import HumanMessage, AIMessage, ToolMessage, SystemMessage
    
    lc_messages = [SystemMessage(content=system)]
    for msg in current_messages:
        if msg["role"] == "user":
            lc_messages.append(HumanMessage(content=msg["content"]))
        elif msg["role"] == "assistant":
            ai_msg = AIMessage(content=msg["content"])
            if "tool_calls" in msg and msg["tool_calls"]:
                ai_msg.tool_calls = msg["tool_calls"]
            lc_messages.append(ai_msg)  
        elif msg["role"] == "tool":
            lc_messages.append(ToolMessage(content=msg["content"], tool_call_id=msg.get("tool_call_id", "")))
    
    # 2. 绑定工具，发起调用（此处可能抛出 API 异常）
    client_with_tools = api_client.bind_tools(tools)
    response = client_with_tools.invoke(lc_messages)

    # 3. 处理响应 ------ 工具调用 或 直接回复
    if hasattr(response, 'tool_calls') and response.tool_calls:
        # 有工具调用请求
        tool_calls_info = []
        for tool_call in response.tool_calls:
            tool_calls_info.append({
                'id': tool_call.get('id', ''),
                'name': tool_call['name'],
                'args': tool_call['args']
            })
        
        # 保存助手消息（含工具调用请求）
        assistant_content = response.content if response.content else ""
        current_messages.append({
            "role": "assistant", 
            "content": assistant_content, 
            "tool_calls": tool_calls_info
        })
        
        # 执行工具调用，收集结果
        tool_results = []
        for tool_call in response.tool_calls:
            result = process_tool_call(tool_call['name'], tool_call['args'])
            tool_results.append({
                "role": "tool",
                "tool_call_id": tool_call.get('id', ''),
                "name": tool_call['name'],
                "content": result
            })
        # 将工具结果作为 user 角色消息追加，继续循环
        current_messages.append({
            "role": "user",
            "content": tool_results,
        })
        continue  # 进入下一轮，让 LLM 处理工具结果

    elif hasattr(response, 'content') and response.content:
        # 无工具调用，直接返回最终文本回复
        assistant_content = response.content
        current_messages.append({
            "role": "assistant",
            "content": assistant_content,
        })
        return response, current_messages

    else:
        # 意外响应（如空内容），视为 end_turn
        return response, current_messages

# 超过最大迭代次数仍无结果，抛出运行时错误
raise RuntimeError(
    f"Tool-use loop exceeded {self.max_iterations} iterations"
)

单纯执行：不处理任何重试或降级。
异常上抛 ：无论 invoke 抛出的认证错误、限流错误、超时错误还是上下文过长错误，都直接抛出，交给外层。
工具循环：支持多轮工具调用，直至得到最终文本回复或达到迭代上限。

Layer 2：上下文压缩层

当 Layer 3 因上下文过长 （如 context_length_exceeded）而抛出异常时，外层会先进入本层进行处理。本层主要有两步：

截断过长的工具调用结果（tool_result）

如果某次工具调用返回的结果文本过长（超过预设阈值），则进行截断（例如保留前 N 个字符，后加 ...(truncated)）。
这能有效压缩单条消息体积，避免因单条结果过大撑爆上下文窗口。

历史对话摘要压缩

当整个对话历史累积过长时，将较旧的对话轮次 抽取出来，调用 LLM 生成一份简洁的摘要。
然后将摘要作为系统消息或用户消息替换掉原来的多轮详细对话，保留最新几轮完整上下文，其余用摘要替代。
这样既保留了核心信息，又大幅降低了 token 消耗。

压缩后的消息会重新交给 Layer 3 重试，如果仍然过长，可递归压缩或最终交由 Layer 1 处理（但通常压缩后即可恢复）。

Layer1: Key 轮换层（处理认证、计费、限流、超时等）

当 Layer 2 也无法解决（或异常不属于上下文过长），或者压缩后依然报错，则由最外层 Key 轮换层 接管。

它负责识别故障类型，并将当前 Auth Profile 置入冷却（cooldown），然后自动切换到下一个可用 Profile（API Key / 端点）重试。

故障冷却

python 复制代码

elif reason in (FailoverReason.auth, FailoverReason.billing):
    # 认证失败（Key 无效/过期）或计费问题 → 冷却 5 分钟
    self.profile_manager.mark_failure(
        profile, reason, cooldown_seconds=300
    )
    break  # 尝试下一个 Profile

elif reason == FailoverReason.rate_limit:
    # 被限流（Rate Limit） → 冷却 2 分钟
    self.profile_manager.mark_failure(
        profile, reason, cooldown_seconds=120
    )
    break  # 尝试下一个 Profile

elif reason == FailoverReason.timeout:
    # 请求超时 → 冷却 1 分钟（通常网络波动，短冷却即可）
    self.profile_manager.mark_failure(
        profile, reason, cooldown_seconds=60
    )
    break  # 尝试下一个 Profile

else:
    # 未知故障 → 保守冷却 2 分钟
    self.profile_manager.mark_failure(
        profile, reason, cooldown_seconds=120
    )
    break

make failure的核心实现

python 复制代码

def mark_failure(
    self,
    profile: AuthProfile,
    reason: FailoverReason,
    cooldown_seconds: float = 300.0,
) -> None:
    """在失败后将配置置入冷却状态。

    默认冷却 5 分钟（适用于 auth/billing 类问题）。
    调用方可根据具体故障类型传入不同冷却时长（如超时用 60 秒）。
    """
    profile.cooldown_until = time.time() + cooldown_seconds
    profile.failure_reason = reason.value
    print_resilience(
        f"Profile '{profile.name}' -> cooldown {cooldown_seconds:.0f}s "
        f"(reason: {reason.value})"
    )

由此可知，整个pipeline的流程就是:

捕获 Layer 2/3 抛出的异常，解析出 FailoverReason。
获取当前使用的 Auth Profile。
调用 mark_failure 将该 Profile 置入冷却（期间不会被选用）。
break 跳出当前 Profile 的重试循环，自动选取下一个未冷却的 Profile。
使用新 Profile 重新从 Layer 3 开始执行