claw-code 源码分析：Turn Loop 里的工程细节——多轮对话如何在移植期保持可测试、可回放？

涉及源码 ：src/runtime.py（run_turn_loop、bootstrap_session）、src/query_engine.py、src/session_store.py、src/transcript.py、src/history.py、src/main.py、tests/test_porting_workspace.py。

1. Turn Loop 在仓库里的定位

PortRuntime.run_turn_loop 不是「模拟用户与模型你来我往的真实多轮产品逻辑」，而是一个 移植期探针 ：在同一组路由结果 上，对 同一个 QueryEnginePort 实例 连续调用多次 submit_message，观察：

会话状态（mutable_messages、total_usage、transcript_store）如何累积；
stop_reason 何时从 completed 变为 max_turns_reached 或 max_budget_reached；
--structured-output 打开后，每一轮 TurnResult.output 是否仍是稳定可解析的 JSON。

这样做的直接好处是：无网络、无 API Key、无随机温度 ，多轮行为在 CI 里可重复，满足「移植期可测试」；可回放 则通过另一条链路------持久化 JSON + load_session / from_saved_session + replay_user_messages------部分实现，下文分述。

2. `run_turn_loop`：循环与引擎配置的耦合

python 复制代码

# 154:167:src/runtime.py
    def run_turn_loop(self, prompt: str, limit: int = 5, max_turns: int = 3, structured_output: bool = False) -> list[TurnResult]:
        engine = QueryEnginePort.from_workspace()
        engine.config = QueryEngineConfig(max_turns=max_turns, structured_output=structured_output)
        matches = self.route_prompt(prompt, limit=limit)
        command_names = tuple(match.name for match in matches if match.kind == 'command')
        tool_names = tuple(match.name for match in matches if match.kind == 'tool')
        results: list[TurnResult] = []
        for turn in range(max_turns):
            turn_prompt = prompt if turn == 0 else f'{prompt} [turn {turn + 1}]'
            result = engine.submit_message(turn_prompt, command_names, tool_names, ())
            results.append(result)
            if result.stop_reason != 'completed':
                break
        return results

2.1 工程细节一：外层循环次数与 `QueryEngineConfig.max_turns` 同名同值

CLI 把 --max-turns 同时传给：

for turn in range(max_turns) ------ 最多尝试几轮 submit_message；
QueryEngineConfig(max_turns=max_turns) ------ 引擎内部允许写入 mutable_messages 的最大条数 （见 query_engine 里 len(self.mutable_messages) >= self.config.max_turns 的闸门）。

在默认用法下，两者一致，因此常见情况是：每一轮都能 completed，直到跑满循环次数 。若将来有人把「循环次数」与「引擎 max_turns」拆成不同参数，第 N 轮可能提前拿到 stop_reason='max_turns_reached'，run_turn_loop 会因 result.stop_reason != 'completed' 提前 break ------这是 可测试的显式出口，而不是死循环或静默失败。

2.2 工程细节二：路由只做一次，多轮共用 `command_names` / `tool_names`

route_prompt(prompt, limit) 在循环外 调用一次，整段 loop 复用同一 matches 推导出的 command_names / tool_names。含义是：

移植期 ：专注测 QueryEngine 状态机 + 输出格式，不引入「每轮重新理解意图」的变量。
产品期局限 ：真实多轮里用户意图会变，路由应每轮更新；当前实现是 刻意简化，阅读时勿误认为已是完整对话产品。

2.3 工程细节三：合成 turn 文本，保证每轮输入可区分

python 复制代码

turn_prompt = prompt if turn == 0 else f'{prompt} [turn {turn + 1}]'

同一语义骨架下微调字符串，使得：

mutable_messages 中各条 不相等，便于 diff 与调试；
UsageSummary.add_turn 按词数累加时，每轮输入长度略有变化，预算边界 可通过调整 max_budget_tokens 在测试中触发。

2.4 工程细节四：不传 `denied_tools`（空元组）

循环里固定 denied_tools=()，多轮不累积权限拒绝。与 bootstrap_session（带 _infer_permission_denials）对比，Turn Loop 走的是 「纯会话闸门 + 用量」 切片测试，权限审计在 单轮 bootstrap 报告 里练。

3. 可测试性：`TurnResult` 列表即「黄金轨迹」

run_turn_loop 返回 list[TurnResult]，每一元素包含 本轮输入、输出、匹配元数据、用量快照、停止原因：

python 复制代码

# 24:32:src/query_engine.py
@dataclass(frozen=True)
class TurnResult:
    prompt: str
    output: str
    matched_commands: tuple[str, ...]
    matched_tools: tuple[str, ...]
    permission_denials: tuple[PermissionDenial, ...]
    usage: UsageSummary
    stop_reason: str

单测/CI 可以断言：

轮次数：例如 test_turn_loop_cli_runs 使用 --max-turns 2，检查输出中出现 ## Turn 1 与 stop_reason=（见 tests/test_porting_workspace.py）。
某一固定 prompt + 固定快照下，matched_tools 是否非空（test_bootstrap_session_tracks_turn_state 针对 bootstrap，同一套路可迁到纯 QueryEnginePort 单测）。
structured_output=True 时，每轮 output 是否为合法 JSON（当前实现为 json.dumps 包一层 summary + session_id）。

冻结数据面 ：命令/工具来自 reference_data/*.json，路由算法确定性强，无模型随机性，这是「移植期可测试」的基石。

4. 可回放：三条互补路径

4.1 内存轨迹：`results: list[TurnResult]`

调用方若拿到 run_turn_loop 的返回值，已具备 按轮重放「当时引擎认为发生了什么」 的只读记录（适合单元测试内联断言）。

4.2 转写 `replay`：`TranscriptStore.replay()` / `QueryEnginePort.replay_user_messages`

python 复制代码

# 19:20:src/transcript.py
    def replay(self) -> tuple[str, ...]:
        return tuple(self.entries)

python 复制代码

# 134:135:src/query_engine.py
    def replay_user_messages(self) -> tuple[str, ...]:
        return self.transcript_store.replay()

submit_message 在成功路径上会对 mutable_messages 与 transcript_store 同步 append ，因此 用户侧消息序列 可通过 replay_user_messages() 取出，用于 不依赖磁盘的轻量回放 或与其他模块对拍。

注意：run_turn_loop 未调用 persist_session，因此默认 Turn Loop 结束后若不做持久化，磁盘上没有 该次多轮的 JSON；回放仅限进程内或通过自写测试保存 TurnResult。

4.3 磁盘会话：`persist_session` + `load_session` + `from_saved_session`

bootstrap_session 在单轮结束后调用 engine.persist_session()，把 session_id、messages、累计 input/output 伪 token 写入 .port_sessions/<id>.json：

python 复制代码

# 140:150:src/query_engine.py
    def persist_session(self) -> str:
        self.flush_transcript()
        path = save_session(
            StoredSession(
                session_id=self.session_id,
                messages=tuple(self.mutable_messages),
                input_tokens=self.total_usage.input_tokens,
                output_tokens=self.total_usage.output_tokens,
            )
        )
        return str(path)

测试链路 test_load_session_cli_runs：bootstrap → 取 persisted_session_path 的 stem → load-session ，验证 持久化与读取 闭环。

python 复制代码

# 49:59:src/query_engine.py
    @classmethod
    def from_saved_session(cls, session_id: str) -> 'QueryEnginePort':
        stored = load_session(session_id)
        transcript = TranscriptStore(entries=list(stored.messages), flushed=True)
        return cls(
            manifest=build_port_manifest(),
            session_id=stored.session_id,
            mutable_messages=list(stored.messages),
            total_usage=UsageSummary(stored.input_tokens, stored.output_tokens),
            transcript_store=transcript,
        )

可回放含义 ：新进程可 from_saved_session hydrate 引擎，继续 submit_message（若业务允许），实现 跨运行 的会话延续。当前 StoredSession 不保存 每轮 TurnResult.output 与累积 permission_denials，因此 严格说是「用户消息 + 用量摘要」级回放，不是完整对话录；移植期够用，产品期需扩展 schema。

4.4 人类可读报告：`RuntimeSession.as_markdown` + `HistoryLog`

单轮 bootstrap_session 把 上下文、setup、路由、执行 shim、流式事件、turn_result、持久化路径 与 HistoryLog 事件链 打成一篇 Markdown：

python 复制代码

# 135:138:src/runtime.py
        history.add('routing', f'matches={len(matches)} for prompt={prompt!r}')
        history.add('execution', f'command_execs={len(command_execs)} tool_execs={len(tool_execs)}')
        history.add('turn', f'commands={len(turn_result.matched_commands)} tools={len(turn_result.matched_tools)} denials={len(turn_result.permission_denials)} stop={turn_result.stop_reason}')
        history.add('session_store', persisted_session_path)

python 复制代码

# 19:22:src/history.py
    def as_markdown(self) -> str:
        lines = ['# Session History', '']
        lines.extend(f'- {event.title}: {event.detail}' for event in self.events)
        return '\n'.join(lines)

适合 PR 审查、故障单附件：一眼看到路由与 stop 原因，无需复现者本地再跑模型。

5. 与 `bootstrap_session` 的对比（单轮 vs 多轮）

维度	`bootstrap_session`	`run_turn_loop`
`submit_message` 次数	1	最多 `max_turns`
权限拒绝	`_infer_permission_denials(matches)`	固定 `()`
流式事件	`stream_submit_message` → `stream_events`	不收集
持久化	`persist_session`	无
`HistoryLog`	有	无
典型用途	单轮「全链路报告」+ 落盘 session id	压状态机 / 预算 / 结构化输出

二者互补：bootstrap 练「审计叙事与落盘」 ，turn-loop 练「多轮状态累积与早停」。

6. CLI 与测试如何锁住行为

CLI （src/main.py）：

python 复制代码

# 153:159:src/main.py
    if args.command == 'turn-loop':
        results = PortRuntime().run_turn_loop(args.prompt, limit=args.limit, max_turns=args.max_turns, structured_output=args.structured_output)
        for idx, result in enumerate(results, start=1):
            print(f'## Turn {idx}')
            print(result.output)
            print(f'stop_reason={result.stop_reason}')
        return 0

测试 test_turn_loop_cli_runs：子进程跑 turn-loop，断言输出含 ## Turn 1 与 stop_reason= ------ 端到端、无 mock ，与 test_bootstrap_cli_runs、test_load_session_cli_runs 一起构成 会话相关 的回归网。

7. 移植期建议与演进方向（基于现状）

可测试 ：保持 TurnResult 稳定；路由与快照 JSON 固定；Turn Loop 继续作为 确定性压力小工具。
可回放 ：若要多轮落盘，可在 run_turn_loop 末尾可选调用 persist_session，或扩展 StoredSession 保存每轮 output / stop_reason。
真实多轮 ：将 route_prompt 移入循环内，并传入 上一轮 assistant 输出 （或工具结果），才接近产品语义；当前实现是 有意减变量 的脚手架。
预算测试 ：调低 QueryEngineConfig.max_budget_tokens（需在 run_turn_loop 暴露参数或构造专用测试入口），可稳定触发 max_budget_reached 与提前 break 分支。

8. 小结

Turn Loop 通过 确定性路由 + 冻结的 matched 元组 + 合成 turn 文本 ，在 无 LLM 条件下演练 多轮状态累积与停止语义 ，并以 list[TurnResult] 作为可断言轨迹。
可回放 依赖 TranscriptStore / replay_user_messages（进程内） 与 persist_session + load_session / from_saved_session（跨进程）；Turn Loop 默认不落盘，与 bootstrap 分工明确。
HistoryLog + Markdown 报告 提供 人类可读 的单轮审计切片，与多轮机器可断言结果形成 测试金字塔 的两层。

claw-code 源码分析：Turn Loop 里的工程细节——多轮对话如何在移植期保持可测试、可回放？

1. Turn Loop 在仓库里的定位

2. run_turn_loop：循环与引擎配置的耦合

2.1 工程细节一：外层循环次数 与 QueryEngineConfig.max_turns 同名同值

2.2 工程细节二：路由只做一次 ，多轮共用 command_names / tool_names

2.3 工程细节三：合成 turn 文本，保证每轮输入可区分

2.4 工程细节四：不传 denied_tools（空元组）

3. 可测试性：TurnResult 列表即「黄金轨迹」

4. 可回放：三条互补路径

4.1 内存轨迹：results: list[TurnResult]

4.2 转写 replay：TranscriptStore.replay() / QueryEnginePort.replay_user_messages

4.3 磁盘会话：persist_session + load_session + from_saved_session

4.4 人类可读报告：RuntimeSession.as_markdown + HistoryLog

5. 与 bootstrap_session 的对比（单轮 vs 多轮）

6. CLI 与测试如何锁住行为

7. 移植期建议与演进方向（基于现状）

8. 小结

2. `run_turn_loop`：循环与引擎配置的耦合

2.1 工程细节一：外层循环次数与 `QueryEngineConfig.max_turns` 同名同值

2.2 工程细节二：路由只做一次，多轮共用 `command_names` / `tool_names`

2.4 工程细节四：不传 `denied_tools`（空元组）

3. 可测试性：`TurnResult` 列表即「黄金轨迹」

4.1 内存轨迹：`results: list[TurnResult]`

4.2 转写 `replay`：`TranscriptStore.replay()` / `QueryEnginePort.replay_user_messages`

4.3 磁盘会话：`persist_session` + `load_session` + `from_saved_session`

4.4 人类可读报告：`RuntimeSession.as_markdown` + `HistoryLog`

5. 与 `bootstrap_session` 的对比（单轮 vs 多轮）