Codex `/review` 命令：从用户选择到 reviewer 子会话，再到结构化 findings

/review 不是普通聊天里说一句"帮我 review 一下"。它是 Codex 的一个专门审查流程：

text 复制代码

用户选择 review 目标
-> Codex 生成一条明确的 review request
-> 启动隔离的 reviewer 子会话
-> reviewer 按 code-review rubric 输出结构化 JSON
-> 主线程收到 EnteredReviewMode / ExitedReviewMode 事件
-> review 结果被写回主线程，后续可以继续让 Codex 修复某些 finding

它的核心目标不是"总结改了什么"，而是找出作者会愿意修的真实问题：bug、风险、行为回归、缺失测试、性能/安全/可维护性问题。

场景一：用户输入 `/review`，为什么会弹出 preset，而不是直接发给模型

用户在 TUI 里输入：

text 复制代码

/review

Codex 不会立刻把这句话作为普通用户消息发给模型。它先打开 review preset 选择界面。

用户能选择的典型目标是：

text 复制代码

Review against a base branch
Review uncommitted changes
Review a commit
Custom review instructions

相关 Slash Command Description / 命令说明

英文原文：

text 复制代码

review my current changes and find issues

中文对照：

text 复制代码

审查我当前的改动并找出问题。

这段是 slash command 在 UI 里的说明，不是模型提示词。它定义的是用户入口：/review 默认面向代码改动审查，而不是泛泛的解释或总结。

如果用户输入带参数的命令：

text 复制代码

/review focus on concurrency bugs in the executor

Codex 会把后面的文本作为 custom review instructions，而不是弹出 preset。

text 复制代码

ReviewTarget::Custom {
  instructions: "focus on concurrency bugs in the executor"
}

这就是 /review 和普通聊天的一处区别：普通聊天会让当前主模型直接响应；/review 会先把用户意图转成结构化 review target。

场景二：审查未提交改动时，reviewer 实际收到什么任务

用户选择：

text 复制代码

Review uncommitted changes

Codex 会生成一条非常直接的 reviewer 输入。

相关 Runtime Prompt / 未提交改动 review prompt

英文原文：

text 复制代码

Review the current code changes (staged, unstaged, and untracked files) and provide prioritized findings.

中文对照：

text 复制代码

审查当前代码改动，包括 staged、unstaged 和 untracked files，并给出按优先级排序的 findings。

这段 prompt 的作用是限定 reviewer 的审查范围：

text 复制代码

不是审查整个仓库。
不是解释当前 diff。
不是顺手修代码。
而是看当前工作区改动，并输出优先级 findings。

reviewer 接下来通常会自己运行 Git 命令、读取 diff、查看相关文件。模型是否调用工具，来自 reviewer 子会话里可见的工具 schema 和这条任务 prompt 的组合判断。

场景三：审查 base branch 时，为什么它会先算 merge base

用户选择：

text 复制代码

Review against a base branch

然后选择：

text 复制代码

main

Codex 会尝试先计算当前 HEAD 和 base branch 的 merge base。如果拿到了 merge base，它会生成更确定的 prompt。

相关 Runtime Prompt / base branch review prompt

英文原文：

text 复制代码

Review the code changes against the base branch '{{base_branch}}'. The merge base commit for this comparison is {{merge_base_sha}}. Run `git diff {{merge_base_sha}}` to inspect the changes relative to {{base_branch}}. Provide prioritized, actionable findings.

中文对照：

text 复制代码

审查相对于 base branch `{{base_branch}}` 的代码改动。本次比较的 merge base commit 是 {{merge_base_sha}}。运行 `git diff {{merge_base_sha}}` 来查看相对于 {{base_branch}} 的改动。给出按优先级排序、可执行的 findings。

如果暂时算不出 merge base，Codex 会使用备用 prompt，让 reviewer 自己找 upstream 和 merge base。

相关 Runtime Prompt / base branch backup prompt

英文原文：

text 复制代码

Review the code changes against the base branch '{{branch}}'. Start by finding the merge diff between the current branch and {{branch}}'s upstream e.g. (`git merge-base HEAD "$(git rev-parse --abbrev-ref "{{branch}}@{upstream}")"`), then run `git diff` against that SHA to see what changes we would merge into the {{branch}} branch. Provide prioritized, actionable findings.

中文对照：

text 复制代码

审查相对于 base branch `{{branch}}` 的代码改动。先找当前分支和 {{branch}} upstream 之间的 merge diff，例如运行 `git merge-base HEAD "$(git rev-parse --abbrev-ref "{{branch}}@{upstream}")"`，然后对这个 SHA 运行 `git diff`，查看会合入 {{branch}} 分支的改动。给出按优先级排序、可执行的 findings。

这两个 prompt 的设计重点是：reviewer 不应该猜测审查范围。它要围绕"这次会合入 base branch 的 diff"做判断。

场景四：审查某个 commit 时，reviewer 不是看工作区，而是看 commit 引入的改动

用户选择：

text 复制代码

Review a commit

然后选择最近提交：

text 复制代码

deadbeef Fix parser panic

Codex 生成的 prompt 是：

相关 Runtime Prompt / commit review prompt

英文原文：

text 复制代码

Review the code changes introduced by commit {{sha}} ("{{title}}"). Provide prioritized, actionable findings.

中文对照：

text 复制代码

审查 commit {{sha}}（"{{title}}"）引入的代码改动。给出按优先级排序、可执行的 findings。

如果没有 commit title，则是：

text 复制代码

Review the code changes introduced by commit {{sha}}. Provide prioritized, actionable findings.

中文对照：

text 复制代码

审查 commit {{sha}} 引入的代码改动。给出按优先级排序、可执行的 findings。

这个目标和未提交改动不同。reviewer 应该围绕指定 commit 的 diff 找问题，而不是顺手审查当前工作区里所有未提交内容。

场景五：reviewer 子会话看到的基础审查规则

/review 会启动一个 reviewer 子会话。这个子会话使用专门的 review base instructions，而不是普通 Codex 编码助手人格。

最关键的系统级规则如下。

相关 System Prompt / reviewer 基础审查规则

英文原文：

text 复制代码

# Review guidelines:

You are acting as a reviewer for a proposed code change made by another engineer.

Below are some default guidelines for determining whether the original author would appreciate the issue being flagged.

These are not the final word in determining whether an issue is a bug. In many cases, you will encounter other, more specific guidelines. These may be present elsewhere in a developer message, a user message, a file, or even elsewhere in this system message.
Those guidelines should be considered to override these general instructions.

Here are the general guidelines for determining whether something is a bug and should be flagged.

1. It meaningfully impacts the accuracy, performance, security, or maintainability of the code.
2. The bug is discrete and actionable (i.e. not a general issue with the codebase or a combination of multiple issues).
3. Fixing the bug does not demand a level of rigor that is not present in the rest of the codebase (e.g. one doesn't need very detailed comments and input validation in a repository of one-off scripts in personal projects)
4. The bug was introduced in the commit (pre-existing bugs should not be flagged).
5. The author of the original PR would likely fix the issue if they were made aware of it.
6. The bug does not rely on unstated assumptions about the codebase or author's intent.
7. It is not enough to speculate that a change may disrupt another part of the codebase, to be considered a bug, one must identify the other parts of the code that are provably affected.
8. The bug is clearly not just an intentional change by the original author.

中文对照：

text 复制代码

# Review 指南：

你正在作为 reviewer，审查另一位工程师提出的代码改动。

下面是一些默认指南，用来判断原作者是否会希望这个问题被指出。

这些规则不是判断 bug 的最终标准。很多情况下，你会看到其他更具体的规则。它们可能出现在 developer message、user message、文件，甚至同一个 system message 的其他位置。
这些更具体的规则应覆盖这里的一般说明。

判断某件事是否是应该指出的 bug，一般标准如下：

1. 它会实质影响代码的准确性、性能、安全性或可维护性。
2. 这个 bug 是离散且可执行的，不是泛泛的代码库问题，也不是多个问题混在一起。
3. 修复这个 bug 不需要超出当前代码库整体严谨程度的额外要求。
4. 这个 bug 是当前 commit 引入的，已有问题不应指出。
5. 如果原 PR 作者知道这个问题，很可能会修。
6. 这个 bug 不依赖对代码库或作者意图的未说明假设。
7. 不能只猜测某个改动可能影响其他部分；必须指出被确实影响的其他代码路径。
8. 这个问题明显不是作者有意做出的行为改变。

这段 prompt 的作用是把 reviewer 从"挑毛病"约束成"只报有修复价值的问题"。因此 /review 的输出通常不会列一堆风格建议，除非风格问题已经影响理解、维护或违反明确标准。

场景六：reviewer 的评论应该长什么样

reviewer 不仅要找问题，还要把每条 finding 写成适合代码审查的 inline comment。

相关 System Prompt / finding comment 规则

英文原文：

text 复制代码

When flagging a bug, you will also provide an accompanying comment. Once again, these guidelines are not the final word on how to construct a comment -- defer to any subsequent guidelines that you encounter.

1. The comment should be clear about why the issue is a bug.
2. The comment should appropriately communicate the severity of the issue. It should not claim that an issue is more severe than it actually is.
3. The comment should be brief. The body should be at most 1 paragraph. It should not introduce line breaks within the natural language flow unless it is necessary for the code fragment.
4. The comment should not include any chunks of code longer than 3 lines. Any code chunks should be wrapped in markdown inline code tags or a code block.
5. The comment should clearly and explicitly communicate the scenarios, environments, or inputs that are necessary for the bug to arise. The comment should immediately indicate that the issue's severity depends on these factors.
6. The comment's tone should be matter-of-fact and not accusatory or overly positive. It should read as a helpful AI assistant suggestion without sounding too much like a human reviewer.
7. The comment should be written such that the original author can immediately grasp the idea without close reading.
8. The comment should avoid excessive flattery and comments that are not helpful to the original author. The comment should avoid phrasing like "Great job ...", "Thanks for ...".

中文对照：

text 复制代码

指出 bug 时，还要提供一条配套评论。这些规则同样不是评论写法的最终标准，应遵循之后遇到的更具体规则。

1. 评论要清楚说明为什么这是 bug。
2. 评论要恰当表达严重程度，不能夸大。
3. 评论要简短，正文最多一段。除非代码片段需要，不要在自然语言中强行换行。
4. 评论中不要包含超过 3 行的代码块。代码片段应使用 markdown inline code 或 code block。
5. 评论要明确说明触发 bug 所需的场景、环境或输入，并立刻表明严重程度依赖这些因素。
6. 语气要就事论事，不指责，也不过度积极；像有帮助的 AI 建议，而不是太像人类 reviewer。
7. 原作者应能不费力地立刻理解问题。
8. 避免无意义夸奖，例如 "Great job ..." 或 "Thanks for ..."。

所以 reviewer 的理想输出不是：

text 复制代码

这里可能有点问题，建议看看。

而是：

text 复制代码

[P1] Preserve failed command output for retries
When the sandbox retry path drops stderr, callers cannot distinguish a permission denial from a command failure, so retrying without sandbox can mask the original error for any command that writes diagnostics only to stderr.

它必须能落到具体文件和行号。

场景七：reviewer 输出为什么是 JSON

reviewer 子会话的最终输出被要求写成 JSON，但这个要求来自 reviewer 的 system prompt，不是当前 /review 路径传入了 API 级 strict JSON schema。

也就是说，约束链路是：

英文原文：

text 复制代码

OUTPUT FORMAT:

## Output schema  --- MUST MATCH *exactly*

```json
{
  "findings": [
    {
      "title": "<≤ 80 chars, imperative>",
      "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>",
      "confidence_score": <float 0.0-1.0>,
      "priority": <int 0-3, optional>,
      "code_location": {
        "absolute_file_path": "<file path>",
        "line_range": {"start": <int>, "end": <int>}
      }
    }
  ],
  "overall_correctness": "patch is correct" | "patch is incorrect",
  "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>",
  "overall_confidence_score": <float 0.0-1.0>
}
```

* **Do not** wrap the JSON in markdown fences or extra prose.
* The code_location field is required and must include absolute_file_path and line_range.
* Line ranges must be as short as possible for interpreting the issue (avoid ranges over 5--10 lines; pick the most suitable subrange).
* The code_location should overlap with the diff.
* Do not generate a PR fix.

中文对照：

text 复制代码

输出格式：

## 输出 schema ------ 必须精确匹配

```json
{
  "findings": [
    {
      "title": "<不超过 80 个字符，祈使式标题>",
      "body": "<有效 Markdown，解释为什么这是问题，并引用文件/行/函数>",
      "confidence_score": <0.0 到 1.0 的浮点数>,
      "priority": <0 到 3 的整数，可选>,
      "code_location": {
        "absolute_file_path": "<文件路径>",
        "line_range": {"start": <起始行>, "end": <结束行>}
      }
    }
  ],
  "overall_correctness": "patch is correct" | "patch is incorrect",
  "overall_explanation": "<1 到 3 句话解释 overall_correctness>",
  "overall_confidence_score": <0.0 到 1.0 的浮点数>
}
```

* 不要把 JSON 包在 markdown 代码块里，也不要加额外说明。
* code_location 字段是必填的，必须包含 absolute_file_path 和 line_range。
* 行范围必须尽可能短，只覆盖解释问题所需的最小范围，避免超过 5 到 10 行。
* code_location 必须和 diff 重叠。
* 不要生成 PR 修复。

这段 system prompt 的作用是让 review 结果可以被 UI、app-server、主线程历史统一消费：

text 复制代码

findings：
  每条具体问题。

priority：
  P0/P1/P2/P3 对应 0/1/2/3。

code_location：
  指向 diff 中最短且足够解释问题的行范围。

overall_correctness：
  判断 patch 是否整体正确。

但这里要注意一个实现细节：当前 /review 启动 reviewer 子会话时没有传 final_output_json_schema，也就是说它不是 Responses API 的 strict structured output。

text 复制代码

final_output_json_schema = None

因此，Codex 会在 reviewer 返回后做容错解析：

text 复制代码

1. 先尝试把完整输出解析成 ReviewOutputEvent。
2. 如果失败，再尝试从输出中截取第一个 `{ ... }` JSON 对象解析。
3. 如果仍然失败，就把原文放进 overall_explanation，findings 为空。

所以准确说法是：

text 复制代码

/review 的 JSON 格式由 reviewer system prompt 强约束。
当前实现还会做解析和 fallback。
它不是 API 层的强制 JSON schema。

这解释了 /review 的边界：它负责发现问题并按 JSON schema 返回 review result，不负责直接改代码。

场景八：review 子会话为什么是隔离的

/review 会启动 reviewer 子会话，而不是让当前主线程模型直接切换身份。

隔离的主要效果是：

text 复制代码

1. reviewer 使用专门的 review base instructions。
2. reviewer 可以使用 review_model；如果没有配置，则沿用当前模型。
3. review 期间禁用或限制一些不适合审查流程的能力，例如 web search、协作/多 agent 扩展、部分工具能力。
4. reviewer 的最终原始 JSON 不直接作为普通聊天输出展示，而是被解析成 review result。

这让 /review 的行为比"请你 review 一下"更稳定：

text 复制代码

普通聊天：
  当前模型直接按当前上下文和系统提示回答，可能混入解释、建议、修复计划。

/review：
  子会话按 review rubric 工作，输出结构化 findings，并把结果回写主线程。

隔离也有代价。reviewer 子会话仍然要自己读取 diff、理解代码、判断问题；如果 diff 很大、测试缺失、上下文不足，它仍可能漏报或误报。

场景九：review 完成后，主线程为什么会看到 `<user_action>`

reviewer 完成后，Codex 会向主线程写入一段结构化 user action，让后续模型知道"用户刚刚发起过 review，并得到了这些结果"。

相关 Runtime Prompt / review 成功回写模板

英文原文：

xml 复制代码

<user_action>
  <context>User initiated a review task. Here's the full review output from reviewer model. User may select one or more comments to resolve.</context>
  <action>review</action>
  <results>
  {{results}}
  </results>
  </user_action>

中文对照：

xml 复制代码

<user_action>
  <context>用户发起了一个 review 任务。下面是 reviewer 模型的完整 review 输出。用户可以选择一个或多个评论来解决。</context>
  <action>review</action>
  <results>
  {{results}}
  </results>
  </user_action>

这段不是给 reviewer 的 prompt，而是 review 完成后写回主线程的运行时上下文。它的作用是让下一轮主模型知道：

text 复制代码

这是 review 结果。
不是用户自己写的一段普通文本。
用户后续可能会说"修第一个问题"或"把这些都处理掉"。

如果 review 被中断，Codex 写入的是另一段模板。

相关 Runtime Prompt / review 中断回写模板

英文原文：

xml 复制代码

<user_action>
  <context>User initiated a review task, but was interrupted. If user asks about this, tell them to re-initiate a review with `/review` and wait for it to complete.</context>
  <action>review</action>
  <results>
  None.
  </results>
</user_action>

中文对照：

xml 复制代码

<user_action>
  <context>用户发起了一个 review 任务，但它被中断了。如果用户问起此事，告诉他们用 `/review` 重新发起 review，并等待它完成。</context>
  <action>review</action>
  <results>
  None.
  </results>
</user_action>

所以 review 结果不仅是 UI 展示，它还会成为后续主线程可见的任务上下文。

场景十：app-server 里的 `review/start` 是什么

在 app-server 协议里，/review 对应的是一个显式 API：

text 复制代码

review/start

相关 API Schema / ReviewStartParams

英文原文：

text 复制代码

ReviewStartParams:
  threadId: string
  target: ReviewTarget
  delivery?: "inline" | "detached" | null

ReviewTarget:
  { type: "uncommittedChanges" }
  { type: "baseBranch", branch: string }
  { type: "commit", sha: string, title: string | null }
  { type: "custom", instructions: string }

中文对照：

text 复制代码

ReviewStartParams：
  threadId：要发起 review 的线程 ID
  target：review 目标
  delivery：可选，"inline" 或 "detached"

ReviewTarget：
  { type: "uncommittedChanges" }：审查当前工作区改动
  { type: "baseBranch", branch: string }：审查相对于某个 base branch 的改动
  { type: "commit", sha: string, title: string | null }：审查某个 commit
  { type: "custom", instructions: string }：按自定义说明审查

delivery 有两个含义：

text 复制代码

inline：
  review 在原线程中作为 review mode 执行，结果回写原线程。

detached：
  review 在新的 review thread 中执行，响应里返回 reviewThreadId。

TUI 默认使用 inline，因此用户看到的是当前线程进入 review mode，然后退出 review mode 并展示结果。

场景十一：`/review` 和 auto-review / Guardian review 不是一回事

Codex 里还有"auto-review""guardian review"等词，容易混淆。

text 复制代码

/review：
  用户主动发起代码审查，目标是找代码改动里的问题。

auto-review / Guardian approval review：
  Codex 在执行高风险命令、权限升级、网络访问、MCP 操作等场景下，用 reviewer 判断是否批准动作。

它们都叫 review，但目标完全不同：

text 复制代码

/review 关注代码质量。
Guardian review 关注操作是否该被批准。

因此 /review 输出的是 findings、overall_correctness、code_location；Guardian review 输出的是 approve/deny 之类的权限决策。

场景十二：什么时候用 `/review`，什么时候直接让 Codex 看代码

适合用 /review 的情况：

text 复制代码

提交前想查当前 diff。
PR 前想做一次独立审查。
想审查某个 commit 是否引入回归。
想按 base branch 看这次会 merge 什么。
想得到结构化 findings，后续逐条修。

更适合普通聊天的情况：

text 复制代码

想理解某段代码。
想让 Codex 解释设计。
想边改边问 tradeoff。
想直接实现修复，而不是先独立审查。

/review 的强项是"独立审查"。它会刻意限制输出格式和 finding 标准，减少闲聊和泛泛建议。

最终链路

把整个 /review 流程压缩成一条链：

text 复制代码

1. 用户输入 /review 或 /review <custom instructions>。
2. TUI 或 app-server 把它转成 ReviewTarget。
3. Codex 根据 target 生成明确的 review prompt。
4. Codex 启动 reviewer 子会话。
5. reviewer 使用专门 review base instructions。
6. reviewer 读取 diff / 文件 / 命令输出，找离散且可修的问题。
7. reviewer 输出结构化 JSON。
8. Codex 解析 JSON 成 ReviewOutputEvent。
9. UI 展示 EnteredReviewMode / ExitedReviewMode 和 findings。
10. Codex 把 review 结果作为 `<user_action>` 写回主线程历史。
11. 用户可以继续要求"修第一个 finding"或"处理这些 review comments"。

所以 /review 的作用不是让主模型换一种语气说话，而是把一次代码审查变成可控、可解析、可回写、可继续处理的专门任务。

Codex `/review` 命令：从用户选择到 reviewer 子会话，再到结构化 findings

场景一：用户输入 /review，为什么会弹出 preset，而不是直接发给模型

场景二：审查未提交改动时，reviewer 实际收到什么任务

场景三：审查 base branch 时，为什么它会先算 merge base

场景四：审查某个 commit 时，reviewer 不是看工作区，而是看 commit 引入的改动

场景五：reviewer 子会话看到的基础审查规则

场景六：reviewer 的评论应该长什么样

场景七：reviewer 输出为什么是 JSON

场景八：review 子会话为什么是隔离的

场景九：review 完成后，主线程为什么会看到 <user_action>

场景十：app-server 里的 review/start 是什么

场景十一：/review 和 auto-review / Guardian review 不是一回事

场景十二：什么时候用 /review，什么时候直接让 Codex 看代码

最终链路

场景一：用户输入 `/review`，为什么会弹出 preset，而不是直接发给模型

场景九：review 完成后，主线程为什么会看到 `<user_action>`

场景十：app-server 里的 `review/start` 是什么

场景十一：`/review` 和 auto-review / Guardian review 不是一回事

场景十二：什么时候用 `/review`，什么时候直接让 Codex 看代码