iceCoder：验收门控深度分析

概述

本项目的验收门控（Acceptance Gating）机制是一套多层次的质量保障系统，确保任务交付前满足预定义的验收标准。系统包含两大核心门控：

Acceptance Gate（验收命令门控）- 针对多步骤验收流程（如 benchmark 场景）
Verification Gate（验证门控）- 针对代码变更后的单元测试验证

一、Acceptance Gate（验收命令门控）

1.1 核心架构

文件位置: src/harness/task-acceptance-tracker.ts

核心类: TaskAcceptanceTracker

激活条件

ini 复制代码

// 从 goal 解析验收命令
const parsed = parseAcceptanceCommandsFromGoal(goal);
// 仅当 ≥2 条命令且为长跑 benchmark/goal 时激活
this.active = parsed.length >= 2 && isLongRunningImplementationGoal(goal);

命令解析规则

多步骤验收链: 从 goal 文本中自动识别 npm ci → npm test → npm run build → npm run test:e2e 等链式命令
归一化匹配:
npx vitest / npx vitest run --reporter=verbose → npm test
npx playwright test / npx cypress run → npm run test:e2e
剥离 cd /d X &&、2>&1、管道重定向等噪声
保留 npm run test:e2e 等带命名空间的命令

状态流转

ini 复制代码

type AcceptanceCommandStatus = 'pending' | 'passed' | 'failed';

interface AcceptanceCommandEntry {
  key: string;           // 归一化键
  label: string;         // 原文展示
  status: AcceptanceCommandStatus;
  lastRunAt?: number;
}

状态更新流程:

调用 recordRunCommand(rawCommand, success) 或 recordRunCommandToolResult(classifiedResult)
语义匹配第一条未通过/失败的验收项
返回 AcceptanceTransition（命令原文、前状态、新状态）
后台任务支持：

background_start / background_running → 保持 pending
background_completed(exitCode: 0) → passed
background_failed / exitCode ≠ 0 → failed

完成判定

kotlin 复制代码

isComplete(): boolean {
  if (!this.isActive()) return true;
  return this.commands.every(c => c.status === 'passed');
}

关键特性:

所有命令必须 passed 才算完成
允许单条命令多次执行（重跑测试）
支持 checkpoint 快照恢复：TaskAcceptanceTracker.fromSnapshot(snapshot)

1.2 反馈注入机制

文件位置: src/harness/harness-tool-round.ts

函数: buildAcceptanceSuccessFeedbackMessage

反馈格式

less 复制代码

// 单条通过（未完成）
[System / Acceptance ✓] npm test --- 8 files / 22 tests passed (1/4 passed)

// 全部通过（完成信号）
[System / Acceptance ✓] npm run test:e2e --- 5 e2e tests passed in 4.4s (4/4 passed)
[System / Acceptance ✓] All 4 acceptance commands passed.
Output ≤10 delivery bullets now and STOP calling tools.

规则

单条通过：仅显示 ✓ + 进度，不注入停止信号
全部通过：附加 All N acceptance commands passed + 停止指令
命令标签截断：≤80 字符 + ...

二、Verification Gate（验证门控）

2.1 核心架构

文件位置: src/harness/task-state.ts, src/harness/document-deliverable.ts

验证状态机

ini 复制代码

type VerificationStatus = 'not_required' | 'required' | 'passed' | 'failed';
type TaskPhase = 'intent' | 'context' | 'editing' | 'verification';

状态流转:

ini 复制代码

无变更           → verificationStatus = 'not_required'
写工程源码       → verificationStatus = 'required', phase = 'editing' → 'verification'
跑单元测试成功   → verificationStatus = 'passed'
跑单元测试失败   → verificationStatus = 'failed'
Acceptance 全通过 → markVerificationPassed() → 'passed'

交付物分类

ini 复制代码

type DeliverableKind = 'engineering' | 'file_deliverable' | 'none';

// 工程源码（需单元测试）
ENGINEERING_EXTENSIONS = ['ts', 'tsx', 'js', 'jsx', 'vue', 'py', 'go', ...];

// 文件交付物（需 file_info/read_file 确认）
其余扩展名（json, yaml, md, sql, ...）

写后读 Gate

scss 复制代码

// 写操作递增版本
bumpFileDeliverableWriteVersion(path): void;

// 读/file_info 确认版本匹配
tryConfirmFileDeliverable(toolName, path, result): void;

// 未确认路径统计
verificationConfirmationStats(filesChanged, writeVersions, confirmVersions): {
  required: number;  // 需确认的总数
  pending: number;   // 待确认数
  exempt: number;    // 豁免数（临时文件/草稿）
}

豁免路径规则:

isGenericTempPath: .tmp/.bak 后缀、工作区相对 tmp//temp//cache/
isDotPrefixedDirPath: 父目录以 . 开头（如 .scratch/out.md）
isEphemeralScriptPath: check-.ps1, cleanup.ps1, verify-.sh
isProjectCustomExemptPath: config.json / .icecoder.json 的 verificationExemptDirs

2.2 单元测试 Gate

文件位置: src/harness/document-deliverable.ts, src/harness/verification-digest.ts

判定逻辑

kotlin 复制代码

shouldPromptEngineeringUnitTest(filesChanged, verificationStatus): boolean {
  if (!hasEngineeringTestTargets(filesChanged)) return false;
  return verificationStatus === 'required';  // 未跑过单测
}

shouldInjectFailedUnitTestReminder(filesChanged, verificationStatus): boolean {
  if (!hasEngineeringTestTargets(filesChanged)) return false;
  return verificationStatus === 'failed';  // 已跑但失败
}

工程源码路径识别

lua 复制代码

engineeringTestTargetPaths(filesChanged): string[] {
  return filesChanged.filter(
    path => isEngineeringUnitTestTargetPath(path) && !isVerificationExemptPath(path)
  );
}

提示注入

成功提示（未跑单测）:

bash 复制代码

[System] You changed source code but have not run unit tests yet.

Run unit tests covered these changed files (pick the command for this project):
- src/foo.ts
- src/bar.ts

Use run_command, then fix failures before claiming the task is complete.

失败加强提示:

bash 复制代码

[System] Unit tests failed for your recent changes.

Please complete unit tests: fix the failures, re-run tests via run_command, and only then finish.

Changed source files:
- src/foo.ts
- src/bar.ts

2.3 Verification Gate 计数器

文件位置: src/harness/harness-verification-gate.ts

计数器重置规则

kotlin 复制代码

shouldResetVerificationGateCounter(
  pendingBefore, pendingAfter, blockingAfter,
  acceptancePendingBefore, acceptancePendingAfter
): boolean {
  if (!blockingAfter) return true;              // blocking 解除
  if (pendingAfter < pendingBefore) return true; // file pending 净减少
  if (acceptancePendingAfter < acceptancePendingBefore) return true; // acceptance 净减少
  return false;
}

用途: 防止 LLM 在验证未完成时过早停止，计数器累积到阈值后强制 block。

三、门控集成流程

3.1 Harness 工具轮循环

文件位置: src/harness/harness-tool-round.ts

Acceptance Gate 集成点

php 复制代码

// run_command 结果分类
const classified = classifyRunCommandResult(args, rawOutput, result.success);

// 更新 Acceptance Tracker
tracker.recordRunCommandToolResult(classified);

// 生成反馈消息
const feedback = buildAcceptanceSuccessFeedbackMessage({
  newlyPassed: [...],
  completedAll: tracker.isComplete(),
  passedCount: tracker.getPassedCount(),
  totalCount: tracker.commands.length
});

// 注入到 LLM 上下文
if (feedback) msgs.push({ role: 'system', content: feedback });

Verification Gate 集成点

ini 复制代码

// 工具结果记录
taskState.recordToolResult(toolCall, result);

// 同步 Acceptance Gate 状态
syncTaskVerificationFromAcceptance(taskState, tracker);

// 检查阻塞
const acceptanceIncomplete = tracker.hasPendingAcceptanceWork();
const isBlocking = taskState.isVerificationBlockingFinal(acceptanceIncomplete);

// 生成 prompt
if (isBlocking) {
  const prompt = taskState.buildVerificationPrompt();
  msgs.push({ role: 'system', content: prompt });
}

3.2 完成判定

文件位置: src/harness/incomplete-completion.ts

kotlin 复制代码

hasPendingWork(task, acceptance, workspaceRoot): boolean {
  if (hasPendingAcceptanceWork(acceptance)) return true;
  if (hasUnfulfilledFileDeliverableGoal(task.goal, task.filesChanged, task.intent)) return true;
  return false;
}

未完成时注入:

vbnet 复制代码

buildIncompleteContinuationPrompt(task, repo, acceptance): string {
  const lines = [
    '[System] The task is NOT complete. Do not stop without calling tools.',
    '',
    'Evidence:'
  ];

  if (hasPendingAcceptanceWork(acceptance)) {
    lines.push(acceptance.buildAcceptancePrompt());
  }
  if (task.verificationStatus === 'failed') {
    lines.push('- Unit tests failed; fix and re-run before stopping.');
  }
  if (shouldPromptEngineeringUnitTest(...)) {
    lines.push('- Source code changed but unit tests have not passed yet.');
  }

  return lines.join('\n');
}

四、执行模式与门控协同

4.1 Execution Mode（执行模式）

文件位置: src/harness/supervisor/mode-decision-engine.ts

模式信号

arduino 复制代码

type ModeSignal = 'task_graph_active' | 'pending_steps' | 'multi_write' 
                | 'branch_switched' | 'checkpoint_resumed' | 'tool_failure' 
                | 'large_diff' | 'explicit_impl';

// 进入 forced 模式
shouldEnterForcedMode(state, config, signals): ModeSignal[] {
  if (state.pendingStepCount >= config.pendingStepsEnterThreshold) reasons.push('pending_steps');
  if (state.writeTargetsThisRound > config.writeTargetsEnterThreshold) reasons.push('multi_write');
  if (!state.lastToolSuccess) reasons.push('tool_failure');
  // ...
}

门控协同

free 模式: LLM 自主决定工具调用
forced 模式: 门控注入强提示，LLM 需优先处理门控任务
ToolGate: DefaultToolGate.decide() 在 forced 模式下可 block 特定工具

五、测试覆盖

5.1 Acceptance Gate 测试

文件: test/harness/task-acceptance-tracker.test.ts

核心用例:

四命令验收链解析（benchmark goal）
激活条件（≥2 命令 + 长跑 goal）
归一化匹配（cd /d X && npm run build → npm run build）
Playwright/Cypress 归一化到 npm run test:e2e
后台任务状态流转（start → running → completed/failed）
快照恢复 roundtrip
hasPendingWork 集成验证

5.2 Verification Gate 测试

文件: test/harness/harness-verification-gate.test.ts

核心用例:

计数器重置条件（blocking 解除 / pending 减少 / acceptance 减少）
计数器保持（无进展时不重置）

5.3 执行模式测试

文件: test/harness/execution-mode-acceptance.test.ts

核心用例:

L0 只读计划保持 free 模式
多写文件 / 工具失败进入 forced 模式
信号优先级排序

六、关键设计原则

6.1 分层门控

Acceptance Gate: 顶层多步骤验收（benchmark / 复杂任务）
Verification Gate: 代码变更后的单测验证
File Deliverable Gate: 写后读确认（非工程文件）

6.2 渐进式反馈

单条验收通过：轻提示（✓ + 进度）
全部通过：停止信号
单测失败：加强提示（不硬 block，允许解释失败）

6.3 容错与恢复

允许命令重跑（多次 npm test 覆盖同一验收项）
后台任务支持（run_command 后台启动 + action:check 轮询）
Checkpoint 快照恢复（TaskAcceptanceTracker.fromSnapshot）

6.4 语义匹配

归一化命令键（剥离噪声、统一变体）
模糊匹配（cd /d X && npm run build 匹配 npm run build）
命令优先于 label（后台任务 check 响应使用真实 command 字段）

七、配置参数

7.1 Execution Mode 参数（supervisor-config.json）

json 复制代码

{
  "executionMode": {
    "pendingStepsEnterThreshold": 2,
    "writeTargetsEnterThreshold": 1,
    "diffLinesEnterThreshold": 200,
    "stableRoundsExitThreshold": 2,
    "modeLockRounds": 2,
    "forcedMinDwellRounds": 1,
    "readonlyToolNames": ["read_file", "glob", "grep", "list_dir"]
  }
}

7.2 验证豁免路径（config.json / .icecoder.json）

json 复制代码

{
  "verificationExemptDirs": [
    ".scratch",
    ".temp",
    "tmp/"
  ]
}

八、典型场景

8.1 Benchmark 四命令验收链

Goal:

go 复制代码

从零实现 survivors roguelike。
只有 **`npm ci` → `npm test` → `npm run build` → `npm run test:e2e` 全部成功** 后，才输出交付 bullet 并结束

流程:

解析出 4 条验收命令，激活 Acceptance Gate
依次执行 npm ci → npm test → npm run build → npm run test:e2e
每条通过后注入 ✓ 反馈（1/4, 2⁄4, 3/4）
第 4 条通过后注入 All 4 acceptance commands passed + 停止信号
hasPendingWork() 返回 false，允许任务结束

8.2 工程源码变更

场景: 修改 src/foo.ts 后未跑单测

流程:

write_file('src/foo.ts') → verificationStatus = 'required'
下一轮 isVerificationBlockingFinal() 返回 true
注入 buildVerificationPrompt() 提示跑单测
用户执行 npm test → verificationStatus = 'passed' 或 'failed'
失败时注入 buildFailedUnitTestReminderPrompt() 加强提示

8.3 后台长时间测试

场景: npm run test:e2e 需 5 分钟

流程:

run_command('npm run test:e2e 2>&1') → 返回 background_start
recordRunCommandToolResult(background_start) → 保持 pending
轮询 action:check → background_running
最终 action:check → background_completed(exitCode: 0)
recordRunCommandToolResult(background_completed) → passed

九、扩展点

9.1 新增验收命令类型

在 normalizeAcceptanceCommandKey() 中添加归一化规则
在 isHarnessVerificationCommand() 中添加命令匹配

9.2 自定义门控策略

实现 ToolGate 接口，自定义 decide() 逻辑
扩展 ExecutionModeConfig 参数调整阈值

9.3 多语言支持

buildAcceptanceSuccessFeedbackMessage() 文案国际化
buildIncompleteContinuationPrompt() 多语言模板

生成时间: 2026-06-12 分析范围: 验收门控机制（Acceptance Gate + Verification Gate）