概述
本项目的验收门控(Acceptance Gating)机制是一套多层次的质量保障系统,确保任务交付前满足预定义的验收标准。系统包含两大核心门控:
- Acceptance Gate(验收命令门控)- 针对多步骤验收流程(如 benchmark 场景)
- Verification Gate(验证门控)- 针对代码变更后的单元测试验证
一、Acceptance Gate(验收命令门控)
1.1 核心架构
文件位置: src/harness/task-acceptance-tracker.ts
核心类: TaskAcceptanceTracker
激活条件
ini
// 从 goal 解析验收命令
const parsed = parseAcceptanceCommandsFromGoal(goal);
// 仅当 ≥2 条命令且为长跑 benchmark/goal 时激活
this.active = parsed.length >= 2 && isLongRunningImplementationGoal(goal);
命令解析规则
-
多步骤验收链: 从 goal 文本中自动识别 npm ci → npm test → npm run build → npm run test:e2e 等链式命令
-
归一化匹配:
-
npx vitest / npx vitest run --reporter=verbose → npm test
-
npx playwright test / npx cypress run → npm run test:e2e
-
剥离 cd /d X &&、2>&1、管道重定向等噪声
-
保留 npm run test:e2e 等带命名空间的命令
状态流转
ini
type AcceptanceCommandStatus = 'pending' | 'passed' | 'failed';
interface AcceptanceCommandEntry {
key: string; // 归一化键
label: string; // 原文展示
status: AcceptanceCommandStatus;
lastRunAt?: number;
}
状态更新流程:
- 调用 recordRunCommand(rawCommand, success) 或 recordRunCommandToolResult(classifiedResult)
- 语义匹配第一条未通过/失败的验收项
- 返回 AcceptanceTransition(命令原文、前状态、新状态)
- 后台任务支持:
-
background_start / background_running → 保持 pending
-
background_completed(exitCode: 0) → passed
-
background_failed / exitCode ≠ 0 → failed
完成判定
kotlin
isComplete(): boolean {
if (!this.isActive()) return true;
return this.commands.every(c => c.status === 'passed');
}
关键特性:
- 所有命令必须 passed 才算完成
- 允许单条命令多次执行(重跑测试)
- 支持 checkpoint 快照恢复:TaskAcceptanceTracker.fromSnapshot(snapshot)
1.2 反馈注入机制
文件位置: src/harness/harness-tool-round.ts
函数: buildAcceptanceSuccessFeedbackMessage
反馈格式
less
// 单条通过(未完成)
[System / Acceptance ✓] npm test --- 8 files / 22 tests passed (1/4 passed)
// 全部通过(完成信号)
[System / Acceptance ✓] npm run test:e2e --- 5 e2e tests passed in 4.4s (4/4 passed)
[System / Acceptance ✓] All 4 acceptance commands passed.
Output ≤10 delivery bullets now and STOP calling tools.
规则
- 单条通过:仅显示 ✓ + 进度,不注入停止信号
- 全部通过:附加 All N acceptance commands passed + 停止指令
- 命令标签截断:≤80 字符 + ...
二、Verification Gate(验证门控)
2.1 核心架构
文件位置: src/harness/task-state.ts, src/harness/document-deliverable.ts
验证状态机
ini
type VerificationStatus = 'not_required' | 'required' | 'passed' | 'failed';
type TaskPhase = 'intent' | 'context' | 'editing' | 'verification';
状态流转:
ini
无变更 → verificationStatus = 'not_required'
写工程源码 → verificationStatus = 'required', phase = 'editing' → 'verification'
跑单元测试成功 → verificationStatus = 'passed'
跑单元测试失败 → verificationStatus = 'failed'
Acceptance 全通过 → markVerificationPassed() → 'passed'
交付物分类
ini
type DeliverableKind = 'engineering' | 'file_deliverable' | 'none';
// 工程源码(需单元测试)
ENGINEERING_EXTENSIONS = ['ts', 'tsx', 'js', 'jsx', 'vue', 'py', 'go', ...];
// 文件交付物(需 file_info/read_file 确认)
其余扩展名(json, yaml, md, sql, ...)
写后读 Gate
scss
// 写操作递增版本
bumpFileDeliverableWriteVersion(path): void;
// 读/file_info 确认版本匹配
tryConfirmFileDeliverable(toolName, path, result): void;
// 未确认路径统计
verificationConfirmationStats(filesChanged, writeVersions, confirmVersions): {
required: number; // 需确认的总数
pending: number; // 待确认数
exempt: number; // 豁免数(临时文件/草稿)
}
豁免路径规则:
- isGenericTempPath: .tmp/.bak 后缀、工作区相对 tmp//temp//cache/
- isDotPrefixedDirPath: 父目录以 . 开头(如 .scratch/out.md)
- isEphemeralScriptPath: check-.ps1, cleanup.ps1, verify-.sh
- isProjectCustomExemptPath: config.json / .icecoder.json 的 verificationExemptDirs
2.2 单元测试 Gate
文件位置: src/harness/document-deliverable.ts, src/harness/verification-digest.ts
判定逻辑
kotlin
shouldPromptEngineeringUnitTest(filesChanged, verificationStatus): boolean {
if (!hasEngineeringTestTargets(filesChanged)) return false;
return verificationStatus === 'required'; // 未跑过单测
}
shouldInjectFailedUnitTestReminder(filesChanged, verificationStatus): boolean {
if (!hasEngineeringTestTargets(filesChanged)) return false;
return verificationStatus === 'failed'; // 已跑但失败
}
工程源码路径识别
lua
engineeringTestTargetPaths(filesChanged): string[] {
return filesChanged.filter(
path => isEngineeringUnitTestTargetPath(path) && !isVerificationExemptPath(path)
);
}
提示注入
成功提示(未跑单测):
bash
[System] You changed source code but have not run unit tests yet.
Run unit tests covered these changed files (pick the command for this project):
- src/foo.ts
- src/bar.ts
Use run_command, then fix failures before claiming the task is complete.
失败加强提示:
bash
[System] Unit tests failed for your recent changes.
Please complete unit tests: fix the failures, re-run tests via run_command, and only then finish.
Changed source files:
- src/foo.ts
- src/bar.ts
2.3 Verification Gate 计数器
文件位置: src/harness/harness-verification-gate.ts
计数器重置规则
kotlin
shouldResetVerificationGateCounter(
pendingBefore, pendingAfter, blockingAfter,
acceptancePendingBefore, acceptancePendingAfter
): boolean {
if (!blockingAfter) return true; // blocking 解除
if (pendingAfter < pendingBefore) return true; // file pending 净减少
if (acceptancePendingAfter < acceptancePendingBefore) return true; // acceptance 净减少
return false;
}
用途: 防止 LLM 在验证未完成时过早停止,计数器累积到阈值后强制 block。
三、门控集成流程
3.1 Harness 工具轮循环
文件位置: src/harness/harness-tool-round.ts
Acceptance Gate 集成点
php
// run_command 结果分类
const classified = classifyRunCommandResult(args, rawOutput, result.success);
// 更新 Acceptance Tracker
tracker.recordRunCommandToolResult(classified);
// 生成反馈消息
const feedback = buildAcceptanceSuccessFeedbackMessage({
newlyPassed: [...],
completedAll: tracker.isComplete(),
passedCount: tracker.getPassedCount(),
totalCount: tracker.commands.length
});
// 注入到 LLM 上下文
if (feedback) msgs.push({ role: 'system', content: feedback });
Verification Gate 集成点
ini
// 工具结果记录
taskState.recordToolResult(toolCall, result);
// 同步 Acceptance Gate 状态
syncTaskVerificationFromAcceptance(taskState, tracker);
// 检查阻塞
const acceptanceIncomplete = tracker.hasPendingAcceptanceWork();
const isBlocking = taskState.isVerificationBlockingFinal(acceptanceIncomplete);
// 生成 prompt
if (isBlocking) {
const prompt = taskState.buildVerificationPrompt();
msgs.push({ role: 'system', content: prompt });
}
3.2 完成判定
文件位置: src/harness/incomplete-completion.ts
kotlin
hasPendingWork(task, acceptance, workspaceRoot): boolean {
if (hasPendingAcceptanceWork(acceptance)) return true;
if (hasUnfulfilledFileDeliverableGoal(task.goal, task.filesChanged, task.intent)) return true;
return false;
}
未完成时注入:
vbnet
buildIncompleteContinuationPrompt(task, repo, acceptance): string {
const lines = [
'[System] The task is NOT complete. Do not stop without calling tools.',
'',
'Evidence:'
];
if (hasPendingAcceptanceWork(acceptance)) {
lines.push(acceptance.buildAcceptancePrompt());
}
if (task.verificationStatus === 'failed') {
lines.push('- Unit tests failed; fix and re-run before stopping.');
}
if (shouldPromptEngineeringUnitTest(...)) {
lines.push('- Source code changed but unit tests have not passed yet.');
}
return lines.join('\n');
}
四、执行模式与门控协同
4.1 Execution Mode(执行模式)
文件位置: src/harness/supervisor/mode-decision-engine.ts
模式信号
arduino
type ModeSignal = 'task_graph_active' | 'pending_steps' | 'multi_write'
| 'branch_switched' | 'checkpoint_resumed' | 'tool_failure'
| 'large_diff' | 'explicit_impl';
// 进入 forced 模式
shouldEnterForcedMode(state, config, signals): ModeSignal[] {
if (state.pendingStepCount >= config.pendingStepsEnterThreshold) reasons.push('pending_steps');
if (state.writeTargetsThisRound > config.writeTargetsEnterThreshold) reasons.push('multi_write');
if (!state.lastToolSuccess) reasons.push('tool_failure');
// ...
}
门控协同
- free 模式: LLM 自主决定工具调用
- forced 模式: 门控注入强提示,LLM 需优先处理门控任务
- ToolGate: DefaultToolGate.decide() 在 forced 模式下可 block 特定工具
五、测试覆盖
5.1 Acceptance Gate 测试
文件: test/harness/task-acceptance-tracker.test.ts
核心用例:
- 四命令验收链解析(benchmark goal)
- 激活条件(≥2 命令 + 长跑 goal)
- 归一化匹配(cd /d X && npm run build → npm run build)
- Playwright/Cypress 归一化到 npm run test:e2e
- 后台任务状态流转(start → running → completed/failed)
- 快照恢复 roundtrip
- hasPendingWork 集成验证
5.2 Verification Gate 测试
文件: test/harness/harness-verification-gate.test.ts
核心用例:
- 计数器重置条件(blocking 解除 / pending 减少 / acceptance 减少)
- 计数器保持(无进展时不重置)
5.3 执行模式测试
文件: test/harness/execution-mode-acceptance.test.ts
核心用例:
- L0 只读计划保持 free 模式
- 多写文件 / 工具失败进入 forced 模式
- 信号优先级排序
六、关键设计原则
6.1 分层门控
- Acceptance Gate: 顶层多步骤验收(benchmark / 复杂任务)
- Verification Gate: 代码变更后的单测验证
- File Deliverable Gate: 写后读确认(非工程文件)
6.2 渐进式反馈
- 单条验收通过:轻提示(✓ + 进度)
- 全部通过:停止信号
- 单测失败:加强提示(不硬 block,允许解释失败)
6.3 容错与恢复
- 允许命令重跑(多次 npm test 覆盖同一验收项)
- 后台任务支持(run_command 后台启动 + action:check 轮询)
- Checkpoint 快照恢复(TaskAcceptanceTracker.fromSnapshot)
6.4 语义匹配
- 归一化命令键(剥离噪声、统一变体)
- 模糊匹配(cd /d X && npm run build 匹配 npm run build)
- 命令优先于 label(后台任务 check 响应使用真实 command 字段)
七、配置参数
7.1 Execution Mode 参数(supervisor-config.json)
json
{
"executionMode": {
"pendingStepsEnterThreshold": 2,
"writeTargetsEnterThreshold": 1,
"diffLinesEnterThreshold": 200,
"stableRoundsExitThreshold": 2,
"modeLockRounds": 2,
"forcedMinDwellRounds": 1,
"readonlyToolNames": ["read_file", "glob", "grep", "list_dir"]
}
}
7.2 验证豁免路径(config.json / .icecoder.json)
json
{
"verificationExemptDirs": [
".scratch",
".temp",
"tmp/"
]
}
八、典型场景
8.1 Benchmark 四命令验收链
Goal:
go
从零实现 survivors roguelike。
只有 **`npm ci` → `npm test` → `npm run build` → `npm run test:e2e` 全部成功** 后,才输出交付 bullet 并结束
流程:
- 解析出 4 条验收命令,激活 Acceptance Gate
- 依次执行 npm ci → npm test → npm run build → npm run test:e2e
- 每条通过后注入 ✓ 反馈(1/4, 2⁄4, 3/4)
- 第 4 条通过后注入 All 4 acceptance commands passed + 停止信号
- hasPendingWork() 返回 false,允许任务结束
8.2 工程源码变更
场景: 修改 src/foo.ts 后未跑单测
流程:
- write_file('src/foo.ts') → verificationStatus = 'required'
- 下一轮 isVerificationBlockingFinal() 返回 true
- 注入 buildVerificationPrompt() 提示跑单测
- 用户执行 npm test → verificationStatus = 'passed' 或 'failed'
- 失败时注入 buildFailedUnitTestReminderPrompt() 加强提示
8.3 后台长时间测试
场景: npm run test:e2e 需 5 分钟
流程:
- run_command('npm run test:e2e 2>&1') → 返回 background_start
- recordRunCommandToolResult(background_start) → 保持 pending
- 轮询 action:check → background_running
- 最终 action:check → background_completed(exitCode: 0)
- recordRunCommandToolResult(background_completed) → passed
九、扩展点
9.1 新增验收命令类型
- 在 normalizeAcceptanceCommandKey() 中添加归一化规则
- 在 isHarnessVerificationCommand() 中添加命令匹配
9.2 自定义门控策略
- 实现 ToolGate 接口,自定义 decide() 逻辑
- 扩展 ExecutionModeConfig 参数调整阈值
9.3 多语言支持
- buildAcceptanceSuccessFeedbackMessage() 文案国际化
- buildIncompleteContinuationPrompt() 多语言模板
生成时间: 2026-06-12 分析范围: 验收门控机制(Acceptance Gate + Verification Gate)