从演示代码到可复用包
Article 19 用一个 900 行的 harness_full_demo.py 演示了 8 层防护。这个文件足够说明概念,但不适合复用:所有层耦合在一起,无法单独测试,无法被其他项目引用。
生产级 Agent 项目需要的是一个可以 import 的包:
scss
harness/
├── __init__.py 公共 API 导出
├── registry.py Layer 2:ActionRegistry + PermissionLevel
├── budget.py Layer 3:PermissionBudget(含 refund())
├── sandbox.py Layer 4:sanitise_input + sandboxed_eval
├── audit.py Layer 6:ImmutableAuditLog(哈希链)
├── rollback.py Layer 7:RollbackCoordinator
└── harness.py 统一入口:AgentHarness
本文从包设计开始,覆盖三个关键 API 决策,最后展示两种集成方式:纯 Python standalone 和 LangGraph 图嵌入。
模块设计
registry.py --- Layer 2
python
class PermissionLevel(Enum):
READ = 1
WRITE = 2
ADMIN = 3
IRREVERSIBLE = 4
@dataclass
class RegisteredAction:
name: str
level: PermissionLevel
budget_cost: int
description: str
handler: Any # Callable 或 BaseTool
class ActionRegistry:
def register(self, action: RegisteredAction) -> None: ...
def get(self, name: str) -> RegisteredAction: ... # 不存在 → PermissionError
def is_allowed(self, name: str) -> bool: ...
def names(self) -> list[str]: ...
get() 而非 __getitem__:统一抛出 PermissionError,不暴露 KeyError 内部细节。
budget.py --- Layer 3
python
class PermissionBudget:
def spend(self, action_name: str, cost: int) -> None:
if self.remaining < cost:
raise BudgetExhaustedError(...)
self.remaining -= cost
def refund(self, action_name: str, cost: int) -> None:
self.remaining = min(self.total, self.remaining + cost)
新增 refund():Article 19 中预算在审批前扣除、被拒绝后不退还。生产包修正了这个设计------IRREVERSIBLE 操作被拦截时,harness.py 主动调用 refund(),保持预算精度。
sandbox.py --- Layer 4
python
INJECTION_PATTERN = re.compile(
r"(ignore.*(previous|above|prior)|forget.*instruction|"
r"you are now|act as|jailbreak|bypass|"
r"override.*system|system.*override|" # 两种词序都覆盖
r"</s>|\n\n###|###\s*system|<\|im_start\|>|system prompt)",
re.IGNORECASE,
)
注意两个细节:
- 同时覆盖
SYSTEM OVERRIDE(system 在前)和override.*system(override 在前) \n\n###匹配真实换行,不是字面量\\n\\n###
这两个 bug 在 Article 21 的对抗测试中被发现并修复。
audit.py --- Layer 6
python
class ImmutableAuditLog:
def log(self, action, actor, target, result, metadata=None) -> str:
entry = {..., "prev_hash": self._last_hash}
entry["hash"] = self._hash(json.dumps(entry, sort_keys=True) + self._last_hash)
with self._path.open("a") as f: # append-only
f.write(json.dumps(entry) + "\n")
return entry["hash"]
def verify_integrity(self) -> bool:
# 重放哈希链,任何字段变动都会返回 False
...
__len__() 辅助方法让测试可以直接用 len(audit) 检查条目数。
rollback.py --- Layer 7
python
class RollbackCoordinator:
@contextmanager
def transaction(self, state: dict, op_name: str):
snapshot = copy.deepcopy(state)
self._snapshots.append({"op": op_name, "snapshot": snapshot})
try:
yield state
except Exception:
state.clear()
state.update(snapshot)
self._snapshots.pop()
raise
def rollback_last(self, state: dict) -> str | None:
"""人工触发:撤销最近一次已提交的事务。"""
if not self._snapshots:
return None
entry = self._snapshots.pop()
state.clear()
state.update(entry["snapshot"])
return entry["op"]
rollback_last() 支持人工回滚:事务成功提交后,快照仍然保留,直到人工确认或显式清除。
统一入口:AgentHarness
python
class AgentHarness:
def __init__(self, budget: int = 100, log_path: str = ...):
self.registry = ActionRegistry()
self.budget = PermissionBudget(total=budget)
self.audit = ImmutableAuditLog(log_path=log_path)
self.rollback = RollbackCoordinator()
self._state: dict = {}
def execute(self, action_name: str, actor: str = "agent", **kwargs) -> Any:
# Layer 4: 净化字符串参数
# Layer 2: 注册表检查(不存在 → PermissionError)
# Layer 3: 预算扣除(不足 → BudgetExhaustedError)
# Layer 5: IRREVERSIBLE → 退还预算 + 抛出 HumanApprovalRequired
# Layer 7: WRITE/ADMIN 包裹在 rollback.transaction 中
# Layer 6: 审计记录
...
def approve_and_execute(self, action_name: str, actor: str = "human", **kwargs) -> Any:
"""捕获 HumanApprovalRequired 后调用此方法完成执行。"""
...
两个方法分离的设计理由:
execute()是自动化路径:所有检查通过即执行approve_and_execute()是人工路径:调用方明确表示"已审批"
将两者合并(如用一个 approved=False 参数)会让意图模糊,且更难测试。
Standalone 使用
基本流程
python
harness = AgentHarness(budget=50)
# 注册动作
harness.registry.register(RegisteredAction(
"read_ticket", PermissionLevel.READ, 1, "Read Jira ticket", handler_fn))
harness.registry.register(RegisteredAction(
"write_draft", PermissionLevel.WRITE, 3, "Write draft fix", handler_fn))
harness.registry.register(RegisteredAction(
"create_pr", PermissionLevel.ADMIN, 8, "Open pull request", handler_fn))
harness.registry.register(RegisteredAction(
"merge_to_main", PermissionLevel.IRREVERSIBLE, 20, "Merge to main", handler_fn))
READ → WRITE → ADMIN 正常流程:
python
r1 = harness.execute("read_ticket", ticket_id="BUG-101")
r2 = harness.execute("write_draft", ticket_id="BUG-101", patch="fix: add null check")
r3 = harness.execute("create_pr", ticket_id="BUG-101", title="fix: BUG-101")
# Budget: 38/50 remaining (read=1 + write=3 + admin=8 = 12 spent)
未注册动作被拦截
python
try:
harness.execute("delete_all_data")
except PermissionError as e:
# "Action 'delete_all_data' not in registry. Execution blocked."
...
IRREVERSIBLE 两阶段执行
python
try:
harness.execute("merge_to_main", pr_id=1)
except HumanApprovalRequired as e:
print(e.action_name) # "merge_to_main"
print(e.action_args) # {"pr_id": 1}
# 人工确认后:
result = harness.approve_and_execute("merge_to_main", pr_id=1)
关键 :execute() 捕获到 IRREVERSIBLE 时先调用 budget.refund(),预算净消耗为 0。只有 approve_and_execute() 才真正扣费。
预算耗尽
python
# budget=5,write cost=3
h = AgentHarness(budget=5)
h.execute("write_draft", ...) # OK,剩余 2
h.execute("write_draft", ...) # BudgetExhaustedError: need 3, remaining 2
LangGraph 集成
在 LangGraph 的 tools_node 中嵌入 harness:
python
def tools_node(state: HState) -> dict:
last = state["messages"][-1]
results = []
for tc in last.tool_calls:
name, args = tc["name"], tc["args"]
try:
reg = harness.registry.get(name) # Layer 2
harness.budget.spend(name, reg.budget_cost) # Layer 3
if reg.level == PermissionLevel.IRREVERSIBLE:
decision = interrupt({...}) # Layer 5: LangGraph 原语
if decision != "approved":
harness.budget.refund(name, reg.budget_cost)
harness.audit.log(name, "checkpoint", ..., "HUMAN_REJECTED")
results.append(ToolMessage(content="rejected", ...))
continue
if reg.level in (WRITE, ADMIN):
with harness.rollback.transaction(harness._state, name): # Layer 7
output = TOOL_MAP[name].invoke(args)
else:
output = TOOL_MAP[name].invoke(args)
harness.audit.log(name, "agent", ..., "EXECUTED") # Layer 6
results.append(ToolMessage(content=str(output), ...))
except PermissionError as e:
harness.audit.log(name, "registry", ..., "BLOCKED")
results.append(ToolMessage(content=str(e), ...))
except BudgetExhaustedError as e:
results.append(ToolMessage(content=str(e), ...))
return {"messages": results}
tools_node 是 harness 的天然接入点:它在工具执行前介入,而不影响 agent_node(推理层)的任何逻辑。
Article 21 的测试结果(45/45)
这个包的行为由 Article 21 的测试套件完整验证:
scss
Functional (Layer 1--7 basic behaviour) ████████████████████████████████ 19/19 PASS
Adversarial (injection / escalation) ████████████████████████████████ 17/17 PASS
Chaos (fault injection / partial) ████████████████████████████████ 9/ 9 PASS
Total 45/ 45 tests passed
测试发现了两个真实 bug:
INJECTION_PATTERN只匹配override.*system,漏掉了[SYSTEM OVERRIDE](词序反转)\\n\\n###匹配字面量\n,不匹配真实换行,漏掉### System:jailbreak 模式
两个 bug 均在 sandbox.py 中修复(一行 regex 调整)。
设计 Checklist
包结构
- 每层一个文件,文件只做一件事
-
__init__.py只导出公开 API,内部类不暴露 -
AgentHarness作为门面(Facade),不直接暴露子系统
API 设计
-
execute()是自动化路径,覆盖 Layer 2 → 7 的完整链 -
approve_and_execute()是人工路径,调用方负责表明"已审批" - IRREVERSIBLE 拦截时退还预算(
refund()),保持预算精度 - 所有异常类型(
PermissionError/BudgetExhaustedError/HumanApprovalRequired)从__init__.py导出
Sandbox
- Injection 检测 Pattern 同时覆盖正向和反向词序
-
\n用真实换行字符,不用字面量\\n
LangGraph 集成
- harness 只嵌入
tools_node,不污染agent_node - 每个 tool call 独立走 harness 检查链
- IRREVERSIBLE 使用 LangGraph
interrupt()而非 Python 异常
总结
五个核心结论:
- 模块化是可测试性的前提:单文件无法单独测试某一层;拆包后每个模块可以独立 mock 和验证
- IRREVERSIBLE 拦截退还预算:Article 19 的 bug 修复,"先拦截后扣费"比"拦截后退费"逻辑更清晰,但两种设计都有适用场景------选一种并文档化
execute()+approve_and_execute()分离是边界清晰的体现:自动路径和人工路径分开,调用方意图明确- 测试发现了生产代码的真实 bug:两个 regex 漏洞在写代码时未被发现,对抗测试第一次运行就暴露了
- LangGraph 的
tools_node是 harness 的天然插槽:无需修改 agent 逻辑,只在工具执行层加 harness,关注点分离
参考资料
- LangGraph Tools Node 文档
- 第 17 篇:Harness Engineering 入门------五要素概论
- 第 19 篇:Harness 完整体系------8 层防护框架
- 本系列完整 Demo 代码:agent-19-harness-production
欢迎访问 PrimeSkills ------ 一个精心策划的 AI Agent 与技能市场,所有内容均经过真实企业级工作流验证。没有噱头,只有真正有效的东西。
更多实用知识和有趣产品,欢迎访问我的个人主页