给 Agent 浏览器任务加一个 Verification Gate：遇到验证页时该如何优雅暂停

Agent 浏览器任务最容易被低估的一类状态，是验证页。

任务原本在执行一个普通流程：打开页面、读取状态、点击按钮、提交表单。中间突然出现安全验证、登录失效、行为异常提示或人工确认页。如果工作流没有识别这类状态，Agent 可能会继续推理、继续点击，最后输出一个看似完成但实际上不可用的结果。

这不是 prompt 写得不够细的问题，而是 workflow 缺少一个明确的 Verification Gate。

1. 验证页不是异常弹窗，而是工作流状态

很多自动化脚本会把验证页当成普通失败：

text 复制代码

element not found
timeout
click failed
unknown page state

但从系统设计角度看，验证页应该是一种一等状态。

text 复制代码

ready
  -> running
  -> verification_required
  -> human_review
  -> resumed / stopped

这样设计之后，任务不会在验证状态下继续执行，也不会把验证页误判为业务页面。

更重要的是，它能把问题从"脚本失败了"变成"任务进入了需要人工处理的状态"。

2. 先定义核心类型

可以先把一次 Agent 浏览器任务抽象成几个对象。

ts 复制代码

type RunStatus =
  | "ready"
  | "running"
  | "verification_required"
  | "human_review"
  | "resumed"
  | "stopped"
  | "failed";

type VerificationType =
  | "login_required"
  | "security_check"
  | "network_or_behavior_anomaly"
  | "permission_required"
  | "unknown";

type BrowserRun = {
  runId: string;
  jobId: string;
  workspaceId: string;
  profileId: string;
  status: RunStatus;
  startedAt: string;
  updatedAt: string;
};

这里重点不是类型多完整，而是让"验证状态"进入 workflow，而不是散落在日志字符串里。

3. EnvironmentSnapshot 是判断入口

验证页出现时，第一件事不是继续点，也不是立刻重试，而是保存环境快照。

ts 复制代码

type EnvironmentSnapshot = {
  id: string;
  runId: string;
  profile: {
    profileId: string;
    browserVersion: string;
    extensionsHash?: string;
  };
  session: {
    cookieStatus: "valid" | "expired" | "unknown";
    localStorageStatus: "ready" | "missing" | "unknown";
    indexedDbStatus: "ready" | "missing" | "unknown";
  };
  network: {
    proxyId?: string;
    region?: string;
    timezone: string;
    language: string;
    webrtcPolicy?: string;
  };
  createdAt: string;
};

它主要回答几个问题：

任务是不是跑在预期 Profile；
Cookie、LocalStorage、IndexedDB 是否可用；
Proxy、时区、语言是否和工作区匹配；
浏览器或插件是否发生变化；
这次异常是否和最近环境变更有关。

没有环境快照，后面看到的所有问题都只能靠猜。

4. Verification Gate 怎么判断

Verification Gate 可以独立成一个步骤，放在每个高风险动作前后。

ts 复制代码

type PageSignal = {
  url: string;
  title?: string;
  screenshotPath?: string;
  visibleTexts: string[];
  statusCode?: number;
};

type VerificationResult =
  | { ok: true }
  | {
      ok: false;
      type: VerificationType;
      reason: string;
      reviewRequired: true;
    };

一个简单实现：

ts 复制代码

function detectVerification(signal: PageSignal): VerificationResult {
  const text = signal.visibleTexts.join(" ");

  if (text.includes("安全验证") || text.includes("请进行验证")) {
    return {
      ok: false,
      type: "security_check",
      reason: "security verification prompt detected",
      reviewRequired: true
    };
  }

  if (text.includes("登录") && text.includes("密码")) {
    return {
      ok: false,
      type: "login_required",
      reason: "login page detected",
      reviewRequired: true
    };
  }

  if (text.includes("网络环境") || text.includes("行为异常")) {
    return {
      ok: false,
      type: "network_or_behavior_anomaly",
      reason: "network or behavior anomaly prompt detected",
      reviewRequired: true
    };
  }

  return { ok: true };
}

这段代码的目的不是自动处理验证，而是让任务及时停止，把现场交给人工复核。

5. 进入验证状态时应该做什么

当 detectVerification 返回 ok: false 时，建议 workflow 做五件事：

text 复制代码

1. pause current run
2. save current screenshot
3. save EnvironmentSnapshot
4. classify VerificationType
5. create ReviewItem

可以写成一个处理函数：

ts 复制代码

type ReviewItem = {
  id: string;
  runId: string;
  type: VerificationType;
  reason: string;
  screenshotPath?: string;
  environmentSnapshotId: string;
  status: "pending" | "resolved" | "rejected";
  createdAt: string;
};

async function handleVerificationRequired(input: {
  run: BrowserRun;
  result: Exclude<VerificationResult, { ok: true }>;
  snapshot: EnvironmentSnapshot;
  screenshotPath?: string;
}) {
  const reviewItem: ReviewItem = {
    id: crypto.randomUUID(),
    runId: input.run.runId,
    type: input.result.type,
    reason: input.result.reason,
    screenshotPath: input.screenshotPath,
    environmentSnapshotId: input.snapshot.id,
    status: "pending",
    createdAt: new Date().toISOString()
  };

  return {
    run: {
      ...input.run,
      status: "human_review" as const,
      updatedAt: new Date().toISOString()
    },
    reviewItem
  };
}

这里的关键是：不要继续执行后续步骤。

验证状态通常意味着当前上下文不再适合自动继续。继续点、继续提交、继续重试，只会让排查更复杂。

6. FailureReason 不要只写 unknown

如果没有分类，所有问题最后都会变成 unknown。

建议至少区分这些类型：

failure_type	含义	处理方式
verification_required	需要人工验证	暂停任务，进入 ReviewQueue
session_invalid	会话不可用	检查 Cookie、LocalStorage、登录状态
env_mismatch	环境不匹配	检查 Profile、Proxy、时区、语言
page_changed	页面结构变化	检查 DOM、选择器、页面版本
action_rate_limited	操作节奏受限	降低频率，增加等待和复核
agent_uncertain	Agent 判断不确定	保存证据，进入人工复核
unknown	未分类异常	查看截图、Trace、最近变更

这样后续处理才有方向。

7. ReviewQueue 是 Agent Workflow 的刹车

Agent workflow 里，人工复核不是倒退，而是刹车系统。

ts 复制代码

type ReviewDecision =
  | { action: "resume"; note: string }
  | { action: "stop"; note: string }
  | { action: "retry_after_fix"; note: string };

function applyReviewDecision(run: BrowserRun, decision: ReviewDecision): BrowserRun {
  if (decision.action === "resume") {
    return {
      ...run,
      status: "resumed",
      updatedAt: new Date().toISOString()
    };
  }

  if (decision.action === "stop") {
    return {
      ...run,
      status: "stopped",
      updatedAt: new Date().toISOString()
    };
  }

  return {
    ...run,
    status: "ready",
    updatedAt: new Date().toISOString()
  };
}

这里可以配合团队流程：

人工完成官方验证后，选择 resume；
发现环境不匹配，选择 retry_after_fix；
判断任务不应继续，选择 stop。

这比无限重试稳得多。

8. 推荐的数据流

text 复制代码

create BrowserRun
  -> collect EnvironmentSnapshot
  -> open target page
  -> collect PageSignal
  -> detectVerification
  -> if ok: continue workflow
  -> if not ok: pause run and create ReviewItem
  -> human review
  -> resume / retry_after_fix / stop

这个流程里，Agent 的职责是识别状态、保留证据、暂停任务，而不是替代人工完成验证。

9. 工具层应该提供什么能力

如果要把这套流程落到工具层，至少需要：

固定 Profile 与 workspace 映射；
保存 Cookie、LocalStorage、IndexedDB 检查结果；
记录 Proxy、时区、语言、WebRTC 策略；
给每次任务生成 RunTrace；
保存关键步骤截图；
识别验证状态并暂停；
提供 ReviewQueue；
记录人工处理结果。

Web4Browser 这类Agent-ready browser environment 的一种实现方向可以作为观察样本：它把浏览器环境、Profile 管理、Agent workflow 和本地优先的数据控制放进同一套工作台，适合承载这类需要可复盘的浏览器任务。

10. Review checklist

检查项	通过标准
Verification Gate 是否存在	能识别验证页、登录页、异常提示
EnvironmentSnapshot 是否完整	Profile、Session、Proxy 信息齐全
截图是否保存	验证状态出现时有当前页面截图
FailureReason 是否分类	不只写 unknown
ReviewQueue 是否接入	需要人工处理时能暂停
决策是否可追溯	resume / stop / retry 有记录
Agent 是否停止继续操作	验证状态下不会继续点击或提交

结尾

Agent 浏览器任务不是每一步都应该自动继续。

遇到验证页、登录失效、环境异常提示时，最稳的工程设计不是继续点击，而是进入 Verification Gate：暂停、留证据、分类、交给人工复核。

这样任务才不只是能跑，而是能被解释、能被复盘、也能在关键状态下停得住。