【openclaw】OpenClaw Cron 模块超深度架构分析之三

OpenClaw Cron 模块深度解析 --- 第三部分

七、Isolated Agent

📊 Isolated Agent执行流程图

执行引擎深度解析

Isolated Agent 执行引擎是 Cron 系统的核心运行时------负责将定时触发事件转化为一次完整的 Agent 交互会话,管理模型选择、会话生命周期、技能快照、交付调度等全链路逻辑。该子系统位于 isolated-agent/ 目录下,由 ~15 个文件组成,形成一个分层的执行管道。

7.1 run.ts --- 主入口流程

文件路径 : isolated-agent/run.ts
代码行数 : ~530 行
核心职责: 作为 isolated agent turn 的总编排器,协调准备(prepare)→ 执行(execute)→ 收尾(finalize)三阶段流水线。

7.1.1 架构分层

run.ts 采用经典的 prepare-execute-finalize 三段式架构:

复制代码
runCronIsolatedAgentTurn()
  ├─ prepareCronRunContext()   ← 阶段一:准备上下文
  ├─ executeCronRun()          ← 阶段二:执行 Agent Turn(委托 run-executor.ts)
  └─ finalizeCronRun()         ← 阶段三:收尾(遥测、交付、会话持久化)
7.1.2 懒加载运行时模式

文件顶部声明了 7 个懒加载的运行时 Promise 变量:

typescript 复制代码
let sessionStoreRuntimePromise: Promise<typeof import("../../config/sessions/store.runtime.js")> | undefined;
let cronExecutorRuntimePromise: ...;
let cronExternalContentRuntimePromise: ...;
let cronAuthProfileRuntimePromise: ...;
let cronContextRuntimePromise: ...;
let cronModelCatalogRuntimePromise: ...;
let cronDeliveryRuntimePromise: ...;

每个运行时对应一个 loadXxxRuntime() 异步函数,采用 ??= 单次赋值保证幂等------首次调用时动态 import(),后续复用已缓存的 Promise。这种模式的价值在于:

  1. 启动加速run.ts 被大量模块引用,若同步 import 所有运行时会拖慢冷启动。
  2. 循环依赖解耦 :将重量级运行时(如 run-executor.runtime.jsrun-delivery.runtime.js)的导入推迟到实际调用时。
  3. 可测试性 :单元测试可以用 OPENCLAW_TEST_FAST=1 跳过运行时加载。
7.1.3 prepareCronRunContext() --- 准备阶段详解

这是整个系统最复杂的单一函数(~250 行),逐行解析核心逻辑:

Step 1: Agent 身份解析

typescript 复制代码
const defaultAgentId = resolveDefaultAgentId(input.cfg);
const requestedAgentId = ... // 优先级: params.agentId > job.agentId
const agentId = normalizedRequested ?? defaultAgentId;

Agent ID 的解析遵循三层优先级:调用方显式指定 > Job 配置 > 全局默认。normalizeAgentId() 确保 ID 规范化(小写、去除空白)。

Step 2: Agent 配置合并

typescript 复制代码
const agentCfg = buildCronAgentDefaultsConfig({
  defaults: input.cfg.agents?.defaults,
  agentConfigOverride,
});
const cfgWithAgentDefaults = { ...input.cfg, agents: { ...input.cfg.agents, defaults: agentCfg } };

buildCronAgentDefaultsConfig(来自 run-config.ts)将 agent 级别的配置覆盖合并到全局默认之上,生成一个 增强版配置 cfgWithAgentDefaults,后续所有子系统都使用此配置而非原始配置。注意:sandbox 字段被显式排除在合并之外(在 run-config.ts 中处理),避免双重应用。

Step 3: 会话 Key 构建

typescript 复制代码
const baseSessionKey = (input.sessionKey?.trim() || `cron:${input.job.id}`).trim();
const agentSessionKey = resolveCronAgentSessionKey({
  sessionKey: baseSessionKey,
  agentId,
  mainKey: input.cfg.session?.mainKey,
  cfg: input.cfg,
});

会话 Key 经历两级转换:

  1. sessionKey 为空,回退到 cron:<jobId> 模式
  2. resolveCronAgentSessionKey 将原始 key 转为 agent store 格式(toAgentStoreSessionKey),再通过 canonicalizeMainSessionAliasagent:<id>:main 映射为配置的 mainKey 别名------修复了 issue #29683 中 cron 会话孤立的问题

Step 4: 会话解析/复用

typescript 复制代码
const cronSession = resolveCronSession({
  cfg: input.cfg,
  sessionKey: agentSessionKey,
  agentId,
  nowMs: now,
  forceNew: input.job.sessionTarget === "isolated",
});

resolveCronSession(来自 session.ts)执行会话复用决策:

  • sessionTarget === "isolated"forceNew=true,强制创建新会话
  • 否则评估会话新鲜度(evaluateSessionFreshness),根据 reset policy 决定是否复用
  • 新会话会清除 lastChannel/lastTo/lastAccountId/lastThreadId/deliveryContext/sessionFile,防止旧的交付路由信息泄漏

Step 5: 模型选择

typescript 复制代码
const resolvedModelSelection = await resolveCronModelSelection({
  cfg: input.cfg,
  cfgWithAgentDefaults,
  agentConfigOverride,
  sessionEntry: cronSession.sessionEntry,
  payload: input.job.payload,
  isGmailHook,
});

模型选择失败时,prepareCronRunContext 返回 { ok: false, result } 提前退出------这是快速失败策略,避免在模型不可用时浪费后续资源。

Step 6: Thinking Level 解析

typescript 复制代码
let thinkLevel = jobThink ?? hooksGmailThinking;
if (!thinkLevel) {
  thinkLevel = resolveThinkingDefault({ cfg, provider, model, catalog });
}
if (thinkLevel === "xhigh" && !supportsXHighThinking(provider, model)) {
  thinkLevel = "high";
}

优先级链:Job payload 指定 > Gmail hook 配置 > 全局默认推理。xhigh 级别在不支持的模型上自动降级为 high,避免运行时错误。

Step 7: 交付上下文解析

typescript 复制代码
const { deliveryRequested, resolvedDelivery, toolPolicy } = await resolveCronDeliveryContext({
  cfg: cfgWithAgentDefaults,
  job: input.job,
  agentId,
  deliveryContract: input.deliveryContract ?? "cron-owned",
});

deliveryContract 区分两种执行模式:

  • "cron-owned":cron runner 拥有交付权,禁用 agent 的 message tool(disableMessageTool: true),所有交付由 cron delivery 管道处理
  • "shared":共享模式,允许 agent 自行发送消息,但 cron 仍可尝试交付

Step 8: 命令体构建

typescript 复制代码
if (shouldWrapExternal) {
  // 外部 hook 内容通过安全包装器处理
  commandBody = `${safeContent}\n\n${timeLine}`.trim();
} else {
  commandBody = `${base}\n${timeLine}`.trim();
}
commandBody = appendCronDeliveryInstruction({ commandBody, deliveryRequested });

外部内容(webhook/gmail)经过 detectSuspiciousPatterns 安全检查,再由 buildSafeExternalPrompt 包装,防止注入攻击。若启用了交付,追加指令告知 agent 以纯文本返回、交由系统自动投递。

Step 9: 技能快照

typescript 复制代码
const skillsSnapshot = await resolveCronSkillsSnapshot({
  workspaceDir,
  config: cfgWithAgentDefaults,
  agentId,
  existingSnapshot: cronSession.sessionEntry.skillsSnapshot,
  isFastTestEnv,
});

增量更新策略:仅在 snapshot version 变更或 skill filter 变更时重新构建,否则复用已有快照。

Step 10: Auth Profile 解析

typescript 复制代码
const authProfileId = !hasSessionAuthProfileOverride && !hasConfiguredAuthProfiles(...) && !hasAnyAuthProfileStoreSource(agentDir)
  ? undefined
  : await resolveSessionAuthProfileOverride({...});

三重短路条件:若会话无覆盖、全局无配置、agent 目录无 store 源,直接跳过 auth profile 解析,避免不必要的 I/O。

7.1.4 finalizeCronRun() --- 收尾阶段详解

收尾阶段是遥测统计和交付调度的核心:

遥测收集

typescript 复制代码
const modelUsed = finalRunResult.meta?.agentMeta?.model ?? execution.fallbackModel ?? execution.liveSelection.model;
const providerUsed = finalRunResult.meta?.agentMeta?.provider ?? execution.fallbackProvider ?? execution.liveSelection.provider;

模型/提供商的解析遵循三级回退:运行时 meta > fallback 模型 > 初始选择。若 usage 非零,计算 token 成本并累计到 sessionEntry.estimatedCostUsd

Payload 结果解析

typescript 复制代码
const { summary, outputText, synthesizedText, deliveryPayloads, hasFatalErrorPayload, ... } = 
  resolveCronPayloadOutcome({ payloads, runLevelError, finalAssistantVisibleText, preferFinalAssistantVisibleText });

resolveCronPayloadOutcome(来自 helpers.ts)执行复杂的 payload 分类逻辑:

  • 错误 payload 后若无成功 payload,标记为 hasFatalErrorPayload
  • Telegram 渠道优先使用 finalAssistantVisibleText
  • 结构化内容(媒体/交互卡片)保持原样,不折叠为纯文本

Heartbeat 过滤

typescript 复制代码
const skipHeartbeatDelivery = prepared.deliveryRequested && 
  isHeartbeatOnlyResponse(payloads, resolveHeartbeatAckMaxChars(prepared.agentCfg));

当 agent 仅返回心跳确认(如 "OK"、"ack")且文本不超过 ackMaxChars 时,跳过交付------避免无意义通知打扰用户。

交付调度

typescript 复制代码
const deliveryResult = await dispatchCronDelivery({...});

委托给 delivery-dispatch.ts,详见第八节。

7.1.5 runCronIsolatedAgentTurn() --- 顶层入口
typescript 复制代码
export async function runCronIsolatedAgentTurn(params) {
  const abortSignal = params.abortSignal ?? params.signal;
  const isAborted = () => abortSignal?.aborted === true;
  const abortReason = () => { ... };
  
  const prepared = await prepareCronRunContext({ input: params, isFastTestEnv });
  if (!prepared.ok) return prepared.result;

  try {
    const execution = await executeCronRun({...});
    if (isAborted()) return prepared.context.withRunSession({ status: "error", error: abortReason() });
    return await finalizeCronRun({ prepared: prepared.context, execution, ... });
  } catch (err) {
    return prepared.context.withRunSession({ status: "error", error: String(err) });
  }
}

设计要点:

  1. 双 signal 支持abortSignalsignal 参数并存(向后兼容),优先使用 abortSignal
  2. isAborted/abortReason 闭包 :避免重复读取 signal 状态,且 abortReason 优先使用 signal.reason 字符串
  3. withRunSession 包装器 :确保所有返回值都携带 sessionIdsessionKey,即使 prepare 阶段失败也不例外
📊 建议配图:run.ts 执行流水线
复制代码
节点: [runCronIsolatedAgentTurn] → [prepareCronRunContext] → [executeCronRun] → [finalizeCronRun]
边:
  - prepare→execute: "prepared context (ok=true)"
  - prepare→exit: "ok=false, early return"
  - execute→finalize: "CronExecutionResult"
  - execute→catch: "unhandled exception"
子节点(prepare 内部):
  [Agent ID 解析] → [Config 合并] → [Session Key] → [Session 解析] → [Model 选择] → [Thinking Level] → [Delivery 上下文] → [Command Body 构建] → [Skills Snapshot] → [Auth Profile] → [Session Pre-Run 持久化]
标注: 每个子节点标注关键决策点(如 "forceNew?", "model ok?", "deliveryRequested?")

7.2 run-executor.ts --- 执行编排

文件路径 : isolated-agent/run-executor.ts
代码行数 : ~210 行
核心职责: 创建 Agent 执行器,管理模型故障转移、LiveSession 模型切换重试、以及"中间确认"(interim ack)检测与重试。

7.2.1 createCronPromptExecutor() --- 执行器工厂

这是一个闭包工厂 ,返回 { runPrompt, getState } 对象:

typescript 复制代码
export function createCronPromptExecutor(params) {
  let runResult: CronPromptRunResult | undefined;
  let fallbackProvider = params.liveSelection.provider;
  let fallbackModel = params.liveSelection.model;
  let runEndedAt = Date.now();
  let bootstrapPromptWarningSignaturesSeen = ...;

  const runPrompt = async (promptText: string) => {
    const fallbackResult = await runWithModelFallback({
      cfg: params.cfgWithAgentDefaults,
      provider: params.liveSelection.provider,
      model: params.liveSelection.model,
      fallbacksOverride: cronFallbacksOverride,
      run: async (providerOverride, modelOverride, runOptions) => {
        // 实际执行逻辑
        if (isCliProvider(providerOverride, ...)) {
          return await runCliAgent({...});
        }
        return await runEmbeddedPiAgent({...});
      },
    });
    runResult = fallbackResult.result;
    fallbackProvider = fallbackResult.provider;
    fallbackModel = fallbackResult.model;
    runEndedAt = Date.now();
  };

  return { runPrompt, getState: () => ({ runResult, fallbackProvider, fallbackModel, runEndedAt, liveSelection }) };
}

关键设计

  1. 双执行路径isCliProvider 分支走 runCliAgent(CLI 外部进程),否则走 runEmbeddedPiAgent(内嵌引擎)
  2. fallback 委托runWithModelFallback 处理主模型失败时的故障转移链,cronFallbacksOverride 来自 resolveCronFallbacksOverride
  3. 状态可变闭包runResultfallbackProviderfallbackModel 在闭包内可变,getState() 返回当前快照

embedded agent 参数详解

typescript 复制代码
await runEmbeddedPiAgent({
  trigger: "cron",
  allowGatewaySubagentBinding: true,   // 允许 gateway 子代理绑定
  senderIsOwner: false,                 // cron 触发非 owner
  messageChannel: params.messageChannel,
  agentAccountId: params.resolvedDelivery.accountId,
  fastMode: resolveFastModeState({...}).enabled,
  bootstrapContextMode: params.agentPayload?.lightContext ? "lightweight" : undefined,
  toolsAllow: params.agentPayload?.toolsAllow,       // 工具白名单
  requireExplicitMessageTarget: params.toolPolicy.requireExplicitMessageTarget,
  disableMessageTool: params.toolPolicy.disableMessageTool,
  allowTransientCooldownProbe: runOptions?.allowTransientCooldownProbe,
  abortSignal: params.abortSignal,
  bootstrapPromptWarningSignaturesSeen,  // 引导签名追踪
});

lightContext 模式减少 bootstrap 上下文注入,toolsAllow 限制可用工具集,disableMessageTool 在 cron-owned 模式下阻止 agent 自行发送消息。

7.2.2 executeCronRun() --- 主执行循环
typescript 复制代码
export async function executeCronRun(params): Promise<CronExecutionResult> {
  // 注册 agent run context(用于日志关联)
  registerAgentRunContext(params.cronSession.sessionEntry.sessionId, {
    sessionKey: params.agentSessionKey,
    verboseLevel: resolvedVerboseLevel,
  });

  const executor = createCronPromptExecutor({...});
  const runStartedAt = params.runStartedAt ?? Date.now();
  const MAX_MODEL_SWITCH_RETRIES = 2;
  let modelSwitchRetries = 0;

  // LiveSession 模型切换重试循环
  while (true) {
    try {
      await executor.runPrompt(params.commandBody);
      break;
    } catch (err) {
      if (!(err instanceof LiveSessionModelSwitchError)) throw err;
      modelSwitchRetries += 1;
      if (modelSwitchRetries > MAX_MODEL_SWITCH_RETRIES) throw err;
      // 更新 live selection 并同步到会话
      params.liveSelection.provider = err.provider;
      params.liveSelection.model = err.model;
      params.liveSelection.authProfileId = err.authProfileId;
      syncCronSessionLiveSelection({ entry: params.cronSession.sessionEntry, liveSelection });
      await params.persistSessionEntry();
      continue;
    }
  }

LiveSessionModelSwitchError 是一种特殊的"非致命错误"------表示会话运行时决定切换模型(如模型过载、配额耗尽)。执行器捕获后更新选择并重试,最多重试 2 次(共 3 次尝试)。

7.2.3 Interim Ack 检测与重试

这是 cron 执行引擎最精巧的部分之一:

typescript 复制代码
const shouldRetryInterimAck =
  !runResult.meta?.error &&                                    // 无运行时错误
  !runResult.didSendViaMessagingTool &&                        // agent 未自行发送消息
  !interimPayloadHasStructuredContent &&                        // 无结构化内容
  !interimPayloads.some((payload) => payload?.isError === true) && // 无错误 payload
  isLikelyInterimCronMessage(interimText);                     // 文本匹配"中间确认"模式

当 agent 返回类似 "on it"、"working on it"、"give me a few minutes" 的临时确认时,系统自动重试------追加一条 continuation prompt:

typescript 复制代码
const continuationPrompt = [
  "Your previous response was only an acknowledgement and did not complete this cron task.",
  "Complete the original task now.",
  "Do not send a status update like 'on it'.",
  "Use tools when needed, including sessions_spawn for parallel subtasks, wait for spawned subagents to finish, then return only the final summary.",
].join(" ");
await executor.runPrompt(continuationPrompt);

子代理活跃检测:重试前还会检查是否有活跃的子代理------若子代理正在运行,说明 agent 已委派任务,无需重试:

typescript 复制代码
if (shouldRetryInterimAck) {
  hasFreshDescendants = listDescendantRunsForRequester(params.agentSessionKey).some(entry => {
    const descendantStartedAt = ...;
    return descendantStartedAt >= runStartedAt;  // 只算本次运行后启动的子代理
  });
  hasActiveDescendants = countActiveDescendantRuns(params.agentSessionKey) > 0;
}
if (shouldRetryInterimAck && !hasFreshDescendants && !hasActiveDescendants) {
  await executor.runPrompt(continuationPrompt);
}
📊 建议配图:executeCronRun 执行流
复制代码
节点: 
  [executeCronRun] → [createCronPromptExecutor] → [runPrompt Loop] → [Interim Ack 检测] → [重试/退出]
  
边:
  - executeCronRun→runPrompt: "commandBody"
  - runPrompt→catch: "LiveSessionModelSwitchError" (最多 2 次重试)
  - runPrompt→success: "CronPromptRunResult"
  - success→interim check: "shouldRetryInterimAck?"
  - interim→retry: "continuationPrompt" (条件: !hasFreshDescendants && !hasActiveDescendants)
  - interim→exit: "返回 CronExecutionResult"
  
标注:
  - "MAX_MODEL_SWITCH_RETRIES=2"
  - "isLikelyInterimCronMessage: 45词以内 + 匹配 hint 列表"
  - "子代理检测: listDescendantRunsForRequester + countActiveDescendantRuns"

7.3 model-selection.ts --- 模型选择与故障转移

文件路径 : isolated-agent/model-selection.ts
代码行数 : ~100 行
核心职责: 解析 cron job 应使用的 AI 模型,实现五级优先级链和访问控制校验。

7.3.1 五级模型优先级
复制代码
resolveCronModelSelection() 优先级(从高到低):

1. payload.model            --- Job 级别显式指定(最高优先级,含访问控制校验)
2. hooksGmailModel          --- Gmail hook 专用模型覆盖
3. sessionEntry.modelOverride --- 会话级持久化覆盖(上次运行时切换的模型)
4. agentConfigOverride      --- Agent 级别配置(subagents.model > model)
5. 全局默认                 --- resolveConfiguredModelRef(cfg)
7.3.2 逐级解析逻辑

Level 5: 全局默认

typescript 复制代码
const resolvedDefault = resolveConfiguredModelRef({
  cfg: params.cfgWithAgentDefaults,
  defaultProvider: DEFAULT_PROVIDER,
  defaultModel: DEFAULT_MODEL,
});
let provider = resolvedDefault.provider;
let model = resolvedDefault.model;

resolveConfiguredModelRef 读取 agents.defaults.model 配置,解析 provider/model 对。

Level 4: Agent 配置覆盖

typescript 复制代码
const subagentModelRaw =
  normalizeModelSelection(params.agentConfigOverride?.subagents?.model) ??
  normalizeModelSelection(params.agentConfigOverride?.model) ??
  normalizeModelSelection(params.cfg.agents?.defaults?.subagents?.model);
if (subagentModelRaw) {
  const resolvedSubagent = resolveAllowedModelRef({
    cfg, catalog, raw: subagentModelRaw, defaultProvider, defaultModel
  });
  if (!("error" in resolvedSubagent)) {
    provider = resolvedSubagent.ref.provider;
    model = resolvedSubagent.ref.model;
  }
}

注意:优先使用 subagents.model(子代理专用模型),回退到 model(通用模型),再回退到全局 subagents.model。若 resolveAllowedModelRef 返回 error,静默跳过此级------不会中断,只是不覆盖。

Level 3: Gmail hook 模型

typescript 复制代码
const hooksGmailModelRef = params.isGmailHook ? resolveHooksGmailModel({cfg, defaultProvider}) : null;
if (hooksGmailModelRef) {
  const status = getModelRefStatus({cfg, catalog, ref: hooksGmailModelRef, defaultProvider, defaultModel});
  if (status.allowed) {
    provider = hooksGmailModelRef.provider;
    model = hooksGmailModelRef.model;
    hooksGmailModelApplied = true;
  }
}

Gmail hook 模型需要通过 allowed 校验,不通过则静默跳过。

Level 2: Payload 显式指定

typescript 复制代码
const modelOverride = typeof modelOverrideRaw === "string" ? modelOverrideRaw.trim() : undefined;
if (modelOverride !== undefined && modelOverride.length > 0) {
  const resolvedOverride = resolveAllowedModelRef({cfg, catalog, raw: modelOverride, ...});
  if ("error" in resolvedOverride) {
    if (resolvedOverride.error.startsWith("model not allowed:")) {
      // 模型不允许 → 回退到当前选择 + 发出 warning
      return { ok: true, provider, model, warning: `...` };
    }
    // 其他错误(格式错误等)→ 硬失败
    return { ok: false, error: resolvedOverride.error };
  }
  provider = resolvedOverride.ref.provider;
  model = resolvedOverride.ref.model;
}

关键区分:

  • "model not allowed:" → 软降级(ok: true + warning),job 仍可运行
  • 其他错误 → 硬失败(ok: false),job 无法执行

Level 1: 会话级覆盖

typescript 复制代码
if (!modelOverride && !hooksGmailModelApplied) {
  const sessionModelOverride = params.sessionEntry.modelOverride?.trim();
  if (sessionModelOverride) {
    const resolvedSessionOverride = resolveAllowedModelRef({...});
    if (!("error" in resolvedSessionOverride)) {
      provider = resolvedSessionOverride.ref.provider;
      model = resolvedSessionOverride.ref.model;
    }
  }
}

会话覆盖仅在 payload 和 Gmail 均未覆盖时生效。错误被静默吞掉------因为会话覆盖是运行时自动产生的,不应因格式问题阻断执行。

7.3.3 Model Catalog 延迟加载
typescript 复制代码
let catalog: Awaited<ReturnType<typeof loadModelCatalog>> | undefined;
const loadCatalogOnce = async () => {
  if (!catalog) {
    catalog = await loadModelCatalog({ config: params.cfgWithAgentDefaults });
  }
  return catalog;
};

Catalog 只在首次需要校验时加载,后续复用缓存。

📊 建议配图:模型选择优先级瀑布
复制代码
节点(从上到下):
  [全局默认: resolveConfiguredModelRef] → [Agent 配置: subagents.model / model] → [Gmail Hook: resolveHooksGmailModel] → [Payload: job.payload.model] → [会话覆盖: sessionEntry.modelOverride]
  
边:
  - 每级→下一级: "未覆盖/校验失败,向下传递"
  - 每级→exit: "校验通过,使用此模型"
  - Payload→exit(warning): "model not allowed, 降级 + warning"
  - Payload→exit(error): "其他错误, 硬失败"
  
标注:
  - "resolveAllowedModelRef: 访问控制校验"
  - "getModelRefStatus: Gmail 模型 allowed 检查"
  - "静默跳过: Agent/Gmail/Session 级别校验失败不报错"

7.4 subagent-followup.ts --- 子代理委派

文件路径 : isolated-agent/subagent-followup.ts
代码行数 : ~120 行
核心职责: 等待子代理完成并收集其输出,作为 cron agent 最终交付内容的来源。

7.4.1 readDescendantSubagentFallbackReply()
typescript 复制代码
export async function readDescendantSubagentFallbackReply(params: {
  sessionKey: string;
  runStartedAt: number;
}): Promise<string | undefined> {
  const descendants = listDescendantRunsForRequester(params.sessionKey)
    .filter(entry => typeof entry.endedAt === "number" && entry.endedAt >= params.runStartedAt)
    .toSorted((a, b) => (a.endedAt ?? 0) - (b.endedAt ?? 0));

步骤:

  1. 列出所有已完成的子代运行(endedAt >= runStartedAt,只算本次触发的)
  2. 按子 session 去重,每个子 session 只保留最新一次运行
  3. 取最近 4 个子 session 的最新回复
  4. 过滤 SILENT_REPLY_TOKEN,拼接为多段落文本

冻结结果回退

typescript 复制代码
if (!reply && typeof entry.frozenResultText === "string" && entry.frozenResultText.trim()) {
  reply = entry.frozenResultText.trim();
}

当子 session transcript 已被删除(如 announce 清理),回退到注册表中冻结的结果文本。

7.4.2 waitForDescendantSubagentSummary()
typescript 复制代码
export async function waitForDescendantSubagentSummary(params: {
  sessionKey: string;
  initialReply?: string;
  timeoutMs: number;
  observedActiveDescendants?: boolean;
}): Promise<string | undefined> {

执行流程

  1. 快路径 :若无活跃子代且未被观察到,直接返回 initialReply
  2. 等待排空 :调用 waitForAgentRunsToDrain() --- 基于 push 的等待(gateway RPC agent.wait),而非忙轮询
  3. 优雅期轮询 :子代全部完成后,等待 cron agent 产出合成消息(finalReplyGraceMs,默认 5 秒)
  4. 最终读取:优雅期结束后最后一次读取

时序配置

typescript 复制代码
function resolveCronSubagentTimings() {
  const fastTestMode = process.env.OPENCLAW_TEST_FAST === "1";
  return {
    waitMinMs: fastTestMode ? 10 : 30_000,     // 最小等待时间
    finalReplyGraceMs: fastTestMode ? 50 : 5_000, // 优雅期
    gracePollMs: fastTestMode ? 8 : 200,          // 优雅期轮询间隔
  };
}

合成消息识别

typescript 复制代码
const resolveUsableLatestReply = async () => {
  const latest = (await readLatestAssistantReply({ sessionKey }))?.trim();
  if (latest && latest !== SILENT_REPLY_TOKEN && 
      (latest !== initialReply || !isLikelyInterimCronMessage(latest))) {
    return latest;
  }
  return undefined;
};

只有当最新回复不同于初始临时确认时才视为有效的合成消息。

📊 建议配图:子代理等待流程
复制代码
节点:
  [waitForDescendantSubagentSummary] → [快路径: 无活跃子代?] → [waitForAgentRunsToDrain] → [优雅期轮询] → [最终读取]
  
边:
  - 快路径→exit: "返回 initialReply"
  - 快路径→wait: "有活跃子代"
  - wait→grace: "所有子代完成"
  - grace→found: "最新回复 ≠ 初始临时确认"
  - grace→final: "优雅期超时"
  - final→exit: "返回最终回复或 undefined"
  
标注:
  - "push-based: gateway RPC agent.wait"
  - "finalReplyGraceMs=5000"
  - "gracePollMs=200"
  - "isLikelyInterimCronMessage: 过滤临时确认"

7.5 subagent-followup-hints.ts --- 子代理委派提示词

文件路径 : isolated-agent/subagent-followup-hints.ts
代码行数 : ~40 行
核心职责: 定义"临时确认"和"子代理委派"的文本模式匹配规则。

7.5.1 两个提示词列表
typescript 复制代码
const SUBAGENT_FOLLOWUP_HINTS = [
  "subagent spawned", "spawned a subagent", "auto-announce when done",
  "both subagents are running", "wait for them to report back",
] as const;

const INTERIM_CRON_HINTS = [
  "on it", "pulling everything together", "give me a few", "give me a few min",
  "few minutes", "let me compile", "i'll gather", "i will gather",
  "working on it", "retrying now", "should be about", "should have your summary",
  "it'll auto-announce when done", "it will auto-announce when done",
  ...SUBAGENT_FOLLOWUP_HINTS,
] as const;
  • SUBAGENT_FOLLOWUP_HINTS(5 条):强信号------agent 明确表示已委派子代理
  • INTERIM_CRON_HINTS(17 条):弱信号------agent 表达了"正在处理"但未完成
7.5.2 匹配算法
typescript 复制代码
export function isLikelyInterimCronMessage(value: string): boolean {
  const normalized = normalizeHintText(value);
  if (!normalized) return false;  // 空文本 ≠ 临时确认(可能是 NO_REPLY)
  const words = normalized.split(" ").filter(Boolean).length;
  return words <= 45 && INTERIM_CRON_HINTS.some(hint => normalized.includes(hint));
}

export function expectsSubagentFollowup(value: string): boolean {
  const normalized = normalizeHintText(value);
  return Boolean(normalized && SUBAGENT_FOLLOWUP_HINTS.some(hint => normalized.includes(hint)));
}

设计要点:

  1. 45 词上限:长回复即使包含 "on it" 也不是临时确认(可能是详细报告的一部分)
  2. 子串匹配includes 而非严格相等,容忍上下文修饰
  3. 空文本特殊处理 :空文本返回 false------表示 agent 选择沉默(NO_REPLY),不应被重试
📊 建议配图:提示词分类
复制代码
节点:
  [输入文本] → [normalizeHintText] → [词数检查] → [子串匹配]
  
边:
  - 词数>45→exit(false): "长文本不可能是临时确认"
  - 空文本→exit(false): "NO_REPLY,不重试"
  - INTERIM_CRON_HINTS 匹配→exit(true): "临时确认,需要重试"
  - SUBAGENT_FOLLOWUP_HINTS 匹配→expectsSubagentFollowup(true): "需要等待子代理"
  
标注:
  - "INTERIM_CRON_HINTS: 17 条模式"
  - "SUBAGENT_FOLLOWUP_HINTS: 5 条模式(INTERIM 的子集)"

7.6 session.ts --- 会话管理

文件路径 : isolated-agent/session.ts
代码行数 : ~60 行
核心职责: 解析或创建 cron 会话,实现会话复用/滚动策略。

7.6.1 resolveCronSession() 核心逻辑
typescript 复制代码
export function resolveCronSession(params) {
  const storePath = resolveStorePath(sessionCfg?.store, { agentId });
  const store = loadSessionStore(storePath);
  const entry = store[params.sessionKey];

  if (!params.forceNew && entry?.sessionId) {
    const resetPolicy = resolveSessionResetPolicy({ sessionCfg, resetType: "direct" });
    const freshness = evaluateSessionFreshness({
      updatedAt: entry.updatedAt, now: params.nowMs, policy: resetPolicy,
    });
    if (freshness.fresh) {
      // 复用: sessionId = entry.sessionId, isNewSession = false
    } else {
      // 滚动: sessionId = crypto.randomUUID(), isNewSession = true
    }
  } else {
    // 强制新: sessionId = crypto.randomUUID(), isNewSession = true
  }

  clearBootstrapSnapshotOnSessionRollover({
    sessionKey, previousSessionId: isNewSession ? entry?.sessionId : undefined,
  });

  const sessionEntry = {
    ...entry,                    // 保留已有的 per-session 覆盖
    sessionId, updatedAt, systemSent,
    ...(isNewSession && {        // 新会话清除路由状态
      lastChannel: undefined, lastTo: undefined, lastAccountId: undefined,
      lastThreadId: undefined, deliveryContext: undefined, sessionFile: undefined,
    }),
  };
}

关键设计决策

  1. resetType: "direct":Cron 会话使用"直聊"重置策略------与用户 1:1 对话模式相同的新鲜度评估
  2. 路由状态清除 :新会话时清除 lastThreadIddeliveryContext,防止旧的 thread 路由泄漏到新会话(例如,上次运行在 Telegram 线程中回复,新会话不应自动继续该线程)
  3. Spread 保留...entry 保留 authProfileOverridecontextTokens 等跨会话持久化的字段,即使 sessionId 滚动

Bootstrap 快照清理

typescript 复制代码
clearBootstrapSnapshotOnSessionRollover({
  sessionKey, previousSessionId: isNewSession ? entry?.sessionId : undefined,
});

当会话滚动时,清除 bootstrap prompt 的缓存快照------新会话需要重新生成 system prompt。

📊 建议配图:会话复用决策树
复制代码
节点:
  [resolveCronSession] → [forceNew?] → [entry 存在?] → [freshness.fresh?] → [返回结果]
  
边:
  - forceNew=true→新会话: "isolated sessionTarget"
  - forceNew=false, entry 不存在→新会话: "首次运行"
  - forceNew=false, entry 存在, fresh=true→复用: "保留 sessionId"
  - forceNew=false, entry 存在, fresh=false→滚动: "新 UUID + 清除路由状态"
  
标注:
  - "resetType=direct: 1:1 对话新鲜度策略"
  - "clearBootstrapSnapshotOnSessionRollover: 会话滚动时清除缓存"
  - "保留: authProfileOverride, contextTokens, estimatedCostUsd"
  - "清除: lastChannel, lastTo, lastThreadId, deliveryContext, sessionFile"

7.7 skills-snapshot.ts --- 技能快照

文件路径 : isolated-agent/skills-snapshot.ts
代码行数 : ~45 行
核心职责: 解析 cron agent 可用的技能快照,实现增量更新策略。

7.7.1 resolveCronSkillsSnapshot() 核心逻辑
typescript 复制代码
export async function resolveCronSkillsSnapshot(params): Promise<SkillSnapshot> {
  if (params.isFastTestEnv) {
    return params.existingSnapshot ?? { prompt: "", skills: [] };
  }

  const snapshotVersion = runtime.getSkillsSnapshotVersion(params.workspaceDir);
  const skillFilter = runtime.resolveAgentSkillsFilter(params.config, params.agentId);
  const shouldRefresh =
    !existingSnapshot ||
    existingSnapshot.version !== snapshotVersion ||
    !matchesSkillFilter(existingSnapshot.skillFilter, skillFilter);
  if (!shouldRefresh) return existingSnapshot;

  return runtime.buildWorkspaceSkillSnapshot(params.workspaceDir, {
    config, agentId, skillFilter,
    eligibility: {
      remote: runtime.getRemoteSkillEligibility({
        advertiseExecNode: runtime.canExecRequestNode({ cfg: config, agentId }),
      }),
    },
    snapshotVersion,
  });
}

增量更新三条件(任一为真则刷新):

  1. 无已有快照(首次运行)
  2. snapshotVersion 变更(workspace 文件有变动)
  3. skillFilter 不匹配(agent 的技能过滤配置变更)

远程技能资格

typescript 复制代码
eligibility: {
  remote: runtime.getRemoteSkillEligibility({
    advertiseExecNode: runtime.canExecRequestNode({ cfg, agentId }),
  }),
}

canExecRequestNode 检查 agent 是否配置了执行节点能力,决定是否在快照中包含远程技能。

📊 建议配图:技能快照更新决策
复制代码
节点:
  [resolveCronSkillsSnapshot] → [isFastTestEnv?] → [shouldRefresh?] → [buildWorkspaceSkillSnapshot]
  
边:
  - fastTest→exit: "返回 existingSnapshot 或空快照"
  - !shouldRefresh→exit: "返回 existingSnapshot"
  - shouldRefresh→build: "重新构建"
  
标注:
  - "shouldRefresh = !existing || version 变更 || filter 变更"
  - "eligibility.remote: 取决于 canExecRequestNode"

八、交付系统深度解析

交付系统是 Cron 模块与外部世界(Telegram、Discord、Feishu 等)的桥梁------负责将 agent 的执行结果投递到指定的渠道和目标。该系统由三层组成:计划生成(delivery-plan)、目标解析(delivery-target)、调度执行(delivery-dispatch)。

8.1 delivery-plan.ts --- 交付计划生成

文件路径 : delivery-plan.ts
代码行数 : ~180 行
核心职责: 从 Job 的 delivery 配置生成结构化的交付计划,并解析故障目标(failureDestination)。

8.1.1 resolveCronDeliveryPlan() --- 主交付计划
typescript 复制代码
export function resolveCronDeliveryPlan(job: CronJob): CronDeliveryPlan {
  const delivery = job.delivery;
  const hasDelivery = delivery && typeof delivery === "object";
  
  // Mode 规范化
  const mode = normalizedMode === "announce" ? "announce"
    : normalizedMode === "webhook" ? "webhook"
    : normalizedMode === "none" ? "none"
    : normalizedMode === "deliver" ? "announce"    // deliver → announce 别名
    : undefined;

Mode 映射规则

输入 映射 说明
announce announce 直接投递到渠道
deliver announce 向后兼容别名
webhook webhook HTTP POST 回调
none none 不交付
未指定 按 payload kind 决定 见下文

隐式模式推断(当 job 无 delivery 配置时):

typescript 复制代码
const isIsolatedAgentTurn =
  job.payload.kind === "agentTurn" &&
  (job.sessionTarget === "isolated" || job.sessionTarget === "current" || 
   job.sessionTarget.startsWith("session:"));
const resolvedMode = isIsolatedAgentTurn ? "announce" : "none";

Isolated agent turn 默认使用 announce 模式------因为独立会话的 agent 无法自行发送消息(disableMessageTool=true),结果必须由系统投递。

8.1.2 resolveFailureDestination() --- 故障目标
typescript 复制代码
export function resolveFailureDestination(
  job: CronJob, globalConfig?: CronFailureDestinationConfig,
): CronFailureDeliveryPlan | null {

优先级 :Job 级 failureDestination > 全局配置 cron.failureDestination

去重逻辑

typescript 复制代码
if (delivery && isSameDeliveryTarget(delivery, result)) {
  return null;  // 故障目标与正常交付目标相同时,不重复发送
}

isSameDeliveryTarget 比较 mode/channel/to/accountId------若故障目标与正常交付目标一致,跳过故障通知(因为正常交付已经会发送错误消息)。

webhook 模式特殊校验

typescript 复制代码
if (resolvedMode === "webhook" && !to) {
  return null;  // webhook 必须有 URL,否则无效
}

Job 覆盖逻辑的精细设计

typescript 复制代码
if (hasJobChannelField) channel = jobChannel;     // "channel" in jobFailureDest → 覆盖
if (hasJobToField) to = jobTo;                      // "to" in jobFailureDest → 覆盖
if (hasJobAccountIdField) accountId = jobAccountId; // "accountId" in jobFailureDest → 覆盖

使用 "field" in obj 而非 obj.field !== undefined 判断------区分"显式设置为空"和"未设置"。只有显式存在的字段才覆盖全局配置。

Mode 切换与 to 的联动

typescript 复制代码
if (jobMode !== undefined) {
  const globalMode = globalConfig?.mode ?? "announce";
  if (!jobToExplicitValue && globalMode !== jobMode) {
    to = undefined;  // 全局 announce→webhook 切换时,清空 to(因为 to 可能是 chat ID 而非 URL)
  }
  mode = jobMode;
}

当从 announce 切换到 webhook 模式且 job 未显式设置 to 时,清空全局的 to------因为 announce 模式的 to 是 chat ID,而 webhook 的 to 是 URL,两者不可混用。

📊 建议配图:交付计划生成
复制代码
节点:
  [resolveCronDeliveryPlan] → [hasDelivery?] → [mode 规范化] → [字段提取] → [返回 CronDeliveryPlan]
  [resolveFailureDestination] → [全局配置] → [Job 覆盖] → [去重检查] → [返回 CronFailureDeliveryPlan | null]
  
边:
  - hasDelivery=false→隐式推断: "isolated→announce, 其他→none"
  - mode=deliver→announce: "别名映射"
  - isSameDeliveryTarget→null: "故障目标与正常目标相同,去重"
  - webhook && !to→null: "缺少 URL,无效"
  
标注:
  - "Job 覆盖优先于全局配置"
  - "hasJobChannelField: 'channel' in obj(区分显式空与未设置)"
  - "mode 切换时 to 联动清空"

8.2 delivery-dispatch.ts --- 交付调度

文件路径 : isolated-agent/delivery-dispatch.ts
代码行数 : ~430 行
核心职责: 实现交付结果的实际调度------处理 announce/webhook 模式投递、子代理输出收集、NO_REPLY 抑制、幂等性缓存、瞬态重试、陈旧性检测。

这是整个 cron 模块最复杂的单一文件,交织了交付路由、子代理等待、重试策略、缓存管理等多重关注点。

8.2.1 dispatchCronDelivery() --- 总调度入口

函数签名包含 25+ 个参数,返回 DispatchCronDeliveryState

typescript 复制代码
export async function dispatchCronDelivery(params): Promise<DispatchCronDeliveryState> {

早期退出路径

  1. delivery 未请求 :直接返回 { delivered, deliveryAttempted, ... }
  2. heartbeat-only :跳过交付(skipHeartbeatDelivery
  3. messaging-tool 已发送skipMessagingToolDelivery 时标记已交付
  4. resolvedDelivery 失败:非 bestEffort → 错误返回;bestEffort → 警告后继续
8.2.2 deliverViaDirect() --- 直投路径
typescript 复制代码
const deliverViaDirect = async (delivery, options?) => {
  // 1. NO_REPLY 抑制
  const payloadsForDelivery = rawPayloads.filter(p => {
    const text = p.text ?? "";
    if (isSilentReplyText(text, SILENT_REPLY_TOKEN)) return false;
    const upper = text.toUpperCase();
    const stripped = stripSilentToken(upper, SILENT_REPLY_TOKEN);
    return stripped === upper.trim();
  });
  
  // 2. 空过滤 → finishSilentReplyDelivery
  if (payloadsForDelivery.length === 0) return await finishSilentReplyDelivery();
  
  // 3. 陈旧性检测
  if (isStaleCronDelivery({ job, runStartedAt })) { ... skip ... }
  
  // 4. 幂等性缓存
  const cachedResults = getCompletedDirectCronDelivery(deliveryIdempotencyKey);
  if (cachedResults) { delivered = true; return null; }
  
  // 5. 实际投递
  const deliveryResults = options?.retryTransient
    ? await retryTransientDirectCronDelivery({ jobId, signal, run: runDelivery })
    : await runDelivery();
  
  // 6. bestEffort 部分失败处理
  let hadPartialFailure = false;
  const onError = params.deliveryBestEffort
    ? (err, _payload) => { hadPartialFailure = true; logCronDeliveryErrorDeferred(...); }
    : undefined;
  
  // 7. 成功后缓存 + awareness 通知
  if (delivered && shouldQueueCronAwareness(job, deliveryBestEffort)) {
    await queueCronAwarenessSystemEvent({...});
  }
  if (delivered) rememberCompletedDirectCronDelivery(deliveryIdempotencyKey, deliveryResults);
};

七个处理步骤的设计意图

  1. NO_REPLY 抑制 :同时处理完全匹配和尾部附加的 NO_REPLY token
  2. 空过滤 :所有 payload 都被抑制后,标记 delivered=falsedeliveryAttempted=true
  3. 陈旧性检测 :如果交付延迟超过 3 小时(STALE_CRON_DELIVERY_MAX_START_DELAY_MS),跳过投递
  4. 幂等性缓存 :基于 runSessionId + channel + accountId + to + threadId 构建键,防止重复投递
  5. 瞬态重试retryTransientDirectCronDelivery 最多 3 次重试,间隔 5s/10s/20s
  6. bestEffort :部分失败时记录错误但不中断,delivered 标记为 false
  7. Awareness 通知:isolated 运行成功交付后,向主会话发送系统事件,让用户知道有新输出
8.2.3 finalizeTextDelivery() --- 文本交付终态化
typescript 复制代码
const finalizeTextDelivery = async (delivery) => {
  // 1. 子代理输出收集
  const expectedSubagentFollowup = expectsSubagentFollowup(initialSynthesizedText);
  let activeSubagentRuns = countActiveDescendantRuns(agentSessionKey);
  
  // 2. 已完成子代回退
  const completedDescendantReply = shouldCheckCompletedDescendants
    ? await readDescendantSubagentFallbackReply({ sessionKey, runStartedAt })
    : undefined;
  
  // 3. 等待活跃子代
  if (activeSubagentRuns > 0 || expectedSubagentFollowup) {
    let finalReply = await waitForDescendantSubagentSummary({...});
    // ...
  }
  
  // 4. 子代仍在活跃 → 返回部分结果
  if (activeSubagentRuns > 0) {
    deliveryAttempted = true;
    return { status: "ok", deliveryAttempted, ... };
  }
  
  // 5. 无改进的临时确认抑制
  if (hadDescendants && synthesizedText === initialSynthesizedText && isLikelyInterimCronMessage(...)) {
    deliveryAttempted = true;
    return { status: "ok", ... };  // 抑制陈旧的"on it"
  }
  
  // 6. NO_REPLY 检测
  // 7. 实际投递
  return await deliverViaDirect(delivery, { retryTransient: true });
};

子代理输出收集的三级策略

条件 行为
有活跃子代 或 expectsSubagentFollowup waitForDescendantSubagentSummary() --- push-based 等待
无活跃但有已完成子代 + isLikelyInterimCronMessage readDescendantSubagentFallbackReply() --- 读取冻结结果
均不满足 直接投递初始文本
8.2.4 瞬态错误分类
typescript 复制代码
const TRANSIENT_DIRECT_CRON_DELIVERY_ERROR_PATTERNS: readonly RegExp[] = [
  /\berrorcode=unavailable\b/i,
  /\bUNAVAILABLE\b/,
  /no active .* listener/i,
  /gateway not connected/i,
  /gateway closed \(1006/i,
  /\b(econnreset|econnrefused|etimedout|enotfound|ehostunreach|network error)\b/i,
];

const PERMANENT_DIRECT_CRON_DELIVERY_ERROR_PATTERNS: readonly RegExp[] = [
  /unsupported channel/i,
  /chat not found/i,
  /bot.*not.*member/i,
  /bot was blocked by the user/i,
  /forbidden: bot was kicked/i,
];

分类逻辑:

  1. 先检查是否匹配永久错误模式
  2. 若不匹配永久模式,再检查是否匹配瞬态模式
  3. 未匹配任何模式 → 不重试
8.2.5 幂等性缓存与裁剪
typescript 复制代码
const COMPLETED_DIRECT_CRON_DELIVERIES = new Map<string, CompletedDirectCronDelivery>();

function pruneCompletedDirectCronDeliveries(now: number) {
  const ttlMs = process.env.OPENCLAW_TEST_FAST === "1" ? 60_000 : 24 * 60 * 60 * 1000;
  // TTL 过期清除
  // 超过 2000 条时按时间排序删除最旧的
}

内存缓存,24 小时 TTL,上限 2000 条。每次读写时触发裁剪。防止进程长时间运行后内存泄漏。

📊 建议配图:交付调度流
复制代码
节点:
  [dispatchCronDelivery] → [交付请求?] → [resolveDelivery 成功?] → [结构化内容?] → [deliverViaDirect / finalizeTextDelivery]
  
  子流程 (deliverViaDirect):
    [NO_REPLY 过滤] → [陈旧性检测] → [幂等性缓存检查] → [实际投递] → [bestEffort 处理] → [awareness 通知] → [缓存记录]
  
  子流程 (finalizeTextDelivery):
    [子代理检测] → [等待/读取子代输出] → [临时确认抑制] → [NO_REPLY 检测] → [deliverViaDirect]
  
边:
  - heartbeat-only→exit: "skipHeartbeatDelivery=true"
  - messaging-tool→exit: "skipMessagingToolDelivery=true"
  - 陈旧→exit: "STALE_CRON_DELIVERY_MAX_START_DELAY_MS=3h"
  - 缓存命中→exit: "delivered=true, return null"
  - 瞬态错误→重试: "3次, 5s/10s/20s 间隔"
  - 永久错误→exit(error): "非 bestEffort 时返回错误"
  
标注:
  - "幂等性键: runSessionId:channel:accountId:to:threadId"
  - "bestEffort: hadPartialFailure → delivered=false"
  - "子代理等待: push-based (gateway RPC)"

8.3 delivery-target.ts --- 目标解析

文件路径 : isolated-agent/delivery-target.ts
代码行数 : ~130 行
核心职责: 将 delivery 配置解析为具体的渠道/目标/账户/线程,处理"last"模式的回退逻辑。

8.3.1 resolveDeliveryTarget() 核心流程
typescript 复制代码
export async function resolveDeliveryTarget(cfg, agentId, jobPayload): Promise<DeliveryTargetResolution> {
  const requestedChannel = typeof jobPayload.channel === "string" ? jobPayload.channel : "last";
  const explicitTo = typeof jobPayload.to === "string" ? jobPayload.to : undefined;

Step 1: 会话查找

typescript 复制代码
const mainSessionKey = resolveAgentMainSessionKey({ cfg, agentId });
const store = loadSessionStore(storePath);
const threadEntry = threadSessionKey ? store[threadSessionKey] : undefined;
const main = threadEntry ?? store[mainSessionKey];

优先查找线程专用会话(如 agent:main:main:thread:1234),回退到主会话。

Step 2: 初步解析

typescript 复制代码
const preliminary = resolveSessionDeliveryTarget({
  entry: main, requestedChannel, explicitTo, explicitThreadId, allowMismatchedLastTo,
});

resolveSessionDeliveryTarget 从会话条目中提取 lastChannel/lastTo/lastAccountId/lastThreadId,与请求参数匹配。

Step 3: Channel 回退

typescript 复制代码
if (!preliminary.channel) {
  if (preliminary.lastChannel) {
    fallbackChannel = preliminary.lastChannel;
  } else {
    const selection = await resolveMessageChannelSelection({ cfg });
    fallbackChannel = selection.channel;
  }
}

三级回退:请求 channel → 会话 lastChannel → 全局默认 channel 选择。

Step 4: AccountId 解析

typescript 复制代码
const explicitAccountId = jobPayload.accountId?.trim();
let accountId = explicitAccountId ?? resolved.accountId;
if (!accountId && channel) {
  accountId = resolveFirstBoundAccountId({ cfg, channelId: channel, agentId });
}
if (jobPayload.accountId) accountId = jobPayload.accountId;  // 最高优先级

AccountId 优先级:job.delivery.accountId > explicitAccountId > 会话 lastAccountId > agent 绑定账户。

Step 5: Target 解析(docking)

typescript 复制代码
const docked = await resolveOutboundTargetWithRuntime({
  channel, to: toCandidate, cfg, accountId, mode, allowFrom: effectiveAllowFrom,
});
const idLikeTarget = await maybeResolveIdLikeTarget({ cfg, channel, input: docked.to, accountId });

两步解析:

  1. resolveOutboundTargetWithRuntime --- 将目标标识符解析为渠道内部格式(如 Telegram chat ID)
  2. maybeResolveIdLikeTarget --- 将 ID 格式的字符串(如 ou_xxxx)解析为实际目标

Step 6: allowFrom 安全检查

typescript 复制代码
if (mode === "implicit") {
  const configuredAllowFrom = channelPlugin?.config.resolveAllowFrom?.({ cfg, accountId });
  const storeAllowFrom = readChannelAllowFromStoreEntriesSync(channel, env, accountId);
  const allowFromOverride = [...new Set([...configuredAllowFrom, ...storeAllowFrom])];
  
  if (toCandidate && allowFromOverride.length > 0) {
    const currentTargetResolution = await resolveOutboundTargetWithRuntime({...});
    if (!currentTargetResolution.ok) {
      toCandidate = allowFromOverride[0];  // 回退到 allowFrom 第一个允许的目标
    }
  }
}

implicit 模式下,若目标不在 allowFrom 列表中,自动回退到列表中的第一个允许目标------确保不会向未授权的目标发送消息。

📊 建议配图:目标解析流
复制代码
节点:
  [resolveDeliveryTarget] → [会话查找] → [初步解析] → [Channel 回退] → [AccountId 解析] → [Target Docking] → [allowFrom 检查] → [返回结果]
  
边:
  - 会话查找→thread优先→main回退: "先线程会话,后主会话"
  - Channel 回退→lastChannel→全局默认: "requested→last→resolveMessageChannelSelection"
  - AccountId→job.delivery.accountId: "最高优先级"
  - allowFrom 不匹配→回退第一个: "implicit 模式安全保护"
  - docking 失败→ok=false: "返回错误"
  
标注:
  - "allowFrom = configured ∪ store, 去重"
  - "maybeResolveIdLikeTarget: ID 格式解析"

8.4 webhook 与 announce 模式对比

维度 announce webhook
投递方式 渠道消息(Telegram/Discord/Feishu 等) HTTP POST 请求
目标解析 resolveDeliveryTarget → 渠道/用户/线程 delivery.to 作为 URL
子代理等待 ✅ finalizeTextDelivery 中等待子代理完成 ❌ 直接发送,不等待
瞬态重试 ✅ 3 次指数退避 取决于 webhook 实现
幂等缓存 COMPLETED_DIRECT_CRON_DELIVERIES
陈旧检测 ✅ 3 小时阈值
NO_REPLY 抑制
bestEffort ✅ 部分失败容忍
Awareness 通知 ✅ queueCronAwarenessSystemEvent
Channel/AccountId 必需 不需要
线程支持 ✅ threadId

设计哲学差异

  • announce 是"人可读"交付------面向最终用户,需要完整的消息路由、格式转换、安全检查
  • webhook 是"机器可读"交付------面向外部系统,关注可靠性和简单性

8.5 故障目标(failureDestination)处理

触发路径delivery.tssendFailureNotificationAnnounce()

typescript 复制代码
export async function sendFailureNotificationAnnounce(
  deps, cfg, agentId, jobId, target, message,
): Promise<void> {
  const resolvedTarget = await resolveDeliveryTarget(cfg, agentId, {
    channel: target.channel, to: target.to, accountId: target.accountId,
    sessionKey: target.sessionKey,
  });
  // 投递失败消息(单条 text payload)
  await deliverOutboundPayloads({
    channel: resolvedTarget.channel, to: resolvedTarget.to, ...
    payloads: [{ text: message }],
    bestEffort: false,  // 故障通知不允许部分失败
  });
}

关键约束

  • 30 秒超时(FAILURE_NOTIFICATION_TIMEOUT_MS
  • bestEffort: false --- 故障通知必须完整投递
  • 投递失败只记录警告,不影响主流程返回

去重逻辑 (在 resolveFailureDestination 中):

typescript 复制代码
if (delivery && isSameDeliveryTarget(delivery, result)) {
  return null;  // 与正常交付相同,避免双重通知
}
📊 建议配图:故障目标处理
复制代码
节点:
  [Cron 运行失败] → [resolveFailureDestination] → [sendFailureNotificationAnnounce] → [resolveDeliveryTarget] → [deliverOutboundPayloads]
  
边:
  - resolveFailureDestination→null: "与正常交付相同 或 配置无效"
  - resolveFailureDestination→plan: "独立故障目标"
  - resolveDeliveryTarget 失败→warn+exit: "只记录警告"
  - deliverOutboundPayloads 失败→warn: "bestEffort=false, 30s 超时"
  
标注:
  - "30s 超时独立 AbortController"
  - "去重: isSameDeliveryTarget"

九、辅助模块深度解析

9.1 normalize.ts --- 输入规范化

文件路径 : normalize.ts
代码行数 : ~300 行
核心职责: 将用户/CLI/API 的原始输入规范化为内部统一的 CronJob 格式。这是系统防御性编程的第一道防线。

9.1.1 整体架构
复制代码
normalizeCronJobInput(raw, options)
  ├─ unwrapJob(raw)              --- 解包 {data: ...} / {job: ...} 包装
  ├─ Agent ID / SessionKey / Enabled 规范化
  ├─ coerceSchedule(base.schedule)
  ├─ inferTopLevelPayload(next)   --- 顶层字段推断 payload
  ├─ coercePayload(base.payload)
  ├─ coerceDelivery(base.delivery)
  ├─ copyTopLevelAgentTurnFields() --- 旧格式字段迁移到 payload
  ├─ stripLegacyTopLevelFields()   --- 清除旧格式顶层字段
  └─ applyDefaults (可选)
9.1.2 coerceSchedule() 逐行解析
typescript 复制代码
function coerceSchedule(schedule: UnknownRecord) {
  const next: UnknownRecord = { ...schedule };

Kind 推断

typescript 复制代码
const rawKind = normalizeLowercaseStringOrEmpty(schedule.kind);
const kind = rawKind === "at" || rawKind === "every" || rawKind === "cron" ? rawKind : undefined;
// 若 kind 未明确指定:
if (typeof schedule.atMs === "number" || typeof schedule.at === "string" || ...) {
  next.kind = "at";        // 有 atMs/at → 一次性
} else if (typeof schedule.everyMs === "number") {
  next.kind = "every";     // 有 everyMs → 重复
} else if (normalizedExpr) {
  next.kind = "cron";      // 有 cron 表达式 → cron
}

三段式推断:atMs/at → everyMs → expr,覆盖最常见的用户输入模式。

atMs 规范化

typescript 复制代码
const parsedAtMs =
  typeof atMsRaw === "number" ? atMsRaw                    // 直接数值
    : typeof atMsRaw === "string" ? parseAbsoluteTimeMs(atMsRaw)  // 字符串时间戳
    : atString ? parseAbsoluteTimeMs(atString)              // ISO 时间字符串
    : null;
if (atString) {
  next.at = parsedAtMs !== null ? new Date(parsedAtMs).toISOString() : atString;
}
delete next.atMs;  // 统一为 ISO 字符串格式

Kind 清理(删除不相关的字段):

typescript 复制代码
if (next.kind === "at") {
  delete next.everyMs; delete next.anchorMs; delete next.expr; 
  delete next.tz; delete next.staggerMs;
} else if (next.kind === "every") {
  delete next.at; delete next.expr; delete next.tz; delete next.staggerMs;
} else if (next.kind === "cron") {
  delete next.at; delete next.everyMs; delete next.anchorMs;
}

每种 kind 只保留相关字段,减少存储体积和歧义。

Stagger 处理

typescript 复制代码
const staggerMs = normalizeCronStaggerMs(schedule.staggerMs);
if (staggerMs !== undefined) next.staggerMs = staggerMs;
else delete next.staggerMs;

normalizeCronStaggerMs 将输入转为非负整数,无效值删除。

9.1.3 coercePayload() 逐行解析

Kind 规范化

typescript 复制代码
const kindRaw = normalizeLowercaseStringOrEmpty(next.kind);
if (kindRaw === "agentturn") next.kind = "agentTurn";   // 驼峰化
else if (kindRaw === "systemevent") next.kind = "systemEvent";

隐式 Kind 推断

typescript 复制代码
if (!next.kind) {
  if (hasMessage) next.kind = "agentTurn";     // 有 message → agent 交互
  else if (hasText) next.kind = "systemEvent"; // 有 text → 系统事件
  else if (hasAgentTurnPayloadHint(next)) next.kind = "agentTurn"; // 有 model/fallbacks/thinking 等 → agentTurn
}

hasAgentTurnPayloadHint 检测仅包含 agentTurn 专属字段的 patch:

typescript 复制代码
function hasAgentTurnPayloadHint(payload) {
  return hasTrimmedStringValue(payload.model) ||
    normalizeTrimmedStringArray(payload.fallbacks) !== undefined ||
    normalizeTrimmedStringArray(payload.toolsAllow, { allowNull: true }) !== undefined ||
    hasTrimmedStringValue(payload.thinking) ||
    typeof payload.timeoutSeconds === "number" ||
    typeof payload.lightContext === "boolean" ||
    typeof payload.allowUnsafeExternalContent === "boolean";
}

Kind 清理

typescript 复制代码
if (next.kind === "systemEvent") {
  delete next.message; delete next.model; delete next.fallbacks;
  delete next.thinking; delete next.timeoutSeconds; delete next.lightContext;
  delete next.allowUnsafeExternalContent; delete next.toolsAllow;
} else if (next.kind === "agentTurn") {
  delete next.text;  // agentTurn 不使用 text
}

旧格式字段清理

typescript 复制代码
delete next.deliver;      // 旧交付标志
delete next.channel;      // 旧渠道字段
delete next.to;           // 旧目标字段
delete next.threadId;     // 旧线程字段
delete next.bestEffortDeliver;  // 旧 bestEffort
delete next.provider;     // 旧 provider

这些字段在旧版 API 中位于 payload 顶层,现已迁移到 delivery 子对象中。

9.1.4 coerceDelivery() 逐行解析
typescript 复制代码
function coerceDelivery(delivery: UnknownRecord) {
  const parsed = parseDeliveryInput(delivery);  // 使用 Zod schema 校验
  // 逐字段赋值或删除
}

委托 delivery-field-schemas.ts 中的 Zod schema 进行类型安全的解析:

typescript 复制代码
export const DeliveryModeFieldSchema = z
  .preprocess(trimLowercaseStringPreprocess, z.enum(["deliver", "announce", "none", "webhook"]))
  .transform(value => value === "deliver" ? "announce" : value);

deliverannounce 的别名转换在 schema 层完成。

9.1.5 applyDefaults 阶段
typescript 复制代码
if (options.applyDefaults) {
  if (!next.wakeMode) next.wakeMode = "now";
  if (typeof next.enabled !== "boolean") next.enabled = true;
  if (!next.name) next.name = inferLegacyName({...});
  
  // sessionTarget 默认值
  if (!next.sessionTarget) {
    if (kind === "systemEvent") next.sessionTarget = "main";
    else if (kind === "agentTurn") next.sessionTarget = "isolated";
  }
  
  // "current" → 实际 sessionKey 解析
  if (next.sessionTarget === "current") {
    next.sessionTarget = `session:${assertSafeCronSessionTargetId(sessionKey)}`;
    // 无 sessionContext 时回退到 isolated
  }
  
  // at 类型默认 deleteAfterRun=true
  if (schedule.kind === "at" && !next.deleteAfterRun) next.deleteAfterRun = true;
  
  // cron 类型默认 stagger
  if (schedule.kind === "cron") { ... resolveDefaultCronStaggerMs ... }
  
  // isolated agentTurn 默认 delivery: { mode: "announce" }
  if (!hasDelivery && isIsolatedAgentTurn && payloadKind === "agentTurn") {
    next.delivery = { mode: "announce" };
  }
}

关键默认值策略:

  • agentTurn 默认 sessionTarget="isolated" --- 避免长期积累 token
  • systemEvent 默认 sessionTarget="main" --- 系统事件注入主会话
  • at 类型默认 deleteAfterRun=true --- 一次性任务执行后自动清理
  • isolated agentTurn 默认 announce 交付 --- agent 无法自行发送消息
📊 建议配图:规范化流水线
复制代码
节点:
  [normalizeCronJobInput] → [unwrapJob] → [Agent ID/SessionKey/Enabled] → [coerceSchedule] → [inferPayload] → [coercePayload] → [coerceDelivery] → [copyTopLevelFields] → [stripLegacyFields] → [applyDefaults]
  
边:
  - 每个步骤→下一步: "规范化后的中间结果"
  - applyDefaults 内部子边:
    - sessionTarget=current→session:xxx: "有 sessionContext"
    - sessionTarget=current→isolated: "无 sessionContext"
    - kind=at→deleteAfterRun=true: "一次性任务"
    - cron→staggerMs: "isRecurringTopOfHourCronExpr → 5min"
    - isolated agentTurn→delivery.announce: "默认交付"
  
标注:
  - "coerceSchedule: kind 推断 → atMs 规范化 → 字段清理"
  - "coercePayload: kind 推断 → 字段清理 → 旧字段删除"
  - "coerceDelivery: Zod schema 解析"

9.2 store.ts --- 持久化与备份策略

文件路径 : store.ts
代码行数 : ~160 行
核心职责 : 管理 jobs.json 的读写,实现安全写入、备份、缓存和运行时字段剥离。

9.2.1 loadCronStore() --- 加载
typescript 复制代码
export async function loadCronStore(storePath: string): Promise<CronStoreFile> {
  const raw = await fs.promises.readFile(storePath, "utf-8");
  const parsed = parseJsonWithJson5Fallback(raw);  // 容忍 JSON5 格式
  const store = { version: 1, jobs: jobs.filter(Boolean) };
  serializedStoreCache.set(storePath, JSON.stringify(store, null, 2));  // 缓存序列化结果
  return store;
}

JSON5 容错parseJsonWithJson5Fallback 先尝试标准 JSON 解析,失败后回退到 JSON5------容忍注释、尾逗号等。

缓存策略 :加载后立即缓存 JSON.stringify(store, null, 2) 的结果,用于后续 save 时的短路比较。

9.2.2 saveCronStore() --- 保存
typescript 复制代码
export async function saveCronStore(storePath, store, opts?) {
  const json = JSON.stringify(store, null, 2);
  const cached = serializedStoreCache.get(storePath);
  
  // 短路 1: 与缓存相同
  if (cached === json) return;
  
  // 短路 2: 与文件相同
  let previous = cached ?? await fs.promises.readFile(storePath, "utf-8") ?? null;
  if (previous === json) { serializedStoreCache.set(storePath, json); return; }
  
  // 备份检查: 仅运行时字段变化时跳过备份
  const skipBackup = opts?.skipBackup || shouldSkipCronBackupForRuntimeOnlyChanges(previous, store);
  
  // 安全写入: tmp → rename
  const tmp = `${storePath}.${process.pid}.${randomBytes(8).toString("hex")}.tmp`;
  await fs.promises.writeFile(tmp, json, { encoding: "utf-8", mode: 0o600 });
  if (previous !== null && !skipBackup) {
    await fs.promises.copyFile(storePath, `${storePath}.bak`);
  }
  await renameWithRetry(tmp, storePath);
  serializedStoreCache.set(storePath, json);
}

三层短路

  1. 内存缓存比较 --- 避免文件 I/O
  2. 文件内容比较 --- 避免写入
  3. 运行时字段差异 --- 避免不必要的备份
9.2.3 运行时字段剥离
typescript 复制代码
function stripRuntimeOnlyCronFields(store: CronStoreFile): unknown {
  return {
    version: store.version,
    jobs: store.jobs.map(job => {
      const { state: _state, updatedAtMs: _updatedAtMs, ...rest } = job;
      return rest;
    }),
  };
}

function shouldSkipCronBackupForRuntimeOnlyChanges(previousRaw, nextStore): boolean {
  const previous = parseCronStoreForBackupComparison(previousRaw);
  return JSON.stringify(strip(previous)) === JSON.stringify(strip(nextStore));
}

stateupdatedAtMs 是运行时动态更新的字段(每次 tick 都会更新 nextRunAtMs),剥离后比较------若只有运行时字段变化,不创建 .bak 备份文件,减少磁盘 I/O 和备份噪音。

9.2.4 renameWithRetry() --- 原子写入重试
typescript 复制代码
async function renameWithRetry(src, dest): Promise<void> {
  for (let attempt = 0; attempt <= RENAME_MAX_RETRIES; attempt++) {
    try {
      await fs.promises.rename(src, dest);
      return;
    } catch (err) {
      if (code === "EBUSY" && attempt < RENAME_MAX_RETRIES) {
        await setTimeout(RENAME_BASE_DELAY_MS * 2 ** attempt);  // 指数退避
        continue;
      }
      // Windows 兼容: rename 无法替换已存在文件时回退到 copyFile + unlink
      if (code === "EPERM" || code === "EEXIST") {
        await fs.promises.copyFile(src, dest);
        await fs.promises.unlink(src).catch(() => {});
        return;
      }
      throw err;
    }
  }
}

跨平台原子写入:

  • Linux/macOSrename() 原子替换
  • WindowsEPERM/EEXIST 时回退到 copyFile + unlink(非原子,但可靠)
  • EBUSY:指数退避重试(防病毒软件锁文件)

文件权限 :所有文件 0o600(仅 owner 读写),目录 0o700(仅 owner 访问)。

📊 建议配图:store 持久化流
复制代码
节点:
  [saveCronStore] → [缓存比较] → [文件比较] → [备份检查] → [安全写入] → [renameWithRetry]
  
边:
  - 缓存命中→exit: "无变化"
  - 文件相同→exit: "无变化"
  - 仅运行时变化→skipBackup: "不创建 .bak"
  - rename 成功→exit: "完成"
  - EBUSY→重试: "指数退避, 最多 3 次"
  - EPERM/EEXIST→copyFile: "Windows 兼容"
  
标注:
  - "0o600 文件权限, 0o700 目录权限"
  - "tmp 命名: storePath.pid.randomHex.tmp"
  - "JSON5 容错解析"

9.3 run-log.ts --- JSONL 日志与裁剪

文件路径 : run-log.ts
代码行数 : ~300 行
核心职责: 记录每次 cron 运行的结果到 JSONL 文件,支持分页查询和自动裁剪。

9.3.1 appendCronRunLog() --- 追加日志
typescript 复制代码
export async function appendCronRunLog(filePath, entry, opts?) {
  const resolved = path.resolve(filePath);
  const prev = writesByPath.get(resolved) ?? Promise.resolve();
  const next = prev.catch(() => undefined).then(async () => {
    await fs.mkdir(runDir, { recursive: true, mode: 0o700 });
    await fs.appendFile(resolved, `${JSON.stringify(entry)}\n`, { encoding: "utf-8", mode: 0o600 });
    await pruneIfNeeded(resolved, { maxBytes, keepLines });
  });
  writesByPath.set(resolved, next);
  try { await next; }
  finally { if (writesByPath.get(resolved) === next) writesByPath.delete(resolved); }
}

串行化写入writesByPath Map 确保同一文件的写入严格串行------每次写入等待前一次完成后再开始。防止并发写入导致 JSONL 行交错。

CronRunLogEntry 结构

typescript 复制代码
type CronRunLogEntry = {
  ts: number;              // 时间戳
  jobId: string;           // Job ID
  action: "finished";      // 动作类型(目前只有 finished)
  status?: "ok" | "error" | "skipped";
  error?: string;
  summary?: string;
  delivered?: boolean;
  deliveryStatus?: "delivered" | "not-delivered" | "unknown" | "not-requested";
  deliveryError?: string;
  sessionId?: string;
  sessionKey?: string;
  runAtMs?: number;
  durationMs?: number;
  nextRunAtMs?: number;
  model?: string;
  provider?: string;
  usage?: { input_tokens, output_tokens, total_tokens, cache_read_tokens, cache_write_tokens };
};
9.3.2 pruneIfNeeded() --- 自动裁剪
typescript 复制代码
async function pruneIfNeeded(filePath, opts) {
  const stat = await fs.stat(filePath);
  if (stat.size <= opts.maxBytes) return;  // 未超限

  const raw = await fs.readFile(filePath, "utf-8");
  const lines = raw.split("\n").map(l => l.trim()).filter(Boolean);
  const kept = lines.slice(Math.max(0, lines.length - opts.keepLines));  // 保留最新 N 行
  
  // 原子替换
  const tmp = `${filePath}.${process.pid}.${randomBytes(8).toString("hex")}.tmp`;
  await fs.writeFile(tmp, `${kept.join("\n")}\n`, { mode: 0o600 });
  await fs.rename(tmp, filePath);
}

裁剪策略

  • 触发条件:文件大小超过 maxBytes(默认 2MB)
  • 保留行数:最新的 keepLines 行(默认 2000 行)
  • 可配置:cron.runLog.maxBytescron.runLog.keepLines
9.3.3 readCronRunLogEntriesPage() --- 分页查询
typescript 复制代码
export async function readCronRunLogEntriesPage(filePath, opts?): Promise<CronRunLogPageResult> {
  const all = parseAllRunLogEntries(raw, { jobId });    // 解析全部条目
  const filtered = filterRunLogEntries(all, { statuses, deliveryStatuses, query, queryTextForEntry });
  const sorted = sortDir === "asc" ? asc : desc;
  return { entries: sorted.slice(offset, offset + limit), total, hasMore, nextOffset };
}

查询能力

  • 按 jobId 过滤
  • 按 status 过滤(ok/error/skipped)
  • 按 deliveryStatus 过滤(delivered/not-delivered/unknown/not-requested)
  • 全文搜索(匹配 summary + error + jobId + jobName)
  • 升序/降序排序
  • 分页(offset + limit,上限 200)

跨 Job 查询readCronRunLogEntriesPageAll() 扫描 runs/ 目录下所有 .jsonl 文件,合并后分页。

安全校验

typescript 复制代码
function assertSafeCronRunLogJobId(jobId: string): string {
  if (trimmed.includes("/") || trimmed.includes("\\") || trimmed.includes("\0")) {
    throw new Error("invalid cron run log job id");
  }
  const resolvedPath = path.resolve(runsDir, `${safeJobId}.jsonl`);
  if (!resolvedPath.startsWith(`${runsDir}${path.sep}`)) {
    throw new Error("invalid cron run log job id");  // 路径遍历防护
  }
}

双重防护:禁止路径分隔符 + 解析后路径必须在 runs/ 目录内。

📊 建议配图:日志系统架构
复制代码
节点:
  [appendCronRunLog] → [串行化队列] → [mkdir + appendFile] → [pruneIfNeeded]
  [readCronRunLogEntriesPage] → [drainPendingWrite] → [parseAllRunLogEntries] → [filter] → [sort] → [slice]
  [readCronRunLogEntriesPageAll] → [readdir runs/] → [并行 parse] → [flat + filter + sort + slice]
  
边:
  - 串行化队列→前次写入: "writesByPath Map"
  - pruneIfNeeded→skip: "size ≤ maxBytes"
  - pruneIfNeeded→trim: "保留最新 keepLines 行"
  - drainPendingWrite→read: "确保写入完成后再读"
  
标注:
  - "maxBytes=2MB, keepLines=2000"
  - "文件权限: 0o600"
  - "路径遍历防护: assertSafeCronRunLogJobId"

9.4 session-reaper.ts --- 会话收割器

文件路径 : session-reaper.ts
代码行数 : ~100 行
核心职责: 定期清理过期的 cron 运行会话,防止会话存储无限膨胀。

9.4.1 sweepCronRunSessions() --- 清理扫描
typescript 复制代码
export async function sweepCronRunSessions(params): Promise<ReaperResult> {
  const now = params.nowMs ?? Date.now();
  const lastSweepAtMs = lastSweepAtMsByStore.get(storePath) ?? 0;
  
  // 节流: 5 分钟内不重复扫描
  if (!params.force && now - lastSweepAtMs < MIN_SWEEP_INTERVAL_MS) {
    return { swept: false, pruned: 0 };
  }
  
  const retentionMs = resolveRetentionMs(params.cronConfig);
  if (retentionMs === null) return { swept: false, pruned: 0 };  // 禁用
  
  await updateSessionStore(storePath, (store) => {
    const cutoff = now - retentionMs;
    for (const key of Object.keys(store)) {
      if (!isCronRunSessionKey(key)) continue;  // 只清理 cron 运行 key
      if (entry.updatedAt < cutoff) {
        prunedSessions.set(entry.sessionId, entry.sessionFile);
        delete store[key];
        pruned++;
      }
    }
  });
  
  // 归档 transcript 文件
  await archiveRemovedSessionTranscripts({ removedSessionFiles, referencedSessionIds, ... });
  await cleanupArchivedSessionTranscripts({ directories, olderThanMs: retentionMs, ... });
}

设计要点

  1. 自节流MIN_SWEEP_INTERVAL_MS=5min,通过 lastSweepAtMsByStore Map 按 store 路径独立节流
  2. 选择性清理isCronRunSessionKey() 只匹配 ...:cron:<jobId>:run:<uuid> 格式的 key,保留基础会话
  3. Transcript 归档 :先归档到 archive/ 目录,再在归档内按 retention 时间二次清理
  4. 引用检查referencedSessionIds 确保不会删除仍在被其他会话引用的 transcript

锁序约束(注释特别强调):

复制代码
此函数通过 updateSessionStore 获取会话存储文件锁。
必须在 cron service 的 locked() 段之外调用,避免锁序反转。
9.4.2 retention 配置
typescript 复制代码
export function resolveRetentionMs(cronConfig?): number | null {
  if (cronConfig?.sessionRetention === false) return null;  // 显式禁用
  const raw = cronConfig?.sessionRetention;
  if (typeof raw === "string" && raw.trim()) {
    return parseDurationMs(raw.trim(), { defaultUnit: "h" });  // 支持字符串如 "12h", "2d"
  }
  return DEFAULT_RETENTION_MS;  // 24 小时
}
📊 建议配图:会话收割器
复制代码
节点:
  [sweepCronRunSessions] → [节流检查] → [retention 检查] → [updateSessionStore 扫描] → [archiveRemovedSessionTranscripts] → [cleanupArchivedSessionTranscripts]
  
边:
  - 节流跳过→exit: "5min 内已扫描"
  - 禁用→exit: "sessionRetention=false"
  - 扫描→删除: "updatedAt < cutoff"
  - 归档→清理: "olderThanMs=retentionMs"
  
标注:
  - "MIN_SWEEP_INTERVAL_MS=5min"
  - "DEFAULT_RETENTION_MS=24h"
  - "只清理 isCronRunSessionKey 匹配的 key"
  - "锁序: 必须在 locked() 段外调用"

9.5 stagger.ts --- 防惊群散列

文件路径 : stagger.ts
代码行数 : ~35 行
核心职责: 计算整点 cron 任务的随机散列延迟,防止多个任务在同一秒触发导致系统过载。

9.5.1 核心逻辑
typescript 复制代码
export const DEFAULT_TOP_OF_HOUR_STAGGER_MS = 5 * 60 * 1000;  // 5 分钟

export function isRecurringTopOfHourCronExpr(expr: string): boolean {
  const fields = parseCronFields(expr);
  if (fields.length === 5) {
    return fields[0] === "0" && fields[1].includes("*");  // 0 * * * *
  }
  if (fields.length === 6) {
    return fields[0] === "0" && fields[1] === "0" && fields[2].includes("*");  // 0 0 * * * *
  }
  return false;
}

export function resolveDefaultCronStaggerMs(expr: string): number | undefined {
  return isRecurringTopOfHourCronExpr(expr) ? DEFAULT_TOP_OF_HOUR_STAGGER_MS : undefined;
}

export function resolveCronStaggerMs(schedule): number {
  const explicit = normalizeCronStaggerMs(schedule.staggerMs);
  if (explicit !== undefined) return explicit;
  return resolveDefaultCronStaggerMs(cronExpr) ?? 0;
}

设计意图

当多个 cron job 都配置为 0 * * * *(每整点)时,若无散列,所有 job 会在整点 0 秒同时触发,造成:

  1. API 限流:同时发起大量 LLM 请求
  2. 资源争抢:CPU/内存瞬时峰值
  3. 交付拥塞:消息队列积压

staggerMs 的值(5 分钟)作为上限 ,实际延迟在运行时由调度器随机分配 [0, staggerMs) 范围内的值。

仅对整点 cron 自动启用0 9 * * *0 0 * * * * 等 minute=0 且 hour 含通配符的表达式。其他表达式(如 30 * * * *)不自动启用,因为它们已经自然分散。

📊 建议配图:散列策略
复制代码
节点:
  [resolveCronStaggerMs] → [explicit staggerMs?] → [isRecurringTopOfHourCronExpr?] → [返回值]
  
边:
  - explicit→exit: "用户指定值"
  - 整点 cron→5min: "DEFAULT_TOP_OF_HOUR_STAGGER_MS"
  - 非整点→0: "无需散列"
  
标注:
  - "整点判定: minute=0 && hour 含 *"
  - "6 字段格式: second=0 && minute=0 && hour 含 *"
  - "实际延迟: random(0, staggerMs)"

9.6 validate-timestamp.ts --- 时间戳校验

文件路径 : validate-timestamp.ts
代码行数 : ~40 行
核心职责 : 校验 schedule.at 时间戳的合法性,防止过去时间和过远未来时间。

9.6.1 validateScheduleTimestamp()
typescript 复制代码
export function validateScheduleTimestamp(schedule, nowMs = Date.now()): TimestampValidationResult {
  if (schedule.kind !== "at") return { ok: true };  // 仅校验 at 类型

  const atMs = parseAbsoluteTimeMs(atRaw);
  if (atMs === null || !Number.isFinite(atMs)) {
    return { ok: false, message: `Invalid schedule.at: expected ISO-8601 timestamp` };
  }

  const diffMs = atMs - nowMs;
  
  // 过去时间(1 分钟宽限)
  if (diffMs < -ONE_MINUTE_MS) {
    return { ok: false, message: `schedule.at is in the past: ${atDate} (${minutesAgo} minutes ago)` };
  }
  
  // 过远未来(10 年上限)
  if (diffMs > TEN_YEARS_MS) {
    return { ok: false, message: `schedule.at is too far in the future: ${atDate} (${yearsAhead} years ahead)` };
  }

  return { ok: true };
}

两个边界

  • 过去 :1 分钟宽限(ONE_MINUTE_MS = 60000),容忍时钟偏移和调度延迟
  • 未来 :10 年上限(TEN_YEARS_MS),防止用户误输入错误年份

仅校验 at 类型everycron 类型天然是循环的,无需时间戳校验。

📊 建议配图:时间戳校验
复制代码
节点:
  [validateScheduleTimestamp] → [kind=at?] → [parseAbsoluteTimeMs] → [过去检查] → [未来检查]
  
边:
  - kind≠at→exit(ok): "仅校验 at 类型"
  - parse 失败→exit(error): "Invalid timestamp"
  - diff < -1min→exit(error): "过去时间"
  - diff > 10yr→exit(error): "过远未来"
  - 合法→exit(ok): "通过"
  
标注:
  - "1 分钟宽限: 时钟偏移容忍"
  - "10 年上限: 误输入防护"

9.7 active-jobs.ts --- 内存活跃追踪

文件路径 : active-jobs.ts
代码行数 : ~30 行
核心职责: 在进程内存中追踪当前正在执行的 cron job,防止同一 job 被并发执行。

9.7.1 实现
typescript 复制代码
const CRON_ACTIVE_JOB_STATE_KEY = Symbol.for("openclaw.cron.activeJobs");

function getCronActiveJobState(): CronActiveJobState {
  return resolveGlobalSingleton<CronActiveJobState>(CRON_ACTIVE_JOB_STATE_KEY, () => ({
    activeJobIds: new Set<string>(),
  }));
}

export function markCronJobActive(jobId: string) { getCronActiveJobState().activeJobIds.add(jobId); }
export function clearCronJobActive(jobId: string) { getCronActiveJobState().activeJobIds.delete(jobId); }
export function isCronJobActive(jobId: string) { return getCronActiveJobState().activeJobIds.has(jobId); }

Symbol.for 全局单例

使用 Symbol.for("openclaw.cron.activeJobs") 而非模块级变量------确保即使模块被多次实例化(如测试中的不同 import 路径),状态仍然是全局共享的。resolveGlobalSingleton 按 Symbol 键存储单例,首次调用时初始化,后续复用。

使用场景

  • 调度器在 tick 时检查 isCronJobActive(jobId) ------若 job 正在执行,跳过本次触发
  • markCronJobActive 在执行开始时调用
  • clearCronJobActive 在执行结束后调用(无论成功/失败)
  • resetCronActiveJobsForTests 用于测试清理

注意 :这是进程级追踪,不支持跨进程/跨机器的去重。分布式环境下需要依赖外部锁服务。

📊 建议配图:活跃追踪
复制代码
节点:
  [调度器 tick] → [isCronJobActive?] → [markCronJobActive] → [执行 job] → [clearCronJobActive]
  
边:
  - active=true→skip: "跳过本次触发"
  - active=false→mark→execute: "开始执行"
  - 执行完成→clear: "无论成功/失败"
  
标注:
  - "Symbol.for 全局单例: 跨模块实例共享"
  - "进程级: 不支持分布式去重"
  - "Set<string> activeJobIds"

9.8 补充模块

9.8.1 normalize-job-identity.ts
typescript 复制代码
export function normalizeCronJobIdentityFields(raw): { mutated: boolean; legacyJobIdIssue: boolean } {
  const rawId = normalizeOptionalString(raw.id) ?? "";
  const legacyJobId = normalizeOptionalString(raw.jobId) ?? "";
  const hadJobIdKey = "jobId" in raw;
  const normalizedId = rawId || legacyJobId;
  const idChanged = Boolean(normalizedId && raw.id !== normalizedId);
  if (idChanged) raw.id = normalizedId;
  if (hadJobIdKey) delete raw.jobId;
  return { mutated: idChanged || hadJobIdKey, legacyJobIdIssue: hadJobIdKey };
}

向后兼容:jobId 旧字段迁移到 id,删除旧键。legacyJobIdIssue 标志用于向用户发出弃用警告。

9.8.2 webhook-url.ts
typescript 复制代码
export function normalizeHttpWebhookUrl(value: unknown): string | null {
  const trimmed = value.trim();
  try {
    const parsed = new URL(trimmed);
    if (!isAllowedWebhookProtocol(parsed.protocol)) return null;  // 只允许 http/https
    return trimmed;
  } catch { return null; }
}

简单但严格:只允许 http:https: 协议,防止 javascript:data: 等 SSRF 向量。

9.8.3 parse.ts --- 绝对时间解析
typescript 复制代码
export function parseAbsoluteTimeMs(input: string): number | null {
  const raw = input.trim();
  if (/^\d+$/.test(raw)) {
    const n = Number(raw);
    if (Number.isFinite(n) && n > 0) return Math.floor(n);  // 纯数字 → 毫秒时间戳
  }
  const parsed = Date.parse(normalizeUtcIso(raw));
  return Number.isFinite(parsed) ? parsed : null;
}

function normalizeUtcIso(raw: string) {
  if (ISO_TZ_RE.test(raw)) return raw;         // 已有时区 → 直接解析
  if (ISO_DATE_RE.test(raw)) return `${raw}T00:00:00Z`;  // 2024-01-01 → 午夜 UTC
  if (ISO_DATE_TIME_RE.test(raw)) return `${raw}Z`;     // 2024-01-01T12:00 → 追加 Z
  return raw;
}

三种输入格式:

  1. 纯数字(1703980800000)→ 直接作为毫秒时间戳
  2. ISO 日期时间(2024-01-01T12:00:00Z)→ Date.parse
  3. ISO 日期(2024-01-01)→ 追加 T00:00:00Z

无时区字符串默认 UTC :追加 Z 后缀而非使用本地时区,保证跨时区一致性。

9.8.4 delivery-field-schemas.ts --- Zod Schema
typescript 复制代码
export const DeliveryModeFieldSchema = z
  .preprocess(trimLowercaseStringPreprocess, z.enum(["deliver", "announce", "none", "webhook"]))
  .transform(value => value === "deliver" ? "announce" : value);

export const DeliveryThreadIdFieldSchema = z.union([
  TrimmedNonEmptyStringFieldSchema,
  z.number().finite(),
]);

export function parseDeliveryInput(input): ParsedDeliveryInput {
  return {
    mode: parseOptionalField(DeliveryModeFieldSchema, input.mode),
    channel: parseOptionalField(LowercaseNonEmptyStringFieldSchema, input.channel),
    to: parseOptionalField(TrimmedNonEmptyStringFieldSchema, input.to),
    threadId: parseOptionalField(DeliveryThreadIdFieldSchema, input.threadId),
    accountId: parseOptionalField(TrimmedNonEmptyStringFieldSchema, input.accountId),
  };
}

每个字段独立解析,任一字段校验失败不影响其他字段------parseOptionalField 使用 safeParse,失败返回 undefined 而非抛出异常。

threadId 双类型:支持字符串和数字------Telegram 的 thread ID 是数字,某些渠道是字符串。

9.8.5 helpers.ts --- Payload 工具函数

已在 7.1.4 节详述 resolveCronPayloadOutcome,此处补充其他工具函数:

pickSummaryFromOutput()

typescript 复制代码
export function pickSummaryFromOutput(text: string | undefined) {
  const clean = (text ?? "").trim();
  if (!clean) return undefined;
  const limit = 2000;
  return clean.length > limit ? `${truncateUtf16Safe(clean, limit)}...` : clean;
}

截断上限 2000 字符,使用 truncateUtf16Safe 确保 UTF-16 代理对不被切断。

pickDeliverablePayloads()

typescript 复制代码
export function pickDeliverablePayloads(payloads): DeliveryPayload[] {
  const successful = payloads.filter(p => p != null && p.isError !== true && isDeliverablePayload(p));
  if (successful.length > 0) return successful;
  const last = pickLastDeliverablePayload(payloads);  // 无成功 payload 时回退到最后一个可交付的
  return last ? [last] : [];
}

优先返回所有成功的可交付 payload,若无则回退到包含错误的最后一个------确保至少有内容交付。

isDeliverablePayload()

typescript 复制代码
function isDeliverablePayload(payload): boolean {
  const hasInteractive = (payload.interactive?.blocks?.length ?? 0) > 0;
  const hasChannelData = Object.keys(payload.channelData ?? {}).length > 0;
  return hasOutboundReplyContent(payload, { trimText: true }) || hasInteractive || hasChannelData;
}

三维度判定:文本内容 / 交互式卡片 / 渠道特定数据。

9.8.6 run-config.ts --- 运行配置构建
typescript 复制代码
export function buildCronAgentDefaultsConfig(params) {
  const { overrideModel, definedOverrides } = extractCronAgentDefaultsOverride(params.agentConfigOverride);
  return mergeCronAgentModelOverride({
    defaults: Object.assign({}, params.defaults, definedOverrides),
    overrideModel,
  });
}

sandbox 排除

typescript 复制代码
function extractCronAgentDefaultsOverride(agentConfigOverride?) {
  const { model: overrideModel, sandbox: _agentSandboxOverride, ...agentOverrideRest } = agentConfigOverride ?? {};
  return { overrideModel, definedOverrides: ...agentOverrideRest... };
}

sandbox 被解构后丢弃(_agentSandboxOverride 前缀 _ 表示未使用),因为 sandbox 解析已在独立路径中处理,不需要在 defaults 层合并。

9.8.7 session-key.ts --- 会话 Key 生成
typescript 复制代码
export function resolveCronAgentSessionKey(params): string {
  const raw = toAgentStoreSessionKey({
    agentId: params.agentId,
    requestKey: params.sessionKey.trim(),
    mainKey: params.mainKey,
  });
  return canonicalizeMainSessionAlias({ cfg: params.cfg, agentId: params.agentId, sessionKey: raw });
}

两步转换:

  1. toAgentStoreSessionKey:将请求 key 转为存储格式(添加 agent 前缀)
  2. canonicalizeMainSessionAlias:将 agent:<id>:main 映射为配置的 mainKey 别名

这修复了 issue #29683:当 cfg.session.mainKey 不是 "main" 时,cron 会话会在读取路径中被孤立。

9.8.8 job-fixtures.ts --- 测试固件
typescript 复制代码
export function makeIsolatedAgentJobFixture(overrides?) {
  return {
    id: "test-job",
    name: "Test Job",
    schedule: { kind: "cron", expr: "0 9 * * *", tz: "UTC" },
    sessionTarget: "isolated",
    payload: { kind: "agentTurn", message: "test" },
    ...overrides,
  } as never;
}

export function makeIsolatedAgentParamsFixture(overrides?) {
  return {
    cfg: {},
    deps: {} as never,
    job: makeIsolatedAgentJobFixture(jobOverrides),
    message: "test",
    sessionKey: "cron:test",
    ...overrides,
  };
}

as never 类型断言绕过 TypeScript 的严格类型检查,允许测试中只覆盖必要字段。


总结:模块间关系全景图

复制代码
                          ┌─────────────────────────────┐
                          │    cron service (timer)      │
                          └──────────┬──────────────────┘
                                     │ 触发
                          ┌──────────▼──────────────────┐
                          │  runCronIsolatedAgentTurn    │  ← run.ts
                          │  (prepare→execute→finalize)  │
                          └──────────┬──────────────────┘
                                     │
              ┌──────────────────────┼──────────────────────────┐
              │                      │                          │
   ┌──────────▼─────────┐  ┌────────▼─────────┐  ┌─────────────▼──────────────┐
   │  model-selection   │  │  run-executor     │  │  delivery-dispatch         │
   │  (5级优先级)        │  │  (执行+重试)      │  │  (交付调度)                │
   └────────────────────┘  └────────┬─────────┘  └────────────┬───────────────┘
                                    │                          │
                           ┌────────▼─────────┐     ┌─────────▼──────────────┐
                           │  subagent-followup │     │  delivery-target       │
                           │  (子代理等待)       │     │  (目标解析)             │
                           └──────────────────┘     └────────────────────────┘

   辅助层:
   ┌──────────┐ ┌─────────┐ ┌──────────┐ ┌──────────────┐ ┌────────────┐ ┌───────────────┐
   │ normalize│ │  store  │ │ run-log  │ │session-reaper│ │  stagger   │ │ active-jobs   │
   │ (输入校验)│ │(持久化) │ │ (JSONL)  │ │  (清理)      │ │ (防惊群)    │ │ (并发控制)     │
   └──────────┘ └─────────┘ └──────────┘ └──────────────┘ └────────────┘ └───────────────┘

数据流

  1. 用户输入 → normalize 规范化 → store 持久化
  2. 定时触发 → active-jobs 并发检查 → run.ts 准备上下文
  3. 准备阶段 → model-selection 选模型 → session 解析会话 → skills-snapshot 构建快照
  4. 执行阶段 → run-executor 运行 agent → subagent-followup 等待子代理
  5. 收尾阶段 → delivery-plan 生成计划 → delivery-target 解析目标 → delivery-dispatch 投递
  6. 记录 → run-log 追加日志 → session-reaper 定期清理

核心设计模式

  • 懒加载运行时:解决循环依赖和启动性能
  • 闭包工厂createCronPromptExecutor 封装可变状态
  • 幂等性缓存COMPLETED_DIRECT_CRON_DELIVERIES 防重复投递
  • 串行化写入writesByPath 保证 JSONL 写入原子性
  • 全局单例Symbol.for + resolveGlobalSingleton 跨模块共享状态
  • 三级短路:内存缓存 → 文件比较 → 实际写入
  • 快速失败:prepare 阶段错误立即返回,不浪费后续资源
  • 优雅降级:模型不可用时降级而非硬失败,交付失败时 bestEffort 容忍
相关推荐
heimeiyingwang2 小时前
【架构实战】多集群管理架构设计(Karmada/Fleet)
架构
SamDeepThinking2 小时前
从DDD的仓储层反向依赖,理解DIP、IOC和DI
java·后端·架构
wanhengidc2 小时前
云主机的核心原理与架构
运维·服务器·科技·游戏·智能手机·架构
竹之却2 小时前
【Agent-阿程】AI先锋杯·14天征文挑战第14期-第14天-OpenClaw 全配置目录结构与核心配置文件详解
人工智能·openclaw
龙侠九重天2 小时前
可视化自动化工具实现
运维·自动化·openclaw
张忠琳2 小时前
【vllm】(三)vLLM v1 Core — 模块超深度逐行分析之三
ai·架构·vllm
踩着两条虫2 小时前
VTJ.PRO 企业级应用开发实战指南
前端·人工智能·低代码·重构·架构
青槿吖2 小时前
告别RestTemplate!Feign让微服务调用像点外卖一样简单
java·开发语言·分布式·spring cloud·微服务·云原生·架构
一碗白开水一2 小时前
【技术探索】解码Mamba:从SSM到革命性序列建模架构的前世今生
架构