构建工业级 AI Agent 网关：核心技术体系

概述

一个工业级 AI Agent 网关本质上是一个循环 + 分发表，外面包裹着持久化、路由、智能、调度、可靠性、韧性和并发控制的层层机制。从最简单的 while-true 循环到生产系统，需要解决以下核心问题：

scss 复制代码

┌─────────────────────────────────────────────────────────────────┐
│                    工业级 Agent 网关架构                         │
├─────────────────────────────────────────────────────────────────┤
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐        │
│  │ 多平台接入 │  │ 智能路由  │  │ 会话管理  │  │ 记忆系统  │        │
│  │ (Channel) │  │ (Gateway) │  │ (Session) │  │(Intelli.) │        │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘        │
│       │             │             │             │               │
│       └─────────────┴──────┬──────┴─────────────┘               │
│                            ▼                                    │
│                    ┌───────────────┐                           │
│                    │   Agent Core   │  (工具调用循环)            │
│                    └───────┬───────┘                           │
│                            │                                    │
│  ┌──────────┐  ┌──────────┴──────────┐  ┌──────────┐           │
│  │ 自治能力  │  │      弹性系统       │  │ 并发控制  │           │
│  │(Heartbeat)│  │(Resilience+Delivery)│  │ (Lane)   │           │
│  └──────────┘  └─────────────────────┘  └──────────┘           │
└─────────────────────────────────────────────────────────────────┘

一、会话管理：持久化与上下文保护

1.1 核心问题

Agent 需要持久化对话历史以支持：

跨会话记忆
崩溃恢复
长期上下文追踪

同时需要处理 LLM 的上下文窗口限制，防止消息过长导致 API 调用失败。

1.2 技术方案：JSONL 追加写入

每个会话对应一个 .jsonl 文件，采用追加写入模式：

text 复制代码

/agents/{agent_id}/sessions/{session_id}.jsonl

记录类型：

json 复制代码

{"type": "user", "content": "Hello", "ts": 1234567890}
{"type": "assistant", "content": [{"type": "text", "text": "Hi!"}], "ts": ...}
{"type": "tool_use", "tool_use_id": "toolu_...", "name": "read_file", "input": {...}, "ts": ...}
{"type": "tool_result", "tool_use_id": "toolu_...", "content": "file contents", "ts": ...}

优势：

追加是原子操作，无需重写整个文件
重放时重建完整的 messages 数组
支持工具调用和结果的精确还原

1.3 上下文溢出保护：三阶段策略

arduino 复制代码

Attempt 0: 正常调用
    │
    └─ overflow? ─no─→ success
           │yes
           ▼
Attempt 1: 截断过大的工具结果
    │
    └─ overflow? ─no─→ success
           │yes
           ▼
Attempt 2: LLM 压缩历史
    │
    └─ overflow? ─yes─→ raise Error

历史压缩算法：

保留最近 20% 的消息
将最早的 50% 序列化为文本
调用 LLM 生成摘要
用摘要 + "Understood" 对替换旧消息

二、多平台通道：统一抽象

2.1 核心问题

不同平台（Telegram、飞书、Discord、CLI）有各自的消息格式和 API，需要统一处理。

2.2 技术方案：Channel 抽象 + InboundMessage

统一的 InboundMessage 格式：

python 复制代码

@dataclass
class InboundMessage:
    text: str              # 消息文本
    sender_id: str         # 发送者 ID
    channel: str           # 平台标识: "cli", "telegram", "feishu"
    account_id: str        # 接收消息的 bot ID
    peer_id: str           # 会话标识: DM=user_id, group=chat_id
    is_group: bool         # 是否群组
    media: list            # 媒体附件
    raw: dict              # 原始平台数据

Channel ABC：

python 复制代码

class Channel(ABC):
    @abstractmethod
    def receive(self) -> InboundMessage | None: ...

    @abstractmethod
    def send(self, to: str, text: str, **kwargs) -> bool: ...

新增平台只需实现这两个方法，agent 核心逻辑完全不需要修改。

2.3 平台适配要点

平台	特殊处理
Telegram	长轮询、offset 持久化、媒体组缓冲、文本合并
飞书	Webhook、token 认证、@提及检测、多消息类型解析
CLI	标准输入输出包装

三、网关与路由：多 Agent 调度

3.1 核心问题

一个网关实例可能服务多个 agent，需要将用户请求路由到正确的 agent。

3.2 技术方案：五层路由解析

yaml 复制代码

Tier 1: peer_id     (最具体，精确到用户)
Tier 2: guild_id    (服务器/群组级别)
Tier 3: account_id  (bot 账号级别)
Tier 4: channel     (平台级别)
Tier 5: default     (默认兜底)

解析算法：线性扫描绑定表，首次匹配即返回。

python 复制代码

def resolve(channel, account_id, guild_id, peer_id):
    for binding in sorted_bindings:
        if binding.match(peer_id): return binding.agent_id  # Tier 1
        if binding.match(guild_id): return binding.agent_id # Tier 2
        # ... 继续到 Tier 5
    return default_agent

3.3 会话隔离：dm_scope

同一用户在不同上下文中可能需要隔离会话：

dm_scope	Key 格式	效果
`main`	`agent:{id}:main`	所有人共享
`per-peer`	`agent:{id}:direct:{peer}`	每用户隔离
`per-channel-peer`	`agent:{id}:{ch}:direct:{peer}`	每平台隔离
`per-account-channel-peer`	`agent:{id}:{ch}:{acc}:direct:{peer}`	最大隔离

四、智能层：提示词工程与记忆系统

4.1 核心问题

如何让 agent 有"个性"？
如何让 agent 记住用户偏好？
如何动态注入相关记忆？

4.2 技术方案：八层提示词组装

yaml 复制代码

Layer 1: Identity        (身份定义，最强影响)
Layer 2: Soul            (性格特征)
Layer 3: Tools Guidance  (工具使用指南)
Layer 4: Skills          (技能模块)
Layer 5: Memory          (常驻记忆 + 召回记忆)
Layer 6: Bootstrap       (启动配置)
Layer 7: Runtime Context (运行时信息)
Layer 8: Channel Hints   (平台特定提示)

越靠前的层对行为影响越大，Soul 放在 Layer 2 正是此原因。

4.3 记忆系统：混合搜索管道

scss 复制代码

用户消息 ─→ 关键词搜索(TF-IDF) ─→ Top 10
         ─→ 向量搜索(哈希投影) ─→ Top 10
         ─→ 合并 + 加权 ─→ 时间衰减 ─→ MMR 重排 ─→ Top 3 注入提示词

TF-IDF 实现（纯 JS，无需外部向量数据库）：

javascript 复制代码

function keywordSearch(query, chunks, topK) {
    const queryTokens = tokenize(query);
    const df = computeDocumentFrequency(chunks);
    const queryVec = tfidf(queryTokens, df, chunks.length);

    return chunks
        .map(chunk => ({
            chunk,
            score: cosineSimilarity(queryVec, tfidf(tokenize(chunk.text), df, chunks.length))
        }))
        .sort((a, b) => b.score - a.score)
        .slice(0, topK);
}

4.4 自动召回

每次 LLM 调用前，自动搜索相关记忆并注入：

javascript 复制代码

const memoryContext = autoRecall(userMessage);  // 搜索 Top 3
const systemPrompt = buildSystemPrompt({
    bootstrap: bootstrapData,
    memoryContext,  // 注入到 Layer 5
});

五、心跳与 Cron：Agent 自治能力

5.1 核心问题

Agent 不能只被动响应用户，还需要：

主动检查状态并提醒用户
执行定时任务

5.2 技术方案：Lane 互斥

核心原则：用户消息始终优先

java 复制代码

Main Lane (用户输入):
    阻塞等待锁 → 独占 LLM 管道 → 释放锁

Heartbeat Lane (后台心跳):
    非阻塞尝试锁 → 成功则执行 → 失败则跳过

javascript 复制代码

// 用户输入：阻塞等待
await laneLock.waitAndAcquire("user");
try {
    // 执行 LLM 调用
} finally {
    laneLock.release("user");
}

// 心跳：非阻塞尝试
if (!laneLock.tryAcquire("heartbeat")) return;  // 用户活跃时直接跳过
try {
    // 执行心跳逻辑
} finally {
    laneLock.release("heartbeat");
}

5.3 Cron 调度：三种类型

类型	配置示例	用途
`at`	`{"at": "2024-01-01T09:00:00"}`	一次性任务
`every`	`{"every_seconds": 3600}`	周期任务
`cron`	`{"expr": "0 9 * * *"}`	日程表任务

自动禁用机制：连续错误 5 次后自动禁用任务，防止无限重试。

5.4 HEARTBEAT_OK 协议

Agent 用 HEARTBEAT_OK 表示"无需报告"：

text 复制代码

HEARTBEAT.md:
"Check if there are any unread reminders.
Reply HEARTBEAT_OK if nothing to report."

Agent 响应:
"HEARTBEAT_OK" → 抑制输出
"您有一个会议即将开始..." → 正常输出

六、消息投递：崩溃安全的可靠性

6.1 核心问题

网络不稳定导致消息发送失败
进程崩溃导致消息丢失

6.2 技术方案：预写队列

核心原则：先写磁盘，再尝试发送

scss 复制代码

入队:
  1. 生成唯一 ID
  2. 写入 .tmp.{pid}.{id}.{ts}.json
  3. fsync() 确认落盘
  4. rename() 原子改名 → 崩溃安全

投递:
  后台扫描 → 发送成功 → ack() 删除文件
                   └─ 失败 → fail() 计算退避，更新文件
                              └─ 重试 5 次后移入 failed/

退避策略（带抖动防止惊群）：

javascript 复制代码

const BACKOFF_MS = [5000, 25000, 120000, 600000];
// 第 1 次重试: 5s ± 1s
// 第 2 次重试: 25s ± 5s
// 第 3 次重试: 2min ± 24s
// 第 4 次重试: 10min ± 2min

6.3 消息分片

不同平台有不同的消息长度限制：

平台	限制
Telegram	4096 字符
Discord	2000 字符

分片时尊重段落边界，避免在代码块中间切割。

七、弹性系统：多层重试洋葱

7.1 核心问题

API 限流、认证失败、超时
上下文溢出
账户余额不足

7.2 技术方案：三层重试架构

yaml 复制代码

Layer 1: 配置轮换 (Auth Profile Rotation)
    │
    ├── 每个配置持有独立的 API key
    ├── 失败时冷却该配置，切换到下一个
    │
Layer 2: 溢出恢复 (Overflow Recovery)
    │
    ├── 截断过大工具结果
    ├── LLM 压缩历史
    │
Layer 3: 工具调用循环 (Tool-Use Loop)
    │
    └── 执行工具，追加结果，继续循环

7.3 失败分类

javascript 复制代码

function classifyFailure(error) {
    const msg = error.message.toLowerCase();

    if (msg.includes("rate") || msg.includes("429")) return "rate_limit";
    if (msg.includes("auth") || msg.includes("401")) return "auth";
    if (msg.includes("timeout")) return "timeout";
    if (msg.includes("billing") || msg.includes("402")) return "billing";
    if (msg.includes("context") || msg.includes("token")) return "overflow";

    return "unknown";
}

处理策略：

失败类型	冷却时间	处理方式
auth / billing	300s	轮换到下一个配置
rate_limit	120s	轮换到下一个配置
timeout	60s	轮换到下一个配置
overflow	0s	压缩上下文后重试同一层

7.4 备选模型链

当所有配置都耗尽时，尝试备选模型：

javascript 复制代码

const fallbackModels = [
    "claude-3-opus",    // 主模型
    "claude-3-sonnet",  // 备选 1
    "claude-3-haiku",   // 备选 2
];

八、并发控制：命名 Lane 模型

8.1 核心问题

多种工作（用户输入、心跳、cron）需要并发
同类工作需要串行化
需要优雅处理重启和关停

8.2 技术方案：命名 Lane 队列

ini 复制代码

┌─────────────────────────────────────────────────┐
│              CommandQueue (调度器)               │
├─────────────────────────────────────────────────┤
│                                                 │
│  ┌─────────┐  ┌─────────┐  ┌─────────────┐     │
│  │  main   │  │  cron   │  │  heartbeat  │     │
│  │ max=1   │  │ max=1   │  │    max=1    │     │
│  │ FIFO    │  │ FIFO    │  │    FIFO     │     │
│  └────┬────┘  └────┬────┘  └──────┬──────┘     │
│       │            │              │            │
│       ▼            ▼              ▼            │
│    [active]     [active]       [active]        │
│                                                 │
└─────────────────────────────────────────────────┘

核心原语 LaneQueue：

javascript 复制代码

class LaneQueue {
    constructor(name, maxConcurrency = 1) {
        this.queue = [];        // FIFO 队列
        this.activeCount = 0;   // 活跃任务数
        this.generation = 0;    // 重启追踪
    }

    enqueue(fn) {
        return new Promise((resolve, reject) => {
            this.queue.push({ fn, resolve, reject, generation: this.generation });
            this.pump();  // 尝试出队执行
        });
    }

    pump() {
        while (this.activeCount < this.maxConcurrency && this.queue.length) {
            const task = this.queue.shift();
            this.activeCount++;
            this.runTask(task);
        }
    }

    async runTask(task) {
        try {
            const result = await task.fn();
            task.resolve(result);
        } finally {
            this.activeCount--;
            // 只有当前 generation 的任务才会 pump
            if (task.generation === this.generation) this.pump();
        }
    }
}

8.3 Generation 追踪：重启安全

旧生命周期的任务完成后，不应继续推进当前队列：

javascript 复制代码

// 重启时递增所有 generation
resetAll() {
    for (const lane of this.lanes.values()) {
        lane.generation++;
    }
}

// 任务完成时检查
if (task.generation === this.generation) {
    this.pump();  // 正常推进
}
// 否则：任务完成返回结果，但不 pump 队列

8.4 用户优先语义

javascript 复制代码

// 用户输入：阻塞等待结果
const response = await commandQueue.enqueue("main", async () => {
    return await runAgentTurn(userInput);
});

// 心跳：非阻塞，不等待结果
commandQueue.enqueue("heartbeat", async () => {
    return await runHeartbeat();
}).then(result => { /* 异步处理 */ });

九、完整架构图

scss 复制代码

用户消息 ──→ Channel ──→ InboundMessage
                                │
                                ▼
                         BindingTable
                         (路由到 agent_id)
                                │
                                ▼
                      buildSessionKey()
                      (dm_scope 隔离)
                                │
                                ▼
                      SessionStore.load()
                      (JSONL 重放)
                                │
                                ▼
                      autoRecall()
                      (记忆召回)
                                │
                                ▼
                      buildSystemPrompt()
                      (8层组装)
                                │
                    ┌───────────┴───────────┐
                    ▼                       ▼
              CommandQueue              Heartbeat/Cron
              (Lane 调度)              (后台自治)
                    │                       │
                    └───────────┬───────────┘
                                ▼
                       ResilienceRunner
                       (三层重试洋葱)
                                │
                                ▼
                       Agent Core Loop
                       (工具调用循环)
                                │
                                ▼
                       DeliveryQueue
                       (崩溃安全投递)
                                │
                                ▼
                         Channel.send()

十、技术栈总结

领域	核心技术	关键设计
持久化	JSONL 追加写入	原子操作、重放恢复
上下文保护	三阶段策略	截断 → 压缩 → 失败
多平台适配	Channel 抽象	receive/send 接口
路由	五层绑定表	最具体匹配优先
提示词	八层组装	分层影响力度
记忆	混合搜索	TF-IDF + 向量 + MMR
自治	Lane 互斥	用户优先
可靠性	预写队列	fsync + rename
弹性	三层重试	配置轮换 + 溢出恢复
并发	命名 Lane	FIFO + generation

十一、生产部署考量

11.1 监控指标

会话层：活跃会话数、平均上下文长度、压缩触发频率
路由层：各 agent 请求分布、路由命中率
记忆层：召回命中率、搜索延迟
投递层：队列深度、重试率、失败率
弹性层：配置冷却时间、备选模型使用率
并发层：各 lane 队列长度、活跃任务数

11.2 扩展方向

分布式部署：Session 存储迁移到 Redis/数据库
向量数据库：记忆系统接入专业向量引擎
插件系统：技能模块的热加载
多模态支持：图像、音频处理
流式输出：支持 SSE/WebSocket 流式响应

总结

工业级 Agent 网关不是简单的 LLM API 包装，而是一个完整的系统工程：

可靠性：从崩溃中恢复，从失败中重试
可扩展性：新增平台、新增 agent、新增技能
可维护性：分层清晰、职责单一
可观测性：全链路追踪、指标采集