Claude Code 上下文管理（二）：零 Token 消耗的压缩三板斧

系列导航

（一）为什么 Agent 会"失忆"？

[（二）零 Token 消耗的压缩三板斧](#（二）零 Token 消耗的压缩三板斧 "#") ← 当前

（三）语义压缩与生产实践

Claude Code 上下文管理（二）：零 Token 消耗的压缩三板斧

📌 上篇回顾

上一篇我们揭示了 LLM "失忆"的本质：

✅ LLM 无状态，每次请求重发全部历史
✅ 三维爆炸：消息数量、单条大小、累积 token
✅ 四层分治：L3/L1/L2/L4 渐进式压缩

本篇深入 L3/L1/L2 三层零 API 调用压缩策略的源码实现，重点解析：

L3 如何 0 Token 把 500KB 压缩到 2KB
L1 的生产级 boundary message 架构
L2 的时间触发与防幻觉机制

🗺️ 快速导航：何时使用哪一层？

在深入每一层之前，先看这个决策树，快速判断你的场景适合哪种策略：

graph TD Start[上下文即将超限] --> Q1{单个工具结果 超过 200KB?} Q1 -->|是| L3[L3: 持久化大文件 效果: 500KB - 2KB] Q1 -->|否| Q2{历史消息数 超过 50 条?} Q2 -->|是| L1[L1: 删除中间消息 效果: 100 条 - 51 条] Q2 -->|否| Q3{距上次对话 超过 60 分钟?} Q3 -->|是| L2[L2: 清理旧工具结果 效果: 174KB - 1KB] Q3 -->|否| Wait[暂不压缩 继续监控] L3 --> Check{解决了?} L1 --> Check L2 --> Check Check -->|否| Combine[组合使用 L3 + L1 + L2] Check -->|是| Done[继续对话] Combine --> Done style L3 fill:#e1f5ff,stroke:#03a9f4,stroke-width:2px style L1 fill:#fff3cd,stroke:#fbc02d,stroke-width:2px style L2 fill:#d4edda,stroke:#66bb6a,stroke-width:2px style Combine fill:#f8d7da,stroke:#e57373,stroke-width:2px style Start fill:#f3e5f5,stroke:#ba68c8,stroke-width:2px style Done fill:#c8e6c9,stroke:#4caf50,stroke-width:3px

💡 实战提示：三层策略可以同时使用，互不冲突。通常的触发顺序是：L3（实时）→ L1（50 条时）→ L2（60 分钟后）。

第一部分：L3 - 大型工具输出问题 📦

解决的核心问题

问题：单个工具调用返回的数据过大，直接占满上下文。

典型场景：

python 复制代码

# 场景 1：读取大型日志文件
read_file("server.log")
→ 返回 500KB 日志
→ 占用 ~125K tokens
→ 一次工具调用就用掉 60% 上下文！

# 场景 2：执行输出大量内容的命令
bash("find . -name '*.py' -exec cat {} \;")
→ 返回几 MB 源代码
→ 直接撑爆上下文窗口

# 场景 3：数据库查询返回大量结果
bash("mysql -e 'SELECT * FROM logs LIMIT 10000'")
→ 返回数万行数据
→ AI 根本不需要看全部，只需要统计信息

设计思想：换位存储，而非丢弃

核心洞察 ：AI 不需要立即看到 全部内容，但需要知道内容在哪里。

传统做法 vs Claude Code 的做法：

方案	做法	问题/优势
❌ 截断	只保留前 2000 字符	AI 看不到后面的关键信息
❌ 丢弃	直接删除	AI 完全不知道有这个输出
❌ 全部保留	500KB 全放上下文	撑爆窗口
✅ 持久化 + 预览	完整内容保存到磁盘上下文只保留前 2000 字符 + 路径 AI 需要时可 read_file(路径)	信息不丢失上下文节省 99.6% 可恢复成本 $0

持久化方案实现

触发条件：最后一条 user 消息中的工具结果总大小 > 200KB

核心逻辑 （基于 src/utils/toolResultStorage.ts）：

typescript 复制代码

// 核心函数：对工具结果应用预算限制
export async function enforceToolResultBudget(
  messages: Message[],
  state: ContentReplacementState,
  skipToolNames: ReadonlySet<string> = new Set(),
) {
  // 1. 按 API 消息边界分组收集候选结果
  const candidatesByMessage = collectCandidatesByMessage(messages)
  const limit = 200_000  // 200KB 阈值
  
  const replacementMap = new Map<string, string>()
  const toPersist: ToolResultCandidate[] = []
  
  // 2. 遍历每个消息组，检查是否超预算
  for (const candidates of candidatesByMessage) {
    const totalSize = candidates.reduce((sum, c) => sum + c.size, 0)
    
    if (totalSize > limit) {
      // 3. 按大小降序选择最大的结果持久化
      const selected = candidates
        .filter(c => c.size > 30_000)  // 小于 30KB 的不处理
        .sort((a, b) => b.size - a.size)
      
      toPersist.push(...selected)
    }
  }
  
  // 4. 并发持久化所有选中的结果
  const freshReplacements = await Promise.all(
    toPersist.map(async c => {
      // 保存完整内容到磁盘
      const path = await saveToDisk(c.content)
      
      // 返回替换内容：预览 + 路径
      return {
        toolUseId: c.toolUseId,
        replacement: `<persisted-output>
Full output: ${path}
Preview:
${c.content.slice(0, 2000)}
...
</persisted-output>`
      }
    })
  )
  
  // 5. 返回替换后的消息
  return replaceToolResultContents(messages, replacementMap)
}

关键设计点：

只处理最后一条 user 消息 --- 之前的工具结果会被 L2 处理
按大小降序处理 --- 优先持久化最大的，效率最高
保留 2000 字符预览 --- AI 能看到开头，判断是否需要完整内容
阈值 30KB --- 小于 30KB 的不持久化，避免过度优化

压缩效果示例

typescript 复制代码

// 压缩前
{
  type: "user",
  content: [{
    type: "tool_result",
    tool_use_id: "toolu_123",
    content: "... 500KB 的日志内容 ..."  // 500KB
  }]
}

// 压缩后
{
  type: "user",
  content: [{
    type: "tool_result",
    tool_use_id: "toolu_123",
    content: `<persisted-output>
Full output: .task_outputs/tool-results/toolu_123.txt
Preview:
[ERROR] 2026-07-02 10:00:00 - Connection failed
[ERROR] 2026-07-02 10:00:01 - Retrying...
...
</persisted-output>`  // 2KB
  }]
}

L3 效果总结

项目	数值
节省空间	500KB → 2KB，节省 498KB ≈ 125K tokens
API Token 成本	$0（无 API 调用）
执行时间	毫秒级
信息损失	极低（可重新读取完整文件）
AI 可见	✅ 预览 + 完整路径

⚠️ 成本全景：不只是 API Token

虽然 L3 不消耗 API Token，但会引入其他隐性成本：

成本类型	L3 持久化	L1 Snip	L2 占位符
API Token	✅ $0	✅ $0	✅ $0
磁盘 I/O	⚠️ 中等（写入文件）	✅ 忽略	✅ 忽略
存储空间	⚠️ 累积（需清理）	✅ 无	✅ 无
重读风险	⚠️ AI 可能多次 read_file	✅ 无	⚠️ 可能需要重新执行工具
代码复杂度	中	高	低

生产级最佳实践：

1. 持久化文件自动清理

typescript 复制代码

// 方案 A：会话结束后自动清理
async function cleanupSession(sessionId: string) {
  const outputDir = `.task_outputs/session-${sessionId}`
  await fs.rm(outputDir, { recursive: true })
}

// 方案 B：按时间清理（保留 24 小时内的）
async function cleanupOldFiles() {
  const files = await glob('.task_outputs/**/*.txt')
  const now = Date.now()
  const ONE_DAY = 24 * 60 * 60 * 1000
  
  for (const file of files) {
    const stat = await fs.stat(file)
    if (now - stat.mtimeMs > ONE_DAY) {
      await fs.unlink(file)
    }
  }
}

// 推荐：在 Agent 启动时注册清理钩子
process.on('exit', () => cleanupSession(currentSessionId))

2. 监控重读频率

如果 AI 在 3 轮内重复 read_file 同一路径 > 2 次，说明预览长度不足：

typescript 复制代码

const readCounter = new Map<string, number>()

function onToolCall(toolName: string, args: any) {
  if (toolName === 'read_file') {
    const path = args.file_path
    readCounter.set(path, (readCounter.get(path) || 0) + 1)
    
    // 预警：重复读取超过 2 次
    if (readCounter.get(path)! > 2) {
      console.warn(`[L3] ${path} 被重复读取 ${readCounter.get(path)} 次，建议增加预览长度`)
    }
  }
}

3. 异步写入优化

高并发场景下，saveToDisk 可能成为瓶颈：

typescript 复制代码

// ❌ 同步写入会阻塞主线程
await fs.writeFile(path, content)

// ✅ 使用写入队列 + 批量刷盘
class PersistenceQueue {
  private queue: Array<{path: string, content: string}> = []
  private flushing = false
  
  async enqueue(path: string, content: string) {
    this.queue.push({path, content})
    if (!this.flushing) this.flush()
  }
  
  private async flush() {
    this.flushing = true
    while (this.queue.length > 0) {
      const batch = this.queue.splice(0, 10)  // 每批 10 个
      await Promise.all(
        batch.map(({path, content}) => fs.writeFile(path, content))
      )
    }
    this.flushing = false
  }
}

💡 成本评估 ：虽然有隐性开销，但对比 API Token 成本（ $15/M），这些成本通常可忽略。一个500KB文件的持久化成本约0.001秒CPU+500KB磁盘，而避免的APIToken成本是15/M），这些成本通常可忽略。一个 500KB 文件的持久化成本约 0.001 秒 CPU + 500KB 磁盘，而避免的 API Token 成本是$ 15/M），这些成本通常可忽略。一个500KB文件的持久化成本约0.001秒CPU+500KB磁盘，而避免的APIToken成本是1.875（125K tokens）。ROI 高达 1875 倍。
💡 关键洞察：L3 通过持久化 + 预览的方式，解决了大型工具输出问题。0 Token 消耗、可恢复、AI 仍能看到关键信息。

第二部分：L1 - Snip Compact 删除中间消息 🧹

解决的核心问题

问题：对话轮次过多，历史消息堆积如山。

典型场景：

arduino 复制代码

用户："重构这个认证系统"

AI 执行流程：
轮 1-10:   探索项目结构（read 10 个文件）
轮 11-30:  分析代码问题（read + 思考）
轮 31-50:  逐个文件重构（read + write）
轮 51-60:  测试和修复（bash + read）
轮 61-70:  最终调整和文档

结果：140+ 条消息（70 assistant + 70 user）

问题：前 50 轮的探索过程还需要吗？

设计思想：保留头尾，删除中间

核心洞察 ：对话有明确的时间价值衰减特性。

复制代码

前 3 条消息：用户意图、任务目标、约束条件 → 永远重要
中间 N 条消息：探索、试错、中间状态 → 价值衰减
最近 47 条消息：当前工作焦点、最新决策 → 高度相关

📖 术语解释 ：下文提到的 boundary message（边界消息）是一个特殊的系统标记，用于记录哪些消息被"删除"了。它只在 REPL 层保留完整历史，不会发送给 API，确保用户可以查看全部对话记录。

为什么中间消息可以删除：

✅ 任务已经明确 --- 前 3 条消息包含了用户意图
✅ 中间是探索 --- 大量试错、文件读取（已沉淀进后续决策）
✅ 当前状态更重要 --- 最近的操作才是工作焦点
✅ 可以重新执行 --- 需要时 AI 可以重新读取文件

朴素实现的问题

传统做法是插入占位符：

python 复制代码

# ❌ 朴素版：插入占位符
messages[:3] + [{"role": "user", "content": "[snipped 50 messages]"}] + messages[-47:]

问题：

⚠️ 违反 role 交替规则（可能出现 user → user(占位符) → assistant）
⚠️ 占位符占用上下文空间（虽小但无价值）
⚠️ UI 看不到完整历史

生产级实现：boundary message + 动态过滤

Claude Code 的方案：不插入占位符，而是：

添加一个 boundary message（系统消息，不发给 API）记录被删除的 UUID
REPL 保留完整历史（用户可查看）
发给 API 时动态过滤掉被标记的消息

触发条件：messages 数量 > 50 条

双重视图架构

flowchart LR subgraph REPL["REPL 层（UI）"] Full["完整历史 msg1 ~ msg80 + boundary message"] end subgraph Filter["动态过滤"] PSV["projectSnippedView() 按 removedUuids 过滤"] end subgraph API["API 层（模型）"] Filtered["过滤后消息 msg1-3 + msg53-80 无占位符，严格交替"] end Full -->|用户可滚动查看全部| Full Full --> PSV PSV --> Filtered Filtered -->|发送| Claude([Claude API]) style REPL fill:#e1f5ff style Filter fill:#fff3cd style API fill:#d4edda

L1 消息删除可视化

flowchart TB Original["📨 压缩前：80 条消息 msg1, msg2, msg3, ..., msg80"] Decision["⚙️ L1 Snip 决策 ━━━━━━━━━━ 消息数 > 50 条？ ✅ 触发压缩"] Rule["保留策略 ━━━━━━━━━━ • 前 3 条（任务上下文） • 后 47 条（当前焦点） • 删除中间 49 条"] REPL["💾 REPL 层（用户视角） ━━━━━━━━━━ 保留：msg1, msg2, msg3, ..., msg80 + boundary message ━━━━━━━━━━ ✨ 完整保留 80 条 + 1 条标记"] API["🚀 API 层（模型视角） ━━━━━━━━━━ msg1 ✅, msg2 ✅, msg3 ✅ ❌ 49 条被过滤 msg53 ✅, ..., msg80 ✅ ━━━━━━━━━━ 📊 动态过滤到 31 条"] Original --> Decision Decision --> Rule Rule --> REPL Rule --> API style Original fill:#ffebee,stroke:#e57373,stroke-width:3px style Decision fill:#e1f5fe,stroke:#4fc3f7,stroke-width:3px style Rule fill:#fff3e0,stroke:#ffb74d,stroke-width:3px style REPL fill:#f3e5f5,stroke:#ba68c8,stroke-width:3px style API fill:#e8f5e9,stroke:#81c784,stroke-width:3px

📖 图表说明：

层级	消息数量	特点	用户体验
压缩前	80 条	完整历史	上下文即将超限
REPL 层	80 条 + boundary	物理保留	✅ 可滚动查看全部
API 层	31 条	动态过滤	✅ 发送给 Claude

🔑 核心机制：

双重视图：用户看完整历史，模型看压缩版本
boundary message：记录被删除消息的 UUID 列表
无占位符 ：API 层直接过滤，不插入 [snipped...]
可逆性 ：随时可修改 removedUuids 恢复消息

boundary message 架构实现

typescript 复制代码

// 完整消息列表（REPL 层 - 用户可见）
replStore = [
  msg1, msg2, msg3, ..., msg80,  // 所有 80 条消息保留
  {
    type: "system",
    subtype: "snip_boundary",
    uuid: "boundary-001",
    snipMetadata: {
      removedUuids: ["msg-004", "msg-005", ..., "msg-052"],  // 标记哪些被"删除"
      keptRanges: [[0, 2], [53, 79]],  // 保留的范围
      totalRemoved: 49
    }
  }
]

// 发送给 API 前动态过滤（API 层 - 模型可见）
function projectSnippedView(messages: Message[]): Message[] {
  const boundary = findLastBoundaryMessage(messages)
  const removedIds = boundary?.snipMetadata?.removedUuids || []
  
  return messages.filter(msg => 
    msg.type !== "system" &&           // 过滤掉 boundary message 本身
    !removedIds.includes(msg.uuid)     // 过滤掉被标记删除的消息
  )
}

实际效果：

复制代码

REPL 层（用户看到）：
  所有 80 条消息 + boundary message
  ↓ 用户可滚动查看完整历史，不丢失信息

API 层（Claude 看到）：
  msg1-3 + msg53-80（共 31 条）
  ↓ 无占位符、严格 role 交替、自动保证合法

优势对比

特性	直接删除 + 占位符方案	Claude Code 的实现方案
占位符	插入 `[snipped...]`	无占位符 ✅
role 交替	可能违反 ⚠️	动态过滤保证严格交替 ✅
历史保留	物理删除 ❌	REPL 完整保留 ✅
可逆性	不可逆 ❌	可逆（修改 removedUuids） ✅
实现复杂度	简单（50 行代码）	复杂（UUID 追踪 + 动态过滤）

💡 关键洞察：生产级 Snip 通过双重视图架构，同时满足了用户体验（完整历史）和 API 约束（严格 role 交替），这是架构设计的精妙之处。

🛡️ 进阶优化：L1 的熔断机制

虽然 boundary message 架构已经很稳定，但生产环境中仍可能遇到极端情况：

并发修改导致 UUID 不一致
序列化/反序列化错误
第三方插件修改了消息结构

防御性实现（可选，适用于多租户 SaaS 或插件生态）：

typescript 复制代码

function projectSnippedView(messages: Message[]): Message[] {
  try {
    const boundary = findLastBoundaryMessage(messages)
    if (!boundary) return removeSystemMessages(messages)
    
    const removedIds = new Set(boundary.snipMetadata?.removedUuids || [])
    const filtered = messages.filter(msg => 
      msg.type !== 'system' && !removedIds.has(msg.uuid)
    )
    
    // 防御检查 1：过滤后不能为空
    if (filtered.length === 0) {
      console.error('[Snip] 过滤后消息为空，回退到全量')
      return removeSystemMessages(messages)
    }
    
    // 防御检查 2：role 必须严格交替
    if (!isRoleAlternating(filtered)) {
      console.error('[Snip] role 交替异常，回退到全量')
      return removeSystemMessages(messages)
    }
    
    // 防御检查 3：必须以 user 消息结尾
    if (filtered[filtered.length - 1]?.type !== 'user') {
      console.error('[Snip] 末尾不是 user 消息，回退到全量')
      return removeSystemMessages(messages)
    }
    
    return filtered
  } catch (error) {
    console.error('[Snip] 动态过滤失败，回退到全量消息', error)
    return removeSystemMessages(messages)
  }
}

// 辅助函数：移除系统消息（最小损失回退）
function removeSystemMessages(messages: Message[]): Message[] {
  return messages.filter(m => m.type !== 'system')
}

// 检查 role 是否严格交替
function isRoleAlternating(messages: Message[]): boolean {
  for (let i = 1; i < messages.length; i++) {
    if (messages[i].type === messages[i - 1].type) {
      return false
    }
  }
  return true
}

何时需要这个机制：

场景	是否需要	原因
多租户 SaaS 产品	✅ 推荐	用户可能恶意构造消息
插件生态系统	✅ 推荐	第三方代码可能修改消息
分布式存储（Redis/DB）	✅ 推荐	并发问题可能导致数据不一致
单机 Agent	⚠️ 可选	过度防御，增加维护成本
原型/研究项目	❌ 不需要	优先快速迭代

💡 工程哲学 ：防御性编程的核心是"graceful degradation"（优雅降级）。当压缩失败时，回退到全量消息，而不是让整个 Agent 崩溃。生产环境中，可用性 > 优化效果。

第三部分：L2 - Micro Compact 旧工具结果占位符 🔄

解决的核心问题

问题：即使删除了中间消息，保留的消息中仍有大量旧工具结果。

典型场景：

python 复制代码

# 经过 L1 snip 后，保留的 47 条消息中：
messages[-47:] = [
    # 30 轮之前的工具结果（已经不需要了）
    {"role": "user", "content": [{"type": "tool_result", "content": "... 10KB models.py ..."}]},
    {"role": "user", "content": [{"type": "tool_result", "content": "... 8KB views.py ..."}]},
    {"role": "user", "content": [{"type": "tool_result", "content": "... 15KB config.yaml ..."}]},
    # ... 还有 20+ 个旧工具结果 ...
    
    # 最近 3 轮的工具结果（当前需要）
    {"role": "user", "content": [{"type": "tool_result", "content": "Wrote auth.py"}]},
    {"role": "user", "content": [{"type": "tool_result", "content": "Tests: 15 passed"}]},
]

# 问题：旧的 models.py 内容（10KB）现在还需要吗？

设计思想：时间窗口 + 结构化占位符

核心洞察：工具结果的价值随时间快速衰减。

复制代码

最近 5 个工具结果：当前工作焦点 → 必须保留
之前的工具结果：信息已被利用 → 可以结构化占位符化

为什么旧工具结果可以删除：

✅ 时效性 --- AI 的注意力在最近的操作上
✅ 可重新执行 --- 需要时可以重新运行工具
✅ 信息已利用 --- 旧结果的信息已体现在后续操作中
✅ L1 已删中间 --- 很旧的已被删掉，L2 处理"不那么旧"的

L2 的防幻觉设计

问题场景 ：简单占位符 [Old tool result content cleared] 可能导致 AI 幻觉。

typescript 复制代码

// 第 10 轮：读取配置
AI: read_file("config.yaml")
结果: "database: postgres\nport: 5432\ntimeout: 30"

// 第 50 轮：L2 清理后
结果: "[Old tool result content cleared]"

// 第 52 轮：AI 需要回忆
AI: "根据之前读取的 config.yaml，数据库是 MySQL..."  ← 幻觉！

生产级方案：结构化元数据占位符，保留关键签名信息。

L2 vs L3 的区别

对比项	L3: tool_result_budget	L2: micro_compact
目标	单次工具结果太大	旧工具结果堆积
处理范围	最后一条 user 消息	所有保留的消息
压缩方式	持久化 + 预览	占位符（简短标记）
保留数量	全部（超大的除外）	最近 5 个（默认配置）

触发策略

根据 Claude Code 源码，L2 有两种触发模式：

1. 时间触发（Time-based）

触发条件：距离上一次 assistant 消息超过 60 分钟
原因：服务器端的 prompt cache 已过期（TTL = 5 分钟）
效果：清除旧工具结果，减少重写的上下文大小
保留数量：默认 5 个工具结果

💡 为什么是 60 分钟？ Anthropic 的 Prompt Caching 有 5 分钟的 TTL（Time-To-Live）。一旦超过 5 分钟没有新请求，缓存就会失效，下次请求需要重新发送完整上下文。Claude Code 采用 60 分钟作为阈值，是因为：

缓存已失效 - 60 分钟远超 5 分钟 TTL，缓存肯定已过期

上下文价值衰减 - 1 小时前的工具结果，时效性已大幅下降

成本优化 - 清理后的上下文更小，重建缓存成本更低

实际项目中，你可以根据对话频率调整这个阈值：

高频对话（< 5 分钟/轮）：不需要 L2，缓存始终有效

中频对话（10-30 分钟/轮）：30 分钟阈值即可

低频对话（> 1 小时/轮）：60 分钟或更长

2. 缓存编辑模式（Cached microcompact）

使用 cache editing 功能动态管理
仅在主线程启用
更智能的缓存管理

实现逻辑

typescript 复制代码

// 可压缩的工具类型
const COMPACTABLE_TOOLS = new Set<string>([
  'Read',
  'Bash',
  'Shell',
  'Grep',
  'Glob',
  'WebFetch',
  'WebSearch',
  'Write',
  'Edit',
])

// 默认配置
const config = {
  keepRecent: 5,               // 保留最近 5 个
  gapThresholdMinutes: 60,     // 60 分钟后触发
}

// 主函数（增强版）
export async function timeBasedMicrocompact(
  messages: Message[],
  config: TimeBasedMCConfig,
) {
  // 1. 收集所有可压缩的 tool_use 调用（按时间顺序）
  const compactableToolUseIds: string[] = []
  const toolNameMap = new Map<string, string>()  // tool_use_id -> tool_name
  
  for (const message of messages) {
    if (message.type === 'assistant') {
      for (const block of message.message.content) {
        if (block.type === 'tool_use' && COMPACTABLE_TOOLS.has(block.name)) {
          compactableToolUseIds.push(block.id)
          toolNameMap.set(block.id, block.name)  // 记录工具名称
        }
      }
    }
  }
  
  // 2. 保留最近 N 个，其余标记为可清除
  const keepSet = new Set(compactableToolUseIds.slice(-config.keepRecent))
  const clearSet = new Set(
    compactableToolUseIds.filter(id => !keepSet.has(id))
  )
  
  // 3. 遍历消息，替换旧的 tool_result 为结构化占位符
  const newMessages = messages.map(message => {
    if (message.type !== 'user') return message
    
    const newContent = message.message.content.map(block => {
      if (block.type !== 'tool_result') return block
      if (!clearSet.has(block.tool_use_id)) return block
      
      // 生成结构化占位符（保留元数据）
      const toolName = toolNameMap.get(block.tool_use_id) || 'Unknown'
      const metadata = extractMetadata(block.content, toolName)
      
      return {
        ...block,
        content: `[Compacted ${toolName} result${metadata ? `: ${metadata}` : ''}. Re-run if needed.]`
      }
    })
    
    return {
      ...message,
      message: { ...message.message, content: newContent }
    }
  })
  
  return { messages: newMessages }
}

// 极轻量的元数据提取（无 LLM，纯规则）
function extractMetadata(content: string, toolName: string): string {
  const MAX_LENGTH = 80  // 占位符总长度上限
  
  switch (toolName) {
    case 'Read':
      const lines = content.split('\n').length
      const sizeKB = (content.length / 1024).toFixed(1)
      return `${lines} lines, ${sizeKB}KB`
    
    case 'Bash':
    case 'Shell':
      // 提取关键结果标记
      if (content.includes('PASSED') || content.includes('passed')) return 'tests passed'
      if (content.includes('FAILED') || content.includes('failed')) return 'tests failed'
      if (content.includes('ERROR') || content.includes('Error')) return 'had errors'
      if (content.length < MAX_LENGTH) return content  // 短结果直接保留
      return `${content.split('\n').length} lines output`
    
    case 'Write':
    case 'Edit':
      return 'file written'
    
    case 'Grep':
    case 'Glob':
      const matchCount = content.split('\n').filter(l => l.trim()).length
      return `${matchCount} matches`
    
    case 'WebFetch':
    case 'WebSearch':
      return `${(content.length / 1024).toFixed(1)}KB fetched`
    
    default:
      return content.length < MAX_LENGTH ? content : `${content.length} chars`
  }
}

L2 内容替换可视化

flowchart TB Before["📦 压缩前：20 条 user 消息 ━━━━━━━━━━ msg 1-15: 旧工具结果（174KB） msg 16-20: 新工具结果（10KB）"] Process["⚙️ L2 处理策略 ━━━━━━━━━━ 时间触发：60 分钟后 保留最近 5 个 其余替换为占位符 消息数量不变"] After["✨ 压缩后：同样 20 条消息 ━━━━━━━━━━ msg 1-15: 占位符（180B） msg 16-20: 完整内容（10KB）"] Detail1["📝 占位符示例： [Compacted Read: 127 lines, 10KB] [Compacted Bash: tests passed]"] Detail2["✅ 保留示例： Write auth.py: 50B 完整 Bash pytest: 2KB 完整"] Before --> Process Process --> After After --> Detail1 After --> Detail2 style Before fill:#ffebee,stroke:#e57373,stroke-width:3px style Process fill:#e1f5fe,stroke:#4fc3f7,stroke-width:3px style After fill:#e8f5e9,stroke:#81c784,stroke-width:3px style Detail1 fill:#fff9c4,stroke:#fbc02d,stroke-width:2px style Detail2 fill:#c8e6c9,stroke:#66bb6a,stroke-width:2px

📖 图表说明：

时间	消息编号	压缩前	压缩后	节省
60 分钟前	1-15	174KB	180B	99.9%
最近	16-20	10KB	10KB	0%
总计	20 条	184KB	10.2KB	94.5%

🔑 核心机制：

就地修改 ：只改 tool_result.content，不删除消息
消息数量不变：压缩前后都是 20 条（L1 会删除消息，L2 不删除）
结构化占位符 ：[Compacted Read: 127 lines, 10KB] 保留元数据
防幻觉设计：AI 能看到工具类型、大小、状态签名
不违反 role 交替：消息结构完全不变

🎯 L1 vs L2 的本质区别：

L1：删除消息（80 → 31 条）
L2：替换内容（20 → 20 条，但 174KB → 180B）

效果对比：占位符方案

方案	占位符示例	Token 消耗	防幻觉效果	适用场景
简单清空	`[Old tool result content cleared]`	6	❌ 无上下文线索	❌ 不推荐
结构化元数据（推荐）	`[Compacted Read result: 127 lines, 15.3KB. Re-run if needed.]`	12	✅ 保留关键签名	✅ 生产环境
完整保留	原始 10KB 内容	~2500	✅ 无信息损失	仅最近 N 个

真实案例对比：

typescript 复制代码

// ❌ 简单占位符（容易幻觉）
tool_result: "[Old tool result content cleared]"
→ AI 可能臆测："数据库配置应该是 MySQL..."

// ✅ 结构化占位符（有锚点）
tool_result: "[Compacted Read result: 23 lines, 1.2KB. Re-run if needed.]"
→ AI 知道："这是个小配置文件，需要时可以重新读取"

压缩效果

python 复制代码

20 个工具结果（总计 200KB）
→ 保留最近 5 个（25KB，完整内容）
→ 其余 15 个替换为结构化占位符（~180 bytes）

节省约 174KB ≈ 43.5K tokens

💡 关键洞察：

L2 是就地修改（in-place），不涉及消息删除，因此不会违反 role 交替规则

结构化元数据占位符通过确定性规则（而非启发式解析）提取关键信息，防止 AI 幻觉

L2 和 L1 配合：L1 减少消息数量，L2 减少单条消息大小

📊 三层压缩对比总结

层级	解决问题	Claude Code 实现方式	复杂度	压缩效果	隐性成本
L3	单个工具结果太大	持久化 + 预览（换位存储）	中等	500KB → 2KB (99.6%)	磁盘 I/O + 存储
L1	消息数量过多	boundary message + 动态过滤 + 熔断	复杂	100 条 → 51 条 (49%)	UUID 追踪
L2	旧工具结果占空间	结构化占位符 + 时间触发 + 防幻觉	简单	174KB → 0.2KB (99.9%)	元数据提取

关键洞察：

L3 的核心是"换位存储"：AI 不需要立即看到全部内容，但需要知道内容在哪里
L1 的难点在于保持 role 交替 + 信息可恢复，生产级用双重视图架构 + 熔断机制解决
L2 通过结构化占位符防幻觉，就地修改不涉及消息结构变化
三者配合：L3 处理单个大文件，L1 批量删除，L2 精细化清理
所有策略都是零 API 调用：成本节省来自 Token 减少，而非 LLM 语义压缩（L4）

🎯 最佳实践与避坑指南

实施顺序建议

flowchart LR Start([开始实现]) --> L3[先实现 L3 投入产出比最高] L3 --> Test1{解决了 80% 问题?} Test1 -->|是| Monitor[监控即可] Test1 -->|否| L2[再实现 L2 简单且有效] L2 --> Test2{还有问题?} Test2 -->|否| Done[✅ 完成] Test2 -->|是| L1[最后实现 L1 复杂但必要] L1 --> Done style L3 fill:#e1f5ff style L2 fill:#d4edda style L1 fill:#fff3cd

推荐顺序：L3 → L2 → L1

原因：

L3 最简单且效果最好 - 一个大文件可能占 50% 上下文
L2 实现简单 - 就地修改，不涉及复杂架构
L1 最复杂 - 需要双重视图架构，最后实现

常见错误与解决方案

❌ 错误 1：过早压缩

typescript 复制代码

// 错误：消息数刚超过 10 就开始删除
if (messages.length > 10) {
  compact(messages)
}

问题：频繁压缩浪费计算资源，且过早删除可能丢失有用上下文。

正确做法：

typescript 复制代码

// 正确：设置合理阈值（如 50 条）
if (messages.length > 50) {
  compact(messages)
}

❌ 错误 2：忽略 role 交替检查

typescript 复制代码

// 错误：直接拼接头尾，可能出现 user → user
const result = [...messages.slice(0, 3), ...messages.slice(-47)]

问题：LLM API 强制要求 user/assistant 严格交替，违反会报错。

正确做法：

typescript 复制代码

// 正确：拼接后检查并修复
const result = ensureAlternating([
  ...messages.slice(0, 3), 
  ...messages.slice(-47)
])

❌ 错误 3：压缩后不保留上下文线索

场景 1：L3 持久化

typescript 复制代码

// ❌ 错误：直接删除内容
toolResult.content = '[Removed]'

问题：AI 完全不知道原内容是什么，无法判断是否需要重新获取。

正确做法：

typescript 复制代码

// ✅ L3：保留预览 + 路径
toolResult.content = `<persisted-output>
Full output: ${path}
Preview: ${content.slice(0, 2000)}
...
</persisted-output>`

场景 2：L2 占位符

typescript 复制代码

// ❌ 错误：无信息占位符
toolResult.content = '[Old tool result content cleared]'

问题：AI 无法判断原内容类型和重要性，容易产生幻觉。

正确做法：

typescript 复制代码

// ✅ L2：结构化元数据占位符
const metadata = extractMetadata(content, toolName)
toolResult.content = `[Compacted ${toolName} result: ${metadata}. Re-run if needed.]`

// 示例输出
'[Compacted Read result: 127 lines, 15.3KB. Re-run if needed.]'
'[Compacted Bash result: tests passed. Re-run if needed.]'

💡 原则：压缩不等于删除，要给 AI 留下"导航线索"。

❌ 错误 4：硬编码阈值

typescript 复制代码

// 错误：所有场景都用相同阈值
const THRESHOLD = 50

问题：不同场景需要不同策略（短对话 vs 长任务）。

正确做法：

typescript 复制代码

// 正确：可配置的策略
interface CompactConfig {
  messageThreshold: number    // L1 触发条件
  toolResultLimit: number     // L3 触发条件
  keepRecentTools: number     // L2 保留数量
  timeGapMinutes: number      // L2 时间阈值
}

const shortTaskConfig: CompactConfig = {
  messageThreshold: 30,
  toolResultLimit: 100_000,
  keepRecentTools: 3,
  timeGapMinutes: 30,
}

const longTaskConfig: CompactConfig = {
  messageThreshold: 100,
  toolResultLimit: 200_000,
  keepRecentTools: 10,
  timeGapMinutes: 60,
}

性能优化建议

场景	优化策略	效果
大文件频繁读取	L3 持久化 + 缓存路径	减少 95% 上下文占用
长对话（100+ 轮）	L1 + L2 组合使用	保持在 50 条消息内
低频对话（> 1 小时）	L2 激进清理（保留 3 个）	重建缓存成本最小化
高频对话（< 5 分钟）	禁用 L2（缓存始终有效）	避免不必要的清理

监控指标

在生产环境中，建议监控以下指标：

typescript 复制代码

interface CompactionMetrics {
  // 压缩前后对比
  beforeSize: number         // 压缩前 token 数
  afterSize: number          // 压缩后 token 数
  compressionRatio: number   // 压缩比
  
  // 执行统计
  l3Triggers: number         // L3 触发次数
  l1Triggers: number         // L1 触发次数
  l2Triggers: number         // L2 触发次数
  
  // 性能指标
  executionTimeMs: number    // 执行时间
  persistedFiles: number     // L3 持久化文件数
  
  // 业务指标
  conversationLength: number // 对话轮次
  avgTokenPerMessage: number // 平均每条消息 token 数
}

// 示例：记录压缩效果
function logCompaction(before: Message[], after: Message[]) {
  const metrics: CompactionMetrics = {
    beforeSize: countTokens(before),
    afterSize: countTokens(after),
    compressionRatio: 1 - (afterSize / beforeSize),
    // ... 其他指标
  }
  
  console.log(`压缩效果：${metrics.compressionRatio * 100}%`)
  console.log(`节省：${metrics.beforeSize - metrics.afterSize} tokens`)
}

💼 实战案例：组合使用三层压缩

场景：长时间代码重构任务

背景：

用户要求重构一个 20 个文件的认证系统
对话持续 2 小时，120 轮交互
读取了 50+ 个文件，总计 5MB 代码

压缩策略：

typescript 复制代码

class AdaptiveCompaction {
  async compact(messages: Message[]): Promise<Message[]> {
    let result = messages
    
    // 第一步：L3 处理大文件（实时触发）
    const largeResults = this.findLargeToolResults(result)
    if (largeResults.length > 0) {
      result = await this.applyL3(result, largeResults)
      console.log(`L3: 持久化 ${largeResults.length} 个大文件`)
    }
    
    // 第二步：L1 删除中间消息（50 条以上触发）
    if (result.length > 50) {
      const beforeCount = result.length
      result = this.applyL1(result)
      console.log(`L1: ${beforeCount} → ${result.length} 条消息`)
    }
    
    // 第三步：L2 清理旧工具结果（60 分钟后触发）
    if (this.shouldTriggerL2(result)) {
      const beforeSize = this.countTokens(result)
      result = this.applyL2(result)
      const afterSize = this.countTokens(result)
      console.log(`L2: ${beforeSize} → ${afterSize} tokens`)
    }
    
    return result
  }
  
  private findLargeToolResults(messages: Message[]): ToolResult[] {
    // 查找超过 200KB 的工具结果
    return messages
      .flatMap(m => m.content)
      .filter(c => c.type === 'tool_result' && c.content.length > 200_000)
  }
}

压缩效果：

阶段	操作	效果
原始	120 条消息，5MB 内容	~1.25M tokens
L3 后	持久化 8 个大文件	~300K tokens (↓ 76%)
L1 后	删除中间 70 条消息	~150K tokens (↓ 50%)
L2 后	清理 40 个旧工具结果	~80K tokens (↓ 47%)
最终	50 条消息，预览内容	~80K tokens (↓ 94%)

成本对比：

bash 复制代码

无压缩：1.25M tokens × $15/M = $18.75
压缩后：80K tokens × $15/M = $1.20

节省：$17.55 (93.6%)
零 API 调用成本：$0（所有压缩均在本地完成）

隐性成本评估：

ini 复制代码

磁盘占用：8 个文件 × ~500KB = ~4MB（会话结束后清理）
I/O 时间：8 次写入 × 1ms = 8ms（可忽略）
CPU 开销：UUID 追踪 + 元数据提取 < 10ms

总隐性成本 < $0.001（相比节省的 $17.55，ROI > 17,550 倍）

📋 快速参考卡片

三层压缩速查表

你的问题	使用策略	关键参数	预期效果
读取日志文件后上下文爆炸	L3	limit=200KB, preview=2KB	99.6% 压缩
对话超过 50 轮，历史消息堆积	L1	keepFirst=3, keepRecent=47	减少 49% 消息
停顿 1 小时后重新对话	L2	gapMinutes=60, keepRecent=5	99.4% 压缩旧结果
长任务（2+ 小时，100+ 轮）	L3+L1+L2	组合使用	90%+ 总压缩比

代码模板

typescript 复制代码

// 完整的三层压缩管理器
class ContextCompactionManager {
  private l3 = new SimpleToolResultStorage()
  private l1 = new SimpleSnipCompact()
  private l2 = new SimpleMicroCompact()
  
  async compact(messages: Message[]): Promise<Message[]> {
    // 按优先级顺序执行
    let result = await this.l3.enforceLimit(messages)  // 先处理大文件
    result = this.l1.compact(result)                   // 再删除中间消息
    result = this.l2.compact(result)                   // 最后清理旧结果
    
    return result
  }
  
  getMetrics(before: Message[], after: Message[]) {
    return {
      messageCount: `${before.length} → ${after.length}`,
      tokenCount: `${countTokens(before)} → ${countTokens(after)}`,
      compressionRatio: `${(1 - countTokens(after) / countTokens(before)) * 100}%`
    }
  }
}

// 一行使用
const manager = new ContextCompactionManager()
const compacted = await manager.compact(messages)

🚀 下篇预告

L3/L1/L2 能解决 80% 的场景，但当免费手段用尽时呢？下篇将揭示：

✅ L4 如何用 LLM 实现 99% 压缩比？
✅ reactive_compact 紧急兜底机制如何设计？
✅ 如何在自己的 Agent 中应用这些策略？（实战代码）
✅ 不同场景的决策树和选择建议

👉 [Claude Code 上下文管理（三）：语义压缩与生产实践](#Claude Code 上下文管理（三）：语义压缩与生产实践 "#")

💬 讨论和交流

你在实现压缩策略时，遇到过哪些问题？

欢迎在评论区分享：

你觉得哪一层最难实现？
boundary message 架构给你什么启发？
你会如何优化时间触发策略？

如果这篇文章对你有帮助，欢迎 点赞收藏，让更多人看到 🙏

系列导航

（一）为什么 Agent 会"失忆"？

（二）零成本压缩三板斧 ← 当前

（三）语义压缩与生产实践 👈 下一篇