Claude Code 的上下文管理：多层渐进式压缩架构深度解析

Claude Code 上下文压缩架构：四层渐进式压缩策略深度解析

引言：从"扫垃圾"到"写摘要"的多层防御

当 AI 助手进行长时间对话时，上下文窗口会逐渐填满。Claude Code 的解决方案不是简单的"满了就删"，而是四层渐进式压缩架构------就像城市的垃圾处理系统：

日常保洁：每天清理街道垃圾（MicroCompact）
定期回收：每周回收可重复利用物品（Context Collapse）
智能整理：根据使用频率整理仓库（Session Memory Compact）
深度清理：定期大扫除并归档（Full Compact）

四层压缩策略对比

Claude Code 实际上有四种压缩机制，根据场景和紧急程度分层触发：

压缩机制	触发时机	KV Cache 影响	压缩效果	文件位置
MicroCompact	工具结果达到阈值	不破坏(cache_edits)	删除旧工具结果	microCompact.ts
Context Collapse	消息量达到阈值	破坏(需重建)	折叠对话段	contextCollapse.ts
Session Memory Compact	自动提取会话记忆后	破坏	基于记忆压缩	sessionMemoryCompact.ts
Full Compact	兜底策略	破坏	LLM 完整总结	compact.ts

1. MicroCompact：无损删除工具结果

核心问题

长时间 Agent 执行中，tool_result 会占用大量上下文，如何删除旧结果但不破坏 KV Cache？

技术方案

使用 Anthropic API 的 cache_edits 特性，通过注意力掩码实现"软删除"：

typescript 复制代码

// src/services/compact/microCompact.ts

// 示例：删除工具结果 tool_1234
const cacheEdits = {
  type: 'cache_edits',
  cache_reference: { index: 0 },  // 引用之前的缓存
  edits: [
    {
      operation: 'delete',
      tool_use_id: 'tool_1234'  // 标记删除
    }
  ]
}

// API 调用时附加
const response = await anthropic.messages.create({
  model: 'claude-sonnet-4',
  messages: conversation,
  system: [systemPrompt, cacheEdits]  // 在 system 中注入
})

关键机制

Transformer 注意力掩码：在 Attention 计算时，将被删除工具结果的 token 权重置为 0
KV Cache 保留 ：已缓存的 Key-Value 矩阵不重建，只是被"忽略"
成本对比 ：
- 传统删除：需要重建整个 prefix，成本 = 100% 输入 tokens
- MicroCompact：只需 cache read，成本 = ~10% 输入 tokens

两种触发模式

1. Count-based（计数触发）- Cached MicroCompact

typescript 复制代码

// 维护工具结果计数
state = {
  toolOrder: ['tool_1', 'tool_2', ..., 'tool_20'],  // 工具 ID 队列
  deletedRefs: new Set(['tool_1', 'tool_2'])        // 已删除的 ID
}

// 当 toolOrder.length - deletedRefs.size >= 20 时触发
const toDelete = toolOrder.slice(0, toolOrder.length - 10)  // 删除前 10 个

触发阈值（GrowthBook 配置）：

typescript 复制代码

{
  triggerThreshold: 20,  // 累计 20 个工具结果时触发
  keepRecent: 10         // 保留最近 10 个
}

2. Time-based（时间触发）

typescript 复制代码

// 检查最后一条助手消息的时间戳
const lastAssistant = messages.findLast(m => m.type === 'assistant')
const gapMinutes = (Date.now() - new Date(lastAssistant.timestamp).getTime()) / 60_000

// 如果间隔 > 5 分钟（默认），说明 cache 已过期
if (gapMinutes > 5) {
  // 直接清空工具结果内容（不使用 cache_edits）
  block.content = '[Old tool result content cleared]'
}

为什么 Time-based 不使用 cache_edits？

间隔 5 分钟后，Anthropic API 的 Prompt Cache 已过期（TTL = 5min）
此时 KV Cache 已经失效，使用 cache_edits 无意义
直接修改内容反而更简单（相当于重建 prompt）

与其他 API 供应商的对比

Anthropic ：提供 cache_edits API（2024 推出）
OpenAI：无类似特性（需完整重发 messages）
Google Gemini：无类似特性
✅ 这是 Anthropic 独有的优化，其他应用开发者可参考这个模式要求供应商支持

阈值来源

typescript 复制代码

// 通过 GrowthBook A/B 测试平台动态调整
const config = getCachedMCConfig()  // 从远程获取
config.triggerThreshold // 可能是 15/20/25，由实验决定

2. Context Collapse：折叠对话段（实验性）

核心思路

将历史对话段折叠为占位符，需要时再"展开"，类似代码编辑器的代码折叠。

数据结构

typescript 复制代码

// src/services/ContextCollapse/contextCollapse.ts

type CommittedCollapse = {
  collapseId: string            // 唯一 ID："0000000012345678"
  summary: string               // 摘要："完成了登录模块重构"
  archived: Message[]           // 原始消息（折叠的内容）
  firstArchivedUuid: string     // 边界：第一条消息的 uuid
  lastArchivedUuid: string      // 边界：最后一条消息的 uuid
}

// 全局状态
const state = {
  committed: CommittedCollapse[]  // 已提交的折叠
  staged: StagedCollapse[]        // 待提交的折叠
}

三阶段工作流

阶段 1：Staging（暂存）- 后台 Agent 生成摘要

typescript 复制代码

// 检测到消息量超过阈值时触发
if (messages.length > threshold) {
  // 1. Fork 一个子 Agent
  const summary = await runForkedAgent({
    prompt: '将这段对话总结为 1-2 句话',
    messages: messages.slice(0, 100)  // 选择前 100 条
  })

  // 2. 暂存（不立即生效）
  staged.push({
    startUuid: messages[0].uuid,
    endUuid: messages[99].uuid,
    summary,
    risk: calculateRisk(messages)  // 评估风险
  })
}

阶段 2：Commit（提交）- 下次 query 发送时生效

typescript 复制代码

// query.ts：在构建 API 请求前
const decoratedMessages = decorateMessagesWithCollapses(messages, state)

function decorateMessagesWithCollapses(msgs: Message[], state: State) {
  // 将 staged 转为 committed
  for (const s of state.staged) {
    state.committed.push({
      collapseId: generateId(),
      summary: s.summary,
      archived: msgs.filter(m => isBetween(m.uuid, s.startUuid, s.endUuid)),
      ...
    })
  }
  state.staged = []

  // 应用折叠：通过 projectView 过滤消息
  return projectView(msgs, state)
}

阶段 3：ProjectView（视图过滤）- 只是"隐藏"，不删除

typescript 复制代码

function projectView(messages: Message[], state: State): Message[] {
  const result = []
  for (const msg of messages) {
    // 检查是否在某个折叠范围内
    const collapse = state.committed.find(c =>
      isBetween(msg.uuid, c.firstArchivedUuid, c.lastArchivedUuid)
    )

    if (collapse) {
      // 跳过此消息（但不删除，仍保留在 replMessages 数组中）
      continue
    }

    result.push(msg)
  }

  // 在折叠点插入占位符
  for (const collapse of state.committed) {
    const index = result.findIndex(m => m.uuid === collapse.firstArchivedUuid) - 1
    result.splice(index, 0, createPlaceholder(collapse))
  }

  return result
}

function createPlaceholder(collapse: CommittedCollapse): Message {
  return {
    type: 'user',
    content: `<collapsed id="${collapse.collapseId}">${collapse.summary}</collapsed>`,
    uuid: collapse.summaryUuid  // 占位符有自己的 uuid
  }
}

关键特性

📦 messages 永远不移动 ：原始消息仍在 replMessages 数组中
🔍 projectView 只是过滤器：每次 API 调用时动态计算要发送的消息
💾 持久化 ：折叠状态写入 transcript（通过 ContextCollapseCommitEntry）

与 MicroCompact 的对比

维度	MicroCompact	Context Collapse
压缩对象	工具结果（tool_result）	对话段（多条消息）
KV Cache	保留（cache_edits）	破坏（内容变了）
可恢复性	不可恢复（已删除）	可恢复（只是隐藏）
成本	~10% tokens	~125% tokens（重建）
触发时机	工具数量阈值	消息数量阈值

为什么 Context Collapse 会破坏 KV Cache？

虽然原始消息没删除，但发送给 API 的内容变了：

typescript 复制代码

// 折叠前
messages = [msg1, msg2, msg3, msg4, msg5]

// 折叠后（projectView 输出）
filteredMessages = [
  msg1,
  placeholder("<collapsed>完成了任务</collapsed>"),  // 新内容！
  msg5
]

Transformer 模型会看到：

原本的 token 序列：[msg1_tokens, msg2_tokens, msg3_tokens, msg4_tokens, msg5_tokens]
折叠后的序列：[msg1_tokens, placeholder_tokens, msg5_tokens]

序列不同 = KV Cache 无效 = 需要重新计算 Attention

成本对比：

不折叠：200k tokens，可能超限，触发更昂贵的 Full Compact
折叠：125% tokens（重建成本），但避免了 Full Compact

什么时候"恢复"折叠的消息？

❌ 不会真正恢复 ：原始消息一直在 replMessages 中，从未删除
✅ 所谓恢复是指：清除 projectView 的过滤规则
场景：当 Full Compact 触发时，会清空所有折叠状态，重新开始

使用场景

适合：对话轮次多但不复杂（如多次简单问答）
不适合：工具调用密集型任务（此时 MicroCompact 更优）

3. Session Memory Compact：基于会话记忆压缩

核心思路

在后台持续提取"会话记忆"（调用 LLM 生成摘要），压缩时直接使用已生成的记忆，无需再次调用 LLM。

关键优势：将 LLM 调用成本分摊到对话过程中，压缩时零延迟。

工作流程

步骤 1：后台提取会话记忆（与主对话并行）

typescript 复制代码

// src/services/SessionMemory/sessionMemory.ts

// 在每次 Agent 响应后触发（post-sampling hook）
registerPostSamplingHook(async ({ messages }) => {
  const currentTokens = tokenCountWithEstimation(messages)

  // 阈值检查
  if (currentTokens > 50_000 && toolCallsSince > 10) {
    // 启动 Forked Agent 提取记忆
    await runForkedAgent({
      prompt: buildSessionMemoryUpdatePrompt(currentMemory),
      tools: [FileEditTool],  // 只能编辑记忆文件
      querySource: 'session_memory'
    })
  }
})

步骤 2：记忆存储在 Markdown 文件

markdown 复制代码

# ~/.claude-sessions/<session-id>/session_memory.md

## 项目概要
正在重构一个老旧的登录模块，将 JavaScript 迁移到 TypeScript。

## 完成的任务
1. 分析了 src/auth/login.js 的代码结构
2. 使用 GrepTool 找到所有调用 login() 的 12 处引用
3. 生成了新的 src/auth/login.ts，添加了类型注解
4. 更新了测试文件 tests/auth.test.ts
5. 运行测试，发现 2 个失败用例并修复

## 关键决策
- 使用 bcrypt 进行密码哈希（而非 crypto）
- 采用 JWT 而非 Session 存储用户状态

## 当前状态
所有测试通过，待用户 review 代码。

步骤 3：Session Memory Compact 触发

typescript 复制代码

// src/services/compact/sessionMemoryCompact.ts

async function trySessionMemoryCompaction(messages: Message[]) {
  // 1. 读取记忆文件
  const sessionMemory = await getSessionMemoryContent()
  if (!sessionMemory) return null

  // 2. 找到"已总结的消息边界"
  const lastSummarizedId = getLastSummarizedMessageId()
  const lastSummarizedIndex = messages.findIndex(m => m.uuid === lastSummarizedId)

  // 3. 计算要保留的消息
  //    - 起点：lastSummarizedIndex + 1
  //    - 终点：messages.length
  //    - 约束：至少保留 10k tokens 和 5 条文本消息
  const startIndex = calculateMessagesToKeepIndex(messages, lastSummarizedIndex)
  const messagesToKeep = messages.slice(startIndex)

  // 4. 构建压缩结果
  return {
    boundaryMarker: createCompactBoundaryMessage('auto', preCompactTokens),
    summaryMessages: [
      createUserMessage({
        content: `# Session Summary\n\n${sessionMemory}`,
        isCompactSummary: true
      })
    ],
    messagesToKeep,
    postCompactTokenCount: estimateMessageTokens([...summaryMessages, ...messagesToKeep])
  }
}

关键问题 1：会话记忆有几份？会合并吗？

✅ 只有一份：

记忆文件路径固定：~/.claude-sessions/<session-id>/session_memory.md
每次提取时，Agent 会增量更新这个文件（使用 FileEditTool）

合并机制：

typescript 复制代码

// 后台 Agent 收到的 prompt
const prompt = `
当前记忆内容：
${currentMemory}

最新对话：
${newMessages}

请更新记忆文件，要求：
1. 合并新信息到现有章节
2. 如果有冲突，用最新信息覆盖
3. 保持结构化（Markdown 格式）
`

示例：合并过程

markdown 复制代码

# 初始状态（第 1 次提取）
## 完成的任务
1. 分析了 login.js

# 第 2 次提取后
## 完成的任务
1. 分析了 login.js
2. 生成了 login.ts  ← 新增

# 第 3 次提取后
## 完成的任务
1. 分析了 login.js
2. 生成了 login.ts，并修复了类型错误  ← 更新（合并）
3. 运行了测试  ← 新增

关键问题 2：触发时机是什么？

触发条件（需同时满足）：

✅ Session Memory 特性开关开启 （tengu_session_memory gate）
✅ Session Memory Compact 特性开关开启 （tengu_sm_compact gate）
✅ 记忆文件存在且非空（不是模板）
✅ Auto Compact 即将触发（上下文接近阈值）

触发时机：

❌ 不是后台任务扫描：没有独立的定时任务
✅ 在 Auto Compact 流程中优先尝试：

typescript 复制代码

// src/services/compact/autoCompact.ts

async function autoCompact(messages: Message[]) {
  // 1. 检查是否需要压缩
  if (tokenCount < threshold * 0.8) return messages

  // 2. 优先尝试 Session Memory Compact
  const smResult = await trySessionMemoryCompaction(messages)
  if (smResult) {
    logEvent('tengu_sm_compact_success')
    return buildPostCompactMessages(smResult)
  }

  // 3. 回退到 Full Compact
  const fullResult = await fullCompact(messages)
  return buildPostCompactMessages(fullResult)
}

执行流程图：

scss 复制代码

用户发送消息
  ↓
tokenCount > threshold?
  ↓ 是
trySessionMemoryCompaction()
  ↓
记忆文件存在？
  ↓ 是
计算要保留的消息
  ↓
postCompactTokenCount < threshold?
  ↓ 是
✅ 使用 Session Memory Compact
  ↓ 否
⚠️ 回退到 Full Compact

为什么 Session Memory Compact 优先于 Full Compact？

成本更低：压缩时无需调用 LLM（摘要已在后台提前生成）
质量更高：记忆是增量积累的，信息更完整
速度更快：直接读取文件，无需等待 API 响应

时间线对比：

Session Memory Compact：

复制代码

对话进行中（后台）          压缩触发时（前台）
  ↓                          ↓
每次响应后调用 LLM  →  →  → 直接读取文件
生成摘要写入文件            使用已有摘要
（分摊成本）                （零延迟）

Full Compact：

markdown 复制代码

对话进行中（后台）          压缩触发时（前台）
  ↓                          ↓
无操作                       调用 LLM 生成摘要
                            （集中成本 + 等待响应）

配置参数（GrowthBook 动态调整）

typescript 复制代码

{
  minTokens: 10_000,          // 压缩后最少保留 10k tokens
  minTextBlockMessages: 5,    // 压缩后最少保留 5 条文本消息
  maxTokens: 40_000           // 压缩后最多保留 40k tokens
}

实际效果

假设 200k tokens 的对话，记忆文件 ~3k tokens
保留最近 15k tokens 的消息
压缩后总计：3k（记忆）+ 15k（消息）= 18k tokens
压缩率：91%（200k → 18k）

4. Full Compact：兜底的完整压缩

触发条件

当以上三种机制都无法解决时：

Session Memory Compact 失败（记忆文件不存在/太大）
Context Collapse 无法生效（消息太复杂）
上下文仍然超限

工作原理

typescript 复制代码

// src/services/compact/compact.ts

async function fullCompact(messages: Message[]) {
  // 1. 选择要压缩的消息段
  const toCompact = messages.slice(0, -20)  // 保留最近 20 条

  // 2. 调用 Claude API 生成摘要
  const summary = await anthropic.messages.create({
    model: 'claude-sonnet-4',
    system: '你是上下文压缩助手...',
    messages: [
      { role: 'user', content: JSON.stringify(toCompact) }
    ]
  })

  // 3. 替换原消息
  return [
    createCompactBoundaryMessage('auto', preCompactTokens),
    createUserMessage({ content: summary, isCompactSummary: true }),
    ...messages.slice(-20)
  ]
}

特点

⚠️ 延迟最高：压缩时才调用 LLM，需要等待 API 响应（阻塞用户）
⚠️ 单次成本高：一次性总结大量对话，token 消耗集中
✅ 最可靠：总能解决上下文超限问题

与其他机制的成本对比：

MicroCompact：零 LLM 成本（只删除工具结果）
Context Collapse：低 LLM 成本（后台生成简短摘要，1-2 句话）
Session Memory：中 LLM 成本（后台持续提取，分摊到对话过程）
Full Compact：高 LLM 成本（压缩时完整总结，成本集中 + 用户等待）

四种压缩机制的决策树

scss 复制代码

上下文压力检测
  ↓
工具结果 > 20 个？
  ↓ 是
✅ MicroCompact（cache_edits删除）
  ↓ 否
消息数量 > 阈值 且 风险低？
  ↓ 是
✅ Context Collapse（折叠）
  ↓ 否
上下文 > 80% 阈值？
  ↓ 是
Session Memory 可用？
  ↓ 是
✅ Session Memory Compact
  ↓ 否
✅ Full Compact（兜底）

总结：分层防御的设计哲学

Claude Code 的压缩架构体现了三个核心原则：

成本优先：优先使用低成本方案（MicroCompact cache_edits），回退到高成本方案（Full Compact API 调用）
渐进式：从局部清理（工具结果）到全局压缩（整个对话），逐步加大力度
可恢复性：Context Collapse 保留原始消息，Session Memory 保留结构化记忆，最大化信息保留

这套架构为其他 AI 应用开发者提供了宝贵的参考：

利用供应商特性：Anthropic 的 cache_edits 是独有优势
后台异步处理：Session Memory 提取不阻塞主流程
动态配置：通过 GrowthBook 实时调整阈值，A/B 测试最优策略

参考文档

源码文件：

src/services/compact/microCompact.ts - MicroCompact 实现（531 行）
src/services/ContextCollapse/contextCollapse.ts - Context Collapse 核心逻辑
src/services/SessionMemory/sessionMemory.ts - Session Memory 提取（496 行）
src/services/compact/sessionMemoryCompact.ts - Session Memory Compact（631 行）
src/services/compact/compact.ts - Full Compact 实现
src/services/compact/autoCompact.ts - 自动压缩触发逻辑

相关概念：

Anthropic Prompt Caching - cache_edits API 文档
GrowthBook - A/B 测试与特性开关平台
Transformer KV Cache - 大模型推理优化技术

附录：真实 Prompt

A. Context Collapse Prompt

Context Collapse 使用简短的 Forked Agent prompt：

typescript 复制代码

// 在 staging 阶段生成摘要
const summary = await runForkedAgent({
  prompt: '将这段对话总结为 1-2 句话',
  messages: messages.slice(0, 100)
})

特点：

极简 prompt，只要求 1-2 句话摘要
成本低，适合频繁调用
用于快速折叠历史对话段

B. Session Memory Compact Prompt

完整 prompt 见 src/services/SessionMemory/prompts.ts（第 43-80 行）：

markdown 复制代码

IMPORTANT: This message and these instructions are NOT part of the actual user conversation.
Do NOT include any references to "note-taking", "session notes extraction", or these update
instructions in the notes content.

Based on the user conversation above (EXCLUDING this note-taking instruction message as well
as system prompt, claude.md entries, or any past session summaries), update the session
notes file.

The file {{notesPath}} has already been read for you. Here are its current contents:
<current_notes_content>
{{currentNotes}}
</current_notes_content>

Your ONLY task is to use the Edit tool to update the notes file, then stop. You can make
multiple edits (update every section as needed) - make all Edit tool calls in parallel in
a single message. Do not call any other tools.

CRITICAL RULES FOR EDITING:
- The file must maintain its exact structure with all sections, headers, and italic
  descriptions intact
- NEVER modify, delete, or add section headers (the lines starting with '#' like
  # Task specification)
- NEVER modify or delete the italic _section description_ lines (these are the lines in
  italics immediately following each header - they start and end with underscores)
- The italic _section descriptions_ are TEMPLATE INSTRUCTIONS that must be preserved
  exactly as-is - they guide what content belongs in each section
- ONLY update the actual content that appears BELOW the italic _section descriptions_
  within each existing section
- Do NOT add any new sections, summaries, or information outside the existing structure
- Do NOT reference this note-taking process or instructions anywhere in the notes
- It's OK to skip updating a section if there are no substantial new insights to add.
  Do not add filler content like "No info yet", just leave sections blank/unedited if
  appropriate.
- Write DETAILED, INFO-DENSE content for each section - include specifics like file paths,
  function names, error messages, exact commands, technical details, etc.
- For "Key results", include the complete, exact output the user requested (e.g., full
  table, full answer, etc.)
- Do not include information that's already in the CLAUDE.md files included in the context
- Keep each section under ~2000 tokens/words - if a section is approaching this limit,
  condense it by cycling out less important details while preserving the most critical
  information
- Focus on actionable, specific information that would help someone understand or recreate
  the work discussed in the conversation
- IMPORTANT: Always update "Current State" to reflect the most recent work - this is
  critical for continuity after compaction

Use the Edit tool with file_path: {{notesPath}}

STRUCTURE PRESERVATION REMINDER:
Each section has TWO parts that must be preserved exactly as they appear in the current file:
1. The section header (line starting with #)
2. The italic description line (the _italicized text_ immediately after the header - this
   is a template instruction)

You ONLY update the actual content that comes AFTER these two preserved lines. The italic
description lines starting and ending with underscores are part of the template structure,
NOT content to be edited or removed.

REMEMBER: Use the Edit tool in parallel and stop. Do not continue after the edits. Only
include insights from the actual user conversation, never from these note-taking
instructions. Do not delete or change section headers or italic _section descriptions_.

模板结构 (DEFAULT_SESSION_MEMORY_TEMPLATE)：

markdown 复制代码

# Session Title
_A short and distinctive 5-10 word descriptive title for the session. Super info dense, no filler_

# Current State
_What is actively being worked on right now? Pending tasks not yet completed. Immediate next steps._

# Task specification
_What did the user ask to build? Any design decisions or other explanatory context_

# Files and Functions
_What are the important files? In short, what do they contain and why are they relevant?_

# Workflow
_What bash commands are usually run and in what order? How to interpret their output if not obvious?_

# Errors & Corrections
_Errors encountered and how they were fixed. What did the user correct? What approaches failed
and should not be tried again?_

# Codebase and System Documentation
_What are the important system components? How do they work/fit together?_

# Learnings
_What has worked well? What has not? What to avoid? Do not duplicate items from other sections_

# Key results
_If the user asked a specific output such as an answer to a question, a table, or other document,
repeat the exact result here_

# Worklog
_Step by step, what was attempted, done? Very terse summary for each step_

特点：

结构化 Markdown 输出（9 个固定章节）
增量更新（使用 Edit tool 修改现有文件）
信息密度高（每个章节限制 ~2000 tokens）
总预算：12000 tokens

C. Full Compact Prompt

完整 prompt 见 src/services/compact/prompt.ts（第 61-143 行）：

markdown 复制代码

CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.

- Do NOT use Read, Bash, Grep, Glob, Edit, Write, or ANY other tool.
- You already have all the context you need in the conversation above.
- Tool calls will be REJECTED and will waste your only turn --- you will fail the task.
- Your entire response must be plain text: an <analysis> block followed by a <summary> block.

Your task is to create a detailed summary of the conversation so far, paying close attention
to the user's explicit requests and your previous actions. This summary should be thorough
in capturing technical details, code patterns, and architectural decisions that would be
essential for continuing development work without losing context.

Before providing your final summary, wrap your analysis in <analysis> tags to organize your
thoughts and ensure you've covered all necessary points. In your analysis process:

1. Chronologically analyze each message and section of the conversation. For each section
   thoroughly identify:
   - The user's explicit requests and intents
   - Your approach to addressing the user's requests
   - Key decisions, technical concepts and code patterns
   - Specific details like:
     - file names
     - full code snippets
     - function signatures
     - file edits
   - Errors that you ran into and how you fixed them
   - Pay special attention to specific user feedback that you received, especially if the
     user told you to do something differently.
2. Double-check for technical accuracy and completeness, addressing each required element
   thoroughly.

Your summary should include the following sections:

1. Primary Request and Intent: Capture all of the user's explicit requests and intents in
   detail
2. Key Technical Concepts: List all important technical concepts, technologies, and
   frameworks discussed.
3. Files and Code Sections: Enumerate specific files and code sections examined, modified,
   or created. Pay special attention to the most recent messages and include full code
   snippets where applicable and include a summary of why this file read or edit is important.
4. Errors and fixes: List all errors that you ran into, and how you fixed them. Pay special
   attention to specific user feedback that you received, especially if the user told you
   to do something differently.
5. Problem Solving: Document problems solved and any ongoing troubleshooting efforts.
6. All user messages: List ALL user messages that are not tool results. These are critical
   for understanding the users' feedback and changing intent.
7. Pending Tasks: Outline any pending tasks that you have explicitly been asked to work on.
8. Current Work: Describe in detail precisely what was being worked on immediately before
   this summary request, paying special attention to the most recent messages from both
   user and assistant. Include file names and code snippets where applicable.
9. Optional Next Step: List the next step that you will take that is related to the most
   recent work you were doing. IMPORTANT: ensure that this step is DIRECTLY in line with
   the user's most recent explicit requests, and the task you were working on immediately
   before this summary request. If your last task was concluded, then only list next steps
   if they are explicitly in line with the users request. Do not start on tangential
   requests or really old requests that were already completed without confirming with
   the user first.
   If there is a next step, include direct quotes from the most recent conversation showing
   exactly what task you were working on and where you left off. This should be verbatim
   to ensure there's no drift in task interpretation.

Here's an example of how your output should be structured:

<example>
<analysis>
[Your thought process, ensuring all points are covered thoroughly and accurately]
</analysis>

<summary>
1. Primary Request and Intent:
   [Detailed description]

2. Key Technical Concepts:
   - [Concept 1]
   - [Concept 2]
   - [...]

3. Files and Code Sections:
   - [File Name 1]
      - [Summary of why this file is important]
      - [Summary of the changes made to this file, if any]
      - [Important Code Snippet]
   - [File Name 2]
      - [Important Code Snippet]
   - [...]

4. Errors and fixes:
    - [Detailed description of error 1]:
      - [How you fixed the error]
      - [User feedback on the error if any]
    - [...]

5. Problem Solving:
   [Description of solved problems and ongoing troubleshooting]

6. All user messages:
    - [Detailed non tool use user message]
    - [...]

7. Pending Tasks:
   - [Task 1]
   - [Task 2]
   - [...]

8. Current Work:
   [Precise description of current work]

9. Optional Next Step:
   [Optional Next step to take]

</summary>
</example>

Please provide your summary based on the conversation so far, following this structure and
ensuring precision and thoroughness in your response.

There may be additional summarization instructions provided in the included context. If so,
remember to follow these instructions when creating the above summary. Examples of instructions
include:
<example>
## Compact Instructions
When summarizing the conversation focus on typescript code changes and also remember the
mistakes you made and how you fixed them.
</example>

<example>
# Summary instructions
When you are using compact - please focus on test output and code changes. Include file
reads verbatim.
</example>

REMINDER: Do NOT call any tools. Respond with plain text only --- an <analysis> block followed
by a <summary> block. Tool calls will be rejected and you will fail the task.

特点：

严格禁止工具调用（NO_TOOLS_PREAMBLE + NO_TOOLS_TRAILER）
要求 <analysis> + <summary> 结构
9 个必需章节（与 Session Memory 类似但更详细）
强调"逐字引用"最近对话（防止任务漂移）
支持用户自定义指令（Compact Instructions）

Prompt 设计对比

维度	Context Collapse	Session Memory	Full Compact
长度	极短（1 句话）	长（~500 行）	很长（~170 行）
结构化	无	强（9 章节 Markdown）	强（9 章节 + XML）
工具使用	禁止	允许（仅 Edit）	禁止
输出格式	纯文本	Markdown 文件	XML（analysis + summary）
信息密度	低（简短摘要）	高（详细记录）	中（关键信息）
可定制性	无	高（用户可修改模板）	中（Compact Instructions）