基于 Claude Code 源码(通过 sourceMap 还原)逆向分析其上下文管理与记忆系统。
一、架构全景
Claude Code 的上下文管理是一个 三层渐进式压缩 + 多重自愈恢复 的系统。三层并非平行运行,而是按「代价从低到高」依次升级:
┌──────────────────────────────────────────────────────────┐
│ Token 消耗曲线 │
│ 0% ───── 87% ──── 90% ──── 93% ──── 95% ──── 100% │
│ │ │ │ │ │ │ │
│ 正常 AutoCompact 警告 错误 阻塞 硬上限 │
│ 运行 触发 提示 阈值 限制 │
└──────────────────────────────────────────────────────────┘
关键阈值(来自 services/compact/autoCompact.ts):
| 阈值 | 计算 | 说明 |
|---|---|---|
| AutoCompact 触发 | effective_window - 13,000 tokens | ~87% 上下文使用率 |
| 警告 | threshold - 20,000 tokens | ~90% 使用率时提示用户 |
| 错误 | threshold - 20,000 tokens | 高使用率时的错误状态 |
| 阻塞 | effective_window - 3,000 tokens | ~97% 时阻止新请求 |
| 熔断器 | 连续 3 次失败 | 停止尝试自动压缩 |
二、第一层:Micro-Compact(轻量清理)
源码文件 : services/compact/microCompact.ts
核心思想
不改变对话结构,只清除过期的工具返回内容(文件读取、Shell 输出、Grep 结果等)。
触发机制
| 模式 | 触发条件 | 操作 |
|---|---|---|
| 时间模式 | 距上次 assistant 消息 > 60 分钟 | 将旧工具结果替换为 [Old tool result content cleared] |
| 缓存模式 | 检测到缓存压力 | 用 API cache_edits 精准删除旧工具结果,保留缓存前缀 |
可清理的工具类型
typescript
const COMPACTABLE_TOOLS = new Set<string>([
FILE_READ_TOOL_NAME, // 文件读取
...SHELL_TOOL_NAMES, // Shell 命令输出
GREP_TOOL_NAME, // 搜索结果
GLOB_TOOL_NAME, // 文件匹配结果
WEB_SEARCH_TOOL_NAME, // 网页搜索结果
WEB_FETCH_TOOL_NAME, // 网页抓取结果
FILE_EDIT_TOOL_NAME, // 文件编辑(编辑结果)
FILE_WRITE_TOOL_NAME, // 文件写入(写入结果)
])
压缩效果
- 压缩率: 中等(保留对话结构,清除工具输出)
- 用户感知: 无感
- 代价: 几乎为零(不需要额外 API 调用)
三、第二层:Session Memory Compact(会话记忆压缩)
源码文件 : services/SessionMemory/sessionMemory.ts + services/compact/sessionMemoryCompact.ts
核心思想
后台维护一个 Markdown 格式的会话笔记文件 ,压缩时用笔记替代旧消息,保留最近消息原文不动。
工作流程
阶段 1:后台笔记提取(不阻塞对话)
对话进行中...
│
├─ 检查触发条件:
│ ├─ Token 阈值达到?(默认 min 2000 tokens)
│ ├─ 工具调用次数达到?(默认 min 3 次)
│ └─ 或:当前轮次无工具调用(自然对话间隙)
│
└─ 触发 → 启动 forked agent(后台进程,共享 prompt cache)
├─ 读取当前笔记文件
├─ 构建更新 prompt
├─ 用 Edit 工具增量更新笔记文件
└─ 记录 lastSummarizedMessageId
触发条件细节 (shouldExtractMemory()):
typescript
const shouldExtract =
(hasMetTokenThreshold && hasMetToolCallThreshold) ||
(hasMetTokenThreshold && !hasToolCallsInLastTurn)
即:token 阈值必须满足,在此基础上要么工具调用次数也满足,要么当前轮无工具调用(在自然对话间隙安全提取)。
阶段 2:压缩时使用笔记
Token 达到 87% 阈值...
│
├─ 检查是否有 Session Memory 笔记
│ ├─ 有笔记 → 使用笔记替代旧消息
│ │ ├─ 从 lastSummarizedMessageId 之后开始保留
│ │ ├─ 向前扩展直到满足最低 token 数(10K)
│ │ ├─ 向前扩展直到满足最低文本消息数(5条)
│ │ ├─ 不超过最大保留量(40K)
│ │ └─ 调整切分点,确保 tool_use/result 配对完整
│ │
│ └─ 无笔记 → 降级到第三层 Full Compact
关键算法:calculateMessagesToKeepIndex()
这是「最近几轮低压缩、之前高压缩」的核心实现:
typescript
// 从 lastSummarizedIndex 向前扩展,满足最低要求
for (let i = startIndex - 1; i >= floor; i--) {
totalTokens += estimateMessageTokens([msg])
if (hasTextBlocks(msg)) textBlockMessageCount++
startIndex = i
if (totalTokens >= config.maxTokens) break // 最多 40K
if (totalTokens >= config.minTokens // 至少 10K
&& textBlockMessageCount >= config.minTextBlockMessages) break // 至少 5 条
}
配置参数
typescript
const DEFAULT_SM_COMPACT_CONFIG = {
minTokens: 10_000, // 压缩后至少保留 10K tokens
minTextBlockMessages: 5, // 压缩后至少保留 5 条文本消息
maxTokens: 40_000, // 压缩后最多保留 40K tokens
}
效果 :最近 10K~40K tokens 的消息原文保留 ,更早的消息被 Session Memory 笔记总结替代。
压缩效果
- 压缩率: 高(旧消息变成笔记摘要,约 100K+ → ~5-10K)
- 用户感知: 最近几轮对话完全正常
- 代价: 低(不需要额外 API 调用来生成摘要,笔记已在后台维护)
四、第三层:Full Compact + Auto-Memory(全量压缩 + 持久记忆)
源码文件 : services/compact/compact.ts + services/extractMemories/extractMemories.ts
核心思想
当第二层不可用时(没有笔记、笔记为空),进行全量 API 压缩:将整个对话发给 Claude,生成结构化摘要。
4.1 Full Compact 流程
compactConversation(messages, context, ...)
│
├─ 1. 前处理: 剥离图片、已注入的附件
│
├─ 2. 消息分组: groupMessagesByApiRound()
│ └─ 按 API 轮次边界分组(每轮 = 一个 assistant.id)
│
├─ 3. 构建 Fork Agent(共享缓存)
│ ├─ 相同 system prompt(复用缓存前缀)
│ ├─ 相同 tools 列表(缓存键匹配)
│ └─ 用户消息 = 对话历史 + 压缩 prompt
│
├─ 4. 生成 9 段式结构化摘要:
│ 1. Primary Request and Intent(主要请求与意图)
│ 2. Key Technical Concepts(关键技术概念)
│ 3. Files and Code Sections(文件和代码片段)
│ 4. Errors and fixes(错误与修复)
│ 5. Problem Solving(问题解决过程)
│ 6. All user messages(全部用户消息)
│ 7. Pending Tasks(待办任务)
│ 8. Current Work(当前工作)
│ 9. Optional Next Step(可选下一步)
│
├─ 5. 后处理:
│ ├─ 恢复最近访问的文件(最多5个,每个≤5K tokens)
│ ├─ 恢复 Skill 附件(每个≤5K tokens)
│ ├─ 恢复 Plan 文件
│ └─ 执行 PostCompact hooks
│
└─ 6. 重试机制:
└─ "prompt_too_long"? → 丢弃最旧的消息组 → 重试
4.2 压缩 Prompt 设计
压缩使用精心设计的 prompt(services/compact/prompt.ts),要求 AI 先在 <analysis> 标签中做分析草稿,然后在 <summary> 标签中输出正式摘要。最终只保留 summary 部分。
关键设计:
- NO_TOOLS_PREAMBLE: 明确禁止压缩 agent 调用任何工具(避免浪费唯一的 turn)
- 分析-总结分离: analysis 作为草稿被丢弃,summary 作为最终输出
- 用户消息全部保留: 第 6 段要求列出所有用户消息,确保不丢失用户意图
- 支持自定义指令 : 用户可以通过
/compact命令传入自定义压缩偏好
4.3 Auto-Memory 持久记忆提取
独立于压缩运行,在每次查询循环结束时后台执行:
对话轮次结束...
│
├─ 检查: 主 agent?自动记忆启用?非远程模式?
│
├─ 检查: 主 agent 是否已直接写入记忆?
│ ├─ 是 → 跳过(避免重复)
│ └─ 否 → 继续
│
├─ 节流: 每 N 轮执行一次(默认 1)
│
└─ 启动 forked agent:
├─ 扫描已有记忆文件(生成清单)
├─ 构建提取 prompt
├─ 权限: 只能读 + 只能在记忆目录写
├─ 最多 5 轮工具调用
└─ 写入记忆到 ~/.claude/projects/<path>/memory/
├─ MEMORY.md(索引文件,≤200行)
└── 各主题 .md 文件
记忆类型
| 类型 | 内容 | 生命周期 |
|---|---|---|
| User | 用户角色、目标、职责 | 跨会话永久 |
| Feedback | 用户纠正、偏好 | 跨会话永久 |
| Project | 项目工作、目标、缺陷 | 跨会话永久 |
| Reference | 外部系统指针 | 跨会话永久 |
记忆文件结构
~/.claude/
├── memory/ # 全局记忆(跨项目)
│ └── MEMORY.md # 索引文件(≤200行/25KB)
├── projects/<path>/memory/ # 项目级记忆
│ ├── MEMORY.md # 项目索引
│ ├── user_role.md # 用户角色信息
│ ├── feedback_testing.md # 测试相关反馈
│ └── project_architecture.md # 项目架构知识
└── session-memory/ # 会话级记忆(临时)
└── project-name/
└── session-notes.md # 当前会话笔记
压缩效果
- Full Compact 压缩率: 极高(100K+ → ~5-10K tokens 摘要)
- Auto-Memory : 将关键信息持久化,新对话自动加载(MEMORY.md 注入系统提示)
五、手动 Compact 的执行路径
用户输入 /compact 时,按以下优先级选择策略:
/compact 命令
│
├─ 优先级 1: Session Memory Compact
│ ├─ 有笔记文件?→ 用笔记替代旧消息,保留最近原文
│ └─ 无笔记?→ 降级
│
├─ 优先级 2: Reactive Compact(仅响应式模式)
│ └─ 按消息组逐层丢弃,直到 API 接受
│
└─ 优先级 3: 传统 Full Compact
├─ 先执行 Micro-Compact 清理旧工具结果
├─ 再调用 compactConversation() 生成完整摘要
└─ 使用 forked agent 共享 prompt cache
六、「自愈」机制详解
这是整个架构最精巧的部分 ------ 不是被动压缩,而是主动修复。
1. 熔断器(Circuit Breaker)
typescript
// autoCompact.ts:70
const MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3
// 连续 3 次失败后停止尝试
// BQ 2026-03-10: 1,279 sessions had 50+ consecutive failures,
// wasting ~250K API calls/day globally.
自愈逻辑: 防止无限重试浪费 API 调用。连续失败说明上下文已不可恢复,停止尝试比持续浪费更好。
2. 重试降级(Retry with Degradation)
compactConversation 内部:
│
├─ API 返回 "prompt_too_long"
│ ├─ 丢弃最旧的消息组(groupMessagesByApiRound 的第一组)
│ └─ 重试压缩
│
└─ 再次失败?→ 继续丢弃 → 重试...
自愈逻辑: 第一次摘要生成失败时,自动减少上下文再试,逐步降级直到成功。
3. API 不变量保护(API Invariant Preservation)
typescript
// sessionMemoryCompact.ts: adjustIndexToPreserveAPIInvariants()
// 1. 确保 tool_use / tool_result 配对不分离
// → 如果保留的消息包含 tool_result,向前查找匹配的 tool_use
// 2. 确保 thinking 块不丢失
// → 同 message.id 的流式消息必须一起保留
自愈逻辑: 切分消息时自动向前回溯,修复孤立的工具调用和思考块。
修复的场景示例:
Session storage (before compaction):
Index N: assistant, message.id: X, content: [thinking]
Index N+1: assistant, message.id: X, content: [tool_use: ORPHAN_ID]
Index N+2: assistant, message.id: X, content: [tool_use: VALID_ID]
Index N+3: user, content: [tool_result: ORPHAN_ID, tool_result: VALID_ID]
If startIndex = N+2:
❌ 旧代码: 只检查 N+2 → 找不到 tool_result → 返回 N+2
结果: ORPHAN tool_use 被排除,但 ORPHAN tool_result 保留 → API 错误
✅ 新代码: 检查所有保留消息的 tool_result → 发现 ORPHAN_ID → 回溯到 N
结果: 所有配对完整 → API 正常
4. 尾随提取(Trailing Extraction)
typescript
// extractMemories.ts:
// 提取进行中收到新请求 → 存入 pendingContext
// 当前提取完成 → 自动用最新上下文再跑一次
自愈逻辑: 不丢失任何消息。延迟的请求会在上一次完成后自动补上。
5. 光标回退(Cursor Rollback)
typescript
// extractMemories.ts:
// 提取成功 → lastMemoryMessageUuid 前进
// 提取失败 → lastMemoryMessageUuid 不动 → 下次重新处理这些消息
自愈逻辑: 失败不会导致消息丢失,下次自动重试。
6. 缓存共享(Prompt Cache Sharing)
typescript
// compact.ts 使用 runForkedAgent:
// - 相同 system prompt → 复用缓存前缀
// - 相同 tools 列表 → 缓存键匹配
// - 20K tokens 保留给摘要输出
自愈逻辑: 压缩请求共享主对话的缓存,减少成本和延迟。即使压缩发生,缓存前缀仍然有效。
7. 文件重新注入(File Re-injection)
typescript
// compact.ts: createPostCompactFileAttachments()
// 压缩后自动恢复:
// - 最近访问的最多 5 个文件(每个≤5K tokens)
// - 活跃的 Skill 附件(每个≤5K tokens)
// - 当前 Plan 文件
自愈逻辑: 压缩后主动恢复工作上下文,防止 AI「失忆」------它知道自己刚才在看什么文件。
8. 会话恢复(Session Restoration)
typescript
// sessionMemoryCompact.ts:
// 场景: 恢复的会话没有 lastSummarizedMessageId
// 处理: lastSummarizedIndex = messages.length - 1
// → startIndex 从 messages.length 开始(不保留任何消息)
// → 用整个 session memory 作为摘要
自愈逻辑: 即使跨会话恢复,也能利用已有的记忆文件重建上下文。
9. 记忆新鲜度追踪
typescript
// memoryFileDetection.ts: memoryFreshnessNote()
// 记忆超过 1 天 → 自动添加时效性警告
// 让 AI 知道这些记忆可能已过时
自愈逻辑: 不盲目信任旧记忆,通过时效性标注降低幻觉风险。
七、消息分组算法
源码文件 : services/compact/grouping.ts
消息分组是压缩的基础设施,按 API 轮次边界 分组:
typescript
function groupMessagesByApiRound(messages: Message[]): Message[][] {
// 边界判定: 新 assistant 消息的 message.id 与前一个不同
// 同一次 API 响应的流式块共享 message.id → 保持在同一组
// 这保证了 tool_use/result 配对不会被拆分
}
设计决策:
- 使用
assistant.message.id作为分组边界(而非用户消息) - 支持单次 prompt 的 agentic 会话(SDK/eval 场景)
- 对畸形对话(dangling tool_use)由 fork 的
ensureToolResultPairing修复
八、三层系统协同工作流
┌─────────────────────────────────────────────────────────────────┐
│ 用户发起对话 │
└─────────────────────┬───────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 对话进行中(后台并行) │
│ ┌───────────────────────┐ ┌────────────────────────────────┐ │
│ │ Session Memory 提取 │ │ Auto-Memory 持久记忆提取 │ │
│ │ (每轮结束后后台运行) │ │ (每轮结束后后台运行) │ │
│ │ → 更新会话笔记文件 │ │ → 更新 ~/.claude/ 记忆文件 │ │
│ └───────────────────────┘ └────────────────────────────────┘ │
└─────────────────────┬───────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Token 使用量达到 87%? │
│ ┌─── 否 ──────────────────┐ │
│ │ 继续对话 │ │
│ │ 同时: Micro-Compact │ │
│ │ 清理过期工具结果 │ │
│ └─────────────────────────┘ │
│ ┌─── 是 ──────────────────────────────────────────────────┐ │
│ │ │ │
│ │ 有 Session Memory 笔记? │ │
│ │ ├─ 有 → Session Memory Compact │ │
│ │ │ (笔记替代旧消息 + 保留最近原文) │ │
│ │ │ │ │
│ │ └─ 无 → Full Compact │ │
│ │ (API 生成 9 段摘要 + 恢复最近文件) │ │
│ │ │ │
│ │ 连续失败 3 次?→ 熔断,停止尝试 │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────┬───────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Token 使用量达到 95%? │
│ ┌─── 是 ──────────────────┐ │
│ │ 阻塞新请求 │ │
│ │ 强制用户手动 /compact │ │
│ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 会话结束 │
│ → Auto-Memory 将关键信息持久化 │
│ → 下次新对话自动加载 MEMORY.md │
└─────────────────────────────────────────────────────────────────┘
九、压缩率分层策略总结
压缩率
高 ────────────────────────────────────────────
│ Full Compact
│ (旧消息→9段摘要,~90%压缩)
│
│ Session Memory Compact
│ (旧消息→笔记,最近保留原文,~70%压缩)
│
│ Micro-Compact
│ (只清工具结果,~30%压缩)
│
低 ────────────────────────────────────────────
│<──── 旧 │─────── 近 ──────>│ 新 │
│ 轮次 轮次 │轮次│
十、核心设计哲学
- 越近越保真: 最近几轮对话几乎零压缩,越远压缩越狠
- 后台无感: 所有提取、整理在后台 forked agent 中进行,用户零感知
- 渐进升级: 从最轻量的清理到最重的全量摘要,按需升级、层层递进
- 永不丢弃 :
- API 不变量(tool 配对、thinking 块)始终保护
- 提取失败自动重试(光标不前进)
- 摘要失败有熔断器(防止雪崩)
- 跨会话延续: Auto-Memory 将关键信息持久化,新对话自动继承旧对话的「经验」
- 缓存友好: 压缩使用 forked agent 共享 prompt cache,最小化额外 API 成本
附录 A:关键系统提示词(Prompt)原文
以下是从源码中提取的完整 Prompt 原文,按功能模块分类。
A.1 全量压缩 Prompt(Full Compact)
文件 : services/compact/prompt.ts
AI 在生成对话摘要时收到的完整指令:
CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.
- Do NOT use Read, Bash, Grep, Glob, Edit, Write, or ANY other tool.
- You already have all the context you need in the conversation above.
- Tool calls will be REJECTED and will waste your only turn --- you will fail the task.
- Your entire response must be plain text: an <analysis> block followed by a <summary> block.
Your task is to create a detailed summary of the conversation so far, paying close
attention to the user's explicit requests and your previous actions.
This summary should be thorough in capturing technical details, code patterns, and
architectural decisions that would be essential for continuing development work
without losing context.
Before providing your final summary, wrap your analysis in <analysis> tags to
organize your thoughts and ensure you've covered all necessary points. In your
analysis process:
1. Chronologically analyze each message and section of the conversation.
For each section thoroughly identify:
- The user's explicit requests and intents
- Your approach to addressing the user's requests
- Key decisions, technical concepts and code patterns
- Specific details like: file names, full code snippets, function signatures, file edits
- Errors that you ran into and how you fixed them
- Pay special attention to specific user feedback that you received, especially
if the user told you to do something differently.
2. Double-check for technical accuracy and completeness, addressing each required
element thoroughly.
Your summary should include the following sections:
1. Primary Request and Intent:
Capture all of the user's explicit requests and intents in detail
2. Key Technical Concepts:
List all important technical concepts, technologies, and frameworks discussed.
3. Files and Code Sections:
Enumerate specific files and code sections examined, modified, or created.
Pay special attention to the most recent messages and include full code snippets
where applicable and include a summary of why this file read or edit is important.
4. Errors and fixes:
List all errors that you ran into, and how you fixed them. Pay special attention
to specific user feedback that you received, especially if the user told you to
do something differently.
5. Problem Solving:
Document problems solved and any ongoing troubleshooting efforts.
6. All user messages:
List ALL user messages that are not tool results. These are critical for
understanding the users' feedback and changing intent.
7. Pending Tasks:
Outline any pending tasks that you have explicitly been asked to work on.
8. Current Work:
Describe in detail precisely what was being worked on immediately before this
summary request, paying special attention to the most recent messages from both
user and assistant. Include file names and code snippets where applicable.
9. Optional Next Step:
List the next step that you will take that is related to the most recent work
you were doing. IMPORTANT: ensure that this step is DIRECTLY in line with the
user's most recent explicit requests, and the task you were working on
immediately before this summary request. If your last task was concluded, then
only list next steps if they are explicitly in line with the users request.
Do not start on tangential requests or really old requests that were already
completed without confirming with the user first.
If there is a next step, include direct quotes from the most recent conversation
showing exactly what task you were working on and where you left off. This should
be verbatim to ensure there's no drift in task interpretation.
REMINDER: Do NOT call any tools. Respond with plain text only --- an <analysis>
block followed by a <summary> block. Tool calls will be rejected and you will
fail the task.
压缩后注入到新对话的衔接语:
This session is being continued from a previous conversation that ran out of
context. The summary below covers the earlier portion of the conversation.
[9段摘要内容]
Recent messages are preserved verbatim.
Continue the conversation from where it left off without asking the user any
further questions. Resume directly --- do not acknowledge the summary, do not
recap what was happening, do not preface with "I'll continue" or similar.
Pick up the last task as if the break never happened.
A.2 Session Memory 笔记模板
文件 : services/SessionMemory/prompts.ts
后台 forked agent 维护的笔记文件模板(10 个固定区域):
markdown
# Session Title
_A short and distinctive 5-10 word descriptive title for the session. Super info dense, no filler_
# Current State
_What is actively being worked on right now? Pending tasks not yet completed. Immediate next steps._
# Task specification
_What did the user ask to build? Any design decisions or other explanatory context_
# Files and Functions
_What are the important files? In short, what do they contain and why are they relevant?_
# Workflow
_What bash commands are usually run and in what order? How to interpret their output if not obvious?_
# Errors & Corrections
_Errors encountered and how they were fixed. What did the user correct?
What approaches failed and should not be tried again?_
# Codebase and System Documentation
_What are the important system components? How do they work/fit together?_
# Learnings
_What has worked well? What has not? What to avoid?
Do not duplicate items from other sections_
# Key results
_If the user asked a specific output such as an answer to a question,
a table, or other document, repeat the exact result here_
# Worklog
_Step by step, what was attempted, done? Very terse summary for each step_
笔记更新 Prompt(发给 forked agent 的指令):
IMPORTANT: This message and these instructions are NOT part of the actual user
conversation. Do NOT include any references to "note-taking", "session notes
extraction", or these update instructions in the notes content.
Based on the user conversation above (EXCLUDING this note-taking instruction
message as well as system prompt, claude.md entries, or any past session
summaries), update the session notes file.
The file {notesPath} has already been read for you. Here are its current contents:
<current_notes_content>
{currentNotes}
</current_notes_content>
Your ONLY task is to use the Edit tool to update the notes file, then stop.
You can make multiple edits (update every section as needed) - make all Edit
tool calls in parallel in a single message. Do not call any other tools.
CRITICAL RULES FOR EDITING:
- The file must maintain its exact structure with all sections, headers, and
italic descriptions intact
- NEVER modify, delete, or add section headers (the lines starting with '#')
- NEVER modify or delete the italic _section description_ lines
- The italic _section descriptions_ are TEMPLATE INSTRUCTIONS that must be
preserved exactly as-is - they guide what content belongs in each section
- ONLY update the actual content that appears BELOW the italic descriptions
- Do NOT reference this note-taking process or instructions anywhere in the notes
- It's OK to skip updating a section if there are no substantial new insights
- Write DETAILED, INFO-DENSE content for each section
- Keep each section under ~2000 tokens/words
- Focus on actionable, specific information
- IMPORTANT: Always update "Current State" to reflect the most recent work
CRITICAL: The session memory file is currently ~{X} tokens, which exceeds the
maximum of 12000 tokens. You MUST condense the file to fit within this budget.
Aggressively shorten oversized sections by removing less important details,
merging related items, and summarizing older entries. Prioritize keeping
"Current State" and "Errors & Corrections" accurate and detailed.
A.3 Auto-Memory 持久记忆系统 Prompt
文件 : memdir/memdir.ts + memdir/memoryTypes.ts
注入到系统提示词中的完整记忆指令:
markdown
# auto memory
You have a persistent, file-based memory system at `{memoryDir}`.
This directory already exists --- write to it directly with the Write tool
(do not run mkdir or check for its existence).
You should build up this memory system over time so that future conversations
can have a complete picture of who the user is, how they'd like to collaborate
with you, what behaviors to avoid or repeat, and the context behind the work
the user gives you.
If the user explicitly asks you to remember something, save it immediately as
whichever type fits best. If they ask you to forget something, find and remove
the relevant entry.
## Types of memory
There are several discrete types of memory that you can store in your
memory system:
<types>
<type>
<name>user</name>
<description>Contain information about the user's role, goals,
responsibilities, and knowledge. Great user memories help you tailor
your future behavior to the user's preferences and perspective. Your
goal in reading and writing these memories is to build up an
understanding of who the user is and how you can be most helpful to
them specifically.</description>
<when_to_save>When you learn any details about the user's role,
preferences, responsibilities, or knowledge</when_to_save>
<how_to_use>When your work should be informed by the user's profile
or perspective.</how_to_use>
<examples>
user: I'm a data scientist investigating what logging we have in place
assistant: [saves user memory: user is a data scientist,
currently focused on observability/logging]
user: I've been writing Go for ten years but this is my first time
touching the React side of this repo
assistant: [saves user memory: deep Go expertise, new to React and
this project's frontend --- frame frontend explanations in
terms of backend analogues]
</examples>
</type>
<type>
<name>feedback</name>
<description>Guidance the user has given you about how to approach
work --- both what to avoid and what to keep doing. Record from failure
AND success: if you only save corrections, you will avoid past mistakes
but drift away from approaches the user has already validated.</description>
<when_to_save>Any time the user corrects your approach ("no not that",
"don't", "stop doing X") OR confirms a non-obvious approach worked
("yes exactly", "perfect, keep doing that"). Corrections are easy to
notice; confirmations are quieter --- watch for them.</when_to_save>
<how_to_use>Let these memories guide your behavior so that the user
does not need to offer the same guidance twice.</how_to_use>
<body_structure>Lead with the rule itself, then a **Why:** line
(the reason the user gave) and a **How to apply:** line.</body_structure>
<examples>
user: don't mock the database in these tests --- we got burned last quarter
assistant: [saves feedback memory: integration tests must hit a real
database, not mocks. Reason: prior incident where mock/prod
divergence masked a broken migration]
user: stop summarizing what you just did at the end of every response,
I can read the diff
assistant: [saves feedback memory: this user wants terse responses
with no trailing summaries]
</examples>
</type>
<type>
<name>project</name>
<description>Information about ongoing work, goals, initiatives, bugs,
or incidents within the project that is not otherwise derivable from
the code or git history.</description>
<when_to_save>When you learn who is doing what, why, or by when.
Always convert relative dates to absolute dates when saving
(e.g., "Thursday" → "2026-03-05").</when_to_save>
<how_to_use>Use these memories to more fully understand the details
and nuance behind the user's request.</how_to_use>
<body_structure>Lead with the fact or decision, then a **Why:** line
and a **How to apply:** line.</body_structure>
<examples>
user: we're freezing all non-critical merges after Thursday
assistant: [saves project memory: merge freeze begins 2026-03-05 for
mobile release cut. Flag any non-critical PR work after that date]
</examples>
</type>
<type>
<name>reference</name>
<description>Stores pointers to where information can be found in
external systems.</description>
<when_to_save>When you learn about resources in external systems
and their purpose.</when_to_save>
<how_to_use>When the user references an external system or information
that may be in an external system.</how_to_use>
<examples>
user: check the Linear project "INGEST" if you want context on tickets
assistant: [saves reference memory: pipeline bugs are tracked in
Linear project "INGEST"]
</examples>
</type>
</types>
## What NOT to save in memory
- Code patterns, conventions, architecture, file paths, or project structure
--- these can be derived by reading the current project state.
- Git history, recent changes, or who-changed-what --- git log/blame are
authoritative.
- Debugging solutions or fix recipes --- the fix is in the code.
- Anything already documented in CLAUDE.md files.
- Ephemeral task details: in-progress work, temporary state, current
conversation context.
These exclusions apply even when the user explicitly asks you to save.
If they ask you to save a PR list or activity summary, ask what was
*surprising* or *non-obvious* about it --- that is the part worth keeping.
## How to save memories
Saving a memory is a two-step process:
**Step 1** --- write the memory to its own file (e.g., `user_role.md`,
`feedback_testing.md`) using this frontmatter format:
```markdown
---
name: {{memory name}}
description: {{one-line description --- used to decide relevance in future
conversations, so be specific}}
type: {{user, feedback, project, reference}}
---
{{memory content --- for feedback/project types, structure as: rule/fact,
then **Why:** and **How to apply:** lines}}
Existing memory files
已有记忆文件清单
Check this list before writing --- update an existing file rather than creating
a duplicate.
后接 A.3 中的记忆类型定义、保存规则等
---
### A.5 Dream(记忆整理)Prompt
**文件**: `services/autoDream/consolidationPrompt.ts`
跨会话记忆整理(后台定期运行)的完整 4 阶段指令:
```markdown
# Dream: Memory Consolidation
You are performing a dream --- a reflective pass over your memory files.
Synthesize what you've learned recently into durable, well-organized memories
so that future sessions can orient quickly.
Memory directory: `{memoryRoot}`
This directory already exists --- write to it directly with the Write tool.
Session transcripts: `{transcriptDir}` (large JSONL files --- grep narrowly,
don't read whole files)
---
## Phase 1 --- Orient
- `ls` the memory directory to see what already exists
- Read `MEMORY.md` to understand the current index
- Skim existing topic files so you improve them rather than creating duplicates
- If `logs/` or `sessions/` subdirectories exist, review recent entries there
## Phase 2 --- Gather recent signal
Look for new information worth persisting. Sources in rough priority order:
1. **Daily logs** (`logs/YYYY/MM/YYYY-MM-DD.md`) if present --- these are the
append-only stream
2. **Existing memories that drifted** --- facts that contradict something you
see in the codebase now
3. **Transcript search** --- if you need specific context, grep the JSONL
transcripts for narrow terms:
`grep -rn "<narrow term>" {transcriptDir}/ --include="*.jsonl" | tail -50`
Don't exhaustively read transcripts. Look only for things you already
suspect matter.
## Phase 3 --- Consolidate
For each thing worth remembering, write or update a memory file at the top
level of the memory directory. Use the memory file format and type conventions
from your system prompt's auto-memory section.
Focus on:
- Merging new signal into existing topic files rather than creating
near-duplicates
- Converting relative dates ("yesterday", "last week") to absolute dates so
they remain interpretable after time passes
- Deleting contradicted facts --- if today's investigation disproves an old
memory, fix it at the source
## Phase 4 --- Prune and index
Update `MEMORY.md` so it stays under 200 lines AND under ~25KB.
It's an **index**, not a dump --- each entry should be one line under ~150
characters: `- [Title](file.md) --- one-line hook`. Never write memory content
directly into it.
- Remove pointers to memories that are now stale, wrong, or superseded
- Demote verbose entries: if an index line is over ~200 chars, it's carrying
content that belongs in the topic file --- shorten the line, move the detail
- Add pointers to newly important memories
- Resolve contradictions --- if two files disagree, fix the wrong one
---
Return a brief summary of what you consolidated, updated, or pruned.
If nothing changed (memories are already tight), say so.
附录 B:关键源码文件索引
| 文件 | 职责 |
|---|---|
services/compact/autoCompact.ts |
自动压缩触发、阈值计算、熔断器 |
services/compact/compact.ts |
全量压缩主流程、文件恢复、重试降级 |
services/compact/microCompact.ts |
轻量工具结果清理 |
services/compact/sessionMemoryCompact.ts |
Session Memory 压缩策略、消息保留算法 |
services/compact/prompt.ts |
压缩 prompt 模板(9段式摘要) |
services/compact/grouping.ts |
消息分组算法(按 API 轮次) |
services/SessionMemory/sessionMemory.ts |
会话笔记后台提取 |
services/extractMemories/extractMemories.ts |
持久记忆提取 |
services/autoDream/autoDream.ts |
跨会话记忆整理(Dream 机制) |
memdir/memoryScan.ts |
记忆文件扫描与索引 |
memdir/paths.ts |
记忆文件路径管理 |
utils/forkedAgent.ts |
Forked Agent 框架(缓存共享) |