【Agent Memory篇】05：MemPalace 整体架构、宫殿隐喻与四层记忆栈

系列前四篇（01--04）我们拆完了 OpenClaw 这一套"LLM 自主决定记什么"的 Agent Memory 方案。从这一篇开始，主角换人------我们进入一个立场截然相反的项目：MemPalace （milla-jovovich/mempalace，v3.0.14）。

它的口号只有一句话："Store everything, then make it findable." （把一切都存下来，再让它可被检索。）这句话背后，是一整套以"记忆宫殿"（Method of Loci）为隐喻、以 ChromaDB 为原始存储、以四层记忆栈（L0--L3）做分层加载的工程架构。它在 LongMemEval R@5 上跑出 96.6% 的 raw-mode 成绩------目前"零 API key、零云、本地跑"这一赛道的最高公开分数。

本篇作为 MemPalace 子系列的开篇，先把整体骨架铺清楚：设计哲学、数据模型（Wing/Hall/Room/Closet/Drawer/Tunnel）、四层记忆栈 L0--L3、配置系统，以及它和 OpenClaw 的架构观差异。挖掘流水线、AAAK + 知识图谱 + 检索、MCP 实战，分别留给 06/07/08 三篇。

文章目录

一. 为什么 MemPalace 值得单独写一个子系列 🎯
- 1.1 一张对比图：OpenClaw vs MemPalace
- 1.2 96.6% 是怎么一回事
- 1.3 为什么它值得你读源码
二. 设计哲学的三大支柱 🏗️
- 2.1 支柱一：记忆宫殿隐喻（Method of Loci）
- 2.2 支柱二：本地优先、零 API key
- 2.3 支柱三：原始存储（Raw Verbatim Storage）
- 2.4 "4 月 7 日那封信"：一个值得抄下来的诚实声明
三. 数据模型：宫殿、翼、大厅、房间、壁橱、抽屉、隧道 🧬
- 3.1 七个概念的一张 ASCII 全景图
- 3.2 Wing：第一层容器
- 3.3 Hall：按"记忆类型"横切
- 3.4 Room：具体话题
- 3.5 Closet 与 Drawer：摘要指针与原始文件
- 3.6 Tunnel：跨 Wing 的同话题连接
- 3.7 为什么这套"空间隐喻"不是花活
四. 四层记忆栈 L0--L3：layers.py 深读 📊
- 4.1 分层的动机：170 tokens 唤醒 vs 19.5M tokens 粘贴
- 4.2 Layer 0 --- Identity（~100 tokens，常驻）
- 4.3 Layer 1 --- Essential Story（~500--800 tokens，常驻）
- 4.4 Layer 2 --- On-Demand（按 wing/room 过滤）
- 4.5 Layer 3 --- Deep Search（全库语义搜索）
- 4.6 MemoryStack：统一入口
- 4.7 一个可复用的"字节预算"心智模型
五. 配置系统：config.py 与 ~/.mempalace 目录 🔄
- 5.1 加载优先级：env > config.json > 默认值
- 5.2 DEFAULT_HALL_KEYWORDS：硬编码的"初始词典"
- 5.3 wing_config.json、identity.txt、people_map.json
- 5.4 Apple Silicon 段错误补丁：一行 env 看出工程态度
六. ChromaDB 作为存储后端的工程权衡 📝
- 6.1 为什么不是 FAISS / Qdrant / Milvus
- 6.2 批量读取的 SQLite 变量上限坑
- 6.3 为什么 wing + room 过滤能涨 34% R@10
七. MemPalace vs OpenClaw vs Mem0 vs Zep：一张大表 🔍
八. 96.6% 从哪儿来、AAAK 的真实权衡 🎯
九. 📚 小结与下篇预告
十. 📖 参考文献

一. 为什么 MemPalace 值得单独写一个子系列 🎯

1.1 一张对比图：OpenClaw vs MemPalace

在系列 01--04 篇里，我们从 OpenClaw 的 memory_manager、extractor、retriever 一路啃到它的"LLM-as-curator"流水线。那套方案的中心假设是：

"LLM 足够聪明，可以替你决定哪些事值得记住。"

于是它每过一段对话就调用一次 LLM 做 extract / summarize，把原始对话丢掉，留下一条条干净的 "fact"。这套路和 Mem0、LangMem、Letta 属于同一学派。

MemPalace 的作者 Milla Jovovich 和 Ben Sigman 在 README 第一句话就把这条路线拍到墙上：

Other memory systems try to fix this by letting AI decide what's worth remembering. It extracts "user prefers Postgres" and throws away the conversation where you explained why . MemPalace takes a different approach: store everything, then make it findable.

用一张图概括两派的世界观：

复制代码

            ┌─────────────────────────────────────────┐
            │        原始对话（19.5M tokens/年）       │
            └────────────────┬────────────────────────┘
                             │
            ┌────────────────┴────────────────┐
            ▼                                 ▼
  ┌────────────────────┐            ┌────────────────────┐
  │   OpenClaw 学派     │            │   MemPalace 学派    │
  │                    │            │                    │
  │ LLM extract/       │            │ 原文直接入库        │
  │ summarize           │           │ ChromaDB verbatim  │
  │                    │            │                    │
  │ 丢掉原文            │           │ 一个字都不丢        │
  │ 留下 "facts"        │           │ 用"宫殿结构"来组织  │
  │                    │            │                    │
  │ 查询 = 拼事实       │           │ 查询 = 找房间 →     │
  │                    │            │        语义检索     │
  └────────┬───────────┘            └─────────┬──────────┘
           │                                  │
           ▼                                  ▼
    每次写入都要调 LLM                每次写入都只是 embed
    成本按 token 线性涨              成本只有一次 ingest
    丢细节，但占 context 少          留细节，但依赖检索质量

这张图解释了为什么 MemPalace 值得另起一个子系列------它代表了 Agent Memory 领域里 "compression-first" 的对立面："index-first"。

1.2 96.6% 是怎么一回事

打开它的 README，最显眼的数字是这三个：

复制代码

  96.6%            500/500            $0
LongMemEval R@5   questions tested   no cloud, local only

benchmarks/BENCHMARKS.md 给出完整结果，其中最抢眼的是：

Benchmark	Mode	Score	API Calls
LongMemEval R@5	Raw (ChromaDB only)	96.6%	Zero
LongMemEval R@5	Hybrid + Haiku rerank	100% (500/500)	~500
LoCoMo R@10	Raw, session level	60.3%	Zero
Palace structure impact	Wing+room filtering	+34% R@10	Zero

三件事值得你先记住：

96.6% 来自 raw mode ，也就是 "原文丢进 ChromaDB + wing/room 过滤"，没有调 AAAK 做压缩，没有调任何外部 LLM。
100% 是带 Haiku rerank 的 hybrid 模式，这行是真实的但 rerank 流水线不在 public benchmark 脚本里。
+34% 来自 wing + room 的 metadata 过滤，这是 ChromaDB 的标准功能，不是什么"秘方"。

第 3 点最关键：这 34% 就是为什么 MemPalace 要花力气去把数据组织成"宫殿"------不是为了文学气质，是为了涨点。

1.3 为什么它值得你读源码

作为工程读者，挑源码的三个理由：

代码量小、概念清晰 ：layers.py 只有 515 行，config.py 只有 149 行，你可以在一个晚上读完全部 L0--L3 + 配置系统。
设计取舍非常"可吵架"：几乎每一个决策（raw vs summary、SQLite vs Neo4j、ChromaDB vs Qdrant、AAAK 用不用）都能开一场架构评审会。这种项目最适合拿来教学。
作者自己就在公开认错：README 里有一段 "A Note from Milla & Ben --- April 7, 2026"，把他们发布初版后被社区捞出来的错误、误导性宣传、没 wire 起来的 feature 全部列清楚了。这种诚实在 AI 工具圈属于稀有物种，值得单独开一节。

二. 设计哲学的三大支柱 🏗️

2.1 支柱一：记忆宫殿隐喻（Method of Loci）

README 的 "How It Works" 第一段是整份文档最"文学"的一段，但它不是修辞，是核心设计：

Ancient Greek orators memorized entire speeches by placing ideas in rooms of an imaginary building. Walk through the building, find the idea. MemPalace applies the same principle to AI memory: your conversations are organized into wings (people and projects), halls (types of memory), and rooms (specific ideas).

它把一整个现实建筑的隐喻映射到了数据库 schema：

隐喻	对应	作用
Wing（翼）	第一层分区	一个人 / 一个项目 / 一个主题
Hall（大厅）	横切维度	记忆类型（facts / events / advice / preferences / discoveries）
Room（房间）	具体话题	`auth-migration`、`graphql-switch`、`ci-pipeline`
Closet（壁橱）	摘要	指向原始文件的摘要卡片
Drawer（抽屉）	原文	一字不动的原始 chunk
Tunnel（隧道）	跨 Wing 连接	把不同 Wing 里的同名 Room 连起来

这个 schema 本质上是一个带"类型维度"的树 + 图 。后面 4.x 节我们会看到，layers.py 里每一次 ChromaDB 查询实际上都是在用 wing/room 这两个 metadata key 做 where 过滤------隐喻在代码里是真实可执行的。

Why（为什么不是扁平 tag system） ：

扁平 tag 系统（Obsidian 风格）在 N 个 tag 下会退化成全文搜索，并且没有内建的"作用域"。宫殿隐喻提供了强制的两级作用域：查询默认就在一个 wing 里，除非你显式开全局。这个先验大大减少了召回噪音，也让 LLM 容易学会"先决定去哪个翼，再去哪个房间"的导航范式。

2.2 支柱二：本地优先、零 API key

pyproject.toml 里依赖只有两行：

toml 复制代码

# /tmp/mempalace_src/pyproject.toml
dependencies = [
    "chromadb>=0.5.0,<0.7",
    "pyyaml>=6.0",
]

没有 openai、没有 anthropic、没有 sentence-transformers 的单独依赖（ChromaDB 自己内置了一个 ONNX 模型）。

这个决定的直接结果是：

装完即用，不需要配 API key
所有 embedding 走本地 ONNX
没有一行代码会往外发数据

mempalace/__init__.py 有一段特别"工程师感"的代码：

python 复制代码

# /tmp/mempalace_src/mempalace/__init__.py
# ONNX Runtime's CoreML provider segfaults during vector queries on Apple Silicon.
# Force CPU execution unless the user has explicitly set a preference.
if platform.machine() == "arm64" and platform.system() == "Darwin":
    os.environ.setdefault("ORT_DISABLE_COREML", "1")

这三行告诉你两件事：(1) 他们真的在 M 系列 Mac 上跑过；(2) 他们宁愿默认牺牲一点性能，也要保证你装完就能用。对一个"本地优先"项目来说，这种防御性代码比 README 里任何一个 benchmark 数字都更能说明诚意。

2.3 支柱三：原始存储（Raw Verbatim Storage）

这是 MemPalace 区别于 Mem0、LangMem、Letta、OpenClaw 的最硬核的一个决定：

MemPalace stores your actual exchanges in ChromaDB without summarization or extraction. The 96.6% LongMemEval result comes from this raw mode.

换句话说：convo_miner 的默认行为就是把对话切成 chunk，打上 wing/room metadata，然后直接 embed 存进 ChromaDB。不过 LLM、不做 extract、不做 summary。

这里面的权衡是：

维度	原始存储	LLM-extract 存储
写入成本	只 embed 一次	每次写都要调 LLM
写入延迟	毫秒级	秒级
Token 成本	0	大量
细节保留	100%	严重损失
存储体积	大	小
检索依赖	重	轻
"为什么"丢失风险	低	高

MemPalace 把赌注压在"向量检索 + 结构化过滤"上。这条路的隐含假设是：嵌入模型足够好 + metadata 过滤足够有力 = 不需要牺牲原文。

96.6% R@5 的 benchmark 算是这个赌注第一次公开的胜利。但也要看到：LoCoMo R@10 只有 60.3%，说明 "raw + wing filter" 并不是万能------在那种长对话多角色互相 refer 的任务里，纯语义检索会丢上下文。这也是 AAAK、KG、rerank 三个分支后来生长的原因，我们会在 07 篇细拆。

2.4 "4 月 7 日那封信"：一个值得抄下来的诚实声明

README 里有一段我非常建议你通读一遍的文字（发布后两天，作者回应社区批评写的）。这里只摘关键几条：

The AAAK token example was incorrect. We used a rough heuristic (len(text)//3) for token counts instead of an actual tokenizer. ... AAAK does not save tokens at small scales.

"30x lossless compression" was overstated. AAAK is a lossy abbreviation system. Independent benchmarks show AAAK mode scores 84.2% R@5 vs raw mode's 96.6% --- a 12.4 point regression.

"+34% palace boost" was misleading. That number compares unfiltered search to wing+room metadata filtering. Metadata filtering is a standard ChromaDB feature, not a novel retrieval mechanism.

"Contradiction detection" exists as a separate utility but is not currently wired into the knowledge graph operations as the README implied.

我把这段单独拎出来有两个原因：

技术含义 ：它等于告诉你，MemPalace 这个项目的真实实力在 "raw verbatim 存储 + wing/room metadata 过滤" 这一条主线上------不在 AAAK，不在 "30x 压缩"，不在 KG 的自动化矛盾检测。后面写 06/07/08 篇时我会按这个真实分量给每个子系统打分。
工程文化含义：一个开源项目愿意在 README 顶部挂一封"我们错在哪"的公开信，这件事本身在 2026 年的 AI 工具圈非常稀有。它等于在说："我们对 benchmark 数字负责，我们不会把一个 84.2% 的东西卖成 100%。" 这种态度决定了你读它源码时可以默认它不在糊弄你。

好了，吹哨结束。进入正题。

三. 数据模型：宫殿、翼、大厅、房间、壁橱、抽屉、隧道 🧬

3.1 七个概念的一张 ASCII 全景图

先把 README 里那张官方图抄下来，再在旁边加上我补充的标注：

复制代码

  ┌─────────────────────────────────────────────────────────────┐
  │  WING: Person  (e.g. wing_kai)                              │
  │  ────────────────────────────────────────────               │
  │                                                            │
  │    ┌──────────┐  ──hall──  ┌──────────┐                    │
  │    │  Room A  │            │  Room B  │   hall = 同 wing   │
  │    │ auth     │            │ security │          内的走廊  │
  │    └────┬─────┘            └──────────┘                    │
  │         │                                                  │
  │         ▼                                                  │
  │    ┌──────────┐      ┌──────────┐                          │
  │    │  Closet  │ ───▶ │  Drawer  │  drawer = 原文 chunk     │
  │    │ summary  │      │ verbatim │  closet = 指向它的摘要    │
  │    └──────────┘      └──────────┘                          │
  └─────────┼──────────────────────────────────────────────────┘
            │
          tunnel    ← 跨 wing 的相同 room 自动建立连接
            │
  ┌─────────┼──────────────────────────────────────────────────┐
  │  WING: Project  (e.g. wing_driftwood)                       │
  │                                                            │
  │    ┌────┴─────┐  ──hall──  ┌──────────┐                    │
  │    │  Room A  │            │  Room C  │                    │
  │    │ auth     │            │ deploy   │                    │
  │    └────┬─────┘            └──────────┘                    │
  │         │                                                  │
  │         ▼                                                  │
  │    ┌──────────┐      ┌──────────┐                          │
  │    │  Closet  │ ───▶ │  Drawer  │                          │
  │    └──────────┘      └──────────┘                          │
  └─────────────────────────────────────────────────────────────┘

  整个 ChromaDB collection 只有一张表：mempalace_drawers
  每条记录 = {id, embedding, document (原文), metadata}
  metadata 里关键字段：{wing, room, hall, importance, source_file}

你会发现很有意思的一件事：所有这些"空间概念"在物理存储上其实只是 ChromaDB 一条条记录的 metadata 字段 。它没有单独的 Wing 表、Room 表、Hall 表。整个宫殿隐喻是虚拟的------靠 metadata 过滤在查询时即时"走"出来。

这是一个典型的"逻辑模型 ≠ 物理模型"的工程决定：逻辑模型让人和 LLM 好理解，物理模型让查询快且简单。

3.2 Wing：第一层容器

Wing 是第一级切分。有两类：

Entity wing ：一个人或一个项目。由 wing_config.json 配置关键词自动识别。
Topic wing ：主题翼。见 config.py：

python 复制代码

# /tmp/mempalace_src/mempalace/config.py
DEFAULT_TOPIC_WINGS = [
    "emotions",
    "consciousness",
    "memory",
    "technical",
    "identity",
    "family",
    "creative",
]

这份默认列表本身挺有意思------它透露了作者把 MemPalace 原始场景设定成"个人 AI 伴侣"，所以才会把 "emotions"、"consciousness"、"family" 放进默认翼。真正拿去做开发助手时，你会把这个列表改成 ["frontend", "backend", "infra", "docs"] 之类。

3.3 Hall：按"记忆类型"横切

Hall 跟 Wing 是正交的。它是一个 memory type 维度：

复制代码

hall_facts       --- 已经定下来的决策
hall_events      --- 会议 / 事件 / 里程碑
hall_discoveries --- 新发现、insight
hall_preferences --- 偏好、习惯
hall_advice      --- 建议 / 解决方案

在 config.py 里，每个 hall 有一组关键词做路由：

python 复制代码

# /tmp/mempalace_src/mempalace/config.py
DEFAULT_HALL_KEYWORDS = {
    "emotions": [
        "scared", "afraid", "worried", "happy", "sad",
        "love", "hate", "feel", "cry", "tears",
    ],
    "technical": [
        "code", "python", "script", "bug", "error",
        "function", "api", "database", "server",
    ],
    ...
}

（注意：这里的 hall 名字跟 README 里的 "hall_facts / hall_events / ..." 不完全一致------代码里把 emotions、consciousness 这些既用作 topic wing 名，也用作 hall keyword 分类。这是代码和文档之间的一个缝，属于 v3.0.14 还没清理干净的历史债。）

Why hall keyword 是硬编码的 ：

它是个 fallback。真正的房间归类在 room_detector_local.py 里靠启发式 + 可选的 LLM 辅助做，但当什么都识别不出时，hall keyword 保证任何一段文本都能被丢进至少一个分类------不会丢。这是"store everything"哲学的具体体现：宁可错分，不要掉落。

3.4 Room：具体话题

Room 是真正的"最细粒度话题"。典型名字：auth-migration、graphql-switch、ci-pipeline、billing-rework。

room_detector_local.py 负责从文本里识别 room 名------这块属于"挖掘流水线"范畴，留给 第 06 篇。

3.5 Closet 与 Drawer：摘要指针与原始文件

在当前 v3.0.14 里：

Drawer = ChromaDB 的一条真实记录，document 字段是原文 chunk。
Closet = 一个"虚拟摘要层"。README 的原话："In v3.0.0 these are plain-text summaries; AAAK-encoded closets are coming in a future update --- see Task #30."

换句话说，closet 的 AAAK 压缩版暂时还是 roadmap。现在 closet 基本上就是"drawer 的截断摘要 + 指针回原文件"。这也是为什么 raw mode 能跑出 96.6% 而 AAAK mode 只有 84.2%------AAAK 压缩层目前对 retrieval 还是净负贡献。

3.6 Tunnel：跨 Wing 的同话题连接

Tunnel 是整个模型里唯一带"图"味儿的东西。规则极其简单：

同一个 Room 名出现在不同的 Wing 里，就自动建立一条 Tunnel。

README 里的例子：

复制代码

wing_kai       / hall_events / auth-migration  → "Kai debugged the OAuth token refresh"
wing_driftwood / hall_facts  / auth-migration  → "team decided to migrate auth to Clerk"
wing_priya     / hall_advice / auth-migration  → "Priya approved Clerk over Auth0"

同一个 room (auth-migration) 出现在 3 个 wing 里 → 自动形成一条三向隧道。当 LLM 问 "auth-migration 最近咋样"，它可以从任一个 wing 入口走进来，顺着 tunnel 一次拿到三个视角。

这个设计本质上是给 ChromaDB 打了一层 "graph view"：物理上还是 metadata 过滤，逻辑上是一张以 room 为节点的多部图。palace_graph.py 是负责这个视图的模块，我们会在第 07 篇细读。

3.7 为什么这套"空间隐喻"不是花活

读到这里，一个合理的怀疑是：这些花里胡哨的名字，和 "prefix + tag + subtag" 有区别吗？ 区别在三点：

LLM-friendly 导航语法。当你告诉 Claude "去 wing_driftwood 的 auth-migration 房间看一下"，它能立刻理解这是一个 two-level filter。而传统 tag 系统 "driftwood + auth-migration" 在 LLM 眼里是两个权重相同的 tag，它不会优先缩小作用域。
Hall 作为第三正交维度。facts vs events vs advice 的区分让"Priya 建议了什么"和"Priya 做了什么"这两种查询走不同的 hall，不会混淆。
Tunnel 是跨 wing 联想的 built-in 机制。没有它，LLM 遇到 "auth-migration 这事儿现在谁在管" 这类问题只能靠全局语义搜索 + 关键词碰运气。有了它，走 tunnel 就是一次 cheap lookup。

Why 不用 Neo4j ：

作者明确反对"为了图起见引入图数据库"。后面我们会看到 KG 部分直接用 SQLite 自己存 triple（07 篇会写）。这是一个很值得学习的 "don't over-engineer" 决定。

四. 四层记忆栈 L0--L3：layers.py 深读 📊

这节是本文的核心技术段。我们会完整走一遍 /tmp/mempalace_src/mempalace/layers.py 的 515 行代码，逐层剖析 L0--L3 是怎么实现的。

4.1 分层的动机：170 tokens 唤醒 vs 19.5M tokens 粘贴

先看 README 里的账本（用的是 Claude 3.5 级别的价格模型）：

方案	每次 Load 的 Tokens	每年成本
把所有聊天历史粘进 context	19.5M（不存在任何窗口能装下）	不可能
用 LLM 做 summary 每次 load 回来	~650K	~$507
MemPalace wake-up（L0+L1）	~170	~$0.70
MemPalace + 5 次搜索	~13,500	~$10

注意这个数量级的差：170 vs 650,000 ，差了约 3800x。这不是压缩算法的胜利，而是**"只在需要时才加载"**的胜利。L0/L1 常驻，L2/L3 按需拉取。

这一点其实是整个 Agent Memory 领域最容易被忽略的 insight：你不需要让 LLM "知道一切"，你只需要让它 "知道去哪里问"。

源码对这个理念的开宗明义的注释：

python 复制代码

# /tmp/mempalace_src/mempalace/layers.py
"""
layers.py --- 4-Layer Memory Stack for mempalace
===================================================

Load only what you need, when you need it.

    Layer 0: Identity       (~100 tokens)   --- Always loaded. "Who am I?"
    Layer 1: Essential Story (~500-800)      --- Always loaded. Top moments from the palace.
    Layer 2: On-Demand      (~200-500 each)  --- Loaded when a topic/wing comes up.
    Layer 3: Deep Search    (unlimited)      --- Full ChromaDB semantic search.

Wake-up cost: ~600-900 tokens (L0+L1). Leaves 95%+ of context free.
"""

"Leaves 95%+ of context free" 这一句是整个设计的灵魂：一个本地记忆系统的成功指标不是"记得多"，而是"占得少"。

4.2 Layer 0 --- Identity（~100 tokens，常驻）

L0 极其简单------就是读一个用户写死的纯文本文件：

python 复制代码

# /tmp/mempalace_src/mempalace/layers.py
class Layer0:
    """
    ~100 tokens. Always loaded.
    Reads from ~/.mempalace/identity.txt --- a plain-text file the user writes.

    Example identity.txt:
        I am Atlas, a personal AI assistant for Alice.
        Traits: warm, direct, remembers everything.
        People: Alice (creator), Bob (Alice's partner).
        Project: A journaling app that helps people process emotions.
    """

    def __init__(self, identity_path: str = None):
        if identity_path is None:
            identity_path = os.path.expanduser("~/.mempalace/identity.txt")
        self.path = identity_path
        self._text = None

    def render(self) -> str:
        """Return the identity text, or a sensible default."""
        if self._text is not None:
            return self._text

        if os.path.exists(self.path):
            with open(self.path, "r") as f:
                self._text = f.read().strip()
        else:
            self._text = (
                "## L0 --- IDENTITY\nNo identity configured. Create ~/.mempalace/identity.txt"
            )

        return self._text

    def token_estimate(self) -> int:
        return len(self.render()) // 4

注意两个设计细节：

极度反对"AI-generated identity"。L0 必须是人类手写的。原因很简单------这是 AI 的"出厂设置"，你不希望它被 hallucinate 出来。
Token 估算用 len // 4。粗糙但够用，因为 L0 是硬常驻项，只需要知道"大致是不是爆预算"，不需要精确 tokenize。

Why 不用 YAML ：
identity.txt 是纯文本。不是 JSON、不是 YAML、不是 TOML。原因是这段内容直接进 system prompt------任何结构化格式对 LLM 来说都是噪音。人写的自然语言，LLM 读起来最顺。

4.3 Layer 1 --- Essential Story（~500--800 tokens，常驻）

L1 是整个栈里最有意思的一层------它是自动生成的。算法一句话概括：

去 ChromaDB 把所有 drawer 捞出来，按 importance 降序排，取 top 15，按 room 分组，截断每条到 200 字符，总长度 cap 在 3200 字符（~800 tokens）。

python 复制代码

# /tmp/mempalace_src/mempalace/layers.py
class Layer1:
    MAX_DRAWERS = 15  # at most 15 moments in wake-up
    MAX_CHARS = 3200  # hard cap on total L1 text (~800 tokens)

    def __init__(self, palace_path: str = None, wing: str = None):
        cfg = MempalaceConfig()
        self.palace_path = palace_path or cfg.palace_path
        self.wing = wing

    def generate(self) -> str:
        """Pull top drawers from ChromaDB and format as compact L1 text."""
        try:
            client = chromadb.PersistentClient(path=self.palace_path)
            col = client.get_collection("mempalace_drawers")
        except Exception:
            return "## L1 --- No palace found. Run: mempalace mine <dir>"

        # Fetch all drawers in batches to avoid SQLite variable limit (~999)
        _BATCH = 500
        docs, metas = [], []
        offset = 0
        while True:
            kwargs = {"include": ["documents", "metadatas"], "limit": _BATCH, "offset": offset}
            if self.wing:
                kwargs["where"] = {"wing": self.wing}
            try:
                batch = col.get(**kwargs)
            except Exception:
                break
            batch_docs = batch.get("documents", [])
            batch_metas = batch.get("metadatas", [])
            if not batch_docs:
                break
            docs.extend(batch_docs)
            metas.extend(batch_metas)
            offset += len(batch_docs)
            if len(batch_docs) < _BATCH:
                break

先注意这个分批取数据的循环------注释里写得很清楚：

Fetch all drawers in batches to avoid SQLite variable limit (~999)

这是一个生产级的细节 。ChromaDB 0.5.x 底层是 SQLite，它的 prepared-statement 参数上限默认是 999。如果你一次 get(limit=50000)，它会在大 collection 上炸掉。500 是一个安全的 batch size。这种坑是跑 benchmark 时才会撞到，README 不会告诉你，但源码里留着这个注释让后来人少踩坑。

接着是打分：

python 复制代码

        # Score each drawer: prefer high importance, recent filing
        scored = []
        for doc, meta in zip(docs, metas):
            importance = 3
            # Try multiple metadata keys that might carry weight info
            for key in ("importance", "emotional_weight", "weight"):
                val = meta.get(key)
                if val is not None:
                    try:
                        importance = float(val)
                    except (ValueError, TypeError):
                        pass
                    break
            scored.append((importance, meta, doc))

        # Sort by importance descending, take top N
        scored.sort(key=lambda x: x[0], reverse=True)
        top = scored[: self.MAX_DRAWERS]

这段有两处值得吐槽和学习：

多 key 兼容 （importance / emotional_weight / weight）：典型的"演进中代码"，不同时期 miner 写不同 key，这里做前缀容错。这种兼容代码在长期项目里是必要恶。
只按 importance 排序，没按时间加权 。注释写的是 prefer high importance, recent filing，但代码里实际没有 recent filing 的逻辑。这是一个 "docstring 说了但代码没做" 的小 bug。对 wake-up 体验影响是：如果你前几个月填了很多 high-importance，最新的可能进不了 L1。v3.0.14 里我判断这是 roadmap 项。

然后是分组渲染：

python 复制代码

        # Group by room for readability
        by_room = defaultdict(list)
        for imp, meta, doc in top:
            room = meta.get("room", "general")
            by_room[room].append((imp, meta, doc))

        # Build compact text
        lines = ["## L1 --- ESSENTIAL STORY"]

        total_len = 0
        for room, entries in sorted(by_room.items()):
            room_line = f"\n[{room}]"
            lines.append(room_line)
            total_len += len(room_line)

            for imp, meta, doc in entries:
                source = Path(meta.get("source_file", "")).name if meta.get("source_file") else ""

                # Truncate doc to keep L1 compact
                snippet = doc.strip().replace("\n", " ")
                if len(snippet) > 200:
                    snippet = snippet[:197] + "..."

                entry_line = f"  - {snippet}"
                if source:
                    entry_line += f"  ({source})"

                if total_len + len(entry_line) > self.MAX_CHARS:
                    lines.append("  ... (more in L3 search)")
                    return "\n".join(lines)

                lines.append(entry_line)
                total_len += len(entry_line)

        return "\n".join(lines)

几个很干净的工程取舍：

按 room 分组：因为 LLM 对"同一标题下的条目"理解比扁平列表好。
每条硬截断 200 字符：强制压榨。如果你想看全文，去 L3。
总长超 3200 字符就截断并写 "... (more in L3 search)" ：这是一条协议消息 ------它告诉 LLM "我知道还有更多，但我决定不贴了，你要是真想看就去调 mempalace_search"。这种显式"提示模型下一步动作"的 markdown 是典型的 LLM-friendly output design。

输出长这样：

复制代码

## L1 --- ESSENTIAL STORY

[auth-migration]
  - team decided to migrate auth to Clerk over Auth0 --- pricing + DX won out. Kai recommended...  (chat_2026-01-15.md)
  - Maya assigned to auth-migration 2026-01-15; completion target end of sprint 47...  (slack_dm.md)

[deploy-pipeline]
  - Rolled back the blue/green switch on 2026-02-03 due to Postgres connection pool exhaust...  (incident_log.md)
  ... (more in L3 search)

Why L1 要在 wake-up 时实时生成而不是缓存 ：
generate() 每次都重跑。原因是 importance 分数可能随新 drawer 加入而变化，缓存容易 stale。对于一个典型 palace（几千到几万 drawer），这次 query 在毫秒级完成，没必要缓存。

4.4 Layer 2 --- On-Demand（按 wing/room 过滤）

L2 是"知道自己要去哪个翼"时用的：

python 复制代码

# /tmp/mempalace_src/mempalace/layers.py
class Layer2:
    """
    ~200-500 tokens per retrieval.
    Loaded when a specific topic or wing comes up in conversation.
    Queries ChromaDB with a wing/room filter.
    """

    def __init__(self, palace_path: str = None):
        cfg = MempalaceConfig()
        self.palace_path = palace_path or cfg.palace_path

    def retrieve(self, wing: str = None, room: str = None, n_results: int = 10) -> str:
        """Retrieve drawers filtered by wing and/or room."""
        try:
            client = chromadb.PersistentClient(path=self.palace_path)
            col = client.get_collection("mempalace_drawers")
        except Exception:
            return "No palace found."

        where = {}
        if wing and room:
            where = {"$and": [{"wing": wing}, {"room": room}]}
        elif wing:
            where = {"wing": wing}
        elif room:
            where = {"room": room}

        kwargs = {"include": ["documents", "metadatas"], "limit": n_results}
        if where:
            kwargs["where"] = where

        try:
            results = col.get(**kwargs)
        except Exception as e:
            return f"Retrieval error: {e}"

这是纯 metadata 过滤 ，没有 query embedding ，用的是 col.get() 不是 col.query()。两者的区别：

方法	语义	触发检索？
`col.get(where=...)`	按 metadata 取全部匹配	否
`col.query(query_texts=..., where=...)`	语义相似度 + metadata 过滤	是

L2 用 get，因为它的调用场景是 "我已经定位到 wing/room 了，把里面的东西全拿来" 。它本质上是一个文件夹 ls。

Why L2 要独立存在 ：

L2 和 L3 最大的区别是：L2 的输入是结构化坐标 （wing + room），L3 的输入是自然语言 query 。当 LLM 已经通过对话上下文知道"用户在问 wing_kai / auth-migration"时，让它再跑一次语义搜索是浪费------直接 get 拿出来就好，既便宜又完整。

这个区分反映了一条 RAG 设计原则："能用结构化过滤的时候，不要用语义搜索"。语义搜索是相似度打分 + top-k 截断，有漏检风险；结构化过滤是"有就全给你"，保证 recall = 1.0。

4.5 Layer 3 --- Deep Search（全库语义搜索）

L3 才是真正的语义搜索：

python 复制代码

# /tmp/mempalace_src/mempalace/layers.py
class Layer3:
    """
    Unlimited depth. Semantic search against the full palace.
    Reuses searcher.py logic against mempalace_drawers.
    """

    def __init__(self, palace_path: str = None):
        cfg = MempalaceConfig()
        self.palace_path = palace_path or cfg.palace_path

    def search(self, query: str, wing: str = None, room: str = None, n_results: int = 5) -> str:
        """Semantic search, returns compact result text."""
        try:
            client = chromadb.PersistentClient(path=self.palace_path)
            col = client.get_collection("mempalace_drawers")
        except Exception:
            return "No palace found."

        where = {}
        if wing and room:
            where = {"$and": [{"wing": wing}, {"room": room}]}
        elif wing:
            where = {"wing": wing}
        elif room:
            where = {"room": room}

        kwargs = {
            "query_texts": [query],
            "n_results": n_results,
            "include": ["documents", "metadatas", "distances"],
        }
        if where:
            kwargs["where"] = where

        try:
            results = col.query(**kwargs)
        except Exception as e:
            return f"Search error: {e}"

注意：L3 的 where 子句仍然是可选的 wing/room 过滤。这意味着在实践中有三种查询形态：

查询形态	where	语义
L3 全局	空	跨所有 wing 语义搜索
L3 + wing	`{"wing": wing}`	wing 内语义搜索（+12% R@10）
L3 + wing + room	`{"$and": [...]}`	房间内语义搜索（+34% R@10）

README 里的表格直接告诉你这三种形态的 retrieval 提升差距：

复制代码

Search all closets:          60.9%  R@10
Search within wing:          73.1%  (+12%)
Search wing + hall:          84.8%  (+24%)
Search wing + room:          94.8%  (+34%)

这就是为什么"宫殿结构"不是装饰------它直接把 recall 从 60% 干到 95%。34 个百分点的提升在 retrieval 领域是巨大的------这是一个新嵌入模型都很难拿到的量级。

结果渲染部分：

python 复制代码

        docs = results["documents"][0]
        metas = results["metadatas"][0]
        dists = results["distances"][0]

        if not docs:
            return "No results found."

        lines = [f'## L3 --- SEARCH RESULTS for "{query}"']
        for i, (doc, meta, dist) in enumerate(zip(docs, metas, dists), 1):
            similarity = round(1 - dist, 3)
            wing_name = meta.get("wing", "?")
            room_name = meta.get("room", "?")
            source = Path(meta.get("source_file", "")).name if meta.get("source_file") else ""

            snippet = doc.strip().replace("\n", " ")
            if len(snippet) > 300:
                snippet = snippet[:297] + "..."

            lines.append(f"  [{i}] {wing_name}/{room_name} (sim={similarity})")
            lines.append(f"      {snippet}")
            if source:
                lines.append(f"      src: {source}")

        return "\n".join(lines)

两处细节：

similarity = round(1 - dist, 3)。ChromaDB 默认返回 L2 距离（或 cosine distance，取决于 embedding 函数）。MemPalace 把它转成"相似度"直接显示给 LLM。这样 LLM 就能自己判断"0.87 很相关 / 0.42 大概不相关"。
[1] wing/room (sim=0.87) 这种 "编号 + 坐标 + 分数"的输出格式。这是给 LLM 看的，不是给人看的。编号让 LLM 方便后续引用（"我觉得结果 $2$ 最相关"），坐标让它知道可以进一步 drill down 到那个 wing/room。

还有一个孪生方法 search_raw，返回 dict list 而不是格式化文本------供 Python API 用户或 MCP tool 结构化消费：

python 复制代码

    def search_raw(
        self, query: str, wing: str = None, room: str = None, n_results: int = 5
    ) -> list:
        """Return raw dicts instead of formatted text."""
        ...
        hits = []
        for doc, meta, dist in zip(
            results["documents"][0],
            results["metadatas"][0],
            results["distances"][0],
        ):
            hits.append(
                {
                    "text": doc,
                    "wing": meta.get("wing", "unknown"),
                    "room": meta.get("room", "unknown"),
                    "source_file": Path(meta.get("source_file", "?")).name,
                    "similarity": round(1 - dist, 3),
                    "metadata": meta,
                }
            )
        return hits

search vs search_raw 的分工，其实是 "给 LLM 吃的 markdown" vs "给程序吃的 JSON" 的经典二分。MCP server 里面调的是 search（因为 MCP tool 返回给 Claude 的就是 markdown），而 benchmark runner 调的是 search_raw（因为要自己算 metric）。

4.6 MemoryStack：统一入口

L0/L1/L2/L3 四个类再被 MemoryStack 包一层：

python 复制代码

# /tmp/mempalace_src/mempalace/layers.py
class MemoryStack:
    """
    The full 4-layer stack. One class, one palace, everything works.

        stack = MemoryStack()
        print(stack.wake_up())                # L0 + L1 (~600-900 tokens)
        print(stack.recall(wing="my_app"))     # L2 on-demand
        print(stack.search("pricing change"))  # L3 deep search
    """

    def __init__(self, palace_path: str = None, identity_path: str = None):
        cfg = MempalaceConfig()
        self.palace_path = palace_path or cfg.palace_path
        self.identity_path = identity_path or os.path.expanduser("~/.mempalace/identity.txt")

        self.l0 = Layer0(self.identity_path)
        self.l1 = Layer1(self.palace_path)
        self.l2 = Layer2(self.palace_path)
        self.l3 = Layer3(self.palace_path)

    def wake_up(self, wing: str = None) -> str:
        """
        Generate wake-up text: L0 (identity) + L1 (essential story).
        Typically ~600-900 tokens. Inject into system prompt or first message.
        """
        parts = []
        parts.append(self.l0.render())
        parts.append("")
        if wing:
            self.l1.wing = wing
        parts.append(self.l1.generate())
        return "\n".join(parts)

    def recall(self, wing: str = None, room: str = None, n_results: int = 10) -> str:
        return self.l2.retrieve(wing=wing, room=room, n_results=n_results)

    def search(self, query: str, wing: str = None, room: str = None, n_results: int = 5) -> str:
        return self.l3.search(query, wing=wing, room=room, n_results=n_results)

三个 public 方法对应三种使用场景：

复制代码

      用户打开 Claude
            │
            ▼
   ┌────────────────────┐
   │  stack.wake_up()   │ ← 一次性，贴到 system prompt
   │  L0 + L1 (~170t)   │
   └─────────┬──────────┘
             │
             ▼
      对话进行中...
             │
             ▼
   ┌────────────────────┐
   │  话题命中某 wing?   │
   └─────────┬──────────┘
        yes  │  no
     ┌───────┴──────┐
     ▼              ▼
  recall()       search()
  wing/room      free query
  ~200-500t      ~300-800t
  (L2)           (L3)

Why 三个方法名不一样（wake_up / recall / search） ：

这不是洁癖，是语义。三个方法对应三种"心智状态"：wake_up 是"开机"、recall 是"我记得这个事"、search 是"我想起来好像有过这个"。对 LLM 调用者（通过 MCP）来说，方法名本身就是调用提示------看到 recall 就知道要传坐标，看到 search 就知道传自然语言。

4.7 一个可复用的"字节预算"心智模型

我建议你读完这节后记住一张表，不管你自己做什么 memory 系统都可以抄：

复制代码

┌──────────────────────────────────────────────────────┐
│ 层级  │ 触发     │ 预算           │ 输入类型       │
├──────┼──────────┼────────────────┼──────────────┤
│  L0  │ 常驻     │ ~100 tokens    │ 人写死        │
│  L1  │ 常驻     │ ~500-800       │ 结构自动生成  │
│  L2  │ 按需     │ ~200-500/次    │ wing/room 坐标│
│  L3  │ 按需     │ ~300-800/次    │ 自然语言 query│
└──────────────────────────────────────────────────────┘
  常驻预算 ≤ 1000 tokens
  单次 retrieval 预算 ≤ 1000 tokens
  总预算目标 ≤ context window 的 5%

这个 "5% 预算"的原则，比任何具体的 embedding 模型选型都更值得借鉴。记忆系统的首要指标是"占得少"，其次才是"记得准"。

五. 配置系统：config.py 与 ~/.mempalace 目录 🔄

5.1 加载优先级：env > config.json > 默认值

config.py 开头的 docstring 就把优先级写得清清楚楚：

python 复制代码

# /tmp/mempalace_src/mempalace/config.py
"""
MemPalace configuration system.

Priority: env vars > config file (~/.mempalace/config.json) > defaults
"""

这个优先级是 "容器友好" 的标志------env vars 最高，意味着你可以在 Docker / systemd 里用环境变量直接覆盖：

python 复制代码

    @property
    def palace_path(self):
        """Path to the memory palace data directory."""
        env_val = os.environ.get("MEMPALACE_PALACE_PATH") or os.environ.get("MEMPAL_PALACE_PATH")
        if env_val:
            return env_val
        return self._file_config.get("palace_path", DEFAULT_PALACE_PATH)

注意它同时支持 MEMPALACE_PALACE_PATH 和 MEMPAL_PALACE_PATH 两个前缀------这是历史包袱（早期名字是 mempal），但保留旧名字体现了对已部署用户的尊重。

整个配置目录结构：

复制代码

~/.mempalace/
├── config.json          ← 全局配置（palace_path / collection_name / ...）
├── people_map.json      ← 人名别名映射（可选）
├── wing_config.json     ← 翼定义（由 `mempalace init` 生成）
├── identity.txt         ← L0 身份文本
├── hook_state/          ← 自动保存 hook 的状态
│   └── hook.log
└── palace/              ← 实际 ChromaDB 数据目录
    └── chroma.sqlite3

5.2 DEFAULT_HALL_KEYWORDS：硬编码的"初始词典"

前面 3.3 节已经贴过一次，这里再看一眼它的"初始词典性"：

python 复制代码

# /tmp/mempalace_src/mempalace/config.py
DEFAULT_HALL_KEYWORDS = {
    "emotions": ["scared", "afraid", "worried", ...],
    "consciousness": ["consciousness", "conscious", "aware", ...],
    "memory": ["memory", "remember", "forget", ...],
    "technical": ["code", "python", "script", "bug", ...],
    ...
}

它硬编码在 Python 源码里，但 hall_keywords property 允许 config.json 覆盖：

python 复制代码

    @property
    def hall_keywords(self):
        """Mapping of hall names to keyword lists."""
        return self._file_config.get("hall_keywords", DEFAULT_HALL_KEYWORDS)

Why 不抽成 yaml / json 数据文件 ：

因为它是兜底用的。即使用户的 config.json 丢了，代码也要能跑。硬编码保证你永远有一个可工作的初始状态。这是一个"宁可 duplicate 也要保证可启动"的决定。

5.3 wing_config.json、identity.txt、people_map.json

这三个文件由 mempalace init 交互式生成，是用户的"个性化层"：

文件	形态	用途
`wing_config.json`	JSON dict	翼名 → 关键词列表
`identity.txt`	纯文本	L0 身份
`people\_map.json`	JSON dict	昵称 → 规范名

README 里给的 wing_config.json 例子：

json 复制代码

{
  "default_wing": "wing_general",
  "wings": {
    "wing_kai":       {"type": "person",  "keywords": ["kai", "kai's"]},
    "wing_driftwood": {"type": "project", "keywords": ["driftwood", "analytics", "saas"]}
  }
}

people_map 专门独立成文件的原因：它既被 config 用，也被 entity_registry 用，还被 AAAK 用。抽成独立文件方便不同模块读同一份 source-of-truth，避免 "三处存同一个东西"。

python 复制代码

# /tmp/mempalace_src/mempalace/config.py
    def save_people_map(self, people_map):
        """Write people_map.json to config directory.

        Args:
            people_map: Dict mapping name variants to canonical names.
        """
        self._config_dir.mkdir(parents=True, exist_ok=True)
        with open(self._people_map_file, "w") as f:
            json.dump(people_map, f, indent=2)
        return self._people_map_file

注意这个 API 只有 save，没有 load------加载逻辑在 people_map property 里。这种"读/写职责分散"的小代码味儿在 v3 里还没有完全清理干净，属于 P3 级别的 tech debt。

5.4 Apple Silicon 段错误补丁：一行 env 看出工程态度

前面 2.2 节贴过这段，这里重贴一次，因为它和配置系统有关：

python 复制代码

# /tmp/mempalace_src/mempalace/__init__.py
if platform.machine() == "arm64" and platform.system() == "Darwin":
    os.environ.setdefault("ORT_DISABLE_COREML", "1")

用 setdefault 而不是直接赋值，意味着："如果用户自己设了，就听用户的" 。这是一个"给自己留退路"的好习惯------假设未来某天 CoreML 修好了，用户可以 export ORT_DISABLE_COREML=0 直接开回来，不用改代码。

README 的 "What we're doing" 里提到这是 Issue #74。从 setdefault 到 README 的 roadmap 一致，说明工程流程是对的：发现问题 → 先打补丁让用户能跑 → 在公开 issue 里追踪 → 未来版本移除。

六. ChromaDB 作为存储后端的工程权衡 📝

6.1 为什么不是 FAISS / Qdrant / Milvus

pyproject.toml 里 vector 相关依赖只有一行：

toml 复制代码

# /tmp/mempalace_src/pyproject.toml
"chromadb>=0.5.0,<0.7",

选 ChromaDB 而不是主流大家伙们，有三个原因：

候选	优势	MemPalace 弃用原因
FAISS	性能天花板	没有 metadata 过滤；需要自己搞持久化
Qdrant	metadata 过滤强	要跑独立 server 进程，违反"本地优先"
Milvus	企业级	重，同上
ChromaDB	嵌入式（SQLite 后端）+ metadata where 原生支持	✓

ChromaDB 的决定性卖点是：它是一个 Python 库，不是一个 server 。chromadb.PersistentClient(path=...) 直接打开一个本地目录，无 server、无端口、无守护进程。对一个号称"装完即用"的项目来说，这是硬需求。

layers.py 里每一次查询都是这样开头：

python 复制代码

        try:
            client = chromadb.PersistentClient(path=self.palace_path)
            col = client.get_collection("mempalace_drawers")
        except Exception:
            return "No palace found."

每次调用都重新打开 client------看起来很浪费，但 ChromaDB 的 PersistentClient 是 cached 的（同一个 path 只会初始化一次）。这种"每调用都 open"的写法换来的是无状态：任何进程、任何线程都能直接打开同一个 palace，无需 DI、无需 singleton、无需锁。

6.2 批量读取的 SQLite 变量上限坑

这个坑前面 4.3 节提过。再强调一次原因：

ChromaDB 0.5.x → SQLite → SQLite prepared statement 参数默认上限 999。

所以一次 col.get(limit=5000) 在老版本 SQLite 上会炸。MemPalace 的做法是 _BATCH = 500 分批取：

python 复制代码

        _BATCH = 500
        docs, metas = [], []
        offset = 0
        while True:
            kwargs = {"include": ["documents", "metadatas"], "limit": _BATCH, "offset": offset}
            ...

500 < 999，留了 2x 安全系数。这种 magic number 值得你在自己的代码里也照抄。

6.3 为什么 wing + room 过滤能涨 34% R@10

回到那张表：

复制代码

Search all closets:          60.9%  R@10
Search within wing:          73.1%  (+12%)
Search wing + hall:          84.8%  (+24%)
Search wing + room:          94.8%  (+34%)

本质原因很简单：embedding 空间里"主题相似度"和"人/项目归属"是纠缠在一起的 。当你搜 "auth migration"，嵌入模型会把所有项目的 auth 讨论都拉近------这对跨项目查询是好事，对单项目查询是噪音。metadata 过滤等于把候选集从"全库"缩小到"某个 wing"，让相似度计算在更纯净的空间里发生。

数学直觉：假设有 N 个 drawer，top-10 的 precision 随候选集缩小接近线性提升。从全库 N → wing 大概 N/5 → room 大概 N/50，候选集缩小一个数量级以上。这就是 34% 的来源。

Why 这不是一个 moat ：

作者自己在 4.7 日那封信里已经承认了："Metadata filtering is a standard ChromaDB feature, not a novel retrieval mechanism." 值得做、值得讲、但不是秘方。真正的秘方是**"想清楚怎么把对话切成 wing/room 这两个标签"**------那部分在 convo_miner.py 和 room_detector_local.py 里，留给第 06 篇。

七. MemPalace vs OpenClaw vs Mem0 vs Zep：一张大表 🔍

维度	MemPalace v3.0.14	OpenClaw	Mem0	Zep（Graphiti）
核心哲学	Store everything, then make it findable	LLM-as-curator	LLM extracts facts	Temporal KG first
存储形态	Raw verbatim 文本	Extracted facts	Extracted facts	Entity-relation graph
向量后端	ChromaDB (SQLite 嵌入式)	Qdrant / Postgres	Qdrant	Neo4j (云)
图后端	SQLite 自家 triple 表	无	无	Neo4j
写入时 LLM 调用	0（默认）	每次写都要	每次写都要	每次写都要
查询时 LLM 调用	0（L3 纯语义）	常用	常用	常用
本地/云	本地 only	两者	SaaS	SaaS
价格	$0	自托管免费	$19--249/月	$25/月+
API key 需求	无	有（LLM 调用）	有	有
记忆组织模型	Wing/Hall/Room/Closet/Drawer/Tunnel	层级标签	Facts with metadata	Temporal triples
显式作用域	wing + room（双层）	tag	metadata	graph path
分层加载	L0/L1/L2/L3 显式	隐式（靠检索）	无	无
LongMemEval R@5	96.6%（raw）	未公开	~85%	~85%
适合场景	个人/小团队、对"原文损失"敏感	多租户、中心化团队知识库	SaaS 友好、无需运维	强时序推理
工程复杂度	低	中	低	高
依赖锁定风险	极低（2 个 pip 包）	中	高（服务 API）	高（Neo4j）

我个人的观点（可吵架）：

如果你在做个人助手 / 开发者 sidekick：MemPalace 是目前最没脑子就能部署的方案。
如果你在做多用户 SaaS：MemPalace 不合适------它没有多租户隔离，ChromaDB 的 metadata 过滤在超大规模下会变慢。这种场景还是 Mem0 / 自托管 Zep 更成熟。
如果你需要做时序推理 （"谁在 2025 年 6 月在做什么"）：Zep 的 Graphiti 还是技术上最强的。但 MemPalace 的 KG 已经有 kg_timeline / kg_invalidate，只是目前没和 fact_checker wire 起来（07 篇细讲）。
如果你追 benchmark 数字：MemPalace raw mode 96.6% 目前就是 SOTA 的"zero-API"组。但要意识到这是在 LongMemEval 这一个数据集上的成绩，LoCoMo 只有 60.3%，迁移性没那么强。

八. 96.6% 从哪儿来、AAAK 的真实权衡 🎯

把这一节单拎出来，是因为我不希望你对着 "96.6%" 误判它的意义。按 4.7 日那封信和源码对照，96.6% 来源的因果链其实是：

复制代码

┌──────────────────────────────────────────────────────┐
│ 1. 原文切 chunk（按 exchange pair 切，不 summary）    │
│         ↓                                            │
│ 2. 写 metadata: {wing, room, hall, source_file}     │
│         ↓                                            │
│ 3. embed（ChromaDB 默认 all-MiniLM-L6-v2）           │
│         ↓                                            │
│ 4. 查询：col.query(query, where={wing,room})        │
│         ↓                                            │
│ 5. 返回 top-5                                        │
└──────────────────────────────────────────────────────┘

贡献因子（个人判断）：
  - chunk 策略（按 exchange 切）：   ~10 pts
  - metadata 过滤（wing/room）：     ~30 pts（正好对应 +34%）
  - 原文不做 summary：               ~10 pts
  - ChromaDB + MiniLM 的组合基线：  ~46 pts

AAAK 的真实状态（按 4.7 日那封信）：

模式	LongMemEval R@5
Raw	96.6%
AAAK	84.2% ← 回退 12.4 个百分点

换句话说，打开 AAAK 压缩现在反而会让 retrieval 变差。作者对这个结果的处理非常值得学习：

在 README 顶部公开承认
把 headline 数字显式改成 "raw mode"
在 benchmark 脚本里加 --mode raw/aaak/rooms 让用户自己对比
继续迭代而不是删掉 AAAK

这反映的是一种**"把系统的每一层单独可测"**的工程文化。如果一个系统的每一个子组件都能独立跑 benchmark，那么每一次重构都有地板------你永远能回到上一个可工作版本。

AAAK 本身作为一套"LLM 可直接读的 abbreviation dialect"还有它自己的价值------尤其是在 "context 里需要装非常多实体名" 的场景，比如 specialist agent 的 diary。那块我们会在 07 篇结合 dialect.py（1075 行）细讲。

九. 📚 小结与下篇预告

回顾这一篇我们走过的路：

哲学：MemPalace 代表 Agent Memory 领域的"索引优先学派"------它的赌注是**"不要让 LLM 决定记什么，让检索决定找什么"**。这和 OpenClaw 的"LLM 策展"学派针锋相对。
数据模型 ：Wing / Hall / Room / Closet / Drawer / Tunnel 六个概念组成了一套空间隐喻，但物理上只是 ChromaDB 一张表 mempalace_drawers 的 metadata。逻辑 ≠ 物理，这是个好设计。
四层记忆栈 ：L0 身份（~100t，常驻）/ L1 精华故事（~500--800t，常驻）/ L2 按 wing-room 过滤的结构化 recall / L3 全库语义 search。核心指标：常驻预算 ≤ 1000 tokens，总预算 ≤ context 的 5%。
配置系统 ：env > config.json > 默认值的三级优先；identity.txt 纯文本、wing_config.json/people_map.json 结构化；Apple Silicon 那一行 setdefault 是工程态度的缩影。
存储选型 ：ChromaDB 的嵌入式特性（SQLite 后端、无 server）是"本地优先"的技术基石。_BATCH=500 是必要的 SQLite 变量上限兼容。
benchmark ：96.6% R@5 真实但要正确理解------它来自 raw mode + wing/room 过滤，不是 AAAK，不是 rerank；34% 的提升来自标准 metadata filter，不是黑魔法。
诚实声明：README 顶部那封 4.7 日的信是一份值得做成 PR review 教材的自省。它告诉你这个项目"不会糊弄你"。

接下来三篇的分工：

【第 06 篇】挖掘流水线 ：convo_miner.py、miner.py、normalize.py、split_mega_files.py、room_detector_local.py、general_extractor.py、entity_detector.py ------ 一份原始对话导出是如何被切成 chunk、打上 wing/room/hall metadata、最后写进 ChromaDB 的完整管线。我们会专门分析它支持的 5 种 chat 导出格式和 exchange-pair chunking 策略。
【第 07 篇】AAAK + 知识图谱 + 检索 ：dialect.py（1075 行）、entity_registry.py（639 行）、knowledge_graph.py、palace_graph.py、searcher.py、fact_checker.py。我们会讲清楚 AAAK 的实际编码规则、它为什么 regress 了 12.4 个点、KG 的 SQLite triple 表结构、temporal validity 是怎么做的。
【第 08 篇】MCP 服务、Hook 与实战 ：mcp_server.py 的 19 个 tool、hooks/mempal_save_hook.sh 的 SAVE_INTERVAL / stop_hook_active 两段状态机、跟 Claude Code 的 plugin marketplace 集成、以及一个端到端 demo：从空 palace 到"你的 Claude 记得你上周的决定"。

如果你是一个正在写自家 Agent Memory 的工程师，这一篇值得你记住的一条可执行建议是：

把你的 memory 系统拆成"常驻 L0/L1 + 按需 L2/L3"两个部分，给常驻部分定死字节预算，然后围绕这个预算倒推存储模型。

这比"用什么 embedding model"或者"要不要上 KG"更早一步决定你项目的成败。

十. 📖 参考文献

MemPalace 源码文件（本篇涉及）：

/tmp/mempalace_src/README.md ------ 设计哲学、宫殿隐喻全景、benchmark 表、4.7 日诚实声明
/tmp/mempalace_src/pyproject.toml ------ 依赖（chromadb>=0.5.0,<0.7、pyyaml>=6.0）、版本 3.0.14
/tmp/mempalace_src/mempalace/__init__.py ------ Apple Silicon CoreML 段错误补丁、ChromaDB posthog 噪音抑制
/tmp/mempalace_src/mempalace/version.py ------ 单一版本源：__version__ = "3.0.14"
/tmp/mempalace_src/mempalace/config.py ------ MempalaceConfig、DEFAULT_PALACE_PATH、DEFAULT_TOPIC_WINGS、DEFAULT_HALL_KEYWORDS
/tmp/mempalace_src/mempalace/layers.py ------ 四层记忆栈 L0/L1/L2/L3、MemoryStack、批量读取 _BATCH=500
/tmp/mempalace_src/mempalace/entity_registry.py ------ 顶层结构（详细留到第 07 篇）
/tmp/mempalace_src/mempalace/dialect.py ------ AAAK 概述（详细留到第 07 篇）
/tmp/mempalace_src/hooks/README.md ------ 自动保存 hook 的 Stop / PreCompact 状态机

外部链接：

GitHub: milla-jovovich/mempalace
LongMemEval 原论文: Wu et al., 2024
LoCoMo 原论文: Maharana et al., 2024
ChromaDB 官方文档: https://docs.trychroma.com
作者致谢的关键 Issue：#27（AAAK 规格）、#39（M2 Ultra 复现）、#43（token 计数批评）、#74（Apple Silicon 段错误）、#100（ChromaDB 版本锁）、#110（hook shell injection）