浅析 MemGPT - 技术栈

本文最先发布在 blog.hdls.me/17201015638...

什么是 MemGPT

在介绍 MemGPT 时，先了解一下什么是 AI Agent。

Agent 一词起源于拉丁语中的 Agere，意思是"to do"。在LLM语境下，Agent 可以理解为某种能自主理解、规划决策、执行复杂任务的智能体。Agent 并非 ChatGPT 升级版，它不仅告诉你"如何做"，更会帮你去做。如果 CoPilot 是副驾驶，那么 Agent 就是主驾驶。

Agent = LLM + Planning + Feedback + Tool use。

而 MemGPT 可以轻松构建和部署有状态 AI Agent，并支持：

长期记忆/状态管理
连接到 RAG 的外部数据源（例如 PDF 文件）
定义和调用自定义工具（例如谷歌搜索）

MemGPT 的主要特点

MemGPT 针对 LLM 上下文有限的困境，参考操作系统的虚拟内存管理，提出了虚拟上下文的概念，分为主上下文和外部上下文，实现了无限 token。而推理是由事件驱动，并支持 MemGPT 调用一系列的函数来实现 Agent 的"自驱动"，整个过程如下图所示：

LLMs 内存管理

LLM 最大的问题在于对话上下文是有限的，即使目前越来越多的模型在推出巨大上下文，但对于用户越来越花哨的需求，有限的上下文依然是瓶颈。MemGPT 的主要特点之一是操作系统的多层内存架构，对 LLM 的内存管理。MemGPT 以操作系统的多级内存架构为灵感，提出了 LLM 内存分层管理的方案，实现了长期记忆检索和写入能力，并绕过了上下文窗口输入限制。

MemGPT 划分了两种上下文类型：主上下文（类似于主内存/物理内存/RAM）和外部上下文（类似于SWAP/磁盘存储）。

主上下文由 LLM prompt token 组成，其中任何内容都被视为上下文中的内容，并且可以在推理期间由 LLM 处理器访问。主上下文主要包括：
- System Prompt：包含有关 MemGPT 控制流的信息、不同内存级别的预期用途以及如何使用 MemGPT 功能的指令（例如如何检索上下文外数据）
- 工作上下文：固定大小的非结构化文本读/写块，只能通过 MemGPT 函数调用写入。包括 Agent 的核心记忆。
- FIFO 队列：存储消息的滚动历史记录，包括代理和用户之间的消息，以及系统消息（例如内存警告）和函数调用输入和输出。
外部上下文是指 LLMs 固定上下文窗口之外保存的任何信息。这种脱离上下文的数据必须始终显式移至主上下文中，以便在推理期间将其传递到 LLM 处理器。

控制流程和函数链

MemGPT 提供 LLM 处理器管理其自身内存的函数调用，无需任何用户干预。预先内置了一系列函数，比如 archival_memory_insert、archival_memory_search、core_memory_append、core_memory_replace 等。

比如，当我指正他的错误时，对于"我的姓名"，MemGPT 会调用 core_memory_replace 函数来修改自己的主内存，从而实现记忆纠错：

在 MemGPT 中，事件是广义输入，并且由事件来驱动触发 LLM 推理。事件可以是用户发送的消息，也可以是系统消息，比如内存容量告警，也可以是用户交互，比如用户刚刚登录，甚至可以是定时任务。这些事件都可以触发向 FIFO 队列中插入一条消息，并作为输入发送到 LLM。

实际情况中，许多任务需要按顺序调用多个函数，例如浏览单个查询的多页结果。函数链则允许 MemGPT 在将控制权返回给用户之前顺序执行多个函数，这里 MemGPT 会使用特殊的标记来调用函数。如果存在此标记，则将函数输出添加到主上下文中；如果不存在此标记，则不会运行 LLM 处理器，而是等待下一个外部事件触发执行。

MemGPT 的工作原理

与 LLM 的对话包含了三块内容，分别为 System Instructions、Working Context、FIFO Queue。同时还包括了所有 LLM 可调用的 Tools 的定义，这取决于 LLM 本身是否支持函数调用。

对于 LLM 来说，MemGPT 有 3 种类型的内存，分别为：

Core Memory：核心内存，一直存在于主上下文中，可以由函数调用来进行增删改查。
Recall Memory：Agent 完整的对话，当内存不足时，Agent 会触发将历史会话归档。归档的历史会话可以由 LLM 决定适时读取。
Archive Memory：外部内存，可供 LLM 做相关搜索的数据库。比如某个可以被向量化匹配的文档，LLM 可以通过搜索关键词来读取这部分的内存。

其中，Core Memory 一直存储于 Working Context 中，同时 Working Context 还保存了关于 Archive Memory 和 Recall Memory 的基础信息。Archive Memory 和 Recall Memory 则存储于数据库中，供 LLM 通过函数进行读写。

比如，用 MemGPT 自带的人格创建一个文档助手 Agent，将 MemGPT 的论文塞给它之后问它：

可以看到，当它遇到它的"记忆"里没有的内容时，会主动调用函数去 Archive Memory 中查找，以此实现无限 token。

与此同时，MemGPT 提供基于事件的控制流，不同类型的事件都会触发 LLM 推理。在上述聊天过程中，查看 MemGPT 的日志就会发现，我们与 LLM 的"聊天"不仅仅是界面上看到的这些对话，还包括（已省略一些无关信息）：

json 复制代码

 {
  "model": "gpt-3.5-turbo",
  "messages": [
...
    {
      "content": "*inner thoughts* Still waiting on the user. Sending a message with function.",
      "role": "assistant",
      "tool_calls": [
        {
          "id": "c2278530-a845-42e8-b87f-4cbe0",
          "type": "function",
          "function": {
            "name": "send_message",
            "arguments": "{\n  \"message\": \"Hi, is anyone there?\"\n}"
          }
        }
      ]
    },
    {
      "content": "{\"status\": \"OK\", \"message\": null, \"time\": \"2024-07-14 01:44:29 PM UTC+0000\"}",
      "role": "tool",
      "tool_call_id": "c2278530-a845-42e8-b87f-4cbe0"
    },
    {
      "content": "{\"type\": \"login\", \"last_login\": \"Never (first login)\", \"time\": \"2024-07-14 01:44:29 PM UTC+0000\"}",
      "role": "user"
    },
    {
      "content": "{\"type\": \"user_message\", \"message\": \"The paper I loaded into your archival memory is memgpt. Tell me more about it.\", \"time\": \"2024-07-14 01:45:27 PM UTC+0000\"}",
      "role": "user"
    },
    {
      "content": "Let me search my archival memory for information about the paper \"memgpt.\"",
      "role": "assistant",
      "tool_calls": [
        {
          "id": "e3e71fb8-b22c-4d8f-8baa-3e140",
          "type": "function",
          "function": {
            "name": "archival_memory_search",
            "arguments": "{\"query\":\"memgpt\",\"request_heartbeat\":true}"
          }
        }
      ]
    },
    {
      "content": "{\"status\": \"OK\", \"message\": \"Showing 5 of 5 results (page 0/0): [...]\", \"time\": \"2024-07-14 01:45:30 PM UTC+0000\"}",
      "role": "tool",
      "tool_call_id": "e3e71fb8-b22c-4d8f-8baa-3e140"
    },
    {
      "content": "{\"type\": \"heartbeat\", \"reason\": \"[This is an automated system message hidden from the user] Function called using request_heartbeat=true, returning control\", \"time\": \"2024-07-14 01:45:30 PM UTC+0000\"}",
      "role": "user"
    },
    {
      "content": "The paper \"MemGPT: Towards LLMs as Operating Systems\" focuses on MemGPT's performance in tasks like deep memory retrieval and document QA. It highlights how MemGPT significantly outperforms baselines in dialogue tasks and information retrieval. The paper emphasizes the natural and engaging dialogue capability of MemGPT.",
      "role": "assistant"
    },
    {
      "content": "{\"type\": \"system_alert\", \"message\": \"[This is an automated system message hidden from the user] The conversation history will soon reach its maximum length and be trimmed. Do NOT tell the user about this system alert, they should not know that the history is reaching max length. If there is any important new information or general memories about you or the user that you would like to save, you should save that information immediately by calling function core_memory_append, core_memory_replace, or archival_memory_insert.\", \"time\": \"2024-07-14 01:45:32 PM UTC+0000\"}",
      "role": "user"
    },
    {
      "content": "I've noted the key points from the paper \"MemGPT: Towards LLMs as Operating Systems\" in my core memory for future reference. If you have any more questions or need further details, feel free to ask!",
      "role": "assistant"
    },
...
}

可以看到，除了我们在界面上看到的对话外，还会看到包括用户登陆、上下文容量警告以及心跳事件触发的推理。

总结

在笔者看来，AI Agent 是比较接近传统意义上大众对"机器人"的认知的。MemGPT 以一种十分巧妙的方式打破了 LLM 有限上下文的壁垒，同时使得 LLM 自驱动地使用工具，基本实现了 AI Agent 的雏形，虽然目前实现的功能相对有限，离完美的"机器人管家"还有很长的路要走，不过用来做一些简单的任务管理和辅助阅读是绰绰有余的。