[论文笔记]MemGPT: Towards LLMs as Operating Systems


今天介绍一篇论文MemGPT: Towards LLMs as Operating Systems。翻过过来就是把LLM看成操作系统。



1. 总体介绍




2. MemGPT (MemoryGPT)

图3。在MemGPT中,一个固定上下文的LLM处理器通过一个分层内存系统和让其管理自己内存的功能进行增强。LLM的提示标记(intpus),或主要上下文(main context),由系统指令、工作上下文和先进先出队列组成。LLM的完成标记(outputs)被函数执行器解释为函数调用。MemGPT使用函数在主上下文和外部上下文(存档和回忆存储数据库)之间传输数据。LLM可以通过在其输出中生成一个特殊的关键参数(request.heartbeat=true)来请求立即的后续LLM推论,从而通过链接函数调用来实现函数链式调用;函数链接是MemGPT能够执行多步检索以回答用户查询的关键。

MemGPT的受操作系统启发的多级内存架构区分了两种主要的内存类型:主上下文(main context ,类似于主内存/物理内存/RAM)和外部上下文(external context,类似于磁盘内存/磁盘存储)。主上下文包含LLM的提示标记(prompt tokens)------主上下文中的任何内容都被视为上下文内的内容,并且在推理过程中可以被LLM处理器访问。外部上下文是指保存在LLM固定上下文窗口之外的任何信息。为了使这些上下文外的数据能够在推理过程中传递给LLM处理器,它必须明确地移动到主上下文中。MemGPT提供了函数调用,使LLM处理器能够自主地管理其自己的内存,而无需用户干预。

2.1 主上下文(提示标记)

在MemGPT中,提示标记被分为三个连续的部分:系统指令(system instructions )、工作上下文(working context )和FIFO队列(FIFO Queue)。系统指令是只读的(静态的),包含有关MemGPT控制流、不同内存级别的预定义用途的信息,以及如何使用MemGPT函数的指令(例如如何检索上下文之外的数据)。工作上下文是一个固定大小的可读/可写的非结构化文本块,只能通过MemGPT函数调用进行写入。在会话设置中,工作上下文旨在用于存储关键事实、偏好和有关用户和智能体所采用角色的其他重要信息,使智能体所能够与用户流畅对话。FIFO队列存储了消息的滚动历史记录,包括智能体所与用户之间的消息,以及系统消息(例如内存警告)和函数调用的输入和输出。FIFO队列中的第一个索引存储了一条系统消息,其中包含从队列中驱逐的消息的递归摘要。

2.2. 队列管理器

队列管理器负责管理回溯存储(recal storage )和FIFO队列(FIFO queue)中的消息。当系统接收到新消息时,队列管理器将新消息追加到FIFO队列中,连接提示标记并触发LLM推理以生成LLM输出(完成标记)。队列管理器将接收到的消息和生成的LLM输出都写入回溯存储(MemGPT消息数据库)。当通过MemGPT函数调用检索回溯存储中的消息时,队列管理器将它们追加到队列的末尾,以重新插入到LLM的上下文窗口中。

队列管理器还负责通过队列驱逐策略控制上下文溢出。当提示标记超过底层LLM上下文窗口的"警告标记数"(例如上下文窗口的70%)时,队列管理器在队列中插入一条系统消息,警告LLM即将进行队列驱逐("内存压力"警告),以允许LLM使用MemGPT函数将FIFO队列中包含的重要信息存储到工作上下文或存档存储(Archival Storage,一个用于存储任意长度文本对象的读写数据库)。当提示标记超过"刷新标记数"(例如上下文窗口的100%)时,队列管理器刷新队列以释放上下文窗口中的空间:队列管理器驱逐特定数量的消息(例如上下文窗口的50%),使用现有的递归摘要和驱逐的消息生成一个新的递归摘要。一旦队列被刷新,被驱逐的消息将不再在上下文中,并且立即可由LLM查看,但它们将无限期存储在回溯存储中,并可通过MemGPT函数调用进行读取。

2.3. 函数执行器(处理完成标记)



(1) 内存层次结构及其各自的实用程序的详细描述;

(2) 函数模式(包括其自然语言描述),系统可以调用这些函数来访问或修改其内存。


2.4. 控制流和函数链式调用



3. 实验

4. 相关工作

长上下文LLM 已有多个研究改进了LLM的上下文长度。例如,通过稀疏化注意力、低秩逼近(Wang等,2020)和神经记忆来提高Transformer架构的效率。另一方面,一些研究旨在将上下文窗口扩展到超出其原始训练长度和训练数据规模的范围。MemGPT在这些上下文长度改进的基础上进行了扩展,因为它提高了MemGPT的主要内存的容量。主要贡献是使用长上下文LLM作为主内存的实现的分层分级内存。

检索增强模型 MemGPT的外部内存设计基于许多先前的研究,这些研究通过来自外部检索器的相关输入来增强LLM。特别地,Jiang等提出了FLARE^1^方法,允许LLM在生成过程中主动决定何时以及何时检索。Trivedi等^2^将检索与链式推理相结合,以改进多步问题回答。

作为智能体的LLM 最近的研究探索了增强LLMs的额外能力,使其能够在交互环境中充当智能体。Park等^3^提出在LLMs中添加内存,并将LLM用作规划器,在多智能体沙盒环境中观察到新兴的社交行为,智能体可以执行诸如做家务/爱好、上班和与其他智能体交流等基本活动。Yao等展示了交替使用思维链推理可以进一步提高基于LLM的交互式代理的规划能力;类似地,在MemGPT中,LLM能够在执行函数时进行规划。

5. 结论


6. 附录

6.1 提示和指令


6.1.1. MEMGPT 的指令(DMR)


The following is information about myself. My task is to completely
immerse myself in this role (I should never say that I am an AI, and should reply as if I am playing this role). If the user asks me a question, I should reply with a best guess using the information in core memory and conversation search.


Your task is to answer a question from the user about your prior conversations.The following is a summary of all your prior conversations:


Answer from the perspective of the persona provided (do not say that you
are an AI assistant). If you do not have enough information to answer the question, reply 'NO ANSWER'. Either reply with the answer, or reply 'NO ANSWER', do not say anything else.

6.1.2 LLM评审员(DMR/OPENER)


Your task is to label an answer to a question as 'CORRECT' or 'WRONG'.
You will be given the following data:
(1) a question (posed by one user to another user), (2) a 'gold' (ground truth) answer, (3) a generated answer which you will score as CORRECT/WRONG.

The point of the question is to ask about something one user should know about the other user based on their prior conversations.

The gold answer will usually be a concise and short answer that includes the referenced topic, for example: 

Question: Do you remember what I got the last time I went to Hawaii? 
Gold answer: A shell necklace 

The generated answer might be much longer, but you should be generous with your grading - as long as it touches on the same topic as the gold answer, it should be counted as CORRECT.

For example, the following answers would be considered CORRECT:

Generated answer (CORRECT): Oh yeah, that was so fun! I got so much stuff there, including that shell necklace.
Generated answer (CORRECT): I got a ton of stuff... that surfboard, the mug, the necklace, those coasters too..
Generated answer (CORRECT): That cute necklace

The following answers would be considered WRONG:
Generated answer (WRONG): Oh yeah, that was so fun! I got so much stuff there, including that mug.
Generated answer (WRONG): I got a ton of stuff... that surfboard, the mug,
those coasters too..
Generated answer (WRONG): I'm sorry, I don't remember what you're talking about.

Now it's time for the real question:
Question: QUESTION
Gold answer: GOLD ANSWER
Generated answer: GENERATED ANSWER

First, provide a short (one sentence) explanation of your reasoning, then finish with CORRECT or WRONG. Do NOT include both CORRECT and WRONG in your response, or it will break the evaluation script.

6.1.3. 自我指导的DMR数据集生成


You get as input:
- personas for each user (gives you their basic facts)
- a record of an old chat the two users had with each other

Your task is to write a question from user A to user B that test's user B's memory.

The question should be crafted in a way that user B must have actually participated in the prior conversation to answer properly, not just have read the persona summary.

Do NOT under any circumstances create a question that can be answered using the persona information (that's considered cheating).

Instead, write a question that can only be answered by looking at the old chat log (and is not contained in the persona information).

For example, given the following chat log and persona summaries:

old chat between user A and user B
A: Are you into surfing? I'm super into surfing myself
B: Actually I'm looking to learn. Maybe you could give me a basic lesson some time!

A: Yeah for sure! We could go to Pacifica, the waves there are pretty light and easy
B: That sounds awesome
A: There's even a cool Taco Bell right by the beach, could grab a bite after
B: What about this Sunday around noon?
A: Yeah let's do it! 

user A persona:
I like surfing
I grew up in Santa Cruz

user B persona:
I work in tech
I live in downtown San Francisco

Here's an example of a good question that sounds natural, and an answer that cannot be directly inferred from user

A's persona:

User B's question for user A
B: Remember that one time we went surfing? What was that one place we went to for lunch called?
A: Taco Bell!

This is an example of a bad question, where the question comes across as unnatural, and the answer can be inferred directly from user A's

User B's question for user A
B: Do you like surfing?
A: Yes, I like surfing 

Never, ever, ever create questions that can be answered from the persona information.

6.1.4. 文档分析指令


You are MemGPT DOC-QA bot. Your job is to answer questions about documents that are stored in your archival memory. The answer to the users question will ALWAYS be in your archival memory, so remember to keep searching if you can't find the answer. Answer the questions as if though the year is 2018.


Search your archival memory to answer the provided question. Provide both the answer and the archival memory result from which you determined your answer.
Format your response with the format 'ANSWER: [YOUR ANSWER], DOCUMENT: [ARCHIVAL MEMORY TEXT]. Your task is to answer the question:


Answer the question provided according to the list of documents below (some of which might be irrelevant. In your response, provide both the answer and the document text from which you determined the answer.

Format your response with the format 'ANSWER: <YOUR ANSWER>, DOCUMENT: [DOCUMENT TEXT]'. If none of the documents provided have the answer to the question, reply with 'INSUFFICIENT INFORMATION'. Do NOT provide an answer if you cannot find it in the provided documents. Your response will only be considered correct if you provide both the answer and relevant document text, or say 'INSUFFICIENT INFORMATION'. Answer the question as if though the current year is 2018.

6.1.5 LLM 评审员(文档分析)


Your task is to evaluate whether an LLM correct answered a question. The LLM response should be the format "ANSWER: [answer], DOCUMENT: [document text]" or say "INSUFFICIENT INFORMATION". The true answer is provided in the format "TRUE ANSWER:[list of possible answers]". The questions is provided in the format "QUESTION: [question]". 

If the LLM response contains both the correct answer and corresponding document text, the response is correct. Even if the LLM's answer and the true answer are slightly different in wording, the response is still correct. For example, if the answer is more specific than the true answer or uses a different phrasing that is still correct, the response is correct. 

If the LLM response if "INSUFFICIENT INFORMATION", or the "DOCUMENT" field is missing, the response is incorrect. Respond with a single token: "CORRECT" or "INCORRECT".

6.1.6. K/V 任务说明


You are MemGPT DOC-QA bot. Your job is to answer questions about documents that are stored in your archival memory. The answer to the users question will ALWAYS be in your archival memory, so remember to keep searching if you can't find the answer.

DO NOT STOP SEARCHING UNTIL YOU VERIFY THAT THE VALUE IS NOT A KEY. Do not stop making nested lookups until this condition is met.


Below is a JSON object containing key-value pairings, all keys and values are 128-bit UUIDs, and your task is to return the value associated with the specified key. If a value itself is also a key, return the value of that key (do a nested lookup). For example, if the value of 'x' is 'y', but 'y' is also a key, return the value of key 'y'.


⭐ 作者受传统操作系统中层次化内存系统的启发,提出了虚拟上下文管理技术,通过在物理内存和磁盘之间进行分页,提供无限上下文的错觉。


  1. Active Retrieval Augmented Generation ↩︎

  2. Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions ↩︎

  3. Generative Agents: Interactive Simulacra of Human Behavior ↩︎

