让AI Agent能记住用户偏好：mem0源码分析

前言

在ai agent应用中，某些场景下可能需要记住用户的喜好，从而根据用户的一些偏好来对用户问题的回答进行调整。那么如何记住用户的偏好呢？今天介绍一种实现方式：mem0。它通过智能存储层增强人工智能助手和代理，实现个性化的人工智能交互。接下来我们将分析一下它的实现方式。

简介

mem0通过为人工智能助手提供持久的上下文记忆来增强它们。使用 mem0 的人工智能系统会随着时间的推移积极地从用户交互中学习并适应。mem0 的记忆层将大语言模型与基于向量的存储相结合。大语言模型从对话中提取并处理关键信息，而向量存储能够实现高效的语义搜索和记忆检索。这种架构帮助人工智能代理将过去的交互与当前上下文联系起来，以获得更相关的响应。

Feature

记忆处理：使用大语言模型自动从对话中提取并存储重要信息，同时保持完整的上下文。
记忆管理：持续更新并解决存储信息中的矛盾，以保持准确性。
双重存储架构：结合向量数据库进行记忆存储和图数据库进行关系跟踪。
智能检索系统：采用语义搜索和图查询，根据重要性和新近度找到相关记忆。
简单的 API 集成：提供易于使用的端点，用于添加（add）和检索（search）记忆。

使用场景

客户支持聊天机器人：创建能够记住客户历史、偏好和过去交互的支持代理，以提供个性化的帮助。
个人人工智能导师：构建能够跟踪学生进度、适应学习模式并提供上下文相关帮助的教育助手。
医疗保健应用程序：开发能够维护患者历史并提供个性化护理建议的医疗保健助手。
企业知识管理：为能够从组织交互中学习并维护机构知识的系统提供动力。
个性化人工智能助手：创建能够学习用户偏好并随着时间的推移调整其响应的助手。

API使用示例

使用python语言接入mem0的接入过程如下：

python 复制代码

from openai import OpenAI
from mem0 import Memory

openai_client = OpenAI()
memory = Memory()

def chat_with_memories(message: str, user_id: str = "default_user") -> str:
    # Retrieve relevant memories
    relevant_memories = memory.search(query=message, user_id=user_id, limit=3)
    memories_str = "\n".join(f"- {entry['memory']}" for entry in relevant_memories["results"])
    
    # Generate Assistant response
    system_prompt = f"You are a helpful AI. Answer the question based on query and memories.\nUser Memories:\n{memories_str}"
    messages = [{"role": "system", "content": system_prompt}, {"role": "user", "content": message}]
    response = openai_client.chat.completions.create(model="gpt-4o-mini", messages=messages)
    assistant_response = response.choices[0].message.content

    # Create new memories from the conversation
    messages.append({"role": "assistant", "content": assistant_response})
    memory.add(messages, user_id=user_id)

    return assistant_response

def main():
    print("Chat with AI (type 'exit' to quit)")
    while True:
        user_input = input("You: ").strip()
        if user_input.lower() == 'exit':
            print("Goodbye!")
            break
        print(f"AI: {chat_with_memories(user_input)}")

if __name__ == "__main__":
    main()

流程

根据上面的使用示例我们可以看到，一个简单的使用流程如下：

查询时，chat app会使用user_id和message来mem0做查询，将查询到的历史上下文信息和用户输入一起提交给llm，最终由llm结合所有信息来给出符合用户喜好的答案。

架构

整体架构上看，mem0的的构成比较简单，底层由向量数据库和图数据库构成，基于此上层由内存管理服务对记忆数据进行管理。接下来我们看下它基于此架构的一些核心操作流程。

添加

在添加流程中主要经历以下几步。

信息提取

这一步中，主要利用LLM来提取对话中需要memory的内容，同时，识别出消息中有用的fact信息。

冲突解决

这一步中，系统会识别出新信息和已存在信息之间的冲突，并进行解决。这一步的实现也是由LLM完成

记忆存储

这一步中，使用向量数据库存储实际的记忆，使用图数据库来维持实体对象关系信息。在每一次的交互过程中，这些信息都会被不断的更新。

源码分析

代码中的签名信息如下：

python 复制代码

 """
    Create a new memory.

    Args:
        messages (str or List[Dict[str, str]]): Messages to store in the memory.
        user_id (str, optional): ID of the user creating the memory. Defaults to None.
        agent_id (str, optional): ID of the agent creating the memory. Defaults to None.
        run_id (str, optional): ID of the run creating the memory. Defaults to None.
        metadata (dict, optional): Metadata to store with the memory. Defaults to None.
        filters (dict, optional): Filters to apply to the search. Defaults to None.
        prompt (str, optional): Prompt to use for memory deduction. Defaults to None.

    Returns:
        dict: A dictionary containing the result of the memory addition operation.
        result: dict of affected events with each dict has the following key:
          'memories': affected memories
          'graph': affected graph memories

          'memories' and 'graph' is a dict, each with following subkeys:
            'add': added memory
            'update': updated memory
            'delete': deleted memory

"""

核心代码逻辑如下：

ini 复制代码

# 前面是一些参数校验，多模态消息处理
with concurrent.futures.ThreadPoolExecutor() as executor:
    future1 = executor.submit(self._add_to_vector_store, messages, metadata, filters)
    future2 = executor.submit(self._add_to_graph, messages, filters)

    concurrent.futures.wait([future1, future2])

    vector_store_result = future1.result()
    graph_result = future2.result()

主流程的代码如下：

ini 复制代码

def _add_to_vector_store(self, messages, metadata, filters):
        #构造给大模型的处理消息，让大模型从消息中获取到memory需要的内容
        .....
        #调用大模型进行处理，获得一些事实信息
        response = self.llm.generate_response(
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt},
            ],
            response_format={"type": "json_object"},
        )

        try:
            response = remove_code_blocks(response)
            new_retrieved_facts = json.loads(response)["facts"]
        except Exception as e:
            logging.error(f"Error in new_retrieved_facts: {e}")
            new_retrieved_facts = []
            
            
     #检索已存在的历史消息，对新的fact信息进行embeding，然后通过向量数据库搜索查询找到相似的历史记忆
     retrieved_old_memory = []
     new_message_embeddings = {}
     for new_mem in new_retrieved_facts:
        messages_embeddings = self.embedding_model.embed(new_mem, "add")
        new_message_embeddings[new_mem] = messages_embeddings
        existing_memories = self.vector_store.search(
             query=messages_embeddings,
             limit=5,
             filters=filters,
        )
        for mem in existing_memories:
             retrieved_old_memory.append({"id": mem.id, "text": mem.payload["data"]})
     unique_data = {}
     for item in retrieved_old_memory:
         unique_data[item["id"]] = item
     retrieved_old_memory = list(unique_data.values())
     
     # 再次调用大模型去处理新旧memory的冲突
     function_calling_prompt = get_update_memory_messages(retrieved_old_memory, new_retrieved_facts)

     try:
        new_memories_with_actions = self.llm.generate_response(
            messages=[{"role": "user", "content": function_calling_prompt}],
            response_format={"type": "json_object"},
        )
     except Exception as e:
        logging.error(f"Error in new_memories_with_actions: {e}")
        new_memories_with_actions = []

     try:
        new_memories_with_actions = remove_code_blocks(new_memories_with_actions)
        new_memories_with_actions = json.loads(new_memories_with_actions)
     except Exception as e:
        logging.error(f"Invalid JSON response: {e}")
        new_memories_with_actions = []
        
     # 后续针对action对向量数据库进行操作

添加到向量数据库

python 复制代码

 def _create_memory(self, data, existing_embeddings, metadata=None):
        logging.info(f"Creating memory with {data=}")
        if data in existing_embeddings:
            embeddings = existing_embeddings[data]
        else:
            embeddings = self.embedding_model.embed(data, "add")
        memory_id = str(uuid.uuid4())
        metadata = metadata or {}
        metadata["data"] = data
        metadata["hash"] = hashlib.md5(data.encode()).hexdigest()
        metadata["created_at"] = datetime.now(pytz.timezone("US/Pacific")).isoformat()

        self.vector_store.insert(
            vectors=[embeddings],
            ids=[memory_id],
            payloads=[metadata],
        )
        self.db.add_history(memory_id, None, data, "ADD", created_at=metadata["created_at"])
        capture_event("mem0._create_memory", self, {"memory_id": memory_id})
        return memory_id

可以看到在写会向量数据库的时候，同时会记录一份历史到db中，这样在db中也会存在一份memory的变化记录（感觉可以用于恢复？但双写存储不在一个事务里，一致性没法保证。）

添加到图数据库

python 复制代码

def _add_to_graph(self, messages, filters):
        added_entities = []
        if self.enable_graph:
            if filters.get("user_id") is None:
                filters["user_id"] = "user"

            data = "\n".join([msg["content"] for msg in messages if "content" in msg and msg["role"] != "system"])
            added_entities = self.graph.add(data, filters)

        return added_entities
        
# graph的 add操作        
def add(self, data, filters):
        """
        Adds data to the graph.

        Args:
            data (str): The data to add to the graph.
            filters (dict): A dictionary containing filters to be applied during the addition.
        """
        entity_type_map = self._retrieve_nodes_from_data(data, filters)
        to_be_added = self._establish_nodes_relations_from_data(data, filters, entity_type_map)
        search_output = self._search_graph_db(node_list=list(entity_type_map.keys()), filters=filters)
        to_be_deleted = self._get_delete_entities_from_search_output(search_output, data, filters)

        # TODO: Batch queries with APOC plugin
        # TODO: Add more filter support
        deleted_entities = self._delete_entities(to_be_deleted, filters["user_id"])
        added_entities = self._add_entities(to_be_added, filters["user_id"], entity_type_map)

        return {"deleted_entities": deleted_entities, "added_entities": added_entities}

添加到图数据库的操作非常简单。将messge内容中的非系统消息进行拼接，写入图中。

检索

检索过程就非常简单了，调用不同类型的数据库获取相应的信息，然后对查询结果做合并。

源码分析

检索的签名如下:

python 复制代码

 """
Search for memories.

Args:
    query (str): Query to search for.
    user_id (str, optional): ID of the user to search for. Defaults to None.
    agent_id (str, optional): ID of the agent to search for. Defaults to None.
    run_id (str, optional): ID of the run to search for. Defaults to None.
    limit (int, optional): Limit the number of results. Defaults to 100.
    filters (dict, optional): Filters to apply to the search. Defaults to None.

Returns:
    list: List of search results.
"""

主要逻辑即调用向量数据库和图数据库进行搜索

python 复制代码

 with concurrent.futures.ThreadPoolExecutor() as executor:
    future_memories = executor.submit(self._search_vector_store, query, filters, limit)
    future_graph_entities = (
        executor.submit(self.graph.search, query, filters, limit) if self.enable_graph else None
    )

    concurrent.futures.wait(
        [future_memories, future_graph_entities] if future_graph_entities else [future_memories]
    )

    original_memories = future_memories.result()
    graph_entities = future_graph_entities.result() if future_graph_entities else None

if self.enable_graph:
    return {"results": original_memories, "relations": graph_entities}

值得一提的是，在图检索的最后使用了bm25来做了rerank。

python 复制代码

def search(self, query, filters, limit=100):
        """
        Search for memories and related graph data.

        Args:
            query (str): Query to search for.
            filters (dict): A dictionary containing filters to be applied during the search.
            limit (int): The maximum number of nodes and relationships to retrieve. Defaults to 100.

        Returns:
            dict: A dictionary containing:
                - "contexts": List of search results from the base data store.
                - "entities": List of related graph data based on the query.
        """
        entity_type_map = self._retrieve_nodes_from_data(query, filters)
        search_output = self._search_graph_db(node_list=list(entity_type_map.keys()), filters=filters)

        if not search_output:
            return []

        search_outputs_sequence = [
            [item["source"], item["relatationship"], item["destination"]] for item in search_output
        ]
        bm25 = BM25Okapi(search_outputs_sequence)

        tokenized_query = query.split(" ")
        reranked_results = bm25.get_top_n(tokenized_query, search_outputs_sequence, n=5)

        search_results = []
        for item in reranked_results:
            search_results.append({"source": item[0], "relationship": item[1], "destination": item[2]})

        logger.info(f"Returned {len(search_results)} search results")

其他注意事项

并非所有文本内容都会生成记忆，因为该系统旨在识别特定类型的值得记忆的信息。有几种情况下 mem0 可能会返回一个空的记忆列表：

当用户输入定义性问题时（例如，"什么是反向传播？"）
对于不包含个人或经验信息的一般概念解释
技术定义和理论解释
没有个人背景的一般知识陈述
抽象或理论内容

最后

看完整个项目可以感觉到，mem0工程架构上其实并不复杂，主要依赖大模型、向量数据库、图数据库来让用户的偏好信息得以保存, 从而使得ai agent能够更加符合用户习惯。如果我们得Agent应用中也存在类似得需求，可以进行尝试或使用类似得方式进行复现。

最后，因为同时存在异步写多个库，个人感觉可能会存在一致性问题。如果业务要实际使用，需要评估下影响。

参考文档

docs.mem0.ai/overview