前言
在ai agent应用中,某些场景下可能需要记住用户的喜好,从而根据用户的一些偏好来对用户问题的回答进行调整。那么如何记住用户的偏好呢?今天介绍一种实现方式:mem0。它通过智能存储层增强人工智能助手和代理,实现个性化的人工智能交互。接下来我们将分析一下它的实现方式。
简介
mem0通过为人工智能助手提供持久的上下文记忆来增强它们。使用 mem0 的人工智能系统会随着时间的推移积极地从用户交互中学习并适应。mem0 的记忆层将大语言模型与基于向量的存储相结合。大语言模型从对话中提取并处理关键信息,而向量存储能够实现高效的语义搜索和记忆检索。这种架构帮助人工智能代理将过去的交互与当前上下文联系起来,以获得更相关的响应。
Feature
- 记忆处理:使用大语言模型自动从对话中提取并存储重要信息,同时保持完整的上下文。
- 记忆管理:持续更新并解决存储信息中的矛盾,以保持准确性。
- 双重存储架构:结合向量数据库进行记忆存储和图数据库进行关系跟踪。
- 智能检索系统:采用语义搜索和图查询,根据重要性和新近度找到相关记忆。
- 简单的 API 集成:提供易于使用的端点,用于添加(add)和检索(search)记忆。
使用场景
- 客户支持聊天机器人:创建能够记住客户历史、偏好和过去交互的支持代理,以提供个性化的帮助。
- 个人人工智能导师:构建能够跟踪学生进度、适应学习模式并提供上下文相关帮助的教育助手。
- 医疗保健应用程序:开发能够维护患者历史并提供个性化护理建议的医疗保健助手。
- 企业知识管理:为能够从组织交互中学习并维护机构知识的系统提供动力。
- 个性化人工智能助手:创建能够学习用户偏好并随着时间的推移调整其响应的助手。
API使用示例
使用python语言接入mem0的接入过程如下:
python
from openai import OpenAI
from mem0 import Memory
openai_client = OpenAI()
memory = Memory()
def chat_with_memories(message: str, user_id: str = "default_user") -> str:
# Retrieve relevant memories
relevant_memories = memory.search(query=message, user_id=user_id, limit=3)
memories_str = "\n".join(f"- {entry['memory']}" for entry in relevant_memories["results"])
# Generate Assistant response
system_prompt = f"You are a helpful AI. Answer the question based on query and memories.\nUser Memories:\n{memories_str}"
messages = [{"role": "system", "content": system_prompt}, {"role": "user", "content": message}]
response = openai_client.chat.completions.create(model="gpt-4o-mini", messages=messages)
assistant_response = response.choices[0].message.content
# Create new memories from the conversation
messages.append({"role": "assistant", "content": assistant_response})
memory.add(messages, user_id=user_id)
return assistant_response
def main():
print("Chat with AI (type 'exit' to quit)")
while True:
user_input = input("You: ").strip()
if user_input.lower() == 'exit':
print("Goodbye!")
break
print(f"AI: {chat_with_memories(user_input)}")
if __name__ == "__main__":
main()
流程
根据上面的使用示例我们可以看到,一个简单的使用流程如下:
查询时,chat app会使用user_id和message来mem0做查询,将查询到的历史上下文信息和用户输入一起提交给llm,最终由llm结合所有信息来给出符合用户喜好的答案。
架构
整体架构上看,mem0的的构成比较简单,底层由向量数据库和图数据库构成,基于此上层由内存管理服务对记忆数据进行管理。接下来我们看下它基于此架构的一些核心操作流程。
添加
在添加流程中主要经历以下几步。
信息提取
这一步中,主要利用LLM来提取对话中需要memory的内容,同时,识别出消息中有用的fact信息。
冲突解决
这一步中,系统会识别出新信息和已存在信息之间的冲突,并进行解决。这一步的实现也是由LLM完成
记忆存储
这一步中,使用向量数据库存储实际的记忆,使用图数据库来维持实体对象关系信息。在每一次的交互过程中,这些信息都会被不断的更新。
源码分析
代码中的签名信息如下:
python
"""
Create a new memory.
Args:
messages (str or List[Dict[str, str]]): Messages to store in the memory.
user_id (str, optional): ID of the user creating the memory. Defaults to None.
agent_id (str, optional): ID of the agent creating the memory. Defaults to None.
run_id (str, optional): ID of the run creating the memory. Defaults to None.
metadata (dict, optional): Metadata to store with the memory. Defaults to None.
filters (dict, optional): Filters to apply to the search. Defaults to None.
prompt (str, optional): Prompt to use for memory deduction. Defaults to None.
Returns:
dict: A dictionary containing the result of the memory addition operation.
result: dict of affected events with each dict has the following key:
'memories': affected memories
'graph': affected graph memories
'memories' and 'graph' is a dict, each with following subkeys:
'add': added memory
'update': updated memory
'delete': deleted memory
"""
核心代码逻辑如下:
ini
# 前面是一些参数校验,多模态消息处理
with concurrent.futures.ThreadPoolExecutor() as executor:
future1 = executor.submit(self._add_to_vector_store, messages, metadata, filters)
future2 = executor.submit(self._add_to_graph, messages, filters)
concurrent.futures.wait([future1, future2])
vector_store_result = future1.result()
graph_result = future2.result()
主流程的代码如下:
ini
def _add_to_vector_store(self, messages, metadata, filters):
#构造给大模型的处理消息,让大模型从消息中获取到memory需要的内容
.....
#调用大模型进行处理,获得一些事实信息
response = self.llm.generate_response(
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
response_format={"type": "json_object"},
)
try:
response = remove_code_blocks(response)
new_retrieved_facts = json.loads(response)["facts"]
except Exception as e:
logging.error(f"Error in new_retrieved_facts: {e}")
new_retrieved_facts = []
#检索已存在的历史消息,对新的fact信息进行embeding,然后通过向量数据库搜索查询找到相似的历史记忆
retrieved_old_memory = []
new_message_embeddings = {}
for new_mem in new_retrieved_facts:
messages_embeddings = self.embedding_model.embed(new_mem, "add")
new_message_embeddings[new_mem] = messages_embeddings
existing_memories = self.vector_store.search(
query=messages_embeddings,
limit=5,
filters=filters,
)
for mem in existing_memories:
retrieved_old_memory.append({"id": mem.id, "text": mem.payload["data"]})
unique_data = {}
for item in retrieved_old_memory:
unique_data[item["id"]] = item
retrieved_old_memory = list(unique_data.values())
# 再次调用大模型去处理新旧memory的冲突
function_calling_prompt = get_update_memory_messages(retrieved_old_memory, new_retrieved_facts)
try:
new_memories_with_actions = self.llm.generate_response(
messages=[{"role": "user", "content": function_calling_prompt}],
response_format={"type": "json_object"},
)
except Exception as e:
logging.error(f"Error in new_memories_with_actions: {e}")
new_memories_with_actions = []
try:
new_memories_with_actions = remove_code_blocks(new_memories_with_actions)
new_memories_with_actions = json.loads(new_memories_with_actions)
except Exception as e:
logging.error(f"Invalid JSON response: {e}")
new_memories_with_actions = []
# 后续针对action对向量数据库进行操作
添加到向量数据库
python
def _create_memory(self, data, existing_embeddings, metadata=None):
logging.info(f"Creating memory with {data=}")
if data in existing_embeddings:
embeddings = existing_embeddings[data]
else:
embeddings = self.embedding_model.embed(data, "add")
memory_id = str(uuid.uuid4())
metadata = metadata or {}
metadata["data"] = data
metadata["hash"] = hashlib.md5(data.encode()).hexdigest()
metadata["created_at"] = datetime.now(pytz.timezone("US/Pacific")).isoformat()
self.vector_store.insert(
vectors=[embeddings],
ids=[memory_id],
payloads=[metadata],
)
self.db.add_history(memory_id, None, data, "ADD", created_at=metadata["created_at"])
capture_event("mem0._create_memory", self, {"memory_id": memory_id})
return memory_id
可以看到在写会向量数据库的时候,同时会记录一份历史到db中,这样在db中也会存在一份memory的变化记录(感觉可以用于恢复?但双写存储不在一个事务里,一致性没法保证。)
添加到图数据库
python
def _add_to_graph(self, messages, filters):
added_entities = []
if self.enable_graph:
if filters.get("user_id") is None:
filters["user_id"] = "user"
data = "\n".join([msg["content"] for msg in messages if "content" in msg and msg["role"] != "system"])
added_entities = self.graph.add(data, filters)
return added_entities
# graph的 add操作
def add(self, data, filters):
"""
Adds data to the graph.
Args:
data (str): The data to add to the graph.
filters (dict): A dictionary containing filters to be applied during the addition.
"""
entity_type_map = self._retrieve_nodes_from_data(data, filters)
to_be_added = self._establish_nodes_relations_from_data(data, filters, entity_type_map)
search_output = self._search_graph_db(node_list=list(entity_type_map.keys()), filters=filters)
to_be_deleted = self._get_delete_entities_from_search_output(search_output, data, filters)
# TODO: Batch queries with APOC plugin
# TODO: Add more filter support
deleted_entities = self._delete_entities(to_be_deleted, filters["user_id"])
added_entities = self._add_entities(to_be_added, filters["user_id"], entity_type_map)
return {"deleted_entities": deleted_entities, "added_entities": added_entities}
添加到图数据库的操作非常简单。将messge内容中的非系统消息进行拼接,写入图中。
检索
检索过程就非常简单了,调用不同类型的数据库获取相应的信息,然后对查询结果做合并。
源码分析
检索的签名如下:
python
"""
Search for memories.
Args:
query (str): Query to search for.
user_id (str, optional): ID of the user to search for. Defaults to None.
agent_id (str, optional): ID of the agent to search for. Defaults to None.
run_id (str, optional): ID of the run to search for. Defaults to None.
limit (int, optional): Limit the number of results. Defaults to 100.
filters (dict, optional): Filters to apply to the search. Defaults to None.
Returns:
list: List of search results.
"""
主要逻辑即调用向量数据库和图数据库进行搜索
python
with concurrent.futures.ThreadPoolExecutor() as executor:
future_memories = executor.submit(self._search_vector_store, query, filters, limit)
future_graph_entities = (
executor.submit(self.graph.search, query, filters, limit) if self.enable_graph else None
)
concurrent.futures.wait(
[future_memories, future_graph_entities] if future_graph_entities else [future_memories]
)
original_memories = future_memories.result()
graph_entities = future_graph_entities.result() if future_graph_entities else None
if self.enable_graph:
return {"results": original_memories, "relations": graph_entities}
值得一提的是,在图检索的最后使用了bm25来做了rerank。
python
def search(self, query, filters, limit=100):
"""
Search for memories and related graph data.
Args:
query (str): Query to search for.
filters (dict): A dictionary containing filters to be applied during the search.
limit (int): The maximum number of nodes and relationships to retrieve. Defaults to 100.
Returns:
dict: A dictionary containing:
- "contexts": List of search results from the base data store.
- "entities": List of related graph data based on the query.
"""
entity_type_map = self._retrieve_nodes_from_data(query, filters)
search_output = self._search_graph_db(node_list=list(entity_type_map.keys()), filters=filters)
if not search_output:
return []
search_outputs_sequence = [
[item["source"], item["relatationship"], item["destination"]] for item in search_output
]
bm25 = BM25Okapi(search_outputs_sequence)
tokenized_query = query.split(" ")
reranked_results = bm25.get_top_n(tokenized_query, search_outputs_sequence, n=5)
search_results = []
for item in reranked_results:
search_results.append({"source": item[0], "relationship": item[1], "destination": item[2]})
logger.info(f"Returned {len(search_results)} search results")
其他注意事项
并非所有文本内容都会生成记忆,因为该系统旨在识别特定类型的值得记忆的信息。有几种情况下 mem0 可能会返回一个空的记忆列表:
-
当用户输入定义性问题时(例如,"什么是反向传播?")
-
对于不包含个人或经验信息的一般概念解释
-
技术定义和理论解释
-
没有个人背景的一般知识陈述
-
抽象或理论内容
最后
看完整个项目可以感觉到,mem0工程架构上其实并不复杂,主要依赖大模型、向量数据库、图数据库来让用户的偏好信息得以保存, 从而使得ai agent能够更加符合用户习惯。如果我们得Agent应用中也存在类似得需求,可以进行尝试或使用类似得方式进行复现。
最后,因为同时存在异步写多个库,个人感觉可能会存在一致性问题。如果业务要实际使用,需要评估下影响。