https://deepwiki.com/search/_744e0876-fa6d-4963-8e01-09dfd4237487?mode=fast
MemU 通过三层架构(Resource → Item → Category)处理记忆:原始资源经提取生成离散记忆项,再按主题聚合并生成可自演进的分类摘要;支持批量记忆化与 CRUD 增量更新,并提供 RAG 与 LLM 两种检索方式 1 2 。
记忆处理流程概览
1. 三层架构
- Resource :原始多模态数据(对话、文档、图像等) 3 。
- Item :从资源提取的离散记忆单元(偏好、技能、事件等) 4 。
- Category :按主题聚合的文本记忆,带可演进摘要(如 preferences.md) 5 。
2. 记忆化(memorize)
- 入口:
service.memorize(resource_url, modality, user)6 。 - 流程:提取 → 分类 → 持久化与索引 7 。
- 自演进:在 persist_index 步骤触发
_update_category_summaries,批量更新分类摘要 8 。
3. 增量更新(CRUD)
- 创建/更新/删除记忆项会构建
category_updates(before/after),在_patch_persist_and_index中调用_patch_category_summaries进行增量演进 9 10 。
4. 检索(retrieve)
5. 示例
Notes
Wiki pages you might want to explore:
Citations
File: README.md (L34-47)
markdown
MemU organizes memory using a **three-layer architecture** inspired by hierarchical storage systems:
<img width="100%" alt="structure" src="assets/structure.png" />
| Layer | Description | Examples |
|-------|-------------|----------|
| **Resource** | Raw multimodal data warehouse | JSON conversations, text documents, images, videos |
| **Item** | Discrete extracted memory units | Individual preferences, skills, opinions, habits |
| **Category** | Aggregated textual memory with summaries | `preferences.md`, `work_life.md`, `relationships.md` |
**Key Benefits:**
- **Full Traceability**: Track from raw data → items → categories and back
- **Progressive Summarization**: Each layer provides increasingly abstracted views
- **Flexible Organization**: Categories evolve based on content patterns
File: docs/SERVICE_API.md (L30-35)
markdown
`MemoryService` is the main entry point for the memU memory system. It provides methods to:
- **Memorize**: Extract and store structured memories from various resource types (conversations, documents, images, videos, audio)
- **Retrieve**: Query and retrieve relevant memories using RAG or LLM-based ranking
- **CRUD**: Create, read, update, and delete individual memory items
- **Pipeline Customization**: Extend and modify the internal processing pipelines
File: docs/SERVICE_API.md (L71-145)
markdown
### memorize
Extract and store structured memories from a resource.
```python
async def memorize(
self,
*,
resource_url: str,
modality: str,
summary_prompt: str | None = None,
user: dict[str, Any] | None = None,
) -> dict[str, Any]
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
resource_url |
str |
Yes | URL or local path to the resource to process. |
modality |
str |
Yes | Type of resource: "conversation", "document", "image", "video", or "audio". |
summary_prompt |
`str | None` | No |
user |
`dict[str, Any] | None` | No |
Returns
python
{
"resource": {
"id": str,
"url": str,
"modality": str,
"local_path": str,
"caption": str | None,
"created_at": str,
"updated_at": str,
# ... user scope fields
},
"items": [
{
"id": str,
"resource_id": str,
"memory_type": str, # "profile", "event", "knowledge", "behavior"
"summary": str,
"created_at": str,
"updated_at": str,
# ... user scope fields
},
...
],
"categories": [
{
"id": str,
"name": str,
"description": str,
"summary": str | None,
# ... user scope fields
},
...
],
"relations": [
{
"item_id": str,
"category_id": str,
# ... user scope fields
},
...
]
}
Example
python
result = await service.memorize(
File: docs/SERVICE_API.md (L153-160)
markdown
print(f" [{item['memory_type']}] {item['summary'][:80]}...")
retrieve
Query and retrieve relevant memories.
**File:** docs/SERVICE_API.md (L161-168)
```markdown
```python
async def retrieve(
self,
queries: list[dict[str, Any]],
where: dict[str, Any] | None = None,
) -> dict[str, Any]
**File:** src/memu/app/memorize.py (L63-93)
```python
async def memorize(
self,
*,
resource_url: str,
modality: str,
user: dict[str, Any] | None = None,
) -> dict[str, Any]:
ctx = self._get_context()
store = self._get_database()
user_scope = self.user_model(**user).model_dump() if user is not None else None
await self._ensure_categories_ready(ctx, store, user_scope)
memory_types = self._resolve_memory_types()
state: WorkflowState = {
"resource_url": resource_url,
"modality": modality,
"memory_types": memory_types,
"categories_prompt_str": self._category_prompt_str,
"ctx": ctx,
"store": store,
"category_ids": list(ctx.category_ids),
"user": user_scope,
}
result = await self._run_workflow("memorize", state)
response = cast(dict[str, Any] | None, result.get("response"))
if response is None:
msg = "Memorize workflow failed to produce a response"
raise RuntimeError(msg)
return response
File: src/memu/app/memorize.py (L280-288)
python
async def _memorize_persist_and_index(self, state: WorkflowState, step_context: Any) -> WorkflowState:
llm_client = self._get_step_llm_client(step_context)
await self._update_category_summaries(
state.get("category_updates", {}),
ctx=state["ctx"],
store=state["store"],
llm_client=llm_client,
)
return state
File: src/memu/app/patch.py (L354-362)
python
async def _patch_persist_and_index(self, state: WorkflowState, step_context: Any) -> WorkflowState:
llm_client = self._get_step_llm_client(step_context)
await self._patch_category_summaries(
state.get("category_updates", {}),
ctx=state["ctx"],
store=state["store"],
llm_client=llm_client,
)
return state
File: src/memu/app/crud.py (L425-475)
python
async def _patch_update_memory_item(self, state: WorkflowState, step_context: Any) -> WorkflowState:
memory_id = state["memory_id"]
memory_payload = state["memory_payload"]
ctx = state["ctx"]
store = state["store"]
user = state["user"]
category_memory_updates: dict[str, tuple[Any, Any]] = {}
item = store.memory_item_repo.get_item(memory_id)
if not item:
msg = f"Memory item with id {memory_id} not found"
raise ValueError(msg)
old_content = item.summary
old_item_categories = store.category_item_repo.get_item_categories(memory_id)
mapped_old_cat_ids = [cat.category_id for cat in old_item_categories]
if memory_payload["content"]:
embed_payload = [memory_payload["content"]]
content_embedding = (await self._get_llm_client().embed(embed_payload))[0]
else:
content_embedding = None
if memory_payload["type"] or memory_payload["content"]:
item = store.memory_item_repo.update_item(
item_id=memory_id,
memory_type=memory_payload["type"],
summary=memory_payload["content"],
embedding=content_embedding,
)
new_cat_names = memory_payload["categories"]
mapped_new_cat_ids = self._map_category_names_to_ids(new_cat_names, ctx)
cats_to_remove = set(mapped_old_cat_ids) - set(mapped_new_cat_ids)
cats_to_add = set(mapped_new_cat_ids) - set(mapped_old_cat_ids)
for cid in cats_to_remove:
store.category_item_repo.unlink_item_category(memory_id, cid)
category_memory_updates[cid] = (old_content, None)
for cid in cats_to_add:
store.category_item_repo.link_item_category(memory_id, cid, user_data=dict(user or {}))
category_memory_updates[cid] = (None, item.summary)
if memory_payload["content"]:
for cid in set(mapped_old_cat_ids) & set(mapped_new_cat_ids):
category_memory_updates[cid] = (old_content, item.summary)
state.update({
"memory_item": item,
"category_updates": category_memory_updates,
})
return state
File: tests/test_inmemory.py (L23-67)
python
# Memorize
print("\n[INMEMORY] Memorizing...")
memory = await service.memorize(resource_url=file_path, modality="conversation", user={"user_id": "123"})
for cat in memory.get("categories", []):
print(f" - {cat.get('name')}: {(cat.get('summary') or '')[:80]}...")
queries = [
{"role": "user", "content": {"text": "Tell me about preferences"}},
{"role": "assistant", "content": {"text": "Sure, I'll tell you about their preferences"}},
{
"role": "user",
"content": {"text": "What are they"},
}, # This is the query that will be used to retrieve the memory, the context will be used for query rewriting
]
# RAG-based retrieval
print("\n[INMEMORY] RETRIEVED - RAG")
service.retrieve_config.method = "rag"
result_rag = await service.retrieve(queries=queries, where={"user_id": "123"})
print(" Categories:")
for cat in result_rag.get("categories", [])[:3]:
print(f" - {cat.get('name')}: {(cat.get('summary') or cat.get('description', ''))[:80]}...")
print(" Items:")
for item in result_rag.get("items", [])[:3]:
print(f" - [{item.get('memory_type')}] {item.get('summary', '')[:100]}...")
if result_rag.get("resources"):
print(" Resources:")
for res in result_rag.get("resources", [])[:3]:
print(f" - [{res.get('modality')}] {res.get('url', '')[:80]}...")
# LLM-based retrieval
print("\n[INMEMORY] RETRIEVED - LLM")
service.retrieve_config.method = "llm"
result_llm = await service.retrieve(queries=queries, where={"user_id": "123"})
print(" Categories:")
for cat in result_llm.get("categories", [])[:3]:
print(f" - {cat.get('name')}: {(cat.get('summary') or cat.get('description', ''))[:80]}...")
print(" Items:")
for item in result_llm.get("items", [])[:3]:
print(f" - [{item.get('memory_type')}] {item.get('summary', '')[:100]}...")
if result_llm.get("resources"):
print(" Resources:")
for res in result_llm.get("resources", [])[:3]:
print(f" - [{res.get('modality')}] {res.get('url', '')[:80]}...")
File: tests/test_postgres.py (L31-75)
python
# Memorize
print("\n[POSTGRES] Memorizing...")
memory = await service.memorize(resource_url=file_path, modality="conversation", user={"user_id": "123"})
for cat in memory.get("categories", []):
print(f" - {cat.get('name')}: {(cat.get('summary') or '')[:80]}...")
queries = [
{"role": "user", "content": {"text": "Tell me about preferences"}},
{"role": "assistant", "content": {"text": "Sure, I'll tell you about their preferences"}},
{
"role": "user",
"content": {"text": "What are they"},
}, # This is the query that will be used to retrieve the memory, the context will be used for query rewriting
]
# RAG-based retrieval
print("\n[POSTGRES] RETRIEVED - RAG")
service.retrieve_config.method = "rag"
result_rag = await service.retrieve(queries=queries, where={"user_id": "123"})
print(" Categories:")
for cat in result_rag.get("categories", [])[:3]:
print(f" - {cat.get('name')}: {(cat.get('summary') or cat.get('description', ''))[:80]}...")
print(" Items:")
for item in result_rag.get("items", [])[:3]:
print(f" - [{item.get('memory_type')}] {item.get('summary', '')[:100]}...")
if result_rag.get("resources"):
print(" Resources:")
for res in result_rag.get("resources", [])[:3]:
print(f" - [{res.get('modality')}] {res.get('url', '')[:80]}...")
# LLM-based retrieval
print("\n[POSTGRES] RETRIEVED - LLM")
service.retrieve_config.method = "llm"
result_llm = await service.retrieve(queries=queries, where={"user_id": "123"})
print(" Categories:")
for cat in result_llm.get("categories", [])[:3]:
print(f" - {cat.get('name')}: {(cat.get('summary') or cat.get('description', ''))[:80]}...")
print(" Items:")
for item in result_llm.get("items", [])[:3]:
print(f" - [{item.get('memory_type')}] {item.get('summary', '')[:100]}...")
if result_llm.get("resources"):
print(" Resources:")
for res in result_llm.get("resources", [])[:3]:
print(f" - [{res.get('modality')}] {res.get('url', '')[:80]}...")