memU怎么处理记忆的

https://deepwiki.com/search/_744e0876-fa6d-4963-8e01-09dfd4237487?mode=fast

MemU 通过三层架构（Resource → Item → Category）处理记忆：原始资源经提取生成离散记忆项，再按主题聚合并生成可自演进的分类摘要；支持批量记忆化与 CRUD 增量更新，并提供 RAG 与 LLM 两种检索方式 1 2 。

记忆处理流程概览

1. 三层架构

Resource ：原始多模态数据（对话、文档、图像等） 3 。
Item ：从资源提取的离散记忆单元（偏好、技能、事件等） 4 。
Category ：按主题聚合的文本记忆，带可演进摘要（如 preferences.md） 5 。

2. 记忆化（memorize）

入口：service.memorize(resource_url, modality, user) 6 。
流程：提取 → 分类 → 持久化与索引 7 。
自演进：在 persist_index 步骤触发 _update_category_summaries，批量更新分类摘要 8 。

3. 增量更新（CRUD）

创建/更新/删除记忆项会构建 category_updates（before/after），在 _patch_persist_and_index 中调用 _patch_category_summaries 进行增量演进 9 10 。

4. 检索（retrieve）

RAG：基于嵌入向量的快速检索，返回相似度分数 11 。
LLM：基于语义理解的深度检索，支持查询重写与早停 12 。

5. 示例

内存与 PostgreSQL 测试均演示：memorize → retrieve（RAG/LLM）的完整链路 13 14 。

Notes

自演进机制通过 LLM 重写分类摘要实现，支持"仅添加/更新"策略，避免显式删除 15 。
增量更新可返回 need_update: false 以跳过无意义的改写，节省成本 16 。

Wiki pages you might want to explore:

Self-Evolving Memory (NevaMind-AI/memU)

Citations

File: README.md (L34-47)

markdown 复制代码

MemU organizes memory using a **three-layer architecture** inspired by hierarchical storage systems:

<img width="100%" alt="structure" src="assets/structure.png" />

| Layer | Description | Examples |
|-------|-------------|----------|
| **Resource** | Raw multimodal data warehouse | JSON conversations, text documents, images, videos |
| **Item** | Discrete extracted memory units | Individual preferences, skills, opinions, habits |
| **Category** | Aggregated textual memory with summaries | `preferences.md`, `work_life.md`, `relationships.md` |

**Key Benefits:**
- **Full Traceability**: Track from raw data → items → categories and back
- **Progressive Summarization**: Each layer provides increasingly abstracted views
- **Flexible Organization**: Categories evolve based on content patterns

File: docs/SERVICE_API.md (L30-35)

markdown 复制代码

`MemoryService` is the main entry point for the memU memory system. It provides methods to:

- **Memorize**: Extract and store structured memories from various resource types (conversations, documents, images, videos, audio)
- **Retrieve**: Query and retrieve relevant memories using RAG or LLM-based ranking
- **CRUD**: Create, read, update, and delete individual memory items
- **Pipeline Customization**: Extend and modify the internal processing pipelines

File: docs/SERVICE_API.md (L71-145)

markdown 复制代码

### memorize

Extract and store structured memories from a resource.

```python
async def memorize(
    self,
    *,
    resource_url: str,
    modality: str,
    summary_prompt: str | None = None,
    user: dict[str, Any] | None = None,
) -> dict[str, Any]

Parameters

Parameter	Type	Required	Description
`resource_url`	`str`	Yes	URL or local path to the resource to process.
`modality`	`str`	Yes	Type of resource: `"conversation"`, `"document"`, `"image"`, `"video"`, or `"audio"`.
`summary_prompt`	`str	None`	No
`user`	`dict[str, Any]	None`	No

Returns

python 复制代码

{
    "resource": {
        "id": str,
        "url": str,
        "modality": str,
        "local_path": str,
        "caption": str | None,
        "created_at": str,
        "updated_at": str,
        # ... user scope fields
    },
    "items": [
        {
            "id": str,
            "resource_id": str,
            "memory_type": str,  # "profile", "event", "knowledge", "behavior"
            "summary": str,
            "created_at": str,
            "updated_at": str,
            # ... user scope fields
        },
        ...
    ],
    "categories": [
        {
            "id": str,
            "name": str,
            "description": str,
            "summary": str | None,
            # ... user scope fields
        },
        ...
    ],
    "relations": [
        {
            "item_id": str,
            "category_id": str,
            # ... user scope fields
        },
        ...
    ]
}

Example

python 复制代码

result = await service.memorize(

File: docs/SERVICE_API.md (L153-160)

markdown 复制代码

    print(f"  [{item['memory_type']}] {item['summary'][:80]}...")

retrieve

Query and retrieve relevant memories.

复制代码

**File:** docs/SERVICE_API.md (L161-168)
```markdown

```python
async def retrieve(
    self,
    queries: list[dict[str, Any]],
    where: dict[str, Any] | None = None,
) -> dict[str, Any]


**File:** src/memu/app/memorize.py (L63-93)
```python
    async def memorize(
        self,
        *,
        resource_url: str,
        modality: str,
        user: dict[str, Any] | None = None,
    ) -> dict[str, Any]:
        ctx = self._get_context()
        store = self._get_database()
        user_scope = self.user_model(**user).model_dump() if user is not None else None
        await self._ensure_categories_ready(ctx, store, user_scope)

        memory_types = self._resolve_memory_types()

        state: WorkflowState = {
            "resource_url": resource_url,
            "modality": modality,
            "memory_types": memory_types,
            "categories_prompt_str": self._category_prompt_str,
            "ctx": ctx,
            "store": store,
            "category_ids": list(ctx.category_ids),
            "user": user_scope,
        }

        result = await self._run_workflow("memorize", state)
        response = cast(dict[str, Any] | None, result.get("response"))
        if response is None:
            msg = "Memorize workflow failed to produce a response"
            raise RuntimeError(msg)
        return response

File: src/memu/app/memorize.py (L280-288)

python 复制代码

    async def _memorize_persist_and_index(self, state: WorkflowState, step_context: Any) -> WorkflowState:
        llm_client = self._get_step_llm_client(step_context)
        await self._update_category_summaries(
            state.get("category_updates", {}),
            ctx=state["ctx"],
            store=state["store"],
            llm_client=llm_client,
        )
        return state

File: src/memu/app/patch.py (L354-362)

python 复制代码

    async def _patch_persist_and_index(self, state: WorkflowState, step_context: Any) -> WorkflowState:
        llm_client = self._get_step_llm_client(step_context)
        await self._patch_category_summaries(
            state.get("category_updates", {}),
            ctx=state["ctx"],
            store=state["store"],
            llm_client=llm_client,
        )
        return state

File: src/memu/app/crud.py (L425-475)

python 复制代码

    async def _patch_update_memory_item(self, state: WorkflowState, step_context: Any) -> WorkflowState:
        memory_id = state["memory_id"]
        memory_payload = state["memory_payload"]
        ctx = state["ctx"]
        store = state["store"]
        user = state["user"]
        category_memory_updates: dict[str, tuple[Any, Any]] = {}

        item = store.memory_item_repo.get_item(memory_id)
        if not item:
            msg = f"Memory item with id {memory_id} not found"
            raise ValueError(msg)
        old_content = item.summary
        old_item_categories = store.category_item_repo.get_item_categories(memory_id)
        mapped_old_cat_ids = [cat.category_id for cat in old_item_categories]

        if memory_payload["content"]:
            embed_payload = [memory_payload["content"]]
            content_embedding = (await self._get_llm_client().embed(embed_payload))[0]
        else:
            content_embedding = None

        if memory_payload["type"] or memory_payload["content"]:
            item = store.memory_item_repo.update_item(
                item_id=memory_id,
                memory_type=memory_payload["type"],
                summary=memory_payload["content"],
                embedding=content_embedding,
            )
        new_cat_names = memory_payload["categories"]
        mapped_new_cat_ids = self._map_category_names_to_ids(new_cat_names, ctx)

        cats_to_remove = set(mapped_old_cat_ids) - set(mapped_new_cat_ids)
        cats_to_add = set(mapped_new_cat_ids) - set(mapped_old_cat_ids)
        for cid in cats_to_remove:
            store.category_item_repo.unlink_item_category(memory_id, cid)
            category_memory_updates[cid] = (old_content, None)
        for cid in cats_to_add:
            store.category_item_repo.link_item_category(memory_id, cid, user_data=dict(user or {}))
            category_memory_updates[cid] = (None, item.summary)

        if memory_payload["content"]:
            for cid in set(mapped_old_cat_ids) & set(mapped_new_cat_ids):
                category_memory_updates[cid] = (old_content, item.summary)

        state.update({
            "memory_item": item,
            "category_updates": category_memory_updates,
        })
        return state

File: tests/test_inmemory.py (L23-67)

python 复制代码

    # Memorize
    print("\n[INMEMORY] Memorizing...")
    memory = await service.memorize(resource_url=file_path, modality="conversation", user={"user_id": "123"})
    for cat in memory.get("categories", []):
        print(f"  - {cat.get('name')}: {(cat.get('summary') or '')[:80]}...")

    queries = [
        {"role": "user", "content": {"text": "Tell me about preferences"}},
        {"role": "assistant", "content": {"text": "Sure, I'll tell you about their preferences"}},
        {
            "role": "user",
            "content": {"text": "What are they"},
        },  # This is the query that will be used to retrieve the memory, the context will be used for query rewriting
    ]

    # RAG-based retrieval
    print("\n[INMEMORY] RETRIEVED - RAG")
    service.retrieve_config.method = "rag"
    result_rag = await service.retrieve(queries=queries, where={"user_id": "123"})
    print("  Categories:")
    for cat in result_rag.get("categories", [])[:3]:
        print(f"    - {cat.get('name')}: {(cat.get('summary') or cat.get('description', ''))[:80]}...")
    print("  Items:")
    for item in result_rag.get("items", [])[:3]:
        print(f"    - [{item.get('memory_type')}] {item.get('summary', '')[:100]}...")
    if result_rag.get("resources"):
        print("  Resources:")
        for res in result_rag.get("resources", [])[:3]:
            print(f"    - [{res.get('modality')}] {res.get('url', '')[:80]}...")

    # LLM-based retrieval
    print("\n[INMEMORY] RETRIEVED - LLM")
    service.retrieve_config.method = "llm"
    result_llm = await service.retrieve(queries=queries, where={"user_id": "123"})
    print("  Categories:")
    for cat in result_llm.get("categories", [])[:3]:
        print(f"    - {cat.get('name')}: {(cat.get('summary') or cat.get('description', ''))[:80]}...")
    print("  Items:")
    for item in result_llm.get("items", [])[:3]:
        print(f"    - [{item.get('memory_type')}] {item.get('summary', '')[:100]}...")
    if result_llm.get("resources"):
        print("  Resources:")
        for res in result_llm.get("resources", [])[:3]:
            print(f"    - [{res.get('modality')}] {res.get('url', '')[:80]}...")

File: tests/test_postgres.py (L31-75)

python 复制代码

    # Memorize
    print("\n[POSTGRES] Memorizing...")
    memory = await service.memorize(resource_url=file_path, modality="conversation", user={"user_id": "123"})
    for cat in memory.get("categories", []):
        print(f"  - {cat.get('name')}: {(cat.get('summary') or '')[:80]}...")

    queries = [
        {"role": "user", "content": {"text": "Tell me about preferences"}},
        {"role": "assistant", "content": {"text": "Sure, I'll tell you about their preferences"}},
        {
            "role": "user",
            "content": {"text": "What are they"},
        },  # This is the query that will be used to retrieve the memory, the context will be used for query rewriting
    ]

    # RAG-based retrieval
    print("\n[POSTGRES] RETRIEVED - RAG")
    service.retrieve_config.method = "rag"
    result_rag = await service.retrieve(queries=queries, where={"user_id": "123"})
    print("  Categories:")
    for cat in result_rag.get("categories", [])[:3]:
        print(f"    - {cat.get('name')}: {(cat.get('summary') or cat.get('description', ''))[:80]}...")
    print("  Items:")
    for item in result_rag.get("items", [])[:3]:
        print(f"    - [{item.get('memory_type')}] {item.get('summary', '')[:100]}...")
    if result_rag.get("resources"):
        print("  Resources:")
        for res in result_rag.get("resources", [])[:3]:
            print(f"    - [{res.get('modality')}] {res.get('url', '')[:80]}...")

    # LLM-based retrieval
    print("\n[POSTGRES] RETRIEVED - LLM")
    service.retrieve_config.method = "llm"
    result_llm = await service.retrieve(queries=queries, where={"user_id": "123"})
    print("  Categories:")
    for cat in result_llm.get("categories", [])[:3]:
        print(f"    - {cat.get('name')}: {(cat.get('summary') or cat.get('description', ''))[:80]}...")
    print("  Items:")
    for item in result_llm.get("items", [])[:3]:
        print(f"    - [{item.get('memory_type')}] {item.get('summary', '')[:100]}...")
    if result_llm.get("resources"):
        print("  Resources:")
        for res in result_llm.get("resources", [])[:3]:
            print(f"    - [{res.get('modality')}] {res.get('url', '')[:80]}...")