memU怎么处理记忆的

https://deepwiki.com/search/_744e0876-fa6d-4963-8e01-09dfd4237487?mode=fast

MemU 通过三层架构(Resource → Item → Category)处理记忆:原始资源经提取生成离散记忆项,再按主题聚合并生成可自演进的分类摘要;支持批量记忆化与 CRUD 增量更新,并提供 RAG 与 LLM 两种检索方式 1 2


记忆处理流程概览

1. 三层架构

  • Resource :原始多模态数据(对话、文档、图像等) 3
  • Item :从资源提取的离散记忆单元(偏好、技能、事件等) 4
  • Category :按主题聚合的文本记忆,带可演进摘要(如 preferences.md5

2. 记忆化(memorize)

  • 入口:service.memorize(resource_url, modality, user) 6
  • 流程:提取 → 分类 → 持久化与索引 7
  • 自演进:在 persist_index 步骤触发 _update_category_summaries,批量更新分类摘要 8

3. 增量更新(CRUD)

  • 创建/更新/删除记忆项会构建 category_updates(before/after),在 _patch_persist_and_index 中调用 _patch_category_summaries 进行增量演进 9 10

4. 检索(retrieve)

  • RAG:基于嵌入向量的快速检索,返回相似度分数 11
  • LLM:基于语义理解的深度检索,支持查询重写与早停 12

5. 示例

  • 内存与 PostgreSQL 测试均演示:memorize → retrieve(RAG/LLM)的完整链路 13 14

Notes

  • 自演进机制通过 LLM 重写分类摘要实现,支持"仅添加/更新"策略,避免显式删除 15
  • 增量更新可返回 need_update: false 以跳过无意义的改写,节省成本 16

Wiki pages you might want to explore:

Citations

File: README.md (L34-47)

markdown 复制代码
MemU organizes memory using a **three-layer architecture** inspired by hierarchical storage systems:

<img width="100%" alt="structure" src="assets/structure.png" />

| Layer | Description | Examples |
|-------|-------------|----------|
| **Resource** | Raw multimodal data warehouse | JSON conversations, text documents, images, videos |
| **Item** | Discrete extracted memory units | Individual preferences, skills, opinions, habits |
| **Category** | Aggregated textual memory with summaries | `preferences.md`, `work_life.md`, `relationships.md` |

**Key Benefits:**
- **Full Traceability**: Track from raw data → items → categories and back
- **Progressive Summarization**: Each layer provides increasingly abstracted views
- **Flexible Organization**: Categories evolve based on content patterns

File: docs/SERVICE_API.md (L30-35)

markdown 复制代码
`MemoryService` is the main entry point for the memU memory system. It provides methods to:

- **Memorize**: Extract and store structured memories from various resource types (conversations, documents, images, videos, audio)
- **Retrieve**: Query and retrieve relevant memories using RAG or LLM-based ranking
- **CRUD**: Create, read, update, and delete individual memory items
- **Pipeline Customization**: Extend and modify the internal processing pipelines

File: docs/SERVICE_API.md (L71-145)

markdown 复制代码
### memorize

Extract and store structured memories from a resource.

```python
async def memorize(
    self,
    *,
    resource_url: str,
    modality: str,
    summary_prompt: str | None = None,
    user: dict[str, Any] | None = None,
) -> dict[str, Any]
Parameters
Parameter Type Required Description
resource_url str Yes URL or local path to the resource to process.
modality str Yes Type of resource: "conversation", "document", "image", "video", or "audio".
summary_prompt `str None` No
user `dict[str, Any] None` No
Returns
python 复制代码
{
    "resource": {
        "id": str,
        "url": str,
        "modality": str,
        "local_path": str,
        "caption": str | None,
        "created_at": str,
        "updated_at": str,
        # ... user scope fields
    },
    "items": [
        {
            "id": str,
            "resource_id": str,
            "memory_type": str,  # "profile", "event", "knowledge", "behavior"
            "summary": str,
            "created_at": str,
            "updated_at": str,
            # ... user scope fields
        },
        ...
    ],
    "categories": [
        {
            "id": str,
            "name": str,
            "description": str,
            "summary": str | None,
            # ... user scope fields
        },
        ...
    ],
    "relations": [
        {
            "item_id": str,
            "category_id": str,
            # ... user scope fields
        },
        ...
    ]
}
Example
python 复制代码
result = await service.memorize(

File: docs/SERVICE_API.md (L153-160)

markdown 复制代码
    print(f"  [{item['memory_type']}] {item['summary'][:80]}...")

retrieve

Query and retrieve relevant memories.

复制代码
**File:** docs/SERVICE_API.md (L161-168)
```markdown

```python
async def retrieve(
    self,
    queries: list[dict[str, Any]],
    where: dict[str, Any] | None = None,
) -> dict[str, Any]


**File:** src/memu/app/memorize.py (L63-93)
```python
    async def memorize(
        self,
        *,
        resource_url: str,
        modality: str,
        user: dict[str, Any] | None = None,
    ) -> dict[str, Any]:
        ctx = self._get_context()
        store = self._get_database()
        user_scope = self.user_model(**user).model_dump() if user is not None else None
        await self._ensure_categories_ready(ctx, store, user_scope)

        memory_types = self._resolve_memory_types()

        state: WorkflowState = {
            "resource_url": resource_url,
            "modality": modality,
            "memory_types": memory_types,
            "categories_prompt_str": self._category_prompt_str,
            "ctx": ctx,
            "store": store,
            "category_ids": list(ctx.category_ids),
            "user": user_scope,
        }

        result = await self._run_workflow("memorize", state)
        response = cast(dict[str, Any] | None, result.get("response"))
        if response is None:
            msg = "Memorize workflow failed to produce a response"
            raise RuntimeError(msg)
        return response

File: src/memu/app/memorize.py (L280-288)

python 复制代码
    async def _memorize_persist_and_index(self, state: WorkflowState, step_context: Any) -> WorkflowState:
        llm_client = self._get_step_llm_client(step_context)
        await self._update_category_summaries(
            state.get("category_updates", {}),
            ctx=state["ctx"],
            store=state["store"],
            llm_client=llm_client,
        )
        return state

File: src/memu/app/patch.py (L354-362)

python 复制代码
    async def _patch_persist_and_index(self, state: WorkflowState, step_context: Any) -> WorkflowState:
        llm_client = self._get_step_llm_client(step_context)
        await self._patch_category_summaries(
            state.get("category_updates", {}),
            ctx=state["ctx"],
            store=state["store"],
            llm_client=llm_client,
        )
        return state

File: src/memu/app/crud.py (L425-475)

python 复制代码
    async def _patch_update_memory_item(self, state: WorkflowState, step_context: Any) -> WorkflowState:
        memory_id = state["memory_id"]
        memory_payload = state["memory_payload"]
        ctx = state["ctx"]
        store = state["store"]
        user = state["user"]
        category_memory_updates: dict[str, tuple[Any, Any]] = {}

        item = store.memory_item_repo.get_item(memory_id)
        if not item:
            msg = f"Memory item with id {memory_id} not found"
            raise ValueError(msg)
        old_content = item.summary
        old_item_categories = store.category_item_repo.get_item_categories(memory_id)
        mapped_old_cat_ids = [cat.category_id for cat in old_item_categories]

        if memory_payload["content"]:
            embed_payload = [memory_payload["content"]]
            content_embedding = (await self._get_llm_client().embed(embed_payload))[0]
        else:
            content_embedding = None

        if memory_payload["type"] or memory_payload["content"]:
            item = store.memory_item_repo.update_item(
                item_id=memory_id,
                memory_type=memory_payload["type"],
                summary=memory_payload["content"],
                embedding=content_embedding,
            )
        new_cat_names = memory_payload["categories"]
        mapped_new_cat_ids = self._map_category_names_to_ids(new_cat_names, ctx)

        cats_to_remove = set(mapped_old_cat_ids) - set(mapped_new_cat_ids)
        cats_to_add = set(mapped_new_cat_ids) - set(mapped_old_cat_ids)
        for cid in cats_to_remove:
            store.category_item_repo.unlink_item_category(memory_id, cid)
            category_memory_updates[cid] = (old_content, None)
        for cid in cats_to_add:
            store.category_item_repo.link_item_category(memory_id, cid, user_data=dict(user or {}))
            category_memory_updates[cid] = (None, item.summary)

        if memory_payload["content"]:
            for cid in set(mapped_old_cat_ids) & set(mapped_new_cat_ids):
                category_memory_updates[cid] = (old_content, item.summary)

        state.update({
            "memory_item": item,
            "category_updates": category_memory_updates,
        })
        return state

File: tests/test_inmemory.py (L23-67)

python 复制代码
    # Memorize
    print("\n[INMEMORY] Memorizing...")
    memory = await service.memorize(resource_url=file_path, modality="conversation", user={"user_id": "123"})
    for cat in memory.get("categories", []):
        print(f"  - {cat.get('name')}: {(cat.get('summary') or '')[:80]}...")

    queries = [
        {"role": "user", "content": {"text": "Tell me about preferences"}},
        {"role": "assistant", "content": {"text": "Sure, I'll tell you about their preferences"}},
        {
            "role": "user",
            "content": {"text": "What are they"},
        },  # This is the query that will be used to retrieve the memory, the context will be used for query rewriting
    ]

    # RAG-based retrieval
    print("\n[INMEMORY] RETRIEVED - RAG")
    service.retrieve_config.method = "rag"
    result_rag = await service.retrieve(queries=queries, where={"user_id": "123"})
    print("  Categories:")
    for cat in result_rag.get("categories", [])[:3]:
        print(f"    - {cat.get('name')}: {(cat.get('summary') or cat.get('description', ''))[:80]}...")
    print("  Items:")
    for item in result_rag.get("items", [])[:3]:
        print(f"    - [{item.get('memory_type')}] {item.get('summary', '')[:100]}...")
    if result_rag.get("resources"):
        print("  Resources:")
        for res in result_rag.get("resources", [])[:3]:
            print(f"    - [{res.get('modality')}] {res.get('url', '')[:80]}...")

    # LLM-based retrieval
    print("\n[INMEMORY] RETRIEVED - LLM")
    service.retrieve_config.method = "llm"
    result_llm = await service.retrieve(queries=queries, where={"user_id": "123"})
    print("  Categories:")
    for cat in result_llm.get("categories", [])[:3]:
        print(f"    - {cat.get('name')}: {(cat.get('summary') or cat.get('description', ''))[:80]}...")
    print("  Items:")
    for item in result_llm.get("items", [])[:3]:
        print(f"    - [{item.get('memory_type')}] {item.get('summary', '')[:100]}...")
    if result_llm.get("resources"):
        print("  Resources:")
        for res in result_llm.get("resources", [])[:3]:
            print(f"    - [{res.get('modality')}] {res.get('url', '')[:80]}...")

File: tests/test_postgres.py (L31-75)

python 复制代码
    # Memorize
    print("\n[POSTGRES] Memorizing...")
    memory = await service.memorize(resource_url=file_path, modality="conversation", user={"user_id": "123"})
    for cat in memory.get("categories", []):
        print(f"  - {cat.get('name')}: {(cat.get('summary') or '')[:80]}...")

    queries = [
        {"role": "user", "content": {"text": "Tell me about preferences"}},
        {"role": "assistant", "content": {"text": "Sure, I'll tell you about their preferences"}},
        {
            "role": "user",
            "content": {"text": "What are they"},
        },  # This is the query that will be used to retrieve the memory, the context will be used for query rewriting
    ]

    # RAG-based retrieval
    print("\n[POSTGRES] RETRIEVED - RAG")
    service.retrieve_config.method = "rag"
    result_rag = await service.retrieve(queries=queries, where={"user_id": "123"})
    print("  Categories:")
    for cat in result_rag.get("categories", [])[:3]:
        print(f"    - {cat.get('name')}: {(cat.get('summary') or cat.get('description', ''))[:80]}...")
    print("  Items:")
    for item in result_rag.get("items", [])[:3]:
        print(f"    - [{item.get('memory_type')}] {item.get('summary', '')[:100]}...")
    if result_rag.get("resources"):
        print("  Resources:")
        for res in result_rag.get("resources", [])[:3]:
            print(f"    - [{res.get('modality')}] {res.get('url', '')[:80]}...")

    # LLM-based retrieval
    print("\n[POSTGRES] RETRIEVED - LLM")
    service.retrieve_config.method = "llm"
    result_llm = await service.retrieve(queries=queries, where={"user_id": "123"})
    print("  Categories:")
    for cat in result_llm.get("categories", [])[:3]:
        print(f"    - {cat.get('name')}: {(cat.get('summary') or cat.get('description', ''))[:80]}...")
    print("  Items:")
    for item in result_llm.get("items", [])[:3]:
        print(f"    - [{item.get('memory_type')}] {item.get('summary', '')[:100]}...")
    if result_llm.get("resources"):
        print("  Resources:")
        for res in result_llm.get("resources", [])[:3]:
            print(f"    - [{res.get('modality')}] {res.get('url', '')[:80]}...")
相关推荐
开开心心就好1 小时前
实用PDF批量加马赛克,抹除敏感信息绿色版
java·linux·开发语言·网络·人工智能·pdf·word2vec
沐曦股份MetaX1 小时前
【智算芯闻】具身智能的新范式:利用AI智能体加速机器人学习技能
人工智能·机器人
乾元1 小时前
模型提取:黑盒环境下如何窃取对手的 AI 模型参数
网络·人工智能·安全·web安全·机器学习·架构·系统架构
志栋智能1 小时前
智能巡检自动化解决方案:从“人海战术”到“AI智巡”的效能革命
大数据·运维·人工智能·网络安全·云原生·自动化
志栋智能1 小时前
AI驱动的带内自动化巡检:编织IT生态的“智慧神经网络”
大数据·运维·网络·人工智能·神经网络·自动化
加速财经2 小时前
WEEX BUILDERS阿姆斯特丹站落幕:真实市场环境下的AI 量化实践探索
大数据·人工智能
Clarence Liu2 小时前
用大白话讲解人工智能(7) 卷积神经网络(CNN):AI怎么“看懂“图片
人工智能·神经网络·cnn
新缸中之脑2 小时前
用AI查找有趣的 Hackaday 帖子
人工智能
Suryxin.2 小时前
从0开始复现nano-vllm「model_runner-py」下半篇之核心数据编排与执行引擎调度
人工智能·pytorch·深度学习·ai·vllm