【4.1.-4.20学习周报】

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档

文章目录

摘要
Abstract
一、方法介绍
- 1.1HippoRAG
1.2HippoRAG2
二、实验
总结

摘要

本博客介绍了论文《From RAG to Memory: Non-Parametric Continual Learning for

Large Language Models》提出HippoRAG 2框架，以解决现有检索增强生成（RAG）系统在模拟人类长期记忆方面的局限。首先分析了持续学习LLMs的挑战及RAG方法的不足，如无法捕捉长期记忆的意义构建和关联性。接着介绍HippoRAG 2的离线索引和在线检索流程，通过密集 - 稀疏集成、深度上下文关联和识别记忆等改进，增强了与人类记忆机制的契合度。实验表明，HippoRAG 2在事实、意义构建和关联记忆任务上全面超越标准RAG方法，为LLMs的非参数持续学习开辟了新途径。

之前的周报我也有提到HippoRAG，HippoRAG 2是对HippoRAG进行升级和改进

Abstract

This blog presents the paper From RAG to Memory: Non-Parametric Continual Learning for Large Language Models proposes the HippoRAG 2 framework to address the limitations of existing Retrieval Enhancement Generation (RAG) systems in simulating human long-term memory. Firstly, the challenges of continuous learning LLMs and the shortcomings of RAG methods, such as the inability to capture the meaning-making and correlation of long-term memory, were analyzed. Next, the offline indexing and online retrieval process of HippoRAG 2 is introduced, which enhances the fit with human memory mechanisms through improvements such as dense-sparse integration, deep context association, and recognition memory. Experiments show that HippoRAG 2 surpasses the standard RAG method in fact, meaning construction and associative memory tasks, opening up a new way for nonparametric continuous learning of LLMs

一、方法介绍

最近提出了几个RAG框架，这些框架使用LLM来显式地构建其检索语料库，以解决这些限制。为了增强意义构建，这种结构增强RAG方法允许LLM生成摘要或知识图(KG)结构连接不同但相关的段落组，从而提高RAG系统理解更长的更复杂的话语的能力。

为了解决联想性差距，HippoRAG使用了个性化PageRank算法和LLM自动构建KG并赋予检索过程多跳推理能力的能力。

尽管这些方法在这两种更具挑战性的记忆任务中表现出了强大的性能，但要使RAG真正接近人类的长期记忆，还需要在更简单的记忆任务中进行非业务操作。为了了解这些系统是否能够实现这样的业务，研究者进行了全面的实验，不仅通过多跳QA大规模语篇理解同时评估它们的联想性和意义构建能力，还通过简单的QA任务测试它们的事实记忆能力。

如上图所示，，在所有三种基准类型上，与最强的基于嵌入的RAG方法相比，所有先前的结构增强方法都表现不佳。

研究者提出的方法，HippoRAG 2，利用HippoRAG的OpenIE和个性化PageRank (PPR)方法的优势，同时通过将段落整合到PPR图搜索过程中，解决了基于查询的上下文化限制。在KG三元组的选择中更深入地涉及查询，并在在线检索过程中使用LLM来识别检索到的三元组何时不相关。

1.1HippoRAG

在这里对之前看过的HippoRAG做一个简单的复习。HippoRAG是是一个受神经生物学启发的llm长期记忆框架，每个组件都旨在模拟人类记忆的各个方面。该框架由三个主要部分组成:人工新皮层(LLM)，海马旁区(PHR编码器)和人工海马(开放KG)。这些组成部分协作复制了人类长期记忆中观察到的相互作用。

对于HippoRAG离线索引，LLM将pass -sages处理成KG三元组，然后将其合并到工海马索引中。同时，PHR负责检测同义词以实现信息互连。对于HippoRAG在线检索，LLM新皮层从查询中提取命名实体，而PHR编码器将这些实体链接到海马体索引。然后，在KG上进行个性化PageRank (personal - personalisedPageRank, PPR)算法，进行基于上下文的检索。尽管HippoRAG试图从非参数RAG构建记忆，但其有效性受到一个关键缺陷的阻碍:以实体为中心的方法会导致索引和推理过程中的上下文丢失，以及语义匹配困难。

1.2HippoRAG2

基于HippoRAG中提出的神经生物学启发的长期记忆框架，HippoRAG 2的结构遵循类似的两阶段过程:离线索引和在线检索 .
HippoRAG 2方法。对于离线索引，使用LLM从段落中提取开放的KG三元组，并将同义词检测应用于短语节点。这些短语和段落一起构成了开放式的KG。对于在线检索，嵌入模型对段落和三元组进行评分，以识别个性化PageRank (PPR)算法的两种类型的种子节点。识别记忆使用LLM过滤最上面的三元组。然后，PPR算法在KG上执行基于上下文的检索，为最终的QA任务提供最相关的段落。上面KG节点的不同颜色反映了它们的概率质量;深色表示PPR过程引起的可能性较高。

HippoRAG2在原有的基础上通过引入以下几个层面来增强与人类长期记忆的关联，:1)它在开放的KG中无缝地集成了概念和上下文信息，增强了构建索引的全面性和原子性。2)通过利用KG结构超越孤立的KG节点，它促进了更多的上下文感知检索。3)它结合了识别记忆，以改进图搜索的种子节点选择。

受人脑中观察到的密集-稀疏整合的启发，研究者将短语节点作为提取概念的稀疏编码形式，同时将密集编码纳入我们的KG中，以表示这些概念产生的上下文。首先，采用了一种编码方法------类似于短语的编码方式，使用嵌入模型。然后在KG中以特定的方式集成这两种类型的编码。与HippoRAG中的文档集成(简单地从图搜索嵌入匹配中聚合分数)不同，通过引入通道节点来增强KG，实现上下文信息的更无缝集成。这种方法保留了与HippoRAG相同的离线索引过程，同时在构建过程中使用与通道相关的附加节点和边来丰富图结构。具体来说，语料库中的每个段落都被视为一个段落节点，上下文边缘标记为"contains"，将该段落与从该段落衍生出来的所有短语连接起来。

Recognition Memory ：回忆和识别是人类记忆检索中的两个互补过程。回忆涉及在没有外部线索的情况下主动检索信息，而识别则依赖于在外部刺激的帮助下识别信息。受此启发，我们将查询到三重检索建模为一个两步过程。1) Query to Triple:使用嵌入模型检索图的top-k个三元组T。2)三重过滤:使用llm来过滤检索到的T并生成三元组T′⊆T。

二、实验

2.1实验概况

研究者选择了三种比较方法作为基线:1)经典寻回犬BM25 (、和GTR (Ni et al.， 2022)。2)在BEIR排行榜上表现良好的大型嵌入模型、，包括阿里巴巴- nlp /GTE-Qwen2-7B-Instruct 、、GritLM/GritLM- 7b 、和nvidia/NV-Embed-v2 、。3)结构增强RAG方法，包括RAPTOR 、GraphRAG 、LightRAG ()和Hip-poRAG 。

数据集 ：为了评估RAG系统在增强联想性和意义构建的同时保留

事实记忆的能力，选择了对应于三种关键挑战类型的数据集:1)简单QA主要评估准确回忆和检索事实知识的能力。2)多跳QA(Multi-hop QA)通过要求模型连接多

个信息来得出答案来衡量关联性。3)话语理解(discourseunderstanding)通过测试的方法来评估意义构建对冗长、复杂的叙述进行解释和推理的能力。

2.2实验代码

完整项目代码：实验代码

以下展示HippoRAG2的关键实现代码：

python 复制代码

import torch
from transformers import AutoModel, AutoTokenizer, pipeline
import networkx as nx
import numpy as np

# Load models
llm_model_name = "meta-llama/Llama-3.3-70B-Instruct"
encoder_model_name = "nvidia/NV-Embed-v2"
llm_tokenizer = AutoTokenizer.from_pretrained(llm_model_name)
llm_model = AutoModel.from_pretrained(llm_model_name)
encoder_tokenizer = AutoTokenizer.from_pretrained(encoder_model_name)
encoder_model = AutoModel.from_pretrained(encoder_model_name)

# Offline Indexing
def offline_indexing(corpus):
    """
    Build the knowledge graph from the corpus during offline indexing.
    
    Args:
        corpus (list): List of text passages.
    
    Returns:
        kg (nx.DiGraph): Knowledge graph with phrase and passage nodes.
        phrase_to_passage (dict): Mapping of phrases to passage IDs.
    """
    kg = nx.DiGraph()
    phrase_to_passage = {}
    
    for passage_id, passage in enumerate(corpus):
        # Extract triples using LLM
        triples = extract_triples(passage)
        for triple in triples:
            subject, relation, object_ = triple
            kg.add_edge(subject, object_, relation=relation)
            phrase_to_passage[subject] = passage_id
            phrase_to_passage[object_] = passage_id
        
        # Add passage node and connect to phrases
        passage_node = f"passage_{passage_id}"
        kg.add_node(passage_node, type="passage", content=passage)
        for triple in triples:
            kg.add_edge(passage_node, triple[0], relation="contains")
            kg.add_edge(passage_node, triple[2], relation="contains")
    
    # Synonym detection
    phrases = [node for node in kg.nodes if not node.startswith("passage_")]
    if phrases:
        phrase_embeddings = get_embeddings(phrases)
        for i in range(len(phrases)):
            for j in range(i + 1, len(phrases)):
                similarity = cosine_similarity(phrase_embeddings[i], phrase_embeddings[j])
                if similarity > 0.8:  # Synonym threshold
                    kg.add_edge(phrases[i], phrases[j], relation="synonym")
                    kg.add_edge(phrases[j], phrases[i], relation="synonym")
    
    return kg, phrase_to_passage

def extract_triples(passage):
    """
    Extract triples from a passage using an LLM (simplified placeholder).
    
    Args:
        passage (str): Text passage.
    
    Returns:
        list: List of (subject, relation, object) triples.
    """
    # Placeholder: In practice, use OpenIE with LLM
    return [("example_subject", "example_relation", "example_object")]

def get_embeddings(texts):
    """
    Generate embeddings for a list of texts using the encoder.
    
    Args:
        texts (list): List of text strings.
    
    Returns:
        torch.Tensor: Embeddings tensor.
    """
    inputs = encoder_tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
    with torch.no_grad():
        embeddings = encoder_model(**inputs).last_hidden_state.mean(dim=1)
    return embeddings

def cosine_similarity(a, b):
    """
    Compute cosine similarity between two vectors.
    
    Args:
        a (np.ndarray): First vector.
        b (np.ndarray): Second vector.
    
    Returns:
        float: Cosine similarity score.
    """
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Online Retrieval
def online_retrieval(query, kg, phrase_to_passage, corpus):
    """
    Perform online retrieval for a given query.
    
    Args:
        query (str): User query.
        kg (nx.DiGraph): Knowledge graph.
        phrase_to_passage (dict): Mapping of phrases to passage IDs.
        corpus (list): List of text passages.
    
    Returns:
        str: Generated answer.
    """
    # Query to triples
    query_embedding = get_embeddings([query])[0]
    triples = [(s, o, d['relation']) for s, o, d in kg.edges(data=True) if d['relation'] != "contains" and d['relation'] != "synonym"]
    if triples:
        triple_texts = [f"{s} {r} {o}" for s, r, o in triples]
        triple_embeddings = get_embeddings(triple_texts)
        similarities = [cosine_similarity(query_embedding, emb) for emb in triple_embeddings]
        top_k_indices = np.argsort(similarities)[-5:][::-1]  # Top-5 triples
        top_k_triples = [triples[i] for i in top_k_indices]
    else:
        top_k_triples = []
    
    # Recognition memory filtering
    filtered_triples = filter_triples(query, top_k_triples)
    
    # Personalized PageRank
    if filtered_triples:
        seed_phrases = set()
        for triple in filtered_triples:
            seed_phrases.add(triple[0])
            seed_phrases.add(triple[2])
        seed_passages = [node for node in kg.nodes if node.startswith("passage_")]
        seed_nodes = list(seed_phrases) + seed_passages
        
        # Assign reset probabilities
        reset_probabilities = {}
        phrase_total = len(seed_phrases)
        passage_total = len(seed_passages)
        for node in seed_phrases:
            reset_probabilities[node] = 0.95 / phrase_total if phrase_total > 0 else 0
        for node in seed_passages:
            reset_probabilities[node] = 0.05 / passage_total if passage_total > 0 else 0
        
        ppr_scores = nx.pagerank(kg, personalization=reset_probabilities, alpha=0.5)
        ranked_passages = sorted(
            [(node, score) for node, score in ppr_scores.items() if node.startswith("passage_")],
            key=lambda x: x[1],
            reverse=True
        )
        top_passage_ids = [int(node.split("_")[1]) for node, _ in ranked_passages[:5]]
    else:
        # Fallback to embedding-based retrieval
        passage_embeddings = get_embeddings(corpus)
        similarities = [cosine_similarity(query_embedding, emb) for emb in passage_embeddings]
        top_passage_ids = np.argsort(similarities)[-5:][::-1]
    
    # Generate answer
    context = " ".join([corpus[i] for i in top_passage_ids])
    answer = generate_answer(query, context)
    return answer

def filter_triples(query, triples):
    """
    Filter triples based on relevance to the query using LLM (simplified placeholder).
    
    Args:
        query (str): User query.
        triples (list): List of (subject, relation, object) triples.
    
    Returns:
        list: Filtered list of triples.
    """
    # Placeholder: In practice, use LLM with prompts as in Appendix A
    return triples  # Simplified: return all triples

def generate_answer(query, context):
    """
    Generate an answer using the LLM based on query and context (simplified placeholder).
    
    Args:
        query (str): User query.
        context (str): Retrieved context.
    
    Returns:
        str: Generated answer.
    """
    # Placeholder: In practice, use LLM for QA
    return "Generated answer based on context"

# Example usage
if __name__ == "__main__":
    corpus = [
        "Example passage 1 about some topic.",
        "Example passage 2 with related information.",
        # Add more passages as needed
    ]
    kg, phrase_to_passage = offline_indexing(corpus)
    query = "What is the topic discussed in the passages?"
    answer = online_retrieval(query, kg, phrase_to_passage, corpus)
    print(f"Query: {query}")
    print(f"Answer: {answer}")

2.3实验结果

可以从上图看到，hippoRAG2在所有QA数据集上的表现均超过了基线模型以及之前的hippoRAG。

研究者针对所提出的连接方法、图构建方法和三重滤波方法设计了烧蚀实验，结果如上图所示。每个引入的机制都促进了HippoRAG 2。首先，具有更深上下文化的

链接方法带来了显著的性能提升。值得注意的是，没有对ner_to -node或查询to-node方法应用过滤过程;然而，无论是否应用过滤，查询到三重方法的性能始

终优于其他两种链接策略。与ner_to -node相比，查询-to-triple平均提高Recall12.5%。此外，查询到节点并不比ner_to -node具有优势，因为查询和KG节点在不

同的粒度级别上操作，而NER结果和KG节点都对应于短语级表示。

总结

基于HippoRAG，引入深度段落整合、上下文感知检索和识别记忆机制，使知识图谱更好融合概念与上下文信息。

在多类记忆任务上超越现有方法，尤其在联想记忆任务上比先进嵌入模型提升7%。

识别记忆中三元组过滤的精度有提升空间，部分样本过滤后无匹配短语或无三元组，需依赖密集检索结果。

个性化PageRank图搜索组件在部分样本中表现不佳，虽识别关键短语，但难以返回理想结果。