深度思考RAG实战：从零构建Agent驱动的多步骤检索增强系统（附完整代码+收藏指南）

文章介绍了一种解决传统RAG系统局限性的"深度思考RAG"架构，通过Agent驱动的多步骤处理（规划、检索、反思、批评和合成）处理复杂查询。系统包含工具感知计划器、多阶段检索漏斗、自我批评机制和流程控制策略，使用LangGraph构建有状态工作流，能自适应选择检索策略并整合多源信息。评估显示，与传统RAG相比，其在上下文精确度、召回率和答案正确性方面显著提升，并探讨了使用马尔可夫决策过程优化策略模型的未来方向。

本文较长，建议点赞收藏。更多AI大模型应用开发学习视频及资料，在智泊AI。

基础的规划、检索、反思、批评、合成等

RAG系统经常失败，不是因为LLM缺乏智能，而是因为其架构过于简单。它试图用线性的一次性方法处理循环的多步骤问题。

许多复杂查询需要推理、反思，以及关于何时采取行动的明智决策，就像我们面对问题时如何检索信息一样。这就是RAG管道中Agent驱动操作的作用所在。让我们看看典型的深度思考RAG管道是什么样的...

深度思考RAG管道（由Fareed Khan创建）

规划： 首先，Agent将复杂的用户查询分解为结构化的多步骤研究计划，决定每个步骤需要哪种工具（内部文档搜索或网络搜索）。
检索： 对于每个步骤，它执行自适应、多阶段检索漏斗，使用监督器动态选择最佳搜索策略（向量、关键词或混合）。
精化： 然后使用高精度交叉编码器重新排序初始结果，并使用蒸馏器代理将最佳证据压缩为简洁的上下文。
反思： 每个步骤后，Agent总结其发现并更新其研究历史，建立对问题的累积理解。
批评： 然后策略代理检查此历史记录，做出战略决策：继续到下一个研究步骤、修订计划如果遇到死胡同，或者完成。
合成： 一旦研究完成，最终代理将来自所有来源的所有收集证据合成为单一、全面且可引用的答案。

在这篇博客中，我们将实现整个深度思考RAG管道 ，并与基本RAG管道进行比较，以展示它如何解决复杂多跳查询。

所有代码+理论都在如下的GitHub仓库中可用：

github.com/FareedKhan-...: 一个解决复杂查询的深度思考RAG管道

• 设置环境
• 知识库来源
• 理解我们的多源、多跳查询
• 构建一个会失败的浅层RAG管道
• 为中央代理系统定义RAG状态
• 战略规划和查询制定
∘ 使用工具感知计划器分解问题
∘ 使用查询重写器代理优化检索
∘ 元数据感知分块的精确性
• 创建多阶段检索漏斗
∘ 使用监督器动态选择策略
∘ 使用混合、关键词和语义搜索的广泛召回
∘ 使用交叉编码器重新排序器的高精度
∘ 使用上下文蒸馏合成
• 使用网络搜索增强知识
• 自我批评和流程控制策略
∘ 更新和反思累积研究历史
∘ 构建策略代理进行流程控制
• 定义图节点
• 定义条件边
• 连接深度思考RAG机器
• 编译和可视化迭代工作流
• 运行深度思考管道
• 分析最终的高质量答案
• 对比比较
• 评估框架和分析结果
• 总结我们的整个管道
• 使用马尔可夫决策过程(MDP)的学习策略

设置环境

在开始编码深度RAG管道之前，我们需要从坚实的基础开始，因为生产级AI系统不仅关乎最终算法，还关乎我们在设置过程中做出的深思熟虑的选择。

我们要实现的每个步骤对于确定最终系统的有效性和可靠性都很重要。

当我们开始开发管道并对其进行试错时，最好将配置定义为简单的字典格式，因为稍后当管道变得复杂时，我们可以简单地参考此字典来更改配置并查看其对整体性能的影响。

plaintext 复制代码

# 配置字典来管理所有系统参数config = {    "data_dir": "./data",                           
# 存储原始和清洁数据的目录    "vector_store_dir": "./vector_store",           
# 持久化我们向量存储的目录    "llm_provider": "openai",                       
# 我们使用的LLM提供商    "reasoning_llm": "gpt-4o",                      
# 用于规划和综合的强大的模型    "fast_llm": "gpt-4o-mini",                      
# 用于简单任务（如基线RAG）的更快、更便宜的模型    "embedding_model": "text-embedding-3-small",    # 用于创建文档嵌入的模型    "reranker_model": "cross-encoder/ms-marco-MiniLM-L-6-v2", 
# 用于精确重新排序的模型    "max_reasoning_iterations": 7,                  
# 防止代理进入无限循环的安全措施    "top_k_retrieval": 10,                          
# 初始广泛召回的文档数量    "top_n_rerank": 3,                              
# 精确重新排序后保留的文档数量}

这些键很容易理解，但有三个键值得提及：

• llm_provider：这是我们使用的LLM提供商，在这种情况下是OpenAI。我使用OpenAI是因为我们可以在LangChain中轻松交换模型和提供商，但您可以选择任何适合您需求的提供商，如Ollama。
• reasoning_llm：这必须是我们整个设置中最强大的，因为它将用于规划和综合。
• fast_llm：这应该是一个更快、更便宜的模型，因为它将用于简单任务，如基线RAG。

现在需要导入我们将在整个管道中使用到的库，并将api密钥设置为环境变量，以避免在代码块中暴露它。

plaintext 复制代码

import os                  
# 用于与操作系统交互（例如，管理环境变量）import re                  
# 用于正则表达式操作，对文本清理有用import json                
# 用于处理JSON数据from getpass import getpass 
# 安全地提示用户输入，如API密钥，而不回显到屏幕from pprint import pprint   
# 用于美化打印Python对象，使它们更易读import uuid                
# 生成唯一标识符from typing import List, Dict, TypedDict, Literal, Optional 
# 用于类型提示，以创建干净、可读和可维护的代码# 辅助函数，如果环境变量尚未存在，则安全地设置环境变量def _set_env(var: str):    
# 检查环境变量是否尚未设置    if not os.environ.get(var):        
# 如果没有，安全地提示用户输入        os.environ[var] = getpass(f"Enter your {var}: ")
# 设置我们将使用的服务的API密钥_set_env("OPENAI_API_KEY")      
# 用于访问OpenAI模型（GPT-4o，嵌入）_set_env("LANGSMITH_API_KEY")   
# 用于LangSmith的跟踪和调试_set_env("TAVILY_API_KEY")      
# 用于网络搜索工具# 启用LangSmith跟踪以获取我们代理执行的详细日志和可视化os.environ["LANGSMITH_TRACING"] = "true"
# 在LangSmith中定义项目名称以组织我们的运行os.environ["LANGSMITH_PROJECT"] = "Advanced-Deep-Thinking-RAG"

还将启用LangSmith进行跟踪。当您使用具有复杂、循环工作流的代理系统时，跟踪不再是可有可无的------它很重要。它帮助您可视化正在发生的事情，并使调试代理的思考过程变得容易得多。

知识库来源

生产级RAG系统需要既复杂又苛刻的知识库，以真正展示其有效性。为此，我们将使用NVIDIA的2023年10-K文件，这是一份超过一百页的综合性文件，详细描述了公司业务运营、财务业绩和披露的风险因素。

知识库来源（由Fareed Khan创建）

首先，我们将实现一个自定义函数 ，直接从SEC EDGAR数据库以编程方式下载10-K文件，解析原始HTML，并将其转换为适合我们RAG管道摄取的清洁、结构化文本格式。让我们编写该函数。

plaintext 复制代码

import requests # 用于下载文档的HTTP请求from bs4 import BeautifulSoup # 解析HTML和XML文档的强大库from langchain.docstore.document import Document # LangChain的文本标准数据结构def download_and_parse_10k(url, doc_path_raw, doc_path_clean):    # 检查清洁文件是否已经存在以避免重新下载    if os.path.exists(doc_path_clean):        print(f"Cleaned 10-K file already exists at: {doc_path_clean}")        return    print(f"Downloading 10-K filing from {url}...")    # 设置User-Agent标头以模拟浏览器，因为某些服务器阻止脚本    headers = {'User-Agent': 'Mozilla/5.0'}    # 向URL发出GET请求    response = requests.get(url, headers=headers)    # 如果下载失败（例如，404 Not Found）则引发错误    response.raise_for_status()    # 将原始HTML内容保存到文件中进行检查    with open(doc_path_raw, 'w', encoding='utf-8') as f:        f.write(response.text)    print(f"Raw document saved to {doc_path_raw}")    # 使用BeautifulSoup解析和清洁HTML内容    soup = BeautifulSoup(response.content, 'html.parser')    # 从常见HTML标签中提取文本，尝试保留段落结构    text = ''    for p in soup.find_all(['p', 'div', 'span']):        # 从每个标签中获取文本，剥离额外空白，并添加换行符        text += p.get_text(strip=True) + '\n\n'    # 使用regex清理过度的换行符和空格以获得更清洁的最终文本    clean_text = re.sub(r'\n{3,}', '\n\n', text).strip() # 将3+换行符折叠为2    clean_text = re.sub(r'\s{2,}', ' ', clean_text).strip() # 将2+空格折叠为1    # 将最终的清洁文本保存到.txt文件    with open(doc_path_clean, 'w', encoding='utf-8') as f:        f.write(clean_text)    print(f"Cleaned text content extracted and saved to {doc_path_clean}")

代码很容易理解，我们使用beautifulsoup4来解析HTML内容并提取文本。它将帮助我们轻松导航HTML结构并检索相关信息，同时忽略任何不必要的元素，如脚本或样式。

现在，让我们执行这个函数并看看它如何工作。

plaintext 复制代码

print("Downloading and parsing NVIDIA's 2023 10-K filing...")# 执行下载和解析函数download_and_parse_10k(url_10k, doc_path_raw, doc_path_clean)# 打开清洁文件并打印样本以验证结果with open(doc_path_clean, 'r', encoding='utf-8') as f:    print("\n--- Sample content from cleaned 10-K ---")    print(f.read(1000) + "...")#### 输出 ####Downloading and parsing NVIDIA 2023 10-K filing...Successfully downloaded 10-K filing from https://www.sec.gov/Archives/edgar/data/1045810/000104581023000017/nvda-20230129.htmRaw document saved to ./data/nvda_10k_2023_raw.htmlCleaned text content extracted and saved to ./data/nvda_10k_2023_clean.txt# --- Sample content from cleaned 10-K ---Item 1. Business. OVERVIEW NVIDIA is the pioneer of accelerated computing. We are a full-stack computing company with a platform strategy that brings together hardware, systems, software, algorithms, libraries, and services to create unique value for the markets we serve. Our work in accelerated computing and AI is reshaping the worlds largest industries and profoundly impacting society. Founded in 1993, we started as a PC graphics chip company, inventing the graphics processing unit, or GPU. The GPU was essential for the growth of the PC gaming market and has since been repurposed to revolutionize computer graphics, high performance computing, or HPC, and AI. The programmability of our GPUs made them ...

我们只是调用这个函数，将所有内容存储在一个txt文件中，它将作为我们rag管道的上下文。

当我们运行上面的代码时，您可以看到它开始为我们下载报告，我们可以看到下载内容样本的样子。

理解我们的多源、多跳查询

为了测试我们实现的管道并与基本RAG进行比较，我们需要使用一个非常复杂的查询，涵盖我们正在处理的文档的不同方面。

plaintext 复制代码

我们的复杂查询："基于NVIDIA的2023年10-K文件，识别其与竞争相关的关键风险。然后，找到关于AMD的AI芯片策略的最新消息（2024年文件发布后），并解释这个新策略如何直接解决或加剧NVIDIA所述的风险之一。"

让我们分解为什么这个查询对于标准RAG管道如此困难：

1. 多跳推理： 它不能在单一步骤中回答。系统必须首先识别风险，然后找到AMD新闻，最后综合两者。
1. 多源知识： 所需信息位于两个完全不同的地方。风险在我们的静态、内部文档（10-K）中，而AMD新闻是外部的，需要访问实时网络。
1. 综合和分析： 查询不要求简单的信息列表。它要求解释一组事实如何使另一组事实变得更糟，这是一个需要真正综合的任务。

在下一节中，我们将实现基本RAG管道，实际看到简单RAG如何失败。

构建一个会失败的浅层RAG管道

现在我们已经配置了环境并准备好具有挑战性的知识库，下一个逻辑步骤是构建一个标准的vanilla RAG管道。这服务于一个关键目的...

首先构建最简单的可能解决方案，我们可以针对它运行我们的复杂查询，并准确观察它如何和为什么失败。

这是我们在这个部分要做的事情：

浅层RAG管道（由Fareed Khan创建）

• 加载和分块文档： 我们将摄取我们清洁的10-K文件并将其分割为小的、固定大小的块，这是一种常见但语义上naive的方法。
• 创建向量存储： 然后我们将对这些块进行嵌入并在ChromaDB向量存储中索引它们，以启用基本语义搜索。
• 组装RAG链： 我们将使用LangChain表达式语言（LCEL），它将把我们检索器、提示模板和LLM连接成线性管道。
• 演示关键失败： 我们将在这个简单系统上执行我们的多跳、多源查询，并分析其不充分的响应。

首先，我们需要加载我们的清洁文档并分割它。我们将使用RecursiveCharacterTextSplitter，这是LangChain生态系统中的标准工具。

plaintext 复制代码

from langchain_community.document_loaders import TextLoader # .txt文件的简单加载器from langchain.text_splitter import RecursiveCharacterTextSplitter # 标准文本分割器print("Loading and chunking the document...")# 使用清洁10-K文件的路径初始化加载器loader = TextLoader(doc_path_clean, encoding='utf-8')# 将文档加载到内存中documents = loader.load()# 使用定义的块大小和重叠初始化文本分割器# chunk_size=1000：每个块将大约1000个字符长。# chunk_overlap=150：每个块将与前一个块共享150个字符以保持一些上下文。text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)# 将加载的文档分割为更小、可管理的块doc_chunks = text_splitter.split_documents(documents)print(f"Document loaded and split into {len(doc_chunks)} chunks.")#### 输出 ####Loading and chunking the document...Document loaded and split into 378 chunks.

我们的主文档有378个块，下一步是使它们可搜索。为此，我们需要创建向量嵌入并将它们存储在数据库中。我们将使用ChromaDB，一个流行的内存向量存储，以及OpenAI text-embedding-3-small模型，如我们的配置中定义的。

plaintext 复制代码

from langchain_community.vectorstores import Chroma # 我们将使用的向量存储from langchain_openai import OpenAIEmbeddings # 创建嵌入的函数print("Creating baseline vector store...")# 使用我们配置中指定的模型初始化嵌入函数embedding_function = OpenAIEmbeddings(model=config['embedding_model'])# 从我们的文档块创建Chroma向量存储# 这个过程获取每个块，为其创建嵌入并索引它。baseline_vector_store = Chroma.from_documents(    documents=doc_chunks,    embedding=embedding_function)# 从向量存储创建检索器# 检索器是实际执行搜索的组件。# search_kwargs={"k": 3}：这告诉检索器为任何给定查询返回前3个最相关的块。baseline_retriever = baseline_vector_store.as_retriever(search_kwargs={"k": 3})print(f"Vector store created with {baseline_vector_store._collection.count()} embeddings.")#### 输出 ####Creating baseline vector store...Vector store created with 378 embeddings.

Chroma.from_documents组织这个过程并将所有向量存储在可搜索索引中。最后一步是使用LangChain表达式语言（LCEL）将它们组装成单个可运行的RAG链。

这个链将定义数据的线性流：从用户问题到检索器，然后到提示，最后到LLM。

plaintext 复制代码

from langchain_core.prompts import ChatPromptTemplate # 用于创建提示模板from langchain_openai import ChatOpenAI # OpenAI聊天模型接口from langchain_core.runnable import RunnablePassthrough # 在链中传递输入的工具from langchain_core.output_parsers import StrOutputParser # 将LLM的输出解析为简单字符串# 这个模板指示LLM如何表现。# {context}：这是我们将从检索文档中注入内容的地方。# {question}：这是用户原始问题将要去的地方。template = """You are an AI financial analyst. Answer the question based only on the following context:{context}Question: {question}"""prompt = ChatPromptTemplate.from_template(template)# 我们使用我们的'fast_llm'来完成这个简单任务，如我们配置中定义的llm = ChatOpenAI(model=config["fast_llm"], temperature=0)# 一个辅助函数来将检索文档列表格式化为单个字符串def format_docs(docs):    return "\n\n---\n\n".join(doc.page_content for doc in docs)# 使用LCEL的管道(|)语法定义的完整RAG链baseline_rag_chain = (    # 第一步是定义我们提示输入的字典    {"context": baseline_retriever | format_docs, "question": RunnablePassthrough()}    # 上下文是通过获取问题、将其传递给检索器并将结果格式化生成的    # 原始问题通过不变地传递    | prompt # 然后字典被传递给提示模板    | llm # 格式化的提示传递给语言模型    | StrOutputParser() # LLM的输出消息被解析为字符串)

您知道我们将字典定义为第一步。其context键由子链填充，输入问题转到baseline_retriever，其输出（Document对象列表）由format_docs格式化为单个字符串。question键通过使用RunnablePassthrough简单传递原始输入来填充。

让我们运行这个简单管道并理解它在哪里失败。

plaintext 复制代码

from rich.console import Console # 用于美化的markdown输出打印from rich.markdown import Markdown# 初始化rich控制台以获得更好的输出格式化console = Console()# 我们的复杂、多跳、多源查询complex_query_adv = "Based on NVIDIA's 2023 10-K filing, identify their key risks related to competition. Then, find recent news (post-filing, from 2024) about AMD's AI chip strategy and explain how this new strategy directly addresses or exacerbates one of NVIDIA's stated risks."print("Executing complex query on the baseline RAG chain...")# 使用我们具有挑战性的查询调用链baseline_result = baseline_rag_chain.invoke(complex_query_adv)console.print("\n--- BASELINE RAG FAILED OUTPUT ---")# 使用markdown格式化打印结果以提高可读性console.print(Markdown(baseline_result))

当您运行上面的代码时，我们得到以下输出。

plaintext 复制代码

#### 输出 ####Executing complex query on the baseline RAG chain...--- BASELINE RAG FAILED OUTPUT ---Based on the provided context, NVIDIA operates in an intensely competitive semiconductorindustry and faces competition from companies like AMD. The context mentionsthat the industry is characterized by rapid technological change. However, the provided documents do not contain any specific information about AMD's recent AI chip strategy from 2024 or how it might impact NVIDIA's stated risks.

您可能注意到这个失败的RAG管道及其输出中有三件事。

• 不相关上下文：检索器抓取了关于"NVIDIA" 、"竞争" 和 "AMD" 的一般块，但错过了具体的2024年AMD策略详情。
• 缺失信息： 关键失败是2023年数据无法覆盖2024年事件。系统没有意识到它缺少关键信息。
• 无规划或工具使用： 将复杂查询视为简单查询。无法将其分解为步骤或使用网络搜索等工具来填补空白。

系统失败不是因为LLM愚蠢，而是因为架构过于简单。它是一个线性的一次性过程，试图解决循环的多步骤问题。

现在我们了解了基本RAG管道的问题，我们可以开始实施我们的深度思考方法，看看它如何很好地解决我们的复杂查询。

为中央代理系统定义RAG状态

为了构建我们的推理代理，我们首先需要一种方法来管理其状态。在我们的简单RAG链中，每个步骤都是无状态的，但是...

然而，智能代理需要记忆。它需要记住原始问题、它创建的计划以及它迄今收集的证据。

RAG状态（由Fareed Khan创建）

RAGState将作为中央记忆，在我们的LangGraph工作流的每个节点之间传递。为了构建它，我们将定义一系列结构化数据类，从最基础的构建块开始：研究计划中的单个步骤。

我们想要定义我们代理计划的原子单元。每个Step必须包含不仅是一个要回答的问题，还有其背后的推理，关键的是，代理应该使用的具体工具。这迫使代理的规划过程变得明确和结构化。

plaintext 复制代码

from langchain_core.documents import Documentfrom langchain_core.pydantic_v1 import BaseModel, Field# 代理推理计划中单个步骤的Pydantic模型class Step(BaseModel):    # 这个研究步骤的具体、可回答的子问题    sub_question: str = Field(description="A specific, answerable question for this step.")    # 代理为什么这个步骤是必要的理由    justification: str = Field(description="A brief explanation of why this step is necessary to answer the main query.")    # 这个步骤使用的具体工具：内部文档搜索或外部网络搜索    tool: Literal["search_10k", "search_web"] = Field(description="The tool to use for this step.")    # 关键关键词列表以提高搜索的准确性    keywords: List[str] = Field(description="A list of critical keywords for searching relevant document sections.")    # (可选)可能更精确、有过滤的搜索内的文档部分    document_section: Optional[str] = Field(description="A likely document section title (e.g., 'Item 1A. Risk Factors') to search within. Only for 'search_10k' tool.")

我们的Step类，使用Pydantic BaseModel，充当我们计划器代理的严格合同。tool: Literal[...]字段强制LLM在我们内部知识（search_10k）或寻求外部信息（search_web）之间做出具体决策。

这种结构化输出比试图解析自然语言计划要可靠得多。

现在我们已经定义了单个Step，我们需要容器来保存整个步骤序列。我们将创建一个Plan类，它只是Step对象的列表。这代表代理完整的端到端研究策略。

plaintext 复制代码

# 整体计划的Pydantic模型，即单个步骤的列表class Plan(BaseModel):    # 详细、多步骤计划来回答用户查询的Step对象列表    steps: List[Step] = Field(description="A detailed, multi-step plan to answer the user's query.")

我们编写了一个Plan类，它将为整个研究过程提供结构。当我们调用我们的计划器代理时，我们将要求它返回符合此架构的JSON对象。这确保代理策略在采取任何检索行动之前是清晰、顺序和机器可读的。

接下来，随着我们的代理执行其计划，它需要一种方式来记住它所学到的内容。我们将定义一个PastStep字典来存储每个已完成步骤的结果。这将形成代理的研究历史 或实验室笔记本。

plaintext 复制代码

# TypedDict来存储我们研究历史中已完成步骤的结果class PastStep(TypedDict):    step_index: int              # 已完成步骤的索引（例如，1, 2, 3）    sub_question: str            # 在此步骤中解决的子问题    retrieved_docs: List[Document] # 为此步骤检索和重新排序的精确文档    summary: str                 # 代理对这一步发现的单句总结

这个PastStep结构对于代理的自我批评循环至关重要。每个步骤后，我们将填写这些字典之一并将其添加到我们的状态中。然后代理将能够查看这个不断增长的摘要列表，以了解它知道什么并决定它是否有足够的信息来完成其任务。

最后，我们将把所有这些片段带到大主RAGState字典中。这是将流经我们整个图的中央对象，保存原始查询、完整计划、过去步骤的历史记录以及当前正在执行的步骤的所有中间数据。

plaintext 复制代码

# 主要状态字典，将在LangGraph代理的所有节点之间传递class RAGState(TypedDict):    original_question: str     # 启动过程的来自用户的初始、复杂查询    plan: Plan                 # 计划器代理生成的多步骤计划    past_steps: List[PastStep] # 已完成研究步骤及其发现的累积历史    current_step_index: int    # 当前正在执行的计划中步骤的索引    retrieved_docs: List[Document] # 当前步骤中检索的文档（广泛召回的结果）    reranked_docs: List[Document]  # 当前步骤中精确重新排序后的文档    synthesized_context: str   # 从重新排序的文档生成的简洁、蒸馏的上下文    final_answer: str          # 用户原始问题的最终、合成答案

这个RAGState``TypedDict是我们代理的完整心智。我们图中的每个节点都将接收这个字典作为输入，并返回其更新版本作为输出。

例如，plan_node将填充plan字段，retriever_node将填充retrieved_docs字段，等等。这种共享的持久状态是使我们简单RAG链缺乏的复杂、迭代推理成为可能的原因。

随着代理记忆的蓝图现在定义，我们准备构建我们系统的第一个认知组件：计划器代理，它将填充这个状态。

战略规划和查询制定

随着我们的RAGState定义，我们现在可以构建我们代理的第一个、可以说是最关键的认知组件：其规划能力。这是我们的系统从简单数据获取器跳到真正推理引擎的地方。与其天真地将用户的复杂查询视为单个搜索，我们的代理将首先暂停、思考并构建详细的、逐步的研究策略。

战略规划（由Fareed Khan创建）

本节分为三个关键工程步骤：

• 工具感知计划器： 我们将构建一个LLM驱动的代理，其唯一工作是将用户查询分解为结构化的Plan对象，决定每个步骤使用哪种工具。
• 查询重写器： 我们将创建一个专门的代理，将计划器的简单子问题转化为高度有效、优化的搜索查询。
• 元数据感知分块： 我们将重新处理我们的源文档以添加部分级元数据，这是一个关键步骤，释放高精度、过滤检索。

使用工具感知计划器分解问题

所以，基本上我们想要构建我们操作的大脑。当这个大脑获得复杂问题时，它需要做的第一件事是制定策略。

分解步骤（由Fareed Khan创建）

我们不能只是把整个问题扔给我们的数据库并希望最好。我们需要教代理如何将问题分解为更小、可管理的片段。

为此，我们将创建一个专门的计划器代理。我们需要给它一组非常清晰的指令或提示，告诉它确切地做什么工作。

plaintext 复制代码

from langchain_core.prompts import ChatPromptTemplatefrom langchain_openai import ChatOpenAIfrom rich.pretty import pprint as rprint# 指示LLM如何作为计划器表现的系统提示planner_prompt = ChatPromptTemplate.from_messages([\    ("system", """You are an expert research planner. Your task is to create a clear, multi-step plan to answer a complex user query by retrieving information from multiple sources.\You have two tools available:\1. `search_10k`: Use this to search for information within NVIDIA's 2023 10-K financial filing. This is best for historical facts, financial data, and stated company policies or risks from that specific time period.\2. `search_web`: Use this to search the public internet for recent news, competitor information, or any topic that is not specific to NVIDIA's 2023 10-K.\Decompose the user's query into a series of simple, sequential sub-questions. For each step, decide which tool is more appropriate.\For `search_10k` steps, also identify the most likely section of the 10-K (e.g., 'Item 1A. Risk Factors', 'Item 7. Management's Discussion and Analysis...').\It is critical to use the exact section titles found in a 10-K filing where possible."""),\    ("human", "User Query: {question}") # 用户的原始、复杂查询\])

我们基本上给LLM一个新角色：专家研究规划师 。我们明确告诉它它有两个工具可供使用（search_10k和search_web）并指导何时使用每一个。这是"工具感知"部分。

我们不只是要求一个计划，而是要求它创建一个直接映射到我们已经构建的能力的计划。

现在我们可以启动推理模型并将其与我们的提示链接。这里非常重要的一步是告诉LLM其最终输出必须是我们Pydantic Plan类的格式。这使输出结构化和可预测。

plaintext 复制代码

# 初始化我们强大的推理模型，如配置中定义的reasoning_llm = ChatOpenAI(model=config["reasoning_llm"], temperature=0)# 通过将提示管道传输到LLM并指示它使用我们结构化的'Plan'输出来创建计划器代理planner_agent = planner_prompt | reasoning_llm.with_structured_output(Plan)print("Tool-Aware Planner Agent created successfully.")# 让我们用我们的复杂查询测试计划器代理以查看其输出print("\n--- Testing Planner Agent ---")test_plan = planner_agent.invoke({"question": complex_query_adv})# 使用rich的美化打印来清晰、可读地显示Pydantic对象rprint(test_plan)

我们获取我们的planner_prompt，将其管道传输到我们强大的reasoning_llm，然后使用.with_structured_output(Plan)方法。这告诉LangChain使用模型函数调用能力将其响应格式化为完美匹配我们Plan Pydantic模式的JSON对象。这比试图解析纯文本响应要可靠得多。

让我们看看当我们用挑战查询测试它时的输出。

plaintext 复制代码

#### 输出 ####Tool-Aware Planner Agent created successfully.--- Testing Planner Agent ---Plan(│   steps=[\│   │   Step(\│   │   │   sub_question="What are the key risks related to competition as stated in NVIDIA's 2023 10-K filing?",\│   │   │   justification="This step is necessary to extract the foundational information about competitive risks directly from the source document as requested by the user.",\│   │   │   tool='search_10k',\│   │   │   keywords=['competition', 'risk factors', 'semiconductor industry', 'competitors'],\│   │   │   document_section='Item 1A. Risk Factors'\│   │   ),\│   │   Step(\│   │   │   sub_question="What are the recent news and developments in AMD's AI chip strategy in 2024?",\│   │   │   justification="This step requires finding up-to-date, external information that is not available in the 2023 10-K filing. A web search is necessary to get the latest details on AMD's strategy.",\│   │   │   tool='search_web',\│   │   │   keywords=['AMD', 'AI chip strategy', '2024', 'MI300X', 'Instinct accelerator'],\│   │   │   document_section=None\│   │   )\│   ])

如果我们查看输出，您可以看到代理没有只是给我们一个模糊的计划，它产生了一个结构化的Plan对象。它正确地识别了查询有两个部分。

1. 对于第一部分，它知道答案在10-K中并选择了search_10k工具，甚至正确地猜测了正确的文档部分。
1. 对于第二部分，它知道"2024年的新闻"不可能在2023年文档中，并正确选择了search_web工具。这是我们的管道至少在思考过程中将给出有希望结果的第一个迹象。

使用查询重写器代理优化检索

所以，基本上我们有一个带有好子问题的计划。

但像"风险是什么？"这样的问题不是一个伟大的搜索查询。它太通用了。搜索引擎，无论它们是向量数据库还是网络搜索，都对具体、关键词丰富的查询效果最好。

查询重写器代理（由Fareed Khan创建）

为了修复这个问题，我们将构建另一个小的、专门的代理：查询重写器。它的唯一工作是将当前步骤的子问题使其更适合搜索，通过添加相关关键词和我们已经学到的上下文。

首先，让我们为这个新代理设计提示。

plaintext 复制代码

from langchain_core.output_parsers import StrOutputParser # 将LLM的输出解析为简单字符串# 我们的查询重写器的提示，指示它作为搜索专家行为query_rewriter_prompt = ChatPromptTemplate.from_messages([\    ("system", """You are a search query optimization expert. Your task is to rewrite a given sub-question into a highly effective search query for a vector database or web search engine, using keywords and context from the research plan.\The rewritten query should be specific, use terminology likely to be found in the target source (a financial 10-K or news articles), and be structured to retrieve the most relevant text snippets."""),\    ("human", "Current sub-question: {sub_question}\n\nRelevant keywords from plan: {keywords}\n\nContext from past steps:\n{past_context}")\])

我们基本上告诉这个代理作为搜索查询优化专家 行为。我们给它三个信息来处理：简单的sub_question、我们计划器已经识别的keywords以及来自任何先前研究步骤的past_context。这给了它构建一个更好的查询所需的所有原材料。

现在我们可以启动这个代理。这是一个简单的链，因为我们只需要字符串作为输出。

plaintext 复制代码

# 通过将提示管道传输到我们的推理LLM和字符串输出解析器来创建代理query_rewriter_agent = query_rewriter_prompt | reasoning_llm | StrOutputParser()print("Query Rewriter Agent created successfully.")# 让我们测试重写器代理。我们将假设我们已经完成了计划的前两个步骤。print("\n--- Testing Query Rewriter Agent ---")# 让我们想象我们在一个需要前两个上下文最终综合步骤。test_sub_q = "How does AMD's 2024 AI chip strategy potentially exacerbate the competitive risks identified in NVIDIA's 10-K?"test_keywords = ['impact', 'threaten', 'competitive pressure', 'market share', 'technological change']# 我们创建一些模拟"过去上下文"来模拟代理在真实运行中此时会知道什么。test_past_context = "Step 1 Summary: NVIDIA's 10-K lists intense competition and rapid technological change as key risks. Step 2 Summary: AMD launched its MI300X AI accelerator in 2024 to directly compete with NVIDIA's H100."# 使用我们的测试数据调用代理rewritten_q = query_rewriter_agent.invoke({    "sub_question": test_sub_q,    "keywords": test_keywords,    "past_context": test_past_context})print(f"Original sub-question: {test_sub_q}")print(f"Rewritten Search Query: {rewritten_q}")

为了正确测试这个，我们必须模拟真实场景。我们创建一个test_past_context字符串，代表代理从其计划的前两个步骤已经生成的摘要。然后我们将这个以及下一个子问题提供给我们的query_rewriter_agent。

让我们看看结果。

plaintext 复制代码

#### 输出 ####Query Rewriter Agent created successfully.--- Testing Query Rewriter Agent ---Original sub-question: How does AMD 2024 AI chip strategy potentially exacerbate the competitive risks identified in NVIDIA 10-K?Rewritten Search Query: analysis of how AMD 2024 AI chip strategy, including products like the MI300X, exacerbates NVIDIA's stated competitive risks such as rapid technological change and market share erosion in the data center and AI semiconductor industry

原始问题是给分析师的，他重写的查询是给搜索引擎的。它已被分配具体术语，如 "MI300X" 、"市场份额侵蚀" 和 "数据中心"，所有这些都是从关键词和过去上下文综合的。

像这样的查询更可能检索到完全正确的文档，使我们整个系统更准确、更高效。这个重写步骤将是我们主要代理循环的关键部分。

元数据感知分块的精确性

所以，基本上，我们的计划器代理给了我们一个好机会。它不只是说查找风险 ，它给了我们一个提示：在Item 1A.风险因素部分寻找风险。

但现在，我们的检索器不能使用那个提示。我们的向量存储只是一个大的、平的378个文本块列表。它不知道什么是"部分"。

元数据感知分块（由Fareed Khan创建）

我们需要修复这个问题。我们将从零开始重建我们的文档块。这次，对于我们创建的每个单个块，我们都将添加一个标签或标签其元数据，告诉我们的系统它来自10-K的确切部分。这将允许我们的代理稍后执行高度精确、过滤的搜索。

首先，我们需要一种方法来以编程方式在我们的原始文本文件中找到每个部分开始的位置。如果我们查看文档，我们可以看到一个清晰的模式：每个主要部分都以单词 "ITEM" 开始，后跟一个数字，如 "ITEM 1A" 或 "ITEM 7" 。这是正则表达式的完美工作。

plaintext 复制代码

# 此regex旨在在10-K文本中找到像'ITEM 1A.'或'ITEM 7.'这样的部分标题。# 它查找'ITEM'，后跟一个空格、一个数字、一个可选字母、一个句号，然后捕获标题文本。# `re.IGNORECASE | re.DOTALL`标志使搜索不区分大小写并允许'.'匹配换行符。section_pattern = r"(ITEM\\s+\\d[A-Z]?\\.\\s*.*?)(?=\\nITEM\\s+\\d[A-Z]?\\.|$)"

我们基本上创建一个模式，它将充当我们的部分检测器。它应该被设计为足够灵活以捕获不同格式，同时足够具体以不抓住错误的文本。

现在我们可以使用此模式将我们的文档切片为两个单独的列表：一个只包含部分标题，另一个包含每个部分内的内容。

plaintext 复制代码

# 我们将使用从Document对象早期加载的原始文本raw_text = documents[0].page_content# 使用re.findall应用我们的模式并将所有部分标题提取到列表中section_titles = re.findall(section_pattern, raw_text, re.IGNORECASE | re.DOTALL)# 快速清理步骤从标题中移除任何额外空白或换行符section_titles = [title.strip().replace('\\n', ' ') for title in section_titles]# 现在，使用re.split在每次部分标题发生时中断文档sections_content = re.split(section_pattern, raw_text, flags=re.IGNORECASE | re.DOTALL)# 分割结果是一个标题和内容混合的列表，所以我们过滤它以仅获取内容部分sections_content = [content.strip() for content in sections_content if content.strip() and not content.strip().lower().startswith('item ')]print(f"Identified {len(section_titles)} document sections.")# 这是关键 sanity 检查：如果标题数量与内容块数量不匹配，出了什么问题。assert len(section_titles) == len(sections_content), "Mismatch between titles and content sections"

这是解析半结构化文档的非常有效的方法。我们两次使用我们的regex模式：一次获取所有部分标题的清洁列表，再次将主文本分割为内容块列表。assert语句给我们信心，我们的解析逻辑是健全的。

好的，现在我们有了片段：标题列表和相应的内容列表。我们现在可以循环遍历它们并创建我们最终的、富含元数据的块。

plaintext 复制代码

import uuid # 我们将使用这个给每个块一个唯一ID，这是好习惯# 此列表将保存我们新的、富含元数据的文档块doc_chunks_with_metadata = []# 使用enumerate循环遍历每个部分的内容以及其标题for i, content in enumerate(sections_content):    # 获取当前内容块对应的标题    section_title = section_titles[i]    # 使用与之前相同的文本分割器，但这次，我们仅在当前部分的内容上运行它    section_chunks = text_splitter.split_text(content)    # 现在，循环遍历从此单个部分创建的较小块    for chunk in section_chunks:        # 为此特定块生成唯一ID        chunk_id = str(uuid.uuid4())        # 为此块创建新的LangChain Document对象        doc_chunks_with_metadata.append(            Document(                page_content=chunk,                # 这是最重要部分：我们附加元数据                metadata={                    "section": section_title,      # 此块所属的部分                    "source_doc": doc_path_clean,  # 文档来自哪里                    "id": chunk_id                 # 此块唯一ID                }            )        )print(f"Created {len(doc_chunks_with_metadata)} chunks with section metadata.")print("\n--- Sample Chunk with Metadata ---")# 为了证明它有效，让我们找到一个我们知道应该在'风险因素'部分的块并打印它sample_chunk = next(c for c in doc_chunks_with_metadata if "Risk Factors" in c.metadata.get("section", ""))print(sample_chunk)

这是我们升级的核心。我们逐个遍历每个部分。对于每个部分，我们创建我们的文本块。但在我们将它们添加到最终列表之前，我们创建一个metadata字典并附加section_title。这有效地标记每个单个块及其来源。

让我们看看输出并看到差异。

plaintext 复制代码

#### 输出 ####Processing document and adding metadata...Identified 22 document sections.Created 381 chunks with section metadata.--- Sample Chunk with Metadata ---Document(│   page_content='Our industry is intensely competitive. We operate in the semiconductor\\nindustry, which is intensely competitive and characterized by rapid\\ntechnological change and evolving industry standards. We compete with a number of\\ncompanies that have different business models and different combinations of\\nhardware, software, and systems expertise, many of which have substantially\\ngreater resources than we have. We expect competition to increase from existing\\ncompetitors, as well as new and emerging companies. Our competitors include\\nIntel, AMD, and Qualcomm; cloud service providers, or CSPs, such as Amazon Web\\nServices, or AWS, Google Cloud, and Microsoft Azure; and various companies\\ndeveloping or that may develop processors or systems for the AI, HPC, data\\ncenter, gaming, professional visualization, and automotive markets. Some of our\\ncustomers are also our competitors. Our business could be materially and\\nadversely affected if our competitors announce or introduce new products, services,\\nor technologies that have better performance or features, are less expensive, or\\nthat gain market acceptance.',│   metadata={│   │   'section': 'Item 1A. Risk Factors.',│   │   'source_doc': './data/nvda_10k_2023_clean.txt',│   │   'id': '...'│   })

看那个metadata块。我们以前有相同文本块现在有一个附加的上下文：'section': 'Item 1A. Risk Factors.'。

现在，当我们的代理需要找到风险时，它可以告诉检索器，"嘿，不要搜索所有381个块。只需搜索部分元数据为'Item 1A.风险因素"的块"。

这个简单的改变将我们的检索器从钝器转变为外科工具，这是构建真正生产级RAG系统的关键原则。

创建多阶段检索漏斗

到目前为止，我们已经设计了一个智能计划器并用元数据丰富了我们的文档。我们现在准备构建我们系统的核心：复杂的检索管道。

简单的一次性语义搜索不再足够。对于生产级代理，我们需要一个既自适应 又多阶段的检索过程。

我们将设计我们的检索过程作为漏斗，其中每个阶段精化前一个阶段的结果：

多阶段漏斗（由Fareed Khan创建）

• 检索监督器：我们将构建一个新的监督器代理，充当动态路由器，分析每个子问题并选择最佳搜索策略（向量、关键词或混合）。
• 阶段1（广泛召回）：我们将实施监督器可以选择的的不同检索策略，专注于广泛撒网以捕获所有可能相关的文档。
• 阶段2（高精度）：我们将使用交叉编码器模型重新排序初始结果，丢弃噪声并将最相关的文档提升到顶部。
• 阶段3（综合）：最后，我们将创建一个蒸馏器代理来将排名靠前的文档压缩为单个、简洁的上下文段落，为我们的下游代理。

使用监督器动态选择策略

所以，基本上不是所有搜索查询都相同。像"'计算&网络'细分市场的收入是多少？"这样的问题包含具体、精确的术语。基于关键词的搜索对此将非常完美。

但像"AMD和NVIDIA之间的竞争风险如何变化？"这样的问题更加概念性和语义性。这需要更多的语义理解和上下文。这种查询的向量搜索可能更好。

我们需要一个代理来分析每个子问题并决定哪种搜索策略最有效。

首先，让我们设计这个监督器代理的决策结构。我们将使用Pydantic来定义它可以做出的可能决策。

plaintext 复制代码

from langchain_core.pydantic_v1 import BaseModel, Field# 监督器代理可以做出的搜索策略决策class RetrievalStrategy(BaseModel):    # 策略必须是这些选项之一    strategy: Literal["vector_search", "keyword_search", "hybrid_search"]    # 代理必须解释为什么它做出这个选择    justification: str

这个RetrievalStrategy强制我们的监督器代理做出明确、具体的选择并解释其推理。

现在，让我们创建监督器代理的提示。

plaintext 复制代码

# 监督器代理的提示，指示它作为搜索策略专家行为retrieval_supervisor_prompt = ChatPromptTemplate.from_messages([\    ("system", """You are a search strategy optimization expert. Your task is to analyze a given query and decide the most effective search strategy for retrieving relevant information from a document collection or web search. You have three options:1. `vector_search`: Use for semantic, conceptual, or open-ended queries that benefit from understanding context and meaning. Best for questions about analysis, trends, comparisons, or concepts.2. `keyword_search`: Use for specific, exact-term queries that contain precise financial terms, product names, numbers, or technical specifications.3. `hybrid_search`: Use for queries that combine both specific terms and conceptual analysis, or when you're uncertain which approach is best. This combines both strategies.Analyze the query and choose the strategy that will yield the most relevant results."""),\    ("human", "Sub-question to analyze: {sub_question}")\])

我们基本上告诉这个代理作为搜索策略优化专家行为。我们给它三个清晰的选择并解释何时使用每一个。

现在，我们可以组装监督器代理本身。

plaintext 复制代码

# 创建监督器代理retrieval_supervisor_agent = retrieval_supervisor_prompt | reasoning_llm.with_structured_output(RetrievalStrategy)print("Retrieval Supervisor Agent created successfully.")# 让我们测试监督器代理print("\n--- Testing Retrieval Supervisor Agent ---")# 测试1：一个具体的技术查询test_query_1 = "What is the revenue for the 'Compute & Networking' segment?"decision_1 = retrieval_supervisor_agent.invoke({"sub_question": test_query_1})print(f"Query: {test_query_1}")print(f"Strategy: {decision_1.strategy}")print(f"Justification: {decision_1.justification}")print()# 测试2：一个概念性分析查询test_query_2 = "How do competitive risks in the semiconductor industry impact NVIDIA's business model?"decision_2 = retrieval_supervisor_agent.invoke({"sub_question": test_query_2})print(f"Query: {test_query_2}")print(f"Strategy: {decision_2.strategy}")print(f"Justification: {decision_2.justification}")

让我们看看这个监督器代理如何表现。

plaintext 复制代码

#### 输出 ####Retrieval Supervisor Agent created successfully.--- Testing Retrieval Supervisor Agent ---Query: What is the revenue for the 'Compute & Networking' segment?Strategy: keyword_searchJustification: This query asks for specific, exact financial data ("Compute & Networking" segment revenue) and contains precise terms that would be better matched with keyword-based search for accurate results.Query: How do competitive risks in the semiconductor industry impact NVIDIA's business model?Strategy: vector_search  Justification: This is a conceptual, analytical question about the relationship between industry factors and business impact. Vector search is better suited for understanding semantic relationships and context for this type of analytical query.

非常好！监督器代理正确分析了每个查询并选择了合适的策略。

现在我们需要实现这个代理可以实际执行的三种搜索策略。

使用混合、关键词和语义搜索的广泛召回

现在我们有了一个智能的监督器，让我们实施它可以选择的三个检索策略。

为了实现这些策略，我们需要重新创建我们带元数据的文档块的向量存储，这次使用Chroma的特殊功能来支持元数据过滤。

plaintext 复制代码

# 重新创建我们的向量存储，这次使用富含元数据的文档块print("Creating enhanced vector store with metadata...")enhanced_vector_store = Chroma.from_documents(    documents=doc_chunks_with_metadata,    embedding=embedding_function)print(f"Enhanced vector store created with {enhanced_vector_store._collection.count()} embeddings with metadata.")

现在，我们将实施三种不同的检索策略：

1. 向量搜索： 语义搜索，基于嵌入相似性，但使用元数据过滤器
1. 关键词搜索： 精确的文本匹配，基于BM25算法
1. 混合搜索： 结合向量和关键词搜索

让我们从向量搜索开始：

plaintext 复制代码

# Strategy 1: 向量搜索（语义搜索 + 元数据过滤）def vector_search_only(query: str, section_filter: Optional[str] = None, k: int = 10) -> List[Document]:    # 如果有部分过滤器，则使用它    if section_filter:        # Chroma支持在查询时使用元数据过滤器        retriever = enhanced_vector_store.as_retriever(            search_kwargs={                "k": k,                "filter": {"section": section_filter}            }        )    else:        retriever = enhanced_vector_store.as_retriever(search_kwargs={"k": k})        return retriever.invoke(query)

现在让我们添加关键词搜索，我们需要一个专门的工具。对于基于BM25的关键词搜索，我们可以使用rank_bm25包，但它更复杂。另一种方法是使用简单的正则表达式搜索。让我实现一个混合方法：

plaintext 复制代码

# Strategy 2: 关键词搜索（精确匹配）def keyword_search_only(query: str, k: int = 10) -> List[Document]:    # 将查询分解为关键词    query_terms = query.lower().split()        # 对所有文档进行简单的关键词匹配评分    scored_docs = []    for doc in doc_chunks_with_metadata:        content = doc.page_content.lower()        # 计算查询词在文档中出现的次数        score = sum(1 for term in query_terms if term in content)        if score > 0:            scored_docs.append((score, doc))        # 按分数排序并返回前k个    scored_docs.sort(key=lambda x: x[0], reverse=True)    return [doc for score, doc in scored_docs[:k]]

现在混合搜索：

plaintext 复制代码

# Strategy 3: 混合搜索（结合向量和关键词）def hybrid_search(query: str, section_filter: Optional[str] = None, k: int = 10) -> List[Document]:    # 获取向量搜索结果    vector_results = vector_search_only(query, section_filter, k=k//2)        # 获取关键词搜索结果    keyword_results = keyword_search_only(query, k=k//2)        # 合并结果并去重    combined_docs = []    seen_content = set()        # 首先添加向量结果    for doc in vector_results:        content_hash = hash(doc.page_content)        if content_hash not in seen_content:            combined_docs.append(doc)            seen_content.add(content_hash)        # 然后添加关键词结果（如果尚未存在）    for doc in keyword_results:        content_hash = hash(doc.page_content)        if content_hash not in seen_content:            combined_docs.append(doc)            seen_content.add(content_hash)        return combined_docs[:k]

让我们测试这些检索策略：

plaintext 复制代码

print("\n--- Testing Retrieval Strategies ---")# 测试查询test_query = "NVIDIA competitive risks semiconductor industry"print(f"Test Query: {test_query}")print(f"Section Filter: Item 1A. Risk Factors")# 测试向量搜索print("\n1. Vector Search Results:")vector_results = vector_search_only(test_query, "Item 1A. Risk Factors", k=3)for i, doc in enumerate(vector_results):    print(f"  {i+1}. {doc.page_content[:100]}... (from {doc.metadata['section']})")# 测试关键词搜索print("\n2. Keyword Search Results:")keyword_results = keyword_search_only(test_query, k=3)for i, doc in enumerate(keyword_results):    print(f"  {i+1}. {doc.page_content[:100]}... (from {doc.metadata['section']})")# 测试混合搜索print("\n3. Hybrid Search Results:")hybrid_results = hybrid_search(test_query, "Item 1A. Risk Factors", k=3)for i, doc in enumerate(hybrid_results):    print(f"  {i+1}. {doc.page_content[:100]}... (from {doc.metadata['section']})")

使用交叉编码器重新排序器的高精度

现在我们有了一个广泛召回，我们需要通过添加第二个阶段的检索来提高精度：高精度重新排序。

我们将使用交叉编码器模型来重新排序我们从初始检索中获得的文档。

plaintext 复制代码

from sentence_transformers import CrossEncoderprint("Loading cross-encoder reranker...")# 加载交叉编码器模型reranker = CrossEncoder(config['reranker_model'])def rerank_documents_function(query: str, documents: List[Document], top_n: int = 3) -> List[Document]:    """使用交叉编码器重新排序文档"""        if not documents:        return []        # 准备查询-文档对    pairs = [(query, doc.page_content) for doc in documents]        # 获取重新排序分数    scores = reranker.predict(pairs)        # 将分数与文档配对并按分数排序    scored_docs = list(zip(scores, documents))    scored_docs.sort(key=lambda x: x[0], reverse=True)        # 返回前top_n个    return [doc for score, doc in scored_docs[:top_n]]print("Cross-encoder reranker loaded successfully.")

使用上下文蒸馏合成

现在我们有了一个小的、高精度的文档集，我们需要第三阶段：综合。

我们将创建一个蒸馏器代理，其唯一工作是将这些排名靠前的文档压缩为单个、简洁的上下文段落，适合下游代理使用。

plaintext 复制代码

# 蒸馏器代理的提示，指示它将多个文档综合为简洁上下文distiller_prompt = ChatPromptTemplate.from_messages([\    ("system", """You are a context distillation expert. Your task is to take multiple retrieved documents and synthesize them into a single, concise, and coherent context paragraph that preserves the most important information for answering the user's sub-question. Guidelines:- Combine information from all provided documents- Eliminate redundancy and focus on the most relevant content- Maintain factual accuracy- Structure the information logically- Keep the context brief but comprehensive"""),\    ("human", "User Sub-Question: {question}\n\nRetrieved Documents:\n{context}")\])# 创建蒸馏器代理distiller_agent = distiller_prompt | reasoning_llm | StrOutputParser()print("Context Distiller Agent created successfully.")# 让我们测试蒸馏器代理print("\n--- Testing Context Distiller Agent ---")# 模拟一些检索的文档sample_docs = [    "NVIDIA operates in the intensely competitive semiconductor industry. The company faces competition from AMD, Intel, and Qualcomm.",    "Rapid technological change is a key characteristic of the semiconductor industry. NVIDIA must continuously innovate to maintain its position.",    "NVIDIA's business could be materially affected if competitors introduce better products or gain market acceptance."]# 格式化文档为字符串sample_context = "\n\n---\n\n".join(sample_docs)# 测试蒸馏distilled = distiller_agent.invoke({    "question": "What are NVIDIA's key competitive risks?",    "context": sample_context})print(f"Original Context Length: {len(sample_context)} characters")print(f"Distilled Context Length: {len(distilled)} characters")print(f"\nDistilled Context:\n{distilled}")

使用网络搜索增强知识

现在我们需要实施我们第二种工具类型：网络搜索。我们将使用Tavily搜索API来获取最新信息。

首先，让我们设置网络搜索功能：

plaintext 复制代码

from tavily import TavilyClient# 初始化Tavily客户端tavily_client = TavilyClient(api_key=os.environ.get("TAVILY_API_KEY"))def web_search_function(query: str, k: int = 5) -> List[Document]:    """使用Tavily进行网络搜索并返回格式化的文档"""        print(f"  Searching web for: {query}")        # 执行搜索    search_results = tavily_client.search(        query=query,        search_depth="advanced",  # 使用高级搜索        include_answer=True,      # 包括AI生成的答案        include_raw_content=True  # 包括原始内容    )        # 将结果转换为LangChain Document对象    documents = []        for result in search_results['results'][:k]:        doc = Document(            page_content=result.get('content', ''),            metadata={                'source': result.get('url', ''),                'title': result.get('title', ''),                'score': result.get('score', 0)            }        )        documents.append(doc)        return documentsprint("Web search function created successfully.")

让我们测试网络搜索功能：

plaintext 复制代码

print("\n--- Testing Web Search Function ---")# 测试查询web_query = "AMD AI chip strategy 2024 MI300X NVIDIA competition"web_results = web_search_function(web_query, k=3)print(f"Found {len(web_results)} web results:")for i, doc in enumerate(web_results):    print(f"  {i+1}. {doc.metadata.get('title', 'No title')}")    print(f"     URL: {doc.metadata.get('source', 'No URL')}")    print(f"     Content: {doc.page_content[:150]}...")    print()

自我批评和流程控制策略

到目前为止，我们已经构建了一个强大的研究机器。我们的代理可以创建计划、选择正确的工具并执行复杂的检索漏斗。但缺少一个关键部分：思考自己进度的能力。一个盲目遵循计划的代理，步步执行，不是真正智能的。它需要自我批评的机制。

自我批评和策略制定（由Fareed Khan创建）

这是我们构建代理自主认知核心的地方。在每个研究步骤后，我们的代理将暂停并反思。它将查看它刚找到的新信息，将其与已经知道的内容进行比较，然后做出战略决策：我的研究完成了吗，还是需要继续？

这个自我批评循环是将我们的系统从脚本工作流提升为自主代理的机制。它是允许它决定何时有足够证据自信回答用户问题的机制。

我们将使用两个新的专门代理来实施这个：

1. 反思代理：这个代理将采用已完成步骤的蒸馏上下文并创建简洁的、一句摘要。这个摘要然后被添加到我们代理的"研究历史"中。
1. 策略代理：这是主策略师。在反思后，它将检查相对于原始计划的整个研究历史并做出关键决策：CONTINUE_PLAN或FINISH。

更新和反思累积研究历史

在我们代理完成研究步骤（例如，检索和蒸馏关于NVIDIA风险的信息）后，我们不想只是继续。我们需要将此新知识整合到代理的记忆中。

反思累积（由Fareed Khan创建）

我们将构建一个反思代理 ，其唯一工作就是执行这种整合。它将采用当前步骤的丰富、蒸馏上下文并将其总结为单个、事实句子。这个摘要然后被添加到我们RAGState中的past_steps列表中。

首先，让我们为这个代理创建提示。

plaintext 复制代码

# 我们的反思代理的提示，指示它简洁和事实reflection_prompt = ChatPromptTemplate.from_messages([\    ("system", """You are a research assistant. Based on the retrieved context for the current sub-question, write a concise, one-sentence summary of the key findings.\This summary will be added to our research history. Be factual and to the point."""),\    ("human", "Current sub-question: {sub_question}\n\nDistilled context:\n{context}")\])

我们告诉这个代理作为勤奋的研究助理行为。它的任务不是创造性，而是做一个好的记录员。它读取context并写入summary。现在我们可以组装代理本身。

plaintext 复制代码

# 通过将我们的提示管道传输到推理LLM和字符串输出解析器来创建代理reflection_agent = reflection_prompt | reasoning_llm | StrOutputParser()print("Reflection Agent created.")

这个reflection_agent是我们认知循环的一部分。通过创建这些简洁摘要，它构建了清洁、易于阅读的研究历史。这个历史将是我们的下一个、最重要代理的输入：决定何时停止的代理。

构建策略代理进行流程控制

这是我们代理自主性的大脑。在reflection_agent更新研究历史后，策略代理开始发挥作用。它充当整个操作的监督员。

它的工作是查看代理知道的一切------原始问题、初始计划和已完成步骤及其摘要的完整历史------并做出高级战略决策。

策略代理（由Fareed Khan创建）

我们将从使用Pydantic模型定义其决策结构开始。

plaintext 复制代码

class Decision(BaseModel):    # 决策必须是这两个动作之一    next_action: Literal["CONTINUE_PLAN", "FINISH"]    # 代理必须证明其决策    justification: str

这个Decision类强制我们的策略代理做出明确的二元选择并解释其推理。这使其行为透明且易于调试。

接下来，我们设计将指导其决策过程的提示。

plaintext 复制代码

# 我们的策略代理的提示，指示它作为主策略师行为policy_prompt = ChatPromptTemplate.from_messages([\    ("system", """You are a master strategist. Your role is to analyze the research progress and decide the next action.\You have the original question, the initial plan, and a log of completed steps with their summaries.\- If the collected information in the Research History is sufficient to comprehensively answer the Original Question, decide to FINISH.\- Otherwise, if the plan is not yet complete, decide to CONTINUE_PLAN."""),\    ("human", "Original Question: {question}\n\nInitial Plan:\n{plan}\n\nResearch History (Completed Steps):\n{history}")\])

我们基本上要求LLM执行元分析。它不是在回答问题本身；它是在推理_研究过程的状态_。它将已经拥有的东西（history）与其需要的东西（plan和question）进行比较并做出判断。

现在，我们可以组装policy_agent。

plaintext 复制代码

# 通过将我们的提示管道传输到推理LLM并使用我们的Decision类结构化其输出来创建代理policy_agent = policy_prompt | reasoning_llm.with_structured_output(Decision)print("Policy Agent created.")# 现在，让我们用我们研究过程的两种不同状态来测试策略代理print("\n--- Testing Policy Agent (Incomplete State) ---")# 第一种状态，其中只有步骤1完成。plan_str = json.dumps([s.dict() for s in test_plan.steps])incomplete_history = "Step 1 Summary: NVIDIA's 10-K states that the semiconductor industry is intensely competitive and subject to rapid technological change."decision1 = policy_agent.invoke({"question": complex_query_adv, "plan": plan_str, "history": incomplete_history})print(f"Decision: {decision1.next_action}, Justification: {decision1.justification}")print("\n--- Testing Policy Agent (Complete State) ---")# 第二种状态，其中步骤1和步骤2都完成。complete_history = incomplete_history + "\nStep 2 Summary: In 2024, AMD launched its MI300X accelerator to directly compete with NVIDIA in the AI chip market, gaining adoption from major cloud providers."decision2 = policy_agent.invoke({"question": complex_query_adv, "plan": plan_str, "history": complete_history})print(f"Decision: {decision2.next_action}, Justification: {decision2.justification}")

为了正确测试我们的policy_agent，我们模拟我们代理生命周期的两个不同瞬间。在第一个测试中，我们提供给它一个只包含步骤1摘要的历史。在第二个中，我们提供给它步骤1和步骤2的摘要。

让我们检查它在每种情况下的决策。

plaintext 复制代码

#### 输出 ####Policy Agent created.--- Testing Policy Agent (Incomplete State) ---Decision: CONTINUE_PLAN, Justification: The research has only identified NVIDIA's competitive risks from the 10-K. It has not yet gathered the required external information about AMD's 2024 strategy, which is the next step in the plan.--- Testing Policy Agent (Complete State) ---Decision: FINISH, Justification: The research history now contains comprehensive summaries of both NVIDIA's stated competitive risks and AMD's recent AI chip strategy. All necessary information has been gathered to perform the final synthesis and answer the user's question.

让我们理解输出...

• 在不完整状态中 ，代理正确地识别出它缺少关于AMD策略的信息。它查看了其计划，看到下一步是使用网络搜索，并正确决定CONTINUE_PLAN。
• 在完整状态中 ，在给予网络搜索摘要后，它再次分析其历史。这次，它识别出它有拼图的所有部分------NVIDIA风险和AMD策略。它正确地决定其研究已完成，是时候FINISH了。

有了这个policy_agent，我们构建了我们自主系统的头脑。最后一步是将所有这些组件连接成完整的、可执行的工作流，使用LangGraph。

定义图节点

我们已经设计所有这些酷的、专门的代理。现在是时候将它们转换为工作流的实际构建块。在LangGraph中，这些构建块被称为节点。节点只是做一件特定工作的Python函数。它将代理的当前记忆（RAGState）作为输入，执行其任务，然后返回包含该记忆任何更新的字典。

我们将为代理需要采取的每个主要步骤创建一个节点。

图节点（由Fareed Khan创建）

首先，我们需要一个简单的辅助函数。由于我们的代理经常需要查看研究历史，我们想要一种清洁的方式来将past_steps列表格式化为可读字符串。

plaintext 复制代码

# 一个辅助函数来格式化研究历史以供提示def get_past_context_str(past_steps: List[PastStep]) -> str:    # 这接受PastStep字典列表并将它们连接为单个字符串。    # 每个步骤都清晰标记，以便LLM理解上下文。    return "\\n\\n".join([f"Step {s['step_index']}: {s['sub_question']}\\nSummary: {s['summary']}" for s in past_steps])

我们基本上创建一个工具，将在几个节点中使用，为我们的提示提供历史上下文。

现在对于我们第一个真实节点：plan_node。这是我们代理推理的起点。它唯一的工作是调用我们之前创建的planner_agent并填充我们RAGState中的plan字段。

plaintext 复制代码

# 节点1：计划器def plan_node(state: RAGState) -> Dict:    console.print("--- 🧠: Generating Plan ---")    # 我们调用我们之前创建的计划器代理，传入用户的原始问题。    plan = planner_agent.invoke({"question": state["original_question"]})    rprint(plan)    # 我们返回一个包含我们RAGState更新的字典。    # LangGraph将自动将其合并到主状态中。    return {"plan": plan, "current_step_index": 0, "past_steps": []}

这个节点启动一切。它从状态中获取original_question，获取plan，然后初始化current_step_index为0（以从第一个步骤开始）并清除past_steps历史以进行此新运行。

接下来，我们需要实际去查找信息的节点。由于我们的计划器可以在两种工具之间选择，我们需要两个单独的检索节点。让我们从retrieval_node开始，用于搜索我们内部的10-K文档。

plaintext 复制代码

# 节点2a：从10-K文档检索def retrieval_node(state: RAGState) -> Dict:    # 首先，获取计划中当前步骤的详细信息。    current_step_index = state["current_step_index"]    current_step = state["plan"].steps[current_step_index]    console.print(f"--- 🔍: Retrieving from 10-K (Step {current_step_index + 1}: {current_step.sub_question}) ---")    # 使用我们的查询重写器来优化子问题以进行搜索。    past_context = get_past_context_str(state['past_steps'])    rewritten_query = query_rewriter_agent.invoke({        "sub_question": current_step.sub_question,        "keywords": current_step.keywords,        "past_context": past_context    })    console.print(f"  Rewritten Query: {rewritten_query}")    # 获取监督器关于哪种检索策略最好的决策。    retrieval_decision = retrieval_supervisor_agent.invoke({"sub_question": rewritten_query})    console.print(f"  Supervisor Decision: Use `{retrieval_decision.strategy}`. Justification: {retrieval_decision.justification}")    # 根据决策，执行正确的检索函数。    if retrieval_decision.strategy == 'vector_search':        retrieved_docs = vector_search_only(rewritten_query, section_filter=current_step.document_section, k=config['top_k_retrieval'])    elif retrieval_decision.strategy == 'keyword_search':        retrieved_docs = bm25_search_only(rewritten_query, k=config['top_k_retrieval'])    else: # hybrid_search        retrieved_docs = hybrid_search(rewritten_query, section_filter=current_step.document_section, k=config['top_k_retrieval'])    # 返回要添加到状态的检索文档。    return {"retrieved_docs": retrieved_docs}

这个节点做很多智能工作。它不仅仅是一个简单的检索器。它编排一个小管道：它重写查询，要求监督器最佳策略，然后执行该策略。

现在，我们需要我们其他工具的对应节点：网络搜索。

plaintext 复制代码

# 节点2b：从网络检索def web_search_node(state: RAGState) -> Dict:    # 获取当前步骤的详细信息。    current_step_index = state["current_step_index"]    current_step = state["plan"].steps[current_step_index]    console.print(f"--- 🌐: Searching Web (Step {current_step_index + 1}: {current_step.sub_question}) ---")    # 重写子问题以进行网络搜索引擎。    past_context = get_past_context_str(state['past_steps'])    rewritten_query = query_rewriter_agent.invoke({        "sub_question": current_step.sub_question,        "keywords": current_step.keywords,        "past_context": past_context    })    console.print(f"  Rewritten Query: {rewritten_query}")    # 调用我们的网络搜索函数。    retrieved_docs = web_search_function(rewritten_query)    # 返回结果。    return {"retrieved_docs": retrieved_docs}

这个web_search_node更简单，因为它不需要监督器，它只有一种搜索网络的方式。但它仍然使用我们强大的查询重写器来确保搜索尽可能有效。

在检索文档（来自任一来源）后，我们需要运行我们的精度和综合漏斗。我们将为每个阶段创建一个节点。首先，rerank_node。

plaintext 复制代码

# 节点3：重新排序器def rerank_node(state: RAGState) -> Dict:    console.print("--- 🎯: Reranking Documents ---")    # 获取当前步骤的详细信息。    current_step_index = state["current_step_index"]    current_step = state["plan"].steps[current_step_index]    # 在我们刚检索的文档上调用我们的重新排序函数。    reranked_docs = rerank_documents_function(current_step.sub_question, state["retrieved_docs"])    console.print(f"  Reranked to top {len(reranked_docs)} documents.")    # 用高精度文档更新状态。    return {"reranked_docs": reranked_docs}

这个节点获取retrieved_docs（我们的10个文档的广泛召回）并使用交叉编码器将它们过滤到前3个，将结果放在reranked_docs中。

接下来，compression_node将采用那前3个文档并蒸馏它们。

plaintext 复制代码

# 节点4：压缩器/蒸馏器def compression_node(state: RAGState) -> Dict:    console.print("--- ✂️: Distilling Context ---")    # 获取当前步骤的详细信息。    current_step_index = state["current_step_index"]    current_step = state["plan"].steps[current_step_index]    # 将前3个文档格式化为单个字符串。    context = format_docs(state["reranked_docs"])    # 调用我们的蒸馏器代理将它们综合为一段。    synthesized_context = distiller_agent.invoke({"question": current_step.sub_question, "context": context})    console.print(f"  Distilled Context Snippet: {synthesized_context[:200]}...")    # 用最终、清洁的上下文更新状态。    return {"synthesized_context": synthesized_context}

这个节点是我们检索漏斗的最后一步。它采用reranked_docs并产生单个、清洁的synthesized_context段落。

现在我们有了证据，我们需要反思它并更新我们的研究历史。这是reflection_node的工作。

plaintext 复制代码

# 节点5：反思/更新步骤def reflection_node(state: RAGState) -> Dict:    console.print("--- 🤔: Reflecting on Findings ---")    # 获取当前步骤的详细信息。    current_step_index = state["current_step_index"]    current_step = state["plan"].steps[current_step_index]    # 调用我们的反思代理来总结发现。    summary = reflection_agent.invoke({"sub_question": current_step.sub_question, "context": state['synthesized_context']})    console.print(f"  Summary: {summary}")    # 创建一个新的PastStep字典，包含此步骤的所有结果。    new_past_step = {        "step_index": current_step_index + 1,        "sub_question": current_step.sub_question,        "retrieved_docs": state['reranked_docs'], # 我们保存重新排序的文档以进行最终引用        "summary": summary    }    # 将新步骤附加到我们的历史并递增步骤索引以移动到下一步。    return {"past_steps": state["past_steps"] + [new_past_step], "current_step_index": current_step_index + 1}

这个节点是我们代理的簿记员。它调用reflection_agent来创建摘要，然后将当前研究周期的所有结果整齐打包为new_past_step对象。然后它将此添加到past_steps列表中并递增current_step_index，让代理为下一个循环准备。

最后，当研究完成时，我们需要一个最后节点来生成最终答案。

plaintext 复制代码

# 节点6：最终答案生成器def final_answer_node(state: RAGState) -> Dict:    console.print("--- ✅: Generating Final Answer with Citations ---")    # 首先，我们需要收集我们从所有过去步骤收集的所有证据。    final_context = ""    for i, step in enumerate(state['past_steps']):        final_context += f"\\n--- Findings from Research Step {i+1} ---\\n"        # 我们包括每个文档的源元数据（部分或URL）以启用引用。        for doc in step['retrieved_docs']:            source = doc.metadata.get('section') or doc.metadata.get('source')            final_context += f"Source: {source}\\nContent: {doc.page_content}\\n\\n"    # 我们创建一个专门用于生成最终、可引用答案的新提示。    final_answer_prompt = ChatPromptTemplate.from_messages([\        ("system", """You are an expert financial analyst. Synthesize the research findings from internal documents and web searches into a comprehensive, multi-paragraph answer for the user's original question.\Your answer must be grounded in the provided context. At the end of any sentence that relies on specific information, you MUST add a citation. For 10-K documents, use [Source: <section title>]. For web results, use [Source: <URL>]."""),\        ("human", "Original Question: {question}\\n\\nResearch History and Context:\\n{context}")\    ])    # 我们为这个最终任务创建一个临时代理并调用它。    final_answer_agent = final_answer_prompt | reasoning_llm | StrOutputParser()    final_answer = final_answer_agent.invoke({"question": state['original_question'], "context": final_context})    # 用最终答案更新状态。    return {"final_answer": final_answer}

这个final_answer_node是我们的压轴戏。它将来自past_steps历史中每一步的所有高质量、重新排序的文档合并到一个巨大的上下文中。然后它使用专门的提示来指导我们强大的reasoning_llm将此信息综合为包含引用的全面、多段落答案，将我们的研究过程带到成功结论。

随着所有节点定义，我们现在有代理的所有构建块。下一步是定义"连接它们并控制图流的"线"。

定义条件边

所以，我们已经构建了所有节点。我们有计划器、检索器、重新排序器、蒸馏器和反射器。将它们想象成房间里的专家集合。现在我们需要定义对话规则。谁在什么时候说话？我们如何决定下一步做什么？

这是边在LangGraph中的工作。简单边是直接的，"在节点A之后，总是去节点B" 。但真正的智能来自条件边。

条件边是查看代理当前记忆（RAGState）并做出决策的函数，基于情况将工作流路由到不同路径。

我们代理需要两个关键决策函数：

1. 工具路由器（route_by_tool）： 在制定计划后，此函数将查看计划的_当前步骤_并决定是否将工作流发送到retrieve_10k节点或retrieve_web节点。
1. 主控制循环（should_continue_node）： 这是最重要的一个。在每个研究步骤完成并反思后，此函数将调用我们的policy_agent来决定是否继续到计划中的下一步或完成研究并生成最终答案。

首先，让我们构建我们简单的工具路由器。

plaintext 复制代码

# 条件边1：工具路由器def route_by_tool(state: RAGState) -> str:    # 获取我们当前所在步骤的索引。    current_step_index = state["current_step_index"]    # 从计划中获取当前步骤的完整详细信息。    current_step = state["plan"].steps[current_step_index]    # 返回为此步骤指定的工具名称。    # LangGraph将使用此字符串来决定下一步去哪个节点。    return current_step.tool

此函数非常简单，但至关重要。它充当交换机。它从状态中读取current_step_index，在plan中找到相应的Step，并返回其tool字段的值（它将是"search_10k"或"search_web"）。当我们连接我们的图时，我们将告诉它使用此函数的输出来选择下一个节点。

现在我们需要创建一个函数来控制我们代理的主要推理循环。这是我们policy_agent发挥作用的地方。

plaintext 复制代码

# 条件边2：主控制循环def should_continue_node(state: RAGState) -> str:    console.print("--- 🚦: Evaluating Policy ---")    # 获取我们即将开始步骤的索引。    current_step_index = state["current_step_index"]    # 首先，检查我们的基本停止条件。    # 条件1：我们是否完成了计划中的所有步骤？    if current_step_index >= len(state["plan"].steps):        console.print("  -> Plan complete. Finishing.")        return "finish"    # 条件2：我们是否超过了迭代次数的安全限制？    if current_step_index >= config["max_reasoning_iterations"]:        console.print("  -> Max iterations reached. Finishing.")        return "finish"    # 一个特殊情况：如果最后的检索步骤未能找到任何文档，    # 就没有反思的意义。最好继续下一步。    if state.get("reranked_docs") is not None and not state["reranked_docs"]:        console.print("  -> Retrieval failed for the last step. Continuing with next step in plan.")        return "continue"    # 如果没有满足基本条件，是时候询问我们的策略代理了。    # 我们将历史和计划格式化为字符串以供提示。    history = get_past_context_str(state['past_steps'])    plan_str = json.dumps([s.dict() for s in state['plan'].steps])    # 调用策略代理以获取其战略决策。    decision = policy_agent.invoke({"question": state["original_question"], "plan": plan_str, "history": history})    console.print(f"  -> Decision: {decision.next_action} | Justification: {decision.justification}")    # 基于代理的决策，返回适当的信号。    if decision.next_action == "FINISH":        return "finish"    else: # CONTINUE_PLAN        return "continue"

这个should_continue_node函数是我们代理控制流的认知核心。它在每个reflection_node后运行。

1. 首先，它检查简单、硬编码的停止标准 。计划用完步骤了吗？我们是否达到了max_reasoning_iterations安全限制？这些防止代理永远运行。
1. 如果那些检查通过，它然后调用我们强大的policy_agent。它给策略代理所有它需要的上下文：原始目标（question）、完整plan以及迄今完成工作的history。
1. 最后，它获取policy_agent的结构化输出（CONTINUE_PLAN或FINISH）并返回简单字符串"continue"或"finish"。LangGraph将使用此字符串要么循环回来进行另一个研究周期或继续到final_answer_node。

随着我们节点（专家）和条件边（对话规则）的现在定义，我们有我们需要的。

是时候将所有这些片段组装成完整、可运行的StateGraph了。

连接深度思考RAG机器

我们所有各个组件都准备好：

1. 我们的节点（工人）
1. 我们的条件边（经理）

现在是将它们全部连接成单一、内聚系统的时候。

我们将使用LangGraph的StateGraph来定义我们代理的完整认知架构。这是我们绘制代理思考过程蓝图的地方，明确定义信息从一步流向下一步的方式。

首先我们需要做的是创建StateGraph的实例。我们将告诉它它将传递的"状态"是我们RAGState字典。

plaintext 复制代码

from langgraph.graph import StateGraph, END # 导入主图组件# 实例化图，告诉它使用我们的RAGState TypedDict作为其状态模式。graph = StateGraph(RAGState)

我们现在有一个空图。下一步是添加我们之前定义的所有节点。.add_node()方法接受两个参数：节点唯一字符串名称和节点将执行的Python函数。

plaintext 复制代码

# 将我们所有Python函数作为节点添加到图中graph.add_node("plan", plan_node)                     # 创建初始计划的节点graph.add_node("retrieve_10k", retrieval_node)        # 内部文档检索节点graph.add_node("retrieve_web", web_search_node)       # 外部网络搜索节点graph.add_node("rerank", rerank_node)                 # 执行精确重新排序的节点graph.add_node("compress", compression_node)          # 蒸馏上下文的节点graph.add_node("reflect", reflection_node)            # 总结发现并更新历史的节点graph.add_node("generate_final_answer", final_answer_node) # 综合最终答案的节点

现在我们所有专家都在房间里。最后且最关键的步骤是定义连接它们的"线"。这是我们使用.add_edge()和.add_conditional_edges()方法来定义控制流的地方。

plaintext 复制代码

# 我们图的入口点是"plan"节点。每个运行都从这里开始。graph.set_entry_point("plan")# 在"plan"节点后，我们使用我们的第一个条件边来决定使用哪种工具。graph.add_conditional_edges(    "plan",           # 源节点    route_by_tool,    # 做出决策的函数    {                 # 将函数输出字符串映射到目标节点的字典        "search_10k": "retrieve_10k",        "search_web": "retrieve_web",    },)# 在从10-K或网络检索后，流对某个步骤是线性的。graph.add_edge("retrieve_10k", "rerank") # 内部检索后，总是去重新排序。graph.add_edge("retrieve_web", "rerank") # 网络检索后，也总是去重新排序。graph.add_edge("rerank", "compress")      # 重新排序后，总是去压缩。graph.add_edge("compress", "reflect")     # 压缩后，总是去反思。# 在"reflect"节点后，我们遇到我们的主要条件边，控制推理循环。graph.add_conditional_edges(    "reflect",        # 源节点    should_continue_node, # 调用我们策略代理的函数    {                 # 将决策映射到下一步的字典        "continue": "plan", # 如果决策是"continue"，我们循环回"plan"节点来路由计划的下一步。        "finish": "generate_final_answer", # 如果决策是"finish"，我们继续生成最终答案。    },)# "generate_final_answer"节点是结束前的最后一步。graph.add_edge("generate_final_answer", END) # 生成答案后，图结论。print("StateGraph constructed successfully.")

这是我们代理大脑的蓝图。让我们追踪流：

1. 它总是从plan开始。
1. route_by_tool条件边然后充当交换机，将流定向到retrieve_10k或retrieve_web。
1. 无论哪个检索器运行，输出总是通过rerank -> compress -> reflect管道进行。
1. 这带我们到最重要部分：should_continue_node条件边。这是我们循环推理的核心。

• 如果策略代理说CONTINUE_PLAN，边将工作流全部发送回plan节点。我们回到plan（而不是直接到下一个检索器）以便route_by_tool可以正确路由计划中的下一步。
• 如果策略代理说FINISH，边打破循环并发送工作流到generate_final_answer节点。
• 最后，在答案生成后，图在END终止。

我们成功定义了我们深度思考代理的完整、复杂和循环架构。剩下的唯一事情是将此蓝图编译成可运行应用程序并可视化它以查看我们构建了什么。

编译和可视化迭代工作流

随着我们图完全连接，组装过程中的最后一步是编译它。.compile()方法获取我们节点和边的抽象定义，并将其转换为具体的、可执行的应用程序。

然后我们可以使用内置LangGraph实用程序来生成我们图的图表。可视化工作流对理解和调试复杂代理系统非常有帮助。它将我们的代码转换为直观流程图，清晰显示代理可能的推理路径。

基本上，我们正在将我们的蓝图变成真实机器。

plaintext 复制代码

# .compile()方法获取我们的图定义并创建可运行对象。deep_thinking_rag_graph = graph.compile()print("Graph compiled successfully.")# 现在，让我们可视化我们构建的架构。try:    from IPython.display import Image, display    # 我们可以获得图结构的PNG图像。    png_image = deep_thinking_rag_graph.get_graph().draw_png()    # 在我们的笔记本中显示图像。    display(Image(png_image))except Exception as e:    # 如果pygraphviz及其系统依赖未安装，这可能失败。    print(f"Graph visualization failed: {e}. Please ensure pygraphviz is installed.")

deep_thinking_rag_graph对象现在是我们完全功能代理。可视化代码然后调用.get_graph().draw_png()来生成我们构造状态机的视觉表示。

深度思考更简单管道流（由Fareed Khan创建）

我们可以清楚地看到：

• 初始分支逻辑，其中route_by_tool在retrieve_10k和retrieve_web之间选择。
• 每个研究步骤的线性处理管道（rerank -> compress -> reflect）。
• 关键的反馈循环 ，其中should_continue边将工作流发送回plan节点以开始下一个研究周期。
• 最终"出口坡道"，一旦研究完成，导向generate_final_answer。

这是一个可以思考的系统架构。现在，让我们对其进行测试。

运行深度思考管道

我们已经构建了一个推理引擎。现在是时候看看它是否能在我们的基线系统如此壮观失败的地方成功。

我们将使用完全相同的多跳、多源挑战查询来调用我们编译的deep_thinking_rag_graph。我们将使用.stream()方法来获得代理执行的实时、逐步跟踪，观察其"思考过程"，因为它解决问题。

本节计划：

• 调用图： 我们将运行我们的代理并观察它执行其计划，在工具之间切换并构建其研究历史。
• 分析最终输出： 我们将检查最终、合成答案，看看它是否成功整合了来自10-K和网络两者的信息。
• 比较结果： 我们将进行最终并排比较，以明确突出我们深度思考代理的架构优势。

我们将设置初始输入，这只是一个包含original_question的字典，然后调用.stream()方法。stream方法对调试和观察非常棒，因为它在每个节点完成其工作后产生图的状态。

plaintext 复制代码

# 这将保存运行完成后图的最终状态。final_state = None# 我们图的初始输入，包含原始用户查询。graph_input = {"original_question": complex_query_adv}print("--- Invoking Deep Thinking RAG Graph ---")# 我们使用.stream()来实时观察代理的过程。# "values"模式意味着我们在每步后获得完整的RAGState对象。for chunk in deep_thinking_rag_graph.stream(graph_input, stream_mode="values"):    # 流中的最后一块将是图的终端状态。    final_state = chunkprint("\n--- Graph Stream Finished ---")

这个循环是我们代理变得生动的时候。每次迭代，LangGraph执行工作流中的下一个节点，更新RAGState并向我们产生新状态。我们嵌入节点内的rich库console.print语句将给我们代理行动和决策的运行评论。

plaintext 复制代码

#### 输出 ####--- Invoking Deep Thinking RAG Graph ------ 🧠: Generating Plan ---plan:  steps:  - sub_question: What are the key risks related to competition as stated in NVIDIA's 2023 10-K filing?    tool: search_10k    ...  - sub_question: What are the recent news and developments in AMD's AI chip strategy in 2024?    tool: search_web    ...--- 🔍: Retrieving from 10-K (Step 1: ...) ---  Rewritten Query: key competitive risks for NVIDIA in the semiconductor industry...  Supervisor Decision: Use `hybrid_search`. ...--- 🎯: Reranking Documents ---  Reranked to top 3 documents.--- ✂️: Distilling Context ---  Distilled Context Snippet: NVIDIA operates in the intensely competitive semiconductor industry...--- 🤔: Reflecting on Findings ---  Summary: According to its 2023 10-K, NVIDIA operates in an intensely competitive semiconductor industry...--- 🚦: Evaluating Policy ---  -> Decision: CONTINUE_PLAN | Justification: The first step...has been completed. The next step...is still pending...--- 🌐: Searching Web (Step 2: ...) ---  Rewritten Query: AMD AI chip strategy news and developments 2024...--- 🎯: Reranking Documents ---  Reranked to top 3 documents.--- ✂️: Distilling Context ---  Distilled Context Snippet: AMD has ramped up its challenge to Nvidia in the AI accelerator market with its Instinct MI300 series...--- 🤔: Reflecting on Findings ---  Summary: In 2024, AMD is aggressively competing with NVIDIA in the AI chip market through its Instinct MI300X accelerator...--- 🚦: Evaluating Policy ---  -> Decision: FINISH | Justification: The research history now contains comprehensive summaries of both NVIDIA's stated risks and AMD's recent strategy...--- ✅: Generating Final Answer with Citations ------ Graph Stream Finished ---

您可以看到我们设计的执行。代理：

1. 规划：它创建了正确的两步、多工具计划。
1. 执行步骤1：它使用search_10k，通过完整检索漏斗运行并反思发现。
1. 自我批评：策略代理看到计划尚未完成并决定CONTINUE_PLAN。
1. 执行步骤2：它正确切换到search_web工具，通过相同漏斗运行并再次反思。
1. 再次自我批评：这次，策略代理看到所有必要信息都已收集并正确决定FINISH。
1. 合成：工作流然后继续到generate_final_answer节点。

代理已成功导航复杂查询。现在，让我们检查它产生的最终答案。

分析最终的高质量答案

代理已完成其研究。final_state变量现在保存完整的RAGState，包括final_answer。让我们打印它并看看它是否成功将来自两个源的信息合成为单一、分析响应，包含引用。

plaintext 复制代码

console.print("--- DEEP THINKING RAG FINAL ANSWER ---")console.print(Markdown(final_state['final_answer']))
``````plaintext
#### 输出 ####--- DEEP THINKING RAG FINAL ANSWER ---Based on an analysis of NVIDIA's 2023 10-K filing and recent news from 2024 regarding AMD's AI chip strategy, the following synthesis can be made:**NVIDIA's Stated Competitive Risks:**In its 2023 10-K filing, NVIDIA identifies its operating environment as the "intensely competitive" semiconductor industry, which is characterized by rapid technological change. A primary risk is that competitors, including AMD, could introduce new products with better performance or lower costs that gain significant market acceptance, which could materially and adversely affect its business [Source: Item 1A. Risk Factors.].**AMD's 2024 AI Chip Strategy:**In 2024, AMD has moved aggressively to challenge NVIDIA's dominance in the AI hardware market with its Instinct MI300 series of accelerators, particularly the MI300X. This product is designed to compete directly with NVIDIA's H100 GPU. AMD's strategy has gained significant traction, with major cloud providers such as Microsoft Azure and Oracle announcing plans to use the new chips [Source: https://www.reuters.com/technology/amd-forecasts-35-billion-ai-chip-revenue-2024-2024-01-30/].**Synthesis and Impact:**AMD's 2024 AI chip strategy directly exacerbates the competitive risks outlined in NVIDIA's 10-K. The successful launch and adoption of the MI300X is a materialization of the specific risk that a competitor could introduce a product with comparable performance. The adoption of AMD's chips by major cloud providers signifies a direct challenge to NVIDIA's market share in the lucrative data center segment, validating NVIDIA's stated concerns about rapid technological change [Source: Item 1A. Risk Factors. and https://www.cnbc.com/2023/12/06/amd-launches-new-mi300x-ai-chip-to-compete-with-nvidias-h100.html].

这是一个完整的成功。答案是深度分析列表。

• 它正确总结了10-K中的风险。
• 它正确总结了网络搜索的AMD新闻。
• 关键的是，在"综合和影响"部分，它执行原始查询所需的多跳推理，解释后者如何加剧前者。
• 最后，它提供正确的来源，引用指向内部文档部分和外部网络URL。

对比比较

让我们并排放置两个结果以使差异清晰。

特征	vanilla RAG（失败）	深度思考RAG（成功）
思考风格	一次性，无记忆。	多步，基于记忆的推理。
规划	无规划将整个查询视为一个搜索。	将查询分解为步骤，为每步选择最佳工具（内部或网络）。
搜索方法	在一个源上的基本语义搜索。	智能、自适应搜索，在每步使用最佳方法。
使用的源	仅一个静态文档。	将内部文档与实时网络数据混合。
答案质量	失败无综合。	成功------清晰、良好引用的多源答案。

比较表（由Fareed Khan创建）

这种比较提供了明确结论。向循环、工具感知和自我批评代理的架构转变导致在复杂、真实世界查询上性能的巨大和可测量改进。

评估框架和分析结果

所以，我们已经看到我们的高级代理在一个非常困难的查询上轶事成功。但在生产环境中，我们需要的不仅仅是单一成功故事。我们需要客观、定量和自动化验证。

评估框架（由Fareed Khan创建）

为了实现这一点，我们现在将使用RAGAs（RAG评估）库构建严格的评估框架。我们将专注于RAGAs提供的四个关键指标：

• 上下文精确度和召回率： 这些衡量我们检索管道的质量。精确度 问："在我们检索的文档中，有多少实际相关？"（信号对噪声）。召回率问："在存在的所有相关文档中，我们实际找到多少？"（完整性）。
• 答案忠实度： 这衡量生成的答案是否基于提供的上下文，作为我们针对LLM幻觉的主要检查。
• 答案正确性： 这是质量的终极衡量。它将生成的答案与手工制作的"地面真相"答案进行比较，以评估其事实准确性和完整性。

基本上，为了运行RAGAs评估，我们需要准备数据集。此数据集将包含我们的挑战查询、由我们的基线和高级管道生成的各自答案、它们使用的各自上下文以及我们将编写的"真正"答案作为理想响应。

plaintext 复制代码

from datasets import Dataset # 来自Hugging Face数据集库，RAGAs使用from ragas import evaluatefrom ragas.metrics import (    context_precision,    context_recall,    faithfulness,    answer_correctness,)import pandas as pdprint("Preparing evaluation dataset...")# 这是我们对复杂查询的手工制作的理想答案。ground_truth_answer_adv = "NVIDIA's 2023 10-K lists intense competition and rapid technological change as key risks. This risk is exacerbated by AMD's 2024 strategy, specifically the launch of the MI300X AI accelerator, which directly competes with NVIDIA's H100 and has been adopted by major cloud providers, threatening NVIDIA's market share in the data center segment."# 我们需要重新运行基线模型的检索器以获得其评估上下文。retrieved_docs_for_baseline_adv = baseline_retriever.invoke(complex_query_adv)baseline_contexts = [[doc.page_content for doc in retrieved_docs_for_baseline_adv]]# 对于高级代理，我们将整合它跨所有研究步骤检索的文档。advanced_contexts_flat = []for step in final_state['past_steps']:    advanced_contexts_flat.extend([doc.page_content for doc in step['retrieved_docs']])# 我们使用集合来移除任何重复文档以进行更清洁的评估。advanced_contexts = [list(set(advanced_contexts_flat))]# 现在，我们构建将转换为评估数据集的字典。eval_data = {    'question': [complex_query_adv, complex_query_adv], # 两个系统相同问题    'answer': [baseline_result, final_state['final_answer']], # 每个系统的答案    'contexts': baseline_contexts + advanced_contexts, # 每个系统使用的上下文    'ground_truth': [ground_truth_answer_adv, ground_truth_answer_adv] # 理想答案}# 创建Hugging Face Dataset对象。eval_dataset = Dataset.from_dict(eval_data)# 定义我们想要计算的指标列表。metrics = [\    context_precision,\    context_recall,\    faithfulness,\    answer_correctness,\]print("Running RAGAs evaluation...")# 运行评估。RAGAs将为每个指标调用LLM进行评分。result = evaluate(eval_dataset, metrics=metrics, is_async=False)print("Evaluation complete.")# 将结果格式化为清洁的pandas DataFrame以便轻松比较。results_df = result.to_pandas()results_df.index = ['baseline_rag', 'deep_thinking_rag']print("\n--- RAGAs Evaluation Results ---")print(results_df[['context_precision', 'context_recall', 'faithfulness', 'answer_correctness']].T)

我们设置正式实验。我们为我们单硬查询收集所有必要神器：问题、两个不同答案、两个不同上下文集和我们的理想地面真相。然后我们仔细打包这个eval_dataset给ragas.evaluate函数。

在后台，RAGAs进行一系列LLM调用，要求它作为法官行为。例如，对于faithfulness，它将问，"这个答案是否完全由此上下文支持？"对于answer_correctness，它将问...

这个答案与这个地面真相答案在事实上的相似程度如何？

我们可以看到数值分数...

plaintext 复制代码

#### 输出 ####Preparing evaluation dataset...Running RAGAs evaluation...Evaluation complete.--- RAGAs Evaluation Results ---                     baseline_rag  deep_thinking_ragcontext_precision        0.500000           0.890000context_recall           0.333333           1.000000faithfulness             1.000000           1.000000answer_correctness       0.395112           0.991458

定量结果为深度思考架构的优越性提供明确和客观裁决。

• 上下文精确度（0.50对0.89）： 基线上下文只有一半相关，因为它只能检索关于竞争的一般信息。高级代理的多步、多工具检索实现了完美精确度分数。
• 上下文召回率（0.33对1.00）： 基线检索器完全错过了来自网络的关键信息，导致非常低的召回率。高级代理的规划和工具使用确保所有必要信息被找到，实现完美召回率。
• 忠实度（1.00对1.00）： 两个系统都高度忠实。基线正确陈述它没有信息，高级代理正确使用它找到的信息。这对两者都是一个好兆头，但无正确性的忠实度是无用的。
• 答案正确性（0.40对0.99）： 这是质量的终极衡量。基线答案不到40%正确，因为它缺少所需分析的后半部分。高级代理的答案几乎完美。

总结我们的整个管道

在本指南中，我们已经从简单、脆弱的RAG管道到复杂自主推理代理的完整架构。

• 我们从构建香草RAG系统开始，并演示了其在复杂、多源查询上的可预测失败。
• 然后我们系统地设计了一个深度思考代理，为其配备规划、使用多种工具和适应其检索策略的能力。
• 我们构建了一个多阶段检索漏斗，从广泛召回（使用混合搜索）到高精度（使用交叉编码器重新排序器）最后到综合（使用蒸馏器代理）。
• 我们使用LangGraph编排整个认知架构，创建循环、有状态工作流，实现真正多步推理。
• 我们实施了一个自我批评循环，允许代理识别失败、修订其自己的计划以及在找不到答案时优雅退出。
• 最后，我们用生产级评估验证我们的成功，使用RAGAs提供高级代理优越性的客观、定量证明。

使用马尔可夫决策过程(MDP)的学习策略

我们的代理目前依靠昂贵的通用LLM如GPT-4o进行每个单个决策的政策代理，决定CONTINUE或FINISH。虽然有效，但这在生产环境中可能缓慢且昂贵。学术前沿提供更优化的前进道路。

• RAG作为决策过程： 我们可以将代理的推理循环建模为 马尔可夫决策过程（MDP） 。在此模型中，每个RAGState是一个"状态"，每个动作（CONTINUE、REVISE、FINISH）导致具有某种奖励的新状态（例如，找到正确答案）。
• 从经验学习： 我们在 LangSmith中记录的数千成功和失败推理轨迹是无价训练数据。每个轨迹都是代理导航此MDP的例子。
• 训练政策模型： 使用此数据，我们可以应用强化学习 来训练更小、专门的政策模型。
• 目标：速度和效率：目标是将像GPT-4o这样模型的复杂推理蒸馏为紧凑、微调模型（例如，7B参数模型）。此学习政策可以CONTINUE/FINISH决策更快、更便宜，同时为我们特定域高度优化。这是像DeepRAG等先进研究论文背后的核心思想，代表自主RAG系统的下一个优化水平。