RAG from Scratch-优化-query

目录

基础篇 (Part 1-4)

  • [Part 1: RAG 概览](#Part 1: RAG 概览)
  • [Part 2: 索引 (Indexing)](#Part 2: 索引 (Indexing))
  • [Part 3: 检索 (Retrieval)](#Part 3: 检索 (Retrieval))
  • [Part 4: 生成 (Generation)](#Part 4: 生成 (Generation))

查询优化篇 (Part 5-9)

  • [Part 5: Multi Query](#Part 5: Multi Query)
  • [Part 6: RAG-Fusion](#Part 6: RAG-Fusion)
  • [Part 7: Decomposition](#Part 7: Decomposition)
  • [Part 8: Step Back](#Part 8: Step Back)
  • [Part 9: HyDE](#Part 9: HyDE)

环境配置

安装依赖

bash 复制代码
pip install langchain_community tiktoken langchain-openai langchainhub chromadb langchain

LangSmith 配置

python 复制代码
import os
from getpass import getpass

os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = getpass()
os.environ["OPENAI_API_KEY"] = getpass()

基础篇

Part 1: RAG 概览

核心流程

RAG 系统包含两个主要阶段:

  1. 索引阶段 (Indexing)

    • 加载文档
    • 文本分割
    • 向量化
    • 存储到向量数据库
  2. 检索与生成阶段 (Retrieval & Generation)

    • 用户提问
    • 检索相关文档
    • 结合上下文生成答案

完整示例

python 复制代码
import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

#### 索引阶段 ####

# 1. 加载文档
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

# 2. 文本分割
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

# 3. 向量化并存储
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()

#### 检索与生成阶段 ####

# 4. 加载 Prompt
prompt = hub.pull("rlm/rag-prompt")

# 5. 初始化 LLM
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# 6. 构建 RAG Chain
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# 7. 提问
answer = rag_chain.invoke("What is Task Decomposition?")
print(answer)
# 输出: Task decomposition is a technique used to break down complex tasks
# into smaller and simpler steps...

Part 2: 索引 (Indexing)

Token 计数

python 复制代码
import tiktoken

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

question = "What kinds of pets do I like?"
num_tokens_from_string(question, "cl100k_base")  # 输出: 8

Embedding 向量化

python 复制代码
from langchain_openai import OpenAIEmbeddings

embd = OpenAIEmbeddings()
query_result = embd.embed_query("What kinds of pets do I like?")
document_result = embd.embed_query("My favorite pet is a cat.")

print(len(query_result))  # 输出: 1536 (向量维度)

余弦相似度计算

python 复制代码
import numpy as np

def cosine_similarity(vec1, vec2):
    dot_product = np.dot(vec1, vec2)
    norm_vec1 = np.linalg.norm(vec1)
    norm_vec2 = np.linalg.norm(vec2)
    return dot_product / (norm_vec1 * norm_vec2)

similarity = cosine_similarity(query_result, document_result)
print("Cosine Similarity:", similarity)  # 输出: 0.8806521938580575

文档加载与分割

python 复制代码
# 加载网页
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

# 使用 tiktoken 进行分割
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300,
    chunk_overlap=50
)
splits = text_splitter.split_documents(blog_docs)

# 创建向量存储
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()

Part 3: 检索 (Retrieval)

控制检索数量

python 复制代码
# 设置检索 top-k
retriever = vectorstore.as_retriever(search_kwargs={"k": 1})

docs = retriever.get_relevant_documents("What is Task Decomposition?")
print(len(docs))  # 输出: 1

Part 4: 生成 (Generation)

自定义 Prompt

python 复制代码
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# 构建 Chain
chain = prompt | llm

# 执行
result = chain.invoke({
    "context": docs,
    "question": "What is Task Decomposition?"
})
print(result.content)

使用 Hub Prompt

python 复制代码
from langchain import hub

prompt_hub_rag = hub.pull("rlm/rag-prompt")
print(prompt_hub_rag)
# 输出: You are an assistant for question-answering tasks.
# Use the following pieces of retrieved context to answer the question...

完整 RAG Chain

python 复制代码
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

answer = rag_chain.invoke("What is Task Decomposition?")
print(answer)

查询优化篇

Part 5: Multi Query

核心思想

通过生成多个不同角度的查询来克服基于距离的相似度搜索的局限性。

实现流程

python 复制代码
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# 1. 创建 Multi Query Prompt
template = """You are an AI language model assistant. Your task is to generate five
different versions of the given user question to retrieve relevant documents from a vector
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search.
Provide these alternative questions separated by newlines. Original question: {question}"""

prompt_perspectives = ChatPromptTemplate.from_template(template)

# 2. 生成多个查询
generate_queries = (
    prompt_perspectives
    | ChatOpenAI(temperature=0)
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

# 3. 去重函数
from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
    """合并并去重检索到的文档"""
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    unique_docs = list(set(flattened_docs))
    return [loads(doc) for doc in unique_docs]

# 4. 构建检索链
question = "What is task decomposition for LLM agents?"
retrieval_chain = generate_queries | retriever.map() | get_unique_union
docs = retrieval_chain.invoke({"question": question})

print(len(docs))  # 输出: 6 (去重后的文档数)

完整 RAG Chain

python 复制代码
from operator import itemgetter

template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    {"context": retrieval_chain, "question": itemgetter("question")}
    | prompt
    | llm
    | StrOutputParser()
)

answer = final_rag_chain.invoke({"question": question})
print(answer)

Part 6: RAG-Fusion

核心思想

生成多个相关查询,然后使用 Reciprocal Rank Fusion (RRF) 算法对检索结果进行重排序。

RRF 算法详解

python 复制代码
def reciprocal_rank_fusion(results: list[list], k=60):
    """
    RRF 公式: score = Σ(1 / (rank + k))

    参数:
        results: 多个查询的检索结果列表
        k: RRF 参数,默认 60
    """
    fused_scores = {}

    for docs in results:
        for rank, doc in enumerate(docs):
            doc_str = dumps(doc)
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            # RRF 公式
            fused_scores[doc_str] += 1 / (rank + k)

    # 按分数降序排序
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    return reranked_results

完整实现

python 复制代码
# 1. 生成相关查询
template = """You are a helpful assistant that generates multiple search queries based on a single input query.
Generate multiple search queries related to: {question}
Output (4 queries):"""

prompt_rag_fusion = ChatPromptTemplate.from_template(template)

generate_queries = (
    prompt_rag_fusion
    | ChatOpenAI(temperature=0)
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

# 2. 构建 RAG-Fusion 检索链
retrieval_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion
docs = retrieval_chain_rag_fusion.invoke({"question": question})

print(len(docs))  # 输出: 7

# 3. 完整 RAG Chain
final_rag_chain = (
    {"context": retrieval_chain_rag_fusion, "question": itemgetter("question")}
    | prompt
    | llm
    | StrOutputParser()
)

answer = final_rag_chain.invoke({"question": question})

Part 7: Decomposition

核心思想

将复杂问题分解为多个子问题,然后分别回答。

两种策略

策略 1: 递归回答

每个子问题的答案会作为下一个子问题的上下文。

python 复制代码
# 1. 生成子问题
template = """You are a helpful assistant that generates multiple sub-questions related to an input question.
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation.
Generate multiple search queries related to: {question}
Output (3 queries):"""

prompt_decomposition = ChatPromptTemplate.from_template(template)

generate_queries_decomposition = (
    prompt_decomposition
    | llm
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

question = "What are the main components of an LLM-powered autonomous agent system?"
questions = generate_queries_decomposition.invoke({"question": question})

print(questions)
# 输出:
# ['1. What is LLM technology and how does it work in autonomous agent systems?',
#  '2. What are the specific components that make up an LLM-powered autonomous agent system?',
#  '3. How do the main components interact with each other?']
python 复制代码
# 2. 递归回答子问题
template = """Here is the question you need to answer:

\n --- \n {question} \n --- \n

Here is any available background question + answer pairs:

\n --- \n {q_a_pairs} \n --- \n

Here is additional context relevant to the question:

\n --- \n {context} \n --- \n

Use the above context and any background question + answer pairs to answer the question: \n {question}
"""

decomposition_prompt = ChatPromptTemplate.from_template(template)

def format_qa_pair(question, answer):
    return f"Question: {question}\nAnswer: {answer}\n\n"

q_a_pairs = ""
for q in questions:
    rag_chain = (
        {"context": itemgetter("question") | retriever,
         "question": itemgetter("question"),
         "q_a_pairs": itemgetter("q_a_pairs")}
        | decomposition_prompt
        | llm
        | StrOutputParser()
    )

    answer = rag_chain.invoke({"question": q, "q_a_pairs": q_a_pairs})
    q_a_pair = format_qa_pair(q, answer)
    q_a_pairs = q_a_pairs + "\n---\n" + q_a_pair

print(answer)  # 最终综合答案
策略 2: 独立回答

每个子问题独立回答,最后综合所有答案。

python 复制代码
def retrieve_and_rag(question, prompt_rag, sub_question_generator_chain):
    # 生成子问题
    sub_questions = sub_question_generator_chain.invoke({"question": question})

    rag_results = []
    for sub_question in sub_questions:
        retrieved_docs = retriever.get_relevant_documents(sub_question)
        answer = (prompt_rag | llm | StrOutputParser()).invoke({
            "context": retrieved_docs,
            "question": sub_question
        })
        rag_results.append(answer)

    return rag_results, sub_questions

answers, questions = retrieve_and_rag(question, prompt_rag, generate_queries_decomposition)

# 综合所有答案
def format_qa_pairs(questions, answers):
    formatted_string = ""
    for i, (question, answer) in enumerate(zip(questions, answers), start=1):
        formatted_string += f"Question {i}: {question}\nAnswer {i}: {answer}\n\n"
    return formatted_string.strip()

context = format_qa_pairs(questions, answers)

template = """Here is a set of Q+A pairs:

{context}

Use these to synthesize an answer to the question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
final_rag_chain = prompt | llm | StrOutputParser()

final_answer = final_rag_chain.invoke({"context": context, "question": question})

Part 8: Step Back

核心思想

将具体问题转换为更通用的"后退"问题,从而检索到更广泛的背景知识。

实现

python 复制代码
from langchain_core.prompts import FewShotChatMessagePromptTemplate

# 1. Few-Shot 示例
examples = [
    {
        "input": "Could the members of The Police perform lawful arrests?",
        "output": "what can the members of The Police do?",
    },
    {
        "input": "Jan Sindel's was born in what country?",
        "output": "what is Jan Sindel's personal history?",
    },
]

example_prompt = ChatPromptTemplate.from_messages([
    ("human", "{input}"),
    ("ai", "{output}"),
])

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

# 2. 构建 Step Back Prompt
prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer."""),
    few_shot_prompt,
    ("user", "{question}"),
])

generate_queries_step_back = prompt | ChatOpenAI(temperature=0) | StrOutputParser()

question = "What is task decomposition for LLM agents?"
step_back_question = generate_queries_step_back.invoke({"question": question})

print(step_back_question)
# 输出: What is the process of breaking down tasks for LLM agents?
python 复制代码
# 3. 同时检索原始问题和 Step Back 问题
response_prompt_template = """You are an expert of world knowledge. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant.

# {normal_context}
# {step_back_context}

# Original Question: {question}
# Answer:"""

response_prompt = ChatPromptTemplate.from_template(response_prompt_template)

chain = (
    {
        "normal_context": RunnableLambda(lambda x: x["question"]) | retriever,
        "step_back_context": generate_queries_step_back | retriever,
        "question": lambda x: x["question"],
    }
    | response_prompt
    | ChatOpenAI(temperature=0)
    | StrOutputParser()
)

answer = chain.invoke({"question": question})

Part 9: HyDE

核心思想

Hypothetical Document Embeddings (HyDE): 让 LLM 生成一个假设性的答案文档,然后用这个文档进行检索。

实现

python 复制代码
# 1. 生成假设性文档
template = """Please write a scientific paper passage to answer the question
Question: {question}
Passage:"""

prompt_hyde = ChatPromptTemplate.from_template(template)

generate_docs_for_retrieval = (
    prompt_hyde
    | ChatOpenAI(temperature=0)
    | StrOutputParser()
)

question = "What is task decomposition for LLM agents?"
hypothetical_doc = generate_docs_for_retrieval.invoke({"question": question})

print(hypothetical_doc)
# 输出: Task decomposition is a fundamental concept in machine learning...
# (生成一段假设性的学术文章)
python 复制代码
# 2. 使用假设性文档进行检索
retrieval_chain = generate_docs_for_retrieval | retriever
retrieved_docs = retrieval_chain.invoke({"question": question})

# 3. 生成最终答案
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = prompt | llm | StrOutputParser()

answer = final_rag_chain.invoke({
    "context": retrieved_docs,
    "question": question
})
相关推荐
uzong2 小时前
315晚会曝光“AI大模型被投毒”,让AI听话,GEO是什么,带给我们什么思考
人工智能
V搜xhliang02462 小时前
机器人建模(URDF)与仿真配置
大数据·人工智能·深度学习·机器学习·自然语言处理·机器人
一只鹿鹿鹿2 小时前
信息安全等级保护安全建设防护解决方案(总体资料)
运维·开发语言·数据库·面试·职场和发展
房产中介行业研习社2 小时前
2026年3月哪些房源管理系统功能全
大数据·运维·人工智能
堕2742 小时前
MySQL数据库《基础篇--数据库索引(2)》
数据库·mysql
wei_shuo2 小时前
数据库优化器进化论:金仓如何用智能下推把查询时间从秒级打到毫秒级
数据库·kingbase·金仓
wuqingshun3141592 小时前
如何停止一个正在退出的线程
java·开发语言·jvm
Shining05962 小时前
CUDA 编程系列(三)《内存模型与规约优化》
人工智能·学习·其他·学习方法·infinitensor
雷工笔记2 小时前
Navicat Premium 17 软件安装记录
数据库