langchain实现Rag基础教程

什么是Rag

Rag（retrieval augemented generation）是一种将外部知识检索与大预言模型生成相结合的架构。

Rag是一种结合信息检索和文本生成的技术架构，能显著提升大预言模型回答的准确性和时效性。

Rag的核心思想：在生成回答前，先从外部知识库中检索相关信息，然后将这些信息作为上下文提供给大预言模型。

用户问题---检索相关文档（向量数据库、知识库）---构建提示词---LLM生成回答

为什么需要Rag

解决幻觉问题：LLM有时会编造不存在的信息
知识时效性：LLM训练数据有截止日期
私有数据接入：让模型访问企业/个人专属数据
可追溯性：可以引用具体来源

sentence-transformers

嵌入模型是必须的，无论用什么向量数据库，都需要一个模型把文本变成向量，数据库只管存储和检索向量，不管生成向量。

RAG核心流程

python 复制代码

# 1. 数据准备，索引阶段
# 使用langchain构建向量索引 langchain版本：1.2.13
import os
import time
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings  # 本地嵌入模型，不需要API key
from langchain_chroma import Chroma
from openai import OpenAI

# 设置 Hugging Face 镜像（国内加速）
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
start_time = time.time()

# 配置
PERSIST_DIR = "./chroma_db"
MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"

# 检查是否已存在持久化的向量数据库
if os.path.exists(PERSIST_DIR) and os.listdir(PERSIST_DIR):
    print("检测到现有的向量数据库，正在加载...")
    # 加载嵌入模型
    embeddings = HuggingFaceEmbeddings(
        model_name=MODEL_NAME,  # 这里会下载并加载模型
        # model_kwargs={"device": "cpu"},
        model_kwargs={"device": "cuda"},
        encode_kwargs={"normalize_embeddings": True}
    )
    # 加载持久化的向量存储
    vectorstore = Chroma(
        embedding_function=embeddings,
        persist_directory=PERSIST_DIR
    )
    print(f"向量数据库加载完成，耗时：{time.time() - start_time:.2f}秒")
else:
    print("首次运行，正在创建向量数据库...")
    # 加载文档
    loader = TextLoader("data.txt", encoding="utf-8")
    documents = loader.load()

    # 文本分割, chunk_size: 指定每个文本块的最大长度，chunk_overlap: 块重叠，保持上下文连贯, 一般设置为chunk_size的10%-20%
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    chunks = text_splitter.split_documents(documents)

    # 加载嵌入模型
    embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2",
        # model_kwargs={'device': 'cpu'},  # 使用 CPU
        model_kwargs={'device': 'cuda'},  # 使用 GPU
        encode_kwargs={'normalize_embeddings': True}
    )
    # 创建持久化向量存储
    vectorstore = Chroma.from_documents(
        documents=chunks,
        embedding=embeddings,  # 必须用模型生成向量
        persist_directory="./chroma_db"
    )
    # 新版本的Chroma会自动持久化
    print(f"向量数据库创建完成，耗时：{time.time() - start_time:.2f}秒")

# 2. 检索增强（查询阶段）
# 检索相关文档
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}  # 返回前3个最相关的文档
)

# 检索
query = "什么是RAG？"
relevant_docs = retriever.invoke(query)

# 构建增强的提示词
context = "\n\n".join([doc.page_content for doc in relevant_docs])
prompt = f"""基于以下上下文回答问题：

上下文：
{context}

问题：{query}

回答："""

# 3. 生成回答
client = OpenAI(
    api_key=os.environ.get('DEEPSEEK_API_KEY'),
    base_url="https://api.deepseek.com")

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "user", "content": prompt},
    ],
    temperature=0,
    stream=False
)

print(response.choices[0].message.content)
end_time = time.time()
print(f"总耗时: {end_time - start_time:.2f}秒")

需要的python依赖

shell 复制代码

aiohappyeyeballs==2.6.1
aiohttp==3.13.3
aiosignal==1.4.0
annotated-doc==0.0.4
annotated-types==0.7.0
anyio==4.13.0
attrs==26.1.0
bcrypt==5.0.0
build==1.4.1
certifi==2026.2.25
charset-normalizer==3.4.6
chromadb==1.5.5
click==8.3.1
colorama==0.4.6
dataclasses-json==0.6.7
distro==1.9.0
durationpy==0.10
filelock==3.25.2
flatbuffers==25.12.19
frozenlist==1.8.0
fsspec==2026.2.0
googleapis-common-protos==1.73.0
greenlet==3.3.2
grpcio==1.78.0
h11==0.16.0
hf-xet==1.4.2
httpcore==1.0.9
httptools==0.7.1
httpx==0.28.1
httpx-sse==0.4.3
huggingface_hub==1.7.2
idna==3.11
importlib_metadata==8.7.1
importlib_resources==6.5.2
Jinja2==3.1.6
jiter==0.13.0
joblib==1.5.3
jsonpatch==1.33
jsonpointer==3.1.1
jsonschema==4.26.0
jsonschema-specifications==2025.9.1
kubernetes==35.0.0
langchain==1.2.13
langchain-chroma==1.1.0
langchain-classic==1.0.3
langchain-community==0.4.1
langchain-core==1.2.22
langchain-huggingface==1.2.1
langchain-openai==1.1.12
langchain-text-splitters==1.1.1
langgraph==1.1.3
langgraph-checkpoint==4.0.1
langgraph-prebuilt==1.0.8
langgraph-sdk==0.3.12
langsmith==0.7.22
markdown-it-py==4.0.0
MarkupSafe==3.0.3
marshmallow==3.26.2
mdurl==0.1.2
mmh3==5.2.1
mpmath==1.3.0
multidict==6.7.1
mypy_extensions==1.1.0
networkx==3.6.1
numpy==2.4.3
oauthlib==3.3.1
onnxruntime==1.24.4
openai==2.29.0
opentelemetry-api==1.40.0
opentelemetry-exporter-otlp-proto-common==1.40.0
opentelemetry-exporter-otlp-proto-grpc==1.40.0
opentelemetry-proto==1.40.0
opentelemetry-sdk==1.40.0
opentelemetry-semantic-conventions==0.61b0
orjson==3.11.7
ormsgpack==1.12.2
overrides==7.7.0
packaging==26.0
pillow==12.1.1
propcache==0.4.1
protobuf==6.33.6
pybase64==1.4.3
pydantic==2.12.5
pydantic-settings==2.13.1
pydantic_core==2.41.5
Pygments==2.19.2
PyPika==0.51.1
pyproject_hooks==1.2.0
python-dateutil==2.9.0.post0
python-dotenv==1.2.2
PyYAML==6.0.3
referencing==0.37.0
regex==2026.2.28
requests==2.32.5
requests-oauthlib==2.0.0
requests-toolbelt==1.0.0
rich==14.3.3
rpds-py==0.30.0
safetensors==0.7.0
scikit-learn==1.8.0
scipy==1.17.1
sentence-transformers==5.3.0
setuptools==81.0.0
shellingham==1.5.4
six==1.17.0
sniffio==1.3.1
SQLAlchemy==2.0.48
sympy==1.13.1
tenacity==9.1.4
threadpoolctl==3.6.0
tiktoken==0.12.0
tokenizers==0.22.2
torch==2.5.1+cu121
torchaudio==2.5.1+cu121
torchvision==0.20.1+cu121
tqdm==4.67.3
transformers==5.3.0
typer==0.24.1
typing-inspect==0.9.0
typing-inspection==0.4.2
typing_extensions==4.15.0
urllib3==2.6.3
uuid_utils==0.14.1
uvicorn==0.42.0
watchfiles==1.1.1
websocket-client==1.9.0
websockets==16.0
xxhash==3.6.0
yarl==1.23.0
zipp==3.23.0
zstandard==0.25.0