什么是Rag
Rag(retrieval augemented generation)是一种将外部知识检索与大预言模型生成相结合的架构。
Rag是一种结合信息检索和文本生成的技术架构,能显著提升大预言模型回答的准确性和时效性。
Rag的核心思想:在生成回答前,先从外部知识库中检索相关信息,然后将这些信息作为上下文提供给大预言模型。
用户问题---检索相关文档(向量数据库、知识库)---构建提示词---LLM生成回答
为什么需要Rag
- 解决幻觉问题:LLM有时会编造不存在的信息
- 知识时效性:LLM训练数据有截止日期
- 私有数据接入:让模型访问企业/个人专属数据
- 可追溯性:可以引用具体来源
sentence-transformers
嵌入模型是必须的,无论用什么向量数据库,都需要一个模型把文本变成向量,数据库只管存储和检索向量,不管生成向量。
RAG核心流程
python
# 1. 数据准备,索引阶段
# 使用langchain构建向量索引 langchain版本:1.2.13
import os
import time
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings # 本地嵌入模型,不需要API key
from langchain_chroma import Chroma
from openai import OpenAI
# 设置 Hugging Face 镜像(国内加速)
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
start_time = time.time()
# 配置
PERSIST_DIR = "./chroma_db"
MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
# 检查是否已存在持久化的向量数据库
if os.path.exists(PERSIST_DIR) and os.listdir(PERSIST_DIR):
print("检测到现有的向量数据库,正在加载...")
# 加载嵌入模型
embeddings = HuggingFaceEmbeddings(
model_name=MODEL_NAME, # 这里会下载并加载模型
# model_kwargs={"device": "cpu"},
model_kwargs={"device": "cuda"},
encode_kwargs={"normalize_embeddings": True}
)
# 加载持久化的向量存储
vectorstore = Chroma(
embedding_function=embeddings,
persist_directory=PERSIST_DIR
)
print(f"向量数据库加载完成,耗时:{time.time() - start_time:.2f}秒")
else:
print("首次运行,正在创建向量数据库...")
# 加载文档
loader = TextLoader("data.txt", encoding="utf-8")
documents = loader.load()
# 文本分割, chunk_size: 指定每个文本块的最大长度,chunk_overlap: 块重叠,保持上下文连贯, 一般设置为chunk_size的10%-20%
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)
# 加载嵌入模型
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2",
# model_kwargs={'device': 'cpu'}, # 使用 CPU
model_kwargs={'device': 'cuda'}, # 使用 GPU
encode_kwargs={'normalize_embeddings': True}
)
# 创建持久化向量存储
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings, # 必须用模型生成向量
persist_directory="./chroma_db"
)
# 新版本的Chroma会自动持久化
print(f"向量数据库创建完成,耗时:{time.time() - start_time:.2f}秒")
# 2. 检索增强(查询阶段)
# 检索相关文档
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 3} # 返回前3个最相关的文档
)
# 检索
query = "什么是RAG?"
relevant_docs = retriever.invoke(query)
# 构建增强的提示词
context = "\n\n".join([doc.page_content for doc in relevant_docs])
prompt = f"""基于以下上下文回答问题:
上下文:
{context}
问题:{query}
回答:"""
# 3. 生成回答
client = OpenAI(
api_key=os.environ.get('DEEPSEEK_API_KEY'),
base_url="https://api.deepseek.com")
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "user", "content": prompt},
],
temperature=0,
stream=False
)
print(response.choices[0].message.content)
end_time = time.time()
print(f"总耗时: {end_time - start_time:.2f}秒")
需要的python依赖
shell
aiohappyeyeballs==2.6.1
aiohttp==3.13.3
aiosignal==1.4.0
annotated-doc==0.0.4
annotated-types==0.7.0
anyio==4.13.0
attrs==26.1.0
bcrypt==5.0.0
build==1.4.1
certifi==2026.2.25
charset-normalizer==3.4.6
chromadb==1.5.5
click==8.3.1
colorama==0.4.6
dataclasses-json==0.6.7
distro==1.9.0
durationpy==0.10
filelock==3.25.2
flatbuffers==25.12.19
frozenlist==1.8.0
fsspec==2026.2.0
googleapis-common-protos==1.73.0
greenlet==3.3.2
grpcio==1.78.0
h11==0.16.0
hf-xet==1.4.2
httpcore==1.0.9
httptools==0.7.1
httpx==0.28.1
httpx-sse==0.4.3
huggingface_hub==1.7.2
idna==3.11
importlib_metadata==8.7.1
importlib_resources==6.5.2
Jinja2==3.1.6
jiter==0.13.0
joblib==1.5.3
jsonpatch==1.33
jsonpointer==3.1.1
jsonschema==4.26.0
jsonschema-specifications==2025.9.1
kubernetes==35.0.0
langchain==1.2.13
langchain-chroma==1.1.0
langchain-classic==1.0.3
langchain-community==0.4.1
langchain-core==1.2.22
langchain-huggingface==1.2.1
langchain-openai==1.1.12
langchain-text-splitters==1.1.1
langgraph==1.1.3
langgraph-checkpoint==4.0.1
langgraph-prebuilt==1.0.8
langgraph-sdk==0.3.12
langsmith==0.7.22
markdown-it-py==4.0.0
MarkupSafe==3.0.3
marshmallow==3.26.2
mdurl==0.1.2
mmh3==5.2.1
mpmath==1.3.0
multidict==6.7.1
mypy_extensions==1.1.0
networkx==3.6.1
numpy==2.4.3
oauthlib==3.3.1
onnxruntime==1.24.4
openai==2.29.0
opentelemetry-api==1.40.0
opentelemetry-exporter-otlp-proto-common==1.40.0
opentelemetry-exporter-otlp-proto-grpc==1.40.0
opentelemetry-proto==1.40.0
opentelemetry-sdk==1.40.0
opentelemetry-semantic-conventions==0.61b0
orjson==3.11.7
ormsgpack==1.12.2
overrides==7.7.0
packaging==26.0
pillow==12.1.1
propcache==0.4.1
protobuf==6.33.6
pybase64==1.4.3
pydantic==2.12.5
pydantic-settings==2.13.1
pydantic_core==2.41.5
Pygments==2.19.2
PyPika==0.51.1
pyproject_hooks==1.2.0
python-dateutil==2.9.0.post0
python-dotenv==1.2.2
PyYAML==6.0.3
referencing==0.37.0
regex==2026.2.28
requests==2.32.5
requests-oauthlib==2.0.0
requests-toolbelt==1.0.0
rich==14.3.3
rpds-py==0.30.0
safetensors==0.7.0
scikit-learn==1.8.0
scipy==1.17.1
sentence-transformers==5.3.0
setuptools==81.0.0
shellingham==1.5.4
six==1.17.0
sniffio==1.3.1
SQLAlchemy==2.0.48
sympy==1.13.1
tenacity==9.1.4
threadpoolctl==3.6.0
tiktoken==0.12.0
tokenizers==0.22.2
torch==2.5.1+cu121
torchaudio==2.5.1+cu121
torchvision==0.20.1+cu121
tqdm==4.67.3
transformers==5.3.0
typer==0.24.1
typing-inspect==0.9.0
typing-inspection==0.4.2
typing_extensions==4.15.0
urllib3==2.6.3
uuid_utils==0.14.1
uvicorn==0.42.0
watchfiles==1.1.1
websocket-client==1.9.0
websockets==16.0
xxhash==3.6.0
yarl==1.23.0
zipp==3.23.0
zstandard==0.25.0