LlamaIndex：使用向量数据库进行检索，loaded_query_engine.query(“..........？“)

前文"LlamaIndex：从文档中进行检索"中，如下代码展示了如何从文档中创建索引并持久化到本地文件：

python 复制代码

# 从文档中创建索引
from llama_index.core import VectorStoreIndex
A_index = VectorStoreIndex.from_documents(A_docs)
# 持久化索引（保存到本地）
from llama_index.core import StorageContext
A_index.storage_context.persist(persist_dir="./storage/A")

但它没有用到高大上的"向量数据库"，而是内存中的简易向量存储（SimpleVectorStore），且 persist 只是将内存索引保存为本地文件（而非真正的向量数据库），这显然不符合生产环境的需求。

本文将其改造为向量数据库方案：以轻量级本地向量数据库 Chroma为例，完整演示对接流程，同时说明核心改动逻辑。

一、对接 Chroma 向量数据库的完整示例（本地部署向量数据库，无额外服务）

1. 前置准备：安装依赖

bash 复制代码

# 核心依赖：llama-index + Chroma 向量数据库
pip install llama-index-core llama-index-vector-stores-chroma chromadb

2. 完整代码（对接 Chroma）

python 复制代码

# ==============================================
# 1. 导入核心库（新增 Chroma 相关依赖）
# ==============================================
from llama_index.core import VectorStoreIndex, Document, StorageContext
from llama_index.vector_stores.chroma import ChromaVectorStore  # 新增：Chroma 向量存储
import chromadb  # 新增：Chroma 客户端

# ==============================================
# 2. 构造示例文档（和前文原代码一致）
# ==============================================
A_docs = [
    Document(
        text="玫瑰的基础单价是5元/朵，VIP客户加价10%",
        metadata={"flower_type": "玫瑰", "source": "定价规则"}
    ),
    Document(
        text="百合的基础单价是8元/朵，所有客户加价15%",
        metadata={"flower_type": "百合", "source": "定价规则"}
    )
]

# ==============================================
# 3. 初始化 Chroma 向量数据库
# ==============================================
# 步骤1：创建 Chroma 客户端（支持本地文件持久化）
# persist_directory：Chroma 数据存储目录（替代原代码的 ./storage/A）
chroma_client = chromadb.PersistentClient(path="./chroma_db")

# 步骤2：创建/获取 Chroma 集合（类似数据库的"表"，存储一组相关向量）
chroma_collection = chroma_client.get_or_create_collection(name="flower_rules")

# 步骤3：将 Chroma 集合封装为 LlamaIndex 可识别的 VectorStore
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# 步骤4：创建存储上下文（关联 Chroma 向量存储，替代默认的内存存储）
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# ==============================================
# 4. 基于 Chroma 创建索引（核心：索引数据存入 Chroma）
# ==============================================
# 对比原代码：新增 storage_context 参数，指定使用 Chroma 存储
A_index = VectorStoreIndex.from_documents(
    A_docs,
    storage_context=storage_context,  # 关键：将索引写入 Chroma
    show_progress=True  # 显示向量写入进度（大规模数据时有用）
)

# ==============================================
# 5. 检索测试（用法和原代码一致，底层走 Chroma 检索）
# ==============================================
query_engine = A_index.as_query_engine(similarity_top_k=2)
response = query_engine.query("玫瑰VIP客户的单价是多少？")
print("检索结果：", response.response)
print("匹配的原始数据：", [node.node.text for node in response.source_nodes])

# ==============================================
# 6. 加载索引（验证持久化：重启程序后仍可读取）
# ==============================================
def load_chroma_index():
    # 连接 Chroma 客户端
    chroma_client = chromadb.PersistentClient(path="./chroma_db")
    chroma_collection = chroma_client.get_collection(name="flower_rules")
    vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
    # 从 Chroma 加载索引
    index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
    return index

# 测试加载索引
loaded_index = load_chroma_index()
loaded_query_engine = loaded_index.as_query_engine()
response2 = loaded_query_engine.query("百合的加价比例是多少？") # 此句：1、"计算查询语句的向量" 依赖嵌入模型（Embedding Model）；2、由向量查到答案语句用到向量数据库；3、生成回答依赖大语言模型（LLM）
print("\n重新加载索引后的检索结果：", response2.response)

3. 核心代码解释（对比原代码的关键改动）

原代码逻辑	对接 Chroma 的改动	作用
无 Chroma 客户端	`chromadb.PersistentClient(path="./chroma_db")`	创建本地持久化的 Chroma 客户端，数据存在 `./chroma_db` 目录
无集合概念	`chroma_client.get_or_create_collection(name="flower_rules")`	创建 Chroma 集合（类似数据库表），用于隔离不同业务的向量数据
无自定义 VectorStore	`ChromaVectorStore(chroma_collection=...)`	将 Chroma 集合封装为 LlamaIndex 兼容的 VectorStore
默认 StorageContext	`StorageContext.from_defaults(vector_store=vector_store)`	指定索引的存储介质为 Chroma，而非默认内存
`from_documents(A_docs)`	`from_documents(A_docs, storage_context=...)`	将文档向量写入 Chroma，而非内存
`storage_context.persist()`	Chroma 自动持久化（无需手动 persist）	Chroma 会将向量数据写入 `./chroma_db`，重启后可直接加载

二、扩展：对接云向量数据库 Pinecone（生产环境常用）

如果需要云端部署（无需管理本地存储，支持大规模数据），以 Pinecone 为例：

1. 安装依赖

bash 复制代码

pip install llama-index-vector-stores-pinecone pinecone-client

2. 核心代码（关键片段）

python 复制代码

import pinecone
from llama_index.vector_stores.pinecone import PineconeVectorStore

# 初始化 Pinecone 客户端（需先在 Pinecone 官网获取 API Key）
pinecone.init(
    api_key="你的 Pinecone API Key",
    environment="us-west1-gcp"  # 替换为你的 Pinecone 环境
)

# 创建/获取 Pinecone 索引（需先在 Pinecone 控制台创建，维度需匹配嵌入模型）
pinecone_index = pinecone.Index("flower-rules-index")

# 封装为 LlamaIndex VectorStore
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)

# 创建存储上下文
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# 创建索引（数据写入 Pinecone 云端）
A_index = VectorStoreIndex.from_documents(A_docs, storage_context=storage_context)

三、常见向量数据库对比（选型参考）

向量数据库	部署方式	特点	适用场景
Chroma	本地/轻量部署	无需额外服务，易上手，支持本地持久化	开发测试、小规模私有知识库
FAISS	本地部署（Facebook 开源）	检索速度极快，支持大规模向量	本地高性能检索、无网络依赖
Pinecone	云端托管	无需运维，支持自动扩容，多地区部署	生产环境、大规模数据、多服务访问
Milvus	本地/集群部署	功能全面，支持多索引类型、分区	企业级大规模知识库、定制化需求

四、纯本地运行知识库程序

作为程序员个人玩玩知识库，可以考虑纯本地部署（无费用地玩）。

以下是纯本地运行的完整 LlamaIndex 知识库程序（Chroma 向量数据库 + 本地 BGE 嵌入模型），全程无外网依赖、无 OpenAI API 调用，覆盖「文档加载→索引创建→持久化→加载→检索」全流程，且包含两种检索模式（无 LLM 纯检索/可选本地 LLM 生成回答）：

1、运行前准备

1. 安装全部依赖（一键复制执行）

bash 复制代码

# 核心库：LlamaIndex 核心 + 本地文件加载 + Chroma 对接 + 本地嵌入模型
pip install llama-index-core llama-index-readers-file llama-index-vector-stores-chroma chromadb llama-index-embeddings-huggingface
# 运行本地 BGE 嵌入模型的依赖
pip install transformers torch sentence-transformers

2. 准备测试文档

在本地新建文件夹（如 ./flower_docs），放入任意格式的测试文档（TXT/PDF/MD 均可），示例文档（保存为 flower_pricing.txt）：

txt 复制代码

玫瑰定价规则：
1. 基础单价：5元/朵
2. 普通客户：加价20%（最终6元/朵）
3. VIP客户：加价10%（最终5.5元/朵）

百合定价规则：
1. 基础单价：8元/朵
2. 所有客户统一加价15%（最终9.2元/朵）
3. 批量购买（≥50朵）：额外减免5%

康乃馨定价规则：
1. 基础单价：3元/朵
2. 节日期间（情人节/母亲节）：加价50%
3. 非节日：无加价

2、完整程序代码（含详细注释）

python 复制代码

# ==============================================
# 1. 导入核心库（纯本地依赖，无 OpenAI）
# ==============================================
import os
from typing import List
# LlamaIndex 核心
from llama_index.core import (
    Document,
    VectorStoreIndex,
    StorageContext,
    Settings
)
# 本地文档加载器
from llama_index.readers.file import SimpleDirectoryReader
# Chroma 向量数据库对接
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb
# 本地 BGE 嵌入模型
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# ==============================================
# 2. 全局配置（核心：纯本地 BGE 嵌入模型，无 LLM 依赖）
# ==============================================
# 配置中文嵌入模型（BGE-small-zh-v1.5，轻量、中文友好、纯本地）
Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-zh-v1.5",  # 本地嵌入模型，无需联网
    embed_batch_size=8,                   # 批量嵌入提升效率
    model_kwargs={"device": "cpu"},       # 仅用 CPU 运行（无 GPU 也可）
    normalize_embeddings=True             # 归一化向量，提升检索精度
)
# 关闭默认 LLM（我们先做"纯检索"，无需生成回答；后续可可选添加本地 LLM）
Settings.llm = None

# ==============================================
# 3. 定义常量（替换为你的路径）
# ==============================================
DOC_DIR = "./flower_docs"          # 测试文档文件夹路径
CHROMA_DB_PATH = "./chroma_db"     # Chroma 向量数据库持久化路径
CHROMA_COLLECTION_NAME = "flower_pricing_rules"  # Chroma 集合名（类似数据库表）

# ==============================================
# 4. 加载本地文档 → 生成 Document 列表
# ==============================================
def load_local_documents(doc_dir: str) -> List[Document]:
    """
    加载指定文件夹下的所有文档，转为 LlamaIndex Document 列表
    """
    # 校验文件夹是否存在
    if not os.path.exists(doc_dir):
        raise FileNotFoundError(f"文档文件夹 {doc_dir} 不存在，请先创建并放入测试文档！")
    
    # 初始化本地文件加载器（支持 TXT/PDF/MD 等）
    reader = SimpleDirectoryReader(
        input_dir=doc_dir,
        required_exts=[".txt", ".pdf", ".md"],  # 仅加载指定格式
        recursive=False,                        # 不递归子文件夹
        encoding="utf-8"                        # 中文编码，避免乱码
    )
    
    # 加载文档并转为 Document 列表
    docs = reader.load_data()
    print(f"✅ 成功加载 {len(docs)} 个 Document（每个文件/PDF页对应1个）")
    
    # 补充自定义元数据（便于检索溯源）
    for i, doc in enumerate(docs):
        doc.metadata["doc_id"] = f"doc_{i+1}"
        doc.metadata["file_type"] = os.path.splitext(doc.metadata["file_name"])[1]
    
    return docs

# ==============================================
# 5. 创建 Chroma 索引 → 持久化到本地
# ==============================================
def create_and_persist_chroma_index(docs: List[Document]):
    """
    将 Document 写入 Chroma 向量数据库，创建并持久化索引
    """
    # 步骤1：初始化 Chroma 客户端（本地持久化）
    chroma_client = chromadb.PersistentClient(path=CHROMA_DB_PATH)
    
    # 步骤2：创建/获取 Chroma 集合（不存在则创建，存在则复用）
    chroma_collection = chroma_client.get_or_create_collection(
        name=CHROMA_COLLECTION_NAME,
        metadata={"description": "鲜花定价规则知识库"}  # 集合描述（可选）
    )
    
    # 步骤3：将 Chroma 集合封装为 LlamaIndex 可识别的 VectorStore
    vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
    
    # 步骤4：创建存储上下文（关联 Chroma 向量存储）
    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    
    # 步骤5：基于 Document 创建向量索引（写入 Chroma）
    index = VectorStoreIndex.from_documents(
        docs,
        storage_context=storage_context,
        show_progress=True  # 显示向量生成进度（大规模文档时有用）
    )
    
    print(f"✅ 成功创建索引并写入 Chroma 向量库（集合名：{CHROMA_COLLECTION_NAME}）")
    return index

# ==============================================
# 6. 从 Chroma 加载已持久化的索引
# ==============================================
def load_chroma_index():
    """
    从本地 Chroma 数据库加载已保存的索引（无需重新处理文档）
    """
    # 步骤1：重新连接 Chroma 客户端
    chroma_client = chromadb.PersistentClient(path=CHROMA_DB_PATH)
    
    # 步骤2：获取已存在的集合（若不存在则报错）
    try:
        chroma_collection = chroma_client.get_collection(name=CHROMA_COLLECTION_NAME)
    except ValueError:
        raise ValueError(f"Chroma 集合 {CHROMA_COLLECTION_NAME} 不存在，请先运行 create_and_persist_chroma_index 创建索引！")
    
    # 步骤3：封装为 VectorStore 并加载索引
    vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
    index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
    
    print(f"✅ 成功从 Chroma 加载索引（集合名：{CHROMA_COLLECTION_NAME}）")
    return index

# ==============================================
# 7. 检索函数（两种模式：纯检索/可选本地 LLM 生成回答）
# ==============================================
def retrieve_from_index(index, query_text: str, use_llm: bool = False):
    """
    从索引中检索相似内容
    :param index: 加载好的 VectorStoreIndex
    :param query_text: 用户查询语句
    :param use_llm: 是否使用 LLM 生成自然语言回答（默认 False，纯检索）
    """
    # 创建查询引擎（核心配置）
    query_engine = index.as_query_engine(
        similarity_top_k=3,  # 返回最相似的 3 个 Node
        response_mode="no_text" if not use_llm else "compact",  # 无 LLM 则仅返回检索结果
        verbose=True         # 打印检索过程日志
    )
    
    # 执行检索
    response = query_engine.query(query_text)
    
    # 输出检索结果
    print("\n" + "="*80)
    print(f"🔍 用户查询：{query_text}")
    print("="*80)
    
    if not use_llm:
        # 模式1：纯检索（无 LLM），输出匹配的原始 Node 内容
        print("📌 匹配到的相似内容（纯本地检索，无 LLM）：")
        for i, node in enumerate(response.source_nodes):
            print(f"\n【相似结果 {i+1}】")
            print(f"相似度：{node.score:.4f}（越高越相似）")
            print(f"来源文档：{node.node.metadata.get('file_name', '未知')}")
            print(f"内容：{node.node.text.strip()}")
    else:
        # 模式2：有 LLM，输出自然语言回答（需提前配置本地 LLM）
        print("📌 生成的回答（本地 LLM）：")
        print(response.response)
        print("\n📌 回答来源：")
        for i, node in enumerate(response.source_nodes):
            print(f"  {i+1}. 相似度 {node.score:.4f} | 内容：{node.node.text[:100]}...")
    
    print("="*80 + "\n")

# ==============================================
# 8. 主程序（串联全流程）
# ==============================================
if __name__ == "__main__":
    try:
        # 步骤1：加载本地文档
        docs = load_local_documents(DOC_DIR)
        
        # 步骤2：创建并持久化 Chroma 索引（首次运行执行，后续可注释）
        index = create_and_persist_chroma_index(docs)
        
        # 步骤3：从 Chroma 加载索引（后续运行可直接加载，无需重新创建）
        # index = load_chroma_index()
        
        # 步骤4：测试检索（纯本地，无 LLM）
        retrieve_from_index(index, "玫瑰VIP客户的单价是多少？", use_llm=False)
        retrieve_from_index(index, "百合批量购买50朵的单价是多少？", use_llm=False)
        retrieve_from_index(index, "康乃馨情人节的价格是多少？", use_llm=False)
        
    except Exception as e:
        print(f"❌ 程序运行失败：{type(e).__name__} - {str(e)}")

3、核心流程解释（对应代码模块）

流程步骤	代码函数	核心作用	本地/联网
文档加载	`load_local_documents`	读取本地 `flower_docs` 文件夹的文档，转为 `Document` 列表	纯本地
向量生成	`Settings.embed_model`（BGE）	将文档/查询语句转为语义向量（512维）	纯本地
索引创建	`create_and_persist_chroma_index`	将向量写入 Chroma 本地数据库（`./chroma_db`）	纯本地
索引加载	`load_chroma_index`	从 Chroma 读取已保存的索引（无需重新处理文档）	纯本地
检索	`retrieve_from_index`	计算查询向量 → 在 Chroma 匹配相似向量 → 返回结果	纯本地

四、运行步骤与验证

首次运行：
- 确保 ./flower_docs 文件夹有测试文档；
- 直接运行代码，程序会自动：加载文档 → 创建 Chroma 索引 → 执行检索测试；
- 运行后会生成 ./chroma_db 文件夹（Chroma 持久化数据）。
后续运行：
- 注释掉 index = create_and_persist_chroma_index(docs)；
- 取消注释 index = load_chroma_index()；
- 运行代码，直接加载已保存的索引，无需重新处理文档，速度更快。
测试查询示例：
- 输入查询：玫瑰VIP客户的单价是多少？ → 匹配到"玫瑰VIP客户：加价10%（最终5.5元/朵）"；
- 输入查询：百合批量购买50朵的单价是多少？ → 匹配到"批量购买（≥50朵）：额外减免5%"；
- 输入查询：康乃馨情人节的价格是多少？ → 匹配到"节日期间（情人节/母亲节）：加价50%"。

3、可选扩展：添加本地 LLM 生成自然语言回答

如果需要生成流畅的自然语言回答（而非仅检索原始内容），可在 全局配置 部分添加本地 LLM 配置（以 Llama 3 为例）：

python 复制代码

# 新增：配置本地 LLM（需先下载 Llama 3 本地模型文件）
from llama_index.llms.llama_cpp import LlamaCPP
Settings.llm = LlamaCPP(
    model_path="./llama-3-8b-instruct.Q4_K_M.gguf",  # 本地模型文件路径
    temperature=0.1,          # 回答随机性（越低越精准）
    max_new_tokens=300,       # 最大生成字数
    context_window=3900,      # 上下文窗口大小
    model_kwargs={"n_threads": 4}  # 线程数，加速推理
)

然后调用检索函数时设置 use_llm=True，即可生成自然语言回答（全程仍为纯本地）。

这个程序是纯本地、无外网依赖的完整知识库方案：

用 Chroma 替代 LlamaIndex 默认的内存存储，支持大规模文档和持久化；
用 BGE 嵌入模型完成文本→向量转换，全程本地运行；
覆盖"文档加载→索引创建→持久化→加载→检索"全流程，可直接适配你的业务文档；
支持两种检索模式：纯检索（无 LLM）和本地 LLM 生成回答（可选）。

你只需替换 DOC_DIR 为自己的业务文档路径，即可快速搭建专属的本地知识库！