前言
💡 痛点:传统数据库无法处理语义搜索?相似度查询性能差?AI 应用缺乏记忆?
🎯 解决方案:向量数据库 --- 存储高维向量,支持近似最近邻(ANN)搜索,让 AI 应用拥有"语义理解"能力。
向量数据库能力矩阵:
#mermaid-svg-GdJSKjIng1dtOHTo{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-GdJSKjIng1dtOHTo .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-GdJSKjIng1dtOHTo .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-GdJSKjIng1dtOHTo .error-icon{fill:#552222;}#mermaid-svg-GdJSKjIng1dtOHTo .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-GdJSKjIng1dtOHTo .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-GdJSKjIng1dtOHTo .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-GdJSKjIng1dtOHTo .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-GdJSKjIng1dtOHTo .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-GdJSKjIng1dtOHTo .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-GdJSKjIng1dtOHTo .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-GdJSKjIng1dtOHTo .marker{fill:#333333;stroke:#333333;}#mermaid-svg-GdJSKjIng1dtOHTo .marker.cross{stroke:#333333;}#mermaid-svg-GdJSKjIng1dtOHTo svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-GdJSKjIng1dtOHTo p{margin:0;}#mermaid-svg-GdJSKjIng1dtOHTo .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-GdJSKjIng1dtOHTo .cluster-label text{fill:#333;}#mermaid-svg-GdJSKjIng1dtOHTo .cluster-label span{color:#333;}#mermaid-svg-GdJSKjIng1dtOHTo .cluster-label span p{background-color:transparent;}#mermaid-svg-GdJSKjIng1dtOHTo .label text,#mermaid-svg-GdJSKjIng1dtOHTo span{fill:#333;color:#333;}#mermaid-svg-GdJSKjIng1dtOHTo .node rect,#mermaid-svg-GdJSKjIng1dtOHTo .node circle,#mermaid-svg-GdJSKjIng1dtOHTo .node ellipse,#mermaid-svg-GdJSKjIng1dtOHTo .node polygon,#mermaid-svg-GdJSKjIng1dtOHTo .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-GdJSKjIng1dtOHTo .rough-node .label text,#mermaid-svg-GdJSKjIng1dtOHTo .node .label text,#mermaid-svg-GdJSKjIng1dtOHTo .image-shape .label,#mermaid-svg-GdJSKjIng1dtOHTo .icon-shape .label{text-anchor:middle;}#mermaid-svg-GdJSKjIng1dtOHTo .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-GdJSKjIng1dtOHTo .rough-node .label,#mermaid-svg-GdJSKjIng1dtOHTo .node .label,#mermaid-svg-GdJSKjIng1dtOHTo .image-shape .label,#mermaid-svg-GdJSKjIng1dtOHTo .icon-shape .label{text-align:center;}#mermaid-svg-GdJSKjIng1dtOHTo .node.clickable{cursor:pointer;}#mermaid-svg-GdJSKjIng1dtOHTo .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-GdJSKjIng1dtOHTo .arrowheadPath{fill:#333333;}#mermaid-svg-GdJSKjIng1dtOHTo .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-GdJSKjIng1dtOHTo .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-GdJSKjIng1dtOHTo .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-GdJSKjIng1dtOHTo .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-GdJSKjIng1dtOHTo .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-GdJSKjIng1dtOHTo .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-GdJSKjIng1dtOHTo .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-GdJSKjIng1dtOHTo .cluster text{fill:#333;}#mermaid-svg-GdJSKjIng1dtOHTo .cluster span{color:#333;}#mermaid-svg-GdJSKjIng1dtOHTo div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-GdJSKjIng1dtOHTo .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-GdJSKjIng1dtOHTo rect.text{fill:none;stroke-width:0;}#mermaid-svg-GdJSKjIng1dtOHTo .icon-shape,#mermaid-svg-GdJSKjIng1dtOHTo .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-GdJSKjIng1dtOHTo .icon-shape p,#mermaid-svg-GdJSKjIng1dtOHTo .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-GdJSKjIng1dtOHTo .icon-shape .label rect,#mermaid-svg-GdJSKjIng1dtOHTo .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-GdJSKjIng1dtOHTo .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-GdJSKjIng1dtOHTo .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-GdJSKjIng1dtOHTo :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 应用
向量数据库
向量化
输入
文本
图片
音频
视频
Embedding Model
Milvus
Pinecone
Weaviate
Qdrant
Chroma
语义搜索
推荐系统
去重
RAG
传统数据库 vs 向量数据库:
| 能力 | 传统数据库(MySQL/PG) | 向量数据库 |
|---|---|---|
| 查询方式 | 精确匹配 / 范围查询 | 相似度搜索(余弦/欧氏距离) |
| 语义理解 | ❌ 无 | ✅ 有(通过 Embedding) |
| 高维数据 | ⚠️ 支持差(维度灾难) | ✅ 原生支持(1024+ 维) |
| 搜索速度 | 🐢 暴力扫描 O(n) | ⚡ ANN 索引 O(log n) |
| AI 集成 | ⚠️ 需额外处理 | ✅ 原生支持 Embedding |
| 适用场景 | 结构化数据 | 非结构化数据(文本/图片/音频) |
一、向量与 Embedding 基础
1.1 向量基础
python
# ===== 向量基础 =====
import numpy as np
from typing import List, Tuple
import math
class VectorUtils:
"""向量工具类"""
@staticmethod
def cosine_similarity(v1: List[float], v2: List[float]) -> float:
"""余弦相似度"""
v1 = np.array(v1)
v2 = np.array(v2)
dot_product = np.dot(v1, v2)
norm1 = np.linalg.norm(v1)
norm2 = np.linalg.norm(v2)
if norm1 == 0 or norm2 == 0:
return 0.0
return float(dot_product / (norm1 * norm2))
@staticmethod
def euclidean_distance(v1: List[float], v2: List[float]) -> float:
"""欧氏距离"""
v1 = np.array(v1)
v2 = np.array(v2)
return float(np.linalg.norm(v1 - v2))
@staticmethod
def dot_product(v1: List[float], v2: List[float]) -> float:
"""点积相似度"""
return float(np.dot(np.array(v1), np.array(v2)))
@staticmethod
def normalize(v: List[float]) -> List[float]:
"""向量归一化(L2 归一化)"""
v = np.array(v)
norm = np.linalg.norm(v)
if norm == 0:
return v.tolist()
return (v / norm).tolist()
@staticmethod
def top_k(
query: List[float],
vectors: List[List[float]],
k: int = 5,
metric: str = "cosine"
) -> List[Tuple[int, float]]:
"""Top-K 相似度搜索(暴力)"""
scores = []
for i, v in enumerate(vectors):
if metric == "cosine":
score = VectorUtils.cosine_similarity(query, v)
elif metric == "euclidean":
# 距离转相似度
dist = VectorUtils.euclidean_distance(query, v)
score = 1 / (1 + dist)
elif metric == "dot":
score = VectorUtils.dot_product(query, v)
else:
raise ValueError(f"不支持的度量: {metric}")
scores.append((i, score))
# 降序排序取 Top-K
scores.sort(key=lambda x: x[1], reverse=True)
return scores[:k]
# 使用示例
if __name__ == "__main__":
# 模拟 3 个文档的向量(实际应从 Embedding 模型获取)
vec_A = [0.1, 0.3, 0.5, 0.7, 0.9]
vec_B = [0.2, 0.4, 0.6, 0.8, 1.0] # 与 A 相似
vec_C = [-0.1, -0.3, -0.5, -0.7, -0.9] # 与 A 相反
vectors = [vec_A, vec_B, vec_C]
query = [0.15, 0.35, 0.55, 0.75, 0.95] # 接近 A 和 B
top_k = VectorUtils.top_k(query, vectors, k=2, metric="cosine")
print(f"Top-2 结果: {top_k}")
# 输出: [(0, 0.998), (1, 0.992)] (索引, 相似度)
1.2 Embedding 模型
python
# ===== Embedding 生成 =====
from openai import OpenAI
from sentence_transformers import SentenceTransformer
import numpy as np
class EmbeddingGenerator:
"""Embedding 生成器"""
def __init__(self, model_type: str = "openai", model_name: str = None, api_key: str = None):
self.model_type = model_type
self.model_name = model_name
if model_type == "openai":
self.client = OpenAI(api_key=api_key)
self.model_name = model_name or "text-embedding-3-small"
self.dimension = 1536 # text-embedding-3-small 维度
elif model_type == "sentence_transformer":
self.model = SentenceTransformer(model_name or "all-MiniLM-L6-v2")
self.dimension = self.model.get_sentence_embedding_dimension()
else:
raise ValueError(f"不支持的模型类型: {model_type}")
def encode(self, texts: List[str]) -> np.ndarray:
"""编码文本为向量"""
if self.model_type == "openai":
return self._encode_openai(texts)
elif self.model_type == "sentence_transformer":
return self._encode_sentence_transformer(texts)
return np.array([])
def encode_single(self, text: str) -> List[float]:
"""编码单条文本"""
vectors = self.encode([text])
return vectors[0].tolist() if len(vectors) > 0 else []
def _encode_openai(self, texts: List[str]) -> np.ndarray:
"""使用 OpenAI API 编码"""
response = self.client.embeddings.create(
model=self.model_name,
input=texts
)
vectors = [item.embedding for item in response.data]
return np.array(vectors)
def _encode_sentence_transformer(self, texts: List[str]) -> np.ndarray:
"""使用 SentenceTransformer 编码"""
vectors = self.model.encode(texts, convert_to_numpy=True)
return vectors
def similarity(self, text1: str, text2: str) -> float:
"""计算两段文本的相似度"""
vec1 = self.encode_single(text1)
vec2 = self.encode_single(text2)
return VectorUtils.cosine_similarity(vec1, vec2)
def search(
self,
query: str,
documents: List[str],
top_k: int = 5
) -> List[Tuple[int, float, str]]:
"""语义搜索"""
# 编码查询
query_vec = self.encode_single(query)
# 编码文档
doc_vecs = self.encode(documents)
# 计算相似度
results = []
for i, doc_vec in enumerate(doc_vecs):
sim = VectorUtils.cosine_similarity(query_vec, doc_vec.tolist())
results.append((i, sim, documents[i]))
# 排序
results.sort(key=lambda x: x[1], reverse=True)
return results[:top_k]
# 使用 OpenAI Embedding
# generator = EmbeddingGenerator(model_type="openai", api_key="your-api-key")
# 使用本地 SentenceTransformer(推荐用于生产)
generator = EmbeddingGenerator(model_type="sentence_transformer")
# 示例:语义搜索
documents = [
"如何使用 Python 连接 MySQL 数据库?",
"PostgreSQL 和 MySQL 的区别是什么?",
"Redis 缓存策略有哪些?",
"如何使用 Docker 部署应用?",
"向量数据库 Milvus 使用教程",
]
query = "数据库怎么连接?"
results = generator.search(query, documents, top_k=3)
for idx, score, doc in results:
print(f" [{score:.4f}] {doc}")
二、主流向量数据库对比
2.1 选型指南
| 数据库 | 开源 | 部署方式 | 索引类型 | 特色功能 | 适用场景 |
|---|---|---|---|---|---|
| Milvus | ✅ | 自建 / 云服务 | HNSW/IVF/ANNOY | 分布式、混合搜索 | 大规模生产环境 |
| Pinecone | ❌ | 云服务 | 专有 | 全托管、Serverless | 快速原型、无运维 |
| Weaviate | ✅ | 自建 / 云服务 | HNSW | 内置 LLM 模块、多模态 | 知识图谱 + 向量 |
| Qdrant | ✅ | 自建 / 云服务 | HNSW | Rust 实现、过滤强大 | 高性能 + 过滤 |
| Chroma | ✅ | 嵌入式 / 服务端 | HNSW | 嵌入式、轻量级 | 本地开发、测试 |
| pgvector | ✅ | PostgreSQL 插件 | HNSW/IVFFlat | 无需额外组件 | PostgreSQL 用户 |
2.2 性能对比(参考)
| 数据库 | 写入速度 | 查询速度(QPS) | 亿级数据 | 过滤能力 |
|---|---|---|---|---|
| Milvus | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ✅ | ⭐⭐⭐⭐ |
| Pinecone | ⭐⭐⭐ | ⭐⭐⭐⭐ | ✅ | ⭐⭐⭐ |
| Weaviate | ⭐⭐⭐ | ⭐⭐⭐⭐ | ✅ | ⭐⭐⭐⭐ |
| Qdrant | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ✅ | ⭐⭐⭐⭐⭐ |
| Chroma | ⭐⭐⭐ | ⭐⭐⭐ | ⚠️ | ⭐⭐ |
| pgvector | ⭐⭐ | ⭐⭐ | ⚠️ | ⭐⭐⭐⭐⭐ |
三、Milvus 实战
3.1 安装与连接
python
# ===== Milvus 安装与连接 =====
# Docker 安装(推荐)
# docker run -d --name milvus \
# -p 19530:19530 \
# -p 9091:9091 \
# milvusdb/milvus:v2.4.0
from pymilvus import (
connections,
FieldSchema, CollectionSchema, Collection,
utility,
DataType,
AnnSearchRequest,
WeightedRanker
)
from typing import List, Dict, Any
class MilvusManager:
"""Milvus 管理器"""
def __init__(self, host: str = "localhost", port: str = "19530"):
self.host = host
self.port = port
self._connect()
def _connect(self):
"""连接 Milvus"""
connections.connect(
alias="default",
host=self.host,
port=self.port
)
print(f"✅ 已连接到 Milvus: {self.host}:{self.port}")
def create_collection(
self,
name: str,
dim: int,
metric_type: str = "COSINE",
overwrite: bool = False
) -> Collection:
"""创建 Collection"""
if utility.has_collection(name):
if overwrite:
utility.drop_collection(name)
print(f"🗑️ 已删除现有 Collection: {name}")
else:
print(f"⚠️ Collection 已存在: {name}")
return Collection(name)
# 定义 Schema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim),
FieldSchema(name="document", dtype=DataType.VARCHAR, max_length=4096),
FieldSchema(name="metadata", dtype=DataType.JSON),
FieldSchema(name="timestamp", dtype=DataType.INT64),
]
schema = CollectionSchema(
fields=fields,
description=f"Collection for {name}"
)
# 创建 Collection
collection = Collection(
name=name,
schema=schema,
consistency_level="Strong"
)
# 创建索引
self._create_index(collection, metric_type)
print(f"✅ 已创建 Collection: {name} (维度: {dim}, 度量: {metric_type})")
return collection
def _create_index(self, collection: Collection, metric_type: str = "COSINE"):
"""创建向量索引"""
index_params = {
"index_type": "HNSW", # 可选: IVF_FLAT, IVF_SQ8, ANNOY
"metric_type": metric_type,
"params": {"M": 16, "efConstruction": 256}
}
collection.create_index(
field_name="embedding",
index_params=index_params
)
print(f"✅ 已创建 HNSW 索引 (M=16, efConstruction=256)")
def insert(
self,
collection_name: str,
embeddings: List[List[float]],
documents: List[str],
metadata: List[Dict] = None,
timestamps: List[int] = None
) -> List[int]:
"""插入数据"""
collection = Collection(collection_name)
# 准备数据
data = [embeddings]
# document 字段
data.append(documents)
# metadata 字段
if metadata is None:
metadata = [{} for _ in range(len(embeddings))]
data.append(metadata)
# timestamp 字段
if timestamps is None:
import time
timestamps = [int(time.time()) for _ in range(len(embeddings))]
data.append(timestamps)
# 插入
mr = collection.insert(data)
collection.flush()
print(f"✅ 已插入 {len(embeddings)} 条数据")
return mr.primary_keys
def search(
self,
collection_name: str,
query_embeddings: List[List[float]],
top_k: int = 5,
expr: str = None,
output_fields: List[str] = None
) -> List[List[Dict]]:
"""向量搜索"""
collection = Collection(collection_name)
collection.load()
search_params = {"metric_type": "COSINE", "params": {"ef": 64}}
results = collection.search(
data=query_embeddings,
anns_field="embedding",
param=search_params,
limit=top_k,
expr=expr,
output_fields=output_fields or ["document", "metadata"]
)
# 格式化结果
formatted = []
for hits in results:
hits_list = []
for hit in hits:
hits_list.append({
"id": hit.id,
"distance": hit.distance,
"document": hit.entity.get("document"),
"metadata": hit.entity.get("metadata"),
})
formatted.append(hits_list)
return formatted
def delete(self, collection_name: str, ids: List[int] = None, expr: str = None):
"""删除数据"""
collection = Collection(collection_name)
if ids:
collection.delete(f"id in {ids}")
print(f"✅ 已删除 ID: {ids}")
elif expr:
collection.delete(expr)
print(f"✅ 已删除符合条件的数据: {expr}")
def drop_collection(self, name: str):
"""删除 Collection"""
if utility.has_collection(name):
utility.drop_collection(name)
print(f"✅ 已删除 Collection: {name}")
def list_collections(self) -> List[str]:
"""列出所有 Collection"""
return utility.list_collections()
# 使用
manager = MilvusManager(host="localhost", port="19530")
# 创建 Collection
collection = manager.create_collection(
name="documents",
dim=768, # SentenceTransformer all-MiniLM-L6-v2 维度
metric_type="COSINE",
overwrite=True
)
3.2 混合搜索(向量 + 标量过滤)
python
# ===== Milvus 混合搜索 =====
class HybridSearch:
"""混合搜索(向量 + 标量过滤)"""
def __init__(self, manager: MilvusManager):
self.manager = manager
def search_with_filter(
self,
collection_name: str,
query_embedding: List[float],
filter_expr: str,
top_k: int = 5
) -> List[Dict]:
"""
带过滤的向量搜索
filter_expr 示例:
- "metadata['category'] == 'tech'" # 类别过滤
- "timestamp > 1700000000" # 时间过滤
- "metadata['score'] > 0.8" # 分数过滤
"""
return self.manager.search(
collection_name=collection_name,
query_embeddings=[query_embedding],
top_k=top_k,
expr=filter_expr,
output_fields=["document", "metadata", "timestamp"]
)[0]
def multi_vector_search(
self,
collection_name: str,
query_embeddings: List[List[float]],
weights: List[float] = None,
top_k: int = 5
) -> List[Dict]:
"""
多向量搜索(Hybrid Search)
适用于:同一文档有多种 Embedding(如:标题向量 + 内容向量)
"""
collection = Collection(collection_name)
collection.load()
if weights is None:
weights = [1.0 / len(query_embeddings)] * len(query_embeddings)
# 构建多个 AnnSearchRequest
requests = []
for i, emb in enumerate(query_embeddings):
req = AnnSearchRequest(
data=[emb],
anns_field="embedding",
param={"metric_type": "COSINE", "params": {"ef": 64}},
limit=top_k * 2 # 取更多候选
)
requests.append(req)
# 加权排序
ranker = WeightedRanker(*weights)
results = collection.hybrid_search(
reqs=requests,
rerank=ranker,
limit=top_k,
output_fields=["document", "metadata"]
)
formatted = []
for hit in results[0]:
formatted.append({
"id": hit.id,
"distance": hit.distance,
"document": hit.entity.get("document"),
"metadata": hit.entity.get("metadata"),
})
return formatted
# 使用
hybrid = HybridSearch(manager)
# 示例 1: 带过滤的搜索
results = hybrid.search_with_filter(
collection_name="documents",
query_embedding=query_vec,
filter_expr="metadata['category'] == 'database' AND timestamp > 1700000000",
top_k=5
)
# 示例 2: 多向量搜索
title_vec = generator.encode_single("文章标题的向量")
content_vec = generator.encode_single("文章内容的向量")
results = hybrid.multi_vector_search(
collection_name="documents",
query_embeddings=[title_vec, content_vec],
weights=[0.3, 0.7], # 内容权重更高
top_k=5
)
四、Qdrant 实战
4.1 Qdrant 快速上手
python
# ===== Qdrant 实战 =====
from qdrant_client import QdrantClient
from qdrant_client.http import models
from typing import List, Dict, Optional
class QdrantManager:
"""Qdrant 管理器"""
def __init__(self, host: str = "localhost", port: int = 6333, api_key: str = None):
self.client = QdrantClient(host=host, port=port, api_key=api_key)
print(f"✅ 已连接到 Qdrant: {host}:{port}")
def create_collection(
self,
name: str,
dim: int,
distance: str = "Cosine",
overwrite: bool = False
):
"""创建 Collection"""
if self.client.collection_exists(name):
if overwrite:
self.client.delete_collection(name)
print(f"🗑️ 已删除现有 Collection: {name}")
else:
print(f"⚠️ Collection 已存在: {name}")
return
self.client.create_collection(
collection_name=name,
vectors_config=models.VectorParams(
size=dim,
distance=getattr(models.Distance, distance)
)
)
print(f"✅ 已创建 Collection: {name} (维度: {dim}, 距离: {distance})")
def insert(
self,
collection_name: str,
embeddings: List[List[float]],
documents: List[str],
metadata: List[Dict] = None,
ids: List[int] = None
) -> List[int]:
"""插入数据"""
from qdrant_client.http.models import PointStruct
if ids is None:
import uuid
ids = [hash(str(i)) % (2**63) for i in range(len(embeddings))]
if metadata is None:
metadata = [{} for _ in range(len(embeddings))]
# 构造 PointStruct
points = []
for i, (emb, doc, meta) in enumerate(zip(embeddings, documents, metadata)):
points.append(PointStruct(
id=ids[i],
vector=emb,
payload={"document": doc, **meta}
))
# 批量插入
self.client.upsert(
collection_name=collection_name,
points=points
)
print(f"✅ 已插入 {len(embeddings)} 条数据")
return ids
def search(
self,
collection_name: str,
query_embedding: List[float],
top_k: int = 5,
score_threshold: float = None,
filter_conditions: Dict = None
) -> List[Dict]:
"""向量搜索"""
# 构造过滤条件
query_filter = None
if filter_conditions:
query_filter = self._build_filter(filter_conditions)
results = self.client.search(
collection_name=collection_name,
query_vector=query_embedding,
limit=top_k,
score_threshold=score_threshold,
query_filter=query_filter
)
# 格式化结果
formatted = []
for hit in results:
formatted.append({
"id": hit.id,
"score": hit.score,
"document": hit.payload.get("document"),
**{k: v for k, v in hit.payload.items() if k != "document"}
})
return formatted
def _build_filter(self, conditions: Dict) -> models.Filter:
"""构建过滤条件"""
must = []
for key, value in conditions.items():
if isinstance(value, dict):
# 范围过滤
if "gte" in value:
must.append(models.FieldCondition(
key=key,
range=models.Range(gte=value["gte"], lte=value.get("lte"))
))
elif isinstance(value, list):
# IN 过滤
must.append(models.FieldCondition(
key=key,
match=models.MatchAny(any=value)
))
else:
# 精确匹配
must.append(models.FieldCondition(
key=key,
match=models.MatchValue(value=value)
))
return models.Filter(must=must)
def delete_collection(self, name: str):
"""删除 Collection"""
if self.client.collection_exists(name):
self.client.delete_collection(name)
print(f"✅ 已删除 Collection: {name}")
def get_collection_info(self, name: str) -> Dict:
"""获取 Collection 信息"""
info = self.client.get_collection(name)
return {
"name": info.name,
"status": info.status,
"vectors_count": info.vectors_count,
"segments_count": info.segments_count,
"config": info.config
}
# 使用
qdrant = QdrantManager(host="localhost", port=6333)
# 创建 Collection
qdrant.create_collection(
name="documents",
dim=768,
distance="Cosine",
overwrite=True
)
# 插入数据
ids = qdrant.insert(
collection_name="documents",
embeddings=doc_vectors, # numpy array
documents=documents,
metadata=[{"category": "tech", "score": 0.9} for _ in documents]
)
# 搜索
results = qdrant.search(
collection_name="documents",
query_embedding=query_vec,
top_k=5,
filter_conditions={"category": "tech"}
)
五、RAG 集成
5.1 RAG 基础架构
python
# ===== RAG (Retrieval-Augmented Generation) =====
from openai import OpenAI
from typing import List, Dict, Any
class RAGSystem:
"""RAG 系统"""
def __init__(
self,
embedding_generator, # EmbeddingGenerator
vector_db, # MilvusManager 或 QdrantManager
collection_name: str,
llm_model: str = "gpt-4o",
api_key: str = None
):
self.embedding_generator = embedding_generator
self.vector_db = vector_db
self.collection_name = collection_name
from openai import OpenAI
self.llm = OpenAI(api_key=api_key)
self.llm_model = llm_model
def index_documents(self, documents: List[str], metadata: List[Dict] = None):
"""索引文档"""
# 1. 生成 Embedding
embeddings = self.embedding_generator.encode(documents)
# 2. 插入向量数据库
if hasattr(self.vector_db, 'insert'):
self.vector_db.insert(
collection_name=self.collection_name,
embeddings=embeddings.tolist(),
documents=documents,
metadata=metadata
)
else:
raise ValueError("向量数据库不支持 insert 方法")
def retrieve(self, query: str, top_k: int = 5) -> List[Dict]:
"""检索相关文档"""
# 1. 编码查询
query_vec = self.embedding_generator.encode_single(query)
# 2. 向量搜索
if hasattr(self.vector_db, 'search'):
results = self.vector_db.search(
collection_name=self.collection_name,
query_embedding=query_vec,
top_k=top_k
)
return results
else:
raise ValueError("向量数据库不支持 search 方法")
def generate_answer(self, query: str, context: List[str]) -> str:
"""生成答案"""
# 构造 Prompt
context_text = "\n\n".join([f"[{i+1}] {c}" for i, c in enumerate(context)])
prompt = f"""基于以下参考文档回答问题。如果参考文档中没有相关信息,请明确说明。
参考文档:
{context_text}
问题: {query}
回答:"""
response = self.llm.chat.completions.create(
model=self.llm_model,
messages=[
{"role": "system", "content": "你是一个基于参考文档回答问题的助手。请严格基于参考文档回答,不要编造信息。"},
{"role": "user", "content": prompt}
],
temperature=0.1
)
return response.choices[0].message.content
def query(self, query: str, top_k: int = 5) -> Dict:
"""RAG 完整流程"""
# 1. 检索
retrieved = self.retrieve(query, top_k)
# 2. 提取上下文
context = [r.get("document", "") for r in retrieved]
# 3. 生成答案
answer = self.generate_answer(query, context)
return {
"query": query,
"answer": answer,
"retrieved": retrieved,
"context": context
}
# 使用
rag = RAGSystem(
embedding_generator=generator,
vector_db=manager, # MilvusManager
collection_name="documents",
api_key="your-openai-api-key"
)
# 索引文档
documents = [
"Milvus 是一个开源向量数据库,支持万亿级向量检索。",
"RAG 是检索增强生成,结合检索和生成提升 LLM 准确率。",
"Embedding 是将文本转换为高维向量的技术。",
# ... 更多文档
]
rag.index_documents(documents)
# 查询
result = rag.query("什么是 Milvus?")
print(f"问题: {result['query']}")
print(f"回答: {result['answer']}")
print(f"参考文档:")
for r in result["retrieved"]:
print(f" [{r['distance']:.4f}] {r['document']}")
5.2 高级 RAG 技术
python
# ===== 高级 RAG 技术 =====
class AdvancedRAG(RAGSystem):
"""高级 RAG 系统"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.reranker = None # 可接入重排序模型
def query_with_reranking(
self,
query: str,
top_k: int = 5,
rerank_top_k: int = 3
) -> Dict:
"""带重排序的查询"""
# 1. 初步检索(向量搜索)
retrieved = self.retrieve(query, top_k=top_k * 2) # 取更多候选
# 2. 重排序(基于交叉编码器)
if self.reranker:
contexts = [r["document"] for r in retrieved]
reranked = self.reranker.rerank(query, contexts, top_k=rerank_top_k)
retrieved = [retrieved[i["index"]] for i in reranked]
else:
# 简单截断
retrieved = retrieved[:rerank_top_k]
# 3. 生成答案
context = [r.get("document", "") for r in retrieved]
answer = self.generate_answer(query, context)
return {
"query": query,
"answer": answer,
"retrieved": retrieved
}
def query_with_hybrid_search(
self,
query: str,
top_k: int = 5
) -> Dict:
"""混合搜索(向量 + 关键词)"""
# 1. 向量搜索
vector_results = self.retrieve(query, top_k=top_k)
# 2. 关键词搜索(BM25)
keyword_results = self._keyword_search(query, top_k=top_k)
# 3. 融合(RRF: Reciprocal Rank Fusion)
merged = self._rrf_merge(vector_results, keyword_results, top_k=top_k)
# 4. 生成答案
context = [r["document"] for r in merged]
answer = self.generate_answer(query, context)
return {
"query": query,
"answer": answer,
"retrieved": merged
}
def _keyword_search(self, query: str, top_k: int = 5) -> List[Dict]:
"""关键词搜索(简化版,实际应使用 Elasticsearch/BM25)"""
# 这里简化为:从已索引文档中做关键词匹配
# 生产环境应使用专门的搜索引擎
return []
@staticmethod
def _rrf_merge(
list1: List[Dict],
list2: List[Dict],
top_k: int = 5,
k: int = 60
) -> List[Dict]:
"""RRF 融合"""
scores = {}
for i, item in enumerate(list1):
doc_id = item.get("id", i)
scores[doc_id] = scores.get(doc_id, 0) + 1 / (i + k)
for i, item in enumerate(list2):
doc_id = item.get("id", i)
scores[doc_id] = scores.get(doc_id, 0) + 1 / (i + k)
# 排序
sorted_ids = sorted(scores.items(), key=lambda x: x[1], reverse=True)
return [next(r for r in list1 + list2 if r.get("id") == sid) for sid, _ in sorted_ids[:top_k]]
def query_with_compression(self, query: str, top_k: int = 5) -> Dict:
"""上下文压缩"""
# 1. 检索
retrieved = self.retrieve(query, top_k=top_k)
# 2. 压缩上下文(提取相关片段)
compressed = []
for r in retrieved:
doc = r["document"]
# 简化:取包含关键词的句子
relevant_sentences = self._extract_relevant_sentences(doc, query)
compressed.append(" ".join(relevant_sentences))
# 3. 生成答案
answer = self.generate_answer(query, compressed)
return {
"query": query,
"answer": answer,
"compressed_context": compressed
}
def _extract_relevant_sentences(self, document: str, query: str) -> List[str]:
"""提取相关句子(简化版)"""
import re
sentences = re.split(r'[。!?\n]+', document)
query_words = set(query.lower().split())
relevant = []
for sent in sentences:
sent_words = set(sent.lower().split())
if query_words & sent_words: # 有交集
relevant.append(sent)
return relevant[:3] # 最多 3 句
六、生产案例
6.1 语义搜索引擎
python
# ===== 生产案例:语义搜索引擎 =====
from flask import Flask, request, jsonify
from typing import List, Dict
import json
class SemanticSearchEngine:
"""语义搜索引擎"""
def __init__(self, rag_system: RAGSystem):
self.rag = rag_system
self.app = Flask(__name__)
self._setup_routes()
def _setup_routes(self):
@self.app.route("/api/search", methods=["POST"])
def search():
data = request.json
query = data.get("query", "")
top_k = data.get("top_k", 5)
if not query:
return jsonify({"error": "query 不能为空"}), 400
try:
result = self.rag.query(query, top_k=top_k)
return jsonify({
"query": result["query"],
"answer": result["answer"],
"results": [
{
"document": r["document"],
"score": r.get("distance", r.get("score", 0)),
"metadata": r.get("metadata", {})
}
for r in result["retrieved"]
]
})
except Exception as e:
return jsonify({"error": str(e)}), 500
@self.app.route("/api/index", methods=["POST"])
def index():
data = request.json
documents = data.get("documents", [])
metadata = data.get("metadata", None)
if not documents:
return jsonify({"error": "documents 不能为空"}), 400
try:
self.rag.index_documents(documents, metadata)
return jsonify({"message": f"已索引 {len(documents)} 篇文档"})
except Exception as e:
return jsonify({"error": str(e)}), 500
def run(self, host: str = "0.0.0.0", port: int = 5000):
self.app.run(host=host, port=port)
# 使用
engine = SemanticSearchEngine(rag)
# engine.run() # 启动服务
6.2 文档去重系统
python
# ===== 生产案例:文档去重 =====
class DocumentDeduplicator:
"""文档去重系统"""
def __init__(self, embedding_generator, vector_db, collection_name: str, threshold: float = 0.95):
self.generator = embedding_generator
self.vector_db = vector_db
self.collection_name = collection_name
self.threshold = threshold
def find_duplicates(self, documents: List[str], batch_size: int = 100) -> List[List[int]]:
"""查找重复文档"""
# 1. 批量生成 Embedding
embeddings = self.generator.encode(documents)
# 2. 插入向量数据库(临时)
ids = self.vector_db.insert(
collection_name=self.collection_name,
embeddings=embeddings.tolist(),
documents=documents
)
# 3. 对每篇文档进行相似度搜索
duplicate_groups = []
visited = set()
for i, emb in enumerate(embeddings):
if i in visited:
continue
# 搜索相似文档
results = self.vector_db.search(
collection_name=self.collection_name,
query_embedding=emb.tolist(),
top_k=10,
score_threshold=self.threshold
)
# 收集重复文档
group = [i]
for r in results:
doc_id = r.get("id", r.get("idx", -1))
if doc_id != ids[i] and doc_id not in visited:
group.append(doc_id)
visited.add(doc_id)
if len(group) > 1:
duplicate_groups.append(group)
visited.add(i)
return duplicate_groups
def remove_duplicates(self, documents: List[str]) -> List[str]:
"""去除重复文档"""
duplicates = self.find_duplicates(documents)
# 标记要删除的索引
to_remove = set()
for group in duplicates:
# 保留第一篇,删除其余
for idx in group[1:]:
to_remove.add(idx)
# 过滤
return [doc for i, doc in enumerate(documents) if i not in to_remove]
七、性能优化
7.1 索引优化
python
# ===== 向量索引优化 =====
class IndexOptimizer:
"""索引优化器"""
# HNSW 参数调优指南
HNSW_PARAMS = {
"M": {
"description": "每个节点的双向链接数",
"range": [8, 16, 32, 48, 64],
"tradeoff": "M 越大,召回率越高,但内存和构建时间增加",
"recommendation": {
"low_memory": 8,
"balanced": 16,
"high_recall": 32
}
},
"efConstruction": {
"description": "构建时的候选列表大小",
"range": [64, 128, 256, 512],
"tradeoff": "越大,索引质量越高,但构建越慢",
"recommendation": {
"fast_build": 64,
"balanced": 128,
"high_quality": 256
}
},
"ef": {
"description": "查询时的候选列表大小",
"range": [64, 128, 256, 512],
"tradeoff": "越大,召回率越高,但查询越慢",
"recommendation": {
"fast_query": 64,
"balanced": 128,
"high_recall": 256
}
}
}
@staticmethod
def recommend_params(
data_size: int,
dim: int,
memory_limit_mb: int = None
) -> Dict:
"""根据数据规模和维度推荐参数"""
recommendations = {}
# M 推荐
if data_size < 100000:
recommendations["M"] = 16
elif data_size < 1000000:
recommendations["M"] = 32
else:
recommendations["M"] = 48
# efConstruction 推荐
if dim <= 384:
recommendations["efConstruction"] = 128
elif dim <= 768:
recommendations["efConstruction"] = 256
else:
recommendations["efConstruction"] = 512
# ef 推荐
recommendations["ef"] = recommendations["M"] * 4
# 内存限制调整
if memory_limit_mb:
estimated_memory = data_size * dim * 4 * recommendations["M"] / 1024 / 1024
if estimated_memory > memory_limit_mb:
recommendations["M"] = max(8, recommendations["M"] // 2)
print(f"⚠️ 内存可能不足,降低 M 到 {recommendations['M']}")
return recommendations
@staticmethod
def benchmark_index(
collection,
test_queries: List[List[float]],
ks: List[int] = [1, 5, 10, 50, 100]
) -> Dict:
"""索引性能基准测试"""
import time
results = {
"query_time": {},
"recall": {}, # 需要真实标签才能计算
}
for k in ks:
times = []
for query in test_queries:
start = time.time()
collection.search(
data=[query],
anns_field="embedding",
param={"metric_type": "COSINE", "params": {"ef": 64}},
limit=k
)
times.append(time.time() - start)
avg_time = sum(times) / len(times)
results["query_time"][k] = {
"avg_ms": avg_time * 1000,
"p99_ms": sorted(times)[int(len(times) * 0.99)] * 1000
}
return results
7.2 批量操作优化
python
# ===== 批量操作优化 =====
class BatchOptimizer:
"""批量操作优化"""
@staticmethod
def batch_insert(
vector_db,
collection_name: str,
embeddings: List[List[float]],
documents: List[str],
batch_size: int = 1000,
show_progress: bool = True
):
"""批量插入优化"""
import tqdm
total = len(embeddings)
all_ids = []
iterator = range(0, total, batch_size)
if show_progress:
iterator = tqdm.tqdm(iterator, desc="批量插入")
for i in iterator:
batch_embeddings = embeddings[i:i + batch_size]
batch_documents = documents[i:i + batch_size]
ids = vector_db.insert(
collection_name=collection_name,
embeddings=batch_embeddings,
documents=batch_documents
)
all_ids.extend(ids)
return all_ids
@staticmethod
def parallel_search(
vector_db,
collection_name: str,
query_embeddings: List[List[float]],
top_k: int = 5,
n_workers: int = 4
) -> List[List[Dict]]:
"""并行搜索"""
from concurrent.futures import ThreadPoolExecutor
def search_one(query_emb):
return vector_db.search(
collection_name=collection_name,
query_embedding=query_emb,
top_k=top_k
)
results = []
with ThreadPoolExecutor(max_workers=n_workers) as executor:
futures = [executor.submit(search_one, emb) for emb in query_embeddings]
for future in futures:
results.append(future.result())
return results
八、总结
8.1 架构全景
#mermaid-svg-x08466ZP9ynCkkNy{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-x08466ZP9ynCkkNy .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-x08466ZP9ynCkkNy .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-x08466ZP9ynCkkNy .error-icon{fill:#552222;}#mermaid-svg-x08466ZP9ynCkkNy .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-x08466ZP9ynCkkNy .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-x08466ZP9ynCkkNy .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-x08466ZP9ynCkkNy .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-x08466ZP9ynCkkNy .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-x08466ZP9ynCkkNy .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-x08466ZP9ynCkkNy .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-x08466ZP9ynCkkNy .marker{fill:#333333;stroke:#333333;}#mermaid-svg-x08466ZP9ynCkkNy .marker.cross{stroke:#333333;}#mermaid-svg-x08466ZP9ynCkkNy svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-x08466ZP9ynCkkNy p{margin:0;}#mermaid-svg-x08466ZP9ynCkkNy .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-x08466ZP9ynCkkNy .cluster-label text{fill:#333;}#mermaid-svg-x08466ZP9ynCkkNy .cluster-label span{color:#333;}#mermaid-svg-x08466ZP9ynCkkNy .cluster-label span p{background-color:transparent;}#mermaid-svg-x08466ZP9ynCkkNy .label text,#mermaid-svg-x08466ZP9ynCkkNy span{fill:#333;color:#333;}#mermaid-svg-x08466ZP9ynCkkNy .node rect,#mermaid-svg-x08466ZP9ynCkkNy .node circle,#mermaid-svg-x08466ZP9ynCkkNy .node ellipse,#mermaid-svg-x08466ZP9ynCkkNy .node polygon,#mermaid-svg-x08466ZP9ynCkkNy .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-x08466ZP9ynCkkNy .rough-node .label text,#mermaid-svg-x08466ZP9ynCkkNy .node .label text,#mermaid-svg-x08466ZP9ynCkkNy .image-shape .label,#mermaid-svg-x08466ZP9ynCkkNy .icon-shape .label{text-anchor:middle;}#mermaid-svg-x08466ZP9ynCkkNy .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-x08466ZP9ynCkkNy .rough-node .label,#mermaid-svg-x08466ZP9ynCkkNy .node .label,#mermaid-svg-x08466ZP9ynCkkNy .image-shape .label,#mermaid-svg-x08466ZP9ynCkkNy .icon-shape .label{text-align:center;}#mermaid-svg-x08466ZP9ynCkkNy .node.clickable{cursor:pointer;}#mermaid-svg-x08466ZP9ynCkkNy .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-x08466ZP9ynCkkNy .arrowheadPath{fill:#333333;}#mermaid-svg-x08466ZP9ynCkkNy .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-x08466ZP9ynCkkNy .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-x08466ZP9ynCkkNy .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-x08466ZP9ynCkkNy .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-x08466ZP9ynCkkNy .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-x08466ZP9ynCkkNy .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-x08466ZP9ynCkkNy .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-x08466ZP9ynCkkNy .cluster text{fill:#333;}#mermaid-svg-x08466ZP9ynCkkNy .cluster span{color:#333;}#mermaid-svg-x08466ZP9ynCkkNy div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-x08466ZP9ynCkkNy .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-x08466ZP9ynCkkNy rect.text{fill:none;stroke-width:0;}#mermaid-svg-x08466ZP9ynCkkNy .icon-shape,#mermaid-svg-x08466ZP9ynCkkNy .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-x08466ZP9ynCkkNy .icon-shape p,#mermaid-svg-x08466ZP9ynCkkNy .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-x08466ZP9ynCkkNy .icon-shape .label rect,#mermaid-svg-x08466ZP9ynCkkNy .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-x08466ZP9ynCkkNy .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-x08466ZP9ynCkkNy .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-x08466ZP9ynCkkNy :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 应用层
向量数据库层
Embedding 层
数据流入
文本文档
图片
音频
OpenAI Embeddings
Sentence Transformers
CLIP 多模态
Milvus
Qdrant
Weaviate
pgvector
RAG 系统
语义搜索
推荐系统
文档去重
8.2 选型建议
| 场景 | 推荐方案 | 理由 |
|---|---|---|
| 快速原型 | Pinecone / Chroma | 开箱即用,无需部署 |
| 生产环境(大规模) | Milvus | 分布式、高性能、成熟 |
| 生产环境(中小规模) | Qdrant | Rust 实现,性能优秀 |
| 已有 PostgreSQL | pgvector | 无需额外组件 |
| 多模态搜索 | Weaviate | 原生支持多模态 |
| 本地开发 | Chroma | 嵌入式,轻量级 |
8.3 最佳实践
| 实践 | 说明 |
|---|---|
| 选择合适的 Embedding 模型 | 中文用 text2vec / bge,英文用 all-MiniLM / text-embedding-3 |
| 合理设置向量维度 | 维度越高,表达能力越强,但存储和搜索成本越高 |
| 使用 HNSW 索引 | 在大多数场景下,HNSW 提供最佳召回率/性能平衡 |
| 批量操作 | 插入和搜索都使用批量接口,减少网络开销 |
| 混合搜索 | 向量搜索 + 标量过滤,提升准确率 |
| 定期优化索引 | 数据量增长后,重新评估索引参数 |
| 监控性能 | 跟踪查询延迟、召回率、内存使用 |
8.4 常见问题
| 问题 | 解决方案 |
|---|---|
| 召回率低 | 增加 ef / 使用更准确 Embedding / 尝试 IVF 索引 |
| 查询慢 | 降低 ef / 减少 top_k / 使用量化(INT8/INT4) |
| 内存不足 | 降低 M / 使用磁盘索引(IVF) / 启用量化 |
| 写入慢 | 批量插入 / 增加 flush 间隔 / 使用异步写入 |
本文涵盖向量数据库的完整技术栈:从向量基础到 Embedding 模型,从 Milvus/Qdrant 实战到 RAG 集成,从生产案例到性能优化。