「milvus-course-ai.zip」
链接:https://pan.quark.cn/s/00f3d411bb6d
github:https://github.com/yuanmomoya/milvus
学习目标
学完本章后,你应该能够:
- 解释向量、Embedding、向量维度和向量空间。
- 区分余弦相似度、内积和 L2 距离。
- 理解为什么海量向量检索不能只靠暴力扫描。
- 说清 ANN、HNSW、IVF、PQ 的基本思想。
- 用 Recall、QPS、Latency、内存成本评价一个向量检索系统。
理论知识:形象化理解
可以把向量数据库想象成一座按照"语义气味"摆放物品的巨大仓库。传统数据库像货架编号系统,你必须知道商品编号、名称或分类才能找到它;向量数据库则像给每个物品都贴上一个高维坐标,意思相近的物品会自然靠在一起。用户问"怎么做知识库问答"时,系统不是逐字找"知识库"三个字,而是在语义空间里寻找气味最接近的一批内容。
Embedding 模型就是这座仓库的"坐标测量仪"。同一套测量仪必须同时用于入库和查询,否则坐标系会错位。余弦相似度像比较两个箭头的朝向,L2 距离像比较两个地点的直线距离,ANN 索引像在仓库里修快速通道:不逐个货架查看,而是先冲到最可能的区域,再精细挑选 TopK。
理解这一章时要记住一句话:向量检索不是魔法,而是"表示学习 + 距离计算 + 索引加速"的组合。召回、延迟和成本永远互相牵制,工程师的任务就是在业务能接受的范围内找到平衡点。
核心概念
向量数据库保存的不是"文本本身有多像",而是模型把文本、图片、音频或视频映射到向量空间后的坐标。相似对象在向量空间里距离更近,检索就是在这个空间中寻找离查询向量最近的 TopK。
#mermaid-svg-wcWR7c50VaDPYXuo{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-wcWR7c50VaDPYXuo .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-wcWR7c50VaDPYXuo .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-wcWR7c50VaDPYXuo .error-icon{fill:#552222;}#mermaid-svg-wcWR7c50VaDPYXuo .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-wcWR7c50VaDPYXuo .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-wcWR7c50VaDPYXuo .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-wcWR7c50VaDPYXuo .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-wcWR7c50VaDPYXuo .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-wcWR7c50VaDPYXuo .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-wcWR7c50VaDPYXuo .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-wcWR7c50VaDPYXuo .marker{fill:#333333;stroke:#333333;}#mermaid-svg-wcWR7c50VaDPYXuo .marker.cross{stroke:#333333;}#mermaid-svg-wcWR7c50VaDPYXuo svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-wcWR7c50VaDPYXuo p{margin:0;}#mermaid-svg-wcWR7c50VaDPYXuo .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-wcWR7c50VaDPYXuo .cluster-label text{fill:#333;}#mermaid-svg-wcWR7c50VaDPYXuo .cluster-label span{color:#333;}#mermaid-svg-wcWR7c50VaDPYXuo .cluster-label span p{background-color:transparent;}#mermaid-svg-wcWR7c50VaDPYXuo .label text,#mermaid-svg-wcWR7c50VaDPYXuo span{fill:#333;color:#333;}#mermaid-svg-wcWR7c50VaDPYXuo .node rect,#mermaid-svg-wcWR7c50VaDPYXuo .node circle,#mermaid-svg-wcWR7c50VaDPYXuo .node ellipse,#mermaid-svg-wcWR7c50VaDPYXuo .node polygon,#mermaid-svg-wcWR7c50VaDPYXuo .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-wcWR7c50VaDPYXuo .rough-node .label text,#mermaid-svg-wcWR7c50VaDPYXuo .node .label text,#mermaid-svg-wcWR7c50VaDPYXuo .image-shape .label,#mermaid-svg-wcWR7c50VaDPYXuo .icon-shape .label{text-anchor:middle;}#mermaid-svg-wcWR7c50VaDPYXuo .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-wcWR7c50VaDPYXuo .rough-node .label,#mermaid-svg-wcWR7c50VaDPYXuo .node .label,#mermaid-svg-wcWR7c50VaDPYXuo .image-shape .label,#mermaid-svg-wcWR7c50VaDPYXuo .icon-shape .label{text-align:center;}#mermaid-svg-wcWR7c50VaDPYXuo .node.clickable{cursor:pointer;}#mermaid-svg-wcWR7c50VaDPYXuo .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-wcWR7c50VaDPYXuo .arrowheadPath{fill:#333333;}#mermaid-svg-wcWR7c50VaDPYXuo .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-wcWR7c50VaDPYXuo .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-wcWR7c50VaDPYXuo .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-wcWR7c50VaDPYXuo .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-wcWR7c50VaDPYXuo .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-wcWR7c50VaDPYXuo .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-wcWR7c50VaDPYXuo .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-wcWR7c50VaDPYXuo .cluster text{fill:#333;}#mermaid-svg-wcWR7c50VaDPYXuo .cluster span{color:#333;}#mermaid-svg-wcWR7c50VaDPYXuo div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-wcWR7c50VaDPYXuo .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-wcWR7c50VaDPYXuo rect.text{fill:none;stroke-width:0;}#mermaid-svg-wcWR7c50VaDPYXuo .icon-shape,#mermaid-svg-wcWR7c50VaDPYXuo .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-wcWR7c50VaDPYXuo .icon-shape p,#mermaid-svg-wcWR7c50VaDPYXuo .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-wcWR7c50VaDPYXuo .icon-shape .label rect,#mermaid-svg-wcWR7c50VaDPYXuo .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-wcWR7c50VaDPYXuo .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-wcWR7c50VaDPYXuo .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-wcWR7c50VaDPYXuo :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 原始文本/图片/音频
Embedding 模型
高维向量
Milvus Collection
用户查询
同一个 Embedding 模型
查询向量
相似度搜索 TopK
向量是什么
向量可以理解为一组浮点数,例如 [0.12, -0.03, 0.88, ...]。Embedding 模型负责让这些数字带有语义:语义相近的内容在向量空间中更接近。常见中文文本模型如 BAAI/bge-small-zh-v1.5 输出 512 维向量,CLIP 图片/文本模型常见输出 512 维向量。
维度不是越高越好。维度升高通常会带来更强表达能力,但也会增加存储、内存带宽、索引构建和搜索计算成本。粗略估算:1,000,000 条 768 维 float32 向量仅原始向量就需要 1_000_000 * 768 * 4 ≈ 2.86GB,还不包含索引和元数据。
相似度度量
| 度量 | 直觉 | 适用场景 | 注意事项 |
|---|---|---|---|
| COSINE | 看方向是否相近 | 文本语义检索常用 | 通常要求向量归一化或模型天然适配 |
| IP | 内积越大越相似 | 推荐、归一化向量检索 | 未归一化时会受向量长度影响 |
| L2 | 欧式距离越小越相似 | 图像特征、传统特征 | 与 COSINE 结果可能完全不同 |
python
import numpy as np
def cosine(a: np.ndarray, b: np.ndarray) -> float:
# 余弦相似度:越接近 1 越相似
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
def l2(a: np.ndarray, b: np.ndarray) -> float:
# L2 距离:越接近 0 越相似
return float(np.linalg.norm(a - b))
暴力检索 vs ANN
暴力检索会计算查询向量与全部向量的距离,准确但成本线性增长。ANN 是近似最近邻搜索,用可控的召回损失换取数量级的性能提升。
#mermaid-svg-LZpy4LhsyqyJyrNt{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-LZpy4LhsyqyJyrNt .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-LZpy4LhsyqyJyrNt .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-LZpy4LhsyqyJyrNt .error-icon{fill:#552222;}#mermaid-svg-LZpy4LhsyqyJyrNt .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-LZpy4LhsyqyJyrNt .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-LZpy4LhsyqyJyrNt .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-LZpy4LhsyqyJyrNt .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-LZpy4LhsyqyJyrNt .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-LZpy4LhsyqyJyrNt .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-LZpy4LhsyqyJyrNt .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-LZpy4LhsyqyJyrNt .marker{fill:#333333;stroke:#333333;}#mermaid-svg-LZpy4LhsyqyJyrNt .marker.cross{stroke:#333333;}#mermaid-svg-LZpy4LhsyqyJyrNt svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-LZpy4LhsyqyJyrNt p{margin:0;}#mermaid-svg-LZpy4LhsyqyJyrNt .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-LZpy4LhsyqyJyrNt .cluster-label text{fill:#333;}#mermaid-svg-LZpy4LhsyqyJyrNt .cluster-label span{color:#333;}#mermaid-svg-LZpy4LhsyqyJyrNt .cluster-label span p{background-color:transparent;}#mermaid-svg-LZpy4LhsyqyJyrNt .label text,#mermaid-svg-LZpy4LhsyqyJyrNt span{fill:#333;color:#333;}#mermaid-svg-LZpy4LhsyqyJyrNt .node rect,#mermaid-svg-LZpy4LhsyqyJyrNt .node circle,#mermaid-svg-LZpy4LhsyqyJyrNt .node ellipse,#mermaid-svg-LZpy4LhsyqyJyrNt .node polygon,#mermaid-svg-LZpy4LhsyqyJyrNt .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-LZpy4LhsyqyJyrNt .rough-node .label text,#mermaid-svg-LZpy4LhsyqyJyrNt .node .label text,#mermaid-svg-LZpy4LhsyqyJyrNt .image-shape .label,#mermaid-svg-LZpy4LhsyqyJyrNt .icon-shape .label{text-anchor:middle;}#mermaid-svg-LZpy4LhsyqyJyrNt .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-LZpy4LhsyqyJyrNt .rough-node .label,#mermaid-svg-LZpy4LhsyqyJyrNt .node .label,#mermaid-svg-LZpy4LhsyqyJyrNt .image-shape .label,#mermaid-svg-LZpy4LhsyqyJyrNt .icon-shape .label{text-align:center;}#mermaid-svg-LZpy4LhsyqyJyrNt .node.clickable{cursor:pointer;}#mermaid-svg-LZpy4LhsyqyJyrNt .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-LZpy4LhsyqyJyrNt .arrowheadPath{fill:#333333;}#mermaid-svg-LZpy4LhsyqyJyrNt .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-LZpy4LhsyqyJyrNt .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-LZpy4LhsyqyJyrNt .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-LZpy4LhsyqyJyrNt .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-LZpy4LhsyqyJyrNt .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-LZpy4LhsyqyJyrNt .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-LZpy4LhsyqyJyrNt .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-LZpy4LhsyqyJyrNt .cluster text{fill:#333;}#mermaid-svg-LZpy4LhsyqyJyrNt .cluster span{color:#333;}#mermaid-svg-LZpy4LhsyqyJyrNt div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-LZpy4LhsyqyJyrNt .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-LZpy4LhsyqyJyrNt rect.text{fill:none;stroke-width:0;}#mermaid-svg-LZpy4LhsyqyJyrNt .icon-shape,#mermaid-svg-LZpy4LhsyqyJyrNt .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-LZpy4LhsyqyJyrNt .icon-shape p,#mermaid-svg-LZpy4LhsyqyJyrNt .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-LZpy4LhsyqyJyrNt .icon-shape .label rect,#mermaid-svg-LZpy4LhsyqyJyrNt .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-LZpy4LhsyqyJyrNt .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-LZpy4LhsyqyJyrNt .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-LZpy4LhsyqyJyrNt :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 查询向量
检索方式
暴力检索 FLAT
扫描全部向量
高召回,成本高
ANN
只访问候选子集
召回略降,速度显著提升
HNSW 直觉
HNSW 是图索引。每个向量是一个节点,相似向量之间有边。搜索从高层稀疏图快速接近目标区域,再到底层密集图精细搜索。
#mermaid-svg-qzb87CIDFyf1CXCG{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-qzb87CIDFyf1CXCG .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-qzb87CIDFyf1CXCG .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-qzb87CIDFyf1CXCG .error-icon{fill:#552222;}#mermaid-svg-qzb87CIDFyf1CXCG .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-qzb87CIDFyf1CXCG .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-qzb87CIDFyf1CXCG .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-qzb87CIDFyf1CXCG .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-qzb87CIDFyf1CXCG .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-qzb87CIDFyf1CXCG .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-qzb87CIDFyf1CXCG .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-qzb87CIDFyf1CXCG .marker{fill:#333333;stroke:#333333;}#mermaid-svg-qzb87CIDFyf1CXCG .marker.cross{stroke:#333333;}#mermaid-svg-qzb87CIDFyf1CXCG svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-qzb87CIDFyf1CXCG p{margin:0;}#mermaid-svg-qzb87CIDFyf1CXCG .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-qzb87CIDFyf1CXCG .cluster-label text{fill:#333;}#mermaid-svg-qzb87CIDFyf1CXCG .cluster-label span{color:#333;}#mermaid-svg-qzb87CIDFyf1CXCG .cluster-label span p{background-color:transparent;}#mermaid-svg-qzb87CIDFyf1CXCG .label text,#mermaid-svg-qzb87CIDFyf1CXCG span{fill:#333;color:#333;}#mermaid-svg-qzb87CIDFyf1CXCG .node rect,#mermaid-svg-qzb87CIDFyf1CXCG .node circle,#mermaid-svg-qzb87CIDFyf1CXCG .node ellipse,#mermaid-svg-qzb87CIDFyf1CXCG .node polygon,#mermaid-svg-qzb87CIDFyf1CXCG .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-qzb87CIDFyf1CXCG .rough-node .label text,#mermaid-svg-qzb87CIDFyf1CXCG .node .label text,#mermaid-svg-qzb87CIDFyf1CXCG .image-shape .label,#mermaid-svg-qzb87CIDFyf1CXCG .icon-shape .label{text-anchor:middle;}#mermaid-svg-qzb87CIDFyf1CXCG .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-qzb87CIDFyf1CXCG .rough-node .label,#mermaid-svg-qzb87CIDFyf1CXCG .node .label,#mermaid-svg-qzb87CIDFyf1CXCG .image-shape .label,#mermaid-svg-qzb87CIDFyf1CXCG .icon-shape .label{text-align:center;}#mermaid-svg-qzb87CIDFyf1CXCG .node.clickable{cursor:pointer;}#mermaid-svg-qzb87CIDFyf1CXCG .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-qzb87CIDFyf1CXCG .arrowheadPath{fill:#333333;}#mermaid-svg-qzb87CIDFyf1CXCG .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-qzb87CIDFyf1CXCG .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-qzb87CIDFyf1CXCG .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-qzb87CIDFyf1CXCG .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-qzb87CIDFyf1CXCG .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-qzb87CIDFyf1CXCG .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-qzb87CIDFyf1CXCG .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-qzb87CIDFyf1CXCG .cluster text{fill:#333;}#mermaid-svg-qzb87CIDFyf1CXCG .cluster span{color:#333;}#mermaid-svg-qzb87CIDFyf1CXCG div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-qzb87CIDFyf1CXCG .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-qzb87CIDFyf1CXCG rect.text{fill:none;stroke-width:0;}#mermaid-svg-qzb87CIDFyf1CXCG .icon-shape,#mermaid-svg-qzb87CIDFyf1CXCG .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-qzb87CIDFyf1CXCG .icon-shape p,#mermaid-svg-qzb87CIDFyf1CXCG .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-qzb87CIDFyf1CXCG .icon-shape .label rect,#mermaid-svg-qzb87CIDFyf1CXCG .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-qzb87CIDFyf1CXCG .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-qzb87CIDFyf1CXCG .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-qzb87CIDFyf1CXCG :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 底层:密集邻接图
A
B
C
D
E
F
G
中层:区域道路
A
B
D
F
G
高层:稀疏高速公路
A
D
G
HNSW 常用参数:
| 参数 | 作用 | 增大后的影响 |
|---|---|---|
M |
每个节点的邻居数量上限 | 召回更好,内存更高,构建更慢 |
efConstruction |
构建索引时的候选集 | 索引质量更好,构建更慢 |
ef |
搜索时的候选集 | 召回更好,延迟更高 |
IVF 直觉
IVF 先把向量空间聚成多个簇,搜索时只访问与查询最接近的若干簇。
#mermaid-svg-s8bCW5yEPQSvxmkh{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-s8bCW5yEPQSvxmkh .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-s8bCW5yEPQSvxmkh .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-s8bCW5yEPQSvxmkh .error-icon{fill:#552222;}#mermaid-svg-s8bCW5yEPQSvxmkh .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-s8bCW5yEPQSvxmkh .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-s8bCW5yEPQSvxmkh .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-s8bCW5yEPQSvxmkh .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-s8bCW5yEPQSvxmkh .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-s8bCW5yEPQSvxmkh .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-s8bCW5yEPQSvxmkh .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-s8bCW5yEPQSvxmkh .marker{fill:#333333;stroke:#333333;}#mermaid-svg-s8bCW5yEPQSvxmkh .marker.cross{stroke:#333333;}#mermaid-svg-s8bCW5yEPQSvxmkh svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-s8bCW5yEPQSvxmkh p{margin:0;}#mermaid-svg-s8bCW5yEPQSvxmkh .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-s8bCW5yEPQSvxmkh .cluster-label text{fill:#333;}#mermaid-svg-s8bCW5yEPQSvxmkh .cluster-label span{color:#333;}#mermaid-svg-s8bCW5yEPQSvxmkh .cluster-label span p{background-color:transparent;}#mermaid-svg-s8bCW5yEPQSvxmkh .label text,#mermaid-svg-s8bCW5yEPQSvxmkh span{fill:#333;color:#333;}#mermaid-svg-s8bCW5yEPQSvxmkh .node rect,#mermaid-svg-s8bCW5yEPQSvxmkh .node circle,#mermaid-svg-s8bCW5yEPQSvxmkh .node ellipse,#mermaid-svg-s8bCW5yEPQSvxmkh .node polygon,#mermaid-svg-s8bCW5yEPQSvxmkh .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-s8bCW5yEPQSvxmkh .rough-node .label text,#mermaid-svg-s8bCW5yEPQSvxmkh .node .label text,#mermaid-svg-s8bCW5yEPQSvxmkh .image-shape .label,#mermaid-svg-s8bCW5yEPQSvxmkh .icon-shape .label{text-anchor:middle;}#mermaid-svg-s8bCW5yEPQSvxmkh .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-s8bCW5yEPQSvxmkh .rough-node .label,#mermaid-svg-s8bCW5yEPQSvxmkh .node .label,#mermaid-svg-s8bCW5yEPQSvxmkh .image-shape .label,#mermaid-svg-s8bCW5yEPQSvxmkh .icon-shape .label{text-align:center;}#mermaid-svg-s8bCW5yEPQSvxmkh .node.clickable{cursor:pointer;}#mermaid-svg-s8bCW5yEPQSvxmkh .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-s8bCW5yEPQSvxmkh .arrowheadPath{fill:#333333;}#mermaid-svg-s8bCW5yEPQSvxmkh .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-s8bCW5yEPQSvxmkh .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-s8bCW5yEPQSvxmkh .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-s8bCW5yEPQSvxmkh .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-s8bCW5yEPQSvxmkh .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-s8bCW5yEPQSvxmkh .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-s8bCW5yEPQSvxmkh .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-s8bCW5yEPQSvxmkh .cluster text{fill:#333;}#mermaid-svg-s8bCW5yEPQSvxmkh .cluster span{color:#333;}#mermaid-svg-s8bCW5yEPQSvxmkh div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-s8bCW5yEPQSvxmkh .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-s8bCW5yEPQSvxmkh rect.text{fill:none;stroke-width:0;}#mermaid-svg-s8bCW5yEPQSvxmkh .icon-shape,#mermaid-svg-s8bCW5yEPQSvxmkh .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-s8bCW5yEPQSvxmkh .icon-shape p,#mermaid-svg-s8bCW5yEPQSvxmkh .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-s8bCW5yEPQSvxmkh .icon-shape .label rect,#mermaid-svg-s8bCW5yEPQSvxmkh .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-s8bCW5yEPQSvxmkh .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-s8bCW5yEPQSvxmkh .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-s8bCW5yEPQSvxmkh :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 全部向量
KMeans 训练中心
倒排列表 nlist
查询向量
选择 nprobe 个最近中心
只扫描被选中的列表
TopK
nlist 决定分桶数量,nprobe 决定搜索多少个桶。nprobe 太小会漏召回,太大会接近暴力扫描。
PQ 直觉
PQ 把高维向量切成多个子空间,每个子空间用码本近似表达。它的核心价值是压缩内存,但会引入量化误差。
#mermaid-svg-xoHfGpBmqDOXJK91{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-xoHfGpBmqDOXJK91 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-xoHfGpBmqDOXJK91 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-xoHfGpBmqDOXJK91 .error-icon{fill:#552222;}#mermaid-svg-xoHfGpBmqDOXJK91 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-xoHfGpBmqDOXJK91 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-xoHfGpBmqDOXJK91 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-xoHfGpBmqDOXJK91 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-xoHfGpBmqDOXJK91 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-xoHfGpBmqDOXJK91 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-xoHfGpBmqDOXJK91 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-xoHfGpBmqDOXJK91 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-xoHfGpBmqDOXJK91 .marker.cross{stroke:#333333;}#mermaid-svg-xoHfGpBmqDOXJK91 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-xoHfGpBmqDOXJK91 p{margin:0;}#mermaid-svg-xoHfGpBmqDOXJK91 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-xoHfGpBmqDOXJK91 .cluster-label text{fill:#333;}#mermaid-svg-xoHfGpBmqDOXJK91 .cluster-label span{color:#333;}#mermaid-svg-xoHfGpBmqDOXJK91 .cluster-label span p{background-color:transparent;}#mermaid-svg-xoHfGpBmqDOXJK91 .label text,#mermaid-svg-xoHfGpBmqDOXJK91 span{fill:#333;color:#333;}#mermaid-svg-xoHfGpBmqDOXJK91 .node rect,#mermaid-svg-xoHfGpBmqDOXJK91 .node circle,#mermaid-svg-xoHfGpBmqDOXJK91 .node ellipse,#mermaid-svg-xoHfGpBmqDOXJK91 .node polygon,#mermaid-svg-xoHfGpBmqDOXJK91 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-xoHfGpBmqDOXJK91 .rough-node .label text,#mermaid-svg-xoHfGpBmqDOXJK91 .node .label text,#mermaid-svg-xoHfGpBmqDOXJK91 .image-shape .label,#mermaid-svg-xoHfGpBmqDOXJK91 .icon-shape .label{text-anchor:middle;}#mermaid-svg-xoHfGpBmqDOXJK91 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-xoHfGpBmqDOXJK91 .rough-node .label,#mermaid-svg-xoHfGpBmqDOXJK91 .node .label,#mermaid-svg-xoHfGpBmqDOXJK91 .image-shape .label,#mermaid-svg-xoHfGpBmqDOXJK91 .icon-shape .label{text-align:center;}#mermaid-svg-xoHfGpBmqDOXJK91 .node.clickable{cursor:pointer;}#mermaid-svg-xoHfGpBmqDOXJK91 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-xoHfGpBmqDOXJK91 .arrowheadPath{fill:#333333;}#mermaid-svg-xoHfGpBmqDOXJK91 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-xoHfGpBmqDOXJK91 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-xoHfGpBmqDOXJK91 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-xoHfGpBmqDOXJK91 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-xoHfGpBmqDOXJK91 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-xoHfGpBmqDOXJK91 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-xoHfGpBmqDOXJK91 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-xoHfGpBmqDOXJK91 .cluster text{fill:#333;}#mermaid-svg-xoHfGpBmqDOXJK91 .cluster span{color:#333;}#mermaid-svg-xoHfGpBmqDOXJK91 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-xoHfGpBmqDOXJK91 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-xoHfGpBmqDOXJK91 rect.text{fill:none;stroke-width:0;}#mermaid-svg-xoHfGpBmqDOXJK91 .icon-shape,#mermaid-svg-xoHfGpBmqDOXJK91 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-xoHfGpBmqDOXJK91 .icon-shape p,#mermaid-svg-xoHfGpBmqDOXJK91 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-xoHfGpBmqDOXJK91 .icon-shape .label rect,#mermaid-svg-xoHfGpBmqDOXJK91 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-xoHfGpBmqDOXJK91 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-xoHfGpBmqDOXJK91 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-xoHfGpBmqDOXJK91 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 768 维 float32 向量
切成 m 个子向量
每段查码本
保存短编码
低内存近似距离计算
指标体系
| 指标 | 含义 | 优化方向 |
|---|---|---|
| Recall@K | 真正相关结果被 TopK 找回的比例 | 增大 ef/nprobe,优化 Embedding,使用 Rerank |
| QPS | 每秒请求数 | 减小候选集、水平扩展、减少输出字段 |
| Latency | 单次请求耗时 | 关注 P50/P95/P99,不只看平均值 |
| Build Time | 索引构建耗时 | 调整索引类型、并行度、Segment 尺寸 |
| Memory | 内存占用 | 量化、mmap、冷热分层、减少副本 |
完整代码
本章配套代码见 ../demos/basic-search。它会:
- 使用
sentence-transformers生成中文文本向量。 - 创建 Milvus Collection。
- 建立 HNSW 索引。
- 写入示例文本。
- 执行 TopK 语义检索。
bash
cd milvus-master-course
./scripts/start.sh
cd demos/basic-search
cp .env.example .env
python main.py
常见错误
| 错误 | 原因 | 修复 |
|---|---|---|
| COSINE 结果不稳定 | 模型输出未归一化或 metric 选错 | 统一 Embedding 模型和 metric |
| 维度不匹配 | Collection dim 与模型 dim 不一致 | 删除重建 Collection 或固定模型版本 |
| 召回差 | Chunk 太大/太小、模型不适合中文、ef/nprobe 太低 | 做离线标注集评测 |
| 延迟高 | TopK 大、候选集大、输出字段多 | 控制参数并压测 P95/P99 |
面试题
- 为什么向量数据库需要 ANN?
- COSINE、IP、L2 的差异是什么?
- HNSW 为什么通常内存占用较高?
- IVF 的
nlist和nprobe如何影响召回和延迟? - PQ 为什么能降成本,又为什么会损失精度?
练习题
- 用
demos/basic-search把 HNSW 的ef从 16、64、128 分别跑一次,记录结果变化。 - 把模型换成另一个中文 Embedding 模型,观察维度变化和检索结果变化。
- 自己构造 20 条容易混淆的文本,手工评估 Recall@3。
小结
向量数据库的本质是"在高维空间中快速找相似对象"。所有工程决策都围绕四个变量展开:召回、延迟、吞吐、成本。后续章节会把这些变量落到 Milvus 的 Schema、索引、查询参数和生产架构中。