「milvus-course-ai.zip」
链接:https://pan.quark.cn/s/00f3d411bb6d
github:https://github.com/yuanmomoya/milvus
学习目标
学完本章后,你应该能够:
- 深入理解 IVF 的聚类训练、倒排列表和搜索流程。
- 掌握 nlist 和 nprobe 的调优方法论。
- 区分 IVF_FLAT、IVF_SQ8、IVF_PQ 的适用场景。
- 在 Milvus 中完成 IVF 索引的创建、搜索和参数调优。
- 评估 IVF 索引在不同数据规模下的表现。
IVF 核心原理
IVF(Inverted File Index)借鉴了文本检索中倒排索引的思想:先对向量空间做聚类分区,搜索时只访问最相关的分区。
构建阶段
#mermaid-svg-ZC0YzeygwIY4Kawt{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-ZC0YzeygwIY4Kawt .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-ZC0YzeygwIY4Kawt .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-ZC0YzeygwIY4Kawt .error-icon{fill:#552222;}#mermaid-svg-ZC0YzeygwIY4Kawt .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-ZC0YzeygwIY4Kawt .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-ZC0YzeygwIY4Kawt .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-ZC0YzeygwIY4Kawt .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-ZC0YzeygwIY4Kawt .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-ZC0YzeygwIY4Kawt .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-ZC0YzeygwIY4Kawt .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-ZC0YzeygwIY4Kawt .marker{fill:#333333;stroke:#333333;}#mermaid-svg-ZC0YzeygwIY4Kawt .marker.cross{stroke:#333333;}#mermaid-svg-ZC0YzeygwIY4Kawt svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-ZC0YzeygwIY4Kawt p{margin:0;}#mermaid-svg-ZC0YzeygwIY4Kawt .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-ZC0YzeygwIY4Kawt .cluster-label text{fill:#333;}#mermaid-svg-ZC0YzeygwIY4Kawt .cluster-label span{color:#333;}#mermaid-svg-ZC0YzeygwIY4Kawt .cluster-label span p{background-color:transparent;}#mermaid-svg-ZC0YzeygwIY4Kawt .label text,#mermaid-svg-ZC0YzeygwIY4Kawt span{fill:#333;color:#333;}#mermaid-svg-ZC0YzeygwIY4Kawt .node rect,#mermaid-svg-ZC0YzeygwIY4Kawt .node circle,#mermaid-svg-ZC0YzeygwIY4Kawt .node ellipse,#mermaid-svg-ZC0YzeygwIY4Kawt .node polygon,#mermaid-svg-ZC0YzeygwIY4Kawt .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-ZC0YzeygwIY4Kawt .rough-node .label text,#mermaid-svg-ZC0YzeygwIY4Kawt .node .label text,#mermaid-svg-ZC0YzeygwIY4Kawt .image-shape .label,#mermaid-svg-ZC0YzeygwIY4Kawt .icon-shape .label{text-anchor:middle;}#mermaid-svg-ZC0YzeygwIY4Kawt .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-ZC0YzeygwIY4Kawt .rough-node .label,#mermaid-svg-ZC0YzeygwIY4Kawt .node .label,#mermaid-svg-ZC0YzeygwIY4Kawt .image-shape .label,#mermaid-svg-ZC0YzeygwIY4Kawt .icon-shape .label{text-align:center;}#mermaid-svg-ZC0YzeygwIY4Kawt .node.clickable{cursor:pointer;}#mermaid-svg-ZC0YzeygwIY4Kawt .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-ZC0YzeygwIY4Kawt .arrowheadPath{fill:#333333;}#mermaid-svg-ZC0YzeygwIY4Kawt .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-ZC0YzeygwIY4Kawt .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-ZC0YzeygwIY4Kawt .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-ZC0YzeygwIY4Kawt .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-ZC0YzeygwIY4Kawt .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-ZC0YzeygwIY4Kawt .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-ZC0YzeygwIY4Kawt .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-ZC0YzeygwIY4Kawt .cluster text{fill:#333;}#mermaid-svg-ZC0YzeygwIY4Kawt .cluster span{color:#333;}#mermaid-svg-ZC0YzeygwIY4Kawt div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-ZC0YzeygwIY4Kawt .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-ZC0YzeygwIY4Kawt rect.text{fill:none;stroke-width:0;}#mermaid-svg-ZC0YzeygwIY4Kawt .icon-shape,#mermaid-svg-ZC0YzeygwIY4Kawt .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-ZC0YzeygwIY4Kawt .icon-shape p,#mermaid-svg-ZC0YzeygwIY4Kawt .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-ZC0YzeygwIY4Kawt .icon-shape .label rect,#mermaid-svg-ZC0YzeygwIY4Kawt .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-ZC0YzeygwIY4Kawt .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-ZC0YzeygwIY4Kawt .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-ZC0YzeygwIY4Kawt :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 倒排列表结构
列表 1: 中心 C1
向量 v3, v7, v12, ...
列表 2: 中心 C2
向量 v1, v5, v9, ...
列表 N: 中心 CN
向量 v2, v8, v15, ...
N 条训练向量
KMeans 聚类
生成 nlist 个中心
每条向量分配到最近的中心
形成 nlist 个倒排列表
搜索阶段
TopK 结果 倒排列表 nlist 个中心 查询向量 TopK 结果 倒排列表 nlist 个中心 查询向量 #mermaid-svg-6E2vDG3P595Fpxsk{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-6E2vDG3P595Fpxsk .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-6E2vDG3P595Fpxsk .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-6E2vDG3P595Fpxsk .error-icon{fill:#552222;}#mermaid-svg-6E2vDG3P595Fpxsk .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-6E2vDG3P595Fpxsk .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-6E2vDG3P595Fpxsk .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-6E2vDG3P595Fpxsk .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-6E2vDG3P595Fpxsk .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-6E2vDG3P595Fpxsk .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-6E2vDG3P595Fpxsk .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-6E2vDG3P595Fpxsk .marker{fill:#333333;stroke:#333333;}#mermaid-svg-6E2vDG3P595Fpxsk .marker.cross{stroke:#333333;}#mermaid-svg-6E2vDG3P595Fpxsk svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-6E2vDG3P595Fpxsk p{margin:0;}#mermaid-svg-6E2vDG3P595Fpxsk .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-6E2vDG3P595Fpxsk text.actor>tspan{fill:black;stroke:none;}#mermaid-svg-6E2vDG3P595Fpxsk .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-6E2vDG3P595Fpxsk .innerArc{stroke-width:1.5;stroke-dasharray:none;}#mermaid-svg-6E2vDG3P595Fpxsk .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-svg-6E2vDG3P595Fpxsk .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-svg-6E2vDG3P595Fpxsk #arrowhead path{fill:#333;stroke:#333;}#mermaid-svg-6E2vDG3P595Fpxsk .sequenceNumber{fill:white;}#mermaid-svg-6E2vDG3P595Fpxsk #sequencenumber{fill:#333;}#mermaid-svg-6E2vDG3P595Fpxsk #crosshead path{fill:#333;stroke:#333;}#mermaid-svg-6E2vDG3P595Fpxsk .messageText{fill:#333;stroke:none;}#mermaid-svg-6E2vDG3P595Fpxsk .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-6E2vDG3P595Fpxsk .labelText,#mermaid-svg-6E2vDG3P595Fpxsk .labelText>tspan{fill:black;stroke:none;}#mermaid-svg-6E2vDG3P595Fpxsk .loopText,#mermaid-svg-6E2vDG3P595Fpxsk .loopText>tspan{fill:black;stroke:none;}#mermaid-svg-6E2vDG3P595Fpxsk .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-6E2vDG3P595Fpxsk .note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-6E2vDG3P595Fpxsk .noteText,#mermaid-svg-6E2vDG3P595Fpxsk .noteText>tspan{fill:black;stroke:none;}#mermaid-svg-6E2vDG3P595Fpxsk .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-svg-6E2vDG3P595Fpxsk .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-svg-6E2vDG3P595Fpxsk .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-svg-6E2vDG3P595Fpxsk .actorPopupMenu{position:absolute;}#mermaid-svg-6E2vDG3P595Fpxsk .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg-6E2vDG3P595Fpxsk .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-6E2vDG3P595Fpxsk .actor-man circle,#mermaid-svg-6E2vDG3P595Fpxsk line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#mermaid-svg-6E2vDG3P595Fpxsk :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 计算与所有中心的距离 选出最近的 nprobe 个中心 访问对应的 nprobe 个列表 逐一计算查询与列表内向量的距离 合并排序取 TopK
时间复杂度分析
| 阶段 | 计算量 | 说明 |
|---|---|---|
| 找最近中心 | O(nlist × dim) | 与所有中心比较 |
| 扫描列表 | O(nprobe × N/nlist × dim) | 扫描 nprobe 个列表 |
| 总计 | O(nlist × dim + nprobe × N/nlist × dim) | 远小于暴力的 O(N × dim) |
当 nprobe << nlist 时,搜索量约为 N × nprobe / nlist,远小于全量 N。
nlist 参数设计
nlist 决定了向量空间被切成多少个区域。
nlist 与数据量的关系
| 数据量 N | 推荐 nlist | 每个列表平均大小 |
|---|---|---|
| 10 万 | 128-256 | 390-780 |
| 100 万 | 1024-4096 | 244-976 |
| 1000 万 | 4096-16384 | 610-2441 |
| 1 亿 | 16384-65536 | 1525-6103 |
经验公式 :nlist ≈ 4 × sqrt(N)
nlist 太大的问题
#mermaid-svg-IPxoggeh1G44HG2v{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-IPxoggeh1G44HG2v .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-IPxoggeh1G44HG2v .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-IPxoggeh1G44HG2v .error-icon{fill:#552222;}#mermaid-svg-IPxoggeh1G44HG2v .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-IPxoggeh1G44HG2v .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-IPxoggeh1G44HG2v .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-IPxoggeh1G44HG2v .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-IPxoggeh1G44HG2v .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-IPxoggeh1G44HG2v .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-IPxoggeh1G44HG2v .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-IPxoggeh1G44HG2v .marker{fill:#333333;stroke:#333333;}#mermaid-svg-IPxoggeh1G44HG2v .marker.cross{stroke:#333333;}#mermaid-svg-IPxoggeh1G44HG2v svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-IPxoggeh1G44HG2v p{margin:0;}#mermaid-svg-IPxoggeh1G44HG2v .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-IPxoggeh1G44HG2v .cluster-label text{fill:#333;}#mermaid-svg-IPxoggeh1G44HG2v .cluster-label span{color:#333;}#mermaid-svg-IPxoggeh1G44HG2v .cluster-label span p{background-color:transparent;}#mermaid-svg-IPxoggeh1G44HG2v .label text,#mermaid-svg-IPxoggeh1G44HG2v span{fill:#333;color:#333;}#mermaid-svg-IPxoggeh1G44HG2v .node rect,#mermaid-svg-IPxoggeh1G44HG2v .node circle,#mermaid-svg-IPxoggeh1G44HG2v .node ellipse,#mermaid-svg-IPxoggeh1G44HG2v .node polygon,#mermaid-svg-IPxoggeh1G44HG2v .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-IPxoggeh1G44HG2v .rough-node .label text,#mermaid-svg-IPxoggeh1G44HG2v .node .label text,#mermaid-svg-IPxoggeh1G44HG2v .image-shape .label,#mermaid-svg-IPxoggeh1G44HG2v .icon-shape .label{text-anchor:middle;}#mermaid-svg-IPxoggeh1G44HG2v .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-IPxoggeh1G44HG2v .rough-node .label,#mermaid-svg-IPxoggeh1G44HG2v .node .label,#mermaid-svg-IPxoggeh1G44HG2v .image-shape .label,#mermaid-svg-IPxoggeh1G44HG2v .icon-shape .label{text-align:center;}#mermaid-svg-IPxoggeh1G44HG2v .node.clickable{cursor:pointer;}#mermaid-svg-IPxoggeh1G44HG2v .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-IPxoggeh1G44HG2v .arrowheadPath{fill:#333333;}#mermaid-svg-IPxoggeh1G44HG2v .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-IPxoggeh1G44HG2v .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-IPxoggeh1G44HG2v .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-IPxoggeh1G44HG2v .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-IPxoggeh1G44HG2v .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-IPxoggeh1G44HG2v .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-IPxoggeh1G44HG2v .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-IPxoggeh1G44HG2v .cluster text{fill:#333;}#mermaid-svg-IPxoggeh1G44HG2v .cluster span{color:#333;}#mermaid-svg-IPxoggeh1G44HG2v div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-IPxoggeh1G44HG2v .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-IPxoggeh1G44HG2v rect.text{fill:none;stroke-width:0;}#mermaid-svg-IPxoggeh1G44HG2v .icon-shape,#mermaid-svg-IPxoggeh1G44HG2v .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-IPxoggeh1G44HG2v .icon-shape p,#mermaid-svg-IPxoggeh1G44HG2v .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-IPxoggeh1G44HG2v .icon-shape .label rect,#mermaid-svg-IPxoggeh1G44HG2v .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-IPxoggeh1G44HG2v .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-IPxoggeh1G44HG2v .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-IPxoggeh1G44HG2v :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} nlist 过大
每个列表很短
边界效应严重
相近向量被分到不同簇
需要更大 nprobe 补偿
搜索变慢
nlist 太小的问题
#mermaid-svg-JvD8nyWD3v82ihVY{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-JvD8nyWD3v82ihVY .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-JvD8nyWD3v82ihVY .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-JvD8nyWD3v82ihVY .error-icon{fill:#552222;}#mermaid-svg-JvD8nyWD3v82ihVY .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-JvD8nyWD3v82ihVY .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-JvD8nyWD3v82ihVY .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-JvD8nyWD3v82ihVY .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-JvD8nyWD3v82ihVY .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-JvD8nyWD3v82ihVY .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-JvD8nyWD3v82ihVY .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-JvD8nyWD3v82ihVY .marker{fill:#333333;stroke:#333333;}#mermaid-svg-JvD8nyWD3v82ihVY .marker.cross{stroke:#333333;}#mermaid-svg-JvD8nyWD3v82ihVY svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-JvD8nyWD3v82ihVY p{margin:0;}#mermaid-svg-JvD8nyWD3v82ihVY .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-JvD8nyWD3v82ihVY .cluster-label text{fill:#333;}#mermaid-svg-JvD8nyWD3v82ihVY .cluster-label span{color:#333;}#mermaid-svg-JvD8nyWD3v82ihVY .cluster-label span p{background-color:transparent;}#mermaid-svg-JvD8nyWD3v82ihVY .label text,#mermaid-svg-JvD8nyWD3v82ihVY span{fill:#333;color:#333;}#mermaid-svg-JvD8nyWD3v82ihVY .node rect,#mermaid-svg-JvD8nyWD3v82ihVY .node circle,#mermaid-svg-JvD8nyWD3v82ihVY .node ellipse,#mermaid-svg-JvD8nyWD3v82ihVY .node polygon,#mermaid-svg-JvD8nyWD3v82ihVY .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-JvD8nyWD3v82ihVY .rough-node .label text,#mermaid-svg-JvD8nyWD3v82ihVY .node .label text,#mermaid-svg-JvD8nyWD3v82ihVY .image-shape .label,#mermaid-svg-JvD8nyWD3v82ihVY .icon-shape .label{text-anchor:middle;}#mermaid-svg-JvD8nyWD3v82ihVY .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-JvD8nyWD3v82ihVY .rough-node .label,#mermaid-svg-JvD8nyWD3v82ihVY .node .label,#mermaid-svg-JvD8nyWD3v82ihVY .image-shape .label,#mermaid-svg-JvD8nyWD3v82ihVY .icon-shape .label{text-align:center;}#mermaid-svg-JvD8nyWD3v82ihVY .node.clickable{cursor:pointer;}#mermaid-svg-JvD8nyWD3v82ihVY .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-JvD8nyWD3v82ihVY .arrowheadPath{fill:#333333;}#mermaid-svg-JvD8nyWD3v82ihVY .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-JvD8nyWD3v82ihVY .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-JvD8nyWD3v82ihVY .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-JvD8nyWD3v82ihVY .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-JvD8nyWD3v82ihVY .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-JvD8nyWD3v82ihVY .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-JvD8nyWD3v82ihVY .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-JvD8nyWD3v82ihVY .cluster text{fill:#333;}#mermaid-svg-JvD8nyWD3v82ihVY .cluster span{color:#333;}#mermaid-svg-JvD8nyWD3v82ihVY div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-JvD8nyWD3v82ihVY .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-JvD8nyWD3v82ihVY rect.text{fill:none;stroke-width:0;}#mermaid-svg-JvD8nyWD3v82ihVY .icon-shape,#mermaid-svg-JvD8nyWD3v82ihVY .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-JvD8nyWD3v82ihVY .icon-shape p,#mermaid-svg-JvD8nyWD3v82ihVY .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-JvD8nyWD3v82ihVY .icon-shape .label rect,#mermaid-svg-JvD8nyWD3v82ihVY .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-JvD8nyWD3v82ihVY .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-JvD8nyWD3v82ihVY .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-JvD8nyWD3v82ihVY :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} nlist 过小
每个列表很长
扫描单个列表就很慢
接近暴力搜索
nprobe 参数调优
nprobe 是搜索时探测的列表数量,直接控制召回率和延迟的平衡。
nprobe 调优实验框架
python
import time
import numpy as np
from pymilvus import MilvusClient
def benchmark_nprobe(
client: MilvusClient,
collection_name: str,
query_vectors: list[list[float]],
nprobe_values: list[int],
top_k: int = 10,
) -> list[dict]:
"""测试不同 nprobe 下的延迟"""
results = []
for nprobe in nprobe_values:
latencies = []
for qv in query_vectors:
start = time.perf_counter()
client.search(
collection_name=collection_name,
data=[qv],
anns_field="embedding",
search_params={"metric_type": "COSINE", "params": {"nprobe": nprobe}},
limit=top_k,
)
latencies.append((time.perf_counter() - start) * 1000)
results.append({
"nprobe": nprobe,
"p50_ms": np.percentile(latencies, 50),
"p95_ms": np.percentile(latencies, 95),
"p99_ms": np.percentile(latencies, 99),
})
return results
召回率评估
python
def compute_recall(
client: MilvusClient,
collection_name: str,
query_vectors: list[list[float]],
ground_truth: list[list[str]],
nprobe: int,
top_k: int = 10,
) -> float:
"""计算 Recall@K(以 FLAT 结果为基准)"""
hits = 0
total = 0
for qv, gt in zip(query_vectors, ground_truth):
results = client.search(
collection_name=collection_name,
data=[qv],
anns_field="embedding",
search_params={"metric_type": "COSINE", "params": {"nprobe": nprobe}},
limit=top_k,
)
retrieved_ids = {hit["id"] for hit in results[0]}
gt_set = set(gt[:top_k])
hits += len(retrieved_ids & gt_set)
total += len(gt_set)
return hits / total if total > 0 else 0.0
典型调优结果
以 100 万条 768 维向量、nlist=1024 为例:
| nprobe | Recall@10 | P50 延迟 | P95 延迟 |
|---|---|---|---|
| 8 | 72% | 2.1ms | 3.5ms |
| 16 | 83% | 3.2ms | 5.1ms |
| 32 | 91% | 5.4ms | 8.2ms |
| 64 | 95% | 9.8ms | 14.3ms |
| 128 | 98% | 18.5ms | 26.1ms |
| 256 | 99.2% | 35.2ms | 48.7ms |
IVF 变体对比
IVF_FLAT
倒排列表中存储原始 float32 向量。精度最高,内存最大。
python
index_params.add_index(
field_name="embedding",
index_type="IVF_FLAT",
metric_type="COSINE",
params={"nlist": 1024},
)
IVF_SQ8
倒排列表中存储 8bit 标量量化向量。每个 float32 压缩为 1 字节,内存降低约 75%。
python
index_params.add_index(
field_name="embedding",
index_type="IVF_SQ8",
metric_type="COSINE",
params={"nlist": 1024},
)
IVF_PQ
倒排列表中存储 PQ 编码。压缩比最高,但精度损失也最大。
python
index_params.add_index(
field_name="embedding",
index_type="IVF_PQ",
metric_type="L2",
params={"nlist": 1024, "m": 96, "nbits": 8},
)
三者对比
| 变体 | 内存(100 万 × 768 维) | 召回率(nprobe=64) | 适用场景 |
|---|---|---|---|
| IVF_FLAT | ~2.9 GB | 95% | 精度优先 |
| IVF_SQ8 | ~0.75 GB | 93% | 内存受限,精度可接受 |
| IVF_PQ (m=96) | ~0.1 GB | 85% | 超大规模,成本优先 |
完整实战代码
python
from pymilvus import DataType, MilvusClient
import numpy as np
import time
client = MilvusClient(uri="http://localhost:19530")
COLLECTION = "ivf_demo"
DIM = 768
N = 100_000
# 创建 Collection
if client.has_collection(COLLECTION):
client.drop_collection(COLLECTION)
schema = MilvusClient.create_schema(auto_id=False)
schema.add_field(field_name="id", datatype=DataType.VARCHAR, is_primary=True, max_length=16)
schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=DIM)
index_params = MilvusClient.prepare_index_params()
index_params.add_index(
field_name="embedding",
index_type="IVF_FLAT",
metric_type="COSINE",
params={"nlist": 512},
)
client.create_collection(collection_name=COLLECTION, schema=schema, index_params=index_params)
# 写入随机数据
batch_size = 5000
for i in range(0, N, batch_size):
vectors = np.random.randn(batch_size, DIM).astype("float32")
norms = np.linalg.norm(vectors, axis=1, keepdims=True)
vectors = (vectors / norms).tolist()
data = [{"id": str(i + j), "embedding": vectors[j]} for j in range(batch_size)]
client.upsert(collection_name=COLLECTION, data=data)
client.load_collection(COLLECTION)
print(f"写入 {N} 条数据完成")
# 搜索测试
query = np.random.randn(DIM).astype("float32")
query = (query / np.linalg.norm(query)).tolist()
for nprobe in [8, 16, 32, 64, 128]:
start = time.perf_counter()
results = client.search(
collection_name=COLLECTION,
data=[query],
anns_field="embedding",
search_params={"metric_type": "COSINE", "params": {"nprobe": nprobe}},
limit=10,
)
elapsed = (time.perf_counter() - start) * 1000
top_score = results[0][0]["distance"] if results[0] else 0
print(f"nprobe={nprobe:3d} 延迟={elapsed:.1f}ms top1_score={top_score:.4f}")
IVF 与 HNSW 的选择
| 维度 | IVF_FLAT | HNSW |
|---|---|---|
| 内存 | 低(仅原始向量 + 中心) | 高(原始向量 + 图结构) |
| 搜索延迟 | 中(取决于 nprobe) | 低(图导航高效) |
| 构建速度 | 快(KMeans 收敛快) | 中(逐条插入建图) |
| 召回率上限 | nprobe=nlist 时 100% | ef 足够大时接近 100% |
| 增量写入 | 新数据追加到最近列表 | 天然支持增量插入 |
| 适用场景 | 内存受限、数据量大 | 内存充足、低延迟要求 |
常见错误
| 现象 | 原因 | 修复 |
|---|---|---|
| 召回率很低 | nprobe 太小 | 逐步增大 nprobe 直到召回达标 |
| 构建索引很慢 | nlist 太大,KMeans 迭代多 | 减小 nlist 或增加训练数据采样 |
| 搜索延迟不稳定 | 列表大小不均匀 | 增大 nlist 使列表更均匀 |
| IVF_PQ 结果很差 | 数据量太少,码本训练不充分 | 数据 > 50 万再用 PQ |
| nprobe=nlist 仍然比 FLAT 慢 | 额外的中心距离计算开销 | 小数据量直接用 FLAT |
面试题
-
IVF 的 nlist 设为 1 和设为 N 分别等价于什么?
nlist=1 等价于 FLAT(所有向量在一个列表)。nlist=N 等价于每个向量一个列表,搜索退化为只比较中心距离。
-
为什么 IVF 的召回率有上限?
查询向量的真正最近邻可能不在最近的 nprobe 个簇中(边界效应)。只有 nprobe=nlist 时才能保证 100% 召回。
-
IVF_SQ8 的量化误差如何影响搜索?
SQ8 把 float32 压缩为 uint8,距离计算有误差,可能导致排序微调,但对 TopK 影响通常 < 3% 召回损失。
-
IVF 索引支持增量写入吗?
Milvus 中支持。新数据分配到最近的已有中心。但数据分布变化大时,旧聚类中心可能不再最优,需要重建索引。
-
如何判断 nlist 是否合理?
观察各列表的大小分布。标准差很大说明 nlist 不匹配数据分布。
练习题
-
nlist 实验:固定 50 万条数据和 nprobe=32,分别用 nlist=128、512、1024、4096 建索引。记录构建时间、搜索延迟和召回率。
-
变体对比:同一批 100 万条数据,分别用 IVF_FLAT、IVF_SQ8、IVF_PQ 建索引。对比内存占用和搜索质量。
-
nprobe 曲线:画出 nprobe 从 1 到 nlist 的完整 recall-latency 曲线,找到拐点。
-
与 HNSW 对比:同一批数据分别用 IVF_FLAT(nlist=1024, nprobe=64) 和 HNSW(M=16, ef=128),对比内存、延迟和召回率。
小结
IVF 是"分区搜索"思想的经典实现。nlist 控制分区粒度,nprobe 控制搜索范围。调优的核心是找到 nprobe 的"甜蜜点"------召回率达标且延迟可接受。IVF 的优势是内存效率高,劣势是需要调参且增量写入可能导致聚类退化。