NVIDIA Blackwell架构深度解读

踩坑千卡GPU集群之前,先搞懂Blackwell架构到底改了啥

搞大模型推理和训练的同行应该都有体会------卡多了之后,瓶颈往往不在单卡算力,而在卡间通信和显存墙。2024年NVIDIA扔出的Blackwell架构,本质上是冲着这两个痛点来的。这篇文章把Blackwell的芯片设计、NVLink互联和机架级产品矩阵梳理一遍,只说干货。


一、从Hopper到Blackwell:为什么不再卷单Die

先看最核心的变化:Blackwell GPU不再是一颗完整的单Die芯片,而是两颗Die通过高速接口拼在一起。
#mermaid-svg-1bRFfpibyt5y9aoc{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-1bRFfpibyt5y9aoc .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-1bRFfpibyt5y9aoc .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-1bRFfpibyt5y9aoc .error-icon{fill:#552222;}#mermaid-svg-1bRFfpibyt5y9aoc .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-1bRFfpibyt5y9aoc .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-1bRFfpibyt5y9aoc .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-1bRFfpibyt5y9aoc .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-1bRFfpibyt5y9aoc .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-1bRFfpibyt5y9aoc .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-1bRFfpibyt5y9aoc .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-1bRFfpibyt5y9aoc .marker{fill:#333333;stroke:#333333;}#mermaid-svg-1bRFfpibyt5y9aoc .marker.cross{stroke:#333333;}#mermaid-svg-1bRFfpibyt5y9aoc svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-1bRFfpibyt5y9aoc p{margin:0;}#mermaid-svg-1bRFfpibyt5y9aoc .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-1bRFfpibyt5y9aoc .cluster-label text{fill:#333;}#mermaid-svg-1bRFfpibyt5y9aoc .cluster-label span{color:#333;}#mermaid-svg-1bRFfpibyt5y9aoc .cluster-label span p{background-color:transparent;}#mermaid-svg-1bRFfpibyt5y9aoc .label text,#mermaid-svg-1bRFfpibyt5y9aoc span{fill:#333;color:#333;}#mermaid-svg-1bRFfpibyt5y9aoc .node rect,#mermaid-svg-1bRFfpibyt5y9aoc .node circle,#mermaid-svg-1bRFfpibyt5y9aoc .node ellipse,#mermaid-svg-1bRFfpibyt5y9aoc .node polygon,#mermaid-svg-1bRFfpibyt5y9aoc .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-1bRFfpibyt5y9aoc .rough-node .label text,#mermaid-svg-1bRFfpibyt5y9aoc .node .label text,#mermaid-svg-1bRFfpibyt5y9aoc .image-shape .label,#mermaid-svg-1bRFfpibyt5y9aoc .icon-shape .label{text-anchor:middle;}#mermaid-svg-1bRFfpibyt5y9aoc .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-1bRFfpibyt5y9aoc .rough-node .label,#mermaid-svg-1bRFfpibyt5y9aoc .node .label,#mermaid-svg-1bRFfpibyt5y9aoc .image-shape .label,#mermaid-svg-1bRFfpibyt5y9aoc .icon-shape .label{text-align:center;}#mermaid-svg-1bRFfpibyt5y9aoc .node.clickable{cursor:pointer;}#mermaid-svg-1bRFfpibyt5y9aoc .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-1bRFfpibyt5y9aoc .arrowheadPath{fill:#333333;}#mermaid-svg-1bRFfpibyt5y9aoc .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-1bRFfpibyt5y9aoc .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-1bRFfpibyt5y9aoc .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-1bRFfpibyt5y9aoc .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-1bRFfpibyt5y9aoc .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-1bRFfpibyt5y9aoc .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-1bRFfpibyt5y9aoc .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-1bRFfpibyt5y9aoc .cluster text{fill:#333;}#mermaid-svg-1bRFfpibyt5y9aoc .cluster span{color:#333;}#mermaid-svg-1bRFfpibyt5y9aoc div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-1bRFfpibyt5y9aoc .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-1bRFfpibyt5y9aoc rect.text{fill:none;stroke-width:0;}#mermaid-svg-1bRFfpibyt5y9aoc .icon-shape,#mermaid-svg-1bRFfpibyt5y9aoc .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-1bRFfpibyt5y9aoc .icon-shape p,#mermaid-svg-1bRFfpibyt5y9aoc .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-1bRFfpibyt5y9aoc .icon-shape .label rect,#mermaid-svg-1bRFfpibyt5y9aoc .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-1bRFfpibyt5y9aoc .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-1bRFfpibyt5y9aoc .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-1bRFfpibyt5y9aoc :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 单Die性能提升仅约15%

功耗增加约35%
B200 双Die合封
NV-HBI

~10TB/s
Die 0
Die 1
H100 单Die封装
单Reticle Die

80B晶体管

TSMC 4N

Hopper的单Die GPU(H100)已经把TSMC 4N工艺的红利吃得差不多了------继续在单个Die上堆晶体管,性能提升幅度远跟不上功耗增长。于是Blackwell换了一条路:用先进封装把两颗Die合封到一起,两Die之间通过NVIDIA自研的NV-HBI(High Bandwidth Interface)互联,带宽做到约10TB/s级别。

一颗B200总计约208B晶体管,大约是H100的两倍。但从OS视角看,两颗Die仍然呈现为一颗完整的GPU,共享8个HBM Stack。

这个变化意味着什么?单颗GPU的算力增长,不再靠制程微缩,而是靠Chiplet拼装。 AMD的MI300系列走了类似路线(八Die合封),可以说这是后摩尔时代行业共识。


二、Blackwell芯片架构:几个关键升级点

2.1 低精度算力:FP4/FP6来了

训练侧FP16/BF16还是主力,但推理侧已经全面拥抱更低精度。Blackwell在FP8之外新增了FP6和FP4支持,以及MXFP8、UE8M0等微缩格式:

精度 定位 典型场景
FP16/BF16 训练主力 大模型预训练、微调
FP8 推理+部分训练 Hopper已支持,Blackwell继续
FP6 过渡精度 MoE模型推理
FP4 极致推理 超大规模模型推理部署

FP4算力大约是FP8的两倍。NVL72机架满载FP4时,总算力非常夸张------单机架就能跑到几百P FLOPS级别。不过注意,FP4目前主要服务于推理,训练侧量化到FP4精度损失还比较大。

2.2 SerDes升级到224G PAM4

SerDes速率从Hopper的112G PAM4翻倍到224G PAM4。这影响的不仅是NVLink,还包括网卡侧------ConnectX-8支持800Gb/s,背后就是224G SerDes的功劳。

2.3 HBM3E显存

Blackwell全系采用HBM3E,B200配置192GB(部分SKU),带宽约8TB/s。HBM容量和带宽的增长对推理场景尤其关键------KV Cache的显存占用直接决定了最大上下文长度。
#mermaid-svg-U9ov7EaTF2an5DwS{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-U9ov7EaTF2an5DwS .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-U9ov7EaTF2an5DwS .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-U9ov7EaTF2an5DwS .error-icon{fill:#552222;}#mermaid-svg-U9ov7EaTF2an5DwS .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-U9ov7EaTF2an5DwS .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-U9ov7EaTF2an5DwS .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-U9ov7EaTF2an5DwS .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-U9ov7EaTF2an5DwS .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-U9ov7EaTF2an5DwS .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-U9ov7EaTF2an5DwS .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-U9ov7EaTF2an5DwS .marker{fill:#333333;stroke:#333333;}#mermaid-svg-U9ov7EaTF2an5DwS .marker.cross{stroke:#333333;}#mermaid-svg-U9ov7EaTF2an5DwS svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-U9ov7EaTF2an5DwS p{margin:0;}#mermaid-svg-U9ov7EaTF2an5DwS :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} GPU HBM容量演进 (单卡) A100(2020)H100(2022)H200(2023)B200(2024)GB300 Ultra(~2025) 350300250200150100500 HBM容量(GB)


3.1 NVLink五代演进

NVLink从Pascal时代的NVLink 1.0一路迭代到Blackwell的NVLink 5.0,变化非常直观:
#mermaid-svg-vcpXDqzk6bb2OBCT{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-vcpXDqzk6bb2OBCT .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-vcpXDqzk6bb2OBCT .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-vcpXDqzk6bb2OBCT .error-icon{fill:#552222;}#mermaid-svg-vcpXDqzk6bb2OBCT .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-vcpXDqzk6bb2OBCT .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-vcpXDqzk6bb2OBCT .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-vcpXDqzk6bb2OBCT .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-vcpXDqzk6bb2OBCT .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-vcpXDqzk6bb2OBCT .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-vcpXDqzk6bb2OBCT .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-vcpXDqzk6bb2OBCT .marker{fill:#333333;stroke:#333333;}#mermaid-svg-vcpXDqzk6bb2OBCT .marker.cross{stroke:#333333;}#mermaid-svg-vcpXDqzk6bb2OBCT svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-vcpXDqzk6bb2OBCT p{margin:0;}#mermaid-svg-vcpXDqzk6bb2OBCT .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-vcpXDqzk6bb2OBCT .cluster-label text{fill:#333;}#mermaid-svg-vcpXDqzk6bb2OBCT .cluster-label span{color:#333;}#mermaid-svg-vcpXDqzk6bb2OBCT .cluster-label span p{background-color:transparent;}#mermaid-svg-vcpXDqzk6bb2OBCT .label text,#mermaid-svg-vcpXDqzk6bb2OBCT span{fill:#333;color:#333;}#mermaid-svg-vcpXDqzk6bb2OBCT .node rect,#mermaid-svg-vcpXDqzk6bb2OBCT .node circle,#mermaid-svg-vcpXDqzk6bb2OBCT .node ellipse,#mermaid-svg-vcpXDqzk6bb2OBCT .node polygon,#mermaid-svg-vcpXDqzk6bb2OBCT .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-vcpXDqzk6bb2OBCT .rough-node .label text,#mermaid-svg-vcpXDqzk6bb2OBCT .node .label text,#mermaid-svg-vcpXDqzk6bb2OBCT .image-shape .label,#mermaid-svg-vcpXDqzk6bb2OBCT .icon-shape .label{text-anchor:middle;}#mermaid-svg-vcpXDqzk6bb2OBCT .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-vcpXDqzk6bb2OBCT .rough-node .label,#mermaid-svg-vcpXDqzk6bb2OBCT .node .label,#mermaid-svg-vcpXDqzk6bb2OBCT .image-shape .label,#mermaid-svg-vcpXDqzk6bb2OBCT .icon-shape .label{text-align:center;}#mermaid-svg-vcpXDqzk6bb2OBCT .node.clickable{cursor:pointer;}#mermaid-svg-vcpXDqzk6bb2OBCT .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-vcpXDqzk6bb2OBCT .arrowheadPath{fill:#333333;}#mermaid-svg-vcpXDqzk6bb2OBCT .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-vcpXDqzk6bb2OBCT .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-vcpXDqzk6bb2OBCT .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-vcpXDqzk6bb2OBCT .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-vcpXDqzk6bb2OBCT .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-vcpXDqzk6bb2OBCT .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-vcpXDqzk6bb2OBCT .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-vcpXDqzk6bb2OBCT .cluster text{fill:#333;}#mermaid-svg-vcpXDqzk6bb2OBCT .cluster span{color:#333;}#mermaid-svg-vcpXDqzk6bb2OBCT div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-vcpXDqzk6bb2OBCT .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-vcpXDqzk6bb2OBCT rect.text{fill:none;stroke-width:0;}#mermaid-svg-vcpXDqzk6bb2OBCT .icon-shape,#mermaid-svg-vcpXDqzk6bb2OBCT .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-vcpXDqzk6bb2OBCT .icon-shape p,#mermaid-svg-vcpXDqzk6bb2OBCT .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-vcpXDqzk6bb2OBCT .icon-shape .label rect,#mermaid-svg-vcpXDqzk6bb2OBCT .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-vcpXDqzk6bb2OBCT .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-vcpXDqzk6bb2OBCT .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-vcpXDqzk6bb2OBCT :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Pascal (2016)
NVLink 1.0

4 Links × 40GB/s

单向总带宽 160GB/s
Ampere (2020)
NVLink 3.0

12 Links × 50GB/s

单向总带宽 600GB/s
Hopper (2022)
NVLink 4.0

18 Links × 50GB/s

单向总带宽 900GB/s
Blackwell (2024)
NVLink 5.0

18 Links × 100GB/s

单向总带宽 1800GB/s

每个NVLink Link从2 Lane升级到2 Lane(Blackwell保持不变),但每Lane速率从112G PAM4翻到224G PAM4,双向总带宽直接翻倍到1800GB/s。

3.2 NVSwitch:Port数量决定L1规模上限

NVSwitch的Port数量直接决定了L1互联域能塞多少GPU。这是一个很关键但容易被忽略的约束:

NVSwitch代际 每芯片Port数 每Port Lane数 单GPU NVLink数 L1域最大GPU数
NVSwitch 2.0 (Ampere) 64 2 12 8 (NVL8)
NVSwitch 3.0 (Hopper) 64 2 18 8 (NVL8)
NVSwitch 5.0 (Blackwell) 72 2 18 72 (NVL72)

Blackwell的NVSwitch单芯片提供72个Port,加上NVLink带宽翻倍,使得L1全互联域从Hopper时代的8卡直接跃升到72卡。这是NVL72能成立的前提。

一句话总结:GPU的NVLink Link数 × 单Link带宽 = GPU对外出口带宽;Switch Port数 × GPU数 = 全网状互联的数学约束。两者缺一不可。


四、产品矩阵:B200/B300/GB200/GB300到底怎么选

Blackwell家族的产品线确实有点眼花缭乱,按层级梳理一下:

4.1 芯片级

芯片 Die配置 OS视角GPU数 定位
B100 单芯双Die 1 早期版本
B200 单芯双Die 1 主力SXM版本
B300A 单芯单Die 1 成本优化版,对应NVL16
B300 Ultra 单芯双Die 1 旗舰版,对应GB300 NVL72

B300A采用单Die设计,显存144GB,功耗更低;B300 Ultra则是双Die的满血版。

4.2 SuperChip:GB200/GB300

#mermaid-svg-YpK6BLDotWYAfusk{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-YpK6BLDotWYAfusk .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-YpK6BLDotWYAfusk .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-YpK6BLDotWYAfusk .error-icon{fill:#552222;}#mermaid-svg-YpK6BLDotWYAfusk .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-YpK6BLDotWYAfusk .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-YpK6BLDotWYAfusk .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-YpK6BLDotWYAfusk .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-YpK6BLDotWYAfusk .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-YpK6BLDotWYAfusk .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-YpK6BLDotWYAfusk .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-YpK6BLDotWYAfusk .marker{fill:#333333;stroke:#333333;}#mermaid-svg-YpK6BLDotWYAfusk .marker.cross{stroke:#333333;}#mermaid-svg-YpK6BLDotWYAfusk svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-YpK6BLDotWYAfusk p{margin:0;}#mermaid-svg-YpK6BLDotWYAfusk .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-YpK6BLDotWYAfusk .cluster-label text{fill:#333;}#mermaid-svg-YpK6BLDotWYAfusk .cluster-label span{color:#333;}#mermaid-svg-YpK6BLDotWYAfusk .cluster-label span p{background-color:transparent;}#mermaid-svg-YpK6BLDotWYAfusk .label text,#mermaid-svg-YpK6BLDotWYAfusk span{fill:#333;color:#333;}#mermaid-svg-YpK6BLDotWYAfusk .node rect,#mermaid-svg-YpK6BLDotWYAfusk .node circle,#mermaid-svg-YpK6BLDotWYAfusk .node ellipse,#mermaid-svg-YpK6BLDotWYAfusk .node polygon,#mermaid-svg-YpK6BLDotWYAfusk .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-YpK6BLDotWYAfusk .rough-node .label text,#mermaid-svg-YpK6BLDotWYAfusk .node .label text,#mermaid-svg-YpK6BLDotWYAfusk .image-shape .label,#mermaid-svg-YpK6BLDotWYAfusk .icon-shape .label{text-anchor:middle;}#mermaid-svg-YpK6BLDotWYAfusk .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-YpK6BLDotWYAfusk .rough-node .label,#mermaid-svg-YpK6BLDotWYAfusk .node .label,#mermaid-svg-YpK6BLDotWYAfusk .image-shape .label,#mermaid-svg-YpK6BLDotWYAfusk .icon-shape .label{text-align:center;}#mermaid-svg-YpK6BLDotWYAfusk .node.clickable{cursor:pointer;}#mermaid-svg-YpK6BLDotWYAfusk .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-YpK6BLDotWYAfusk .arrowheadPath{fill:#333333;}#mermaid-svg-YpK6BLDotWYAfusk .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-YpK6BLDotWYAfusk .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-YpK6BLDotWYAfusk .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-YpK6BLDotWYAfusk .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-YpK6BLDotWYAfusk .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-YpK6BLDotWYAfusk .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-YpK6BLDotWYAfusk .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-YpK6BLDotWYAfusk .cluster text{fill:#333;}#mermaid-svg-YpK6BLDotWYAfusk .cluster span{color:#333;}#mermaid-svg-YpK6BLDotWYAfusk div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-YpK6BLDotWYAfusk .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-YpK6BLDotWYAfusk rect.text{fill:none;stroke-width:0;}#mermaid-svg-YpK6BLDotWYAfusk .icon-shape,#mermaid-svg-YpK6BLDotWYAfusk .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-YpK6BLDotWYAfusk .icon-shape p,#mermaid-svg-YpK6BLDotWYAfusk .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-YpK6BLDotWYAfusk .icon-shape .label rect,#mermaid-svg-YpK6BLDotWYAfusk .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-YpK6BLDotWYAfusk .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-YpK6BLDotWYAfusk .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-YpK6BLDotWYAfusk :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 统一内存视图
LPDDR5X 480GB

来自Grace CPU
HBM3E 384GB

来自2×B200
GB200 SuperChip
NVLink-C2C

~900GB/s
Grace CPU

72核 ARM
B200 GPU × 2

GB200把1颗Grace CPU和2颗B200 GPU通过NVLink-C2C捆在一起,形成一颗SuperChip。CPU和GPU之间900GB/s的带宽意味着它们共享同一个内存地址空间,不再是传统PCIe的DMA拷贝模式。

GB300升级到2颗B300 Ultra,总体思路一致。

4.3 机架级产品

产品 GPU数 NVSwitch配置 定位
HGX B300A NVL16 16 × B300A SXM形态 传统8/16卡服务器升级
DGX B300 Ultra 8 × B300 Ultra SXM形态 单机高性能节点
GB200 NVL36 36 × B200 9×NVSwitch Tray L1域半互联
GB200 NVL72 72 × B200 9×NVSwitch Tray L1域全互联
GB300 NVL72 72 × B300 Ultra 9×NVSwitch Tray L1域全互联升级版

五、NVL72:为什么要做72卡全互联机架

5.1 物理构成

一个NVL72机架包括:
#mermaid-svg-IOhJZoj27V9SROPZ{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-IOhJZoj27V9SROPZ .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-IOhJZoj27V9SROPZ .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-IOhJZoj27V9SROPZ .error-icon{fill:#552222;}#mermaid-svg-IOhJZoj27V9SROPZ .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-IOhJZoj27V9SROPZ .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-IOhJZoj27V9SROPZ .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-IOhJZoj27V9SROPZ .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-IOhJZoj27V9SROPZ .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-IOhJZoj27V9SROPZ .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-IOhJZoj27V9SROPZ .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-IOhJZoj27V9SROPZ .marker{fill:#333333;stroke:#333333;}#mermaid-svg-IOhJZoj27V9SROPZ .marker.cross{stroke:#333333;}#mermaid-svg-IOhJZoj27V9SROPZ svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-IOhJZoj27V9SROPZ p{margin:0;}#mermaid-svg-IOhJZoj27V9SROPZ .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-IOhJZoj27V9SROPZ .cluster-label text{fill:#333;}#mermaid-svg-IOhJZoj27V9SROPZ .cluster-label span{color:#333;}#mermaid-svg-IOhJZoj27V9SROPZ .cluster-label span p{background-color:transparent;}#mermaid-svg-IOhJZoj27V9SROPZ .label text,#mermaid-svg-IOhJZoj27V9SROPZ span{fill:#333;color:#333;}#mermaid-svg-IOhJZoj27V9SROPZ .node rect,#mermaid-svg-IOhJZoj27V9SROPZ .node circle,#mermaid-svg-IOhJZoj27V9SROPZ .node ellipse,#mermaid-svg-IOhJZoj27V9SROPZ .node polygon,#mermaid-svg-IOhJZoj27V9SROPZ .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-IOhJZoj27V9SROPZ .rough-node .label text,#mermaid-svg-IOhJZoj27V9SROPZ .node .label text,#mermaid-svg-IOhJZoj27V9SROPZ .image-shape .label,#mermaid-svg-IOhJZoj27V9SROPZ .icon-shape .label{text-anchor:middle;}#mermaid-svg-IOhJZoj27V9SROPZ .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-IOhJZoj27V9SROPZ .rough-node .label,#mermaid-svg-IOhJZoj27V9SROPZ .node .label,#mermaid-svg-IOhJZoj27V9SROPZ .image-shape .label,#mermaid-svg-IOhJZoj27V9SROPZ .icon-shape .label{text-align:center;}#mermaid-svg-IOhJZoj27V9SROPZ .node.clickable{cursor:pointer;}#mermaid-svg-IOhJZoj27V9SROPZ .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-IOhJZoj27V9SROPZ .arrowheadPath{fill:#333333;}#mermaid-svg-IOhJZoj27V9SROPZ .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-IOhJZoj27V9SROPZ .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-IOhJZoj27V9SROPZ .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-IOhJZoj27V9SROPZ .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-IOhJZoj27V9SROPZ .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-IOhJZoj27V9SROPZ .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-IOhJZoj27V9SROPZ .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-IOhJZoj27V9SROPZ .cluster text{fill:#333;}#mermaid-svg-IOhJZoj27V9SROPZ .cluster span{color:#333;}#mermaid-svg-IOhJZoj27V9SROPZ div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-IOhJZoj27V9SROPZ .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-IOhJZoj27V9SROPZ rect.text{fill:none;stroke-width:0;}#mermaid-svg-IOhJZoj27V9SROPZ .icon-shape,#mermaid-svg-IOhJZoj27V9SROPZ .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-IOhJZoj27V9SROPZ .icon-shape p,#mermaid-svg-IOhJZoj27V9SROPZ .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-IOhJZoj27V9SROPZ .icon-shape .label rect,#mermaid-svg-IOhJZoj27V9SROPZ .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-IOhJZoj27V9SROPZ .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-IOhJZoj27V9SROPZ .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-IOhJZoj27V9SROPZ :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} GB200 NVL72 机架
铜缆背板
CX-8 SuperNIC 800Gb/s
18 × Compute Tray (1RU)
Tray 1: 2×GB200 = 4×B200 GPU
Tray 2-17: ...
Tray 18: 2×GB200 = 4×B200 GPU
9 × NVSwitch Tray (1RU)
每Tray 2×NVSwitch Chip

每Chip 72 Port

18个Compute Tray × 每个4颗B200 = 72颗GPU在L1域内全互联,任意两颗GPU之间NVLink带宽可达1800GB/s。

5.2 NVL72的核心收益

搞大模型推理的同行应该很熟这个场景:MoE模型的Decode阶段是EP(Expert Parallelism)的重灾区。 All-to-All通信的带宽和延迟直接决定了TPS上限。

NVL72相对于多台NVL8拼装,最本质的优势在于:

  • L1域内All-to-All通信无阻塞:72卡全互联,不需要跨机架走IB/RoCE
  • 延迟大幅降低:铜缆背板替代光模块+交换机,延迟低一个数量级
  • 单位功耗下的总吞吐提升:同样跑EP64,NVL72的TPS上限远超多机拼装方案

不过也要清醒认识到:NVL72解决的问题是Scale-Up通信,如果你的模型并行策略以DP(Data Parallelism)为主,那NVL72的收益就有限了。选什么产品形态,取决于你的并行策略。


六、未来路线图:Rubin在路上了

NVIDIA已经公开了后续两代产品的规划:

代际 时间窗口 GPU代表 NVLink 关键变化
Hopper 2022-2023 H100/H200 NVLink 4.0 首次引入FP8
Blackwell 2024-2026 B200/B300 NVLink 5.0 双Die合封、FP4、NVL72
Rubin ~2026-2027 VR200/VR300 NVLink 6.0/7.0 四Die合封(Rubin Ultra)、HBM4

Rubin Ultra将采用4颗Reticle Die + 2颗I/O Chiplet的Chiplet方案,FP4算力目标在百P级别,HBM升级到HBM4(Rubin)甚至HBM4E(Rubin Ultra)。

互连侧:NVLink 6.0的每Link从2 Lane升级到4 Lane(每Lane保持200G),单向带宽从900GB/s翻到1800GB/s;NVSwitch单芯片Port数从72增至144------这意味着L1域的上限还会继续扩大。
#mermaid-svg-YYgFEodtXtDhm8rH{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-YYgFEodtXtDhm8rH .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-YYgFEodtXtDhm8rH .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-YYgFEodtXtDhm8rH .error-icon{fill:#552222;}#mermaid-svg-YYgFEodtXtDhm8rH .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-YYgFEodtXtDhm8rH .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-YYgFEodtXtDhm8rH .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-YYgFEodtXtDhm8rH .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-YYgFEodtXtDhm8rH .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-YYgFEodtXtDhm8rH .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-YYgFEodtXtDhm8rH .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-YYgFEodtXtDhm8rH .marker{fill:#333333;stroke:#333333;}#mermaid-svg-YYgFEodtXtDhm8rH .marker.cross{stroke:#333333;}#mermaid-svg-YYgFEodtXtDhm8rH svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-YYgFEodtXtDhm8rH p{margin:0;}#mermaid-svg-YYgFEodtXtDhm8rH .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-YYgFEodtXtDhm8rH .cluster-label text{fill:#333;}#mermaid-svg-YYgFEodtXtDhm8rH .cluster-label span{color:#333;}#mermaid-svg-YYgFEodtXtDhm8rH .cluster-label span p{background-color:transparent;}#mermaid-svg-YYgFEodtXtDhm8rH .label text,#mermaid-svg-YYgFEodtXtDhm8rH span{fill:#333;color:#333;}#mermaid-svg-YYgFEodtXtDhm8rH .node rect,#mermaid-svg-YYgFEodtXtDhm8rH .node circle,#mermaid-svg-YYgFEodtXtDhm8rH .node ellipse,#mermaid-svg-YYgFEodtXtDhm8rH .node polygon,#mermaid-svg-YYgFEodtXtDhm8rH .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-YYgFEodtXtDhm8rH .rough-node .label text,#mermaid-svg-YYgFEodtXtDhm8rH .node .label text,#mermaid-svg-YYgFEodtXtDhm8rH .image-shape .label,#mermaid-svg-YYgFEodtXtDhm8rH .icon-shape .label{text-anchor:middle;}#mermaid-svg-YYgFEodtXtDhm8rH .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-YYgFEodtXtDhm8rH .rough-node .label,#mermaid-svg-YYgFEodtXtDhm8rH .node .label,#mermaid-svg-YYgFEodtXtDhm8rH .image-shape .label,#mermaid-svg-YYgFEodtXtDhm8rH .icon-shape .label{text-align:center;}#mermaid-svg-YYgFEodtXtDhm8rH .node.clickable{cursor:pointer;}#mermaid-svg-YYgFEodtXtDhm8rH .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-YYgFEodtXtDhm8rH .arrowheadPath{fill:#333333;}#mermaid-svg-YYgFEodtXtDhm8rH .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-YYgFEodtXtDhm8rH .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-YYgFEodtXtDhm8rH .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-YYgFEodtXtDhm8rH .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-YYgFEodtXtDhm8rH .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-YYgFEodtXtDhm8rH .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-YYgFEodtXtDhm8rH .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-YYgFEodtXtDhm8rH .cluster text{fill:#333;}#mermaid-svg-YYgFEodtXtDhm8rH .cluster span{color:#333;}#mermaid-svg-YYgFEodtXtDhm8rH div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-YYgFEodtXtDhm8rH .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-YYgFEodtXtDhm8rH rect.text{fill:none;stroke-width:0;}#mermaid-svg-YYgFEodtXtDhm8rH .icon-shape,#mermaid-svg-YYgFEodtXtDhm8rH .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-YYgFEodtXtDhm8rH .icon-shape p,#mermaid-svg-YYgFEodtXtDhm8rH .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-YYgFEodtXtDhm8rH .icon-shape .label rect,#mermaid-svg-YYgFEodtXtDhm8rH .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-YYgFEodtXtDhm8rH .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-YYgFEodtXtDhm8rH .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-YYgFEodtXtDhm8rH :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 制程+精度
Chiplet规模化
Rubin 2026
四Die(Rubin Ultra)

HBM4/HBM4E

NVL144/NVL576
Blackwell 2024
双Die

FP4/FP6

NVL72
Hopper 2022
单Die

FP8


七、总结:几个值得关注的趋势

把整条线串起来,能从Blackwell看到几个很明确的行业趋势:

维度 趋势 Blackwell体现
芯片设计 单Die→Chiplet多Die合封 B200双Die通过NV-HBI互联
精度路线 推理精度向FP4下探 首次支持FP6/FP4
显存 HBM容量、带宽持续增长 HBM3E 192GB→288GB
互联 L1域规模从8卡跃升到72卡 NVSwitch 5.0 + NVLink 5.0
机架 从单机到整机柜一体化交付 NVL72是完整产品
网络 Scale-Out网卡进入800G时代 CX-8 800Gb/s

搞大模型基础设施的同行需要关注的本质问题是:当模型参数规模持续膨胀(十万亿级)、上下文长度进入百万Token级,单卡算力增长已经跟不上需求了。 未来的竞争焦点,会从"单卡有多强"转变为"一个机架能提供多少有效算力和显存带宽"。

Blackwell给出的答案是双Die Chiplet + NVL72全互联机架,这个思路大概率会延续到后面几代产品。


本文基于NVIDIA官方公开发布的产品信息和架构白皮书整理,架构分析部分均为公开信息。所有性能数据来自NVIDIA官方公布的产品规格表。