19-Hugging Face Transformers之Qwen3.5-MoE 系列详解:混合专家 + 线性注意力 + 多模态的完整生命周期

Qwen3.5-MoE 系列详解:混合专家 + 线性注意力 + 多模态的完整生命周期

本文档以 Qwen3.5-MoE 模型为例,将 Transformers 框架的所有模块串联起来,深度剖析最前沿的 混合专家 + 多模态 + 线性注意力 模型在 Transformers 中的完整生命周期。

源码文件:

  • configuration_qwen3_5_moe.py(file:///workspace/src/transformers/models/qwen3_5_moe/configuration_qwen3_5_moe.py)
  • modeling_qwen3_5_moe.py(file:///workspace/src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py)
  • modular_qwen3_5_moe.py(file:///workspace/src/transformers/models/qwen3_5_moe/modular_qwen3_5_moe.py)

相关文章:

Hugging Face Transformers 源码全景解读

01-Hugging Face Transformers 核心基础设施深度分析

02-Hugging Face Transformers 配置系统深度分析

03-Hugging Face Transformers 模型系统深度分析

04-Hugging Face Transformers 注意力与掩码系统深度分析

05-Hugging Face Transformers 缓存系统深度分析

06-Hugging Face Transformers 生成系统深度分析

07-Hugging Face Transformers 分词器系统深度分析

08-Hugging Face Transformers 多模态处理系统深度分析

09-Hugging Face Transformers 训练系统深度分析

10-Hugging Face Transformers 量化系统深度分析

11-Hugging Face Transformers 分布式与并行系统深度分析

12-Hugging Face Transformers之Pipeline 推理管道深入分析

13-Hugging Face Transformers之AutoModel 自动分发机制深入分析

14-Hugging Face Transformers 模型实现模式深度分析

15-Hugging Face Transformers之CLI 与工具架构总览

16-Hugging Face Transformers之测试体系架构总览

17-Hugging Face Transformers之BERT 案例详解:Transformers 框架全模块串联

18-Hugging Face Transformers之GPT-2 案例详解:Decoder-only 自回归模型的完整生命周期

19-Hugging Face Transformers之Qwen3.5-MoE 系列详解:混合专家 + 线性注意力 + 多模态的完整生命周期

1. Qwen3.5-MoE 在 Transformers 中的定位

Qwen3.5-MoE 是 Qwen 系列中最前沿的混合架构模型,它同时融合了三大创新:混合注意力层 (full_attention + linear_attention 交替)、MoE 专家路由 (256 专家 Top-8 路由 + 共享专家)和多模态视觉编码器(Vision Transformer + PatchMerger)。

1.1 架构定位图

#mermaid-svg-tfBgeanuvV5jRV9P{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-tfBgeanuvV5jRV9P .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-tfBgeanuvV5jRV9P .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-tfBgeanuvV5jRV9P .error-icon{fill:#552222;}#mermaid-svg-tfBgeanuvV5jRV9P .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-tfBgeanuvV5jRV9P .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-tfBgeanuvV5jRV9P .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-tfBgeanuvV5jRV9P .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-tfBgeanuvV5jRV9P .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-tfBgeanuvV5jRV9P .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-tfBgeanuvV5jRV9P .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-tfBgeanuvV5jRV9P .marker{fill:#333333;stroke:#333333;}#mermaid-svg-tfBgeanuvV5jRV9P .marker.cross{stroke:#333333;}#mermaid-svg-tfBgeanuvV5jRV9P svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-tfBgeanuvV5jRV9P p{margin:0;}#mermaid-svg-tfBgeanuvV5jRV9P .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-tfBgeanuvV5jRV9P .cluster-label text{fill:#333;}#mermaid-svg-tfBgeanuvV5jRV9P .cluster-label span{color:#333;}#mermaid-svg-tfBgeanuvV5jRV9P .cluster-label span p{background-color:transparent;}#mermaid-svg-tfBgeanuvV5jRV9P .label text,#mermaid-svg-tfBgeanuvV5jRV9P span{fill:#333;color:#333;}#mermaid-svg-tfBgeanuvV5jRV9P .node rect,#mermaid-svg-tfBgeanuvV5jRV9P .node circle,#mermaid-svg-tfBgeanuvV5jRV9P .node ellipse,#mermaid-svg-tfBgeanuvV5jRV9P .node polygon,#mermaid-svg-tfBgeanuvV5jRV9P .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-tfBgeanuvV5jRV9P .rough-node .label text,#mermaid-svg-tfBgeanuvV5jRV9P .node .label text,#mermaid-svg-tfBgeanuvV5jRV9P .image-shape .label,#mermaid-svg-tfBgeanuvV5jRV9P .icon-shape .label{text-anchor:middle;}#mermaid-svg-tfBgeanuvV5jRV9P .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-tfBgeanuvV5jRV9P .rough-node .label,#mermaid-svg-tfBgeanuvV5jRV9P .node .label,#mermaid-svg-tfBgeanuvV5jRV9P .image-shape .label,#mermaid-svg-tfBgeanuvV5jRV9P .icon-shape .label{text-align:center;}#mermaid-svg-tfBgeanuvV5jRV9P .node.clickable{cursor:pointer;}#mermaid-svg-tfBgeanuvV5jRV9P .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-tfBgeanuvV5jRV9P .arrowheadPath{fill:#333333;}#mermaid-svg-tfBgeanuvV5jRV9P .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-tfBgeanuvV5jRV9P .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-tfBgeanuvV5jRV9P .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-tfBgeanuvV5jRV9P .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-tfBgeanuvV5jRV9P .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-tfBgeanuvV5jRV9P .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-tfBgeanuvV5jRV9P .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-tfBgeanuvV5jRV9P .cluster text{fill:#333;}#mermaid-svg-tfBgeanuvV5jRV9P .cluster span{color:#333;}#mermaid-svg-tfBgeanuvV5jRV9P div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-tfBgeanuvV5jRV9P .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-tfBgeanuvV5jRV9P rect.text{fill:none;stroke-width:0;}#mermaid-svg-tfBgeanuvV5jRV9P .icon-shape,#mermaid-svg-tfBgeanuvV5jRV9P .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-tfBgeanuvV5jRV9P .icon-shape p,#mermaid-svg-tfBgeanuvV5jRV9P .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-tfBgeanuvV5jRV9P .icon-shape .label rect,#mermaid-svg-tfBgeanuvV5jRV9P .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-tfBgeanuvV5jRV9P .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-tfBgeanuvV5jRV9P .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-tfBgeanuvV5jRV9P :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Qwen 模型家族
Qwen2

纯 Dense + 标准注意力
Qwen2-VL

Dense + 多模态 + 标准注意力
Qwen2-MoE

MoE + 标准注意力
Qwen3-VL

Dense + 多模态 + 标准注意力
Qwen3-MoE

MoE + 标准注意力
Qwen3

Dense + 标准注意力 + 思考模式
Qwen3-VL-MoE

MoE + 多模态 + 标准注意力
Qwen3-Next

Dense + 混合注意力 + MoE
Qwen3.5

Dense + 混合注意力
Qwen3.5-MoE

🔥 MoE + 混合注意力 + 多模态

1.2 三大创新点图示

#mermaid-svg-fzRU5QU4LsHs4yfD{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-fzRU5QU4LsHs4yfD .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-fzRU5QU4LsHs4yfD .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-fzRU5QU4LsHs4yfD .error-icon{fill:#552222;}#mermaid-svg-fzRU5QU4LsHs4yfD .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-fzRU5QU4LsHs4yfD .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-fzRU5QU4LsHs4yfD .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-fzRU5QU4LsHs4yfD .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-fzRU5QU4LsHs4yfD .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-fzRU5QU4LsHs4yfD .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-fzRU5QU4LsHs4yfD .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-fzRU5QU4LsHs4yfD .marker{fill:#333333;stroke:#333333;}#mermaid-svg-fzRU5QU4LsHs4yfD .marker.cross{stroke:#333333;}#mermaid-svg-fzRU5QU4LsHs4yfD svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-fzRU5QU4LsHs4yfD p{margin:0;}#mermaid-svg-fzRU5QU4LsHs4yfD .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-fzRU5QU4LsHs4yfD .cluster-label text{fill:#333;}#mermaid-svg-fzRU5QU4LsHs4yfD .cluster-label span{color:#333;}#mermaid-svg-fzRU5QU4LsHs4yfD .cluster-label span p{background-color:transparent;}#mermaid-svg-fzRU5QU4LsHs4yfD .label text,#mermaid-svg-fzRU5QU4LsHs4yfD span{fill:#333;color:#333;}#mermaid-svg-fzRU5QU4LsHs4yfD .node rect,#mermaid-svg-fzRU5QU4LsHs4yfD .node circle,#mermaid-svg-fzRU5QU4LsHs4yfD .node ellipse,#mermaid-svg-fzRU5QU4LsHs4yfD .node polygon,#mermaid-svg-fzRU5QU4LsHs4yfD .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-fzRU5QU4LsHs4yfD .rough-node .label text,#mermaid-svg-fzRU5QU4LsHs4yfD .node .label text,#mermaid-svg-fzRU5QU4LsHs4yfD .image-shape .label,#mermaid-svg-fzRU5QU4LsHs4yfD .icon-shape .label{text-anchor:middle;}#mermaid-svg-fzRU5QU4LsHs4yfD .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-fzRU5QU4LsHs4yfD .rough-node .label,#mermaid-svg-fzRU5QU4LsHs4yfD .node .label,#mermaid-svg-fzRU5QU4LsHs4yfD .image-shape .label,#mermaid-svg-fzRU5QU4LsHs4yfD .icon-shape .label{text-align:center;}#mermaid-svg-fzRU5QU4LsHs4yfD .node.clickable{cursor:pointer;}#mermaid-svg-fzRU5QU4LsHs4yfD .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-fzRU5QU4LsHs4yfD .arrowheadPath{fill:#333333;}#mermaid-svg-fzRU5QU4LsHs4yfD .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-fzRU5QU4LsHs4yfD .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-fzRU5QU4LsHs4yfD .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-fzRU5QU4LsHs4yfD .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-fzRU5QU4LsHs4yfD .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-fzRU5QU4LsHs4yfD .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-fzRU5QU4LsHs4yfD .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-fzRU5QU4LsHs4yfD .cluster text{fill:#333;}#mermaid-svg-fzRU5QU4LsHs4yfD .cluster span{color:#333;}#mermaid-svg-fzRU5QU4LsHs4yfD div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-fzRU5QU4LsHs4yfD .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-fzRU5QU4LsHs4yfD rect.text{fill:none;stroke-width:0;}#mermaid-svg-fzRU5QU4LsHs4yfD .icon-shape,#mermaid-svg-fzRU5QU4LsHs4yfD .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-fzRU5QU4LsHs4yfD .icon-shape p,#mermaid-svg-fzRU5QU4LsHs4yfD .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-fzRU5QU4LsHs4yfD .icon-shape .label rect,#mermaid-svg-fzRU5QU4LsHs4yfD .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-fzRU5QU4LsHs4yfD .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-fzRU5QU4LsHs4yfD .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-fzRU5QU4LsHs4yfD :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 创新点 3:多模态视觉编码器
PatchEmbed

3D 卷积
VisionBlocks × 27

  • 旋转位置编码
    PatchMerger

空间合并 + 投影
创新点 2:MoE 专家路由
TopKRouter

256 专家 Top-8
Qwen3_5MoeExperts

3D 参数张量
SharedExpert

  • SharedExpertGate
    创新点 1:混合注意力层
    每隔 4 层交替
    每隔 4 层交替
    full_attention 层

标准 Softmax 注意力

  • QK Norm + Gate
    linear_attention 层

GatedDeltaNet

  • 因果卷积 + 门控 Delta 规则

1.3 继承关系

modular_qwen3_5_moe.py(file:///workspace/src/transformers/models/qwen3_5_moe/modular_qwen3_5_moe.py) 可以看出,Qwen3.5-MoE 的类继承链非常清晰:

Qwen3.5-MoE 类 直接父类 来源模块
Qwen3_5MoeTextConfig Qwen3NextConfig qwen3_next
Qwen3_5MoeVisionConfig Qwen3_5VisionConfig qwen3_5
Qwen3_5MoeConfig Qwen3VLConfig qwen3_vl
Qwen3_5MoeGatedDeltaNet Qwen3_5GatedDeltaNet qwen3_5
Qwen3_5MoeAttention Qwen3NextAttention qwen3_next
Qwen3_5MoeExperts Qwen3NextExperts qwen3_next
Qwen3_5MoeTopKRouter Qwen3VLMoeTextTopKRouter qwen3_vl_moe
Qwen3_5MoeSparseMoeBlock Qwen3NextSparseMoeBlock qwen3_next
Qwen3_5MoeForConditionalGeneration Qwen3VLMoeForConditionalGeneration qwen3_vl_moe

2. Config 三层嵌套设计

Qwen3.5-MoE 采用三层 Config 嵌套设计,顶层 Qwen3_5MoeConfig 包含 text_configvision_config 两个子配置。

2.1 Config 嵌套类图

#mermaid-svg-iptmIhTmW5GEddKj{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-iptmIhTmW5GEddKj .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-iptmIhTmW5GEddKj .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-iptmIhTmW5GEddKj .error-icon{fill:#552222;}#mermaid-svg-iptmIhTmW5GEddKj .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-iptmIhTmW5GEddKj .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-iptmIhTmW5GEddKj .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-iptmIhTmW5GEddKj .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-iptmIhTmW5GEddKj .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-iptmIhTmW5GEddKj .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-iptmIhTmW5GEddKj .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-iptmIhTmW5GEddKj .marker{fill:#333333;stroke:#333333;}#mermaid-svg-iptmIhTmW5GEddKj .marker.cross{stroke:#333333;}#mermaid-svg-iptmIhTmW5GEddKj svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-iptmIhTmW5GEddKj p{margin:0;}#mermaid-svg-iptmIhTmW5GEddKj g.classGroup text{fill:#9370DB;stroke:none;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:10px;}#mermaid-svg-iptmIhTmW5GEddKj g.classGroup text .title{font-weight:bolder;}#mermaid-svg-iptmIhTmW5GEddKj .cluster-label text{fill:#333;}#mermaid-svg-iptmIhTmW5GEddKj .cluster-label span{color:#333;}#mermaid-svg-iptmIhTmW5GEddKj .cluster-label span p{background-color:transparent;}#mermaid-svg-iptmIhTmW5GEddKj .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-iptmIhTmW5GEddKj .cluster text{fill:#333;}#mermaid-svg-iptmIhTmW5GEddKj .cluster span{color:#333;}#mermaid-svg-iptmIhTmW5GEddKj .nodeLabel,#mermaid-svg-iptmIhTmW5GEddKj .edgeLabel{color:#131300;}#mermaid-svg-iptmIhTmW5GEddKj .edgeLabel .label rect{fill:#ECECFF;}#mermaid-svg-iptmIhTmW5GEddKj .label text{fill:#131300;}#mermaid-svg-iptmIhTmW5GEddKj .labelBkg{background:#ECECFF;}#mermaid-svg-iptmIhTmW5GEddKj .edgeLabel .label span{background:#ECECFF;}#mermaid-svg-iptmIhTmW5GEddKj .classTitle{font-weight:bolder;}#mermaid-svg-iptmIhTmW5GEddKj .node rect,#mermaid-svg-iptmIhTmW5GEddKj .node circle,#mermaid-svg-iptmIhTmW5GEddKj .node ellipse,#mermaid-svg-iptmIhTmW5GEddKj .node polygon,#mermaid-svg-iptmIhTmW5GEddKj .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-iptmIhTmW5GEddKj .divider{stroke:#9370DB;stroke-width:1;}#mermaid-svg-iptmIhTmW5GEddKj g.clickable{cursor:pointer;}#mermaid-svg-iptmIhTmW5GEddKj g.classGroup rect{fill:#ECECFF;stroke:#9370DB;}#mermaid-svg-iptmIhTmW5GEddKj g.classGroup line{stroke:#9370DB;stroke-width:1;}#mermaid-svg-iptmIhTmW5GEddKj .classLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.5;}#mermaid-svg-iptmIhTmW5GEddKj .classLabel .label{fill:#9370DB;font-size:10px;}#mermaid-svg-iptmIhTmW5GEddKj .relation{stroke:#333333;stroke-width:1;fill:none;}#mermaid-svg-iptmIhTmW5GEddKj .dashed-line{stroke-dasharray:3;}#mermaid-svg-iptmIhTmW5GEddKj .dotted-line{stroke-dasharray:1 2;}#mermaid-svg-iptmIhTmW5GEddKj #compositionStart,#mermaid-svg-iptmIhTmW5GEddKj .composition{fill:#333333!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-iptmIhTmW5GEddKj #compositionEnd,#mermaid-svg-iptmIhTmW5GEddKj .composition{fill:#333333!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-iptmIhTmW5GEddKj #dependencyStart,#mermaid-svg-iptmIhTmW5GEddKj .dependency{fill:#333333!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-iptmIhTmW5GEddKj #dependencyStart,#mermaid-svg-iptmIhTmW5GEddKj .dependency{fill:#333333!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-iptmIhTmW5GEddKj #extensionStart,#mermaid-svg-iptmIhTmW5GEddKj .extension{fill:transparent!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-iptmIhTmW5GEddKj #extensionEnd,#mermaid-svg-iptmIhTmW5GEddKj .extension{fill:transparent!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-iptmIhTmW5GEddKj #aggregationStart,#mermaid-svg-iptmIhTmW5GEddKj .aggregation{fill:transparent!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-iptmIhTmW5GEddKj #aggregationEnd,#mermaid-svg-iptmIhTmW5GEddKj .aggregation{fill:transparent!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-iptmIhTmW5GEddKj #lollipopStart,#mermaid-svg-iptmIhTmW5GEddKj .lollipop{fill:#ECECFF!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-iptmIhTmW5GEddKj #lollipopEnd,#mermaid-svg-iptmIhTmW5GEddKj .lollipop{fill:#ECECFF!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-iptmIhTmW5GEddKj .edgeTerminals{font-size:11px;line-height:initial;}#mermaid-svg-iptmIhTmW5GEddKj .classTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-iptmIhTmW5GEddKj .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-iptmIhTmW5GEddKj .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-iptmIhTmW5GEddKj :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} text_config
vision_config
PreTrainedConfig
+model_type: str
+post_init()
+to_dict()
Qwen3_5MoeTextConfig
+model_type = "qwen3_5_moe_text"
+base_config_key = "text_config"
+vocab_size: int = 248320
+hidden_size: int = 2048
+num_hidden_layers: int = 40
+num_attention_heads: int = 16
+num_key_value_heads: int = 2
+head_dim: int = 256
+num_experts: int = 256
+num_experts_per_tok: int = 8
+moe_intermediate_size: int = 512
+shared_expert_intermediate_size: int = 512
+layer_types: list<str> | None
+linear_conv_kernel_dim: int = 4
+linear_key_head_dim: int = 128
+linear_value_head_dim: int = 128
+linear_num_key_heads: int = 16
+linear_num_value_heads: int = 32
+base_model_tp_plan: dict
+base_model_pp_plan: dict
+post_init()
Qwen3_5MoeVisionConfig
+model_type = "qwen3_5_moe_vision"
+base_config_key = "vision_config"
+depth: int = 27
+hidden_size: int = 1152
+intermediate_size: int = 4304
+num_heads: int = 16
+in_channels: int = 3
+patch_size: int = 16
+spatial_merge_size: int = 2
+temporal_patch_size: int = 2
+out_hidden_size: int = 3584
+num_position_embeddings: int = 2304
Qwen3_5MoeConfig
+model_type = "qwen3_5_moe"
+sub_configs: dict
+text_config: Qwen3_5MoeTextConfig
+vision_config: Qwen3_5MoeVisionConfig
+image_token_id: int = 248056
+video_token_id: int = 248057
+vision_start_token_id: int = 248053
+vision_end_token_id: int = 248054
+post_init()

2.2 sub_configs 机制图

sub_configs 是 Transformers 中多模态模型的标准机制,定义在 configuration_qwen3_5_moe.py:171(file:///workspace/src/transformers/models/qwen3_5_moe/configuration_qwen3_5_moe.py#L171):

python 复制代码
class Qwen3_5MoeConfig(PreTrainedConfig):
    sub_configs = {"vision_config": Qwen3_5MoeVisionConfig, "text_config": Qwen3_5MoeTextConfig}

Qwen3_5MoeVisionConfig Qwen3_5MoeTextConfig Qwen3_5MoeConfig config.json Qwen3_5MoeVisionConfig Qwen3_5MoeTextConfig Qwen3_5MoeConfig config.json #mermaid-svg-f7qYiHOxhNXRwRIm{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-f7qYiHOxhNXRwRIm .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-f7qYiHOxhNXRwRIm .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-f7qYiHOxhNXRwRIm .error-icon{fill:#552222;}#mermaid-svg-f7qYiHOxhNXRwRIm .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-f7qYiHOxhNXRwRIm .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-f7qYiHOxhNXRwRIm .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-f7qYiHOxhNXRwRIm .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-f7qYiHOxhNXRwRIm .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-f7qYiHOxhNXRwRIm .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-f7qYiHOxhNXRwRIm .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-f7qYiHOxhNXRwRIm .marker{fill:#333333;stroke:#333333;}#mermaid-svg-f7qYiHOxhNXRwRIm .marker.cross{stroke:#333333;}#mermaid-svg-f7qYiHOxhNXRwRIm svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-f7qYiHOxhNXRwRIm p{margin:0;}#mermaid-svg-f7qYiHOxhNXRwRIm .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-f7qYiHOxhNXRwRIm text.actor>tspan{fill:black;stroke:none;}#mermaid-svg-f7qYiHOxhNXRwRIm .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-f7qYiHOxhNXRwRIm .innerArc{stroke-width:1.5;stroke-dasharray:none;}#mermaid-svg-f7qYiHOxhNXRwRIm .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-svg-f7qYiHOxhNXRwRIm .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-svg-f7qYiHOxhNXRwRIm #arrowhead path{fill:#333;stroke:#333;}#mermaid-svg-f7qYiHOxhNXRwRIm .sequenceNumber{fill:white;}#mermaid-svg-f7qYiHOxhNXRwRIm #sequencenumber{fill:#333;}#mermaid-svg-f7qYiHOxhNXRwRIm #crosshead path{fill:#333;stroke:#333;}#mermaid-svg-f7qYiHOxhNXRwRIm .messageText{fill:#333;stroke:none;}#mermaid-svg-f7qYiHOxhNXRwRIm .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-f7qYiHOxhNXRwRIm .labelText,#mermaid-svg-f7qYiHOxhNXRwRIm .labelText>tspan{fill:black;stroke:none;}#mermaid-svg-f7qYiHOxhNXRwRIm .loopText,#mermaid-svg-f7qYiHOxhNXRwRIm .loopText>tspan{fill:black;stroke:none;}#mermaid-svg-f7qYiHOxhNXRwRIm .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-f7qYiHOxhNXRwRIm .note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-f7qYiHOxhNXRwRIm .noteText,#mermaid-svg-f7qYiHOxhNXRwRIm .noteText>tspan{fill:black;stroke:none;}#mermaid-svg-f7qYiHOxhNXRwRIm .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-svg-f7qYiHOxhNXRwRIm .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-svg-f7qYiHOxhNXRwRIm .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-svg-f7qYiHOxhNXRwRIm .actorPopupMenu{position:absolute;}#mermaid-svg-f7qYiHOxhNXRwRIm .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg-f7qYiHOxhNXRwRIm .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-f7qYiHOxhNXRwRIm .actor-man circle,#mermaid-svg-f7qYiHOxhNXRwRIm line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#mermaid-svg-f7qYiHOxhNXRwRIm :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} alt text_config 是 dict text_config 是 None alt vision_config 是 dict vision_config 是 None self.text_config 和 self.vision_config 均为实例化的 Config 对象 加载顶层配置 post_init() 检查 text_config Qwen3_5MoeTextConfig(**text_config) Qwen3_5MoeTextConfig() 使用默认值 post_init() 检查 vision_config Qwen3_5MoeVisionConfig(**vision_config) Qwen3_5MoeVisionConfig() 使用默认值

关键代码在 configuration_qwen3_5_moe.py:183-194(file:///workspace/src/transformers/models/qwen3_5_moe/configuration_qwen3_5_moe.py#L183):

python 复制代码
def __post_init__(self, **kwargs):
    if isinstance(self.vision_config, dict):
        self.vision_config = self.sub_configs["vision_config"](**self.vision_config)
    elif self.vision_config is None:
        self.vision_config = self.sub_configs["vision_config"]()

    if isinstance(self.text_config, dict):
        self.text_config = self.sub_configs["text_config"](**self.text_config)
    elif self.text_config is None:
        self.text_config = self.sub_configs["text_config"]()

    super().__post_init__(**kwargs)

2.3 base_model_tp_plan / base_model_pp_plan 并行策略声明图

Qwen3_5MoeTextConfigconfiguration_qwen3_5_moe.py:59-77(file:///workspace/src/transformers/models/qwen3_5_moe/configuration_qwen3_5_moe.py#L59) 声明了张量并行(TP)和流水线并行(PP)策略:
#mermaid-svg-mb3teBdWeh3zhbIJ{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-mb3teBdWeh3zhbIJ .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-mb3teBdWeh3zhbIJ .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-mb3teBdWeh3zhbIJ .error-icon{fill:#552222;}#mermaid-svg-mb3teBdWeh3zhbIJ .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-mb3teBdWeh3zhbIJ .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-mb3teBdWeh3zhbIJ .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-mb3teBdWeh3zhbIJ .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-mb3teBdWeh3zhbIJ .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-mb3teBdWeh3zhbIJ .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-mb3teBdWeh3zhbIJ .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-mb3teBdWeh3zhbIJ .marker{fill:#333333;stroke:#333333;}#mermaid-svg-mb3teBdWeh3zhbIJ .marker.cross{stroke:#333333;}#mermaid-svg-mb3teBdWeh3zhbIJ svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-mb3teBdWeh3zhbIJ p{margin:0;}#mermaid-svg-mb3teBdWeh3zhbIJ .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-mb3teBdWeh3zhbIJ .cluster-label text{fill:#333;}#mermaid-svg-mb3teBdWeh3zhbIJ .cluster-label span{color:#333;}#mermaid-svg-mb3teBdWeh3zhbIJ .cluster-label span p{background-color:transparent;}#mermaid-svg-mb3teBdWeh3zhbIJ .label text,#mermaid-svg-mb3teBdWeh3zhbIJ span{fill:#333;color:#333;}#mermaid-svg-mb3teBdWeh3zhbIJ .node rect,#mermaid-svg-mb3teBdWeh3zhbIJ .node circle,#mermaid-svg-mb3teBdWeh3zhbIJ .node ellipse,#mermaid-svg-mb3teBdWeh3zhbIJ .node polygon,#mermaid-svg-mb3teBdWeh3zhbIJ .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-mb3teBdWeh3zhbIJ .rough-node .label text,#mermaid-svg-mb3teBdWeh3zhbIJ .node .label text,#mermaid-svg-mb3teBdWeh3zhbIJ .image-shape .label,#mermaid-svg-mb3teBdWeh3zhbIJ .icon-shape .label{text-anchor:middle;}#mermaid-svg-mb3teBdWeh3zhbIJ .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-mb3teBdWeh3zhbIJ .rough-node .label,#mermaid-svg-mb3teBdWeh3zhbIJ .node .label,#mermaid-svg-mb3teBdWeh3zhbIJ .image-shape .label,#mermaid-svg-mb3teBdWeh3zhbIJ .icon-shape .label{text-align:center;}#mermaid-svg-mb3teBdWeh3zhbIJ .node.clickable{cursor:pointer;}#mermaid-svg-mb3teBdWeh3zhbIJ .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-mb3teBdWeh3zhbIJ .arrowheadPath{fill:#333333;}#mermaid-svg-mb3teBdWeh3zhbIJ .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-mb3teBdWeh3zhbIJ .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-mb3teBdWeh3zhbIJ .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-mb3teBdWeh3zhbIJ .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-mb3teBdWeh3zhbIJ .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-mb3teBdWeh3zhbIJ .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-mb3teBdWeh3zhbIJ .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-mb3teBdWeh3zhbIJ .cluster text{fill:#333;}#mermaid-svg-mb3teBdWeh3zhbIJ .cluster span{color:#333;}#mermaid-svg-mb3teBdWeh3zhbIJ div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-mb3teBdWeh3zhbIJ .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-mb3teBdWeh3zhbIJ rect.text{fill:none;stroke-width:0;}#mermaid-svg-mb3teBdWeh3zhbIJ .icon-shape,#mermaid-svg-mb3teBdWeh3zhbIJ .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-mb3teBdWeh3zhbIJ .icon-shape p,#mermaid-svg-mb3teBdWeh3zhbIJ .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-mb3teBdWeh3zhbIJ .icon-shape .label rect,#mermaid-svg-mb3teBdWeh3zhbIJ .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-mb3teBdWeh3zhbIJ .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-mb3teBdWeh3zhbIJ .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-mb3teBdWeh3zhbIJ :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} PP 策略 (base_model_pp_plan)
embed_tokens

输入: input_ids

输出: inputs_embeds
layers

输入: hidden_states, attention_mask

输出: hidden_states
norm

输入: hidden_states

输出: hidden_states
TP 策略 (base_model_tp_plan)
q_proj → colwise
k_proj → colwise
v_proj → colwise
o_proj → rowwise
q_norm → replicated_with_grad_allreduce
k_norm → replicated_with_grad_allreduce
experts.gate_up_proj → packed_colwise
experts.down_proj → rowwise
experts → moe_tp_experts
shared_expert.gate_proj → colwise
shared_expert.up_proj → colwise
shared_expert.down_proj → rowwise


3. from_pretrained 完整时序

Qwen3_5MoeForConditionalGeneration.from_pretrained('Qwen/Qwen3.5-35B-A3B') 到模型就绪的完整流程。

3.1 时序图

Qwen3_5MoeTextModel Qwen3_5MoeVisionModel Qwen3_5MoeModel Qwen3_5MoeForConditionalGeneration PreTrainedModel Qwen3_5MoeVisionConfig Qwen3_5MoeTextConfig Qwen3_5MoeConfig AutoModelForCausalLM 用户代码 Qwen3_5MoeTextModel Qwen3_5MoeVisionModel Qwen3_5MoeModel Qwen3_5MoeForConditionalGeneration PreTrainedModel Qwen3_5MoeVisionConfig Qwen3_5MoeTextConfig Qwen3_5MoeConfig AutoModelForCausalLM 用户代码 #mermaid-svg-L8lpS15eq2DYGdp1{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-L8lpS15eq2DYGdp1 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-L8lpS15eq2DYGdp1 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-L8lpS15eq2DYGdp1 .error-icon{fill:#552222;}#mermaid-svg-L8lpS15eq2DYGdp1 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-L8lpS15eq2DYGdp1 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-L8lpS15eq2DYGdp1 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-L8lpS15eq2DYGdp1 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-L8lpS15eq2DYGdp1 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-L8lpS15eq2DYGdp1 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-L8lpS15eq2DYGdp1 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-L8lpS15eq2DYGdp1 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-L8lpS15eq2DYGdp1 .marker.cross{stroke:#333333;}#mermaid-svg-L8lpS15eq2DYGdp1 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-L8lpS15eq2DYGdp1 p{margin:0;}#mermaid-svg-L8lpS15eq2DYGdp1 .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-L8lpS15eq2DYGdp1 text.actor>tspan{fill:black;stroke:none;}#mermaid-svg-L8lpS15eq2DYGdp1 .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-L8lpS15eq2DYGdp1 .innerArc{stroke-width:1.5;stroke-dasharray:none;}#mermaid-svg-L8lpS15eq2DYGdp1 .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-svg-L8lpS15eq2DYGdp1 .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-svg-L8lpS15eq2DYGdp1 #arrowhead path{fill:#333;stroke:#333;}#mermaid-svg-L8lpS15eq2DYGdp1 .sequenceNumber{fill:white;}#mermaid-svg-L8lpS15eq2DYGdp1 #sequencenumber{fill:#333;}#mermaid-svg-L8lpS15eq2DYGdp1 #crosshead path{fill:#333;stroke:#333;}#mermaid-svg-L8lpS15eq2DYGdp1 .messageText{fill:#333;stroke:none;}#mermaid-svg-L8lpS15eq2DYGdp1 .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-L8lpS15eq2DYGdp1 .labelText,#mermaid-svg-L8lpS15eq2DYGdp1 .labelText>tspan{fill:black;stroke:none;}#mermaid-svg-L8lpS15eq2DYGdp1 .loopText,#mermaid-svg-L8lpS15eq2DYGdp1 .loopText>tspan{fill:black;stroke:none;}#mermaid-svg-L8lpS15eq2DYGdp1 .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-L8lpS15eq2DYGdp1 .note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-L8lpS15eq2DYGdp1 .noteText,#mermaid-svg-L8lpS15eq2DYGdp1 .noteText>tspan{fill:black;stroke:none;}#mermaid-svg-L8lpS15eq2DYGdp1 .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-svg-L8lpS15eq2DYGdp1 .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-svg-L8lpS15eq2DYGdp1 .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-svg-L8lpS15eq2DYGdp1 .actorPopupMenu{position:absolute;}#mermaid-svg-L8lpS15eq2DYGdp1 .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg-L8lpS15eq2DYGdp1 .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-L8lpS15eq2DYGdp1 .actor-man circle,#mermaid-svg-L8lpS15eq2DYGdp1 line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#mermaid-svg-L8lpS15eq2DYGdp1 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} post_init() 自动将 dict 转为 Config 对象 构建 40 层 DecoderLayer 每层根据 layer_types 选择 full_attention 或 linear_attention 每层均使用 Qwen3_5MoeSparseMoeBlock (256 专家 + 共享专家) MoE 权重特殊处理: experts.gate_up_proj shape: 256, 1024, 2048 experts.down_proj shape: 256, 2048, 512 GatedDeltaNet: dt_bias=1, A_log~U(0,16) RMSNorm: weight=0 (1-centered) Experts: normal_(std=initializer_range) from_pretrained('Qwen/Qwen3.5-35B-A3B') 从 config.json 实例化 Config 解析 text_config dict → Qwen3_5MoeTextConfig 解析 vision_config dict → Qwen3_5MoeVisionConfig Qwen3_5MoeForConditionalGeneration(config) Qwen3_5MoeModel(config) Qwen3_5MoeVisionModel._from_config(config.vision_config) Qwen3_5MoeTextModel._from_config(config.text_config) self.lm_head = Linear(2048, 248320) load_state_dict() 加载权重 权重分配完成 post_init() → _init_weights()

3.2 MoE 权重加载特殊处理

Qwen3.5-MoE 的专家权重以 3D 张量存储,这是 MoE 模型与 Dense 模型的关键区别。在 modeling_qwen3_5_moe.py:736-772(file:///workspace/src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py#L736) 中:

python 复制代码
@use_experts_implementation
class Qwen3_5MoeExperts(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.num_experts = config.num_experts          # 256
        self.hidden_dim = config.hidden_size           # 2048
        self.intermediate_dim = config.moe_intermediate_size  # 512
        # 3D 参数张量:[num_experts, intermediate_dim*2, hidden_dim]
        self.gate_up_proj = nn.Parameter(torch.empty(self.num_experts, 2 * self.intermediate_dim, self.hidden_dim))
        # 3D 参数张量:[num_experts, hidden_dim, intermediate_dim]
        self.down_proj = nn.Parameter(torch.empty(self.num_experts, self.hidden_dim, self.intermediate_dim))

#mermaid-svg-s2dJUsfjrbzluq42{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-s2dJUsfjrbzluq42 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-s2dJUsfjrbzluq42 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-s2dJUsfjrbzluq42 .error-icon{fill:#552222;}#mermaid-svg-s2dJUsfjrbzluq42 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-s2dJUsfjrbzluq42 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-s2dJUsfjrbzluq42 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-s2dJUsfjrbzluq42 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-s2dJUsfjrbzluq42 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-s2dJUsfjrbzluq42 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-s2dJUsfjrbzluq42 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-s2dJUsfjrbzluq42 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-s2dJUsfjrbzluq42 .marker.cross{stroke:#333333;}#mermaid-svg-s2dJUsfjrbzluq42 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-s2dJUsfjrbzluq42 p{margin:0;}#mermaid-svg-s2dJUsfjrbzluq42 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-s2dJUsfjrbzluq42 .cluster-label text{fill:#333;}#mermaid-svg-s2dJUsfjrbzluq42 .cluster-label span{color:#333;}#mermaid-svg-s2dJUsfjrbzluq42 .cluster-label span p{background-color:transparent;}#mermaid-svg-s2dJUsfjrbzluq42 .label text,#mermaid-svg-s2dJUsfjrbzluq42 span{fill:#333;color:#333;}#mermaid-svg-s2dJUsfjrbzluq42 .node rect,#mermaid-svg-s2dJUsfjrbzluq42 .node circle,#mermaid-svg-s2dJUsfjrbzluq42 .node ellipse,#mermaid-svg-s2dJUsfjrbzluq42 .node polygon,#mermaid-svg-s2dJUsfjrbzluq42 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-s2dJUsfjrbzluq42 .rough-node .label text,#mermaid-svg-s2dJUsfjrbzluq42 .node .label text,#mermaid-svg-s2dJUsfjrbzluq42 .image-shape .label,#mermaid-svg-s2dJUsfjrbzluq42 .icon-shape .label{text-anchor:middle;}#mermaid-svg-s2dJUsfjrbzluq42 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-s2dJUsfjrbzluq42 .rough-node .label,#mermaid-svg-s2dJUsfjrbzluq42 .node .label,#mermaid-svg-s2dJUsfjrbzluq42 .image-shape .label,#mermaid-svg-s2dJUsfjrbzluq42 .icon-shape .label{text-align:center;}#mermaid-svg-s2dJUsfjrbzluq42 .node.clickable{cursor:pointer;}#mermaid-svg-s2dJUsfjrbzluq42 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-s2dJUsfjrbzluq42 .arrowheadPath{fill:#333333;}#mermaid-svg-s2dJUsfjrbzluq42 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-s2dJUsfjrbzluq42 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-s2dJUsfjrbzluq42 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-s2dJUsfjrbzluq42 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-s2dJUsfjrbzluq42 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-s2dJUsfjrbzluq42 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-s2dJUsfjrbzluq42 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-s2dJUsfjrbzluq42 .cluster text{fill:#333;}#mermaid-svg-s2dJUsfjrbzluq42 .cluster span{color:#333;}#mermaid-svg-s2dJUsfjrbzluq42 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-s2dJUsfjrbzluq42 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-s2dJUsfjrbzluq42 rect.text{fill:none;stroke-width:0;}#mermaid-svg-s2dJUsfjrbzluq42 .icon-shape,#mermaid-svg-s2dJUsfjrbzluq42 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-s2dJUsfjrbzluq42 .icon-shape p,#mermaid-svg-s2dJUsfjrbzluq42 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-s2dJUsfjrbzluq42 .icon-shape .label rect,#mermaid-svg-s2dJUsfjrbzluq42 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-s2dJUsfjrbzluq42 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-s2dJUsfjrbzluq42 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-s2dJUsfjrbzluq42 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} MoE 模型权重 (3D)
Dense 模型权重 (2D)
MoE: 融合为 3D
MoE: 融合为 3D
MoE: 扩展为 3D
gate_proj: 2048, 512
up_proj: 2048, 512
down_proj: 512, 2048
gate_up_proj: 256, 1024, 2048

256个专家共享一个参数张量

gate和up融合存储
down_proj: 256, 2048, 512

256个专家共享一个参数张量


4. 混合注意力层架构

Qwen3.5-MoE 的核心创新在于 full_attention 层和 linear_attention 层交替排列,这是混合注意力架构的首次大规模应用。

4.1 层类型分布图

layer_typesconfiguration_qwen3_5_moe.py:112-119(file:///workspace/src/transformers/models/qwen3_5_moe/configuration_qwen3_5_moe.py#L112) 中自动生成,默认 full_attention_interval=4

python 复制代码
def __post_init__(self, **kwargs):
    if self.layer_types is None:
        interval_pattern = kwargs.pop("full_attention_interval", 4)
        self.layer_types = [
            "linear_attention" if bool((i + 1) % interval_pattern) else "full_attention"
            for i in range(self.num_hidden_layers)
        ]

#mermaid-svg-XQghF1ANHaDDjfle{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-XQghF1ANHaDDjfle .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-XQghF1ANHaDDjfle .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-XQghF1ANHaDDjfle .error-icon{fill:#552222;}#mermaid-svg-XQghF1ANHaDDjfle .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-XQghF1ANHaDDjfle .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-XQghF1ANHaDDjfle .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-XQghF1ANHaDDjfle .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-XQghF1ANHaDDjfle .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-XQghF1ANHaDDjfle .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-XQghF1ANHaDDjfle .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-XQghF1ANHaDDjfle .marker{fill:#333333;stroke:#333333;}#mermaid-svg-XQghF1ANHaDDjfle .marker.cross{stroke:#333333;}#mermaid-svg-XQghF1ANHaDDjfle svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-XQghF1ANHaDDjfle p{margin:0;}#mermaid-svg-XQghF1ANHaDDjfle .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-XQghF1ANHaDDjfle .cluster-label text{fill:#333;}#mermaid-svg-XQghF1ANHaDDjfle .cluster-label span{color:#333;}#mermaid-svg-XQghF1ANHaDDjfle .cluster-label span p{background-color:transparent;}#mermaid-svg-XQghF1ANHaDDjfle .label text,#mermaid-svg-XQghF1ANHaDDjfle span{fill:#333;color:#333;}#mermaid-svg-XQghF1ANHaDDjfle .node rect,#mermaid-svg-XQghF1ANHaDDjfle .node circle,#mermaid-svg-XQghF1ANHaDDjfle .node ellipse,#mermaid-svg-XQghF1ANHaDDjfle .node polygon,#mermaid-svg-XQghF1ANHaDDjfle .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-XQghF1ANHaDDjfle .rough-node .label text,#mermaid-svg-XQghF1ANHaDDjfle .node .label text,#mermaid-svg-XQghF1ANHaDDjfle .image-shape .label,#mermaid-svg-XQghF1ANHaDDjfle .icon-shape .label{text-anchor:middle;}#mermaid-svg-XQghF1ANHaDDjfle .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-XQghF1ANHaDDjfle .rough-node .label,#mermaid-svg-XQghF1ANHaDDjfle .node .label,#mermaid-svg-XQghF1ANHaDDjfle .image-shape .label,#mermaid-svg-XQghF1ANHaDDjfle .icon-shape .label{text-align:center;}#mermaid-svg-XQghF1ANHaDDjfle .node.clickable{cursor:pointer;}#mermaid-svg-XQghF1ANHaDDjfle .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-XQghF1ANHaDDjfle .arrowheadPath{fill:#333333;}#mermaid-svg-XQghF1ANHaDDjfle .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-XQghF1ANHaDDjfle .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-XQghF1ANHaDDjfle .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-XQghF1ANHaDDjfle .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-XQghF1ANHaDDjfle .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-XQghF1ANHaDDjfle .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-XQghF1ANHaDDjfle .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-XQghF1ANHaDDjfle .cluster text{fill:#333;}#mermaid-svg-XQghF1ANHaDDjfle .cluster span{color:#333;}#mermaid-svg-XQghF1ANHaDDjfle div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-XQghF1ANHaDDjfle .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-XQghF1ANHaDDjfle rect.text{fill:none;stroke-width:0;}#mermaid-svg-XQghF1ANHaDDjfle .icon-shape,#mermaid-svg-XQghF1ANHaDDjfle .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-XQghF1ANHaDDjfle .icon-shape p,#mermaid-svg-XQghF1ANHaDDjfle .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-XQghF1ANHaDDjfle .icon-shape .label rect,#mermaid-svg-XQghF1ANHaDDjfle .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-XQghF1ANHaDDjfle .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-XQghF1ANHaDDjfle .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-XQghF1ANHaDDjfle :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 40层 DecoderLayer 的 layer_types 分布
Layer 0

linear_attention
Layer 1

linear_attention
Layer 2

linear_attention
Layer 3

🔥full_attention
Layer 4

linear_attention
Layer 5

linear_attention
Layer 6

linear_attention
Layer 7

🔥full_attention
...
Layer 39

🔥full_attention

规律:每 4 层中,第 0-2 层为 linear_attention,第 3 层为 full_attention。40 层中共有 10 个 full_attention 层和 30 个 linear_attention 层。

4.2 Qwen3_5MoeAttention 内部结构图

定义在 modeling_qwen3_5_moe.py:642-716(file:///workspace/src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py#L642):
#mermaid-svg-kcj2s6SgBmt7UOl3{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-kcj2s6SgBmt7UOl3 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-kcj2s6SgBmt7UOl3 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-kcj2s6SgBmt7UOl3 .error-icon{fill:#552222;}#mermaid-svg-kcj2s6SgBmt7UOl3 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-kcj2s6SgBmt7UOl3 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-kcj2s6SgBmt7UOl3 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-kcj2s6SgBmt7UOl3 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-kcj2s6SgBmt7UOl3 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-kcj2s6SgBmt7UOl3 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-kcj2s6SgBmt7UOl3 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-kcj2s6SgBmt7UOl3 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-kcj2s6SgBmt7UOl3 .marker.cross{stroke:#333333;}#mermaid-svg-kcj2s6SgBmt7UOl3 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-kcj2s6SgBmt7UOl3 p{margin:0;}#mermaid-svg-kcj2s6SgBmt7UOl3 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-kcj2s6SgBmt7UOl3 .cluster-label text{fill:#333;}#mermaid-svg-kcj2s6SgBmt7UOl3 .cluster-label span{color:#333;}#mermaid-svg-kcj2s6SgBmt7UOl3 .cluster-label span p{background-color:transparent;}#mermaid-svg-kcj2s6SgBmt7UOl3 .label text,#mermaid-svg-kcj2s6SgBmt7UOl3 span{fill:#333;color:#333;}#mermaid-svg-kcj2s6SgBmt7UOl3 .node rect,#mermaid-svg-kcj2s6SgBmt7UOl3 .node circle,#mermaid-svg-kcj2s6SgBmt7UOl3 .node ellipse,#mermaid-svg-kcj2s6SgBmt7UOl3 .node polygon,#mermaid-svg-kcj2s6SgBmt7UOl3 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-kcj2s6SgBmt7UOl3 .rough-node .label text,#mermaid-svg-kcj2s6SgBmt7UOl3 .node .label text,#mermaid-svg-kcj2s6SgBmt7UOl3 .image-shape .label,#mermaid-svg-kcj2s6SgBmt7UOl3 .icon-shape .label{text-anchor:middle;}#mermaid-svg-kcj2s6SgBmt7UOl3 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-kcj2s6SgBmt7UOl3 .rough-node .label,#mermaid-svg-kcj2s6SgBmt7UOl3 .node .label,#mermaid-svg-kcj2s6SgBmt7UOl3 .image-shape .label,#mermaid-svg-kcj2s6SgBmt7UOl3 .icon-shape .label{text-align:center;}#mermaid-svg-kcj2s6SgBmt7UOl3 .node.clickable{cursor:pointer;}#mermaid-svg-kcj2s6SgBmt7UOl3 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-kcj2s6SgBmt7UOl3 .arrowheadPath{fill:#333333;}#mermaid-svg-kcj2s6SgBmt7UOl3 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-kcj2s6SgBmt7UOl3 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-kcj2s6SgBmt7UOl3 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-kcj2s6SgBmt7UOl3 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-kcj2s6SgBmt7UOl3 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-kcj2s6SgBmt7UOl3 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-kcj2s6SgBmt7UOl3 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-kcj2s6SgBmt7UOl3 .cluster text{fill:#333;}#mermaid-svg-kcj2s6SgBmt7UOl3 .cluster span{color:#333;}#mermaid-svg-kcj2s6SgBmt7UOl3 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-kcj2s6SgBmt7UOl3 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-kcj2s6SgBmt7UOl3 rect.text{fill:none;stroke-width:0;}#mermaid-svg-kcj2s6SgBmt7UOl3 .icon-shape,#mermaid-svg-kcj2s6SgBmt7UOl3 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-kcj2s6SgBmt7UOl3 .icon-shape p,#mermaid-svg-kcj2s6SgBmt7UOl3 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-kcj2s6SgBmt7UOl3 .icon-shape .label rect,#mermaid-svg-kcj2s6SgBmt7UOl3 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-kcj2s6SgBmt7UOl3 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-kcj2s6SgBmt7UOl3 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-kcj2s6SgBmt7UOl3 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} query
gate
hidden_states

bs, seq, 2048

q_proj

2048, 16×256×2=8192

输出含 gate
k_proj

2048, 2×256=512

v_proj

2048, 2×256=512

torch.chunk(dim=-1)

拆分为 query 和 gate
q_norm (RMSNorm)

head_dim=256
k_norm (RMSNorm)

head_dim=256
apply_rotary_pos_emb

M-RoPE 位置编码
KV Cache 更新

past_key_values.update()
Attention Interface

FlashAttn/SDPA/Eager
Sigmoid Gate

attn_output *= σ(gate)
o_proj

16×256, 2048

attn_output

bs, seq, 2048

三大创新点

  1. QK Normq_normk_norm 对 Q/K 做 RMSNorm,稳定训练
  2. Gate 机制q_proj 输出维度翻倍(head_dim * 2),一半作为 query,一半经 sigmoid 门控
  3. M-RoPE:多模态旋转位置编码,支持文本/图像/视频的 3D 位置

4.3 Qwen3_5MoeGatedDeltaNet 内部结构图

定义在 modeling_qwen3_5_moe.py:367-555(file:///workspace/src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py#L367):
#mermaid-svg-MWnOINO4SGyiG9aj{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-MWnOINO4SGyiG9aj .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-MWnOINO4SGyiG9aj .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-MWnOINO4SGyiG9aj .error-icon{fill:#552222;}#mermaid-svg-MWnOINO4SGyiG9aj .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-MWnOINO4SGyiG9aj .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-MWnOINO4SGyiG9aj .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-MWnOINO4SGyiG9aj .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-MWnOINO4SGyiG9aj .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-MWnOINO4SGyiG9aj .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-MWnOINO4SGyiG9aj .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-MWnOINO4SGyiG9aj .marker{fill:#333333;stroke:#333333;}#mermaid-svg-MWnOINO4SGyiG9aj .marker.cross{stroke:#333333;}#mermaid-svg-MWnOINO4SGyiG9aj svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-MWnOINO4SGyiG9aj p{margin:0;}#mermaid-svg-MWnOINO4SGyiG9aj .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-MWnOINO4SGyiG9aj .cluster-label text{fill:#333;}#mermaid-svg-MWnOINO4SGyiG9aj .cluster-label span{color:#333;}#mermaid-svg-MWnOINO4SGyiG9aj .cluster-label span p{background-color:transparent;}#mermaid-svg-MWnOINO4SGyiG9aj .label text,#mermaid-svg-MWnOINO4SGyiG9aj span{fill:#333;color:#333;}#mermaid-svg-MWnOINO4SGyiG9aj .node rect,#mermaid-svg-MWnOINO4SGyiG9aj .node circle,#mermaid-svg-MWnOINO4SGyiG9aj .node ellipse,#mermaid-svg-MWnOINO4SGyiG9aj .node polygon,#mermaid-svg-MWnOINO4SGyiG9aj .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-MWnOINO4SGyiG9aj .rough-node .label text,#mermaid-svg-MWnOINO4SGyiG9aj .node .label text,#mermaid-svg-MWnOINO4SGyiG9aj .image-shape .label,#mermaid-svg-MWnOINO4SGyiG9aj .icon-shape .label{text-anchor:middle;}#mermaid-svg-MWnOINO4SGyiG9aj .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-MWnOINO4SGyiG9aj .rough-node .label,#mermaid-svg-MWnOINO4SGyiG9aj .node .label,#mermaid-svg-MWnOINO4SGyiG9aj .image-shape .label,#mermaid-svg-MWnOINO4SGyiG9aj .icon-shape .label{text-align:center;}#mermaid-svg-MWnOINO4SGyiG9aj .node.clickable{cursor:pointer;}#mermaid-svg-MWnOINO4SGyiG9aj .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-MWnOINO4SGyiG9aj .arrowheadPath{fill:#333333;}#mermaid-svg-MWnOINO4SGyiG9aj .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-MWnOINO4SGyiG9aj .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-MWnOINO4SGyiG9aj .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-MWnOINO4SGyiG9aj .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-MWnOINO4SGyiG9aj .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-MWnOINO4SGyiG9aj .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-MWnOINO4SGyiG9aj .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-MWnOINO4SGyiG9aj .cluster text{fill:#333;}#mermaid-svg-MWnOINO4SGyiG9aj .cluster span{color:#333;}#mermaid-svg-MWnOINO4SGyiG9aj div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-MWnOINO4SGyiG9aj .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-MWnOINO4SGyiG9aj rect.text{fill:none;stroke-width:0;}#mermaid-svg-MWnOINO4SGyiG9aj .icon-shape,#mermaid-svg-MWnOINO4SGyiG9aj .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-MWnOINO4SGyiG9aj .icon-shape p,#mermaid-svg-MWnOINO4SGyiG9aj .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-MWnOINO4SGyiG9aj .icon-shape .label rect,#mermaid-svg-MWnOINO4SGyiG9aj .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-MWnOINO4SGyiG9aj .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-MWnOINO4SGyiG9aj .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-MWnOINO4SGyiG9aj :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Q, K, V
hidden_states

bs, seq, 2048

in_proj_qkv

2048, 2×2048+4096=8192

QKV 融合投影
in_proj_z

2048, 4096

门控 z
in_proj_b

2048, 32

beta 投影
in_proj_a

2048, 32

衰减率投影
causal_conv1d

kernel_size=4, groups=conv_dim

因果卷积
torch.split → Q, K, V
β = σ(b)

erasure 门控
g = -exp(A_log) × softplus(a + dt_bias)

衰减率
Gated Delta Rule

chunk 模式 (prefill)

recurrent 模式 (decode)
RMSNormGated

norm + silu(z) 门控
out_proj

4096, 2048

output

bs, seq, 2048

GatedDeltaNet 核心公式

复制代码
# 递推模式(单 token 解码):
S_t = S_{t-1} * exp(g_t)                    # 衰减旧状态
kv_mem = (S_t * k_t).sum(dim=-2)            # 检索记忆
δ_t = (v_t - kv_mem) * β_t                  # 计算修正量
S_t = S_t + k_t^T * δ_t                     # 更新状态
o_t = (S_t * q_t).sum(dim=-2)               # 查询输出

4.4 两种注意力层的数据流对比图

#mermaid-svg-mBI4HqG7gesYPElO{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-mBI4HqG7gesYPElO .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-mBI4HqG7gesYPElO .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-mBI4HqG7gesYPElO .error-icon{fill:#552222;}#mermaid-svg-mBI4HqG7gesYPElO .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-mBI4HqG7gesYPElO .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-mBI4HqG7gesYPElO .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-mBI4HqG7gesYPElO .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-mBI4HqG7gesYPElO .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-mBI4HqG7gesYPElO .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-mBI4HqG7gesYPElO .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-mBI4HqG7gesYPElO .marker{fill:#333333;stroke:#333333;}#mermaid-svg-mBI4HqG7gesYPElO .marker.cross{stroke:#333333;}#mermaid-svg-mBI4HqG7gesYPElO svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-mBI4HqG7gesYPElO p{margin:0;}#mermaid-svg-mBI4HqG7gesYPElO .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-mBI4HqG7gesYPElO .cluster-label text{fill:#333;}#mermaid-svg-mBI4HqG7gesYPElO .cluster-label span{color:#333;}#mermaid-svg-mBI4HqG7gesYPElO .cluster-label span p{background-color:transparent;}#mermaid-svg-mBI4HqG7gesYPElO .label text,#mermaid-svg-mBI4HqG7gesYPElO span{fill:#333;color:#333;}#mermaid-svg-mBI4HqG7gesYPElO .node rect,#mermaid-svg-mBI4HqG7gesYPElO .node circle,#mermaid-svg-mBI4HqG7gesYPElO .node ellipse,#mermaid-svg-mBI4HqG7gesYPElO .node polygon,#mermaid-svg-mBI4HqG7gesYPElO .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-mBI4HqG7gesYPElO .rough-node .label text,#mermaid-svg-mBI4HqG7gesYPElO .node .label text,#mermaid-svg-mBI4HqG7gesYPElO .image-shape .label,#mermaid-svg-mBI4HqG7gesYPElO .icon-shape .label{text-anchor:middle;}#mermaid-svg-mBI4HqG7gesYPElO .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-mBI4HqG7gesYPElO .rough-node .label,#mermaid-svg-mBI4HqG7gesYPElO .node .label,#mermaid-svg-mBI4HqG7gesYPElO .image-shape .label,#mermaid-svg-mBI4HqG7gesYPElO .icon-shape .label{text-align:center;}#mermaid-svg-mBI4HqG7gesYPElO .node.clickable{cursor:pointer;}#mermaid-svg-mBI4HqG7gesYPElO .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-mBI4HqG7gesYPElO .arrowheadPath{fill:#333333;}#mermaid-svg-mBI4HqG7gesYPElO .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-mBI4HqG7gesYPElO .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-mBI4HqG7gesYPElO .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-mBI4HqG7gesYPElO .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-mBI4HqG7gesYPElO .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-mBI4HqG7gesYPElO .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-mBI4HqG7gesYPElO .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-mBI4HqG7gesYPElO .cluster text{fill:#333;}#mermaid-svg-mBI4HqG7gesYPElO .cluster span{color:#333;}#mermaid-svg-mBI4HqG7gesYPElO div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-mBI4HqG7gesYPElO .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-mBI4HqG7gesYPElO rect.text{fill:none;stroke-width:0;}#mermaid-svg-mBI4HqG7gesYPElO .icon-shape,#mermaid-svg-mBI4HqG7gesYPElO .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-mBI4HqG7gesYPElO .icon-shape p,#mermaid-svg-mBI4HqG7gesYPElO .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-mBI4HqG7gesYPElO .icon-shape .label rect,#mermaid-svg-mBI4HqG7gesYPElO .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-mBI4HqG7gesYPElO .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-mBI4HqG7gesYPElO .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-mBI4HqG7gesYPElO :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} linear_attention 层
hidden_states
input_layernorm
in_proj_qkv + in_proj_z/b/a
causal_conv1d

kernel=4
Gated Delta Rule

O(n) 复杂度
RMSNormGated

norm + silu(z)
residual + output
full_attention 层
hidden_states
input_layernorm
q_proj + k_proj + v_proj
q_norm + k_norm
M-RoPE
Softmax Attention

O(n²) 复杂度
Sigmoid Gate
residual + output

特性 full_attention linear_attention
复杂度 O(n²) O(n)
缓存类型 KV Cache conv_state + recurrent_state
位置编码 M-RoPE 无(卷积隐式编码)
QK Norm ❌(使用 L2 Norm)
Gate 机制 Sigmoid Gate on Q RMSNormGated with z
适用场景 精确长程依赖 高效序列建模

5. MoE 专家路由系统

Qwen3.5-MoE 采用 256 专家 Top-8 路由 + 共享专家的混合架构,每个 token 同时经过 8 个路由专家和 1 个共享专家。

5.1 MoE 路由流程图

#mermaid-svg-TfDZEYUxhDgZ2CIY{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-TfDZEYUxhDgZ2CIY .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-TfDZEYUxhDgZ2CIY .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-TfDZEYUxhDgZ2CIY .error-icon{fill:#552222;}#mermaid-svg-TfDZEYUxhDgZ2CIY .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-TfDZEYUxhDgZ2CIY .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-TfDZEYUxhDgZ2CIY .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-TfDZEYUxhDgZ2CIY .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-TfDZEYUxhDgZ2CIY .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-TfDZEYUxhDgZ2CIY .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-TfDZEYUxhDgZ2CIY .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-TfDZEYUxhDgZ2CIY .marker{fill:#333333;stroke:#333333;}#mermaid-svg-TfDZEYUxhDgZ2CIY .marker.cross{stroke:#333333;}#mermaid-svg-TfDZEYUxhDgZ2CIY svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-TfDZEYUxhDgZ2CIY p{margin:0;}#mermaid-svg-TfDZEYUxhDgZ2CIY .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-TfDZEYUxhDgZ2CIY .cluster-label text{fill:#333;}#mermaid-svg-TfDZEYUxhDgZ2CIY .cluster-label span{color:#333;}#mermaid-svg-TfDZEYUxhDgZ2CIY .cluster-label span p{background-color:transparent;}#mermaid-svg-TfDZEYUxhDgZ2CIY .label text,#mermaid-svg-TfDZEYUxhDgZ2CIY span{fill:#333;color:#333;}#mermaid-svg-TfDZEYUxhDgZ2CIY .node rect,#mermaid-svg-TfDZEYUxhDgZ2CIY .node circle,#mermaid-svg-TfDZEYUxhDgZ2CIY .node ellipse,#mermaid-svg-TfDZEYUxhDgZ2CIY .node polygon,#mermaid-svg-TfDZEYUxhDgZ2CIY .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-TfDZEYUxhDgZ2CIY .rough-node .label text,#mermaid-svg-TfDZEYUxhDgZ2CIY .node .label text,#mermaid-svg-TfDZEYUxhDgZ2CIY .image-shape .label,#mermaid-svg-TfDZEYUxhDgZ2CIY .icon-shape .label{text-anchor:middle;}#mermaid-svg-TfDZEYUxhDgZ2CIY .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-TfDZEYUxhDgZ2CIY .rough-node .label,#mermaid-svg-TfDZEYUxhDgZ2CIY .node .label,#mermaid-svg-TfDZEYUxhDgZ2CIY .image-shape .label,#mermaid-svg-TfDZEYUxhDgZ2CIY .icon-shape .label{text-align:center;}#mermaid-svg-TfDZEYUxhDgZ2CIY .node.clickable{cursor:pointer;}#mermaid-svg-TfDZEYUxhDgZ2CIY .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-TfDZEYUxhDgZ2CIY .arrowheadPath{fill:#333333;}#mermaid-svg-TfDZEYUxhDgZ2CIY .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-TfDZEYUxhDgZ2CIY .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-TfDZEYUxhDgZ2CIY .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-TfDZEYUxhDgZ2CIY .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-TfDZEYUxhDgZ2CIY .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-TfDZEYUxhDgZ2CIY .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-TfDZEYUxhDgZ2CIY .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-TfDZEYUxhDgZ2CIY .cluster text{fill:#333;}#mermaid-svg-TfDZEYUxhDgZ2CIY .cluster span{color:#333;}#mermaid-svg-TfDZEYUxhDgZ2CIY div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-TfDZEYUxhDgZ2CIY .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-TfDZEYUxhDgZ2CIY rect.text{fill:none;stroke-width:0;}#mermaid-svg-TfDZEYUxhDgZ2CIY .icon-shape,#mermaid-svg-TfDZEYUxhDgZ2CIY .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-TfDZEYUxhDgZ2CIY .icon-shape p,#mermaid-svg-TfDZEYUxhDgZ2CIY .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-TfDZEYUxhDgZ2CIY .icon-shape .label rect,#mermaid-svg-TfDZEYUxhDgZ2CIY .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-TfDZEYUxhDgZ2CIY .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-TfDZEYUxhDgZ2CIY .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-TfDZEYUxhDgZ2CIY :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 共享专家
专家计算
路由决策
top_k_index, top_k_weights
hidden_states

bs, seq, 2048

Qwen3_5MoeTopKRouter

weight: 256, 2048
Softmax → router_probs
Top-8 选择 → indices + weights
归一化 weights

w /= sum(w)
遍历 256 个专家

仅计算被选中的专家
gate_up_projexpert_idx

F.linear → chunk → SiLU(gate)*up
down_projexpert_idx

F.linear → down_proj
× routing_weights
index_add_ 累加
SharedExpert (标准 MLP)

gate_proj + up_proj + down_proj
SharedExpertGate

σ(Linear(x))
expert_output + shared_expert_output
output

bs, seq, 2048

5.2 Qwen3_5MoeSparseMoeBlock 内部结构图

定义在 modeling_qwen3_5_moe.py:794-813(file:///workspace/src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py#L794):
#mermaid-svg-lSLO1B4AMXWtBbsh{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-lSLO1B4AMXWtBbsh .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-lSLO1B4AMXWtBbsh .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-lSLO1B4AMXWtBbsh .error-icon{fill:#552222;}#mermaid-svg-lSLO1B4AMXWtBbsh .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-lSLO1B4AMXWtBbsh .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-lSLO1B4AMXWtBbsh .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-lSLO1B4AMXWtBbsh .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-lSLO1B4AMXWtBbsh .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-lSLO1B4AMXWtBbsh .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-lSLO1B4AMXWtBbsh .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-lSLO1B4AMXWtBbsh .marker{fill:#333333;stroke:#333333;}#mermaid-svg-lSLO1B4AMXWtBbsh .marker.cross{stroke:#333333;}#mermaid-svg-lSLO1B4AMXWtBbsh svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-lSLO1B4AMXWtBbsh p{margin:0;}#mermaid-svg-lSLO1B4AMXWtBbsh .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-lSLO1B4AMXWtBbsh .cluster-label text{fill:#333;}#mermaid-svg-lSLO1B4AMXWtBbsh .cluster-label span{color:#333;}#mermaid-svg-lSLO1B4AMXWtBbsh .cluster-label span p{background-color:transparent;}#mermaid-svg-lSLO1B4AMXWtBbsh .label text,#mermaid-svg-lSLO1B4AMXWtBbsh span{fill:#333;color:#333;}#mermaid-svg-lSLO1B4AMXWtBbsh .node rect,#mermaid-svg-lSLO1B4AMXWtBbsh .node circle,#mermaid-svg-lSLO1B4AMXWtBbsh .node ellipse,#mermaid-svg-lSLO1B4AMXWtBbsh .node polygon,#mermaid-svg-lSLO1B4AMXWtBbsh .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-lSLO1B4AMXWtBbsh .rough-node .label text,#mermaid-svg-lSLO1B4AMXWtBbsh .node .label text,#mermaid-svg-lSLO1B4AMXWtBbsh .image-shape .label,#mermaid-svg-lSLO1B4AMXWtBbsh .icon-shape .label{text-anchor:middle;}#mermaid-svg-lSLO1B4AMXWtBbsh .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-lSLO1B4AMXWtBbsh .rough-node .label,#mermaid-svg-lSLO1B4AMXWtBbsh .node .label,#mermaid-svg-lSLO1B4AMXWtBbsh .image-shape .label,#mermaid-svg-lSLO1B4AMXWtBbsh .icon-shape .label{text-align:center;}#mermaid-svg-lSLO1B4AMXWtBbsh .node.clickable{cursor:pointer;}#mermaid-svg-lSLO1B4AMXWtBbsh .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-lSLO1B4AMXWtBbsh .arrowheadPath{fill:#333333;}#mermaid-svg-lSLO1B4AMXWtBbsh .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-lSLO1B4AMXWtBbsh .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-lSLO1B4AMXWtBbsh .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-lSLO1B4AMXWtBbsh .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-lSLO1B4AMXWtBbsh .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-lSLO1B4AMXWtBbsh .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-lSLO1B4AMXWtBbsh .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-lSLO1B4AMXWtBbsh .cluster text{fill:#333;}#mermaid-svg-lSLO1B4AMXWtBbsh .cluster span{color:#333;}#mermaid-svg-lSLO1B4AMXWtBbsh div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-lSLO1B4AMXWtBbsh .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-lSLO1B4AMXWtBbsh rect.text{fill:none;stroke-width:0;}#mermaid-svg-lSLO1B4AMXWtBbsh .icon-shape,#mermaid-svg-lSLO1B4AMXWtBbsh .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-lSLO1B4AMXWtBbsh .icon-shape p,#mermaid-svg-lSLO1B4AMXWtBbsh .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-lSLO1B4AMXWtBbsh .icon-shape .label rect,#mermaid-svg-lSLO1B4AMXWtBbsh .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-lSLO1B4AMXWtBbsh .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-lSLO1B4AMXWtBbsh .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-lSLO1B4AMXWtBbsh :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 共享专家路径
稀疏路由路径
hidden_states

bs, seq, 2048

reshape → -1, 2048
gate (TopKRouter)

→ router_logits, routing_weights, selected_experts
experts (Qwen3_5MoeExperts)

→ expert_output
shared_expert (MLP)

gate_proj 2048, 512

up_proj 2048, 512

down_proj 512, 2048
shared_expert_gate

Linear(2048, 1)

σ(x) * shared_output
expert_output + gated_shared_output
reshape → bs, seq, 2048
output

关键代码:

python 复制代码
class Qwen3_5MoeSparseMoeBlock(nn.Module):
    def __init__(self, config):
        self.gate = Qwen3_5MoeTopKRouter(config)
        self.experts = Qwen3_5MoeExperts(config)
        self.shared_expert = Qwen3_5MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
        self.shared_expert_gate = torch.nn.Linear(config.hidden_size, 1, bias=False)

    def forward(self, hidden_states):
        shared_expert_output = self.shared_expert(hidden_states_reshaped)
        _, routing_weights, selected_experts = self.gate(hidden_states_reshaped)
        expert_output = self.experts(hidden_states_reshaped, selected_experts, routing_weights)
        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
        expert_output = expert_output + shared_expert_output

5.3 @use_experts_implementation 装饰器的工作原理

定义在 integrations/moe.py:523(file:///workspace/src/transformers/integrations/moe.py),该装饰器允许将默认的 PyTorch 专家实现替换为优化版本(如 megablocksgrouped_gemm):
#mermaid-svg-J905Em76I1IPpg9e{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-J905Em76I1IPpg9e .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-J905Em76I1IPpg9e .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-J905Em76I1IPpg9e .error-icon{fill:#552222;}#mermaid-svg-J905Em76I1IPpg9e .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-J905Em76I1IPpg9e .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-J905Em76I1IPpg9e .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-J905Em76I1IPpg9e .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-J905Em76I1IPpg9e .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-J905Em76I1IPpg9e .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-J905Em76I1IPpg9e .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-J905Em76I1IPpg9e .marker{fill:#333333;stroke:#333333;}#mermaid-svg-J905Em76I1IPpg9e .marker.cross{stroke:#333333;}#mermaid-svg-J905Em76I1IPpg9e svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-J905Em76I1IPpg9e p{margin:0;}#mermaid-svg-J905Em76I1IPpg9e .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-J905Em76I1IPpg9e .cluster-label text{fill:#333;}#mermaid-svg-J905Em76I1IPpg9e .cluster-label span{color:#333;}#mermaid-svg-J905Em76I1IPpg9e .cluster-label span p{background-color:transparent;}#mermaid-svg-J905Em76I1IPpg9e .label text,#mermaid-svg-J905Em76I1IPpg9e span{fill:#333;color:#333;}#mermaid-svg-J905Em76I1IPpg9e .node rect,#mermaid-svg-J905Em76I1IPpg9e .node circle,#mermaid-svg-J905Em76I1IPpg9e .node ellipse,#mermaid-svg-J905Em76I1IPpg9e .node polygon,#mermaid-svg-J905Em76I1IPpg9e .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-J905Em76I1IPpg9e .rough-node .label text,#mermaid-svg-J905Em76I1IPpg9e .node .label text,#mermaid-svg-J905Em76I1IPpg9e .image-shape .label,#mermaid-svg-J905Em76I1IPpg9e .icon-shape .label{text-anchor:middle;}#mermaid-svg-J905Em76I1IPpg9e .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-J905Em76I1IPpg9e .rough-node .label,#mermaid-svg-J905Em76I1IPpg9e .node .label,#mermaid-svg-J905Em76I1IPpg9e .image-shape .label,#mermaid-svg-J905Em76I1IPpg9e .icon-shape .label{text-align:center;}#mermaid-svg-J905Em76I1IPpg9e .node.clickable{cursor:pointer;}#mermaid-svg-J905Em76I1IPpg9e .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-J905Em76I1IPpg9e .arrowheadPath{fill:#333333;}#mermaid-svg-J905Em76I1IPpg9e .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-J905Em76I1IPpg9e .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-J905Em76I1IPpg9e .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-J905Em76I1IPpg9e .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-J905Em76I1IPpg9e .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-J905Em76I1IPpg9e .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-J905Em76I1IPpg9e .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-J905Em76I1IPpg9e .cluster text{fill:#333;}#mermaid-svg-J905Em76I1IPpg9e .cluster span{color:#333;}#mermaid-svg-J905Em76I1IPpg9e div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-J905Em76I1IPpg9e .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-J905Em76I1IPpg9e rect.text{fill:none;stroke-width:0;}#mermaid-svg-J905Em76I1IPpg9e .icon-shape,#mermaid-svg-J905Em76I1IPpg9e .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-J905Em76I1IPpg9e .icon-shape p,#mermaid-svg-J905Em76I1IPpg9e .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-J905Em76I1IPpg9e .icon-shape .label rect,#mermaid-svg-J905Em76I1IPpg9e .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-J905Em76I1IPpg9e .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-J905Em76I1IPpg9e .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-J905Em76I1IPpg9e :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 默认
megablocks
grouped_gemm
原始 Qwen3_5MoeExperts

forward() 逐专家循环
@use_experts_implementation

装饰器
experts_interface.dispatch()

根据运行时选择实现
PyTorch 实现

逐专家循环计算
MegaBlocks 实现

Block-Sparse 矩阵乘法
GroupedGEMM 实现

批量矩阵乘法

5.4 负载均衡损失计算图

定义在 modeling_qwen3_5_moe.py:1755-1834(file:///workspace/src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py#L1755):
#mermaid-svg-AJcbqulDlnFYpyip{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-AJcbqulDlnFYpyip .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-AJcbqulDlnFYpyip .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-AJcbqulDlnFYpyip .error-icon{fill:#552222;}#mermaid-svg-AJcbqulDlnFYpyip .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-AJcbqulDlnFYpyip .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-AJcbqulDlnFYpyip .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-AJcbqulDlnFYpyip .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-AJcbqulDlnFYpyip .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-AJcbqulDlnFYpyip .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-AJcbqulDlnFYpyip .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-AJcbqulDlnFYpyip .marker{fill:#333333;stroke:#333333;}#mermaid-svg-AJcbqulDlnFYpyip .marker.cross{stroke:#333333;}#mermaid-svg-AJcbqulDlnFYpyip svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-AJcbqulDlnFYpyip p{margin:0;}#mermaid-svg-AJcbqulDlnFYpyip .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-AJcbqulDlnFYpyip .cluster-label text{fill:#333;}#mermaid-svg-AJcbqulDlnFYpyip .cluster-label span{color:#333;}#mermaid-svg-AJcbqulDlnFYpyip .cluster-label span p{background-color:transparent;}#mermaid-svg-AJcbqulDlnFYpyip .label text,#mermaid-svg-AJcbqulDlnFYpyip span{fill:#333;color:#333;}#mermaid-svg-AJcbqulDlnFYpyip .node rect,#mermaid-svg-AJcbqulDlnFYpyip .node circle,#mermaid-svg-AJcbqulDlnFYpyip .node ellipse,#mermaid-svg-AJcbqulDlnFYpyip .node polygon,#mermaid-svg-AJcbqulDlnFYpyip .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-AJcbqulDlnFYpyip .rough-node .label text,#mermaid-svg-AJcbqulDlnFYpyip .node .label text,#mermaid-svg-AJcbqulDlnFYpyip .image-shape .label,#mermaid-svg-AJcbqulDlnFYpyip .icon-shape .label{text-anchor:middle;}#mermaid-svg-AJcbqulDlnFYpyip .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-AJcbqulDlnFYpyip .rough-node .label,#mermaid-svg-AJcbqulDlnFYpyip .node .label,#mermaid-svg-AJcbqulDlnFYpyip .image-shape .label,#mermaid-svg-AJcbqulDlnFYpyip .icon-shape .label{text-align:center;}#mermaid-svg-AJcbqulDlnFYpyip .node.clickable{cursor:pointer;}#mermaid-svg-AJcbqulDlnFYpyip .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-AJcbqulDlnFYpyip .arrowheadPath{fill:#333333;}#mermaid-svg-AJcbqulDlnFYpyip .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-AJcbqulDlnFYpyip .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-AJcbqulDlnFYpyip .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-AJcbqulDlnFYpyip .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-AJcbqulDlnFYpyip .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-AJcbqulDlnFYpyip .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-AJcbqulDlnFYpyip .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-AJcbqulDlnFYpyip .cluster text{fill:#333;}#mermaid-svg-AJcbqulDlnFYpyip .cluster span{color:#333;}#mermaid-svg-AJcbqulDlnFYpyip div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-AJcbqulDlnFYpyip .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-AJcbqulDlnFYpyip rect.text{fill:none;stroke-width:0;}#mermaid-svg-AJcbqulDlnFYpyip .icon-shape,#mermaid-svg-AJcbqulDlnFYpyip .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-AJcbqulDlnFYpyip .icon-shape p,#mermaid-svg-AJcbqulDlnFYpyip .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-AJcbqulDlnFYpyip .icon-shape .label rect,#mermaid-svg-AJcbqulDlnFYpyip .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-AJcbqulDlnFYpyip .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-AJcbqulDlnFYpyip .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-AJcbqulDlnFYpyip :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 无 attention_mask
有 attention_mask
tokens_per_expert

= sum(mask * expert_mask) / sum(mask)
router_prob_per_expert

= sum(weights * mask) / sum(mask)
gate_logits

每层的路由 logits

shape: bs\*seq, 256
torch.cat 所有层的 logits
Softmax → routing_weights
Top-K → selected_experts
one_hot → expert_mask
tokens_per_expert

= mean(expert_mask)
router_prob_per_expert

= mean(routing_weights)
overall_loss

= sum(tokens_per_expert × router_prob_per_expert) × num_experts

公式:L_aux = N × Σ_i(f_i × P_i),其中 f_i 是分配给专家 i 的 token 比例,P_i 是路由到专家 i 的平均概率。


6. 多模态视觉编码器

6.1 视觉编码流程图

#mermaid-svg-Hd0X2HlK84PVmvZ8{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-Hd0X2HlK84PVmvZ8 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .error-icon{fill:#552222;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .marker.cross{stroke:#333333;}#mermaid-svg-Hd0X2HlK84PVmvZ8 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-Hd0X2HlK84PVmvZ8 p{margin:0;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .cluster-label text{fill:#333;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .cluster-label span{color:#333;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .cluster-label span p{background-color:transparent;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .label text,#mermaid-svg-Hd0X2HlK84PVmvZ8 span{fill:#333;color:#333;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .node rect,#mermaid-svg-Hd0X2HlK84PVmvZ8 .node circle,#mermaid-svg-Hd0X2HlK84PVmvZ8 .node ellipse,#mermaid-svg-Hd0X2HlK84PVmvZ8 .node polygon,#mermaid-svg-Hd0X2HlK84PVmvZ8 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .rough-node .label text,#mermaid-svg-Hd0X2HlK84PVmvZ8 .node .label text,#mermaid-svg-Hd0X2HlK84PVmvZ8 .image-shape .label,#mermaid-svg-Hd0X2HlK84PVmvZ8 .icon-shape .label{text-anchor:middle;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .rough-node .label,#mermaid-svg-Hd0X2HlK84PVmvZ8 .node .label,#mermaid-svg-Hd0X2HlK84PVmvZ8 .image-shape .label,#mermaid-svg-Hd0X2HlK84PVmvZ8 .icon-shape .label{text-align:center;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .node.clickable{cursor:pointer;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .arrowheadPath{fill:#333333;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-Hd0X2HlK84PVmvZ8 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-Hd0X2HlK84PVmvZ8 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-Hd0X2HlK84PVmvZ8 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .cluster text{fill:#333;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .cluster span{color:#333;}#mermaid-svg-Hd0X2HlK84PVmvZ8 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-Hd0X2HlK84PVmvZ8 rect.text{fill:none;stroke-width:0;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .icon-shape,#mermaid-svg-Hd0X2HlK84PVmvZ8 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .icon-shape p,#mermaid-svg-Hd0X2HlK84PVmvZ8 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .icon-shape .label rect,#mermaid-svg-Hd0X2HlK84PVmvZ8 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-Hd0X2HlK84PVmvZ8 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-Hd0X2HlK84PVmvZ8 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-Hd0X2HlK84PVmvZ8 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 27 层 VisionBlock
×27
pixel_values

num_patches, 3, 2, 16, 16

(C, T, H, W)
Qwen3_5MoeVisionPatchEmbed

Conv3d: kernel=2,16,16

stride=2,16,16
位置嵌入

bilinear 插值 + pos_embed
旋转位置编码

rotary_pos_emb(position_ids)
norm1 (LayerNorm)
VisionAttention

qkv → RoPE → Attention → proj
norm2 (LayerNorm)
VisionMLP

fc1 → GELU → fc2
PatchMerger

LayerNorm → fc1 → GELU → fc2

1152×4, 3584

image_embeds / video_embeds

num_tokens, 3584

6.2 3D 位置编码(M-RoPE)计算图

视觉 token 的 3D 位置编码由 get_vision_position_ids 方法计算,定义在 modeling_qwen3_5_moe.py:1394-1450(file:///workspace/src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py#L1394):
#mermaid-svg-dOP3YYaePVZMlcz7{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-dOP3YYaePVZMlcz7 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-dOP3YYaePVZMlcz7 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-dOP3YYaePVZMlcz7 .error-icon{fill:#552222;}#mermaid-svg-dOP3YYaePVZMlcz7 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-dOP3YYaePVZMlcz7 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-dOP3YYaePVZMlcz7 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-dOP3YYaePVZMlcz7 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-dOP3YYaePVZMlcz7 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-dOP3YYaePVZMlcz7 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-dOP3YYaePVZMlcz7 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-dOP3YYaePVZMlcz7 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-dOP3YYaePVZMlcz7 .marker.cross{stroke:#333333;}#mermaid-svg-dOP3YYaePVZMlcz7 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-dOP3YYaePVZMlcz7 p{margin:0;}#mermaid-svg-dOP3YYaePVZMlcz7 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-dOP3YYaePVZMlcz7 .cluster-label text{fill:#333;}#mermaid-svg-dOP3YYaePVZMlcz7 .cluster-label span{color:#333;}#mermaid-svg-dOP3YYaePVZMlcz7 .cluster-label span p{background-color:transparent;}#mermaid-svg-dOP3YYaePVZMlcz7 .label text,#mermaid-svg-dOP3YYaePVZMlcz7 span{fill:#333;color:#333;}#mermaid-svg-dOP3YYaePVZMlcz7 .node rect,#mermaid-svg-dOP3YYaePVZMlcz7 .node circle,#mermaid-svg-dOP3YYaePVZMlcz7 .node ellipse,#mermaid-svg-dOP3YYaePVZMlcz7 .node polygon,#mermaid-svg-dOP3YYaePVZMlcz7 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-dOP3YYaePVZMlcz7 .rough-node .label text,#mermaid-svg-dOP3YYaePVZMlcz7 .node .label text,#mermaid-svg-dOP3YYaePVZMlcz7 .image-shape .label,#mermaid-svg-dOP3YYaePVZMlcz7 .icon-shape .label{text-anchor:middle;}#mermaid-svg-dOP3YYaePVZMlcz7 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-dOP3YYaePVZMlcz7 .rough-node .label,#mermaid-svg-dOP3YYaePVZMlcz7 .node .label,#mermaid-svg-dOP3YYaePVZMlcz7 .image-shape .label,#mermaid-svg-dOP3YYaePVZMlcz7 .icon-shape .label{text-align:center;}#mermaid-svg-dOP3YYaePVZMlcz7 .node.clickable{cursor:pointer;}#mermaid-svg-dOP3YYaePVZMlcz7 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-dOP3YYaePVZMlcz7 .arrowheadPath{fill:#333333;}#mermaid-svg-dOP3YYaePVZMlcz7 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-dOP3YYaePVZMlcz7 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-dOP3YYaePVZMlcz7 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-dOP3YYaePVZMlcz7 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-dOP3YYaePVZMlcz7 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-dOP3YYaePVZMlcz7 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-dOP3YYaePVZMlcz7 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-dOP3YYaePVZMlcz7 .cluster text{fill:#333;}#mermaid-svg-dOP3YYaePVZMlcz7 .cluster span{color:#333;}#mermaid-svg-dOP3YYaePVZMlcz7 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-dOP3YYaePVZMlcz7 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-dOP3YYaePVZMlcz7 rect.text{fill:none;stroke-width:0;}#mermaid-svg-dOP3YYaePVZMlcz7 .icon-shape,#mermaid-svg-dOP3YYaePVZMlcz7 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-dOP3YYaePVZMlcz7 .icon-shape p,#mermaid-svg-dOP3YYaePVZMlcz7 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-dOP3YYaePVZMlcz7 .icon-shape .label rect,#mermaid-svg-dOP3YYaePVZMlcz7 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-dOP3YYaePVZMlcz7 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-dOP3YYaePVZMlcz7 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-dOP3YYaePVZMlcz7 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 三维位置计算
空间合并
grid_thw

T, H, W

spatial_merge_size = 2
temporal_merge_size = 1
position_temporal

arange(T) × time_interval

  • start_position
    position_height

arange(H//2) + start_position

repeat_interleave(W//2) × T
position_width

arange(W//2) + start_position

repeat(H//2 × T)
torch.stack(T, H, W)

shape: 3, num_tokens

关键代码:

python 复制代码
def get_vision_position_ids(self, start_position, grid_thw, ...):
    llm_grid_t = grid_thw[0] // temp_merge_size
    llm_grid_h = grid_thw[1] // spatial_merge_size
    llm_grid_w = grid_thw[2] // spatial_merge_size

    position_temporal = torch.arange(llm_grid_t) * time_interval
    position_width = torch.arange(llm_grid_w) + start_position
    position_height = torch.arange(llm_grid_h) + start_position

    position_width = position_width.repeat(llm_grid_h * llm_grid_t)
    position_height = position_height.repeat_interleave(llm_grid_w).repeat(llm_grid_t)
    position_temporal = position_temporal.repeat_interleave(llm_grid_h * llm_grid_w) + start_position

    return torch.stack([position_temporal, position_height, position_width], dim=0)

6.3 视觉 token 与文本 token 的融合流程图

modeling_qwen3_5_moe.py:1707-1727(file:///workspace/src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py#L1707) 中,使用 masked_scatter 将视觉嵌入融合到文本嵌入中:
#mermaid-svg-4XmMOpbyMbm3mNE0{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-4XmMOpbyMbm3mNE0 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-4XmMOpbyMbm3mNE0 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-4XmMOpbyMbm3mNE0 .error-icon{fill:#552222;}#mermaid-svg-4XmMOpbyMbm3mNE0 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-4XmMOpbyMbm3mNE0 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-4XmMOpbyMbm3mNE0 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-4XmMOpbyMbm3mNE0 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-4XmMOpbyMbm3mNE0 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-4XmMOpbyMbm3mNE0 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-4XmMOpbyMbm3mNE0 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-4XmMOpbyMbm3mNE0 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-4XmMOpbyMbm3mNE0 .marker.cross{stroke:#333333;}#mermaid-svg-4XmMOpbyMbm3mNE0 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-4XmMOpbyMbm3mNE0 p{margin:0;}#mermaid-svg-4XmMOpbyMbm3mNE0 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-4XmMOpbyMbm3mNE0 .cluster-label text{fill:#333;}#mermaid-svg-4XmMOpbyMbm3mNE0 .cluster-label span{color:#333;}#mermaid-svg-4XmMOpbyMbm3mNE0 .cluster-label span p{background-color:transparent;}#mermaid-svg-4XmMOpbyMbm3mNE0 .label text,#mermaid-svg-4XmMOpbyMbm3mNE0 span{fill:#333;color:#333;}#mermaid-svg-4XmMOpbyMbm3mNE0 .node rect,#mermaid-svg-4XmMOpbyMbm3mNE0 .node circle,#mermaid-svg-4XmMOpbyMbm3mNE0 .node ellipse,#mermaid-svg-4XmMOpbyMbm3mNE0 .node polygon,#mermaid-svg-4XmMOpbyMbm3mNE0 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-4XmMOpbyMbm3mNE0 .rough-node .label text,#mermaid-svg-4XmMOpbyMbm3mNE0 .node .label text,#mermaid-svg-4XmMOpbyMbm3mNE0 .image-shape .label,#mermaid-svg-4XmMOpbyMbm3mNE0 .icon-shape .label{text-anchor:middle;}#mermaid-svg-4XmMOpbyMbm3mNE0 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-4XmMOpbyMbm3mNE0 .rough-node .label,#mermaid-svg-4XmMOpbyMbm3mNE0 .node .label,#mermaid-svg-4XmMOpbyMbm3mNE0 .image-shape .label,#mermaid-svg-4XmMOpbyMbm3mNE0 .icon-shape .label{text-align:center;}#mermaid-svg-4XmMOpbyMbm3mNE0 .node.clickable{cursor:pointer;}#mermaid-svg-4XmMOpbyMbm3mNE0 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-4XmMOpbyMbm3mNE0 .arrowheadPath{fill:#333333;}#mermaid-svg-4XmMOpbyMbm3mNE0 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-4XmMOpbyMbm3mNE0 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-4XmMOpbyMbm3mNE0 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-4XmMOpbyMbm3mNE0 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-4XmMOpbyMbm3mNE0 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-4XmMOpbyMbm3mNE0 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-4XmMOpbyMbm3mNE0 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-4XmMOpbyMbm3mNE0 .cluster text{fill:#333;}#mermaid-svg-4XmMOpbyMbm3mNE0 .cluster span{color:#333;}#mermaid-svg-4XmMOpbyMbm3mNE0 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-4XmMOpbyMbm3mNE0 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-4XmMOpbyMbm3mNE0 rect.text{fill:none;stroke-width:0;}#mermaid-svg-4XmMOpbyMbm3mNE0 .icon-shape,#mermaid-svg-4XmMOpbyMbm3mNE0 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-4XmMOpbyMbm3mNE0 .icon-shape p,#mermaid-svg-4XmMOpbyMbm3mNE0 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-4XmMOpbyMbm3mNE0 .icon-shape .label rect,#mermaid-svg-4XmMOpbyMbm3mNE0 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-4XmMOpbyMbm3mNE0 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-4XmMOpbyMbm3mNE0 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-4XmMOpbyMbm3mNE0 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} input_ids

bs, seq

含 image_token_id 占位符
embed_tokens(input_ids)

bs, seq, 2048

pixel_values

图像/视频像素
VisionModel 编码

→ image_embeds / video_embeds
get_placeholder_mask()

定位 image_token_id / video_token_id
torch_compilable_check

验证 token 数 == feature 数
masked_scatter(mask, embeds)

将视觉嵌入填入占位符位置
inputs_embeds

bs, seq, 2048

文本+视觉融合嵌入

关键代码:

python 复制代码
if pixel_values is not None:
    image_embeds = self.get_image_features(pixel_values, image_grid_thw)
    image_mask, _ = self.get_placeholder_mask(input_ids, inputs_embeds=inputs_embeds, image_features=image_embeds)
    inputs_embeds = inputs_embeds.masked_scatter(image_mask, image_embeds)

if pixel_values_videos is not None:
    video_embeds = self.get_video_features(pixel_values_videos, video_grid_thw)
    _, video_mask = self.get_placeholder_mask(input_ids, inputs_embeds=inputs_embeds, video_features=video_embeds)
    inputs_embeds = inputs_embeds.masked_scatter(video_mask, video_embeds)

7. RoPE 与 M-RoPE 位置编码

M-RoPE(Multimodal Rotary Position Embedding)是 Qwen3.5 系列的核心位置编码方案,支持文本的 1D 位置和图像/视频的 3D 位置。

7.1 M-RoPE 原理图

#mermaid-svg-C9xwoV6oa6cMQ8oL{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-C9xwoV6oa6cMQ8oL .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-C9xwoV6oa6cMQ8oL .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-C9xwoV6oa6cMQ8oL .error-icon{fill:#552222;}#mermaid-svg-C9xwoV6oa6cMQ8oL .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-C9xwoV6oa6cMQ8oL .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-C9xwoV6oa6cMQ8oL .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-C9xwoV6oa6cMQ8oL .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-C9xwoV6oa6cMQ8oL .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-C9xwoV6oa6cMQ8oL .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-C9xwoV6oa6cMQ8oL .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-C9xwoV6oa6cMQ8oL .marker{fill:#333333;stroke:#333333;}#mermaid-svg-C9xwoV6oa6cMQ8oL .marker.cross{stroke:#333333;}#mermaid-svg-C9xwoV6oa6cMQ8oL svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-C9xwoV6oa6cMQ8oL p{margin:0;}#mermaid-svg-C9xwoV6oa6cMQ8oL .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-C9xwoV6oa6cMQ8oL .cluster-label text{fill:#333;}#mermaid-svg-C9xwoV6oa6cMQ8oL .cluster-label span{color:#333;}#mermaid-svg-C9xwoV6oa6cMQ8oL .cluster-label span p{background-color:transparent;}#mermaid-svg-C9xwoV6oa6cMQ8oL .label text,#mermaid-svg-C9xwoV6oa6cMQ8oL span{fill:#333;color:#333;}#mermaid-svg-C9xwoV6oa6cMQ8oL .node rect,#mermaid-svg-C9xwoV6oa6cMQ8oL .node circle,#mermaid-svg-C9xwoV6oa6cMQ8oL .node ellipse,#mermaid-svg-C9xwoV6oa6cMQ8oL .node polygon,#mermaid-svg-C9xwoV6oa6cMQ8oL .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-C9xwoV6oa6cMQ8oL .rough-node .label text,#mermaid-svg-C9xwoV6oa6cMQ8oL .node .label text,#mermaid-svg-C9xwoV6oa6cMQ8oL .image-shape .label,#mermaid-svg-C9xwoV6oa6cMQ8oL .icon-shape .label{text-anchor:middle;}#mermaid-svg-C9xwoV6oa6cMQ8oL .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-C9xwoV6oa6cMQ8oL .rough-node .label,#mermaid-svg-C9xwoV6oa6cMQ8oL .node .label,#mermaid-svg-C9xwoV6oa6cMQ8oL .image-shape .label,#mermaid-svg-C9xwoV6oa6cMQ8oL .icon-shape .label{text-align:center;}#mermaid-svg-C9xwoV6oa6cMQ8oL .node.clickable{cursor:pointer;}#mermaid-svg-C9xwoV6oa6cMQ8oL .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-C9xwoV6oa6cMQ8oL .arrowheadPath{fill:#333333;}#mermaid-svg-C9xwoV6oa6cMQ8oL .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-C9xwoV6oa6cMQ8oL .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-C9xwoV6oa6cMQ8oL .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-C9xwoV6oa6cMQ8oL .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-C9xwoV6oa6cMQ8oL .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-C9xwoV6oa6cMQ8oL .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-C9xwoV6oa6cMQ8oL .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-C9xwoV6oa6cMQ8oL .cluster text{fill:#333;}#mermaid-svg-C9xwoV6oa6cMQ8oL .cluster span{color:#333;}#mermaid-svg-C9xwoV6oa6cMQ8oL div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-C9xwoV6oa6cMQ8oL .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-C9xwoV6oa6cMQ8oL rect.text{fill:none;stroke-width:0;}#mermaid-svg-C9xwoV6oa6cMQ8oL .icon-shape,#mermaid-svg-C9xwoV6oa6cMQ8oL .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-C9xwoV6oa6cMQ8oL .icon-shape p,#mermaid-svg-C9xwoV6oa6cMQ8oL .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-C9xwoV6oa6cMQ8oL .icon-shape .label rect,#mermaid-svg-C9xwoV6oa6cMQ8oL .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-C9xwoV6oa6cMQ8oL .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-C9xwoV6oa6cMQ8oL .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-C9xwoV6oa6cMQ8oL :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 视频 token (3D 位置)
grid_thw = T, H, W
T: 0,0,...,0,1,1,...,1,...

(帧间递增)
H: 0,0,1,1,...,0,0,1,1,...

(每帧内行重复)
W: 0,1,0,1,...,0,1,0,1,...

(每帧内列重复)
图像 token (3D 位置)
grid_thw = 1, H, W
T: 0,0,0,...,0

(单帧,全0)
H: 0,0,1,1,2,2,...

(行重复)
W: 0,1,0,1,0,1,...

(列重复)
文本 token (1D 位置)
position_ids = 0,1,2,3,...

三个维度使用相同位置
T: 0,1,2,3,...
H: 0,1,2,3,...
W: 0,1,2,3,...

7.2 apply_interleaved_mrope 交错排列图

modeling_qwen3_5_moe.py:165-180(file:///workspace/src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py#L165) 中,M-RoPE 将三维频率交错排列:
#mermaid-svg-Fji9AvE1mxXLtnzg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-Fji9AvE1mxXLtnzg .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-Fji9AvE1mxXLtnzg .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-Fji9AvE1mxXLtnzg .error-icon{fill:#552222;}#mermaid-svg-Fji9AvE1mxXLtnzg .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-Fji9AvE1mxXLtnzg .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-Fji9AvE1mxXLtnzg .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-Fji9AvE1mxXLtnzg .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-Fji9AvE1mxXLtnzg .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-Fji9AvE1mxXLtnzg .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-Fji9AvE1mxXLtnzg .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-Fji9AvE1mxXLtnzg .marker{fill:#333333;stroke:#333333;}#mermaid-svg-Fji9AvE1mxXLtnzg .marker.cross{stroke:#333333;}#mermaid-svg-Fji9AvE1mxXLtnzg svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-Fji9AvE1mxXLtnzg p{margin:0;}#mermaid-svg-Fji9AvE1mxXLtnzg .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-Fji9AvE1mxXLtnzg .cluster-label text{fill:#333;}#mermaid-svg-Fji9AvE1mxXLtnzg .cluster-label span{color:#333;}#mermaid-svg-Fji9AvE1mxXLtnzg .cluster-label span p{background-color:transparent;}#mermaid-svg-Fji9AvE1mxXLtnzg .label text,#mermaid-svg-Fji9AvE1mxXLtnzg span{fill:#333;color:#333;}#mermaid-svg-Fji9AvE1mxXLtnzg .node rect,#mermaid-svg-Fji9AvE1mxXLtnzg .node circle,#mermaid-svg-Fji9AvE1mxXLtnzg .node ellipse,#mermaid-svg-Fji9AvE1mxXLtnzg .node polygon,#mermaid-svg-Fji9AvE1mxXLtnzg .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-Fji9AvE1mxXLtnzg .rough-node .label text,#mermaid-svg-Fji9AvE1mxXLtnzg .node .label text,#mermaid-svg-Fji9AvE1mxXLtnzg .image-shape .label,#mermaid-svg-Fji9AvE1mxXLtnzg .icon-shape .label{text-anchor:middle;}#mermaid-svg-Fji9AvE1mxXLtnzg .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-Fji9AvE1mxXLtnzg .rough-node .label,#mermaid-svg-Fji9AvE1mxXLtnzg .node .label,#mermaid-svg-Fji9AvE1mxXLtnzg .image-shape .label,#mermaid-svg-Fji9AvE1mxXLtnzg .icon-shape .label{text-align:center;}#mermaid-svg-Fji9AvE1mxXLtnzg .node.clickable{cursor:pointer;}#mermaid-svg-Fji9AvE1mxXLtnzg .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-Fji9AvE1mxXLtnzg .arrowheadPath{fill:#333333;}#mermaid-svg-Fji9AvE1mxXLtnzg .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-Fji9AvE1mxXLtnzg .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-Fji9AvE1mxXLtnzg .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-Fji9AvE1mxXLtnzg .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-Fji9AvE1mxXLtnzg .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-Fji9AvE1mxXLtnzg .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-Fji9AvE1mxXLtnzg .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-Fji9AvE1mxXLtnzg .cluster text{fill:#333;}#mermaid-svg-Fji9AvE1mxXLtnzg .cluster span{color:#333;}#mermaid-svg-Fji9AvE1mxXLtnzg div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-Fji9AvE1mxXLtnzg .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-Fji9AvE1mxXLtnzg rect.text{fill:none;stroke-width:0;}#mermaid-svg-Fji9AvE1mxXLtnzg .icon-shape,#mermaid-svg-Fji9AvE1mxXLtnzg .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-Fji9AvE1mxXLtnzg .icon-shape p,#mermaid-svg-Fji9AvE1mxXLtnzg .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-Fji9AvE1mxXLtnzg .icon-shape .label rect,#mermaid-svg-Fji9AvE1mxXLtnzg .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-Fji9AvE1mxXLtnzg .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-Fji9AvE1mxXLtnzg .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-Fji9AvE1mxXLtnzg :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 交错排列后
分块频率 (mrope_section=11,11,10)
T 频率: f0...f10

11 个维度
H 频率: f0...f10

11 个维度
W 频率: f0...f9

10 个维度

T0, H0, W0, T1, H1, W1, ..., T10, H10, T10, H10

THW 交错 → 保持频率连续性

关键代码:

python 复制代码
def apply_interleaved_mrope(self, freqs, mrope_section):
    freqs_t = freqs[0]  # 以 T 维度为基底
    for dim, offset in enumerate((1, 2), start=1):  # H, W
        length = mrope_section[dim] * 3
        idx = slice(offset, length, 3)  # 交错索引
        freqs_t[..., idx] = freqs[dim, ..., idx]
    return freqs_t

7.3 get_rope_index 位置计算流程图

定义在 modeling_qwen3_5_moe.py:1452-1543(file:///workspace/src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py#L1452):
#mermaid-svg-vKI32V0iYJWEJhp6{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-vKI32V0iYJWEJhp6 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-vKI32V0iYJWEJhp6 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-vKI32V0iYJWEJhp6 .error-icon{fill:#552222;}#mermaid-svg-vKI32V0iYJWEJhp6 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-vKI32V0iYJWEJhp6 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-vKI32V0iYJWEJhp6 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-vKI32V0iYJWEJhp6 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-vKI32V0iYJWEJhp6 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-vKI32V0iYJWEJhp6 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-vKI32V0iYJWEJhp6 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-vKI32V0iYJWEJhp6 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-vKI32V0iYJWEJhp6 .marker.cross{stroke:#333333;}#mermaid-svg-vKI32V0iYJWEJhp6 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-vKI32V0iYJWEJhp6 p{margin:0;}#mermaid-svg-vKI32V0iYJWEJhp6 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-vKI32V0iYJWEJhp6 .cluster-label text{fill:#333;}#mermaid-svg-vKI32V0iYJWEJhp6 .cluster-label span{color:#333;}#mermaid-svg-vKI32V0iYJWEJhp6 .cluster-label span p{background-color:transparent;}#mermaid-svg-vKI32V0iYJWEJhp6 .label text,#mermaid-svg-vKI32V0iYJWEJhp6 span{fill:#333;color:#333;}#mermaid-svg-vKI32V0iYJWEJhp6 .node rect,#mermaid-svg-vKI32V0iYJWEJhp6 .node circle,#mermaid-svg-vKI32V0iYJWEJhp6 .node ellipse,#mermaid-svg-vKI32V0iYJWEJhp6 .node polygon,#mermaid-svg-vKI32V0iYJWEJhp6 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-vKI32V0iYJWEJhp6 .rough-node .label text,#mermaid-svg-vKI32V0iYJWEJhp6 .node .label text,#mermaid-svg-vKI32V0iYJWEJhp6 .image-shape .label,#mermaid-svg-vKI32V0iYJWEJhp6 .icon-shape .label{text-anchor:middle;}#mermaid-svg-vKI32V0iYJWEJhp6 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-vKI32V0iYJWEJhp6 .rough-node .label,#mermaid-svg-vKI32V0iYJWEJhp6 .node .label,#mermaid-svg-vKI32V0iYJWEJhp6 .image-shape .label,#mermaid-svg-vKI32V0iYJWEJhp6 .icon-shape .label{text-align:center;}#mermaid-svg-vKI32V0iYJWEJhp6 .node.clickable{cursor:pointer;}#mermaid-svg-vKI32V0iYJWEJhp6 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-vKI32V0iYJWEJhp6 .arrowheadPath{fill:#333333;}#mermaid-svg-vKI32V0iYJWEJhp6 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-vKI32V0iYJWEJhp6 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-vKI32V0iYJWEJhp6 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-vKI32V0iYJWEJhp6 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-vKI32V0iYJWEJhp6 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-vKI32V0iYJWEJhp6 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-vKI32V0iYJWEJhp6 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-vKI32V0iYJWEJhp6 .cluster text{fill:#333;}#mermaid-svg-vKI32V0iYJWEJhp6 .cluster span{color:#333;}#mermaid-svg-vKI32V0iYJWEJhp6 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-vKI32V0iYJWEJhp6 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-vKI32V0iYJWEJhp6 rect.text{fill:none;stroke-width:0;}#mermaid-svg-vKI32V0iYJWEJhp6 .icon-shape,#mermaid-svg-vKI32V0iYJWEJhp6 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-vKI32V0iYJWEJhp6 .icon-shape p,#mermaid-svg-vKI32V0iYJWEJhp6 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-vKI32V0iYJWEJhp6 .icon-shape .label rect,#mermaid-svg-vKI32V0iYJWEJhp6 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-vKI32V0iYJWEJhp6 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-vKI32V0iYJWEJhp6 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-vKI32V0iYJWEJhp6 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 视频组 (type=2)
图像组 (type=1)
文本组 (type=0)
input_ids + mm_token_type_ids

  • image_grid_thw + video_grid_thw
    按 token_type 分组

itertools.groupby
arange(text_len) + current_pos

expand(3, -1) → T/H/W 相同
next(image_grid_thw_iter)
get_vision_position_ids(current_pos, grid_thw)
current_pos += max(H,W) // merge_size
next(video_grid_thw_iter)
get_vision_position_ids(current_pos, grid_thw)
current_pos += max(H,W) // merge_size
torch.cat 所有组的位置

shape: 3, bs, seq_len
mrope_position_deltas

= max(position) + 1 - seq_len

视频特殊处理 :由于 Qwen3.5 使用时间戳分隔视频帧,video_grid_thw 需要按帧拆分:

python 复制代码
if video_grid_thw is not None:
    video_grid_thw = torch.repeat_interleave(video_grid_thw, video_grid_thw[:, 0], dim=0)
    video_grid_thw[:, 0] = 1  # 每帧独立

8. 缓存系统

Qwen3.5-MoE 的混合注意力架构需要混合缓存:full_attention 层使用 KV Cache,linear_attention 层使用 conv_state + recurrent_state

8.1 混合缓存架构图

#mermaid-svg-qvYG1rhHZFdxCDEp{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-qvYG1rhHZFdxCDEp .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-qvYG1rhHZFdxCDEp .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-qvYG1rhHZFdxCDEp .error-icon{fill:#552222;}#mermaid-svg-qvYG1rhHZFdxCDEp .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-qvYG1rhHZFdxCDEp .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-qvYG1rhHZFdxCDEp .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-qvYG1rhHZFdxCDEp .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-qvYG1rhHZFdxCDEp .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-qvYG1rhHZFdxCDEp .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-qvYG1rhHZFdxCDEp .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-qvYG1rhHZFdxCDEp .marker{fill:#333333;stroke:#333333;}#mermaid-svg-qvYG1rhHZFdxCDEp .marker.cross{stroke:#333333;}#mermaid-svg-qvYG1rhHZFdxCDEp svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-qvYG1rhHZFdxCDEp p{margin:0;}#mermaid-svg-qvYG1rhHZFdxCDEp .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-qvYG1rhHZFdxCDEp .cluster-label text{fill:#333;}#mermaid-svg-qvYG1rhHZFdxCDEp .cluster-label span{color:#333;}#mermaid-svg-qvYG1rhHZFdxCDEp .cluster-label span p{background-color:transparent;}#mermaid-svg-qvYG1rhHZFdxCDEp .label text,#mermaid-svg-qvYG1rhHZFdxCDEp span{fill:#333;color:#333;}#mermaid-svg-qvYG1rhHZFdxCDEp .node rect,#mermaid-svg-qvYG1rhHZFdxCDEp .node circle,#mermaid-svg-qvYG1rhHZFdxCDEp .node ellipse,#mermaid-svg-qvYG1rhHZFdxCDEp .node polygon,#mermaid-svg-qvYG1rhHZFdxCDEp .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-qvYG1rhHZFdxCDEp .rough-node .label text,#mermaid-svg-qvYG1rhHZFdxCDEp .node .label text,#mermaid-svg-qvYG1rhHZFdxCDEp .image-shape .label,#mermaid-svg-qvYG1rhHZFdxCDEp .icon-shape .label{text-anchor:middle;}#mermaid-svg-qvYG1rhHZFdxCDEp .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-qvYG1rhHZFdxCDEp .rough-node .label,#mermaid-svg-qvYG1rhHZFdxCDEp .node .label,#mermaid-svg-qvYG1rhHZFdxCDEp .image-shape .label,#mermaid-svg-qvYG1rhHZFdxCDEp .icon-shape .label{text-align:center;}#mermaid-svg-qvYG1rhHZFdxCDEp .node.clickable{cursor:pointer;}#mermaid-svg-qvYG1rhHZFdxCDEp .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-qvYG1rhHZFdxCDEp .arrowheadPath{fill:#333333;}#mermaid-svg-qvYG1rhHZFdxCDEp .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-qvYG1rhHZFdxCDEp .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-qvYG1rhHZFdxCDEp .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-qvYG1rhHZFdxCDEp .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-qvYG1rhHZFdxCDEp .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-qvYG1rhHZFdxCDEp .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-qvYG1rhHZFdxCDEp .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-qvYG1rhHZFdxCDEp .cluster text{fill:#333;}#mermaid-svg-qvYG1rhHZFdxCDEp .cluster span{color:#333;}#mermaid-svg-qvYG1rhHZFdxCDEp div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-qvYG1rhHZFdxCDEp .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-qvYG1rhHZFdxCDEp rect.text{fill:none;stroke-width:0;}#mermaid-svg-qvYG1rhHZFdxCDEp .icon-shape,#mermaid-svg-qvYG1rhHZFdxCDEp .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-qvYG1rhHZFdxCDEp .icon-shape p,#mermaid-svg-qvYG1rhHZFdxCDEp .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-qvYG1rhHZFdxCDEp .icon-shape .label rect,#mermaid-svg-qvYG1rhHZFdxCDEp .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-qvYG1rhHZFdxCDEp .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-qvYG1rhHZFdxCDEp .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-qvYG1rhHZFdxCDEp :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} DynamicCache (统一管理)
linear_attention 层缓存
full_attention 层缓存
full_attention
linear_attention
linear_attention
CacheLayer

key_cache: bs, heads, seq, dim

value_cache: bs, heads, seq, dim
LinearAttentionCacheLayerMixin

conv_states: bs, conv_dim, kernel_size

因果卷积状态
recurrent_states: bs, heads, k_dim, v_dim

DeltaNet 递推状态
config.layer_types

确定每层缓存类型

DynamicCache 在初始化时根据 config.layer_types 自动判断每层的缓存类型,定义在 cache_utils.py:1229(file:///workspace/src/transformers/cache_utils.py)。

8.2 linear_attention 层的缓存更新流程

recurrent_state conv_state DynamicCache Qwen3_5MoeGatedDeltaNet recurrent_state conv_state DynamicCache Qwen3_5MoeGatedDeltaNet #mermaid-svg-MhtFPkmoy42yPA1T{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-MhtFPkmoy42yPA1T .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-MhtFPkmoy42yPA1T .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-MhtFPkmoy42yPA1T .error-icon{fill:#552222;}#mermaid-svg-MhtFPkmoy42yPA1T .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-MhtFPkmoy42yPA1T .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-MhtFPkmoy42yPA1T .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-MhtFPkmoy42yPA1T .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-MhtFPkmoy42yPA1T .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-MhtFPkmoy42yPA1T .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-MhtFPkmoy42yPA1T .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-MhtFPkmoy42yPA1T .marker{fill:#333333;stroke:#333333;}#mermaid-svg-MhtFPkmoy42yPA1T .marker.cross{stroke:#333333;}#mermaid-svg-MhtFPkmoy42yPA1T svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-MhtFPkmoy42yPA1T p{margin:0;}#mermaid-svg-MhtFPkmoy42yPA1T .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-MhtFPkmoy42yPA1T text.actor>tspan{fill:black;stroke:none;}#mermaid-svg-MhtFPkmoy42yPA1T .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-MhtFPkmoy42yPA1T .innerArc{stroke-width:1.5;stroke-dasharray:none;}#mermaid-svg-MhtFPkmoy42yPA1T .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-svg-MhtFPkmoy42yPA1T .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-svg-MhtFPkmoy42yPA1T #arrowhead path{fill:#333;stroke:#333;}#mermaid-svg-MhtFPkmoy42yPA1T .sequenceNumber{fill:white;}#mermaid-svg-MhtFPkmoy42yPA1T #sequencenumber{fill:#333;}#mermaid-svg-MhtFPkmoy42yPA1T #crosshead path{fill:#333;stroke:#333;}#mermaid-svg-MhtFPkmoy42yPA1T .messageText{fill:#333;stroke:none;}#mermaid-svg-MhtFPkmoy42yPA1T .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-MhtFPkmoy42yPA1T .labelText,#mermaid-svg-MhtFPkmoy42yPA1T .labelText>tspan{fill:black;stroke:none;}#mermaid-svg-MhtFPkmoy42yPA1T .loopText,#mermaid-svg-MhtFPkmoy42yPA1T .loopText>tspan{fill:black;stroke:none;}#mermaid-svg-MhtFPkmoy42yPA1T .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-MhtFPkmoy42yPA1T .note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-MhtFPkmoy42yPA1T .noteText,#mermaid-svg-MhtFPkmoy42yPA1T .noteText>tspan{fill:black;stroke:none;}#mermaid-svg-MhtFPkmoy42yPA1T .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-svg-MhtFPkmoy42yPA1T .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-svg-MhtFPkmoy42yPA1T .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-svg-MhtFPkmoy42yPA1T .actorPopupMenu{position:absolute;}#mermaid-svg-MhtFPkmoy42yPA1T .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg-MhtFPkmoy42yPA1T .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-MhtFPkmoy42yPA1T .actor-man circle,#mermaid-svg-MhtFPkmoy42yPA1T line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#mermaid-svg-MhtFPkmoy42yPA1T :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Prefill 阶段 (seq_len > 1) Decode 阶段 (seq_len == 1) conv_state 原地更新 S_t = S_{t-1} * g + k^T * δ has_previous_state(layer_idx)? False (首次) in_proj_qkv → causal_conv1d update_conv_state(new_conv_state, layer_idx) 懒初始化 + copy chunk_gated_delta_rule(Q, K, V, g, β) update_recurrent_state(last_recurrent_state, layer_idx) copy has_previous_state(layer_idx)? True conv_state, recurrent_state causal_conv1d_update (单步更新) recurrent_gated_delta_rule (递推) update_recurrent_state(new_state, layer_idx) copy

关键代码在 modeling_qwen3_5_moe.py:449-546(file:///workspace/src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py#L449):

python 复制代码
use_precomputed_states = cache_params is not None and cache_params.has_previous_state(self.layer_idx)

if use_precomputed_states:
    conv_state = cache_params.layers[self.layer_idx].conv_states
    recurrent_state = cache_params.layers[self.layer_idx].recurrent_states

# Prefill: 多 token,使用 chunk 模式
if not (use_precomputed_states and seq_len == 1):
    if cache_params is not None:
        new_conv_state = F.pad(mixed_qkv, (self.conv_kernel_size - mixed_qkv.shape[-1], 0))
        cache_params.update_conv_state(new_conv_state, self.layer_idx)
    core_attn_out, last_recurrent_state = self.chunk_gated_delta_rule(...)

# Decode: 单 token,使用 recurrent 模式
else:
    mixed_qkv = self.causal_conv1d_update(mixed_qkv, conv_state, ...)
    core_attn_out, last_recurrent_state = self.recurrent_gated_delta_rule(...)

if cache_params is not None:
    cache_params.update_recurrent_state(last_recurrent_state, self.layer_idx)

9. generate() 生成全流程

9.1 生成循环时序图

DynamicCache Qwen3_5MoeTextModel Qwen3_5MoeVisionModel Qwen3_5MoeForConditionalGeneration AutoProcessor 用户 DynamicCache Qwen3_5MoeTextModel Qwen3_5MoeVisionModel Qwen3_5MoeForConditionalGeneration AutoProcessor 用户 #mermaid-svg-dmc8lBfGLXt1X299{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-dmc8lBfGLXt1X299 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-dmc8lBfGLXt1X299 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-dmc8lBfGLXt1X299 .error-icon{fill:#552222;}#mermaid-svg-dmc8lBfGLXt1X299 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-dmc8lBfGLXt1X299 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-dmc8lBfGLXt1X299 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-dmc8lBfGLXt1X299 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-dmc8lBfGLXt1X299 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-dmc8lBfGLXt1X299 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-dmc8lBfGLXt1X299 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-dmc8lBfGLXt1X299 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-dmc8lBfGLXt1X299 .marker.cross{stroke:#333333;}#mermaid-svg-dmc8lBfGLXt1X299 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-dmc8lBfGLXt1X299 p{margin:0;}#mermaid-svg-dmc8lBfGLXt1X299 .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-dmc8lBfGLXt1X299 text.actor>tspan{fill:black;stroke:none;}#mermaid-svg-dmc8lBfGLXt1X299 .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-dmc8lBfGLXt1X299 .innerArc{stroke-width:1.5;stroke-dasharray:none;}#mermaid-svg-dmc8lBfGLXt1X299 .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-svg-dmc8lBfGLXt1X299 .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-svg-dmc8lBfGLXt1X299 #arrowhead path{fill:#333;stroke:#333;}#mermaid-svg-dmc8lBfGLXt1X299 .sequenceNumber{fill:white;}#mermaid-svg-dmc8lBfGLXt1X299 #sequencenumber{fill:#333;}#mermaid-svg-dmc8lBfGLXt1X299 #crosshead path{fill:#333;stroke:#333;}#mermaid-svg-dmc8lBfGLXt1X299 .messageText{fill:#333;stroke:none;}#mermaid-svg-dmc8lBfGLXt1X299 .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-dmc8lBfGLXt1X299 .labelText,#mermaid-svg-dmc8lBfGLXt1X299 .labelText>tspan{fill:black;stroke:none;}#mermaid-svg-dmc8lBfGLXt1X299 .loopText,#mermaid-svg-dmc8lBfGLXt1X299 .loopText>tspan{fill:black;stroke:none;}#mermaid-svg-dmc8lBfGLXt1X299 .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-dmc8lBfGLXt1X299 .note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-dmc8lBfGLXt1X299 .noteText,#mermaid-svg-dmc8lBfGLXt1X299 .noteText>tspan{fill:black;stroke:none;}#mermaid-svg-dmc8lBfGLXt1X299 .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-svg-dmc8lBfGLXt1X299 .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-svg-dmc8lBfGLXt1X299 .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-svg-dmc8lBfGLXt1X299 .actorPopupMenu{position:absolute;}#mermaid-svg-dmc8lBfGLXt1X299 .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg-dmc8lBfGLXt1X299 .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-dmc8lBfGLXt1X299 .actor-man circle,#mermaid-svg-dmc8lBfGLXt1X299 line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#mermaid-svg-dmc8lBfGLXt1X299 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 首次迭代 (is_first_iteration=True) 后续迭代 (is_first_iteration=False) loop 生成循环 处理图像+文本 input_ids + pixel_values + grid_thw + mm_token_type_ids prepare_inputs_for_generation() get_image_features(pixel_values, grid_thw) image_embeds masked_scatter 融合视觉嵌入 _prepare_position_ids_for_generation() 计算 3D position_ids + rope_deltas forward(inputs_embeds, position_ids, ...) 初始化 DynamicCache hidden_states lm_head → logits → 采样 next_token prepare_inputs_for_generation() 清除 pixel_values/grid_thw _prepare_position_ids_for_generation() 使用 rope_deltas 推算位置 forward(input_ids=next_token, position_ids, past_key_values) 读取/更新缓存 hidden_states lm_head → logits → 采样 next_token generated_ids

9.2 prepare_inputs_for_generation 的特殊处理

定义在 modeling_qwen3_5_moe.py:2106-2142(file:///workspace/src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py#L2106):

python 复制代码
def prepare_inputs_for_generation(self, input_ids, past_key_values=None, ...,
                                   pixel_values=None, pixel_values_videos=None,
                                   image_grid_thw=None, video_grid_thw=None,
                                   is_first_iteration=False, **kwargs):
    model_inputs = super().prepare_inputs_for_generation(...)

    # 首次迭代后清除视觉输入,避免重复编码
    if not is_first_iteration and use_cache:
        model_inputs["pixel_values"] = None
        model_inputs["pixel_values_videos"] = None

    return model_inputs

#mermaid-svg-26xaSGdaj6czf8xa{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-26xaSGdaj6czf8xa .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-26xaSGdaj6czf8xa .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-26xaSGdaj6czf8xa .error-icon{fill:#552222;}#mermaid-svg-26xaSGdaj6czf8xa .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-26xaSGdaj6czf8xa .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-26xaSGdaj6czf8xa .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-26xaSGdaj6czf8xa .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-26xaSGdaj6czf8xa .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-26xaSGdaj6czf8xa .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-26xaSGdaj6czf8xa .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-26xaSGdaj6czf8xa .marker{fill:#333333;stroke:#333333;}#mermaid-svg-26xaSGdaj6czf8xa .marker.cross{stroke:#333333;}#mermaid-svg-26xaSGdaj6czf8xa svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-26xaSGdaj6czf8xa p{margin:0;}#mermaid-svg-26xaSGdaj6czf8xa .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-26xaSGdaj6czf8xa .cluster-label text{fill:#333;}#mermaid-svg-26xaSGdaj6czf8xa .cluster-label span{color:#333;}#mermaid-svg-26xaSGdaj6czf8xa .cluster-label span p{background-color:transparent;}#mermaid-svg-26xaSGdaj6czf8xa .label text,#mermaid-svg-26xaSGdaj6czf8xa span{fill:#333;color:#333;}#mermaid-svg-26xaSGdaj6czf8xa .node rect,#mermaid-svg-26xaSGdaj6czf8xa .node circle,#mermaid-svg-26xaSGdaj6czf8xa .node ellipse,#mermaid-svg-26xaSGdaj6czf8xa .node polygon,#mermaid-svg-26xaSGdaj6czf8xa .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-26xaSGdaj6czf8xa .rough-node .label text,#mermaid-svg-26xaSGdaj6czf8xa .node .label text,#mermaid-svg-26xaSGdaj6czf8xa .image-shape .label,#mermaid-svg-26xaSGdaj6czf8xa .icon-shape .label{text-anchor:middle;}#mermaid-svg-26xaSGdaj6czf8xa .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-26xaSGdaj6czf8xa .rough-node .label,#mermaid-svg-26xaSGdaj6czf8xa .node .label,#mermaid-svg-26xaSGdaj6czf8xa .image-shape .label,#mermaid-svg-26xaSGdaj6czf8xa .icon-shape .label{text-align:center;}#mermaid-svg-26xaSGdaj6czf8xa .node.clickable{cursor:pointer;}#mermaid-svg-26xaSGdaj6czf8xa .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-26xaSGdaj6czf8xa .arrowheadPath{fill:#333333;}#mermaid-svg-26xaSGdaj6czf8xa .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-26xaSGdaj6czf8xa .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-26xaSGdaj6czf8xa .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-26xaSGdaj6czf8xa .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-26xaSGdaj6czf8xa .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-26xaSGdaj6czf8xa .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-26xaSGdaj6czf8xa .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-26xaSGdaj6czf8xa .cluster text{fill:#333;}#mermaid-svg-26xaSGdaj6czf8xa .cluster span{color:#333;}#mermaid-svg-26xaSGdaj6czf8xa div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-26xaSGdaj6czf8xa .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-26xaSGdaj6czf8xa rect.text{fill:none;stroke-width:0;}#mermaid-svg-26xaSGdaj6czf8xa .icon-shape,#mermaid-svg-26xaSGdaj6czf8xa .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-26xaSGdaj6czf8xa .icon-shape p,#mermaid-svg-26xaSGdaj6czf8xa .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-26xaSGdaj6czf8xa .icon-shape .label rect,#mermaid-svg-26xaSGdaj6czf8xa .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-26xaSGdaj6czf8xa .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-26xaSGdaj6czf8xa .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-26xaSGdaj6czf8xa :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 后续迭代
pixel_values: ❌ None
→ 仅文本 token,使用 rope_deltas 推算位置
image_grid_thw: ❌ None
mm_token_type_ids: ❌ None
首次迭代
pixel_values: ✅ 有值
→ 视觉编码 + masked_scatter
image_grid_thw: ✅ 有值
mm_token_type_ids: ✅ 有值

9.3 _prepare_position_ids_for_generation 的 3D 位置编码处理

定义在 modeling_qwen3_5_moe.py:2144-2180(file:///workspace/src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py#L2144):
#mermaid-svg-VvSrbpcw1FmxkpZr{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-VvSrbpcw1FmxkpZr .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-VvSrbpcw1FmxkpZr .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-VvSrbpcw1FmxkpZr .error-icon{fill:#552222;}#mermaid-svg-VvSrbpcw1FmxkpZr .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-VvSrbpcw1FmxkpZr .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-VvSrbpcw1FmxkpZr .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-VvSrbpcw1FmxkpZr .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-VvSrbpcw1FmxkpZr .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-VvSrbpcw1FmxkpZr .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-VvSrbpcw1FmxkpZr .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-VvSrbpcw1FmxkpZr .marker{fill:#333333;stroke:#333333;}#mermaid-svg-VvSrbpcw1FmxkpZr .marker.cross{stroke:#333333;}#mermaid-svg-VvSrbpcw1FmxkpZr svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-VvSrbpcw1FmxkpZr p{margin:0;}#mermaid-svg-VvSrbpcw1FmxkpZr .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-VvSrbpcw1FmxkpZr .cluster-label text{fill:#333;}#mermaid-svg-VvSrbpcw1FmxkpZr .cluster-label span{color:#333;}#mermaid-svg-VvSrbpcw1FmxkpZr .cluster-label span p{background-color:transparent;}#mermaid-svg-VvSrbpcw1FmxkpZr .label text,#mermaid-svg-VvSrbpcw1FmxkpZr span{fill:#333;color:#333;}#mermaid-svg-VvSrbpcw1FmxkpZr .node rect,#mermaid-svg-VvSrbpcw1FmxkpZr .node circle,#mermaid-svg-VvSrbpcw1FmxkpZr .node ellipse,#mermaid-svg-VvSrbpcw1FmxkpZr .node polygon,#mermaid-svg-VvSrbpcw1FmxkpZr .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-VvSrbpcw1FmxkpZr .rough-node .label text,#mermaid-svg-VvSrbpcw1FmxkpZr .node .label text,#mermaid-svg-VvSrbpcw1FmxkpZr .image-shape .label,#mermaid-svg-VvSrbpcw1FmxkpZr .icon-shape .label{text-anchor:middle;}#mermaid-svg-VvSrbpcw1FmxkpZr .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-VvSrbpcw1FmxkpZr .rough-node .label,#mermaid-svg-VvSrbpcw1FmxkpZr .node .label,#mermaid-svg-VvSrbpcw1FmxkpZr .image-shape .label,#mermaid-svg-VvSrbpcw1FmxkpZr .icon-shape .label{text-align:center;}#mermaid-svg-VvSrbpcw1FmxkpZr .node.clickable{cursor:pointer;}#mermaid-svg-VvSrbpcw1FmxkpZr .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-VvSrbpcw1FmxkpZr .arrowheadPath{fill:#333333;}#mermaid-svg-VvSrbpcw1FmxkpZr .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-VvSrbpcw1FmxkpZr .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-VvSrbpcw1FmxkpZr .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-VvSrbpcw1FmxkpZr .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-VvSrbpcw1FmxkpZr .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-VvSrbpcw1FmxkpZr .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-VvSrbpcw1FmxkpZr .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-VvSrbpcw1FmxkpZr .cluster text{fill:#333;}#mermaid-svg-VvSrbpcw1FmxkpZr .cluster span{color:#333;}#mermaid-svg-VvSrbpcw1FmxkpZr div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-VvSrbpcw1FmxkpZr .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-VvSrbpcw1FmxkpZr rect.text{fill:none;stroke-width:0;}#mermaid-svg-VvSrbpcw1FmxkpZr .icon-shape,#mermaid-svg-VvSrbpcw1FmxkpZr .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-VvSrbpcw1FmxkpZr .icon-shape p,#mermaid-svg-VvSrbpcw1FmxkpZr .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-VvSrbpcw1FmxkpZr .icon-shape .label rect,#mermaid-svg-VvSrbpcw1FmxkpZr .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-VvSrbpcw1FmxkpZr .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-VvSrbpcw1FmxkpZr .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-VvSrbpcw1FmxkpZr :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 是



_prepare_position_ids_for_generation
past_length != 0

且 rope_deltas 存在?
position_ids = text_positions + rope_deltas

直接使用缓存的 delta
有 input_ids 且

有 mm_token_type_ids 且

有 image/video_grid_thw?
get_rope_index(input_ids, ...)

计算完整 3D 位置
存储 rope_deltas
vision_positions = text_positions.expand(3,-1,-1)

纯文本:三个维度相同
rope_deltas = zeros

无多模态偏移
torch.cat(text_positions, vision_positions)

shape: 4, bs, seq
position_ids 4, bs, seq

关键代码:

python 复制代码
def _prepare_position_ids_for_generation(self, inputs_tensor, model_kwargs):
    text_positions = super()._prepare_position_ids_for_generation(inputs_tensor, model_kwargs)

    # 增量生成:直接用缓存的 rope_deltas
    past_length = 0
    if (cache := model_kwargs.get("past_key_values")) is not None:
        past_length = cache.get_seq_length()
    if past_length != 0 and self.model.rope_deltas is not None:
        position_ids = text_positions[None, ...] + self.model.rope_deltas
        return position_ids

    # 首次生成:计算 3D 位置
    if is_input_ids and model_kwargs.get("mm_token_type_ids") is not None and ...:
        vision_positions, rope_deltas = self.model.get_rope_index(inputs_tensor, **model_kwargs)
        self.model.rope_deltas = rope_deltas
    else:
        vision_positions = text_positions.unsqueeze(0).expand(3, -1, -1)
        self.model.rope_deltas = torch.zeros(...)

    # 拼接 [text, T, H, W] → [4, bs, seq]
    text_positions = text_positions[None, ...]
    position_ids = torch.cat([text_positions, vision_positions], dim=0)
    return position_ids

10. 分布式并行

10.1 TP 策略映射图

定义在 configuration_qwen3_5_moe.py:59-72(file:///workspace/src/transformers/models/qwen3_5_moe/configuration_qwen3_5_moe.py#L59):
#mermaid-svg-YsIisHY06zQS4DPM{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-YsIisHY06zQS4DPM .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-YsIisHY06zQS4DPM .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-YsIisHY06zQS4DPM .error-icon{fill:#552222;}#mermaid-svg-YsIisHY06zQS4DPM .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-YsIisHY06zQS4DPM .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-YsIisHY06zQS4DPM .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-YsIisHY06zQS4DPM .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-YsIisHY06zQS4DPM .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-YsIisHY06zQS4DPM .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-YsIisHY06zQS4DPM .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-YsIisHY06zQS4DPM .marker{fill:#333333;stroke:#333333;}#mermaid-svg-YsIisHY06zQS4DPM .marker.cross{stroke:#333333;}#mermaid-svg-YsIisHY06zQS4DPM svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-YsIisHY06zQS4DPM p{margin:0;}#mermaid-svg-YsIisHY06zQS4DPM .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-YsIisHY06zQS4DPM .cluster-label text{fill:#333;}#mermaid-svg-YsIisHY06zQS4DPM .cluster-label span{color:#333;}#mermaid-svg-YsIisHY06zQS4DPM .cluster-label span p{background-color:transparent;}#mermaid-svg-YsIisHY06zQS4DPM .label text,#mermaid-svg-YsIisHY06zQS4DPM span{fill:#333;color:#333;}#mermaid-svg-YsIisHY06zQS4DPM .node rect,#mermaid-svg-YsIisHY06zQS4DPM .node circle,#mermaid-svg-YsIisHY06zQS4DPM .node ellipse,#mermaid-svg-YsIisHY06zQS4DPM .node polygon,#mermaid-svg-YsIisHY06zQS4DPM .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-YsIisHY06zQS4DPM .rough-node .label text,#mermaid-svg-YsIisHY06zQS4DPM .node .label text,#mermaid-svg-YsIisHY06zQS4DPM .image-shape .label,#mermaid-svg-YsIisHY06zQS4DPM .icon-shape .label{text-anchor:middle;}#mermaid-svg-YsIisHY06zQS4DPM .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-YsIisHY06zQS4DPM .rough-node .label,#mermaid-svg-YsIisHY06zQS4DPM .node .label,#mermaid-svg-YsIisHY06zQS4DPM .image-shape .label,#mermaid-svg-YsIisHY06zQS4DPM .icon-shape .label{text-align:center;}#mermaid-svg-YsIisHY06zQS4DPM .node.clickable{cursor:pointer;}#mermaid-svg-YsIisHY06zQS4DPM .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-YsIisHY06zQS4DPM .arrowheadPath{fill:#333333;}#mermaid-svg-YsIisHY06zQS4DPM .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-YsIisHY06zQS4DPM .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-YsIisHY06zQS4DPM .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-YsIisHY06zQS4DPM .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-YsIisHY06zQS4DPM .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-YsIisHY06zQS4DPM .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-YsIisHY06zQS4DPM .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-YsIisHY06zQS4DPM .cluster text{fill:#333;}#mermaid-svg-YsIisHY06zQS4DPM .cluster span{color:#333;}#mermaid-svg-YsIisHY06zQS4DPM div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-YsIisHY06zQS4DPM .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-YsIisHY06zQS4DPM rect.text{fill:none;stroke-width:0;}#mermaid-svg-YsIisHY06zQS4DPM .icon-shape,#mermaid-svg-YsIisHY06zQS4DPM .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-YsIisHY06zQS4DPM .icon-shape p,#mermaid-svg-YsIisHY06zQS4DPM .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-YsIisHY06zQS4DPM .icon-shape .label rect,#mermaid-svg-YsIisHY06zQS4DPM .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-YsIisHY06zQS4DPM .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-YsIisHY06zQS4DPM .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-YsIisHY06zQS4DPM :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} MoE 专家并行策略
experts.gate_up_proj → packed_colwise

打包列切分 256, 1024, 2048
experts.down_proj → rowwise
experts → moe_tp_experts

🔥 专家级并行:每个 GPU 持有部分专家
shared_expert.gate_proj → colwise
shared_expert.up_proj → colwise
shared_expert.down_proj → rowwise
注意力层并行策略
q_proj → colwise

按列切分,每个 GPU 计算部分 head
k_proj → colwise
v_proj → colwise
o_proj → rowwise

按行切分,结果 all-reduce
q_norm → replicated_with_grad_allreduce

复制,梯度 all-reduce
k_norm → replicated_with_grad_allreduce

10.2 MoE 专家并行(moe_tp_experts)原理图

#mermaid-svg-2XhquL2L9qE1X38M{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-2XhquL2L9qE1X38M .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-2XhquL2L9qE1X38M .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-2XhquL2L9qE1X38M .error-icon{fill:#552222;}#mermaid-svg-2XhquL2L9qE1X38M .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-2XhquL2L9qE1X38M .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-2XhquL2L9qE1X38M .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-2XhquL2L9qE1X38M .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-2XhquL2L9qE1X38M .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-2XhquL2L9qE1X38M .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-2XhquL2L9qE1X38M .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-2XhquL2L9qE1X38M .marker{fill:#333333;stroke:#333333;}#mermaid-svg-2XhquL2L9qE1X38M .marker.cross{stroke:#333333;}#mermaid-svg-2XhquL2L9qE1X38M svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-2XhquL2L9qE1X38M p{margin:0;}#mermaid-svg-2XhquL2L9qE1X38M .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-2XhquL2L9qE1X38M .cluster-label text{fill:#333;}#mermaid-svg-2XhquL2L9qE1X38M .cluster-label span{color:#333;}#mermaid-svg-2XhquL2L9qE1X38M .cluster-label span p{background-color:transparent;}#mermaid-svg-2XhquL2L9qE1X38M .label text,#mermaid-svg-2XhquL2L9qE1X38M span{fill:#333;color:#333;}#mermaid-svg-2XhquL2L9qE1X38M .node rect,#mermaid-svg-2XhquL2L9qE1X38M .node circle,#mermaid-svg-2XhquL2L9qE1X38M .node ellipse,#mermaid-svg-2XhquL2L9qE1X38M .node polygon,#mermaid-svg-2XhquL2L9qE1X38M .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-2XhquL2L9qE1X38M .rough-node .label text,#mermaid-svg-2XhquL2L9qE1X38M .node .label text,#mermaid-svg-2XhquL2L9qE1X38M .image-shape .label,#mermaid-svg-2XhquL2L9qE1X38M .icon-shape .label{text-anchor:middle;}#mermaid-svg-2XhquL2L9qE1X38M .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-2XhquL2L9qE1X38M .rough-node .label,#mermaid-svg-2XhquL2L9qE1X38M .node .label,#mermaid-svg-2XhquL2L9qE1X38M .image-shape .label,#mermaid-svg-2XhquL2L9qE1X38M .icon-shape .label{text-align:center;}#mermaid-svg-2XhquL2L9qE1X38M .node.clickable{cursor:pointer;}#mermaid-svg-2XhquL2L9qE1X38M .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-2XhquL2L9qE1X38M .arrowheadPath{fill:#333333;}#mermaid-svg-2XhquL2L9qE1X38M .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-2XhquL2L9qE1X38M .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-2XhquL2L9qE1X38M .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-2XhquL2L9qE1X38M .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-2XhquL2L9qE1X38M .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-2XhquL2L9qE1X38M .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-2XhquL2L9qE1X38M .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-2XhquL2L9qE1X38M .cluster text{fill:#333;}#mermaid-svg-2XhquL2L9qE1X38M .cluster span{color:#333;}#mermaid-svg-2XhquL2L9qE1X38M div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-2XhquL2L9qE1X38M .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-2XhquL2L9qE1X38M rect.text{fill:none;stroke-width:0;}#mermaid-svg-2XhquL2L9qE1X38M .icon-shape,#mermaid-svg-2XhquL2L9qE1X38M .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-2XhquL2L9qE1X38M .icon-shape p,#mermaid-svg-2XhquL2L9qE1X38M .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-2XhquL2L9qE1X38M .icon-shape .label rect,#mermaid-svg-2XhquL2L9qE1X38M .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-2XhquL2L9qE1X38M .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-2XhquL2L9qE1X38M .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-2XhquL2L9qE1X38M :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 4 GPU 张量并行
GPU 3
GPU 2
GPU 1
GPU 0
Expert 0-63

gate_up_proj0:64

down_proj0:64
Expert 64-127

gate_up_proj64:128

down_proj64:128
Expert 128-191

gate_up_proj128:192

down_proj128:192
Expert 192-255

gate_up_proj192:256

down_proj192:256
hidden_states

bs, seq, 2048

TopKRouter

每个 GPU 完整计算路由
All-to-All 通信

将 token 发送到对应专家所在 GPU
各 GPU 并行计算

本地专家前向
All-to-All 通信

收集计算结果
expert_output

bs, seq, 2048

moe_tp_experts 与普通 colwise/rowwise 的区别:

  • colwise/rowwise:切分单个线性层的权重矩阵
  • moe_tp_experts:按专家维度切分,每个 GPU 持有 256/tp_size 个完整专家

11. 状态与生命周期总结

11.1 状态机图

渲染错误: Mermaid 渲染失败: Parse error on line 41: ...生成: full_attention 层: KV Cache
linea... -----------------------^ Expecting 'SPACE', 'NL', 'HIDE_EMPTY', 'scale', 'COMPOSIT_STATE', 'STRUCT_STOP', 'STATE_DESCR', 'ID', 'FORK', 'JOIN', 'CHOICE', 'CONCURRENT', 'note', 'acc_title', 'acc_descr', 'acc_descr_multiline_value', 'CLICK', 'classDef', 'style', 'class', 'direction_tb', 'direction_bt', 'direction_rl', 'direction_lr', 'EDGE_STATE', got 'DESCR'

11.2 关键数据流总结

#mermaid-svg-Anp5gNhZXS57U3N9{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-Anp5gNhZXS57U3N9 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-Anp5gNhZXS57U3N9 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-Anp5gNhZXS57U3N9 .error-icon{fill:#552222;}#mermaid-svg-Anp5gNhZXS57U3N9 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-Anp5gNhZXS57U3N9 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-Anp5gNhZXS57U3N9 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-Anp5gNhZXS57U3N9 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-Anp5gNhZXS57U3N9 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-Anp5gNhZXS57U3N9 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-Anp5gNhZXS57U3N9 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-Anp5gNhZXS57U3N9 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-Anp5gNhZXS57U3N9 .marker.cross{stroke:#333333;}#mermaid-svg-Anp5gNhZXS57U3N9 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-Anp5gNhZXS57U3N9 p{margin:0;}#mermaid-svg-Anp5gNhZXS57U3N9 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-Anp5gNhZXS57U3N9 .cluster-label text{fill:#333;}#mermaid-svg-Anp5gNhZXS57U3N9 .cluster-label span{color:#333;}#mermaid-svg-Anp5gNhZXS57U3N9 .cluster-label span p{background-color:transparent;}#mermaid-svg-Anp5gNhZXS57U3N9 .label text,#mermaid-svg-Anp5gNhZXS57U3N9 span{fill:#333;color:#333;}#mermaid-svg-Anp5gNhZXS57U3N9 .node rect,#mermaid-svg-Anp5gNhZXS57U3N9 .node circle,#mermaid-svg-Anp5gNhZXS57U3N9 .node ellipse,#mermaid-svg-Anp5gNhZXS57U3N9 .node polygon,#mermaid-svg-Anp5gNhZXS57U3N9 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-Anp5gNhZXS57U3N9 .rough-node .label text,#mermaid-svg-Anp5gNhZXS57U3N9 .node .label text,#mermaid-svg-Anp5gNhZXS57U3N9 .image-shape .label,#mermaid-svg-Anp5gNhZXS57U3N9 .icon-shape .label{text-anchor:middle;}#mermaid-svg-Anp5gNhZXS57U3N9 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-Anp5gNhZXS57U3N9 .rough-node .label,#mermaid-svg-Anp5gNhZXS57U3N9 .node .label,#mermaid-svg-Anp5gNhZXS57U3N9 .image-shape .label,#mermaid-svg-Anp5gNhZXS57U3N9 .icon-shape .label{text-align:center;}#mermaid-svg-Anp5gNhZXS57U3N9 .node.clickable{cursor:pointer;}#mermaid-svg-Anp5gNhZXS57U3N9 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-Anp5gNhZXS57U3N9 .arrowheadPath{fill:#333333;}#mermaid-svg-Anp5gNhZXS57U3N9 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-Anp5gNhZXS57U3N9 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-Anp5gNhZXS57U3N9 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-Anp5gNhZXS57U3N9 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-Anp5gNhZXS57U3N9 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-Anp5gNhZXS57U3N9 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-Anp5gNhZXS57U3N9 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-Anp5gNhZXS57U3N9 .cluster text{fill:#333;}#mermaid-svg-Anp5gNhZXS57U3N9 .cluster span{color:#333;}#mermaid-svg-Anp5gNhZXS57U3N9 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-Anp5gNhZXS57U3N9 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-Anp5gNhZXS57U3N9 rect.text{fill:none;stroke-width:0;}#mermaid-svg-Anp5gNhZXS57U3N9 .icon-shape,#mermaid-svg-Anp5gNhZXS57U3N9 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-Anp5gNhZXS57U3N9 .icon-shape p,#mermaid-svg-Anp5gNhZXS57U3N9 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-Anp5gNhZXS57U3N9 .icon-shape .label rect,#mermaid-svg-Anp5gNhZXS57U3N9 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-Anp5gNhZXS57U3N9 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-Anp5gNhZXS57U3N9 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-Anp5gNhZXS57U3N9 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 输出
文本模型
位置编码
嵌入融合
视觉编码
输入
每层
input_layernorm
full_attention / linear_attention
post_attention_layernorm
SparseMoeBlock

256专家Top8 + 共享专家
input_ids
pixel_values
grid_thw
mm_token_type_ids
VisionModel

PatchEmbed → 27 Blocks → Merger
image_embeds

num_tokens, 3584

masked_scatter

视觉嵌入 → 占位符
inputs_embeds

bs, seq, 2048

get_rope_index

3D M-RoPE
position_ids

4, bs, seq

embed_tokens
40 层 DecoderLayer
RMSNorm
lm_head

2048, 248320

logits

11.3 核心设计哲学

Qwen3.5-MoE 在 Transformers 中的实现体现了以下设计哲学:

  1. 模块化继承 :通过 modular_qwen3_5_moe.py 中的类继承(Qwen3_5MoeGatedDeltaNet ← Qwen3_5GatedDeltaNet),最大化代码复用,最小化重复
  2. 混合架构统一管理DynamicCache 根据 config.layer_types 自动分发不同缓存类型,上层代码无需感知底层差异
  3. 多模态位置编码 :M-RoPE 将文本 1D 位置和视觉 3D 位置统一到同一框架,通过 rope_deltas 在增量生成时高效推算
  4. MoE 专家并行moe_tp_experts 策略让 256 个专家可以跨 GPU 分布,配合 @use_experts_implementation 装饰器支持多种优化后端
  5. 生成效率linear_attention 层的 O(n) 复杂度 + recurrent_state 缓存,使得增量解码无需维护完整的 KV Cache,大幅降低长序列生成的内存开销
相关推荐
Coovally AI模型快速验证1 小时前
上海 AI Lab联合发布无需人工标注的TrackRef3D:全自动3D指代分割,mIoU达38.8领跑SOTA
人工智能·3d
怪兽学LLM1 小时前
Agent Skill 完全指南:从 SKILL.md 到实战开发,打造属于你的 AI 能力插件
人工智能
米小虾1 小时前
2026年6月AI行业全景:从百模大战到Agent元年,这30天发生了什么?
人工智能
米小虾1 小时前
AI Agent全面爆发:2026年最值得关注的Agent框架与实战选择指南
人工智能
东方巴黎~Sunsiny1 小时前
后端已经开始使用AI代替前端开发了
java·人工智能·状态模式
AI科技星1 小时前
引电统一方程:严格推导与量纲零错误验证
人工智能·算法·机器学习·架构·学习方法
AI探索先锋1 小时前
[特殊字符] GPT-5.6 偷跑实锤!Anthropic 边喊“刹车“边冲 IPO,一只“哥布林“让 OpenAI 连夜封号|AI科技热线
人工智能·科技·ai
城事漫游Molly1 小时前
质性研究AI工作流(二):编码工作流 SOP
人工智能·数据分析·ai for science·定性研究·定性编码·科研工作流
库拉大叔1 小时前
大模型AI横评实测:GPT-4与Claude 3.5三大维度对比,落地选型怎么选?
大数据·人工智能