02_DeepSpec-DSpark-核心原理_推测解码与draft模型

02 · 核心原理:推测解码与 draft 模型

本篇在总分总中承上启下:上一篇 00 总览 给出项目地图,本篇从"为什么需要 draft 模型"这一根本问题出发,讲清楚推测解码的数学保证、三种 drafter 范式的优劣、DSpark 半自回归设计的动机。后续 03 DSpark 建模04 Eagle3 建模 在此基础上展开实现细节。


总览段(总)

推测解码(Speculative Decoding)的本质是用"猜 + 验"替代"逐字生成" :让一个轻量 draft 模型快速预测接下来 K 个 token,让大 target 模型一次性批量验证这 K 个候选。验证用拒绝采样(rejection sampling)实现,数学上保证输出分布与原 target 完全一致------即无损加速

DeepSpec 实现的三种 drafter 范式:

范式 算法 draft 耗时 suffix decay 代表代码
自回归 Eagle3 O ( K ) O(K) O(K) eagle3/loss.py(file:///workspace/deepspec/modeling/eagle3/loss.py)
并行 DFlash O ( 1 ) O(1) O(1) dspark/config(file:///workspace/config/dflash/dflash_qwen3_4b.py)
半自回归 DSpark O ( 1 ) + O ( K ) 轻 O(1) + O(K)_{\text{轻}} O(1)+O(K)轻 dspark/modeling.py(file:///workspace/deepspec/modeling/dspark/qwen3/modeling.py)

DSpark 的核心洞察:并行主干拿到高吞吐,串行 Markov head 解决 suffix decay,confidence head 让验证变"聪明"
#mermaid-svg-kBbnwIRQi8cwKGMu{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-kBbnwIRQi8cwKGMu .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-kBbnwIRQi8cwKGMu .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-kBbnwIRQi8cwKGMu .error-icon{fill:#552222;}#mermaid-svg-kBbnwIRQi8cwKGMu .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-kBbnwIRQi8cwKGMu .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-kBbnwIRQi8cwKGMu .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-kBbnwIRQi8cwKGMu .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-kBbnwIRQi8cwKGMu .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-kBbnwIRQi8cwKGMu .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-kBbnwIRQi8cwKGMu .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-kBbnwIRQi8cwKGMu .marker{fill:#333333;stroke:#333333;}#mermaid-svg-kBbnwIRQi8cwKGMu .marker.cross{stroke:#333333;}#mermaid-svg-kBbnwIRQi8cwKGMu svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-kBbnwIRQi8cwKGMu p{margin:0;}#mermaid-svg-kBbnwIRQi8cwKGMu .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-kBbnwIRQi8cwKGMu .cluster-label text{fill:#333;}#mermaid-svg-kBbnwIRQi8cwKGMu .cluster-label span{color:#333;}#mermaid-svg-kBbnwIRQi8cwKGMu .cluster-label span p{background-color:transparent;}#mermaid-svg-kBbnwIRQi8cwKGMu .label text,#mermaid-svg-kBbnwIRQi8cwKGMu span{fill:#333;color:#333;}#mermaid-svg-kBbnwIRQi8cwKGMu .node rect,#mermaid-svg-kBbnwIRQi8cwKGMu .node circle,#mermaid-svg-kBbnwIRQi8cwKGMu .node ellipse,#mermaid-svg-kBbnwIRQi8cwKGMu .node polygon,#mermaid-svg-kBbnwIRQi8cwKGMu .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-kBbnwIRQi8cwKGMu .rough-node .label text,#mermaid-svg-kBbnwIRQi8cwKGMu .node .label text,#mermaid-svg-kBbnwIRQi8cwKGMu .image-shape .label,#mermaid-svg-kBbnwIRQi8cwKGMu .icon-shape .label{text-anchor:middle;}#mermaid-svg-kBbnwIRQi8cwKGMu .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-kBbnwIRQi8cwKGMu .rough-node .label,#mermaid-svg-kBbnwIRQi8cwKGMu .node .label,#mermaid-svg-kBbnwIRQi8cwKGMu .image-shape .label,#mermaid-svg-kBbnwIRQi8cwKGMu .icon-shape .label{text-align:center;}#mermaid-svg-kBbnwIRQi8cwKGMu .node.clickable{cursor:pointer;}#mermaid-svg-kBbnwIRQi8cwKGMu .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-kBbnwIRQi8cwKGMu .arrowheadPath{fill:#333333;}#mermaid-svg-kBbnwIRQi8cwKGMu .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-kBbnwIRQi8cwKGMu .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-kBbnwIRQi8cwKGMu .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-kBbnwIRQi8cwKGMu .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-kBbnwIRQi8cwKGMu .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-kBbnwIRQi8cwKGMu .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-kBbnwIRQi8cwKGMu .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-kBbnwIRQi8cwKGMu .cluster text{fill:#333;}#mermaid-svg-kBbnwIRQi8cwKGMu .cluster span{color:#333;}#mermaid-svg-kBbnwIRQi8cwKGMu div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-kBbnwIRQi8cwKGMu .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-kBbnwIRQi8cwKGMu rect.text{fill:none;stroke-width:0;}#mermaid-svg-kBbnwIRQi8cwKGMu .icon-shape,#mermaid-svg-kBbnwIRQi8cwKGMu .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-kBbnwIRQi8cwKGMu .icon-shape p,#mermaid-svg-kBbnwIRQi8cwKGMu .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-kBbnwIRQi8cwKGMu .icon-shape .label rect,#mermaid-svg-kBbnwIRQi8cwKGMu .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-kBbnwIRQi8cwKGMu .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-kBbnwIRQi8cwKGMu .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-kBbnwIRQi8cwKGMu :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 接受前缀
拒绝点修正
prompt tokens
target model

1 步生成 anchor
draft model

生成 K 个候选
target model

批量验证 K+1
拒绝采样
committed tokens
residual sample
下一轮 anchor

图说明: 这是推测解码的最小循环单元。target 先生成 1 个 anchor token;draft 用它生成 K 个候选;target 一次 forward 验证 K+1 个 token(K 个 draft + 1 个 bonus);拒绝采样产出"最长正确前缀 + 1 个修正/bonus token",作为下一轮 anchor。每轮净产出 ≥ 1 \geq 1 ≥1 个 token,最多 K + 1 K+1 K+1 个。


分述段(分)

2.1 为什么自回归慢、并行快

大模型推理的瓶颈不是 FLOPs 而是显存带宽 :每次 forward 都要把整个模型权重从 HBM 搬到 SRAM。这意味着------让 GPU 同时验证 10 个 token,只比验证 1 个 token 慢一点点(权重只搬一次)。
#mermaid-svg-2048DKcxFGaUU4P6{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-2048DKcxFGaUU4P6 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-2048DKcxFGaUU4P6 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-2048DKcxFGaUU4P6 .error-icon{fill:#552222;}#mermaid-svg-2048DKcxFGaUU4P6 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-2048DKcxFGaUU4P6 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-2048DKcxFGaUU4P6 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-2048DKcxFGaUU4P6 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-2048DKcxFGaUU4P6 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-2048DKcxFGaUU4P6 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-2048DKcxFGaUU4P6 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-2048DKcxFGaUU4P6 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-2048DKcxFGaUU4P6 .marker.cross{stroke:#333333;}#mermaid-svg-2048DKcxFGaUU4P6 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-2048DKcxFGaUU4P6 p{margin:0;}#mermaid-svg-2048DKcxFGaUU4P6 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-2048DKcxFGaUU4P6 .cluster-label text{fill:#333;}#mermaid-svg-2048DKcxFGaUU4P6 .cluster-label span{color:#333;}#mermaid-svg-2048DKcxFGaUU4P6 .cluster-label span p{background-color:transparent;}#mermaid-svg-2048DKcxFGaUU4P6 .label text,#mermaid-svg-2048DKcxFGaUU4P6 span{fill:#333;color:#333;}#mermaid-svg-2048DKcxFGaUU4P6 .node rect,#mermaid-svg-2048DKcxFGaUU4P6 .node circle,#mermaid-svg-2048DKcxFGaUU4P6 .node ellipse,#mermaid-svg-2048DKcxFGaUU4P6 .node polygon,#mermaid-svg-2048DKcxFGaUU4P6 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-2048DKcxFGaUU4P6 .rough-node .label text,#mermaid-svg-2048DKcxFGaUU4P6 .node .label text,#mermaid-svg-2048DKcxFGaUU4P6 .image-shape .label,#mermaid-svg-2048DKcxFGaUU4P6 .icon-shape .label{text-anchor:middle;}#mermaid-svg-2048DKcxFGaUU4P6 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-2048DKcxFGaUU4P6 .rough-node .label,#mermaid-svg-2048DKcxFGaUU4P6 .node .label,#mermaid-svg-2048DKcxFGaUU4P6 .image-shape .label,#mermaid-svg-2048DKcxFGaUU4P6 .icon-shape .label{text-align:center;}#mermaid-svg-2048DKcxFGaUU4P6 .node.clickable{cursor:pointer;}#mermaid-svg-2048DKcxFGaUU4P6 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-2048DKcxFGaUU4P6 .arrowheadPath{fill:#333333;}#mermaid-svg-2048DKcxFGaUU4P6 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-2048DKcxFGaUU4P6 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-2048DKcxFGaUU4P6 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-2048DKcxFGaUU4P6 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-2048DKcxFGaUU4P6 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-2048DKcxFGaUU4P6 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-2048DKcxFGaUU4P6 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-2048DKcxFGaUU4P6 .cluster text{fill:#333;}#mermaid-svg-2048DKcxFGaUU4P6 .cluster span{color:#333;}#mermaid-svg-2048DKcxFGaUU4P6 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-2048DKcxFGaUU4P6 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-2048DKcxFGaUU4P6 rect.text{fill:none;stroke-width:0;}#mermaid-svg-2048DKcxFGaUU4P6 .icon-shape,#mermaid-svg-2048DKcxFGaUU4P6 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-2048DKcxFGaUU4P6 .icon-shape p,#mermaid-svg-2048DKcxFGaUU4P6 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-2048DKcxFGaUU4P6 .icon-shape .label rect,#mermaid-svg-2048DKcxFGaUU4P6 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-2048DKcxFGaUU4P6 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-2048DKcxFGaUU4P6 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-2048DKcxFGaUU4P6 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 并行验证 (target verify batch)
同时
同时
同时
token 1
token 2
token 3
token 4
权重只搬一次

耗时 O(1)
自回归生成 (Eagle3)
token 1
token 2
token 3
token 4
每步都要搬一次权重

耗时 O(K)

图说明: 自回归逐 token 生成时每步都要重新加载权重,所以 draft 耗时随 K 线性增长;并行把 K 个 token 打包成一个 batch,权重只搬一次,耗时几乎不随 K 变化。这是 DSpark 选择"并行 backbone + 轻量串行修正"的根本动因------保留并行吞吐,但用极轻的串行 head 解决下文将讲的 suffix decay。

2.2 拒绝采样的无损保证

核心算法在 base_evaluator.py:186-304(file:///workspace/deepspec/eval/base_evaluator.py#L186-304) 的 verify_draft_tokens。对每个 draft 候选 x k x_k xk:

accept_prob k = min ⁡ ( 1 , p k t ( x k ) p k d ( x k ) ) \text{accept\_prob}_k = \min\left(1, \frac{p^t_k(x_k)}{p^d_k(x_k)}\right) accept_probk=min(1,pkd(xk)pkt(xk))

  • p k t p^t_k pkt 是 target 在该位置的分布, p k d p^d_k pkd 是 draft 的分布
  • 采样 rand < accept_prob 决定接受/拒绝
  • 第一个拒绝位置之后所有 token 归零(cumprod 实现)
  • 拒绝时从残差 max ⁡ ( 0 , p t − p d ) \max(0, p^t - p^d) max(0,pt−pd) 归一化后采样修正 token
  • 全接受则从 target 最后位置采一个 bonus token

为什么无损 :这是经典拒绝采样,接受概率恰好抵消 draft 与 target 分布的差异,期望输出分布严格等于 target 分布。论文 Section 3.2.2 强调的 non-anticipating 性质------截断决策不能依赖未来 token------正是为保证这一性质而设。

2.3 三种 drafter 范式对比

#mermaid-svg-noAYfP8dE9wxtdNx{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-noAYfP8dE9wxtdNx .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-noAYfP8dE9wxtdNx .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-noAYfP8dE9wxtdNx .error-icon{fill:#552222;}#mermaid-svg-noAYfP8dE9wxtdNx .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-noAYfP8dE9wxtdNx .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-noAYfP8dE9wxtdNx .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-noAYfP8dE9wxtdNx .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-noAYfP8dE9wxtdNx .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-noAYfP8dE9wxtdNx .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-noAYfP8dE9wxtdNx .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-noAYfP8dE9wxtdNx .marker{fill:#333333;stroke:#333333;}#mermaid-svg-noAYfP8dE9wxtdNx .marker.cross{stroke:#333333;}#mermaid-svg-noAYfP8dE9wxtdNx svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-noAYfP8dE9wxtdNx p{margin:0;}#mermaid-svg-noAYfP8dE9wxtdNx .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-noAYfP8dE9wxtdNx .cluster-label text{fill:#333;}#mermaid-svg-noAYfP8dE9wxtdNx .cluster-label span{color:#333;}#mermaid-svg-noAYfP8dE9wxtdNx .cluster-label span p{background-color:transparent;}#mermaid-svg-noAYfP8dE9wxtdNx .label text,#mermaid-svg-noAYfP8dE9wxtdNx span{fill:#333;color:#333;}#mermaid-svg-noAYfP8dE9wxtdNx .node rect,#mermaid-svg-noAYfP8dE9wxtdNx .node circle,#mermaid-svg-noAYfP8dE9wxtdNx .node ellipse,#mermaid-svg-noAYfP8dE9wxtdNx .node polygon,#mermaid-svg-noAYfP8dE9wxtdNx .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-noAYfP8dE9wxtdNx .rough-node .label text,#mermaid-svg-noAYfP8dE9wxtdNx .node .label text,#mermaid-svg-noAYfP8dE9wxtdNx .image-shape .label,#mermaid-svg-noAYfP8dE9wxtdNx .icon-shape .label{text-anchor:middle;}#mermaid-svg-noAYfP8dE9wxtdNx .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-noAYfP8dE9wxtdNx .rough-node .label,#mermaid-svg-noAYfP8dE9wxtdNx .node .label,#mermaid-svg-noAYfP8dE9wxtdNx .image-shape .label,#mermaid-svg-noAYfP8dE9wxtdNx .icon-shape .label{text-align:center;}#mermaid-svg-noAYfP8dE9wxtdNx .node.clickable{cursor:pointer;}#mermaid-svg-noAYfP8dE9wxtdNx .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-noAYfP8dE9wxtdNx .arrowheadPath{fill:#333333;}#mermaid-svg-noAYfP8dE9wxtdNx .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-noAYfP8dE9wxtdNx .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-noAYfP8dE9wxtdNx .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-noAYfP8dE9wxtdNx .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-noAYfP8dE9wxtdNx .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-noAYfP8dE9wxtdNx .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-noAYfP8dE9wxtdNx .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-noAYfP8dE9wxtdNx .cluster text{fill:#333;}#mermaid-svg-noAYfP8dE9wxtdNx .cluster span{color:#333;}#mermaid-svg-noAYfP8dE9wxtdNx div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-noAYfP8dE9wxtdNx .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-noAYfP8dE9wxtdNx rect.text{fill:none;stroke-width:0;}#mermaid-svg-noAYfP8dE9wxtdNx .icon-shape,#mermaid-svg-noAYfP8dE9wxtdNx .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-noAYfP8dE9wxtdNx .icon-shape p,#mermaid-svg-noAYfP8dE9wxtdNx .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-noAYfP8dE9wxtdNx .icon-shape .label rect,#mermaid-svg-noAYfP8dE9wxtdNx .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-noAYfP8dE9wxtdNx .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-noAYfP8dE9wxtdNx .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-noAYfP8dE9wxtdNx :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} DSpark 半自回归
anchor + K-1 mask tokens
并行 backbone 5层

单次 forward
base logit 1..K
串行 Markov head

逐 token 修正
corrected logit 1..K
confidence head

每位置打分
DFlash 并行
anchor + K-1 mask tokens
draft 5层

单次 forward
logit 1..K 同时出
Eagle3 自回归
step 1
target hidden
draft 1层 + KV cache
logit 1
step 2
logit 2
...
logit K

图说明: Eagle3 用 1 层 draft + KV cache 串行生成 K 步,draft 耗时 O ( K ) O(K) O(K);DSpark/DFlash 用 5 层 draft + mask-token block 单次 forward 出 K 个 logit,draft 耗时 O ( 1 ) O(1) O(1)。DSpark 在 DFlash 的并行 backbone 之上加了一个极轻的串行 Markov head(参数量 O ( r V ) O(rV) O(rV), r = 256 r=256 r=256)和 confidence head,既保留并行吞吐又解决 suffix decay。

2.4 suffix decay 与 capacity advantage

论文 Figure 2 揭示了一个关键现象(Section 4.3.1):

  • 位置 1:并行 drafter(DFlash)显著高于自回归(Eagle3),因为并行可以用更深网络(5 层 vs 1 层),capacity advantage。例:Qwen3-4B 上 Math 任务 DFlash 0.88 vs Eagle3 0.81,Chat 任务 0.72 vs 0.53。
  • 位置 2--7:并行 drafter 快速衰减("multi-modal collision"------并行预测独立 marginalize 所有可能前驱,无法条件化于已采样 token),自回归 drafter 反而上升。

#mermaid-svg-yPRYRoXp5IudgOVr{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-yPRYRoXp5IudgOVr .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-yPRYRoXp5IudgOVr .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-yPRYRoXp5IudgOVr .error-icon{fill:#552222;}#mermaid-svg-yPRYRoXp5IudgOVr .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-yPRYRoXp5IudgOVr .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-yPRYRoXp5IudgOVr .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-yPRYRoXp5IudgOVr .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-yPRYRoXp5IudgOVr .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-yPRYRoXp5IudgOVr .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-yPRYRoXp5IudgOVr .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-yPRYRoXp5IudgOVr .marker{fill:#333333;stroke:#333333;}#mermaid-svg-yPRYRoXp5IudgOVr .marker.cross{stroke:#333333;}#mermaid-svg-yPRYRoXp5IudgOVr svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-yPRYRoXp5IudgOVr p{margin:0;}#mermaid-svg-yPRYRoXp5IudgOVr .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-yPRYRoXp5IudgOVr .cluster-label text{fill:#333;}#mermaid-svg-yPRYRoXp5IudgOVr .cluster-label span{color:#333;}#mermaid-svg-yPRYRoXp5IudgOVr .cluster-label span p{background-color:transparent;}#mermaid-svg-yPRYRoXp5IudgOVr .label text,#mermaid-svg-yPRYRoXp5IudgOVr span{fill:#333;color:#333;}#mermaid-svg-yPRYRoXp5IudgOVr .node rect,#mermaid-svg-yPRYRoXp5IudgOVr .node circle,#mermaid-svg-yPRYRoXp5IudgOVr .node ellipse,#mermaid-svg-yPRYRoXp5IudgOVr .node polygon,#mermaid-svg-yPRYRoXp5IudgOVr .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-yPRYRoXp5IudgOVr .rough-node .label text,#mermaid-svg-yPRYRoXp5IudgOVr .node .label text,#mermaid-svg-yPRYRoXp5IudgOVr .image-shape .label,#mermaid-svg-yPRYRoXp5IudgOVr .icon-shape .label{text-anchor:middle;}#mermaid-svg-yPRYRoXp5IudgOVr .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-yPRYRoXp5IudgOVr .rough-node .label,#mermaid-svg-yPRYRoXp5IudgOVr .node .label,#mermaid-svg-yPRYRoXp5IudgOVr .image-shape .label,#mermaid-svg-yPRYRoXp5IudgOVr .icon-shape .label{text-align:center;}#mermaid-svg-yPRYRoXp5IudgOVr .node.clickable{cursor:pointer;}#mermaid-svg-yPRYRoXp5IudgOVr .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-yPRYRoXp5IudgOVr .arrowheadPath{fill:#333333;}#mermaid-svg-yPRYRoXp5IudgOVr .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-yPRYRoXp5IudgOVr .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-yPRYRoXp5IudgOVr .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-yPRYRoXp5IudgOVr .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-yPRYRoXp5IudgOVr .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-yPRYRoXp5IudgOVr .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-yPRYRoXp5IudgOVr .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-yPRYRoXp5IudgOVr .cluster text{fill:#333;}#mermaid-svg-yPRYRoXp5IudgOVr .cluster span{color:#333;}#mermaid-svg-yPRYRoXp5IudgOVr div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-yPRYRoXp5IudgOVr .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-yPRYRoXp5IudgOVr rect.text{fill:none;stroke-width:0;}#mermaid-svg-yPRYRoXp5IudgOVr .icon-shape,#mermaid-svg-yPRYRoXp5IudgOVr .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-yPRYRoXp5IudgOVr .icon-shape p,#mermaid-svg-yPRYRoXp5IudgOVr .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-yPRYRoXp5IudgOVr .icon-shape .label rect,#mermaid-svg-yPRYRoXp5IudgOVr .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-yPRYRoXp5IudgOVr .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-yPRYRoXp5IudgOVr .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-yPRYRoXp5IudgOVr :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Position 2-7: suffix decay
DFlash 0.87→0.78
Eagle3 0.53→0.74
并行独立预测

无法条件化
Position 1: capacity 优势
DFlash 0.88
Eagle3 0.81
并行可用 5 层

自回归只能 1 层
DSpark: 并行主干 + 串行 head

两头都拿到

图说明: DSpark 的设计目标就是把两种范式的优势叠加:位置 1 用并行深网络拿到高 capacity,位置 2--7 用 Markov head 注入前缀依赖避免 suffix decay。论文实测 DSpark 在 Math 上从 0.93 起步且整个 block 稳定,比 DFlash 高出约 5--15 个百分点。

2.5 τ 与 acceptance rate 公式

acceptance rate(论文 Eq. 8):单个位置的接受概率等于 1 减去 draft 与 target 分布的 total variation 距离的一半:

c k ∗ = 1 − 1 2 ∥ p k d − p k t ∥ 1 c^*_k = 1 - \frac{1}{2}\|p^d_k - p^t_k\|_1 ck∗=1−21∥pkd−pkt∥1

代码实现见 dspark/loss.py:60-70(file:///workspace/deepspec/modeling/dspark/loss.py#L60-70) 的 _compute_accept_rate_3d。这个公式是 DSpark confidence head 的训练目标------直接用解析的接受率作为 soft label,不需要实际跑推测解码就能监督。

τ(accepted length):每轮验证平均被接受的 token 数(含 bonus),论文 Table 1 的核心指标。其期望值近似为:

τ ≈ 1 + ∑ k = 1 K ∏ i ≤ k c i \tau \approx 1 + \sum_{k=1}^{K} \prod_{i \leq k} c_i τ≈1+k=1∑Ki≤k∏ci

代码中 tau_prob_per_block = expected_draft_accepted + 1loss.py:40-57(file:///workspace/deepspec/modeling/dspark/loss.py#L40-57)),加 1 是因为 bonus token 总会被接受。

2.6 lossless 性质与 non-anticipating 约束

论文 Section 3.2.2 与 Appendix A 强调:为了不破坏无损性,调度器决策不能依赖未来 token。具体到 Algorithm 1:

  • 按 a r , j = ∏ i ≤ j c r , i a_{r,j} = \prod_{i \leq j} c_{r,i} ar,j=∏i≤jcr,i 全局排序候选
  • 贪心加入 token,更新 Θ = τ ⋅ SPS ( B ) \Theta = \tau \cdot \text{SPS}(B) Θ=τ⋅SPS(B)
  • 一旦 Θ \Theta Θ 下降就 break(早停保证 non-anticipating

如果做 retrospective 全局搜索(不早停),会泄露 x r , k x_{r,k} xr,k 的采样结果到 step k k k 的决策中,引入 selection bias。本仓库开源版只提供单请求的 confidence_threshold 截断近似(draft_ops.py:82-93(file:///workspace/deepspec/eval/dspark/draft_ops.py#L82-93)),生产版 Algorithm 1 在 HAI-LLM 内部实现。

2.7 三种 loss 的角色

DSpark 的训练目标(论文 Eq. 12):

L = α c e L c e + α t v L t v + α c o n f L c o n f , α c e = 0.1 , α t v = 0.9 , α c o n f = 1.0 L = \alpha_{ce} L_{ce} + \alpha_{tv} L_{tv} + \alpha_{conf} L_{conf}, \quad \alpha_{ce}=0.1, \alpha_{tv}=0.9, \alpha_{conf}=1.0 L=αceLce+αtvLtv+αconfLconf,αce=0.1,αtv=0.9,αconf=1.0
#mermaid-svg-hXpvcIH9Xt3TTqIu{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-hXpvcIH9Xt3TTqIu .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-hXpvcIH9Xt3TTqIu .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-hXpvcIH9Xt3TTqIu .error-icon{fill:#552222;}#mermaid-svg-hXpvcIH9Xt3TTqIu .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-hXpvcIH9Xt3TTqIu .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-hXpvcIH9Xt3TTqIu .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-hXpvcIH9Xt3TTqIu .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-hXpvcIH9Xt3TTqIu .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-hXpvcIH9Xt3TTqIu .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-hXpvcIH9Xt3TTqIu .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-hXpvcIH9Xt3TTqIu .marker{fill:#333333;stroke:#333333;}#mermaid-svg-hXpvcIH9Xt3TTqIu .marker.cross{stroke:#333333;}#mermaid-svg-hXpvcIH9Xt3TTqIu svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-hXpvcIH9Xt3TTqIu p{margin:0;}#mermaid-svg-hXpvcIH9Xt3TTqIu .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-hXpvcIH9Xt3TTqIu .cluster-label text{fill:#333;}#mermaid-svg-hXpvcIH9Xt3TTqIu .cluster-label span{color:#333;}#mermaid-svg-hXpvcIH9Xt3TTqIu .cluster-label span p{background-color:transparent;}#mermaid-svg-hXpvcIH9Xt3TTqIu .label text,#mermaid-svg-hXpvcIH9Xt3TTqIu span{fill:#333;color:#333;}#mermaid-svg-hXpvcIH9Xt3TTqIu .node rect,#mermaid-svg-hXpvcIH9Xt3TTqIu .node circle,#mermaid-svg-hXpvcIH9Xt3TTqIu .node ellipse,#mermaid-svg-hXpvcIH9Xt3TTqIu .node polygon,#mermaid-svg-hXpvcIH9Xt3TTqIu .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-hXpvcIH9Xt3TTqIu .rough-node .label text,#mermaid-svg-hXpvcIH9Xt3TTqIu .node .label text,#mermaid-svg-hXpvcIH9Xt3TTqIu .image-shape .label,#mermaid-svg-hXpvcIH9Xt3TTqIu .icon-shape .label{text-anchor:middle;}#mermaid-svg-hXpvcIH9Xt3TTqIu .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-hXpvcIH9Xt3TTqIu .rough-node .label,#mermaid-svg-hXpvcIH9Xt3TTqIu .node .label,#mermaid-svg-hXpvcIH9Xt3TTqIu .image-shape .label,#mermaid-svg-hXpvcIH9Xt3TTqIu .icon-shape .label{text-align:center;}#mermaid-svg-hXpvcIH9Xt3TTqIu .node.clickable{cursor:pointer;}#mermaid-svg-hXpvcIH9Xt3TTqIu .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-hXpvcIH9Xt3TTqIu .arrowheadPath{fill:#333333;}#mermaid-svg-hXpvcIH9Xt3TTqIu .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-hXpvcIH9Xt3TTqIu .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-hXpvcIH9Xt3TTqIu .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-hXpvcIH9Xt3TTqIu .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-hXpvcIH9Xt3TTqIu .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-hXpvcIH9Xt3TTqIu .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-hXpvcIH9Xt3TTqIu .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-hXpvcIH9Xt3TTqIu .cluster text{fill:#333;}#mermaid-svg-hXpvcIH9Xt3TTqIu .cluster span{color:#333;}#mermaid-svg-hXpvcIH9Xt3TTqIu div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-hXpvcIH9Xt3TTqIu .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-hXpvcIH9Xt3TTqIu rect.text{fill:none;stroke-width:0;}#mermaid-svg-hXpvcIH9Xt3TTqIu .icon-shape,#mermaid-svg-hXpvcIH9Xt3TTqIu .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-hXpvcIH9Xt3TTqIu .icon-shape p,#mermaid-svg-hXpvcIH9Xt3TTqIu .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-hXpvcIH9Xt3TTqIu .icon-shape .label rect,#mermaid-svg-hXpvcIH9Xt3TTqIu .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-hXpvcIH9Xt3TTqIu .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-hXpvcIH9Xt3TTqIu .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-hXpvcIH9Xt3TTqIu :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} backbone hidden
Lce: CE 损失

预测正确 token
Ltv: TV 距离

对齐 target 分布
confidence head
Lconf: BCE

预测接受率
加权求和
backward

图说明: Lce 是标准交叉熵,Ltv 直接惩罚 acceptance rate 的对偶(total variation),Lconf 监督 confidence head。 α t v = 0.9 \alpha_{tv}=0.9 αtv=0.9 远大于 α c e = 0.1 \alpha_{ce}=0.1 αce=0.1,说明 DSpark 把"对齐 target 分布"放在比"预测正确 token"更重的位置------因为对齐分布直接最大化期望接受率。位置权重 w k = exp ⁡ ( − ( k − 1 ) / γ ) w_k = \exp(-(k-1)/\gamma) wk=exp(−(k−1)/γ) 让靠前位置权重更大(loss.py:25-37(file:///workspace/deepspec/modeling/dspark/loss.py#L25-37)),因为前缀 token 决定整 block 是否被接受。


小结段(总)

本篇建立了三个直觉:

  1. 并行验证便宜、自回归贵:因为权重搬运是瓶颈,并行把 K 个 token 打包成一次 forward。
  2. 并行 drafter 有 suffix decay:并行预测无法条件化已采样 token,position 2--7 接受率快速衰减;自回归 drafter 反而稳定但 position 1 capacity 受限。
  3. DSpark = 并行主干 + 串行 Markov head + confidence head:用并行拿吞吐、用串行解决 decay、用 confidence 智能截断验证。所有 loss 项都与 acceptance rate 直接相关(CE/TV/BCE),训练目标与评测目标对齐。

这套原理在 03 DSpark 建模 中落地为 13 步 forward 流程;在 07 评测系统 中落地为 verify_draft_tokens 循环;在 08 实验复现 中复现为 Table 1 的 9×4 矩阵。

延伸阅读 :进入 03 DSpark 建模 看半自回归具体怎么实现,或读 04 Eagle3 建模 对照自回归范式。论文原文 Section 2(Preliminaries)与 Section 3.1(Semi-Autoregressive Generation)在 DSpark_paper.pdf(file:///workspace/DSpark_paper.pdf)。