08_DeepSpec-DSpark-实验复现_论文对照与流程详解

08 · 实验复现:论文对照与流程详解

本篇是用户特别强调"要详细、要单独成节"的实验文档,按 DSpark 论文 Section 4--5 全面对照。前序 02 核心原理--07 评测系统 已讲清代码实现,本篇回答"如何用这套代码复现论文实验"。共 8 节,每节配 Mermaid 流程图与文字说明,含完整命令矩阵、文件清单与踩坑清单。


总览段(总)

DSpark 论文实验分两部分:Section 4 离线基准实验 (draft 质量 + ablation)+ Section 5 生产部署实验(DeepSeek-V4 在线流量验证)。本仓库代码完整覆盖 Section 4 全部实验,Section 5 的 HAI-LLM/ZOS 部分在生产侧(开源代码侧提供数据生成与训练框架)。
#mermaid-svg-R4p7riRpjjvLR0WO{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-R4p7riRpjjvLR0WO .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-R4p7riRpjjvLR0WO .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-R4p7riRpjjvLR0WO .error-icon{fill:#552222;}#mermaid-svg-R4p7riRpjjvLR0WO .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-R4p7riRpjjvLR0WO .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-R4p7riRpjjvLR0WO .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-R4p7riRpjjvLR0WO .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-R4p7riRpjjvLR0WO .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-R4p7riRpjjvLR0WO .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-R4p7riRpjjvLR0WO .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-R4p7riRpjjvLR0WO .marker{fill:#333333;stroke:#333333;}#mermaid-svg-R4p7riRpjjvLR0WO .marker.cross{stroke:#333333;}#mermaid-svg-R4p7riRpjjvLR0WO svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-R4p7riRpjjvLR0WO p{margin:0;}#mermaid-svg-R4p7riRpjjvLR0WO .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-R4p7riRpjjvLR0WO .cluster-label text{fill:#333;}#mermaid-svg-R4p7riRpjjvLR0WO .cluster-label span{color:#333;}#mermaid-svg-R4p7riRpjjvLR0WO .cluster-label span p{background-color:transparent;}#mermaid-svg-R4p7riRpjjvLR0WO .label text,#mermaid-svg-R4p7riRpjjvLR0WO span{fill:#333;color:#333;}#mermaid-svg-R4p7riRpjjvLR0WO .node rect,#mermaid-svg-R4p7riRpjjvLR0WO .node circle,#mermaid-svg-R4p7riRpjjvLR0WO .node ellipse,#mermaid-svg-R4p7riRpjjvLR0WO .node polygon,#mermaid-svg-R4p7riRpjjvLR0WO .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-R4p7riRpjjvLR0WO .rough-node .label text,#mermaid-svg-R4p7riRpjjvLR0WO .node .label text,#mermaid-svg-R4p7riRpjjvLR0WO .image-shape .label,#mermaid-svg-R4p7riRpjjvLR0WO .icon-shape .label{text-anchor:middle;}#mermaid-svg-R4p7riRpjjvLR0WO .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-R4p7riRpjjvLR0WO .rough-node .label,#mermaid-svg-R4p7riRpjjvLR0WO .node .label,#mermaid-svg-R4p7riRpjjvLR0WO .image-shape .label,#mermaid-svg-R4p7riRpjjvLR0WO .icon-shape .label{text-align:center;}#mermaid-svg-R4p7riRpjjvLR0WO .node.clickable{cursor:pointer;}#mermaid-svg-R4p7riRpjjvLR0WO .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-R4p7riRpjjvLR0WO .arrowheadPath{fill:#333333;}#mermaid-svg-R4p7riRpjjvLR0WO .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-R4p7riRpjjvLR0WO .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-R4p7riRpjjvLR0WO .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-R4p7riRpjjvLR0WO .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-R4p7riRpjjvLR0WO .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-R4p7riRpjjvLR0WO .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-R4p7riRpjjvLR0WO .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-R4p7riRpjjvLR0WO .cluster text{fill:#333;}#mermaid-svg-R4p7riRpjjvLR0WO .cluster span{color:#333;}#mermaid-svg-R4p7riRpjjvLR0WO div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-R4p7riRpjjvLR0WO .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-R4p7riRpjjvLR0WO rect.text{fill:none;stroke-width:0;}#mermaid-svg-R4p7riRpjjvLR0WO .icon-shape,#mermaid-svg-R4p7riRpjjvLR0WO .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-R4p7riRpjjvLR0WO .icon-shape p,#mermaid-svg-R4p7riRpjjvLR0WO .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-R4p7riRpjjvLR0WO .icon-shape rect,#mermaid-svg-R4p7riRpjjvLR0WO .image-shape rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-R4p7riRpjjvLR0WO .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-R4p7riRpjjvLR0WO .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-R4p7riRpjjvLR0WO :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 4. 指标聚合 跨 rank allreduce
宏平均 τ per domain
对照 Table 1 / Figure 2-6
3. 评测 9 benchmark eval.py + 9 个 JSONL
acceptance_length τ 收集
confidence 校准 ECE/AUROC
2. 训练 draft config 选择算法
train.py + FSDP + bf16
10 epoch 收敛

  1. 数据生成 download_and_split.py

open-perfectblend 1.3M
generate_train_data.py

sglang 重生成回答
prepare_target_cache.py

forward hook 抓 hidden
0. 环境准备 8×A100/H100 GPU
38TB 磁盘 for target cache
sglang server

图说明: 实验全流程分 5 阶段,每阶段产物是下阶段输入。环境准备是硬约束(8 卡 + 38TB);数据生成耗时最长(sglang 重生成 + target cache);训练 10 epoch(论文 Section 4.1);评测 9 benchmark;聚合产出论文 Table 1 与 Figure 2-6。


分述段(分)

8.1 Section 4.1 实验设置对照

论文 Section 4.1 描述的实验设置在代码中的对应:

论文设置 代码位置 复现命令
4 个 target:Qwen3-{4B,8B,14B}、Gemma4-12B config/{dspark,eagle3,dflash}/*_{qwen3_4b,qwen3_8b,qwen3_14b,gemma4_12b}.py --config config/dspark/dspark_qwen3_4b.py
Eagle3 ttt_length=7 与 DSpark block_size=7 对齐 config/eagle3/eagle3_qwen3_4b.py(file:///workspace/config/eagle3/eagle3_qwen3_4b.py) 的 ttt_length=7config/dspark/dspark_qwen3_4b.py(file:///workspace/config/dspark/dspark_qwen3_4b.py) 的 block_size=7 -
Eagle3 draft 1 层、DSpark/DFlash draft 5 层 draft_num_hidden_layers=1 / num_draft_layers=5 -
同样的 target-model feature layers target_layer_ids=[1, 9, 17, 25, 33] -
Open-PerfectBlend 1.3M 样本 scripts/data/download_and_split.py(file:///workspace/scripts/data/download_and_split.py) python scripts/data/download_and_split.py --dataset-name mlabonne/open-perfectblend ...
chat 17.6% / math 39.4% / code 38.9% / instruction 4.1% 数据集自带 -
用各 target 推荐采样参数重新生成回答 scripts/data/generate_train_data.py(file:///workspace/scripts/data/generate_train_data.py) python scripts/data/generate_train_data.py --model Qwen/Qwen3-4B --temperature 0.7 --top-p 0.8 --top-k 20 --min-p 0 ...
non-thinking 模式 --disable-thinkingparser.py(file:///workspace/deepspec/data/parser.py) 中 gemma4 的 assistant_loss_prefix -
10 epochs 全收敛 num_train_epochs=10config/dspark/dspark_qwen3_4b.py:39(file:///workspace/config/dspark/dspark_qwen3_4b.py#L39)) -
DSpark 默认 Markov head markov_head_type='vanilla'config/dspark/dspark_qwen3_4b.py:19(file:///workspace/config/dspark/dspark_qwen3_4b.py#L19)) -
9 个 benchmark 三领域 eval.py:18-28(file:///workspace/eval.py#L18-28) 的 TASKS -
temperature=1.0 评测 --temperature 1.0eval.py:35-36(file:///workspace/eval.py#L35-36)) -
chain-based drafting BaseEvaluator 默认 chain -

8.2 Section 4.2 主结果复现(Table 1)

论文 Table 1 报告 4 target × 9 benchmark × 3 drafter = 108 个 cell 的 accepted length τ。
#mermaid-svg-ECRwVVCfp40wTtI8{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-ECRwVVCfp40wTtI8 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-ECRwVVCfp40wTtI8 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-ECRwVVCfp40wTtI8 .error-icon{fill:#552222;}#mermaid-svg-ECRwVVCfp40wTtI8 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-ECRwVVCfp40wTtI8 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-ECRwVVCfp40wTtI8 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-ECRwVVCfp40wTtI8 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-ECRwVVCfp40wTtI8 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-ECRwVVCfp40wTtI8 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-ECRwVVCfp40wTtI8 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-ECRwVVCfp40wTtI8 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-ECRwVVCfp40wTtI8 .marker.cross{stroke:#333333;}#mermaid-svg-ECRwVVCfp40wTtI8 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-ECRwVVCfp40wTtI8 p{margin:0;}#mermaid-svg-ECRwVVCfp40wTtI8 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-ECRwVVCfp40wTtI8 .cluster-label text{fill:#333;}#mermaid-svg-ECRwVVCfp40wTtI8 .cluster-label span{color:#333;}#mermaid-svg-ECRwVVCfp40wTtI8 .cluster-label span p{background-color:transparent;}#mermaid-svg-ECRwVVCfp40wTtI8 .label text,#mermaid-svg-ECRwVVCfp40wTtI8 span{fill:#333;color:#333;}#mermaid-svg-ECRwVVCfp40wTtI8 .node rect,#mermaid-svg-ECRwVVCfp40wTtI8 .node circle,#mermaid-svg-ECRwVVCfp40wTtI8 .node ellipse,#mermaid-svg-ECRwVVCfp40wTtI8 .node polygon,#mermaid-svg-ECRwVVCfp40wTtI8 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-ECRwVVCfp40wTtI8 .rough-node .label text,#mermaid-svg-ECRwVVCfp40wTtI8 .node .label text,#mermaid-svg-ECRwVVCfp40wTtI8 .image-shape .label,#mermaid-svg-ECRwVVCfp40wTtI8 .icon-shape .label{text-anchor:middle;}#mermaid-svg-ECRwVVCfp40wTtI8 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-ECRwVVCfp40wTtI8 .rough-node .label,#mermaid-svg-ECRwVVCfp40wTtI8 .node .label,#mermaid-svg-ECRwVVCfp40wTtI8 .image-shape .label,#mermaid-svg-ECRwVVCfp40wTtI8 .icon-shape .label{text-align:center;}#mermaid-svg-ECRwVVCfp40wTtI8 .node.clickable{cursor:pointer;}#mermaid-svg-ECRwVVCfp40wTtI8 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-ECRwVVCfp40wTtI8 .arrowheadPath{fill:#333333;}#mermaid-svg-ECRwVVCfp40wTtI8 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-ECRwVVCfp40wTtI8 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-ECRwVVCfp40wTtI8 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-ECRwVVCfp40wTtI8 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-ECRwVVCfp40wTtI8 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-ECRwVVCfp40wTtI8 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-ECRwVVCfp40wTtI8 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-ECRwVVCfp40wTtI8 .cluster text{fill:#333;}#mermaid-svg-ECRwVVCfp40wTtI8 .cluster span{color:#333;}#mermaid-svg-ECRwVVCfp40wTtI8 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-ECRwVVCfp40wTtI8 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-ECRwVVCfp40wTtI8 rect.text{fill:none;stroke-width:0;}#mermaid-svg-ECRwVVCfp40wTtI8 .icon-shape,#mermaid-svg-ECRwVVCfp40wTtI8 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-ECRwVVCfp40wTtI8 .icon-shape p,#mermaid-svg-ECRwVVCfp40wTtI8 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-ECRwVVCfp40wTtI8 .icon-shape rect,#mermaid-svg-ECRwVVCfp40wTtI8 .image-shape rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-ECRwVVCfp40wTtI8 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-ECRwVVCfp40wTtI8 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-ECRwVVCfp40wTtI8 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 输出
per-sample τ JSONL
跨 rank allreduce
宏平均 per domain
每个 cell 命令模板
python eval.py

--target_name_or_path Qwen/Qwen3-4B

--draft_name_or_path deepseek-ai/dspark_qwen3_4b_block7

--temperature 1.0
108 cell 复现矩阵
Qwen3-4B

9 benchmark × 3 drafter
Qwen3-8B

9 benchmark × 3 drafter
Qwen3-14B

9 benchmark × 3 drafter
Gemma4-12B

9 benchmark × 3 drafter

图说明: Table 1 共 108 个 cell,每 cell 跑一次 eval.py。可复用 released checkpoint(README.md:55-62(file:///workspace/README.md#L55-62))省去训练。每 cell 输出 per-sample τ,跨 8 rank allreduce 聚合,再按 domain(Math/Code/Chat)算宏平均。论文 Qwen3-4B 行已从 PDF 提取确认:

Target=Qwen3-4B GSM8K MATH AIME25 MBPP HumanEval LCB MT-Bench Alpaca Arena-Hard
Eagle3 5.14 4.62 3.92 3.69 4.16 3.77 2.39 2.26 2.55
DFlash 5.40 4.85 4.15 4.40 4.74 4.18 3.07 2.96 2.83
DSpark 6.11 5.70 4.89 5.13 5.38 4.86 3.64 3.54 3.29
复现命令矩阵(Qwen3-4B 行示例)
bash 复制代码
# Eagle3 cell(9 个 benchmark 一次跑完)
python eval.py \
    --target_name_or_path Qwen/Qwen3-4B \
    --draft_name_or_path deepseek-ai/eagle3_qwen3_4b_ttt7 \
    --temperature 1.0 \
    --max-new-tokens 2048

# DFlash cell
python eval.py \
    --target_name_or_path Qwen/Qwen3-4B \
    --draft_name_or_path deepseek-ai/dflash_qwen3_4b_block7 \
    --temperature 1.0

# DSpark cell
python eval.py \
    --target_name_or_path Qwen/Qwen3-4B \
    --draft_name_or_path deepseek-ai/dspark_qwen3_4b_block7 \
    --temperature 1.0

切换 target 只需改 --target_name_or_path--draft_name_or_patheval.py 自动通过 architectures 字段路由 evaluator(eval.py:50-57(file:///workspace/eval.py#L50-57))。

宏平均 τ 计算与代码位置

τ 跨 rank 聚合在 `allreduce_response_metrics`(file:///workspace/deepspec/eval/base_evaluator.py) (base_evaluator.py:550-630(file:///workspace/deepspec/eval/base_evaluator.py#L550-630)):

python 复制代码
acceptance_length_sum = all_reduce(SUM, acceptance_length_sum)
proposal_count = all_reduce(SUM, proposal_count)
tau = acceptance_length_sum / proposal_count

domain 宏平均 = mean(tau_gsm8k, tau_math500, tau_aime25)(Math)/ mean(tau_mbpp, tau_humaneval, tau_livecodebench)(Code)/ mean(tau_mt_bench, tau_alpaca, tau_arena_hard_v2)(Chat)。

论文相对提升复算

论文摘要的 "DSpark 相对 Eagle3 提升 30.9%(Qwen3-4B)"复算方法:

  • Math 宏平均:DSpark=(6.11+5.70+4.89)/3=5.57,Eagle3=(5.14+4.62+3.92)/3=4.56,提升=(5.57-4.56)/4.56≈22.1%
  • 全 9 benchmark 宏平均:DSpark≈4.68,Eagle3≈3.61,提升≈29.6% ≈ 30.9%(与论文一致)

8.3 Section 4.3.1 Position-wise acceptance(Figure 2)

论文 Figure 2 报告三领域内每个 draft 位置的 conditional acceptance rate,揭示 suffix decay 现象。
#mermaid-svg-TvQJYP5oaquY8UbD{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-TvQJYP5oaquY8UbD .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-TvQJYP5oaquY8UbD .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-TvQJYP5oaquY8UbD .error-icon{fill:#552222;}#mermaid-svg-TvQJYP5oaquY8UbD .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-TvQJYP5oaquY8UbD .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-TvQJYP5oaquY8UbD .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-TvQJYP5oaquY8UbD .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-TvQJYP5oaquY8UbD .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-TvQJYP5oaquY8UbD .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-TvQJYP5oaquY8UbD .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-TvQJYP5oaquY8UbD .marker{fill:#333333;stroke:#333333;}#mermaid-svg-TvQJYP5oaquY8UbD .marker.cross{stroke:#333333;}#mermaid-svg-TvQJYP5oaquY8UbD svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-TvQJYP5oaquY8UbD p{margin:0;}#mermaid-svg-TvQJYP5oaquY8UbD .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-TvQJYP5oaquY8UbD .cluster-label text{fill:#333;}#mermaid-svg-TvQJYP5oaquY8UbD .cluster-label span{color:#333;}#mermaid-svg-TvQJYP5oaquY8UbD .cluster-label span p{background-color:transparent;}#mermaid-svg-TvQJYP5oaquY8UbD .label text,#mermaid-svg-TvQJYP5oaquY8UbD span{fill:#333;color:#333;}#mermaid-svg-TvQJYP5oaquY8UbD .node rect,#mermaid-svg-TvQJYP5oaquY8UbD .node circle,#mermaid-svg-TvQJYP5oaquY8UbD .node ellipse,#mermaid-svg-TvQJYP5oaquY8UbD .node polygon,#mermaid-svg-TvQJYP5oaquY8UbD .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-TvQJYP5oaquY8UbD .rough-node .label text,#mermaid-svg-TvQJYP5oaquY8UbD .node .label text,#mermaid-svg-TvQJYP5oaquY8UbD .image-shape .label,#mermaid-svg-TvQJYP5oaquY8UbD .icon-shape .label{text-anchor:middle;}#mermaid-svg-TvQJYP5oaquY8UbD .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-TvQJYP5oaquY8UbD .rough-node .label,#mermaid-svg-TvQJYP5oaquY8UbD .node .label,#mermaid-svg-TvQJYP5oaquY8UbD .image-shape .label,#mermaid-svg-TvQJYP5oaquY8UbD .icon-shape .label{text-align:center;}#mermaid-svg-TvQJYP5oaquY8UbD .node.clickable{cursor:pointer;}#mermaid-svg-TvQJYP5oaquY8UbD .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-TvQJYP5oaquY8UbD .arrowheadPath{fill:#333333;}#mermaid-svg-TvQJYP5oaquY8UbD .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-TvQJYP5oaquY8UbD .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-TvQJYP5oaquY8UbD .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-TvQJYP5oaquY8UbD .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-TvQJYP5oaquY8UbD .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-TvQJYP5oaquY8UbD .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-TvQJYP5oaquY8UbD .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-TvQJYP5oaquY8UbD .cluster text{fill:#333;}#mermaid-svg-TvQJYP5oaquY8UbD .cluster span{color:#333;}#mermaid-svg-TvQJYP5oaquY8UbD div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-TvQJYP5oaquY8UbD .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-TvQJYP5oaquY8UbD rect.text{fill:none;stroke-width:0;}#mermaid-svg-TvQJYP5oaquY8UbD .icon-shape,#mermaid-svg-TvQJYP5oaquY8UbD .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-TvQJYP5oaquY8UbD .icon-shape p,#mermaid-svg-TvQJYP5oaquY8UbD .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-TvQJYP5oaquY8UbD .icon-shape rect,#mermaid-svg-TvQJYP5oaquY8UbD .image-shape rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-TvQJYP5oaquY8UbD .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-TvQJYP5oaquY8UbD .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-TvQJYP5oaquY8UbD :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 绘图
按 domain 聚合
Plot position 1-7 曲线
三算法叠图对照
计算口径
Conditional acceptance:

分子=该位置 accept 数

分母=前 k-1 全 accept 的实例数
vs Prefix survival:

分母=所有 propose 实例
代码收集路径
verify_draft_tokens

记录每位置 accept
build_metrics_row

accept_rates_by_position
allreduce_response_metrics

跨 rank 聚合

图说明: Figure 2 关键差异是 conditional acceptance(分母只算前 k-1 全 accept 的实例)与 prefix survival(分母算所有 propose 实例)的区别。conditional 指标隔离了 baseline predictive quality,避免前缀错误的连带惩罚。代码 accept_rates_by_positionbase_evaluator.py:469-511(file:///workspace/deepspec/eval/base_evaluator.py#L469-511) 收集。论文实测:DFlash 从 0.88 衰减到 0.78(Code)、0.72 衰减到 0.63(Chat);Eagle3 反而从 0.53 上升到 0.74(Chat);DSpark 从 0.93 起步且整个 block 稳定。

8.4 Section 4.3.2 Drafter depth & proposal length(Figure 3、4)

论文 Figure 3 研究 drafter 层数(1/2/5)对 τ 的影响;Figure 4 研究不同 proposal length(4/8/12/16)下三算法对比 + latency 开销。
#mermaid-svg-BuMvvVJBjzcwkErB{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-BuMvvVJBjzcwkErB .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-BuMvvVJBjzcwkErB .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-BuMvvVJBjzcwkErB .error-icon{fill:#552222;}#mermaid-svg-BuMvvVJBjzcwkErB .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-BuMvvVJBjzcwkErB .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-BuMvvVJBjzcwkErB .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-BuMvvVJBjzcwkErB .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-BuMvvVJBjzcwkErB .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-BuMvvVJBjzcwkErB .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-BuMvvVJBjzcwkErB .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-BuMvvVJBjzcwkErB .marker{fill:#333333;stroke:#333333;}#mermaid-svg-BuMvvVJBjzcwkErB .marker.cross{stroke:#333333;}#mermaid-svg-BuMvvVJBjzcwkErB svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-BuMvvVJBjzcwkErB p{margin:0;}#mermaid-svg-BuMvvVJBjzcwkErB .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-BuMvvVJBjzcwkErB .cluster-label text{fill:#333;}#mermaid-svg-BuMvvVJBjzcwkErB .cluster-label span{color:#333;}#mermaid-svg-BuMvvVJBjzcwkErB .cluster-label span p{background-color:transparent;}#mermaid-svg-BuMvvVJBjzcwkErB .label text,#mermaid-svg-BuMvvVJBjzcwkErB span{fill:#333;color:#333;}#mermaid-svg-BuMvvVJBjzcwkErB .node rect,#mermaid-svg-BuMvvVJBjzcwkErB .node circle,#mermaid-svg-BuMvvVJBjzcwkErB .node ellipse,#mermaid-svg-BuMvvVJBjzcwkErB .node polygon,#mermaid-svg-BuMvvVJBjzcwkErB .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-BuMvvVJBjzcwkErB .rough-node .label text,#mermaid-svg-BuMvvVJBjzcwkErB .node .label text,#mermaid-svg-BuMvvVJBjzcwkErB .image-shape .label,#mermaid-svg-BuMvvVJBjzcwkErB .icon-shape .label{text-anchor:middle;}#mermaid-svg-BuMvvVJBjzcwkErB .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-BuMvvVJBjzcwkErB .rough-node .label,#mermaid-svg-BuMvvVJBjzcwkErB .node .label,#mermaid-svg-BuMvvVJBjzcwkErB .image-shape .label,#mermaid-svg-BuMvvVJBjzcwkErB .icon-shape .label{text-align:center;}#mermaid-svg-BuMvvVJBjzcwkErB .node.clickable{cursor:pointer;}#mermaid-svg-BuMvvVJBjzcwkErB .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-BuMvvVJBjzcwkErB .arrowheadPath{fill:#333333;}#mermaid-svg-BuMvvVJBjzcwkErB .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-BuMvvVJBjzcwkErB .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-BuMvvVJBjzcwkErB .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-BuMvvVJBjzcwkErB .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-BuMvvVJBjzcwkErB .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-BuMvvVJBjzcwkErB .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-BuMvvVJBjzcwkErB .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-BuMvvVJBjzcwkErB .cluster text{fill:#333;}#mermaid-svg-BuMvvVJBjzcwkErB .cluster span{color:#333;}#mermaid-svg-BuMvvVJBjzcwkErB div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-BuMvvVJBjzcwkErB .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-BuMvvVJBjzcwkErB rect.text{fill:none;stroke-width:0;}#mermaid-svg-BuMvvVJBjzcwkErB .icon-shape,#mermaid-svg-BuMvvVJBjzcwkErB .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-BuMvvVJBjzcwkErB .icon-shape p,#mermaid-svg-BuMvvVJBjzcwkErB .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-BuMvvVJBjzcwkErB .icon-shape rect,#mermaid-svg-BuMvvVJBjzcwkErB .image-shape rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-BuMvvVJBjzcwkErB .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-BuMvvVJBjzcwkErB .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-BuMvvVJBjzcwkErB :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Figure 4 latency 测量
batch=128
ctx 长度 512/1024/2048/4096 平均
结论: γ=4→16 延迟仅增 0.2%-1.3%
Figure 4 proposal length 实验
固定 num_draft_layers=5
变 block_size ∈ 4,8,12,16
DSpark 测 markov + rnn 两版
结论: γ 越大 DSpark 优势越显著
Figure 3 drafter depth 实验
固定 block_size=7
变 num_draft_layers ∈ 1,2,5
对照 5 层 DFlash baseline
结论: 2 层 DSpark 已超 5 层 DFlash

图说明: Figure 3、4 都用 Qwen3-4B target,配置变更通过 --opts 实现:

bash 复制代码
# Figure 3: drafter depth 扫描
for depth in 1 2 5; do
  python train.py --config config/dspark/dspark_qwen3_4b.py \
    --opts "model.num_draft_layers=${depth}" "data.target_cache_path=${target_cache_dir}"
  # 训练完评测
  python eval.py --target_name_or_path Qwen/Qwen3-4B \
    --draft_name_or_path ~/checkpoints/deepspec/dspark_block7_qwen3_4b_d${depth}/step_latest
done

# Figure 4: proposal length 扫描
for gamma in 4 8 12 16; do
  python train.py --config config/dspark/dspark_qwen3_4b.py \
    --opts "model.block_size=${gamma}" "data.target_cache_path=${target_cache_dir}"
  python eval.py --target_name_or_path Qwen/Qwen3-4B \
    --draft_name_or_path ~/checkpoints/deepspec/dspark_block${gamma}_qwen3_4b/step_latest
done

# Figure 4: RNN head 变体
python train.py --config config/dspark/dspark_qwen3_4b.py \
  --opts "model.markov_head_type=rnn" "data.target_cache_path=${target_cache_dir}"

论文关键结论复现

  • DSpark 在 γ=7 时相对 DFlash 提升 16%(Math)/ 15%(Code)/ 18%(Chat)
  • γ=15 时提升扩大到 30%/26%/22%
  • RNN head 相比 Markov head 只在长 proposal length 有边际增益,故默认用 Markov
  • 串行 head 延迟开销可忽略(γ=16 时仅 1.3%)

8.5 Section 4.3.3 Confidence head(Figure 5、6)

论文 Figure 5 是 confidence threshold sweep;Figure 6 是 reliability diagram(ECE 校准)。
#mermaid-svg-KZFmZle42X0HsFC2{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-KZFmZle42X0HsFC2 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-KZFmZle42X0HsFC2 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-KZFmZle42X0HsFC2 .error-icon{fill:#552222;}#mermaid-svg-KZFmZle42X0HsFC2 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-KZFmZle42X0HsFC2 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-KZFmZle42X0HsFC2 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-KZFmZle42X0HsFC2 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-KZFmZle42X0HsFC2 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-KZFmZle42X0HsFC2 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-KZFmZle42X0HsFC2 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-KZFmZle42X0HsFC2 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-KZFmZle42X0HsFC2 .marker.cross{stroke:#333333;}#mermaid-svg-KZFmZle42X0HsFC2 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-KZFmZle42X0HsFC2 p{margin:0;}#mermaid-svg-KZFmZle42X0HsFC2 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-KZFmZle42X0HsFC2 .cluster-label text{fill:#333;}#mermaid-svg-KZFmZle42X0HsFC2 .cluster-label span{color:#333;}#mermaid-svg-KZFmZle42X0HsFC2 .cluster-label span p{background-color:transparent;}#mermaid-svg-KZFmZle42X0HsFC2 .label text,#mermaid-svg-KZFmZle42X0HsFC2 span{fill:#333;color:#333;}#mermaid-svg-KZFmZle42X0HsFC2 .node rect,#mermaid-svg-KZFmZle42X0HsFC2 .node circle,#mermaid-svg-KZFmZle42X0HsFC2 .node ellipse,#mermaid-svg-KZFmZle42X0HsFC2 .node polygon,#mermaid-svg-KZFmZle42X0HsFC2 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-KZFmZle42X0HsFC2 .rough-node .label text,#mermaid-svg-KZFmZle42X0HsFC2 .node .label text,#mermaid-svg-KZFmZle42X0HsFC2 .image-shape .label,#mermaid-svg-KZFmZle42X0HsFC2 .icon-shape .label{text-anchor:middle;}#mermaid-svg-KZFmZle42X0HsFC2 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-KZFmZle42X0HsFC2 .rough-node .label,#mermaid-svg-KZFmZle42X0HsFC2 .node .label,#mermaid-svg-KZFmZle42X0HsFC2 .image-shape .label,#mermaid-svg-KZFmZle42X0HsFC2 .icon-shape .label{text-align:center;}#mermaid-svg-KZFmZle42X0HsFC2 .node.clickable{cursor:pointer;}#mermaid-svg-KZFmZle42X0HsFC2 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-KZFmZle42X0HsFC2 .arrowheadPath{fill:#333333;}#mermaid-svg-KZFmZle42X0HsFC2 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-KZFmZle42X0HsFC2 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-KZFmZle42X0HsFC2 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-KZFmZle42X0HsFC2 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-KZFmZle42X0HsFC2 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-KZFmZle42X0HsFC2 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-KZFmZle42X0HsFC2 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-KZFmZle42X0HsFC2 .cluster text{fill:#333;}#mermaid-svg-KZFmZle42X0HsFC2 .cluster span{color:#333;}#mermaid-svg-KZFmZle42X0HsFC2 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-KZFmZle42X0HsFC2 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-KZFmZle42X0HsFC2 rect.text{fill:none;stroke-width:0;}#mermaid-svg-KZFmZle42X0HsFC2 .icon-shape,#mermaid-svg-KZFmZle42X0HsFC2 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-KZFmZle42X0HsFC2 .icon-shape p,#mermaid-svg-KZFmZle42X0HsFC2 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-KZFmZle42X0HsFC2 .icon-shape rect,#mermaid-svg-KZFmZle42X0HsFC2 .image-shape rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-KZFmZle42X0HsFC2 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-KZFmZle42X0HsFC2 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-KZFmZle42X0HsFC2 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Figure 6 reliability diagram
threshold=0.0 跑一次
ConfidenceHeadRecorder 收集

per-position 直方图
计算 ECE / AUROC / Brier
plot_reliability_diagram 输出 PNG
结论: raw ECE 3%-8%

AUROC 0.81-0.91
Figure 5 threshold sweep
threshold ∈ 0.0, 0.2, 0.4, 0.6, 0.8
每 threshold 跑 eval.py

--confidence-threshold X
记录 avg_tokens_per_step

  • acceptance_rate
    结论: Chat 提升 45.7%→95.7%

Math 76.9%→92.5%

Code 67.6%→92.0%

图说明: Figure 5 sweep 脚本:对 --confidence-threshold 从 0.0 到 0.8 逐步扫描,每点记录 avg_tokens_per_step(柱状图)与 acceptance_rate(折线)。Figure 6 在 threshold=0.0 时启用 ConfidenceHeadRecorder,输出 metrics.json + reliability_*.png。论文实测 raw ECE 约 3%-8%,AUROC 0.81-0.91(图说明:position 1 ECE 5.7%、position 7 ECE 3.3%)。

Sweep 脚本示例
bash 复制代码
# Figure 5: confidence threshold sweep
for thr in 0.0 0.2 0.4 0.6 0.8; do
  python eval.py \
    --target_name_or_path Qwen/Qwen3-4B \
    --draft_name_or_path deepseek-ai/dspark_qwen3_4b_block7 \
    --confidence-threshold ${thr} \
    --temperature 1.0 \
    --tensorboard-dir logs/sweep_thr_${thr}
done

# Figure 6: reliability diagram (threshold 必须为 0)
python eval.py \
  --target_name_or_path Qwen/Qwen3-4B \
  --draft_name_or_path deepseek-ai/dspark_qwen3_4b_block7 \
  --confidence-threshold 0.0 \
  --temperature 1.0
# 输出 metrics.json + reliability_*.png
STS 校准说明

重要 :STS(Sequential Temperature Scaling)在论文 Section 3.2.1 描述,但在生产 HAI-LLM 内部实现,本仓库无对应代码 。本仓库 ConfidenceHeadRecorder 提供 raw confidence 的 ECE/AUROC/Brier 评估,可作为离线 STS 校准的输入数据。论文 Figure 6 显示 STS 后 ECE 降到 ~1%,本仓库代码不直接产出 STS 校准后的指标。

8.6 Section 5 生产部署

论文 Section 5 描述 DeepSeek-V4-Flash/Pro 的端到端生产 pipeline。
#mermaid-svg-W93Yk84LJeoPo5ny{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-W93Yk84LJeoPo5ny .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-W93Yk84LJeoPo5ny .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-W93Yk84LJeoPo5ny .error-icon{fill:#552222;}#mermaid-svg-W93Yk84LJeoPo5ny .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-W93Yk84LJeoPo5ny .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-W93Yk84LJeoPo5ny .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-W93Yk84LJeoPo5ny .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-W93Yk84LJeoPo5ny .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-W93Yk84LJeoPo5ny .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-W93Yk84LJeoPo5ny .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-W93Yk84LJeoPo5ny .marker{fill:#333333;stroke:#333333;}#mermaid-svg-W93Yk84LJeoPo5ny .marker.cross{stroke:#333333;}#mermaid-svg-W93Yk84LJeoPo5ny svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-W93Yk84LJeoPo5ny p{margin:0;}#mermaid-svg-W93Yk84LJeoPo5ny .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-W93Yk84LJeoPo5ny .cluster-label text{fill:#333;}#mermaid-svg-W93Yk84LJeoPo5ny .cluster-label span{color:#333;}#mermaid-svg-W93Yk84LJeoPo5ny .cluster-label span p{background-color:transparent;}#mermaid-svg-W93Yk84LJeoPo5ny .label text,#mermaid-svg-W93Yk84LJeoPo5ny span{fill:#333;color:#333;}#mermaid-svg-W93Yk84LJeoPo5ny .node rect,#mermaid-svg-W93Yk84LJeoPo5ny .node circle,#mermaid-svg-W93Yk84LJeoPo5ny .node ellipse,#mermaid-svg-W93Yk84LJeoPo5ny .node polygon,#mermaid-svg-W93Yk84LJeoPo5ny .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-W93Yk84LJeoPo5ny .rough-node .label text,#mermaid-svg-W93Yk84LJeoPo5ny .node .label text,#mermaid-svg-W93Yk84LJeoPo5ny .image-shape .label,#mermaid-svg-W93Yk84LJeoPo5ny .icon-shape .label{text-anchor:middle;}#mermaid-svg-W93Yk84LJeoPo5ny .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-W93Yk84LJeoPo5ny .rough-node .label,#mermaid-svg-W93Yk84LJeoPo5ny .node .label,#mermaid-svg-W93Yk84LJeoPo5ny .image-shape .label,#mermaid-svg-W93Yk84LJeoPo5ny .icon-shape .label{text-align:center;}#mermaid-svg-W93Yk84LJeoPo5ny .node.clickable{cursor:pointer;}#mermaid-svg-W93Yk84LJeoPo5ny .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-W93Yk84LJeoPo5ny .arrowheadPath{fill:#333333;}#mermaid-svg-W93Yk84LJeoPo5ny .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-W93Yk84LJeoPo5ny .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-W93Yk84LJeoPo5ny .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-W93Yk84LJeoPo5ny .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-W93Yk84LJeoPo5ny .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-W93Yk84LJeoPo5ny .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-W93Yk84LJeoPo5ny .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-W93Yk84LJeoPo5ny .cluster text{fill:#333;}#mermaid-svg-W93Yk84LJeoPo5ny .cluster span{color:#333;}#mermaid-svg-W93Yk84LJeoPo5ny div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-W93Yk84LJeoPo5ny .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-W93Yk84LJeoPo5ny rect.text{fill:none;stroke-width:0;}#mermaid-svg-W93Yk84LJeoPo5ny .icon-shape,#mermaid-svg-W93Yk84LJeoPo5ny .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-W93Yk84LJeoPo5ny .icon-shape p,#mermaid-svg-W93Yk84LJeoPo5ny .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-W93Yk84LJeoPo5ny .icon-shape rect,#mermaid-svg-W93Yk84LJeoPo5ny .image-shape rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-W93Yk84LJeoPo5ny .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-W93Yk84LJeoPo5ny .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-W93Yk84LJeoPo5ny :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 实测结果
V4-Flash 单用户 +60%-85%
V4-Pro 单用户 +57%-78%
高并发场景吞吐 +400%
5.3 高吞吐低延迟推理
变长 query batch

token flatten + marker tensor
稀疏 attention 内

intra-seq 依赖 marker
V4 架构: index-attention

  • compress kernel 修改
    5.2 ZOS 异步调度
    Algorithm 1 理论版

同步调度 → GPU stall
生产版: 用两步前的 confidence

近似 upcoming capacity
当前步 token 按真实 confidence 排序

仅截断长度用历史预测
rank-preserving + 因果屏障

保 lossless
5.1 可扩展训练 HAI-LLM
DeepSeek-V4 + DSpark

3 MoE 层 + mHC + sliding window 128
γ=5 + Markov head
hidden state 通信优化

O(d) 而非 O(V)
anchor-bounded sequence packing

token 级 attn index

图说明: 生产 pipeline 三层架构:训练层用 HAI-LLM 优化 hidden state 通信与 anchor packing;调度层用 ZOS(Zero-Overhead Scheduling)异步近似 Algorithm 1,避免 GPU stall;推理层用变长 batch kernel + marker tensor 支持动态路由。实测数据:V4-Flash 单用户生成速度提升 60%-85%,V4-Pro 提升 57%-78%,高并发场景有效吞吐 4×。

5.1 训练优化要点

论文 Section 5.1 描述两个系统级优化,本仓库代码可作为参考但实现细节在 HAI-LLM 内:

  1. Hidden state 通信 :原本需在并行 worker 间传输 target 的全词表 logits( V ≈ 10 5 V \approx 10^5 V≈105),改成只传 hidden states( O ( d ) O(d) O(d)),lm head 投影在 draft worker 本地完成。本仓库的 target cache 协议(target_cache_dataset.py(file:///workspace/deepspec/data/target_cache_dataset.py))正是这一思路的离线版------把 target 的 hidden 预计算到磁盘。
  2. Anchor-bounded sequence packing :把多个独立 anchor 块打包成 dense batch,用 token-level attention index 而非 2D mask 维持 causal。本仓库的 create_dspark_attention_maskcommon.py:78-106(file:///workspace/deepspec/modeling/dspark/common.py#L78-106))是这一思路的 single-node 版。
5.2 ZOS 异步调度的代码线索

Algorithm 1 理论版要求同步调度(先确定本步 batch size 再 forward),与 ZOS 的"下一步 batch size 必须在本步完成前已知"冲突。生产版改为:

  • 候选 token 按当前真实 confidence 排序(保持 rank-preserving)
  • 截断长度 K K K 用两步前的 confidence 历史预测(asynchronous approximation)
  • 因截断决策只用历史信息,形成因果屏障,保 lossless 性质

本仓库 draft_ops.py:82-93(file:///workspace/deepspec/eval/dspark/draft_ops.py#L82-93) 的 _confident_prefix_length 是单请求离线版------按 confidence_threshold 截断,与 Algorithm 1 的多请求全局调度不同。

8.7 实验输出文件清单

复现实验会产出以下文件:

文件路径 内容 阶段
train_datasets/perfectblend_train.jsonl 切分后的训练 prompt 数据
train_datasets/qwen3_4b/perfectblend_train_regen.jsonl sglang 重生成回答 数据
~/.cache/deepspec/qwen3_4b_target_cache/ target cache(38TB) 数据
~/checkpoints/deepspec/<exp_name>/step_<N>/ checkpoint 每 3000 步 训练
~/checkpoints/deepspec/<exp_name>/step_latest 最新 checkpoint 符号链接 训练
~/tensorboard/deepspec/<exp_name>/ tensorboard 日志 训练
<ckpt>/config.py 训练时的 config 快照 训练
<ckpt>/training_state.rank{N}.pt 每 rank optimizer + RNG 训练
<output>/metrics.json confidence 校准指标(threshold=0 时) 评测
<output>/reliability_*.png 可靠性图(threshold=0 时) 评测
<output>/<task>.jsonl per-sample τ 评测
stdout 跨 rank 聚合后的 τ per task 评测

8.8 实验踩坑清单

#mermaid-svg-ATEircBjPzIV7Jjy{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-ATEircBjPzIV7Jjy .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-ATEircBjPzIV7Jjy .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-ATEircBjPzIV7Jjy .error-icon{fill:#552222;}#mermaid-svg-ATEircBjPzIV7Jjy .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-ATEircBjPzIV7Jjy .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-ATEircBjPzIV7Jjy .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-ATEircBjPzIV7Jjy .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-ATEircBjPzIV7Jjy .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-ATEircBjPzIV7Jjy .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-ATEircBjPzIV7Jjy .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-ATEircBjPzIV7Jjy .marker{fill:#333333;stroke:#333333;}#mermaid-svg-ATEircBjPzIV7Jjy .marker.cross{stroke:#333333;}#mermaid-svg-ATEircBjPzIV7Jjy svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-ATEircBjPzIV7Jjy p{margin:0;}#mermaid-svg-ATEircBjPzIV7Jjy .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-ATEircBjPzIV7Jjy .cluster-label text{fill:#333;}#mermaid-svg-ATEircBjPzIV7Jjy .cluster-label span{color:#333;}#mermaid-svg-ATEircBjPzIV7Jjy .cluster-label span p{background-color:transparent;}#mermaid-svg-ATEircBjPzIV7Jjy .label text,#mermaid-svg-ATEircBjPzIV7Jjy span{fill:#333;color:#333;}#mermaid-svg-ATEircBjPzIV7Jjy .node rect,#mermaid-svg-ATEircBjPzIV7Jjy .node circle,#mermaid-svg-ATEircBjPzIV7Jjy .node ellipse,#mermaid-svg-ATEircBjPzIV7Jjy .node polygon,#mermaid-svg-ATEircBjPzIV7Jjy .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-ATEircBjPzIV7Jjy .rough-node .label text,#mermaid-svg-ATEircBjPzIV7Jjy .node .label text,#mermaid-svg-ATEircBjPzIV7Jjy .image-shape .label,#mermaid-svg-ATEircBjPzIV7Jjy .icon-shape .label{text-anchor:middle;}#mermaid-svg-ATEircBjPzIV7Jjy .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-ATEircBjPzIV7Jjy .rough-node .label,#mermaid-svg-ATEircBjPzIV7Jjy .node .label,#mermaid-svg-ATEircBjPzIV7Jjy .image-shape .label,#mermaid-svg-ATEircBjPzIV7Jjy .icon-shape .label{text-align:center;}#mermaid-svg-ATEircBjPzIV7Jjy .node.clickable{cursor:pointer;}#mermaid-svg-ATEircBjPzIV7Jjy .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-ATEircBjPzIV7Jjy .arrowheadPath{fill:#333333;}#mermaid-svg-ATEircBjPzIV7Jjy .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-ATEircBjPzIV7Jjy .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-ATEircBjPzIV7Jjy .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-ATEircBjPzIV7Jjy .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-ATEircBjPzIV7Jjy .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-ATEircBjPzIV7Jjy .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-ATEircBjPzIV7Jjy .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-ATEircBjPzIV7Jjy .cluster text{fill:#333;}#mermaid-svg-ATEircBjPzIV7Jjy .cluster span{color:#333;}#mermaid-svg-ATEircBjPzIV7Jjy div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-ATEircBjPzIV7Jjy .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-ATEircBjPzIV7Jjy rect.text{fill:none;stroke-width:0;}#mermaid-svg-ATEircBjPzIV7Jjy .icon-shape,#mermaid-svg-ATEircBjPzIV7Jjy .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-ATEircBjPzIV7Jjy .icon-shape p,#mermaid-svg-ATEircBjPzIV7Jjy .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-ATEircBjPzIV7Jjy .icon-shape rect,#mermaid-svg-ATEircBjPzIV7Jjy .image-shape rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-ATEircBjPzIV7Jjy .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-ATEircBjPzIV7Jjy .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-ATEircBjPzIV7Jjy :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 常见踩坑
38TB target cache 存储

→ 提前规划磁盘 / 减 layer
target_layer_ids 含最后一层

→ eval hidden 与 cache 不一致

assert_no_final_target_layer
local_batch_size != 1

→ 单样本 anchor 数已足
resume 时改 world_size

→ topology 不一致报错
confidence_threshold > 0 时

做校准评估 → 不无偏
EVAL_ATTN_IMPLEMENTATION

与训练 flex_attention 不一致
sglang 采样参数未对齐 target

→ 训练分布偏移
切换 target 未改 chat_template

→ loss_mask 错位

图说明: 8 类常见踩坑点,每个都对应代码中具体校验或文档警告。assert_no_final_target_layerbase_evaluator.py:100-112(file:///workspace/deepspec/eval/base_evaluator.py#L100-112))会主动拦截 P2;local_batch_size != saved_local_batch_sizeckpt_manager.py:84-133(file:///workspace/deepspec/trainer/ckpt_manager.py#L84-133) 校验拦截 P4;sglang 采样参数(P7)与 chat_template(P8)需手动对齐,无代码校验。

详细踩坑对照:

  1. 38TB target cache 存储README.md:29(file:///workspace/README.md#L29)):默认 Qwen3-4B 配置需 38TB 磁盘。减小存储:① 减小训练集;② 减少 target_layer_ids(如改成 [9, 25] 两层,存储减 60%)。
  2. target_layer_ids 不能含最后一层assert_no_final_target_layerbase_evaluator.py:100-112(file:///workspace/deepspec/eval/base_evaluator.py#L100-112)):transformers 的 output_hidden_states 存归一化后的 final hidden,与 cache 中的 raw decoder output 不一致。
  3. local_batch_size=1 是有意为之 :单样本 num_anchors=512 已提供 3584 个监督位置,batch 维度的并行度足够。改大会 OOM。
  4. resume 时 topology 必须一致saved_world_size == world_sizesaved_rank == global_ranksaved_local_batch_size == local_batch_sizeckpt_manager.py:84-133(file:///workspace/deepspec/trainer/ckpt_manager.py#L84-133))。改 world_size 必须从头训。
  5. confidence_threshold > 0 时不能做校准评估 :截断后采样不再无偏,evaluator.py:46-48(file:///workspace/deepspec/eval/dspark/evaluator.py#L46-48) 会跳过 ConfidenceHeadRecorder。复现 Figure 6 必须 --confidence-threshold 0.0
  6. EVAL_ATTN_IMPLEMENTATION="sdpa"dspark/evaluator.py:32(file:///workspace/deepspec/eval/dspark/evaluator.py#L32)):训练用 flex_attention 的 block mask,eval 用 sdpa(单 block 不需要 block mask)。改了会报错或速度退化。
  7. sglang 采样参数必须对齐 target 推荐 :Qwen3 用 --temperature 0.7 --top-p 0.8 --top-k 20 --min-p 0;Gemma4 需查官方推荐。未对齐会让训练数据分布偏移,τ 显著下降。
  8. 切换 target 必须同步改 chat_template :Qwen 用 qwen,Gemma4 用 gemma4parser.py:30-51(file:///workspace/deepspec/data/parser.py#L30-51))。assistant_loss_prefix 决定 loss_mask 起始位置,错位会让 loss 算到 prompt 上。

小结段(总)

DSpark 论文实验的复现路径在本仓库代码中完整可走通 ------Section 4 全部基准实验可直接用 eval.py + released checkpoint 复现;Section 5 生产部署的核心算法(Algorithm 1)在论文与代码中均有线索,但完整生产 pipeline 在 HAI-LLM 内部。

复现核心要点回顾:

  1. Section 4.1 实验设置:12 个 config 文件 × 9 benchmark × 3 drafter,配置项与论文一字一句对得上。
  2. Section 4.2 主结果 Table 1 :108 cell,每 cell 一次 eval.py,可复用 released checkpoint 跳过训练。
  3. Section 4.3.1 Figure 2accept_rates_by_positionBaseEvaluator 中收集,需注意 conditional vs prefix survival 的口径差异。
  4. Section 4.3.2 Figure 3、4 :用 --opts "model.num_draft_layers=N"model.block_size=K 扫描,每点需独立训练。
  5. Section 4.3.3 Figure 5、6 :threshold sweep + reliability diagram,ConfidenceHeadRecorder 自动产出。
  6. Section 5 生产部署:HAI-LLM 训练优化 + ZOS 异步调度 + 变长 batch kernel,本仓库提供数据生成与训练框架的对应代码线索。
  7. 8 类踩坑清单:每类都对应代码中具体校验或文档警告,按图索骥即可避免。

实验资源估算:

  • 复现 Section 4.2 全 108 cell(用 released checkpoint):8×A100 80G × 4 target × 3 drafter × 9 benchmark ≈ 数天(每 cell 0.5-2 小时)
  • 复现 Section 4.3.2 ablation(自训练):每点需 8×A100 训练 10 epoch ≈ 1-2 天,共 4(depth)+ 4(length)+ 1(rnn)= 9 点 ≈ 2 周
  • target cache 生成:1.3M 样本 × 5 层 hidden,8 卡 sglang + 8 卡 target forward ≈ 数天 + 38TB 磁盘

延伸阅读 :进入 09 使用指南 看完整端到端命令;进入 07 评测系统 回顾 τ 计算细节。论文 Section 4-5 全文在 DSpark_paper.pdf(file:///workspace/DSpark_paper.pdf),论文 Table 1 / Figure 2-6 的实验数据可在 PDF 中查阅。