08 · 实验复现:论文对照与流程详解
本篇是用户特别强调"要详细、要单独成节"的实验文档,按 DSpark 论文 Section 4--5 全面对照。前序 02 核心原理--07 评测系统 已讲清代码实现,本篇回答"如何用这套代码复现论文实验"。共 8 节,每节配 Mermaid 流程图与文字说明,含完整命令矩阵、文件清单与踩坑清单。
总览段(总)
DSpark 论文实验分两部分:Section 4 离线基准实验 (draft 质量 + ablation)+ Section 5 生产部署实验(DeepSeek-V4 在线流量验证)。本仓库代码完整覆盖 Section 4 全部实验,Section 5 的 HAI-LLM/ZOS 部分在生产侧(开源代码侧提供数据生成与训练框架)。
#mermaid-svg-R4p7riRpjjvLR0WO{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-R4p7riRpjjvLR0WO .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-R4p7riRpjjvLR0WO .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-R4p7riRpjjvLR0WO .error-icon{fill:#552222;}#mermaid-svg-R4p7riRpjjvLR0WO .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-R4p7riRpjjvLR0WO .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-R4p7riRpjjvLR0WO .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-R4p7riRpjjvLR0WO .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-R4p7riRpjjvLR0WO .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-R4p7riRpjjvLR0WO .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-R4p7riRpjjvLR0WO .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-R4p7riRpjjvLR0WO .marker{fill:#333333;stroke:#333333;}#mermaid-svg-R4p7riRpjjvLR0WO .marker.cross{stroke:#333333;}#mermaid-svg-R4p7riRpjjvLR0WO svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-R4p7riRpjjvLR0WO p{margin:0;}#mermaid-svg-R4p7riRpjjvLR0WO .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-R4p7riRpjjvLR0WO .cluster-label text{fill:#333;}#mermaid-svg-R4p7riRpjjvLR0WO .cluster-label span{color:#333;}#mermaid-svg-R4p7riRpjjvLR0WO .cluster-label span p{background-color:transparent;}#mermaid-svg-R4p7riRpjjvLR0WO .label text,#mermaid-svg-R4p7riRpjjvLR0WO span{fill:#333;color:#333;}#mermaid-svg-R4p7riRpjjvLR0WO .node rect,#mermaid-svg-R4p7riRpjjvLR0WO .node circle,#mermaid-svg-R4p7riRpjjvLR0WO .node ellipse,#mermaid-svg-R4p7riRpjjvLR0WO .node polygon,#mermaid-svg-R4p7riRpjjvLR0WO .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-R4p7riRpjjvLR0WO .rough-node .label text,#mermaid-svg-R4p7riRpjjvLR0WO .node .label text,#mermaid-svg-R4p7riRpjjvLR0WO .image-shape .label,#mermaid-svg-R4p7riRpjjvLR0WO .icon-shape .label{text-anchor:middle;}#mermaid-svg-R4p7riRpjjvLR0WO .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-R4p7riRpjjvLR0WO .rough-node .label,#mermaid-svg-R4p7riRpjjvLR0WO .node .label,#mermaid-svg-R4p7riRpjjvLR0WO .image-shape .label,#mermaid-svg-R4p7riRpjjvLR0WO .icon-shape .label{text-align:center;}#mermaid-svg-R4p7riRpjjvLR0WO .node.clickable{cursor:pointer;}#mermaid-svg-R4p7riRpjjvLR0WO .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-R4p7riRpjjvLR0WO .arrowheadPath{fill:#333333;}#mermaid-svg-R4p7riRpjjvLR0WO .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-R4p7riRpjjvLR0WO .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-R4p7riRpjjvLR0WO .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-R4p7riRpjjvLR0WO .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-R4p7riRpjjvLR0WO .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-R4p7riRpjjvLR0WO .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-R4p7riRpjjvLR0WO .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-R4p7riRpjjvLR0WO .cluster text{fill:#333;}#mermaid-svg-R4p7riRpjjvLR0WO .cluster span{color:#333;}#mermaid-svg-R4p7riRpjjvLR0WO div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-R4p7riRpjjvLR0WO .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-R4p7riRpjjvLR0WO rect.text{fill:none;stroke-width:0;}#mermaid-svg-R4p7riRpjjvLR0WO .icon-shape,#mermaid-svg-R4p7riRpjjvLR0WO .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-R4p7riRpjjvLR0WO .icon-shape p,#mermaid-svg-R4p7riRpjjvLR0WO .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-R4p7riRpjjvLR0WO .icon-shape rect,#mermaid-svg-R4p7riRpjjvLR0WO .image-shape rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-R4p7riRpjjvLR0WO .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-R4p7riRpjjvLR0WO .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-R4p7riRpjjvLR0WO :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 4. 指标聚合 跨 rank allreduce
宏平均 τ per domain
对照 Table 1 / Figure 2-6
3. 评测 9 benchmark eval.py + 9 个 JSONL
acceptance_length τ 收集
confidence 校准 ECE/AUROC
2. 训练 draft config 选择算法
train.py + FSDP + bf16
10 epoch 收敛
- 数据生成 download_and_split.py
open-perfectblend 1.3M
generate_train_data.py
sglang 重生成回答
prepare_target_cache.py
forward hook 抓 hidden
0. 环境准备 8×A100/H100 GPU
38TB 磁盘 for target cache
sglang server
图说明: 实验全流程分 5 阶段,每阶段产物是下阶段输入。环境准备是硬约束(8 卡 + 38TB);数据生成耗时最长(sglang 重生成 + target cache);训练 10 epoch(论文 Section 4.1);评测 9 benchmark;聚合产出论文 Table 1 与 Figure 2-6。
分述段(分)
8.1 Section 4.1 实验设置对照
论文 Section 4.1 描述的实验设置在代码中的对应:
| 论文设置 | 代码位置 | 复现命令 |
|---|---|---|
| 4 个 target:Qwen3-{4B,8B,14B}、Gemma4-12B | config/{dspark,eagle3,dflash}/*_{qwen3_4b,qwen3_8b,qwen3_14b,gemma4_12b}.py |
--config config/dspark/dspark_qwen3_4b.py |
Eagle3 ttt_length=7 与 DSpark block_size=7 对齐 |
config/eagle3/eagle3_qwen3_4b.py(file:///workspace/config/eagle3/eagle3_qwen3_4b.py) 的 ttt_length=7;config/dspark/dspark_qwen3_4b.py(file:///workspace/config/dspark/dspark_qwen3_4b.py) 的 block_size=7 |
- |
| Eagle3 draft 1 层、DSpark/DFlash draft 5 层 | draft_num_hidden_layers=1 / num_draft_layers=5 |
- |
| 同样的 target-model feature layers | target_layer_ids=[1, 9, 17, 25, 33] |
- |
| Open-PerfectBlend 1.3M 样本 | scripts/data/download_and_split.py(file:///workspace/scripts/data/download_and_split.py) | python scripts/data/download_and_split.py --dataset-name mlabonne/open-perfectblend ... |
| chat 17.6% / math 39.4% / code 38.9% / instruction 4.1% | 数据集自带 | - |
| 用各 target 推荐采样参数重新生成回答 | scripts/data/generate_train_data.py(file:///workspace/scripts/data/generate_train_data.py) | python scripts/data/generate_train_data.py --model Qwen/Qwen3-4B --temperature 0.7 --top-p 0.8 --top-k 20 --min-p 0 ... |
| non-thinking 模式 | --disable-thinking;parser.py(file:///workspace/deepspec/data/parser.py) 中 gemma4 的 assistant_loss_prefix |
- |
| 10 epochs 全收敛 | num_train_epochs=10(config/dspark/dspark_qwen3_4b.py:39(file:///workspace/config/dspark/dspark_qwen3_4b.py#L39)) |
- |
| DSpark 默认 Markov head | markov_head_type='vanilla'(config/dspark/dspark_qwen3_4b.py:19(file:///workspace/config/dspark/dspark_qwen3_4b.py#L19)) |
- |
| 9 个 benchmark 三领域 | eval.py:18-28(file:///workspace/eval.py#L18-28) 的 TASKS |
- |
| temperature=1.0 评测 | --temperature 1.0(eval.py:35-36(file:///workspace/eval.py#L35-36)) |
- |
| chain-based drafting | BaseEvaluator 默认 chain |
- |
8.2 Section 4.2 主结果复现(Table 1)
论文 Table 1 报告 4 target × 9 benchmark × 3 drafter = 108 个 cell 的 accepted length τ。
#mermaid-svg-ECRwVVCfp40wTtI8{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-ECRwVVCfp40wTtI8 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-ECRwVVCfp40wTtI8 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-ECRwVVCfp40wTtI8 .error-icon{fill:#552222;}#mermaid-svg-ECRwVVCfp40wTtI8 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-ECRwVVCfp40wTtI8 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-ECRwVVCfp40wTtI8 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-ECRwVVCfp40wTtI8 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-ECRwVVCfp40wTtI8 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-ECRwVVCfp40wTtI8 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-ECRwVVCfp40wTtI8 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-ECRwVVCfp40wTtI8 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-ECRwVVCfp40wTtI8 .marker.cross{stroke:#333333;}#mermaid-svg-ECRwVVCfp40wTtI8 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-ECRwVVCfp40wTtI8 p{margin:0;}#mermaid-svg-ECRwVVCfp40wTtI8 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-ECRwVVCfp40wTtI8 .cluster-label text{fill:#333;}#mermaid-svg-ECRwVVCfp40wTtI8 .cluster-label span{color:#333;}#mermaid-svg-ECRwVVCfp40wTtI8 .cluster-label span p{background-color:transparent;}#mermaid-svg-ECRwVVCfp40wTtI8 .label text,#mermaid-svg-ECRwVVCfp40wTtI8 span{fill:#333;color:#333;}#mermaid-svg-ECRwVVCfp40wTtI8 .node rect,#mermaid-svg-ECRwVVCfp40wTtI8 .node circle,#mermaid-svg-ECRwVVCfp40wTtI8 .node ellipse,#mermaid-svg-ECRwVVCfp40wTtI8 .node polygon,#mermaid-svg-ECRwVVCfp40wTtI8 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-ECRwVVCfp40wTtI8 .rough-node .label text,#mermaid-svg-ECRwVVCfp40wTtI8 .node .label text,#mermaid-svg-ECRwVVCfp40wTtI8 .image-shape .label,#mermaid-svg-ECRwVVCfp40wTtI8 .icon-shape .label{text-anchor:middle;}#mermaid-svg-ECRwVVCfp40wTtI8 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-ECRwVVCfp40wTtI8 .rough-node .label,#mermaid-svg-ECRwVVCfp40wTtI8 .node .label,#mermaid-svg-ECRwVVCfp40wTtI8 .image-shape .label,#mermaid-svg-ECRwVVCfp40wTtI8 .icon-shape .label{text-align:center;}#mermaid-svg-ECRwVVCfp40wTtI8 .node.clickable{cursor:pointer;}#mermaid-svg-ECRwVVCfp40wTtI8 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-ECRwVVCfp40wTtI8 .arrowheadPath{fill:#333333;}#mermaid-svg-ECRwVVCfp40wTtI8 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-ECRwVVCfp40wTtI8 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-ECRwVVCfp40wTtI8 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-ECRwVVCfp40wTtI8 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-ECRwVVCfp40wTtI8 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-ECRwVVCfp40wTtI8 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-ECRwVVCfp40wTtI8 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-ECRwVVCfp40wTtI8 .cluster text{fill:#333;}#mermaid-svg-ECRwVVCfp40wTtI8 .cluster span{color:#333;}#mermaid-svg-ECRwVVCfp40wTtI8 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-ECRwVVCfp40wTtI8 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-ECRwVVCfp40wTtI8 rect.text{fill:none;stroke-width:0;}#mermaid-svg-ECRwVVCfp40wTtI8 .icon-shape,#mermaid-svg-ECRwVVCfp40wTtI8 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-ECRwVVCfp40wTtI8 .icon-shape p,#mermaid-svg-ECRwVVCfp40wTtI8 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-ECRwVVCfp40wTtI8 .icon-shape rect,#mermaid-svg-ECRwVVCfp40wTtI8 .image-shape rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-ECRwVVCfp40wTtI8 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-ECRwVVCfp40wTtI8 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-ECRwVVCfp40wTtI8 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 输出
per-sample τ JSONL
跨 rank allreduce
宏平均 per domain
每个 cell 命令模板
python eval.py
--target_name_or_path Qwen/Qwen3-4B
--draft_name_or_path deepseek-ai/dspark_qwen3_4b_block7
--temperature 1.0
108 cell 复现矩阵
Qwen3-4B
9 benchmark × 3 drafter
Qwen3-8B
9 benchmark × 3 drafter
Qwen3-14B
9 benchmark × 3 drafter
Gemma4-12B
9 benchmark × 3 drafter
图说明: Table 1 共 108 个 cell,每 cell 跑一次 eval.py。可复用 released checkpoint(README.md:55-62(file:///workspace/README.md#L55-62))省去训练。每 cell 输出 per-sample τ,跨 8 rank allreduce 聚合,再按 domain(Math/Code/Chat)算宏平均。论文 Qwen3-4B 行已从 PDF 提取确认:
| Target=Qwen3-4B | GSM8K | MATH | AIME25 | MBPP | HumanEval | LCB | MT-Bench | Alpaca | Arena-Hard |
|---|---|---|---|---|---|---|---|---|---|
| Eagle3 | 5.14 | 4.62 | 3.92 | 3.69 | 4.16 | 3.77 | 2.39 | 2.26 | 2.55 |
| DFlash | 5.40 | 4.85 | 4.15 | 4.40 | 4.74 | 4.18 | 3.07 | 2.96 | 2.83 |
| DSpark | 6.11 | 5.70 | 4.89 | 5.13 | 5.38 | 4.86 | 3.64 | 3.54 | 3.29 |
复现命令矩阵(Qwen3-4B 行示例)
bash
# Eagle3 cell(9 个 benchmark 一次跑完)
python eval.py \
--target_name_or_path Qwen/Qwen3-4B \
--draft_name_or_path deepseek-ai/eagle3_qwen3_4b_ttt7 \
--temperature 1.0 \
--max-new-tokens 2048
# DFlash cell
python eval.py \
--target_name_or_path Qwen/Qwen3-4B \
--draft_name_or_path deepseek-ai/dflash_qwen3_4b_block7 \
--temperature 1.0
# DSpark cell
python eval.py \
--target_name_or_path Qwen/Qwen3-4B \
--draft_name_or_path deepseek-ai/dspark_qwen3_4b_block7 \
--temperature 1.0
切换 target 只需改 --target_name_or_path 与 --draft_name_or_path。eval.py 自动通过 architectures 字段路由 evaluator(eval.py:50-57(file:///workspace/eval.py#L50-57))。
宏平均 τ 计算与代码位置
τ 跨 rank 聚合在 `allreduce_response_metrics`(file:///workspace/deepspec/eval/base_evaluator.py) (base_evaluator.py:550-630(file:///workspace/deepspec/eval/base_evaluator.py#L550-630)):
python
acceptance_length_sum = all_reduce(SUM, acceptance_length_sum)
proposal_count = all_reduce(SUM, proposal_count)
tau = acceptance_length_sum / proposal_count
domain 宏平均 = mean(tau_gsm8k, tau_math500, tau_aime25)(Math)/ mean(tau_mbpp, tau_humaneval, tau_livecodebench)(Code)/ mean(tau_mt_bench, tau_alpaca, tau_arena_hard_v2)(Chat)。
论文相对提升复算
论文摘要的 "DSpark 相对 Eagle3 提升 30.9%(Qwen3-4B)"复算方法:
- Math 宏平均:DSpark=(6.11+5.70+4.89)/3=5.57,Eagle3=(5.14+4.62+3.92)/3=4.56,提升=(5.57-4.56)/4.56≈22.1%
- 全 9 benchmark 宏平均:DSpark≈4.68,Eagle3≈3.61,提升≈29.6% ≈ 30.9%(与论文一致)
8.3 Section 4.3.1 Position-wise acceptance(Figure 2)
论文 Figure 2 报告三领域内每个 draft 位置的 conditional acceptance rate,揭示 suffix decay 现象。
#mermaid-svg-TvQJYP5oaquY8UbD{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-TvQJYP5oaquY8UbD .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-TvQJYP5oaquY8UbD .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-TvQJYP5oaquY8UbD .error-icon{fill:#552222;}#mermaid-svg-TvQJYP5oaquY8UbD .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-TvQJYP5oaquY8UbD .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-TvQJYP5oaquY8UbD .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-TvQJYP5oaquY8UbD .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-TvQJYP5oaquY8UbD .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-TvQJYP5oaquY8UbD .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-TvQJYP5oaquY8UbD .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-TvQJYP5oaquY8UbD .marker{fill:#333333;stroke:#333333;}#mermaid-svg-TvQJYP5oaquY8UbD .marker.cross{stroke:#333333;}#mermaid-svg-TvQJYP5oaquY8UbD svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-TvQJYP5oaquY8UbD p{margin:0;}#mermaid-svg-TvQJYP5oaquY8UbD .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-TvQJYP5oaquY8UbD .cluster-label text{fill:#333;}#mermaid-svg-TvQJYP5oaquY8UbD .cluster-label span{color:#333;}#mermaid-svg-TvQJYP5oaquY8UbD .cluster-label span p{background-color:transparent;}#mermaid-svg-TvQJYP5oaquY8UbD .label text,#mermaid-svg-TvQJYP5oaquY8UbD span{fill:#333;color:#333;}#mermaid-svg-TvQJYP5oaquY8UbD .node rect,#mermaid-svg-TvQJYP5oaquY8UbD .node circle,#mermaid-svg-TvQJYP5oaquY8UbD .node ellipse,#mermaid-svg-TvQJYP5oaquY8UbD .node polygon,#mermaid-svg-TvQJYP5oaquY8UbD .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-TvQJYP5oaquY8UbD .rough-node .label text,#mermaid-svg-TvQJYP5oaquY8UbD .node .label text,#mermaid-svg-TvQJYP5oaquY8UbD .image-shape .label,#mermaid-svg-TvQJYP5oaquY8UbD .icon-shape .label{text-anchor:middle;}#mermaid-svg-TvQJYP5oaquY8UbD .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-TvQJYP5oaquY8UbD .rough-node .label,#mermaid-svg-TvQJYP5oaquY8UbD .node .label,#mermaid-svg-TvQJYP5oaquY8UbD .image-shape .label,#mermaid-svg-TvQJYP5oaquY8UbD .icon-shape .label{text-align:center;}#mermaid-svg-TvQJYP5oaquY8UbD .node.clickable{cursor:pointer;}#mermaid-svg-TvQJYP5oaquY8UbD .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-TvQJYP5oaquY8UbD .arrowheadPath{fill:#333333;}#mermaid-svg-TvQJYP5oaquY8UbD .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-TvQJYP5oaquY8UbD .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-TvQJYP5oaquY8UbD .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-TvQJYP5oaquY8UbD .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-TvQJYP5oaquY8UbD .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-TvQJYP5oaquY8UbD .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-TvQJYP5oaquY8UbD .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-TvQJYP5oaquY8UbD .cluster text{fill:#333;}#mermaid-svg-TvQJYP5oaquY8UbD .cluster span{color:#333;}#mermaid-svg-TvQJYP5oaquY8UbD div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-TvQJYP5oaquY8UbD .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-TvQJYP5oaquY8UbD rect.text{fill:none;stroke-width:0;}#mermaid-svg-TvQJYP5oaquY8UbD .icon-shape,#mermaid-svg-TvQJYP5oaquY8UbD .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-TvQJYP5oaquY8UbD .icon-shape p,#mermaid-svg-TvQJYP5oaquY8UbD .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-TvQJYP5oaquY8UbD .icon-shape rect,#mermaid-svg-TvQJYP5oaquY8UbD .image-shape rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-TvQJYP5oaquY8UbD .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-TvQJYP5oaquY8UbD .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-TvQJYP5oaquY8UbD :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 绘图
按 domain 聚合
Plot position 1-7 曲线
三算法叠图对照
计算口径
Conditional acceptance:
分子=该位置 accept 数
分母=前 k-1 全 accept 的实例数
vs Prefix survival:
分母=所有 propose 实例
代码收集路径
verify_draft_tokens
记录每位置 accept
build_metrics_row
accept_rates_by_position
allreduce_response_metrics
跨 rank 聚合
图说明: Figure 2 关键差异是 conditional acceptance(分母只算前 k-1 全 accept 的实例)与 prefix survival(分母算所有 propose 实例)的区别。conditional 指标隔离了 baseline predictive quality,避免前缀错误的连带惩罚。代码 accept_rates_by_position 在 base_evaluator.py:469-511(file:///workspace/deepspec/eval/base_evaluator.py#L469-511) 收集。论文实测:DFlash 从 0.88 衰减到 0.78(Code)、0.72 衰减到 0.63(Chat);Eagle3 反而从 0.53 上升到 0.74(Chat);DSpark 从 0.93 起步且整个 block 稳定。
8.4 Section 4.3.2 Drafter depth & proposal length(Figure 3、4)
论文 Figure 3 研究 drafter 层数(1/2/5)对 τ 的影响;Figure 4 研究不同 proposal length(4/8/12/16)下三算法对比 + latency 开销。
#mermaid-svg-BuMvvVJBjzcwkErB{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-BuMvvVJBjzcwkErB .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-BuMvvVJBjzcwkErB .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-BuMvvVJBjzcwkErB .error-icon{fill:#552222;}#mermaid-svg-BuMvvVJBjzcwkErB .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-BuMvvVJBjzcwkErB .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-BuMvvVJBjzcwkErB .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-BuMvvVJBjzcwkErB .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-BuMvvVJBjzcwkErB .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-BuMvvVJBjzcwkErB .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-BuMvvVJBjzcwkErB .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-BuMvvVJBjzcwkErB .marker{fill:#333333;stroke:#333333;}#mermaid-svg-BuMvvVJBjzcwkErB .marker.cross{stroke:#333333;}#mermaid-svg-BuMvvVJBjzcwkErB svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-BuMvvVJBjzcwkErB p{margin:0;}#mermaid-svg-BuMvvVJBjzcwkErB .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-BuMvvVJBjzcwkErB .cluster-label text{fill:#333;}#mermaid-svg-BuMvvVJBjzcwkErB .cluster-label span{color:#333;}#mermaid-svg-BuMvvVJBjzcwkErB .cluster-label span p{background-color:transparent;}#mermaid-svg-BuMvvVJBjzcwkErB .label text,#mermaid-svg-BuMvvVJBjzcwkErB span{fill:#333;color:#333;}#mermaid-svg-BuMvvVJBjzcwkErB .node rect,#mermaid-svg-BuMvvVJBjzcwkErB .node circle,#mermaid-svg-BuMvvVJBjzcwkErB .node ellipse,#mermaid-svg-BuMvvVJBjzcwkErB .node polygon,#mermaid-svg-BuMvvVJBjzcwkErB .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-BuMvvVJBjzcwkErB .rough-node .label text,#mermaid-svg-BuMvvVJBjzcwkErB .node .label text,#mermaid-svg-BuMvvVJBjzcwkErB .image-shape .label,#mermaid-svg-BuMvvVJBjzcwkErB .icon-shape .label{text-anchor:middle;}#mermaid-svg-BuMvvVJBjzcwkErB .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-BuMvvVJBjzcwkErB .rough-node .label,#mermaid-svg-BuMvvVJBjzcwkErB .node .label,#mermaid-svg-BuMvvVJBjzcwkErB .image-shape .label,#mermaid-svg-BuMvvVJBjzcwkErB .icon-shape .label{text-align:center;}#mermaid-svg-BuMvvVJBjzcwkErB .node.clickable{cursor:pointer;}#mermaid-svg-BuMvvVJBjzcwkErB .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-BuMvvVJBjzcwkErB .arrowheadPath{fill:#333333;}#mermaid-svg-BuMvvVJBjzcwkErB .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-BuMvvVJBjzcwkErB .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-BuMvvVJBjzcwkErB .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-BuMvvVJBjzcwkErB .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-BuMvvVJBjzcwkErB .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-BuMvvVJBjzcwkErB .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-BuMvvVJBjzcwkErB .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-BuMvvVJBjzcwkErB .cluster text{fill:#333;}#mermaid-svg-BuMvvVJBjzcwkErB .cluster span{color:#333;}#mermaid-svg-BuMvvVJBjzcwkErB div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-BuMvvVJBjzcwkErB .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-BuMvvVJBjzcwkErB rect.text{fill:none;stroke-width:0;}#mermaid-svg-BuMvvVJBjzcwkErB .icon-shape,#mermaid-svg-BuMvvVJBjzcwkErB .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-BuMvvVJBjzcwkErB .icon-shape p,#mermaid-svg-BuMvvVJBjzcwkErB .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-BuMvvVJBjzcwkErB .icon-shape rect,#mermaid-svg-BuMvvVJBjzcwkErB .image-shape rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-BuMvvVJBjzcwkErB .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-BuMvvVJBjzcwkErB .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-BuMvvVJBjzcwkErB :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Figure 4 latency 测量
batch=128
ctx 长度 512/1024/2048/4096 平均
结论: γ=4→16 延迟仅增 0.2%-1.3%
Figure 4 proposal length 实验
固定 num_draft_layers=5
变 block_size ∈ 4,8,12,16
DSpark 测 markov + rnn 两版
结论: γ 越大 DSpark 优势越显著
Figure 3 drafter depth 实验
固定 block_size=7
变 num_draft_layers ∈ 1,2,5
对照 5 层 DFlash baseline
结论: 2 层 DSpark 已超 5 层 DFlash
图说明: Figure 3、4 都用 Qwen3-4B target,配置变更通过 --opts 实现:
bash
# Figure 3: drafter depth 扫描
for depth in 1 2 5; do
python train.py --config config/dspark/dspark_qwen3_4b.py \
--opts "model.num_draft_layers=${depth}" "data.target_cache_path=${target_cache_dir}"
# 训练完评测
python eval.py --target_name_or_path Qwen/Qwen3-4B \
--draft_name_or_path ~/checkpoints/deepspec/dspark_block7_qwen3_4b_d${depth}/step_latest
done
# Figure 4: proposal length 扫描
for gamma in 4 8 12 16; do
python train.py --config config/dspark/dspark_qwen3_4b.py \
--opts "model.block_size=${gamma}" "data.target_cache_path=${target_cache_dir}"
python eval.py --target_name_or_path Qwen/Qwen3-4B \
--draft_name_or_path ~/checkpoints/deepspec/dspark_block${gamma}_qwen3_4b/step_latest
done
# Figure 4: RNN head 变体
python train.py --config config/dspark/dspark_qwen3_4b.py \
--opts "model.markov_head_type=rnn" "data.target_cache_path=${target_cache_dir}"
论文关键结论复现:
- DSpark 在 γ=7 时相对 DFlash 提升 16%(Math)/ 15%(Code)/ 18%(Chat)
- γ=15 时提升扩大到 30%/26%/22%
- RNN head 相比 Markov head 只在长 proposal length 有边际增益,故默认用 Markov
- 串行 head 延迟开销可忽略(γ=16 时仅 1.3%)
8.5 Section 4.3.3 Confidence head(Figure 5、6)
论文 Figure 5 是 confidence threshold sweep;Figure 6 是 reliability diagram(ECE 校准)。
#mermaid-svg-KZFmZle42X0HsFC2{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-KZFmZle42X0HsFC2 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-KZFmZle42X0HsFC2 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-KZFmZle42X0HsFC2 .error-icon{fill:#552222;}#mermaid-svg-KZFmZle42X0HsFC2 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-KZFmZle42X0HsFC2 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-KZFmZle42X0HsFC2 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-KZFmZle42X0HsFC2 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-KZFmZle42X0HsFC2 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-KZFmZle42X0HsFC2 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-KZFmZle42X0HsFC2 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-KZFmZle42X0HsFC2 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-KZFmZle42X0HsFC2 .marker.cross{stroke:#333333;}#mermaid-svg-KZFmZle42X0HsFC2 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-KZFmZle42X0HsFC2 p{margin:0;}#mermaid-svg-KZFmZle42X0HsFC2 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-KZFmZle42X0HsFC2 .cluster-label text{fill:#333;}#mermaid-svg-KZFmZle42X0HsFC2 .cluster-label span{color:#333;}#mermaid-svg-KZFmZle42X0HsFC2 .cluster-label span p{background-color:transparent;}#mermaid-svg-KZFmZle42X0HsFC2 .label text,#mermaid-svg-KZFmZle42X0HsFC2 span{fill:#333;color:#333;}#mermaid-svg-KZFmZle42X0HsFC2 .node rect,#mermaid-svg-KZFmZle42X0HsFC2 .node circle,#mermaid-svg-KZFmZle42X0HsFC2 .node ellipse,#mermaid-svg-KZFmZle42X0HsFC2 .node polygon,#mermaid-svg-KZFmZle42X0HsFC2 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-KZFmZle42X0HsFC2 .rough-node .label text,#mermaid-svg-KZFmZle42X0HsFC2 .node .label text,#mermaid-svg-KZFmZle42X0HsFC2 .image-shape .label,#mermaid-svg-KZFmZle42X0HsFC2 .icon-shape .label{text-anchor:middle;}#mermaid-svg-KZFmZle42X0HsFC2 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-KZFmZle42X0HsFC2 .rough-node .label,#mermaid-svg-KZFmZle42X0HsFC2 .node .label,#mermaid-svg-KZFmZle42X0HsFC2 .image-shape .label,#mermaid-svg-KZFmZle42X0HsFC2 .icon-shape .label{text-align:center;}#mermaid-svg-KZFmZle42X0HsFC2 .node.clickable{cursor:pointer;}#mermaid-svg-KZFmZle42X0HsFC2 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-KZFmZle42X0HsFC2 .arrowheadPath{fill:#333333;}#mermaid-svg-KZFmZle42X0HsFC2 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-KZFmZle42X0HsFC2 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-KZFmZle42X0HsFC2 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-KZFmZle42X0HsFC2 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-KZFmZle42X0HsFC2 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-KZFmZle42X0HsFC2 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-KZFmZle42X0HsFC2 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-KZFmZle42X0HsFC2 .cluster text{fill:#333;}#mermaid-svg-KZFmZle42X0HsFC2 .cluster span{color:#333;}#mermaid-svg-KZFmZle42X0HsFC2 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-KZFmZle42X0HsFC2 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-KZFmZle42X0HsFC2 rect.text{fill:none;stroke-width:0;}#mermaid-svg-KZFmZle42X0HsFC2 .icon-shape,#mermaid-svg-KZFmZle42X0HsFC2 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-KZFmZle42X0HsFC2 .icon-shape p,#mermaid-svg-KZFmZle42X0HsFC2 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-KZFmZle42X0HsFC2 .icon-shape rect,#mermaid-svg-KZFmZle42X0HsFC2 .image-shape rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-KZFmZle42X0HsFC2 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-KZFmZle42X0HsFC2 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-KZFmZle42X0HsFC2 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Figure 6 reliability diagram
threshold=0.0 跑一次
ConfidenceHeadRecorder 收集
per-position 直方图
计算 ECE / AUROC / Brier
plot_reliability_diagram 输出 PNG
结论: raw ECE 3%-8%
AUROC 0.81-0.91
Figure 5 threshold sweep
threshold ∈ 0.0, 0.2, 0.4, 0.6, 0.8
每 threshold 跑 eval.py
--confidence-threshold X
记录 avg_tokens_per_step
- acceptance_rate
结论: Chat 提升 45.7%→95.7%
Math 76.9%→92.5%
Code 67.6%→92.0%
图说明: Figure 5 sweep 脚本:对 --confidence-threshold 从 0.0 到 0.8 逐步扫描,每点记录 avg_tokens_per_step(柱状图)与 acceptance_rate(折线)。Figure 6 在 threshold=0.0 时启用 ConfidenceHeadRecorder,输出 metrics.json + reliability_*.png。论文实测 raw ECE 约 3%-8%,AUROC 0.81-0.91(图说明:position 1 ECE 5.7%、position 7 ECE 3.3%)。
Sweep 脚本示例
bash
# Figure 5: confidence threshold sweep
for thr in 0.0 0.2 0.4 0.6 0.8; do
python eval.py \
--target_name_or_path Qwen/Qwen3-4B \
--draft_name_or_path deepseek-ai/dspark_qwen3_4b_block7 \
--confidence-threshold ${thr} \
--temperature 1.0 \
--tensorboard-dir logs/sweep_thr_${thr}
done
# Figure 6: reliability diagram (threshold 必须为 0)
python eval.py \
--target_name_or_path Qwen/Qwen3-4B \
--draft_name_or_path deepseek-ai/dspark_qwen3_4b_block7 \
--confidence-threshold 0.0 \
--temperature 1.0
# 输出 metrics.json + reliability_*.png
STS 校准说明
重要 :STS(Sequential Temperature Scaling)在论文 Section 3.2.1 描述,但在生产 HAI-LLM 内部实现,本仓库无对应代码 。本仓库 ConfidenceHeadRecorder 提供 raw confidence 的 ECE/AUROC/Brier 评估,可作为离线 STS 校准的输入数据。论文 Figure 6 显示 STS 后 ECE 降到 ~1%,本仓库代码不直接产出 STS 校准后的指标。
8.6 Section 5 生产部署
论文 Section 5 描述 DeepSeek-V4-Flash/Pro 的端到端生产 pipeline。
#mermaid-svg-W93Yk84LJeoPo5ny{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-W93Yk84LJeoPo5ny .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-W93Yk84LJeoPo5ny .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-W93Yk84LJeoPo5ny .error-icon{fill:#552222;}#mermaid-svg-W93Yk84LJeoPo5ny .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-W93Yk84LJeoPo5ny .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-W93Yk84LJeoPo5ny .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-W93Yk84LJeoPo5ny .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-W93Yk84LJeoPo5ny .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-W93Yk84LJeoPo5ny .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-W93Yk84LJeoPo5ny .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-W93Yk84LJeoPo5ny .marker{fill:#333333;stroke:#333333;}#mermaid-svg-W93Yk84LJeoPo5ny .marker.cross{stroke:#333333;}#mermaid-svg-W93Yk84LJeoPo5ny svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-W93Yk84LJeoPo5ny p{margin:0;}#mermaid-svg-W93Yk84LJeoPo5ny .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-W93Yk84LJeoPo5ny .cluster-label text{fill:#333;}#mermaid-svg-W93Yk84LJeoPo5ny .cluster-label span{color:#333;}#mermaid-svg-W93Yk84LJeoPo5ny .cluster-label span p{background-color:transparent;}#mermaid-svg-W93Yk84LJeoPo5ny .label text,#mermaid-svg-W93Yk84LJeoPo5ny span{fill:#333;color:#333;}#mermaid-svg-W93Yk84LJeoPo5ny .node rect,#mermaid-svg-W93Yk84LJeoPo5ny .node circle,#mermaid-svg-W93Yk84LJeoPo5ny .node ellipse,#mermaid-svg-W93Yk84LJeoPo5ny .node polygon,#mermaid-svg-W93Yk84LJeoPo5ny .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-W93Yk84LJeoPo5ny .rough-node .label text,#mermaid-svg-W93Yk84LJeoPo5ny .node .label text,#mermaid-svg-W93Yk84LJeoPo5ny .image-shape .label,#mermaid-svg-W93Yk84LJeoPo5ny .icon-shape .label{text-anchor:middle;}#mermaid-svg-W93Yk84LJeoPo5ny .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-W93Yk84LJeoPo5ny .rough-node .label,#mermaid-svg-W93Yk84LJeoPo5ny .node .label,#mermaid-svg-W93Yk84LJeoPo5ny .image-shape .label,#mermaid-svg-W93Yk84LJeoPo5ny .icon-shape .label{text-align:center;}#mermaid-svg-W93Yk84LJeoPo5ny .node.clickable{cursor:pointer;}#mermaid-svg-W93Yk84LJeoPo5ny .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-W93Yk84LJeoPo5ny .arrowheadPath{fill:#333333;}#mermaid-svg-W93Yk84LJeoPo5ny .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-W93Yk84LJeoPo5ny .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-W93Yk84LJeoPo5ny .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-W93Yk84LJeoPo5ny .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-W93Yk84LJeoPo5ny .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-W93Yk84LJeoPo5ny .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-W93Yk84LJeoPo5ny .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-W93Yk84LJeoPo5ny .cluster text{fill:#333;}#mermaid-svg-W93Yk84LJeoPo5ny .cluster span{color:#333;}#mermaid-svg-W93Yk84LJeoPo5ny div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-W93Yk84LJeoPo5ny .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-W93Yk84LJeoPo5ny rect.text{fill:none;stroke-width:0;}#mermaid-svg-W93Yk84LJeoPo5ny .icon-shape,#mermaid-svg-W93Yk84LJeoPo5ny .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-W93Yk84LJeoPo5ny .icon-shape p,#mermaid-svg-W93Yk84LJeoPo5ny .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-W93Yk84LJeoPo5ny .icon-shape rect,#mermaid-svg-W93Yk84LJeoPo5ny .image-shape rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-W93Yk84LJeoPo5ny .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-W93Yk84LJeoPo5ny .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-W93Yk84LJeoPo5ny :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 实测结果
V4-Flash 单用户 +60%-85%
V4-Pro 单用户 +57%-78%
高并发场景吞吐 +400%
5.3 高吞吐低延迟推理
变长 query batch
token flatten + marker tensor
稀疏 attention 内
intra-seq 依赖 marker
V4 架构: index-attention
- compress kernel 修改
5.2 ZOS 异步调度
Algorithm 1 理论版
同步调度 → GPU stall
生产版: 用两步前的 confidence
近似 upcoming capacity
当前步 token 按真实 confidence 排序
仅截断长度用历史预测
rank-preserving + 因果屏障
保 lossless
5.1 可扩展训练 HAI-LLM
DeepSeek-V4 + DSpark
3 MoE 层 + mHC + sliding window 128
γ=5 + Markov head
hidden state 通信优化
O(d) 而非 O(V)
anchor-bounded sequence packing
token 级 attn index
图说明: 生产 pipeline 三层架构:训练层用 HAI-LLM 优化 hidden state 通信与 anchor packing;调度层用 ZOS(Zero-Overhead Scheduling)异步近似 Algorithm 1,避免 GPU stall;推理层用变长 batch kernel + marker tensor 支持动态路由。实测数据:V4-Flash 单用户生成速度提升 60%-85%,V4-Pro 提升 57%-78%,高并发场景有效吞吐 4×。
5.1 训练优化要点
论文 Section 5.1 描述两个系统级优化,本仓库代码可作为参考但实现细节在 HAI-LLM 内:
- Hidden state 通信 :原本需在并行 worker 间传输 target 的全词表 logits( V ≈ 10 5 V \approx 10^5 V≈105),改成只传 hidden states( O ( d ) O(d) O(d)),lm head 投影在 draft worker 本地完成。本仓库的 target cache 协议(target_cache_dataset.py(file:///workspace/deepspec/data/target_cache_dataset.py))正是这一思路的离线版------把 target 的 hidden 预计算到磁盘。
- Anchor-bounded sequence packing :把多个独立 anchor 块打包成 dense batch,用 token-level attention index 而非 2D mask 维持 causal。本仓库的
create_dspark_attention_mask(common.py:78-106(file:///workspace/deepspec/modeling/dspark/common.py#L78-106))是这一思路的 single-node 版。
5.2 ZOS 异步调度的代码线索
Algorithm 1 理论版要求同步调度(先确定本步 batch size 再 forward),与 ZOS 的"下一步 batch size 必须在本步完成前已知"冲突。生产版改为:
- 候选 token 按当前真实 confidence 排序(保持 rank-preserving)
- 截断长度 K K K 用两步前的 confidence 历史预测(asynchronous approximation)
- 因截断决策只用历史信息,形成因果屏障,保 lossless 性质
本仓库 draft_ops.py:82-93(file:///workspace/deepspec/eval/dspark/draft_ops.py#L82-93) 的 _confident_prefix_length 是单请求离线版------按 confidence_threshold 截断,与 Algorithm 1 的多请求全局调度不同。
8.7 实验输出文件清单
复现实验会产出以下文件:
| 文件路径 | 内容 | 阶段 |
|---|---|---|
train_datasets/perfectblend_train.jsonl |
切分后的训练 prompt | 数据 |
train_datasets/qwen3_4b/perfectblend_train_regen.jsonl |
sglang 重生成回答 | 数据 |
~/.cache/deepspec/qwen3_4b_target_cache/ |
target cache(38TB) | 数据 |
~/checkpoints/deepspec/<exp_name>/step_<N>/ |
checkpoint 每 3000 步 | 训练 |
~/checkpoints/deepspec/<exp_name>/step_latest |
最新 checkpoint 符号链接 | 训练 |
~/tensorboard/deepspec/<exp_name>/ |
tensorboard 日志 | 训练 |
<ckpt>/config.py |
训练时的 config 快照 | 训练 |
<ckpt>/training_state.rank{N}.pt |
每 rank optimizer + RNG | 训练 |
<output>/metrics.json |
confidence 校准指标(threshold=0 时) | 评测 |
<output>/reliability_*.png |
可靠性图(threshold=0 时) | 评测 |
<output>/<task>.jsonl |
per-sample τ | 评测 |
| stdout | 跨 rank 聚合后的 τ per task | 评测 |
8.8 实验踩坑清单
#mermaid-svg-ATEircBjPzIV7Jjy{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-ATEircBjPzIV7Jjy .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-ATEircBjPzIV7Jjy .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-ATEircBjPzIV7Jjy .error-icon{fill:#552222;}#mermaid-svg-ATEircBjPzIV7Jjy .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-ATEircBjPzIV7Jjy .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-ATEircBjPzIV7Jjy .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-ATEircBjPzIV7Jjy .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-ATEircBjPzIV7Jjy .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-ATEircBjPzIV7Jjy .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-ATEircBjPzIV7Jjy .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-ATEircBjPzIV7Jjy .marker{fill:#333333;stroke:#333333;}#mermaid-svg-ATEircBjPzIV7Jjy .marker.cross{stroke:#333333;}#mermaid-svg-ATEircBjPzIV7Jjy svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-ATEircBjPzIV7Jjy p{margin:0;}#mermaid-svg-ATEircBjPzIV7Jjy .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-ATEircBjPzIV7Jjy .cluster-label text{fill:#333;}#mermaid-svg-ATEircBjPzIV7Jjy .cluster-label span{color:#333;}#mermaid-svg-ATEircBjPzIV7Jjy .cluster-label span p{background-color:transparent;}#mermaid-svg-ATEircBjPzIV7Jjy .label text,#mermaid-svg-ATEircBjPzIV7Jjy span{fill:#333;color:#333;}#mermaid-svg-ATEircBjPzIV7Jjy .node rect,#mermaid-svg-ATEircBjPzIV7Jjy .node circle,#mermaid-svg-ATEircBjPzIV7Jjy .node ellipse,#mermaid-svg-ATEircBjPzIV7Jjy .node polygon,#mermaid-svg-ATEircBjPzIV7Jjy .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-ATEircBjPzIV7Jjy .rough-node .label text,#mermaid-svg-ATEircBjPzIV7Jjy .node .label text,#mermaid-svg-ATEircBjPzIV7Jjy .image-shape .label,#mermaid-svg-ATEircBjPzIV7Jjy .icon-shape .label{text-anchor:middle;}#mermaid-svg-ATEircBjPzIV7Jjy .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-ATEircBjPzIV7Jjy .rough-node .label,#mermaid-svg-ATEircBjPzIV7Jjy .node .label,#mermaid-svg-ATEircBjPzIV7Jjy .image-shape .label,#mermaid-svg-ATEircBjPzIV7Jjy .icon-shape .label{text-align:center;}#mermaid-svg-ATEircBjPzIV7Jjy .node.clickable{cursor:pointer;}#mermaid-svg-ATEircBjPzIV7Jjy .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-ATEircBjPzIV7Jjy .arrowheadPath{fill:#333333;}#mermaid-svg-ATEircBjPzIV7Jjy .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-ATEircBjPzIV7Jjy .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-ATEircBjPzIV7Jjy .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-ATEircBjPzIV7Jjy .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-ATEircBjPzIV7Jjy .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-ATEircBjPzIV7Jjy .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-ATEircBjPzIV7Jjy .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-ATEircBjPzIV7Jjy .cluster text{fill:#333;}#mermaid-svg-ATEircBjPzIV7Jjy .cluster span{color:#333;}#mermaid-svg-ATEircBjPzIV7Jjy div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-ATEircBjPzIV7Jjy .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-ATEircBjPzIV7Jjy rect.text{fill:none;stroke-width:0;}#mermaid-svg-ATEircBjPzIV7Jjy .icon-shape,#mermaid-svg-ATEircBjPzIV7Jjy .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-ATEircBjPzIV7Jjy .icon-shape p,#mermaid-svg-ATEircBjPzIV7Jjy .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-ATEircBjPzIV7Jjy .icon-shape rect,#mermaid-svg-ATEircBjPzIV7Jjy .image-shape rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-ATEircBjPzIV7Jjy .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-ATEircBjPzIV7Jjy .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-ATEircBjPzIV7Jjy :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 常见踩坑
38TB target cache 存储
→ 提前规划磁盘 / 减 layer
target_layer_ids 含最后一层
→ eval hidden 与 cache 不一致
assert_no_final_target_layer
local_batch_size != 1
→ 单样本 anchor 数已足
resume 时改 world_size
→ topology 不一致报错
confidence_threshold > 0 时
做校准评估 → 不无偏
EVAL_ATTN_IMPLEMENTATION
与训练 flex_attention 不一致
sglang 采样参数未对齐 target
→ 训练分布偏移
切换 target 未改 chat_template
→ loss_mask 错位
图说明: 8 类常见踩坑点,每个都对应代码中具体校验或文档警告。assert_no_final_target_layer(base_evaluator.py:100-112(file:///workspace/deepspec/eval/base_evaluator.py#L100-112))会主动拦截 P2;local_batch_size != saved_local_batch_size 在 ckpt_manager.py:84-133(file:///workspace/deepspec/trainer/ckpt_manager.py#L84-133) 校验拦截 P4;sglang 采样参数(P7)与 chat_template(P8)需手动对齐,无代码校验。
详细踩坑对照:
- 38TB target cache 存储 (README.md:29(file:///workspace/README.md#L29)):默认 Qwen3-4B 配置需 38TB 磁盘。减小存储:① 减小训练集;② 减少
target_layer_ids(如改成[9, 25]两层,存储减 60%)。 - target_layer_ids 不能含最后一层 (
assert_no_final_target_layer,base_evaluator.py:100-112(file:///workspace/deepspec/eval/base_evaluator.py#L100-112)):transformers 的output_hidden_states存归一化后的 final hidden,与 cache 中的 raw decoder output 不一致。 local_batch_size=1是有意为之 :单样本num_anchors=512已提供 3584 个监督位置,batch 维度的并行度足够。改大会 OOM。- resume 时 topology 必须一致 :
saved_world_size == world_size、saved_rank == global_rank、saved_local_batch_size == local_batch_size(ckpt_manager.py:84-133(file:///workspace/deepspec/trainer/ckpt_manager.py#L84-133))。改 world_size 必须从头训。 confidence_threshold > 0时不能做校准评估 :截断后采样不再无偏,evaluator.py:46-48(file:///workspace/deepspec/eval/dspark/evaluator.py#L46-48) 会跳过ConfidenceHeadRecorder。复现 Figure 6 必须--confidence-threshold 0.0。EVAL_ATTN_IMPLEMENTATION="sdpa"(dspark/evaluator.py:32(file:///workspace/deepspec/eval/dspark/evaluator.py#L32)):训练用 flex_attention 的 block mask,eval 用 sdpa(单 block 不需要 block mask)。改了会报错或速度退化。- sglang 采样参数必须对齐 target 推荐 :Qwen3 用
--temperature 0.7 --top-p 0.8 --top-k 20 --min-p 0;Gemma4 需查官方推荐。未对齐会让训练数据分布偏移,τ 显著下降。 - 切换 target 必须同步改
chat_template:Qwen 用qwen,Gemma4 用gemma4(parser.py:30-51(file:///workspace/deepspec/data/parser.py#L30-51))。assistant_loss_prefix决定 loss_mask 起始位置,错位会让 loss 算到 prompt 上。
小结段(总)
DSpark 论文实验的复现路径在本仓库代码中完整可走通 ------Section 4 全部基准实验可直接用 eval.py + released checkpoint 复现;Section 5 生产部署的核心算法(Algorithm 1)在论文与代码中均有线索,但完整生产 pipeline 在 HAI-LLM 内部。
复现核心要点回顾:
- Section 4.1 实验设置:12 个 config 文件 × 9 benchmark × 3 drafter,配置项与论文一字一句对得上。
- Section 4.2 主结果 Table 1 :108 cell,每 cell 一次
eval.py,可复用 released checkpoint 跳过训练。 - Section 4.3.1 Figure 2 :
accept_rates_by_position在BaseEvaluator中收集,需注意 conditional vs prefix survival 的口径差异。 - Section 4.3.2 Figure 3、4 :用
--opts "model.num_draft_layers=N"与model.block_size=K扫描,每点需独立训练。 - Section 4.3.3 Figure 5、6 :threshold sweep + reliability diagram,
ConfidenceHeadRecorder自动产出。 - Section 5 生产部署:HAI-LLM 训练优化 + ZOS 异步调度 + 变长 batch kernel,本仓库提供数据生成与训练框架的对应代码线索。
- 8 类踩坑清单:每类都对应代码中具体校验或文档警告,按图索骥即可避免。
实验资源估算:
- 复现 Section 4.2 全 108 cell(用 released checkpoint):8×A100 80G × 4 target × 3 drafter × 9 benchmark ≈ 数天(每 cell 0.5-2 小时)
- 复现 Section 4.3.2 ablation(自训练):每点需 8×A100 训练 10 epoch ≈ 1-2 天,共 4(depth)+ 4(length)+ 1(rnn)= 9 点 ≈ 2 周
- target cache 生成:1.3M 样本 × 5 层 hidden,8 卡 sglang + 8 卡 target forward ≈ 数天 + 38TB 磁盘
延伸阅读 :进入 09 使用指南 看完整端到端命令;进入 07 评测系统 回顾 τ 计算细节。论文 Section 4-5 全文在 DSpark_paper.pdf(file:///workspace/DSpark_paper.pdf),论文 Table 1 / Figure 2-6 的实验数据可在 PDF 中查阅。