DDK:Distilling Domain Knowledge for Efficient Large Language Models

速览方法论

不太了解知识蒸馏的可以看这篇文章【KD开山之作】

本文的动机是"降低学生模型在各领域和老师模型的差异"。

在一些性能差异比较大的领域，ddk方法可以降低student模型的困惑度(PPL, Perplexity)，提高学生模型在该领域的性能。

该篇文章的优势在于，提高了student模型的泛用性，相较于当下的方法，student性能更加均衡。

Contributions

1）To the best of our knowledge, we are the first to study the influence of domain-specific data mixtures for distilling LLMs, and efficiently transfer the domain knowledge of the teacher network upon the domain weights.

首个研究特定领域知识混合对大模型知识蒸馏的影响 ，并基于领域权重对领域知识迁移。

2）DDKproposes a factor smooth updating strategy to strategically enhance the appropriate focus of the distillation process on targeted domains, which effectively stabilizes the domain knowledge guided sampling process for smoother distillation.

领域知识权重是基于它提出的discrepancy factor （领域差异因子）。文章提出了一种平滑更新该因子的策略。

3）Extensive experiments on multiple benchmark datasets demonstrate the effectiveness and generalization ability of our proposed DDK.

在多个基准数据集上进行广泛实验验证方法的有效性。