【字节拥抱开源】ByteDance-Seed开源连续潜在扩散语言模型——Cola DLM

Cola DLM （Continuous Latent Diffusion Language Model）是一种分层连续潜空间扩散语言模型。它将文本自编码器（Text VAE）与基于块因果关系的扩散变换器（DiT）先验相结合：自编码器将文本映射为连续的潜在序列，并将潜在序列解码回令牌；而扩散变换器则通过流匹配（Flow Matching）实现潜在先验的传递。

本模型仓库包含论文《连续潜在扩散语言模型》的 HuggingFace 格式检查点。

链接

模型仓库: https://huggingface.co/ByteDance-Seed/Cola-DLM
GitHub仓库: https://github.com/ByteDance-Seed/Cola-DLM
论文: https://arxiv.org/abs/2605.06548
HuggingFace每日论文: https://huggingface.co/papers/2605.06548
项目主页: https://hongcanguo.github.io/Cola-DLM/
博客文章: https://hongcanguo.github.io/posts/2026-cola-dlm.html
知乎文章: https://zhuanlan.zhihu.com/p/2038324180920313704

模型文件

预期的仓库目录结构为:

text 复制代码

.
├── cola_dlm/
│   ├── cola_dit/
│   │   ├── config.json
│   │   └── model.safetensors*
│   └── cola_vae/
│       ├── config.json
│       └── model.safetensors*
├── tokenizer.json
├── README.md
└── README_zh.md

检查点由两个协作模块组成：

ColaDiTModel：一个块因果一维扩散变换器，用于连续文本潜在空间的先验建模。
ColaTextVAEModel：一个文本变分自编码器，包含编码器和条件解码器，实现文本到潜在空间及潜在空间到文本的双向映射。

快速开始

从GitHub仓库安装Cola DLM代码包，然后安装下载辅助工具：

bash 复制代码

git clone https://github.com/ByteDance-Seed/Cola-DLM.git
cd Cola-DLM
pip install -e .
pip install huggingface_hub

下载模型文件：

bash 复制代码

huggingface-cli download ByteDance-Seed/Cola-DLM --local-dir hf_models

运行一个最小的Python示例：

python 复制代码

import torch
from tokenizers import Tokenizer

from cola_dlm import (
    ColaDiTModel,
    ColaTextVAEModel,
    generate_task_repaint_inference,
)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

dit = ColaDiTModel.from_pretrained("hf_models/cola_dlm/cola_dit").to(device)
vae = ColaTextVAEModel.from_pretrained("hf_models/cola_dlm/cola_vae").to(device)
tokenizer = Tokenizer.from_file("hf_models/tokenizer.json")

prompts = [{"question": "Question: What is the capital of France? Answer:"}]
results = generate_task_repaint_inference(
    dit=dit,
    vae=vae,
    tokenizer=tokenizer,
    prompts=prompts,
    task_name="lambada",
    device=device,
    max_new_tokens=32,
    temperature=0.0,
    guidance_scale=7.0,
    timestep_num=16,
    pad_token_id=100277,
)

print(results[0]["generate"])

OpenAI 兼容服务

Cola DLM 代码版本中的配套 openai_adapter/ 服务通过 OpenAI 兼容的 Chat Completions 端点公开此模型：

text 复制代码

POST /v1/chat/completions

从代码仓库根目录安装适配器依赖项：

bash 复制代码

pip install -e .
pip install -r openai_adapter/requirements.txt

启动服务：

bash 复制代码

export COLA_DIT_PATH=hf_models/cola_dlm/cola_dit
export COLA_VAE_PATH=hf_models/cola_dlm/cola_vae
export COLA_TOKENIZER_PATH=hf_models/tokenizer.json
export COLA_MODEL_NAME=cola-dlm
export COLA_API_KEY=change-me

uvicorn openai_adapter.server:app --host 0.0.0.0 --port 8000

然后发送一个请求：

bash 复制代码

curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer change-me" \
  -d '{
    "model": "cola-dlm",
    "messages": [
      {
        "role": "user",
        "content": "Question: What is the capital of France? Answer:"
      }
    ],
    "temperature": 0,
    "max_tokens": 32,
    "stream": false
  }'

该适配器目前支持非流式补全功能。

模型详情

架构: 文本VAE + 块因果DiT潜在先验
训练目标: 两阶段训练（先进行文本VAE预训练，再通过流匹配进行文本VAE与DiT联合训练）
训练算力节点: 发布权重对应论文RQ4扩展曲线中2000 EFLOPs的检查点
分词器: OLMo 2分词器（词汇量100,278词条）
特殊标记ID: 填充标记=100277，结束标记=100257，im_end标记=100265
框架: PyTorch 2.1+ 和 HuggingFace Transformers 4.40+
许可: Apache 2.0许可证

评估结果

开源推理实现的零样本基准测试结果：

任务	准确率(%)
LAMBADA	50.80
MMLU	19.30
OBQA	23.00
HellaSwag	10.70
RACE	19.60
SIQA	28.90
SQuAD	30.90
Story Cloze	30.77
任务平均	26.75

开源HuggingFace实现与论文内部实现可能存在细微差异，各任务数值会有小幅波动，但整体趋势与论文一致。

使用范围

Cola DLM主要用于以下研究领域：

分层潜变量语言模型
文本连续潜在扩散
流匹配先验
基准式文本生成

该检查点未经过指令微调且未进行RLHF处理，不应视为生产级聊天机器人或用于安全关键决策。

局限性

主要基于英文文本训练，其他语言评估不足
输出可能包含事实错误、冒犯内容、偏见或幻觉
生成质量对提示格式和长度敏感，建议采用"问题:...答案:"式提示进行快速评估
生成时使用可变KV缓存，服务实现需在单进程内序列化生成（除非显式隔离缓存处理）

引用

如果您在工作中使用了Cola DLM，请引用：

bibtex 复制代码

@article{guo2026cola,
  title   = {Continuous Latent Diffusion Language Model},
  author  = {Guo, Hongcan and Zhao, Qinyu and Zhao, Yian and Nie, Shen and
             Zhu, Rui and Guo, Qiushan and Wang, Feng and Yang, Tao and
             Zhao, Hengshuang and Wei, Guoqiang and Zeng, Yan},
  journal = {arXiv preprint arXiv:2605.06548},
  year    = {2026},
  url     = {https://arxiv.org/abs/2605.06548},
}