18-vLLM 结构化输出约束分析文档

vLLM 结构化输出约束分析文档

📌 定位

本文档深入分析 vLLM 的**结构化输出(Structured Output)**系统架构与实现细节。结构化输出是 LLM 推理中的关键技术,通过在 token 生成阶段施加语法约束(JSON Schema、正则表达式、Grammar 等),确保模型输出符合预定义的结构规范。
#mermaid-svg-6bbY0RbgpnHqvYXe{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-6bbY0RbgpnHqvYXe .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-6bbY0RbgpnHqvYXe .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-6bbY0RbgpnHqvYXe .error-icon{fill:#552222;}#mermaid-svg-6bbY0RbgpnHqvYXe .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-6bbY0RbgpnHqvYXe .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-6bbY0RbgpnHqvYXe .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-6bbY0RbgpnHqvYXe .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-6bbY0RbgpnHqvYXe .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-6bbY0RbgpnHqvYXe .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-6bbY0RbgpnHqvYXe .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-6bbY0RbgpnHqvYXe .marker{fill:#333333;stroke:#333333;}#mermaid-svg-6bbY0RbgpnHqvYXe .marker.cross{stroke:#333333;}#mermaid-svg-6bbY0RbgpnHqvYXe svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-6bbY0RbgpnHqvYXe p{margin:0;}#mermaid-svg-6bbY0RbgpnHqvYXe .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-6bbY0RbgpnHqvYXe .cluster-label text{fill:#333;}#mermaid-svg-6bbY0RbgpnHqvYXe .cluster-label span{color:#333;}#mermaid-svg-6bbY0RbgpnHqvYXe .cluster-label span p{background-color:transparent;}#mermaid-svg-6bbY0RbgpnHqvYXe .label text,#mermaid-svg-6bbY0RbgpnHqvYXe span{fill:#333;color:#333;}#mermaid-svg-6bbY0RbgpnHqvYXe .node rect,#mermaid-svg-6bbY0RbgpnHqvYXe .node circle,#mermaid-svg-6bbY0RbgpnHqvYXe .node ellipse,#mermaid-svg-6bbY0RbgpnHqvYXe .node polygon,#mermaid-svg-6bbY0RbgpnHqvYXe .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-6bbY0RbgpnHqvYXe .rough-node .label text,#mermaid-svg-6bbY0RbgpnHqvYXe .node .label text,#mermaid-svg-6bbY0RbgpnHqvYXe .image-shape .label,#mermaid-svg-6bbY0RbgpnHqvYXe .icon-shape .label{text-anchor:middle;}#mermaid-svg-6bbY0RbgpnHqvYXe .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-6bbY0RbgpnHqvYXe .rough-node .label,#mermaid-svg-6bbY0RbgpnHqvYXe .node .label,#mermaid-svg-6bbY0RbgpnHqvYXe .image-shape .label,#mermaid-svg-6bbY0RbgpnHqvYXe .icon-shape .label{text-align:center;}#mermaid-svg-6bbY0RbgpnHqvYXe .node.clickable{cursor:pointer;}#mermaid-svg-6bbY0RbgpnHqvYXe .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-6bbY0RbgpnHqvYXe .arrowheadPath{fill:#333333;}#mermaid-svg-6bbY0RbgpnHqvYXe .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-6bbY0RbgpnHqvYXe .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-6bbY0RbgpnHqvYXe .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-6bbY0RbgpnHqvYXe .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-6bbY0RbgpnHqvYXe .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-6bbY0RbgpnHqvYXe .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-6bbY0RbgpnHqvYXe .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-6bbY0RbgpnHqvYXe .cluster text{fill:#333;}#mermaid-svg-6bbY0RbgpnHqvYXe .cluster span{color:#333;}#mermaid-svg-6bbY0RbgpnHqvYXe div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-6bbY0RbgpnHqvYXe .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-6bbY0RbgpnHqvYXe rect.text{fill:none;stroke-width:0;}#mermaid-svg-6bbY0RbgpnHqvYXe .icon-shape,#mermaid-svg-6bbY0RbgpnHqvYXe .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-6bbY0RbgpnHqvYXe .icon-shape p,#mermaid-svg-6bbY0RbgpnHqvYXe .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-6bbY0RbgpnHqvYXe .icon-shape .label rect,#mermaid-svg-6bbY0RbgpnHqvYXe .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-6bbY0RbgpnHqvYXe .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-6bbY0RbgpnHqvYXe .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-6bbY0RbgpnHqvYXe :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} auto/xgrammar
outlines
lm-format-enforcer
guidance
StructuredOutputsConfig

配置层
backend 选择
XgrammarBackend

高性能语法约束
OutlinesBackend

FSM 约束引擎
LMFormatEnforcerBackend

轻量级格式强制
GuidanceBackend

Guidance 库集成
支持类型

JSON/Regex/Grammar/

Choice/StructuralTag
支持类型

JSON/Regex/Choice
支持类型

JSON/Regex/Choice
支持类型

JSON/Regex/Grammar/

Choice/StructuralTag


一、StructuredOutputsConfig 配置

📍 源码位置

structured_outputs.py(file:///workspace/vllm/config/structured_outputs.py)

核心配置类

StructuredOutputsConfig 是引擎级别的结构化输出配置类,定义了后端选择和行为控制参数。

配置参数说明
参数名 类型 默认值 说明
backend Literal["auto", "xgrammar", "guidance", "outlines", "lm-format-enforcer"] "auto" 选择结构化输出后端。"auto" 会根据请求内容和库支持自动选择
disable_any_whitespace bool False 禁止 JSON 输出中的空白字符(紧凑格式)。仅 xgrammar 和 guidance 支持
disable_additional_properties bool False guidance 后端不使用 additionalProperties。仅 guidance 后端支持
reasoning_parser str "" 推理内容解析器选择
reasoning_parser_plugin str "" 动态加载的推理解析器插件路径
enable_in_reasoning bool False 是否在推理阶段使用结构化输入

关键源码片段

1. 后端类型定义 (structured_outputs.py:12-14(file:///workspace/vllm/config/structured_outputs.py#L12-L14))

python 复制代码
StructuredOutputsBackend = Literal[
    "auto", "xgrammar", "guidance", "outlines", "lm-format-enforcer"
]

2. 配置验证逻辑 (structured_outputs.py:62-73(file:///workspace/vllm/config/structured_outputs.py#L62-L73))

python 复制代码
@model_validator(mode="after")
def _validate_structured_output_config(self) -> Self:
    if self.disable_any_whitespace and self.backend not in ("xgrammar", "guidance"):
        raise ValueError(
            "disable_any_whitespace is only supported for "
            "xgrammar and guidance backends."
        )
    if self.disable_additional_properties and self.backend != "guidance":
        raise ValueError(
            "disable_additional_properties is only supported "
            "for the guidance backend."
        )
    return self

3. Hash 计算 (structured_outputs.py:44-60(file:///workspace/vllm/config/structured_outputs.py#L44-L60))

该 config 不影响计算图(computation graph),因此 hash 计算返回空因子列表:

python 复制代码
def compute_hash(self) -> str:
    factors: list[Any] = []
    hash_str = safe_hash(str(factors).encode(), usedforsecurity=False).hexdigest()
    return hash_str

二、后端实现(四种后端逐一深入分析)

类型系统基础

在分析各后端之前,先理解抽象基类定义:

📍 backend_types.py(file:///workspace/vllm/v1/structured_output/backend_types.py)

StructuredOutputOptions 枚举 (backend_types.py:19-25(file:///workspace/vllm/v1/structured_output/backend_types.py#L19-L25))
python 复制代码
class StructuredOutputOptions(enum.Enum):
    JSON = enum.auto()
    JSON_OBJECT = enum.auto()
    REGEX = enum.auto()
    GRAMMAR = enum.auto()
    CHOICE = enum.auto()
    STRUCTURAL_TAG = enum.auto()
StructuredOutputGrammar 抽象基类 (backend_types.py:31-95(file:///workspace/vllm/v1/structured_output/backend_types.py#L31-L95))

请求级别的 Grammar 实现,必须实现以下方法:

方法 功能说明
accept_tokens(request_id, tokens) 接受 token 列表并推进 FSM,返回是否成功
validate_tokens(tokens) 验证 token 是否被接受(不推进 FSM)
rollback(num_tokens) 回滚指定数量的 token
fill_bitmask(bitmask, idx) 填充 bitmask 用于 logits masking
is_terminated() 检查是否终止
reset() 重置状态
StructuredOutputBackend 抽象基类 (backend_types.py:98-136(file:///workspace/vllm/v1/structured_output/backend_types.py#L98-L136))

引擎级别的后端实现,必须实现:

方法 功能说明
compile_grammar(request_type, grammar_spec) 编译 grammar 规范为 StructuredOutputGrammar
allocate_token_bitmask(max_num_seqs) 分配 token bitmask 内存
destroy() 清理资源

2.1 XgrammarBackend - 高性能语法约束

📍 backend_xgrammar.py(file:///workspace/vllm/v1/structured_output/backend_xgrammar.py)

核心原理

Xgrammar 是 MLC-AI 开发的高性能语法约束库,基于 LLAMA.cpp 的 GBNF (Grammar-Based Normal Form) 语法格式。其核心优势在于:

  1. 原生 C++ 实现:编译和匹配性能极高
  2. 直接 Bitmask 操作 :通过 fill_next_token_bitmask 直接操作 GPU tensor
  3. Jump-forward 解码:支持跳过确定性 token 加速生成
  4. Speculative Decoding 支持:内置回滚机制
初始化流程 (backend_xgrammar.py:36-75(file:///workspace/vllm/v1/structured_output/backend_xgrammar.py#L36-L75))
python 复制代码
@dataclass
class XgrammarBackend(StructuredOutputBackend):
    def __post_init__(self):
        self.disable_any_whitespace = (
            self.vllm_config.structured_outputs_config.disable_any_whitespace
        )

        # 特殊处理 Mistral tokenizer(Tekken 编码)
        if is_mistral_tokenizer(self.tokenizer):
            stop_token_ids = [self.tokenizer.eos_token_id]
            self.vocab_size = len(self.tokenizer.vocab)
            tokenizer_info = xgr.TokenizerInfo(
                encoded_vocab=self.tokenizer.vocab,
                vocab_type=xgr.VocabType.RAW if self.tokenizer.is_tekken
                else xgr.VocabType.BYTE_FALLBACK,
                vocab_size=self.vocab_size,
                stop_token_ids=stop_token_ids,
                add_prefix_space=True,
            )
        else:
            tokenizer_info = xgr.TokenizerInfo.from_huggingface(
                self.tokenizer, vocab_size=self.vocab_size
            )

        # 创建 GrammarCompiler(带缓存)
        self.compiler = xgr.GrammarCompiler(
            tokenizer_info,
            max_threads=8,
            cache_enabled=True,
            cache_limit_bytes=vllm.envs.VLLM_XGRAMMAR_CACHE_MB * 1024 * 1024,
        )

        # Speculative decoding 支持
        self.num_speculative_tokens = 0
        if self.vllm_config.speculative_config is not None:
            self.num_speculative_tokens = (
                self.vllm_config.speculative_config.num_speculative_tokens
            )

关键特性

  • TokenizerInfo 构建:区分 Mistral/Tekken tokenizer 和标准 HF tokenizer
  • GrammarCompiler 缓存:避免重复编译相同 grammar
  • Speculative tokens 回滚 :设置 max_rollback_tokens
Grammar 编译逻辑 (backend_xgrammar.py:77-122(file:///workspace/vllm/v1/structured_output/backend_xgrammar.py#L77-L122))
python 复制代码
def compile_grammar(
    self, request_type: StructuredOutputOptions, grammar_spec: str
) -> StructuredOutputGrammar:
    if request_type == StructuredOutputOptions.JSON:
        ctx = self.compiler.compile_json_schema(
            grammar_spec, any_whitespace=not self.disable_any_whitespace
        )
    elif request_type == StructuredOutputOptions.JSON_OBJECT:
        ctx = self.compiler.compile_json_schema(
            '{"type": "object"}', any_whitespace=not self.disable_any_whitespace
        )
    elif request_type == StructuredOutputOptions.GRAMMAR:
        ctx = self.compiler.compile_grammar(grammar_spec)
    elif request_type == StructuredOutputOptions.REGEX:
        ctx = self.compiler.compile_regex(grammar_spec)
    elif request_type == StructuredOutputOptions.STRUCTURAL_TAG:
        # 处理 structural tag(工具调用场景)
        s_tag = json.loads(grammar_spec)
        if "structures" in s_tag:
            tags = [
                xgr.StructuralTagItem(
                    begin=s["begin"],
                    schema=json.dumps(s["schema"]),
                    end=s["end"],
                )
                for s in s_tag["structures"]
            ]
            ctx = self.compiler.compile_structural_tag(tags, s_tag["triggers"])
        else:
            ctx = self.compiler.compile_structural_tag(grammar_spec)
    else:
        raise ValueError(...)

    return XgrammarGrammar(
        matcher=xgr.GrammarMatcher(ctx, max_rollback_tokens=self.num_speculative_tokens),
        vocab_size=self.vocab_size,
        ctx=ctx,
    )

支持的编译类型映射

Request Type 编译方法 说明
JSON compile_json_schema() JSON Schema → Grammar
JSON_OBJECT compile_json_schema('{"type": "object"}') 任意 JSON 对象
GRAMMAR compile_grammar() EBNF/Grammar 字符串
REGEX compile_regex() 正则表达式 → Grammar
STRUCTURAL_TAG compile_structural_tag() 结构化标签(工具调用)
XgrammarGrammar - Token 级别操作 (backend_xgrammar.py:131-199(file:///workspace/vllm/v1/structured_output/backend_xgrammar.py#L131-L199))
python 复制代码
@dataclass
class XgrammarGrammar(StructuredOutputGrammar):
    vocab_size: int
    matcher: xgr.GrammarMatcher
    ctx: xgr.CompiledGrammar
    num_processed_tokens: int = field(default_factory=lambda: 0, ...)
    _is_terminated: bool = field(default=False, ...)

    def accept_tokens(self, request_id: str, tokens: list[int]) -> bool:
        """推进 FSM 状态机"""
        if self._is_terminated:
            return False
        for token in tokens:
            if not self.matcher.accept_token(token):
                logger.error("Failed to advance FSM for request %s", request_id)
                return False
            self.num_processed_tokens += 1
        self._is_terminated = self.matcher.is_terminated()
        return True

    def validate_tokens(self, tokens: list[int]) -> list[int]:
        """验证 token 但不推进状态"""
        accepted_tokens = []
        for token in tokens:
            if self.matcher.accept_token(token):
                accepted_tokens.append(token)
            else:
                break
        if len(accepted_tokens) > 0:
            self.matcher.rollback(len(accepted_tokens))  # 回滚到初始状态
        return accepted_tokens

    def rollback(self, num_tokens: int) -> None:
        """回滚状态"""
        self.matcher.rollback(num_tokens)
        self.num_processed_tokens -= num_tokens
        self._is_terminated = self.matcher.is_terminated()

    def fill_bitmask(self, bitmask: torch.Tensor, idx: int) -> None:
        """填充下一轮允许的 token bitmask"""
        self.matcher.fill_next_token_bitmask(bitmask, idx)

核心机制

  • FSM(有限状态机):每个 token 推进状态机一步
  • Bitmask 填充:直接告诉 GPU 哪些 token 是合法的
  • 回滚支持:用于 speculative decoding 场景
验证函数 - 不支持的 JSON Schema 特性检测 (backend_xgrammar.py:221-265(file:///workspace/vllm/v1/structured_output/backend_xgrammar.py#L221-L265))
python 复制代码
STRING_SUPPORTED_FORMATS = {
    "email", "date", "time", "date-time", "duration",
    "ipv4", "ipv6", "hostname", "uuid", "uri",
    "uri-reference", "uri-template", "json-pointer", "relative-json-pointer"
}

def has_xgrammar_unsupported_json_features(schema: dict[str, Any]) -> bool:
    """递归检查 JSON schema 是否包含 xgrammar 不支持的特性"""

    def check_object(obj: dict[str, Any]) -> bool:
        # 1. 数值类型的 multipleOf 不支持
        if obj.get("type") in ("integer", "number") and ("multipleOf" in obj):
            return True

        # 2. 数组的 uniqueItems/contains/minContains/maxContains 不支持
        if obj.get("type") == "array" and any(
            key in obj for key in ("uniqueItems", "contains", "minContains", "maxContains")
        ):
            return True

        # 3. 字符串 format 仅支持特定集合
        if obj.get("type") == "string" and "format" in obj \
           and obj["format"] not in STRING_SUPPORTED_FORMATS:
            return True

        # 4. 对象的 patternProperties/propertyNames 不支持
        if obj.get("type") == "object" and any(
            key in obj for key in ("patternProperties", "propertyNames")
        ):
            return True

        # 5. 递归检查嵌套对象
        for value in obj.values():
            ...
        return False

    return check_object(schema)

2.2 OutlinesBackend - 基于 FSM 的约束引擎

📍 backend_outlines.py(file:///workspace/vllm/v1/structured_output/backend_outlines.py)

核心原理

Outlines 库(现 outlines_core)的核心思想是:

  1. Regex → DFA 转换:将正则表达式转换为确定性有限自动机(DFA)
  2. Vocabulary 映射:将 DFA 状态映射到 token vocabulary
  3. Index 构建:构建高效的查找索引加速 mask 计算
  4. Guide 引擎:通过 Guide 对象管理状态转换
初始化流程 (backend_outlines.py:52-55(file:///workspace/vllm/v1/structured_output/backend_outlines.py#L52-L55))
python 复制代码
@dataclass
class OutlinesBackend(StructuredOutputBackend):
    def __post_init__(self):
        self.vocabulary = get_outlines_vocabulary(self.tokenizer)
        self.cache = get_outlines_cache()  # LRU 或 disk cache

Vocabulary 构建过程(详见 utils.py):

  • 从 tokenizer 提取词汇表
  • 处理特殊 token(如 <0xXX> byte token)
  • 构建 bytes → token_ids 映射
  • 计算 hash 用于缓存键
Grammar 编译逻辑 (backend_outlines.py:69-93(file:///workspace/vllm/v1/structured_output/backend_outlines.py#L69-L93))
python 复制代码
def compile_grammar(
    self, request_type: StructuredOutputOptions, grammar_spec: str
) -> StructuredOutputGrammar:
    if request_type == StructuredOutputOptions.JSON:
        regex = json_schema.build_regex_from_schema(grammar_spec)
    elif request_type == StructuredOutputOptions.REGEX:
        regex = grammar_spec
    elif request_type == StructuredOutputOptions.CHOICE:
        choices = ast.literal_eval(grammar_spec)
        choices = [regex_escape(c) for c in choices]
        regex = "(" + "|".join(choices) + ")"
    else:
        raise ValueError(...)

    index = self._compile_index(regex, self.vocabulary)
    max_rollback_tokens = (
        self.vllm_config.speculative_config.num_speculative_tokens
        if self.vllm_config.speculative_config is not None else 0
    )
    return OutlinesGrammar(
        vocab_size=self.vocab_size,
        guide=oc.Guide(index, max_rollback=max_rollback_tokens),
    )

关键特点

  • JSON Schema → Regex :通过 json_schema.build_regex_from_schema() 转换
  • Choice → Regex :将选项列表转换为 (choice1|choice2|...) 格式
  • 不支持 Grammar 类型:Outlines 后端不支持原始 Grammar 规范
Index 编译与缓存 (backend_outlines.py:57-67(file:///workspace/vllm/v1/structured_output/backend_outlines.py#L57-L67))
python 复制代码
def _compile_index(self, regex_string: str, vocabulary: OutlinesVocabulary) -> oc.Index:
    cache_key = f"{vocabulary._hash}_{regex_string}"
    if cache_key in self.cache:
        return self.cache[cache_key]

    index = oc.Index(regex_string, vocabulary.inner)
    self.cache[cache_key] = index
    return index

缓存策略:

  • LRU Cache(默认):内存缓存,最多 128 条目
  • Disk Cache (可选):通过 VLLM_V1_USE_OUTLINES_CACHE=1 启用
OutlinesGrammar - Guide 状态管理 (backend_outlines.py:107-164(file:///workspace/vllm/v1/structured_output/backend_outlines.py#L107-L164))
python 复制代码
@dataclass
class OutlinesGrammar(StructuredOutputGrammar):
    vocab_size: int
    guide: oc.Guide
    num_processed_tokens: int = field(default_factory=lambda: 0, ...)
    _prev_finished: bool = field(default=False, ...)  # 延迟 finished 信号

    def accept_tokens(self, request_id: str, tokens: list[int]) -> bool:
        """
        两阶段检查:
        1. accepts_tokens(): 检查当前 token 是否可接受
        2. advance(): 实际推进状态(可能到达 dead state)
        """
        if self.guide.accepts_tokens(tokens):
            for t in tokens:
                self.guide.advance(t)
                self.num_processed_tokens += 1
            return True
        return False

    def is_terminated(self) -> bool:
        """延迟 finished 信号:DFA accept 后再等一轮让 EOS 可发出"""
        curr = self.guide.is_finished()
        prev = self._prev_finished
        self._prev_finished = curr
        return prev

    def fill_bitmask(self, bitmask: torch.Tensor, idx: int) -> None:
        """通过 Guide 写入 mask 到 GPU memory"""
        mask = bitmask[idx]
        self.guide.write_mask_into(mask.data_ptr(), mask.numel(), mask.element_size())

特殊设计 - _prev_finished 延迟机制

  • Outlines_core 在 DFA 到达 accept 状态时立即标记 finished
  • 但 vLLM 需要在 finished 后还能发出 EOS token
  • 因此延迟一周期返回 terminated 信号
正则表达式验证 (backend_outlines.py:299-330(file:///workspace/vllm/v1/structured_output/backend_outlines.py#L299-330))

Outlines 后端对正则表达式有严格限制:

python 复制代码
def validate_regex_is_buildable(pattern: str) -> None:
    """
    验证正则表达式是否符合 regex-automata 的要求:
    1. 无 backreferences(反向引用)
    2. 无 look-around assertions(前瞻/后瞻断言)
    3. 无 Unicode word boundaries(\b, \B)
    4. 必须有 universal start state(无锚定前缀)
    """
    parsed = sre_parse.parse(pattern)
    _check_unsupported(parsed)  # 检查不支持的特性
    if _prefix_needs_context(parsed):  # 检查是否有锚定前缀
        raise ValueError("Regex does not have a anchored universal start state...")

不支持的正则特性

特性 示例 原因
Backreferences \1, \k<name> DFA 无法处理
Look-ahead (?=...), (?!...) 需要 context
Look-behind (?<=...), (?<!...) 需要 context
Word boundary \b, \B Unicode 边界复杂

2.3 LMFormatEnforcerBackend - 轻量级格式强制

📍 backend_lm_format_enforcer.py(file:///workspace/vllm/v1/structured_output/backend_lm_format_enforcer.py)

核心原理

lm-format-enforcer 是一个轻量级的格式强制库,采用字符级别解析器(CharacterLevelParser)方式工作:

  1. Parser 定义:定义字符级别的输出约束规则
  2. Token 允许集计算:根据当前已生成的 prefix 计算下一个合法 token 集合
  3. TokenEnforcer:封装 tokenizer 和 parser,提供高效查询接口
初始化流程 (backend_lm_format_enforcer.py:94-98(file:///workspace/vllm/v1/structured_output/backend_lm_format_enforcer.py#L94-L98))
python 复制代码
@dataclass
class LMFormatEnforcerBackend(StructuredOutputBackend):
    def __post_init__(self):
        # 使用 LRU cache 缓存 tokenizer_data 构建结果
        self.tokenizer_data = _cached_build_vllm_token_enforcer_tokenizer_data(
            self.tokenizer, self.vocab_size
        )

Tokenizer Data 构建(cached):

python 复制代码
@lru_cache
def _cached_build_vllm_token_enforcer_tokenizer_data(
    tokenizer: PreTrainedTokenizerBase, vocab_size: int
) -> "lmfe_vllm.TokenEnforcerTokenizerData":
    return lmfe_vllm.build_vllm_token_enforcer_tokenizer_data(
        tokenizer, use_bitmask=True, vocab_size=vocab_size
    )
Grammar 编译 - CharacterLevelParser 选择 (backend_lm_format_enforcer.py:100-135(file:///workspace/vllm/v1/structured_output/backend_lm_format_enforcer.py#L100-L135))
python 复制代码
def compile_grammar(
    self, request_type: StructuredOutputOptions, grammar_spec: str
) -> StructuredOutputGrammar:
    character_level_parser: lmformatenforcer.CharacterLevelParser

    if request_type == StructuredOutputOptions.JSON:
        spec_dict = json.loads(grammar_spec)
        character_level_parser = lmformatenforcer.JsonSchemaParser(spec_dict)
    elif request_type == StructuredOutputOptions.JSON_OBJECT:
        character_level_parser = lmformatenforcer.JsonSchemaParser(None)
    elif request_type == StructuredOutputOptions.REGEX:
        character_level_parser = lmformatenforcer.RegexParser(grammar_spec)
    elif request_type == StructuredOutputOptions.CHOICE:
        choices = ast.literal_eval(grammar_spec)
        character_level_parser = lmformatenforcer.UnionParser(
            [lmformatenforcer.StringParser(choice) for choice in choices]
        )
    else:
        raise ValueError(...)

    # ⚠️ 不支持 speculative decoding
    if max_rollback_tokens > 0:
        raise ValueError(
            "LM Format Enforcer backend does not support speculative tokens"
        )

    token_enforcer = lmformatenforcer.TokenEnforcer(
        tokenizer_data=self.tokenizer_data,
        parser=character_level_parser,
    )
    return LMFormatEnforcerGrammar(token_enforcer)

Parser 类型对应关系

Request Type Parser 类 说明
JSON JsonSchemaParser(schema_dict) JSON Schema 约束
JSON_OBJECT JsonSchemaParser(None) 任意 JSON 对象
REGEX RegexParser(pattern) 正则表达式约束
CHOICE UnionParser([StringParser(...)]) 多选一约束
LMFormatEnforcerGrammar - Prefix 追踪机制 (backend_lm_format_enforcer.py:43-90(file:///workspace/vllm/v1/structured_output/backend_lm_format_enforcer.py#L43-L90))
python 复制代码
@dataclass
class LMFormatEnforcerGrammar(StructuredOutputGrammar):
    token_enforcer: lmformatenforcer.TokenEnforcer
    current_tokens_prefix: list[int] = field(default_factory=list)

    def accept_tokens(self, request_id: str, tokens: list[int]) -> bool:
        """逐个检查 token 是否在允许集中"""
        original_len = len(self.current_tokens_prefix)
        for token in tokens:
            if not self.token_enforcer.get_allowed_tokens(
                self.current_tokens_prefix
            ).is_token_allowed(token):
                # 原子性操作:失败时回滚部分更新
                del self.current_tokens_prefix[original_len:]
                return False
            self.current_tokens_prefix.append(token)
        return True

    def fill_bitmask(self, bitmask: torch.Tensor, batch_index: int) -> None:
        """获取当前 prefix 的允许 token 集合并写入 bitmask"""
        allowed_tokens = self.token_enforcer.get_allowed_tokens(
            self.current_tokens_prefix
        )
        bitmask[batch_index] = allowed_tokens.allowed_tokens

    def is_terminated(self) -> bool:
        """当最后一个 token 是 EOS 时认为终止"""
        return (
            len(self.current_tokens_prefix) > 0
            and self.current_tokens_prefix[-1] == self.token_enforcer.eos_token_id
        )

核心特点

  • Prefix 追踪:维护已生成的 token 序列作为上下文
  • 原子性操作:accept 失败时完整回滚
  • 不支持 Speculative Decoding:明确抛出异常

2.4 GuidanceBackend - Guidance 库集成

📍 backend_guidance.py(file:///workspace/vllm/v1/structured_output/backend_guidance.py)

核心原理

Guidance(通过 llguidance 库集成)是微软开发的约束生成框架,特点包括:

  1. 高级 Grammar 表示:支持复杂的结构化标签(Structural Tag)
  2. Flexible Whitespace:可配置空白字符处理
  3. Additional Properties 控制:精细控制 JSON Schema 行为
  4. LLMatcher:高性能的 token 级别匹配器
初始化流程 (backend_guidance.py:87-101(file:///workspace/vllm/v1/structured_output/backend_guidance.py#L87-L101))
python 复制代码
@dataclass
class GuidanceBackend(StructuredOutputBackend):
    def __post_init__(self):
        self.disable_any_whitespace = (
            self.vllm_config.structured_outputs_config.disable_any_whitespace
        )
        self.disable_additional_properties = (
            self.vllm_config.structured_outputs_config.disable_additional_properties
        )

        # 特殊处理 Mistral tokenizer
        if is_mistral_tokenizer(self.tokenizer):
            self.ll_tokenizer = self.tokenizer.llg_tokenizer
        else:
            self.ll_tokenizer = llguidance_hf.from_tokenizer(
                self.tokenizer, max(self.vocab_size, len(self.tokenizer))
            )
Grammar 序列化 - 统一入口 (backend_guidance.py:219-285(file:///workspace/vllm/v1/structured_output/backend_guidance.py#L219-L285))
python 复制代码
def serialize_guidance_grammar(
    request_type: StructuredOutputOptions,
    grammar_spec: str | dict[str, Any],
    disable_any_whitespace: bool = False,
    disable_additional_properties: bool = False,
) -> str:

    def _process_schema(grammar_spec) -> str:
        if disable_additional_properties:
            grammar_spec = process_for_additional_properties(grammar_spec)
        return llguidance.LLMatcher.grammar_from_json_schema(
            grammar_spec,
            defaults={"whitespace_flexible": not disable_any_whitespace},
        )

    if request_type == StructuredOutputOptions.JSON:
        return _process_schema(grammar_spec)
    elif request_type == StructuredOutputOptions.REGEX:
        tp = "regex"
    elif request_type == StructuredOutputOptions.GRAMMAR:
        tp = "grammar"
    elif request_type == StructuredOutputOptions.CHOICE:
        tp = "choice"
    elif request_type == StructuredOutputOptions.STRUCTURAL_TAG:
        # 处理结构化标签(复杂工具调用场景)
        s_tag = json.loads(grammar_spec) if isinstance(grammar_spec, str) else grammar_spec
        triggers = s_tag["triggers"]
        tags = []
        for s in s_tag["structures"]:
            begin = s["begin"]
            trig = next((t for t in triggers if begin.startswith(t)), None)
            tags.append(llguidance.StructTag(
                trigger=trig,
                begin=s["begin"],
                grammar=_process_schema(s["schema"]),
                end=s["end"],
            ))
        return llguidance.StructTag.to_grammar(tags)

    return llguidance.grammar_from(tp, grammar_spec)

Additional Properties 处理 (backend_guidance.py:35-46(file:///workspace/vllm/v1/structured_output/backend_guidance.py#L35-L46)):

python 复制代码
def _walk_json_for_additional_properties(data: object):
    """递归遍历 JSON schema,为包含 properties 的对象添加 additionalProperties: false"""
    if isinstance(data, dict):
        for value in data.values():
            _walk_json_for_additional_properties(value)
        if "additionalProperties" not in data and (
            "properties" in data or "patternProperties" in data
        ):
            data["additionalProperties"] = False
    elif isinstance(data, list):
        for item in data:
            _walk_json_for_additional_properties(item)
GuidanceGrammar - LLMatcher 封装 (backend_guidance.py:137-217(file:///workspace/vllm/v1/structured_output/backend_guidance.py#L137-L217))
python 复制代码
@dataclass
class GuidanceGrammar(StructuredOutputGrammar):
    ll_matcher: llguidance.LLMatcher
    ll_tokenizer: llguidance.LLTokenizer
    vocab_size: int
    printed_error: bool = False
    terminated: bool = False
    rollback_lag: int = 0  # EOS 延迟计数器

    def accept_tokens(self, request_id: str, tokens: list[int]) -> bool:
        # 检测 EOS token
        if self.ll_tokenizer.eos_token in tokens:
            if self.ll_matcher.is_stopped() and not self.terminated:
                self.rollback_lag = 1  # 延迟终止信号
            self.terminated = True

        if self.ll_matcher.is_stopped():
            return True

        # 消费 token 并推进解析器
        r = self.ll_matcher.consume_tokens(tokens)
        self.check_error()
        return r

    def fill_bitmask(self, bitmask: torch.Tensor, idx: int) -> None:
        """自动处理 stopped/error 状态下的 mask"""
        llguidance_torch.fill_next_token_bitmask(self.ll_matcher, bitmask, idx)
        self.check_error()

    def rollback(self, num_tokens: int) -> None:
        if num_tokens > 0:
            self.ll_matcher.rollback(num_tokens - self.rollback_lag)
            self.terminated = False
            self.rollback_lag = 0
            self.check_error()

特殊设计 - rollback_lag

  • 与 Outlines 类似,Guidance 也需要延迟终止信号以允许 EOS 发出
  • 通过 rollback_lag 计数器实现精确控制
不支持的 JSON Schema 特性 (backend_guidance.py:48-71(file:////workspace/vllm/v1/structured_output/backend_guidance.py#L48-L71))
python 复制代码
def has_guidance_unsupported_json_features(schema: dict[str, Any]) -> bool:
    def check_object(obj: dict[str, Any]) -> bool:
        # patternProperties 不被 llguidance 支持
        if "patternProperties" in obj:
            return True
        # 递归检查嵌套对象
        for value in obj.values():
            ...
        return False
    return check_object(schema)

三、支持的结构类型对比

3.1 各后端支持矩阵

结构类型 Xgrammar Outlines LM Format Enforcer Guidance
JSON Schema ✅ 完整支持 ✅ (转 Regex) ✅ 完整支持
JSON Object
正则表达式 ✅ (有限制)
Grammar (EBNF)
Choice ✅ (转 Grammar) ✅ (转 Regex)
Structural Tag
Speculative Decoding
Jump-forward ⚠️ TODO

3.2 详细说明

JSON Schema 约束

用途:确保输出符合 JSON Schema 定义的数据结构

示例

python 复制代码
sampling_params = SamplingParams(
    structured_outputs=StructuredOutputsParams(
        json='{"type": "object", "properties": {"name": {"type": "string"}, "age": {"type": "integer"}}}'
    )
)

各后端差异

  • Xgrammar:直接编译 JSON Schema 为 Grammar,性能最优
  • Outlines:先转换为正则表达式,再构建 DFA
  • LM Format Enforcer:使用 JsonSchemaParser 字符级解析
  • Guidance:通过 LLMatcher.grammar_from_json_schema() 处理
正则表达式约束

用途:通过正则表达式限制输出格式

示例

python 复制代码
sampling_params = SamplingParams(
    structured_outputs=StructuredOutputsParams(regex=r"\d{3}-\d{4}")
)

限制

  • Outlines 不支持:backreference、look-around、word boundary
  • 其他后端基本无限制
Grammar (EBNF/CFG) 约束

用途:使用形式语法定义复杂输出结构

示例

python 复制代码
ebnf_grammar = """
root ::= expression
expression ::= term ("+" term)*
term ::= factor ("*" factor)*
factor ::= number | "(" expression ")"
number ::= [0-9]+
"""
sampling_params = SamplingParams(
    structured_outputs=StructuredOutputsParams(grammar=ebnf_grammar)
)

Lark ↔ EBNF 转换 (utils.py:289-448(file:///workspace/vllm/v1/structured_output/utils.py#L289-L448)):

Xgrammar 仅支持 EBNF 格式,但用户可能提供 Lark 格式的 grammar。utils.py 提供 convert_lark_to_ebnf() 函数进行转换:

python 复制代码
def convert_lark_to_ebnf(grammar_str: str) -> str:
    """
    Lark 格式: rule: 'hello'
    EBNF 格式: root ::= rule\nrule ::= "hello"

    主要转换:
    1. 添加 root 规则指向第一个规则
    2. 单引号 → 双引号
    3. : → ::=
    4. | alternatives 处理
    """
Choice 约束

用途:从预定义选项中选择输出

示例

python 复制代码
sampling_params = SamplingParams(
    structured_outputs=StructuredOutputsParams(choice=["positive", "negative", "neutral"])
)

内部转换

  • Xgrammar :转换为 "root ::= "positive" | "negative" | "neutral" EBNF grammar
  • Outlines :转换为 "(positive|negative|neutral)" 正则表达式
  • LM Format Enforcer:使用 UnionParser(StringParser(...))
  • Guidance:直接传递 choice 类型
工具调用格式约束(Structural Tag)

用途:OpenAI 函数调用 / 工具调用的结构化输出

示例

python 复制代码
structural_tag = {
    "triggers": ["<function_call>"],
    "structures": [
        {
            "begin": "<function_call>",
            "schema": {"type": "object", "properties": {"name": {"type": "string"}}},
            "end": "</function_call>"
        }
    ]
}

支持情况

  • Xgrammar :✅ 通过 compile_structural_tag() 支持
  • Guidance :✅ 通过 StructTag.to_grammar() 支持
  • 其他后端:❌ 不支持

四、Request 级别配置

📍 request.py(file:///workspace/vllm/v1/structured_output/request.py)

StructuredOutputRequest 类

#mermaid-svg-XtEQcjLuU9Jnk27w{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-XtEQcjLuU9Jnk27w .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-XtEQcjLuU9Jnk27w .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-XtEQcjLuU9Jnk27w .error-icon{fill:#552222;}#mermaid-svg-XtEQcjLuU9Jnk27w .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-XtEQcjLuU9Jnk27w .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-XtEQcjLuU9Jnk27w .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-XtEQcjLuU9Jnk27w .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-XtEQcjLuU9Jnk27w .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-XtEQcjLuU9Jnk27w .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-XtEQcjLuU9Jnk27w .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-XtEQcjLuU9Jnk27w .marker{fill:#333333;stroke:#333333;}#mermaid-svg-XtEQcjLuU9Jnk27w .marker.cross{stroke:#333333;}#mermaid-svg-XtEQcjLuU9Jnk27w svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-XtEQcjLuU9Jnk27w p{margin:0;}#mermaid-svg-XtEQcjLuU9Jnk27w g.classGroup text{fill:#9370DB;stroke:none;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:10px;}#mermaid-svg-XtEQcjLuU9Jnk27w g.classGroup text .title{font-weight:bolder;}#mermaid-svg-XtEQcjLuU9Jnk27w .cluster-label text{fill:#333;}#mermaid-svg-XtEQcjLuU9Jnk27w .cluster-label span{color:#333;}#mermaid-svg-XtEQcjLuU9Jnk27w .cluster-label span p{background-color:transparent;}#mermaid-svg-XtEQcjLuU9Jnk27w .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-XtEQcjLuU9Jnk27w .cluster text{fill:#333;}#mermaid-svg-XtEQcjLuU9Jnk27w .cluster span{color:#333;}#mermaid-svg-XtEQcjLuU9Jnk27w .nodeLabel,#mermaid-svg-XtEQcjLuU9Jnk27w .edgeLabel{color:#131300;}#mermaid-svg-XtEQcjLuU9Jnk27w .edgeLabel .label rect{fill:#ECECFF;}#mermaid-svg-XtEQcjLuU9Jnk27w .label text{fill:#131300;}#mermaid-svg-XtEQcjLuU9Jnk27w .labelBkg{background:#ECECFF;}#mermaid-svg-XtEQcjLuU9Jnk27w .edgeLabel .label span{background:#ECECFF;}#mermaid-svg-XtEQcjLuU9Jnk27w .classTitle{font-weight:bolder;}#mermaid-svg-XtEQcjLuU9Jnk27w .node rect,#mermaid-svg-XtEQcjLuU9Jnk27w .node circle,#mermaid-svg-XtEQcjLuU9Jnk27w .node ellipse,#mermaid-svg-XtEQcjLuU9Jnk27w .node polygon,#mermaid-svg-XtEQcjLuU9Jnk27w .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-XtEQcjLuU9Jnk27w .divider{stroke:#9370DB;stroke-width:1;}#mermaid-svg-XtEQcjLuU9Jnk27w g.clickable{cursor:pointer;}#mermaid-svg-XtEQcjLuU9Jnk27w g.classGroup rect{fill:#ECECFF;stroke:#9370DB;}#mermaid-svg-XtEQcjLuU9Jnk27w g.classGroup line{stroke:#9370DB;stroke-width:1;}#mermaid-svg-XtEQcjLuU9Jnk27w .classLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.5;}#mermaid-svg-XtEQcjLuU9Jnk27w .classLabel .label{fill:#9370DB;font-size:10px;}#mermaid-svg-XtEQcjLuU9Jnk27w .relation{stroke:#333333;stroke-width:1;fill:none;}#mermaid-svg-XtEQcjLuU9Jnk27w .dashed-line{stroke-dasharray:3;}#mermaid-svg-XtEQcjLuU9Jnk27w .dotted-line{stroke-dasharray:1 2;}#mermaid-svg-XtEQcjLuU9Jnk27w #compositionStart,#mermaid-svg-XtEQcjLuU9Jnk27w .composition{fill:#333333!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-XtEQcjLuU9Jnk27w #compositionEnd,#mermaid-svg-XtEQcjLuU9Jnk27w .composition{fill:#333333!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-XtEQcjLuU9Jnk27w #dependencyStart,#mermaid-svg-XtEQcjLuU9Jnk27w .dependency{fill:#333333!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-XtEQcjLuU9Jnk27w #dependencyStart,#mermaid-svg-XtEQcjLuU9Jnk27w .dependency{fill:#333333!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-XtEQcjLuU9Jnk27w #extensionStart,#mermaid-svg-XtEQcjLuU9Jnk27w .extension{fill:transparent!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-XtEQcjLuU9Jnk27w #extensionEnd,#mermaid-svg-XtEQcjLuU9Jnk27w .extension{fill:transparent!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-XtEQcjLuU9Jnk27w #aggregationStart,#mermaid-svg-XtEQcjLuU9Jnk27w .aggregation{fill:transparent!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-XtEQcjLuU9Jnk27w #aggregationEnd,#mermaid-svg-XtEQcjLuU9Jnk27w .aggregation{fill:transparent!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-XtEQcjLuU9Jnk27w #lollipopStart,#mermaid-svg-XtEQcjLuU9Jnk27w .lollipop{fill:#ECECFF!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-XtEQcjLuU9Jnk27w #lollipopEnd,#mermaid-svg-XtEQcjLuU9Jnk27w .lollipop{fill:#ECECFF!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-XtEQcjLuU9Jnk27w .edgeTerminals{font-size:11px;line-height:initial;}#mermaid-svg-XtEQcjLuU9Jnk27w .classTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-XtEQcjLuU9Jnk27w .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-XtEQcjLuU9Jnk27w .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-XtEQcjLuU9Jnk27w :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} StructuredOutputRequest
+params: StructuredOutputsParams
-_grammar: Future<StructuredOutputGrammar> | StructuredOutputGrammar | None
+reasoning_ended: bool | None
+reasoning_parser_kwargs: dict | None
+reasoner: ReasoningParser | None
+is_grammar_ready: bool
+grammar: StructuredOutputGrammar | None
+structured_output_key: StructuredOutputKey
+from_sampling_params(sampling_params) : StructuredOutputRequest | None

核心功能 (request.py:22-75(file:///workspace/vllm/v1/structured_output/request.py#L22-L75))
python 复制代码
@dataclasses.dataclass
class StructuredOutputRequest:
    params: StructuredOutputsParams
    _grammar: Future[StructuredOutputGrammar] | StructuredOutputGrammar | None = None
    reasoning_ended: bool | None = None
    reasoning_parser_kwargs: dict[str, Any] | None = None
    reasoner: "ReasoningParser | None" = None

    @staticmethod
    def from_sampling_params(
        sampling_params: SamplingParams | None,
    ) -> "StructuredOutputRequest | None":
        """从 SamplingParams 创建请求,如果无约束则返回 None"""
        if sampling_params is None:
            return None
        params = sampling_params.structured_outputs
        if not params or params.all_constraints_none():
            return None
        return StructuredOutputRequest(params=params)

    @property
    def is_grammar_ready(self) -> bool:
        """检查异步编译的 grammar 是否就绪(100μs 超时)"""
        return self._check_grammar_completion()

    @property
    def grammar(self) -> StructuredOutputGrammar | None:
        """获取已编译的 grammar(如果就绪)"""
        completed = self._check_grammar_completion()
        return cast(StructuredOutputGrammar | None, self._grammar) if completed else None

关键设计 - 异步 Grammar 编译

  • _grammar 可以是 Future 对象(异步编译中)或已完成的 StructuredOutputGrammar
  • is_grammar_ready 属性非阻塞检查(100μs 超时)
  • 支持并发编译多个 grammar 而不阻塞调度循环
Structured Output Key 生成 (request.py:77-98(file:///workspace/vllm/v1/structured_output/request.py#L77-L98))
python 复制代码
def get_structured_output_key(params: StructuredOutputsParams) -> StructuredOutputKey:
    """根据优先级确定请求的类型和规范字符串"""
    if params.json is not None:
        if not isinstance(params.json, str):
            json_str = json.dumps(params.json)
        else:
            json_str = params.json
        return StructuredOutputOptions.JSON, json_str
    if params.json_object:
        return StructuredOutputOptions.JSON_OBJECT, ""
    if params.regex is not None:
        return StructuredOutputOptions.REGEX, params.regex
    if params.choice is not None:
        if not isinstance(params.choice, str):
            json_str = json.dumps(params.choice)
        else:
            json_str = params.choice
        return StructuredOutputOptions.CHOICE, json_str
    if params.grammar is not None:
        return StructuredOutputOptions.GRAMMAR, params.grammar
    if params.structural_tag is not None:
        return StructuredOutputOptions.STRUCTURAL_TAG, params.structural_tag
    raise ValueError("No valid structured output parameter found")

优先级顺序:JSON > JSON_OBJECT > REGEX > CHOICE > GRAMMAR > STRUCTURAL_TAG


五、工具函数与辅助模块

5.1 Utils 工具函数

📍 utils.py(file:///workspace/vllm/v1/structured_output/utils.py)

apply_grammar_bitmask - Bitmask 应用 (utils.py:44-135(file:///workspace/vllm/v1/structured_output/utils.py#L44-L135))

这是结构化输出的核心函数,负责将 grammar bitmask 应用到模型输出 logits 上:

python 复制代码
def apply_grammar_bitmask(
    scheduler_output: SchedulerOutput,
    grammar_output: GrammarOutput,
    input_batch: InputBatch,
    logits: torch.Tensor,
) -> None:
    """
    应用 grammar bitmask 到模型输出 logits

    流程:
    1. 从 scheduler 获取 compacted bitmask
    2. 根据 batch index 重新排序(因为 scheduler 和 runner 的顺序可能不同)
    3. 处理 speculative decode tokens 的 offset
    4. 将 numpy array 转换为 torch tensor 并复制到 GPU
    5. 调用 xgr.apply_token_bitmask_inplace() 修改 logits
    """

    # 1. 获取结构化输出请求的 batch indices
    struct_out_req_batch_indices: dict[str, int] = {}
    cumulative_offset = 0
    spec_tokens = scheduler_output.scheduled_spec_decode_tokens
    struct_out_req_ids = set(grammar_output.structured_output_request_ids)

    for batch_index, req_id in enumerate(input_batch.req_ids):
        logit_index = batch_index + cumulative_offset
        cumulative_offset += len(spec_tokens.get(req_id, ()))
        if req_id in struct_out_req_ids:
            struct_out_req_batch_indices[req_id] logit_index

    # 2. 重新排序 bitmask 以匹配 batch 顺序
    sorted_bitmask = np.full(
        shape=(logits.shape[0], grammar_bitmask.shape[1]),
        fill_value=-1,
        dtype=grammar_bitmask.dtype,
    )
    # ... 排序逻辑 ...

    # 3. 应用 bitmask 到 logits
    grammar_bitmask = torch.from_numpy(sorted_bitmask).to(logits.device, non_blocking=True)

    if not logits.is_cpu:
        xgr.apply_token_bitmask_inplace(logits, grammar_bitmask, indices=index_tensor)
    else:
        # CPU 情况需要 float32 转换
        if logits.dtype != torch.float32:
            logits_fp32 = logits.to(torch.float32)
            xgr.apply_token_bitmask_inplace(logits_fp32, grammar_bitmask, indices=indices)
            logits.copy_(logits_fp32.to(logits.dtype))
        else:
            xgr.apply_token_bitmask_inplace(logits, grammar_bitmask, indices=indices)

关键点

  • Batch reorder:scheduler 和 runner 的请求顺序可能不同,需要重新排序
  • Speculative offset:speculative decode tokens 会增加 logit index 偏移
  • CPU fallback:CPU 模式下需要 float32 转换(老版本 xgrammar kernel 限制)
  • Non-blocking copy :使用 non_blocking=True 避免 CPU-GPU 同步
Vocabulary 管理 (utils.py:138-286(file:///workspace/vllm/v1/structured_output/utils.py#L138-L286))

OutlinesVocabulary 包装类 (utils.py:138-151(file:///workspace/vllm/v1/structured_output/utils.py#L138-L151)):

python 复制代码
class OutlinesVocabulary:
    """包装 outlines_core.Vocabulary,附带 hash 用于缓存"""

    def __init__(self, vocabulary: oc.Vocabulary) -> None:
        self.inner = vocabulary
        # 使用 SHA256 hash 作为缓存键(避免 Python hash 的随机性)
        hex_str = hashlib.sha256(vocabulary.__repr__().encode("utf-8")).hexdigest()
        hash_int = int(hex_str, 16)
        self._hash = hash_int

Reduced Vocabulary 构建 (utils.py:206-272(file:///workspace/vllm/v1/structured_output/utils.py#L206-L272)):

python 复制代码
def _reduced_vocabulary(tokenizer: TokenizerLike) -> dict[bytes, list[int]]:
    """
    创建精简词汇表:token bytes → equivalent token ids 列表

    处理特殊情况:
    - 特殊 token(EOS, PAD 等)排除
    - SPIECE_UNDERLINE 前缀(Llama tokenizer)→ 添加空格
    - <0xXX> byte token(Llama tokenizer)→ 直接转换为字节
    - Unicode replacement 字符(GPT2 tokenizer)→ 使用 bytes_to_unicode 映射
    """
    vocabulary: dict[bytes, list[int]] = {}
    for token, token_idx in tokenizer.get_vocab().items():
        if token in tokenizer.all_special_tokens:
            continue

        token_str = convert_token_to_string(token)
        if token_str:
            # 处理各种编码格式的 token
            token_bytes = encode_token_to_bytes(token_str, token)
            if token_idx != eos_token_id:
                vocabulary.setdefault(token_bytes, []).append(token_idx)
        else:
            empty_token_ids.append(token_idx)

    return vocabulary

Cache 管理 (utils.py:154-199(file:///workspace/vllm/v1/structured_output/utils.py#L154-L199)):

python 复制代码
def get_outlines_cache():
    """获取 Index 缓存实例"""
    cache_dir = get_outlines_cache_path()

    if envs.VLLM_V1_USE_OUTLINES_CACHE:
        # Disk cache(持久化,适合生产环境但有磁盘空间风险)
        from diskcache import Cache
        logger.warning(
            "Enabling outlines cache. This is an unbounded on-disk "
            "cache. It may consume a lot of disk space..."
        )
        cache = Cache(cache_dir, eviction_policy="none", cull_limit=0)
        # 版本检查:版本变化时清空缓存
        cached_version = cache.get("__version__", None)
        if cached_version != outlines_version:
            cache.clear()
        cache.set("__version__", outlines_version)
        return cache

    # Default: LRU Cache(内存缓存,最多 128 条目)
    return LRUCache(maxsize=128)
Grammar 格式转换工具 (utils.py:289-458(file:///workspace/vllm/v1/structured_output/utils.py#L289-L458))

Lark → EBNF 转换器

python 复制代码
def grammar_is_likely_lark(grammar_str: str) -> bool:
    """检测 grammar 是否使用 Lark 语法(通过检查是否包含 '::=')"""
    for line in grammar_str.split("\n"):
        line = re.sub(r"(#|//).*$", "", line).strip()
        if not line:
            continue
        if "::=" in line:
            return False  # 已经是 EBNF 格式
    return True  # 可能是 Lark 格式

def convert_lark_to_ebnf(grammar_str: str) -> str:
    """
    完整的 Lark → EBNF 转换实现

    转换规则:
    1. 第一个规则成为 root 规则
    2. '...' → "..." (引号转换)
    3. : → ::= (定义符号)
    4. | alternatives 保持不变
    5. 注释(# 和 //)移除
    6. 验证所有引用的规则都已定义
    """

Choice → Grammar 转换 (utils.py:451-458(file:///workspace/vllm/v1/structured_output/utils.py#L451-458)):

python 复制代码
def choice_as_grammar(choice: list[str]) -> str:
    """将 choice 列表转换为 EBNF grammar 字符串"""
    def escape_ebnf_string(s: str) -> str:
        return re.sub(r'(["\\])', r"\\\1", s)

    escaped_choices = (escape_ebnf_string(c) for c in choice)
    grammar = "root ::= " + " | ".join(f'"{c}"' for c in escaped_choices)
    return grammar

六、结构化输出处理流程

#mermaid-svg-Iq8oxLucmhFrFlCE{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-Iq8oxLucmhFrFlCE .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-Iq8oxLucmhFrFlCE .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-Iq8oxLucmhFrFlCE .error-icon{fill:#552222;}#mermaid-svg-Iq8oxLucmhFrFlCE .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-Iq8oxLucmhFrFlCE .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-Iq8oxLucmhFrFlCE .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-Iq8oxLucmhFrFlCE .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-Iq8oxLucmhFrFlCE .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-Iq8oxLucmhFrFlCE .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-Iq8oxLucmhFrFlCE .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-Iq8oxLucmhFrFlCE .marker{fill:#333333;stroke:#333333;}#mermaid-svg-Iq8oxLucmhFrFlCE .marker.cross{stroke:#333333;}#mermaid-svg-Iq8oxLucmhFrFlCE svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-Iq8oxLucmhFrFlCE p{margin:0;}#mermaid-svg-Iq8oxLucmhFrFlCE .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-Iq8oxLucmhFrFlCE .cluster-label text{fill:#333;}#mermaid-svg-Iq8oxLucmhFrFlCE .cluster-label span{color:#333;}#mermaid-svg-Iq8oxLucmhFrFlCE .cluster-label span p{background-color:transparent;}#mermaid-svg-Iq8oxLucmhFrFlCE .label text,#mermaid-svg-Iq8oxLucmhFrFlCE span{fill:#333;color:#333;}#mermaid-svg-Iq8oxLucmhFrFlCE .node rect,#mermaid-svg-Iq8oxLucmhFrFlCE .node circle,#mermaid-svg-Iq8oxLucmhFrFlCE .node ellipse,#mermaid-svg-Iq8oxLucmhFrFlCE .node polygon,#mermaid-svg-Iq8oxLucmhFrFlCE .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-Iq8oxLucmhFrFlCE .rough-node .label text,#mermaid-svg-Iq8oxLucmhFrFlCE .node .label text,#mermaid-svg-Iq8oxLucmhFrFlCE .image-shape .label,#mermaid-svg-Iq8oxLucmhFrFlCE .icon-shape .label{text-anchor:middle;}#mermaid-svg-Iq8oxLucmhFrFlCE .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-Iq8oxLucmhFrFlCE .rough-node .label,#mermaid-svg-Iq8oxLucmhFrFlCE .node .label,#mermaid-svg-Iq8oxLucmhFrFlCE .image-shape .label,#mermaid-svg-Iq8oxLucmhFrFlCE .icon-shape .label{text-align:center;}#mermaid-svg-Iq8oxLucmhFrFlCE .node.clickable{cursor:pointer;}#mermaid-svg-Iq8oxLucmhFrFlCE .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-Iq8oxLucmhFrFlCE .arrowheadPath{fill:#333333;}#mermaid-svg-Iq8oxLucmhFrFlCE .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-Iq8oxLucmhFrFlCE .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-Iq8oxLucmhFrFlCE .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-Iq8oxLucmhFrFlCE .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-Iq8oxLucmhFrFlCE .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-Iq8oxLucmhFrFlCE .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-Iq8oxLucmhFrFlCE .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-Iq8oxLucmhFrFlCE .cluster text{fill:#333;}#mermaid-svg-Iq8oxLucmhFrFlCE .cluster span{color:#333;}#mermaid-svg-Iq8oxLucmhFrFlCE div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-Iq8oxLucmhFrFlCE .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-Iq8oxLucmhFrFlCE rect.text{fill:none;stroke-width:0;}#mermaid-svg-Iq8oxLucmhFrFlCE .icon-shape,#mermaid-svg-Iq8oxLucmhFrFlCE .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-Iq8oxLucmhFrFlCE .icon-shape p,#mermaid-svg-Iq8oxLucmhFrFlCE .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-Iq8oxLucmhFrFlCE .icon-shape .label rect,#mermaid-svg-Iq8oxLucmhFrFlCE .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-Iq8oxLucmhFrFlCE .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-Iq8oxLucmhFrFlCE .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-Iq8oxLucmhFrFlCE :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 否

auto
xgrammar
outlines
lm-format-enforcer
guidance




用户请求

SamplingParams
是否有

structured_outputs?
正常采样
StructuredOutputRequest.from_sampling_params
get_structured_output_key

确定约束类型
backend 选择
自动选择最佳后端
XgrammarBackend
OutlinesBackend
LMFormatEnforcerBackend
GuidanceBackend
compile_grammar

编译 Grammar
异步编译

Future or Sync
grammar 就绪?
等待/跳过本轮
Grammar.accept_tokens

推进 FSM
Grammar.fill_bitmask

生成 Bitmask
apply_grammar_bitmask

应用到 Logits
Logits Masking

过滤非法 Token
Sampling

从合法 Token 中采样
是否终止?
返回结果

流程详解

1️⃣ 请求初始化阶段
复制代码
User Request → SamplingParams.structured_outputs
    ↓
StructuredOutputRequest.from_sampling_params()
    ↓
提取 StructuredOutputsParams(json/regex/choice/grammar/structural_tag)
    ↓
get_structured_output_key() → (StructuredOutputOptions, grammar_spec)
2️⃣ Grammar 编译阶段
复制代码
Backend.compile_grammar(request_type, grammar_spec)
    ↓
┌─────────────────────────────────────────────────────┐
│  Xgrammar:   GrammarCompiler.compile_xxx()           │
│              → GrammarMatcher (C++ FSM)               │
│                                                     │
│  Outlines:   build_regex_from_schema() (if JSON)     │
│              → oc.Index (DFA)                        │
│              → oc.Guide (state machine)              │
│                                                     │
│  LMFE:       JsonSchemaParser / RegexParser / ...    │
│              → TokenEnforcer                         │
│                                                     │
│  Guidance:   serialize_guidance_grammar()             │
│              → LLMatcher                             │
└─────────────────────────────────────────────────────┘
    ↓
返回 StructuredOutputGrammar 实例
3️⃣ 推理循环阶段(每步解码)
复制代码
┌─────────────────────────────────────────────┐
│ Step 1: Grammar.accept_tokens(new_tokens)    │
│         - 推进 FSM 状态                       │
│         - 验证 token 合法性                   │
│                                             │
│ Step 2: Grammar.fill_bitmask(bitmask, idx)  │
│         - 填充下一轮合法 token mask           │
│                                             │
│ Step 3: apply_grammar_bitmask()             │
│         - Batch reorder                      │
│         - GPU tensor 操作                   │
│         - xgr.apply_token_bitmask_inplace() │
│                                             │
│ Step 4: Logits Masking                      │
│         - 非法 token logit → -∞             │
│                                             │
│ Step 5: Sampling                            │
│         - 从合法 token 中概率采样            │
│                                             │
│ Step 6: Check termination                   │
│         - Grammar.is_terminated()?          │
│         - Yes → 返回结果                     │
│         - No → 继续 Step 1                  │
└─────────────────────────────────────────────┘
4️⃣ Speculative Decoding 支持
复制代码
Normal Path:  accept_tokens → fill_bitmask → sample
                ↓
Speculative:  validate_tokens (不推进状态)
                ↓
            如果验证失败 → rollback(num_tokens)
                ↓
            重新生成 candidate tokens

七、后端选择策略与建议

Auto 模式选择逻辑

backend="auto" 时,vLLM 会根据以下启发式规则选择:

场景 推荐后端 原因
JSON Schema + 高性能需求 xgrammar 原生 C++,最快
正则表达式 + 简单模式 outlines DFA 高效
快速原型开发 lm-format-enforcer 轻量级,易调试
复杂工具调用 guidance Structural Tag 支持
需要最大兼容性 guidance 支持最多种类

性能特征对比

维度 Xgrammar Outlines LMFE Guidance
编译速度 ⭐⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐
运行时开销 ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐⭐
内存占用 中(Index 缓存)
GPU 利用率 最高(native kernel)
灵活性 最高

兼容性注意事项

  1. Speculative Decoding:LM Format Enforcer 不支持
  2. JSON Schema 复杂度
    • Xgrammar:不支持 multipleOfuniqueItemspatternProperties
    • Guidance:不支持 patternProperties
  3. 正则表达式限制:Outlines 不支持 look-around 和 backreference
  4. Grammar 格式:Xgrammar 仅支持 EBNF(需转换 Lark 格式)

八、关键设计模式总结

8.1 抽象工厂模式

复制代码
StructuredOutputBackend (ABC)
    ├── XgrammarBackend
    ├── OutlinesBackend
    ├── LMFormatEnforcerBackend
    └── GuidanceBackend

每个 Backend 负责:
- compile_grammar() → 返回具体的 Grammar 实现
- allocate_token_bitmask() → 内存管理
- destroy() → 资源清理

8.2 Strategy 模式

运行时可通过配置切换后端:

python 复制代码
# engine 启动时
config = StructuredOutputsConfig(backend="xgrammar")  # 或 "auto"
engine = LLMEngine(config=config)

# 请求时无需关心具体后端
sampling_params = SamplingParams(
    structured_outputs=StructuredOutputsParams(json=schema)
)

8.3 Observer 模式(Bitmask 更新)

Grammar 作为 Observer 监控 token 生成过程:

  • accept_tokens():通知新 token
  • fill_bitmask():查询下一步约束
  • rollback():响应 speculative decoding 回滚

8.4 Lazy Initialization & Async Compilation

python 复制代码
# Grammar 可以是 Future(异步编译中)
_grammar: Future[StructuredOutputGrammar] | StructuredOutputGrammar | None

# 非阻塞检查
@property
def is_grammar_ready(self) -> bool:
    return self._check_grammar_completion()  # 100μs timeout

这避免了首次请求时的冷启动延迟。


九、扩展指南

添加新的结构化输出后端

如果要添加自定义后端(例如基于新库),需要:

  1. 继承 StructuredOutputBackend
python 复制代码
@dataclass
class MyCustomBackend(StructuredOutputBackend):
    def compile_grammar(self, request_type, grammar_spec):
        # 实现编译逻辑
        pass

    def allocate_token_bitmask(self, max_num_seqs):
        # 分配内存
        pass

    def destroy(self):
        # 清理资源
        pass
  1. 继承 StructuredOutputGrammar
python 复制代码
@dataclass
class MyCustomGrammar(StructuredOutputGrammar):
    def accept_tokens(self, request_id, tokens):
        # 推进状态机
        pass

    def fill_bitmask(self, bitmask, idx):
        # 填充 bitmask
        pass

    # ... 实现其他必需方法
  1. 注册到 StructuredOutputsBackend 类型
python 复制代码
StructuredOutputsBackend = Literal[
    "auto", "xgrammar", "guidance", "outlines", "lm-format-enforcer", "my-custom"
]
  1. 实现验证函数
python 复制代码
def validate_my_custom_grammar(sampling_params: SamplingParams):
    # 验证请求参数是否被后端支持
    pass

十、常见问题排查

问题 1:ValueError - Unsupported JSON Schema features

错误信息

复制代码
The provided JSON schema contains features not supported by xgrammar.

解决方案

  • 移除 multipleOfuniqueItemscontainspatternProperties
  • 或者切换到 guidance 后端

问题 2:Regex does not have universal start state

错误信息

复制代码
Regex does not have a anchored universal start state...

原因 :使用了 ^ 锚定或 look-around 断言

解决方案

  • 移除开头的 ^$
  • 替换 (?=...) 为普通匹配
  • 切换到 xgrammarguidance 后端(对正则限制更少)

问题 3:LM Format Enforcer 不支持 Speculative Decoding

错误信息

复制代码
LM Format Enforcer backend does not support speculative tokens

解决方案

  • 禁用 speculative decoding:--speculative-model ""
  • 或切换到 xgrammar/outlines/guidance 后端

问题 4:Grammar 编译超时

现象:首请求延迟高

原因:Complex JSON Schema 或 Grammar 首次编译耗时

优化方案

  • 启用 Xgrammar 缓存:VLLM_XGRAMMAR_CACHE_MB=256
  • 启用 Outlines disk cache:VLLM_V1_USE_OUTLINES_CACHE=1
  • 预热常用 schema

附录:核心数据结构关系图

#mermaid-svg-Slr6Btxn7DK6PMvC{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-Slr6Btxn7DK6PMvC .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-Slr6Btxn7DK6PMvC .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-Slr6Btxn7DK6PMvC .error-icon{fill:#552222;}#mermaid-svg-Slr6Btxn7DK6PMvC .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-Slr6Btxn7DK6PMvC .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-Slr6Btxn7DK6PMvC .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-Slr6Btxn7DK6PMvC .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-Slr6Btxn7DK6PMvC .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-Slr6Btxn7DK6PMvC .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-Slr6Btxn7DK6PMvC .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-Slr6Btxn7DK6PMvC .marker{fill:#333333;stroke:#333333;}#mermaid-svg-Slr6Btxn7DK6PMvC .marker.cross{stroke:#333333;}#mermaid-svg-Slr6Btxn7DK6PMvC svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-Slr6Btxn7DK6PMvC p{margin:0;}#mermaid-svg-Slr6Btxn7DK6PMvC g.classGroup text{fill:#9370DB;stroke:none;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:10px;}#mermaid-svg-Slr6Btxn7DK6PMvC g.classGroup text .title{font-weight:bolder;}#mermaid-svg-Slr6Btxn7DK6PMvC .cluster-label text{fill:#333;}#mermaid-svg-Slr6Btxn7DK6PMvC .cluster-label span{color:#333;}#mermaid-svg-Slr6Btxn7DK6PMvC .cluster-label span p{background-color:transparent;}#mermaid-svg-Slr6Btxn7DK6PMvC .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-Slr6Btxn7DK6PMvC .cluster text{fill:#333;}#mermaid-svg-Slr6Btxn7DK6PMvC .cluster span{color:#333;}#mermaid-svg-Slr6Btxn7DK6PMvC .nodeLabel,#mermaid-svg-Slr6Btxn7DK6PMvC .edgeLabel{color:#131300;}#mermaid-svg-Slr6Btxn7DK6PMvC .edgeLabel .label rect{fill:#ECECFF;}#mermaid-svg-Slr6Btxn7DK6PMvC .label text{fill:#131300;}#mermaid-svg-Slr6Btxn7DK6PMvC .labelBkg{background:#ECECFF;}#mermaid-svg-Slr6Btxn7DK6PMvC .edgeLabel .label span{background:#ECECFF;}#mermaid-svg-Slr6Btxn7DK6PMvC .classTitle{font-weight:bolder;}#mermaid-svg-Slr6Btxn7DK6PMvC .node rect,#mermaid-svg-Slr6Btxn7DK6PMvC .node circle,#mermaid-svg-Slr6Btxn7DK6PMvC .node ellipse,#mermaid-svg-Slr6Btxn7DK6PMvC .node polygon,#mermaid-svg-Slr6Btxn7DK6PMvC .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-Slr6Btxn7DK6PMvC .divider{stroke:#9370DB;stroke-width:1;}#mermaid-svg-Slr6Btxn7DK6PMvC g.clickable{cursor:pointer;}#mermaid-svg-Slr6Btxn7DK6PMvC g.classGroup rect{fill:#ECECFF;stroke:#9370DB;}#mermaid-svg-Slr6Btxn7DK6PMvC g.classGroup line{stroke:#9370DB;stroke-width:1;}#mermaid-svg-Slr6Btxn7DK6PMvC .classLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.5;}#mermaid-svg-Slr6Btxn7DK6PMvC .classLabel .label{fill:#9370DB;font-size:10px;}#mermaid-svg-Slr6Btxn7DK6PMvC .relation{stroke:#333333;stroke-width:1;fill:none;}#mermaid-svg-Slr6Btxn7DK6PMvC .dashed-line{stroke-dasharray:3;}#mermaid-svg-Slr6Btxn7DK6PMvC .dotted-line{stroke-dasharray:1 2;}#mermaid-svg-Slr6Btxn7DK6PMvC #compositionStart,#mermaid-svg-Slr6Btxn7DK6PMvC .composition{fill:#333333!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-Slr6Btxn7DK6PMvC #compositionEnd,#mermaid-svg-Slr6Btxn7DK6PMvC .composition{fill:#333333!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-Slr6Btxn7DK6PMvC #dependencyStart,#mermaid-svg-Slr6Btxn7DK6PMvC .dependency{fill:#333333!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-Slr6Btxn7DK6PMvC #dependencyStart,#mermaid-svg-Slr6Btxn7DK6PMvC .dependency{fill:#333333!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-Slr6Btxn7DK6PMvC #extensionStart,#mermaid-svg-Slr6Btxn7DK6PMvC .extension{fill:transparent!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-Slr6Btxn7DK6PMvC #extensionEnd,#mermaid-svg-Slr6Btxn7DK6PMvC .extension{fill:transparent!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-Slr6Btxn7DK6PMvC #aggregationStart,#mermaid-svg-Slr6Btxn7DK6PMvC .aggregation{fill:transparent!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-Slr6Btxn7DK6PMvC #aggregationEnd,#mermaid-svg-Slr6Btxn7DK6PMvC .aggregation{fill:transparent!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-Slr6Btxn7DK6PMvC #lollipopStart,#mermaid-svg-Slr6Btxn7DK6PMvC .lollipop{fill:#ECECFF!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-Slr6Btxn7DK6PMvC #lollipopEnd,#mermaid-svg-Slr6Btxn7DK6PMvC .lollipop{fill:#ECECFF!important;stroke:#333333!important;stroke-width:1;}#mermaid-svg-Slr6Btxn7DK6PMvC .edgeTerminals{font-size:11px;line-height:initial;}#mermaid-svg-Slr6Btxn7DK6PMvC .classTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-Slr6Btxn7DK6PMvC .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-Slr6Btxn7DK6PMvC .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-Slr6Btxn7DK6PMvC :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} creates
creates
creates
creates
holds
uses
StructuredOutputsConfig
+backend: str
+disable_any_whitespace: bool
+disable_additional_properties: bool
+compute_hash() : str
<<abstract>>
StructuredOutputBackend
+vllm_config: VllmConfig
+tokenizer: TokenizerLike
+vocab_size: int
+compile_grammar() : StructuredOutputGrammar
+allocate_token_bitmask() : Tensor
+destroy()
<<abstract>>
StructuredOutputGrammar
+accept_tokens() : bool
+validate_tokens() : list
+rollback()
+fill_bitmask()
+is_terminated() : bool
+reset()
StructuredOutputRequest
+params: StructuredOutputsParams
-_grammar: Future | Grammar | None
+is_grammar_ready: bool
+grammar: Grammar | None
+from_sampling_params() : Request
<<enumeration>>
StructuredOutputOptions
JSON
JSON_OBJECT
REGEX
GRAMMAR
CHOICE
STRUCTURAL_TAG
XgrammarBackend
+compiler: GrammarCompiler
+num_speculative_tokens: int
XgrammarGrammar
+matcher: GrammarMatcher
+ctx: CompiledGrammar
OutlinesBackend
+vocabulary: OutlinesVocabulary
+cache: Cache
OutlinesGrammar
+guide: Guide
+_prev_finished: bool
LMFormatEnforcerBackend
+tokenizer_data: TokenEnforcerTokenizerData
LMFormatEnforcerGrammar
+token_enforcer: TokenEnforcer
+current_tokens_prefix: list
GuidanceBackend
+ll_tokenizer: LLTokenizer
GuidanceGrammar
+ll_matcher: LLMatcher
+rollback_lag: int
Engineuses


参考资源

源码文件索引

文件 路径 功能
Config /workspace/vllm/config/structured_outputs.py 配置定义
Backend Types /workspace/vllm/v1/structured_output/backend_types.py 抽象基类
Xgrammar /workspace/vllm/v1/structured_output/backend_xgrammar.py Xgrammar 后端
Outlines /workspace/vllm/v1/structured_output/backend_outlines.py Outlines 后端
LMFE /workspace/vllm/v1/structured_output/backend_lm_format_enforcer.py LM Format Enforcer 后端
Guidance /workspace/vllm/v1/structured_output/backend_guidance.py Guidance 后端
Request /workspace/vllm/v1/structured_output/request.py 请求级配置
Utils /workspace/vllm/v1/structured_output/utils.py 工具函数

外部依赖


文档版本:v1.0 | 基于 vLLM 最新源码分析 | 生成日期:2026-05-10

相关推荐
网络攻城狮_1 小时前
网络协议大全
运维·网络·网络协议·http
有浔则灵1 小时前
网络安全核心知识梳理:从OSI模型到密码技术
网络·安全·web安全
happyprince2 小时前
14-vLLM LoRA 适配器系统深度解析
vllm
Lorin 洛林3 小时前
一文读懂 Agent Skills
前端·网络
andxe11 小时前
安科士AndXe 400G QSFP-DD LR8光模块芯片架构与品控体系解析
网络·光模块·光通信
去码头整点薯条9811 小时前
网络实验报告10
网络
坚定的共产主义生产设备永不宕机12 小时前
路由协议OSPF进阶一篇讲全
网络
碎碎念_49213 小时前
以太网技术、VLAN、STP详解
网络·stp·vlan
hbugs00114 小时前
【案例分享】全网首个华三数据中心流量可视化实验,基于EVE-NG V7平台
网络·网络协议·安全·devops·eve-ng