大模型系列——投机解码:Prompt Lookup Decoding代码解读

官方代码见:GitHub - apoorvumang/prompt-lookup-decoding

UPDATE 2 : This method is now available in vLLM as well by setting speculative_model="[ngram]" 🥳

UPDATE : This has been added to the transformers library. Please see this for a code example, or simply add prompt_lookup_num_tokens=10 to your model.generate(...) call.

TLDR : We modify speculative decoding where we replace the draft model with simple string matching in the prompt to generate candidate token sequences. This results in significant speedups (2x-4x) in input-grounded tasks, with no effect on output quality. This method can be used with any decoder model without model changes or external datastore, and with both greedy and sampling techniques.

Intuition : In several LLM use cases where you're doing input grounded generation (summarization, document QA, multi-turn chat, code editing), there is high n-gram overlap between LLM input (prompt) and LLM output. This could be entity names, phrases, or code chunks that the LLM directly copies from the input while generating the output. Prompt lookup exploits this pattern to speed up autoregressive decoding in LLMs.

python 复制代码
def find_candidate_pred_tokens(input_ids, max_ngram_size=3, num_pred_tokens=10):
    input_length = input_ids.size(1)

    for ngram_size in range(max_ngram_size, 0, -1):
        # Extract the last n tokens as our search ngram
        ngram = input_ids[0, -ngram_size:].tolist()

        # Create sliding windows of size ngram_size
        windows = input_ids.unfold(dimension=1, size=ngram_size, step=1)

        # Convert ngram to a tensor for comparison
        ngram_tensor = torch.tensor(ngram, device=input_ids.device).unsqueeze(0)

        # Find where the windows match the ngram
        matches = (windows == ngram_tensor).all(dim=2)

        # Get the indices of matches
        match_indices = matches.nonzero(as_tuple=True)[1]

        # Iterate through match indices to find a valid continuation
        for idx in match_indices:
            start_idx = idx + ngram_size
            end_idx = start_idx + num_pred_tokens
            # Ensure we don't go beyond the length of input_ids and avoid self-match
            if end_idx <= input_length and start_idx < input_length - ngram_size:
                return input_ids[0, start_idx:end_idx]

    # If no match is found, return an empty tensor
    return torch.tensor([], dtype=torch.long, device=input_ids.device)

ODOs/Thoughts/Future work

  • There's probably better ways to do stringmatching than the current one, and there are several obvious things to improve eg. what to do when there are multiple matches? Whats the ideal length of continuation?
  • We haven't yet tried sampling, although there's no reason it shouldn't work.
    • Here, one additional thing to test would be whether prompt lookup while sampling can affect hallucination rates, since this artifically increases probability of sampling exact sequences from input (this was suggest by my colleague Shwetha S)
  • Testing actual FLOPs impact and tradeoffs is needed
  • Also need to figure out best hyperparams - 3 and 10 were chosen on very little testing
  • It would be an interesting challenge to design the "best lookup function" for decoding, could even be a competition?

这个方法可能还是有问题的,正如坐着所说,可能存在幻觉,不一定ngram匹配上的就能加速

相关推荐
三桥君10 小时前
在AI应用中Prompt撰写重要却难掌握,‘理解模型与行业知识是关键’:提升迫在眉睫
人工智能·ai·系统架构·prompt·产品经理·三桥君
semantist@语校11 小时前
日本语言学校:签证制度类 Prompt 的结构整理路径与策略
人工智能·百度·ai·语言模型·prompt·github·数据集
正在走向自律2 天前
第三章-提示词-解锁Prompt提示词工程核销逻辑,开启高效AI交互(10/36)
人工智能·prompt·交互·上下文注入·lora 微调·多模态提示·ape(自动提示词工程)
AIGC包拥它2 天前
AI教学设计助手:生成好教案的Prompt技术实战(二)
人工智能·prompt·aigc
是数学系的小孩儿2 天前
Prompt
prompt
在努力的韩小豪3 天前
如何从0开始构建自己的第一个AI应用?(Prompt工程、Agent自定义、Tuning)
人工智能·python·llm·prompt·agent·ai应用·mcp
AIGC包拥它3 天前
AI教学设计助手:生成好教案的Prompt技术实战(一)
人工智能·prompt
coding随想3 天前
JavaScript中的系统对话框:alert、confirm、prompt
开发语言·javascript·prompt
liliangcsdn4 天前
Deepseek-如何从零开始开发需要专业知识的prompt
prompt
zzywxc7874 天前
AI技术通过提示词工程(Prompt Engineering)正在深度重塑职场生态和行业格局,这种变革不仅体现在效率提升,更在重构人机协作模式。
java·大数据·开发语言·人工智能·spring·重构·prompt