Transformer和LLM前沿内容(4):Long-Context LLM

文章目录

      • [1. Context Extension](#1. Context Extension)
        • [1.1 Rotary Position Embedding (RoPE)](#1.1 Rotary Position Embedding (RoPE))
        • [1.2 LongLoRA](#1.2 LongLoRA)
      • [2. Evaluation of Long-Context LLMs](#2. Evaluation of Long-Context LLMs)
        • [2.1 The Lost in the Middle Phenomenon](#2.1 The Lost in the Middle Phenomenon)
        • [2.2 Long-Context Benchmarks: NIAH, LongBench](#2.2 Long-Context Benchmarks: NIAH, LongBench)
      • [3. Efficient Attention Mechanisms](#3. Efficient Attention Mechanisms)
        • [3.1 KV Cache](#3.1 KV Cache)
        • [3.2 StreamingLLM and Attention Sinks(重点)](#3.2 StreamingLLM and Attention Sinks(重点))
        • [3.3 DuoAttention: Retrieval Heads and Streaming Heads (重点)](#3.3 DuoAttention: Retrieval Heads and Streaming Heads (重点))
        • [3.4 Quest: Query-Aware Sparsity(重点)](#3.4 Quest: Query-Aware Sparsity(重点))
      • [4. Beyond Transformers](#4. Beyond Transformers)
        • [4.1 State-Space Models (SSMs): Mamba](#4.1 State-Space Models (SSMs): Mamba)
        • [4.2 Hybrid Models: Jamba](#4.2 Hybrid Models: Jamba)

1. Context Extension

1.1 Rotary Position Embedding (RoPE)

1.2 LongLoRA




2. Evaluation of Long-Context LLMs

2.1 The Lost in the Middle Phenomenon
2.2 Long-Context Benchmarks: NIAH, LongBench



3. Efficient Attention Mechanisms

3.1 KV Cache


3.2 StreamingLLM and Attention Sinks(重点)














3.3 DuoAttention: Retrieval Heads and Streaming Heads (重点)









3.4 Quest: Query-Aware Sparsity(重点)









4. Beyond Transformers

4.1 State-Space Models (SSMs): Mamba





4.2 Hybrid Models: Jamba




相关推荐
happyprince2 小时前
[推理]vLLM-2026年第二季度路线图
人工智能
Rick19932 小时前
LangChain(含 LangChain4j)和 Spring AI的区别
人工智能·spring·langchain
java1234_小锋2 小时前
Spring AI 2.0 开发Java Agent智能体 - Spring AI 2.0简介
java·人工智能·spring·spring ai
Jun6262 小时前
【树莓派】opencv水滴接触角测量
人工智能·opencv·计算机视觉
zhangfeng11332 小时前
No space left on device (28) llamafactory微调训练的时候 报错,需要调节 dataloader_num_workers
人工智能·语言模型·llama
流年似水~2 小时前
iOS 开发进阶之路:从能跑到能维护
人工智能·程序人生·ios·语言模型
QuestLab2 小时前
【第23期】2026年4月26日 AI日报
人工智能
AIminminHu2 小时前
((AI篇)OpenGL渲染与几何内核那点事-(二-1-(10):从“搜个大概”到“读懂图纸”:一个 CAD 开发者眼中的 RAG 进化简史)
人工智能·agent·opengl·智能体
SmartBrain2 小时前
AI技术演进与实战路径洞察
人工智能·架构·aigc