中文使用 googletrans 翻译,翻译不对的地方以英文为准
目录
- 大语模型上下文工程的调查
- 通过反射生成模型测试时间缩放
- 推理还是记忆?由于数据污染而导致的增强学习结果不可靠
- [朝着具有深层推理的代理抹布:对LLMS的抹布 - 改良系统的调查](#朝着具有深层推理的代理抹布:对LLMS的抹布 - 改良系统的调查)
- 神经:通过神经生成模型模拟操作系统
- VisionThink:通过强化学习智能有效的视觉语言模型
- 开放视力推理:转移视觉推理的语言认知行为
- 回归的混合物:自适应令牌计算的学习动态递归深度
- [π^3:可扩展的置换式 - 视觉几何学习](#π^3:可扩展的置换式 - 视觉几何学习)
- 视觉基础模型作为自动回归图像生成的有效视觉引物器
- 双子座2.5:用高级推理,多模式,长上下文和下一代代理能力推动边界
- [Exaone 4.0:统一的大型语言模型集成了非争议和推理模式](#Exaone 4.0:统一的大型语言模型集成了非争议和推理模式)
- diffuman4d:4D一致的人类视图综合了带有时空扩散模型的稀疏视频
- Clift:压缩光景令牌用于计算效率和适应性神经渲染
- SpeakervID-5M:用于视听二元互动人类一代的大型高质量数据集
- 模仿游戏:图灵机模拟器是长度可推广的推理器
- 视觉语言视觉自动编码器:扩散模型的可扩展知识蒸馏
- Physx:物理接地的3D资产产生
- AnyCap项目:可控omni模式字幕的统一框架,数据集和基准测试
- SWE-PERF:语言模型可以优化现实世界存储库上的代码性能吗?
- KV缓存转向以诱导小语言模型的推理
- Riemannlora:无歧义洛拉优化的统一的Riemannian框架
- Ambrace-3K:在复杂环境中体现的推理和行动
- 最佳数据混合物的缩放定律
- FrafterBench:为土木工程中的任务自动化的大型语言模型基准测试
- 一个愚弄llm-as-a-a-augghgudge的象征
大语模型上下文工程的调查
-
标题: A Survey of Context Engineering for Large Language Models
-
作者: Lingrui Mei, Jiayu Yao, Yuyao Ge, Yiwei Wang, Baolong Bi, Yujun Cai, Jiazhi Liu, Mingyu Li, Zhong-Zhi Li, Duzhen Zhang, Chenlin Zhou, Jiayi Mao, Tianze Xia, Jiafeng Guo, Shenghua Liu
-
日期: 2025-07-17
-
ArXiv主页 : https://arxiv.org/abs/2507.13334
-
gitHub仓库 : https://github.com/Meirtz/Awesome-Context-Engineering
英文摘要
The performance of Large Language Models (LLMs) is fundamentally determined by the contextual information provided during inference. This survey introduces Context Engineering, a formal discipline that transcends simple prompt design to encompass the systematic optimization of information payloads for LLMs. We present a comprehensive taxonomy decomposing Context Engineering into its foundational components and the sophisticated implementations that integrate them into intelligent systems. We first examine the foundational components: context retrieval and generation, context processing and context management. We then explore how these components are architecturally integrated to create sophisticated system implementations: retrieval-augmented generation (RAG), memory systems and tool-integrated reasoning, and multi-agent systems. Through this systematic analysis of over 1300 research papers, our survey not only establishes a technical roadmap for the field but also reveals a critical research gap: a fundamental asymmetry exists between model capabilities. While current models, augmented by advanced context engineering, demonstrate remarkable proficiency in understanding complex contexts, they exhibit pronounced limitations in generating equally sophisticated, long-form outputs. Addressing this gap is a defining priority for future research. Ultimately, this survey provides a unified framework for both researchers and engineers advancing context-aware AI.
中文摘要
大语言模型(LLM)的性能从根本上取决于推论过程中提供的上下文信息。这项调查介绍了上下文工程,这是一种正式的学科,超越了简单的提示设计,以涵盖LLMS信息有效负载的系统优化。我们向其基础组件和将它们集成到智能系统中的复杂实现中,将上下文工程分解为全面的分类学。我们首先研究了基础组件:上下文检索和生成,上下文处理和上下文管理。然后,我们探讨如何在架构集成中集成这些组件以创建复杂的系统实现:检索功能增强的生成(RAG),内存系统和工具集成推理以及多代理系统。通过对1300多个研究论文的系统分析,我们的调查不仅建立了该领域的技术路线图,而且还揭示了重要的研究差距:模型能力之间存在基本的不对称性。尽管当前的模型是由高级上下文工程增强的,但表现出在理解复杂环境方面的出色熟练程度,但它们在产生同样复杂的长期产出时表现出明显的局限性。解决这一差距是未来研究的定义优先事项。最终,这项调查为研究人员和工程师提供了一个统一的框架,以推进上下文感知的AI。
通过反射生成模型测试时间缩放
-
标题: Test-Time Scaling with Reflective Generative Model
-
作者: Zixiao Wang, Yuxin Wang, Xiaorui Wang, Mengting Xing, Jie Gao, Jianjun Xu, Guangcan Liu, Chenhui Jin, Zhuo Wang, Shengzhuo Zhang, Hongtao Xie
-
日期: 2025-07-02
-
ArXiv主页 : https://arxiv.org/abs/2507.01951
-
gitHub仓库 : https://github.com/MetaStone-AI/MetaStone-S1
英文摘要
We introduce our first reflective generative model MetaStone-S1, which obtains OpenAI o3's performance via the self-supervised process reward model (SPRM). Through sharing the backbone network and using task-specific heads for next token prediction and process scoring respectively, SPRM successfully integrates the policy model and process reward model(PRM) into a unified interface without extra process annotation, reducing over 99% PRM parameters for efficient reasoning. Equipped with SPRM, MetaStone-S1 is naturally suitable for test time scaling (TTS), and we provide three reasoning effort modes (low, medium, and high), based on the controllable thinking length. Moreover, we empirically establish a scaling law that reveals the relationship between total thinking computation and TTS performance. Experiments demonstrate that our MetaStone-S1 achieves comparable performance to OpenAI-o3-mini's series with only 32B parameter size. To support the research community, we have open-sourced MetaStone-S1 at https://github.com/MetaStone-AI/MetaStone-S1.
中文摘要
我们介绍了我们的第一个反射生成模型metastone-S1,该模型通过自我监督的过程奖励模型(SPRM)获得OpenAI O3的性能。通过共享骨干网络并使用特定于任务的头部进行隔壁预测和过程评分,SPRM成功地将策略模型和流程奖励模型(PRM)集成到统一的界面中,而无需额外的流程注释,减少了超过99%的PRM参数以获得有效的推理。Metastone-S1配备了SPRM,自然适合测试时间缩放(TTS),我们根据可控的思维长度提供了三种推理工作模式(低,中,中高)。此外,我们从经验上建立了扩展定律,该法律揭示了总思维计算与TTS绩效之间的关系。实验表明,我们的Metastone-S1的性能与OpenAI-O3-Mini的系列相当,只有32B参数大小。为了支持研究社区,我们在https://github.com/metastone-ai/metastone-s1上提供了开源的Metastone-S1。
推理还是记忆?由于数据污染而导致的增强学习结果不可靠
-
标题: Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
-
作者: Mingqi Wu, Zhihao Zhang, Qiaole Dong, Zhiheng Xi, Jun Zhao, Senjie Jin, Xiaoran Fan, Yuhao Zhou, Yanwei Fu, Qin Liu, Songyang Zhang, Qi Zhang
-
日期: 2025-07-14
-
ArXiv主页 : https://arxiv.org/abs/2507.10532
英文摘要
The reasoning capabilities of large language models (LLMs) have been a longstanding focus of research. Recent works have further enhanced these capabilities using reinforcement learning (RL), with many new methods claiming significant improvements with minimal or no external supervision. Surprisingly, some studies even suggest that random or incorrect reward signals can enhance reasoning performance. However, these breakthroughs are mostly reported on the Qwen2.5 model family and evaluated on well-known benchmarks such as MATH-500, AMC, and AIME, while failing to achieve similar gains on other models like Llama, which warrants further investigation. Our analysis shows that although Qwen2.5 achieves strong mathematical reasoning performance, its pretraining on large-scale web corpora makes it vulnerable to data contamination in popular benchmarks. As a result, results derived from these benchmarks may be unreliable. To address this, we introduce a generator that produces fully synthetic arithmetic problems of arbitrary length and difficulty, yielding a clean dataset we call RandomCalculation. Using these leakage-free datasets, we show that only accurate reward signals consistently improve performance, while noisy or incorrect signals do not. We advocate for evaluating RL methods on uncontaminated benchmarks and across diverse model families to ensure trustworthy conclusions.
中文摘要
大语言模型(LLM)的推理能力一直是研究的长期重点。最近的工作进一步增强了这些功能,并使用加强学习(RL)增强了这些功能,许多新方法声称通过最少或没有外部监督的大幅改善。令人惊讶的是,一些研究甚至表明随机或不正确的奖励信号可以提高推理性能。但是,这些突破大多在QWEN2.5模型家族上进行了报道,并在诸如Math-500,AMC和Aime之类的众所周知的基准上进行了评估,同时未能在Llama等其他模型上取得类似的收益,这些模型值得进一步研究。我们的分析表明,尽管QWEN2.5达到了强大的数学推理性能,但其在大规模网络中的预处理使其容易受到流行基准的数据污染的影响。结果,从这些基准测试中得出的结果可能不可靠。为了解决这个问题,我们介绍了一个发电机,该发生器会产生任意长度和难度的完全合成算术问题,从而产生了一个干净的数据集,我们称之为随机计算。使用这些无泄漏数据集,我们表明只有准确的奖励信号始终如一地提高性能,而嘈杂或不正确的信号则不会。我们主张评估未经污染基准和跨不同模型家庭的RL方法,以确保值得信赖的结论。
朝着具有深层推理的代理抹布:对LLMS的抹布 - 改良系统的调查
-
标题: Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs
-
作者: Yangning Li, Weizhi Zhang, Yuyao Yang, Wei-Chieh Huang, Yaozu Wu, Junyu Luo, Yuanchen Bei, Henry Peng Zou, Xiao Luo, Yusheng Zhao, Chunkit Chan, Yankai Chen, Zhongfen Deng, Yinghui Li, Hai-Tao Zheng, Dongyuan Li, Renhe Jiang, Ming Zhang, Yangqiu Song, Philip S. Yu
-
日期: 2025-07-13
-
ArXiv主页 : https://arxiv.org/abs/2507.09477
-
gitHub仓库 : https://github.com/DavidZWZ/Awesome-Deep-Research
英文摘要
Retrieval-Augmented Generation (RAG) lifts the factuality of Large Language Models (LLMs) by injecting external knowledge, yet it falls short on problems that demand multi-step inference; conversely, purely reasoning-oriented approaches often hallucinate or mis-ground facts. This survey synthesizes both strands under a unified reasoning-retrieval perspective. We first map how advanced reasoning optimizes each stage of RAG (Reasoning-Enhanced RAG). Then, we show how retrieved knowledge of different type supply missing premises and expand context for complex inference (RAG-Enhanced Reasoning). Finally, we spotlight emerging Synergized RAG-Reasoning frameworks, where (agentic) LLMs iteratively interleave search and reasoning to achieve state-of-the-art performance across knowledge-intensive benchmarks. We categorize methods, datasets, and open challenges, and outline research avenues toward deeper RAG-Reasoning systems that are more effective, multimodally-adaptive, trustworthy, and human-centric. The collection is available at https://github.com/DavidZWZ/Awesome-RAG-Reasoning.
中文摘要
检索演示的一代(RAG)通过注入外部知识来提高大语言模型(LLMS)的事实,但它缺乏需要多步推理的问题;相反,纯粹以推理为导向的方法通常会幻觉或误入歧途。这项调查在统一的推理 - 取消透视视角下综合了这两条线。我们首先绘制高级推理如何优化抹布的每个阶段(推理增强的抹布)。然后,我们展示如何检索不同类型的供应遗失前提的知识并扩展复杂推理的上下文(抹布增强的推理)。最后,我们聚焦了新兴的抹布框架框架,在其中(代理)LLMS迭代交织搜索和推理,以在知识密集型基准中实现最先进的性能。我们将方法,数据集和公开挑战分类,并概述研究途径,以更深入的破布系统,这些系统更有效,自适应,值得信赖和以人为本。该系列可在https://github.com/davidzwz/awesome-rag-rag-resounting上找到。
神经:通过神经生成模型模拟操作系统
- 标题: NeuralOS: Towards Simulating Operating Systems via Neural Generative Models
- 作者: Luke Rivard, Sun Sun, Hongyu Guo, Wenhu Chen, Yuntian Deng
- 日期: 2025-07-11
- ArXiv主页 : https://arxiv.org/abs/2507.08800
- 论文链接 : https://arxiv.org/pdf/2507.08800
- 项目链接 : https://neural-os.com/
- gitHub仓库 : https://github.com/yuntian-group/neural-os
英文摘要
We introduce NeuralOS, a neural framework that simulates graphical user interfaces (GUIs) of operating systems by directly predicting screen frames in response to user inputs such as mouse movements, clicks, and keyboard events. NeuralOS combines a recurrent neural network (RNN), which tracks computer state, with a diffusion-based neural renderer that generates screen images. The model is trained on a large-scale dataset of Ubuntu XFCE recordings, which include both randomly generated interactions and realistic interactions produced by AI agents. Experiments show that NeuralOS successfully renders realistic GUI sequences, accurately captures mouse interactions, and reliably predicts state transitions like application launches. Although modeling fine-grained keyboard interactions precisely remains challenging, NeuralOS offers a step toward creating fully adaptive, generative neural interfaces for future human-computer interaction systems.
中文摘要
我们介绍了神经元,这是一个神经框架,通过直接预测屏幕帧来响应用户输入(例如鼠标移动,点击和键盘事件)来模拟操作系统的图形用户界面(GUI)。Neuralos结合了跟踪计算机状态的复发性神经网络(RNN),以及生成屏幕图像的基于扩散的神经渲染器。该模型在Ubuntu XFCE记录的大规模数据集上进行了训练,其中包括随机生成的交互和AI代理产生的现实交互。实验表明,神经系统成功地呈现了现实的GUI序列,可以准确捕获鼠标的相互作用,并可靠地预测了状态过渡,例如应用程序启动。尽管对细粒键盘相互作用进行建模精确挑战,但Neuralos为为未来的人类计算机相互作用系统创建完全适应性的,生成的神经界面提供了一步。
VisionThink:通过强化学习智能有效的视觉语言模型
-
标题: VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
-
作者: Senqiao Yang, Junyi Li, Xin Lai, Bei Yu, Hengshuang Zhao, Jiaya Jia
-
日期: 2025-07-17
-
ArXiv主页 : https://arxiv.org/abs/2507.13348
英文摘要
Recent advancements in vision-language models (VLMs) have improved performance by increasing the number of visual tokens, which are often significantly longer than text tokens. However, we observe that most real-world scenarios do not require such an extensive number of visual tokens. While the performance drops significantly in a small subset of OCR-related tasks, models still perform accurately in most other general VQA tasks with only 1/4 resolution. Therefore, we propose to dynamically process distinct samples with different resolutions, and present a new paradigm for visual token compression, namely, VisionThink. It starts with a downsampled image and smartly decides whether it is sufficient for problem solving. Otherwise, the model could output a special token to request the higher-resolution image. Compared to existing Efficient VLM methods that compress tokens using fixed pruning ratios or thresholds, VisionThink autonomously decides whether to compress tokens case by case. As a result, it demonstrates strong fine-grained visual understanding capability on OCR-related tasks, and meanwhile saves substantial visual tokens on simpler tasks. We adopt reinforcement learning and propose the LLM-as-Judge strategy to successfully apply RL to general VQA tasks. Moreover, we carefully design a reward function and penalty mechanism to achieve a stable and reasonable image resize call ratio. Extensive experiments demonstrate the superiority, efficiency, and effectiveness of our method. Our code is available at https://github.com/dvlab-research/VisionThink.
中文摘要
视觉模型(VLM)的最新进展通过增加视觉令牌的数量来提高性能,这些图形通常比文本令牌更长。但是,我们观察到,大多数真实的场景不需要如此广泛的视觉令牌。尽管在与OCR相关任务的一小部分中,性能显着下降,但模型仍然在大多数其他只有1/4分辨率的VQA任务中都能准确。因此,我们建议通过不同的分辨率动态处理不同的样品,并提出一个新的范式,用于视觉令牌压缩,即VisionThink。它从降采样的图像开始,并巧妙地决定是否足以解决问题。否则,该模型可以输出特殊令牌以请求更高分辨率的图像。与现有的有效VLM方法相比,使用固定的修剪比或阈值压缩令牌,VisionThink自主决定是否按照情况来压缩令牌。结果,它在与OCR相关的任务上展示了强烈的细颗粒视觉理解能力,同时节省了更简单的任务上的大量视觉令牌。我们采用强化学习,并提出法学委员会法官策略,以成功地将RL应用于一般VQA任务。此外,我们仔细设计了奖励功能和惩罚机制,以实现稳定且合理的图像调整呼叫比率。广泛的实验证明了我们方法的优势,效率和有效性。我们的代码可在https://github.com/dvlab-research/visionthink上找到。
开放视力推理:转移视觉推理的语言认知行为
- 标题: Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
- 作者: Yana Wei, Liang Zhao, Jianjian Sun, Kangheng Lin, Jisheng Yin, Jingcheng Hu, Yinmin Zhang, En Yu, Haoran Lv, Zejia Weng, Jia Wang, Chunrui Han, Yuang Peng, Qi Han, Zheng Ge, Xiangyu Zhang, Daxin Jiang, Vishal M. Patel
- 日期: 2025-07-07
- ArXiv主页 : https://arxiv.org/abs/2507.05255
- 论文链接 : https://arxiv.org/pdf/2507.05255
- 项目链接 : https://weiyana.github.io/Open-Vision-Reasoner/
- gitHub仓库 : https://github.com/Open-Reasoner-Zero/Open-Vision-Reasoner
英文摘要
The remarkable reasoning capability of large language models (LLMs) stems from cognitive behaviors that emerge through reinforcement with verifiable rewards. This work investigates how to transfer this principle to Multimodal LLMs (MLLMs) to unlock advanced visual reasoning. We introduce a two-stage paradigm built on Qwen2.5-VL-7B: a massive linguistic cold-start fine-tuning, followed by multimodal reinforcement learning (RL) spanning nearly 1,000 steps, surpassing all previous open-source efforts in scale. This pioneering work reveals three fundamental insights: 1) Behavior transfer emerges surprisingly early in cold start due to linguistic mental imagery. 2) Cold start broadly memorizes visual behaviors, while RL critically discerns and scales up effective patterns. 3) Transfer strategically favors high-utility behaviors such as visual reflection. Our resulting model, Open-Vision-Reasoner (OVR), achieves state-of-the-art performance on a suite of reasoning benchmarks, including 95.3% on MATH500, 51.8% on MathVision and 54.6% on MathVerse. We release our model, data, and training dynamics to catalyze the development of more capable, behavior-aligned multimodal reasoners.
中文摘要
大语言模型(LLMS)的显着推理能力源于通过可验证的奖励加强出现的认知行为。这项工作调查了如何将该原理转移到多模式LLM(MLLM)以解锁先进的视觉推理。我们介绍了建立在QWEN2.5-VL-7B上的两阶段范式:大规模的语言冷启动微调,然后进行近1,000个步骤的多模式增强学习(RL),超过所有以前的开放源代码规模。这项开创性的工作揭示了三个基本见解:1)由于语言心理图像,行为转移在寒冷开始时出人意料地出现。2)寒冷开始记住视觉行为,而RL批判性地辨别并扩大了有效模式。3)在战略上转移有利于高纯粹行为,例如视觉反射。我们由此产生的模型(OVR)在一系列推理基准中实现了最先进的性能,其中包括Math500的95.3%,MathVision的51.8%和Mathverse的54.6%。我们释放我们的模型,数据和训练动力学,以促进功能更强大,行为一致的多模式推理者的发展。
回归的混合物:自适应令牌计算的学习动态递归深度
-
标题: Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
-
作者: Sangmin Bae, Yujin Kim, Reza Bayat, Sungnyun Kim, Jiyoun Ha, Tal Schuster, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Aaron Courville, Se-Young Yun
-
日期: 2025-07-14
-
ArXiv主页 : https://arxiv.org/abs/2507.10524
-
gitHub仓库 : https://github.com/raymin0223/mixture_of_recursions
英文摘要
Scaling language models unlocks impressive capabilities, but the accompanying computational and memory demands make both training and deployment expensive. Existing efficiency efforts typically target either parameter sharing or adaptive computation, leaving open the question of how to attain both simultaneously. We introduce Mixture-of-Recursions (MoR), a unified framework that combines the two axes of efficiency inside a single Recursive Transformer. MoR reuses a shared stack of layers across recursion steps to achieve parameter efficiency, while lightweight routers enable adaptive token-level thinking by dynamically assigning different recursion depths to individual tokens. This allows MoR to focus quadratic attention computation only among tokens still active at a given recursion depth, further improving memory access efficiency by selectively caching only their key-value pairs. Beyond these core mechanisms, we also propose a KV sharing variant that reuses KV pairs from the first recursion, specifically designed to decrease prefill latency and memory footprint. Across model scales ranging from 135M to 1.7B parameters, MoR forms a new Pareto frontier: at equal training FLOPs and smaller model sizes, it significantly lowers validation perplexity and improves few-shot accuracy, while delivering higher throughput compared with vanilla and existing recursive baselines. These gains demonstrate that MoR is an effective path towards large-model quality without incurring large-model cost.
中文摘要
缩放语言模型可以解锁令人印象深刻的功能,但是随附的计算和记忆需求使培训和部署既昂贵。现有的效率努力通常针对参数共享或自适应计算,从而打开了如何同时获得两者的问题。我们介绍了回归(MOR)的混合物,这是一个统一的框架,结合了单个递归变压器内部的两个效率轴。MOR在递归步骤中恢复了共享的一层层,以实现参数效率,而轻型路由器通过将不同的递归深度分配给单个令牌来实现自适应令牌级别的思维。这使得MOR仅在给定递归深度仍处于活动状态的令牌中只能将二次注意计算注意到,从而通过选择性缓存其键值对来进一步提高内存访问效率。除了这些核心机制之外,我们还提出了一种KV共享变体,该变体可以重用第一次递归中的KV对,专门设计用于减少预填充潜伏期和内存足迹。在从135m到1.7b参数的模型量表中,MOR形成了一个新的Pareto边界:在均等训练和较小的模型尺寸下,它显着降低了验证的困惑,并提高了很少的射击精度,而与Vanilla和现有的Recursive Baselines相比,它提供了更高的吞吐量。这些收益表明,MOR是通往大型模型质量的有效途径,而不会产生大型模型。
π^3:可扩展的置换式 - 视觉几何学习
- 标题: π^3: Scalable Permutation-Equivariant Visual Geometry Learning
- 作者: Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chunhua Shen, Tong He
- 日期: 2025-07-17
- ArXiv主页 : https://arxiv.org/abs/2507.13347
- 论文链接 : https://arxiv.org/pdf/2507.13347
- 项目链接 : https://yyfz.github.io/pi3/
- gitHub仓库 : https://github.com/yyfz/Pi3
英文摘要
We introduce pi^3, a feed-forward neural network that offers a novel approach to visual geometry reconstruction, breaking the reliance on a conventional fixed reference view. Previous methods often anchor their reconstructions to a designated viewpoint, an inductive bias that can lead to instability and failures if the reference is suboptimal. In contrast, pi^3 employs a fully permutation-equivariant architecture to predict affine-invariant camera poses and scale-invariant local point maps without any reference frames. This design makes our model inherently robust to input ordering and highly scalable. These advantages enable our simple and bias-free approach to achieve state-of-the-art performance on a wide range of tasks, including camera pose estimation, monocular/video depth estimation, and dense point map reconstruction. Code and models are publicly available.
中文摘要
我们介绍了PI3,这是一种馈送前向神经网络,为视觉几何重建提供了一种新颖的方法,破坏了对常规固定参考视图的依赖。先前的方法通常将其重建锚定在指定的观点上,该观点是电感偏差,如果参考是次优的,可能导致不稳定和失败。相比之下,PI3采用完全置换的等值架构来预测仿射不变的相机姿势和规模不变的本地点图,而无需任何参考帧。该设计使我们的模型固有地对输入排序和高度可扩展。这些优势使我们可以在各种任务上实现最先进的表现,包括相机姿势估计,单眼/视频深度估计以及密集的点图重建。代码和模型公开可用。
视觉基础模型作为自动回归图像生成的有效视觉引物器
- 标题: Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation
- 作者: Anlin Zheng, Xin Wen, Xuanyang Zhang, Chuofan Ma, Tiancai Wang, Gang Yu, Xiangyu Zhang, Xiaojuan Qi
- 日期: 2025-07-11
- ArXiv主页 : https://arxiv.org/abs/2507.08441
- 论文链接 : https://arxiv.org/pdf/2507.08441
英文摘要
Leveraging the powerful representations of pre-trained vision foundation models -- traditionally used for visual comprehension -- we explore a novel direction: building an image tokenizer directly atop such models, a largely underexplored area. Specifically, we employ a frozen vision foundation model as the encoder of our tokenizer. To enhance its effectiveness, we introduce two key components: (1) a region-adaptive quantization framework that reduces redundancy in the pre-trained features on regular 2D grids, and (2) a semantic reconstruction objective that aligns the tokenizer's outputs with the foundation model's representations to preserve semantic fidelity. Based on these designs, our proposed image tokenizer, VFMTok, achieves substantial improvements in image reconstruction and generation quality, while also enhancing token efficiency. It further boosts autoregressive (AR) generation -- achieving a gFID of 2.07 on ImageNet benchmarks, while accelerating model convergence by three times, and enabling high-fidelity class-conditional synthesis without the need for classifier-free guidance (CFG). The code will be released publicly to benefit the community.
中文摘要
利用预训练的视觉基础模型(传统上用于视觉理解)的强大表示形式 - 我们探索了一个新颖的方向:直接在此类模型上建立图像令牌,这是一个很大程度上未经置换的区域。具体而言,我们采用冷冻视觉基础模型作为令牌的编码器。为了提高其有效性,我们介绍了两个关键组成部分:(1)一个区域自适应量化框架,可在常规2D网格上降低预训练的特征的冗余,以及(2)语义重建目标,将标记器的输出与基础模型的表示形式保持一致,以保存Senicantic Fidelity。基于这些设计,我们提出的图像令牌VFMTOK在图像重建和发电质量方面取得了重大改进,同时也提高了令牌效率。它进一步增强了自回旋(AR)生成 - 在成像网基准测试上达到2.07的GFID,同时将模型收敛加速3次,并启用高保真性班级条件合成的情况,而无需进行分类的无需分类器指导(CFG)。该代码将公开发布以使社区受益。
双子座2.5:用高级推理,多模式,长上下文和下一代代理能力推动边界
- 标题: Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
- 作者: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu, Toby Boyd, Brad Hekman, Aaron Parisi, Chaoyi Zhang, Kornraphop Kawintiranon, Tania Bedrax-Weiss, Oliver Wang, Ya Xu, Ollie Purkiss, Uri Mendlovic, Ilaï Deutel, Nam Nguyen, Adam Langley, Flip Korn, Lucia Rossazza, Alexandre Ramé, Sagar Waghmare, Helen Miller, Vaishakh Keshava, Ying Jian, Xiaofan Zhang, Raluca Ada Popa, Kedar Dhamdhere, Blaž Bratanič, Kyuyeun Kim, Terry Koo, Ferran Alet, Yi-ting Chen, Arsha Nagrani, Hannah Muckenhirn, Zhiyuan Zhang, Corbin Quick, Filip Pavetić, Duc Dung Nguyen, Joao Carreira, Michael Elabd, Haroon Qureshi, Fabian Mentzer, Yao-Yuan Yang, Danielle Eisenbud, Anmol Gulati, Ellie Talius, Eric Ni, Sahra Ghalebikesabi, Edouard Yvinec, Alaa Saade, Thatcher Ulrich, Lorenzo Blanco, Dan A. Calian, Muhuan Huang, Aäron van den Oord, Naman Goyal, Terry Chen, Praynaa Rawlani, Christian Schallhart, Swachhand Lokhande, Xianghong Luo, Jyn Shan, Ceslee Montgomery, Victoria Krakovna, Federico Piccinini, Omer Barak, Jingyu Cui, Yiling Jia, Mikhail Dektiarev, Alexey Kolganov, Shiyu Huang, Zhe Chen, Xingyu Wang, Jessica Austin, Peter de Boursac, Evgeny Sluzhaev, Frank Ding, Huijian Li, Surya Bhupatiraju, Mohit Agarwal, Sławek Kwasiborski, Paramjit Sandhu, Patrick Siegler, Ahmet Iscen, Eyal Ben-David, Shiraz Butt, Miltos Allamanis, Seth Benjamin, Robert Busa-Fekete, Felix Hernandez-Campos, Sasha Goldshtein, Matt Dibb, Weiyang Zhang, Annie Marsden, Carey Radebaugh, Stephen Roller, Abhishek Nayyar, Jacob Austin, Tayfun Terzi, Bhargav Kanagal Shamanna, Pete Shaw, Aayush Singh, Florian Luisier, Artur Mendonça, Vaibhav Aggarwal, Larisa Markeeva, Claudio Fantacci, Sergey Brin, HyunJeong Choe, Guanyu Wang, Hartwig Adam, Avigail Dabush, Tatsuya Kiyono, Eyal Marcus, Jeremy Cole, Theophane Weber, Hongrae Lee, Ronny Huang, Alex Muzio, Leandro Kieliger, Maigo Le, Courtney Biles, Long Le, Archit Sharma, Chengrun Yang, Avery Lamp, Dave Dopson, Nate Hurley, Katrina, Xu, Zhihao Shan, Shuang Song, Jiewen Tan, Alexandre Senges, George Zhang, Chong You, Yennie Jun, David Raposo, Susanna Ricco, Xuan Yang, Weijie Chen, Prakhar Gupta, Arthur Szlam, Kevin Villela, Chun-Sung Ferng, Daniel Kasenberg, Chen Liang, Rui Zhu, Arunachalam Narayanaswamy, Florence Perot, Paul Pucciarelli, Anna Shekhawat, Alexey Stern, Rishikesh Ingale, Stefani Karp, Sanaz Bahargam, Adrian Goedeckemeyer, Jie Han, Sicheng Li, Andrea Tacchetti, Dian Yu, Abhishek Chakladar, Zhiying Zhang, Mona El Mahdy, Xu Gao, Dale Johnson, Samrat Phatale, AJ Piergiovanni, Hyeontaek Lim, Clement Farabet, Carl Lebsack, Theo Guidroz, John Blitzer, Nico Duduta, David Madras, Steve Li, Daniel von Dincklage, Xin Li, Mahdis Mahdieh, George Tucker, Ganesh Jawahar, Owen Xiao, Danny Tarlow, Robert Geirhos, Noam Velan, Daniel Vlasic, Kalesha Bullard, SK Park, Nishesh Gupta, Kellie Webster, Ayal Hitron, Jieming Mao, Julian Eisenschlos, Laurel Prince, Nina D'Souza, Kelvin Zheng, Sara Nasso, Gabriela Botea, Carl Doersch, Caglar Unlu, Chris Alberti, Alexey Svyatkovskiy, Ankita Goel, Krzysztof Choromanski, Pan-Pan Jiang, Richard Nguyen, Four Flynn, Daria Ćurko, Peter Chen, Nicholas Roth, Kieran Milan, Caleb Habtegebriel, Shashi Narayan, Michael Moffitt, Jake Marcus, Thomas Anthony, Brendan McMahan, Gowoon Cheon, Ruibo Liu, Megan Barnes, Lukasz Lew, Rebeca Santamaria-Fernandez, Mayank Upadhyay, Arjun Akula, Arnar Mar Hrafnkelsson, Alvaro Caceres, Andrew Bunner, Michal Sokolik, Subha Puttagunta, Lawrence Moore, Berivan Isik, Weilun Chen, Jay Hartford, Lawrence Chan, Pradeep Shenoy, Dan Holtmann-Rice, Jane Park, Fabio Viola, Alex Salcianu, Sujeevan Rajayogam, Ian Stewart-Binks, Zelin Wu, Richard Everett, Xi Xiong, Pierre-Antoine Manzagol, Gary Leung, Carl Saroufim, Bo Pang, Dawid Wegner, George Papamakarios, Jennimaria Palomaki, Helena Pankov, Guangda Lai, Guilherme Tubone, Shubin Zhao, Theofilos Strinopoulos, Seth Neel, Mingqiu Wang, Joe Kelley, Li Li, Pingmei Xu, Anitha Vijayakumar, Andrea D'olimpio, Omer Levy, Massimo Nicosia, Grigory Rozhdestvenskiy, Ni Lao, Sirui Xie, Yash Katariya, Jon Simon, Sanjiv Kumar, Florian Hartmann, Michael Kilgore, Jinhyuk Lee, Aroma Mahendru, Roman Ring, Tom Hennigan, Fiona Lang, Colin Cherry, David Steiner, Dawsen Hwang, Ray Smith, Pidong Wang, Jeremy Chen, Ming-Hsuan Yang, Sam Kwei, Philippe Schlattner, Donnie Kim, Ganesh Poomal Girirajan, Nikola Momchev, Ayushi Agarwal, Xingyi Zhou, Ilkin Safarli, Zachary Garrett, AJ Pierigiovanni, Sarthak Jauhari, Alif Raditya Rochman, Shikhar Vashishth, Quan Yuan, Christof Angermueller, Jon Blanton, Xinying Song, Nitesh Bharadwaj Gundavarapu, Thi Avrahami, Maxine Deines, Subhrajit Roy, Manish Gupta, Christopher Semturs, Shobha Vasudevan, Aditya Srikanth Veerubhotla, Shriya Sharma, Josh Jacob, Zhen Yang, Andreas Terzis, Dan Karliner, Auriel Wright, Tania Rojas-Esponda, Ashley Brown, Abhijit Guha Roy, Pawan Dogra, Andrei Kapishnikov, Peter Young, Wendy Kan, Vinodh Kumar Rajendran, Maria Ivanova, Salil Deshmukh, Chia-Hua Ho, Mike Kwong, Stav Ginzburg, Annie Louis, KP Sawhney, Slav Petrov, Jing Xie, Yunfei Bai, Georgi Stoyanov, Alex Fabrikant, Rajesh Jayaram, Yuqi Li, Joe Heyward, Justin Gilmer, Yaqing Wang, Radu Soricut, Luyang Liu, Qingnan Duan, Jamie Hayes, Maura O'Brien, Gaurav Singh Tomar, Sivan Eiger, Bahar Fatemi, Jeffrey Hui, Catarina Barros, Adaeze Chukwuka, Alena Butryna, Saksham Thakur, Austin Huang, Zhufeng Pan, Haotian Tang, Serkan Cabi, Tulsee Doshi, Michiel Bakker, Sumit Bagri, Ruy Ley-Wild, Adam Lelkes, Jennie Lees, Patrick Kane, David Greene, Shimu Wu, Jörg Bornschein, Gabriela Surita, Sarah Hodkinson, Fangtao Li, Chris Hidey, Sébastien Pereira, Sean Ammirati, Phillip Lippe, Adam Kraft, Pu Han, Sebastian Gerlach, Zifeng Wang, Liviu Panait, Feng Han, Brian Farris, Yingying Bi, Hannah DeBalsi, Miaosen Wang, Gladys Tyen, James Cohan, Susan Zhang, Jarred Barber, Da-Woon Chung, Jaeyoun Kim, Markus Kunesch, Steven Pecht, Nami Akazawa, Abe Friesen, James Lyon, Ali Eslami, Junru Wu, Jie Tan, Yue Song, Ravi Kumar, Chris Welty, Ilia Akolzin, Gena Gibson, Sean Augenstein, Arjun Pillai, Nancy Yuen, Du Phan, Xin Wang, Iain Barr, Heiga Zen, Nan Hua, Casper Liu, Jilei, Wang, Tanuj Bhatia, Hao Xu, Oded Elyada, Pushmeet Kohli, Mirek Olšák, Ke Chen, Azalia Mirhoseini, Noam Shazeer, Shoshana Jakobovits, Maggie Tran, Nolan Ramsden, Tarun Bharti, Fred Alcober, Yunjie Li, Shilpa Shetty, Jing Chen, Dmitry Kalashnikov, Megha Nawhal, Sercan Arik, Hanwen Chen, Michiel Blokzijl, Shubham Gupta, James Rubin, Rigel Swavely, Sophie Bridgers, Ian Gemp, Chen Su, Arun Suggala, Juliette Pluto, Mary Cassin, Alain Vaucher, Kaiyang Ji, Jiahao Cai, Andrew Audibert, Animesh Sinha, David Tian, Efrat Farkash, Amy Hua, Jilin Chen, Duc-Hieu Tran, Edward Loper, Nicole Brichtova, Lara McConnaughey, Ballie Sandhu, Robert Leland, Doug DeCarlo, Andrew Over, James Huang, Xing Wu, Connie Fan, Eric Li, Yun Lei, Deepak Sharma, Cosmin Paduraru, Luo Yu, Matko Bošnjak, Phuong Dao, Min Choi, Sneha Kudugunta, Jakub Adamek, Carlos Guía, Ali Khodaei, Jie Feng, Wenjun Zeng, David Welling, Sandeep Tata, Christina Butterfield, Andrey Vlasov, Seliem El-Sayed, Swaroop Mishra, Tara Sainath, Shentao Yang, RJ Skerry-Ryan, Jeremy Shar, Robert Berry, Arunkumar Rajendran, Arun Kandoor, Andrea Burns, Deepali Jain, Tom Stone, Wonpyo Park, Shibo Wang, Albin Cassirer, Guohui Wang, Hayato Kobayashi, Sergey Rogulenko, Vineetha Govindaraj, Mikołaj Rybiński, Nadav Olmert, Colin Evans, Po-Sen Huang, Kelvin Xu, Premal Shah, Terry Thurk, Caitlin Sikora, Mu Cai, Jin Xie, Elahe Dabir, Saloni Shah, Norbert Kalb, Carrie Zhang, Shruthi Prabhakara, Amit Sabne, Artiom Myaskovsky, Vikas Raunak, Blanca Huergo, Behnam Neyshabur, Jon Clark, Ye Zhang, Shankar Krishnan, Eden Cohen, Dinesh Tewari, James Lottes, Yumeya Yamamori, Hui, Li, Mohamed Elhawaty, Ada Maksutaj Oflazer, Adrià Recasens, Sheryl Luo, Duy Nguyen, Taylor Bos, Kalyan Andra, Ana Salazar, Ed Chi, Jeongwoo Ko, Matt Ginsberg, Anders Andreassen, Anian Ruoss, Todor Davchev, Elnaz Davoodi, Chenxi Liu, Min Kim, Santiago Ontanon, Chi Ming To, Dawei Jia, Rosemary Ke, Jing Wang, Anna Korsun, Moran Ambar, Ilya Kornakov, Irene Giannoumis, Toni Creswell, Denny Zhou, Yi Su, Ishaan Watts, Aleksandr Zaks, Evgenii Eltyshev, Ziqiang Feng, Sidharth Mudgal, Alex Kaskasoli, Juliette Love, Kingshuk Dasgupta, Sam Shleifer, Richard Green, Sungyong Seo, Chansoo Lee, Dale Webster, Prakash Shroff, Ganna Raboshchuk, Isabel Leal, James Manyika, Sofia Erell, Daniel Murphy, Zhisheng Xiao, Anton Bulyenov, Julian Walker, Mark Collier, Matej Kastelic, Nelson George, Sushant Prakash, Sailesh Sidhwani, Alexey Frolov, Steven Hansen, Petko Georgiev, Tiberiu Sosea, Chris Apps, Aishwarya Kamath, David Reid, Emma Cooney, Charlotte Magister, Oriana Riva, Alec Go, Pu-Chin Chen, Sebastian Krause, Nir Levine, Marco Fornoni, Ilya Figotin, Nick Roy, Parsa Mahmoudieh, Vladimir Magay, Mukundan Madhavan, Jin Miao, Jianmo Ni, Yasuhisa Fujii, Ian Chou, George Scrivener, Zak Tsai, Siobhan Mcloughlin, Jeremy Selier, Sandra Lefdal, Jeffrey Zhao, Abhijit Karmarkar, Kushal Chauhan, Shivanker Goel, Zhaoyi Zhang, Vihan Jain, Parisa Haghani, Mostafa Dehghani, Jacob Scott, Erin Farnese, Anastasija Ilić, Steven Baker, Julia Pawar, Li Zhong, Josh Camp, Yoel Zeldes, Shravya Shetty, Anand Iyer, Vít Listík, Jiaxian Guo, Luming Tang, Mark Geller, Simon Bucher, Yifan Ding, Hongzhi Shi, Carrie Muir, Dominik Grewe, Ramy Eskander, Octavio Ponce, Boqing Gong, Derek Gasaway, Samira Khan, Umang Gupta, Angelos Filos, Weicheng Kuo, Klemen Kloboves, Jennifer Beattie, Christian Wright, Leon Li, Alicia Jin, Sandeep Mariserla, Miteyan Patel, Jens Heitkaemper, Dilip Krishnan, Vivek Sharma, David Bieber, Christian Frank, John Lambert, Paul Caron, Martin Polacek, Mai Giménez, Himadri Choudhury, Xing Yu, Sasan Tavakkol, Arun Ahuja, Franz Och, Rodolphe Jenatton, Wojtek Skut, Bryan Richter, David Gaddy, Andy Ly, Misha Bilenko, Megh Umekar, Ethan Liang, Martin Sevenich, Mandar Joshi, Hassan Mansoor, Rebecca Lin, Sumit Sanghai, Abhimanyu Singh, Xiaowei Li, Sudheendra Vijayanarasimhan, Zaheer Abbas, Yonatan Bitton, Hansa Srinivasan, Manish Reddy Vuyyuru, Alexander Frömmgen, Yanhua Sun, Ralph Leith, Alfonso Castaño, DJ Strouse, Le Yan, Austin Kyker, Satish Kambala, Mary Jasarevic, Thibault Sellam, Chao Jia, Alexander Pritzel, Raghavender R, Huizhong Chen, Natalie Clay, Sudeep Gandhe, Sean Kirmani, Sayna Ebrahimi, Hannah Kirkwood, Jonathan Mallinson, Chao Wang, Adnan Ozturel, Kuo Lin, Shyam Upadhyay, Vincent Cohen-Addad, Sean Purser-haskell, Yichong Xu, Ebrahim Songhori, Babi Seal, Alberto Magni, Almog Gueta, Tingting Zou, Guru Guruganesh, Thais Kagohara, Hung Nguyen, Khalid Salama, Alejandro Cruzado Ruiz, Justin Frye, Zhenkai Zhu, Matthias Lochbrunner, Simon Osindero, Wentao Yuan, Lisa Lee, Aman Prasad, Lam Nguyen Thiet, Daniele Calandriello, Victor Stone, Qixuan Feng, Han Ke, Maria Voitovich, Geta Sampemane, Lewis Chiang, Ling Wu, Alexander Bykovsky, Matt Young, Luke Vilnis, Ishita Dasgupta, Aditya Chawla, Qin Cao, Bowen Liang, Daniel Toyama, Szabolcs Payrits, Anca Stefanoiu, Dimitrios Vytiniotis, Ankesh Anand, Tianxiao Shen, Blagoj Mitrevski, Michael Tschannen, Sreenivas Gollapudi, Aishwarya P S, José Leal, Zhe Shen, Han Fu, Wei Wang, Arvind Kannan, Doron Kukliansky, Sergey Yaroshenko, Svetlana Grant, Umesh Telang, David Wood, Alexandra Chronopoulou, Alexandru Ţifrea, Tao Zhou, Tony, Nguy~ên, Muge Ersoy, Anima Singh, Meiyan Xie, Emanuel Taropa, Woohyun Han, Eirikur Agustsson, Andrei Sozanschi, Hui Peng, Alex Chen, Yoel Drori, Efren Robles, Yang Gao, Xerxes Dotiwalla, Ying Chen, Anudhyan Boral, Alexei Bendebury, John Nham, Chris Tar, Luis Castro, Jiepu Jiang, Canoee Liu, Felix Halim, Jinoo Baek, Andy Wan, Jeremiah Liu, Yuan Cao, Shengyang Dai, Trilok Acharya, Ruoxi Sun, Fuzhao Xue, Saket Joshi, Morgane Lustman, Yongqin Xian, Rishabh Joshi, Deep Karkhanis, Nora Kassner, Jamie Hall, Xiangzhuo Ding, Gan Song, Gang Li, Chen Zhu, Yana Kulizhskaya, Bin Ni, Alexey Vlaskin, Solomon Demmessie, Lucio Dery, Salah Zaiem, Yanping Huang, Cindy Fan, Felix Gimeno, Ananth Balashankar, Koji Kojima, Hagai Taitelbaum, Maya Meng, Dero Gharibian, Sahil Singla, Wei Chen, Ambrose Slone, Guanjie Chen, Sujee Rajayogam, Max Schumacher, Suyog Kotecha, Rory Blevins, Qifei Wang, Mor Hazan Taege, Alex Morris, Xin Liu, Fayaz Jamil, Richard Zhang, Pratik Joshi, Ben Ingram, Tyler Liechty, Ahmed Eleryan, Scott Baird, Alex Grills, Gagan Bansal, Shan Han, Kiran Yalasangi, Shawn Xu, Majd Al Merey, Isabel Gao, Felix Weissenberger, Igor Karpov, Robert Riachi, Ankit Anand, Gautam Prasad, Kay Lamerigts, Reid Hayes, Jamie Rogers, Mandy Guo, Ashish Shenoy, Qiong, Hu, Kyle He, Yuchen Liu, Polina Zablotskaia, Sagar Gubbi, Yifan Chang, Jay Pavagadhi, Kristian Kjems, Archita Vadali, Diego Machado, Yeqing Li, Renshen Wang, Dipankar Ghosh, Aahil Mehta, Dana Alon, George Polovets, Alessio Tonioni, Nate Kushman, Joel D'sa, Lin Zhuo, Allen Wu, Rohin Shah, John Youssef, Jiayu Ye, Justin Snyder, Karel Lenc, Senaka Buthpitiya, Matthew Tung, Jichuan Chang, Tao Chen, David Saxton, Jenny Lee, Lydia Lihui Zhang, James Qin, Prabakar Radhakrishnan, Maxwell Chen, Piotr Ambroszczyk, Metin Toksoz-Exley, Yan Zhong, Nitzan Katz, Brendan O'Donoghue, Tamara von Glehn, Adi Gerzi Rosenthal, Aga Świetlik, Xiaokai Zhao, Nick Fernando, Jinliang Wei, Jieru Mei, Sergei Vassilvitskii, Diego Cedillo, Pranjal Awasthi, Hui Zheng, Koray Kavukcuoglu, Itay Laish, Joseph Pagadora, Marc Brockschmidt, Christopher A. Choquette-Choo, Arunkumar Byravan, Yifeng Lu, Xu Chen, Mia Chen, Kenton Lee, Rama Pasumarthi, Sijal Bhatnagar, Aditya Shah, Qiyin Wu, Zhuoyuan Chen, Zack Nado, Bartek Perz, Zixuan Jiang, David Kao, Ganesh Mallya, Nino Vieillard, Lantao Mei, Sertan Girgin, Mandy Jordan, Yeongil Ko, Alekh Agarwal, Yaxin Liu, Yasemin Altun, Raoul de Liedekerke, Anastasios Kementsietsidis, Daiyi Peng, Dangyi Liu, Utku Evci, Peter Humphreys, Austin Tarango, Xiang Deng, Yoad Lewenberg, Kevin Aydin, Chengda Wu, Bhavishya Mittal, Tsendsuren Munkhdalai, Kleopatra Chatziprimou, Rodrigo Benenson, Uri First, Xiao Ma, Jinning Li, Armand Joulin, Hamish Tomlinson, Tingnan Zhang, Milad Nasr, Zhi Hong, Michaël Sander, Lisa Anne Hendricks, Anuj Sharma, Andrew Bolt, Eszter Vértes, Jiri Simsa, Tomer Levinboim, Olcan Sercinoglu, Divyansh Shukla, Austin Wu, Craig Swanson, Danny Vainstein, Fan Bu, Bo Wang, Ryan Julian, Charles Yoon, Sergei Lebedev, Antonious Girgis, Bernd Bandemer, David Du, Todd Wang, Xi Chen, Ying Xiao, Peggy Lu, Natalie Ha, Vlad Ionescu, Simon Rowe, Josip Matak, Federico Lebron, Andreas Steiner, Lalit Jain, Manaal Faruqui, Nicolas Lacasse, Georgie Evans, Neesha Subramaniam, Dean Reich, Giulia Vezzani, Aditya Pandey, Joe Stanton, Tianhao Zhou, Liam McCafferty, Henry Griffiths, Verena Rieser, Soheil Hassas Yeganeh, Eleftheria Briakou, Lu Huang, Zichuan Wei, Liangchen Luo, Erik Jue, Gabby Wang, Victor Cotruta, Myriam Khan, Jongbin Park, Qiuchen Guo, Peiran Li, Rong Rong, Diego Antognini, Anastasia Petrushkina, Chetan Tekur, Eli Collins, Parul Bhatia, Chester Kwak, Wenhu Chen, Arvind Neelakantan, Immanuel Odisho, Sheng Peng, Vincent Nallatamby, Vaibhav Tulsyan, Fabian Pedregosa, Peng Xu, Raymond Lin, Yulong Wang, Emma Wang, Sholto Douglas, Reut Tsarfaty, Elena Gribovskaya, Renga Aravamudhan, Manu Agarwal, Mara Finkelstein, Qiao Zhang, Elizabeth Cole, Phil Crone, Sarmishta Velury, Anil Das, Chris Sauer, Luyao Xu, Danfeng Qin, Chenjie Gu, Dror Marcus, CJ Zheng, Wouter Van Gansbeke, Sobhan Miryoosefi, Haitian Sun, YaGuang Li, Charlie Chen, Jae Yoo, Pavel Dubov, Alex Tomala, Adams Yu, Paweł Wesołowski, Alok Gunjan, Eddie Cao, Jiaming Luo, Nikhil Sethi, Arkadiusz Socala, Laura Graesser, Tomas Kocisky, Arturo BC, Minmin Chen, Edward Lee, Sophie Wang, Weize Kong, Qiantong Xu, Nilesh Tripuraneni, Yiming Li, Xinxin Yu, Allen Porter, Paul Voigtlaender, Biao Zhang, Arpi Vezer, Sarah York, Qing Wei, Geoffrey Cideron, Mark Kurzeja, Seungyeon Kim, Benny Li, Angéline Pouget, Hyo Lee, Kaspar Daugaard, Yang Li, Dave Uthus, Aditya Siddhant, Paul Cavallaro, Sriram Ganapathy, Maulik Shah, Rolf Jagerman, Jeff Stanway, Piermaria Mendolicchio, Li Xiao, Kayi Lee, Tara Thompson, Shubham Milind Phal, Jason Chase, Sun Jae Lee, Adrian N Reyes, Disha Shrivastava, Zhen Qin, Roykrong Sukkerd, Seth Odoom, Lior Madmoni, John Aslanides, Jonathan Herzig, Elena Pochernina, Sheng Zhang, Parker Barnes, Daisuke Ikeda, Qiujia Li, Shuo-yiin Chang, Shakir Mohamed, Jim Sproch, Richard Powell, Bidisha Samanta, Domagoj Ćevid, Anton Kovsharov, Shrestha Basu Mallick, Srinivas Tadepalli, Anne Zheng, Kareem Ayoub, Andreas Noever, Christian Reisswig, Zhuo Xu, Junhyuk Oh, Martin Matysiak, Tim Blyth, Shereen Ashraf, Julien Amelot, Boone Severson, Michele Bevilacqua, Motoki Sano, Ethan Dyer, Ofir Roval, Anu Sinha, Yin Zhong, Sagi Perel, Tea Sabolić, Johannes Mauerer, Willi Gierke, Mauro Verzetti, Rodrigo Cabrera, Alvin Abdagic, Steven Hemingray, Austin Stone, Jong Lee, Farooq Ahmad, Karthik Raman, Lior Shani, Jonathan Lai, Orhan Firat, Nathan Waters, Eric Ge, Mo Shomrat, Himanshu Gupta, Rajeev Aggarwal, Tom Hudson, Bill Jia, Simon Baumgartner, Palak Jain, Joe Kovac, Junehyuk Jung, Ante Žužul, Will Truong, Morteza Zadimoghaddam, Songyou Peng, Marco Liang, Rachel Sterneck, Balaji Lakshminarayanan, Machel Reid, Oliver Woodman, Tong Zhou, Jianling Wang, Vincent Coriou, Arjun Narayanan, Jay Hoover, Yenai Ma, Apoorv Jindal, Clayton Sanford, Doug Reid, Swaroop Ramaswamy, Alex Kurakin, Roland Zimmermann, Yana Lunts, Dragos Dena, Zalán Borsos, Vered Cohen, Shujian Zhang, Will Grathwohl, Robert Dadashi, Morgan Redshaw, Joshua Kessinger, Julian Odell, Silvano Bonacina, Zihang Dai, Grace Chen, Ayush Dubey, Pablo Sprechmann, Mantas Pajarskas, Wenxuan Zhou, Niharika Ahuja, Tara Thomas, Martin Nikoltchev, Matija Kecman, Bharath Mankalale, Andrey Ryabtsev, Jennifer She, Christian Walder, Jiaming Shen, Lu Li, Carolina Parada, Sheena Panthaplackel, Okwan Kwon, Matt Lawlor, Utsav Prabhu, Yannick Schroecker, Marc'aurelio Ranzato, Pete Blois, Iurii Kemaev, Ting Yu, Dmitry, Lepikhin, Hao Xiong, Sahand Sharifzadeh, Oleaser Johnson, Jeremiah Willcock, Rui Yao, Greg Farquhar, Sujoy Basu, Hidetoshi Shimokawa, Nina Anderson, Haiguang Li, Khiem Pham, Yizhong Liang, Sebastian Borgeaud, Alexandre Moufarek, Hideto Kazawa, Blair Kutzman, Marcin Sieniek, Sara Smoot, Ruth Wang, Natalie Axelsson, Nova Fallen, Prasha Sundaram, Yuexiang Zhai, Varun Godbole, Petros Maniatis, Alek Wang, Ilia Shumailov, Santhosh Thangaraj, Remi Crocker, Nikita Gupta, Gang Wu, Phil Chen, Gellért Weisz, Celine Smith, Mojtaba Seyedhosseini, Boya Fang, Xiyang Luo, Roey Yogev, Zeynep Cankara, Andrew Hard, Helen Ran, Rahul Sukthankar, George Necula, Gaël Liu, Honglong Cai, Praseem Banzal, Daniel Keysers, Sanjay Ghemawat, Connie Tao, Emma Dunleavy, Aditi Chaudhary, Wei Li, Maciej Mikuła, Chen-Yu Lee, Tiziana Refice, Krishna Somandepalli, Alexandre Fréchette, Dan Bahir, John Karro, Keith Rush, Sarah Perrin, Bill Rosgen, Xiaomeng Yang, Clara Huiyi Hu, Mahmoud Alnahlawi, Justin Mao-Jones, Roopal Garg, Hoang Nguyen, Bat-Orgil Batsaikhan, Iñaki Iturrate, Anselm Levskaya, Avi Singh, Ashyana Kachra, Tony Lu, Denis Petek, Zheng Xu, Mark Graham, Lukas Zilka, Yael Karov, Marija Kostelac, Fangyu Liu, Yaohui Guo, Weiyue Wang, Bernd Bohnet, Emily Pitler, Tony Bruguier, Keisuke Kinoshita, Chrysovalantis Anastasiou, Nilpa Jha, Ting Liu, Jerome Connor, Phil Wallis, Philip Pham, Eric Bailey, Shixin Li, Heng-Tze Cheng, Sally Ma, Haiqiong Li, Akanksha Maurya, Kate Olszewska, Manfred Warmuth, Christy Koh, Dominik Paulus, Siddhartha Reddy Jonnalagadda, Enrique Piqueras, Ali Elqursh, Geoff Brown, Hadar Shemtov, Loren Maggiore, Fei Xia, Ryan Foley, Beka Westberg, George van den Driessche, Livio Baldini Soares, Arjun Kar, Michael Quinn, Siqi Zuo, Jialin Wu, Kyle Kastner, Anna Bortsova, Aijun Bai, Ales Mikhalap, Luowei Zhou, Jennifer Brennan, Vinay Ramasesh, Honglei Zhuang, John Maggs, Johan Schalkwyk, Yuntao Xu, Hui Huang, Andrew Howard, Sasha Brown, Linting Xue, Gloria Shen, Brian Albert, Neha Jha, Daniel Zheng, Varvara Krayvanova, Spurthi Amba Hombaiah, Olivier Lacombe, Gautam Vasudevan, Dan Graur, Tian Xie, Meet Gandhi, Bangju Wang, Dustin Zelle, Harman Singh, Dahun Kim, Sébastien Cevey, Victor Ungureanu, Natasha Noy, Fei Liu, Annie Xie, Fangxiaoyu Feng, Katerina Tsihlas, Daniel Formoso, Neera Vats, Quentin Wellens, Yinan Wang, Niket Kumar Bhumihar, Samrat Ghosh, Matt Hoffman, Tom Lieber, Oran Lang, Kush Bhatia, Tom Paine, Aroonalok Pyne, Ronny Votel, Madeleine Clare Elish, Benoit Schillings, Alex Panagopoulos, Haichuan Yang, Adam Raveret, Zohar Yahav, Shuang Liu, Warren Chen, Dalia El Badawy, Nishant Agrawal, Mohammed Badawi, Mahdi Mirzazadeh, Carla Bromberg, Fan Ye, Chang Liu, Tatiana Sholokhova, George-Cristian Muraru, Gargi Balasubramaniam, Jonathan Malmaud, Alen Carin, Danilo Martins, Irina Jurenka, Pankil Botadra, Dave Lacey, Richa Singh, Mariano Schain, Dan Zheng, Isabelle Guyon, Victor Lavrenko, Seungji Lee, Xiang Zhou, Demis Hassabis, Jeshwanth Challagundla, Derek Cheng, Nikhil Mehta, Matthew Mauger, Michela Paganini, Pushkar Mishra, Kate Lee, Zhang Li, Lexi Baugher, Ondrej Skopek, Max Chang, Amir Zait, Gaurav Menghani, Lizzetth Bellot, Guangxing Han, Jean-Michel Sarr, Sharat Chikkerur, Himanshu Sahni, Rohan Anil, Arun Narayanan, Chandu Thekkath, Daniele Pighin, Hana Strejček, Marko Velic, Fred Bertsch, Manuel Tragut, Keran Rong, Alicia Parrish, Kai Bailey, Jiho Park, Isabela Albuquerque, Abhishek Bapna, Rajesh Venkataraman, Alec Kosik, Johannes Griesser, Zhiwei Deng, Alek Andreev, Qingyun Dou, Kevin Hui, Fanny Wei, Xiaobin Yu, Lei Shu, Avia Aharon, David Barker, Badih Ghazi, Sebastian Flennerhag, Chris Breaux, Yuchuan Liu, Matthew Bilotti, Josh Woodward, Uri Alon, Stephanie Winkler, Tzu-Kuo Huang, Kostas Andriopoulos, João Gabriel Oliveira, Penporn Koanantakool, Berkin Akin, Michael Wunder, Cicero Nogueira dos Santos, Mohammad Hossein Bateni, Lin Yang, Dan Horgan, Beer Changpinyo, Keyvan Amiri, Min Ma, Dayeong Lee, Lihao Liang, Anirudh Baddepudi, Tejasi Latkar, Raia Hadsell, Jun Xu, Hairong Mu, Michael Han, Aedan Pope, Snchit Grover, Frank Kim, Ankit Bhagatwala, Guan Sun, Yamini Bansal, Amir Globerson, Alireza Nazari, Samira Daruki, Hagen Soltau, Jane Labanowski, Laurent El Shafey, Matt Harvey, Yanif Ahmad, Elan Rosenfeld, William Kong, Etienne Pot, Yi-Xuan Tan, Aurora Wei, Victoria Langston, Marcel Prasetya, Petar Veličković, Richard Killam, Robin Strudel, Darren Ni, Zhenhai Zhu, Aaron Archer, Kavya Kopparapu, Lynn Nguyen, Emilio Parisotto, Hussain Masoom, Sravanti Addepalli, Jordan Grimstad, Hexiang Hu, Joss Moore, Avinatan Hassidim, Le Hou, Mukund Raghavachari, Jared Lichtarge, Adam R. Brown, Hilal Dib, Natalia Ponomareva, Justin Fu, Yujing Zhang, Altaf Rahman, Joana Iljazi, Edouard Leurent, Gabriel Dulac-Arnold, Cosmo Du, Chulayuth Asawaroengchai, Larry Jin, Ela Gruzewska, Ziwei Ji, Benigno Uria, Daniel De Freitas, Paul Barham, Lauren Beltrone, Víctor Campos, Jun Yan, Neel Kovelamudi, Arthur Nguyen, Elinor Davies, Zhichun Wu, Zoltan Egyed, Kristina Toutanova, Nithya Attaluri, Hongliang Fei, Peter Stys, Siddhartha Brahma, Martin Izzard, Siva Velusamy, Scott Lundberg, Vincent Zhuang, Kevin Sequeira, Adam Santoro, Ehsan Amid, Ophir Aharoni, Shuai Ye, Mukund Sundararajan, Lijun Yu, Yu-Cheng Ling, Stephen Spencer, Hugo Song, Josip Djolonga, Christo Kirov, Sonal Gupta, Alessandro Bissacco, Clemens Meyer, Mukul Bhutani, Andrew Dai, Weiyi Wang, Siqi Liu, Ashwin Sreevatsa, Qijun Tan, Maria Wang, Lucy Kim, Yicheng Wang, Alex Irpan, Yang Xiao, Stanislav Fort, Yifan He, Alex Gurney, Bryan Gale, Yue Ma, Monica Roy, Viorica Patraucean, Taylan Bilal, Golnaz Ghiasi, Anahita Hosseini, Melvin Johnson, Zhuowan Li, Yi Tay, Benjamin Beyret, Katie Millican, Josef Broder, Mayank Lunayach, Danny Swisher, Eugen Vušak, David Parkinson, MH Tessler, Adi Mayrav Gilady, Richard Song, Allan Dafoe, Yves Raimond, Masa Yamaguchi, Itay Karo, Elizabeth Nielsen, Kevin Kilgour, Mike Dusenberry, Rajiv Mathews, Jiho Choi, Siyuan Qiao, Harsh Mehta, Sahitya Potluri, Chris Knutsen, Jialu Liu, Tat Tan, Kuntal Sengupta, Keerthana Gopalakrishnan, Abodunrinwa Toki, Mencher Chiang, Mike Burrows, Grace Vesom, Zafarali Ahmed, Ilia Labzovsky, Siddharth Vashishtha, Preeti Singh, Ankur Sharma, Ada Ma, Jinyu Xie, Pranav Talluri, Hannah Forbes-Pollard, Aarush Selvan, Joel Wee, Loic Matthey, Tom Funkhouser, Parthasarathy Gopavarapu, Lev Proleev, Cheng Li, Matt Thomas, Kashyap Kolipaka, Zhipeng Jia, Ashwin Kakarla, Srinivas Sunkara, Joan Puigcerver, Suraj Satishkumar Sheth, Emily Graves, Chen Wang, Sadh MNM Khan, Kai Kang, Shyamal Buch, Fred Zhang, Omkar Savant, David Soergel, Kevin Lee, Linda Friso, Xuanyi Dong, Rahul Arya, Shreyas Chandrakaladharan, Connor Schenck, Greg Billock, Tejas Iyer, Anton Bakalov, Leslie Baker, Alex Ruiz, Angad Chandorkar, Trieu Trinh, Matt Miecnikowski, Yanqi Zhou, Yangsibo Huang, Jiazhong Nie, Ali Shah, Ashish Thapliyal, Sam Haves, Lun Wang, Uri Shaham, Patrick Morris-Suzuki, Soroush Radpour, Leonard Berrada, Thomas Strohmann, Chaochao Yan, Jingwei Shen, Sonam Goenka, Tris Warkentin, Petar Dević, Dan Belov, Albert Webson, Madhavi Yenugula, Puranjay Datta, Jerry Chang, Nimesh Ghelani, Aviral Kumar, Vincent Perot, Jessica Lo, Yang Song, Herman Schmit, Jianmin Chen, Vasilisa Bashlovkina, Xiaoyue Pan, Diana Mincu, Paul Roit, Isabel Edkins, Andy Davis, Yujia Li, Ben Horn, Xinjian Li, Pradeep Kumar S, Eric Doi, Wanzheng Zhu, Sri Gayatri Sundara Padmanabhan, Siddharth Verma, Jasmine Liu, Heng Chen, Mihajlo Velimirović, Malcolm Reynolds, Priyanka Agrawal, Nick Sukhanov, Abhinit Modi, Siddharth Goyal, John Palowitch, Nima Khajehnouri, Wing Lowe, David Klinghoffer, Sharon Silver, Vinh Tran, Candice Schumann, Francesco Piccinno, Xi Liu, Mario Lučić, Xiaochen Yang, Sandeep Kumar, Ajay Kannan, Ragha Kotikalapudi, Mudit Bansal, Fabian Fuchs, Javad Hosseini, Abdelrahman Abdelhamed, Dawn Bloxwich, Tianhe Yu, Ruoxin Sang, Gregory Thornton, Karan Gill, Yuchi Liu, Virat Shejwalkar, Jason Lin, Zhipeng Yan, Kehang Han, Thomas Buschmann, Michael Pliskin, Zhi Xing, Susheel Tatineni, Junlin Zhang, Sissie Hsiao, Gavin Buttimore, Marcus Wu, Zefei Li, Geza Kovacs, Legg Yeung, Tao Huang, Aaron Cohen, Bethanie Brownfield, Averi Nowak, Mikel Rodriguez, Tianze Shi, Hado van Hasselt, Kevin Cen, Deepanway Ghoshal, Kushal Majmundar, Weiren Yu, Warren, Chen, Danila Sinopalnikov, Hao Zhang, Vlado Galić, Di Lu, Zeyu Zheng, Maggie Song, Gary Wang, Gui Citovsky, Swapnil Gawde, Isaac Galatzer-Levy, David Silver, Ivana Balazevic, Dipanjan Das, Kingshuk Majumder, Yale Cong, Praneet Dutta, Dustin Tran, Hui Wan, Junwei Yuan, Daniel Eppens, Alanna Walton, Been Kim, Harry Ragan, James Cobon-Kerr, Lu Liu, Weijun Wang, Bryce Petrini, Jack Rae, Rakesh Shivanna, Yan Xiong, Chace Lee, Pauline Coquinot, Yiming Gu, Lisa Patel, Blake Hechtman, Aviel Boag, Orion Jankowski, Alex Wertheim, Alex Lee, Paul Covington, Hila Noga, Sam Sobell, Shanthal Vasanth, William Bono, Chirag Nagpal, Wei Fan, Xavier Garcia, Kedar Soparkar, Aybuke Turker, Nathan Howard, Sachit Menon, Yuankai Chen, Vikas Verma, Vladimir Pchelin, Harish Rajamani, Valentin Dalibard, Ana Ramalho, Yang Guo, Kartikeya Badola, Seojin Bang, Nathalie Rauschmayr, Julia Proskurnia, Sudeep Dasari, Xinyun Chen, Mikhail Sushkov, Anja Hauth, Pauline Sho, Abhinav Singh, Bilva Chandra, Allie Culp, Max Dylla, Olivier Bachem, James Besley, Heri Zhao, Timothy Lillicrap, Wei Wei, Wael Al Jishi, Ning Niu, Alban Rrustemi, Raphaël Lopez Kaufman, Ryan Poplin, Jewel Zhao, Minh Truong, Shikhar Bharadwaj, Ester Hlavnova, Eli Stickgold, Cordelia Schmid, Georgi Stephanov, Zhaoqi Leng, Frederick Liu, Léonard Hussenot, Shenil Dodhia, Juliana Vicente Franco, Lesley Katzen, Abhanshu Sharma, Sarah Cogan, Zuguang Yang, Aniket Ray, Sergi Caelles, Shen Yan, Ravin Kumar, Daniel Gillick, Renee Wong, Joshua Ainslie, Jonathan Hoech, Séb Arnold, Dan Abolafia, Anca Dragan, Ben Hora, Grace Hu, Alexey Guseynov, Yang Lu, Chas Leichner, Jinmeng Rao, Abhimanyu Goyal, Nagabhushan Baddi, Daniel Hernandez Diaz, Tim McConnell, Max Bain, Jake Abernethy, Qiqi Yan, Rylan Schaeffer, Paul Vicol, Will Thompson, Montse Gonzalez Arenas, Mathias Bellaiche, Pablo Barrio, Stefan Zinke, Riccardo Patana, Pulkit Mehta, JK Kearns, Avraham Ruderman, Scott Pollom, David D'Ambrosio, Cath Hope, Yang Yu, Andrea Gesmundo, Kuang-Huei Lee, Aviv Rosenberg, Yiqian Zhou, Yaoyiran Li, Drew Garmon, Yonghui Wu, Safeen Huda, Gil Fidel, Martin Baeuml, Jian Li, Phoebe Kirk, Rhys May, Tao Tu, Sara Mc Carthy, Toshiyuki Fukuzawa, Miranda Aperghis, Chih-Kuan Yeh, Toshihiro Yoshino, Bo Li, Austin Myers, Kaisheng Yao, Ben Limonchik, Changwan Ryu, Rohun Saxena, Alex Goldin, Ruizhe Zhao, Rocky Rhodes, Tao Zhu, Divya Tyam, Heidi Howard, Nathan Byrd, Hongxu Ma, Yan Wu, Ryan Mullins, Qingze Wang, Aida Amini, Sebastien Baur, Yiran Mao, Subhashini Venugopalan, Will Song, Wen Ding, Paul Collins, Sashank Reddi, Megan Shum, Andrei Rusu, Luisa Zintgraf, Kelvin Chan, Sheela Goenka, Mathieu Blondel, Michael Collins, Renke Pan, Marissa Giustina, Nikolai Chinaev, Christian Schuler, Ce Zheng, Jonas Valfridsson, Alyssa Loo, Alex Yakubovich, Jamie Smith, Tao Jiang, Rich Munoz, Gabriel Barcik, Rishabh Bansal, Mingyao Yang, Yilun Du, Pablo Duque, Mary Phuong, Alexandra Belias, Kunal Lad, Zeyu Liu, Tal Schuster, Karthik Duddu, Jieru Hu, Paige Kunkle, Matthew Watson, Jackson Tolins, Josh Smith, Denis Teplyashin, Garrett Bingham, Marvin Ritter, Marco Andreetto, Divya Pitta, Mohak Patel, Shashank Viswanadha, Trevor Strohman, Catalin Ionescu, Jincheng Luo, Yogesh Kalley, Jeremy Wiesner, Dan Deutsch, Derek Lockhart, Peter Choy, Rumen Dangovski, Chawin Sitawarin, Cat Graves, Tanya Lando, Joost van Amersfoort, Ndidi Elue, Zhouyuan Huo, Pooya Moradi, Jean Tarbouriech, Henryk Michalewski, Wenting Ye, Eunyoung Kim, Alex Druinsky, Florent Altché, Xinyi Chen, Artur Dwornik, Da-Cheng Juan, Rivka Moroshko, Horia Toma, Jarrod Kahn, Hai Qian, Maximilian Sieb, Irene Cai, Roman Goldenberg, Praneeth Netrapalli, Sindhu Raghuram, Yuan Gong, Lijie Fan, Evan Palmer, Yossi Matias, Valentin Gabeur, Shreya Pathak, Tom Ouyang, Don Metzler, Geoff Bacon, Srinivasan Venkatachary, Sridhar Thiagarajan, Alex Cullum, Eran Ofek, Vytenis Sakenas, Mohamed Hammad, Cesar Magalhaes, Mayank Daswani, Oscar Chang, Ashok Popat, Ruichao Li, Komal Jalan, Yanhan Hou, Josh Lipschultz, Antoine He, Wenhao Jia, Pier Giuseppe Sessa, Prateek Kolhar, William Wong, Sumeet Singh, Lukas Haas, Jay Whang, Hanna Klimczak-Plucińska, Georges Rotival, Grace Chung, Yiqing Hua, Anfal Siddiqui, Nicolas Serrano, Dongkai Chen, Billy Porter, Libin Bai, Keshav Shivam, Sho Arora, Partha Talukdar, Tom Cobley, Sangnie Bhardwaj, Evgeny Gladchenko, Simon Green, Kelvin Guu, Felix Fischer, Xiao Wu, Eric Wang, Achintya Singhal, Tatiana Matejovicova, James Martens, Hongji Li, Roma Patel, Elizabeth Kemp, Jiaqi Pan, Lily Wang, Blake JianHang Chen, Jean-Baptiste Alayrac, Navneet Potti, Erika Gemzer, Eugene Ie, Kay McKinney, Takaaki Saeki, Edward Chou, Pascal Lamblin, SQ Mah, Zach Fisher, Martin Chadwick, Jon Stritar, Obaid Sarvana, Andrew Hogue, Artem Shtefan, Hadi Hashemi, Yang Xu, Jindong Gu, Sharad Vikram, Chung-Ching Chang, Sabela Ramos, Logan Kilpatrick, Weijuan Xi, Jenny Brennan, Yinghao Sun, Abhishek Jindal, Ionel Gog, Dawn Chen, Felix Wu, Jason Lee, Sudhindra Kopalle, Srinadh Bhojanapalli, Oriol Vinyals, Natan Potikha, Burcu Karagol Ayan, Yuan Yuan, Michael Riley, Piotr Stanczyk, Sergey Kishchenko, Bing Wang, Dan Garrette, Antoine Yang, Vlad Feinberg, CJ Carey, Javad Azizi, Viral Shah, Erica Moreira, Chongyang Shi, Josh Feldman, Elizabeth Salesky, Thomas Lampe, Aneesh Pappu, Duhyeon Kim, Jonas Adler, Avi Caciularu, Brian Walker, Yunhan Xu, Yochai Blau, Dylan Scandinaro, Terry Huang, Sam El-Husseini, Abhishek Sinha, Lijie Ren, Taylor Tobin, Patrik Sundberg, Tim Sohn, Vikas Yadav, Mimi Ly, Emily Xue, Jing Xiong, Afzal Shama Soudagar, Sneha Mondal, Nikhil Khadke, Qingchun Ren, Ben Vargas, Stan Bileschi, Sarah Chakera, Cindy Wang, Boyu Wang, Yoni Halpern, Joe Jiang, Vikas Sindhwani, Petre Petrov, Pranavaraj Ponnuramu, Sanket Vaibhav Mehta, Yu Watanabe, Betty Chan, Matheus Wisniewski, Trang Pham, Jingwei Zhang, Conglong Li, Dario de Cesare, Art Khurshudov, Alex Vasiloff, Melissa Tan, Zoe Ashwood, Bobak Shahriari, Maryam Majzoubi, Garrett Tanzer, Olga Kozlova, Robin Alazard, James Lee-Thorp, Nguyet Minh Phu, Isaac Tian, Junwhan Ahn, Andy Crawford, Lauren Lax, Yuan, Shangguan, Iftekhar Naim, David Ross, Oleksandr Ferludin, Tongfei Guo, Andrea Banino, Hubert Soyer, Xiaoen Ju, Dominika Rogozińska, Ishaan Malhi, Marcella Valentine, Daniel Balle, Apoorv Kulshreshtha, Maciej Kula, Yiwen Song, Sophia Austin, John Schultz, Roy Hirsch, Arthur Douillard, Apoorv Reddy, Michael Fink, Summer Yue, Khyatti Gupta, Adam Zhang, Norman Rink, Daniel McDuff, Lei Meng, András György, Yasaman Razeghi, Ricky Liang, Kazuki Osawa, Aviel Atias, Matan Eyal, Tyrone Hill, Nikolai Grigorev, Zhengdong Wang, Nitish Kulkarni, Rachel Soh, Ivan Lobov, Zachary Charles, Sid Lall, Kazuma Hashimoto, Ido Kessler, Victor Gomes, Zelda Mariet, Danny Driess, Alessandro Agostini, Canfer Akbulut, Jingcao Hu, Marissa Ikonomidis, Emily Caveness, Kartik Audhkhasi, Saurabh Agrawal, Ioana Bica, Evan Senter, Jayaram Mudigonda, Kelly Chen, Jingchen Ye, Xuanhui Wang, James Svensson, Philipp Fränken, Josh Newlan, Li Lao, Eva Schnider, Sami Alabed, Joseph Kready, Jesse Emond, Afief Halumi, Tim Zaman, Chengxi Ye, Naina Raisinghani, Vilobh Meshram, Bo Chang, Ankit Singh Rawat, Axel Stjerngren, Sergey Levi, Rui Wang, Xiangzhu Long, Mitchelle Rasquinha, Steven Hand, Aditi Mavalankar, Lauren Agubuzu, Sudeshna Roy, Junquan Chen, Jarek Wilkiewicz, Hao Zhou, Michal Jastrzebski, Qiong Hu, Agustin Dal Lago, Ramya Sree Boppana, Wei-Jen Ko, Jennifer Prendki, Yao Su, Zhi Li, Eliza Rutherford, Girish Ramchandra Rao, Ramona Comanescu, Adrià Puigdomènech, Qihang Chen, Dessie Petrova, Christine Chan, Vedrana Milutinovic, Felipe Tiengo Ferreira, Chin-Yi Cheng, Ming Zhang, Tapomay Dey, Sherry Yang, Ramesh Sampath, Quoc Le, Howard Zhou, Chu-Cheng Lin, Hoi Lam, Christine Kaeser-Chen, Kai Hui, Dean Hirsch, Tom Eccles, Basil Mustafa, Shruti Rijhwani, Morgane Rivière, Yuanzhong Xu, Junjie Wang, Xinyang Geng, Xiance Si, Arjun Khare, Cheolmin Kim, Vahab Mirrokni, Kamyu Lee, Khuslen Baatarsukh, Nathaniel Braun, Lisa Wang, Pallavi LV, Richard Tanburn, Yuvein, Zhu, Fangda Li, Setareh Ariafar, Dan Goldberg, Ken Burke, Daniil Mirylenka, Meiqi Guo, Olaf Ronneberger, Hadas Natalie Vogel, Liqun Cheng, Nishita Shetty, Johnson Jia, Thomas Jimma, Corey Fry, Ted Xiao, Martin Sundermeyer, Ryan Burnell, Yannis Assael, Mario Pinto, JD Chen, Rohit Sathyanarayana, Donghyun Cho, Jing Lu, Rishabh Agarwal, Sugato Basu, Lucas Gonzalez, Dhruv Shah, Meng Wei, Dre Mahaarachchi, Rohan Agrawal, Tero Rissa, Yani Donchev, Ramiro Leal-Cavazos, Adrian Hutter, Markus Mircea, Alon Jacovi, Faruk Ahmed, Jiageng Zhang, Shuguang Hu, Bo-Juen Chen, Jonni Kanerva, Guillaume Desjardins, Andrew Lee, Nikos Parotsidis, Asier Mujika, Tobias Weyand, Jasper Snoek, Jo Chick, Kai Chen, Paul Chang, Ethan Mahintorabi, Zi Wang, Tolly Powell, Orgad Keller, Abhirut Gupta, Claire Sha, Kanav Garg, Nicolas Heess, Ágoston Weisz, Cassidy Hardin, Bartek Wydrowski, Ben Coleman, Karina Zainullina, Pankaj Joshi, Alessandro Epasto, Terry Spitz, Binbin Xiong, Kai Zhao, Arseniy Klimovskiy, Ivy Zheng, Johan Ferret, Itay Yona, Waleed Khawaja, Jean-Baptiste Lespiau, Maxim Krikun, Siamak Shakeri, Timothee Cour, Bonnie Li, Igor Krivokon, Dan Suh, Alex Hofer, Jad Al Abdallah, Nikita Putikhin, Oscar Akerlund, Silvio Lattanzi, Anurag Kumar, Shane Settle, Himanshu Srivastava, Folawiyo Campbell-Ajala, Edouard Rosseel, Mihai Dorin Istin, Nishanth Dikkala, Anand Rao, Nick Young, Kate Lin, Dhruva Bhaswar, Yiming Wang, Jaume Sanchez Elias, Kritika Muralidharan, James Keeling, Dayou Du, Siddharth Gopal, Gregory Dibb, Charles Blundell, Manolis Delakis, Jacky Liang, Marco Tulio Ribeiro, Georgi Karadzhov, Guillermo Garrido, Ankur Bapna, Jiawei Cao, Adam Sadovsky, Pouya Tafti, Arthur Guez, Coline Devin, Yixian Di, Jinwei Xing, Chuqiao, Xu, Hanzhao Lin, Chun-Te Chu, Sameera Ponda, Wesley Helmholz, Fan Yang, Yue Gao, Sara Javanmardi, Wael Farhan, Alex Ramirez, Ricardo Figueira, Khe Chai Sim, Yuval Bahat, Ashwin Vaswani, Liangzhe Yuan, Gufeng Zhang, Leland Rechis, Hanjun Dai, Tayo Oguntebi, Alexandra Cordell, Eugénie Rives, Kaan Tekelioglu, Naveen Kumar, Bing Zhang, Aurick Zhou, Nikolay Savinov, Andrew Leach, Alex Tudor, Sanjay Ganapathy, Yanyan Zheng, Mirko Rossini, Vera Axelrod, Arnaud Autef, Yukun Zhu, Zheng Zheng, Mingda Zhang, Baochen Sun, Jie Ren, Nenad Tomasev, Nithish Kannan, Amer Sinha, Charles Chen, Louis O'Bryan, Alex Pak, Aditya Kusupati, Weel Yang, Deepak Ramachandran, Patrick Griffin, Seokhwan Kim, Philipp Neubeck, Craig Schiff, Tammo Spalink, Mingyang Ling, Arun Nair, Ga-Young Joung, Linda Deng, Avishkar Bhoopchand, Lora Aroyo, Tom Duerig, Jordan Griffith, Gabe Barth-Maron, Jake Ades, Alex Haig, Ankur Taly, Yunting Song, Paul Michel, Dave Orr, Dean Weesner, Corentin Tallec, Carrie Grimes Bostock, Paul Niemczyk, Andy Twigg, Mudit Verma, Rohith Vallu, Henry Wang, Marco Gelmi, Kiranbir Sodhia, Aleksandr Chuklin, Omer Goldman, Jasmine George, Liang Bai, Kelvin Zhang, Petar Sirkovic, Efrat Nehoran, Golan Pundak, Jiaqi Mu, Alice Chen, Alex Greve, Paulo Zacchello, David Amos, Heming Ge, Eric Noland, Colton Bishop, Jeffrey Dudek, Youhei Namiki, Elena Buchatskaya, Jing Li, Dorsa Sadigh, Masha Samsikova, Dan Malkin, Damien Vincent, Robert David, Rob Willoughby, Phoenix Meadowlark, Shawn Gao, Yan Li, Raj Apte, Amit Jhindal, Stein Xudong Lin, Alex Polozov, Zhicheng Wang, Tomas Mery, Anirudh GP, Varun Yerram, Sage Stevens, Tianqi Liu, Noah Fiedel, Charles Sutton, Matthew Johnson, Xiaodan Song, Kate Baumli, Nir Shabat, Muqthar Mohammad, Hao Liu, Marco Selvi, Yichao Zhou, Mehdi Hafezi Manshadi, Chu-ling Ko, Anthony Chen, Michael Bendersky, Jorge Gonzalez Mendez, Nisarg Kothari, Amir Zandieh, Yiling Huang, Daniel Andor, Ellie Pavlick, Idan Brusilovsky, Jitendra Harlalka, Sally Goldman, Andrew Lampinen, Guowang Li, Asahi Ushio, Somit Gupta, Lei Zhang, Chuyuan Kelly Fu, Madhavi Sewak, Timo Denk, Jed Borovik, Brendan Jou, Avital Zipori, Prateek Jain, Junwen Bai, Thang Luong, Jonathan Tompson, Alice Li, Li Liu, George Powell, Jiajun Shen, Alex Feng, Grishma Chole, Da Yu, Yinlam Chow, Tongxin Yin, Eric Malmi, Kefan Xiao, Yash Pande, Shachi Paul, Niccolò Dal Santo, Adil Dostmohamed, Sergio Guadarrama, Aaron Phillips, Thanumalayan Sankaranarayana Pillai, Gal Yona, Amin Ghafouri, Preethi Lahoti, Benjamin Lee, Dhruv Madeka, Eren Sezener, Simon Tokumine, Adrian Collister, Nicola De Cao, Richard Shin, Uday Kalra, Parker Beak, Emily Nottage, Ryo Nakashima, Ivan Jurin, Vikash Sehwag, Meenu Gaba, Junhao Zeng, Kevin R. McKee, Fernando Pereira, Tamar Yakar, Amayika Panda, Arka Dhar, Peilin Zhong, Daniel Sohn, Mark Brand, Lars Lowe Sjoesund, Viral Carpenter, Sharon Lin, Shantanu Thakoor, Marcus Wainwright, Ashwin Chaugule, Pranesh Srinivasan, Muye Zhu, Bernett Orlando, Jack Weber, Ayzaan Wahid, Gilles Baechler, Apurv Suman, Jovana Mitrović, Gabe Taubman, Honglin Yu, Helen King, Josh Dillon, Cathy Yip, Dhriti Varma, Tomas Izo, Levent Bolelli, Borja De Balle Pigem, Julia Di Trapani, Fotis Iliopoulos, Adam Paszke, Nishant Ranka, Joe Zou, Francesco Pongetti, Jed McGiffin, Alex Siegman, Rich Galt, Ross Hemsley, Goran Žužić, Victor Carbune, Tao Li, Myle Ott, Félix de Chaumont Quitry, David Vilar Torres, Yuri Chervonyi, Tomy Tsai, Prem Eruvbetine, Samuel Yang, Matthew Denton, Jake Walker, Slavica Andačić, Idan Heimlich Shtacher, Vittal Premachandran, Harshal Tushar Lehri, Cip Baetu, Damion Yates, Lampros Lamprou, Mariko Iinuma, Ioana Mihailescu, Ben Albrecht, Shachi Dave, Susie Sargsyan, Bryan Perozzi, Lucas Manning, Chiyuan Zhang, Denis Vnukov, Igor Mordatch, Raia Hadsell Wolfgang Macherey, Ryan Kappedal, Jim Stephan, Aditya Tripathi, Klaus Macherey, Jun Qian, Abhishek Bhowmick, Shekoofeh Azizi, Rémi Leblond, Shiva Mohan Reddy Garlapati, Timothy Knight, Matthew Wiethoff, Wei-Chih Hung, Anelia Angelova, Georgios Evangelopoulos, Pawel Janus, Dimitris Paparas, Matthew Rahtz, Ken Caluwaerts, Vivek Sampathkumar, Daniel Jarrett, Shadi Noghabi, Antoine Miech, Chak Yeung, Geoff Clark, Henry Prior, Fei Zheng, Jean Pouget-Abadie, Indro Bhattacharya, Kalpesh Krishna, Will Bishop, Zhe Yuan, Yunxiao Deng, Ashutosh Sathe, Kacper Krasowiak, Ciprian Chelba, Cho-Jui Hsieh, Kiran Vodrahalli, Buhuang Liu, Thomas Köppe, Amr Khalifa, Lubo Litchev, Pichi Charoenpanit, Reed Roberts, Sachin Yadav, Yasumasa Onoe, Desi Ivanov, Megha Mohabey, Vighnesh Birodkar, Nemanja Rakićević, Pierre Sermanet, Vaibhav Mehta, Krishan Subudhi, Travis Choma, Will Ng, Luheng He, Kathie Wang, Tasos Kementsietsidis, Shane Gu, Mansi Gupta, Andrew Nystrom, Mehran Kazemi, Timothy Chung, Nacho Cano, Nikhil Dhawan, Yufei Wang, Jiawei Xia, Trevor Yacovone, Eric Jia, Mingqing Chen, Simeon Ivanov, Ashrith Sheshan, Sid Dalmia, Paweł Stradomski, Pengcheng Yin, Salem Haykal, Congchao Wang, Dennis Duan, Neslihan Bulut, Greg Kochanski, Liam MacDermed, Namrata Godbole, Shitao Weng, Jingjing Chen, Rachana Fellinger, Ramin Mehran, Daniel Suo, Hisham Husain, Tong He, Kaushal Patel, Joshua Howland, Randall Parker, Kelvin Nguyen, Sharath Maddineni, Chris Rawles, Mina Khan, Shlomi Cohen-Ganor, Amol Mandhane, Xinyi Wu, Chenkai Kuang, Iulia Comşa, Ramya Ganeshan, Hanie Sedghi, Adam Bloniarz, Nuo Wang Pierse, Anton Briukhov, Petr Mitrichev, Anita Gergely, Serena Zhan, Allan Zhou, Nikita Saxena, Eva Lu, Josef Dean, Ashish Gupta, Nicolas Perez-Nieves, Renjie Wu, Cory McLean, Wei Liang, Disha Jindal, Anton Tsitsulin, Wenhao Yu, Kaiz Alarakyia, Tom Schaul, Piyush Patil, Peter Sung, Elijah Peake, Hongkun Yu, Feryal Behbahani, JD Co-Reyes, Alan Ansell, Sean Sun, Clara Barbu, Jonathan Lee, Seb Noury, James Allingham, Bilal Piot, Mohit Sharma, Christopher Yew, Ivan Korotkov, Bibo Xu, Demetra Brady, Goran Petrovic, Shibl Mourad, Claire Cui, Aditya Gupta, Parker Schuh, Saarthak Khanna, Anna Goldie, Abhinav Arora, Vadim Zubov, Amy Stuart, Mark Epstein, Yun Zhu, Jianqiao Liu, Yury Stuken, Ziyue Wang, Karolis Misiunas, Dee Guo, Ashleah Gill, Ale Hartman, Zaid Nabulsi, Aurko Roy, Aleksandra Faust, Jason Riesa, Ben Withbroe, Mengchao Wang, Marco Tagliasacchi, Andreea Marzoca, James Noraky, Serge Toropov, Malika Mehrotra, Bahram Raad, Sanja Deur, Steve Xu, Marianne Monteiro, Zhongru Wu, Yi Luan, Sam Ritter, Nick Li, Håvard Garnes, Yanzhang He, Martin Zlocha, Jifan Zhu, Matteo Hessel, Will Wu, Spandana Raj Babbula, Chizu Kawamoto, Yuanzhen Li, Mehadi Hassen, Yan Wang, Brian Wieder, James Freedman, Yin Zhang, Xinyi Bai, Tianli Yu, David Reitter, XiangHai Sheng, Mateo Wirth, Aditya Kini, Dima Damen, Mingcen Gao, Rachel Hornung, Michael Voznesensky, Brian Roark, Adhi Kuncoro, Yuxiang Zhou, Rushin Shah, Anthony Brohan, Kuangyuan Chen, James Wendt, David Rim, Paul Kishan Rubenstein, Jonathan Halcrow, Michelle Liu, Ty Geri, Yunhsuan Sung, Jane Shapiro, Shaan Bijwadia, Chris Duvarney, Christina Sorokin, Paul Natsev, Reeve Ingle, Pramod Gupta, Young Maeng, Ndaba Ndebele, Kexin Zhu, Valentin Anklin, Katherine Lee, Yuan Liu, Yaroslav Akulov, Shaleen Gupta, Guolong Su, Flavien Prost, Tianlin Liu, Vitaly Kovalev, Pol Moreno, Martin Scholz, Sam Redmond, Zongwei Zhou, Alex Castro-Ros, André Susano Pinto, Dia Kharrat, Michal Yarom, Rachel Saputro, Jannis Bulian, Ben Caine, Ji Liu, Abbas Abdolmaleki, Shariq Iqbal, Tautvydas Misiunas, Mikhail Sirotenko, Shefali Garg, Guy Bensky, Huan Gui, Xuezhi Wang, Raphael Koster, Mike Bernico, Da Huang, Romal Thoppilan, Trevor Cohn, Ben Golan, Wenlei Zhou, Andrew Rosenberg, Markus Freitag, Tynan Gangwani, Vincent Tsang, Anand Shukla, Xiaoqi Ren, Minh Giang, Chi Zou, Andre Elisseeff, Charline Le Lan, Dheeru Dua, Shuba Lall, Pranav Shyam, Frankie Garcia, Sarah Nguyen, Michael Guzman, AJ Maschinot, Marcello Maggioni, Ming-Wei Chang, Karol Gregor, Lotte Weerts, Kumaran Venkatesan, Bogdan Damoc, Leon Liu, Jan Wassenberg, Lewis Ho, Becca Roelofs, Majid Hadian, François-Xavier Aubet, Yu Liang, Sami Lachgar, Danny Karmon, Yong Cheng, Amelio Vázquez-Reina, Angie Chen, Zhuyun Dai, Andy Brock, Shubham Agrawal, Chenxi Pang, Peter Garst, Mariella Sanchez-Vargas, Ivor Rendulic, Aditya Ayyar, Andrija Ražnatović, Olivia Ma, Roopali Vij, Neha Sharma, Ashwin Balakrishna, Bingyuan Liu, Ian Mackinnon, Sorin Baltateanu, Petra Poklukar, Gabriel Ibagon, Colin Ji, Hongyang Jiao, Isaac Noble, Wojciech Stokowiec, Zhihao Li, Jeff Dean, David Lindner, Mark Omernick, Kristen Chiafullo, Mason Dimarco, Vitor Rodrigues, Vittorio Selo, Garrett Honke, Xintian, Wu, Wei He, Adam Hillier, Anhad Mohananey, Vihari Piratla, Chang Ye, Chase Malik, Sebastian Riedel, Samuel Albanie, Zi Yang, Kenny Vassigh, Maria Bauza, Sheng Li, Yiqing Tao, Nevan Wichers, Andrii Maksai, Abe Ittycheriah, Ross Mcilroy, Bryan Seybold, Noah Goodman, Romina Datta, Steven M. Hernandez, Tian Shi, Yony Kochinski, Anna Bulanova, Ken Franko, Mikita Sazanovich, Nicholas FitzGerald, Praneeth Kacham, Shubha Srinivas Raghvendra, Vincent Hellendoorn, Alexander Grushetsky, Julian Salazar, Angeliki Lazaridou, Jason Chang, Jan-Thorsten Peter, Sushant Kafle, Yann Dauphin, Abhishek Rao, Filippo Graziano, Izhak Shafran, Yuguo Liao, Tianli Ding, Geng Yan, Grace Chu, Zhao Fu, Vincent Roulet, Gabriel Rasskin, Duncan Williams, Shahar Drath, Alex Mossin, Raphael Hoffmann, Jordi Orbay, Francesco Bertolini, Hila Sheftel, Justin Chiu, Siyang Xue, Yuheng Kuang, Ferjad Naeem, Swaroop Nath, Nana Nti, Phil Culliton, Kashyap Krishnakumar, Michael Isard, Pei Sun, Ayan Chakrabarti, Nathan Clement, Regev Cohen, Arissa Wongpanich, GS Oh, Ashwin Murthy, Hao Zheng, Jessica Hamrick, Oskar Bunyan, Suhas Ganesh, Nitish Gupta, Roy Frostig, John Wieting, Yury Malkov, Pierre Marcenac, Zhixin, Lai, Xiaodan Tang, Mohammad Saleh, Fedir Zubach, Chinmay Kulkarni, Huanjie Zhou, Vicky Zayats, Nan Ding, Anshuman Tripathi, Arijit Pramanik, Patrik Zochbauer, Harish Ganapathy, Vedant Misra, Zach Behrman, Hugo Vallet, Mingyang Zhang, Mukund Sridhar, Ye Jin, Mohammad Babaeizadeh, Siim Põder, Megha Goel, Divya Jain, Tajwar Nasir, Shubham Mittal, Tim Dozat, Diego Ardila, Aliaksei Severyn, Fabio Pardo, Sammy Jerome, Siyang Qin, Louis Rouillard, Amir Yazdanbakhsh, Zizhao Zhang, Shivani Agrawal, Kaushik Shivakumar, Caden Lu, Praveen Kallakuri, Rachita Chhaparia, Kanishka Rao, Charles Kwong, Asya Fadeeva, Shitij Nigam, Yan Virin, Yuan Zhang, Balaji Venkatraman, Beliz Gunel, Marc Wilson, Huiyu Wang, Abhinav Gupta, Xiaowei Xu, Adrien Ali Taïga, Kareem Mohamed, Doug Fritz, Daniel Rodriguez, Zoubin Ghahramani, Harry Askham, Lior Belenki, James Zhao, Rahul Gupta, Krzysztof Jastrzębski, Takahiro Kosakai, Kaan Katircioglu, Jon Schneider, Rina Panigrahy, Konstantinos Bousmalis, Peter Grabowski, Prajit Ramachandran, Chaitra Hegde, Mihaela Rosca, Angelo Scorza Scarpati, Kyriakos Axiotis, Ying Xu, Zach Gleicher, Assaf Hurwitz Michaely, Mandar Sharma, Sanil Jain, Christoph Hirnschall, Tal Marian, Xuhui Jia, Kevin Mather, Kilol Gupta, Linhai Qiu, Nigamaa Nayakanti, Lucian Ionita, Steven Zheng, Lucia Loher, Kurt Shuster, Igor Petrovski, Roshan Sharma, Rahma Chaabouni, Angel Yeh, James An, Arushi Gupta, Steven Schwarcz, Seher Ellis, Sam Conway-Rahman, Javier Snaider, Alex Zhai, James Atwood, Daniel Golovin, Liqian Peng, Te I, Vivian Xia, Salvatore Scellato, Mahan Malihi, Arthur Bražinskas, Vlad-Doru Ion, Younghoon Jun, James Swirhun, Soroosh Mariooryad, Jiao Sun, Steve Chien, Rey Coaguila, Ariel Brand, Yi Gao, Tom Kwiatkowski, Roee Aharoni, Cheng-Chun Lee, Mislav Žanić, Yichi Zhang, Dan Ethier, Vitaly Nikolaev, Pranav Nair, Yoav Ben Shalom, Hen Fitoussi, Jai Gupta, Hongbin Liu, Dee Cattle, Tolga Bolukbasi, Ben Murdoch, Fantine Huot, Yin Li, Chris Hahn
- 日期: 2025-07-07
- ArXiv主页 : https://arxiv.org/abs/2507.06261
- 论文链接 : https://arxiv.org/pdf/2507.06261
英文摘要
In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.
中文摘要
在本报告中,我们介绍了Gemini 2.x模型系列:Gemini 2.5 Pro和Gemini 2.5 Flash,以及我们较早的Gemini 2.0 Flash和Flash-Lite模型。Gemini 2.5 Pro是我们迄今为止最有能力的模型,在边境编码和推理基准测试中实现了SOTA性能。Gemini 2.5 Pro除了令人难以置信的编码和推理技巧外,还具有多模式理解的思维模型,现在可以处理多达3个小时的视频内容。可以将其独特的长上下文,多模式和推理功能组合在一起,以解锁新的代理工作流程。Gemini 2.5 Flash在计算和延迟需求的一小部分提供了出色的推理能力,Gemini 2.0 Flash和Flash-Lite以低潜伏期和成本提供了高性能。综上所述,Gemini 2.X模型生成涵盖了模型能力与成本的完整帕累托前沿,从而使用户可以探索复杂的代理问题解决的界限。
Exaone 4.0:统一的大型语言模型集成了非争议和推理模式
- 标题: EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes
- 作者: LG AI Research, Kyunghoon Bae, Eunbi Choi, Kibong Choi, Stanley Jungkyu Choi, Yemuk Choi, Kyubeen Han, Seokhee Hong, Junwon Hwang, Taewan Hwang, Joonwon Jang, Hyojin Jeon, Kijeong Jeon, Gerrard Jeongwon Jo, Hyunjik Jo, Jiyeon Jung, Euisoon Kim, Hyosang Kim, Jihoon Kim, Joonkee Kim, Seonghwan Kim, Soyeon Kim, Sunkyoung Kim, Yireun Kim, Yongil Kim, Youchul Kim, Edward Hwayoung Lee, Gwangho Lee, Haeju Lee, Honglak Lee, Jinsik Lee, Kyungmin Lee, Sangha Park, Young Min Paik, Yongmin Park, Youngyong Park, Sanghyun Seo, Sihoon Yang, Heuiyeen Yeen, Sihyuk Yi, Hyeongu Yun
- 日期: 2025-07-15
- ArXiv主页 : https://arxiv.org/abs/2507.11407
- 论文链接 : https://arxiv.org/pdf/2507.11407
英文摘要
This technical report introduces EXAONE 4.0, which integrates a Non-reasoning mode and a Reasoning mode to achieve both the excellent usability of EXAONE 3.5 and the advanced reasoning abilities of EXAONE Deep. To pave the way for the agentic AI era, EXAONE 4.0 incorporates essential features such as agentic tool use, and its multilingual capabilities are extended to support Spanish in addition to English and Korean. The EXAONE 4.0 model series consists of two sizes: a mid-size 32B model optimized for high performance, and a small-size 1.2B model designed for on-device applications. The EXAONE 4.0 demonstrates superior performance compared to open-weight models in its class and remains competitive even against frontier-class models. The models are publicly available for research purposes and can be easily downloaded via https://huggingface.co/LGAI-EXAONE.
中文摘要
该技术报告介绍了Exaone 4.0,该报告集成了非争议模式和推理模式,以实现Exaone 3.5的出色可用性和Exaone Deep的高级推理能力。为了为代理AI时代铺平道路,Exaone 4.0结合了基本功能,例如代理工具的使用,其多语言功能还扩展到除英语和韩文外,还可以支持西班牙语。Exaone 4.0模型系列包括两种尺寸:用于高性能优化的中型32B型号,以及专为设备应用程序设计的小型1.2B型号。Exaone 4.0与同类的开放权重模型相比表现出卓越的性能,即使在边境级型号上也保持竞争力。这些模型可公开用于研究目的,可以通过https://huggingface.co/lgai-exaone轻松下载。
diffuman4d:4D一致的人类视图综合了带有时空扩散模型的稀疏视频
- 标题: Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
- 作者: Yudong Jin, Sida Peng, Xuan Wang, Tao Xie, Zhen Xu, Yifan Yang, Yujun Shen, Hujun Bao, Xiaowei Zhou
- 日期: 2025-07-17
- ArXiv主页 : https://arxiv.org/abs/2507.13344
- 论文链接 : https://arxiv.org/pdf/2507.13344
- 项目链接 : https://diffuman4d.github.io/
- gitHub仓库 : https://github.com/zju3dv/Diffuman4D
英文摘要
This paper addresses the challenge of high-fidelity view synthesis of humans with sparse-view videos as input. Previous methods solve the issue of insufficient observation by leveraging 4D diffusion models to generate videos at novel viewpoints. However, the generated videos from these models often lack spatio-temporal consistency, thus degrading view synthesis quality. In this paper, we propose a novel sliding iterative denoising process to enhance the spatio-temporal consistency of the 4D diffusion model. Specifically, we define a latent grid in which each latent encodes the image, camera pose, and human pose for a certain viewpoint and timestamp, then alternately denoising the latent grid along spatial and temporal dimensions with a sliding window, and finally decode the videos at target viewpoints from the corresponding denoised latents. Through the iterative sliding, information flows sufficiently across the latent grid, allowing the diffusion model to obtain a large receptive field and thus enhance the 4D consistency of the output, while making the GPU memory consumption affordable. The experiments on the DNA-Rendering and ActorsHQ datasets demonstrate that our method is able to synthesize high-quality and consistent novel-view videos and significantly outperforms the existing approaches. See our project page for interactive demos and video results: https://diffuman4d.github.io/ .
中文摘要
本文解决了以稀疏视频视频作为输入的高保真视图综合人类合成的挑战。以前的方法通过利用4D扩散模型来在新观点上生成视频,从而解决了不足的观察问题。但是,这些模型产生的视频通常缺乏时空的一致性,从而降低了视图合成质量。在本文中,我们提出了一个新型的滑动迭代脱索过程,以增强4D扩散模型的时空一致性。具体而言,我们定义了一个潜在的网格,其中每个潜在的图像,相机姿势和人姿势以某个观点和时间戳编码图像,相机姿势和人姿势,然后用滑动窗口交替地沿空间和时间尺寸沿空间和时间尺寸降低潜在的网格,并最终从相应的贬低的年轻人在目标观点上解码视频。通过迭代滑动,信息在潜在网格上充分流动,从而使扩散模型获得了大型的接收场,从而增强了输出的4D一致性,同时使GPU内存消耗负担得起。DNA渲染和ActorsHQ数据集的实验表明,我们的方法能够综合高质量和一致的新型视频,并显着胜过现有方法。有关交互式演示和视频结果,请参见我们的项目页面:https://diffuman4d.github.io/。
Clift:压缩光景令牌用于计算效率和适应性神经渲染
- 标题: CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering
- 作者: Zhengqing Wang, Yuefan Wu, Jiacheng Chen, Fuyang Zhang, Yasutaka Furukawa
- 日期: 2025-07-11
- ArXiv主页 : https://arxiv.org/abs/2507.08776
- 论文链接 : https://arxiv.org/pdf/2507.08776
- 项目链接 : https://clift-nvs.github.io/
- gitHub仓库 : https://github.com/eric-zqwang/CLiFT
英文摘要
This paper proposes a neural rendering approach that represents a scene as "compressed light-field tokens (CLiFTs)", retaining rich appearance and geometric information of a scene. CLiFT enables compute-efficient rendering by compressed tokens, while being capable of changing the number of tokens to represent a scene or render a novel view with one trained network. Concretely, given a set of images, multi-view encoder tokenizes the images with the camera poses. Latent-space K-means selects a reduced set of rays as cluster centroids using the tokens. The multi-view ``condenser'' compresses the information of all the tokens into the centroid tokens to construct CLiFTs. At test time, given a target view and a compute budget (i.e., the number of CLiFTs), the system collects the specified number of nearby tokens and synthesizes a novel view using a compute-adaptive renderer. Extensive experiments on RealEstate10K and DL3DV datasets quantitatively and qualitatively validate our approach, achieving significant data reduction with comparable rendering quality and the highest overall rendering score, while providing trade-offs of data size, rendering quality, and rendering speed.
中文摘要
本文提出了一种神经渲染方法,该方法代表了一个场景,即"压缩的光场令牌(Clifts)",保留了场景的丰富外观和几何信息。Clift可以通过压缩令牌启用计算效率的渲染,同时能够更改代表场景的令牌数量或使用一个训练有素的网络呈现新颖的视图。具体而言,给定一组图像,多视图编码器用相机姿势将图像归为图像。潜在空间K-均值使用令牌选择了一组减少的射线作为群集质心。多视图``冷凝器''将所有令牌的信息压缩到质心代币中以构造clift。在测试时,给定目标视图和计算预算(即,克利夫特的数量),该系统收集附近的代币数量,并使用Complute-Aptaptive渲染器合成新的视图。对RealEstate10K和DL3DV数据集进行了广泛的实验,并定性地验证了我们的方法,通过可比的渲染质量和最高的总体渲染得分实现了显着的数据减少,同时提供了数据尺寸,渲染质量和渲染速度的权衡。
SpeakervID-5M:用于视听二元互动人类一代的大型高质量数据集
- 标题: SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation
- 作者: Youliang Zhang, Zhaoyang Li, Duomin Wang, Jiahe Zhang, Deyu Zhou, Zixin Yin, Xili Dai, Gang Yu, Xiu Li
- 日期: 2025-07-14
- ArXiv主页 : https://arxiv.org/abs/2507.09862
- 论文链接 : https://arxiv.org/pdf/2507.09862
- 项目链接 : https://dorniwang.github.io/SpeakerVid-5M/
- gitHub仓库 : https://github.com/Dorniwang/SpeakerVid-5M-Code
英文摘要
The rapid development of large-scale models has catalyzed significant breakthroughs in the digital human domain. These advanced methodologies offer high-fidelity solutions for avatar driving and rendering, leading academia to focus on the next major challenge: audio-visual dyadic interactive virtual human. To facilitate research in this emerging area, we present SpeakerVid-5M dataset, the first large-scale, high-quality dataset designed for audio-visual dyadic interactive virtual human generation. Totaling over 8,743 hours, SpeakerVid-5M contains more than 5.2 million video clips of human portraits. It covers diverse scales and interaction types, including monadic talking, listening, and dyadic conversations. Crucially, the dataset is structured along two key dimensions: interaction type and data quality. First, it is categorized into four types (dialogue branch, single branch, listening branch and multi-turn branch) based on the interaction scenario. Second, it is stratified into a large-scale pre-training subset and a curated, high-quality subset for Supervised Fine-Tuning (SFT). This dual structure accommodates a wide array of 2D virtual human tasks. In addition, we provide an autoregressive (AR)-based video chat baseline trained on this data, accompanied by a dedicated set of metrics and test data to serve as a benchmark VidChatBench for future work. Both the dataset and the corresponding data processing code will be publicly released. Project page: https://dorniwang.github.io/SpeakerVid-5M/
中文摘要
大规模模型的快速发展促进了数字人类领域的显着突破。这些先进的方法为化身驾驶和渲染提供了高保真的解决方案,使学术界专注于下一个主要挑战:视听二元互动互动虚拟人类。为了促进该新兴领域的研究,我们介绍了SpeakervID-5M数据集,这是第一个专为视听二元互动虚拟人类代理设计的大型高质量数据集。SpeakervId-5m总计超过8,743小时,包含超过520万个人类肖像的视频片段。它涵盖了各种尺度和互动类型,包括Monadic说话,听力和二元对话。至关重要的是,数据集沿两个关键维度结构:交互类型和数据质量。首先,根据交互情况,它分为四种类型(对话分支,单个分支,听力分支和多转支分支)。其次,将其分层为大规模的预训练子集,并为监督微调(SFT)进行了精选的高质量子集。这种双重结构可容纳各种各样的2D虚拟人类任务。此外,我们还提供了基于此数据的基于自回旋(AR)的视频聊天基线,并配有一套专用的指标和测试数据,以作为基准Vidchatbench进行未来工作。数据集和相应的数据处理代码都将公开发布。项目页面:https://dorniwang.github.io/speakervid-5m/
模仿游戏:图灵机模拟器是长度可推广的推理器
- 标题: The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
- 作者: Zhouqi Hua, Wenwei Zhang, Chengqi Lyu, Yuzhe Gu, Songyang Gao, Kuikun Liu, Kai Chen
- 日期: 2025-07-17
- ArXiv主页 : https://arxiv.org/abs/2507.13332
- 论文链接 : https://arxiv.org/pdf/2507.13332
英文摘要
Length generalization, the ability to solve problems of longer sequences than those observed during training, poses a core challenge of Transformer-based large language models (LLM). Although existing studies have predominantly focused on data-driven approaches for arithmetic operations and symbolic manipulation tasks, these approaches tend to be task-specific with limited overall performance. To pursue a more general solution, this paper focuses on a broader case of reasoning problems that are computable, i.e., problems that algorithms can solve, thus can be solved by the Turing Machine. From this perspective, this paper proposes Turing MAchine Imitation Learning (TAIL) to improve the length generalization ability of LLMs. TAIL synthesizes chain-of-thoughts (CoT) data that imitate the execution process of a Turing Machine by computer programs, which linearly expands the reasoning steps into atomic states to alleviate shortcut learning and explicit memory fetch mechanism to reduce the difficulties of dynamic and long-range data access in elementary operations. To validate the reliability and universality of TAIL, we construct a challenging synthetic dataset covering 8 classes of algorithms and 18 tasks. Without bells and whistles, TAIL significantly improves the length generalization ability as well as the performance of Qwen2.5-7B on various tasks using only synthetic data, surpassing previous methods and DeepSeek-R1. The experimental results reveal that the key concepts in the Turing Machine, instead of the thinking styles, are indispensable for TAIL for length generalization, through which the model exhibits read-and-write behaviors consistent with the properties of the Turing Machine in their attention layers. This work provides a promising direction for future research in the learning of LLM reasoning from synthetic data.
中文摘要
长度的概括是解决序列问题比训练期间观察到的更长的序列问题的能力,这对基于变压器的大语言模型(LLM)提出了核心挑战。尽管现有的研究主要集中在算术操作和符号操作任务的数据驱动方法上,但这些方法往往是特定于任务的,并且总体绩效有限。为了寻求更一般的解决方案,本文着重于可计算的推理问题的更广泛的案例,即算法可以解决的问题,因此可以通过Turing Machine解决。从这个角度来看,本文提出了图灵机模仿学习(TAIL),以提高LLM的长度概括能力。Tail合成了通过计算机程序模仿Turing机器的执行过程的经验链(COT)数据,该程序将推理步骤线性扩展到原子状态,以减轻快捷方式学习和明确的内存fetch fetch机制,以减少基本操作中动态和长距离数据访问的困难。为了验证尾巴的可靠性和普遍性,我们构建了一个具有挑战性的合成数据集,涵盖了8类算法和18个任务。没有铃铛和哨子,尾巴可显着提高长度的泛化能力以及仅使用合成数据,超过以前的方法和DeepSeek-R1的各种任务的QWEN2.5-7B的性能。实验结果表明,图灵机中的关键概念而不是思维方式对于长度泛化而言是必不可少的,模型通过该模型表现出与图灵机在注意层中的读取和写入行为。这项工作为从合成数据学习LLM推理的未来研究提供了一个有希望的方向。
视觉语言视觉自动编码器:扩散模型的可扩展知识蒸馏
- 标题: Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models
- 作者: Tiezheng Zhang, Yitong Li, Yu-cheng Chou, Jieneng Chen, Alan Yuille, Chen Wei, Junfei Xiao
- 日期: 2025-07-09
- ArXiv主页 : https://arxiv.org/abs/2507.07104
- 论文链接 : https://arxiv.org/pdf/2507.07104
- 项目链接 : https://lambert-x.github.io/Vision-Language-Vision/
- gitHub仓库 : https://github.com/Tiezheng11/Vision-Language-Vision
英文摘要
Building state-of-the-art Vision-Language Models (VLMs) with strong captioning capabilities typically necessitates training on billions of high-quality image-text pairs, requiring millions of GPU hours. This paper introduces the Vision-Language-Vision (VLV) auto-encoder framework, which strategically leverages key pretrained components: a vision encoder, the decoder of a Text-to-Image (T2I) diffusion model, and subsequently, a Large Language Model (LLM). Specifically, we establish an information bottleneck by regularizing the language representation space, achieved through freezing the pretrained T2I diffusion decoder. Our VLV pipeline effectively distills knowledge from the text-conditioned diffusion model using continuous embeddings, demonstrating comprehensive semantic understanding via high-quality reconstructions. Furthermore, by fine-tuning a pretrained LLM to decode the intermediate language representations into detailed descriptions, we construct a state-of-the-art (SoTA) captioner comparable to leading models like GPT-4o and Gemini 2.0 Flash. Our method demonstrates exceptional cost-efficiency and significantly reduces data requirements; by primarily utilizing single-modal images for training and maximizing the utility of existing pretrained models (image encoder, T2I diffusion model, and LLM), it circumvents the need for massive paired image-text datasets, keeping the total training expenditure under $1,000 USD.
中文摘要
建立具有强大字幕功能的最先进的视觉模型(VLM)通常需要对数十亿个高质量的图像文本对进行培训,需要数百万个GPU小时。本文介绍了视觉语言视觉(VLV)自动编码器框架,该框架从策略上利用了关键的预处理组件:视觉编码器,文本对图像(T2I)扩散模型的解码器,然后是大语言模型(LLM)。具体而言,我们通过使语言表示空间正规化,通过冻结验证的T2I扩散解码器来建立信息瓶颈。我们的VLV管道有效地使用连续嵌入从文本条件的扩散模型中提取知识,从而通过高质量的重建来证明全面的语义理解。此外,通过微调验证的LLM将中间语言表示形式解码为详细的描述,我们构建了一个最先进的字幕(SOTA)字幕仪,可与GPT-4O和Gemini 2.0 Flash等领先模型相当。我们的方法证明了出色的成本效率,并大大减少了数据要求;通过主要利用单模式图像来训练并最大化现有验证模型的实用性(图像编码器,T2I扩散模型和LLM),它规避了对大量配对的图像 - 文本数据集的需求,从而使总培训支出保持在1,000美元以下。
Physx:物理接地的3D资产产生
- 标题: PhysX: Physical-Grounded 3D Asset Generation
- 作者: Ziang Cao, Zhaoxi Chen, Linag Pan, Ziwei Liu
- 日期: 2025-07-16
- ArXiv主页 : https://arxiv.org/abs/2507.12465
- 论文链接 : https://arxiv.org/pdf/2507.12465
- 项目链接 : https://physx-3d.github.io/
- gitHub仓库 : https://github.com/ziangcao0312/PhysX
英文摘要
3D modeling is moving from virtual to physical. Existing 3D generation primarily emphasizes geometries and textures while neglecting physical-grounded modeling. Consequently, despite the rapid development of 3D generative models, the synthesized 3D assets often overlook rich and important physical properties, hampering their real-world application in physical domains like simulation and embodied AI. As an initial attempt to address this challenge, we propose PhysX, an end-to-end paradigm for physical-grounded 3D asset generation. 1) To bridge the critical gap in physics-annotated 3D datasets, we present PhysXNet - the first physics-grounded 3D dataset systematically annotated across five foundational dimensions: absolute scale, material, affordance, kinematics, and function description. In particular, we devise a scalable human-in-the-loop annotation pipeline based on vision-language models, which enables efficient creation of physics-first assets from raw 3D assets.2) Furthermore, we propose PhysXGen, a feed-forward framework for physics-grounded image-to-3D asset generation, injecting physical knowledge into the pre-trained 3D structural space. Specifically, PhysXGen employs a dual-branch architecture to explicitly model the latent correlations between 3D structures and physical properties, thereby producing 3D assets with plausible physical predictions while preserving the native geometry quality. Extensive experiments validate the superior performance and promising generalization capability of our framework. All the code, data, and models will be released to facilitate future research in generative physical AI.
中文摘要
3D建模正在从虚拟变为物理。现有的3D代主要强调几何形状和纹理,同时忽略了物理基础的建模。因此,尽管3D生成模型的发展迅速,但合成的3D资产通常会忽略丰富而重要的物理特性,从而阻碍了其在仿真和体现AI等物理领域中的实际应用。作为解决这一挑战的初步尝试,我们提出了Physx,这是一种用于物理基础3D资产的端到端范式。1)为了弥合物理注销的3D数据集中的临界差距,我们提出了Physxnet-在五个基础维度上系统地注释了第一个物理界面的3D数据集:绝对规模,材料,负担能力,运动学和功能描述。特别是,我们根据视觉模型设计了一种可扩展的人类在环境注释管道,该管道可以有效地创建原始3D资产的物理优先资产。2)此外,我们提出了Physxgen,Physxgen,Physxgen,Physxgen,这是一种将物理 - 固定的图像至3D资产的馈送框架,将物理知识产生,将物理知识注入物理知识,将物理知识注入型号的3D 3D 3D 3D。具体而言,PhysXGEN采用双分支结构来明确对3D结构与物理特性之间的潜在相关性进行建模,从而在保留天然几何质量的同时生产具有合理的物理预测的3D资产。广泛的实验验证了我们框架的出色性能和有希望的概括能力。将发布所有代码,数据和模型,以促进生成物理AI的未来研究。
AnyCap项目:可控omni模式字幕的统一框架,数据集和基准测试
-
标题: AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning
-
作者: Yiming Ren, Zhiqiang Lin, Yu Li, Gao Meng, Weiyun Wang, Junjie Wang, Zicheng Lin, Jifeng Dai, Yujiu Yang, Wenhai Wang, Ruihang Chu
-
日期: 2025-07-17
-
ArXiv主页 : https://arxiv.org/abs/2507.12841
-
gitHub仓库 : https://github.com/qishisuren123/AnyCap
英文摘要
Controllable captioning is essential for precise multimodal alignment and instruction following, yet existing models often lack fine-grained control and reliable evaluation protocols. To address this gap, we present the AnyCap Project, an integrated solution spanning model, dataset, and evaluation. We introduce AnyCapModel (ACM), a lightweight plug-and-play framework that enhances the controllability of existing foundation models for omni-modal captioning without retraining the base model. ACM reuses the original captions from base models while incorporating user instructions and modality features to generate improved captions. To remedy the data scarcity in controllable multimodal captioning, we build AnyCapDataset (ACD), covering three modalities, 28 user-instruction types, and 300,k high-quality data entries. We further propose AnyCapEval, a new benchmark that provides more reliable evaluation metrics for controllable captioning by decoupling content accuracy and stylistic fidelity. ACM markedly improves caption quality across a diverse set of base models on AnyCapEval. Notably, ACM-8B raises GPT-4o's content scores by 45% and style scores by 12%, and it also achieves substantial gains on widely used benchmarks such as MIA-Bench and VidCapBench.
中文摘要
可控字幕对于精确的多模式对齐和指令以下是必不可少的,但是现有模型通常缺乏细粒度的控制和可靠的评估协议。为了解决这一差距,我们介绍了AnyCap项目,该项目是跨越模型,数据集和评估的集成解决方案。我们介绍了AnyCapModel(ACM),这是一个轻巧的插件框架,可增强现有基础模型对Omni-Modal字幕的可控性,而无需重新培训基本模型。ACM在基本模型中重复了原始字幕,同时结合了用户说明和模态功能以生成改进的字幕。为了纠正可控的多模式字幕中的数据稀缺性,我们构建了AnyCapDataSet(ACD),涵盖了三种模式,28种用户指导类型和300 \,k高质量的数据条目。我们进一步提出了AnyCapeval,这是一种新的基准测试标准,可通过将内容准确性和风格忠诚度解耦,为可控字幕提供更可靠的评估指标。ACM明显提高了AnyCapeval各种基本模型的标题质量。值得注意的是,ACM-8B将GPT-4O的内容分数提高45 \%,样式得分提高了12 \%,并且在广泛使用的基准测试基准(例如MIA Bench和Vidcapbench)上也获得了可观的增长。
SWE-PERF:语言模型可以优化现实世界存储库上的代码性能吗?
- 标题: SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?
- 作者: Xinyi He, Qian Liu, Mingzhe Du, Lin Yan, Zhijie Fan, Yiming Huang, Zejian Yuan, Zejun Ma
- 日期: 2025-07-16
- ArXiv主页 : https://arxiv.org/abs/2507.12415
- 论文链接 : https://arxiv.org/pdf/2507.12415
- 项目链接 : https://swe-perf.github.io/
- gitHub仓库 : https://github.com/SWE-Perf/SWE-Perf
英文摘要
Code performance optimization is paramount in real-world software engineering and critical for production-level systems. While Large Language Models (LLMs) have demonstrated impressive capabilities in code generation and bug fixing, their proficiency in enhancing code performance at the repository level remains largely unexplored. To address this gap, we introduce SWE-Perf, the first benchmark specifically designed to systematically evaluate LLMs on code performance optimization tasks within authentic repository contexts. SWE-Perf comprises 140 carefully curated instances, each derived from performance-improving pull requests from popular GitHub repositories. Each benchmark instance includes the relevant codebase, target functions, performance-related tests, expert-authored patches, and executable environments. Through a comprehensive evaluation of representative methods that span file-level and repo-level approaches (e.g., Agentless and OpenHands), we reveal a substantial capability gap between existing LLMs and expert-level optimization performance, highlighting critical research opportunities in this emerging field.
中文摘要
在现实世界软件工程中,代码性能优化至关重要,对于生产级系统至关重要。尽管大型语言模型(LLMS)在代码生成和错误修复方面表现出了令人印象深刻的功能,但它们在增强存储库级别的代码性能方面的熟练程度仍然在很大程度上尚未开发。为了解决这一差距,我们介绍了SWE-Perf,这是第一个专门设计的基准,该基准是针对在真实存储库上下文中系统地评估代码性能优化任务的LLM的。SWE-PERF包括140个经过精心策划的实例,每个实例都是由流行的GitHub存储库中的性能提出的拉动请求得出的。每个基准实例包括相关的代码库,目标功能,与性能相关的测试,专家撰写的补丁和可执行的环境。通过对跨越文件级别和回复级方法的代表性方法的全面评估(例如,无代理和开放式方法),我们揭示了现有LLMS和专家级优化性能之间的巨大能力差距,突出了此新兴领域的重要研究机会。
KV缓存转向以诱导小语言模型的推理
-
标题: KV Cache Steering for Inducing Reasoning in Small Language Models
-
作者: Max Belitsky, Dawid J. Kopiczko, Michael Dorkenwald, M. Jehanzeb Mirza, Cees G. M. Snoek, Yuki M. Asano
-
日期: 2025-07-11
-
ArXiv主页 : https://arxiv.org/abs/2507.08799
英文摘要
We propose cache steering, a lightweight method for implicit steering of language models via a one-shot intervention applied directly to the key-value cache. To validate its effectiveness, we apply cache steering to induce chain-of-thought reasoning in small language models. Our approach leverages GPT-4o-generated reasoning traces to construct steering vectors that shift model behavior toward more explicit, multi-step reasoning without fine-tuning or prompt modifications. Experimental evaluations on diverse reasoning benchmarks demonstrate that cache steering improves both the qualitative structure of model reasoning and quantitative task performance. Compared to prior activation steering techniques that require continuous interventions, our one-shot cache steering offers substantial advantages in terms of hyperparameter stability, inference-time efficiency, and ease of integration, making it a more robust and practical solution for controlled generation.
中文摘要
我们提出了缓存转向,这是一种通过直接应用于键值缓存的单次干预来隐式转向语言模型的轻量级方法。为了验证其有效性,我们应用缓存转向以在小语言模型中诱导经过思考的推理。我们的方法利用GPT-4O生成的推理轨迹来构建转向向量,这些向量将模型行为转移到更明确的,多步推理的情况下,而无需微调或及时修改。对各种推理基准的实验评估表明,缓存转向既改善了模型推理的定性结构和定量任务绩效。与需要连续干预的先前激活转向技术相比,我们的单发缓存转向方向在超级参数稳定性,推理时间效率和易于整合方面具有很大的优势,使其成为受控生成的更强大和实用的解决方案。
Riemannlora:无歧义洛拉优化的统一的Riemannian框架
- 标题: RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization
- 作者: Vladimir Bogachev, Vladimir Aletov, Alexander Molozhavenko, Denis Bobkov, Vera Soboleva, Aibek Alanov, Maxim Rakhuba
- 日期: 2025-07-16
- ArXiv主页 : https://arxiv.org/abs/2507.12142
- 论文链接 : https://arxiv.org/pdf/2507.12142
英文摘要
Low-Rank Adaptation (LoRA) has become a widely adopted standard for parameter-efficient fine-tuning of large language models (LLMs), significantly reducing memory and computational demands. However, challenges remain, including finding optimal initialization strategies or mitigating overparametrization in low-rank matrix factorization. In this work, we propose a novel approach that addresses both of the challenges simultaneously within a unified framework. Our method treats a set of fixed-rank LoRA matrices as a smooth manifold. Considering adapters as elements on this manifold removes overparametrization, while determining the direction of the fastest loss decrease along the manifold provides initialization. Special care is taken to obtain numerically stable and computationally efficient implementation of our method, using best practices from numerical linear algebra and Riemannian optimization. Experimental results on LLM and diffusion model architectures demonstrate that RiemannLoRA consistently improves both convergence speed and final performance over standard LoRA and its state-of-the-art modifications.
中文摘要
低级适应性(LORA)已成为大型语言模型(LLMS)参数有效微调的广泛采用标准,可大大减少记忆和计算需求。但是,仍然存在挑战,包括找到最佳的初始化策略或减轻低级矩阵分解中的过度参数化。在这项工作中,我们提出了一种新颖的方法,该方法在统一框架内同时解决了这两个挑战。我们的方法将一组固定级的Lora矩阵视为平滑的歧管。考虑适配器作为此歧管上的元素消除了过分散热,同时确定沿流形的最快损耗减少的方向提供了初始化。采用数值线性代数和Riemannian优化的最佳实践,采取特殊护理以获得我们方法的数值稳定和计算有效实现。LLM和扩散模型体系结构的实验结果表明,Riemannlora始终提高收敛速度和最终性能,而不是标准LORA及其最先进的修改。
Ambrace-3K:在复杂环境中体现的推理和行动
- 标题: EmbRACE-3K: Embodied Reasoning and Action in Complex Environments
- 作者: Mingxian Lin, Wei Huang, Yitang Li, Chengjie Jiang, Kui Wu, Fangwei Zhong, Shengju Qian, Xin Wang, Xiaojuan Qi
- 日期: 2025-07-14
- ArXiv主页 : https://arxiv.org/abs/2507.10548
- 论文链接 : https://arxiv.org/pdf/2507.10548
- 项目链接 : https://mxllc.github.io/EmbRACE-3K
- gitHub仓库 : https://github.com/mxllc/EmbRACE-3K
英文摘要
Recent advanced vision-language models(VLMs) have demonstrated strong performance on passive, offline image and video understanding tasks. However, their effectiveness in embodied settings, which require online interaction and active scene understanding remains limited. In such scenarios, an agent perceives the environment from a first-person perspective, with each action dynamically shaping subsequent observations. Even state-of-the-art models such as GPT-4o, Claude 3.5 Sonnet, and Gemini 2.5 Pro struggle in open-environment interactions, exhibiting clear limitations in spatial reasoning and long-horizon planning. To address this gap, we introduce EmRACE-3K, a dataset of over 3,000 language-guided tasks situated in diverse, photorealistic environments constructed using Unreal Engine and the UnrealCV-Zoo framework. The tasks encompass a wide range of embodied challenges, including navigation, object manipulation, and multi-stage goal execution. Each task unfolds as a multi-step trajectory, pairing first-person visual observations with high-level instructions, grounded actions, and natural language rationales that express the agent's intent at every step. Using EmRACE-3K, we establish a benchmark to evaluate the embodied reasoning capabilities of VLMs across three key dimensions: Exploration, Dynamic Spatial-Semantic Reasoning, and Multi-stage Goal Execution. In zero-shot settings, all models achieve success rates below 20%, underscoring the challenge posed by our benchmark and the current limitations of VLMs in interactive environments. To demonstrate the utility of EmRACE-3K, we further fine-tune Qwen2.5-VL-7B using supervised learning followed by reinforcement learning. This approach yields substantial improvements across all three challenge categories, highlighting the dataset's effectiveness in enabling the development of embodied reasoning capabilities.
中文摘要
最近的高级视觉模型(VLM)在被动,离线图像和视频理解任务上表现出了很强的表现。但是,它们在体现的设置中的有效性(需要在线互动和积极的场景理解仍然有限)仍然有限。在这种情况下,代理从第一人称角度看待环境,每个动作都会动态塑造随后的观察。即使是最先进的模型,例如GPT-4O,Claude 3.5十四行诗和Gemini 2.5 Pro在开放环境相互作用中的斗争,在空间推理和长期计划中都有明显的限制。为了解决这一差距,我们介绍了Emrace-3K,这是一个超过3,000个语言指导任务的数据集,该任务位于使用Unreal Engine和UnrealCV-ZOO框架中构建的多样化的,逼真的环境中。这些任务包括广泛的具体挑战,包括导航,对象操纵和多阶段目标执行。每个任务都以多步轨迹的形式展开,将第一人称视觉观察与高级指令,基础动作和自然语言理由配对,这些观察在每个步骤都表达了代理商的意图。使用EMRACE-3K,我们建立了一个基准测试,以评估VLM在三个关键维度上的体现推理能力:探索,动态空间语义推理和多阶段目标执行。在零拍摄的设置中,所有模型都达到了20%以下的成功率,强调了我们的基准标准和交互式环境中VLM的当前局限性所带来的挑战。为了证明EMRACE-3K的实用性,我们使用有监督的学习,然后进行增强学习,进一步调整QWEN2.5-VL-7B。这种方法在所有三个挑战类别中都产生了重大改进,突出了数据集在实现具体推理能力的发展方面的有效性。
最佳数据混合物的缩放定律
- 标题: Scaling Laws for Optimal Data Mixtures
- 作者: Mustafa Shukor, Louis Bethune, Dan Busbridge, David Grangier, Enrico Fini, Alaaeldin El-Nouby, Pierre Ablin
- 日期: 2025-07-12
- ArXiv主页 : https://arxiv.org/abs/2507.09404
- 论文链接 : https://arxiv.org/pdf/2507.09404
英文摘要
Large foundation models are typically trained on data from multiple domains, with the data mixture--the proportion of each domain used--playing a critical role in model performance. The standard approach to selecting this mixture relies on trial and error, which becomes impractical for large-scale pretraining. We propose a systematic method to determine the optimal data mixture for any target domain using scaling laws. Our approach accurately predicts the loss of a model of size N trained with D tokens and a specific domain weight vector h. We validate the universality of these scaling laws by demonstrating their predictive power in three distinct and large-scale settings: large language model (LLM), native multimodal model (NMM), and large vision models (LVM) pretraining. We further show that these scaling laws can extrapolate to new data mixtures and across scales: their parameters can be accurately estimated using a few small-scale training runs, and used to estimate the performance at larger scales and unseen domain weights. The scaling laws allow to derive the optimal domain weights for any target domain under a given training budget (N,D), providing a principled alternative to costly trial-and-error methods.
中文摘要
大型基础模型通常经过来自多个域的数据培训,数据混合物(所使用的每个域的比例)在模型性能中发挥着关键作用。选择这种混合物的标准方法取决于反复试验,这对于大规模预处理而言是不切实际的。我们提出了一种系统的方法,可以使用缩放定律确定任何目标域的最佳数据混合物。我们的方法准确地预测了用D令牌和特定域重量向量h训练的大小N模型的丢失。我们通过在三个不同的大规模环境中证明它们的预测能力来验证这些缩放定律的普遍性:大语言模型(LLM),天然多模型模型(NMM)和大型视觉模型(LVM)预处理。我们进一步表明,这些缩放定律可以推断到新的数据混合物和跨量表:可以使用一些小规模的训练运行准确地估算它们的参数,并用于估算更大尺度的性能和看不见的域重量。缩放定律允许在给定培训预算(n,d)下为任何目标域的最佳域权重得出最佳域权重,从而提供了昂贵的试用和错误方法的原则替代方案。
FrafterBench:为土木工程中的任务自动化的大型语言模型基准测试
-
标题: DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering
-
作者: Yinsheng Li, Zhen Dong, Yi Shao
-
日期: 2025-07-15
-
ArXiv主页 : https://arxiv.org/abs/2507.11527
-
gitHub仓库 : https://github.com/Eason-Li-AIS/DrafterBench
英文摘要
Large Language Model (LLM) agents have shown great potential for solving real-world problems and promise to be a solution for tasks automation in industry. However, more benchmarks are needed to systematically evaluate automation agents from an industrial perspective, for example, in Civil Engineering. Therefore, we propose DrafterBench for the comprehensive evaluation of LLM agents in the context of technical drawing revision, a representation task in civil engineering. DrafterBench contains twelve types of tasks summarized from real-world drawing files, with 46 customized functions/tools and 1920 tasks in total. DrafterBench is an open-source benchmark to rigorously test AI agents' proficiency in interpreting intricate and long-context instructions, leveraging prior knowledge, and adapting to dynamic instruction quality via implicit policy awareness. The toolkit comprehensively assesses distinct capabilities in structured data comprehension, function execution, instruction following, and critical reasoning. DrafterBench offers detailed analysis of task accuracy and error statistics, aiming to provide deeper insight into agent capabilities and identify improvement targets for integrating LLMs in engineering applications. Our benchmark is available at https://github.com/Eason-Li-AIS/DrafterBench, with the test set hosted at https://huggingface.co/datasets/Eason666/DrafterBench.
中文摘要
大型语言模型(LLM)代理人表现出了解决现实世界中问题的巨大潜力,并有望成为行业任务自动化的解决方案。但是,需要更多的基准来系统地从工业角度进行自动化代理,例如在土木工程中。因此,我们建议在技术图修订的背景下对LLM代理进行全面评估(在土木工程中的代表任务)中进行全面评估。DrafterBench包含从实际图纸文件中汇总的十二种类型的任务,共有46个自定义功能/工具和1920个任务。DrafterBench是一种开源基准,可严格测试AI代理在解释复杂和长篇小说指令,利用先验知识以及通过隐式政策意识适应动态指导质量的熟练程度。该工具包全面评估了结构化数据理解,功能执行,指令以下和关键推理方面的不同功能。DrafterBench提供了对任务准确性和错误统计数据的详细分析,旨在更深入地了解代理能力,并确定将LLM集成在工程应用程序中的改进目标。我们的基准可以在https://github.com/eason-li-ais/drafterbench上获得,测试集托管在https://huggingface.co/datasets/seasy666/drafterbench上。
一个愚弄llm-as-a-a-augghgudge的象征
- 标题: One Token to Fool LLM-as-a-Judge
- 作者: Yulai Zhao, Haolin Liu, Dian Yu, S. Y. Kung, Haitao Mi, Dong Yu
- 日期: 2025-07-11
- ArXiv主页 : https://arxiv.org/abs/2507.08794
- 论文链接 : https://arxiv.org/pdf/2507.08794
英文摘要
Generative reward models (also known as LLMs-as-judges), which use large language models (LLMs) to evaluate answer quality, are increasingly adopted in reinforcement learning with verifiable rewards (RLVR). They are often preferred over rigid rule-based metrics, especially for complex reasoning tasks involving free-form outputs. In this paradigm, an LLM is typically prompted to compare a candidate answer against a ground-truth reference and assign a binary reward indicating correctness. Despite the seeming simplicity of this comparison task, we find that generative reward models exhibit surprising vulnerabilities to superficial manipulations: non-word symbols (e.g., ":" or ".") or reasoning openers like "Thought process:" and "Let's solve this problem step by step." can often lead to false positive rewards. We demonstrate that this weakness is widespread across LLMs, datasets, and prompt formats, posing a serious threat for core algorithmic paradigms that rely on generative reward models, such as rejection sampling, preference optimization, and RLVR. To mitigate this issue, we introduce a simple yet effective data augmentation strategy and train a new generative reward model with substantially improved robustness. Our findings highlight the urgent need for more reliable LLM-based evaluation methods. We release our robust, general-domain reward model and its synthetic training data at https://huggingface.co/sarosavo/Master-RM and https://huggingface.co/datasets/sarosavo/Master-RM.
中文摘要
使用大型语言模型(LLMS)评估答案质量的生成奖励模型(也称为LLMS-AS-judges)在具有可验证的奖励(RLVR)的增强学习中越来越多地采用。它们通常比基于严格的规则指标更喜欢,尤其是对于涉及自由形式产出的复杂推理任务。在此范式中,通常会提示LLM将候选人答案与地面确实参考进行比较,并分配指示正确性的二进制奖励。尽管这项比较任务似乎很简单,但我们发现,生成奖励模型对表面操纵表现出令人惊讶的漏洞:非单词符号(例如,":"或"或"。)或像"思考过程"之类的推理启动器:逐步解决此问题。"通常会导致假阳性奖励。我们证明,这种弱点在LLM,数据集和及时格式中广泛存在,对依赖生成奖励模型的核心算法范式构成了严重的威胁,例如拒绝采样,偏好优化和RLVR。为了减轻此问题,我们介绍了一种简单而有效的数据增强策略,并培训了一种具有大大改善的鲁棒性的新生成奖励模型。我们的发现突出了迫切需要基于LLM的更可靠的评估方法。我们在https://huggingface.co/sarosavo/master-rm和https://huggingface.co/datasetsets/sarosavo/master-rm上发布了强大的通用域奖励模型及其合成训练数据。