【论文速递】2025年第30周(Jul-20-26)(Robotics/Embodied AI/LLM)中文使用 googletrans 翻译,翻译不对的地方以英文为准This paper introduces Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant reinforcement learning algorithm for training large language models. Unlike previous algorithms that adopt token-level i