vapo

大模型·强化学习·dapo·vapo

【RL】DAPO的后续：VAPO算法VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

我是有底线的