【论文速递】2026年第02周(Jan-04-10)(Robotics/Embodied AI/LLM)中文使用 googletrans 翻译,翻译不对的地方以英文为准As language models become increasingly capable, users expect them to provide not only accurate responses but also behaviors aligned with diverse human preferences across a variety of scenarios. To achieve this, Reinforcemen