o1模型

【偏好对齐】PRM应该奖励单个步骤的正确性吗？论文地址：《Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning》

聚梦小课堂

OpenAI GPT o1技术报告阅读（5）-安全性对齐以及思维链等的综合评估与思考原文链接：https://openai.com/index/learning-to-reason-with-llms/

我是有底线的