大模型数学库DeepSeek-Math-V2

大型语言模型在数学推理方面取得显著进展,但仅依赖正确答案奖励存在根本性局限:正确结果未必代表严谨推导。为此,研究团队提出"可自我验证的数学推理"框架,通过训练验证器评估证明过程的严谨性,并以此作为生成器的奖励模型。该框架采用验证计算扩展策略,自动标注难验证证明以持续提升验证能力。最终模型DeepSeekMath-V2在IMO、CMO等顶级数学竞赛中取得突破性成绩(如Putnam 118/120分),证明该方向对开发更强大的数学AI系统具有可行性。相关模型已在HuggingFace平台开源。

Large language models have made significant progress in mathematical reasoning, which serves as an important testbed for AI and could impact scientific research if further advanced. By scaling reasoning with reinforcement learning that rewards correct final answers, LLMs have improved from poor performance to saturating quantitative reasoning competitions like AIME and HMMT in one year. However, this approach faces fundamental limitations. Pursuing higher final answer accuracy doesn't address a key issue: correct answers don't guarantee correct reasoning. Moreover, many mathematical tasks like theorem proving require rigorous step-by-step derivation rather than numerical answers, making final answer rewards inapplicable. To push the limits of deep reasoning, we believe it is necessary to verify the comprehensiveness and rigor of mathematical reasoning. Self-verification is particularly important for scaling test-time compute, especially for open problems without known solutions. Towards self-verifiable mathematical reasoning, we investigate how to train an accurate and faithful LLM-based verifier for theorem proving. We then train a proof generator using the verifier as the reward model, and incentivize the generator to identify and resolve as many issues as possible in their own proofs before finalizing them. To maintain the generation-verification gap as the generator becomes stronger, we propose to scale verification compute to automatically label new hard-to-verify proofs, creating training data to further improve the verifier. Our resulting model, DeepSeekMath-V2, demonstrates strong theorem-proving capabilities, achieving gold-level scores on IMO 2025 and CMO 2024 and a near-perfect 118/120 on Putnam 2024 with scaled test-time compute. While much work remains, these results suggest that self-verifiable mathematical reasoning is a feasible research direction that may help develop more capable mathematical AI systems.

https://huggingface.co/deepseek-ai/DeepSeek-Math-V2

参考

https://ollama.com/t1c/deepseek-math-7b-rl
https://huggingface.co/deepseek-ai/deepseek-math-7b-base/tree/main

相关推荐
子游i3 分钟前
HappyHorse 1.0 创作指南
人工智能·ai·happyhorse
ting94520007 分钟前
动手学深度学习(PyTorch版)深度详解(4):深度学习计算实战详解
人工智能·pytorch·深度学习
QuestLab8 分钟前
【第26期】2026年4月29日 AI日报
人工智能
南宫萧幕9 分钟前
Python与Simulink联合仿真:基于DQN的HEV能量管理策略建模与全链路排雷实战
开发语言·人工智能·python·算法·机器学习·matlab·控制
ToTensor12 分钟前
Agent 记忆管理框架基准测试排名
人工智能·agent
极智视界13 分钟前
分类数据集 - 伪造人脸和真实人脸分类数据集下载
人工智能·yolo·数据集·图像分类·算法训练·人脸伪造检测
千寻girling15 分钟前
滑动窗口刷了快一个月(26天)了 , 还没有刷完. | 含(操作系统学什么的Java 后端)
java·开发语言·javascript·c++·人工智能·后端·python
GEO索引未来22 分钟前
国内首部GEO可信传播标准立项通过/DeepSeek-V4 正式上线并开源/Open AI、Google继续推进AI广告标准化
大数据·人工智能·gpt·ai·chatgpt·开源
Chengbei1123 分钟前
面向红队的 AI 赋能全场景流量分析仪 网页 / APP / 终端 / IoT 全域 HTTPS 抓包解密利器
人工智能·物联网·网络协议·web安全·网络安全·https·系统安全
小糖学代码24 分钟前
LLM系列:2.pytorch入门:9.神经网络的学习
人工智能·python·深度学习·神经网络·学习·机器学习