大模型数学库DeepSeek-Math-V2

大型语言模型在数学推理方面取得显著进展,但仅依赖正确答案奖励存在根本性局限:正确结果未必代表严谨推导。为此,研究团队提出"可自我验证的数学推理"框架,通过训练验证器评估证明过程的严谨性,并以此作为生成器的奖励模型。该框架采用验证计算扩展策略,自动标注难验证证明以持续提升验证能力。最终模型DeepSeekMath-V2在IMO、CMO等顶级数学竞赛中取得突破性成绩(如Putnam 118/120分),证明该方向对开发更强大的数学AI系统具有可行性。相关模型已在HuggingFace平台开源。

Large language models have made significant progress in mathematical reasoning, which serves as an important testbed for AI and could impact scientific research if further advanced. By scaling reasoning with reinforcement learning that rewards correct final answers, LLMs have improved from poor performance to saturating quantitative reasoning competitions like AIME and HMMT in one year. However, this approach faces fundamental limitations. Pursuing higher final answer accuracy doesn't address a key issue: correct answers don't guarantee correct reasoning. Moreover, many mathematical tasks like theorem proving require rigorous step-by-step derivation rather than numerical answers, making final answer rewards inapplicable. To push the limits of deep reasoning, we believe it is necessary to verify the comprehensiveness and rigor of mathematical reasoning. Self-verification is particularly important for scaling test-time compute, especially for open problems without known solutions. Towards self-verifiable mathematical reasoning, we investigate how to train an accurate and faithful LLM-based verifier for theorem proving. We then train a proof generator using the verifier as the reward model, and incentivize the generator to identify and resolve as many issues as possible in their own proofs before finalizing them. To maintain the generation-verification gap as the generator becomes stronger, we propose to scale verification compute to automatically label new hard-to-verify proofs, creating training data to further improve the verifier. Our resulting model, DeepSeekMath-V2, demonstrates strong theorem-proving capabilities, achieving gold-level scores on IMO 2025 and CMO 2024 and a near-perfect 118/120 on Putnam 2024 with scaled test-time compute. While much work remains, these results suggest that self-verifiable mathematical reasoning is a feasible research direction that may help develop more capable mathematical AI systems.

https://huggingface.co/deepseek-ai/DeepSeek-Math-V2

参考

https://ollama.com/t1c/deepseek-math-7b-rl
https://huggingface.co/deepseek-ai/deepseek-math-7b-base/tree/main

相关推荐
wxl7812272 小时前
驾驭工程 (Harness Engineering):AI Agent 时代的软件工程新范式
人工智能·软件工程
程序员Shawn2 小时前
【深度学习 | 第四篇】- 循环神经网络
人工智能·rnn·深度学习
33三 三like2 小时前
BERT-BiLSTM-CRF 养老需求实体抽取模型解析与实践:从口语文本到结构化知识
人工智能·深度学习·bert
vx_biyesheji00012 小时前
计算机毕业设计:Python城市交通出行模式挖掘系统 Django框架 可视化 数据分析 PyEcharts 交通 深度学习(建议收藏)✅
人工智能·python·深度学习·数据分析·django·汽车·课程设计
yuanmazhiwu2 小时前
计算机毕业设计:Python智慧出行数据分析与模式识别系统 Django框架 可视化 数据分析 PyEcharts 交通 深度学习(建议收藏)✅
人工智能·python·算法·数据分析·django·flask·课程设计
纤纡.2 小时前
基于 OpenCV 与 dlib 的人脸检测与关键点定位实战教程
人工智能·opencv·计算机视觉
lovingsoft2 小时前
VSCode+Claude Code+Playwright-MCP 配置实操|零踩坑,1分钟打通AI浏览器自动化
人工智能·vscode·自动化
kay_5452 小时前
YOLO26改进 | 卷积模块 | 利用频域特征加强空间细节与纹理表示能力【CVPR2025】
人工智能·目标检测·计算机视觉·目标跟踪·yolo26·yolo26改进·研究生论文
东坡肘子2 小时前
苹果的罕见妥协:当高危漏洞遇上“拒升”潮 -- 肘子的 Swift 周报 #130
人工智能·swiftui·swift