大模型数学库DeepSeek-Math-V2

大型语言模型在数学推理方面取得显著进展,但仅依赖正确答案奖励存在根本性局限:正确结果未必代表严谨推导。为此,研究团队提出"可自我验证的数学推理"框架,通过训练验证器评估证明过程的严谨性,并以此作为生成器的奖励模型。该框架采用验证计算扩展策略,自动标注难验证证明以持续提升验证能力。最终模型DeepSeekMath-V2在IMO、CMO等顶级数学竞赛中取得突破性成绩(如Putnam 118/120分),证明该方向对开发更强大的数学AI系统具有可行性。相关模型已在HuggingFace平台开源。

Large language models have made significant progress in mathematical reasoning, which serves as an important testbed for AI and could impact scientific research if further advanced. By scaling reasoning with reinforcement learning that rewards correct final answers, LLMs have improved from poor performance to saturating quantitative reasoning competitions like AIME and HMMT in one year. However, this approach faces fundamental limitations. Pursuing higher final answer accuracy doesn't address a key issue: correct answers don't guarantee correct reasoning. Moreover, many mathematical tasks like theorem proving require rigorous step-by-step derivation rather than numerical answers, making final answer rewards inapplicable. To push the limits of deep reasoning, we believe it is necessary to verify the comprehensiveness and rigor of mathematical reasoning. Self-verification is particularly important for scaling test-time compute, especially for open problems without known solutions. Towards self-verifiable mathematical reasoning, we investigate how to train an accurate and faithful LLM-based verifier for theorem proving. We then train a proof generator using the verifier as the reward model, and incentivize the generator to identify and resolve as many issues as possible in their own proofs before finalizing them. To maintain the generation-verification gap as the generator becomes stronger, we propose to scale verification compute to automatically label new hard-to-verify proofs, creating training data to further improve the verifier. Our resulting model, DeepSeekMath-V2, demonstrates strong theorem-proving capabilities, achieving gold-level scores on IMO 2025 and CMO 2024 and a near-perfect 118/120 on Putnam 2024 with scaled test-time compute. While much work remains, these results suggest that self-verifiable mathematical reasoning is a feasible research direction that may help develop more capable mathematical AI systems.

https://huggingface.co/deepseek-ai/DeepSeek-Math-V2

参考

https://ollama.com/t1c/deepseek-math-7b-rl
https://huggingface.co/deepseek-ai/deepseek-math-7b-base/tree/main

相关推荐
染指111016 小时前
26.RAG进阶(Advanced RAG)-假设性问题索引
人工智能·windows·agent·rag·advanced rag
闵孚龙16 小时前
动态图机制:为什么 PyTorch 调试起来更舒服
人工智能·pytorch·python
甲维斯17 小时前
还要啥Codex!DeepSeek接入Zcode远程连接!
人工智能
百胜软件@百胜软件17 小时前
百胜软件亮相“AI消费新生活”主题日活动,AI智能运营平台入选市级案例征集
人工智能·生活·零售数字化·数智中台·珠宝行业
专注搞钱18 小时前
GPT-4o写设备Recipe:从3小时到10分钟
数据库·人工智能·gpt·半导体
闻道参看18 小时前
贝芯宠AI灵兽 ELFVET 大模型聚焦临床应用,强化宠物诊疗综合能力
人工智能·宠物
MartinYeung518 小时前
[论文学习]重新思考大型语言模型忘却目标:梯度视角与超越
人工智能·学习·语言模型
财经资讯数据_灵砚智能18 小时前
基于全球经济类多源新闻的NLP情感分析与数据可视化(夜间-次晨)2026年6月14日
大数据·人工智能·python·ai·信息可视化·自然语言处理·灵砚智能
m0_3801671419 小时前
加密货币价格 API、市场数据 API 与 分析 API 有什么区别?
人工智能·ai·区块链
zyplayer-doc19 小时前
企业知识库安全与权限管理完全指南:从加密到审计的六层防护
人工智能·安全·pdf·编辑器·创业创新