想在云上低成本部署高性能Agent？MiniMax-M2 + DigitalOcean实战指南

MiniMax-M2 为开发者提供了一个引人注目的解决方案，它通过一个拥有 2300 亿参数但仅激活 100 亿参数的专家混合模型，来提供编码和智能体能力。该模型在保持与 Claude Sonnet 4.5 和 GPT-5 等尖端模型相媲美的性能的同时，仅需其一小部分计算开销，因此尤其适合那些对成本控制和低延迟有严格要求的部署场景。

模型概览

核心能力	面向开发者的核心价值	关键指标/详情
智能体性能	MiniMax-M2 使用 `...` 标签将其推理过程与最终输出分离。这使模型能够在多轮交互中保持连贯的思维链。擅长需要规划、执行与调整的复杂长程任务，是构建自主智能体的理想选择。	在 BrowseComp（44.0 分）和 ArtifactsBench（66.8 分）上表现出色，超越多个规模更大的模型。
高级编码	专为端到端的开发者工作流设计，支持包含"编码-运行-修复"的迭代循环以及多文件编辑。	在 Terminal-Bench（46.3 分）和 SWE-bench Verified（69.4 分）基准测试中极具竞争力。
工具调用能力	为复杂工具集成（Shell、浏览器、搜索）而构建，在与外部数据或系统交互时表现稳健可靠。	提供专门的工具调用指南。在 HLE（使用工具）及其他工具增强基准测试中表现强劲。
卓越的通用智能	在通用知识和推理方面保持竞争力，确保即使在核心编码任务之外也能可靠工作。	综合 AA 智能得分达 61 分，在开源模型中名列前茅。

部署指南

官方文档给出了多种运行 MiniMax-M2 的方式。

以下为官方文档中推荐的配置，实际需求请根据具体用例调整：

4×96 GB GPU：支持最长 400 K token 的上下文
8×144 GB GPU：支持最长 3 M token 的上下文

由于我们这次用的是数据量比较大的模型，所以我们直接用 8×H200 的集群来运行它。

我们在这里使用的是 DigitalOcean 的 GPU Droplet 云服务器。目前 DigitalOcean 可以提供 H200（单卡或 8 卡）、H100（单卡或 8 卡）等一系列 GPU 服务器机型，而且支持按需实例和裸金属。

相对于 AWS、GCP 等云平台，DigitalOcean 提供的 GPU 服务器总体成本更低，而且使用简单，无学习成本。DigitalOcean 还将在明年年初正式推出基于 NVIDIA B300 的 GPU Droplet 服务器，详情可直接咨询 DigitalOcean 中国区独家战略合作伙伴卓普云 AI Droplet。

1. 在 Web 控制台里

复制代码

apt install python3.10-venv

perl 复制代码

v pip install 'triton-kernels @ git+https://github.com/triton-lang/triton.git@v3.5.0#subdirectory=python/triton_kernels'  vllm --extra-index-url https://wheels.vllm.ai/nightly --prerelease=allow

启动服务：

css 复制代码

SAFETENSORS_FAST_GPU=1 vllm serve \
    MiniMaxAI/MiniMax-M2 --trust-remote-code \
    --tensor-parallel-size 4 \
    --enable-auto-tool-choice --tool-call-parser minimax_m2 \
    --reasoning-parser minimax_m2_append_think

安装 vllm 和 fla-core

复制代码

pip install vllm fla-core

然后我们发送一段请求。

arduino 复制代码

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "MiniMaxAI/MiniMax-M2",
    "messages": [
      {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
      {"role": "user", "content": [{"type": "text", "text": "Who won the world series in 2020?"}]}
    ]
  }'

我们在 Web 控制台里看到了模型的输出。

我们看下模型给我们输出了什么：

less 复制代码

"content": "<think>Okay, the user is asking who won the World Series in 2020. Let me start by recalling the correct information. The Los Angeles Dodgers defeated the Tampa Bay Rays. That was a significant event because it was during the pandemic, so the season had unique adjustments.</think>The user might be a baseball fan wanting a quick fact, or perhaps someone who has heard about the series and wants confirmation. They didn't mention any specific context, so keeping it straightforward is best. But maybe they're interested in the historical aspect of that win, especially since it was the Dodgers' first title since 1988. Including that detail adds value.I should also note the unusual circumstances of the 2020 season. The COVID-19 pandemic affected everything, with shortened seasons, limited crowds, and other changes. Mentioning these factors could address deeper curiosity about how the event stood out. However, since the user didn't ask for that, I should balance conciseness with relevant context.\\n\\nThe user possibly a casual viewer looking for a quick answer? Or a student working on a project? They might need the answer for trivia or an assignment. Either way, providing the teams and the year, along with the significance of the Dodgers' win, covers the essentials without overcomplicating. No need for excessive details unless they follow up. Keeping the response clear and accurate is key here.\\n\\n<|im-start|>The **Los Angeles Dodgers** won the **2020 World Series**. \\n\\nThey defeated the **Tampa Bay Rays** in 6 games (4-2). This was the Dodgers' first World Series championship since 1988. The 2020 World Series was held in Arlington, Texas, at Globe Life Field (the neutral site) due to the COVID-19 pandemic and its impact on the MLB season. \\n\\nThe decisive game was game 6, played on October 27, 2020, where the Dodgers won 3-1.<|im-end|>"

这段输出展示了 MiniMax-M2 的核心特性：

交错思考格式：使用 <think> 标签将内部推理与最终答案分开。
高质量输出：给出准确、简洁且格式规范的答案，既包含关键事实（道奇击败光芒），也补充了相关背景（疫情环境、中立球场、历史意义），体现了前沿级别的事实检索与总结能力。

如果你正在构建智能体系统、编程工具，或者任何既需要高智能又追求高效率的应用，不妨试用一下这个模型。

6. 常见问题

Q：MiniMax-M2 是什么？

A：总参 230 B 的 MoE 模型，专为代码与 Agent 场景设计，每 token 仅激活 10 B，兼顾性能与成本。

Q：支持工具调用吗？

A：支持。采用"工具优先"设计，可自动判断何时调用外部工具。

Q：什么是"交错思考"？

A：模型用 ... 把中间推理与最终答案分开，方便多轮对话中保持连贯的逻辑链。

Q：有哪些 Agent 基准表现？

A：在 Terminal-Bench 得 46.3 %，在 BrowseComp 得 44 %，超过很多更大的通用模型。