小米的MiMo-V2-Flash，谷歌味挺浓

前不久，小米特意挑选在创始人雷军生日当天发布了 MiMo-V2-Flash 这个最新模型。

MiMo是小米基础大模型的产品线，初代只做了7B的参数量，基本上是在复刻DeepSeek模型的训练过程(毕竟是从DeepSeek挖的人)，之前我也写过文章详细解读过。

试水成功后，下一代就是做Scaling，MiMo-V2-Flash采用 MoE 架构进行设计，总参数量一下子拓展到309B，其中，激活参数量为15B。

从官方 $1$ 给出的数据评测数值看，MiMo-V2-Flash已经站上了开源语言大模型的第一梯队。

除了性能外，速度是这款模型的更大亮点，从罗福莉的首秀PPT $2$ 看，它的速度比相近参数量的模型都要快很多。

Flash有多快？

Flash这个命名方式一看就是"致敬"谷歌的Gemini-Flash。

那么，MiMo-V2-Flash 和 Gemini-3-Flash 谁的速度更快？

下面来进行一个实验。

MiMo-V2-Flash 调用方式

目前在公测期，MiMo-V2-Flash 可以免费调用。

用uv初始化一个Python环境，并安装相关依赖。

复制代码

uv init
uv venv 
source .venv/bin/activate 
uv add openai
uv add python-dotenv

新建.env文件，配置环境变量，秘钥从mimo开放平台 $3$ 获取。

复制代码

MIMO_API_KEY=sk-xxx

运行下面的代码进行调用，我在官方代码的基础上，进一步提供了流式和非流式的两种输出方式。

python 复制代码

import os
from dotenv import load_dotenv
from openai import OpenAI

# 加载 .env 文件中的环境变量
load_dotenv()

api_key = os.getenv("MIMO_API_KEY")
if not api_key:
    raise RuntimeError("MIMO_API_KEY not found in .env file")

client = OpenAI(
    api_key=api_key,
    base_url="https://api.xiaomimimo.com/v1"
)

completion = client.chat.completions.create(
    model="mimo-v2-flash",
    messages=[
        {
            "role": "system",
            "content": (
                "You are MiMo, an AI assistant developed by Xiaomi. "
                "Today is date: Tuesday, December 16, 2025. "
                "Your knowledge cutoff date is December 2024."
            )
        },
        {
            "role": "user",
            "content": "你是谁？"
        }
    ],
    max_completion_tokens=1024,
    temperature=0.3,
    top_p=0.95,
    stream=False,
    stop=None,
    frequency_penalty=0,
    presence_penalty=0,
    extra_body={
        "thinking": {"type": "disabled"}
    }
)

# stream=False
print(completion.choices[0].message.content)

# stream=True
# for chunk in completion:
#     if not chunk.choices:
#         continue

#     delta = chunk.choices[0].delta

#     if hasattr(delta, "content") and delta.content:
#         print(delta.content, end="", flush=True)

一个有意思的发现：在示例代码中，默认配置了system指令，而把system的指令注释掉之后，问它"你是谁？"

它会做出以下内容回答：

又运行了一次，还是"忘本"了：

extra_body有个thinking的参数，用来控制模型是否思考，默认是关闭的，而将它打开，会发现它想起自己是谁了。

从这个现象看，小米的API服务经验不是很足，在不开thinking模式的时候，后台是没有额外的系统提示词的。

而开thinking模式后，在调用时估计会有额外的提示词增加。

MiMo-V2-Flash 速度测试

书回正题，下面来具体测试一下，MiMo-V2-Flash 和 Gemini-3-Flash 的速度。

用一个比较费时的提示词进行测试：

prompt：原封不动地输出圆周率的前1000位数字

对于 MiMo-V2-Flash，程序总运行时间：12.492 秒

对于 Gemini-3-Flash，用一个类似的程序去计算用时：

python 复制代码

import os
import time
from dotenv import load_dotenv
from google import genai

# 1. 加载 .env 文件
load_dotenv()

# 2. 读取 API Key
api_key = os.getenv("GOOGLE_API_KEY")
if not api_key:
    raise RuntimeError("GOOGLE_API_KEY not found in environment variables")

# 3. 创建 Gemini Client
client = genai.Client(api_key=api_key)

# 4. 指定模型
model_name = "gemini-3-flash-preview"

# 5. 开始计时
start_time = time.perf_counter()

# 6. 发起请求
response = client.models.generate_content(
    model=model_name,
    contents="原封不动地输出圆周率的前1000位数字"
)

# 7. 结束计时
end_time = time.perf_counter()
elapsed_s = end_time - start_time

# 8. 输出
print(response.text)
print(f"\n程序总运行时间：{elapsed_s:.3f} s")