Phi-4-mini-flash-reasoning 部署安装与推理测试完整记录

B站视频：https://www.bilibili.com/video/BV1EhRFBeEJN/

文章目录

[Phi-4-mini-flash-reasoning 部署安装与推理测试完整记录](#Phi-4-mini-flash-reasoning 部署安装与推理测试完整记录)
- 一、前言
- 二、模型下载
- 三、环境安装（两套方案）
- [方案一：原有成功安装记录（使用预编译 wheel 和 TMPDIR 技巧）](#方案一：原有成功安装记录（使用预编译 wheel 和 TMPDIR 技巧）)
- - [1. 创建 Conda 环境](#1. 创建 Conda 环境)
  - [2. 安装 PyTorch CUDA 12.4](#2. 安装 PyTorch CUDA 12.4)
  - [3. 安装 Transformers / Accelerate / ModelScope](#3. 安装 Transformers / Accelerate / ModelScope)
  - [4. 安装 Mamba 相关依赖](#4. 安装 Mamba 相关依赖)
  - [5. 安装 flash-attn](#5. 安装 flash-attn)
  - [6. 验证环境](#6. 验证环境)
- 方案二：新建独立环境精确安装（推荐避免版本冲突）
- - [1. 创建新环境并安装 PyTorch 2.6.0](#1. 创建新环境并安装 PyTorch 2.6.0)
  - [2. 通过 conda 安装 nvcc（关键步骤）](#2. 通过 conda 安装 nvcc（关键步骤）)
  - [3. 安装构建依赖和官方指定的 Mamba 扩展](#3. 安装构建依赖和官方指定的 Mamba 扩展)
  - - [3.1 安装 causal-conv1d](#3.1 安装 causal-conv1d)
    - [3.2 安装 mamba-ssm](#3.2 安装 mamba-ssm)
  - [4. 安装 flash-attn](#4. 安装 flash-attn)
  - [5. 安装其他依赖](#5. 安装其他依赖)
  - [6. 最终验证](#6. 最终验证)
- 四、基础模型运行测试（通用）
- - 第一版推理测试代码
- 五、改进版脚本：分离思考内容与最终回答
- 六、运行结果示例
- - [answer.txt 内容](#answer.txt 内容)
  - [raw.txt 内容](#raw.txt 内容)
  - [result.json 内容](#result.json 内容)
  - [thinking.txt 内容](#thinking.txt 内容)
- 七、常见问题记录
- - [问题 1：`No module named 'flash_attn'`](#问题 1：No module named 'flash_attn')
  - [问题 2：`Invalid cross-device link`](#问题 2：Invalid cross-device link)
  - [问题 3：`causal_conv1d_cuda` / `selective_scan_cuda` 缺失](#问题 3：causal_conv1d_cuda / selective_scan_cuda 缺失)
  - [问题 4：`undefined symbol: _ZN3c105ErrorC2E...`（FlashAttention 加载失败）](#问题 4：undefined symbol: _ZN3c105ErrorC2E...（FlashAttention 加载失败）)
- 八、关键经验总结
- 九、完整流程回顾

Phi-4-mini-flash-reasoning 部署安装与推理测试完整记录

本文记录一次在本地服务器上部署并运行 Phi-4-mini-flash-reasoning 的完整过程，包括模型下载、Conda 环境创建、PyTorch CUDA 环境配置、Mamba 相关依赖安装、flash-attn 安装问题处理、推理测试脚本编写，以及将模型输出中的 <think>...</think> 思考内容与最终回答内容分离保存。

模型来源：魔塔社区 ModelScope

模型地址：https://www.modelscope.cn/models/LLM-Research/Phi-4-mini-flash-reasoning

一、前言

这次主要目标是完成 Phi-4-mini-flash-reasoning 的本地部署与推理测试。整个安装过程中涉及以下几个关键组件：

Python / Conda 环境
PyTorch CUDA 12.4
Transformers / Accelerate / ModelScope
Mamba 相关 CUDA 扩展依赖
flash-attn
本地模型加载与推理脚本
<think>...</think> 推理内容与最终回答内容分离

最终测试结果显示，模型可以正常加载，并成功输出 <think>...</think> 推理过程以及最终答案。

二、模型下载

模型地址：

https://www.modelscope.cn/models/LLM-Research/Phi-4-mini-flash-reasoning

可以使用 ModelScope 的 snapshot_download 下载模型。

python 复制代码

# source /etc/network_turbo
from modelscope import snapshot_download

# 指定模型的下载路径
cache_dir = '/mnt/sda_8t/winstonYF/model'
# 调用 snapshot_download 函数下载模型

model_dir = snapshot_download('LLM-Research/Phi-4-mini-flash-reasoning', cache_dir=cache_dir)

print(f"模型已下载到: {model_dir}")

下载完成后，模型目录类似如下：

text 复制代码

/mnt/sda_8t/winstonYF/model/LLM-Research/Phi-4-mini-flash-reasoning

后续推理脚本可以直接使用这个本地模型路径。

三、环境安装（两套方案）

下面提供两套完整的安装方案。方案一 是实际成功运行的记录（包含遇到 Invalid cross-device link 问题的解决方法）；方案二是通用性更强的独立环境精确安装方法（推荐在新机器或遇到版本冲突时使用）。你可以根据自己的情况选择。

方案一：原有成功安装记录（使用预编译 wheel 和 TMPDIR 技巧）

这套方案已在目标机器上验证成功，最终能够正常运行模型并分离思考内容。

1. 创建 Conda 环境

建议使用 Python 3.10，不建议使用 Python 3.12。部分深度学习 CUDA 扩展包在 Python 3.10 上兼容性更好，安装也更稳定。

bash 复制代码

conda create -n phi4flash python=3.10 -y
conda activate phi4flash

安装基础构建工具：

bash 复制代码

pip install -U pip setuptools wheel ninja packaging \
  -i https://pypi.tuna.tsinghua.edu.cn/simple

这些工具主要用于后续编译 CUDA 扩展，例如 causal-conv1d、mamba-ssm、flash-attn 等。

2. 安装 PyTorch CUDA 12.4

使用阿里云 PyTorch wheel 镜像安装 PyTorch 2.6.0 CUDA 12.4 版本：

bash 复制代码

pip install \
  torch==2.6.0+cu124 \
  torchvision==0.21.0+cu124 \
  torchaudio==2.6.0+cu124 \
  -f https://mirrors.aliyun.com/pytorch-wheels/cu124/

安装完成后建议先检查 PyTorch 和 CUDA 是否可用。

bash 复制代码

python - <<'PY'
import torch
print(torch.__version__)
print(torch.version.cuda)
print(torch.cuda.is_available())
PY

如果输出里 torch.cuda.is_available() 为 True，说明 PyTorch CUDA 环境基本正常。

3. 安装 Transformers / Accelerate / ModelScope

bash 复制代码

pip install \
  transformers==4.46.1 \
  accelerate==1.4.0 \
  einops \
  modelscope \
  -i https://pypi.tuna.tsinghua.edu.cn/simple

其中：

transformers：用于加载和运行模型。
accelerate：用于设备映射、多卡加载等。
einops：一些模型结构中会用到。
modelscope：用于下载和加载魔塔模型。

4. 安装 Mamba 相关依赖

Phi-4-mini-flash-reasoning 运行时需要 Mamba 相关 CUDA 扩展，因此需要安装 causal-conv1d 和 mamba-ssm。

先安装 causal-conv1d：

bash 复制代码

pip install causal-conv1d==1.5.0.post8 \
  --no-build-isolation \
  -i https://pypi.tuna.tsinghua.edu.cn/simple

再安装 mamba-ssm：

bash 复制代码

pip install mamba-ssm==2.2.4 \
  --no-build-isolation \
  -i https://pypi.tuna.tsinghua.edu.cn/simple

说明：

text 复制代码

causal_conv1d_cuda 不是单独安装的包；
它是 causal-conv1d 安装后生成的 CUDA 扩展模块。

selective_scan_cuda 也不是单独安装的包；
它是 mamba-ssm 安装后生成的 CUDA 扩展模块。

因此如果后面报错提示缺少 causal_conv1d_cuda 或 selective_scan_cuda，不要直接去 pip install causal_conv1d_cuda selective_scan_cuda，而应该检查 causal-conv1d 和 mamba-ssm 是否安装成功。

5. 安装 flash-attn

安装 flash-attn 时，第一次直接安装可能会遇到下面的问题：

text 复制代码

error: [Errno 18] Invalid cross-device link

这个问题通常是因为 pip 临时目录、缓存目录和当前目录不在同一个磁盘或挂载点上，构建 wheel 时跨设备移动文件失败。

推荐使用同一个大盘目录作为 pip 临时目录和缓存目录：

bash 复制代码

mkdir -p /mnt/sda_8t/winstonYF/tmp/pip_cache
mkdir -p /mnt/sda_8t/winstonYF/tmp/pip_tmp

清理旧缓存：

bash 复制代码

pip uninstall -y flash-attn flash_attn
pip cache purge

重新安装：

bash 复制代码

TMPDIR=/mnt/sda_8t/winstonYF/tmp/pip_tmp \
PIP_CACHE_DIR=/mnt/sda_8t/winstonYF/tmp/pip_cache \
pip install flash_attn==2.7.4.post1 \
  --no-build-isolation \
  --no-cache-dir \
  -i https://pypi.tuna.tsinghua.edu.cn/simple

如果仍然失败，可以直接下载预编译 wheel：

bash 复制代码

cd /mnt/sda_8t/winstonYF/tmp

wget -O flash_attn-2.7.4.post1-cu12torch2.6-cp310.whl \
"https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl"

pip install /mnt/sda_8t/winstonYF/tmp/flash_attn-2.7.4.post1-cu12torch2.6-cp310.whl

6. 验证环境

bash 复制代码

python - <<'PY'
import torch
print("torch:", torch.__version__)
print("cuda:", torch.version.cuda)
print("cuda available:", torch.cuda.is_available())

import einops
import flash_attn
print("flash_attn:", flash_attn.__version__)

import causal_conv1d
import causal_conv1d_cuda
import mamba_ssm
import selective_scan_cuda

print("all ok")
PY

看到类似下面结果即可：

text 复制代码

torch: 2.6.0+cu124
cuda: 12.4
cuda available: True
flash_attn: 2.7.4.post1
all ok

方案二：新建独立环境精确安装（推荐避免版本冲突）

这套方案通过 conda 安装 nvcc，并严格锁定 PyTorch 版本，避免因为依赖冲突导致 torch 被意外升级，从而引发 undefined symbol 等 ABI 不兼容问题。适合在新机器或环境已经混乱时使用。

1. 创建新环境并安装 PyTorch 2.6.0

bash 复制代码

conda create -n phi4flash_clean python=3.10 -y
conda activate phi4flash_clean

# 安装 PyTorch 2.6.0 + CUDA 12.4（官方稳定版）
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124

或者使用阿里云镜像加速：

bash 复制代码

pip install torch==2.6.0+cu124 torchvision==0.21.0+cu124 torchaudio==2.6.0+cu124 -f https://mirrors.aliyun.com/pytorch-wheels/cu124/

验证安装：

bash 复制代码

python -c "import torch; print(torch.__version__, torch.cuda.is_available())"
# 输出应为: 2.6.0 True

2. 通过 conda 安装 nvcc（关键步骤）

bash 复制代码

conda install -c nvidia cuda-nvcc=12.4 -y

安装后检查版本：

bash 复制代码

nvcc --version
# 应显示 release 12.4

3. 安装构建依赖和官方指定的 Mamba 扩展

⚠️ 重要提示 ：官方要求特定版本：causal-conv1d==1.5.0.post8，mamba-ssm==2.2.4。

优先推荐使用预编译 wheel（避免源码编译失败），如果预编译版不可用再尝试源码编译。

3.1 安装 causal-conv1d

方法一（推荐）：直接安装预编译 wheel

先确认 ABI 标签（通常为 FALSE）：

bash 复制代码

python -c "import torch; print(torch.compiled_with_cxx11_abi())"
# 若输出 False，使用 cxx11abiFALSE；若 True 则用 cxx11abiTRUE

下载对应 wheel（以 cxx11abiFALSE 为例）：

bash 复制代码

wget https://github.com/Dao-AILab/causal-conv1d/releases/download/v1.5.0.post8/causal_conv1d-1.5.0.post8%2Bcu124torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

如果上述链接 404，请手动在 causal-conv1d Releases 页面查找与 1.5.0.post8、cu124、torch2.6、cp310 匹配的 .whl 文件，下载后执行：

bash 复制代码

pip install causal_conv1d-1.5.0.post8+cu124torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

方法二：源码编译（备选）

bash 复制代码

pip install causal-conv1d==1.5.0.post8 --no-build-isolation

3.2 安装 mamba-ssm

同样优先使用预编译 wheel（从 mamba releases 下载匹配版本，如 mamba_ssm-2.2.4+cu124torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl）：

bash 复制代码

# 示例（需根据实际文件名修改）
pip install mamba_ssm-2.2.4+cu124torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

如无预编译 wheel，则从源码编译：

bash 复制代码

pip install mamba-ssm==2.2.4 --no-build-isolation

4. 安装 flash-attn

官方要求版本 flash_attn==2.7.4.post1。推荐使用预编译 wheel 以避免编译错误。

方法一（推荐）：直接安装预编译 wheel

确认 ABI 标签（同上，通常为 FALSE），下载：

bash 复制代码

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

方法二：源码编译（备选）

bash 复制代码

export MAX_JOBS=4   # 限制并行数，避免内存不足
pip install flash-attn==2.7.4.post1 --no-build-isolation

5. 安装其他依赖

bash 复制代码

pip install transformers==4.46.1 accelerate==1.4.0 einops modelscope -i https://pypi.tuna.tsinghua.edu.cn/simple

6. 最终验证

运行以下命令，确保所有模块可正常导入：

bash 复制代码

python -c "import torch, flash_attn, causal_conv1d, causal_conv1d_cuda, mamba_ssm, selective_scan_cuda, transformers, accelerate; print('All modules imported successfully!')"

若输出无报错，说明环境已准备就绪，可以运行 Phi-4-mini-flash-reasoning 模型。

四、基础模型运行测试（通用）

以下代码适用于上述两种方案安装好的环境。

示例运行命令：

bash 复制代码

python phi.py \
  --gpu 0 \
  --model "/mnt/sda_8t/winstonYF/model/LLM-Research/Phi-4-mini-flash-reasoning" \
  --prompt "How to solve 3*x^2+4*x+5=1?"

成功运行时会看到类似下面的输出：

text 复制代码

Loading checkpoint shards: 100%|...
<think>
...
</think>
...
<|end|>

第一版推理测试代码

下面是 Phi-4-mini-flash-reasoning 的基础测试代码（不包含分离思考内容的功能）。

python 复制代码

'''
python phi.py \
    --gpu 0 \
    --model "/mnt/sda_8t/winstonYF/model/LLM-Research/Phi-4-mini-flash-reasoning" \
    --prompt "How to solve 3*x^2+4*x+5=1?"
'''

import os
import argparse


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--gpu",
        type=str,
        default="0",
        help='指定使用哪些 GPU，例如 "0"、"1,3"。传 "cpu" 则用 CPU。',
    )
    parser.add_argument(
        "--model",
        type=str,
        default="LLM-Research/Phi-4-mini-flash-reasoning",
        help="模型路径或模型 ID，例如 LLM-Research/Phi-4-mini-flash-reasoning，或本地路径。",
    )
    parser.add_argument(
        "--prompt",
        type=str,
        default="How to solve 3*x^2+4*x+5=1?",
        help="输入问题。",
    )
    return parser.parse_args()


args = parse_args()

# 关键：在 import torch / 加载模型前设置
use_cpu = args.gpu.lower() == "cpu"
if not use_cpu:
    os.environ["CUDA_VISIBLE_DEVICES"] = args.gpu

import torch
from modelscope import AutoModelForCausalLM, AutoTokenizer

torch.random.manual_seed(0)

model_id = args.model

if use_cpu:
    device_map = "cpu"
else:
    if "," in args.gpu:
        device_map = "auto"
    else:
        device_map = {"": "cuda:0"}

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map=device_map,
    torch_dtype="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True,
)

messages = [{
    "role": "user",
    "content": args.prompt,
}]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
)

if use_cpu:
    input_device = torch.device("cpu")
else:
    input_device = torch.device("cuda:0")

inputs = {k: v.to(input_device) for k, v in inputs.items()}

outputs = model.generate(
    **inputs,
    max_new_tokens=32768,
    temperature=0.6,
    top_p=0.95,
    do_sample=True,
)

text = tokenizer.batch_decode(
    outputs[:, inputs["input_ids"].shape[-1]:],
    skip_special_tokens=False,
)[0]

print(text)

五、改进版脚本：分离思考内容与最终回答

下面对测试代码进行改进，让思考内容与回答内容可以分开显示、保存。

支持的功能包括：

--show all：同时显示 thinking 和 answer。
--show raw：显示原始输出。
--show thinking：只显示 <think>...</think> 中的思考内容。
--show answer：只显示最终回答。
--show json：以 JSON 格式显示结果。
--save_dir outputs：保存 raw.txt、thinking.txt、answer.txt、result.json。

python 复制代码

'''
python phi.py \
    --gpu 0 \
    --model "/mnt/sda_8t/winstonYF/model/LLM-Research/Phi-4-mini-flash-reasoning" \
    --prompt "How to solve 3*x^2+4*x+5=1?"

只看最终回答：
python phi.py \
    --gpu 0 \
    --model "/mnt/sda_8t/winstonYF/model/LLM-Research/Phi-4-mini-flash-reasoning" \
    --prompt "How to solve 3*x^2+4*x+5=1?" \
    --show answer

只看思考内容：
python phi.py \
    --gpu 0 \
    --model "/mnt/sda_8t/winstonYF/model/LLM-Research/Phi-4-mini-flash-reasoning" \
    --prompt "How to solve 3*x^2+4*x+5=1?" \
    --show thinking

保存结果：
python phi.py \
    --gpu 0 \
    --model "/mnt/sda_8t/winstonYF/model/LLM-Research/Phi-4-mini-flash-reasoning" \
    --prompt "How to solve 3*x^2+4*x+5=1?" \
    --save_dir outputs
'''

import os
import re
import json
import argparse


def parse_args():
    parser = argparse.ArgumentParser()

    parser.add_argument(
        "--gpu",
        type=str,
        default="0",
        help='指定使用哪些 GPU，例如 "0"、"1,3"。传 "cpu" 则用 CPU。',
    )

    parser.add_argument(
        "--model",
        type=str,
        default="LLM-Research/Phi-4-mini-flash-reasoning",
        help="模型路径或模型 ID，例如 LLM-Research/Phi-4-mini-flash-reasoning，或本地路径。",
    )

    parser.add_argument(
        "--prompt",
        type=str,
        default="How to solve 3*x^2+4*x+5=1?",
        help="输入问题。",
    )

    parser.add_argument(
        "--max_new_tokens",
        type=int,
        default=32768,
        help="最大生成 token 数。",
    )

    parser.add_argument(
        "--temperature",
        type=float,
        default=0.6,
        help="采样温度。",
    )

    parser.add_argument(
        "--top_p",
        type=float,
        default=0.95,
        help="top_p 采样参数。",
    )

    parser.add_argument(
        "--seed",
        type=int,
        default=0,
        help="随机种子。",
    )

    parser.add_argument(
        "--show",
        type=str,
        default="all",
        choices=["all", "raw", "thinking", "answer", "json"],
        help="输出内容：all/raw/thinking/answer/json。",
    )

    parser.add_argument(
        "--save_dir",
        type=str,
        default=None,
        help="如果指定，则保存 raw.txt、thinking.txt、answer.txt。",
    )

    return parser.parse_args()


def clean_special_tokens(text: str) -> str:
    """
    清理模型输出里的特殊结束符。
    """
    special_tokens = [
        "<|end|>",
        "<|endoftext|>",
        "<|im_end|>",
        "</s>",
    ]

    for token in special_tokens:
        text = text.replace(token, "")

    return text.strip()


def split_thinking_and_answer(text: str):
    """
    从模型原始输出中提取：
    1. thinking: <think>...</think> 中间的内容
    2. answer: </think> 后面的内容

    兼容三种情况：
    - 正常有 <think>...</think>
    - 有 <think> 但没有 </think>
    - 完全没有 <think>
    """

    raw_text = text.strip()

    # 情况 1：标准格式 <think>...</think> answer
    pattern = r"<think>(.*?)</think>(.*)"
    match = re.search(pattern, raw_text, flags=re.DOTALL)

    if match:
        thinking = match.group(1).strip()
        answer = match.group(2).strip()
        answer = clean_special_tokens(answer)
        return thinking, answer

    # 情况 2：有 <think>，但模型没生成 </think>
    if "<think>" in raw_text:
        thinking = raw_text.split("<think>", 1)[1].strip()
        thinking = clean_special_tokens(thinking)
        return thinking, ""

    # 情况 3：没有 thinking 标签，全部当成回答
    answer = clean_special_tokens(raw_text)
    return "", answer


def print_result(raw_text: str, thinking: str, answer: str, show: str):
    if show == "raw":
        print(raw_text)
        return

    if show == "thinking":
        print(thinking)
        return

    if show == "answer":
        print(answer)
        return

    if show == "json":
        obj = {
            "thinking": thinking,
            "answer": answer,
            "raw": raw_text,
        }
        print(json.dumps(obj, ensure_ascii=False, indent=2))
        return

    # 默认 show == "all"
    print("=" * 80)
    print("THINKING")
    print("=" * 80)
    print(thinking if thinking else "[未提取到 thinking 内容]")

    print("\n" + "=" * 80)
    print("ANSWER")
    print("=" * 80)
    print(answer if answer else "[未提取到 answer 内容]")


def save_result(save_dir: str, raw_text: str, thinking: str, answer: str):
    os.makedirs(save_dir, exist_ok=True)

    with open(os.path.join(save_dir, "raw.txt"), "w", encoding="utf-8") as f:
        f.write(raw_text)

    with open(os.path.join(save_dir, "thinking.txt"), "w", encoding="utf-8") as f:
        f.write(thinking)

    with open(os.path.join(save_dir, "answer.txt"), "w", encoding="utf-8") as f:
        f.write(answer)

    with open(os.path.join(save_dir, "result.json"), "w", encoding="utf-8") as f:
        json.dump(
            {
                "thinking": thinking,
                "answer": answer,
                "raw": raw_text,
            },
            f,
            ensure_ascii=False,
            indent=2,
        )

    print(f"\n结果已保存到：{save_dir}")


args = parse_args()

# 关键：必须在 import torch / 加载模型前设置 CUDA_VISIBLE_DEVICES
use_cpu = args.gpu.lower() == "cpu"

if not use_cpu:
    os.environ["CUDA_VISIBLE_DEVICES"] = args.gpu

import torch
from modelscope import AutoModelForCausalLM, AutoTokenizer

torch.random.manual_seed(args.seed)

model_id = args.model

if use_cpu:
    device_map = "cpu"
else:
    if "," in args.gpu:
        # 多卡自动切分
        device_map = "auto"
    else:
        # 单卡时，CUDA_VISIBLE_DEVICES 会把指定物理 GPU 映射成 cuda:0
        device_map = {"": "cuda:0"}

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map=device_map,
    torch_dtype="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True,
)

messages = [
    {
        "role": "user",
        "content": args.prompt,
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
)

input_len = inputs["input_ids"].shape[-1]

if use_cpu:
    input_device = torch.device("cpu")
else:
    input_device = torch.device("cuda:0")

inputs = {k: v.to(input_device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=args.max_new_tokens,
        temperature=args.temperature,
        top_p=args.top_p,
        do_sample=True,
    )

raw_text = tokenizer.batch_decode(
    outputs[:, input_len:],
    skip_special_tokens=False,
)[0]

thinking, answer = split_thinking_and_answer(raw_text)

print_result(
    raw_text=raw_text,
    thinking=thinking,
    answer=answer,
    show=args.show,
)

if args.save_dir:
    save_result(
        save_dir=args.save_dir,
        raw_text=raw_text,
        thinking=thinking,
        answer=answer,
    )

六、运行结果示例

执行命令：

bash 复制代码

python phi.py \
    --gpu 0 \
    --model "/mnt/sda_8t/winstonYF/model/LLM-Research/Phi-4-mini-flash-reasoning" \
    --prompt "How to solve 3*x^2+4*x+5=1?" \
    --save_dir outputs

运行截图如下：

执行后会生成如下文件：

text 复制代码

outputs/
├── answer.txt
├── raw.txt
├── result.json
└── thinking.txt

answer.txt 内容

txt 复制代码

To solve the equation \(3x^2 + 4x + 5 = 1\), follow these steps:

1. **Rearrange the equation** to standard quadratic form:
   \[
   3x^2 + 4x + 5 - 1 = 0 \implies 3x^2 + 4x + 4 = 0
   \]

2. **Identify coefficients**: \(a = 3\), \(b = 4\), \(c = 4\).

3. **Calculate the discriminant**:
   \[
   \Delta = b^2 - 4ac = 4^2 - 4(3)(4) = 16 - 48 = -32
   \]
   Since \(\Delta < 0\), the solutions are complex.

4. **Apply the quadratic formula**:
   \[
   x = \frac{-b \pm \sqrt{\Delta}}{2a} = \frac{-4 \pm \sqrt{-32}}{6} = \frac{-4 \pm 4\sqrt{2}i}{6}
   \]

5. **Simplify the expression** by dividing numerator and denominator by 2:
   \[
   x = \frac{-2 \pm 2\sqrt{2}i}{3}
   \]

**Solutions**:
\[
x = \boxed{-\frac{2}{3} \pm \frac{2\sqrt{2}}{3}i}
\]

raw.txt 内容

txt 复制代码

<think>
Okay, so I need to solve the equation 3x² + 4x + 5 = 1. Hmm, let's see. First, I remember that to solve a quadratic equation, I should get everything on one side so that the equation equals zero. That way, I can maybe factor it or use the quadratic formula. Let me try subtracting 1 from both sides to move that over. 

So, subtracting 1 from both sides gives me 3x² + 4x + 5 - 1 = 0. Simplifying that, 5 - 1 is 4, so now the equation is 3x² + 4x + 4 = 0. Alright, now it's in the standard quadratic form ax² + bx + c = 0, where a is 3, b is 4, and c is 4. 

Next, I need to figure out how to solve this quadratic. I know that if it factors nicely, that would be the easiest way, but I don't remember off the top of my head if 3x² + 4x + 4 factors. Let me check. The product of a and c is 3*4 = 12. I need two numbers that multiply to 12 and add up to b, which is 4. Let's think: factors of 12 are 1 and 12, 2 and 6, 3 and 4. Hmm, 3 and 4 add up to 7, 2 and 6 add up to 8, 1 and 12 add up to 13. None of these pairs add up to 4. So factoring might not work here. 

Since factoring doesn't seem straightforward, I should use the quadratic formula. The quadratic formula is x = [-b ± √(b² - 4ac)] / (2a). Let me plug in the values. Here, a = 3, b = 4, c = 4. 

First, calculate the discriminant, which is b² - 4ac. So that's 4² - 4*3*4. 4 squared is 16. Then, 4*3 is 12, and 12*4 is 48. So the discriminant is 16 - 48, which equals -32. Wait, the discriminant is negative. That means there are no real solutions, only complex ones. 

So, the solutions will be complex numbers. Let me proceed with the quadratic formula. The square root of the discriminant is √(-32). Since the square root of a negative number involves i, the imaginary unit, this becomes √(32)*i. Simplifying √32, which is √(16*2) = 4√2. So, the square root part is 4√2 i. 

Now, plugging back into the formula: x = [-4 ± 4√2 i] / (2*3). The denominator is 6. So, we can factor out a 4 in the numerator. Let's see: [-4 ± 4√2 i]/6. Dividing numerator and denominator by 2, we get [-2 ± 2√2 i]/3. So, the solutions are x = (-2 + 2√2 i)/3 and x = (-2 - 2√2 i)/3. 

Alternatively, this can be written as x = -2/3 ± (2√2)/3 i. To check if this is correct, maybe I should substitute one of the solutions back into the original equation. Let's take x = (-2 + 2√2 i)/3. 

First, compute x². Let's denote x = (-2 + 2√2 i)/3. Squaring this: [(-2)^2 + (2√2 i)^2 + 2*(-2)*(2√2 i)] / (3^2). Calculating each term:

(-2)^2 = 4.

(2√2 i)^2 = (4*2)(i²) = 8*(-1) = -8.

The cross term: 2*(-2)*(2√2 i) = -8√2 i.

So, adding them up: 4 - 8 - 8√2 i = -4 - 8√2 i. Then divide by 9: (-4 - 8√2 i)/9.

Now, multiply x by 4: 4x = 4*(-2 + 2√2 i)/3 = (-8 + 8√2 i)/3.

Original equation is 3x² + 4x + 4 = 0. Let's compute 3x²: 3*[(-4 - 8√2 i)/9] = (-12 - 24√2 i)/9 = (-4 - 8√2 i)/3.

Then add 4x: (-4 - 8√2 i)/3 + (-8 + 8√2 i)/3 = [(-4 -8) + (-8√2 i + 8√2 i)] /3 = (-12 + 0i)/3 = -4.

Then add the last term, which is 4: -4 + 4 = 0. So it checks out. Similarly, the other solution would also check out. 

Therefore, the solutions are complex numbers: x equals negative two-thirds plus or minus two root two over three i. 

Wait, let me just confirm once more that I didn't make a mistake in simplifying the square root of the discriminant. The discriminant was -32, so sqrt(-32) is sqrt(32)*i. sqrt(32) is sqrt(16*2) which is 4*sqrt(2), so yes, that's correct. Then, substituting back into the quadratic formula gives the numerator as -4 ± 4*sqrt(2)i, divided by 6. Dividing numerator and denominator by 2 gives -2 ± 2*sqrt(2)i over 3. That seems right. 

I think that's all. The key steps were moving 1 to the left side, recognizing it's a quadratic equation, checking if it factors (which it doesn't), applying the quadratic formula, simplifying the complex numbers, and verifying the solution by substitution. So the final answer should be the two complex numbers I found.
</think>

To solve the equation \(3x^2 + 4x + 5 = 1\), follow these steps:

1. **Rearrange the equation** to standard quadratic form:
   \[
   3x^2 + 4x + 5 - 1 = 0 \implies 3x^2 + 4x + 4 = 0
   \]

2. **Identify coefficients**: \(a = 3\), \(b = 4\), \(c = 4\).

3. **Calculate the discriminant**:
   \[
   \Delta = b^2 - 4ac = 4^2 - 4(3)(4) = 16 - 48 = -32
   \]
   Since \(\Delta < 0\), the solutions are complex.

4. **Apply the quadratic formula**:
   \[
   x = \frac{-b \pm \sqrt{\Delta}}{2a} = \frac{-4 \pm \sqrt{-32}}{6} = \frac{-4 \pm 4\sqrt{2}i}{6}
   \]

5. **Simplify the expression** by dividing numerator and denominator by 2:
   \[
   x = \frac{-2 \pm 2\sqrt{2}i}{3}
   \]

**Solutions**:
\[
x = \boxed{-\frac{2}{3} \pm \frac{2\sqrt{2}}{3}i}
\]<|end|>

result.json 内容

json 复制代码

{
  "thinking": "Okay, so I need to solve the equation 3x² + 4x + 5 = 1. Hmm, let's see. First, I remember that to solve a quadratic equation, I should get everything on one side so that the equation equals zero. That way, I can maybe factor it or use the quadratic formula. Let me try subtracting 1 from both sides to move that over. \n\nSo, subtracting 1 from both sides gives me 3x² + 4x + 5 - 1 = 0. Simplifying that, 5 - 1 is 4, so now the equation is 3x² + 4x + 4 = 0. Alright, now it's in the standard quadratic form ax² + bx + c = 0, where a is 3, b is 4, and c is 4. \n\nNext, I need to figure out how to solve this quadratic. I know that if it factors nicely, that would be the easiest way, but I don't remember off the top of my head if 3x² + 4x + 4 factors. Let me check. The product of a and c is 3*4 = 12. I need two numbers that multiply to 12 and add up to b, which is 4. Let's think: factors of 12 are 1 and 12, 2 and 6, 3 and 4. Hmm, 3 and 4 add up to 7, 2 and 6 add up to 8, 1 and 12 add up to 13. None of these pairs add up to 4. So factoring might not work here. \n\nSince factoring doesn't seem straightforward, I should use the quadratic formula. The quadratic formula is x = [-b ± √(b² - 4ac)] / (2a). Let me plug in the values. Here, a = 3, b = 4, c = 4. \n\nFirst, calculate the discriminant, which is b² - 4ac. So that's 4² - 4*3*4. 4 squared is 16. Then, 4*3 is 12, and 12*4 is 48. So the discriminant is 16 - 48, which equals -32. Wait, the discriminant is negative. That means there are no real solutions, only complex ones. \n\nSo, the solutions will be complex numbers. Let me proceed with the quadratic formula. The square root of the discriminant is √(-32). Since the square root of a negative number involves i, the imaginary unit, this becomes √(32)*i. Simplifying √32, which is √(16*2) = 4√2. So, the square root part is 4√2 i. \n\nNow, plugging back into the formula: x = [-4 ± 4√2 i] / (2*3). The denominator is 6. So, we can factor out a 4 in the numerator. Let's see: [-4 ± 4√2 i]/6. Dividing numerator and denominator by 2, we get [-2 ± 2√2 i]/3. So, the solutions are x = (-2 + 2√2 i)/3 and x = (-2 - 2√2 i)/3. \n\nAlternatively, this can be written as x = -2/3 ± (2√2)/3 i. To check if this is correct, maybe I should substitute one of the solutions back into the original equation. Let's take x = (-2 + 2√2 i)/3. \n\nFirst, compute x². Let's denote x = (-2 + 2√2 i)/3. Squaring this: [(-2)^2 + (2√2 i)^2 + 2*(-2)*(2√2 i)] / (3^2). Calculating each term:\n\n(-2)^2 = 4.\n\n(2√2 i)^2 = (4*2)(i²) = 8*(-1) = -8.\n\nThe cross term: 2*(-2)*(2√2 i) = -8√2 i.\n\nSo, adding them up: 4 - 8 - 8√2 i = -4 - 8√2 i. Then divide by 9: (-4 - 8√2 i)/9.\n\nNow, multiply x by 4: 4x = 4*(-2 + 2√2 i)/3 = (-8 + 8√2 i)/3.\n\nOriginal equation is 3x² + 4x + 4 = 0. Let's compute 3x²: 3*[(-4 - 8√2 i)/9] = (-12 - 24√2 i)/9 = (-4 - 8√2 i)/3.\n\nThen add 4x: (-4 - 8√2 i)/3 + (-8 + 8√2 i)/3 = [(-4 -8) + (-8√2 i + 8√2 i)] /3 = (-12 + 0i)/3 = -4.\n\nThen add the last term, which is 4: -4 + 4 = 0. So it checks out. Similarly, the other solution would also check out. \n\nTherefore, the solutions are complex numbers: x equals negative two-thirds plus or minus two root two over three i. \n\nWait, let me just confirm once more that I didn't make a mistake in simplifying the square root of the discriminant. The discriminant was -32, so sqrt(-32) is sqrt(32)*i. sqrt(32) is sqrt(16*2) which is 4*sqrt(2), so yes, that's correct. Then, substituting back into the quadratic formula gives the numerator as -4 ± 4*sqrt(2)i, divided by 6. Dividing numerator and denominator by 2 gives -2 ± 2*sqrt(2)i over 3. That seems right. \n\nI think that's all. The key steps were moving 1 to the left side, recognizing it's a quadratic equation, checking if it factors (which it doesn't), applying the quadratic formula, simplifying the complex numbers, and verifying the solution by substitution. So the final answer should be the two complex numbers I found.",
  "answer": "To solve the equation \\(3x^2 + 4x + 5 = 1\\), follow these steps:\n\n1. **Rearrange the equation** to standard quadratic form:\n   \\[\n   3x^2 + 4x + 5 - 1 = 0 \\implies 3x^2 + 4x + 4 = 0\n   \\]\n\n2. **Identify coefficients**: \\(a = 3\\), \\(b = 4\\), \\(c = 4\\).\n\n3. **Calculate the discriminant**:\n   \\[\n   \\Delta = b^2 - 4ac = 4^2 - 4(3)(4) = 16 - 48 = -32\n   \\]\n   Since \\(\\Delta < 0\\), the solutions are complex.\n\n4. **Apply the quadratic formula**:\n   \\[\n   x = \\frac{-b \\pm \\sqrt{\\Delta}}{2a} = \\frac{-4 \\pm \\sqrt{-32}}{6} = \\frac{-4 \\pm 4\\sqrt{2}i}{6}\n   \\]\n\n5. **Simplify the expression** by dividing numerator and denominator by 2:\n   \\[\n   x = \\frac{-2 \\pm 2\\sqrt{2}i}{3}\n   \\]\n\n**Solutions**:\n\\[\nx = \\boxed{-\\frac{2}{3} \\pm \\frac{2\\sqrt{2}}{3}i}\n\\]",
  "raw": "<think>\nOkay, so I need to solve the equation 3x² + 4x + 5 = 1. Hmm, let's see. First, I remember that to solve a quadratic equation, I should get everything on one side so that the equation equals zero. That way, I can maybe factor it or use the quadratic formula. Let me try subtracting 1 from both sides to move that over. \n\nSo, subtracting 1 from both sides gives me 3x² + 4x + 5 - 1 = 0. Simplifying that, 5 - 1 is 4, so now the equation is 3x² + 4x + 4 = 0. Alright, now it's in the standard quadratic form ax² + bx + c = 0, where a is 3, b is 4, and c is 4. \n\nNext, I need to figure out how to solve this quadratic. I know that if it factors nicely, that would be the easiest way, but I don't remember off the top of my head if 3x² + 4x + 4 factors. Let me check. The product of a and c is 3*4 = 12. I need two numbers that multiply to 12 and add up to b, which is 4. Let's think: factors of 12 are 1 and 12, 2 and 6, 3 and 4. Hmm, 3 and 4 add up to 7, 2 and 6 add up to 8, 1 and 12 add up to 13. None of these pairs add up to 4. So factoring might not work here. \n\nSince factoring doesn't seem straightforward, I should use the quadratic formula. The quadratic formula is x = [-b ± √(b² - 4ac)] / (2a). Let me plug in the values. Here, a = 3, b = 4, c = 4. \n\nFirst, calculate the discriminant, which is b² - 4ac. So that's 4² - 4*3*4. 4 squared is 16. Then, 4*3 is 12, and 12*4 is 48. So the discriminant is 16 - 48, which equals -32. Wait, the discriminant is negative. That means there are no real solutions, only complex ones. \n\nSo, the solutions will be complex numbers. Let me proceed with the quadratic formula. The square root of the discriminant is √(-32). Since the square root of a negative number involves i, the imaginary unit, this becomes √(32)*i. Simplifying √32, which is √(16*2) = 4√2. So, the square root part is 4√2 i. \n\nNow, plugging back into the formula: x = [-4 ± 4√2 i] / (2*3). The denominator is 6. So, we can factor out a 4 in the numerator. Let's see: [-4 ± 4√2 i]/6. Dividing numerator and denominator by 2, we get [-2 ± 2√2 i]/3. So, the solutions are x = (-2 + 2√2 i)/3 and x = (-2 - 2√2 i)/3. \n\nAlternatively, this can be written as x = -2/3 ± (2√2)/3 i. To check if this is correct, maybe I should substitute one of the solutions back into the original equation. Let's take x = (-2 + 2√2 i)/3. \n\nFirst, compute x². Let's denote x = (-2 + 2√2 i)/3. Squaring this: [(-2)^2 + (2√2 i)^2 + 2*(-2)*(2√2 i)] / (3^2). Calculating each term:\n\n(-2)^2 = 4.\n\n(2√2 i)^2 = (4*2)(i²) = 8*(-1) = -8.\n\nThe cross term: 2*(-2)*(2√2 i) = -8√2 i.\n\nSo, adding them up: 4 - 8 - 8√2 i = -4 - 8√2 i. Then divide by 9: (-4 - 8√2 i)/9.\n\nNow, multiply x by 4: 4x = 4*(-2 + 2√2 i)/3 = (-8 + 8√2 i)/3.\n\nOriginal equation is 3x² + 4x + 4 = 0. Let's compute 3x²: 3*[(-4 - 8√2 i)/9] = (-12 - 24√2 i)/9 = (-4 - 8√2 i)/3.\n\nThen add 4x: (-4 - 8√2 i)/3 + (-8 + 8√2 i)/3 = [(-4 -8) + (-8√2 i + 8√2 i)] /3 = (-12 + 0i)/3 = -4.\n\nThen add the last term, which is 4: -4 + 4 = 0. So it checks out. Similarly, the other solution would also check out. \n\nTherefore, the solutions are complex numbers: x equals negative two-thirds plus or minus two root two over three i. \n\nWait, let me just confirm once more that I didn't make a mistake in simplifying the square root of the discriminant. The discriminant was -32, so sqrt(-32) is sqrt(32)*i. sqrt(32) is sqrt(16*2) which is 4*sqrt(2), so yes, that's correct. Then, substituting back into the quadratic formula gives the numerator as -4 ± 4*sqrt(2)i, divided by 6. Dividing numerator and denominator by 2 gives -2 ± 2*sqrt(2)i over 3. That seems right. \n\nI think that's all. The key steps were moving 1 to the left side, recognizing it's a quadratic equation, checking if it factors (which it doesn't), applying the quadratic formula, simplifying the complex numbers, and verifying the solution by substitution. So the final answer should be the two complex numbers I found.\n</think>\n\nTo solve the equation \\(3x^2 + 4x + 5 = 1\\), follow these steps:\n\n1. **Rearrange the equation** to standard quadratic form:\n   \\[\n   3x^2 + 4x + 5 - 1 = 0 \\implies 3x^2 + 4x + 4 = 0\n   \\]\n\n2. **Identify coefficients**: \\(a = 3\\), \\(b = 4\\), \\(c = 4\\).\n\n3. **Calculate the discriminant**:\n   \\[\n   \\Delta = b^2 - 4ac = 4^2 - 4(3)(4) = 16 - 48 = -32\n   \\]\n   Since \\(\\Delta < 0\\), the solutions are complex.\n\n4. **Apply the quadratic formula**:\n   \\[\n   x = \\frac{-b \\pm \\sqrt{\\Delta}}{2a} = \\frac{-4 \\pm \\sqrt{-32}}{6} = \\frac{-4 \\pm 4\\sqrt{2}i}{6}\n   \\]\n\n5. **Simplify the expression** by dividing numerator and denominator by 2:\n   \\[\n   x = \\frac{-2 \\pm 2\\sqrt{2}i}{3}\n   \\]\n\n**Solutions**:\n\\[\nx = \\boxed{-\\frac{2}{3} \\pm \\frac{2\\sqrt{2}}{3}i}\n\\]<|end|>"
}

thinking.txt 内容

txt 复制代码

Okay, so I need to solve the equation 3x² + 4x + 5 = 1. Hmm, let's see. First, I remember that to solve a quadratic equation, I should get everything on one side so that the equation equals zero. That way, I can maybe factor it or use the quadratic formula. Let me try subtracting 1 from both sides to move that over. 

So, subtracting 1 from both sides gives me 3x² + 4x + 5 - 1 = 0. Simplifying that, 5 - 1 is 4, so now the equation is 3x² + 4x + 4 = 0. Alright, now it's in the standard quadratic form ax² + bx + c = 0, where a is 3, b is 4, and c is 4. 

Next, I need to figure out how to solve this quadratic. I know that if it factors nicely, that would be the easiest way, but I don't remember off the top of my head if 3x² + 4x + 4 factors. Let me check. The product of a and c is 3*4 = 12. I need two numbers that multiply to 12 and add up to b, which is 4. Let's think: factors of 12 are 1 and 12, 2 and 6, 3 and 4. Hmm, 3 and 4 add up to 7, 2 and 6 add up to 8, 1 and 12 add up to 13. None of these pairs add up to 4. So factoring might not work here. 

Since factoring doesn't seem straightforward, I should use the quadratic formula. The quadratic formula is x = [-b ± √(b² - 4ac)] / (2a). Let me plug in the values. Here, a = 3, b = 4, c = 4. 

First, calculate the discriminant, which is b² - 4ac. So that's 4² - 4*3*4. 4 squared is 16. Then, 4*3 is 12, and 12*4 is 48. So the discriminant is 16 - 48, which equals -32. Wait, the discriminant is negative. That means there are no real solutions, only complex ones. 

So, the solutions will be complex numbers. Let me proceed with the quadratic formula. The square root of the discriminant is √(-32). Since the square root of a negative number involves i, the imaginary unit, this becomes √(32)*i. Simplifying √32, which is √(16*2) = 4√2. So, the square root part is 4√2 i. 

Now, plugging back into the formula: x = [-4 ± 4√2 i] / (2*3). The denominator is 6. So, we can factor out a 4 in the numerator. Let's see: [-4 ± 4√2 i]/6. Dividing numerator and denominator by 2, we get [-2 ± 2√2 i]/3. So, the solutions are x = (-2 + 2√2 i)/3 and x = (-2 - 2√2 i)/3. 

Alternatively, this can be written as x = -2/3 ± (2√2)/3 i. To check if this is correct, maybe I should substitute one of the solutions back into the original equation. Let's take x = (-2 + 2√2 i)/3. 

First, compute x². Let's denote x = (-2 + 2√2 i)/3. Squaring this: [(-2)^2 + (2√2 i)^2 + 2*(-2)*(2√2 i)] / (3^2). Calculating each term:

(-2)^2 = 4.

(2√2 i)^2 = (4*2)(i²) = 8*(-1) = -8.

The cross term: 2*(-2)*(2√2 i) = -8√2 i.

So, adding them up: 4 - 8 - 8√2 i = -4 - 8√2 i. Then divide by 9: (-4 - 8√2 i)/9.

Now, multiply x by 4: 4x = 4*(-2 + 2√2 i)/3 = (-8 + 8√2 i)/3.

Original equation is 3x² + 4x + 4 = 0. Let's compute 3x²: 3*[(-4 - 8√2 i)/9] = (-12 - 24√2 i)/9 = (-4 - 8√2 i)/3.

Then add 4x: (-4 - 8√2 i)/3 + (-8 + 8√2 i)/3 = [(-4 -8) + (-8√2 i + 8√2 i)] /3 = (-12 + 0i)/3 = -4.

Then add the last term, which is 4: -4 + 4 = 0. So it checks out. Similarly, the other solution would also check out. 

Therefore, the solutions are complex numbers: x equals negative two-thirds plus or minus two root two over three i. 

Wait, let me just confirm once more that I didn't make a mistake in simplifying the square root of the discriminant. The discriminant was -32, so sqrt(-32) is sqrt(32)*i. sqrt(32) is sqrt(16*2) which is 4*sqrt(2), so yes, that's correct. Then, substituting back into the quadratic formula gives the numerator as -4 ± 4*sqrt(2)i, divided by 6. Dividing numerator and denominator by 2 gives -2 ± 2*sqrt(2)i over 3. That seems right. 

I think that's all. The key steps were moving 1 to the left side, recognizing it's a quadratic equation, checking if it factors (which it doesn't), applying the quadratic formula, simplifying the complex numbers, and verifying the solution by substitution. So the final answer should be the two complex numbers I found.

七、常见问题记录

问题 1：`No module named 'flash_attn'`

原因：flash_attn 没有安装成功。

解决：

bash 复制代码

python -c "import flash_attn; print(flash_attn.__version__)"

如果这句报错，重新执行第八节中的 flash-attn 安装步骤。

问题 2：`Invalid cross-device link`

原因：pip 构建 flash_attn wheel 时，临时目录和缓存目录跨磁盘，rename/move 失败。

解决：

bash 复制代码

TMPDIR=/mnt/sda_8t/winstonYF/tmp/pip_tmp \
PIP_CACHE_DIR=/mnt/sda_8t/winstonYF/tmp/pip_cache \
pip install flash_attn==2.7.4.post1 \
  --no-build-isolation \
  --no-cache-dir \
  -i https://pypi.tuna.tsinghua.edu.cn/simple

如果还是不稳定，可以使用前文给出的 GitHub 预编译 wheel 下载方式。

问题 3：`causal_conv1d_cuda` / `selective_scan_cuda` 缺失

不要这样装：

bash 复制代码

pip install causal_conv1d_cuda selective_scan_cuda

正确方式是：

bash 复制代码

pip install causal-conv1d==1.5.0.post8 --no-build-isolation
pip install mamba-ssm==2.2.4 --no-build-isolation

这两个 CUDA 模块会分别由 causal-conv1d 和 mamba-ssm 生成。

问题 4：`undefined symbol: _ZN3c105ErrorC2E...`（FlashAttention 加载失败）

原因：PyTorch 版本与 flash-attn 编译时使用的版本不一致，常见于 torch 被意外升级（例如通过 conda 安装 cuda-nvcc 时自动升级）。

解决：严格保持 torch==2.6.0，推荐使用方案二 新建独立环境，并利用 conda 安装 cuda-nvcc=12.4 后立即固定 torch 版本。

八、关键经验总结

这次部署过程中，有几个比较关键的点：

Python 版本建议使用 3.10

Python 3.10 对深度学习生态里的 CUDA 扩展兼容性更好。
PyTorch、CUDA、flash-attn 版本要匹配

本文使用的是 torch==2.6.0+cu124 和 flash_attn==2.7.4.post1。
Mamba 相关扩展不要单独安装内部模块名
causal_conv1d_cuda 来自 causal-conv1d，selective_scan_cuda 来自 mamba-ssm。
flash-attn 构建时注意 pip 临时目录问题

如果遇到 Invalid cross-device link，可以通过统一 TMPDIR 和 PIP_CACHE_DIR 到同一个磁盘目录解决。
模型输出可以用正则拆分 thinking 和 answer

如果模型输出格式为 <think>...</think>，可以通过正则表达式将思考过程和最终回答分开保存，便于后续分析和整理。
严格避免 torch 版本漂移

在使用 conda install cuda-nvcc 等操作后，务必检查 torch 版本是否被改变，必要时重新安装固定版本的 torch。

九、完整流程回顾

完整流程可以概括为：

text 复制代码

创建 Conda 环境
  ↓
安装 PyTorch CUDA 12.4
  ↓
安装 Transformers / Accelerate / ModelScope
  ↓
安装 causal-conv1d 和 mamba-ssm
  ↓
安装 flash-attn
  ↓
下载 Phi-4-mini-flash-reasoning 模型
  ↓
编写基础推理脚本 phi.py
  ↓
运行模型并验证输出
  ↓
改进脚本，分离 thinking / answer
  ↓
保存 raw.txt、thinking.txt、answer.txt、result.json

最终，本地环境已经可以成功运行 Phi-4-mini-flash-reasoning，并且可以把模型输出中的推理过程和最终答案分别保存下来，方便后续测试、分析和写博客记录。

Phi-4-mini-flash-reasoning 部署安装与推理测试完整记录

文章目录