# 2026.5 LLaMA Factory 微调模型使用 llama.cpp 量化 Qwen3.5 模型实操文档

2026.5 LLaMA Factory 微调模型使用 llama.cpp 量化 Qwen3.5 模型实操文档

文档说明

适用场景 ：将 LLaMA Factory 微调并合并后的 Qwen3.5 模型（HuggingFace 格式）转换为 llama.cpp 支持的 GGUF 格式，并完成量化推理
核心问题：Qwen3.5 自带 MTP 模块，llama.cpp 不兼容，转换时需禁用
测试环境：Linux 系统、llama.cpp 最新版、Qwen3.5 全量合并模型

一、前置准备

1. 环境要求

已安装 Python、PyTorch、git
已完成 LLaMA Factory 微调，且合并 LoRA 权重（得到完整 HF 格式模型）
模型目录包含：model.safetensors、config.json、tokenizer.json 等核心文件

2. 模型路径

本文默认模型路径：/mnt/workspace/LLaMA-Factory/saves/merge/qwen3.5_sft_merged

二、步骤1：拉取并编译最新版 llama.cpp

llama.cpp 需最新版才能支持 Qwen3.5 架构

bash 复制代码

# 进入工作目录
cd /mnt/workspace

# 克隆 llama.cpp（已克隆则跳过）
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

# 拉取最新代码
git pull

# 编译
cmake -B build
cmake --build build --config Release

三、步骤2：HF 格式 → GGUF 格式（核心步骤）

必须添加 --no-mtp 参数，禁用 Qwen3.5 专属 MTP 模块，解决张量缺失报错

bash 复制代码

# 回到 llama.cpp 根目录
cd /mnt/workspace/llama.cpp

# 执行转换命令（复制直接运行）
python convert_hf_to_gguf.py \
/mnt/workspace/LLaMA-Factory/saves/merge/qwen3.5_sft_merged \
--outfile qwen3.5_sft_merged_f16.gguf \
--no-mtp \
--outtype f16

输出文件：qwen3.5_sft_merged_f16.gguf（FP16 精度基础模型）

四、步骤3：GGUF 模型量化（推荐 q4_K_M）

q4_K_M 是平衡速度与精度的最优量化方案

bash 复制代码

# 量化命令（使用新版 llama-quantize 工具）
./build/bin/llama-quantize \
qwen3.5_sft_merged_f16.gguf \
qwen3.5_sft_merged_q4_K_M.gguf \
q4_K_M

输出文件：qwen3.5_sft_merged_q4_K_M.gguf（最终量化模型）

五、步骤4：模型推理测试

1. 单次指令测试

bash 复制代码

./build/bin/llama-cli \
--model /mnt/workspace/llama.cpp/qwen3.5_sft_merged_q4_K_M.gguf \
--chat-template chatml \
-p "你好"

2. 交互式对话模式

bash 复制代码

./build/bin/llama-cli \
--model /mnt/workspace/llama.cpp/qwen3.5_sft_merged_q4_K_M.gguf \
--chat-template chatml \
--conversation

六、核心参数说明

参数	作用	必要性
`--no-mtp`	禁用 Qwen3.5 专属 MTP 模块，解决张量缺失报错	必选
`--outtype f16`	输出 FP16 精度 GGUF 模型	推荐
`--chat-template chatml`	适配 Qwen3.5 官方对话模板	必选
`q4_K_M`	量化格式（平衡速度/精度）	推荐

七、常见报错与解决方案

1. 报错：`missing tensor 'blk.24.attn_norm.weight'`

原因：未禁用 Qwen3.5 MTP 模块
解决：转换时必须加 --no-mtp 参数

2. 报错：`failed to open GGUF file`

原因：Linux 路径使用 Windows 反斜杠 \，或路径错误
解决：统一使用正斜杠 /，用 ls 验证文件存在

# 2026.5 LLaMA Factory 微调模型 使用 llama.cpp 量化 Qwen3.5 模型实操文档

2026.5 LLaMA Factory 微调模型 使用 llama.cpp 量化 Qwen3.5 模型实操文档

文档说明

一、前置准备

1. 环境要求

2. 模型路径

二、步骤1：拉取并编译最新版 llama.cpp

三、步骤2：HF 格式 → GGUF 格式（核心步骤）

四、步骤3：GGUF 模型量化（推荐 q4_K_M）

五、步骤4：模型推理测试

1. 单次指令测试

2. 交互式对话模式

六、核心参数说明

七、常见报错与解决方案

1. 报错：missing tensor 'blk.24.attn_norm.weight'

2. 报错：failed to open GGUF file

# 2026.5 LLaMA Factory 微调模型使用 llama.cpp 量化 Qwen3.5 模型实操文档

2026.5 LLaMA Factory 微调模型使用 llama.cpp 量化 Qwen3.5 模型实操文档

1. 报错：`missing tensor 'blk.24.attn_norm.weight'`

2. 报错：`failed to open GGUF file`