【LLaMA-Factory】使用LoRa微调训练DeepSeek-R1-Distill-Qwen-7B

本地环境说明
禁用开源驱动nouveau
安装nvidia-smi
安装Git环境
安装Anaconda(conda)环境
下载`DeepSeek-R1-Distill-Qwen-7B`模型
安装LLaMA-Factory
下载LLaMA-Factory
安装LLaMA-Factory依赖
修改环境变量
安装deepspeed
Alpaca数据集准备
lora配置文件准备
设置GPU个数
启动微调
查看微调时GPU情况
启动webui服务
对话
损失函数曲线
参考资料

本地环境说明

依赖	版本
Linux	BigCloud Enterprise Linux 8.6
GPU	NVIDIA Tesla T4 16G * 8

禁用开源驱动nouveau

如果不禁用开源驱动，直接安装nvidia-smi，会安装失败，在日志文件/var/log/nvidia-installer.log中会出现以下错误信息
ERROR: Unable to load the kernel module 'nvidia.ko'

查看nouveau是否在运行，先输入指令

shell 复制代码

lsmod | grep nouveau

如果不出现一下的情况则已经禁用

shell 复制代码

nouveau              2334720  0
video                  57344  1 nouveau
mxm_wmi                16384  1 nouveau
drm_kms_helper        262144  5 drm_vram_helper,ast,nouveau
ttm                   114688  3 drm_vram_helper,drm_ttm_helper,nouveau
i2c_algo_bit           16384  3 igb,ast,nouveau
drm                   614400  7 drm_kms_helper,drm_vram_helper,ast,drm_ttm_helper,ttm,nouveau
i2c_core               98304  9 drm_kms_helper,i2c_algo_bit,igb,ast,i2c_smbus,i2c_i801,ipmi_ssif,nouveau,drm
wmi                    32768  2 mxm_wmi,nouveau

禁用nouveau

shell 复制代码

# 如果文件不存在，就创建
sudo sh -c 'cat > /etc/modprobe.d/blacklist.conf << EOF
blacklist nouveau
options nouveau modeset=0
EOF'

# 重新生成 initramfs
sudo dracut --force
# 重启机器
sudo reboot

安装nvidia-smi

浏览器访问: https://www.nvidia.com/en-us/drivers/ 需要梯子(科学上网)，才能加载Manual Driver Search
填写驱动信息
驱动查询结果
驱动下载页面
下载驱动文件，得到NVIDIA-Linux-x86_64-570.133.20.run，上传到Linux机器上不能复制下载地址，然后在机器上使用wget命令直接下载，这样请求会返回403
安装驱动

必须使用root权限安装

-no-x-check #安装驱动时关闭X服务

-no-nouveau-check #安装驱动时禁用nouveau

-no-opengl-files #只安装驱动文件，不安装OpenGL文件

安装的时候，出现内核模块类型选择，根据提示选择NVIDIA Proprietary，使用左右键控制选择，然后回车

shell 复制代码

Multiple kernel module types are available for this system. Which would you like to use?
  NVIDIA Proprietary        MIT/GPL

安装完运行命令确认驱动安装成功

shell 复制代码

nvidia-smi

显卡信息如下

shell 复制代码

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20             Driver Version: 570.133.20     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:3D:00.0 Off |                    0 |
| N/A   51C    P0             27W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla T4                       Off |   00000000:3E:00.0 Off |                    0 |
| N/A   54C    P0             28W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Tesla T4                       Off |   00000000:40:00.0 Off |                    0 |
| N/A   51C    P0             27W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Tesla T4                       Off |   00000000:41:00.0 Off |                    0 |
| N/A   48C    P0             26W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  Tesla T4                       Off |   00000000:B1:00.0 Off |                    0 |
| N/A   47C    P0             26W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  Tesla T4                       Off |   00000000:B2:00.0 Off |                    0 |
| N/A   53C    P0             30W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  Tesla T4                       Off |   00000000:B4:00.0 Off |                    0 |
| N/A   56C    P0             30W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  Tesla T4                       Off |   00000000:B5:00.0 Off |                    0 |
| N/A   55C    P0             30W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

安装Git环境

安装git和git-lfs

shell 复制代码

sudo dnf install git git-lfs

安装Anaconda(conda)环境

下载页面: https://www.anaconda.com/download/success
64-Bit (x86) Installer下载

shell 复制代码

wget https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Linux-x86_64.sh

安装

shell 复制代码

sudo sh Anaconda3-2024.10-1-Linux-x86_64.sh

会出现很多信息，一路yes下去，观看文档用q跳过

shell 复制代码

Do you accept the license terms? [yes|no]
>>> yes

Anaconda3 will now be installed into this location:
/root/anaconda3
  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below
[/root/anaconda3] >>> /data/ProgramFiles/anaconda3

You can undo this by running `conda init --reverse $SHELL`? [yes|no]
[no] >>> yes

设置环境变量

shell 复制代码

cat >> ~/.bash_profile << EOF
export ANACONDA3_HOME=/data/ProgramFiles/anaconda3
export CONDA_ENVS_PATH=\$ANACONDA3_HOME/envs
export PATH="\$ANACONDA3_HOME/bin:$PATH"
EOF
source ~/.bash_profile

# 目录是使用root权限安装的，对目录进行授权
sudo chown -R tkyj.tkyj /data

查看conda版本以验证是否安装成功

shell 复制代码

conda -V

配置镜像源

shell 复制代码

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes

下载`DeepSeek-R1-Distill-Qwen-7B`模型

魔搭社区: https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

shell 复制代码

mkdir -pv /data/llm/models
cd /data/llm/models

# 如果您希望跳过 lfs 大文件下载，可以使用如下命令
GIT_LFS_SKIP_SMUDGE=1 git clone https://www.modelscope.cn/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B.git
# 请确保lfs已经被正确安装
cd DeepSeek-R1-Distill-Qwen-7B
git lfs install
# 下载大文件
git lfs pull

安装LLaMA-Factory

软件要求

Mandatory	Minimum	Recommend
python	3.9	3.10
torch	2.0.0	2.6.0
transformers	4.45.0	4.50.0
datasets	2.16.0	3.2.0
accelerate	0.34.0	1.2.1
peft	0.14.0	0.15.1
trl	0.8.6	0.9.6

Optional	Minimum	Recommend
CUDA	11.6	12.2
deepspeed	0.10.0	0.16.4
bitsandbytes	0.39.0	0.43.1
vllm	0.4.3	0.8.2
flash-attn	2.5.6	2.7.2

硬件要求

Method	Bits	7B	14B	30B	70B	`x`B
Full (`bf16` or `fp16`)	32	120GB	240GB	600GB	1200GB	`18x`GB
Full (`pure_bf16`)	16	60GB	120GB	300GB	600GB	`8x`GB
Freeze/LoRA/GaLore/APOLLO/BAdam	16	16GB	32GB	64GB	160GB	`2x`GB
QLoRA	8	10GB	20GB	40GB	80GB	`x`GB
QLoRA	4	6GB	12GB	24GB	48GB	`x/2`GB
QLoRA	2	4GB	8GB	16GB	24GB	`x/4`GB

下载LLaMA-Factory

使用git克隆项目

shell 复制代码

cd /data/ProgramFiles
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git

安装LLaMA-Factory依赖

shell 复制代码

cd LLaMA-Factory
conda create --name llama_factory  python=3.10
conda activate llama_factory
# 从torch官网匹配cuda 12.8版本
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -e ".[torch,metrics]" -i https://pypi.tuna.tsinghua.edu.cn/simple

# 验证安装
llamafactory-cli version

修改环境变量

方便每次登录终端自动切换到对应的Python环境

shell 复制代码

cat >> ~/.bash_profile << EOF
conda activate llama_factory
EOF
source ~/.bash_profile

安装deepspeed

单卡16G显存微调7B模型有点吃紧，可以使用deepspeed解决

shell 复制代码

# 查询cuda版本
conda search cuda -c nvidia
# 安装与nvidia-smi对应的版本(安装cuda是安装deepspeed的前置条件)
conda install cuda=12.8.1 -c nvidia
DS_BUILD_CPU_ADAM=1 pip3 install deepspeed==0.16.7

Alpaca数据集准备

alpaca示例格式

shell 复制代码

[
  {
    "instruction": "Invent a new word by combining two existing words.",
    "input": "",
    "output": "\"Chillax\" - a combination of the words \"chill\" and \"relax\". It means to calm down and unwind, enjoying oneself in a laidback manner."
  },
  {
    "instruction": "Transform the following sentence using a synonym: The car sped quickly.",
    "input": "",
    "output": "The car accelerated rapidly."
  }

修改LLaMA-Factory/data/dataset_info.json，增加如下信息

text 复制代码

"cmic_financial_apaca": {
  "file_name": "/data/llm/dataset/cmic_financial_apaca.json",
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "response": "output",
    "system": "system"
  }
}

lora配置文件准备

备份原始文件

shell 复制代码

cd LLaMA-Factory/examples/train_lora
# 从原始文件复制一份新的出来
cp llama3_lora_sft.yaml ds_qwen7b_lora_sft.yaml
vi examples/train_lora/ds_qwen7b_lora_sft.yaml

修改ds_qwen7b_lora_sft.yaml，主要修改如下字段
- model_name_or_path
- dataset
- template
- cutoff_len
- max_samples
- output_dir
需要关注以下参数
- model_name_or_path: 模型路径
- dataset: 数据集名称，对应上面声明的cmic_financial_apaca
- template: 模版
- cutoff_len: 控制输入序列的最大长度
- output_dir: 微调后权重保存路径
- gradient_accumulation_steps: 梯度累积的步数，GPU资源不足时需要减少该值
- num_train_epochs: 训练的轮数
ds_qwen7b_lora_sft.yaml完整内容如下

yaml 复制代码

### model
model_name_or_path: /data/llm/models/DeepSeek-R1-Distill-Qwen-7B
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all

### dataset
dataset: cmic_financial_apaca
template: deepseek3
cutoff_len: 4096
max_samples: 4019
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

### output
output_dir: /data/llm/models/sft/DeepSeek-R1-Distill-Qwen-7B
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

### eval
#eval_dataset: alpaca_en_demo
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

设置GPU个数

多卡并行时，设置GPU个数，修改: LLaMA-Factory/examples/accelerate/fsdp_config.yaml

shell 复制代码

num_processes: 8  # the number of GPUs in all nodes

启动微调

shell 复制代码

conda activate llama_factory
 
# 后台运行
nohup llamafactory-cli train /data/ProgramFiles/LLaMA-Factory/examples/train_lora/ds_qwen7b_lora_sft.yaml > nohup.log 2>&1 &

# 查看日志
tail -fn200 nohup.log

查看微调时GPU情况

运行命令: watch -n 0.5 nvidia-smi

shell 复制代码

Every 0.5s: nvidia-smi                                                                                                                                             localhost.localdomain: Mon Apr 28 15:33:10 2025

Mon Apr 28 15:33:14 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20             Driver Version: 570.133.20     CUDA Version:      |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:3D:00.0 Off |                    0 |
| N/A   39C    P8             19W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla T4                       Off |   00000000:3E:00.0 Off |                    0 |
| N/A   40C    P0             26W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Tesla T4                       Off |   00000000:40:00.0 Off |                    0 |
| N/A   37C    P0             26W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Tesla T4                       Off |   00000000:41:00.0 Off |                    0 |
| N/A   37C    P0             25W /   70W |       0MiB /  15360MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  Tesla T4                       Off |   00000000:B1:00.0 Off |                    0 |
| N/A   36C    P8             13W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  Tesla T4                       Off |   00000000:B2:00.0 Off |                    0 |
| N/A   40C    P8             15W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  Tesla T4                       Off |   00000000:B4:00.0 Off |                    0 |
| N/A   42C    P8             15W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  Tesla T4                       Off |   00000000:B5:00.0 Off |                    0 |
| N/A   41C    P8             11W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

启动webui服务

shell 复制代码

# 关闭防火墙，方便访问web服务端口
sudo systemctl stop firewalld

GRADIO_SHARE=1
nohup llamafactory-cli webui > webui.log 2>&1 &
tail -fn200 webui.log

对话

shell 复制代码

llamafactory-cli chat /data/ProgramFiles/LLaMA-Factory/examples/inference/ds_qwen7b_lora_sft.yaml

【LLaMA-Factory】使用LoRa微调训练DeepSeek-R1-Distill-Qwen-7B

【LLaMA-Factory】使用LoRa微调训练DeepSeek-R1-Distill-Qwen-7B

本地环境说明

禁用开源驱动nouveau

安装nvidia-smi

安装Git环境

安装Anaconda(conda)环境

下载DeepSeek-R1-Distill-Qwen-7B模型

安装LLaMA-Factory

下载LLaMA-Factory

安装LLaMA-Factory依赖

修改环境变量

安装deepspeed

Alpaca数据集准备

lora配置文件准备

设置GPU个数

启动微调

查看微调时GPU情况

启动webui服务

对话

损失函数曲线

参考资料

下载`DeepSeek-R1-Distill-Qwen-7B`模型