【LLaMA-Factory】使用LoRa微调训练DeepSeek-R1-Distill-Qwen-7B

【LLaMA-Factory】使用LoRa微调训练DeepSeek-R1-Distill-Qwen-7B

本地环境说明

依赖 版本
Linux BigCloud Enterprise Linux 8.6
GPU NVIDIA Tesla T4 16G * 8

禁用开源驱动nouveau

如果不禁用开源驱动,直接安装nvidia-smi,会安装失败,在日志文件/var/log/nvidia-installer.log中会出现以下错误信息
ERROR: Unable to load the kernel module 'nvidia.ko'

  • 查看nouveau是否在运行,先输入指令
shell 复制代码
lsmod | grep nouveau

如果不出现一下的情况则已经禁用

shell 复制代码
nouveau              2334720  0
video                  57344  1 nouveau
mxm_wmi                16384  1 nouveau
drm_kms_helper        262144  5 drm_vram_helper,ast,nouveau
ttm                   114688  3 drm_vram_helper,drm_ttm_helper,nouveau
i2c_algo_bit           16384  3 igb,ast,nouveau
drm                   614400  7 drm_kms_helper,drm_vram_helper,ast,drm_ttm_helper,ttm,nouveau
i2c_core               98304  9 drm_kms_helper,i2c_algo_bit,igb,ast,i2c_smbus,i2c_i801,ipmi_ssif,nouveau,drm
wmi                    32768  2 mxm_wmi,nouveau
  • 禁用nouveau
shell 复制代码
# 如果文件不存在,就创建
sudo sh -c 'cat > /etc/modprobe.d/blacklist.conf << EOF
blacklist nouveau
options nouveau modeset=0
EOF'

# 重新生成 initramfs
sudo dracut --force
# 重启机器
sudo reboot

安装nvidia-smi

  • 浏览器访问: https://www.nvidia.com/en-us/drivers/ 需要梯子(科学上网),才能加载Manual Driver Search
  • 填写驱动信息
  • 驱动查询结果
  • 驱动下载页面
  • 下载驱动文件,得到NVIDIA-Linux-x86_64-570.133.20.run,上传到Linux机器上 不能复制下载地址,然后在机器上使用wget命令直接下载,这样请求会返回403
  • 安装驱动
  • 必须使用root权限安装
  • -no-x-check #安装驱动时关闭X服务
  • -no-nouveau-check #安装驱动时禁用nouveau
  • -no-opengl-files #只安装驱动文件,不安装OpenGL文件
  • 安装的时候,出现内核模块类型选择,根据提示选择NVIDIA Proprietary,使用左右键控制选择,然后回车
shell 复制代码
Multiple kernel module types are available for this system. Which would you like to use?
  NVIDIA Proprietary        MIT/GPL   
  • 安装完运行命令确认驱动安装成功
shell 复制代码
nvidia-smi

显卡信息如下

shell 复制代码
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20             Driver Version: 570.133.20     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:3D:00.0 Off |                    0 |
| N/A   51C    P0             27W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla T4                       Off |   00000000:3E:00.0 Off |                    0 |
| N/A   54C    P0             28W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Tesla T4                       Off |   00000000:40:00.0 Off |                    0 |
| N/A   51C    P0             27W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Tesla T4                       Off |   00000000:41:00.0 Off |                    0 |
| N/A   48C    P0             26W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  Tesla T4                       Off |   00000000:B1:00.0 Off |                    0 |
| N/A   47C    P0             26W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  Tesla T4                       Off |   00000000:B2:00.0 Off |                    0 |
| N/A   53C    P0             30W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  Tesla T4                       Off |   00000000:B4:00.0 Off |                    0 |
| N/A   56C    P0             30W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  Tesla T4                       Off |   00000000:B5:00.0 Off |                    0 |
| N/A   55C    P0             30W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

安装Git环境

安装gitgit-lfs

shell 复制代码
sudo dnf install git git-lfs

安装Anaconda(conda)环境

  • 下载页面: https://www.anaconda.com/download/success
  • 64-Bit (x86) Installer下载
shell 复制代码
wget https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Linux-x86_64.sh
  • 安装
shell 复制代码
sudo sh Anaconda3-2024.10-1-Linux-x86_64.sh
  • 会出现很多信息,一路yes下去,观看文档用q跳过
shell 复制代码
Do you accept the license terms? [yes|no]
>>> yes

Anaconda3 will now be installed into this location:
/root/anaconda3
  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below
[/root/anaconda3] >>> /data/ProgramFiles/anaconda3

You can undo this by running `conda init --reverse $SHELL`? [yes|no]
[no] >>> yes
  • 设置环境变量
shell 复制代码
cat >> ~/.bash_profile << EOF
export ANACONDA3_HOME=/data/ProgramFiles/anaconda3
export CONDA_ENVS_PATH=\$ANACONDA3_HOME/envs
export PATH="\$ANACONDA3_HOME/bin:$PATH"
EOF
source ~/.bash_profile

# 目录是使用root权限安装的,对目录进行授权
sudo chown -R tkyj.tkyj /data
  • 查看conda版本以验证是否安装成功
shell 复制代码
conda -V
  • 配置镜像源
shell 复制代码
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes

下载DeepSeek-R1-Distill-Qwen-7B模型

  • 魔搭社区: https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
shell 复制代码
mkdir -pv /data/llm/models
cd /data/llm/models

# 如果您希望跳过 lfs 大文件下载,可以使用如下命令
GIT_LFS_SKIP_SMUDGE=1 git clone https://www.modelscope.cn/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B.git
# 请确保lfs已经被正确安装
cd DeepSeek-R1-Distill-Qwen-7B
git lfs install
# 下载大文件
git lfs pull

安装LLaMA-Factory

  • 软件要求
Mandatory Minimum Recommend
python 3.9 3.10
torch 2.0.0 2.6.0
transformers 4.45.0 4.50.0
datasets 2.16.0 3.2.0
accelerate 0.34.0 1.2.1
peft 0.14.0 0.15.1
trl 0.8.6 0.9.6
Optional Minimum Recommend
CUDA 11.6 12.2
deepspeed 0.10.0 0.16.4
bitsandbytes 0.39.0 0.43.1
vllm 0.4.3 0.8.2
flash-attn 2.5.6 2.7.2
  • 硬件要求
Method Bits 7B 14B 30B 70B xB
Full (bf16 or fp16) 32 120GB 240GB 600GB 1200GB 18xGB
Full (pure_bf16) 16 60GB 120GB 300GB 600GB 8xGB
Freeze/LoRA/GaLore/APOLLO/BAdam 16 16GB 32GB 64GB 160GB 2xGB
QLoRA 8 10GB 20GB 40GB 80GB xGB
QLoRA 4 6GB 12GB 24GB 48GB x/2GB
QLoRA 2 4GB 8GB 16GB 24GB x/4GB

下载LLaMA-Factory

  • 使用git克隆项目
shell 复制代码
cd /data/ProgramFiles
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git

安装LLaMA-Factory依赖

shell 复制代码
cd LLaMA-Factory
conda create --name llama_factory  python=3.10
conda activate llama_factory
# 从torch官网匹配cuda 12.8版本
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -e ".[torch,metrics]" -i https://pypi.tuna.tsinghua.edu.cn/simple

# 验证安装
llamafactory-cli version

修改环境变量

方便每次登录终端自动切换到对应的Python环境

shell 复制代码
cat >> ~/.bash_profile << EOF
conda activate llama_factory
EOF
source ~/.bash_profile

安装deepspeed

单卡16G显存微调7B模型有点吃紧,可以使用deepspeed解决

shell 复制代码
# 查询cuda版本
conda search cuda -c nvidia
# 安装与nvidia-smi对应的版本(安装cuda是安装deepspeed的前置条件)
conda install cuda=12.8.1 -c nvidia
DS_BUILD_CPU_ADAM=1 pip3 install deepspeed==0.16.7

Alpaca数据集准备

  • alpaca示例格式
shell 复制代码
[
  {
    "instruction": "Invent a new word by combining two existing words.",
    "input": "",
    "output": "\"Chillax\" - a combination of the words \"chill\" and \"relax\". It means to calm down and unwind, enjoying oneself in a laidback manner."
  },
  {
    "instruction": "Transform the following sentence using a synonym: The car sped quickly.",
    "input": "",
    "output": "The car accelerated rapidly."
  }
  • 修改LLaMA-Factory/data/dataset_info.json,增加如下信息
text 复制代码
"cmic_financial_apaca": {
  "file_name": "/data/llm/dataset/cmic_financial_apaca.json",
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "response": "output",
    "system": "system"
  }
}

lora配置文件准备

  • 备份原始文件
shell 复制代码
cd LLaMA-Factory/examples/train_lora
# 从原始文件复制一份新的出来
cp llama3_lora_sft.yaml ds_qwen7b_lora_sft.yaml
vi examples/train_lora/ds_qwen7b_lora_sft.yaml
  • 修改ds_qwen7b_lora_sft.yaml,主要修改如下字段
    • model_name_or_path
    • dataset
    • template
    • cutoff_len
    • max_samples
    • output_dir
  • 需要关注以下参数
    • model_name_or_path: 模型路径
    • dataset: 数据集名称,对应上面声明的cmic_financial_apaca
    • template: 模版
    • cutoff_len: 控制输入序列的最大长度
    • output_dir: 微调后权重保存路径
    • gradient_accumulation_steps: 梯度累积的步数,GPU资源不足时需要减少该值
    • num_train_epochs: 训练的轮数
  • ds_qwen7b_lora_sft.yaml完整内容如下
yaml 复制代码
### model
model_name_or_path: /data/llm/models/DeepSeek-R1-Distill-Qwen-7B
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all

### dataset
dataset: cmic_financial_apaca
template: deepseek3
cutoff_len: 4096
max_samples: 4019
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

### output
output_dir: /data/llm/models/sft/DeepSeek-R1-Distill-Qwen-7B
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

### eval
#eval_dataset: alpaca_en_demo
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

设置GPU个数

多卡并行时,设置GPU个数,修改: LLaMA-Factory/examples/accelerate/fsdp_config.yaml

shell 复制代码
num_processes: 8  # the number of GPUs in all nodes

启动微调

shell 复制代码
conda activate llama_factory
 
# 后台运行
nohup llamafactory-cli train /data/ProgramFiles/LLaMA-Factory/examples/train_lora/ds_qwen7b_lora_sft.yaml > nohup.log 2>&1 &

# 查看日志
tail -fn200 nohup.log

查看微调时GPU情况

运行命令: watch -n 0.5 nvidia-smi

shell 复制代码
Every 0.5s: nvidia-smi                                                                                                                                             localhost.localdomain: Mon Apr 28 15:33:10 2025

Mon Apr 28 15:33:14 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20             Driver Version: 570.133.20     CUDA Version:      |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:3D:00.0 Off |                    0 |
| N/A   39C    P8             19W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla T4                       Off |   00000000:3E:00.0 Off |                    0 |
| N/A   40C    P0             26W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Tesla T4                       Off |   00000000:40:00.0 Off |                    0 |
| N/A   37C    P0             26W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Tesla T4                       Off |   00000000:41:00.0 Off |                    0 |
| N/A   37C    P0             25W /   70W |       0MiB /  15360MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  Tesla T4                       Off |   00000000:B1:00.0 Off |                    0 |
| N/A   36C    P8             13W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  Tesla T4                       Off |   00000000:B2:00.0 Off |                    0 |
| N/A   40C    P8             15W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  Tesla T4                       Off |   00000000:B4:00.0 Off |                    0 |
| N/A   42C    P8             15W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  Tesla T4                       Off |   00000000:B5:00.0 Off |                    0 |
| N/A   41C    P8             11W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

启动webui服务

shell 复制代码
# 关闭防火墙,方便访问web服务端口
sudo systemctl stop firewalld

GRADIO_SHARE=1
nohup llamafactory-cli webui > webui.log 2>&1 &
tail -fn200 webui.log


对话

shell 复制代码
llamafactory-cli chat /data/ProgramFiles/LLaMA-Factory/examples/inference/ds_qwen7b_lora_sft.yaml

损失函数曲线

参考资料

相关推荐
AI改变未来1 小时前
我们该如何使用DeepSeek帮我们减负?
人工智能·deepseek
pedestrian_h3 小时前
Spring AI 开发本地deepseek对话快速上手笔记
java·spring boot·笔记·llm·ollama·deepseek
木偶彡4 小时前
如何通过http访问ollama接口
大模型·ollama·deepseek
浪淘沙jkp4 小时前
AI大模型学习二十、利用Dify+deepseekR1 使用知识库搭建初中英语学习智能客服机器人
人工智能·llm·embedding·agent·知识库·dify·deepseek
AI大模型顾潇19 小时前
[特殊字符] 本地部署DeepSeek大模型:安全加固与企业级集成方案
数据库·人工智能·安全·大模型·llm·微调·llama
-曾牛1 天前
Spring AI 与 Hugging Face 深度集成:打造高效文本生成应用
java·人工智能·后端·spring·搜索引擎·springai·deepseek
大白技术控1 天前
浙江大学 deepseek 公开课 第三季 第3期 - 陈喜群 教授 (附PPT下载) by 突破信息差
人工智能·互联网·deepseek·deepseek公开课·浙大deepseek公开课课件·deepseek公开课ppt·人工智能大模型
Silence4Allen1 天前
大模型微调指南之 LLaMA-Factory 篇:一键启动LLaMA系列模型高效微调
人工智能·大模型·微调·llama-factory
奔跑吧邓邓子2 天前
DeepSeek“智”造:解锁旅游行业新玩法
应用·deepseek·旅游行业·旅游攻略