LLM - 使用 LLaMA-Factory 微调大模型环境配置与训练推理教程 (1)

欢迎关注我的CSDN：https://spike.blog.csdn.net/

本文地址：https://spike.blog.csdn.net/article/details/143388189

免责声明：本文来源于个人知识与公开资料，仅用于学术交流，欢迎讨论，不支持转载。

LLaMA-Factory 是开源的大模型微调框架，用于高效地微调和部署大语言模型，支持多种预训练模型和微调算法，提供完整的工具和接口，对于预训练的模型进行定制化的训练和调整，以适应特定的应用场景。

Paper: LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

目前比较友好的多模态大语言模型训练框架，即 LLaMA-Factory、SWIFT(ModelScope)、XTuner(上海 AI 实验室)，Star 和 Fork 数量对比(2024.11.6)：

LLaMA-Factory: Star 33.7 k、Fork 4.1k，使用和关注人数最多
SWIFT(ModelScope): 与 iOS 开发语言重名，Star 4.1k，Fork 364
XTuner: Star 3.9 k、Fork 306

1. 配置环境

构建训练 Docker 环境，即：

bash 复制代码

docker run -it \
--privileged \
--network host \
--shm-size 64G \
--gpus all \
--ipc host \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
--name llama_factory \
-v [your path]:[your path] \
nvcr.io/nvidia/pytorch:23.03-py3 \
/bin/bash

环境变量：

bash 复制代码

export TORCH_HOME="[your path]/torch_home/"
export HF_HOME="[your path]/huggingface/"
export HUGGINGFACE_TOKEN="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"
export MODELSCOPE_CACHE="[your path]/modelscope_models/"
export MODELSCOPE_API_TOKEN="dd37048f-f66b-4b6a-9a1e-99a0e15581c8"
export CUDA_HOME="/usr/local/cuda"

服务器配置：

bash 复制代码

uname -m && cat /etc/*release

# 输出
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04

GCC 配置：

bash 复制代码

gcc --version

# 输出
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

NVCC 配置：

bash 复制代码

nvcc -V

# 输出
Cuda compilation tools, release 12.1, V12.1.66

在 Docker 中构建 Conda 环境：

bash 复制代码

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda_20241105.sh
conda init  # 这一步，在 bashrc 中创建 conda 启动脚本
source ~/.bashrc

conda create -n llama_factory python=3.10
conda activate llama_factory

构建 PyTorch 环境：

bash 复制代码

pip3 install torch torchvision torchaudio

python

import torch
print(torch.__version__)  # 2.5.0+cu124
print(torch.cuda.is_available())  # True
exit()

安装 Llama-Factory 的依赖：

bash 复制代码

git clone https://github.com/hiyouga/LLaMA-Factory.git
pip install -r requirements.txt

pip install --no-deps -e ".[torch,metrics]"
# pip install --no-deps -e .

# 验证是否成功
llamafactory-cli train -h

推理测试：

bash 复制代码

CUDA_VISIBLE_DEVICES=1 llamafactory-cli webchat \
--model_name_or_path [your path]/llm/Meta-Llama-3.1-8B-Instruct/ \
--template llama3

参考：执行llamafactory-cli webui报错 Exception in thread Thread-3 (_do_normal_analytics_request)

日志：

bash 复制代码

[INFO|2024-11-05 12:16:51] llamafactory.model.model_utils.attention:157 >> Using torch SDPA for faster training and inference.
[INFO|2024-11-05 12:16:51] llamafactory.model.loader:157 >> all params: 8,030,261,248
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.

WebUI 已经启动成功，即：

2. 训练准备

Llama Factory 支持 Alpaca 和 ShareGPT 两种数据格式，数据地址：

中文数据：alpaca_gpt4_data_zh.json
广告数据：AdvertiseGen.tar.gz
大模型自我介绍数据集：identity.json，系统自带

替换 identity.json 数据集的内容：

需要填写 name 和 author，两个属性，例如 ZhichunluBot (大模型名称)、ManonLegrand (作者)

bash 复制代码

cp data/identity.json data/identity_new.json
sed -i 's/{{name}}/ZhichunluBot/g'  data/identity_new.json 
sed -i 's/{{author}}/ManonLegrand/g'  data/identity_new.json 
mv data/identity.json data/identity_ori.json
mv data/identity_new.json data/identity.json

注册数据集，将数据集的 JSON 文件放入 data 目录，修改 data/dataset_info.json

自定义数据集名称，例如 adgen_local，后续训练，使用名称查找数据集。
指定数据集，具体文件位置
定义原数据集的输入输出，与所需要的格式之间的映射关系

构建 dataset_info.json 数据集：

json 复制代码

# 旧数据
"alpaca_gpt4_zh": {
  "hf_hub_url": "llamafactory/alpaca_gpt4_zh",
  "ms_hub_url": "llamafactory/alpaca_gpt4_zh",
  "om_hub_url": "State_Cloud/alpaca-gpt4-data-zh"
},

# 新增
"adgen_local": {
  "file_name": "AdvertiseGen/train.json",
  "columns": {
    "prompt": "content",
    "response": "summary"
  }
}

下载 alpaca_gpt4_zh 数据集，即：

不指定 --local-dir，下载至默认路径，即 HF_HOME 位置
数据集需要指定 --repo-type dataset 参数

bash 复制代码

huggingface-cli download --token [your token] --repo-type dataset llamafactory/alpaca_gpt4_zh

# huggingface-cli download --token [your token] --repo-type dataset llamafactory/alpaca_gpt4_zh --local-dir llamafactory/alpaca_gpt4_zh

模型微调：

--stage sft，选择微调(SFT) 模式
--finetuning_type lora，选择 LoRA 类型
--num_train_epochs 10，Epochs 数量设置 10，合计 580 步 (58 Step/Epoch)，与数据集相关

即

bash 复制代码

llamafactory-cli train -h
CUDA_VISIBLE_DEVICES=1,2 nohup llamafactory-cli train \
    --stage sft \
    --do_train \
    --model_name_or_path [your path]/llm/Meta-Llama-3.1-8B-Instruct/ \
    --dataset alpaca_gpt4_zh,identity,adgen_local \
    --dataset_dir ./data \
    --template llama3 \
    --finetuning_type lora \
    --output_dir ./saves/Llama-3.1-8B/lora/sft-3 \
    --overwrite_cache \
    --overwrite_output_dir \
    --cutoff_len 1024 \
    --preprocessing_num_workers 16 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --logging_steps 50 \
    --warmup_steps 20 \
    --save_steps 100 \
    --eval_steps 50 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --learning_rate 5e-5 \
    --num_train_epochs 10.0 \
    --max_samples 1000 \
    --val_size 0.1 \
    --plot_loss \
    --fp16 > nohup.out &

合计显存占用 60G，2个卡，每个卡占用 30 G，即 30G * 2 = 60G

训练日志：

bash 复制代码

***** Running Evaluation *****
[INFO|trainer.py:4119] 2024-11-06 09:56:27,498 >>   Num examples = 210
[INFO|trainer.py:4122] 2024-11-06 09:56:27,498 >>   Batch size = 1
100%|██████████| 105/105 [00:06<00:00, 16.91it/s]
***** eval metrics *****
  epoch                   =      9.966
  eval_loss               =     1.7758
  eval_runtime            = 0:00:06.27
  eval_samples_per_second =     33.457
  eval_steps_per_second   =     16.728
[INFO|modelcard.py:449] 2024-11-06 09:56:33,803 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
bj-a800-roce0007:6615:6881 [0] NCCL INFO [Service thread] Connection closed by localRank 0
bj-a800-roce0007:6616:6880 [1] NCCL INFO [Service thread] Connection closed by localRank 1
bj-a800-roce0007:6615:6974 [0] NCCL INFO comm 0xc994100 rank 0 nranks 2 cudaDev 0 busId 51000 - Abort COMPLETE
bj-a800-roce0007:6616:6975 [1] NCCL INFO comm 0xe182c70 rank 1 nranks 2 cudaDev 1 busId 6a000 - Abort COMPLETE

Loss 曲线，还没有完全收敛：

测试效果：

bash 复制代码

CUDA_VISIBLE_DEVICES=1 llamafactory-cli webchat \
--model_name_or_path [your path]/llm/Meta-Llama-3.1-8B-Instruct/ \
--adapter_name_or_path [your path]/llm/LLaMA-Factory/saves/Llama-3.1-8B/lora/sft-3/  \
--template llama3 \
--finetuning_type lora

已经学习完成 identity.json 数据的效果，输出如下：

参考：

训练数据集的格式Alpaca 和 ShareGPT

LLM - 使用 LLaMA-Factory 微调大模型 环境配置与训练推理 教程 (1)

1. 配置环境

2. 训练准备

LLM - 使用 LLaMA-Factory 微调大模型环境配置与训练推理教程 (1)