欢迎关注我的CSDN:https://spike.blog.csdn.net/
免责声明:本文来源于个人知识与公开资料,仅用于学术交流,欢迎讨论,不支持转载。
LLaMA-Factory 是开源的大模型微调框架,用于高效地微调和部署大语言模型,支持多种预训练模型和微调算法,提供完整的工具和接口,对于预训练的模型进行定制化的训练和调整,以适应特定的应用场景。
Paper: LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
目前比较友好的多模态大语言模型训练框架,即 LLaMA-Factory、SWIFT(ModelScope)、XTuner(上海 AI 实验室),Star 和 Fork 数量对比(2024.11.6):
- LLaMA-Factory: Star 33.7 k、Fork 4.1k,使用和关注人数最多
- SWIFT(ModelScope): 与 iOS 开发语言重名,Star 4.1k,Fork 364
- XTuner: Star 3.9 k、Fork 306
1. 配置环境
构建训练 Docker 环境,即:
bash
docker run -it \
--privileged \
--network host \
--shm-size 64G \
--gpus all \
--ipc host \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
--name llama_factory \
-v [your path]:[your path] \
nvcr.io/nvidia/pytorch:23.03-py3 \
/bin/bash
环境变量:
bash
export TORCH_HOME="[your path]/torch_home/"
export HF_HOME="[your path]/huggingface/"
export HUGGINGFACE_TOKEN="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"
export MODELSCOPE_CACHE="[your path]/modelscope_models/"
export MODELSCOPE_API_TOKEN="dd37048f-f66b-4b6a-9a1e-99a0e15581c8"
export CUDA_HOME="/usr/local/cuda"
服务器配置:
bash
uname -m && cat /etc/*release
# 输出
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
GCC 配置:
bash
gcc --version
# 输出
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
NVCC 配置:
bash
nvcc -V
# 输出
Cuda compilation tools, release 12.1, V12.1.66
在 Docker 中构建 Conda 环境:
bash
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda_20241105.sh
conda init # 这一步,在 bashrc 中创建 conda 启动脚本
source ~/.bashrc
conda create -n llama_factory python=3.10
conda activate llama_factory
构建 PyTorch 环境:
bash
pip3 install torch torchvision torchaudio
python
import torch
print(torch.__version__) # 2.5.0+cu124
print(torch.cuda.is_available()) # True
exit()
安装 Llama-Factory 的依赖:
bash
git clone https://github.com/hiyouga/LLaMA-Factory.git
pip install -r requirements.txt
pip install --no-deps -e ".[torch,metrics]"
# pip install --no-deps -e .
# 验证是否成功
llamafactory-cli train -h
推理测试:
bash
CUDA_VISIBLE_DEVICES=1 llamafactory-cli webchat \
--model_name_or_path [your path]/llm/Meta-Llama-3.1-8B-Instruct/ \
--template llama3
日志:
bash
[INFO|2024-11-05 12:16:51] llamafactory.model.model_utils.attention:157 >> Using torch SDPA for faster training and inference.
[INFO|2024-11-05 12:16:51] llamafactory.model.loader:157 >> all params: 8,030,261,248
Running on local URL: http://0.0.0.0:7860
To create a public link, set `share=True` in `launch()`.
WebUI 已经启动成功,即:
2. 训练准备
Llama Factory 支持 Alpaca 和 ShareGPT 两种数据格式,数据地址:
- 中文数据:alpaca_gpt4_data_zh.json
- 广告数据:AdvertiseGen.tar.gz
- 大模型自我介绍数据集:
identity.json
,系统自带
替换 identity.json
数据集的内容:
- 需要填写
name
和author
,两个属性,例如 ZhichunluBot (大模型名称)、ManonLegrand (作者)
bash
cp data/identity.json data/identity_new.json
sed -i 's/{{name}}/ZhichunluBot/g' data/identity_new.json
sed -i 's/{{author}}/ManonLegrand/g' data/identity_new.json
mv data/identity.json data/identity_ori.json
mv data/identity_new.json data/identity.json
注册数据集,将数据集的 JSON 文件放入 data 目录,修改 data/dataset_info.json
- 自定义数据集名称,例如
adgen_local
,后续训练,使用名称查找数据集。 - 指定数据集,具体文件位置
- 定义原数据集的输入输出,与所需要的格式之间的映射关系
构建 dataset_info.json
数据集:
json
# 旧数据
"alpaca_gpt4_zh": {
"hf_hub_url": "llamafactory/alpaca_gpt4_zh",
"ms_hub_url": "llamafactory/alpaca_gpt4_zh",
"om_hub_url": "State_Cloud/alpaca-gpt4-data-zh"
},
# 新增
"adgen_local": {
"file_name": "AdvertiseGen/train.json",
"columns": {
"prompt": "content",
"response": "summary"
}
}
下载 alpaca_gpt4_zh
数据集,即:
- 不指定
--local-dir
, 下载至默认路径,即HF_HOME
位置 - 数据集需要指定
--repo-type dataset
参数
bash
huggingface-cli download --token [your token] --repo-type dataset llamafactory/alpaca_gpt4_zh
# huggingface-cli download --token [your token] --repo-type dataset llamafactory/alpaca_gpt4_zh --local-dir llamafactory/alpaca_gpt4_zh
模型微调:
--stage sft
,选择 微调(SFT) 模式--finetuning_type lora
,选择 LoRA 类型--num_train_epochs 10
,Epochs 数量设置 10,合计 580 步 (58 Step/Epoch),与数据集相关
即
bash
llamafactory-cli train -h
CUDA_VISIBLE_DEVICES=1,2 nohup llamafactory-cli train \
--stage sft \
--do_train \
--model_name_or_path [your path]/llm/Meta-Llama-3.1-8B-Instruct/ \
--dataset alpaca_gpt4_zh,identity,adgen_local \
--dataset_dir ./data \
--template llama3 \
--finetuning_type lora \
--output_dir ./saves/Llama-3.1-8B/lora/sft-3 \
--overwrite_cache \
--overwrite_output_dir \
--cutoff_len 1024 \
--preprocessing_num_workers 16 \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 8 \
--lr_scheduler_type cosine \
--logging_steps 50 \
--warmup_steps 20 \
--save_steps 100 \
--eval_steps 50 \
--evaluation_strategy steps \
--load_best_model_at_end \
--learning_rate 5e-5 \
--num_train_epochs 10.0 \
--max_samples 1000 \
--val_size 0.1 \
--plot_loss \
--fp16 > nohup.out &
合计显存占用 60G,2个卡,每个卡占用 30 G,即 30G * 2 = 60G
训练日志:
bash
***** Running Evaluation *****
[INFO|trainer.py:4119] 2024-11-06 09:56:27,498 >> Num examples = 210
[INFO|trainer.py:4122] 2024-11-06 09:56:27,498 >> Batch size = 1
100%|██████████| 105/105 [00:06<00:00, 16.91it/s]
***** eval metrics *****
epoch = 9.966
eval_loss = 1.7758
eval_runtime = 0:00:06.27
eval_samples_per_second = 33.457
eval_steps_per_second = 16.728
[INFO|modelcard.py:449] 2024-11-06 09:56:33,803 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
bj-a800-roce0007:6615:6881 [0] NCCL INFO [Service thread] Connection closed by localRank 0
bj-a800-roce0007:6616:6880 [1] NCCL INFO [Service thread] Connection closed by localRank 1
bj-a800-roce0007:6615:6974 [0] NCCL INFO comm 0xc994100 rank 0 nranks 2 cudaDev 0 busId 51000 - Abort COMPLETE
bj-a800-roce0007:6616:6975 [1] NCCL INFO comm 0xe182c70 rank 1 nranks 2 cudaDev 1 busId 6a000 - Abort COMPLETE
Loss 曲线,还没有完全收敛:
测试效果:
bash
CUDA_VISIBLE_DEVICES=1 llamafactory-cli webchat \
--model_name_or_path [your path]/llm/Meta-Llama-3.1-8B-Instruct/ \
--adapter_name_or_path [your path]/llm/LLaMA-Factory/saves/Llama-3.1-8B/lora/sft-3/ \
--template llama3 \
--finetuning_type lora
已经学习完成 identity.json
数据的效果,输出如下:
参考: