【深度学习】LLaMA-Factory 大模型微调工具, 大模型GLM-4-9B Chat ，微调与部署 (2)

文章目录

数据准备
chat
评估模型
导出模型
部署
总结

资料：
https://github.com/hiyouga/LLaMA-Factory/blob/main/README_zh.md
https://www.53ai.com/news/qianyanjishu/2015.html

代码拉取：

bash 复制代码

git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory

build镜像和执行镜像：

bash 复制代码

cd /ssd/xiedong/glm-4-9b-xd/LLaMA-Factory

docker build -f ./docker/docker-cuda/Dockerfile \
    --build-arg INSTALL_BNB=false \
    --build-arg INSTALL_VLLM=false \
    --build-arg INSTALL_DEEPSPEED=false \
    --build-arg INSTALL_FLASHATTN=false \
    --build-arg PIP_INDEX=https://pypi.org/simple \
    -t llamafactory:latest .

docker run -dit --gpus=all \
    -v ./hf_cache:/root/.cache/huggingface \
    -v ./ms_cache:/root/.cache/modelscope \
    -v ./data:/app/data \
    -v ./output:/app/output \
    -v /ssd/xiedong/glm-4-9b-xd:/ssd/xiedong/glm-4-9b-xd \
    -p 9998:7860 \
    -p 9999:8000 \
    --shm-size 16G \
    llamafactory:latest

docker exec -it  a2b34ec1 bash

pip install bitsandbytes>=0.37.0

我构建好的镜像是：kevinchina/deeplearning:llamafactory-0.8.3，可以直接执行：

bash 复制代码

cd /ssd/xiedong/glm-4-9b-xd/LLaMA-Factory
docker run -dit --gpus '"device=0,1,2,3"' \
    -v ./hf_cache:/root/.cache/huggingface \
    -v ./ms_cache:/root/.cache/modelscope \
    -v ./data:/app/data \
    -v ./output:/app/output \
    -v /ssd/xiedong/glm-4-9b-xd:/ssd/xiedong/glm-4-9b-xd \
    -p 9998:7860 \
    -p 9999:8000 \
    --shm-size 16G \
    kevinchina/deeplearning:llamafactory-0.8.3

快速开始

下面三行命令分别对 Llama3-8B-Instruct 模型进行 LoRA 微调、推理和合并。

llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml

llamafactory-cli chat examples/inference/llama3_lora_sft.yaml

llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml

高级用法请参考 examples/README_zh.md（包括多 GPU 微调）。

Tip

使用 llamafactory-cli help 显示帮助信息。

LLaMA Board 可视化微调（由 Gradio 驱动）

bash 复制代码

llamafactory-cli webui

看一点资料：https://www.cnblogs.com/lm970585581/p/18140564

数据准备

数据准备的官方说明:

https://github.com/hiyouga/LLaMA-Factory/blob/main/data/README_zh.md

偏好数据集是用在奖励建模阶段的。

Alpaca 格式数据集格式：

bash 复制代码

[
  {
    "instruction": "人类指令（必填）",
    "input": "人类输入（选填）",
    "output": "模型回答（必填）",
    "system": "系统提示词（选填）",
    "history": [
      ["第一轮指令（选填）", "第一轮回答（选填）"],
      ["第二轮指令（选填）", "第二轮回答（选填）"]
    ]
  }
]

在指令监督微调数据集（Alpaca 格式）中，几个主要列分别有以下作用：

instruction（人类指令，必填）:
- 这一列包含了人类发出的具体指令或问题。这是模型根据指令生成回答的主要输入。
- 例子: "请解释一下量子力学的基本概念。"
input（人类输入，选填）:
- 这一列包含了与指令相关的额外输入信息，可以为空。如果填写，则与指令一起构成人类的完整输入。
- 例子: 如果指令是"请解释以下内容："，input 可以是"量子力学的基本概念。"
output（模型回答，必填）:
- 这一列包含了模型生成的回答或反应。这个是模型在接收到指令和输入后应产生的输出。
- 例子: "量子力学是一门研究微观粒子行为的物理学分支，其基本概念包括波粒二象性、测不准原理等。"
system（系统提示词，选填）:
- 这一列提供了给模型的系统级提示词，帮助设置对话的上下文或对话的语境。如果没有特定的系统提示词，可以为空。
- 例子: "你是一位物理学家，擅长解释复杂的科学概念。"
history（历史对话，选填）:
- 这一列包含了历史对话记录，是由多个字符串二元组构成的列表，每个二元组代表一轮对话的指令和回答。这些历史记录可以帮助模型理解当前对话的上下文。
- 例子:
  json 复制代码
```
[
  ["什么是相对论？", "相对论是由爱因斯坦提出的理论，分为狭义相对论和广义相对论。"],
  ["狭义相对论的核心概念是什么？", "狭义相对论的核心概念是光速不变和时间空间的相对性。"]
]
```

综上所述，这些列在数据集中的作用是：

instruction 和 input 一起构成人类给模型的完整输入。
output 是模型在接收到输入后生成的回答。
system 为模型提供额外的上下文或提示。
history 提供对话的历史记录，帮助模型理解和生成更加连贯的回答。

我现在要微调一个领域任务。这个任务是这样的：会有很长一段材料，要模型给出材料分类、材料里写的负责人名字。我要如何构建数据集?下面是例子：

数据集结构可以这么给：

json 复制代码

[
  {
    "instruction": "请对以下材料进行分类，并找出材料中的负责人名字。",
    "input": "材料内容",
    "output": "分类: 材料分类; 负责人: 负责人名字",
    "system": "你是一位文本分类和信息提取专家。",
  }
]

样例数据：

json 复制代码

[
  {
    "instruction": "请对以下材料进行分类，并找出材料中的负责人名字。",
    "input": "本公司2024年第一季度财报显示，收入增长了20%。财务负责人是张三。",
    "output": "分类: 财务报告; 负责人: 张三",
    "system": "你是一位文本分类和信息提取专家。",
  },
  {
    "instruction": "请对以下材料进行分类，并找出材料中的负责人名字。",
    "input": "根据最新的市场调研报告，本季度市场份额有显著提升。市场部负责人李四表示，对未来市场充满信心。",
    "output": "分类: 市场调研报告; 负责人: 李四",
    "system": "你是一位文本分类和信息提取专家。",
  }
]

dataset_info.json这么加：

json 复制代码

 "数据集名称": {
   "file_name": "data.json",
   "columns": {
     "prompt": "instruction",
     "query": "input",
     "response": "output",
     "system": "system",
   }
 }

本次微调选择了开源项目数据集，地址如下：

https://github.com/KMnO4-zx/huanhuan-chat/blob/master/dataset/train/lora/huanhuan.json

下载后，将json文件存放到LLaMA-Factory的data目录下。

修改data目录下dataset_info.json 文件。

直接增加以下内容即可：

bash 复制代码

 "huanhuan": {
    "file_name": "huanhuan.json"
  }，

如图：

进入容器打开webui：

bash 复制代码

llamafactory-cli webui

网页打开页面：

http://10.136.19.26:9998/

webui训练老报错，可以把指令弄下来去容器里执行：

bash 复制代码

llamafactory-cli train \
    --stage sft \
    --do_train True \
    --model_name_or_path /ssd/xiedong/glm-4-9b-xd/glm-4-9b-chat \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --quantization_method bitsandbytes \
    --template glm4 \
    --flash_attn auto \
    --dataset_dir data \
    --dataset huanhuan \
    --cutoff_len 1024 \
    --learning_rate 5e-05 \
    --num_train_epochs 3.0 \
    --max_samples 100000 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 0 \
    --optim adamw_torch \
    --packing False \
    --report_to none \
    --output_dir saves/GLM-4-9B-Chat/lora/train_2024-07-23-04-22-25 \
    --bf16 True \
    --plot_loss True \
    --ddp_timeout 180000000 \
    --include_num_input_tokens_seen True \
    --lora_rank 8 \
    --lora_alpha 32 \
    --lora_dropout 0.1 \
    --lora_target all

训练完:

***** train metrics *****

epoch = 2.9807

num_input_tokens_seen = 741088

total_flos = 36443671GF

train_loss = 2.5584

train_runtime = 0:09:24.59

train_samples_per_second = 19.814

train_steps_per_second = 0.308

chat

评估模型

40G显存空余才行，这模型太大。

类似，看指令，然后命令行执行:

bash 复制代码

CUDA_VISIBLE_DEVICES=1,2,3 llamafactory-cli train \
    --stage sft \
    --model_name_or_path /ssd/xiedong/glm-4-9b-xd/glm-4-9b-chat \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --quantization_method bitsandbytes \
    --template glm4 \
    --flash_attn auto \
    --dataset_dir data \
    --eval_dataset huanhuan \
    --cutoff_len 1024 \
    --max_samples 100000 \
    --per_device_eval_batch_size 2 \
    --predict_with_generate True \
    --max_new_tokens 512 \
    --top_p 0.7 \
    --temperature 0.95 \
    --output_dir saves/GLM-4-9B-Chat/lora/eval_2024-07-23-04-22-25 \
    --do_predict True \
    --adapter_name_or_path saves/GLM-4-9B-Chat/lora/train_2024-07-23-04-22-25

数据集有点大，没执行完我就停止了，结果可能是存这里：/app/saves/GLM-4-9B-Chat/lora/eval_2024-07-23-04-22-25

导出模型

填导出路径进行导出/ssd/xiedong/glm-4-9b-xd/export_test0723。

部署

LLaMA-Factory可以直接部署模型，给参数就可以。

https://github.com/hiyouga/LLaMA-Factory/blob/main/src/api.py

比如：

bash 复制代码

llamafactory-cli api  --model_name_or_path /ssd/xiedong/glm-4-9b-xd/export_0725_yingyong --template glm4 --finetuning_type lora --adapter_name_or_path saves/GLM-4-9B-Chat/lora/train_2024-07-26-02-14-58

请求：

bash 复制代码

curl -X 'POST' \
  'http://10.136.19.26:9999/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "gpt-4",
  "messages": [
    {
      "role": "user",
      "content": "你是谁?"
    }
  ],
  "do_sample": true,
  "temperature": 0.7,
  "top_p": 0.9,
  "n": 1,
  "max_tokens": 150,
  "stop": null,
  "stream": false
}
'

python请求：

bash 复制代码

import requests

url = 'http://10.136.19.26:9999/v1/chat/completions'
headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json'
}
data = {
    "model": "gpt-4",
    "messages": [
        {
            "role": "system",
            "content": "你是一位文本分析专家，现在需要分析文本的所属应用类别。"
        },
        {
            "role": "user",
            # user_input+ocr_ret
            "content": "贷款"
        }
    ],
    "do_sample": True,
    "temperature": 0.7,
    "top_p": 0.9,
    "n": 1,
    "max_tokens": 150,
    "stop": None,
    "stream": False
}

response = requests.post(url, headers=headers, json=data)

print(response.json()['choices'][0]['message']['content'].replace('\n', ''))

webUi：

bash 复制代码

llamafactory-cli webchat  --model_name_or_path /ssd/xiedong/glm-4-9b-xd/export_0725_yingyong --template glm4 --finetuning_type lora --adapter_name_or_path saves/GLM-4-9B-Chat/lora/train_2024-07-26-02-14-58 


llamafactory-cli webchat  --model_name_or_path /ssd/xiedong/glm-4-9b-xd/glm-4-9b-chat --template glm4 --finetuning_type lora

总结

这么看下来，这个文档的含金量很高：
https://github.com/hiyouga/LLaMA-Factory/tree/main/examples

为了方便使用，推送了这个镜像:

bash 复制代码

docker push kevinchina/deeplearning:llamafactory-0.8.3