基于peft的lora进行模型qwen0.5的微调、合并、转换为.gguf

背景：

当前火热的大模型技术，让我这个老程序员多少有点好奇，想看看这东西是什么？浅尝辄止的玩一下，由于个人初学，理解上肯定不专业甚至会有错误。本博客只给和我一样的菜的兄弟提供参考，也为自己做个记录。

材料：

1、一台有显卡的电脑，纯CPU微调真的很慢，10条记录能跑半个小时。

2、下文实验的电脑是 T480，显卡是MX150独显，只有2G显存.

制作：

安装显卡驱动

1、windows任务管理器 --性能 --GPU --含有NVIDIA

我们可以通过nvidia-smi命令查看自己的显卡CUDA 版本

2、下载地址：CUDA Toolkit Archive | NVIDIA 0eveloper

3、下载后双击安装即可（我采用了默认精简安装，一路下一步即可）

安装完成后输入nvcc -V进行查看版本

安装pytorch--gpu等python库

1、根据要下载的Pytorch版本选择python版本，以下有对照。目前python3.9适用性强。

2、通过Start Locally | PyTorch

找到与自己匹配的即可（高版本兼容低版本的，我安装的是 12.4 也可以）

5、复制上面的命令在环境中进行安装即可，这个安装很费时间。

复制代码

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

6、通过下面命令来验证是否成功

复制代码

import torch
torch.cuda.is_available()

训练

数据格式

复制代码

[
{
        "Question": "根据描述，一个1岁的孩子在夏季头皮出现多处小结节，长期不愈合，且现在疮大如梅，溃破流脓，口不收敛，头皮下有空洞，患处皮肤增厚。这种病症在中医中诊断为什么病？",
        "Complex_CoT": "这个小孩子在夏天头皮上长了些小结节，一直都没好，后来变成了脓包，流了好多脓。想想夏天那么热，可能和湿热有关。才一岁的小孩，免疫力本来就不强，夏天的湿热没准就侵袭了身体。\n\n用中医的角度来看，出现小结节、再加上长期不愈合，这些症状让我想到了头疮。小孩子最容易得这些皮肤病，主要因为湿热在体表郁结。\n\n但再看看，头皮下还有空洞，这可能不止是简单的头疮。看起来病情挺严重的，也许是脓肿没治好。这样的情况中医中有时候叫做禿疮或者湿疮，也可能是另一种情况。\n\n等一下，头皮上的空洞和皮肤增厚更像是疾病已经深入到头皮下，这是不是说明有可能是流注或瘰疬？这些名字常描述头部或颈部的严重感染，特别是有化脓不愈合，又形成通道或空洞的情况。\n\n仔细想想，我怎么感觉这些症状更贴近瘰疬的表现？尤其考虑到孩子的年纪和夏天发生的季节性因素，湿热可能是主因，但可能也有火毒或者痰湿造成的滞留。\n\n回到基本的症状描述上看，这种长期不愈合又复杂的状况，如果结合中医更偏重的病名，是不是有可能是涉及更深层次的感染？\n\n再考虑一下，这应该不是单纯的瘰疬，得仔细分析头皮增厚并出现空洞这样的严重症状。中医里头，这样的表现可能更符合'蚀疮'或'头疽'。这些病名通常描述头部严重感染后的溃烂和组织坏死。\n\n看看季节和孩子的体质，夏天又湿又热，外邪很容易侵入头部，对孩子这么弱的免疫系统简直就是挑战。头疽这个病名听起来真是切合，因为它描述的感染严重，溃烂到出现空洞。\n\n不过，仔细琢磨后发现，还有个病名似乎更为合适，叫做'蝼蛄疖'，这病在中医里专指像这种严重感染并伴有深部空洞的情况。它也涵盖了化脓和皮肤增厚这些症状。\n\n哦，该不会是夏季湿热，导致湿毒入侵，孩子的体质不能御，其病情发展成这样的感染？综合分析后我觉得'蝼蛄疖'这个病名真是相当符合。",
        "Response": "从中医的角度来看，你所描述的症状符合"蝼蛄疖"的病症。这种病症通常发生在头皮，表现为多处结节，溃破流脓，形成空洞，患处皮肤增厚且长期不愈合。湿热较重的夏季更容易导致这种病症的发展，特别是在免疫力较弱的儿童身上。建议结合中医的清热解毒、祛湿消肿的治疗方法进行处理，并配合专业的医疗建议进行详细诊断和治疗。"
    }
]

数据准备

复制代码

# 数据预处理函数
def process_data(tokenizer):
    dataset = load_dataset("json", data_files='E:/gptTest/data/medical_o1_sft_Chinese.json', split="train[:10]")
    def process_func(example):
        """
        将数据集进行预处理
        """
        MAX_LENGTH = 512 
        input_ids, attention_mask, labels = [], [], []
        instruction = tokenizer(
            f"<|im_start|>system\n现在你是一个医学专家，我有一些身体问题，请你用专业的知识帮我解决。<|im_end|>\n<|im_start|>user\n诊断问题:{example['Question']}\n详细分析:{example['Complex_CoT']}<|im_end|>\n<|im_start|>assistant\n",
            add_special_tokens=False,
        )
        response = tokenizer(f"{example['Response']}", add_special_tokens=False)
        input_ids = instruction["input_ids"] + response["input_ids"] + [tokenizer.pad_token_id]
        attention_mask = (
            instruction["attention_mask"] + response["attention_mask"] + [1]
        )
        labels = [-100] * len(instruction["input_ids"]) + response["input_ids"] + [tokenizer.pad_token_id]
        if len(input_ids) > MAX_LENGTH:  # 做一个截断
            input_ids = input_ids[:MAX_LENGTH]
            attention_mask = attention_mask[:MAX_LENGTH]
            labels = labels[:MAX_LENGTH]
        return {"input_ids": input_ids, "attention_mask": attention_mask, "labels": labels}   
    encoded_dataset = dataset.map(process_func, remove_columns=dataset.column_names)
    return encoded_dataset

LoRA配置

复制代码

# LoRA配置
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["v_proj", "q_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

训练参数配置

复制代码

# 训练参数配置
training_args = TrainingArguments(
    output_dir='E:/gptTest/results',
    gradient_accumulation_steps=8,  # 累计梯度相当于batch_size=8
    per_device_train_batch_size=1,  # 每个设备的批大小（根据显存调整）
    num_train_epochs=3,  # 训练轮次
    eval_strategy="no",  # 不进行验证（如需验证可设置为"steps"或"epoch"）
    learning_rate=2e-5,  # 学习率（常用范围：1e-5到5e-5）
    logging_dir="E:/gptTest/logs",  # 日志目录
    logging_steps=10,  # 每多少步记录日志
    fp16=True,
    dataloader_pin_memory=False,  # 加速数据加载
    remove_unused_columns=False  # 防止删除未使用的列
)

训练脚本

复制代码

def train_fn():
    # 加载tokenizer
    tokenizer = AutoTokenizer.from_pretrained('E:/gptTest/qwen05b', use_fast=False, trust_remote_code=True)
    # tokenizer.pad_token = tokenizer.eos_token

    # 加载模型到GPU
    model = AutoModelForCausalLM.from_pretrained('E:/gptTest/qwen05b', torch_dtype=torch.float16, device_map={"": device})
    model = get_peft_model(model, peft_config)
    model.print_trainable_parameters()
    # 准备数据
    train_dataset = process_data(tokenizer)
    # 数据加载器
    # 创建Trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        tokenizer=tokenizer,
        data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True)
    )

    # 开始训练
    print("开始训练...")
    trainer.train()
    print('我训练完成了')
    # 保存最终模型
    output_dir = "E:/gptTest/gpt" # 模型输出目录
    trainer.save_model(output_dir)
    tokenizer.save_pretrained(output_dir)

环境判断

复制代码

# 根据系统环境判断采用CPU还是GPU训练
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

引入的包库

复制代码

import torch
import pandas as pd
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer, DataCollatorForSeq2Seq
from peft import LoraConfig, get_peft_model
from sklearn.model_selection import train_test_split
from datasets import load_dataset, Dataset, DatasetDict
import json
import os
import subprocess
from peft import PeftModel

推理

直接上代码：

复制代码

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from peft import PeftModel


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def predict(messages, model, tokenizer):
    
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    model_inputs = tokenizer([text], return_tensors="pt").to(device)

    generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512)
    generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

    return response

model_name = "E:/gptTest/gpt"
# model = AutoModelForCausalLM.from_pretrained(model_name, device_map={"": device}, trust_remote_code=True).eval()
# tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, device_map={"": device})

tokenizer = AutoTokenizer.from_pretrained('E:/gptTest/qwen05b', use_fast=False, trust_remote_code=True)
#加载模型到GPU
model = AutoModelForCausalLM.from_pretrained('E:/gptTest/qwen05b', torch_dtype=torch.float16, device_map={"": device})
model = PeftModel.from_pretrained(model, model_id=model_name)

# 示例指令   
# sample_input = "一个1岁的孩子在夏季头皮出现多处小结节，长期不愈合，且现在疮大如梅，溃破流脓，口不收敛，头皮下有空洞，患处皮肤增厚。这种病症在中医中诊断为什么病？"     
test_texts = {
    'instruction': "现在你是一个医学专家，我有一些身体问题，请你用专业的知识帮我解决。",
    'input': ""蝼蛄疖"的病症"
}
# 使用模型生成响应   
instruction = test_texts['instruction']
input_value = test_texts['input']

messages = [
    {"role": "system", "content": f"{instruction}"},
    {"role": "user", "content": f"{input_value}"}
]

response = predict(messages, model, tokenizer)

print(f"模型响应: {response}")

这里要强调的有2点

1、如上这样加载也可以是较为标准的微调模型用于推理时的加载方式。

2、如果不采用PeftModel 进行合并加载也可以采用如下加载（直接加载微调后的模型）

复制代码

model_name = "E:/gptTest/gpt"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map={"": device}, trust_remote_code=True).eval()
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, device_map={"": device})

但这样操作需要修改该微调模型下的部分配置（对原模型的依赖，如adapter_config.json文件中的依赖）

合并

基于peft 微调训练的模型如上述保存后只是保存了微调部分，结构如下：

该模型脱离原模型是无法运行的，咋办，只有合并。

复制代码

def merge_fn():
    tokenizer = AutoTokenizer.from_pretrained('E:/gptTest/qwen05b', use_fast=False, trust_remote_code=True)
    # 加载模型到GPU
    model = AutoModelForCausalLM.from_pretrained('E:/gptTest/qwen05b', torch_dtype=torch.float16, device_map={"": device})
    # 获取PEFT模型，加载基础模型和lora调优后的模型
    newModel = PeftModel.from_pretrained(model, 'E:/gptTest/gpt')
    # 合并模型.
    mergedModel= newModel.merge_and_unload()
    # 保存合并模型
    mergedModel.save_pretrained('E:/gptTest/merge')
    # 保存分词相关配置
    tokenizer.save_pretrained('E:/gptTest/merge')

合并后的目录结构如下：

这时的模型明显脱离了peft，变成了完整的transformer模型的样子，后续使用直接使用

复制代码

model = AutoModelForCausalLM.from_pretrained(model_name, device_map={"": device}, trust_remote_code=True).eval()
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, device_map={"": device})

转换成gguf

现在我们微调后的模型属于自己的本地模型，如果想要做后续的量化或者使用ollam进行部署就有必要进行格式转换

复制代码

# 模型转换为.gguf格式，方便部署与推理.
def to_gguf():
    gguf_output_file = os.path.join('E:/gptTest/',"wesky.gguf")
    convert_to_gguf('E:/gptTest/merge', gguf_output_file)

转换函数（源自网络博客内容改造）

复制代码

def convert_to_gguf(input_dir, output_file): 
# """ 转换HuggingFace模型为GGUF格式 参数： 
#     input_dir - 微调后的模型目录 
#     output_file - 输出的GGUF文件路径 
# """
    llama_cpp_path = "E:/gptTest/qwen/llama.cpp-master"  # 需要替换为实际路径
    convert_script = os.path.join(llama_cpp_path, "convert_hf_to_gguf.py")  # 第一次转换尝试
    print(convert_script)
    command =f"python {convert_script} {input_dir} --outfile {output_file}"
    try:
        subprocess.run(command, shell=True, check=True)
        print(f"成功将模型转换为 GGUF 格式:{output_file}")
    except subprocess.CalledProcessError as e:
        print(f"转换失败:{e}") 
        print("尝试使用更新的转换脚本...")
        convert_script = os.path.join(llama_cpp_path, "convert_hf_to_gguf_update.py") 
        command =f"python {convert_script} {input_dir} --outfile {output_file}"
        try:
            subprocess.run(command, shell=True, check=True) 
            print(f"成功将模型转换为 GGUF 格式:{output_file}")
        except subprocess.CalledProcessError as e2:
            print(f"转换仍失败:{e2}")
            print("可能原因：模型架构不兼容、转换脚本版本问题")

1、为了实现上述的目标，我们需要安装 llama_acc(https://codeload.github.com/ggml-org/llama.cpp/zip/refs/heads/master),可以直接下载后解压

llama.cpp-master.zip

2、解压后切换到../../llama.cpp-master/ 目录下执行 cmd 并运行 pip install requirements.txt 即可等待安装（注意：这个操作可能会影响到你的已有程序的依赖，由于本人没有采用python 虚拟环境，故出现了依赖问题，最后升级pytorch--gpu来修复的，因为这个脚本会默认将当前环境的pytorch 切换为 pytorch--cpu）

3、完成上述操作结课配合上面的2段脚本完成gguf的转换