【LLM模型】【自我认知微调】实践基于【ModelScope】的【ms-swift】框架【GPU】方式

前言

本文是基于ModelScope的ms-swift框架使用GPU的方式进行的LLM模型的自我认知微调实践

环境准备

创建conda虚拟环境

lua 复制代码

conda create -n model_scope_llm_gpu
conda activate model_scope_llm_gpu

创建model_scope_llm_gpu虚拟环境

切换到model_scope_llm_gpu虚拟环境

安装Python环境

本次实践Python版本依旧采用3.10版本

ini 复制代码

conda install python=3.10

安装Python3.10

安装pytorch-cuda环境

ini 复制代码

conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=12.1 -c pytorch -c nvidia

pytorch-cuda安装开始

pytorch-cuda安装成功

安装ms-swift环境

arduino 复制代码

pip install 'ms-swift[llm]' -U

也可使用国内镜像，安装速度更快

arduino 复制代码

pip install 'ms-swift[llm]' -U -i https://pypi.tuna.tsinghua.edu.cn/simple

ms-swift安装开始

ms-swift安装成功

IDE准备

使用PyCharm IDE工具创建一个自己的项目目录：

选择File->New Project

选择已经创建好的虚拟环境

点击Create创建成功后会默认生成一个main.py的文件，等待右下角环境加载完成后，run运行一下main.py文件，成功打印Hi, PyCharm内容说明创建成功：

main.py文件

加载虚拟环境

main.py运行成功

LLM模型微调前推理

在新建的项目中创建一个inference_before.py文件，写入以下代码：

python 复制代码

import os

os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_model_tokenizer, get_template, inference, ModelType,
    get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.glm4_9b_chat
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')


kwargs = {}
model_id_or_path = None
model, tokenizer = get_model_tokenizer(model_type, torch.float16, model_id_or_path=model_id_or_path,
                                       model_kwargs={'device_map': 'cuda:0'}, **kwargs)
# 修改max_new_tokens
model.generation_config.max_new_tokens = 128

template = get_template(template_type, tokenizer)
seed_everything(42)

query = '你是谁？'
response, history = inference(model, template, query)
print(f'response: {response}')
print(f'history: {history}')

项目名称右键->NEW->Python File

inference_before.py文件创建成功

写入代码

运行写好的inference_before.py文件，等待三十几秒之后控制台输出问题答案（GPU方式要比CPU方式快很多）：

加载本地glm模型

成功输出问题答案