主要参考的链接:https://juejin.cn/post/7439169215133597759
遇到的坑:
1、我的电脑是CUDA12.4,此时需要用python3.10,torch的安装用下面的命令:
conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=12.4 -c pytorch -c nvidia
2、训练时,由于显存不够,需要修改配置参数:
python
val_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
inference_mode=True, # 训练模式
r=4,#64, # Lora 秩
lora_alpha=1,#16, # Lora alaph,具体作用参见 Lora 原理
lora_dropout=0.05, # Dropout 比例
bias="none",
)
如上面所示修改,需要修改train.py中的2个地方,还有下面的部分
python
# 设置SwanLab回调
swanlab_callback = SwanLabCallback(
project="Qwen2-VL-finetune",
experiment_name="qwen2-vl-coco2014",
config={
"model": "https://modelscope.cn/models/Qwen/Qwen2-VL-2B-Instruct",
"dataset": "https://modelscope.cn/datasets/modelscope/coco_2014_caption/quickstart",
"github": "https://github.com/datawhalechina/self-llm",
"prompt": "COCO Yes: ",
"train_data_number": len(train_data),
"lora_rank": 4, #64,
"lora_alpha": 1,#16,
"lora_dropout": 0.1,
},
)
然后就可以运行起来了。结果如下截图: