使用DeepSpeed/P-Tuning v2对ChatGLM-6B进行微调

link
之前尝试了基于ChatGLM-6B使用LoRA进行参数高效微调 ,本文给大家分享使用DeepSpeed和P-Tuning v2对ChatGLM-6B进行微调,相关代码放置在GitHub上面:llm-action

ChatGLM-6B简介

ChatGLM-6B相关的简介请查看之前的文章,这里不再赘述。

P-Tuning v2简介

P-Tuning是一种较新的模型微调方法,它采用了参数剪枝的技术,可以将微调的参数量减少到原来的0.1%。具体来说,P-Tuning v2是基于P-Tuning v1的升级版,主要的改进在于采用了更加高效的剪枝方法,可以进一步减少模型微调的参数量。

P-Tuning v2的原理是通过对已训练好的大型语言模型进行参数剪枝,得到一个更加小巧、效率更高的轻量级模型。具体地,P-Tuning v2首先使用一种自适应的剪枝策略,对大型语言模型中的参数进行裁剪,去除其中不必要的冗余参数。然后,对于被剪枝的参数,P-Tuning v2使用了一种特殊的压缩方法,能够更加有效地压缩参数大小,并显著减少模型微调的总参数量。

总的来说,P-Tuning v2的核心思想是让模型变得更加轻便、更加高效,同时尽可能地保持模型的性能不受影响。这不仅可以加快模型的训练和推理速度,还可以减少模型在使用过程中的内存和计算资源消耗,让模型更适用于各种实际应用场景中。

环境搭建

基础环境配置如下:

  • 操作系统: Ubuntu 18.04
  • CPUs: 单个节点具有 1TB 内存的 Intel CPU,物理CPU个数为64,每颗CPU核数为16
  • GPUs: 8 卡 A800 80GB GPUs
  • Python: 3.10 (需要先升级OpenSSL到1.1.1t版本(点击下载OpenSSL ),然后再编译安装Python),点击下载Python
  • NVIDIA驱动程序版本: 515.65.01,根据不同型号选择不同的驱动程序,点击下载
  • CUDA工具包: 11.7,点击下载
  • NCCL: nccl_2.14.3-1+cuda11.7,点击下载
  • cuDNN: 8.8.1.3_cuda11,点击下载

上面的NVIDIA驱动、CUDA、Python等工具的安装就不一一赘述了。

创建虚拟环境并激活虚拟环境chatglm-ptuningv2-venv-py310-cu117:

text 复制代码
cd /home/guodong.li/virtual-venv
virtualenv -p /usr/bin/python3.10 chatglm-ptuningv2-venv-py310-cu117
source /home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/bin/activate

离线安装PyTorch,**点击下载**对应cuda版本的torch和torchvision即可。

text 复制代码
pip install torch-1.13.1+cu117-cp310-cp310-linux_x86_64.whl
pip install torchvision-0.14.1+cu117-cp310-cp310-linux_x86_64.whl

安装其他依赖库。

text 复制代码
pip install -r requirements.txt

requirements.txt文件内容如下:

text 复制代码
protobuf
transformers==4.28.0
cpm_kernels
gradio
mdtex2html
sentencepiece
rouge_chinese
nltk
jieba
datasets
deepspeed
accelerate

注意
官方文档的transformers版本为4.27.1,chatglm加载模型时会调用transformers/dynamic_module_utils.py文件下的get_class_in_module方法,而该方法在并发情况下会存在找不到文件的问题。将transformers版本升级到4.28.0可以规避此问题。

数据准备

下面以 ADGEN (广告生成) 数据集为例来介绍微调的具体使用。

ADGEN 数据集为根据输入(content)生成一段广告词(summary),具体格式如下所示:

text 复制代码
{
    "content": "类型#上衣*版型#宽松*版型#显瘦*图案#线条*衣样式#衬衫*衣袖型#泡泡袖*衣款式#抽绳",
    "summary": "这件衬衫的款式非常的宽松,利落的线条可以很好的隐藏身材上的小缺点,穿在身上有着很好的显瘦效果。领口装饰了一个可爱的抽绳,漂亮的绳结展现出了十足的个性,配合时尚的泡泡袖型,尽显女性甜美可爱的气息。"
}

请从官网下载 ADGEN 数据集,同通过此**链接** 下载,并将其解压到 AdvertiseGen 目录。

text 复制代码
tar -zxvf AdvertiseGen.tar.gz

查看数据集大小:

text 复制代码
> wc -l AdvertiseGen/*
> 1070 AdvertiseGen/dev.json
> 114599 AdvertiseGen/train.json
> 115669 total

使用DeepSpeed DP+Zero对ChatGLM-6B进行全参数微调

首先,我们使用DeepSpeed对ChatGLM-6B进行全参数微调。

首先,下载源代码,为确保代码的一致性切换到对应的commitid

text 复制代码
git clone https://github.com/THUDM/ChatGLM-6B.git
cd ChatGLM-6B
git checkout 8633db1
cd ptuning

修改ds_train_finetune.sh脚本使用DeepSpeed进行全参数微调。

text 复制代码
LR=1e-4

MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --num_gpus=8 --master_port M A S T E R P O R T m a i n . p y − − d e e p s p e e d d e e p s p e e d . j s o n − − d o t r a i n − − t r a i n f i l e / d a t a / n f s / l l m / d a t a / A d v e r t i s e G e n / t r a i n . j s o n − − t e s t f i l e / d a t a / n f s / l l m / d a t a / A d v e r t i s e G e n / d e v . j s o n − − p r o m p t c o l u m n c o n t e n t − − r e s p o n s e c o l u m n s u m m a r y − − o v e r w r i t e c a c h e − − m o d e l n a m e o r p a t h / d a t a / n f s / l l m / m o d e l / c h a t g l m − 6 b − − o u t p u t d i r / h o m e / g u o d o n g . l i / o u t p u t / a d g e n − c h a t g l m − 6 b − f t − MASTER_PORT main.py \ --deepspeed deepspeed.json \ --do_train \ --train_file /data/nfs/llm/data/AdvertiseGen/train.json \ --test_file /data/nfs/llm/data/AdvertiseGen/dev.json \ --prompt_column content \ --response_column summary \ --overwrite_cache \ --model_name_or_path /data/nfs/llm/model/chatglm-6b \ --output_dir /home/guodong.li/output/adgen-chatglm-6b-ft- MASTERPORTmain.py −−deepspeeddeepspeed.json −−dotrain −−trainfile/data/nfs/llm/data/AdvertiseGen/train.json −−testfile/data/nfs/llm/data/AdvertiseGen/dev.json −−promptcolumncontent −−responsecolumnsummary −−overwritecache −−modelnameorpath/data/nfs/llm/model/chatglm−6b −−outputdir/home/guodong.li/output/adgen−chatglm−6b−ft−LR

--overwrite_output_dir

--max_source_length 64

--max_target_length 64

--per_device_train_batch_size 24

--per_device_eval_batch_size 1

--gradient_accumulation_steps 2

--predict_with_generate

--num_train_epochs 2

--logging_steps 10

--save_steps 300

--learning_rate $LR

--fp16

运行过程:

text 复制代码
> sh ds_train_finetune.sh

[2023-04-14 18:01:33,206] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.

[2023-04-14 18:01:33,417] [INFO] [runner.py:540:main] cmd = /home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=44148 --enable_each_rank_log=None main.py --deepspeed deepspeed.json --do_train --train_file /data/nfs/llm/data/AdvertiseGen/train.json --test_file /data/nfs/llm/data/AdvertiseGen/dev.json --prompt_column content --response_column summary --overwrite_cache --model_name_or_path /data/nfs/llm/model/chatglm-6b --output_dir /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4 --overwrite_output_dir --max_source_length 64 --max_target_length 64 --per_device_train_batch_size 24 --per_device_eval_batch_size 1 --gradient_accumulation_steps 2 --predict_with_generate --num_train_epochs 2 --logging_steps 10 --save_steps 300 --learning_rate 1e-4 --fp16

[2023-04-14 18:01:35,945] [INFO] [launch.py:222:main] 0 NCCL_SOCKET_IFNAME=bond0

[2023-04-14 18:01:35,945] [INFO] [launch.py:222:main] 0 NCCL_IB_DISABLE=1

[2023-04-14 18:01:35,945] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}

[2023-04-14 18:01:35,945] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=8, node_rank=0

[2023-04-14 18:01:35,945] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})

[2023-04-14 18:01:35,945] [INFO] [launch.py:247:main] dist_world_size=8

[2023-04-14 18:01:35,945] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

[2023-04-14 18:01:40,133] [INFO] [comm.py:586:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl

04/14/2023 18:01:41 - WARNING - main - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: True

...

04/14/2023 18:01:41 - WARNING - main - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: True

04/14/2023 18:01:41 - INFO - main - Training/evaluation parameters Seq2SeqTrainingArguments(

_n_gpu=1,

adafactor=False,

adam_beta1=0.9,

adam_beta2=0.999,

adam_epsilon=1e-08,

auto_find_batch_size=False,

bf16=False,

bf16_full_eval=False,

data_seed=None,

dataloader_drop_last=False,

dataloader_num_workers=0,

dataloader_pin_memory=True,

ddp_bucket_cap_mb=None,

ddp_find_unused_parameters=None,

ddp_timeout=1800,

debug=[],

deepspeed=deepspeed.json,

disable_tqdm=False,

do_eval=False,

do_predict=False,

do_train=True,

eval_accumulation_steps=None,

eval_delay=0,

eval_steps=None,

evaluation_strategy=no,

fp16=True,

fp16_backend=auto,

fp16_full_eval=False,

fp16_opt_level=O1,

fsdp=[],

fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},

fsdp_min_num_params=0,

fsdp_transformer_layer_cls_to_wrap=None,

full_determinism=False,

generation_config=None,

generation_max_length=None,

generation_num_beams=None,

gradient_accumulation_steps=2,

gradient_checkpointing=False,

greater_is_better=None,

group_by_length=False,

half_precision_backend=auto,

hub_model_id=None,

hub_private_repo=False,

hub_strategy=every_save,

hub_token=<HUB_TOKEN>,

ignore_data_skip=False,

include_inputs_for_metrics=False,

jit_mode_eval=False,

label_names=None,

label_smoothing_factor=0.0,

learning_rate=0.0001,

length_column_name=length,

load_best_model_at_end=False,

local_rank=0,

log_level=passive,

log_level_replica=warning,

log_on_each_node=True,

logging_dir=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/runs/Apr14_18-01-40_ai-app-2-46,

logging_first_step=False,

logging_nan_inf_filter=True,

logging_steps=10,

logging_strategy=steps,

lr_scheduler_type=linear,

max_grad_norm=1.0,

max_steps=-1,

metric_for_best_model=None,

mp_parameters=,

no_cuda=False,

num_train_epochs=2.0,

optim=adamw_hf,

optim_args=None,

output_dir=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4,

overwrite_output_dir=True,

past_index=-1,

per_device_eval_batch_size=1,

per_device_train_batch_size=24,

predict_with_generate=True,

prediction_loss_only=False,

push_to_hub=False,

push_to_hub_model_id=None,

push_to_hub_organization=None,

push_to_hub_token=<PUSH_TO_HUB_TOKEN>,

ray_scope=last,

remove_unused_columns=True,

report_to=[],

resume_from_checkpoint=None,

run_name=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4,

save_on_each_node=False,

save_safetensors=False,

save_steps=300,

save_strategy=steps,

save_total_limit=None,

seed=42,

sharded_ddp=[],

skip_memory_metrics=True,

sortish_sampler=False,

tf32=None,

torch_compile=False,

torch_compile_backend=None,

torch_compile_mode=None,

torchdynamo=None,

tpu_metrics_debug=False,

tpu_num_cores=None,

use_ipex=False,

use_legacy_prediction_loop=False,

use_mps_device=False,

warmup_ratio=0.0,

warmup_steps=0,

weight_decay=0.0,

xpu_backend=None,

)

04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 184.03it/s]

04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

[WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,664 >> Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.

04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

0%|                                                                                                                                                                                   | 0/2 [00:00<?, ?it/s][WARNING|tokenization_auto.py:675] 2023-04-14 18:03:01,675 >> Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 240.57it/s]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 197.48it/s]

[INFO|configuration_utils.py:666] 2023-04-14 18:03:01,678 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,678 >> Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.

[WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,679 >> Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.

[INFO|configuration_utils.py:666] 2023-04-14 18:03:01,685 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

[INFO|configuration_utils.py:720] 2023-04-14 18:03:01,687 >> Model config ChatGLMConfig {

"_name_or_path": "/data/nfs/llm/model/chatglm-6b",

"architectures": [

"ChatGLMModel"

],

"auto_map": {

"AutoConfig": "configuration_chatglm.ChatGLMConfig",

"AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",

"AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration"

},

"bos_token_id": 130004,

"eos_token_id": 130005,

"gmask_token_id": 130001,

"hidden_size": 4096,

"inner_hidden_size": 16384,

"layernorm_epsilon": 1e-05,

"mask_token_id": 130000,

"max_sequence_length": 2048,

"model_type": "chatglm",

"num_attention_heads": 32,

"num_layers": 28,

"pad_token_id": 3,

"position_encoding_2d": true,

"pre_seq_len": null,

"prefix_projection": false,

"quantization_bit": 0,

"torch_dtype": "float16",

"transformers_version": "4.28.0",

"use_cache": true,

"vocab_size": 130528

}

0%| | 0/2 00:00\WARNING\|tokenization_auto.py:675 2023-04-14 18:03:01,688 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

WARNING\|tokenization_auto.py:675 2023-04-14 18:03:01,689 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

INFO\|tokenization_utils_base.py:1807 2023-04-14 18:03:01,694 >> loading file ice_text.model

INFO\|tokenization_utils_base.py:1807 2023-04-14 18:03:01,694 >> loading file added_tokens.json

INFO\|tokenization_utils_base.py:1807 2023-04-14 18:03:01,694 >> loading file special_tokens_map.json

INFO\|tokenization_utils_base.py:1807 2023-04-14 18:03:01,694 >> loading file tokenizer_config.json

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 00:00\<00:00, 285.37it/s

INFO\|modeling_utils.py:2531 2023-04-14 18:03:01,992 >> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.json

INFO\|configuration_utils.py:575 2023-04-14 18:03:01,993 >> Generate config GenerationConfig {

"_from_model_config": true,

"bos_token_id": 130004,

"eos_token_id": 130005,

"pad_token_id": 3,

"transformers_version": "4.28.0"

}

Loading checkpoint shards: 0%| | 0/8 00:00\WARNING\|auto_factory.py:456 2023-04-14 18:03:02,077 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

WARNING\|auto_factory.py:456 2023-04-14 18:03:02,109 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 00:13\<00:00, 1.70s/it

INFO\|modeling_utils.py:3190 2023-04-14 18:03:15,622 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

INFO\|modeling_utils.py:3198 2023-04-14 18:03:15,622 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /data/nfs/llm/model/chatglm-6b.

If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.

Loading checkpoint shards: 25%|████████████████████████████████████ | 2/8 00:13\<00:40, 6.73s/itINFO\|modeling_utils.py:2839 2023-04-14 18:03:15,703 >> Generation config file not found, using a generation config created from the model config.

...

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 00:34\<00:00, 4.32s/it

input_ids 5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3

inputs 类型#裤版型#宽松 风格#性感图案#线条 裤型#阔腿裤 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还

...

label_ids -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100

labels 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还

2023-04-14 18:06:30,469 INFO logging.py:96:log_dist Rank 0 DeepSpeed Flops Profiler Enabled: False

2023-04-14 18:06:30,470 INFO logging.py:96:log_dist Rank 0 Removing param_group that has no 'params' in the client Optimizer

2023-04-14 18:06:30,470 INFO logging.py:96:log_dist Rank 0 Using client Optimizer as basic optimizer

2023-04-14 18:06:30,483 INFO logging.py:96:log_dist Rank 0 DeepSpeed Basic Optimizer = AdamW

2023-04-14 18:06:30,484 INFO utils.py:51:is_zero_supported_optimizer Checking ZeRO support for optimizer=AdamW type=<class 'transformers.optimization.AdamW'>

2023-04-14 18:06:30,484 WARNING engine.py:1118:_do_optimizer_sanity_check **** You are using ZeRO with an untested optimizer, proceed with caution *****

2023-04-14 18:06:30,484 INFO logging.py:96:log_dist Rank 0 Creating torch.float16 ZeRO stage 2 optimizer

2023-04-14 18:06:30,484 INFO stage_1_and_2.py:133:**init** Reduce bucket size 500000000

2023-04-14 18:06:30,484 INFO stage_1_and_2.py:134:**init** Allgather bucket size 500000000

2023-04-14 18:06:30,484 INFO stage_1_and_2.py:135:**init** CPU Offload: False

2023-04-14 18:06:30,484 INFO stage_1_and_2.py:136:**init** Round robin gradient partitioning: False

Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...

Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...

Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...

Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...

Emitting ninja build file /home/guodong.li/.cache/torch_extensions/py310_cu117/utils/build.ninja...

Building extension module utils...

Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)

Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...

Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...

ninja: no work to do.

Loading extension module utils...

Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...

Time to load utils op: 0.10171675682067871 seconds

Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...

Emitting ninja build file /home/guodong.li/.cache/torch_extensions/py310_cu117/utils/build.ninja...

Building extension module utils...

Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)

ninja: no work to do.

Loading extension module utils...

Time to load utils op: 0.18768668174743652 seconds

...

Loading extension module utils...

Time to load utils op: 0.3021426200866699 seconds

Rank: 2 partition count 8, 8 and sizes(771473408, False), (187392, False)

...

Rank: 4 partition count 8, 8 and sizes(771473408, False), (187392, False)

Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...

No modifications detected for re-loaded extension module utils, skipping build step...

Loading extension module utils...

Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...

Time to load utils op: 0.0005774497985839844 seconds

...

No modifications detected for re-loaded extension module utils, skipping build step...

Loading extension module utils...

Time to load utils op: 0.0011382102966308594 seconds

2023-04-14 18:06:48,321 INFO utils.py:785:see_memory_usage Before initializing optimizer states

2023-04-14 18:06:48,321 INFO utils.py:786:see_memory_usage MA 14.37 GB Max_MA 14.37 GB CA 14.39 GB Max_CA 14 GB

2023-04-14 18:06:48,322 INFO utils.py:793:see_memory_usage CPU Virtual Memory: used = 50.56 GB, percent = 5.0%

04/14/2023 18:06:48 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False...

...

04/14/2023 18:06:48 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False...

2023-04-14 18:06:48,431 INFO utils.py:785:see_memory_usage After initializing optimizer states

2023-04-14 18:06:48,434 INFO utils.py:786:see_memory_usage MA 20.12 GB Max_MA 25.87 GB CA 25.9 GB Max_CA 26 GB

2023-04-14 18:06:48,435 INFO utils.py:793:see_memory_usage CPU Virtual Memory: used = 50.84 GB, percent = 5.0%

2023-04-14 18:06:48,435 INFO stage_1_and_2.py:489:**init** optimizer state initialized

2023-04-14 18:06:48,512 INFO utils.py:785:see_memory_usage After initializing ZeRO optimizer

2023-04-14 18:06:48,513 INFO utils.py:786:see_memory_usage MA 20.12 GB Max_MA 20.12 GB CA 25.9 GB Max_CA 26 GB

2023-04-14 18:06:48,513 INFO utils.py:793:see_memory_usage CPU Virtual Memory: used = 51.29 GB, percent = 5.1%

2023-04-14 18:06:48,515 INFO logging.py:96:log_dist Rank 0 DeepSpeed Final Optimizer = AdamW

2023-04-14 18:06:48,515 INFO logging.py:96:log_dist Rank 0 DeepSpeed using client LR scheduler

2023-04-14 18:06:48,515 INFO logging.py:96:log_dist Rank 0 DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x7f172c367a30>

2023-04-14 18:06:48,515\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] step=0, skipped=0, lr=\[0.0001, 0.0001\], mom=\[(0.9, 0.999), (0.9, 0.999)

2023-04-14 18:06:48,515 INFO config.py:953:print DeepSpeedEngine configuration:

2023-04-14 18:06:48,516 INFO config.py:957:print activation_checkpointing_config {

"partition_activations": false,

"contiguous_memory_optimization": false,

"cpu_checkpointing": false,

"number_checkpoints": null,

"synchronize_checkpoint_boundary": false,

"profile": false

}

2023-04-14 18:06:48,516 INFO config.py:957:print aio_config ... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}

2023-04-14 18:06:48,516 INFO config.py:957:print amp_enabled ... False

2023-04-14 18:06:48,516 INFO config.py:957:print amp_params ... False

2023-04-14 18:06:48,516 INFO config.py:957:print autotuning_config ... {

"enabled": false,

"start_step": null,

"end_step": null,

"metric_path": null,

"arg_mappings": null,

"metric": "throughput",

"model_info": null,

"results_dir": "autotuning_results",

"exps_dir": "autotuning_exps",

"overwrite": true,

"fast": true,

"start_profile_step": 3,

"end_profile_step": 5,

"tuner_type": "gridsearch",

"tuner_early_stopping": 5,

"tuner_num_trials": 50,

"model_info_path": null,

"mp_size": 1,

"max_train_batch_size": null,

"min_train_batch_size": 1,

"max_train_micro_batch_size_per_gpu": 1.024000e+03,

"min_train_micro_batch_size_per_gpu": 1,

"num_tuning_micro_batch_sizes": 3

}

2023-04-14 18:06:48,516 INFO config.py:957:print bfloat16_enabled ... False

2023-04-14 18:06:48,516 INFO config.py:957:print checkpoint_parallel_write_pipeline False

2023-04-14 18:06:48,516 INFO config.py:957:print checkpoint_tag_validation_enabled True

2023-04-14 18:06:48,516 INFO config.py:957:print checkpoint_tag_validation_fail False

2023-04-14 18:06:48,516 INFO config.py:957:print comms_config ... <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f172843d6f0>

2023-04-14 18:06:48,516 INFO config.py:957:print communication_data_type ... None

2023-04-14 18:06:48,516 INFO config.py:957:print compression_config ... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}

2023-04-14 18:06:48,516 INFO config.py:957:print curriculum_enabled_legacy ... False

2023-04-14 18:06:48,516 INFO config.py:957:print curriculum_params_legacy ... False

2023-04-14 18:06:48,516 INFO config.py:957:print data_efficiency_config ... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}

2023-04-14 18:06:48,516 INFO config.py:957:print data_efficiency_enabled ... False

2023-04-14 18:06:48,516 INFO config.py:957:print dataloader_drop_last ... False

2023-04-14 18:06:48,516 INFO config.py:957:print disable_allgather ... False

2023-04-14 18:06:48,516 INFO config.py:957:print dump_state ... False

2023-04-14 18:06:48,516 INFO config.py:957:print dynamic_loss_scale_args ... {'init_scale': 65536, 'scale_window': 1000, 'delayed_shift': 2, 'min_scale': 1}

2023-04-14 18:06:48,516 INFO config.py:957:print eigenvalue_enabled ... False

2023-04-14 18:06:48,516 INFO config.py:957:print eigenvalue_gas_boundary_resolution 1

2023-04-14 18:06:48,516 INFO config.py:957:print eigenvalue_layer_name ... bert.encoder.layer

2023-04-14 18:06:48,517 INFO config.py:957:print eigenvalue_layer_num ... 0

2023-04-14 18:06:48,517 INFO config.py:957:print eigenvalue_max_iter ... 100

2023-04-14 18:06:48,517 INFO config.py:957:print eigenvalue_stability ... 1e-06

2023-04-14 18:06:48,517 INFO config.py:957:print eigenvalue_tol ... 0.01

2023-04-14 18:06:48,517 INFO config.py:957:print eigenvalue_verbose ... False

2023-04-14 18:06:48,517 INFO config.py:957:print elasticity_enabled ... False

2023-04-14 18:06:48,517 INFO config.py:957:print flops_profiler_config ... {

"enabled": false,

"profile_step": 1,

"module_depth": -1,

"top_modules": 1,

"detailed": true,

"output_file": null

}

2023-04-14 18:06:48,517 INFO config.py:957:print fp16_auto_cast ... False

2023-04-14 18:06:48,517 INFO config.py:957:print fp16_enabled ... True

2023-04-14 18:06:48,517 INFO config.py:957:print fp16_master_weights_and_gradients False

2023-04-14 18:06:48,517 INFO config.py:957:print global_rank ... 0

2023-04-14 18:06:48,517 INFO config.py:957:print grad_accum_dtype ... None

2023-04-14 18:06:48,517 INFO config.py:957:print gradient_accumulation_steps ... 1

2023-04-14 18:06:48,517 INFO config.py:957:print gradient_clipping ... 0.0

2023-04-14 18:06:48,517 INFO config.py:957:print gradient_predivide_factor ... 1.0

2023-04-14 18:06:48,517 INFO config.py:957:print hybrid_engine ... enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8

2023-04-14 18:06:48,517 INFO config.py:957:print initial_dynamic_scale ... 65536

2023-04-14 18:06:48,517 INFO config.py:957:print load_universal_checkpoint ... False

2023-04-14 18:06:48,517 INFO config.py:957:print loss_scale ... 0

2023-04-14 18:06:48,517 INFO config.py:957:print memory_breakdown ... False

2023-04-14 18:06:48,517 INFO config.py:957:print monitor_config ... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False

2023-04-14 18:06:48,517 INFO config.py:957:print nebula_config ... {

"enabled": false,

"persistent_storage_path": null,

"persistent_time_interval": 100,

"num_of_version_in_retention": 2,

"enable_nebula_load": true,

"load_path": null

}

2023-04-14 18:06:48,517 INFO config.py:957:print optimizer_legacy_fusion ... False

2023-04-14 18:06:48,517 INFO config.py:957:print optimizer_name ... None

2023-04-14 18:06:48,517 INFO config.py:957:print optimizer_params ... None

2023-04-14 18:06:48,517 INFO config.py:957:print pipeline ... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}

2023-04-14 18:06:48,517 INFO config.py:957:print pld_enabled ... False

2023-04-14 18:06:48,517 INFO config.py:957:print pld_params ... False

2023-04-14 18:06:48,517 INFO config.py:957:print prescale_gradients ... False

2023-04-14 18:06:48,517 INFO config.py:957:print scheduler_name ... None

2023-04-14 18:06:48,517 INFO config.py:957:print scheduler_params ... None

2023-04-14 18:06:48,518 INFO config.py:957:print sparse_attention ... None

2023-04-14 18:06:48,518 INFO config.py:957:print sparse_gradients_enabled ... False

2023-04-14 18:06:48,518 INFO config.py:957:print steps_per_print ... 10

2023-04-14 18:06:48,518 INFO config.py:957:print train_batch_size ... 192

2023-04-14 18:06:48,518 INFO config.py:957:print train_micro_batch_size_per_gpu 24

2023-04-14 18:06:48,518 INFO config.py:957:print use_node_local_storage ... False

2023-04-14 18:06:48,518 INFO config.py:957:print wall_clock_breakdown ... False

2023-04-14 18:06:48,518 INFO config.py:957:print world_size ... 8

2023-04-14 18:06:48,518 INFO config.py:957:print zero_allow_untested_optimizer True

2023-04-14 18:06:48,518 INFO config.py:957:print zero_config ... stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False memory_efficient_linear=True

2023-04-14 18:06:48,518 INFO config.py:957:print zero_enabled ... True

2023-04-14 18:06:48,518 INFO config.py:957:print zero_force_ds_cpu_optimizer ... True

2023-04-14 18:06:48,518 INFO config.py:957:print zero_optimization_stage ... 2

2023-04-14 18:06:48,518 INFO config.py:943:print_user_config json = {

"train_micro_batch_size_per_gpu": 24,

"zero_allow_untested_optimizer": true,

"fp16": {

"enabled": true,

"loss_scale": 0,

"initial_scale_power": 16,

"loss_scale_window": 1000,

"hysteresis": 2,

"min_loss_scale": 1

},

"zero_optimization": {

"stage": 2,

"allgather_partitions": true,

"allgather_bucket_size": 5.000000e+08,

"overlap_comm": false,

"reduce_scatter": true,

"reduce_bucket_size": 5.000000e+08,

"contiguous_gradients": true

}

}

Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...

No modifications detected for re-loaded extension module utils, skipping build step...

Loading extension module utils...

Time to load utils op: 0.00031948089599609375 seconds

0%| | 0/596 00:00\04/14/2023 18:06:48 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False...

2023-04-14 18:06:53,718 INFO loss_scaler.py:188:update_scale deepspeed OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1

2023-04-14 18:06:55,883 INFO loss_scaler.py:181:update_scale deepspeed OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768

0%|▎ | 1/596 00:07\<1:13:02, 7.37s/it2023-04-14 18:06:57,948 INFO loss_scaler.py:181:update_scale deepspeed OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384

2023-04-14 18:07:00,007 INFO loss_scaler.py:181:update_scale deepspeed OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192

0%|▌ | 2/596 00:11\<54:01, 5.46s/it2023-04-14 18:07:06,332 INFO loss_scaler.py:181:update_scale deepspeed OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096

1%|▊ | 3/596 00:17\<57:51, 5.85s/it2023-04-14 18:07:08,383 INFO loss_scaler.py:181:update_scale deepspeed OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048

1%|█▏ | 4/596 00:24\<59:20, 6.01s/it2023-04-14 18:07:18,876 INFO loss_scaler.py:181:update_scale deepspeed OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024

2023-04-14 18:07:18,876\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] step=10, skipped=7, lr=\[9.949664429530202e-05, 9.949664429530202e-05\], mom=\[(0.9, 0.999), (0.9, 0.999)

2023-04-14 18:07:18,877 INFO timer.py:199:stop epoch=0/micro_step=10/global_step=10, RunningAvgSamplesPerSec=66.98818896434254, CurrSamplesPerSec=93.79590019766518, MemAllocated=21.59GB, MaxMemAllocated=28.8GB

1%|█▍ | 5/596 00:30\<1:00:11, 6.11s/it

...

2023-04-14 18:47:55,207\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] step=590, skipped=12, lr=\[3.02013422818792e-06, 3.02013422818792e-06\], mom=\[(0.9, 0.999), (0.9, 0.999)

2023-04-14 18:47:57,392 INFO timer.py:199:stop epoch=0/micro_step=590/global_step=590, RunningAvgSamplesPerSec=45.931193758598916, CurrSamplesPerSec=45.63412532914195, MemAllocated=21.59GB, MaxMemAllocated=28.8GB

50%|███████████████████████████████████████████████████████████████████████████████████▊ | 299/596 41:42\<41:37, 8.41s/it2023-04-14 18:48:37,273 INFO logging.py:96:log_dist Rank 0 step=600, skipped=12, lr=1.3422818791946309e-06, 1.3422818791946309e-06, mom=(0.9, 0.999), (0.9, 0.999)

2023-04-14 18:48:39,453 INFO timer.py:199:stop epoch=0/micro_step=600/global_step=600, RunningAvgSamplesPerSec=45.92850276413307, CurrSamplesPerSec=45.66031263997641, MemAllocated=21.59GB, MaxMemAllocated=28.8GB

{'loss': 13.3487, 'learning_rate': 1.3422818791946309e-06, 'epoch': 1.01}

50%|████████████████████████████████████████████████████████████████████████████████████ | 300/596 41:50\<41:30, 8.41s/itSaving the whole model

INFO\|configuration_utils.py:457 2023-04-14 18:48:39,458 >> Configuration saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/config.json

INFO\|configuration_utils.py:362 2023-04-14 18:48:39,459 >> Configuration saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/generation_config.json

INFO\|modeling_utils.py:1855 2023-04-14 18:49:03,951 >> The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/pytorch_model.bin.index.json.

INFO\|tokenization_utils_base.py:2171 2023-04-14 18:49:03,953 >> tokenizer config file saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/tokenizer_config.json

INFO\|tokenization_utils_base.py:2178 2023-04-14 18:49:03,953 >> Special tokens file saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/special_tokens_map.json

2023-04-14 18:49:03,983 INFO logging.py:96:log_dist Rank 0 Torch Checkpoint global_step600 is about to be saved!

2023-04-14 18:49:03,988 INFO logging.py:96:log_dist Rank 0 Saving model checkpoint: /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt

2023-04-14 18:49:03,988 INFO torch_checkpoint_engine.py:21:save Torch Saving /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt...

2023-04-14 18:49:15,934 INFO torch_checkpoint_engine.py:23:save Torch Saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt.

2023-04-14 18:49:15,937 INFO torch_checkpoint_engine.py:21:save Torch Saving /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt...

2023-04-14 18:49:28,049 INFO torch_checkpoint_engine.py:23:save Torch Saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt.

2023-04-14 18:49:28,049 INFO engine.py:3125:_save_zero_checkpoint zero checkpoint saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt

2023-04-14 18:49:28,049 INFO torch_checkpoint_engine.py:33:commit Torch Checkpoint global_step600 is ready now!

51%|████████████████████████████████████████████████████████████████████████████████████▏ | 304/596 43:14\<1:05:51, 13.53s/it2023-04-14 18:50:09,137 INFO logging.py:96:log_dist Rank 0 step=610, skipped=12, lr=0.0, 0.0, mom=(0.9, 0.999), (0.9, 0.999)

2023-04-14 18:50:11,316 INFO timer.py:199:stop epoch=0/micro_step=610/global_step=610, RunningAvgSamplesPerSec=45.926876625767875, CurrSamplesPerSec=45.66709917655267, MemAllocated=21.59GB, MaxMemAllocated=28.8GB

52%|██████████████████████████████████████████████████████████████████████████████████████▌ | 309/596 43:56\<44:16, 9.26s/it2023-04-14 18:50:51,114 INFO logging.py:96:log_dist Rank 0 step=620, skipped=12, lr=0.0, 0.0, mom=(0.9, 0.999), (0.9, 0.999)

2023-04-14 18:50:53,302 INFO timer.py:199:stop epoch=0/micro_step=620/global_step=620, RunningAvgSamplesPerSec=45.92462533252217, CurrSamplesPerSec=45.55552426651123, MemAllocated=21.59GB, MaxMemAllocated=28.8GB

{'loss': 13.3202, 'learning_rate': 0.0, 'epoch': 1.04}

...

99%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 589/596 1:23:07\<00:58, 8.41s/it2023-04-14 19:30:02,654 INFO logging.py:96:log_dist Rank 0 step=1180, skipped=12, lr=0.0, 0.0, mom=(0.9, 0.999), (0.9, 0.999)

2023-04-14 19:30:04,820 INFO timer.py:199:stop epoch=0/micro_step=1180/global_step=1180, RunningAvgSamplesPerSec=45.85904109663022, CurrSamplesPerSec=45.73521852038509, MemAllocated=21.59GB, MaxMemAllocated=28.8GB

{'loss': 13.3537, 'learning_rate': 0.0, 'epoch': 1.98}

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍| 594/596 1:23:49\<00:16, 8.41s/it2023-04-14 19:30:44,847 INFO logging.py:96:log_dist Rank 0 step=1190, skipped=12, lr=0.0, 0.0, mom=(0.9, 0.999), (0.9, 0.999)

2023-04-14 19:30:47,022 INFO timer.py:199:stop epoch=0/micro_step=1190/global_step=1190, RunningAvgSamplesPerSec=45.856487437478386, CurrSamplesPerSec=45.579988341622055, MemAllocated=21.59GB, MaxMemAllocated=28.8GB

{'train_runtime': 5046.8863, 'train_samples_per_second': 45.414, 'train_steps_per_second': 0.118, 'train_loss': 13.905431555421561, 'epoch': 2.0}

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 596/596 1:24:06\<00:00, 8.47s/it

***** train metrics *****

epoch = 2.0

train_loss = 13.9054

train_runtime = 1:24:06.88

train_samples = 114599

train_samples_per_second = 45.414

train_steps_per_second = 0.118

2023-04-14 19:30:58,560 INFO launch.py:460:main Process 35198 exits successfully.

2023-04-14 19:30:58,561 INFO launch.py:460:main Process 35192 exits successfully.

2023-04-14 19:30:58,561 INFO launch.py:460:main Process 35193 exits successfully.

2023-04-14 19:30:58,561 INFO launch.py:460:main Process 35195 exits successfully.

2023-04-14 19:30:58,561 INFO launch.py:460:main Process 35191 exits successfully.

2023-04-14 19:30:59,562 INFO launch.py:460:main Process 35194 exits successfully.

2023-04-14 19:30:59,563 INFO launch.py:460:main Process 35197 exits successfully.

2023-04-14 19:31:00,564 INFO launch.py:460:main Process 35196 exits successfully.

GPU显存占用:

text 复制代码
Fri Apr 14 18:27:45 2023

±----------------------------------------------------------------------------+

| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |

|-------------------------------±---------------------±---------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|                               |                      |               MIG M. |

|=++==============|

|   0  NVIDIA A800 80G...  Off  | 00000000:34:00.0 Off |                    0 |

| N/A   59C    P0    92W / 300W |  36539MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   1  NVIDIA A800 80G...  Off  | 00000000:35:00.0 Off |                    0 |

| N/A   61C    P0    96W / 300W |  38395MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   2  NVIDIA A800 80G...  Off  | 00000000:36:00.0 Off |                    0 |

| N/A   63C    P0    93W / 300W |  38395MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   3  NVIDIA A800 80G...  Off  | 00000000:37:00.0 Off |                    0 |

| N/A   65C    P0   102W / 300W |  38347MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   4  NVIDIA A800 80G...  Off  | 00000000:9B:00.0 Off |                    0 |

| N/A   64C    P0   108W / 300W |  38347MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   5  NVIDIA A800 80G...  Off  | 00000000:9C:00.0 Off |                    0 |

| N/A   64C    P0   105W / 300W |  38395MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   6  NVIDIA A800 80G...  Off  | 00000000:9D:00.0 Off |                    0 |

| N/A   58C    P0    97W / 300W |  36433MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   7  NVIDIA A800 80G...  Off  | 00000000:9E:00.0 Off |                    0 |

| N/A   59C    P0    92W / 300W |  38347MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+

| Processes: |

| GPU GI CI PID Type Process name GPU Memory |

| ID ID Usage |

|=============================================================================|

| 0 N/A N/A 35191 C ...nv-py310-cu117/bin/python 36537MiB |

| 1 N/A N/A 35192 C ...nv-py310-cu117/bin/python 38393MiB |

| 2 N/A N/A 35193 C ...nv-py310-cu117/bin/python 38393MiB |

| 3 N/A N/A 35194 C ...nv-py310-cu117/bin/python 38345MiB |

| 4 N/A N/A 35195 C ...nv-py310-cu117/bin/python 38345MiB |

| 5 N/A N/A 35196 C ...nv-py310-cu117/bin/python 38393MiB |

| 6 N/A N/A 35197 C ...nv-py310-cu117/bin/python 36431MiB |

| 7 N/A N/A 35198 C ...nv-py310-cu117/bin/python 38345MiB |

±----------------------------------------------------------------------------+

输出文件:

text 复制代码
 tree /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4

/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4

├── all_results.json

├── checkpoint-300

│   ├── config.json

│   ├── configuration_chatglm.py

│   ├── generation_config.json

│   ├── global_step600

│   │   ├── mp_rank_00_model_states.pt

│   │   ├── zero_pp_rank_0_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_1_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_2_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_3_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_4_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_5_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_6_mp_rank_00_optim_states.pt

│   │   └── zero_pp_rank_7_mp_rank_00_optim_states.pt

│   ├── ice_text.model

│   ├── latest

│   ├── modeling_chatglm.py

│   ├── pytorch_model-00001-of-00002.bin

│   ├── pytorch_model-00002-of-00002.bin

│   ├── pytorch_model.bin.index.json

│   ├── quantization.py

│   ├── rng_state_0.pth

│   ├── rng_state_1.pth

│   ├── rng_state_2.pth

│   ├── rng_state_3.pth

│   ├── rng_state_4.pth

│   ├── rng_state_5.pth

│   ├── rng_state_6.pth

│   ├── rng_state_7.pth

│   ├── special_tokens_map.json

│   ├── tokenization_chatglm.py

│   ├── tokenizer_config.json

│   ├── trainer_state.json

│   ├── training_args.bin

│   └── zero_to_fp32.py

├── trainer_state.json

└── train_results.json

2 directories, 36 files

训练结束后没有保存模型权重,只保存了训练过程中的checkpoint,可在代码中添加trainer.save_model()进行保存。

使用DeepSpeed进行full finetuning,对于显存要求较高,且训练较慢。因此下面尝试使用官网提供的P-Tuning v2进行高效参数微调。

使用P-Tuning v2对ChatGLM-6B进行参数高效微调

对于 ChatGLM-6B 模型基于 P-Tuning v2 进行微调。可将需要微调的参数量减少到原来的 0.1%,再通过模型量化、Gradient Checkpoint 等方法,最低只需要 7GB 显存即可运行。

首先,修改train.sh脚本,主要是修改train_filevalidation_filemodel_name_or_pathoutput_dir参数:

text 复制代码
PRE_SEQ_LEN=128

LR=2e-2

CUDA_VISIBLE_DEVICES=0 python3 main.py

--do_train

--train_file /data/nfs/llm/data/AdvertiseGen/train.json

--validation_file /data/nfs/llm/data/AdvertiseGen/dev.json

--prompt_column content

--response_column summary

--overwrite_cache

--model_name_or_path /data/nfs/llm/model/chatglm-6b

--output_dir /home/guodong.li/output/adgen-chatglm-6b-pt- P R E S E Q L E N − PRE_SEQ_LEN- PRESEQLEN−LR

--overwrite_output_dir

--max_source_length 64

--max_target_length 64

--per_device_train_batch_size 1

--per_device_eval_batch_size 1

--gradient_accumulation_steps 16

--predict_with_generate

--max_steps 3000

--logging_steps 10

--save_steps 1000

--learning_rate $LR

--pre_seq_len $PRE_SEQ_LEN

--quantization_bit 4

运行过程:

text 复制代码
  0%|                  | 0/3000 [00:00<?, ?it/s]

...

{'loss': 4.2962, 'learning_rate': 0.0196, 'epoch': 0.01}

{'loss': 4.3112, 'learning_rate': 0.019533333333333333, 'epoch': 0.01}

2%|███▊             | 70/3000 [03:20<2:17:06,  2.81s/it]

GPU显存占用:

text 复制代码
|-------------------------------±---------------------±---------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|                               |                      |               MIG M. |

|=++==============|

|   0  NVIDIA A800 80G...  Off  | 00000000:34:00.0 Off |                    0 |

| N/A   71C    P0   300W / 300W |   6291MiB / 81920MiB |     74%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

对显存的占用确实低,即使用了P-Tuning v2进行参数高效微调,但训练的速度还是很慢。

修改train.sh增大batch_size继续干。

text 复制代码
PRE_SEQ_LEN=128

LR=2e-2

CUDA_VISIBLE_DEVICES=0 python3 main.py

--do_train

--train_file /data/nfs/llm/data/AdvertiseGen/train.json

--validation_file /data/nfs/llm/data/AdvertiseGen/dev.json

--prompt_column content

--response_column summary

--overwrite_cache

--model_name_or_path /data/nfs/llm/model/chatglm-6b

--output_dir /home/guodong.li/output/adgen-chatglm-6b-pt- P R E S E Q L E N − PRE_SEQ_LEN- PRESEQLEN−LR

--overwrite_output_dir

--max_source_length 64

--max_target_length 64

--per_device_train_batch_size 128

--per_device_eval_batch_size 8

--gradient_accumulation_steps 16

--predict_with_generate

--num_train_epochs 1

--logging_steps 10

--save_steps 100

--learning_rate $LR

--pre_seq_len $PRE_SEQ_LEN

--quantization_bit 4

运行过程:

text 复制代码
sh train.sh

04/14/2023 19:46:38 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: Fals

04/14/2023 19:46:38 - INFO - main - Training/evaluation parameters Seq2SeqTrainingArguments(

_n_gpu=1,

adafactor=False,

adam_beta1=0.9,

adam_beta2=0.999,

adam_epsilon=1e-08,

auto_find_batch_size=False,

bf16=False,

bf16_full_eval=False,

data_seed=None,

dataloader_drop_last=False,

dataloader_num_workers=0,

dataloader_pin_memory=True,

ddp_bucket_cap_mb=None,

ddp_find_unused_parameters=None,

ddp_timeout=1800,

debug=[],

deepspeed=None,

disable_tqdm=False,

do_eval=False,

do_predict=False,

do_train=True,

eval_accumulation_steps=None,

eval_delay=0,

eval_steps=None,

evaluation_strategy=no,

fp16=False,

fp16_backend=auto,

fp16_full_eval=False,

fp16_opt_level=O1,

fsdp=[],

fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},

fsdp_min_num_params=0,

fsdp_transformer_layer_cls_to_wrap=None,

full_determinism=False,

generation_config=None,

generation_max_length=None,

generation_num_beams=None,

gradient_accumulation_steps=16,

gradient_checkpointing=False,

greater_is_better=None,

group_by_length=False,

half_precision_backend=auto,

hub_model_id=None,

hub_private_repo=False,

hub_strategy=every_save,

hub_token=<HUB_TOKEN>,

ignore_data_skip=False,

include_inputs_for_metrics=False,

jit_mode_eval=False,

label_names=None,

label_smoothing_factor=0.0,

learning_rate=0.02,

length_column_name=length,

load_best_model_at_end=False,

local_rank=-1,

log_level=passive,

log_level_replica=warning,

log_on_each_node=True,

logging_dir=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/runs/Apr14_19-46-38_ai-app-2-46,

logging_first_step=False,

logging_nan_inf_filter=True,

logging_steps=10,

logging_strategy=steps,

lr_scheduler_type=linear,

max_grad_norm=1.0,

max_steps=-1,

metric_for_best_model=None,

mp_parameters=,

no_cuda=False,

num_train_epochs=1.0,

optim=adamw_hf,

optim_args=None,

output_dir=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2,

overwrite_output_dir=True,

past_index=-1,

per_device_eval_batch_size=8,

per_device_train_batch_size=128,

predict_with_generate=True,

prediction_loss_only=False,

push_to_hub=False,

push_to_hub_model_id=None,

push_to_hub_organization=None,

push_to_hub_token=<PUSH_TO_HUB_TOKEN>,

ray_scope=last,

remove_unused_columns=True,

report_to=[],

resume_from_checkpoint=None,

run_name=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2,

save_on_each_node=False,

save_safetensors=False,

save_steps=100,

save_strategy=steps,

save_total_limit=None,

seed=42,

sharded_ddp=[],

skip_memory_metrics=True,

sortish_sampler=False,

tf32=None,

torch_compile=False,

torch_compile_backend=None,

torch_compile_mode=None,

torchdynamo=None,

tpu_metrics_debug=False,

tpu_num_cores=None,

use_ipex=False,

use_legacy_prediction_loop=False,

use_mps_device=False,

warmup_ratio=0.0,

warmup_steps=0,

weight_decay=0.0,

xpu_backend=None,

)

04/14/2023 19:47:58 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-1cf934bed8e233e6e)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

[INFO|configuration_utils.py:666] 2023-04-14 19:47:58,671 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[WARNING|configuration_auto.py:925] 2023-04-14 19:47:58,671 >> Explicitly passing a `revision` is encouraged when loading a configuratio a newer revision.

[INFO|configuration_utils.py:666] 2023-04-14 19:47:58,679 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[INFO|configuration_utils.py:720] 2023-04-14 19:47:58,681 >> Model config ChatGLMConfig {

"_name_or_path": "/data/nfs/llm/model/chatglm-6b",

"architectures": [

"ChatGLMModel"

],

"auto_map": {

"AutoConfig": "configuration_chatglm.ChatGLMConfig",

"AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",

"AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration"

},

"bos_token_id": 130004,

"eos_token_id": 130005,

"gmask_token_id": 130001,

"hidden_size": 4096,

"inner_hidden_size": 16384,

"layernorm_epsilon": 1e-05,

"mask_token_id": 130000,

"max_sequence_length": 2048,

"model_type": "chatglm",

"num_attention_heads": 32,

"num_layers": 28,

"pad_token_id": 3,

"position_encoding_2d": true,

"pre_seq_len": null,

"prefix_projection": false,

"quantization_bit": 0,

"torch_dtype": "float16",

"transformers_version": "4.28.0",

"use_cache": true,

"vocab_size": 130528

}

WARNING\|tokenization_auto.py:675 2023-04-14 19:47:58,683 >> Explicitly passing a revision is encouraged when loading a model with curevision.

INFO\|tokenization_utils_base.py:1807 2023-04-14 19:47:58,692 >> loading file ice_text.model

INFO\|tokenization_utils_base.py:1807 2023-04-14 19:47:58,692 >> loading file added_tokens.json

INFO\|tokenization_utils_base.py:1807 2023-04-14 19:47:58,692 >> loading file special_tokens_map.json

INFO\|tokenization_utils_base.py:1807 2023-04-14 19:47:58,692 >> loading file tokenizer_config.json

WARNING\|auto_factory.py:456 2023-04-14 19:47:59,089 >> Explicitly passing a revision is encouraged when loading a model with custom ion.

INFO\|modeling_utils.py:2531 2023-04-14 19:47:59,115 >> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.jso

INFO\|configuration_utils.py:575 2023-04-14 19:47:59,117 >> Generate config GenerationConfig {

"_from_model_config": true,

"bos_token_id": 130004,

"eos_token_id": 130005,

"pad_token_id": 3,

"transformers_version": "4.28.0"

}

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████

INFO\|modeling_utils.py:3190 2023-04-14 19:48:08,508 >> All model checkpoint weights were used when initializing ChatGLMForConditionalG

WARNING\|modeling_utils.py:3192\] 2023-04-14 19:48:08,508 \>\> Some weights of ChatGLMForConditionalGeneration were not initialized from thtialized: \['transformer.prefix_encoder.embedding.weight'

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

INFO\|modeling_utils.py:2839 2023-04-14 19:48:08,548 >> Generation config file not found, using a generation config created from the mo

Quantized to 4 bit

input_ids 5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 15388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65564219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 6 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3

inputs 类型#裤版型#宽松 风格#性感图案#线条 裤型#阔腿裤 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长适贴身体验感棒棒哒。系带部分增加设计看点,还

label_ids [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 741-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100

labels 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自

/home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/lib/python3.10/site-packages/transformers/optimization.py:391: FutureWain a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warn

warnings.warn(

0%| 04/14/2023 19:51:19 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkp

{'loss': 6.0246, 'learning_rate': 0.016428571428571428, 'epoch': 0.18}

{'loss': 7.8721, 'learning_rate': 0.012857142857142859, 'epoch': 0.36}

{'loss': 8.2653, 'learning_rate': 0.009285714285714286, 'epoch': 0.54}

{'loss': 8.6636, 'learning_rate': 0.005714285714285714, 'epoch': 0.71}

{'loss': 8.5985, 'learning_rate': 0.002142857142857143, 'epoch': 0.89}

{'train_runtime': 4868.4062, 'train_samples_per_second': 23.539, 'train_steps_per_second': 0.012, 'train_loss': 7.956800188337054, 'epoc

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

***** train metrics *****

epoch = 1.0

train_loss = 7.9568

train_runtime = 1:21:08.40

train_samples = 114599

train_samples_per_second = 23.539

train_steps_per_second = 0.012

显存占用:

text 复制代码
Sun Apr 16 19:53:00 2023

±----------------------------------------------------------------------------+

| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |

|-------------------------------±---------------------±---------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|                               |                      |               MIG M. |

|=++==============|

|   0  NVIDIA A800 80G...  Off  | 00000000:34:00.0 Off |                    0 |

| N/A   71C    P0   281W / 300W |  63275MiB / 81920MiB |     92%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+

| Processes: |

| GPU GI CI PID Type Process name GPU Memory |

| ID ID Usage |

|=============================================================================|

| 0 N/A N/A 20126 C python3 63273MiB |

±----------------------------------------------------------------------------+

输出文件:

text 复制代码
> ls -al  /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2

total 12

drwxrwxr-x 2 guodong.li guodong.li   98 Apr 14 21:12 .

drwxrwxr-x 8 guodong.li guodong.li  177 Apr 14 17:12 ...

-rw-rw-r-- 1 guodong.li guodong.li  195 Apr 14 21:12 all_results.json

-rw-rw-r-- 1 guodong.li guodong.li 1185 Apr 14 21:12 trainer_state.json

-rw-rw-r-- 1 guodong.li guodong.li  195 Apr 14 21:12 train_results.json

可以看到,通过调整batch_size,显存使用及利用率都提升上去了。

如果需要使用DeepSpeed进行数据并行,可参考如下命令:

text 复制代码
PRE_SEQ_LEN=128

LR=2e-2

deepspeed --include localhost:1,2,3 --master_port 29001 main.py

--deepspeed deepspeed.json

--do_train

--train_file /data/nfs/llm/data/AdvertiseGen/train.json

--validation_file /data/nfs/llm/data/AdvertiseGen/dev.json

--prompt_column content

--response_column summary

--overwrite_cache

--model_name_or_path /data/nfs/llm/model/chatglm-6b

--output_dir /home/guodong.li/output/adgen-chatglm-6b-pt

--overwrite_output_dir

--max_source_length 64

--max_target_length 64

--per_device_train_batch_size 128

--per_device_eval_batch_size 8

--gradient_accumulation_steps 16

--predict_with_generate

--num_train_epochs 10

--logging_steps 10

--save_steps 100

--learning_rate $LR

--pre_seq_len $PRE_SEQ_LEN

模型评估

修改evaluate.sh文件,修改model_name_or_path(模型路径),ptuning_checkpoint(P-Tuning v2微调之后的权重路径)等参数:

text 复制代码
PRE_SEQ_LEN=128

CHECKPOINT=adgen-chatglm-6b-pt-128-2e-2

STEP=3000

PRE_SEQ_LEN=128

CHECKPOINT=adgen-chatglm-6b-pt-128-2e-2

STEP=3000

CUDA_VISIBLE_DEVICES=1 python3 main.py

--do_predict

--validation_file /data/nfs/llm/data/AdvertiseGen/dev.json

--test_file /data/nfs/llm/data/AdvertiseGen/dev.json

--overwrite_cache

--prompt_column content

--response_column summary

--model_name_or_path /data/nfs/llm/model/chatglm-6b

--ptuning_checkpoint /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500

--output_dir /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500

--overwrite_output_dir

--max_source_length 64

--max_target_length 64

--per_device_eval_batch_size 1

--predict_with_generate

--pre_seq_len $PRE_SEQ_LEN

--quantization_bit 4

运行过程:

text 复制代码
sh evaluate.sh

04/16/2023 20:18:01 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False

04/16/2023 20:18:01 - INFO - main - Training/evaluation parameters Seq2SeqTrainingArguments(

_n_gpu=1,

adafactor=False,

adam_beta1=0.9,

adam_beta2=0.999,

adam_epsilon=1e-08,

auto_find_batch_size=False,

...

fp16=False,

fp16_backend=auto,

fp16_full_eval=False,

fp16_opt_level=O1,

fsdp=[],

fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},

fsdp_min_num_params=0,

fsdp_transformer_layer_cls_to_wrap=None,

full_determinism=False,

generation_config=None,

...

warmup_ratio=0.0,

warmup_steps=0,

weight_decay=0.0,

xpu_backend=None,

)

Downloading and preparing dataset json/default to /home/guodong.li/.cache/huggingface/datasets/json/default-df42438b5ccb0b44/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e...

Downloading data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3419.73it/s]

Extracting data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 196.48it/s]

Dataset json downloaded and prepared to /home/guodong.li/.cache/huggingface/datasets/json/default-df42438b5ccb0b44/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e. Subsequent calls will reuse this data.

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 326.85it/s]

[INFO|configuration_utils.py:666] 2023-04-16 20:19:21,784 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[WARNING|configuration_auto.py:925] 2023-04-16 20:19:21,785 >> Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.

[INFO|configuration_utils.py:666] 2023-04-16 20:19:21,792 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[INFO|configuration_utils.py:720] 2023-04-16 20:19:21,795 >> Model config ChatGLMConfig {

"_name_or_path": "/data/nfs/llm/model/chatglm-6b",

"architectures": [

"ChatGLMModel"

],

"auto_map": {

"AutoConfig": "configuration_chatglm.ChatGLMConfig",

"AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",

"AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration"

},

"bos_token_id": 130004,

"eos_token_id": 130005,

"gmask_token_id": 130001,

"hidden_size": 4096,

"inner_hidden_size": 16384,

"layernorm_epsilon": 1e-05,

"mask_token_id": 130000,

"max_sequence_length": 2048,

"model_type": "chatglm",

"num_attention_heads": 32,

"num_layers": 28,

"pad_token_id": 3,

"position_encoding_2d": true,

"pre_seq_len": null,

"prefix_projection": false,

"quantization_bit": 0,

"torch_dtype": "float16",

"transformers_version": "4.28.0",

"use_cache": true,

"vocab_size": 130528

}

WARNING\|tokenization_auto.py:675 2023-04-16 20:19:21,797 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

INFO\|tokenization_utils_base.py:1807 2023-04-16 20:19:21,805 >> loading file ice_text.model

INFO\|tokenization_utils_base.py:1807 2023-04-16 20:19:21,805 >> loading file added_tokens.json

INFO\|tokenization_utils_base.py:1807 2023-04-16 20:19:21,805 >> loading file special_tokens_map.json

INFO\|tokenization_utils_base.py:1807 2023-04-16 20:19:21,805 >> loading file tokenizer_config.json

WARNING\|auto_factory.py:456 2023-04-16 20:19:22,186 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

INFO\|modeling_utils.py:2531 2023-04-16 20:19:22,222 >> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.json

INFO\|configuration_utils.py:575 2023-04-16 20:19:22,224 >> Generate config GenerationConfig {

"_from_model_config": true,

"bos_token_id": 130004,

"eos_token_id": 130005,

"pad_token_id": 3,

"transformers_version": "4.28.0"

}

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 00:08\<00:00, 1.04s/it

INFO\|modeling_utils.py:3190 2023-04-16 20:19:30,912 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

WARNING\|modeling_utils.py:3192\] 2023-04-16 20:19:30,912 \>\> Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at /data/nfs/llm/model/chatglm-6b and are newly initialized: \['transformer.prefix_encoder.embedding.weight'

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

INFO\|modeling_utils.py:2839 2023-04-16 20:19:30,967 >> Generation config file not found, using a generation config created from the model config.

Quantized to 4 bit

input_ids 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 65421, 61, 75898, 32, 68554, 61, 77257, 64555, 32, 65107, 61, 66268, 32, 65347, 61, 71689, 32, 69768, 61, 85428, 32, 65173, 73942, 61, 70984, 32, 65173, 70936, 61, 64703, 65509, 130001, 130004

inputs 类型#上衣材质#牛仔布 颜色#白色风格#简约 图案#刺绣衣样式#外套 衣款式#破洞

label_ids 5, 71689, 66561, 67061, 77257, 70984, 6, 72194, 65173, 64290, 64622, 81549, 63823, 65173, 64290, 83343, 63832, 63912, 65209, 64703, 65509, 64051, 6, 69418, 78598, 87019, 6, 64257, 71319, 66069, 74197, 63823, 65173, 72265, 64880, 64131, 63832, 73416, 85428, 66261, 6, 65594, 87834, 6, 73412, 105145, 65388, 63823, 130001, 130004

labels 简约而不简单的牛仔外套,白色的衣身十分百搭。衣身多处有做旧破洞设计,打破单调乏味,增加一丝造型看点。衣身后背处有趣味刺绣装饰,丰富层次感,彰显别样时尚。

04/16/2023 20:21:30 - INFO - main - *** Predict ***

INFO\|configuration_utils.py:575 2023-04-16 20:21:30,090 >> Generate config GenerationConfig {

"_from_model_config": true,

"bos_token_id": 130004,

"eos_token_id": 130005,

"pad_token_id": 3,

"transformers_version": "4.28.0"

}

0%| | 0/1070 00:00\INFO\|configuration_utils.py:575 2023-04-16 20:21:34,430 >> Generate config GenerationConfig {

"_from_model_config": true,

"bos_token_id": 130004,

"eos_token_id": 130005,

"pad_token_id": 3,

"transformers_version": "4.28.0"

}

0%|▎ | 2/1070 00:02\<25:39, 1.44s/itINFO\|configuration_utils.py:575 2023-04-16 20:21:37,311 >> Generate config GenerationConfig {

"_from_model_config": true,

"bos_token_id": 130004,

"eos_token_id": 130005,

"pad_token_id": 3,

"transformers_version": "4.28.0"

}

0%|▍ | 3/1070

...

1%|█▎ | 8/1070 00:20\<50:13, 2.84s/itINFO\|configuration_utils.py:575 2023-04-16 20:21:55,233 >> Generate config GenerationConfig {

"_from_model_config": true,

"bos_token_id": 130004,

"eos_token_id": 130005,

"pad_token_id": 3,

"transformers_version": "4.28.0"

}

1%|█▍ | 9/1070 00:23\<50:24, 2.85s/itINFO\|configuration_utils.py:575 2023-04-16 20:21:58,112 >> Generate config GenerationConfig {

"_from_model_config": true,

"bos_token_id": 130004,

"eos_token_id": 130005,

"pad_token_id": 3,

"transformers_version": "4.28.0"

}

1%|█▌ | 10/1070 00:26\<50:30, 2.86s/itINFO\|configuration_utils.py:575 2023-04-16 20:22:00,990 >> Generate config GenerationConfig {

"_from_model_config": true,

"bos_token_id": 130004,

"eos_token_id": 130005,

"pad_token_id": 3,

"transformers_version": "4.28.0"

}

1%|█▋ | 11/1070 00:29\<50:37, 2.87s/itINFO\|configuration_utils.py:575 2023-04-16 20:22:03,880 >> Generate config GenerationConfig {

"_from_model_config": true,

"bos_token_id": 130004,

"eos_token_id": 130005,

"pad_token_id": 3,

"transformers_version": "4.28.0"

}

1%|█▊ | 12/1070 00:32\<50:38, 2.87s/itINFO\|configuration_utils.py:575 2023-04-16 20:22:06,761 >> Generate config GenerationConfig {

"_from_model_config": true,

"bos_token_id": 130004,

"eos_token_id": 130005,

"pad_token_id": 3,

"transformers_version": "4.28.0"

}

...

INFO\|configuration_utils.py:575 2023-04-16 21:13:16,240 >> Generate config GenerationConfig {

"_from_model_config": true,

"bos_token_id": 130004,

"eos_token_id": 130005,

"pad_token_id": 3,

"transformers_version": "4.28.0"

}

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 1069/1070 51:44\<00:02, 2.92s/itINFO\|configuration_utils.py:575 2023-04-16 21:13:19,107 >> Generate config GenerationConfig {

"_from_model_config": true,

"bos_token_id": 130004,

"eos_token_id": 130005,

"pad_token_id": 3,

"transformers_version": "4.28.0"

}

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1070/1070 51:47\<00:00, 2.90s/itBuilding prefix dict from the default dictionary ...

04/16/2023 21:13:22 - DEBUG - jieba - Building prefix dict from the default dictionary ...

Dumping model to file cache /tmp/jieba.cache

04/16/2023 21:13:22 - DEBUG - jieba - Dumping model to file cache /tmp/jieba.cache

Loading model cost 0.634 seconds.

04/16/2023 21:13:22 - DEBUG - jieba - Loading model cost 0.634 seconds.

Prefix dict has been built successfully.

04/16/2023 21:13:22 - DEBUG - jieba - Prefix dict has been built successfully.

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1070/1070 51:53\<00:00, 2.91s/it

***** predict metrics *****

predict_bleu-4 = 0.7846

predict_rouge-1 = 8.8941

predict_rouge-2 = 1.3703

predict_rouge-l = 16.4982

predict_runtime = 0:51:57.77

predict_samples = 1070

predict_samples_per_second = 0.343

predict_steps_per_second = 0.343

模型推理

新增inference.py文件:

text 复制代码
import os

import torch

from transformers import AutoConfig, AutoModel, AutoTokenizer

MODEL_PATH = "/data/nfs/llm/model/chatglm-6b"

CHECKPOINT_PATH = "/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500"

载入Tokenizer

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)

config = AutoConfig.from_pretrained(MODEL_PATH, trust_remote_code=True, pre_seq_len=128)

model = AutoModel.from_pretrained(MODEL_PATH, config=config, trust_remote_code=True).cuda()

prefix_state_dict = torch.load(os.path.join(CHECKPOINT_PATH, "pytorch_model.bin"))

new_prefix_state_dict = {}

for k, v in prefix_state_dict.items():

if k.startswith("transformer.prefix_encoder."):

new_prefix_state_dictk\[len("transformer.prefix_encoder."):] = v

model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)

print(f"Quantized to 4 bit")

model = model.quantize(4)

model = model.half().cuda()

model.transformer.prefix_encoder.float()

model = model.eval()

print("用户:你好\n")

response, history = model.chat(tokenizer, "你好", history=\[\])

print("ChatGLM-6B:\n",response)

print("\n------------------------------------------------\n用户:")

line = input()

while line:

response, history = model.chat(tokenizer, line, history=history)

print("ChatGLM-6B:\n", response)

print("\n------------------------------------------------\n用户:")

line = input()

运行命令:

text 复制代码
CUDA_VISIBLE_DEVICES=0 python3 inference.py

结语

上面使用了DeepSpeed DP+ZeRO对ChatGLM-6B进行全参数微调,同时,当我们遇到GPU资源不足的情况下,可以利用P-Tuning v2进行了高效参数微调。

参考文档

相关推荐
shandianchengzi39 分钟前
【记录】Claude Code|Ubuntu26给Claude Code新增任务消息提示音
运维·服务器·ubuntu·ai·大模型·音频·claude
aqi001 小时前
15天学会AI应用开发(三)把历史对话作为提示词会怎样
人工智能·python·大模型·ai编程·ai应用
谷哥的小弟1 小时前
大模型核心基础知识(14)—神经网络的结构
人工智能·深度学习·神经网络·大模型·大语言模型
weixin_468466853 小时前
Airtable 零基础快速上手与实战指南
数据库·人工智能·python·深度学习·ai·大模型
容沁风3 小时前
本地用pptx和大模型生产PPT课件
python·大模型·pptx
谷哥的小弟4 小时前
大模型核心基础知识(13)—深度学习的发展基础与技术特点
人工智能·深度学习·机器学习·大模型·大语言模型
kaisun644 小时前
国内主流大模型采购清单
大模型·国产
追光者♂4 小时前
【测评系列6】CSDN AI数字营销实测体验官——OpenClaw 数据采集工具新手入门指南
人工智能·深度学习·机器学习·ai·大模型·openclaw·前沿科学
谷哥的小弟4 小时前
大模型核心基础知识(12)—机器学习的基本概念与常见方法
人工智能·深度学习·机器学习·大模型·大语言模型
weixin_4684668514 小时前
MoneyPrinterTurbo 短视频自动化生产实战指南
运维·人工智能·自动化·大模型·音视频·moneyprinter