link
之前尝试了基于ChatGLM-6B使用LoRA进行参数高效微调 ,本文给大家分享使用DeepSpeed和P-Tuning v2对ChatGLM-6B进行微调,相关代码放置在GitHub上面:llm-action。
ChatGLM-6B简介
ChatGLM-6B相关的简介请查看之前的文章,这里不再赘述。
P-Tuning v2简介
P-Tuning是一种较新的模型微调方法,它采用了参数剪枝的技术,可以将微调的参数量减少到原来的0.1%。具体来说,P-Tuning v2是基于P-Tuning v1的升级版,主要的改进在于采用了更加高效的剪枝方法,可以进一步减少模型微调的参数量。
P-Tuning v2的原理是通过对已训练好的大型语言模型进行参数剪枝,得到一个更加小巧、效率更高的轻量级模型。具体地,P-Tuning v2首先使用一种自适应的剪枝策略,对大型语言模型中的参数进行裁剪,去除其中不必要的冗余参数。然后,对于被剪枝的参数,P-Tuning v2使用了一种特殊的压缩方法,能够更加有效地压缩参数大小,并显著减少模型微调的总参数量。
总的来说,P-Tuning v2的核心思想是让模型变得更加轻便、更加高效,同时尽可能地保持模型的性能不受影响。这不仅可以加快模型的训练和推理速度,还可以减少模型在使用过程中的内存和计算资源消耗,让模型更适用于各种实际应用场景中。
环境搭建
基础环境配置如下:
- 操作系统: Ubuntu 18.04
- CPUs: 单个节点具有 1TB 内存的 Intel CPU,物理CPU个数为64,每颗CPU核数为16
- GPUs: 8 卡 A800 80GB GPUs
- Python: 3.10 (需要先升级OpenSSL到1.1.1t版本(点击下载OpenSSL ),然后再编译安装Python),点击下载Python
- NVIDIA驱动程序版本: 515.65.01,根据不同型号选择不同的驱动程序,点击下载。
- CUDA工具包: 11.7,点击下载
- NCCL: nccl_2.14.3-1+cuda11.7,点击下载
- cuDNN: 8.8.1.3_cuda11,点击下载
上面的NVIDIA驱动、CUDA、Python等工具的安装就不一一赘述了。
创建虚拟环境并激活虚拟环境chatglm-ptuningv2-venv-py310-cu117:
text
cd /home/guodong.li/virtual-venv
virtualenv -p /usr/bin/python3.10 chatglm-ptuningv2-venv-py310-cu117
source /home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/bin/activate
离线安装PyTorch,**点击下载**对应cuda版本的torch和torchvision即可。
text
pip install torch-1.13.1+cu117-cp310-cp310-linux_x86_64.whl
pip install torchvision-0.14.1+cu117-cp310-cp310-linux_x86_64.whl
安装其他依赖库。
text
pip install -r requirements.txt
requirements.txt文件内容如下:
text
protobuf
transformers==4.28.0
cpm_kernels
gradio
mdtex2html
sentencepiece
rouge_chinese
nltk
jieba
datasets
deepspeed
accelerate
注意 :
官方文档的transformers版本为4.27.1,chatglm加载模型时会调用transformers/dynamic_module_utils.py文件下的get_class_in_module方法,而该方法在并发情况下会存在找不到文件的问题。将transformers版本升级到4.28.0可以规避此问题。
数据准备
下面以 ADGEN (广告生成) 数据集为例来介绍微调的具体使用。
ADGEN 数据集为根据输入(content)生成一段广告词(summary),具体格式如下所示:
text
{
"content": "类型#上衣*版型#宽松*版型#显瘦*图案#线条*衣样式#衬衫*衣袖型#泡泡袖*衣款式#抽绳",
"summary": "这件衬衫的款式非常的宽松,利落的线条可以很好的隐藏身材上的小缺点,穿在身上有着很好的显瘦效果。领口装饰了一个可爱的抽绳,漂亮的绳结展现出了十足的个性,配合时尚的泡泡袖型,尽显女性甜美可爱的气息。"
}
请从官网下载 ADGEN 数据集,同通过此**链接** 下载,并将其解压到 AdvertiseGen 目录。
text
tar -zxvf AdvertiseGen.tar.gz
查看数据集大小:
text
> wc -l AdvertiseGen/*
> 1070 AdvertiseGen/dev.json
> 114599 AdvertiseGen/train.json
> 115669 total
使用DeepSpeed DP+Zero对ChatGLM-6B进行全参数微调
首先,我们使用DeepSpeed对ChatGLM-6B进行全参数微调。
首先,下载源代码,为确保代码的一致性切换到对应的commitid:
text
git clone https://github.com/THUDM/ChatGLM-6B.git
cd ChatGLM-6B
git checkout 8633db1
cd ptuning
修改ds_train_finetune.sh脚本使用DeepSpeed进行全参数微调。
text
LR=1e-4
MASTER_PORT=$(shuf -n 1 -i 10000-65535)
deepspeed --num_gpus=8 --master_port M A S T E R P O R T m a i n . p y − − d e e p s p e e d d e e p s p e e d . j s o n − − d o t r a i n − − t r a i n f i l e / d a t a / n f s / l l m / d a t a / A d v e r t i s e G e n / t r a i n . j s o n − − t e s t f i l e / d a t a / n f s / l l m / d a t a / A d v e r t i s e G e n / d e v . j s o n − − p r o m p t c o l u m n c o n t e n t − − r e s p o n s e c o l u m n s u m m a r y − − o v e r w r i t e c a c h e − − m o d e l n a m e o r p a t h / d a t a / n f s / l l m / m o d e l / c h a t g l m − 6 b − − o u t p u t d i r / h o m e / g u o d o n g . l i / o u t p u t / a d g e n − c h a t g l m − 6 b − f t − MASTER_PORT main.py \ --deepspeed deepspeed.json \ --do_train \ --train_file /data/nfs/llm/data/AdvertiseGen/train.json \ --test_file /data/nfs/llm/data/AdvertiseGen/dev.json \ --prompt_column content \ --response_column summary \ --overwrite_cache \ --model_name_or_path /data/nfs/llm/model/chatglm-6b \ --output_dir /home/guodong.li/output/adgen-chatglm-6b-ft- MASTERPORTmain.py −−deepspeeddeepspeed.json −−dotrain −−trainfile/data/nfs/llm/data/AdvertiseGen/train.json −−testfile/data/nfs/llm/data/AdvertiseGen/dev.json −−promptcolumncontent −−responsecolumnsummary −−overwritecache −−modelnameorpath/data/nfs/llm/model/chatglm−6b −−outputdir/home/guodong.li/output/adgen−chatglm−6b−ft−LR
--overwrite_output_dir
--max_source_length 64
--max_target_length 64
--per_device_train_batch_size 24
--per_device_eval_batch_size 1
--gradient_accumulation_steps 2
--predict_with_generate
--num_train_epochs 2
--logging_steps 10
--save_steps 300
--learning_rate $LR
--fp16
运行过程:
text
> sh ds_train_finetune.sh
[2023-04-14 18:01:33,206] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-04-14 18:01:33,417] [INFO] [runner.py:540:main] cmd = /home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=44148 --enable_each_rank_log=None main.py --deepspeed deepspeed.json --do_train --train_file /data/nfs/llm/data/AdvertiseGen/train.json --test_file /data/nfs/llm/data/AdvertiseGen/dev.json --prompt_column content --response_column summary --overwrite_cache --model_name_or_path /data/nfs/llm/model/chatglm-6b --output_dir /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4 --overwrite_output_dir --max_source_length 64 --max_target_length 64 --per_device_train_batch_size 24 --per_device_eval_batch_size 1 --gradient_accumulation_steps 2 --predict_with_generate --num_train_epochs 2 --logging_steps 10 --save_steps 300 --learning_rate 1e-4 --fp16
[2023-04-14 18:01:35,945] [INFO] [launch.py:222:main] 0 NCCL_SOCKET_IFNAME=bond0
[2023-04-14 18:01:35,945] [INFO] [launch.py:222:main] 0 NCCL_IB_DISABLE=1
[2023-04-14 18:01:35,945] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2023-04-14 18:01:35,945] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=8, node_rank=0
[2023-04-14 18:01:35,945] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2023-04-14 18:01:35,945] [INFO] [launch.py:247:main] dist_world_size=8
[2023-04-14 18:01:35,945] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2023-04-14 18:01:40,133] [INFO] [comm.py:586:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
04/14/2023 18:01:41 - WARNING - main - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: True
...
04/14/2023 18:01:41 - WARNING - main - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: True
04/14/2023 18:01:41 - INFO - main - Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=deepspeed.json,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=2,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0001,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/runs/Apr14_18-01-40_ai-app-2-46,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=2.0,
optim=adamw_hf,
optim_args=None,
output_dir=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=1,
per_device_train_batch_size=24,
predict_with_generate=True,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=[],
resume_from_checkpoint=None,
run_name=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4,
save_on_each_node=False,
save_safetensors=False,
save_steps=300,
save_strategy=steps,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 184.03it/s]
04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
[WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,664 >> Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
0%| | 0/2 [00:00<?, ?it/s][WARNING|tokenization_auto.py:675] 2023-04-14 18:03:01,675 >> Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 240.57it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 197.48it/s]
[INFO|configuration_utils.py:666] 2023-04-14 18:03:01,678 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json
[WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,678 >> Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
[WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,679 >> Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
[INFO|configuration_utils.py:666] 2023-04-14 18:03:01,685 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json
04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
[INFO|configuration_utils.py:720] 2023-04-14 18:03:01,687 >> Model config ChatGLMConfig {
"_name_or_path": "/data/nfs/llm/model/chatglm-6b",
"architectures": [
"ChatGLMModel"
],
"auto_map": {
"AutoConfig": "configuration_chatglm.ChatGLMConfig",
"AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration"
},
"bos_token_id": 130004,
"eos_token_id": 130005,
"gmask_token_id": 130001,
"hidden_size": 4096,
"inner_hidden_size": 16384,
"layernorm_epsilon": 1e-05,
"mask_token_id": 130000,
"max_sequence_length": 2048,
"model_type": "chatglm",
"num_attention_heads": 32,
"num_layers": 28,
"pad_token_id": 3,
"position_encoding_2d": true,
"pre_seq_len": null,
"prefix_projection": false,
"quantization_bit": 0,
"torch_dtype": "float16",
"transformers_version": "4.28.0",
"use_cache": true,
"vocab_size": 130528
}
0%| | 0/2 00:00\, ?it/sWARNING\|tokenization_auto.py:675 2023-04-14 18:03:01,688 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
WARNING\|tokenization_auto.py:675 2023-04-14 18:03:01,689 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
INFO\|tokenization_utils_base.py:1807 2023-04-14 18:03:01,694 >> loading file ice_text.model
INFO\|tokenization_utils_base.py:1807 2023-04-14 18:03:01,694 >> loading file added_tokens.json
INFO\|tokenization_utils_base.py:1807 2023-04-14 18:03:01,694 >> loading file special_tokens_map.json
INFO\|tokenization_utils_base.py:1807 2023-04-14 18:03:01,694 >> loading file tokenizer_config.json
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 00:00\<00:00, 285.37it/s
INFO\|modeling_utils.py:2531 2023-04-14 18:03:01,992 >> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.json
INFO\|configuration_utils.py:575 2023-04-14 18:03:01,993 >> Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 130004,
"eos_token_id": 130005,
"pad_token_id": 3,
"transformers_version": "4.28.0"
}
Loading checkpoint shards: 0%| | 0/8 00:00\, ?it/sWARNING\|auto_factory.py:456 2023-04-14 18:03:02,077 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
WARNING\|auto_factory.py:456 2023-04-14 18:03:02,109 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 00:13\<00:00, 1.70s/it
INFO\|modeling_utils.py:3190 2023-04-14 18:03:15,622 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.
INFO\|modeling_utils.py:3198 2023-04-14 18:03:15,622 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /data/nfs/llm/model/chatglm-6b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.
Loading checkpoint shards: 25%|████████████████████████████████████ | 2/8 00:13\<00:40, 6.73s/itINFO\|modeling_utils.py:2839 2023-04-14 18:03:15,703 >> Generation config file not found, using a generation config created from the model config.
...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 00:34\<00:00, 4.32s/it
input_ids 5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3
inputs 类型#裤版型#宽松 风格#性感图案#线条 裤型#阔腿裤 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还
...
label_ids -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100
labels 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还
2023-04-14 18:06:30,469 INFO logging.py:96:log_dist Rank 0 DeepSpeed Flops Profiler Enabled: False
2023-04-14 18:06:30,470 INFO logging.py:96:log_dist Rank 0 Removing param_group that has no 'params' in the client Optimizer
2023-04-14 18:06:30,470 INFO logging.py:96:log_dist Rank 0 Using client Optimizer as basic optimizer
2023-04-14 18:06:30,483 INFO logging.py:96:log_dist Rank 0 DeepSpeed Basic Optimizer = AdamW
2023-04-14 18:06:30,484 INFO utils.py:51:is_zero_supported_optimizer Checking ZeRO support for optimizer=AdamW type=<class 'transformers.optimization.AdamW'>
2023-04-14 18:06:30,484 WARNING engine.py:1118:_do_optimizer_sanity_check **** You are using ZeRO with an untested optimizer, proceed with caution *****
2023-04-14 18:06:30,484 INFO logging.py:96:log_dist Rank 0 Creating torch.float16 ZeRO stage 2 optimizer
2023-04-14 18:06:30,484 INFO stage_1_and_2.py:133:**init** Reduce bucket size 500000000
2023-04-14 18:06:30,484 INFO stage_1_and_2.py:134:**init** Allgather bucket size 500000000
2023-04-14 18:06:30,484 INFO stage_1_and_2.py:135:**init** CPU Offload: False
2023-04-14 18:06:30,484 INFO stage_1_and_2.py:136:**init** Round robin gradient partitioning: False
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Emitting ninja build file /home/guodong.li/.cache/torch_extensions/py310_cu117/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
ninja: no work to do.
Loading extension module utils...
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Time to load utils op: 0.10171675682067871 seconds
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Emitting ninja build file /home/guodong.li/.cache/torch_extensions/py310_cu117/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 0.18768668174743652 seconds
...
Loading extension module utils...
Time to load utils op: 0.3021426200866699 seconds
Rank: 2 partition count 8, 8 and sizes(771473408, False), (187392, False)
...
Rank: 4 partition count 8, 8 and sizes(771473408, False), (187392, False)
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Time to load utils op: 0.0005774497985839844 seconds
...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0011382102966308594 seconds
2023-04-14 18:06:48,321 INFO utils.py:785:see_memory_usage Before initializing optimizer states
2023-04-14 18:06:48,321 INFO utils.py:786:see_memory_usage MA 14.37 GB Max_MA 14.37 GB CA 14.39 GB Max_CA 14 GB
2023-04-14 18:06:48,322 INFO utils.py:793:see_memory_usage CPU Virtual Memory: used = 50.56 GB, percent = 5.0%
04/14/2023 18:06:48 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False...
...
04/14/2023 18:06:48 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False...
2023-04-14 18:06:48,431 INFO utils.py:785:see_memory_usage After initializing optimizer states
2023-04-14 18:06:48,434 INFO utils.py:786:see_memory_usage MA 20.12 GB Max_MA 25.87 GB CA 25.9 GB Max_CA 26 GB
2023-04-14 18:06:48,435 INFO utils.py:793:see_memory_usage CPU Virtual Memory: used = 50.84 GB, percent = 5.0%
2023-04-14 18:06:48,435 INFO stage_1_and_2.py:489:**init** optimizer state initialized
2023-04-14 18:06:48,512 INFO utils.py:785:see_memory_usage After initializing ZeRO optimizer
2023-04-14 18:06:48,513 INFO utils.py:786:see_memory_usage MA 20.12 GB Max_MA 20.12 GB CA 25.9 GB Max_CA 26 GB
2023-04-14 18:06:48,513 INFO utils.py:793:see_memory_usage CPU Virtual Memory: used = 51.29 GB, percent = 5.1%
2023-04-14 18:06:48,515 INFO logging.py:96:log_dist Rank 0 DeepSpeed Final Optimizer = AdamW
2023-04-14 18:06:48,515 INFO logging.py:96:log_dist Rank 0 DeepSpeed using client LR scheduler
2023-04-14 18:06:48,515 INFO logging.py:96:log_dist Rank 0 DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x7f172c367a30>
2023-04-14 18:06:48,515\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] step=0, skipped=0, lr=\[0.0001, 0.0001\], mom=\[(0.9, 0.999), (0.9, 0.999)
2023-04-14 18:06:48,515 INFO config.py:953:print DeepSpeedEngine configuration:
2023-04-14 18:06:48,516 INFO config.py:957:print activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
2023-04-14 18:06:48,516 INFO config.py:957:print aio_config ... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
2023-04-14 18:06:48,516 INFO config.py:957:print amp_enabled ... False
2023-04-14 18:06:48,516 INFO config.py:957:print amp_params ... False
2023-04-14 18:06:48,516 INFO config.py:957:print autotuning_config ... {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
2023-04-14 18:06:48,516 INFO config.py:957:print bfloat16_enabled ... False
2023-04-14 18:06:48,516 INFO config.py:957:print checkpoint_parallel_write_pipeline False
2023-04-14 18:06:48,516 INFO config.py:957:print checkpoint_tag_validation_enabled True
2023-04-14 18:06:48,516 INFO config.py:957:print checkpoint_tag_validation_fail False
2023-04-14 18:06:48,516 INFO config.py:957:print comms_config ... <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f172843d6f0>
2023-04-14 18:06:48,516 INFO config.py:957:print communication_data_type ... None
2023-04-14 18:06:48,516 INFO config.py:957:print compression_config ... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
2023-04-14 18:06:48,516 INFO config.py:957:print curriculum_enabled_legacy ... False
2023-04-14 18:06:48,516 INFO config.py:957:print curriculum_params_legacy ... False
2023-04-14 18:06:48,516 INFO config.py:957:print data_efficiency_config ... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
2023-04-14 18:06:48,516 INFO config.py:957:print data_efficiency_enabled ... False
2023-04-14 18:06:48,516 INFO config.py:957:print dataloader_drop_last ... False
2023-04-14 18:06:48,516 INFO config.py:957:print disable_allgather ... False
2023-04-14 18:06:48,516 INFO config.py:957:print dump_state ... False
2023-04-14 18:06:48,516 INFO config.py:957:print dynamic_loss_scale_args ... {'init_scale': 65536, 'scale_window': 1000, 'delayed_shift': 2, 'min_scale': 1}
2023-04-14 18:06:48,516 INFO config.py:957:print eigenvalue_enabled ... False
2023-04-14 18:06:48,516 INFO config.py:957:print eigenvalue_gas_boundary_resolution 1
2023-04-14 18:06:48,516 INFO config.py:957:print eigenvalue_layer_name ... bert.encoder.layer
2023-04-14 18:06:48,517 INFO config.py:957:print eigenvalue_layer_num ... 0
2023-04-14 18:06:48,517 INFO config.py:957:print eigenvalue_max_iter ... 100
2023-04-14 18:06:48,517 INFO config.py:957:print eigenvalue_stability ... 1e-06
2023-04-14 18:06:48,517 INFO config.py:957:print eigenvalue_tol ... 0.01
2023-04-14 18:06:48,517 INFO config.py:957:print eigenvalue_verbose ... False
2023-04-14 18:06:48,517 INFO config.py:957:print elasticity_enabled ... False
2023-04-14 18:06:48,517 INFO config.py:957:print flops_profiler_config ... {
"enabled": false,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
2023-04-14 18:06:48,517 INFO config.py:957:print fp16_auto_cast ... False
2023-04-14 18:06:48,517 INFO config.py:957:print fp16_enabled ... True
2023-04-14 18:06:48,517 INFO config.py:957:print fp16_master_weights_and_gradients False
2023-04-14 18:06:48,517 INFO config.py:957:print global_rank ... 0
2023-04-14 18:06:48,517 INFO config.py:957:print grad_accum_dtype ... None
2023-04-14 18:06:48,517 INFO config.py:957:print gradient_accumulation_steps ... 1
2023-04-14 18:06:48,517 INFO config.py:957:print gradient_clipping ... 0.0
2023-04-14 18:06:48,517 INFO config.py:957:print gradient_predivide_factor ... 1.0
2023-04-14 18:06:48,517 INFO config.py:957:print hybrid_engine ... enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
2023-04-14 18:06:48,517 INFO config.py:957:print initial_dynamic_scale ... 65536
2023-04-14 18:06:48,517 INFO config.py:957:print load_universal_checkpoint ... False
2023-04-14 18:06:48,517 INFO config.py:957:print loss_scale ... 0
2023-04-14 18:06:48,517 INFO config.py:957:print memory_breakdown ... False
2023-04-14 18:06:48,517 INFO config.py:957:print monitor_config ... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
2023-04-14 18:06:48,517 INFO config.py:957:print nebula_config ... {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
2023-04-14 18:06:48,517 INFO config.py:957:print optimizer_legacy_fusion ... False
2023-04-14 18:06:48,517 INFO config.py:957:print optimizer_name ... None
2023-04-14 18:06:48,517 INFO config.py:957:print optimizer_params ... None
2023-04-14 18:06:48,517 INFO config.py:957:print pipeline ... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
2023-04-14 18:06:48,517 INFO config.py:957:print pld_enabled ... False
2023-04-14 18:06:48,517 INFO config.py:957:print pld_params ... False
2023-04-14 18:06:48,517 INFO config.py:957:print prescale_gradients ... False
2023-04-14 18:06:48,517 INFO config.py:957:print scheduler_name ... None
2023-04-14 18:06:48,517 INFO config.py:957:print scheduler_params ... None
2023-04-14 18:06:48,518 INFO config.py:957:print sparse_attention ... None
2023-04-14 18:06:48,518 INFO config.py:957:print sparse_gradients_enabled ... False
2023-04-14 18:06:48,518 INFO config.py:957:print steps_per_print ... 10
2023-04-14 18:06:48,518 INFO config.py:957:print train_batch_size ... 192
2023-04-14 18:06:48,518 INFO config.py:957:print train_micro_batch_size_per_gpu 24
2023-04-14 18:06:48,518 INFO config.py:957:print use_node_local_storage ... False
2023-04-14 18:06:48,518 INFO config.py:957:print wall_clock_breakdown ... False
2023-04-14 18:06:48,518 INFO config.py:957:print world_size ... 8
2023-04-14 18:06:48,518 INFO config.py:957:print zero_allow_untested_optimizer True
2023-04-14 18:06:48,518 INFO config.py:957:print zero_config ... stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False memory_efficient_linear=True
2023-04-14 18:06:48,518 INFO config.py:957:print zero_enabled ... True
2023-04-14 18:06:48,518 INFO config.py:957:print zero_force_ds_cpu_optimizer ... True
2023-04-14 18:06:48,518 INFO config.py:957:print zero_optimization_stage ... 2
2023-04-14 18:06:48,518 INFO config.py:943:print_user_config json = {
"train_micro_batch_size_per_gpu": 24,
"zero_allow_untested_optimizer": true,
"fp16": {
"enabled": true,
"loss_scale": 0,
"initial_scale_power": 16,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 5.000000e+08,
"overlap_comm": false,
"reduce_scatter": true,
"reduce_bucket_size": 5.000000e+08,
"contiguous_gradients": true
}
}
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.00031948089599609375 seconds
0%| | 0/596 00:00\, ?it/s04/14/2023 18:06:48 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False...
2023-04-14 18:06:53,718 INFO loss_scaler.py:188:update_scale deepspeed OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1
2023-04-14 18:06:55,883 INFO loss_scaler.py:181:update_scale deepspeed OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768
0%|▎ | 1/596 00:07\<1:13:02, 7.37s/it2023-04-14 18:06:57,948 INFO loss_scaler.py:181:update_scale deepspeed OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
2023-04-14 18:07:00,007 INFO loss_scaler.py:181:update_scale deepspeed OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192
0%|▌ | 2/596 00:11\<54:01, 5.46s/it2023-04-14 18:07:06,332 INFO loss_scaler.py:181:update_scale deepspeed OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096
1%|▊ | 3/596 00:17\<57:51, 5.85s/it2023-04-14 18:07:08,383 INFO loss_scaler.py:181:update_scale deepspeed OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048
1%|█▏ | 4/596 00:24\<59:20, 6.01s/it2023-04-14 18:07:18,876 INFO loss_scaler.py:181:update_scale deepspeed OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024
2023-04-14 18:07:18,876\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] step=10, skipped=7, lr=\[9.949664429530202e-05, 9.949664429530202e-05\], mom=\[(0.9, 0.999), (0.9, 0.999)
2023-04-14 18:07:18,877 INFO timer.py:199:stop epoch=0/micro_step=10/global_step=10, RunningAvgSamplesPerSec=66.98818896434254, CurrSamplesPerSec=93.79590019766518, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
1%|█▍ | 5/596 00:30\<1:00:11, 6.11s/it
...
2023-04-14 18:47:55,207\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] step=590, skipped=12, lr=\[3.02013422818792e-06, 3.02013422818792e-06\], mom=\[(0.9, 0.999), (0.9, 0.999)
2023-04-14 18:47:57,392 INFO timer.py:199:stop epoch=0/micro_step=590/global_step=590, RunningAvgSamplesPerSec=45.931193758598916, CurrSamplesPerSec=45.63412532914195, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
50%|███████████████████████████████████████████████████████████████████████████████████▊ | 299/596 41:42\<41:37, 8.41s/it2023-04-14 18:48:37,273 INFO logging.py:96:log_dist Rank 0 step=600, skipped=12, lr=1.3422818791946309e-06, 1.3422818791946309e-06, mom=(0.9, 0.999), (0.9, 0.999)
2023-04-14 18:48:39,453 INFO timer.py:199:stop epoch=0/micro_step=600/global_step=600, RunningAvgSamplesPerSec=45.92850276413307, CurrSamplesPerSec=45.66031263997641, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
{'loss': 13.3487, 'learning_rate': 1.3422818791946309e-06, 'epoch': 1.01}
50%|████████████████████████████████████████████████████████████████████████████████████ | 300/596 41:50\<41:30, 8.41s/itSaving the whole model
INFO\|configuration_utils.py:457 2023-04-14 18:48:39,458 >> Configuration saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/config.json
INFO\|configuration_utils.py:362 2023-04-14 18:48:39,459 >> Configuration saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/generation_config.json
INFO\|modeling_utils.py:1855 2023-04-14 18:49:03,951 >> The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/pytorch_model.bin.index.json.
INFO\|tokenization_utils_base.py:2171 2023-04-14 18:49:03,953 >> tokenizer config file saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/tokenizer_config.json
INFO\|tokenization_utils_base.py:2178 2023-04-14 18:49:03,953 >> Special tokens file saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/special_tokens_map.json
2023-04-14 18:49:03,983 INFO logging.py:96:log_dist Rank 0 Torch Checkpoint global_step600 is about to be saved!
2023-04-14 18:49:03,988 INFO logging.py:96:log_dist Rank 0 Saving model checkpoint: /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt
2023-04-14 18:49:03,988 INFO torch_checkpoint_engine.py:21:save Torch Saving /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt...
2023-04-14 18:49:15,934 INFO torch_checkpoint_engine.py:23:save Torch Saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt.
2023-04-14 18:49:15,937 INFO torch_checkpoint_engine.py:21:save Torch Saving /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt...
2023-04-14 18:49:28,049 INFO torch_checkpoint_engine.py:23:save Torch Saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt.
2023-04-14 18:49:28,049 INFO engine.py:3125:_save_zero_checkpoint zero checkpoint saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt
2023-04-14 18:49:28,049 INFO torch_checkpoint_engine.py:33:commit Torch Checkpoint global_step600 is ready now!
51%|████████████████████████████████████████████████████████████████████████████████████▏ | 304/596 43:14\<1:05:51, 13.53s/it2023-04-14 18:50:09,137 INFO logging.py:96:log_dist Rank 0 step=610, skipped=12, lr=0.0, 0.0, mom=(0.9, 0.999), (0.9, 0.999)
2023-04-14 18:50:11,316 INFO timer.py:199:stop epoch=0/micro_step=610/global_step=610, RunningAvgSamplesPerSec=45.926876625767875, CurrSamplesPerSec=45.66709917655267, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
52%|██████████████████████████████████████████████████████████████████████████████████████▌ | 309/596 43:56\<44:16, 9.26s/it2023-04-14 18:50:51,114 INFO logging.py:96:log_dist Rank 0 step=620, skipped=12, lr=0.0, 0.0, mom=(0.9, 0.999), (0.9, 0.999)
2023-04-14 18:50:53,302 INFO timer.py:199:stop epoch=0/micro_step=620/global_step=620, RunningAvgSamplesPerSec=45.92462533252217, CurrSamplesPerSec=45.55552426651123, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
{'loss': 13.3202, 'learning_rate': 0.0, 'epoch': 1.04}
...
99%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 589/596 1:23:07\<00:58, 8.41s/it2023-04-14 19:30:02,654 INFO logging.py:96:log_dist Rank 0 step=1180, skipped=12, lr=0.0, 0.0, mom=(0.9, 0.999), (0.9, 0.999)
2023-04-14 19:30:04,820 INFO timer.py:199:stop epoch=0/micro_step=1180/global_step=1180, RunningAvgSamplesPerSec=45.85904109663022, CurrSamplesPerSec=45.73521852038509, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
{'loss': 13.3537, 'learning_rate': 0.0, 'epoch': 1.98}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍| 594/596 1:23:49\<00:16, 8.41s/it2023-04-14 19:30:44,847 INFO logging.py:96:log_dist Rank 0 step=1190, skipped=12, lr=0.0, 0.0, mom=(0.9, 0.999), (0.9, 0.999)
2023-04-14 19:30:47,022 INFO timer.py:199:stop epoch=0/micro_step=1190/global_step=1190, RunningAvgSamplesPerSec=45.856487437478386, CurrSamplesPerSec=45.579988341622055, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
{'train_runtime': 5046.8863, 'train_samples_per_second': 45.414, 'train_steps_per_second': 0.118, 'train_loss': 13.905431555421561, 'epoch': 2.0}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 596/596 1:24:06\<00:00, 8.47s/it
***** train metrics *****
epoch = 2.0
train_loss = 13.9054
train_runtime = 1:24:06.88
train_samples = 114599
train_samples_per_second = 45.414
train_steps_per_second = 0.118
2023-04-14 19:30:58,560 INFO launch.py:460:main Process 35198 exits successfully.
2023-04-14 19:30:58,561 INFO launch.py:460:main Process 35192 exits successfully.
2023-04-14 19:30:58,561 INFO launch.py:460:main Process 35193 exits successfully.
2023-04-14 19:30:58,561 INFO launch.py:460:main Process 35195 exits successfully.
2023-04-14 19:30:58,561 INFO launch.py:460:main Process 35191 exits successfully.
2023-04-14 19:30:59,562 INFO launch.py:460:main Process 35194 exits successfully.
2023-04-14 19:30:59,563 INFO launch.py:460:main Process 35197 exits successfully.
2023-04-14 19:31:00,564 INFO launch.py:460:main Process 35196 exits successfully.
GPU显存占用:
text
Fri Apr 14 18:27:45 2023
±----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=++==============|
| 0 NVIDIA A800 80G... Off | 00000000:34:00.0 Off | 0 |
| N/A 59C P0 92W / 300W | 36539MiB / 81920MiB | 100% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA A800 80G... Off | 00000000:35:00.0 Off | 0 |
| N/A 61C P0 96W / 300W | 38395MiB / 81920MiB | 100% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 2 NVIDIA A800 80G... Off | 00000000:36:00.0 Off | 0 |
| N/A 63C P0 93W / 300W | 38395MiB / 81920MiB | 100% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 3 NVIDIA A800 80G... Off | 00000000:37:00.0 Off | 0 |
| N/A 65C P0 102W / 300W | 38347MiB / 81920MiB | 100% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 4 NVIDIA A800 80G... Off | 00000000:9B:00.0 Off | 0 |
| N/A 64C P0 108W / 300W | 38347MiB / 81920MiB | 100% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 5 NVIDIA A800 80G... Off | 00000000:9C:00.0 Off | 0 |
| N/A 64C P0 105W / 300W | 38395MiB / 81920MiB | 100% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 6 NVIDIA A800 80G... Off | 00000000:9D:00.0 Off | 0 |
| N/A 58C P0 97W / 300W | 36433MiB / 81920MiB | 100% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 7 NVIDIA A800 80G... Off | 00000000:9E:00.0 Off | 0 |
| N/A 59C P0 92W / 300W | 38347MiB / 81920MiB | 100% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 35191 C ...nv-py310-cu117/bin/python 36537MiB |
| 1 N/A N/A 35192 C ...nv-py310-cu117/bin/python 38393MiB |
| 2 N/A N/A 35193 C ...nv-py310-cu117/bin/python 38393MiB |
| 3 N/A N/A 35194 C ...nv-py310-cu117/bin/python 38345MiB |
| 4 N/A N/A 35195 C ...nv-py310-cu117/bin/python 38345MiB |
| 5 N/A N/A 35196 C ...nv-py310-cu117/bin/python 38393MiB |
| 6 N/A N/A 35197 C ...nv-py310-cu117/bin/python 36431MiB |
| 7 N/A N/A 35198 C ...nv-py310-cu117/bin/python 38345MiB |
±----------------------------------------------------------------------------+
输出文件:
text
tree /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4
/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4
├── all_results.json
├── checkpoint-300
│ ├── config.json
│ ├── configuration_chatglm.py
│ ├── generation_config.json
│ ├── global_step600
│ │ ├── mp_rank_00_model_states.pt
│ │ ├── zero_pp_rank_0_mp_rank_00_optim_states.pt
│ │ ├── zero_pp_rank_1_mp_rank_00_optim_states.pt
│ │ ├── zero_pp_rank_2_mp_rank_00_optim_states.pt
│ │ ├── zero_pp_rank_3_mp_rank_00_optim_states.pt
│ │ ├── zero_pp_rank_4_mp_rank_00_optim_states.pt
│ │ ├── zero_pp_rank_5_mp_rank_00_optim_states.pt
│ │ ├── zero_pp_rank_6_mp_rank_00_optim_states.pt
│ │ └── zero_pp_rank_7_mp_rank_00_optim_states.pt
│ ├── ice_text.model
│ ├── latest
│ ├── modeling_chatglm.py
│ ├── pytorch_model-00001-of-00002.bin
│ ├── pytorch_model-00002-of-00002.bin
│ ├── pytorch_model.bin.index.json
│ ├── quantization.py
│ ├── rng_state_0.pth
│ ├── rng_state_1.pth
│ ├── rng_state_2.pth
│ ├── rng_state_3.pth
│ ├── rng_state_4.pth
│ ├── rng_state_5.pth
│ ├── rng_state_6.pth
│ ├── rng_state_7.pth
│ ├── special_tokens_map.json
│ ├── tokenization_chatglm.py
│ ├── tokenizer_config.json
│ ├── trainer_state.json
│ ├── training_args.bin
│ └── zero_to_fp32.py
├── trainer_state.json
└── train_results.json
2 directories, 36 files
训练结束后没有保存模型权重,只保存了训练过程中的checkpoint,可在代码中添加trainer.save_model()进行保存。
使用DeepSpeed进行full finetuning,对于显存要求较高,且训练较慢。因此下面尝试使用官网提供的P-Tuning v2进行高效参数微调。
使用P-Tuning v2对ChatGLM-6B进行参数高效微调
对于 ChatGLM-6B 模型基于 P-Tuning v2 进行微调。可将需要微调的参数量减少到原来的 0.1%,再通过模型量化、Gradient Checkpoint 等方法,最低只需要 7GB 显存即可运行。
首先,修改train.sh脚本,主要是修改train_file、validation_file、model_name_or_path、output_dir参数:
text
PRE_SEQ_LEN=128
LR=2e-2
CUDA_VISIBLE_DEVICES=0 python3 main.py
--do_train
--train_file /data/nfs/llm/data/AdvertiseGen/train.json
--validation_file /data/nfs/llm/data/AdvertiseGen/dev.json
--prompt_column content
--response_column summary
--overwrite_cache
--model_name_or_path /data/nfs/llm/model/chatglm-6b
--output_dir /home/guodong.li/output/adgen-chatglm-6b-pt- P R E S E Q L E N − PRE_SEQ_LEN- PRESEQLEN−LR
--overwrite_output_dir
--max_source_length 64
--max_target_length 64
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 16
--predict_with_generate
--max_steps 3000
--logging_steps 10
--save_steps 1000
--learning_rate $LR
--pre_seq_len $PRE_SEQ_LEN
--quantization_bit 4
运行过程:
text
0%| | 0/3000 [00:00<?, ?it/s]
...
{'loss': 4.2962, 'learning_rate': 0.0196, 'epoch': 0.01}
{'loss': 4.3112, 'learning_rate': 0.019533333333333333, 'epoch': 0.01}
2%|███▊ | 70/3000 [03:20<2:17:06, 2.81s/it]
GPU显存占用:
text
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=++==============|
| 0 NVIDIA A800 80G... Off | 00000000:34:00.0 Off | 0 |
| N/A 71C P0 300W / 300W | 6291MiB / 81920MiB | 74% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
对显存的占用确实低,即使用了P-Tuning v2进行参数高效微调,但训练的速度还是很慢。
修改train.sh增大batch_size继续干。
text
PRE_SEQ_LEN=128
LR=2e-2
CUDA_VISIBLE_DEVICES=0 python3 main.py
--do_train
--train_file /data/nfs/llm/data/AdvertiseGen/train.json
--validation_file /data/nfs/llm/data/AdvertiseGen/dev.json
--prompt_column content
--response_column summary
--overwrite_cache
--model_name_or_path /data/nfs/llm/model/chatglm-6b
--output_dir /home/guodong.li/output/adgen-chatglm-6b-pt- P R E S E Q L E N − PRE_SEQ_LEN- PRESEQLEN−LR
--overwrite_output_dir
--max_source_length 64
--max_target_length 64
--per_device_train_batch_size 128
--per_device_eval_batch_size 8
--gradient_accumulation_steps 16
--predict_with_generate
--num_train_epochs 1
--logging_steps 10
--save_steps 100
--learning_rate $LR
--pre_seq_len $PRE_SEQ_LEN
--quantization_bit 4
运行过程:
text
sh train.sh
04/14/2023 19:46:38 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: Fals
04/14/2023 19:46:38 - INFO - main - Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=16,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.02,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/runs/Apr14_19-46-38_ai-app-2-46,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
optim=adamw_hf,
optim_args=None,
output_dir=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=128,
predict_with_generate=True,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=[],
resume_from_checkpoint=None,
run_name=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2,
save_on_each_node=False,
save_safetensors=False,
save_steps=100,
save_strategy=steps,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
04/14/2023 19:47:58 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-1cf934bed8e233e6e)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
[INFO|configuration_utils.py:666] 2023-04-14 19:47:58,671 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json
[WARNING|configuration_auto.py:925] 2023-04-14 19:47:58,671 >> Explicitly passing a `revision` is encouraged when loading a configuratio a newer revision.
[INFO|configuration_utils.py:666] 2023-04-14 19:47:58,679 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json
[INFO|configuration_utils.py:720] 2023-04-14 19:47:58,681 >> Model config ChatGLMConfig {
"_name_or_path": "/data/nfs/llm/model/chatglm-6b",
"architectures": [
"ChatGLMModel"
],
"auto_map": {
"AutoConfig": "configuration_chatglm.ChatGLMConfig",
"AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration"
},
"bos_token_id": 130004,
"eos_token_id": 130005,
"gmask_token_id": 130001,
"hidden_size": 4096,
"inner_hidden_size": 16384,
"layernorm_epsilon": 1e-05,
"mask_token_id": 130000,
"max_sequence_length": 2048,
"model_type": "chatglm",
"num_attention_heads": 32,
"num_layers": 28,
"pad_token_id": 3,
"position_encoding_2d": true,
"pre_seq_len": null,
"prefix_projection": false,
"quantization_bit": 0,
"torch_dtype": "float16",
"transformers_version": "4.28.0",
"use_cache": true,
"vocab_size": 130528
}
WARNING\|tokenization_auto.py:675 2023-04-14 19:47:58,683 >> Explicitly passing a revision is encouraged when loading a model with curevision.
INFO\|tokenization_utils_base.py:1807 2023-04-14 19:47:58,692 >> loading file ice_text.model
INFO\|tokenization_utils_base.py:1807 2023-04-14 19:47:58,692 >> loading file added_tokens.json
INFO\|tokenization_utils_base.py:1807 2023-04-14 19:47:58,692 >> loading file special_tokens_map.json
INFO\|tokenization_utils_base.py:1807 2023-04-14 19:47:58,692 >> loading file tokenizer_config.json
WARNING\|auto_factory.py:456 2023-04-14 19:47:59,089 >> Explicitly passing a revision is encouraged when loading a model with custom ion.
INFO\|modeling_utils.py:2531 2023-04-14 19:47:59,115 >> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.jso
INFO\|configuration_utils.py:575 2023-04-14 19:47:59,117 >> Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 130004,
"eos_token_id": 130005,
"pad_token_id": 3,
"transformers_version": "4.28.0"
}
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████
INFO\|modeling_utils.py:3190 2023-04-14 19:48:08,508 >> All model checkpoint weights were used when initializing ChatGLMForConditionalG
WARNING\|modeling_utils.py:3192\] 2023-04-14 19:48:08,508 \>\> Some weights of ChatGLMForConditionalGeneration were not initialized from thtialized: \['transformer.prefix_encoder.embedding.weight'
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO\|modeling_utils.py:2839 2023-04-14 19:48:08,548 >> Generation config file not found, using a generation config created from the mo
Quantized to 4 bit
input_ids 5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 15388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65564219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 6 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3
inputs 类型#裤版型#宽松 风格#性感图案#线条 裤型#阔腿裤 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长适贴身体验感棒棒哒。系带部分增加设计看点,还
label_ids [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 741-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100
labels 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自
/home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/lib/python3.10/site-packages/transformers/optimization.py:391: FutureWain a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warn
warnings.warn(
0%| 04/14/2023 19:51:19 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkp
{'loss': 6.0246, 'learning_rate': 0.016428571428571428, 'epoch': 0.18}
{'loss': 7.8721, 'learning_rate': 0.012857142857142859, 'epoch': 0.36}
{'loss': 8.2653, 'learning_rate': 0.009285714285714286, 'epoch': 0.54}
{'loss': 8.6636, 'learning_rate': 0.005714285714285714, 'epoch': 0.71}
{'loss': 8.5985, 'learning_rate': 0.002142857142857143, 'epoch': 0.89}
{'train_runtime': 4868.4062, 'train_samples_per_second': 23.539, 'train_steps_per_second': 0.012, 'train_loss': 7.956800188337054, 'epoc
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
***** train metrics *****
epoch = 1.0
train_loss = 7.9568
train_runtime = 1:21:08.40
train_samples = 114599
train_samples_per_second = 23.539
train_steps_per_second = 0.012
显存占用:
text
Sun Apr 16 19:53:00 2023
±----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=++==============|
| 0 NVIDIA A800 80G... Off | 00000000:34:00.0 Off | 0 |
| N/A 71C P0 281W / 300W | 63275MiB / 81920MiB | 92% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 20126 C python3 63273MiB |
±----------------------------------------------------------------------------+
输出文件:
text
> ls -al /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2
total 12
drwxrwxr-x 2 guodong.li guodong.li 98 Apr 14 21:12 .
drwxrwxr-x 8 guodong.li guodong.li 177 Apr 14 17:12 ...
-rw-rw-r-- 1 guodong.li guodong.li 195 Apr 14 21:12 all_results.json
-rw-rw-r-- 1 guodong.li guodong.li 1185 Apr 14 21:12 trainer_state.json
-rw-rw-r-- 1 guodong.li guodong.li 195 Apr 14 21:12 train_results.json
可以看到,通过调整batch_size,显存使用及利用率都提升上去了。
如果需要使用DeepSpeed进行数据并行,可参考如下命令:
text
PRE_SEQ_LEN=128
LR=2e-2
deepspeed --include localhost:1,2,3 --master_port 29001 main.py
--deepspeed deepspeed.json
--do_train
--train_file /data/nfs/llm/data/AdvertiseGen/train.json
--validation_file /data/nfs/llm/data/AdvertiseGen/dev.json
--prompt_column content
--response_column summary
--overwrite_cache
--model_name_or_path /data/nfs/llm/model/chatglm-6b
--output_dir /home/guodong.li/output/adgen-chatglm-6b-pt
--overwrite_output_dir
--max_source_length 64
--max_target_length 64
--per_device_train_batch_size 128
--per_device_eval_batch_size 8
--gradient_accumulation_steps 16
--predict_with_generate
--num_train_epochs 10
--logging_steps 10
--save_steps 100
--learning_rate $LR
--pre_seq_len $PRE_SEQ_LEN
模型评估
修改evaluate.sh文件,修改model_name_or_path(模型路径),ptuning_checkpoint(P-Tuning v2微调之后的权重路径)等参数:
text
PRE_SEQ_LEN=128
CHECKPOINT=adgen-chatglm-6b-pt-128-2e-2
STEP=3000
PRE_SEQ_LEN=128
CHECKPOINT=adgen-chatglm-6b-pt-128-2e-2
STEP=3000
CUDA_VISIBLE_DEVICES=1 python3 main.py
--do_predict
--validation_file /data/nfs/llm/data/AdvertiseGen/dev.json
--test_file /data/nfs/llm/data/AdvertiseGen/dev.json
--overwrite_cache
--prompt_column content
--response_column summary
--model_name_or_path /data/nfs/llm/model/chatglm-6b
--ptuning_checkpoint /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500
--output_dir /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500
--overwrite_output_dir
--max_source_length 64
--max_target_length 64
--per_device_eval_batch_size 1
--predict_with_generate
--pre_seq_len $PRE_SEQ_LEN
--quantization_bit 4
运行过程:
text
sh evaluate.sh
04/16/2023 20:18:01 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False
04/16/2023 20:18:01 - INFO - main - Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
...
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
...
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
Downloading and preparing dataset json/default to /home/guodong.li/.cache/huggingface/datasets/json/default-df42438b5ccb0b44/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e...
Downloading data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3419.73it/s]
Extracting data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 196.48it/s]
Dataset json downloaded and prepared to /home/guodong.li/.cache/huggingface/datasets/json/default-df42438b5ccb0b44/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e. Subsequent calls will reuse this data.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 326.85it/s]
[INFO|configuration_utils.py:666] 2023-04-16 20:19:21,784 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json
[WARNING|configuration_auto.py:925] 2023-04-16 20:19:21,785 >> Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
[INFO|configuration_utils.py:666] 2023-04-16 20:19:21,792 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json
[INFO|configuration_utils.py:720] 2023-04-16 20:19:21,795 >> Model config ChatGLMConfig {
"_name_or_path": "/data/nfs/llm/model/chatglm-6b",
"architectures": [
"ChatGLMModel"
],
"auto_map": {
"AutoConfig": "configuration_chatglm.ChatGLMConfig",
"AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration"
},
"bos_token_id": 130004,
"eos_token_id": 130005,
"gmask_token_id": 130001,
"hidden_size": 4096,
"inner_hidden_size": 16384,
"layernorm_epsilon": 1e-05,
"mask_token_id": 130000,
"max_sequence_length": 2048,
"model_type": "chatglm",
"num_attention_heads": 32,
"num_layers": 28,
"pad_token_id": 3,
"position_encoding_2d": true,
"pre_seq_len": null,
"prefix_projection": false,
"quantization_bit": 0,
"torch_dtype": "float16",
"transformers_version": "4.28.0",
"use_cache": true,
"vocab_size": 130528
}
WARNING\|tokenization_auto.py:675 2023-04-16 20:19:21,797 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
INFO\|tokenization_utils_base.py:1807 2023-04-16 20:19:21,805 >> loading file ice_text.model
INFO\|tokenization_utils_base.py:1807 2023-04-16 20:19:21,805 >> loading file added_tokens.json
INFO\|tokenization_utils_base.py:1807 2023-04-16 20:19:21,805 >> loading file special_tokens_map.json
INFO\|tokenization_utils_base.py:1807 2023-04-16 20:19:21,805 >> loading file tokenizer_config.json
WARNING\|auto_factory.py:456 2023-04-16 20:19:22,186 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
INFO\|modeling_utils.py:2531 2023-04-16 20:19:22,222 >> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.json
INFO\|configuration_utils.py:575 2023-04-16 20:19:22,224 >> Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 130004,
"eos_token_id": 130005,
"pad_token_id": 3,
"transformers_version": "4.28.0"
}
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 00:08\<00:00, 1.04s/it
INFO\|modeling_utils.py:3190 2023-04-16 20:19:30,912 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.
WARNING\|modeling_utils.py:3192\] 2023-04-16 20:19:30,912 \>\> Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at /data/nfs/llm/model/chatglm-6b and are newly initialized: \['transformer.prefix_encoder.embedding.weight'
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO\|modeling_utils.py:2839 2023-04-16 20:19:30,967 >> Generation config file not found, using a generation config created from the model config.
Quantized to 4 bit
input_ids 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 65421, 61, 75898, 32, 68554, 61, 77257, 64555, 32, 65107, 61, 66268, 32, 65347, 61, 71689, 32, 69768, 61, 85428, 32, 65173, 73942, 61, 70984, 32, 65173, 70936, 61, 64703, 65509, 130001, 130004
inputs 类型#上衣材质#牛仔布 颜色#白色风格#简约 图案#刺绣衣样式#外套 衣款式#破洞
label_ids 5, 71689, 66561, 67061, 77257, 70984, 6, 72194, 65173, 64290, 64622, 81549, 63823, 65173, 64290, 83343, 63832, 63912, 65209, 64703, 65509, 64051, 6, 69418, 78598, 87019, 6, 64257, 71319, 66069, 74197, 63823, 65173, 72265, 64880, 64131, 63832, 73416, 85428, 66261, 6, 65594, 87834, 6, 73412, 105145, 65388, 63823, 130001, 130004
labels 简约而不简单的牛仔外套,白色的衣身十分百搭。衣身多处有做旧破洞设计,打破单调乏味,增加一丝造型看点。衣身后背处有趣味刺绣装饰,丰富层次感,彰显别样时尚。
04/16/2023 20:21:30 - INFO - main - *** Predict ***
INFO\|configuration_utils.py:575 2023-04-16 20:21:30,090 >> Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 130004,
"eos_token_id": 130005,
"pad_token_id": 3,
"transformers_version": "4.28.0"
}
0%| | 0/1070 00:00\, ?it/sINFO\|configuration_utils.py:575 2023-04-16 20:21:34,430 >> Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 130004,
"eos_token_id": 130005,
"pad_token_id": 3,
"transformers_version": "4.28.0"
}
0%|▎ | 2/1070 00:02\<25:39, 1.44s/itINFO\|configuration_utils.py:575 2023-04-16 20:21:37,311 >> Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 130004,
"eos_token_id": 130005,
"pad_token_id": 3,
"transformers_version": "4.28.0"
}
0%|▍ | 3/1070
...
1%|█▎ | 8/1070 00:20\<50:13, 2.84s/itINFO\|configuration_utils.py:575 2023-04-16 20:21:55,233 >> Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 130004,
"eos_token_id": 130005,
"pad_token_id": 3,
"transformers_version": "4.28.0"
}
1%|█▍ | 9/1070 00:23\<50:24, 2.85s/itINFO\|configuration_utils.py:575 2023-04-16 20:21:58,112 >> Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 130004,
"eos_token_id": 130005,
"pad_token_id": 3,
"transformers_version": "4.28.0"
}
1%|█▌ | 10/1070 00:26\<50:30, 2.86s/itINFO\|configuration_utils.py:575 2023-04-16 20:22:00,990 >> Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 130004,
"eos_token_id": 130005,
"pad_token_id": 3,
"transformers_version": "4.28.0"
}
1%|█▋ | 11/1070 00:29\<50:37, 2.87s/itINFO\|configuration_utils.py:575 2023-04-16 20:22:03,880 >> Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 130004,
"eos_token_id": 130005,
"pad_token_id": 3,
"transformers_version": "4.28.0"
}
1%|█▊ | 12/1070 00:32\<50:38, 2.87s/itINFO\|configuration_utils.py:575 2023-04-16 20:22:06,761 >> Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 130004,
"eos_token_id": 130005,
"pad_token_id": 3,
"transformers_version": "4.28.0"
}
...
INFO\|configuration_utils.py:575 2023-04-16 21:13:16,240 >> Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 130004,
"eos_token_id": 130005,
"pad_token_id": 3,
"transformers_version": "4.28.0"
}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 1069/1070 51:44\<00:02, 2.92s/itINFO\|configuration_utils.py:575 2023-04-16 21:13:19,107 >> Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 130004,
"eos_token_id": 130005,
"pad_token_id": 3,
"transformers_version": "4.28.0"
}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1070/1070 51:47\<00:00, 2.90s/itBuilding prefix dict from the default dictionary ...
04/16/2023 21:13:22 - DEBUG - jieba - Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
04/16/2023 21:13:22 - DEBUG - jieba - Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.634 seconds.
04/16/2023 21:13:22 - DEBUG - jieba - Loading model cost 0.634 seconds.
Prefix dict has been built successfully.
04/16/2023 21:13:22 - DEBUG - jieba - Prefix dict has been built successfully.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1070/1070 51:53\<00:00, 2.91s/it
***** predict metrics *****
predict_bleu-4 = 0.7846
predict_rouge-1 = 8.8941
predict_rouge-2 = 1.3703
predict_rouge-l = 16.4982
predict_runtime = 0:51:57.77
predict_samples = 1070
predict_samples_per_second = 0.343
predict_steps_per_second = 0.343
模型推理
新增inference.py文件:
text
import os
import torch
from transformers import AutoConfig, AutoModel, AutoTokenizer
MODEL_PATH = "/data/nfs/llm/model/chatglm-6b"
CHECKPOINT_PATH = "/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500"
载入Tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
config = AutoConfig.from_pretrained(MODEL_PATH, trust_remote_code=True, pre_seq_len=128)
model = AutoModel.from_pretrained(MODEL_PATH, config=config, trust_remote_code=True).cuda()
prefix_state_dict = torch.load(os.path.join(CHECKPOINT_PATH, "pytorch_model.bin"))
new_prefix_state_dict = {}
for k, v in prefix_state_dict.items():
if k.startswith("transformer.prefix_encoder."):
new_prefix_state_dictk\[len("transformer.prefix_encoder."):] = v
model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)
print(f"Quantized to 4 bit")
model = model.quantize(4)
model = model.half().cuda()
model.transformer.prefix_encoder.float()
model = model.eval()
print("用户:你好\n")
response, history = model.chat(tokenizer, "你好", history=\[\])
print("ChatGLM-6B:\n",response)
print("\n------------------------------------------------\n用户:")
line = input()
while line:
response, history = model.chat(tokenizer, line, history=history)
print("ChatGLM-6B:\n", response)
print("\n------------------------------------------------\n用户:")
line = input()
运行命令:
text
CUDA_VISIBLE_DEVICES=0 python3 inference.py
结语
上面使用了DeepSpeed DP+ZeRO对ChatGLM-6B进行全参数微调,同时,当我们遇到GPU资源不足的情况下,可以利用P-Tuning v2进行了高效参数微调。
参考文档: