使用DeepSpeed/P-Tuning v2对ChatGLM-6B进行微调

link
之前尝试了基于ChatGLM-6B使用LoRA进行参数高效微调 ,本文给大家分享使用DeepSpeed和P-Tuning v2对ChatGLM-6B进行微调,相关代码放置在GitHub上面:llm-action

ChatGLM-6B简介

ChatGLM-6B相关的简介请查看之前的文章,这里不再赘述。

P-Tuning v2简介

P-Tuning是一种较新的模型微调方法,它采用了参数剪枝的技术,可以将微调的参数量减少到原来的0.1%。具体来说,P-Tuning v2是基于P-Tuning v1的升级版,主要的改进在于采用了更加高效的剪枝方法,可以进一步减少模型微调的参数量。

P-Tuning v2的原理是通过对已训练好的大型语言模型进行参数剪枝,得到一个更加小巧、效率更高的轻量级模型。具体地,P-Tuning v2首先使用一种自适应的剪枝策略,对大型语言模型中的参数进行裁剪,去除其中不必要的冗余参数。然后,对于被剪枝的参数,P-Tuning v2使用了一种特殊的压缩方法,能够更加有效地压缩参数大小,并显著减少模型微调的总参数量。

总的来说,P-Tuning v2的核心思想是让模型变得更加轻便、更加高效,同时尽可能地保持模型的性能不受影响。这不仅可以加快模型的训练和推理速度,还可以减少模型在使用过程中的内存和计算资源消耗,让模型更适用于各种实际应用场景中。

环境搭建

基础环境配置如下:

  • 操作系统: Ubuntu 18.04
  • CPUs: 单个节点具有 1TB 内存的 Intel CPU,物理CPU个数为64,每颗CPU核数为16
  • GPUs: 8 卡 A800 80GB GPUs
  • Python: 3.10 (需要先升级OpenSSL到1.1.1t版本(点击下载OpenSSL ),然后再编译安装Python),点击下载Python
  • NVIDIA驱动程序版本: 515.65.01,根据不同型号选择不同的驱动程序,点击下载
  • CUDA工具包: 11.7,点击下载
  • NCCL: nccl_2.14.3-1+cuda11.7,点击下载
  • cuDNN: 8.8.1.3_cuda11,点击下载

上面的NVIDIA驱动、CUDA、Python等工具的安装就不一一赘述了。

创建虚拟环境并激活虚拟环境chatglm-ptuningv2-venv-py310-cu117:

text 复制代码
cd /home/guodong.li/virtual-venv
virtualenv -p /usr/bin/python3.10 chatglm-ptuningv2-venv-py310-cu117
source /home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/bin/activate

离线安装PyTorch,**点击下载**对应cuda版本的torch和torchvision即可。

text 复制代码
pip install torch-1.13.1+cu117-cp310-cp310-linux_x86_64.whl
pip install torchvision-0.14.1+cu117-cp310-cp310-linux_x86_64.whl

安装其他依赖库。

text 复制代码
pip install -r requirements.txt

requirements.txt文件内容如下:

text 复制代码
protobuf
transformers==4.28.0
cpm_kernels
gradio
mdtex2html
sentencepiece
rouge_chinese
nltk
jieba
datasets
deepspeed
accelerate

注意
官方文档的transformers版本为4.27.1,chatglm加载模型时会调用transformers/dynamic_module_utils.py文件下的get_class_in_module方法,而该方法在并发情况下会存在找不到文件的问题。将transformers版本升级到4.28.0可以规避此问题。

数据准备

下面以 ADGEN (广告生成) 数据集为例来介绍微调的具体使用。

ADGEN 数据集为根据输入(content)生成一段广告词(summary),具体格式如下所示:

text 复制代码
{
    "content": "类型#上衣*版型#宽松*版型#显瘦*图案#线条*衣样式#衬衫*衣袖型#泡泡袖*衣款式#抽绳",
    "summary": "这件衬衫的款式非常的宽松,利落的线条可以很好的隐藏身材上的小缺点,穿在身上有着很好的显瘦效果。领口装饰了一个可爱的抽绳,漂亮的绳结展现出了十足的个性,配合时尚的泡泡袖型,尽显女性甜美可爱的气息。"
}

请从官网下载 ADGEN 数据集,同通过此**链接** 下载,并将其解压到 AdvertiseGen 目录。

text 复制代码
tar -zxvf AdvertiseGen.tar.gz

查看数据集大小:

text 复制代码
> wc -l AdvertiseGen/*
> 1070 AdvertiseGen/dev.json
> 114599 AdvertiseGen/train.json
> 115669 total

使用DeepSpeed DP+Zero对ChatGLM-6B进行全参数微调

首先,我们使用DeepSpeed对ChatGLM-6B进行全参数微调。

首先,下载源代码,为确保代码的一致性切换到对应的commitid

text 复制代码
git clone https://github.com/THUDM/ChatGLM-6B.git
cd ChatGLM-6B
git checkout 8633db1
cd ptuning

修改ds_train_finetune.sh脚本使用DeepSpeed进行全参数微调。

text 复制代码
LR=1e-4

MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --num_gpus=8 --master_port M A S T E R P O R T m a i n . p y − − d e e p s p e e d d e e p s p e e d . j s o n − − d o t r a i n − − t r a i n f i l e / d a t a / n f s / l l m / d a t a / A d v e r t i s e G e n / t r a i n . j s o n − − t e s t f i l e / d a t a / n f s / l l m / d a t a / A d v e r t i s e G e n / d e v . j s o n − − p r o m p t c o l u m n c o n t e n t − − r e s p o n s e c o l u m n s u m m a r y − − o v e r w r i t e c a c h e − − m o d e l n a m e o r p a t h / d a t a / n f s / l l m / m o d e l / c h a t g l m − 6 b − − o u t p u t d i r / h o m e / g u o d o n g . l i / o u t p u t / a d g e n − c h a t g l m − 6 b − f t − MASTER_PORT main.py \ --deepspeed deepspeed.json \ --do_train \ --train_file /data/nfs/llm/data/AdvertiseGen/train.json \ --test_file /data/nfs/llm/data/AdvertiseGen/dev.json \ --prompt_column content \ --response_column summary \ --overwrite_cache \ --model_name_or_path /data/nfs/llm/model/chatglm-6b \ --output_dir /home/guodong.li/output/adgen-chatglm-6b-ft- MASTERPORTmain.py −−deepspeeddeepspeed.json −−dotrain −−trainfile/data/nfs/llm/data/AdvertiseGen/train.json −−testfile/data/nfs/llm/data/AdvertiseGen/dev.json −−promptcolumncontent −−responsecolumnsummary −−overwritecache −−modelnameorpath/data/nfs/llm/model/chatglm−6b −−outputdir/home/guodong.li/output/adgen−chatglm−6b−ft−LR

--overwrite_output_dir

--max_source_length 64

--max_target_length 64

--per_device_train_batch_size 24

--per_device_eval_batch_size 1

--gradient_accumulation_steps 2

--predict_with_generate

--num_train_epochs 2

--logging_steps 10

--save_steps 300

--learning_rate $LR

--fp16

运行过程:

text 复制代码
> sh ds_train_finetune.sh

[2023-04-14 18:01:33,206] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.

[2023-04-14 18:01:33,417] [INFO] [runner.py:540:main] cmd = /home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=44148 --enable_each_rank_log=None main.py --deepspeed deepspeed.json --do_train --train_file /data/nfs/llm/data/AdvertiseGen/train.json --test_file /data/nfs/llm/data/AdvertiseGen/dev.json --prompt_column content --response_column summary --overwrite_cache --model_name_or_path /data/nfs/llm/model/chatglm-6b --output_dir /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4 --overwrite_output_dir --max_source_length 64 --max_target_length 64 --per_device_train_batch_size 24 --per_device_eval_batch_size 1 --gradient_accumulation_steps 2 --predict_with_generate --num_train_epochs 2 --logging_steps 10 --save_steps 300 --learning_rate 1e-4 --fp16

[2023-04-14 18:01:35,945] [INFO] [launch.py:222:main] 0 NCCL_SOCKET_IFNAME=bond0

[2023-04-14 18:01:35,945] [INFO] [launch.py:222:main] 0 NCCL_IB_DISABLE=1

[2023-04-14 18:01:35,945] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}

[2023-04-14 18:01:35,945] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=8, node_rank=0

[2023-04-14 18:01:35,945] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})

[2023-04-14 18:01:35,945] [INFO] [launch.py:247:main] dist_world_size=8

[2023-04-14 18:01:35,945] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

[2023-04-14 18:01:40,133] [INFO] [comm.py:586:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl

04/14/2023 18:01:41 - WARNING - main - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: True

...

04/14/2023 18:01:41 - WARNING - main - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: True

04/14/2023 18:01:41 - INFO - main - Training/evaluation parameters Seq2SeqTrainingArguments(

_n_gpu=1,

adafactor=False,

adam_beta1=0.9,

adam_beta2=0.999,

adam_epsilon=1e-08,

auto_find_batch_size=False,

bf16=False,

bf16_full_eval=False,

data_seed=None,

dataloader_drop_last=False,

dataloader_num_workers=0,

dataloader_pin_memory=True,

ddp_bucket_cap_mb=None,

ddp_find_unused_parameters=None,

ddp_timeout=1800,

debug=[],

deepspeed=deepspeed.json,

disable_tqdm=False,

do_eval=False,

do_predict=False,

do_train=True,

eval_accumulation_steps=None,

eval_delay=0,

eval_steps=None,

evaluation_strategy=no,

fp16=True,

fp16_backend=auto,

fp16_full_eval=False,

fp16_opt_level=O1,

fsdp=[],

fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},

fsdp_min_num_params=0,

fsdp_transformer_layer_cls_to_wrap=None,

full_determinism=False,

generation_config=None,

generation_max_length=None,

generation_num_beams=None,

gradient_accumulation_steps=2,

gradient_checkpointing=False,

greater_is_better=None,

group_by_length=False,

half_precision_backend=auto,

hub_model_id=None,

hub_private_repo=False,

hub_strategy=every_save,

hub_token=<HUB_TOKEN>,

ignore_data_skip=False,

include_inputs_for_metrics=False,

jit_mode_eval=False,

label_names=None,

label_smoothing_factor=0.0,

learning_rate=0.0001,

length_column_name=length,

load_best_model_at_end=False,

local_rank=0,

log_level=passive,

log_level_replica=warning,

log_on_each_node=True,

logging_dir=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/runs/Apr14_18-01-40_ai-app-2-46,

logging_first_step=False,

logging_nan_inf_filter=True,

logging_steps=10,

logging_strategy=steps,

lr_scheduler_type=linear,

max_grad_norm=1.0,

max_steps=-1,

metric_for_best_model=None,

mp_parameters=,

no_cuda=False,

num_train_epochs=2.0,

optim=adamw_hf,

optim_args=None,

output_dir=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4,

overwrite_output_dir=True,

past_index=-1,

per_device_eval_batch_size=1,

per_device_train_batch_size=24,

predict_with_generate=True,

prediction_loss_only=False,

push_to_hub=False,

push_to_hub_model_id=None,

push_to_hub_organization=None,

push_to_hub_token=<PUSH_TO_HUB_TOKEN>,

ray_scope=last,

remove_unused_columns=True,

report_to=[],

resume_from_checkpoint=None,

run_name=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4,

save_on_each_node=False,

save_safetensors=False,

save_steps=300,

save_strategy=steps,

save_total_limit=None,

seed=42,

sharded_ddp=[],

skip_memory_metrics=True,

sortish_sampler=False,

tf32=None,

torch_compile=False,

torch_compile_backend=None,

torch_compile_mode=None,

torchdynamo=None,

tpu_metrics_debug=False,

tpu_num_cores=None,

use_ipex=False,

use_legacy_prediction_loop=False,

use_mps_device=False,

warmup_ratio=0.0,

warmup_steps=0,

weight_decay=0.0,

xpu_backend=None,

)

04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 184.03it/s]

04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

[WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,664 >> Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.

04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

0%|                                                                                                                                                                                   | 0/2 [00:00<?, ?it/s][WARNING|tokenization_auto.py:675] 2023-04-14 18:03:01,675 >> Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 240.57it/s]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 197.48it/s]

[INFO|configuration_utils.py:666] 2023-04-14 18:03:01,678 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,678 >> Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.

[WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,679 >> Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.

[INFO|configuration_utils.py:666] 2023-04-14 18:03:01,685 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

[INFO|configuration_utils.py:720] 2023-04-14 18:03:01,687 >> Model config ChatGLMConfig {

"_name_or_path": "/data/nfs/llm/model/chatglm-6b",

"architectures": [

"ChatGLMModel"

],

"auto_map": {

"AutoConfig": "configuration_chatglm.ChatGLMConfig",

"AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",

"AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration"

},

"bos_token_id": 130004,

"eos_token_id": 130005,

"gmask_token_id": 130001,

"hidden_size": 4096,

"inner_hidden_size": 16384,

"layernorm_epsilon": 1e-05,

"mask_token_id": 130000,

"max_sequence_length": 2048,

"model_type": "chatglm",

"num_attention_heads": 32,

"num_layers": 28,

"pad_token_id": 3,

"position_encoding_2d": true,

"pre_seq_len": null,

"prefix_projection": false,

"quantization_bit": 0,

"torch_dtype": "float16",

"transformers_version": "4.28.0",

"use_cache": true,

"vocab_size": 130528

}

0%| | 0/2 [00:00<?, ?it/s][WARNING|tokenization_auto.py:675] 2023-04-14 18:03:01,688 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

WARNING\|tokenization_auto.py:675\] 2023-04-14 18:03:01,689 \>\> Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. \[INFO\|tokenization_utils_base.py:1807\] 2023-04-14 18:03:01,694 \>\> loading file ice_text.model \[INFO\|tokenization_utils_base.py:1807\] 2023-04-14 18:03:01,694 \>\> loading file added_tokens.json \[INFO\|tokenization_utils_base.py:1807\] 2023-04-14 18:03:01,694 \>\> loading file special_tokens_map.json \[INFO\|tokenization_utils_base.py:1807\] 2023-04-14 18:03:01,694 \>\> loading file tokenizer_config.json 100%\|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 2/2 \[00:00\<00:00, 285.37it/s

INFO\|modeling_utils.py:2531\] 2023-04-14 18:03:01,992 \>\> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.json \[INFO\|configuration_utils.py:575\] 2023-04-14 18:03:01,993 \>\> Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 130004, "eos_token_id": 130005, "pad_token_id": 3, "transformers_version": "4.28.0" } Loading checkpoint shards: 0%\| \| 0/8 \[00:00\\> Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. \[WARNING\|auto_factory.py:456\] 2023-04-14 18:03:02,109 \>\> Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Loading checkpoint shards: 100%\|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 8/8 \[00:13\<00:00, 1.70s/it

INFO\|modeling_utils.py:3190\] 2023-04-14 18:03:15,622 \>\> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration. \[INFO\|modeling_utils.py:3198\] 2023-04-14 18:03:15,622 \>\> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /data/nfs/llm/model/chatglm-6b. If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training. Loading checkpoint shards: 25%\|████████████████████████████████████ \| 2/8 \[00:13\<00:40, 6.73s/it\]\[INFO\|modeling_utils.py:2839\] 2023-04-14 18:03:15,703 \>\> Generation config file not found, using a generation config created from the model config. ... Loading checkpoint shards: 100%\|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 8/8 \[00:34\<00:00, 4.32s/it

input_ids [5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]

inputs 类型#裤版型#宽松 风格#性感图案#线条 裤型#阔腿裤 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还

...

label_ids [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]

labels 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还

2023-04-14 18:06:30,469\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] DeepSpeed Flops Profiler Enabled: False \[2023-04-14 18:06:30,470\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] Removing param_group that has no 'params' in the client Optimizer \[2023-04-14 18:06:30,470\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] Using client Optimizer as basic optimizer \[2023-04-14 18:06:30,483\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] DeepSpeed Basic Optimizer = AdamW \[2023-04-14 18:06:30,484\] \[INFO\] \[utils.py:51:is_zero_supported_optimizer\] Checking ZeRO support for optimizer=AdamW type=\ \[2023-04-14 18:06:30,484\] \[WARNING\] \[engine.py:1118:_do_optimizer_sanity_check\] \*\*\*\* You are using ZeRO with an untested optimizer, proceed with caution \*\*\*\*\* \[2023-04-14 18:06:30,484\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] Creating torch.float16 ZeRO stage 2 optimizer \[2023-04-14 18:06:30,484\] \[INFO\] \[stage_1_and_2.py:133:**init** \] Reduce bucket size 500000000 \[2023-04-14 18:06:30,484\] \[INFO\] \[stage_1_and_2.py:134:**init** \] Allgather bucket size 500000000 \[2023-04-14 18:06:30,484\] \[INFO\] \[stage_1_and_2.py:135:**init** \] CPU Offload: False \[2023-04-14 18:06:30,484\] \[INFO\] \[stage_1_and_2.py:136:**init** \] Round robin gradient partitioning: False Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... Emitting ninja build file /home/guodong.li/.cache/torch_extensions/py310_cu117/utils/build.ninja... Building extension module utils... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... ninja: no work to do. Loading extension module utils... Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... Time to load utils op: 0.10171675682067871 seconds Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... Emitting ninja build file /home/guodong.li/.cache/torch_extensions/py310_cu117/utils/build.ninja... Building extension module utils... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module utils... Time to load utils op: 0.18768668174743652 seconds ... Loading extension module utils... Time to load utils op: 0.3021426200866699 seconds Rank: 2 partition count \[8, 8\] and sizes\[(771473408, False), (187392, False)

...

Rank: 4 partition count [8, 8] and sizes[(771473408, False), (187392, False)]

Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...

No modifications detected for re-loaded extension module utils, skipping build step...

Loading extension module utils...

Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...

Time to load utils op: 0.0005774497985839844 seconds

...

No modifications detected for re-loaded extension module utils, skipping build step...

Loading extension module utils...

Time to load utils op: 0.0011382102966308594 seconds

2023-04-14 18:06:48,321\] \[INFO\] \[utils.py:785:see_memory_usage\] Before initializing optimizer states \[2023-04-14 18:06:48,321\] \[INFO\] \[utils.py:786:see_memory_usage\] MA 14.37 GB Max_MA 14.37 GB CA 14.39 GB Max_CA 14 GB \[2023-04-14 18:06:48,322\] \[INFO\] \[utils.py:793:see_memory_usage\] CPU Virtual Memory: used = 50.56 GB, percent = 5.0% 04/14/2023 18:06:48 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... ... 04/14/2023 18:06:48 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... \[2023-04-14 18:06:48,431\] \[INFO\] \[utils.py:785:see_memory_usage\] After initializing optimizer states \[2023-04-14 18:06:48,434\] \[INFO\] \[utils.py:786:see_memory_usage\] MA 20.12 GB Max_MA 25.87 GB CA 25.9 GB Max_CA 26 GB \[2023-04-14 18:06:48,435\] \[INFO\] \[utils.py:793:see_memory_usage\] CPU Virtual Memory: used = 50.84 GB, percent = 5.0% \[2023-04-14 18:06:48,435\] \[INFO\] \[stage_1_and_2.py:489:**init** \] optimizer state initialized \[2023-04-14 18:06:48,512\] \[INFO\] \[utils.py:785:see_memory_usage\] After initializing ZeRO optimizer \[2023-04-14 18:06:48,513\] \[INFO\] \[utils.py:786:see_memory_usage\] MA 20.12 GB Max_MA 20.12 GB CA 25.9 GB Max_CA 26 GB \[2023-04-14 18:06:48,513\] \[INFO\] \[utils.py:793:see_memory_usage\] CPU Virtual Memory: used = 51.29 GB, percent = 5.1% \[2023-04-14 18:06:48,515\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] DeepSpeed Final Optimizer = AdamW \[2023-04-14 18:06:48,515\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] DeepSpeed using client LR scheduler \[2023-04-14 18:06:48,515\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] DeepSpeed LR Scheduler = \ \[2023-04-14 18:06:48,515\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] step=0, skipped=0, lr=\[0.0001, 0.0001\], mom=\[(0.9, 0.999), (0.9, 0.999)

2023-04-14 18:06:48,515\] \[INFO\] \[config.py:953:print\] DeepSpeedEngine configuration: \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] aio_config ... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] amp_enabled ... False \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] amp_params ... False \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] autotuning_config ... { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] bfloat16_enabled ... False \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] checkpoint_parallel_write_pipeline False \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] checkpoint_tag_validation_enabled True \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] checkpoint_tag_validation_fail False \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] comms_config ... \ \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] communication_data_type ... None \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] compression_config ... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] curriculum_enabled_legacy ... False \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] curriculum_params_legacy ... False \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] data_efficiency_config ... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] data_efficiency_enabled ... False \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] dataloader_drop_last ... False \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] disable_allgather ... False \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] dump_state ... False \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] dynamic_loss_scale_args ... {'init_scale': 65536, 'scale_window': 1000, 'delayed_shift': 2, 'min_scale': 1} \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] eigenvalue_enabled ... False \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] eigenvalue_gas_boundary_resolution 1 \[2023-04-14 18:06:48,516\] \[INFO\] \[config.py:957:print\] eigenvalue_layer_name ... bert.encoder.layer \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] eigenvalue_layer_num ... 0 \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] eigenvalue_max_iter ... 100 \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] eigenvalue_stability ... 1e-06 \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] eigenvalue_tol ... 0.01 \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] eigenvalue_verbose ... False \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] elasticity_enabled ... False \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] flops_profiler_config ... { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] fp16_auto_cast ... False \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] fp16_enabled ... True \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] fp16_master_weights_and_gradients False \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] global_rank ... 0 \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] grad_accum_dtype ... None \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] gradient_accumulation_steps ... 1 \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] gradient_clipping ... 0.0 \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] gradient_predivide_factor ... 1.0 \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] hybrid_engine ... enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] initial_dynamic_scale ... 65536 \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] load_universal_checkpoint ... False \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] loss_scale ... 0 \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] memory_breakdown ... False \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] monitor_config ... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] nebula_config ... { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] optimizer_legacy_fusion ... False \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] optimizer_name ... None \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] optimizer_params ... None \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] pipeline ... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] pld_enabled ... False \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] pld_params ... False \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] prescale_gradients ... False \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] scheduler_name ... None \[2023-04-14 18:06:48,517\] \[INFO\] \[config.py:957:print\] scheduler_params ... None \[2023-04-14 18:06:48,518\] \[INFO\] \[config.py:957:print\] sparse_attention ... None \[2023-04-14 18:06:48,518\] \[INFO\] \[config.py:957:print\] sparse_gradients_enabled ... False \[2023-04-14 18:06:48,518\] \[INFO\] \[config.py:957:print\] steps_per_print ... 10 \[2023-04-14 18:06:48,518\] \[INFO\] \[config.py:957:print\] train_batch_size ... 192 \[2023-04-14 18:06:48,518\] \[INFO\] \[config.py:957:print\] train_micro_batch_size_per_gpu 24 \[2023-04-14 18:06:48,518\] \[INFO\] \[config.py:957:print\] use_node_local_storage ... False \[2023-04-14 18:06:48,518\] \[INFO\] \[config.py:957:print\] wall_clock_breakdown ... False \[2023-04-14 18:06:48,518\] \[INFO\] \[config.py:957:print\] world_size ... 8 \[2023-04-14 18:06:48,518\] \[INFO\] \[config.py:957:print\] zero_allow_untested_optimizer True \[2023-04-14 18:06:48,518\] \[INFO\] \[config.py:957:print\] zero_config ... stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False memory_efficient_linear=True \[2023-04-14 18:06:48,518\] \[INFO\] \[config.py:957:print\] zero_enabled ... True \[2023-04-14 18:06:48,518\] \[INFO\] \[config.py:957:print\] zero_force_ds_cpu_optimizer ... True \[2023-04-14 18:06:48,518\] \[INFO\] \[config.py:957:print\] zero_optimization_stage ... 2 \[2023-04-14 18:06:48,518\] \[INFO\] \[config.py:943:print_user_config\] json = { "train_micro_batch_size_per_gpu": 24, "zero_allow_untested_optimizer": true, "fp16": { "enabled": true, "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "zero_optimization": { "stage": 2, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "contiguous_gradients": true } } Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.00031948089599609375 seconds 0%\| \| 0/596 \[00:00\

2023-04-14 18:07:18,877\] \[INFO\] \[timer.py:199:stop\] epoch=0/micro_step=10/global_step=10, RunningAvgSamplesPerSec=66.98818896434254, CurrSamplesPerSec=93.79590019766518, MemAllocated=21.59GB, MaxMemAllocated=28.8GB 1%\|█▍ \| 5/596 \[00:30\<1:00:11, 6.11s/it

...

2023-04-14 18:47:55,207\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] step=590, skipped=12, lr=\[3.02013422818792e-06, 3.02013422818792e-06\], mom=\[(0.9, 0.999), (0.9, 0.999)

2023-04-14 18:47:57,392\] \[INFO\] \[timer.py:199:stop\] epoch=0/micro_step=590/global_step=590, RunningAvgSamplesPerSec=45.931193758598916, CurrSamplesPerSec=45.63412532914195, MemAllocated=21.59GB, MaxMemAllocated=28.8GB 50%\|███████████████████████████████████████████████████████████████████████████████████▊ \| 299/596 \[41:42\<41:37, 8.41s/it\]\[2023-04-14 18:48:37,273\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] step=600, skipped=12, lr=\[1.3422818791946309e-06, 1.3422818791946309e-06\], mom=\[(0.9, 0.999), (0.9, 0.999)

2023-04-14 18:48:39,453\] \[INFO\] \[timer.py:199:stop\] epoch=0/micro_step=600/global_step=600, RunningAvgSamplesPerSec=45.92850276413307, CurrSamplesPerSec=45.66031263997641, MemAllocated=21.59GB, MaxMemAllocated=28.8GB {'loss': 13.3487, 'learning_rate': 1.3422818791946309e-06, 'epoch': 1.01} 50%\|████████████████████████████████████████████████████████████████████████████████████ \| 300/596 \[41:50\<41:30, 8.41s/it\]Saving the whole model \[INFO\|configuration_utils.py:457\] 2023-04-14 18:48:39,458 \>\> Configuration saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/config.json \[INFO\|configuration_utils.py:362\] 2023-04-14 18:48:39,459 \>\> Configuration saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/generation_config.json \[INFO\|modeling_utils.py:1855\] 2023-04-14 18:49:03,951 \>\> The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/pytorch_model.bin.index.json. \[INFO\|tokenization_utils_base.py:2171\] 2023-04-14 18:49:03,953 \>\> tokenizer config file saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/tokenizer_config.json \[INFO\|tokenization_utils_base.py:2178\] 2023-04-14 18:49:03,953 \>\> Special tokens file saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/special_tokens_map.json \[2023-04-14 18:49:03,983\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] \[Torch\] Checkpoint global_step600 is about to be saved! \[2023-04-14 18:49:03,988\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] Saving model checkpoint: /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt \[2023-04-14 18:49:03,988\] \[INFO\] \[torch_checkpoint_engine.py:21:save\] \[Torch\] Saving /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt... \[2023-04-14 18:49:15,934\] \[INFO\] \[torch_checkpoint_engine.py:23:save\] \[Torch\] Saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt. \[2023-04-14 18:49:15,937\] \[INFO\] \[torch_checkpoint_engine.py:21:save\] \[Torch\] Saving /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt... \[2023-04-14 18:49:28,049\] \[INFO\] \[torch_checkpoint_engine.py:23:save\] \[Torch\] Saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt. \[2023-04-14 18:49:28,049\] \[INFO\] \[engine.py:3125:_save_zero_checkpoint\] zero checkpoint saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt \[2023-04-14 18:49:28,049\] \[INFO\] \[torch_checkpoint_engine.py:33:commit\] \[Torch\] Checkpoint global_step600 is ready now! 51%\|████████████████████████████████████████████████████████████████████████████████████▏ \| 304/596 \[43:14\<1:05:51, 13.53s/it\]\[2023-04-14 18:50:09,137\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] step=610, skipped=12, lr=\[0.0, 0.0\], mom=\[(0.9, 0.999), (0.9, 0.999)

2023-04-14 18:50:11,316\] \[INFO\] \[timer.py:199:stop\] epoch=0/micro_step=610/global_step=610, RunningAvgSamplesPerSec=45.926876625767875, CurrSamplesPerSec=45.66709917655267, MemAllocated=21.59GB, MaxMemAllocated=28.8GB 52%\|██████████████████████████████████████████████████████████████████████████████████████▌ \| 309/596 \[43:56\<44:16, 9.26s/it\]\[2023-04-14 18:50:51,114\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] step=620, skipped=12, lr=\[0.0, 0.0\], mom=\[(0.9, 0.999), (0.9, 0.999)

2023-04-14 18:50:53,302\] \[INFO\] \[timer.py:199:stop\] epoch=0/micro_step=620/global_step=620, RunningAvgSamplesPerSec=45.92462533252217, CurrSamplesPerSec=45.55552426651123, MemAllocated=21.59GB, MaxMemAllocated=28.8GB {'loss': 13.3202, 'learning_rate': 0.0, 'epoch': 1.04} ... 99%\|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ \| 589/596 \[1:23:07\<00:58, 8.41s/it\]\[2023-04-14 19:30:02,654\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] step=1180, skipped=12, lr=\[0.0, 0.0\], mom=\[(0.9, 0.999), (0.9, 0.999)

2023-04-14 19:30:04,820\] \[INFO\] \[timer.py:199:stop\] epoch=0/micro_step=1180/global_step=1180, RunningAvgSamplesPerSec=45.85904109663022, CurrSamplesPerSec=45.73521852038509, MemAllocated=21.59GB, MaxMemAllocated=28.8GB {'loss': 13.3537, 'learning_rate': 0.0, 'epoch': 1.98} 100%\|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍\| 594/596 \[1:23:49\<00:16, 8.41s/it\]\[2023-04-14 19:30:44,847\] \[INFO\] \[logging.py:96:log_dist\] \[Rank 0\] step=1190, skipped=12, lr=\[0.0, 0.0\], mom=\[(0.9, 0.999), (0.9, 0.999)

2023-04-14 19:30:47,022\] \[INFO\] \[timer.py:199:stop\] epoch=0/micro_step=1190/global_step=1190, RunningAvgSamplesPerSec=45.856487437478386, CurrSamplesPerSec=45.579988341622055, MemAllocated=21.59GB, MaxMemAllocated=28.8GB {'train_runtime': 5046.8863, 'train_samples_per_second': 45.414, 'train_steps_per_second': 0.118, 'train_loss': 13.905431555421561, 'epoch': 2.0} 100%\|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 596/596 \[1:24:06\<00:00, 8.47s/it

***** train metrics *****

epoch = 2.0

train_loss = 13.9054

train_runtime = 1:24:06.88

train_samples = 114599

train_samples_per_second = 45.414

train_steps_per_second = 0.118

2023-04-14 19:30:58,560\] \[INFO\] \[launch.py:460:main\] Process 35198 exits successfully. \[2023-04-14 19:30:58,561\] \[INFO\] \[launch.py:460:main\] Process 35192 exits successfully. \[2023-04-14 19:30:58,561\] \[INFO\] \[launch.py:460:main\] Process 35193 exits successfully. \[2023-04-14 19:30:58,561\] \[INFO\] \[launch.py:460:main\] Process 35195 exits successfully. \[2023-04-14 19:30:58,561\] \[INFO\] \[launch.py:460:main\] Process 35191 exits successfully. \[2023-04-14 19:30:59,562\] \[INFO\] \[launch.py:460:main\] Process 35194 exits successfully. \[2023-04-14 19:30:59,563\] \[INFO\] \[launch.py:460:main\] Process 35197 exits successfully. \[2023-04-14 19:31:00,564\] \[INFO\] \[launch.py:460:main\] Process 35196 exits successfully. GPU显存占用: ```text Fri Apr 14 18:27:45 2023 ±----------------------------------------------------------------------------+ | NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 | |-------------------------------±---------------------±---------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=++==============| | 0 NVIDIA A800 80G... Off | 00000000:34:00.0 Off | 0 | | N/A 59C P0 92W / 300W | 36539MiB / 81920MiB | 100% Default | | | | Disabled | ±------------------------------±---------------------±---------------------+ | 1 NVIDIA A800 80G... Off | 00000000:35:00.0 Off | 0 | | N/A 61C P0 96W / 300W | 38395MiB / 81920MiB | 100% Default | | | | Disabled | ±------------------------------±---------------------±---------------------+ | 2 NVIDIA A800 80G... Off | 00000000:36:00.0 Off | 0 | | N/A 63C P0 93W / 300W | 38395MiB / 81920MiB | 100% Default | | | | Disabled | ±------------------------------±---------------------±---------------------+ | 3 NVIDIA A800 80G... Off | 00000000:37:00.0 Off | 0 | | N/A 65C P0 102W / 300W | 38347MiB / 81920MiB | 100% Default | | | | Disabled | ±------------------------------±---------------------±---------------------+ | 4 NVIDIA A800 80G... Off | 00000000:9B:00.0 Off | 0 | | N/A 64C P0 108W / 300W | 38347MiB / 81920MiB | 100% Default | | | | Disabled | ±------------------------------±---------------------±---------------------+ | 5 NVIDIA A800 80G... Off | 00000000:9C:00.0 Off | 0 | | N/A 64C P0 105W / 300W | 38395MiB / 81920MiB | 100% Default | | | | Disabled | ±------------------------------±---------------------±---------------------+ | 6 NVIDIA A800 80G... Off | 00000000:9D:00.0 Off | 0 | | N/A 58C P0 97W / 300W | 36433MiB / 81920MiB | 100% Default | | | | Disabled | ±------------------------------±---------------------±---------------------+ | 7 NVIDIA A800 80G... Off | 00000000:9E:00.0 Off | 0 | | N/A 59C P0 92W / 300W | 38347MiB / 81920MiB | 100% Default | | | | Disabled | ±------------------------------±---------------------±---------------------+ ``` ±----------------------------------------------------------------------------+ \| Processes: \| \| GPU GI CI PID Type Process name GPU Memory \| \| ID ID Usage \| \|=============================================================================\| \| 0 N/A N/A 35191 C ...nv-py310-cu117/bin/python 36537MiB \| \| 1 N/A N/A 35192 C ...nv-py310-cu117/bin/python 38393MiB \| \| 2 N/A N/A 35193 C ...nv-py310-cu117/bin/python 38393MiB \| \| 3 N/A N/A 35194 C ...nv-py310-cu117/bin/python 38345MiB \| \| 4 N/A N/A 35195 C ...nv-py310-cu117/bin/python 38345MiB \| \| 5 N/A N/A 35196 C ...nv-py310-cu117/bin/python 38393MiB \| \| 6 N/A N/A 35197 C ...nv-py310-cu117/bin/python 36431MiB \| \| 7 N/A N/A 35198 C ...nv-py310-cu117/bin/python 38345MiB \| ±----------------------------------------------------------------------------+ 输出文件: ```text tree /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4 /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4 ├── all_results.json ├── checkpoint-300 │ ├── config.json │ ├── configuration_chatglm.py │ ├── generation_config.json │ ├── global_step600 │ │ ├── mp_rank_00_model_states.pt │ │ ├── zero_pp_rank_0_mp_rank_00_optim_states.pt │ │ ├── zero_pp_rank_1_mp_rank_00_optim_states.pt │ │ ├── zero_pp_rank_2_mp_rank_00_optim_states.pt │ │ ├── zero_pp_rank_3_mp_rank_00_optim_states.pt │ │ ├── zero_pp_rank_4_mp_rank_00_optim_states.pt │ │ ├── zero_pp_rank_5_mp_rank_00_optim_states.pt │ │ ├── zero_pp_rank_6_mp_rank_00_optim_states.pt │ │ └── zero_pp_rank_7_mp_rank_00_optim_states.pt │ ├── ice_text.model │ ├── latest │ ├── modeling_chatglm.py │ ├── pytorch_model-00001-of-00002.bin │ ├── pytorch_model-00002-of-00002.bin │ ├── pytorch_model.bin.index.json │ ├── quantization.py │ ├── rng_state_0.pth │ ├── rng_state_1.pth │ ├── rng_state_2.pth │ ├── rng_state_3.pth │ ├── rng_state_4.pth │ ├── rng_state_5.pth │ ├── rng_state_6.pth │ ├── rng_state_7.pth │ ├── special_tokens_map.json │ ├── tokenization_chatglm.py │ ├── tokenizer_config.json │ ├── trainer_state.json │ ├── training_args.bin │ └── zero_to_fp32.py ├── trainer_state.json └── train_results.json ``` 2 directories, 36 files 训练结束后没有保存模型权重,只保存了训练过程中的checkpoint,可在代码中添加`trainer.save_model()`进行保存。 使用DeepSpeed进行full finetuning,对于显存要求较高,且训练较慢。因此下面尝试使用官网提供的P-Tuning v2进行高效参数微调。 ### **使用P-Tuning v2对ChatGLM-6B进行参数高效微调** 对于 ChatGLM-6B 模型基于 **[P-Tuning v2](https://link.zhihu.com/?target=https%3A//github.com/THUDM/P-tuning-v2)** 进行微调。可将需要微调的参数量减少到原来的 0.1%,再通过模型量化、Gradient Checkpoint 等方法,最低只需要 7GB 显存即可运行。 首先,修改`train.sh`脚本,主要是修改`train_file`、`validation_file`、`model_name_or_path`、`output_dir`参数: ```text PRE_SEQ_LEN=128 LR=2e-2 ``` CUDA_VISIBLE_DEVICES=0 python3 main.py --do_train --train_file /data/nfs/llm/data/AdvertiseGen/train.json --validation_file /data/nfs/llm/data/AdvertiseGen/dev.json --prompt_column content --response_column summary --overwrite_cache --model_name_or_path /data/nfs/llm/model/chatglm-6b --output_dir /home/guodong.li/output/adgen-chatglm-6b-pt- P R E S E Q L E N − PRE_SEQ_LEN- PRESEQLEN−LR --overwrite_output_dir --max_source_length 64 --max_target_length 64 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 16 --predict_with_generate --max_steps 3000 --logging_steps 10 --save_steps 1000 --learning_rate $LR --pre_seq_len $PRE_SEQ_LEN --quantization_bit 4 运行过程: ```text 0%| | 0/3000 [00:00, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.02, length_column_name=length, load_best_model_at_end=False, local_rank=-1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/runs/Apr14_19-46-38_ai-app-2-46, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=1.0, optim=adamw_hf, optim_args=None, output_dir=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=128, predict_with_generate=True, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=[], resume_from_checkpoint=None, run_name=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2, save_on_each_node=False, save_safetensors=False, save_steps=100, save_strategy=steps, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, xpu_backend=None, ) 04/14/2023 19:47:58 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-1cf934bed8e233e6e) 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ [INFO|configuration_utils.py:666] 2023-04-14 19:47:58,671 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json [WARNING|configuration_auto.py:925] 2023-04-14 19:47:58,671 >> Explicitly passing a `revision` is encouraged when loading a configuratio a newer revision. [INFO|configuration_utils.py:666] 2023-04-14 19:47:58,679 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json [INFO|configuration_utils.py:720] 2023-04-14 19:47:58,681 >> Model config ChatGLMConfig { "_name_or_path": "/data/nfs/llm/model/chatglm-6b", "architectures": [ "ChatGLMModel" ], "auto_map": { "AutoConfig": "configuration_chatglm.ChatGLMConfig", "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration" }, "bos_token_id": 130004, "eos_token_id": 130005, "gmask_token_id": 130001, "hidden_size": 4096, "inner_hidden_size": 16384, "layernorm_epsilon": 1e-05, "mask_token_id": 130000, "max_sequence_length": 2048, "model_type": "chatglm", "num_attention_heads": 32, "num_layers": 28, "pad_token_id": 3, "position_encoding_2d": true, "pre_seq_len": null, "prefix_projection": false, "quantization_bit": 0, "torch_dtype": "float16", "transformers_version": "4.28.0", "use_cache": true, "vocab_size": 130528 } ``` \[WARNING\|tokenization_auto.py:675\] 2023-04-14 19:47:58,683 \>\> Explicitly passing a `revision` is encouraged when loading a model with curevision. \[INFO\|tokenization_utils_base.py:1807\] 2023-04-14 19:47:58,692 \>\> loading file ice_text.model \[INFO\|tokenization_utils_base.py:1807\] 2023-04-14 19:47:58,692 \>\> loading file added_tokens.json \[INFO\|tokenization_utils_base.py:1807\] 2023-04-14 19:47:58,692 \>\> loading file special_tokens_map.json \[INFO\|tokenization_utils_base.py:1807\] 2023-04-14 19:47:58,692 \>\> loading file tokenizer_config.json \[WARNING\|auto_factory.py:456\] 2023-04-14 19:47:59,089 \>\> Explicitly passing a `revision` is encouraged when loading a model with custom ion. \[INFO\|modeling_utils.py:2531\] 2023-04-14 19:47:59,115 \>\> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.jso \[INFO\|configuration_utils.py:575\] 2023-04-14 19:47:59,117 \>\> Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 130004, "eos_token_id": 130005, "pad_token_id": 3, "transformers_version": "4.28.0" } Loading checkpoint shards: 100%\|████████████████████████████████████████████████████████████████████████████████████████████████████████ \[INFO\|modeling_utils.py:3190\] 2023-04-14 19:48:08,508 \>\> All model checkpoint weights were used when initializing ChatGLMForConditionalG \[WARNING\|modeling_utils.py:3192\] 2023-04-14 19:48:08,508 \>\> Some weights of ChatGLMForConditionalGeneration were not initialized from thtialized: \['transformer.prefix_encoder.embedding.weight'

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

INFO\|modeling_utils.py:2839\] 2023-04-14 19:48:08,548 \>\> Generation config file not found, using a generation config created from the mo Quantized to 4 bit input_ids \[5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 15388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65564219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 6 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3

inputs 类型#裤版型#宽松 风格#性感图案#线条 裤型#阔腿裤 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长适贴身体验感棒棒哒。系带部分增加设计看点,还

label_ids [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 741-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100

labels 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自

/home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/lib/python3.10/site-packages/transformers/optimization.py:391: FutureWain a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warn

warnings.warn(

0%| 04/14/2023 19:51:19 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkp

{'loss': 6.0246, 'learning_rate': 0.016428571428571428, 'epoch': 0.18}

{'loss': 7.8721, 'learning_rate': 0.012857142857142859, 'epoch': 0.36}

{'loss': 8.2653, 'learning_rate': 0.009285714285714286, 'epoch': 0.54}

{'loss': 8.6636, 'learning_rate': 0.005714285714285714, 'epoch': 0.71}

{'loss': 8.5985, 'learning_rate': 0.002142857142857143, 'epoch': 0.89}

{'train_runtime': 4868.4062, 'train_samples_per_second': 23.539, 'train_steps_per_second': 0.012, 'train_loss': 7.956800188337054, 'epoc

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

***** train metrics *****

epoch = 1.0

train_loss = 7.9568

train_runtime = 1:21:08.40

train_samples = 114599

train_samples_per_second = 23.539

train_steps_per_second = 0.012

显存占用:

text 复制代码
Sun Apr 16 19:53:00 2023

±----------------------------------------------------------------------------+

| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |

|-------------------------------±---------------------±---------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|                               |                      |               MIG M. |

|=++==============|

|   0  NVIDIA A800 80G...  Off  | 00000000:34:00.0 Off |                    0 |

| N/A   71C    P0   281W / 300W |  63275MiB / 81920MiB |     92%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+

| Processes: |

| GPU GI CI PID Type Process name GPU Memory |

| ID ID Usage |

|=============================================================================|

| 0 N/A N/A 20126 C python3 63273MiB |

±----------------------------------------------------------------------------+

输出文件:

text 复制代码
> ls -al  /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2

total 12

drwxrwxr-x 2 guodong.li guodong.li   98 Apr 14 21:12 .

drwxrwxr-x 8 guodong.li guodong.li  177 Apr 14 17:12 ...

-rw-rw-r-- 1 guodong.li guodong.li  195 Apr 14 21:12 all_results.json

-rw-rw-r-- 1 guodong.li guodong.li 1185 Apr 14 21:12 trainer_state.json

-rw-rw-r-- 1 guodong.li guodong.li  195 Apr 14 21:12 train_results.json

可以看到,通过调整batch_size,显存使用及利用率都提升上去了。

如果需要使用DeepSpeed进行数据并行,可参考如下命令:

text 复制代码
PRE_SEQ_LEN=128

LR=2e-2

deepspeed --include localhost:1,2,3 --master_port 29001 main.py

--deepspeed deepspeed.json

--do_train

--train_file /data/nfs/llm/data/AdvertiseGen/train.json

--validation_file /data/nfs/llm/data/AdvertiseGen/dev.json

--prompt_column content

--response_column summary

--overwrite_cache

--model_name_or_path /data/nfs/llm/model/chatglm-6b

--output_dir /home/guodong.li/output/adgen-chatglm-6b-pt

--overwrite_output_dir

--max_source_length 64

--max_target_length 64

--per_device_train_batch_size 128

--per_device_eval_batch_size 8

--gradient_accumulation_steps 16

--predict_with_generate

--num_train_epochs 10

--logging_steps 10

--save_steps 100

--learning_rate $LR

--pre_seq_len $PRE_SEQ_LEN

模型评估

修改evaluate.sh文件,修改model_name_or_path(模型路径),ptuning_checkpoint(P-Tuning v2微调之后的权重路径)等参数:

text 复制代码
PRE_SEQ_LEN=128

CHECKPOINT=adgen-chatglm-6b-pt-128-2e-2

STEP=3000

PRE_SEQ_LEN=128

CHECKPOINT=adgen-chatglm-6b-pt-128-2e-2

STEP=3000

CUDA_VISIBLE_DEVICES=1 python3 main.py

--do_predict

--validation_file /data/nfs/llm/data/AdvertiseGen/dev.json

--test_file /data/nfs/llm/data/AdvertiseGen/dev.json

--overwrite_cache

--prompt_column content

--response_column summary

--model_name_or_path /data/nfs/llm/model/chatglm-6b

--ptuning_checkpoint /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500

--output_dir /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500

--overwrite_output_dir

--max_source_length 64

--max_target_length 64

--per_device_eval_batch_size 1

--predict_with_generate

--pre_seq_len $PRE_SEQ_LEN

--quantization_bit 4

运行过程:

text 复制代码
sh evaluate.sh

04/16/2023 20:18:01 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False

04/16/2023 20:18:01 - INFO - main - Training/evaluation parameters Seq2SeqTrainingArguments(

_n_gpu=1,

adafactor=False,

adam_beta1=0.9,

adam_beta2=0.999,

adam_epsilon=1e-08,

auto_find_batch_size=False,

...

fp16=False,

fp16_backend=auto,

fp16_full_eval=False,

fp16_opt_level=O1,

fsdp=[],

fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},

fsdp_min_num_params=0,

fsdp_transformer_layer_cls_to_wrap=None,

full_determinism=False,

generation_config=None,

...

warmup_ratio=0.0,

warmup_steps=0,

weight_decay=0.0,

xpu_backend=None,

)

Downloading and preparing dataset json/default to /home/guodong.li/.cache/huggingface/datasets/json/default-df42438b5ccb0b44/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e...

Downloading data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3419.73it/s]

Extracting data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 196.48it/s]

Dataset json downloaded and prepared to /home/guodong.li/.cache/huggingface/datasets/json/default-df42438b5ccb0b44/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e. Subsequent calls will reuse this data.

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 326.85it/s]

[INFO|configuration_utils.py:666] 2023-04-16 20:19:21,784 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[WARNING|configuration_auto.py:925] 2023-04-16 20:19:21,785 >> Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.

[INFO|configuration_utils.py:666] 2023-04-16 20:19:21,792 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[INFO|configuration_utils.py:720] 2023-04-16 20:19:21,795 >> Model config ChatGLMConfig {

"_name_or_path": "/data/nfs/llm/model/chatglm-6b",

"architectures": [

"ChatGLMModel"

],

"auto_map": {

"AutoConfig": "configuration_chatglm.ChatGLMConfig",

"AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",

"AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration"

},

"bos_token_id": 130004,

"eos_token_id": 130005,

"gmask_token_id": 130001,

"hidden_size": 4096,

"inner_hidden_size": 16384,

"layernorm_epsilon": 1e-05,

"mask_token_id": 130000,

"max_sequence_length": 2048,

"model_type": "chatglm",

"num_attention_heads": 32,

"num_layers": 28,

"pad_token_id": 3,

"position_encoding_2d": true,

"pre_seq_len": null,

"prefix_projection": false,

"quantization_bit": 0,

"torch_dtype": "float16",

"transformers_version": "4.28.0",

"use_cache": true,

"vocab_size": 130528

}

WARNING\|tokenization_auto.py:675\] 2023-04-16 20:19:21,797 \>\> Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. \[INFO\|tokenization_utils_base.py:1807\] 2023-04-16 20:19:21,805 \>\> loading file ice_text.model \[INFO\|tokenization_utils_base.py:1807\] 2023-04-16 20:19:21,805 \>\> loading file added_tokens.json \[INFO\|tokenization_utils_base.py:1807\] 2023-04-16 20:19:21,805 \>\> loading file special_tokens_map.json \[INFO\|tokenization_utils_base.py:1807\] 2023-04-16 20:19:21,805 \>\> loading file tokenizer_config.json \[WARNING\|auto_factory.py:456\] 2023-04-16 20:19:22,186 \>\> Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. \[INFO\|modeling_utils.py:2531\] 2023-04-16 20:19:22,222 \>\> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.json \[INFO\|configuration_utils.py:575\] 2023-04-16 20:19:22,224 \>\> Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 130004, "eos_token_id": 130005, "pad_token_id": 3, "transformers_version": "4.28.0" } Loading checkpoint shards: 100%\|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 8/8 \[00:08\<00:00, 1.04s/it

INFO\|modeling_utils.py:3190\] 2023-04-16 20:19:30,912 \>\> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration. \[WARNING\|modeling_utils.py:3192\] 2023-04-16 20:19:30,912 \>\> Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at /data/nfs/llm/model/chatglm-6b and are newly initialized: \['transformer.prefix_encoder.embedding.weight'

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

INFO\|modeling_utils.py:2839\] 2023-04-16 20:19:30,967 \>\> Generation config file not found, using a generation config created from the model config. Quantized to 4 bit input_ids \[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 65421, 61, 75898, 32, 68554, 61, 77257, 64555, 32, 65107, 61, 66268, 32, 65347, 61, 71689, 32, 69768, 61, 85428, 32, 65173, 73942, 61, 70984, 32, 65173, 70936, 61, 64703, 65509, 130001, 130004

inputs 类型#上衣材质#牛仔布 颜色#白色风格#简约 图案#刺绣衣样式#外套 衣款式#破洞

label_ids [5, 71689, 66561, 67061, 77257, 70984, 6, 72194, 65173, 64290, 64622, 81549, 63823, 65173, 64290, 83343, 63832, 63912, 65209, 64703, 65509, 64051, 6, 69418, 78598, 87019, 6, 64257, 71319, 66069, 74197, 63823, 65173, 72265, 64880, 64131, 63832, 73416, 85428, 66261, 6, 65594, 87834, 6, 73412, 105145, 65388, 63823, 130001, 130004]

labels 简约而不简单的牛仔外套,白色的衣身十分百搭。衣身多处有做旧破洞设计,打破单调乏味,增加一丝造型看点。衣身后背处有趣味刺绣装饰,丰富层次感,彰显别样时尚。

04/16/2023 20:21:30 - INFO - main - *** Predict ***

INFO\|configuration_utils.py:575\] 2023-04-16 20:21:30,090 \>\> Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 130004, "eos_token_id": 130005, "pad_token_id": 3, "transformers_version": "4.28.0" } 0%\| \| 0/1070 \[00:00\\> Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 130004, "eos_token_id": 130005, "pad_token_id": 3, "transformers_version": "4.28.0" } 0%\|▎ \| 2/1070 \[00:02\<25:39, 1.44s/it\]\[INFO\|configuration_utils.py:575\] 2023-04-16 20:21:37,311 \>\> Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 130004, "eos_token_id": 130005, "pad_token_id": 3, "transformers_version": "4.28.0" } 0%\|▍ \| 3/1070 ... 1%\|█▎ \| 8/1070 \[00:20\<50:13, 2.84s/it\]\[INFO\|configuration_utils.py:575\] 2023-04-16 20:21:55,233 \>\> Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 130004, "eos_token_id": 130005, "pad_token_id": 3, "transformers_version": "4.28.0" } 1%\|█▍ \| 9/1070 \[00:23\<50:24, 2.85s/it\]\[INFO\|configuration_utils.py:575\] 2023-04-16 20:21:58,112 \>\> Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 130004, "eos_token_id": 130005, "pad_token_id": 3, "transformers_version": "4.28.0" } 1%\|█▌ \| 10/1070 \[00:26\<50:30, 2.86s/it\]\[INFO\|configuration_utils.py:575\] 2023-04-16 20:22:00,990 \>\> Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 130004, "eos_token_id": 130005, "pad_token_id": 3, "transformers_version": "4.28.0" } 1%\|█▋ \| 11/1070 \[00:29\<50:37, 2.87s/it\]\[INFO\|configuration_utils.py:575\] 2023-04-16 20:22:03,880 \>\> Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 130004, "eos_token_id": 130005, "pad_token_id": 3, "transformers_version": "4.28.0" } 1%\|█▊ \| 12/1070 \[00:32\<50:38, 2.87s/it\]\[INFO\|configuration_utils.py:575\] 2023-04-16 20:22:06,761 \>\> Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 130004, "eos_token_id": 130005, "pad_token_id": 3, "transformers_version": "4.28.0" } ... \[INFO\|configuration_utils.py:575\] 2023-04-16 21:13:16,240 \>\> Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 130004, "eos_token_id": 130005, "pad_token_id": 3, "transformers_version": "4.28.0" } 100%\|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊\| 1069/1070 \[51:44\<00:02, 2.92s/it\]\[INFO\|configuration_utils.py:575\] 2023-04-16 21:13:19,107 \>\> Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 130004, "eos_token_id": 130005, "pad_token_id": 3, "transformers_version": "4.28.0" } 100%\|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 1070/1070 \[51:47\<00:00, 2.90s/it\]Building prefix dict from the default dictionary ... 04/16/2023 21:13:22 - DEBUG - jieba - Building prefix dict from the default dictionary ... Dumping model to file cache /tmp/jieba.cache 04/16/2023 21:13:22 - DEBUG - jieba - Dumping model to file cache /tmp/jieba.cache Loading model cost 0.634 seconds. 04/16/2023 21:13:22 - DEBUG - jieba - Loading model cost 0.634 seconds. Prefix dict has been built successfully. 04/16/2023 21:13:22 - DEBUG - jieba - Prefix dict has been built successfully. 100%\|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 1070/1070 \[51:53\<00:00, 2.91s/it

***** predict metrics *****

predict_bleu-4 = 0.7846

predict_rouge-1 = 8.8941

predict_rouge-2 = 1.3703

predict_rouge-l = 16.4982

predict_runtime = 0:51:57.77

predict_samples = 1070

predict_samples_per_second = 0.343

predict_steps_per_second = 0.343

模型推理

新增inference.py文件:

text 复制代码
import os

import torch

from transformers import AutoConfig, AutoModel, AutoTokenizer

MODEL_PATH = "/data/nfs/llm/model/chatglm-6b"

CHECKPOINT_PATH = "/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500"

载入Tokenizer

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)

config = AutoConfig.from_pretrained(MODEL_PATH, trust_remote_code=True, pre_seq_len=128)

model = AutoModel.from_pretrained(MODEL_PATH, config=config, trust_remote_code=True).cuda()

prefix_state_dict = torch.load(os.path.join(CHECKPOINT_PATH, "pytorch_model.bin"))

new_prefix_state_dict = {}

for k, v in prefix_state_dict.items():

if k.startswith("transformer.prefix_encoder."):

new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v

model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)

print(f"Quantized to 4 bit")

model = model.quantize(4)

model = model.half().cuda()

model.transformer.prefix_encoder.float()

model = model.eval()

print("用户:你好\n")

response, history = model.chat(tokenizer, "你好", history=[])

print("ChatGLM-6B:\n",response)

print("\n------------------------------------------------\n用户:")

line = input()

while line:

response, history = model.chat(tokenizer, line, history=history)

print("ChatGLM-6B:\n", response)

print("\n------------------------------------------------\n用户:")

line = input()

运行命令:

text 复制代码
CUDA_VISIBLE_DEVICES=0 python3 inference.py

结语

上面使用了DeepSpeed DP+ZeRO对ChatGLM-6B进行全参数微调,同时,当我们遇到GPU资源不足的情况下,可以利用P-Tuning v2进行了高效参数微调。

参考文档

相关推荐
cooldream20091 小时前
华为云Flexus+DeepSeek征文|利用华为云一键部署 Dify 平台并接入 DeepSeek 大模型,构建长篇文章生成助手
大模型·华为云·dify
静心问道4 小时前
SELF-INSTRUCT:使用自生成指令对齐语言模型
人工智能·语言模型·大模型
大模型铲屎官10 天前
【Go语言-Day 7】循环控制全解析:从 for 基础到 for-range 遍历与高级控制
开发语言·人工智能·后端·golang·大模型·go语言·循环控制
玩电脑的辣条哥10 天前
AI-Sphere-Butler之如何将豆包桌面版对接到AI全能管家~新玩法(一)
人工智能·大模型·豆包·ai全能管家·豆包助手
喜欢吃豆10 天前
快速手搓一个MCP服务指南(一):FastMCP 快速入门指南详解
网络·人工智能·python·深度学习·大模型·mcp
mengyoufengyu11 天前
DeepSeek15-揭密模型上下文协议(MCP)
人工智能·大模型·deepseek
linweidong11 天前
一站式用AI编程神奇Cursor/Trae(VScode环境)开发运行Scala应用
大数据·vscode·后端·大模型·scala·ai编程·cursor
Tadas-Gao11 天前
视觉Transformer金字塔架构演进:从PVT到CoaT的技术脉络与创新解析
人工智能·深度学习·机器学习·大模型·llm·transformer
liuyunshengsir11 天前
神经网络中的均方误差(Mean Squared Error)详解
深度学习·神经网络·大模型