RuntimeError: CUDA error: device-side assert triggered

报错源码情况

复制代码
Traceback (most recent call last):
  File "/home/pyUser/proDB/project-ml/nlp/python/wmt/run_tf_1.py", line 701, in <module>
    train_seq2seq( lr, 1, device)
  File "/home/pyUser/proDB/project-ml/nlp/python/wmt/run_tf_1.py", line 676, in train_seq2seq
    Y_hat, _ = tf_net(X, dec_input, X_valid_len)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pyUser/anaconda3/envs/pytorch/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pyUser/anaconda3/envs/pytorch/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pyUser/proDB/project-ml/nlp/python/wmt/run_tf_1.py", line 517, in forward
    enc_outputs = self.encoder(enc_X, *args)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pyUser/anaconda3/envs/pytorch/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pyUser/anaconda3/envs/pytorch/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pyUser/proDB/project-ml/nlp/python/wmt/run_tf_1.py", line 366, in forward
    to_pos = emb_data *  math.sqrt(self.num_hiddens)
             ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

第一步:启用同步模式,方便调试

复制代码
import os
# CUDA_LAUNCH_BLOCKING=1:启用同步模式,主机等待内核完成。这有助于调试和性能分析,但会降低整体效率。‌

# CUDA_LAUNCH_BLOCKING=0(默认):保持异步模式,允许主机与GPU并行执行,提升性能。‌

os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

增加后,报错信息如下:

复制代码
Traceback (most recent call last):
  File "/home/pyUser/proDB/project-ml/nlp/python/wmt/run_tf_1.py", line 707, in <module>
    train_seq2seq( lr, 1, device)
  File "/home/pyUser/proDB/project-ml/nlp/python/wmt/run_tf_1.py", line 682, in train_seq2seq
    Y_hat, _ = tf_net(X, dec_input, X_valid_len)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pyUser/anaconda3/envs/pytorch/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pyUser/anaconda3/envs/pytorch/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pyUser/proDB/project-ml/nlp/python/wmt/run_tf_1.py", line 523, in forward
    enc_outputs = self.encoder(enc_X, *args)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pyUser/anaconda3/envs/pytorch/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pyUser/anaconda3/envs/pytorch/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pyUser/proDB/project-ml/nlp/python/wmt/run_tf_1.py", line 370, in forward
    emb_data = self.embedding(X)
               ^^^^^^^^^^^^^^^^^
  File "/home/pyUser/anaconda3/envs/pytorch/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pyUser/anaconda3/envs/pytorch/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pyUser/anaconda3/envs/pytorch/lib/python3.12/site-packages/torch/nn/modules/sparse.py", line 190, in forward
    return F.embedding(
           ^^^^^^^^^^^^
  File "/home/pyUser/anaconda3/envs/pytorch/lib/python3.12/site-packages/torch/nn/functional.py", line 2551, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 

错误明确指出是在 torch.nn.functional.embedding 调用时出错。:

结论:X 中包含非法 token ID(超出词表范围或为负数)

因为使用了SentencePiece,所以先验证控制符

复制代码
sp_model_path = './format_data/'
sp = spm.SentencePieceProcessor()
sp.load(sp_model_path + "spm_bpe.model")

print("Vocab size:", sp.vocab_size())
print("PAD ID:", sp.pad_id())
print("BOS ID:", sp.bos_id())
print("EOS ID:", sp.eos_id())
print("UNK ID:", sp.unk_id())

# 验证这些 ID 是否在合法范围内 [0, vocab_size)
assert 0 <= sp.pad_id() < sp.vocab_size()
assert 0 <= sp.bos_id() < sp.vocab_size()
assert 0 <= sp.eos_id() < sp.vocab_size()
assert 0 <= sp.unk_id() < sp.vocab_size()

然后发现pad_id 为-1

发现原训练语句有误:

复制代码
spm_train \
  --input=all_data_no_split.txt \
  --model_prefix=spm_bpe \
  --vocab_size=32000 \
  --model_type=bpe \
  --user_defined_symbols="<pad>,<bos>,<eos>" \ 
  --unk_id=0 \
  --bos_id=1 \
  --eos_id=2 \
  --pad_id=3 \
  --num_threads=16

问题出在: --user_defined_symbols="<pad>,<bos>,<eos>" \

删除后重新训练词表,然后在运行就没有问题了

相关推荐
聆风吟º14 小时前
CANN runtime 全链路拆解:AI 异构计算运行时的任务管理与功能适配技术路径
人工智能·深度学习·神经网络·cann
User_芊芊君子14 小时前
CANN大模型推理加速引擎ascend-transformer-boost深度解析:毫秒级响应的Transformer优化方案
人工智能·深度学习·transformer
智驱力人工智能15 小时前
小区高空抛物AI实时预警方案 筑牢社区头顶安全的实践 高空抛物检测 高空抛物监控安装教程 高空抛物误报率优化方案 高空抛物监控案例分享
人工智能·深度学习·opencv·算法·安全·yolo·边缘计算
人工不智能57715 小时前
拆解 BERT:Output 中的 Hidden States 到底藏了什么秘密?
人工智能·深度学习·bert
h64648564h15 小时前
CANN 性能剖析与调优全指南:从 Profiling 到 Kernel 级优化
人工智能·深度学习
心疼你的一切15 小时前
解密CANN仓库:AIGC的算力底座、关键应用与API实战解析
数据仓库·深度学习·aigc·cann
学电子她就能回来吗17 小时前
深度学习速成:损失函数与反向传播
人工智能·深度学习·学习·计算机视觉·github
Coder_Boy_18 小时前
TensorFlow小白科普
人工智能·深度学习·tensorflow·neo4j
大模型玩家七七18 小时前
梯度累积真的省显存吗?它换走的是什么成本
java·javascript·数据库·人工智能·深度学习
kkzhang19 小时前
Concept Bottleneck Models-概念瓶颈模型用于可解释决策:进展、分类体系 与未来方向综述
深度学习