【Pytorch】Yolov5中CPU转GPU过程报错完善留档归纳

Yolov5 + 从CPU转GPU + Python多版本切换 + Conda包处理

文章目录

    • [Yolov5 + 从CPU转GPU + Python多版本切换 + Conda包处理](#Yolov5 + 从CPU转GPU + Python多版本切换 + Conda包处理)
    • 1.Pytorch套件中存在版本不匹配
    • 2.numpy停留在3.8没跟上pytorch2.2.2
    • [3.ModuleNotFoundError: No module named 'pandas._libs.interval'](#3.ModuleNotFoundError: No module named 'pandas._libs.interval')
    • [4.ImportError: cannot import name '_c_internal_utils' from partially initialized module 'matplotlib' (most likely due to a circular import)](#4.ImportError: cannot import name '_c_internal_utils' from partially initialized module 'matplotlib' (most likely due to a circular import))
    • [5. 单升级matplotlib导致依赖缺失未升级](#5. 单升级matplotlib导致依赖缺失未升级)
    • [6.ImportError: The scipy install you are using seems to be broken, (extension modules cannot be imported)](#6.ImportError: The scipy install you are using seems to be broken, (extension modules cannot be imported))
    • [7.If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management 炸空间](#7.If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management 炸空间)
    • [8.NotImplementedError: Could not run 'torchvision::nms' with arguments from the 'cuDA' backend.](#8.NotImplementedError: Could not run ‘torchvision::nms' with arguments from the 'cuDA' backend.)

1.Pytorch套件中存在版本不匹配

这是后续一系列惨烈报错的起点,包括但不限于pytorch与torch,torch与torchvision,numpy与python,升级python及numpy导致matplotlib多版本残留,处理matplotlib又导致scipy包损坏,最后一切修好后出现炸掉空间无法启动。。。。。。

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.

torchvision 0.17.2 requires torch==2.2.2, but you have torch 1.8.0 which is incompatible.

解决办法

与此前自动升级了torch而落下torchvision相反,这次是torch被落下(

发生原因是用conda create -n yolov5 python=3.8后,考虑原项目使用的是

torch==1.80+python3.8.16

把torch,torchvision,python都从低版本升级到高版本后,各种问题开始浮上水面。

2.numpy停留在3.8没跟上pytorch2.2.2

torch OSError: [WinError 126] 找不到指定的模块

升级numpy

pip install --upgrade numpy

3.ModuleNotFoundError: No module named 'pandas._libs.interval'

(yolo5) C:\Users\ASUS\Desktop\yolo\yolov5>python train.py --img 640 --batch 32 --epoch 3 --data data/horse.yaml --cfg models/yolov5s.yaml --weights weights/yolov5s.pt

Traceback (most recent call last):

File "C:\Users\ASUS\Desktop\yolo\yolov5\train.py", line 49, in

import val as validate # for end-of-epoch mAP

^^^^^^^^^^^^^^^^^^^^^^

File "C:\Users\ASUS\Desktop\yolo\yolov5\val.py", line 39, in

from models.common import DetectMultiBackend

File "C:\Users\ASUS\Desktop\yolo\yolov5\models\common.py", line 18, in

import pandas as pd

File "E:\anaconda3\envs\yolo5\Lib\site-packages\pandas_init _.py", line 22, in

from pandas.compat import is_numpy_dev as is_numpy_dev # pyright: ignore # noqa:F401
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\anaconda3\envs\yolo5\Lib\site-packages\pandas\compat_init
.py", line 25, in

from pandas.compat.numpy import (

File "E:\anaconda3\envs\yolo5\Lib\site-packages\pandas\compat\numpy_init _.py", line 4, in

from pandas.util.version import Version

File "E:\anaconda3\envs\yolo5\Lib\site-packages\pandas\util_init _.py", line 2, in

from pandas.util._decorators import ( # noqa:F401

File "E:\anaconda3\envs\yolo5\Lib\site-packages\pandas\util_decorators.py", line 14, in

from pandas.libs.properties import cache_readonly
File "E:\anaconda3\envs\yolo5\Lib\site-packages\pandas_libs_init
.py", line 13, in

from pandas._libs.interval import Interval

ModuleNotFoundError: No module named 'pandas._libs.interval'

缺了pandas小小的库依赖。。。

pip install --force-reinstall pandas

4.ImportError: cannot import name '_c_internal_utils' from partially initialized module 'matplotlib' (most likely due to a circular import)

Traceback (most recent call last):

File "C:\Users\ASUS\Desktop\yolo\yolov5\models\common.py", line 27, in

import ultralytics

File "E:\anaconda3\envs\yolo5\Lib\site-packages\ultralytics_init _.py", line 5, in

from ultralytics.data.explorer.explorer import Explorer

File "E:\anaconda3\envs\yolo5\Lib\site-packages\ultralytics\data_init _.py", line 3, in

from .base import BaseDataset

File "E:\anaconda3\envs\yolo5\Lib\site-packages\ultralytics\data\base.py", line 17, in

from ultralytics.utils import DEFAULT_CFG, LOCAL_RANK, LOGGER, NUM_THREADS, TQDM

File "E:\anaconda3\envs\yolo5\Lib\site-packages\ultralytics\utils_init _.py", line 21, in

import matplotlib.pyplot as plt

File "E:\anaconda3\envs\yolo5\Lib\site-packages\matplotlib_init _.py", line 157, in

from . import _api, version, cbook, *docstring, rcsetup
File "E:\anaconda3\envs\yolo5\Lib\site-packages\matplotlib\cbook
init* .py", line 35, in

from matplotlib import _api, _c_internal_utils

ImportError: cannot import name '_c_internal_utils' from partially initialized module 'matplotlib' (mos

Traceback (most recent call last):

File "C:\Users\ASUS\Desktop\yolo\yolov5\train.py", line 49, in

import val as validate # for end-of-epoch mAP

^^^^^^^^^^^^^^^^^^^^^^

File "C:\Users\ASUS\Desktop\yolo\yolov5\val.py", line 39, in

from models.common import DetectMultiBackend

File "C:\Users\ASUS\Desktop\yolo\yolov5\models\common.py", line 34, in

import ultralytics

File "E:\anaconda3\envs\yolo5\Lib\site-packages\ultralytics_init _.py", line 5, in

from ultralytics.data.explorer.explorer import Explorer

File "E:\anaconda3\envs\yolo5\Lib\site-packages\ultralytics\data_init _.py", line 3, in

from .base import BaseDataset

File "E:\anaconda3\envs\yolo5\Lib\site-packages\ultralytics\data\base.py", line 17, in

from ultralytics.utils import DEFAULT_CFG, LOCAL_RANK, LOGGER, NUM_THREADS, TQDM

File "E:\anaconda3\envs\yolo5\Lib\site-packages\ultralytics\utils_init _.py", line 21, in

import matplotlib.pyplot as plt

File "E:\anaconda3\envs\yolo5\Lib\site-packages\matplotlib_init _.py", line 157, in

from . import _api, version, cbook, *docstring, rcsetup
File "E:\anaconda3\envs\yolo5\Lib\site-packages\matplotlib\cbook
init* .py", line 35, in

from matplotlib import _api, _c_internal_utils

ImportError: cannot import name 'c_internal_utils' from partially initialized module 'matplotlib' (most likely due to a circular import) (E:\anaconda3\envs\yolo5\Lib\site-packages\matplotlib_init.py)

比较复杂,问题分析:

1.项目中存在与matplotlib同名的文件,修改同名的文件(因为matplotlib是库名不能修改,即只要项目中没有同名的文件即可),再三检查后,没有同名的文件,果断放弃这一方法。

2.因多次安装卸载Python的不同版本 ,可能存在多个matplotlib,重新卸载matplotlib库,使用:pip uninstall matplotlib,相关的也建议删除,然后再重新安装:pip install matplotlib.
如果因为相关库删除后无法找到库,要记得重新安装(下面马上就要考)

用这个方法解决了。

另外更新库是没有用的。

pip uninstall matplotlib
pip install matplotlib

5. 单升级matplotlib导致依赖缺失未升级

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "C:\Users\ASUS\Desktop\yolo\yolov5\train.py", line 49, in

import val as validate # for end-of-epoch mAP

^^^^^^^^^^^^^^^^^^^^^^

File "C:\Users\ASUS\Desktop\yolo\yolov5\val.py", line 39, in

from models.common import DetectMultiBackend

File "C:\Users\ASUS\Desktop\yolo\yolov5\models\common.py", line 34, in

import ultralytics

File "E:\anaconda3\envs\yolo5\Lib\site-packages\ultralytics_init _.py", line 5, in

from ultralytics.data.explorer.explorer import Explorer

File "E:\anaconda3\envs\yolo5\Lib\site-packages\ultralytics\data_init _.py", line 3, in

from .base import BaseDataset

File "E:\anaconda3\envs\yolo5\Lib\site-packages\ultralytics\data\base.py", line 17, in

from ultralytics.utils import DEFAULT_CFG, LOCAL_RANK, LOGGER, NUM_THREADS, TQDM

File "E:\anaconda3\envs\yolo5\Lib\site-packages\ultralytics\utils_init _.py", line 21, in

import matplotlib.pyplot as plt

File "E:\anaconda3\envs\yolo5\Lib\site-packages\matplotlib_init .py", line 272, in
*check_versions()
File "E:\anaconda3\envs\yolo5\Lib\site-packages\matplotlib
init* .py", line 266, in check_versions
module = importlib.import_module(modname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\anaconda3\envs\yolo5\Lib\importlib_init
.py", line 90, in import_module

return bootstrap.*gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\anaconda3\envs\yolo5\Lib\site-packages\kiwisolver
init* .py", line 8, in

from ._cext import (

ModuleNotFoundError: No module named 'kiwisolver._cext'

重新安装依赖

华丽的归来------缺失了相关的小依赖,重新按一遍吧。。。

pip uninstall matplotlib kiwisolver
pip install matplotlib

6.ImportError: The scipy install you are using seems to be broken, (extension modules cannot be imported)

Traceback (most recent call last):

File "E:\anaconda3\envs\yolo5\Lib\site-packages\scipy_init _.py", line 184, in

from scipy._lib._ccallback import LowLevelCallable

File "E:\anaconda3\envs\yolo5\Lib\site-packages\scipy_lib_ccallback.py", line 1, in

from . import _ccallback_c

ImportError: cannot import name 'ccallback_c' from 'scipy.*lib' (E:\anaconda3\envs\yolo5\Lib\site-packages\scipy_libinit*.py)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

File "C:\Users\ASUS\Desktop\yolo\yolov5\train.py", line 49, in

import val as validate # for end-of-epoch mAP

^^^^^^^^^^^^^^^^^^^^^^

File "C:\Users\ASUS\Desktop\yolo\yolov5\val.py", line 60, in

from utils.plots import output_to_target, plot_images, plot_val_study

File "C:\Users\ASUS\Desktop\yolo\yolov5\utils\plots.py", line 18, in

from scipy.ndimage.filters import gaussian_filter1d

File "E:\anaconda3\envs\yolo5\Lib\site-packages\scipy_init _.py", line 189, in

raise ImportError(msg) from e

ImportError: The scipy install you are using seems to be broken, (extension modules cannot be imported), please try reinstalling.

scipy是重要的包,损坏要到全局层面去修

这次不在conda或者pip来处理,需要直接到cmd再开个黑框框终端

打开Windows控制台命令窗口:
Win + R 打开Windows命令运行框 输入 cmd 
修复对应环境缺失的包,输入:
conda install -n 环境名xxx scipy

7.If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management 炸空间

Traceback (most recent call last):

File "C:\Users\ASUS\Desktop\yolo\yolov5\train.py", line 850, in

main(opt)

File "C:\Users\ASUS\Desktop\yolo\yolov5\train.py", line 625, in main

train(opt.hyp, opt, device, callbacks)

File "C:\Users\ASUS\Desktop\yolo\yolov5\train.py", line 384, in train

pred = model(imgs) # forward

^^^^^^^^^^^

File "E:\anaconda3\envs\yolo5\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "E:\anaconda3\envs\yolo5\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl

return forward_call(*args, **kwargs)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\Users\ASUS\Desktop\yolo\yolov5\models\yolo.py", line 263, in forward

return self._forward_once(x, profile, visualize) # single-scale inference, train

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\Users\ASUS\Desktop\yolo\yolov5\models\yolo.py", line 167, in _forward_once

x = m(x) # run

^^^^

File "E:\anaconda3\envs\yolo5\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "E:\anaconda3\envs\yolo5\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl

return forward_call(*args, **kwargs)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "E:\anaconda3\envs\yolo5\Lib\site-packages\torch\nn\modules\upsampling.py", line 157, in forward

return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners,

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "E:\anaconda3\envs\yolo5\Lib\site-packages\torch\nn\functional.py", line 4001, in interpolate

return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacity of 4.00 GiB of which 0 bytes is free. Of the allocated memory 3.55 GiB is allocated by PyTorch, and 37.67 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

思路分析:

显卡上空间不够,实在是绷不住,此前在4090上跑GLM3-32k-6B也遇到了这种情况,还有离谱的报错,尚待解决------

8.NotImplementedError: Could not run 'torchvision::nms' with arguments from the 'cuDA' backend.

NotImplementedError: Could not run 'torchvision::nms' with arguments from the 'cuDA' backend.This could be because the operator doesn't exist for this backend,or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile,please visit...

嘛,如果对您有帮助的话就开心的复制吧,整理不易转载请注明qwq!

如果有更好的建议或意见欢迎补充!

我是亓云鹏(亓Qí),努力与大家一同分享算法的快乐!

每博一图(1/1)↓

下一个坑:

解决完所有可能的问题后胆战心惊(bushi)地开始启动

开始执行训练

小测试下训练结果

跑出来的效果图

下面进行验证

python val.py --weights runs/train/exp/weights/best.pt --data ./data/horse.yaml --img 320

看一下效果

下一个坑:torch版本自动升级导致torchvision不一致报错

相关推荐
好喜欢吃红柚子4 分钟前
万字长文解读空间、通道注意力机制机制和超详细代码逐行分析(SE,CBAM,SGE,CA,ECA,TA)
人工智能·pytorch·python·计算机视觉·cnn
小馒头学python8 分钟前
机器学习是什么?AIGC又是什么?机器学习与AIGC未来科技的双引擎
人工智能·python·机器学习
神奇夜光杯18 分钟前
Python酷库之旅-第三方库Pandas(202)
开发语言·人工智能·python·excel·pandas·标准库及第三方库·学习与成长
正义的彬彬侠20 分钟前
《XGBoost算法的原理推导》12-14决策树复杂度的正则化项 公式解析
人工智能·决策树·机器学习·集成学习·boosting·xgboost
千天夜30 分钟前
使用UDP协议传输视频流!(分片、缓存)
python·网络协议·udp·视频流
Debroon30 分钟前
RuleAlign 规则对齐框架:将医生的诊断规则形式化并注入模型,无需额外人工标注的自动对齐方法
人工智能
测试界的酸菜鱼33 分钟前
Python 大数据展示屏实例
大数据·开发语言·python
羊小猪~~37 分钟前
神经网络基础--什么是正向传播??什么是方向传播??
人工智能·pytorch·python·深度学习·神经网络·算法·机器学习
AI小杨38 分钟前
【车道线检测】一、传统车道线检测:基于霍夫变换的车道线检测史诗级详细教程
人工智能·opencv·计算机视觉·霍夫变换·车道线检测
晨曦_子画43 分钟前
编程语言之战:AI 之后的 Kotlin 与 Java
android·java·开发语言·人工智能·kotlin