🌟想了解YOLO系列算法更多教程欢迎订阅我的专栏🌟
对于基础薄弱的同学来说,推荐阅读《目标检测蓝皮书》📘,里面涵盖了丰富的目标检测实用知识,是你迅速掌握目标检测的理想选择!
如果想了解 YOLOv5 和 YOLOv7 系列算法的训练和改进,可以关注专栏《YOLOv5/v7 改进实战》🌟。该专栏涵盖了丰富的YOLO实用教程,专门为改进YOLO的同学而设计。该专栏阅读量已经突破60w+🚀,被誉为全网最经典的教程!所有的改进方法都提供了详细的手把手教学!
《YOLOv5/v7 进阶实战》🏅专栏是在《YOLOv5/v7 改进实战》🌟专栏上进一步推出的更加有难度的专栏,除大量的最新最前沿改进外,还包含多种手把手的部署压缩教程,内容不仅可以用于小论文,也可用于大论文!
想了解 YOLOv8 系列算法教程的同学可以关注这个专栏《YOLOv8改进实战》🍀,这个专栏为博主精心设计的最新专栏,随 YOLOv8 官方项目实时更新,内容以最新最前沿的改进为主,专栏内容包含【检测】【分类】【分割】【关键点】任务!
我最近在哔哩哔哩上更新了视频版的讲解,有需要的同学可以关注一下~ 我的哔哩哔哩主页
文章目录
-
- [1 原理](#1 原理)
-
- [1.1 SPP(Spatial Pyramid Pooling)](#1.1 SPP(Spatial Pyramid Pooling))
- [1.2 SPPF(Spatial Pyramid Pooling - Fast)](#1.2 SPPF(Spatial Pyramid Pooling - Fast))
- [1.3 SimSPPF(Simplified SPPF)](#1.3 SimSPPF(Simplified SPPF))
- [1.4 ASPP(Atrous Spatial Pyramid Pooling)](#1.4 ASPP(Atrous Spatial Pyramid Pooling))
- [1.5 RFB(Receptive Field Block)](#1.5 RFB(Receptive Field Block))
- [1.6 SPPCSPC](#1.6 SPPCSPC)
- [1.7 SPPFCSPC🍀](#1.7 SPPFCSPC🍀)
- [1.8 SPPELAN](#1.8 SPPELAN)
- [2 参数量对比](#2 参数量对比)
- [3 改进方式](#3 改进方式)
- [4 Issue](#4 Issue)
- 本人更多YOLOv5实战内容导航🍀🌟🚀
1 原理
1.1 SPP(Spatial Pyramid Pooling)
SPP
模块是何凯明大神在2015年的论文《Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition》中被提出。
SPP
全程为空间金字塔池化结构,主要是为了解决两个问题:
- 有效避免了对图像区域裁剪、缩放操作导致的图像失真等问题;
- 解决了卷积神经网络对图相关重复特征提取的问题,大大提高了产生候选框的速度,且节省了计算成本。
python
class SPP(nn.Module):
# Spatial Pyramid Pooling (SPP) layer https://arxiv.org/abs/1406.4729
def __init__(self, c1, c2, k=(5, 9, 13)):
super().__init__()
c_ = c1 // 2 # hidden channels
self.cv1 = Conv(c1, c_, 1, 1)
self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1)
self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
def forward(self, x):
x = self.cv1(x)
with warnings.catch_warnings():
warnings.simplefilter('ignore') # suppress torch 1.9.0 max_pool2d() warning
return self.cv2(torch.cat([x] + [m(x) for m in self.m], 1))
1.2 SPPF(Spatial Pyramid Pooling - Fast)
这个是YOLOv5
作者Glenn Jocher
基于SPP
提出的,速度较SPP
快很多,所以叫SPP-Fast
python
class SPPF(nn.Module):
# Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher
def __init__(self, c1, c2, k=5): # equivalent to SPP(k=(5, 9, 13))
super().__init__()
c_ = c1 // 2 # hidden channels
self.cv1 = Conv(c1, c_, 1, 1)
self.cv2 = Conv(c_ * 4, c2, 1, 1)
self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2)
def forward(self, x):
x = self.cv1(x)
with warnings.catch_warnings():
warnings.simplefilter('ignore') # suppress torch 1.9.0 max_pool2d() warning
y1 = self.m(x)
y2 = self.m(y1)
return self.cv2(torch.cat((x, y1, y2, self.m(y2)), 1))
1.3 SimSPPF(Simplified SPPF)
美团YOLOv6
提出的模块,感觉和SPPF
只差了一个激活函数,简单测试了一下,单个ConvBNReLU
速度要比ConvBNSiLU
快18%
c
class SimConv(nn.Module):
'''Normal Conv with ReLU activation'''
def __init__(self, in_channels, out_channels, kernel_size, stride, groups=1, bias=False):
super().__init__()
padding = kernel_size // 2
self.conv = nn.Conv2d(
in_channels,
out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=groups,
bias=bias,
)
self.bn = nn.BatchNorm2d(out_channels)
self.act = nn.ReLU()
def forward(self, x):
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
return self.act(self.conv(x))
class SimSPPF(nn.Module):
'''Simplified SPPF with ReLU activation'''
def __init__(self, in_channels, out_channels, kernel_size=5):
super().__init__()
c_ = in_channels // 2 # hidden channels
self.cv1 = SimConv(in_channels, c_, 1, 1)
self.cv2 = SimConv(c_ * 4, out_channels, 1, 1)
self.m = nn.MaxPool2d(kernel_size=kernel_size, stride=1, padding=kernel_size // 2)
def forward(self, x):
x = self.cv1(x)
with warnings.catch_warnings():
warnings.simplefilter('ignore')
y1 = self.m(x)
y2 = self.m(y1)
return self.cv2(torch.cat([x, y1, y2, self.m(y2)], 1))
1.4 ASPP(Atrous Spatial Pyramid Pooling)
受到SPP
的启发,语义分割模型DeepLabv2中提出了ASPP
模块(空洞空间卷积池化金字塔),该模块使用具有不同采样率的多个并行空洞卷积层。为每个采样率提取的特征在单独的分支中进一步处理,并融合以生成最终结果。该模块通过不同的空洞率构建不同感受野的卷积核,用来获取多尺度物体信息,具体结构比较简单如下图所示:
ASPP
是在DeepLab中提出来的,在后续的DeepLab版本中对其做了改进,如加入BN层、加入深度可分离卷积等,但基本的思路还是没变。
python
# without BN version
class ASPP(nn.Module):
def __init__(self, in_channel=512, out_channel=256):
super(ASPP, self).__init__()
self.mean = nn.AdaptiveAvgPool2d((1, 1)) # (1,1)means ouput_dim
self.conv = nn.Conv2d(in_channel,out_channel, 1, 1)
self.atrous_block1 = nn.Conv2d(in_channel, out_channel, 1, 1)
self.atrous_block6 = nn.Conv2d(in_channel, out_channel, 3, 1, padding=6, dilation=6)
self.atrous_block12 = nn.Conv2d(in_channel, out_channel, 3, 1, padding=12, dilation=12)
self.atrous_block18 = nn.Conv2d(in_channel, out_channel, 3, 1, padding=18, dilation=18)
self.conv_1x1_output = nn.Conv2d(out_channel * 5, out_channel, 1, 1)
def forward(self, x):
size = x.shape[2:]
image_features = self.mean(x)
image_features = self.conv(image_features)
image_features = F.upsample(image_features, size=size, mode='bilinear')
atrous_block1 = self.atrous_block1(x)
atrous_block6 = self.atrous_block6(x)
atrous_block12 = self.atrous_block12(x)
atrous_block18 = self.atrous_block18(x)
net = self.conv_1x1_output(torch.cat([image_features, atrous_block1, atrous_block6,
atrous_block12, atrous_block18], dim=1))
return net
1.5 RFB(Receptive Field Block)
RFB
模块是在《ECCV2018:Receptive Field Block Net for Accurate and Fast Object Detection》一文中提出的,该文的出发点是模拟人类视觉的感受野从而加强网络的特征提取能力,在结构上RFB
借鉴了Inception
的思想,主要是在Inception
的基础上加入了空洞卷积,从而有效增大了感受野
RFB
和RFB-s
的架构。RFB-s
用于在浅层人类视网膜主题图中模拟较小的pRF
,使用具有较小内核的更多分支。
python
class BasicConv(nn.Module):
def __init__(self, in_planes, out_planes, kernel_size, stride=1, padding=0, dilation=1, groups=1, relu=True, bn=True):
super(BasicConv, self).__init__()
self.out_channels = out_planes
if bn:
self.conv = nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, groups=groups, bias=False)
self.bn = nn.BatchNorm2d(out_planes, eps=1e-5, momentum=0.01, affine=True)
self.relu = nn.ReLU(inplace=True) if relu else None
else:
self.conv = nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, groups=groups, bias=True)
self.bn = None
self.relu = nn.ReLU(inplace=True) if relu else None
def forward(self, x):
x = self.conv(x)
if self.bn is not None:
x = self.bn(x)
if self.relu is not None:
x = self.relu(x)
return x
class BasicRFB(nn.Module):
def __init__(self, in_planes, out_planes, stride=1, scale=0.1, map_reduce=8, vision=1, groups=1):
super(BasicRFB, self).__init__()
self.scale = scale
self.out_channels = out_planes
inter_planes = in_planes // map_reduce
self.branch0 = nn.Sequential(
BasicConv(in_planes, inter_planes, kernel_size=1, stride=1, groups=groups, relu=False),
BasicConv(inter_planes, 2 * inter_planes, kernel_size=(3, 3), stride=stride, padding=(1, 1), groups=groups),
BasicConv(2 * inter_planes, 2 * inter_planes, kernel_size=3, stride=1, padding=vision, dilation=vision, relu=False, groups=groups)
)
self.branch1 = nn.Sequential(
BasicConv(in_planes, inter_planes, kernel_size=1, stride=1, groups=groups, relu=False),
BasicConv(inter_planes, 2 * inter_planes, kernel_size=(3, 3), stride=stride, padding=(1, 1), groups=groups),
BasicConv(2 * inter_planes, 2 * inter_planes, kernel_size=3, stride=1, padding=vision + 2, dilation=vision + 2, relu=False, groups=groups)
)
self.branch2 = nn.Sequential(
BasicConv(in_planes, inter_planes, kernel_size=1, stride=1, groups=groups, relu=False),
BasicConv(inter_planes, (inter_planes // 2) * 3, kernel_size=3, stride=1, padding=1, groups=groups),
BasicConv((inter_planes // 2) * 3, 2 * inter_planes, kernel_size=3, stride=stride, padding=1, groups=groups),
BasicConv(2 * inter_planes, 2 * inter_planes, kernel_size=3, stride=1, padding=vision + 4, dilation=vision + 4, relu=False, groups=groups)
)
self.ConvLinear = BasicConv(6 * inter_planes, out_planes, kernel_size=1, stride=1, relu=False)
self.shortcut = BasicConv(in_planes, out_planes, kernel_size=1, stride=stride, relu=False)
self.relu = nn.ReLU(inplace=False)
def forward(self, x):
x0 = self.branch0(x)
x1 = self.branch1(x)
x2 = self.branch2(x)
out = torch.cat((x0, x1, x2), 1)
out = self.ConvLinear(out)
short = self.shortcut(x)
out = out * self.scale + short
out = self.relu(out)
return out
1.6 SPPCSPC
该模块是YOLOv7
中使用的SPP
结构,表现优于SPPF
,但参数量和计算量提升了很多
python
class SPPCSPC(nn.Module):
# CSP https://github.com/WongKinYiu/CrossStagePartialNetworks
def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(5, 9, 13)):
super(SPPCSPC, self).__init__()
c_ = int(2 * c2 * e) # hidden channels
self.cv1 = Conv(c1, c_, 1, 1)
self.cv2 = Conv(c1, c_, 1, 1)
self.cv3 = Conv(c_, c_, 3, 1)
self.cv4 = Conv(c_, c_, 1, 1)
self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
self.cv5 = Conv(4 * c_, c_, 1, 1)
self.cv6 = Conv(c_, c_, 3, 1)
self.cv7 = Conv(2 * c_, c2, 1, 1)
def forward(self, x):
x1 = self.cv4(self.cv3(self.cv1(x)))
y1 = self.cv6(self.cv5(torch.cat([x1] + [m(x1) for m in self.m], 1)))
y2 = self.cv2(x)
return self.cv7(torch.cat((y1, y2), dim=1))
python
#分组SPPCSPC 分组后参数量和计算量与原本差距不大,不知道效果怎么样
class SPPCSPC_group(nn.Module):
def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(5, 9, 13)):
super(SPPCSPC_group, self).__init__()
c_ = int(2 * c2 * e) # hidden channels
self.cv1 = Conv(c1, c_, 1, 1, g=4)
self.cv2 = Conv(c1, c_, 1, 1, g=4)
self.cv3 = Conv(c_, c_, 3, 1, g=4)
self.cv4 = Conv(c_, c_, 1, 1, g=4)
self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
self.cv5 = Conv(4 * c_, c_, 1, 1, g=4)
self.cv6 = Conv(c_, c_, 3, 1, g=4)
self.cv7 = Conv(2 * c_, c2, 1, 1, g=4)
def forward(self, x):
x1 = self.cv4(self.cv3(self.cv1(x)))
y1 = self.cv6(self.cv5(torch.cat([x1] + [m(x1) for m in self.m], 1)))
y2 = self.cv2(x)
return self.cv7(torch.cat((y1, y2), dim=1))
1.7 SPPFCSPC🍀
我借鉴了SPPF
的思想将SPPCSPC
优化了一下,得到了SPPFCSPC
,在保持感受野不变的情况下获得速度提升;我把这个模块给v7
作者看了,并没有得到否定,详细回答可以看[4 Issue](#4 Issue)
目前这个结构被YOLOv6 3.0
版本使用了,效果很不错,大家可以看一下YOLOv6 3.0
的论文,里面有详细的实验结果。
python
class SPPFCSPC(nn.Module):
def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=5):
super(SPPFCSPC, self).__init__()
c_ = int(2 * c2 * e) # hidden channels
self.cv1 = Conv(c1, c_, 1, 1)
self.cv2 = Conv(c1, c_, 1, 1)
self.cv3 = Conv(c_, c_, 3, 1)
self.cv4 = Conv(c_, c_, 1, 1)
self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2)
self.cv5 = Conv(4 * c_, c_, 1, 1)
self.cv6 = Conv(c_, c_, 3, 1)
self.cv7 = Conv(2 * c_, c2, 1, 1)
def forward(self, x):
x1 = self.cv4(self.cv3(self.cv1(x)))
x2 = self.m(x1)
x3 = self.m(x2)
y1 = self.cv6(self.cv5(torch.cat((x1,x2,x3, self.m(x3)),1)))
y2 = self.cv2(x)
return self.cv7(torch.cat((y1, y2), dim=1))
1.8 SPPELAN
YOLOv9
最新更新的模块,原理很简单,感兴趣可以试试~
python
import numpy as np
import torch.nn as nn
import torch
def autopad(k, p=None, d=1): # kernel, padding, dilation
# Pad to 'same' shape outputs
if d > 1:
k = (
d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]
) # actual kernel-size
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
class Conv(nn.Module):
# Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)
default_act = nn.SiLU() # default activation
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
super().__init__()
self.conv = nn.Conv2d(
c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False
)
self.bn = nn.BatchNorm2d(c2)
self.act = (
self.default_act
if act is True
else act
if isinstance(act, nn.Module)
else nn.Identity()
)
def forward(self, x):
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
return self.act(self.conv(x))
class SP(nn.Module):
def __init__(self, k=3, s=1):
super(SP, self).__init__()
self.m = nn.MaxPool2d(kernel_size=k, stride=s, padding=k // 2)
def forward(self, x):
return self.m(x)
class SPPELAN(nn.Module):
# spp-elan
def __init__(
self, c1, c2, c3
): # ch_in, ch_out, number, shortcut, groups, expansion
super().__init__()
self.c = c3
self.cv1 = Conv(c1, c3, 1, 1)
self.cv2 = SP(5)
self.cv3 = SP(5)
self.cv4 = SP(5)
self.cv5 = Conv(4 * c3, c2, 1, 1)
def forward(self, x):
y = [self.cv1(x)]
y.extend(m(y[-1]) for m in [self.cv2, self.cv3, self.cv4])
return self.cv5(torch.cat(y, 1))
2 参数量对比
这里我在yolov5s.yaml
中使用各个模型替换SPP
模块
模型 | 参数量(parameters) | 计算量(GFLOPs) |
---|---|---|
SPP | 7225885 | 16.5 |
SPPF | 7235389 | 16.5 |
SimSPPF | 7235389 | 16.5 |
ASPP | 15485725 | 23.1 |
BasicRFB | 7895421 | 17.1 |
SPPCSPC | 13663549 | 21.7 |
SPPFCSPC🍀 | 13663549 | 21.7 |
分组SPPCSPC | 8355133 | 17.4 |
3 改进方式
第一步;各个代码放入common.py
中
第二步;yolo.py
中加入类名
第三步;修改配置文件
yolov5配置文件如下:
python
# YOLOv5 🚀 by Ultralytics, GPL-3.0 license
# YOLOv5 v6.0 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
[-1, 1, Conv, [128, 3, 2]], # 1-P2/4
[-1, 3, C3, [128]],
[-1, 1, Conv, [256, 3, 2]], # 3-P3/8
[-1, 6, C3, [256]],
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
[-1, 9, C3, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
[-1, 3, C3, [1024]],
[-1, 1, SPPF, [1024, 5]], # 9
# [-1, 1, ASPP, [512]], # 9
# [-1, 1, SPP, [1024]],
# [-1, 1, SimSPPF, [1024, 5]],
# [-1, 1, BasicRFB, [1024]],
# [-1, 1, SPPCSPC, [1024]],
# [-1, 1, SPPFCSPC, [1024, 5]], # 🍀
]
4 Issue
Q:Why use SPPCSPC instead of SPPFCSPC? /
yolov5's SPPF is much faster than SPP.
Why not try to replace SPPCSPC with SPPFCSPC?
A:Max pooling uses very few computation, if you programming well, above one could run three max pool layers in parallel, while below one must process three max pool layers sequentially.
By the way, you could replace SPPCSPC by SPPFCSPC at inference time if your hardware is friendly to SPPFCSPC.
感兴趣的可以试一下
本人更多YOLOv5实战内容导航🍀🌟🚀
-
手把手带你Yolov5 (v6.2)添加注意力机制(一)(并附上30多种顶会Attention原理图)🌟强烈推荐🍀新增8种
-
空间金字塔池化改进 SPP / SPPF / SimSPPF / ASPP / RFB / SPPCSPC / SPPFCSPC🚀
有问题欢迎大家指正,如果感觉有帮助的话请点赞支持下👍📖🌟
更新日志:2022年8月16日上午9:33分前在图片中增加感受野标注🍀
更新日志:2022年8月29日晚上11点40分在文中增加了SimSPPF
模块,并测试了速度
更新日志:2022年8月30日修正了SPPCSPC
的结构图
更新日志:2022年8月30日增加了SPPFCSPC
的结构
更新日志:2023年5月19日修复了RFB
的小错误
更新日志:2023年7月23日修复了RFB
的Bug