【目标检测】YOLOv7算法实现(一)：模型搭建

本系列文章记录本人硕士阶段YOLO系列目标检测算法自学及其代码实现的过程。其中算法具体实现借鉴于ultralytics YOLO源码Github，删减了源码中部分内容，满足个人科研需求。

本篇文章在YOLOv5算法实现的基础上，进一步完成YOLOv7算法的实现。YOLOv7相比于YOLOv5，最主要的不同之处如下：

模型结构：引进了更为高效的特征提取模块(ELAN)、下采样模块(MP)，不同的空间池化层(SPPCSPC)，重参数卷积(RepConv)

正样本匹配：结合YOLOv5中和正样本匹配方法和YOLOX中的正样本筛选方法(SimOTA)

文章地址：
YOLOv7算法实现(一)：模型搭建
 YOLOv7算法实现(二)：正样本匹配(SimOTA)与损失计算

本文目录

[1 模型结构](#1 模型结构)
[2 模型模块实现(common.py)](#2 模型模块实现(common.py))
- [2.1 CBS模块](#2.1 CBS模块)
- [2.2 ELAN模块](#2.2 ELAN模块)
- [2.3 MP模块](#2.3 MP模块)
- [2.4 SPPCSPC模块](#2.4 SPPCSPC模块)
- [2.5 RepConv模块](#2.5 RepConv模块)
- - [2.5.1 参数融合](#2.5.1 参数融合)
  - [2.5.2 模块实现](#2.5.2 模块实现)
- [2.6 Detect(耦合头 + Anchor-based)模块](#2.6 Detect(耦合头 + Anchor-based)模块)
[3 模型配置文件构建(model.yaml)](#3 模型配置文件构建(model.yaml))
[4 模型搭建(yolo.py)](#4 模型搭建(yolo.py))

1 模型结构

YOLOv7的模型结构如图1所示，其包含以下几个模块：

CBS：卷积层、批标准化(BN)和SiLU激活函数
ELAN：多梯度融合特征提取模块，根据融合的梯度信息的不同分为ELAB-B和ELAB-H
MP：最大池化和CBS结合的下采样模块
SPPCSPC：跨阶段特征金字塔池化模块
RepConv：重参数化卷积
Detect：检测头(耦合头 + Anchor-based)

图1 YOLOv7网络结构

2 模型模块实现(common.py)

2.1 CBS模块

python 复制代码

Class Conv(nn.Module):
	'''
	卷积块:conv-BN-Activation
	'''
	default_act = nn.SiLU()  # 默认激活函数
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True, b=False):
       '''
       :param c1: 输入通道数
       :param c2: 输出通道数
       :param k: 卷积核大小
       :param s: 步长
       :param p: 填充 默认为None则表示填充至与输入分辨率相同
       :param g: 分组卷积,默认为1时为标准卷积
       :param d: 间隙卷积,默认为1时为标准卷积;不为1表示点之间有空隙的过滤器,对卷积核进行膨胀
       :param act: 是否使用激活函数
       :param b: 卷积偏置,默认使用无偏置卷积
       '''
       super(Conv, self).__init__()
       self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=b)
       self.bn = nn.BatchNorm2d(c2)
       # 若act为True:使用默认激活函数;若act为其他激活函数模块:则使用该激活函数;反之:使用nn.Identity,表示不对输入进行操作,直接输出输入
       self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
    def forward(self, x):
    	# 前向传播
        return self.act(self.bn(self.conv(x)))

2.2 ELAN模块

ELAN_B

python 复制代码

class ELAN_B(nn.Module):
    '''
    yolov7中特征提取模块(backbone部分)
    使用了密集的残差结构, 通过增加相当的深度来提高准确率; 内部的残差块使用跳跃连接,
    缓解了深度神经网络中增加深度带来的梯度消失问题。
    '''
    def __init__(self, c1, c2, e=0.5):
        super().__init__()

        c_ = int(c1 * e)
        self.cv1 = Conv(c1, c_, 1, 1)

        self.cv2 = Conv(c1, c_, 1, 1)

        self.cv3 = nn.Sequential(Conv(c_, c_, 3, 1),
                                 Conv(c_, c_, 3, 1))

        self.cv4 = nn.Sequential(Conv(c_, c_, 3, 1),
                                 Conv(c_, c_, 3, 1))

        self.cv5 = Conv(c_ * 4, c2, 1, 1)

    def forward(self, x):
        y1 = self.cv1(x)  # c1 // 2
        y2 = self.cv2(x)  # c1 // 2
        y3 = self.cv3(y2)  # c1
        y4 = self.cv4(y3)  # c1

        return self.cv5(torch.cat([y1, y2, y3, y4], dim=1))

ELAN-H

python 复制代码

class ELAN_H(nn.Module):
    def __init__(self, c1, c2, e=0.5):
        '''Head部分'''
        super().__init__()
        c_ = int(c1 * e)
        c__ = c_ // 2
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(c_, c__, 3, 1)
        self.cv4 = Conv(c__, c__, 3, 1)
        self.cv5 = Conv(c__, c__, 3, 1)
        self.cv6 = Conv(c__, c__, 3, 1)
        self.cv7 = Conv(c_ * 2 + c__ * 4, c2, 1, 1)

    def forward(self, x):
        y1 = self.cv1(x)

        y2 = self.cv2(x)
        y3 = self.cv3(y2)
        y4 = self.cv4(y3)
        y5 = self.cv5(y4)
        y6 = self.cv6(y5)

        return self.cv7(torch.cat([y1, y2, y3, y4, y5, y6], dim=1))

2.3 MP模块

python 复制代码

class MP(nn.Module):
    '''
    yolov7中下采样模块
    '''
    def __init__(self, c1, c2):
        super().__init__()
        # MP-1
        if c1 == c2:
            c_ = c1 // 2
        # MP-2
        else:
            c_ = c1
        # 第一个分支
        self.maxpool = MaxPool(k=2, s=2)
        self.cv1 = Conv(c1, c_, 1, 1)
        # 第二个分支
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(c_, c_, 3, 2)

    def forward(self, x):
        o1 = self.cv1(self.maxpool(x))
        o2 = self.cv3(self.cv2(x))
        return torch.cat([o1, o2], dim=1)

2.4 SPPCSPC模块

python 复制代码

class SPPCSPC(nn.Module):
    def __init__(self, c1, c2, k=(5, 9, 13), e=0.5):
        super().__init__()
        c_ = int(2 * c2 * e)
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(c_, c_, 3, 1)
        self.cv4 = Conv(c_, c_, 1, 1)
        self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
        self.cv5 = Conv(4 * c_, c_, 1, 1)
        self.cv6 = Conv(c_, c_, 3, 1)
        self.cv7 = Conv(2 * c_, c2, 1, 1)

    def forward(self, x):
        x1 = self.cv4(self.cv3(self.cv1(x)))
        y1 = self.cv6(self.cv5(torch.cat([x1] + [m(x1) for m in self.m], 1)))
        y2 = self.cv2(x)
        return self.cv7(torch.cat((y1, y2), dim=1))

2.5 RepConv模块

2.5.1 参数融合

RepConv模块的具体结构如图1中所示，在推理阶段其将多个卷积模块融为一个卷积模块，提高模型计算的高效性。

卷积层+BN层 → 卷积层

卷积层的计算公式为：
C o n v ( x ) = W ( x ) + b Conv(x){\rm{ }} = {\rm{ }}W(x) + b Conv(x)=W(x)+b

在pytorch中，卷积层权重为conv.weight，偏置为conv.bias。

BN层的计算公式为：
B N ( x ) = γ x − m e a n v a r + β BN(x) = \gamma {{x - mean} \over {\sqrt {{\mathop{\rm var}} } }} + \beta BN(x)=γvar x−mean+β

在pytorch中，γ = bn.weight，mean = bn.running_mean, var = bn.running_var, β = bn.bias。

卷积层和BN层融合公式为：

B N ( C o n v ( x ) ) = γ ( W ( x ) + b ) − m e a n v a r + β = n e w _ W ( x ) + n e w _ b BN(Conv(x)) = \gamma {{(W(x) + b) - mean} \over {\sqrt {{\mathop{\rm var}} } }} + \beta = new\_W(x) + new\_b BN(Conv(x))=γvar (W(x)+b)−mean+β=new_W(x)+new_b
n e w _ W ( x ) = γ W ( x ) v a r , n e w _ b = γ ( b − m e a n ) v a r + β new\_W(x) = {{\gamma W(x)} \over {\sqrt {{\mathop{\rm var}} } }},new\_b = {{\gamma (b - mean)} \over {\sqrt {{\mathop{\rm var}} } }} + \beta new_W(x)=var γW(x),new_b=var γ(b−mean)+β

BN层->卷积层：构造一个参数为0的卷积层(1×1)，实现卷积层+BN层融合
卷积层(3×3)+卷积层(1×1) →卷积层(3×3)：将卷积层(1×1)填充为3×3，再将卷积层权重和偏置相加实现融合

2.5.2 模块实现

python 复制代码

class RepConv(nn.Module):
    def __init__(self, c1, c2, k=3, s=1, p=None, g=1, act=True, deploy=False):
        '''
        重参数卷积
        训练时:
            deploy = False
            rbr_dense(3x3卷积) + rbr_1x1(1x1卷积) + rbr_identity(c2==c1时)相加
            rbr_reparam = None
        推理时:
            deploy = True
            rbr_param = Conv2d
            rbr_dense, rbr_1x1, rbr_identity = None, None, None
        '''
        super().__init__()

        self.deploy = deploy
        self.groups = g
        self.in_channels = c1
        self.out_channels = c2

        assert k == 3
        assert autopad(k, p) == 1

        padding_11 = autopad(k, p) - k // 2

        self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())

        # 推理阶段, 仅有一个3x3卷积
        if self.deploy:
            self.rbr_reparam = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=True)
        else:
            # 输入输出通道数相同时, identity层(BN层)
            self.rbr_identity = (nn.BatchNorm2d(num_features=c1) if c2 == c1 and s == 1 else None)
            # 3×3卷积 + BN层
            self.rbr_dense = nn.Sequential(
                nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False),
                nn.BatchNorm2d(num_features=c2),
            )
            # 1×1卷积 + BN层
            self.rbr_1x1 = nn.Sequential(
                nn.Conv2d(c1, c2, 1, s, padding_11, groups=g, bias=False),
                nn.BatchNorm2d(num_features=c2),
            )

    def forward(self, x):
        # 推理阶段
        if hasattr(self, 'rbr_reparam'):
            return self.act(self.rbr_reparam(x))

        # 训练阶段
        if self.rbr_identity is None:
            id_out = 0
        else:
            id_out = self.rbr_identity(x)

        return self.act(self.rbr_dense(x) + self.rbr_1x1(x) + id_out)

    #融合卷积层和BN层: Conv2D+BN=Conv2D
    def fuse_conv_bn(self, conv, bn):
        std = (bn.running_var + bn.eps).sqrt()
        bias = bn.bias - bn.running_mean * bn.weight / std

        t = (bn.weight / std).reshape(-1, 1, 1, 1)
        weights = conv.weight * t

        bn = nn.Identity()
        conv = nn.Conv2d(in_channels=conv.in_channels,
                         out_channels=conv.out_channels,
                         kernel_size=conv.kernel_size,
                         stride=conv.stride,
                         padding=conv.padding,
                         dilation=conv.dilation,
                         groups=conv.groups,
                         bias=True,
                         padding_mode=conv.padding_mode)

        conv.weight = torch.nn.Parameter(weights)
        conv.bias = torch.nn.Parameter(bias)
        return conv
    # 重参数操作(在推理阶段执行)
    def fuse_repvgg_block(self):
        if self.deploy:
            return
        print(f"RepConv.fuse_repvgg_block")

        # 融合3x3的卷积层和BN层为一个3x3卷积(有偏置)
        self.rbr_dense = self.fuse_conv_bn(self.rbr_dense[0], self.rbr_dense[1])
        # 融合1x1的卷积层和BN层为一个1x1卷积(有偏置)
        self.rbr_1x1 = self.fuse_conv_bn(self.rbr_1x1[0], self.rbr_1x1[1])
        rbr_1x1_bias = self.rbr_1x1.bias
        # 填充卷积核大小与3x3卷积大小相同
        weight_1x1_expanded = torch.nn.functional.pad(self.rbr_1x1.weight, [1, 1, 1, 1])
        # 融合identity的BN层为一个1x1卷积(无偏置)
        if isinstance(self.rbr_identity, nn.BatchNorm2d) or isinstance(self.rbr_identity, nn.modules.batchnorm.SyncBatchNorm):
            identity_conv_1x1 = nn.Conv2d(
                    in_channels=self.in_channels,
                    out_channels=self.out_channels,
                    kernel_size=1,
                    stride=1,
                    padding=0,
                    groups=self.groups,
                    bias=False)
            identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.to(self.rbr_1x1.weight.data.device)
            identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.squeeze().squeeze()
            identity_conv_1x1.weight.data.fill_(0.0)
            identity_conv_1x1.weight.data.fill_diagonal_(1.0)
            identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.unsqueeze(2).unsqueeze(3)
            # 融合该1x1卷积和Identity的BN层
            identity_conv_1x1 = self.fuse_conv_bn(identity_conv_1x1, self.rbr_identity)
            bias_identity_expanded = identity_conv_1x1.bias
            weight_identity_expanded = torch.nn.functional.pad(identity_conv_1x1.weight, [1, 1, 1, 1])
        else:
            bias_identity_expanded = torch.nn.Parameter(torch.zeros_like(rbr_1x1_bias))
            weight_identity_expanded = torch.nn.Parameter(torch.zeros_like(weight_1x1_expanded))

        # 融合3x3卷积和扩充的1x1卷积的权重和偏置
        self.rbr_dense.weight = torch.nn.Parameter(
            self.rbr_dense.weight + weight_1x1_expanded + weight_identity_expanded)
        self.rbr_dense.bias = torch.nn.Parameter(self.rbr_dense.bias + rbr_1x1_bias + bias_identity_expanded)

        self.rbr_reparam = self.rbr_dense
        self.deploy = True

        if self.rbr_identity is not None:
            del self.rbr_identity
            self.rbr_identity = None

        if self.rbr_1x1 is not None:
            del self.rbr_1x1
            self.rbr_1x1 = None

        if self.rbr_dense is not None:
            del self.rbr_dense
            self.rbr_dense = None

2.6 Detect(耦合头 + Anchor-based)模块

Detect模块的具体实现过程可见文章YOLOv5算法实现(二)：模型搭建

3 模型配置文件构建(model.yaml)

基于图1所示的模型结构和模型模块所需的参数，构建模型配置文件。其中结构解析包含四个参数[from，number，module，args]：

from：当前层的输入来自于哪一层
number：当前层数量
module：当前层所有模块(在common.py中实现，需与类名对应)
args：第一个参数为当前层输出通道数，其余参数为模块特有参数；当前层的输入通道数由"from"参数指向的层决定，在结构解析时加入该参数。

python 复制代码

# Parameters
nc: 80  # number of classes 类别数
depth_multiple: 1.0  # model depth multiple 模型深度(模块个数系数)
width_multiple: 1.0  # layer channel multiple 模型宽度(模块通道数系数)
anchors: 
  - [10,13, 16,30, 33,23]  # P3/8 (stride=8 feature_map所用Anchor,小目标检测)
  - [30,61, 62,45, 59,119]  # P4/16 (stride=16 feature_map所用Anchor)
  - [116,90, 156,198, 373,326]  # P5/32 (stride=32 feature_map所用Anchor,大目标检测)

backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [32, 3, 1]],  # 0
   [-1, 1, Conv, [64, 3, 2]],  # 1-P1/2
   [-1, 1, Conv, [64, 3, 1]],  # 2

   [-1, 1, Conv, [128, 3, 2]], # 3-P2/4
   [-1, 1, ELAN_B, [256]],       # 4

   [-1, 1, MP, [256]],         # 5-P3/8
   [-1, 1, ELAN_B, [512]],       # 6

   [ -1, 1, MP, [512]],        # 7-P4/16
   [ -1, 1, ELAN_B, [1024]],     # 8

   [ -1, 1, MP, [1024]],        # 9-P5/32
   [ -1, 1, ELAN_B, [1024, 0.25]],# 10
  ]

head:
  [[-1, 1, SPPCSPC, [512]],     # 11

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [8, 1, Conv, [256, 1, 1]],  # route backbone P4
   [[-1, -2], 1, Concat, [1]],

   [-1, 1, ELAN_H, [256]],  # 16

   [-1, 1, Conv, [128, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [6, 1, Conv, [128, 1, 1]], # route backbone P3
   [[-1, -2], 1, Concat, [1]],

   [-1, 1, ELAN_H, [128]],  # 21 (P3/8-samll)

   [-1, 1, MP, [256]],
   [[-1, 16], 1, Concat, [1]],

   [-1, 1, ELAN_H, [256]],  # 24 (P4/16-medium)

   [-1, 1, MP, [512]],
   [[-1, 11], 1, Concat, [1]],

   [-1, 1, ELAN_H, [512]],  # 27 (P5/32-large)

   [21, 1, RepConv, [256, 3, 1]], # 28
   [24, 1, RepConv, [512, 3, 1]], # 29
   [27, 1, RepConv, [1024, 3, 1]], # 30

   [[28, 29, 30], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

4 模型搭建(yolo.py)

模型搭建的具体实现方法可见文章YOLOv5算法实现(二)：模型搭建，在YOLOv7中，在模型类中额外添加一个如下函数实现RepConv模块的融合即可。

python 复制代码

    def fuse(self):
        print('Fusing layers...')
        for m in self.model.modules():
            if isinstance(m, RepConv):
                m.fuse_repvgg_block()
        return self