YoloV9改进策略:下采样改进|自研下采样模块(独家改进)|疯狂涨点|附结构图

文章目录

摘要

本文介绍我自研的下采样模块。本次改进的下采样模块是一种通用的改进方法,你可以用分类任务的主干网络中,也可以用在分割和超分的任务中。已经有粉丝用来改进ConvNext模型,取得了非常好的效果,配合一些其他的改进,发一篇CVPR、ECCV之类的顶会完全没有问题。

本次我将这个模块用来改进YoloV9,实现大幅度涨点。

自研下采样模块及其变种

第一种改进方法

将输入分成两个分支,一个分支用卷积,一个分支分成两部分,一部分用MaxPool,一部分用AvgPool。然后,在最后合并起来。代码如下:

python 复制代码
import torch
import torch.nn as nn

def autopad(k, p=None, d=1):  # kernel, padding, dilation
    """Pad to 'same' shape outputs."""
    if d > 1:
        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad
    return p


class Conv(nn.Module):
    """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""

    default_act = nn.SiLU()  # default activation

    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
        """Initialize Conv layer with given arguments including activation."""
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()

    def forward(self, x):
        """Apply convolution, batch normalization and activation to input tensor."""
        return self.act(self.bn(self.conv(x)))

    def forward_fuse(self, x):
        """Perform transposed convolution of 2D data."""
        return self.act(self.conv(x))

class DownSimper(nn.Module):
    """DownSimper."""

    def __init__(self, c1, c2):
        super().__init__()
        self.c = c2 // 2
        self.cv1 = Conv(c1, self.c, 3, 2, d=3)
        self.cv2 = Conv(c1, self.c, 1, 1, 0)

    def forward(self, x):
        x1 = self.cv1(x)
        x = self.cv2(x)
        x2, x3 = x.chunk(2, 1)
        x2 = torch.nn.functional.max_pool2d(x2, 3, 2, 1)
        x3 = torch.nn.functional.avg_pool2d(x3, 3, 2, 1)
        return torch.cat((x1, x2, x3), 1)

结构图:

左侧卷积中d=3,代表使用空洞卷积或者是膨胀卷积,可以带来更大的感受野。d=3,k=3等同卷积核为9.

YoloV9官方测试结果

python 复制代码
yolov9 summary: 580 layers, 60567520 parameters, 0 gradients, 264.3 GFLOPs
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 15/15 00:02
                   all        230       1412      0.878      0.991      0.989      0.732
                   c17        230        131       0.92      0.992      0.994      0.797
                    c5        230         68      0.828          1      0.992      0.807
            helicopter        230         43      0.895      0.977      0.969      0.634
                  c130        230         85      0.955      0.999      0.994      0.684
                   f16        230         57      0.839      0.965      0.966      0.689
                    b2        230          2          1      0.978      0.995      0.647
                 other        230         86       0.91      0.942      0.957      0.525
                   b52        230         70      0.917      0.971      0.979      0.806
                  kc10        230         62      0.958      0.984      0.987      0.826
               command        230         40      0.964          1      0.995      0.815
                   f15        230        123      0.939      0.995      0.995      0.702
                 kc135        230         91      0.949      0.989      0.978      0.691
                   a10        230         27      0.863      0.963      0.982      0.458
                    b1        230         20      0.926          1      0.995      0.712
                   aew        230         25      0.929          1      0.993      0.812
                   f22        230         17      0.835          1      0.995      0.706
                    p3        230        105       0.97          1      0.995      0.804
                    p8        230          1      0.566          1      0.995      0.697
                   f35        230         32      0.908          1      0.995      0.547
                   f18        230        125      0.956      0.992      0.993      0.828
                   v22        230         41      0.921          1      0.995      0.682
                 su-27        230         31      0.925          1      0.994      0.832
                 il-38        230         27      0.899          1      0.995      0.816
                tu-134        230          1      0.346          1      0.995      0.895
                 su-33        230          2       0.96          1      0.995      0.747
                 an-70        230          2      0.718          1      0.995      0.796
                 tu-22        230         98      0.912          1      0.995      0.804

改进方法

将代码复制到common.py中,如下图:

在yolo.py中的parse_model函数中增加DownSimper,如下图:

代码:

python 复制代码
        elif m is DownSimper:
            c2 = args[0]
            c1 = ch[f]
            args = [c1, c2]

修改models/detect/yolov9.yaml配置文件,代码如下:

python 复制代码
# YOLOv9 backbone
backbone:
  [
   [-1, 1, Silence, []],  
   
   # conv down
   [-1, 1, DownSimper, [128]],  # 1-P1/2

   # conv down
   [-1, 1, DownSimper, [256]],  # 2-P2/4

   # elan-1 block
   [-1, 1, RepNCSPELAN4, [256, 128, 64, 1]],  # 3

   # conv down
   [-1, 1, DownSimper, [512]],  # 4-P3/8

   # elan-2 block
   [-1, 1, RepNCSPELAN4, [512, 256, 128, 1]],  # 5

   # conv down
   [-1, 1, DownSimper, [512]],  # 6-P4/16

   # elan-2 block
   [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]],  # 7

   # conv down
   [-1, 1, DownSimper, [512]],  # 8-P5/32

   # elan-2 block
   [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]],  # 9
  ]

# YOLOv9 head
head:
  [
   # elan-spp block
   [-1, 1, SPPELAN, [512, 256]],  # 10

   # up-concat merge
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 7], 1, Concat, [1]],  # cat backbone P4

   # elan-2 block
   [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]],  # 13

   # up-concat merge
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 5], 1, Concat, [1]],  # cat backbone P3

   # elan-2 block
   [-1, 1, RepNCSPELAN4, [256, 256, 128, 1]],  # 16 (P3/8-small)

   # conv-down merge
   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 13], 1, Concat, [1]],  # cat head P4

   # elan-2 block
   [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]],  # 19 (P4/16-medium)

   # conv-down merge
   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5

   # elan-2 block
   [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]],  # 22 (P5/32-large)
   
   # routing
   [5, 1, CBLinear, [[256]]], # 23
   [7, 1, CBLinear, [[256, 512]]], # 24
   [9, 1, CBLinear, [[256, 512, 512]]], # 25
   
   # conv down
   [0, 1, Conv, [64, 3, 2]],  # 26-P1/2

   # conv down
   [-1, 1, Conv, [128, 3, 2]],  # 27-P2/4

   # elan-1 block
   [-1, 1, RepNCSPELAN4, [256, 128, 64, 1]],  # 28

   # conv down fuse
   [-1, 1, Conv, [256, 3, 2]],  # 29-P3/8
   [[23, 24, 25, -1], 1, CBFuse, [[0, 0, 0]]], # 30  

   # elan-2 block
   [-1, 1, RepNCSPELAN4, [512, 256, 128, 1]],  # 31

   # conv down fuse
   [-1, 1, Conv, [512, 3, 2]],  # 32-P4/16
   [[24, 25, -1], 1, CBFuse, [[1, 1]]], # 33 

   # elan-2 block
   [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]],  # 34

   # conv down fuse
   [-1, 1, Conv, [512, 3, 2]],  # 35-P5/32
   [[25, -1], 1, CBFuse, [[2]]], # 36

   # elan-2 block
   [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]],  # 37

   # detect
   [[31, 34, 37, 16, 19, 22], 1, DualDDetect, [nc]],  # DualDDetect(A3, A4, A5, P3, P4, P5)
  ]

修改train_dual.py脚本中的超参数,代码如下:

python 复制代码
   	parser.add_argument('--weights', type=str, default='', help='initial weights path')
    parser.add_argument('--cfg', type=str, default='models/detect/yolov9.yaml', help='model.yaml path')
    parser.add_argument('--data', type=str, default=ROOT / 'data/VOC.yaml', help='dataset.yaml path')
    parser.add_argument('--epochs', type=int, default=100, help='total training epochs')
    parser.add_argument('--batch-size', type=int, default=8, help='total batch size for all GPUs, -1 for autobatch')
    parser.add_argument('--imgsz', '--img', '--img-size', type=int, default=640, help='train, val image size (pixels)')
    parser.add_argument('--workers', type=int, default=0, help='max dataloader workers (per RANK in DDP mode)')
    parser.add_argument('--project', default=ROOT / 'runs/train', help='save to project/name')
    parser.add_argument('--name', default='exp', help='save to project/name')
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')

测试结果

python 复制代码
yolov9 summary: 595 layers, 58708576 parameters, 0 gradients, 274.0 GFLOPs
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 15/15 00:35
                   all        230       1412      0.952      0.974       0.99      0.738
                   c17        230        131      0.981      0.992      0.995      0.832
                    c5        230         68      0.963      0.985      0.995      0.847
            helicopter        230         43      0.968       0.93      0.972      0.635
                  c130        230         85      0.988      0.996      0.995      0.669
                   f16        230         57      0.976      0.947      0.975      0.687
                    b2        230          2      0.767          1      0.995      0.516
                 other        230         86      0.981      0.907      0.968      0.573
                   b52        230         70      0.969      0.971      0.985      0.812
                  kc10        230         62      0.986      0.984      0.989      0.835
               command        230         40      0.988          1      0.995       0.82
                   f15        230        123      0.965      0.992      0.989      0.697
                 kc135        230         91      0.984      0.989      0.981      0.725
                   a10        230         27          1      0.794      0.976      0.495
                    b1        230         20      0.979          1      0.995      0.682
                   aew        230         25      0.944          1      0.995      0.802
                   f22        230         17          1      0.881      0.992      0.717
                    p3        230        105      0.981      0.992      0.995       0.81
                    p8        230          1      0.756          1      0.995      0.697
                   f35        230         32       0.99      0.938      0.982       0.55
                   f18        230        125      0.981      0.992      0.991      0.829
                   v22        230         41       0.99          1      0.995      0.684
                 su-27        230         31      0.985          1      0.995      0.849
                 il-38        230         27      0.987          1      0.995       0.84
                tu-134        230          1      0.756          1      0.995      0.895
                 su-33        230          2       0.99          1      0.995      0.747
                 an-70        230          2      0.838          1      0.995      0.848
                 tu-22        230         98          1          1      0.995      0.832

总结

本文自研下采样模块,实现YoloV9的涨点。欢迎大家在自己的数据集上做尝试。

代码:

python 复制代码
链接:https://pan.baidu.com/s/1QFrpJuHOaDpKTEmBIIxdYw?pwd=cpvq 
提取码:cpvq
相关推荐
阿_旭44 分钟前
一文读懂| 自注意力与交叉注意力机制在计算机视觉中作用与基本原理
人工智能·深度学习·计算机视觉·cross-attention·self-attention
王哈哈^_^1 小时前
【数据集】【YOLO】【目标检测】交通事故识别数据集 8939 张,YOLO道路事故目标检测实战训练教程!
前端·人工智能·深度学习·yolo·目标检测·计算机视觉·pyqt
Power20246662 小时前
NLP论文速读|LongReward:基于AI反馈来提升长上下文大语言模型
人工智能·深度学习·机器学习·自然语言处理·nlp
YRr YRr2 小时前
深度学习:循环神经网络(RNN)详解
人工智能·rnn·深度学习
sp_fyf_20242 小时前
计算机前沿技术-人工智能算法-大语言模型-最新研究进展-2024-11-01
人工智能·深度学习·神经网络·算法·机器学习·语言模型·数据挖掘
红客5972 小时前
Transformer和BERT的区别
深度学习·bert·transformer
多吃轻食3 小时前
大模型微调技术 --> 脉络
人工智能·深度学习·神经网络·自然语言处理·embedding
charles_vaez3 小时前
开源模型应用落地-glm模型小试-glm-4-9b-chat-快速体验(一)
深度学习·语言模型·自然语言处理
YRr YRr3 小时前
深度学习:Transformer Decoder详解
人工智能·深度学习·transformer
Shy9604183 小时前
Bert完形填空
python·深度学习·bert