【红外小目标检测实战四】使用风车卷积和Neck多层融合改进

接前篇，通道数和模型结构不变的情况下，砍掉P1头，减半通道数，保持4个头训练，参数量和计算量急剧下降的同时，召回率还有提升。虽说FLOPs也下降了很多，但是召回率才0.78，还有待提升。本篇文章从改进Neck部分和风车卷积插入进行实验

本次优化思路

P1层计算量非常大，引出P1头对边缘设备不友好，因此删除它，考虑从特征融合部分结构改进。

前两层找一个对小目标友好的模块，Neck部分融合时反复进行，最好拿到原始的高分辨率特征图进行融合，保证信息不失真。

P5检测头可以砍掉，对小目标没有什么作用，留着还增加计算量。

模型信息

模型结构图

还是使用Yolov11改进实验：

参考PRNet和风车卷积设计模型结构。

模型参数量分析

总体FLOPs很小，只有16.7G，参数量也小，2.2M，对标yolov11-n。

风车卷积

风车卷积主要是为了适配红外小目标中心特征集中、边缘衰减的高斯 - like 空间分布，通过非对称填充和分组卷积，构建 "中心密集、边缘扩散" 的感受野，增强底层特征提取能力。

传统标准卷积采用对称填充与固定大小核（如 3×3），对所有区域无差别提取特征，无法优先捕捉中心关键信息，且感受野有限。

PConv 的设计正是基于这一分布特性，通过 "非对称填充 + 并行分支卷积" 构建 "中心权重高、边缘权重低" 的类高斯感受野，让卷积操作更贴合红外小目标的成像规律，同时避免标准卷积的信息浪费与参数冗余。

PConv采用非对称填充来为图像的不同区域创建水平和垂直卷积核。这些卷积核向外扩散，其中h1、w1和c1分别表示输入张量X(h1,w1,c1)的高度、宽度和通道大小。为了增强训练的稳定性和速度，我们在每次卷积后应用批归一化（BN）和SiLU激活函数。PConv的第一层执行并行卷积，具体如下：

风车卷积的优点

感受野扩展：PConv通过分组卷积和不对称填充，显著扩大了感受野，相比标准卷积（3×3）提升了177%-444%。
参数效率：PConv在扩大感受野的同时，仅增加了少量参数（111%-122%），显著提高了参数效率。
特征提取能力：红外小目标通常呈 "中心亮、边缘暗" 的类高斯灰度分布，PConv 通过 "四分支非对称填充 + 交错卷积"，构建中心权重高、边缘权重低的感受野。

代码修改

yaml配置文件

yaml 复制代码

# Neck部分多次高分辨率融合,前两层使用风车卷积提取特征

nc: 1  # number of classes

backbone:
  # [from, repeats, module, args]
  - [-1, 1, PConv, [32, 4, 2]]           
  - [-1, 1, PConv, [64, 3, 2]]           
  - [-1, 1, C3k2, [64, True, 0.25]] # 2  P2
  - [-1, 1, Conv, [64, 3, 2]]
  - [-1, 1, C3k2, [128, True, 0.25]] # 4  P3
  - [-1, 1, Conv, [128, 3, 2]]
  - [-1, 1, C3k2, [128, False]]  # 6  P4
  - [-1, 1, Conv, [256, 3, 2]]
  - [-1, 1, C3k2, [256, False]]  # 8
  - [-1, 1, SPPF, [256, 5]]
  - [-1, 1, C2PSA, [256]]  # 10

head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 1, C3k2, [128, False]]  # 13

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 1, C3k2, [64, False]]  # 16

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 2], 1, Concat, [1]] # cat backbone P2
  - [-1, 1, C3k2, [32, False]]  # 19

  # part 2
  - [-1, 1, Conv, [64, 3, 2]]           
  - [-1, 1, C3k2, [128, False]]  # 21 
  - [[-1, 4], 1, Concat, [1]]  # cat backbone P3
  - [-1, 1, C3k2, [64, False]]

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 2], 1, Concat, [1]]  # cat backbone P2
  - [-1, 1, C3k2, [64, False]]  # 26

  # part 3
  - [-1, 1, Conv, [32, 3, 2]]
  - [-1, 1, Conv, [64, 3, 2]]  # 28
  - [-1, 1, C3k2, [128, False]] # 29

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]]
  - [-1, 1, C3k2, [64, False]]  # 32

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 2], 1, Concat, [1]]
  - [[-1, 26], 1, Concat, [1]]
  - [-1, 1, C3k2, [32, False]]  # 36

  - [[29, 32, 36], 1, Detect, [nc]]  # Detect(P2, P3, P4)

ultralytics/nn/moudules

新增APConv.py文件

python 复制代码

import torch
import torch.nn as nn
import torch.nn.functional as F


def autopad(k, p=None, d=1):  # kernel, padding, dilation
    """Pad to 'same' shape outputs."""
    if d > 1:
        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad
    return p


class Conv(nn.Module):
    """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""

    default_act = nn.SiLU()  # default activation

    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
        """Initialize Conv layer with given arguments including activation."""
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()

    def forward(self, x):
        """Apply convolution, batch normalization and activation to input tensor."""
        return self.act(self.bn(self.conv(x)))

    def forward_fuse(self, x):
        """Perform transposed convolution of 2D data."""
        return self.act(self.conv(x))


class PConv(nn.Module):  
    ''' Pinwheel-shaped Convolution using the Asymmetric Padding method. '''
    
    def __init__(self, c1, c2, k, s):
        super().__init__()

        # self.k = k
        p = [(k, 0, 1, 0), (0, k, 0, 1), (0, 1, k, 0), (1, 0, 0, k)]
        self.pad = [nn.ZeroPad2d(padding=(p[g])) for g in range(4)]
        self.cw = Conv(c1, c2 // 4, (1, k), s=s, p=0)
        self.ch = Conv(c1, c2 // 4, (k, 1), s=s, p=0)
        self.cat = Conv(c2, c2, 2, s=1, p=0)

    def forward(self, x):
        yw0 = self.cw(self.pad[0](x))
        yw1 = self.cw(self.pad[1](x))
        yh0 = self.ch(self.pad[2](x))
        yh1 = self.ch(self.pad[3](x))
        return self.cat(torch.cat([yw0, yw1, yh0, yh1], dim=1))


class APC2f(nn.Module):
    """Faster Implementation of APCSP Bottleneck with Asymmetric Padding convolutions."""

    def __init__(self, c1, c2, n=1, shortcut=False, P=True, g=1, e=0.5):
        """Initialize CSP bottleneck layer with two convolutions with arguments ch_in, ch_out, number, shortcut, groups,
        expansion.
        """
        super().__init__()
        self.c = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, 2 * self.c, 1, 1)
        self.cv2 = Conv((2 + n) * self.c, c2, 1)  # optional act=FReLU(c2)
        if P:
            self.m = nn.ModuleList(APBottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))
        else:
            self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))

    def forward(self, x):
        """Forward pass through APC2f layer."""
        y = list(self.cv1(x).chunk(2, 1))
        y.extend(m(y[-1]) for m in self.m)
        return self.cv2(torch.cat(y, 1))

    def forward_split(self, x):
        """Forward pass using split() instead of chunk()."""
        y = list(self.cv1(x).split((self.c, self.c), 1))
        y.extend(m(y[-1]) for m in self.m)
        return self.cv2(torch.cat(y, 1))


class APBottleneck(nn.Module):
    """Asymmetric Padding bottleneck."""

    def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
        """Initializes a bottleneck module with given input/output channels, shortcut option, group, kernels, and
        expansion.
        """
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        p = [(2,0,2,0),(0,2,0,2),(0,2,2,0),(2,0,0,2)]
        self.pad = [nn.ZeroPad2d(padding=(p[g])) for g in range(4)]
        self.cv1 = Conv(c1, c_ // 4, k[0], 1, p=0)
        self.cv2 = Conv(c_, c2, k[1], 1, g=g)
        self.add = shortcut and c1 == c2

    def forward(self, x):
        """'forward()' applies the YOLO FPN to input data."""
        return x + self.cv2((torch.cat([self.cv1(self.pad[g](x)) for g in range(4)], 1))) if self.add else self.cv2((torch.cat([self.cv1(self.pad[g](x)) for g in range(4)], 1)))
    

class Bottleneck(nn.Module):
    """Standard bottleneck."""

    def __init__(
        self, c1: int, c2: int, shortcut: bool = True, g: int = 1, k: tuple[int, int] = (3, 3), e: float = 0.5
    ):
        """
        Initialize a standard bottleneck module.

        Args:
            c1 (int): Input channels.
            c2 (int): Output channels.
            shortcut (bool): Whether to use shortcut connection.
            g (int): Groups for convolutions.
            k (tuple): Kernel sizes for convolutions.
            e (float): Expansion ratio.
        """
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, k[0], 1)
        self.cv2 = Conv(c_, c2, k[1], 1, g=g)
        self.add = shortcut and c1 == c2

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """Apply bottleneck with optional shortcut connection."""
        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))

train.py

python 复制代码

from ultralytics import YOLO 
 

if __name__ == '__main__':
  v11 = YOLO('xxx.yaml')
  
  results = v11.train(
    data        = 'dataxxx.yaml', 
    epochs      = 300,
    batch       = 32, 
    cache       = True,
    mosaic      = 0.0,
    imgsz       = 640,
    copy_paste  = 0.2, 
    device      = "0",  
    workers     = 16,
    project     = 'xxx',
    plots       = True,
    name        = 'xxx',
  )

其它修改

（1）/ultralytics/nn/modules/init .py中加入PConv

（2）task.py中parse_moudules函数的base_moudules中加入PConv

实验结果

训练过程图

测试集上混淆矩阵

实验总结

目前此网络召回率达到0.94，已经达到我的心理预期了，计算量只有16.7G FLOPs，可以说是很低了，完全可以实时在边缘设备推理。

对比数据

模型	大小	P	R	FLOPs
原始yolov11s	19M	0.95	0.59	21.7G
通道减半	2.2M	0.92	0.94	16.7G

后续工作

1.RK和jetson部署

2.丰富测试集进行更严苛的测试和改进

【红外小目标检测实战 四】使用风车卷积和Neck多层融合改进