【YOLOv8改进[Backbone]】使用MobileNetV3助力YOLOv8网络结构轻量化并助力涨点

[一 MobileNetV3](#一 MobileNetV3)

[1 面向块搜索的平台感知NAS和NetAdapt](#1 面向块搜索的平台感知NAS和NetAdapt)

[2 反向残差和线性瓶颈](#2 反向残差和线性瓶颈)

[二使用MobileNetV3助力YOLOv8](#二使用MobileNetV3助力YOLOv8)

[1 整体修改](#1 整体修改)

[① 添加MobileNetV3.py文件](#① 添加MobileNetV3.py文件)

[② 修改ultralytics/nn/tasks.py文件](#② 修改ultralytics/nn/tasks.py文件)

[③ 修改ultralytics/utils/torch_utils.py文件](#③ 修改ultralytics/utils/torch_utils.py文件)

[2 配置文件](#2 配置文件)

[3 训练](#3 训练)

其他

报错

一 MobileNetV3

官方论文地址 ：https://arxiv.org/pdf/1905.02244v5.pdf

官方代码地址 ：https://gitcode.com/Shubhamai/pytorchmobilenet/blob/main/MobileNetV3.py

本论文中提出了基于 ++++互补搜索技术++++ 和 ++++新颖架构设计++++ 相结合的下一代mobilenet 。MobileNetV3通过 ++++硬件感知网络架构搜索(NAS)++++ 和 ++++NetAdapt算法++++ 的结合来调整 优化，适应 移动电话 CPU ，然后 通过新的架构 设计（反转残差结构、线性瓶颈层） 进行改进 。本文开始探索自动搜索算法和网络设计如何协同工作，以利用互补的方法提高整体技术水平。通过这个过程，创建了两个新的MobileNet模型:MobileNetV3-Large和MobileNetV3-Small，它们分别针对高资源和低资源用例 。然后将这些模型应用于目标检测和语义分割任务。对于 ++++语义分割++++ (或任何密集像素预测)的任务，提出了一种新的高效 ++++分割解码器++++ :精简空间金字塔池 化 (LR-ASPP)。在移动分类、检测和分割方面取得了最新的成果。

与MobileNetV2相比，MobileNetV3-Large在减少延迟(减少了20%)的同时，在ImageNet分类上的准确率提高了3.2%。与具有相似可比延迟的MobileNetV2模型相比，MobileNetV3-Small的准确率提高了6.6%，MobileNetV3-Large检测速度超过25%，与MobileNetV2在COCO检测上的精度大致相同。在相似的城市景观分割任务，准确度相近的情况下，MobileNetV3-Large LR-ASPP比MobileNetV2 R-ASPP快34%。

1 面向块搜索的平台感知NAS和NetAdapt

网络搜索已经证明自己是发现和优化网络架构的一个非常强大的工具。对于MobileNetV3，使用平台感知NAS通过优化每个网络块来搜索全局的网络结构。然后，使用NetAdapt算法搜索每层filter的数量。这些技术是互补的，可以结合起来有效地找到针对给定硬件平台的优化模型。

采用了一种平台感知的神经结构方法来寻找全局网络结构。使用相同的基于rnn的控制器和相同的分解分层搜索空间，对于目标延迟约为80ms的大型移动模型，我们发现与[43]Mnasnet: Platform-aware neural architecture search for mobile. 结果相似。简单地重用相同的MnasNet-A1[43]作为最初的大型移动模型，然后在其上应用NetAdapt和其他优化。对于小模型，准确率随延迟的变化更为显著;因此，需要一个较小的权重因子w = - 0.15来补偿不同延迟带来的较大精度变化 。在这个新的权重因子w的增强下，从头开始一个新的架构搜索，以找到初始的种子模型，然后应用NetAdapt (它允许以顺序的方式对各个层进行微调，而不是试图进行粗略的推断而是全局架构) 和其他优化来获得最终的MobileNetV3-Small模型。

2 反向残差和线性瓶颈

MobileNetV2层(反向残差和线性瓶颈)。每个块由狭窄的输入和输出(瓶颈)组成，它们不具有非线性，然后扩展到更高维度的空间并投影到输出。残差连接瓶颈(而不是扩展)。

MobileNetV2 + queeze-and-Excite。与此形成鲜明对比的是，对残差层施加压缩和激励操作。根据不同的层使用不同的非线性。

二使用MobileNetV3助力YOLOv8

1 整体修改

① 添加MobileNetV3.py文件

在ultralytics/nn/modules目录 下新建MobileNetV3.py文件，文件的内容如下：

python 复制代码

"""A from-scratch implementation of MobileNetV3 paper ( for educational purposes ).
Paper
    Searching for MobileNetV3 - https://arxiv.org/abs/1905.02244v5
author : shubham.aiengineer@gmail.com
"""
 
import torch
from torch import nn
from torchsummary import summary
 
__all__ = ['MobileNetV3']
 
class SqueezeExitationBlock(nn.Module):
    def __init__(self, in_channels: int):
        """Constructor for SqueezeExitationBlock.
        Args:
            in_channels (int): Number of input channels.
        """
        super().__init__()
 
        self.pool1 = nn.AdaptiveAvgPool2d(1)
        self.linear1 = nn.Linear(
            in_channels, in_channels // 4
        )  # divide by 4 is mentioned in the paper, 5.3. Large squeeze-and-excite
        self.act1 = nn.ReLU()
        self.linear2 = nn.Linear(in_channels // 4, in_channels)
        self.act2 = nn.Hardsigmoid()
 
    def forward(self, x):
        """Forward pass for SqueezeExitationBlock."""
 
        identity = x
 
        x = self.pool1(x)
        x = torch.flatten(x, 1)
        x = self.linear1(x)
        x = self.act1(x)
        x = self.linear2(x)
        x = self.act2(x)
 
        x = identity * x[:, :, None, None]
 
        return x
 
 
class ConvNormActivationBlock(nn.Module):
    def __init__(
            self,
            in_channels: int,
            out_channels: int,
            kernel_size: list,
            stride: int = 1,
            padding: int = 0,
            groups: int = 1,
            bias: bool = False,
            activation: torch.nn = nn.Hardswish,
    ):
        """Constructs a block containing a convolution, batch normalization and activation layer
        Args:
            in_channels (int): number of input channels
            out_channels (int): number of output channels
            kernel_size (list): size of the convolutional kernel
            stride (int, optional): stride of the convolutional kernel. Defaults to 1.
            padding (int, optional): padding of the convolutional kernel. Defaults to 0.
            groups (int, optional): number of groups for depthwise seperable convolution. Defaults to 1.
            bias (bool, optional): whether to use bias. Defaults to False.
            activation (torch.nn, optional): activation function. Defaults to nn.Hardswish.
        """
        super().__init__()
 
        self.conv = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size,
            stride=stride,
            padding=padding,
            groups=groups,
            bias=bias,
        )
        self.norm = nn.BatchNorm2d(out_channels)
        self.activation = activation()
 
    def forward(self, x):
        """Perform forward pass."""
 
        x = self.conv(x)
        x = self.norm(x)
        x = self.activation(x)
 
        return x
 
 
class InverseResidualBlock(nn.Module):
    def __init__(
            self,
            in_channels: int,
            out_channels: int,
            kernel_size: int,
            expansion_size: int = 6,
            stride: int = 1,
            squeeze_exitation: bool = True,
            activation: nn.Module = nn.Hardswish,
    ):
 
        """Constructs a inverse residual block
        Args:
            in_channels (int): number of input channels
            out_channels (int): number of output channels
            kernel_size (int): size of the convolutional kernel
            expansion_size (int, optional): size of the expansion factor. Defaults to 6.
            stride (int, optional): stride of the convolutional kernel. Defaults to 1.
            squeeze_exitation (bool, optional): whether to add squeeze and exitation block or not. Defaults to True.
            activation (nn.Module, optional): activation function. Defaults to nn.Hardswish.
        """
 
        super().__init__()
 
        self.residual = in_channels == out_channels and stride == 1
        self.squeeze_exitation = squeeze_exitation
 
        self.conv1 = (
            ConvNormActivationBlock(
                in_channels, expansion_size, (1, 1), activation=activation
            )
            if in_channels != expansion_size
            else nn.Identity()
        )  # If it's not the first layer, then we need to add a 1x1 convolutional layer to expand the number of channels
        self.depthwise_conv = ConvNormActivationBlock(
            expansion_size,
            expansion_size,
            (kernel_size, kernel_size),
            stride=stride,
            padding=kernel_size // 2,
            groups=expansion_size,
            activation=activation,
        )
        if self.squeeze_exitation:
            self.se = SqueezeExitationBlock(expansion_size)
 
        self.conv2 = nn.Conv2d(
            expansion_size, out_channels, (1, 1), bias=False
        )  # bias is false because we are using batch normalization, which already has bias
        self.norm = nn.BatchNorm2d(out_channels)
 
    def forward(self, x):
        """Perform forward pass."""
 
        identity = x
 
        x = self.conv1(x)
        x = self.depthwise_conv(x)
 
        if self.squeeze_exitation:
            x = self.se(x)
 
        x = self.conv2(x)
        x = self.norm(x)
 
        if self.residual:
            x = x + identity
 
        return x
 
 
class MobileNetV3(nn.Module):
    def __init__(
            self,
            n_classes: int = 1000,
            input_channel: int = 3,
            config: str = "large",
            dropout: float = 0.8,
    ):
        """Constructs MobileNetV3 architecture
        Args:
        `n_classes`: An integer count of output neuron in last layer, default 1000
        `input_channel`: An integer value input channels in first conv layer, default is 3.
        `config`: A string value indicating the configuration of MobileNetV3, either `large` or `small`, default is `large`.
        `dropout` [0, 1] : A float parameter for dropout in last layer, between 0 and 1, default is 0.8.
        """
 
        super().__init__()
 
        # The configuration of MobileNetv3.
        # input channels, kernel size, expension size, output channels, squeeze exitation, activation, stride
        RE = nn.ReLU
        HS = nn.Hardswish
        configs_dict = {
            "small": (
                (16, 3, 16, 16, True, RE, 2),
                (16, 3, 72, 24, False, RE, 2),
                (24, 3, 88, 24, False, RE, 1),
                (24, 5, 96, 40, True, HS, 2),
                (40, 5, 240, 40, True, HS, 1),
                (40, 5, 240, 40, True, HS, 1),
                (40, 5, 120, 48, True, HS, 1),
                (48, 5, 144, 48, True, HS, 1),
                (48, 5, 288, 96, True, HS, 2),
                (96, 5, 576, 96, True, HS, 1),
                (96, 5, 576, 96, True, HS, 1),
            ),
            "large": (
                (16, 3, 16, 16, False, RE, 1),
                (16, 3, 64, 24, False, RE, 2),
                (24, 3, 72, 24, False, RE, 1),
                (24, 5, 72, 40, True, RE, 2),
                (40, 5, 120, 40, True, RE, 1),
                (40, 5, 120, 40, True, RE, 1),
                (40, 3, 240, 80, False, HS, 2),
                (80, 3, 200, 80, False, HS, 1),
                (80, 3, 184, 80, False, HS, 1),
                (80, 3, 184, 80, False, HS, 1),
                (80, 3, 480, 112, True, HS, 1),
                (112, 3, 672, 112, True, HS, 1),
                (112, 5, 672, 160, True, HS, 2),
                (160, 5, 960, 160, True, HS, 1),
                (160, 5, 960, 160, True, HS, 1),
            ),
        }
 
        self.model = nn.Sequential(
            ConvNormActivationBlock(
                input_channel, 16, (3, 3), stride=2, padding=1, activation=nn.Hardswish
            ),
        )
 
        for (
                in_channels,
                kernel_size,
                expansion_size,
                out_channels,
                squeeze_exitation,
                activation,
                stride,
        ) in configs_dict[config]:
            self.model.append(
                InverseResidualBlock(
                    in_channels=in_channels,
                    out_channels=out_channels,
                    kernel_size=kernel_size,
                    expansion_size=expansion_size,
                    stride=stride,
                    squeeze_exitation=squeeze_exitation,
                    activation=activation,
                )
            )
 
        hidden_channels = 576 if config == "small" else 960
        _out_channel = 1024 if config == "small" else 1280
 
        self.model.append(
            ConvNormActivationBlock(
                out_channels,
                hidden_channels,
                (1, 1),
                bias=False,
                activation=nn.Hardswish,
            )
        )
        if config == 'small':
           self.index = [16, 24, 48, 576]
        else:
            self.index = [24, 40, 112, 960]
        self.width_list = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]
 
    def forward(self, x):
        """Perform forward pass."""
        results = [None, None, None, None]
 
        for model in self.model:
            x = model(x)
            if x.size(1) in self.index:
                position = self.index.index(x.size(1))  # Find the position in the index list
                results[position] = x
            # results.append(x)
        return results
 
 
if __name__ == "__main__":
    # Generating Sample image
    image_size = (1, 3, 640, 640)
    image = torch.rand(*image_size)
 
    # Model
    mobilenet_v3 = MobileNetV3(config="large")
 
    # summary(
    #     mobilenet_v3,
    #     input_data=image,
    #     col_names=["input_size", "output_size", "num_params"],
    #     device="cpu",
    #     depth=2,
    # )
 
    out = mobilenet_v3(image)
    print(out)

② 修改ultralytics/nn/tasks.py文件

具体的修改内容如下图所示：

③ 修改ultralytics/utils/torch_utils.py文件

2 配置文件

yolov8_MobileNetV3.yaml 的内容与原版对比：

3 训练

上述修改完毕后，开始训练吧！🌺🌺🌺

训练示例：

bash 复制代码

yolo task=detect mode=train model=cfg/models/v8/yolov8_MobileNetV3.yaml data=cfg/datasets/coco128.yaml epochs=100 batch=16 device=cpu project=yolov8

其他

如果觉得替换部分内容不方便的话，可以直接复制下述文件对应替换原始py文件的内容：

修改后的task.py