【YOLO学习笔记】YOLOv8详解解读

文章目录

一、网络结构
- yolov8.yaml文件解读
- 网络结构图
二、模块代码与结构拆解
- parse_model()
- Conv
- C2f
- SPPF
- Concat
- Detect
三、损失函数详解

一、网络结构

yolov8.yaml文件解读

ultralytics/ultralytics/cfg/models/v8/yolov8.yaml下。

与 yolov5 类似，yolov8 官方提供了5个目标检测的网络版本：yolov8n、yolov8s、yolov8m、yolov8l、yolov8x 各版本的网络结构都是类似的，只是他们有不同的 depth、width、max_channels 。在参数scale这个列表里记录。

depth ：用于控制模块重复次数，假设模块默认重复次数为 n，会通过 depth_multiple * n 控制该模块重复次数，达到控制网络深度的作用
width ：用于控制模块 channel 数，假设模块默认 channel 数为 c，会通过 width_multiple * c 控制该模块 channel 数，达到控制网络宽度的作用
max_channel ：用于控制网络的大小，各版本中模块的 channel 数不能大于其对应的 max_channel 以控制网络大小。若模块的 channel 数大于 max_channel，则截断取 max_channel，对应代码片段如下（c2 为模块输出 channel 数）：

举例 yolov8n举例，网络配置[0.33, 0.25, 1024]为 :

输出 channel 数乘以 0.25
如果网络中某模块重复次数大于1 ，则该模块重复次数乘以 0.33 ，若模块重复次数等于 1，则保持为 1
如果网络中任意一层的 channel 大于 max_channels，则取 max_channels。

网络结构图

yolov8l:

注意：yolov8l 配置中，SPPF 的输出 channel 数设置的是 1024 ，为什么网络结构图中画出来是 512。因为每个配置都有设置 max_channels，如果网络中任意一层的 channel 大于 max_channels，则取 max_channels。

yolov8n自己绘制版本

二、模块代码与结构拆解

parse_model()

ultralytics/nn/tasks.py里的函数parse_model()

python 复制代码

def parse_model(d, ch, verbose=True):
    """
    Parse a YOLO model.yaml dictionary into a PyTorch model.

    Args:
        d (dict): Model dictionary.
        ch (int): Input channels.
        verbose (bool): Whether to print model details.

    Returns:
        model (torch.nn.Sequential): PyTorch model.
        save (list): Sorted list of output layers.
    """
    import ast

    # Args
    legacy = True  # backward compatibility for v3/v5/v8/v9 models
    max_channels = float("inf")
    nc, act, scales = (d.get(x) for x in ("nc", "activation", "scales"))
    depth, width, kpt_shape = (d.get(x, 1.0) for x in ("depth_multiple", "width_multiple", "kpt_shape"))
    if scales:
        scale = d.get("scale")
        if not scale:
            scale = tuple(scales.keys())[0]
            LOGGER.warning(f"no model scale passed. Assuming scale='{scale}'.")
        depth, width, max_channels = scales[scale]

    if act:
        Conv.default_act = eval(act)  # redefine default activation, i.e. Conv.default_act = torch.nn.SiLU()
        if verbose:
            LOGGER.info(f"{colorstr('activation:')} {act}")  # print

    if verbose:
        LOGGER.info(f"\n{'':>3}{'from':>20}{'n':>3}{'params':>10}  {'module':<45}{'arguments':<30}")
    ch = [ch]
    layers, save, c2 = [], [], ch[-1]  # layers, savelist, ch out
    base_modules = frozenset(
        {
            Classify,
            Conv,
            ConvTranspose,
            GhostConv,
            Bottleneck ,Bottleneck_DBB,
            GhostBottleneck,
            SPP,
            SPPF,
            C2fPSA,
            C2PSA,
            DWConv,
            Focus,
            BottleneckCSP,
            C1,
            C2,
            C2f,
            C3k2, C3k2_WT ,C3k2_DBB ,
            RepNCSPELAN4,
            ELAN1,
            ADown,
            AConv,
            SPPELAN,
            C2fAttn,
            C3,
            C3TR,
            C3Ghost,
            torch.nn.ConvTranspose2d,
            DWConvTranspose2d,
            C3x,
            RepC3,
            PSA,
            SCDown,
            C2fCIB,
            A2C2f,
        }
    )
    repeat_modules = frozenset(  # modules with 'repeat' arguments
        {
            BottleneckCSP,
            C1,
            C2,
            C2f,
            C3k2, C3k2_WT ,C3k2_DBB,
            C2fAttn,
            C3,
            C3TR,
            C3Ghost,
            C3x,
            RepC3,
            C2fPSA,
            C2fCIB,
            C2PSA,
            A2C2f,
        }
    )
    for i, (f, n, m, args) in enumerate(d["backbone"] + d["head"]):  # from, number, module, args
        m = (
            getattr(torch.nn, m[3:])
            if "nn." in m
            else getattr(__import__("torchvision").ops, m[16:])
            if "torchvision.ops." in m
            else globals()[m]
        )  # get module
        for j, a in enumerate(args):
            if isinstance(a, str):
                with contextlib.suppress(ValueError):
                    args[j] = locals()[a] if a in locals() else ast.literal_eval(a)
        n = n_ = max(round(n * depth), 1) if n > 1 else n  # depth gain
        if m in base_modules:
            c1, c2 = ch[f], args[0]
            if c2 != nc:  # if c2 not equal to number of classes (i.e. for Classify() output)
                c2 = make_divisible(min(c2, max_channels) * width, 8)
            if m is C2fAttn:  # set 1) embed channels and 2) num heads
                args[1] = make_divisible(min(args[1], max_channels // 2) * width, 8)
                args[2] = int(max(round(min(args[2], max_channels // 2 // 32)) * width, 1) if args[2] > 1 else args[2])

            args = [c1, c2, *args[1:]]
            if m in repeat_modules:
                args.insert(2, n)  # number of repeats
                n = 1
            if m is C3k2:  # for M/L/X sizes
                legacy = False
                if scale in "mlx":
                    args[3] = True
            if m is A2C2f:
                legacy = False
                if scale in "lx":  # for L/X sizes
                    args.extend((True, 1.2))
            if m is C2fCIB:
                legacy = False
        elif m is AIFI:
            args = [ch[f], *args]
        elif m in frozenset({HGStem, HGBlock}):
            c1, cm, c2 = ch[f], args[0], args[1]
            args = [c1, cm, c2, *args[2:]]
            if m is HGBlock:
                args.insert(4, n)  # number of repeats
                n = 1
        elif m is ResNetLayer:
            c2 = args[1] if args[3] else args[1] * 4
        elif m is torch.nn.BatchNorm2d:
            args = [ch[f]]
        elif m is Concat:
            c2 = sum(ch[x] for x in f)
        elif m in frozenset(
            {Detect, WorldDetect, YOLOEDetect, Segment, YOLOESegment, Pose, OBB, ImagePoolingAttn, v10Detect}
        ):
            args.append([ch[x] for x in f])
            if m is Segment or m is YOLOESegment:
                args[2] = make_divisible(min(args[2], max_channels) * width, 8)
            if m in {Detect, YOLOEDetect, Segment, YOLOESegment, Pose, OBB}:
                m.legacy = legacy
        elif m is RTDETRDecoder:  # special case, channels arg must be passed in index 1
            args.insert(1, [ch[x] for x in f])
        elif m is CBLinear:
            c2 = args[0]
            c1 = ch[f]
            args = [c1, c2, *args[1:]]
        elif m is CBFuse:
            c2 = ch[f[-1]]
        elif m in frozenset({TorchVision, Index}):
            c2 = args[0]
            c1 = ch[f]
            args = [*args[1:]]
        else:
            c2 = ch[f]

        m_ = torch.nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module
        t = str(m)[8:-2].replace("__main__.", "")  # module type
        m_.np = sum(x.numel() for x in m_.parameters())  # number params
        m_.i, m_.f, m_.type = i, f, t  # attach index, 'from' index, type
        if verbose:
            LOGGER.info(f"{i:>3}{str(f):>20}{n_:>3}{m_.np:10.0f}  {t:<45}{str(args):<30}")  # print
        save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1)  # append to savelist
        layers.append(m_)
        if i == 0:
            ch = []
        ch.append(c2)
    return torch.nn.Sequential(*layers), sorted(save)

parse_model() 是 Ultralytics YOLO 框架中用于解析模型yaml配置文件并构建 PyTorch 模型的核心函数，主要功能是将 model.yaml 配置文件中定义的字典结构（网络架构、参数等）转换为可执行的 PyTorch 模型（torch.nn.Sequential）。

解析模型配置参数

从输入的模型字典 d 中提取关键参数，包括：
- 网络深度控制（depth_multiple）、宽度控制（width_multiple）：用于按比例调整网络模块的重复次数和通道数（与 YOLO 系列中通过这两个参数控制模型规模的逻辑一致）。
- 类别数（nc）、激活函数（activation）、尺度参数（scales）等：为网络层初始化提供基础配置。
动态构建网络层

遍历配置文件中 backbone（主干网络）和 head（检测头）定义的每一层结构（格式为 (from, number, module, args)），逐一生成对应的 PyTorch 模块：
- 模块解析 ：根据 module 名称（如 Conv、C2f、Detect 等）加载对应的类（如 Conv 卷积层、C2f 残差模块、Detect 检测头）。
- 参数调整 ：
  - 基于 depth_multiple 调整模块重复次数（n），控制网络深度；
  - 基于 width_multiple 调整通道数（c2），并通过 make_divisible 确保通道数为 8 的倍数（优化 GPU 计算效率）；
  - 针对特殊模块（如注意力模块、检测头）补充特定参数（如注意力头数、输入通道列表等）。
管理网络层连接与输出
- 跟踪每一层的输入输出通道（ch 列表），确保层与层之间的通道匹配；
- 记录需要保存的中间层索引（save 列表），用于后续特征融合或调试。
输出构建结果

最终返回：
- 由所有层组成的 torch.nn.Sequential 模型（可直接用于训练或推理）；
  python 复制代码
```
m_ = torch.nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module
```
- 排序后的 save 列表（包含需要保留输出的层索引）。