目标检测：YOLOv5实现与训练

1. YOLOv5核心原理

1.1 网络架构概览

graph TD A[Input 640x640x3] --> B[Backbone-CSPDarknet53] B --> C[Neck-PANet] C --> D[Head-Detection Layers] style A fill:#9f9,stroke:#333 style D fill:#f99,stroke:#333

1.2 检测头结构

每个检测层输出维度： <math xmlns="http://www.w3.org/1998/Math/MathML"> o u t p u t = ( 5 + n u m _ c l a s s e s ) × n u m _ a n c h o r s × H × W output = (5 + num\_classes) \times num\_anchors \times H \times W </math>output=(5+num_classes)×num_anchors×H×W

其中：

5: 中心坐标(x,y)、宽高(w,h)、置信度
num_classes: 类别数量
num_anchors: 预设锚框数量（通常为3）

1.3 损失函数组成

<math xmlns="http://www.w3.org/1998/Math/MathML"> L o s s = λ c o o r d ∑ L b o x + λ o b j ∑ L o b j + λ c l s ∑ L c l s Loss = \lambda_{coord}\sum L_{box} + \lambda_{obj}\sum L_{obj} + \lambda_{cls}\sum L_{cls} </math>Loss=λcoord∑Lbox+λobj∑Lobj+λcls∑Lcls

<math xmlns="http://www.w3.org/1998/Math/MathML"> L b o x L_{box} </math>Lbox: CIOU Loss
<math xmlns="http://www.w3.org/1998/Math/MathML"> L o b j L_{obj} </math>Lobj: BCEWithLogitsLoss
<math xmlns="http://www.w3.org/1998/Math/MathML"> L c l s L_{cls} </math>Lcls: BCEWithLogitsLoss

2. 数据准备与增强

2.1 数据格式要求

bash 复制代码

dataset/
├── images/
│   ├── train/
│   └── val/
└── labels/
    ├── train/
    └── val/

标签文件示例（归一化坐标）：

txt 复制代码

# class_id center_x center_y width height
0 0.45 0.32 0.12 0.23
1 0.67 0.51 0.30 0.15

2.2 数据增强策略

python 复制代码

# 官方默认配置（yolov5/data/hyps/hyp.scratch-low.yaml）
augmentations:
  hsv_h: 0.015  # 色调增强
  hsv_s: 0.7    # 饱和度增强 
  hsv_v: 0.4    # 明度增强
  degrees: 0.0  # 旋转角度
  translate: 0.1 # 平移比例
  scale: 0.5    # 缩放比例
  shear: 0.0    # 错切角度
  perspective: 0.0 # 透视变换
  flipud: 0.0   # 上下翻转概率
  fliplr: 0.5   # 左右翻转概率
  mosaic: 1.0   # Mosaic增强概率
  mixup: 0.0    # Mixup增强概率

3. 模型构建与训练

3.1 模型定义核心代码

python 复制代码

import torch
import torch.nn as nn

class Conv(nn.Module):
    # 标准卷积块：Conv2d + BN + SiLU
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, 
                            autopad(k, p), 
                            groups=g, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = nn.SiLU() if act else nn.Identity()

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

class Bottleneck(nn.Module):
    # 残差块
    def __init__(self, c1, c2, shortcut=True, g=1, e=0.5):
        super().__init__()
        c_ = int(c2 * e)
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_, c2, 3, 1, g=g)
        self.add = shortcut and c1 == c2

    def forward(self, x):
        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))

class C3(nn.Module):
    # CSP Bottleneck with 3 convolutions
    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
        super().__init__()
        c_ = int(c2 * e)
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(2 * c_, c2, 1)
        self.m = nn.Sequential(
            *(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))

    def forward(self, x):
        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))

3.2 训练配置

yaml 复制代码

# yolov5/models/yolov5s.yaml
# 官方模型配置示例
nc: 80  # 类别数
depth_multiple: 1.0  # 深度系数
width_multiple: 1.0  # 宽度系数

anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

backbone:
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],    # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],    # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],    # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],   # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],      # 9
  ]

head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)

   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

3.3 训练启动命令

bash 复制代码

# 单GPU训练
python train.py --img 640 --batch 16 --epochs 100 --data coco.yaml --weights yolov5s.pt

# 多GPU分布式训练
torchrun --nproc_per_node 4 train.py --img 640 --batch 64 --epochs 100 --data coco.yaml --weights yolov5s.pt

3.4 训练监控指标

graph LR A[训练损失] --> B[验证mAP] B --> C[类别精度] C --> D[推理速度] style A fill:#9f9,stroke:#333 style D fill:#f99,stroke:#333

4. 模型验证与部署

4.1 验证指标计算

python 复制代码

from utils.metrics import ap_per_class

# 计算mAP
stats = []
for batch_i, (img, targets, paths, shapes) in enumerate(dataloader):
    preds = model(img)
    # 处理预测结果...
    stats.append((correct, pred_conf, pred_cls, target_cls))

# 统计最终指标
tp, fp, p, r, f1, ap, ap_class = ap_per_class(*stats)
mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean()

4.2 ONNX导出

python 复制代码

import torch

model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
model.eval()

# 示例输入
x = torch.randn(1, 3, 640, 640)

# 导出模型
torch.onnx.export(
    model,
    x,
    "yolov5s.onnx",
    export_params=True,
    opset_version=12,
    input_names=['images'],
    output_names=['output'],
    dynamic_axes={
        'images': {0: 'batch_size'},
        'output': {0: 'batch_size'}
    }
)

5. 常见问题解决

5.1 显存不足处理

减小--batch-size参数
使用更小模型（如yolov5n）
启用混合精度训练（添加--half）

5.2 训练不收敛对策

检查数据标注质量
适当增大初始学习率（--lr 0.01）
关闭数据增强（--noval）
检查锚框匹配（--anchor_t 4.0）

5.3 推理优化技巧

python 复制代码

# 使用TensorRT加速
from torch2trt import torch2trt

model_trt = torch2trt(
    model, 
    [x], 
    fp16_mode=True,
    max_workspace_size=1 << 30
)

6. 性能基准测试

模型变体	mAP@0.5	参数量	推理速度（V100）	适用场景
yolov5n	28.4	1.9M	2.7ms	移动端
yolov5s	37.4	7.2M	6.8ms	边缘计算
yolov5m	45.4	21.2M	14.5ms	服务器
yolov5l	49.0	46.5M	26.4ms	高性能
yolov5x	50.7	86.7M	48.3ms	研究级

附录：核心数学公式

CIOU Loss公式

<math xmlns="http://www.w3.org/1998/Math/MathML"> L C I o U = 1 − I o U + ρ 2 ( b , b g t ) c 2 + α v \mathcal{L}_{CIoU} = 1 - IoU + \frac{\rho^2(b,b^{gt})}{c^2} + \alpha v </math>LCIoU=1−IoU+c2ρ2(b,bgt)+αv

其中：

<math xmlns="http://www.w3.org/1998/Math/MathML"> ρ \rho </math>ρ: 中心点欧氏距离
<math xmlns="http://www.w3.org/1998/Math/MathML"> c c </math>c: 最小包围框对角线长度
<math xmlns="http://www.w3.org/1998/Math/MathML"> v v </math>v: 长宽比一致性参数
<math xmlns="http://www.w3.org/1998/Math/MathML"> α = v ( 1 − I o U ) + v \alpha = \frac{v}{(1-IoU)+v} </math>α=(1−IoU)+vv

非极大抑制（NMS）算法

python 复制代码

def nms(boxes, scores, iou_threshold):
    # 按得分降序排序
    order = scores.argsort()[::-1]
    keep = []
    
    while order.size > 0:
        i = order[0]
        keep.append(i)
        # 计算当前框与其他框的IoU
        ious = bbox_iou(boxes[i], boxes[order[1:]])
        # 保留IoU低于阈值的索引
        inds = np.where(ious <= iou_threshold)[0]
        order = order[inds + 1]
    
    return keep

最佳实践建议：

小数据集优先使用预训练权重（--weights yolov5s.pt）
使用W&B或TensorBoard监控训练过程
部署前进行模型量化和速度测试
定期验证数据标注质量

完整训练日志和预训练模型请参考官方仓库

复制代码