yoloV5训练visDrone2019-Det无人机视觉下目标检测

一、visDrone2019数据集详解

visDrone2019数据集是无人机视角下最具挑战性的目标检测基准数据集之一，由天津大学机器学习与数据挖掘实验室联合其他研究机构共同构建。该数据集采集自中国14个不同城市，覆盖复杂城市场景、交通枢纽、密集人群等多种环境。

数据集构成：

训练集：6,471张图像（含6471个标注文件）
验证集：548张图像
测试集：3,190张图像
标注类别：10个主要目标类别
pedestrian（行人）、people（人群）、bicycle（自行车）、car（轿车）、van（厢式货车）、truck（卡车）、tricycle（三轮车）、awning-tricycle（带篷三轮车）、bus（公交车）、motor（摩托车）

标注格式特点：

每个图像对应.txt标注文件包含多行数据，每行格式为：
<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<category>,<truncation>,<occlusion>

其中关键参数说明：

bbox_left：边界框左上角x坐标
bbox_top：边界框左上角y坐标
category：目标类别（1-10对应上述类别）
occlusion：遮挡程度（0=完全可见，1=部分遮挡，2=严重遮挡）

二、数据预处理与标签格式转换

YOLOv5要求标签格式为归一化的中心坐标+宽高形式：
<class_id> <x_center> <y_center> <width> <height>

转换步骤详解：

类别说明（visdrone.names）：

0 pedestrian
1 people
2 bicycle
3 car
4 van
5 truck
6 tricycle
7 awning-tricycle
8 bus
9 motor
Python转换脚本：

python 复制代码

def visdrone2yolo(dir):
      from PIL import Image
      from tqdm import tqdm

      def convert_box(size, box):
          # Convert VisDrone box to YOLO xywh box
          dw = 1. / size[0]
          dh = 1. / size[1]
          return (box[0] + box[2] / 2) * dw, (box[1] + box[3] / 2) * dh, box[2] * dw, box[3] * dh

      (dir / 'labels').mkdir(parents=True, exist_ok=True)  # make labels directory
      pbar = tqdm((dir / 'annotations').glob('*.txt'), desc=f'Converting {dir}')
      for f in pbar:
          img_size = Image.open((dir / 'images' / f.name).with_suffix('.jpg')).size
          lines = []
          with open(f, 'r') as file:  # read annotation.txt
              for row in [x.split(',') for x in file.read().strip().splitlines()]:
                  if row[4] == '0':  # VisDrone 'ignored regions' class 0
                      continue
                  cls = int(row[5]) - 1
                  box = convert_box(img_size, tuple(map(int, row[:4])))
                  lines.append(f"{cls} {' '.join(f'{x:.6f}' for x in box)}\n")
                  with open(str(f).replace(os.sep + 'annotations' + os.sep, os.sep + 'labels' + os.sep), 'w') as fl:
                      fl.writelines(lines)  # write label.txt


  # Download
  dir = Path('')  # 修改为你保存VisDrone数据的目录
  # Convert
  for d in 'VisDrone2019-DET-train', 'VisDrone2019-DET-val', 'VisDrone2019-DET-test-dev':
      visdrone2yolo(dir / d)  # convert VisDrone annotations to YOLO labels

关键注意事项：

处理遮挡标注：建议保留occlusion<2的样本（过滤严重遮挡目标）
小目标处理：无人机图像普遍存在小目标问题，需保持原始分辨率
类别平衡：分析各类别分布，必要时进行过采样/欠采样

三、YOLOv5参数设置

1. 数据配置文件（visdrone.yaml）

yaml 复制代码

path: ../datasets/VisDrone  # dataset root dir
train: VisDrone2019-DET-train/images  # train images (relative to 'path')  6471 images
val: VisDrone2019-DET-val/images  # val images (relative to 'path')  548 images
test: VisDrone2019-DET-test-dev/images  # test images (optional)  1610 images

# Classes
names:
  0: pedestrian
  1: people
  2: bicycle
  3: car
  4: van
  5: truck
  6: tricycle
  7: awning-tricycle
  8: bus
  9: motor

2. 关键训练参数解析

预训练模型：

python 复制代码

--weights yolov5s.pt

训练超参选择hyp.scratch-low.yaml

python 复制代码

--hyp scratch-low.yaml

yp.scratch-low.yaml 参数基本说明

yaml 复制代码

lr0: 0.01        # 初始学习率
lrf: 0.1         # 最终学习率系数
momentum: 0.937  # SGD动量
weight_decay: 0.0005  # 权重衰减
warmup_epochs: 3.0    # 学习率预热
warmup_momentum: 0.8  
warmup_bias_lr: 0.1   
box: 0.05        # 边界框损失权重
cls: 0.3         # 分类损失权重
cls_pw: 1.0      # 分类正样本权重
obj: 0.7         # 目标存在损失权重
obj_pw: 1.0      # 目标存在正样本权重
iou_t: 0.20      # IoU训练阈值
anchor_t: 4.0    # anchor阈值
fl_gamma: 0.0    # Focal loss gamma

四、小结

在训练启动的过程中，提示默认的anchors值不是最优的配置，并给出了重新聚类后的anchors。
VisDrone2019存在大量的极小目标，使用640*640的训练尺寸，会导致在训练时，大量的小目标，被缩放为一个点目标，这对训练会起到副作用，所以，这个问题得去优化。
VisDrone2019存在inogre区域（有目标，但是未标注），应该进行抹黑处理，并出现标注不完整的情况。