专栏:YOLO 全面解析与改进
当前状态:原始 YOLO 系列模型基线训练中,本文先整理数据集背景、实验动机和批量训练脚本设计,完整结果将在训练结束后补充。
1. 为什么把 VisDrone 作为 YOLO 改进实验的数据集
在通用目标检测任务中,COCO、VOC 等数据集更适合作为算法的基础能力验证;但如果目标是研究 YOLO 在真实复杂场景中的改进空间,航拍无人机场景往往更有参考价值。VisDrone 就是这类任务中使用较多的基准数据集之一。
VisDrone 由天津大学 AISKYEYE 团队发布,数据来自不同无人机平台,覆盖中国 14 个城市中的城市、乡村、稀疏目标、密集目标、不同天气和光照条件等场景。官方仓库介绍中给出的整体规模为 288 段视频、261,908 帧视频帧和 10,209 张静态图像,并提供超过 260 万个目标边界框标注,同时包含目标类别、遮挡、可见性等属性信息。VisDrone 官方仓库
这类数据对 YOLO 模型并不友好。原因主要有三点:
- 小目标比例高。无人机视角下,行人、车辆、摩托车等目标往往只占图像中很小区域。
- 目标密度变化大。部分画面中目标稀疏,部分城市道路或广场画面中目标高度密集。
- 视角和尺度波动明显。无人机拍摄高度、俯仰角、运动状态不同,会造成目标尺度、姿态、遮挡情况显著变化。
因此,VisDrone 很适合作为"原始 YOLO 基线"和"改进 YOLO 模型"之间的对比场。相比只在简单数据集上比较 mAP,VisDrone 更容易暴露模型在小目标召回、密集目标定位、多尺度特征融合等方面的问题。
2. VisDrone2019-DET 数据集结构
本文使用的是 VisDrone 中的图像目标检测子任务,即 VisDrone2019-DET。Ultralytics 官方文档将 VisDrone 组织为多个任务子集,包括图像目标检测、视频目标检测、单目标跟踪、多目标跟踪和人群计数。Ultralytics VisDrone 文档
在 Ultralytics 的 VisDrone.yaml 中,检测任务的路径和划分如下:
| 划分 | 路径 | 图像数量 | 用途 |
|---|---|---|---|
| train | images/train |
6471 | 模型训练 |
| val | images/val |
548 | 训练过程验证与模型选择 |
| test | images/test |
1610 | test-dev,可用于后续测试 |
类别共 10 类:
| ID | 类别 |
|---|---|
| 0 | pedestrian |
| 1 | people |
| 2 | bicycle |
| 3 | car |
| 4 | van |
| 5 | truck |
| 6 | tricycle |
| 7 | awning-tricycle |
| 8 | bus |
| 9 | motor |
这里需要特别注意 pedestrian 和 people 这两个类别。它们在语义上接近,但在 VisDrone 标注体系中是两个独立类别。对于通用检测实验,应当保留官方类别定义;如果后续任务只关心"人"这一类,则可以考虑将二者合并,但那已经属于任务重定义,不应与官方 10 类基线直接比较。
3. 标注格式与 YOLO 格式转换
VisDrone 原始标注并不是 YOLO 的 class x_center y_center width height 格式。Ultralytics 的数据集配置中提供了自动下载与转换逻辑:读取 VisDrone 原始标注后,将左上角坐标和宽高转换为归一化的中心点坐标和宽高,并写入 YOLO 标签文件。
转换过程还会跳过 ignored regions。也就是说,原始标注中被标记为忽略区域的框不会进入训练标签。这一点很重要,因为无人机场景中存在大量难以明确判定的区域,如果直接把 ignored regions 当作普通目标训练,模型会学到噪声标注。
本实验直接使用项目中的:
text
ultralytics/cfg/datasets/VisDrone.yaml
该配置已经定义了数据路径、类别名称、训练/验证/测试划分,以及从 VisDrone 原始格式到 YOLO 格式的转换方式。
4. 本文实验定位:原始 YOLO 系列基线
本次实验的目的不是立刻提出改进结构,而是先建立一组可复现的原始 YOLO 基线。后续所有结构改进、注意力机制、小目标检测头、损失函数调整或 Neck/FPN/PAN 改造,都需要和这组基线比较。
因此,本文中的"原始 YOLO 模型"指的是:
- 使用项目中已有的 YOLO 系列 YAML 结构配置;
- 不额外添加注意力模块;
- 不修改检测头;
- 不修改损失函数;
- 不调整数据集类别定义;
- 使用同一套训练脚本、训练参数和验证流程。
当前 benchmark_visdrone_fixed.py 中批量训练的模型包括:
text
YOLOv8l
YOLOv10l
YOLO11l
YOLO12l
YOLO26l
RT-DETR-l
benchmark_visdrone_fixed.py代码:
python
#!/usr/bin/env python3
"""
VisDrone multi-model benchmark with safe DDP training.
Key difference from benchmark_visdrone.py:
- DDP is used only for training.
- Validation runs in a separate normal Python process after torchrun exits.
This avoids rank 0 entering model.val() while rank 1..N have already skipped
validation, which can hang or fail in distributed metric gathering.
"""
from __future__ import annotations
import argparse
import csv
import json
import os
import socket
import subprocess
import sys
from pathlib import Path
ROOT = Path(__file__).resolve().parent
if str(ROOT) not in sys.path:
sys.path.insert(0, str(ROOT))
env_pythonpath = os.environ.get("PYTHONPATH", "")
if str(ROOT) not in env_pythonpath:
os.environ["PYTHONPATH"] = str(ROOT) + (os.pathsep + env_pythonpath if env_pythonpath else "")
YOLO_MODELS = {
"YOLOv8l": "yolov8l.yaml",
"YOLOv10l": "yolov10l.yaml",
"YOLO11l": "yolo11l.yaml",
"YOLO12l": "yolo12l.yaml",
"YOLO26l": "yolo26l.yaml",
}
RTDETR_MODELS = {
"RT-DETR-l": "rtdetr-l.yaml",
}
TRAIN_WORKER_TEMPLATE = '''\
import os
import sys
import traceback
from pathlib import Path
ROOT = Path({root!r})
sys.path.insert(0, str(ROOT))
os.environ["PYTHONPATH"] = str(ROOT)
from ultralytics import RTDETR, YOLO
from ultralytics.utils import RANK
model_name = {model_name!r}
model_cfg = {model_cfg!r}
is_rtdetr = {is_rtdetr!r}
data_yaml = {data!r}
epochs = {epochs}
batch = {batch}
imgsz = {imgsz}
device_ids = {device_ids!r}
workers = {workers}
seed = {seed}
save_dir = Path({save_dir!r})
if __name__ == "__main__":
try:
model = RTDETR(model=model_cfg) if is_rtdetr else YOLO(model=model_cfg)
if RANK in {{-1, 0}}:
params_m = sum(p.numel() for p in model.model.parameters()) / 1e6
print(f"[INFO] {{model_name}} params: {{params_m:.2f}} M")
model.train(
data=data_yaml,
epochs=epochs,
batch=batch,
imgsz=imgsz,
device=device_ids,
workers=workers,
seed=seed,
patience=0,
verbose=(RANK in {{-1, 0}}),
plots=False,
project=str(save_dir),
name=model_name,
exist_ok=True,
)
except Exception:
traceback.print_exc()
sys.exit(1)
'''
VAL_WORKER_TEMPLATE = '''\
import json
import os
import sys
import traceback
from pathlib import Path
ROOT = Path({root!r})
sys.path.insert(0, str(ROOT))
os.environ["PYTHONPATH"] = str(ROOT)
from ultralytics import RTDETR, YOLO
model_name = {model_name!r}
weights = {weights!r}
is_rtdetr = {is_rtdetr!r}
data_yaml = {data!r}
imgsz = {imgsz}
batch = {batch}
device = {device!r}
workers = {workers}
results_file = Path({results_file!r})
if __name__ == "__main__":
try:
model = RTDETR(model=weights) if is_rtdetr else YOLO(model=weights)
params_m = sum(p.numel() for p in model.model.parameters()) / 1e6
val_results = model.val(
data=data_yaml,
split="val",
imgsz=imgsz,
batch=batch,
device=device,
workers=workers,
verbose=False,
plots=False,
)
speed = val_results.speed
metrics = {{
"model": model_name,
"params_M": round(params_m, 2),
"mAP50": round(float(val_results.box.map50), 4),
"mAP50_95": round(float(val_results.box.map), 4),
"precision": round(float(val_results.box.mp), 4),
"recall": round(float(val_results.box.mr), 4),
"inference_ms": round(float(speed.get("inference", 0)), 2),
}}
results_file.parent.mkdir(parents=True, exist_ok=True)
with open(results_file, "w", encoding="utf-8") as f:
json.dump(metrics, f)
print(f"[DONE] {{model_name}} metrics saved to {{results_file}}")
except Exception:
traceback.print_exc()
sys.exit(1)
'''
def parse_args() -> argparse.Namespace:
all_model_names = list(YOLO_MODELS) + list(RTDETR_MODELS)
parser = argparse.ArgumentParser(description="VisDrone multi-model benchmark")
parser.add_argument("--data", type=str, default="ultralytics/cfg/datasets/VisDrone.yaml")
parser.add_argument("--epochs", type=int, default=300)
parser.add_argument("--batch", type=int, default=128, help="Global training batch size.")
parser.add_argument("--imgsz", type=int, default=640)
parser.add_argument("--device", type=str, default="0,1,2,3,4,5,6,7")
parser.add_argument("--workers", type=int, default=4)
parser.add_argument("--val-device", type=str, default="0", help="Device used by the post-training val process.")
parser.add_argument(
"--val-batch",
type=int,
default=None,
help="Validation batch size. Defaults to train batch divided by GPU count.",
)
parser.add_argument(
"--models",
type=str,
nargs="+",
default=None,
help="Models to benchmark. Choices: " + " ".join(all_model_names),
)
parser.add_argument("--seed", type=int, default=42)
return parser.parse_args()
def find_free_port() -> int:
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("127.0.0.1", 0))
return s.getsockname()[1]
def clean_ddp_env(env: dict[str, str]) -> dict[str, str]:
env = env.copy()
for key in (
"RANK",
"LOCAL_RANK",
"WORLD_SIZE",
"LOCAL_WORLD_SIZE",
"MASTER_ADDR",
"MASTER_PORT",
"GROUP_RANK",
"ROLE_RANK",
"ROLE_WORLD_SIZE",
"TORCHELASTIC_RUN_ID",
):
env.pop(key, None)
return env
def selected_models(args: argparse.Namespace) -> dict[str, tuple[str, bool]]:
all_models = {k: (v, False) for k, v in YOLO_MODELS.items()}
all_models.update({k: (v, True) for k, v in RTDETR_MODELS.items()})
if args.models is None:
return all_models
selected = {k: v for k, v in all_models.items() if k in args.models}
missing = [m for m in args.models if m not in all_models]
if missing:
raise ValueError(f"Unknown model(s): {missing}. Available: {list(all_models)}")
return selected
def run_one_model(model_name: str, model_cfg: str, is_rtdetr: bool, args: argparse.Namespace) -> dict | None:
print("\n" + "=" * 80)
print(f">>> Training: {model_name}")
print("=" * 80)
device_ids = [x.strip() for x in args.device.split(",") if x.strip()]
num_gpus = len(device_ids)
if num_gpus < 1:
raise ValueError("--device must contain at least one CUDA device id, e.g. 0 or 0,1,2,3")
if args.batch % num_gpus != 0:
raise ValueError(f"--batch {args.batch} must be divisible by GPU count {num_gpus}")
dataset_name = Path(args.data).stem
save_dir = ROOT / "runs" / f"benchmark_{dataset_name}"
run_dir = save_dir / model_name
weights_dir = run_dir / "weights"
results_dir = ROOT / "runs" / ".benchmark_results"
results_dir.mkdir(parents=True, exist_ok=True)
results_file = results_dir / f"{model_name}.json"
results_file.unlink(missing_ok=True)
train_worker = results_dir / f"_train_{model_name}.py"
train_worker.write_text(
TRAIN_WORKER_TEMPLATE.format(
root=str(ROOT),
model_name=model_name,
model_cfg=model_cfg,
is_rtdetr=is_rtdetr,
data=args.data,
epochs=args.epochs,
batch=args.batch,
imgsz=args.imgsz,
device_ids=args.device,
workers=args.workers,
seed=args.seed,
save_dir=str(save_dir),
),
encoding="utf-8",
)
if num_gpus > 1:
train_cmd = [
sys.executable,
"-m",
"torch.distributed.run",
"--nproc_per_node",
str(num_gpus),
"--master_port",
str(find_free_port()),
str(train_worker),
]
else:
train_cmd = [sys.executable, str(train_worker)]
print("[TRAIN CMD]", " ".join(train_cmd))
train_proc = subprocess.run(train_cmd, cwd=str(ROOT), env=clean_ddp_env(os.environ))
if train_proc.returncode != 0:
print(f"[ERROR] {model_name} training failed with exit code {train_proc.returncode}")
return None
best = weights_dir / "best.pt"
last = weights_dir / "last.pt"
weights = best if best.exists() else last
if not weights.exists():
print(f"[ERROR] no checkpoint found for {model_name}: expected {best} or {last}")
return None
val_batch = args.val_batch or max(args.batch // num_gpus, 1)
val_worker = results_dir / f"_val_{model_name}.py"
val_worker.write_text(
VAL_WORKER_TEMPLATE.format(
root=str(ROOT),
model_name=model_name,
weights=str(weights),
is_rtdetr=is_rtdetr,
data=args.data,
imgsz=args.imgsz,
batch=val_batch,
device=args.val_device,
workers=args.workers,
results_file=str(results_file),
),
encoding="utf-8",
)
print(f">>> Validating: {model_name} with {weights}")
val_env = clean_ddp_env(os.environ)
val_env["CUDA_VISIBLE_DEVICES"] = args.device
val_cmd = [sys.executable, str(val_worker)]
print("[VAL CMD]", " ".join(val_cmd))
val_proc = subprocess.run(val_cmd, cwd=str(ROOT), env=val_env)
if val_proc.returncode != 0:
print(f"[ERROR] {model_name} validation failed with exit code {val_proc.returncode}")
return None
if not results_file.exists():
print(f"[ERROR] result file was not generated: {results_file}")
return None
with open(results_file, "r", encoding="utf-8") as f:
metrics = json.load(f)
print(f"\n--- {model_name} metrics ---")
for key, value in metrics.items():
print(f"{key}: {value}")
return metrics
def find_best(all_results: list[dict]) -> dict[str, dict]:
return {
"mAP50": max(all_results, key=lambda x: x["mAP50"]),
"mAP50_95": max(all_results, key=lambda x: x["mAP50_95"]),
"speed": min(all_results, key=lambda x: x["inference_ms"]),
"precision": max(all_results, key=lambda x: x["precision"]),
"recall": max(all_results, key=lambda x: x["recall"]),
}
def print_summary_table(all_results: list[dict]) -> tuple[list[dict], dict[str, dict]]:
print("\n\n" + "=" * 110)
print("VisDrone multi-model benchmark summary".center(110))
print("=" * 110)
print(
f"{'Model':<14s}"
f"{'Params(M)':>10s}"
f"{'Infer(ms)':>10s}"
f"{'mAP50':>10s}"
f"{'mAP50-95':>10s}"
f"{'Precision':>10s}"
f"{'Recall':>10s}"
)
print("-" * 110)
sorted_results = sorted(all_results, key=lambda x: x["mAP50_95"], reverse=True)
for r in sorted_results:
print(
f"{r['model']:<14s}"
f"{r['params_M']:>10.2f}"
f"{r['inference_ms']:>10.2f}"
f"{r['mAP50']:>10.4f}"
f"{r['mAP50_95']:>10.4f}"
f"{r['precision']:>10.4f}"
f"{r['recall']:>10.4f}"
)
print("-" * 110)
best = find_best(all_results)
print(f"Best mAP50: {best['mAP50']['model']} ({best['mAP50']['mAP50']})")
print(f"Best mAP50-95: {best['mAP50_95']['model']} ({best['mAP50_95']['mAP50_95']})")
print(f"Fastest: {best['speed']['model']} ({best['speed']['inference_ms']} ms)")
print(f"Best precision: {best['precision']['model']} ({best['precision']['precision']})")
print(f"Best recall: {best['recall']['model']} ({best['recall']['recall']})")
print("=" * 110)
return sorted_results, best
def save_reports(sorted_results: list[dict], best: dict[str, dict], dataset_name: str) -> None:
csv_path = ROOT / f"benchmark_results_{dataset_name}_fixed.csv"
with open(csv_path, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(
f,
fieldnames=["model", "params_M", "inference_ms", "mAP50", "mAP50_95", "precision", "recall"],
)
writer.writeheader()
writer.writerows(sorted_results)
print(f"\nCSV saved to: {csv_path}")
md_path = ROOT / f"benchmark_results_{dataset_name}_fixed.md"
lines = [
"# VisDrone multi-model benchmark results",
"",
"| Model | Params (M) | Inference (ms) | mAP50 | mAP50-95 | Precision | Recall |",
"|-------|------------|----------------|-------|----------|-----------|--------|",
]
for r in sorted_results:
lines.append(
f"| {r['model']} | {r['params_M']:.2f} | {r['inference_ms']:.2f} "
f"| {r['mAP50']:.4f} | {r['mAP50_95']:.4f} "
f"| {r['precision']:.4f} | {r['recall']:.4f} |"
)
lines += [
"",
f"- Best mAP50: {best['mAP50']['model']} ({best['mAP50']['mAP50']})",
f"- Best mAP50-95: {best['mAP50_95']['model']} ({best['mAP50_95']['mAP50_95']})",
f"- Fastest: {best['speed']['model']} ({best['speed']['inference_ms']} ms)",
f"- Best precision: {best['precision']['model']} ({best['precision']['precision']})",
f"- Best recall: {best['recall']['model']} ({best['recall']['recall']})",
]
with open(md_path, "w", encoding="utf-8") as f:
f.write("\n".join(lines))
print(f"Markdown saved to: {md_path}")
def main() -> None:
if "LOCAL_RANK" in os.environ or "RANK" in os.environ:
raise SystemExit("Run this script with plain python, not torchrun.")
args = parse_args()
models = selected_models(args)
print(f"Devices: {args.device}")
print(f"Dataset: {args.data} | Epochs: {args.epochs} | Batch: {args.batch} | ImgSz: {args.imgsz}")
print(f"Val device: {args.val_device} | Val batch: {args.val_batch or 'batch/gpu_count'}")
print(f"Models: {list(models)}")
all_results = []
for name, (cfg, is_rtdetr) in models.items():
metrics = run_one_model(name, cfg, is_rtdetr, args)
if metrics is not None:
all_results.append(metrics)
if not all_results:
print("\nAll model runs failed. No report generated.")
return
sorted_results, best = print_summary_table(all_results)
save_reports(sorted_results, best, Path(args.data).stem)
if __name__ == "__main__":
main()
这些模型会在相同的 VisDrone2019-DET 数据集上训练,并统一记录参数量、推理耗时、mAP50、mAP50-95、Precision 和 Recall 等指标。训练结束后,脚本会在项目根目录生成 benchmark_results_VisDrone_fixed.csv 和 benchmark_results_VisDrone_fixed.md,便于后续整理为文章中的结果表;每个模型的中间验证指标会先保存到 runs/.benchmark_results/*.json。
5. 为什么需要 benchmark_visdrone_fixed.py
单独训练一个模型时,直接写一条 model.train() 命令就可以完成实验。但当我们需要连续训练多个 YOLO 版本时,脚本设计会变得更重要,尤其是在 8 卡服务器上使用 DDP 训练时。
最初的批量训练脚本容易出现一个隐蔽问题:训练阶段使用多卡 DDP,而训练结束后的验证只在 rank 0 进程里执行。这样会导致验证器仍处在分布式上下文中,却无法从其他 rank 收集指标,轻则卡住,重则出现 NCCL 或 dist.gather_object() 相关错误。
benchmark_visdrone_fixed.py 的核心修复思路是把训练和验证拆开:
- 每个模型训练时,由父进程调用
torch.distributed.run启动 DDP。 - DDP 训练结束后,所有训练 rank 退出。
- 父进程再单独启动普通 Python 子进程,加载该模型的
best.pt或last.pt做验证。 - 验证结果写入独立 JSON 文件。
- 父进程读取 JSON,并在所有模型完成后生成汇总表。
这种流程的好处是训练阶段可以充分使用多卡,验证阶段则避免了 DDP 残留上下文造成的通信问题。对于长时间批量实验,这比把所有逻辑塞在同一个 Python 进程里更稳定,也更容易定位失败模型。
6. 实验参数设置
当前脚本默认参数如下:
text
data = ultralytics/cfg/datasets/VisDrone.yaml
epochs = 300
batch = 128
imgsz = 640
device = 0,1,2,3,4,5,6,7
workers = 32
seed = 42
示例运行命令:
bash
python benchmark_visdrone_fixed.py \
--data ultralytics/cfg/datasets/VisDrone.yaml \
--epochs 300 \
--batch 128 \
--imgsz 640 \
--device 0,1,2,3,4,5,6,7 \
--workers 32
这里的 batch=128 是全局 batch size。对于 8 卡训练,每张卡大约分到 16 张图像。验证阶段默认使用 batch / GPU 数量 作为验证 batch,避免将训练阶段的全局 batch 直接压到单卡验证上。
7. 评价指标
为了方便后续横向比较,本文保留以下指标:
| 指标 | 含义 |
|---|---|
| Params(M) | 模型参数量,反映模型规模 |
| Inference(ms) | 单图推理耗时,反映速度 |
| mAP50 | IoU=0.50 下的平均精度 |
| mAP50-95 | COCO 风格 mAP,更严格,综合 IoU=0.50:0.95 |
| Precision | 检测结果中预测为正样本的准确程度 |
| Recall | 真实目标被检出的比例 |
对 VisDrone 这类小目标密集数据集来说,不能只看 mAP50。mAP50 往往对定位误差更宽容,而 mAP50-95 对边界框质量要求更高,更能反映模型在小目标精确定位上的能力。与此同时,Recall 也非常关键,因为无人机检测场景中漏检小目标是最常见的问题。
8. 后续结果分析计划
目前模型仍在训练中,因此本文暂不放最终结果表。训练完成后,下一部分将补充:
- 各 YOLO 版本在 VisDrone 验证集上的完整指标表;
- mAP50 与 mAP50-95 的差异分析;
- 速度、参数量和精度之间的权衡;
- 小目标场景下 Recall 偏低的可能原因;
- 原始模型的主要短板;
- 后续改进方向,包括小目标检测头、多尺度特征融合、注意力机制、损失函数和数据增强策略。
这组实验的价值不在于单纯比较"哪个版本分数最高",而在于为后续改进建立一个可信的参照系。只有先把原始模型在同一数据集、同一训练策略、同一评价流程下跑清楚,后续的改进结果才有解释空间。
参考资料
- VisDrone 官方数据集仓库:https://github.com/VisDrone/VisDrone-Dataset
- Ultralytics VisDrone 数据集文档:https://docs.ultralytics.com/datasets/detect/visdrone/
- Vision Meets Drones: A Challenge:https://arxiv.org/abs/1804.07437
- Detection and Tracking Meet Drones Challenge:https://arxiv.org/abs/2001.06303