【感知实战·数据增强篇】深度解析目标检测中的图片数据增强算法，多图演示效果

作者：探物 AI > 核心导读： 在视觉深度学习中，数据决定了模型上限。对于算力有限或样本稀缺的开发者，数据增强（Data Augmentation）就是性价比最高的"炼丹秘籍"，很多模型准确度的提高就是靠增加数据，增加数据的普适性，因此本文统计数据增强算法，进行汇总，并进行效果演示。

一、数据增强的四个维度

1. 几何增强和像素增强

这是最经典的策略，旨在告诉模型：物体在不同光照、不同角度下还是同一个东西。

几何变换：水平/垂直翻转、旋转、随机裁剪（Crop）、缩放。

python 复制代码

# ─── 1. 几何变换 ────
def aug_geometric(img: np.ndarray) -> dict:
  h, w = img.shape[:2]
  results = {}

  # 水平翻转
  results["HorizontalFlip"] = cv2.flip(img, 1)

  # 垂直翻转
  results["VerticalFlip"] = cv2.flip(img, 0)

  # 旋转 45°
  M = cv2.getRotationMatrix2D((w // 2, h // 2), 45, 1.0)
  results["Rotate45"] = cv2.warpAffine(img, M, (w, h))

  # 随机裁剪（取中心 70%）
  cy, cx = int(h * 0.15), int(w * 0.15)
  crop = img[cy:h - cy, cx:w - cx]
  results["CenterCrop70%"] = cv2.resize(crop, (w, h))

  # 缩放（先缩小再 resize 回来）
  small = cv2.resize(img, (int(w * 0.6), int(h * 0.6)))
  results["Scale0.6"] = cv2.resize(small, (w, h))

  return results

色彩抖动 ：调整亮度、对比度、饱和度及色相（Hue）。

python 复制代码

# ─── 2. 色彩抖动 ──────
def aug_color_jitter(img: np.ndarray) -> dict:
  results = {}
  img_hsv = cv2.cvtColor(img, cv2.COLOR_RGB2HSV).astype(np.float32)

  # 亮度 +60
  bright = img.astype(np.float32)
  bright = np.clip(bright + 60, 0, 255).astype(np.uint8)
  results["Brightness+60"] = bright

  # 对比度 ×1.8
  contrast = np.clip(img.astype(np.float32) * 1.8, 0, 255).astype(np.uint8)
  results["Contrast×1.8"] = contrast

  # 饱和度 ×2（HSV 的 S 通道）
  sat = img_hsv.copy()
  sat[..., 1] = np.clip(sat[..., 1] * 2, 0, 255)
  results["Saturation×2"] = cv2.cvtColor(sat.astype(np.uint8), cv2.COLOR_HSV2RGB)

  # 色相偏移 +30°（HSV 的 H 通道，范围 0-179）
  hue = img_hsv.copy()
  hue[..., 0] = (hue[..., 0] + 30) % 180
  results["HueShift+30"] = cv2.cvtColor(hue.astype(np.uint8), cv2.COLOR_HSV2RGB)

  return results

噪声注入：高斯噪声、椒盐噪声（模拟传感器在极端环境下的干扰）。

python 复制代码

 # ─── 3. 噪声注入 ─────
def aug_noise(img: np.ndarray) -> dict:
   results = {}
   h, w = img.shape[:2]

   # 高斯噪声
   gauss = np.random.normal(0, 25, img.shape).astype(np.float32)
   noisy = np.clip(img.astype(np.float32) + gauss, 0, 255).astype(np.uint8)
   results["GaussianNoise"] = noisy

   # 椒盐噪声（1% 像素）
   sp = img.copy()
   n_pixels = int(h * w * 0.01)
   # 盐
   rows = np.random.randint(0, h, n_pixels)
   cols = np.random.randint(0, w, n_pixels)
   sp[rows, cols] = 255
   # 椒
   rows = np.random.randint(0, h, n_pixels)
   cols = np.random.randint(0, w, n_pixels)
   sp[rows, cols] = 0
   results["Salt&PepperNoise"] = sp

   return results

2. 抗遮挡强化数据增强

为了防止模型"死记硬背"，我们故意破坏图像的局部信息，强迫模型根据残缺特征识别物体，增强模型对于遮挡物的检测。

Random Erasing：随机擦除一个矩形区域。

python 复制代码

# ─── 4. Random Erasing ───────
def aug_random_erasing(img: np.ndarray,
                      erase_ratio: float = 0.15,
                      fill: int = 128) -> np.ndarray:
  """随机擦除一个矩形区域，填充灰色"""
  out = img.copy()
  h, w = out.shape[:2]
  eh = int(h * erase_ratio)
  ew = int(w * erase_ratio)
  top  = np.random.randint(0, h - eh)
  left = np.random.randint(0, w - ew)
  out[top:top + eh, left:left + ew] = fill
  return out

GridMask：这个主要在于目标分割算法中进行的数据增强，有规则地打上黑色网格。在多任务分割中非常有效，能强迫模型学习物体的全局结构而非局部纹理。

python 复制代码

# ─── 5. GridMask ────────
def aug_gridmask(img: np.ndarray,
                d: int = 48,
                ratio: float = 0.4) -> np.ndarray:
  """
  生成规则网格掩码并应用到图像
  d     : 网格单元大小（像素）
  ratio : 每个单元内遮挡区域的比例
  """
  out  = img.copy()
  h, w = out.shape[:2]
  mask = np.ones((h, w), dtype=np.uint8)

  keep = int(d * (1 - ratio))   # 每格保留的像素数
  for y in range(0, h, d):
      for x in range(0, w, d):
          # 遮挡右下方 (d-keep) × (d-keep) 区域
          y1 = min(y + keep, h)
          x1 = min(x + keep, w)
          mask[y1:y + d, x1:x + d] = 0

  out[mask == 0] = 0
  return out

3. 跨样本混合（YOLO 的成名作）

这是目前感知类算法（尤其是 YOLOv5 到最新的 YOLO26）的标配，也是提分最明显的手段，核心代码如下。

Mixup：将两张图按比例透明叠加，标签也同步融合。

python 复制代码

# ─── 6. Mixup ───
def aug_mixup(img_a: np.ndarray, img_b: np.ndarray,
            alpha: float = 0.4) -> np.ndarray:
  """
  按 alpha 比例线性叠加两张图
  标签融合：label = alpha * label_a + (1-alpha) * label_b
  """
  lam = np.random.beta(alpha, alpha)
  mixed = (lam * img_a.astype(np.float32) +
           (1 - lam) * img_b.astype(np.float32))
  return np.clip(mixed, 0, 255).astype(np.uint8)

CutMix：把 A 图的一部分剪掉，贴上 B 图的内容。这里的作用其实是很多的，鉴于篇幅限制，简单说可以提高目标被遮挡情况下的检测能力（把A图的一部分截取）、第二是提高每一张图片的密度，迫使卷积核在每一个像素点都进行学习，从而提高mAP(平均精度)，第三则是提高模型对于目标边界的敏感度，AI必须分清楚是物体A的像素还是物体B的像素，这个对物体定位或者目标分割作用提升巨大。
python 复制代码
```
## ─── 7. CutMix ─────
def aug_cutmix(img_a: np.ndarray, img_b: np.ndarray,
             cut_ratio: float = 0.3) -> np.ndarray:
  """
  将 img_b 的一块矩形区域贴到 img_a 上
  标签权重 = 1 - 被替换区域面积占比
  """
  out = img_a.copy()
  h, w = out.shape[:2]
  ch = int(h * cut_ratio)
  cw = int(w * cut_ratio)
  top  = np.random.randint(0, h - ch)
  left = np.random.randint(0, w - cw)
  out[top:top + ch, left:left + cw] = img_b[top:top + ch, left:left + cw]
  return out
```

Mosaic (马赛克增强)：将 4 张图随机拼接。极大丰富了背景，在一个 Batch 里大幅增加了目标密度，对小目标检测有奇效。

python 复制代码

# ─── 8. Mosaic ─────
def aug_mosaic(imgs: list) -> np.ndarray:
  """
  将 4 张图随机拼接成 2×2 马赛克图
  imgs: list of 4 H×W×3 numpy arrays（尺寸需相同）
  """
  assert len(imgs) >= 4, "Mosaic 需要至少 4 张图"
  h, w = imgs[0].shape[:2]
  # 拼接：上下各拼两张
  top    = np.concatenate([imgs[0], imgs[1]], axis=1)
  bottom = np.concatenate([imgs[2], imgs[3]], axis=1)
  mosaic = np.concatenate([top, bottom], axis=0)
  return cv2.resize(mosaic, (w, h))   # resize 回原始尺寸

二、进阶：针对感知类算法的普适性数据增强

如果你正在处理诸如**"海面船舶检测"** 或**"复杂工业视觉"**，以下两种方法是提分关键：

1. Copy-Paste (复制粘贴大法)

这是实例分割中的神技。通过将目标从原图中精确抠出，随机粘贴到不同的背景图中。它解决了数据不平衡的问题、一些长尾分布的问题，比如你当前的训练数据集，5个类别，但其中1个类别只有5%，这是明显的数据不平衡，此时可以选择此数据增强算法来解决准确度不达标。

优势：完美解决"长尾分布"问题。如果某种罕见样本只有 5 张，通过 Copy-Paste，你可以让它出现在 5000 张图中。

python 复制代码

# ─── 9. Copy-Paste（简化版像素级演示）────────────────
def aug_copy_paste(src: np.ndarray, dst: np.ndarray,
                obj_ratio: float = 0.25) -> np.ndarray:
 """
 从 src 中心裁出目标区域（模拟"抠图"），
 随机贴到 dst 的随机位置上。
 真实场景需配合实例分割掩码使用。
 """
 out = dst.copy()
 h, w = src.shape[:2]
 oh = int(h * obj_ratio)
 ow = int(w * obj_ratio)
 # 裁出 src 正中间的矩形作为"目标"
 cy, cx = h // 2, w // 2
 obj = src[cy - oh // 2:cy + oh // 2, cx - ow // 2:cx + ow // 2]

 # 随机贴到 dst
 dh, dw = out.shape[:2]
 max_top  = max(dh - oh, 1)
 max_left = max(dw - ow, 1)
 top  = np.random.randint(0, max_top)
 left = np.random.randint(0, max_left)
 out[top:top + oh, left:left + ow] = obj
 return out

2. 特征级增强 (ISDA)

传统的数据增强是"皮囊"的改变，而ISDA相当于"灵魂"的改变，它不再对原始像素进行翻转或裁剪，而是在模型的"特征空间"里进行改动。实际上他的工作原理，是通过计算图片特征的协方差矩阵，沿着语义变化的概率方向添加噪声 ，这意味着模型在训练时，实际上是在成千上万种"不存在但合理"的变体中进行学习。

（ISDA并没有现成的函数，其实它是直接集成在模型的损失函数部分，通过实时统计特征的分布，让模型在每一次迭代时，都类似训练了千万种图片，这种隐式增强，让轻量化模型也可以跑出大模型的效果。）

💻 核心实现（代码演示）：

python 复制代码

import torch
import torch.nn as nn

class ISDALoss(nn.Module):
  def __init__(self, feature_num, class_num):
    super(ISDALoss, self).__init__()
    # 记录每个类别的特征期望和协方差
    self.estimator = EstimatorCV(feature_num, class_num)

  def forward(self, features, y, target_layer):
    # 1. 实时更新特征的分布统计（均值和方差）
    self.estimator.update_CV(features, y)
    
    # 2. 计算特征在空间中的"扰动"
    # 这里就是 ISDA 的核心：利用协方差矩阵对分类器的权重进行补偿
    isda_logits = target_layer(features) + self.estimator.get_purturb(features, y)
    
    # 3. 依然使用标准的交叉熵，但输入的是"增强后"的逻辑值
    return nn.CrossEntropyLoss()(isda_logits, y)

🔍 总结：如何挑选你的"增强全家桶"？

业务痛点	推荐策略
小目标、远距离检测	Mosaic + Mixup
目标遮挡、背景嘈杂	CutMix + GridMask
样本极少、算力吃紧	Copy-Paste + ISDA

🔜 下期预告：雨雾数据集构造，提升恶劣环境下的目标检测能力

工业级流水线 ：深度解析基于 Albumentations 的增强方案。
一键适配 ：完美支持 YOLO 格式，实现 Bbox 边界框坐标自动同步。
实战模拟 ：手把手教你如何构造逼真的雨天、雾天传感器数据集。

关注"探物 AI"，明天见！

💡 资源获取 ：后台已为你整理好本文涉及的核心算法 Python 源代码文件。关注并回复"数据增强"即可获取网盘下载链接。

【感知实战·数据增强篇】深度解析目标检测中的图片数据增强算法，多图演示效果

【感知实战·数据增强篇】深度解析目标检测中的图片数据增强算法，多图演示效果

一、 数据增强的四个维度

1. 几何增强和像素增强

2. 抗遮挡强化数据增强

3. 跨样本混合（YOLO 的成名作）

二、 进阶：针对感知类算法的普适性数据增强

1. Copy-Paste (复制粘贴大法)

2. 特征级增强 (ISDA)

🔍 总结：如何挑选你的"增强全家桶"？

🔜 下期预告：雨雾数据集构造，提升恶劣环境下的目标检测能力

一、数据增强的四个维度

二、进阶：针对感知类算法的普适性数据增强