前几天在研究提高小目标的识别准确率的方法,看到一篇文章是写的:RFLA (ReceptiveFieldLabelAssigner) :基于高斯感受野的分配策略,考虑锚点的感受野特性
然后我也没仔细阅读原理,直接搜能不能结合到YOLO里,看到了有相应的大牛做出来了,但是需要订阅收费
对于白嫖党表示不能忍受,果断交给GPT,立即推!
1.新建训练脚本代码
# 这是我的train.py,直接放在git下来的ultralytic项目主目录下就行了
from ultralytics import YOLO
from ultralytics.utils import LOGGER
# model = YOLO("yolov8s.pt")
model = YOLO("./ultralytics/cfg/models/v8/yolov8s.yaml")
results = model.train(data="050_VOCdevkit_visdrone_det.yaml", epochs=100, workers=4,imgsz=640, batch=16, device=1)
2.改loss.py代码
2.1 找到'/ultralytics/ultralytics/utils/loss.py'文件
PS:找不到的可能是版本不同,可按以下方法克隆我的这个代码版本
# 1. 克隆仓库
git clone https://github.com/ultralytics/ultralytics.git
cd ultralytics
# 2. 切换到和我相同的提交
git checkout c8b6a8bc
2.2 打开loss.py
后开头加上:
from ultralytics.utils.tal import ReceptiveFieldLabelAssigner
2.3 然后找到如下代码:
self.assigner = TaskAlignedAssigner(topk=10, num_classes=self.nc, alpha=0.5, beta=6.0)
这个源码用的是:TAL (TaskAlignedAssigner) :基于任务对齐的分配策略,结合分类置信度和定位精度来选择正样本
需要将其改为我们需要的RFLA ,直接替换为如下的代码:
# 新增分配器选择逻辑
assigner_type = model.args.get('assigner', 'rfla') # 默认使用rfla,后续测完效果记得改回为task,因为我发现train.py中加入assigner=rfla这个参数后会报错,解决很麻烦,暂时这么写吧
if assigner_type == 'rfla':
print('AAAAAAAAAAAAAAAAAAAAAAAAAA----rfla')
# 使用真正的ReceptiveFieldLabelAssigner类
self.assigner = ReceptiveFieldLabelAssigner(topk=13, num_classes=self.nc, alpha=1.0, beta=6.0)
else:
print("DDDDDDDDDDDDDDDDDDDDD----tal")
self.assigner = TaskAlignedAssigner(topk=10, num_classes=self.nc, alpha=0.5, beta=6.0)
2.4 找到源码这句:
_, target_bboxes, target_scores, fg_mask, _ = self.assigner(
改为:
target_labels, target_bboxes, target_scores, fg_mask, target_gt_idx = self.assigner(
ok loss.py改完了
3 改tal.py代码
位置:
# 和loss.py同路径下
/ultralytics/ultralytics/utils/tal.py
代码最后增加:
我没管这是啥意思,主要是看也看不懂...
class ReceptiveFieldLabelAssigner(nn.Module):
"""
Receptive Field Label Assignment (RFLA) 策略实现
基于高斯感受野的标签分配方法,参考论文: https://arxiv.org/abs/2208.08738
核心特性:
1. 高斯感受野建模 (Gaussian Receptive Field Modelling)
2. 感受野距离 (Receptive Field Distance, RFD)
3. 层次标签分配 (Hierarchical Label Assignment, HLA)
"""
def __init__(self, topk=13, num_classes=80, alpha=1.0, beta=6.0, eps=1e-9,
rfd_threshold=0.5, hla_enabled=True, tiny_object_threshold=16):
super().__init__()
self.topk = topk
self.num_classes = num_classes
self.bg_idx = num_classes
self.alpha = alpha
self.beta = beta
self.eps = eps
self.rfd_threshold = rfd_threshold
self.hla_enabled = hla_enabled
self.tiny_object_threshold = tiny_object_threshold # 小目标阈值(像素)
def gaussian_receptive_field(self, anchor_points, strides, image_size=640):
"""
计算高斯感受野 - 基于原始论文的更精确建模
Args:
anchor_points (Tensor): 锚点坐标, shape (num_anchors, 2) 或 (bs, num_anchors, 2)
strides (Tensor): 步长, shape (num_anchors, 1) 或 (bs, num_anchors, 1)
image_size (int): 图像尺寸
Returns:
Tensor: 高斯感受野参数, shape (num_anchors, 4) 或 (bs, num_anchors, 4) [cx, cy, sigma_x, sigma_y]
"""
# 检查输入维度
if anchor_points.dim() == 2:
# 二维输入:单个batch
num_anchors = anchor_points.shape[0]
anchor_points = anchor_points.unsqueeze(0) # (1, num_anchors, 2)
strides = strides.unsqueeze(0) # (1, num_anchors, 1)
bs = 1
else:
# 三维输入:多个batch
bs, num_anchors, _ = anchor_points.shape
# 基于原始论文的精确感受野建模
# 感受野大小与步长和网络结构相关
receptive_field_size = strides * 3.0 # 更精确的感受野大小估计
# 计算高斯分布的标准差 - 基于3σ原则
sigma = receptive_field_size / 6.0 # 标准差约为感受野大小的1/6
# 考虑感受野的椭圆形状 (不同方向可能有不同感受野)
# 在原始论文中,感受野是各向异性的
sigma_x = sigma * 1.1 # x方向稍大的标准差
sigma_y = sigma * 0.9 # y方向稍小的标准差
# 返回高斯感受野参数
# 确保所有张量维度一致
rf_params = torch.cat([anchor_points, sigma_x, sigma_y], dim=2) # (bs, num_anchors, 4)
# 如果输入是二维的,返回二维结果
if bs == 1 and anchor_points.dim() == 3:
rf_params = rf_params.squeeze(0) # (num_anchors, 4)
return rf_params
def receptive_field_distance(self, gt_bboxes, rf_params):
"""
计算感受野距离 (RFD) - 基于原始论文的更精确实现
Args:
gt_bboxes (Tensor): GT框坐标, shape (bs, num_gt, 4) [x1, y1, x2, y2]
rf_params (Tensor): 高斯感受野参数, shape (bs, num_anchors, 4) [cx, cy, sigma_x, sigma_y]
Returns:
Tensor: RFD距离, shape (bs, num_gt, num_anchors)
"""
bs, num_gt, _ = gt_bboxes.shape
_, num_anchors, _ = rf_params.shape
# 计算GT框中心点
gt_centers = (gt_bboxes[..., :2] + gt_bboxes[..., 2:]) / 2.0 # (bs, num_gt, 2)
# 扩展维度以进行批量计算
gt_centers_expanded = gt_centers.unsqueeze(2).expand(-1, -1, num_anchors, -1) # (bs, num_gt, num_anchors, 2)
rf_params_expanded = rf_params.unsqueeze(1).expand(-1, num_gt, -1, -1) # (bs, num_gt, num_anchors, 4)
# 提取高斯参数
rf_centers = rf_params_expanded[..., :2] # (bs, num_gt, num_anchors, 2)
rf_sigmas = rf_params_expanded[..., 2:] # (bs, num_gt, num_anchors, 2)
# 计算马氏距离 (考虑各向异性)
delta = gt_centers_expanded - rf_centers # (bs, num_gt, num_anchors, 2)
# 计算协方差矩阵的逆 (对角矩阵)
sigma_inv = 1.0 / rf_sigmas # (bs, num_gt, num_anchors, 2)
# 计算马氏距离: sqrt(delta^T * Sigma^-1 * delta)
mahalanobis_distance = torch.sqrt((delta * sigma_inv * delta).sum(dim=-1)) # (bs, num_gt, num_anchors)
# 计算高斯概率密度函数值
gaussian_pdf = torch.exp(-0.5 * mahalanobis_distance.pow(2)) # (bs, num_gt, num_anchors)
# RFD距离定义为1 - 高斯概率密度值
rfd_distance = 1.0 - gaussian_pdf
return rfd_distance
def hierarchical_label_assignment(self, rfd_distances, gt_bboxes, mask_gt):
"""
层次标签分配 (HLA) - 基于原始论文的精确实现
Args:
rfd_distances (Tensor): RFD距离, shape (bs, num_gt, num_anchors)
gt_bboxes (Tensor): GT框坐标, shape (bs, num_gt, 4)
mask_gt (Tensor): GT框有效掩码, shape (bs, num_gt, 1)
Returns:
Tensor: 调整后的分配权重, shape (bs, num_gt, num_anchors)
"""
if not self.hla_enabled:
return rfd_distances
bs, num_gt, num_anchors = rfd_distances.shape
# 计算GT框面积
gt_areas = (gt_bboxes[..., 2] - gt_bboxes[..., 0]) * (gt_bboxes[..., 3] - gt_bboxes[..., 1]) # (bs, num_gt)
# 识别小目标 (面积小于阈值)
tiny_object_mask = gt_areas < (self.tiny_object_threshold ** 2) # (bs, num_gt)
# 基于原始论文的多层次分配策略
weight_adjustment = torch.ones_like(rfd_distances) # (bs, num_gt, num_anchors)
# 批量计算,避免循环
for b in range(bs):
for g in range(num_gt):
if mask_gt[b, g, 0].item():
if tiny_object_mask[b, g]:
# 小目标: 更宽松的分配策略
# 降低RFD阈值,增加正样本数量
adjustment_factor = torch.where(
rfd_distances[b, g] < self.rfd_threshold * 0.5, # 显著降低阈值
3.0, # 大幅增加权重
torch.where(
rfd_distances[b, g] < self.rfd_threshold * 0.8, # 适度降低阈值
2.0, # 适度增加权重
1.0
)
)
else:
# 大目标: 更严格的分配策略
# 提高RFD阈值,减少正样本数量
adjustment_factor = torch.where(
rfd_distances[b, g] < self.rfd_threshold * 1.5, # 提高阈值
0.5, # 降低权重
torch.where(
rfd_distances[b, g] < self.rfd_threshold * 2.0, # 更高阈值
0.8, # 适度降低权重
1.0
)
)
weight_adjustment[b, g] = adjustment_factor
# 应用权重调整
adjusted_rfd = rfd_distances * weight_adjustment
return adjusted_rfd
def forward(self, pd_scores, pd_bboxes, anc_points, gt_labels, gt_bboxes, mask_gt, strides=None):
"""
实现RFLA的前向计算逻辑
Args:
pd_scores (Tensor): Predicted scores, shape (bs, num_total_anchors, num_classes)
pd_bboxes (Tensor): Predicted bboxes, shape (bs, num_total_anchors, 4)
anc_points (Tensor): Anchor points, shape (num_total_anchors, 2)
gt_labels (Tensor): Ground truth labels, shape (bs, num_gt, 1)
gt_bboxes (Tensor): Ground truth bboxes, shape (bs, num_gt, 4)
mask_gt (Tensor): Mask for valid gt bboxes, shape (bs, num_gt, 1)
strides (Tensor): Feature map strides, shape (num_total_anchors, 1)
Returns:
tuple: target_labels, target_bboxes, target_scores, fg_mask, target_gt_idx
"""
self.bs = pd_scores.shape[0]
self.n_max_boxes = gt_bboxes.shape[1]
# 如果没有GT框,则返回空的分配结果
if self.n_max_boxes == 0:
device = gt_bboxes.device
fg_mask = torch.zeros((self.bs, pd_bboxes.shape[1]), dtype=torch.bool, device=device)
target_gt_idx = torch.zeros((self.bs, pd_bboxes.shape[1]), dtype=torch.int64, device=device)
target_labels = torch.full((self.bs, pd_bboxes.shape[1]), self.bg_idx, dtype=torch.int64, device=device)
target_bboxes = torch.zeros((self.bs, pd_bboxes.shape[1], 4), dtype=torch.float32, device=device)
target_scores = torch.zeros((self.bs, pd_bboxes.shape[1], self.num_classes), dtype=torch.float32, device=device)
return target_labels, target_bboxes, target_scores, fg_mask, target_gt_idx
# 1. 高斯感受野建模
if strides is None:
# 如果没有提供步长,使用默认值
strides = torch.ones((anc_points.shape[0], 1), device=anc_points.device) * 8.0
# 扩展锚点和步长以匹配batch维度
anc_points_batch = anc_points.unsqueeze(0).expand(self.bs, -1, -1) # (bs, num_anchors, 2)
strides_batch = strides.unsqueeze(0).expand(self.bs, -1, -1) # (bs, num_anchors, 1)
rf_params = self.gaussian_receptive_field(anc_points_batch, strides_batch) # (bs, num_anchors, 4)
# 2. 计算感受野距离 (RFD)
rfd_distances = self.receptive_field_distance(gt_bboxes, rf_params) # (bs, num_gt, num_anchors)
# 3. 层次标签分配 (HLA)
adjusted_rfd = self.hierarchical_label_assignment(rfd_distances, gt_bboxes, mask_gt) # (bs, num_gt, num_anchors)
# 4. 基于RFD的分配策略
# 获取有效的GT框掩码
valid_mask = mask_gt.repeat(1, 1, pd_bboxes.shape[1]) # (bs, num_gt, num_total_anchors)
# 使用RFD替代IoU作为分配依据
rfd_similarity = 1.0 - adjusted_rfd # 将距离转换为相似度
rfd_similarity = rfd_similarity * valid_mask # 无效的GT框相似度设为0
# 结合预测分数
pd_scores_max = pd_scores.max(dim=2)[0] # (bs, num_total_anchors)
pd_scores_for_gt = pd_scores_max.unsqueeze(1).expand(-1, self.n_max_boxes, -1) # (bs, num_gt, num_total_anchors)
# 对齐度量计算 (使用RFD替代IoU)
align_metric = self.alpha * rfd_similarity.pow(self.beta) * pd_scores_for_gt # (bs, num_gt, num_anchors)
# 5. 选择top-k候选框
topk_mask = torch.zeros_like(align_metric, dtype=torch.bool) # (bs, num_gt, num_total_anchors)
topk_metric, topk_idxs = torch.topk(align_metric, self.topk, dim=-1, largest=True) # (bs, num_gt, topk)
topk_mask.scatter_(2, topk_idxs, True) # (bs, num_gt, num_total_anchors)
# 更新对齐度量
align_metric = torch.where(topk_mask, align_metric, torch.zeros_like(align_metric)) # (bs, num_gt, num_total_anchors)
# 6. 生成分配结果
mask_pos = align_metric > self.eps # (bs, num_gt, num_total_anchors)
# 处理一个锚点分配给多个GT框的情况
mask_pos_sum = mask_pos.sum(dim=1) # (bs, num_total_anchors)
if mask_pos_sum.max() > 1:
# 选择RFD相似度最大的GT框
mask_multiple_gts = (mask_pos_sum > 1).unsqueeze(1).repeat(1, self.n_max_boxes, 1) # (bs, num_gt, num_total_anchors)
max_rfd_idx = rfd_similarity.argmax(dim=1) # (bs, num_total_anchors)
is_max_rfd = torch.zeros_like(mask_pos, dtype=mask_pos.dtype) # (bs, num_gt, num_total_anchors)
is_max_rfd.scatter_(1, max_rfd_idx.unsqueeze(1), 1)
mask_pos = torch.where(mask_multiple_gts, is_max_rfd, mask_pos)
# 生成最终的分配结果
fg_mask = mask_pos.sum(dim=1) > 0 # (bs, num_total_anchors)
target_gt_idx = mask_pos.float().argmax(dim=1) # (bs, num_total_anchors)
# 处理batch索引
batch_ind = torch.arange(end=self.bs, dtype=torch.int64, device=gt_labels.device)[..., None]
target_gt_idx = target_gt_idx + batch_ind * self.n_max_boxes # (bs, num_total_anchors)
# 生成目标标签
target_labels = gt_labels.long().flatten()[target_gt_idx] # (bs, num_total_anchors)
target_labels.clamp_(0)
# 生成目标边界框
target_bboxes = gt_bboxes.view(-1, gt_bboxes.shape[-1])[target_gt_idx] # (bs, num_total_anchors, 4)
# 生成目标分数
target_scores = torch.zeros(
(target_labels.shape[0], target_labels.shape[1], self.num_classes),
dtype=torch.float32,
device=target_labels.device,
) # (bs, num_total_anchors, num_classes)
target_scores.scatter_(2, target_labels.unsqueeze(-1), 1)
fg_scores_mask = fg_mask[:, :, None].repeat(1, 1, self.num_classes) # (bs, num_total_anchors, num_classes)
target_scores = torch.where(fg_scores_mask > 0, target_scores, 0)
# 返回顺序调整为与TaskAlignedAssigner一致: target_labels, target_bboxes, target_scores, fg_mask, target_gt_idx
return target_labels, target_bboxes, target_scores, fg_mask, target_gt_idx
4 运行train.py
然后直接运行训练脚本train.py就行了,训练速度会比之前源码TAL的方法慢一些,可以理解,因为大概是loss计算那里加了很多RFLA的东西
目前我也正在测试效果,后续在跟进更新