YOLOv8目标检测融合RFLA提高小目标准确率

前几天在研究提高小目标的识别准确率的方法,看到一篇文章是写的:RFLA (ReceptiveFieldLabelAssigner) :基于高斯感受野的分配策略,考虑锚点的感受野特性

然后我也没仔细阅读原理,直接搜能不能结合到YOLO里,看到了有相应的大牛做出来了,但是需要订阅收费

对于白嫖党表示不能忍受,果断交给GPT,立即推!

1.新建训练脚本代码

复制代码
# 这是我的train.py,直接放在git下来的ultralytic项目主目录下就行了
from ultralytics import YOLO
from ultralytics.utils import LOGGER
# model = YOLO("yolov8s.pt")
model = YOLO("./ultralytics/cfg/models/v8/yolov8s.yaml")
results = model.train(data="050_VOCdevkit_visdrone_det.yaml", epochs=100, workers=4,imgsz=640, batch=16, device=1)

2.改loss.py代码

2.1 找到'/ultralytics/ultralytics/utils/loss.py'文件

PS:找不到的可能是版本不同,可按以下方法克隆我的这个代码版本

复制代码
# 1. 克隆仓库
git clone https://github.com/ultralytics/ultralytics.git
cd ultralytics

# 2. 切换到和我相同的提交
git checkout c8b6a8bc

2.2 打开loss.py后开头加上:

复制代码
from ultralytics.utils.tal import ReceptiveFieldLabelAssigner

2.3 然后找到如下代码:

复制代码
self.assigner = TaskAlignedAssigner(topk=10, num_classes=self.nc, alpha=0.5, beta=6.0)

这个源码用的是:TAL (TaskAlignedAssigner) :基于任务对齐的分配策略,结合分类置信度和定位精度来选择正样本

需要将其改为我们需要的RFLA ,直接替换为如下的代码:

复制代码
        # 新增分配器选择逻辑
        assigner_type = model.args.get('assigner', 'rfla')  # 默认使用rfla,后续测完效果记得改回为task,因为我发现train.py中加入assigner=rfla这个参数后会报错,解决很麻烦,暂时这么写吧
        if assigner_type == 'rfla':
            print('AAAAAAAAAAAAAAAAAAAAAAAAAA----rfla')
            # 使用真正的ReceptiveFieldLabelAssigner类
            self.assigner = ReceptiveFieldLabelAssigner(topk=13, num_classes=self.nc, alpha=1.0, beta=6.0)
        else:
           print("DDDDDDDDDDDDDDDDDDDDD----tal")
            self.assigner = TaskAlignedAssigner(topk=10, num_classes=self.nc, alpha=0.5, beta=6.0)

2.4 找到源码这句:

复制代码
_, target_bboxes, target_scores, fg_mask, _ = self.assigner(

改为:

复制代码
target_labels, target_bboxes, target_scores, fg_mask, target_gt_idx = self.assigner(

ok loss.py改完了

3 改tal.py代码

位置:

复制代码
# 和loss.py同路径下
/ultralytics/ultralytics/utils/tal.py

代码最后增加:

我没管这是啥意思,主要是看也看不懂...

复制代码
class ReceptiveFieldLabelAssigner(nn.Module):
    """
    Receptive Field Label Assignment (RFLA) 策略实现
    基于高斯感受野的标签分配方法,参考论文: https://arxiv.org/abs/2208.08738
    
    核心特性:
    1. 高斯感受野建模 (Gaussian Receptive Field Modelling)
    2. 感受野距离 (Receptive Field Distance, RFD)
    3. 层次标签分配 (Hierarchical Label Assignment, HLA)
    """
    def __init__(self, topk=13, num_classes=80, alpha=1.0, beta=6.0, eps=1e-9, 
                 rfd_threshold=0.5, hla_enabled=True, tiny_object_threshold=16):
        super().__init__()
        self.topk = topk
        self.num_classes = num_classes
        self.bg_idx = num_classes
        self.alpha = alpha
        self.beta = beta
        self.eps = eps
        self.rfd_threshold = rfd_threshold
        self.hla_enabled = hla_enabled
        self.tiny_object_threshold = tiny_object_threshold  # 小目标阈值(像素)

    def gaussian_receptive_field(self, anchor_points, strides, image_size=640):
        """
        计算高斯感受野 - 基于原始论文的更精确建模
        
        Args:
            anchor_points (Tensor): 锚点坐标, shape (num_anchors, 2) 或 (bs, num_anchors, 2)
            strides (Tensor): 步长, shape (num_anchors, 1) 或 (bs, num_anchors, 1)
            image_size (int): 图像尺寸
            
        Returns:
            Tensor: 高斯感受野参数, shape (num_anchors, 4) 或 (bs, num_anchors, 4) [cx, cy, sigma_x, sigma_y]
        """
        # 检查输入维度
        if anchor_points.dim() == 2:
            # 二维输入:单个batch
            num_anchors = anchor_points.shape[0]
            anchor_points = anchor_points.unsqueeze(0)  # (1, num_anchors, 2)
            strides = strides.unsqueeze(0)  # (1, num_anchors, 1)
            bs = 1
        else:
            # 三维输入:多个batch
            bs, num_anchors, _ = anchor_points.shape
        
        # 基于原始论文的精确感受野建模
        # 感受野大小与步长和网络结构相关
        receptive_field_size = strides * 3.0  # 更精确的感受野大小估计
        
        # 计算高斯分布的标准差 - 基于3σ原则
        sigma = receptive_field_size / 6.0  # 标准差约为感受野大小的1/6
        
        # 考虑感受野的椭圆形状 (不同方向可能有不同感受野)
        # 在原始论文中,感受野是各向异性的
        sigma_x = sigma * 1.1  # x方向稍大的标准差
        sigma_y = sigma * 0.9  # y方向稍小的标准差
        
        # 返回高斯感受野参数
        # 确保所有张量维度一致
        rf_params = torch.cat([anchor_points, sigma_x, sigma_y], dim=2)  # (bs, num_anchors, 4)
        
        # 如果输入是二维的,返回二维结果
        if bs == 1 and anchor_points.dim() == 3:
            rf_params = rf_params.squeeze(0)  # (num_anchors, 4)
        
        return rf_params

    def receptive_field_distance(self, gt_bboxes, rf_params):
        """
        计算感受野距离 (RFD) - 基于原始论文的更精确实现
        
        Args:
            gt_bboxes (Tensor): GT框坐标, shape (bs, num_gt, 4) [x1, y1, x2, y2]
            rf_params (Tensor): 高斯感受野参数, shape (bs, num_anchors, 4) [cx, cy, sigma_x, sigma_y]
            
        Returns:
            Tensor: RFD距离, shape (bs, num_gt, num_anchors)
        """
        bs, num_gt, _ = gt_bboxes.shape
        _, num_anchors, _ = rf_params.shape
        
        # 计算GT框中心点
        gt_centers = (gt_bboxes[..., :2] + gt_bboxes[..., 2:]) / 2.0  # (bs, num_gt, 2)
        
        # 扩展维度以进行批量计算
        gt_centers_expanded = gt_centers.unsqueeze(2).expand(-1, -1, num_anchors, -1)  # (bs, num_gt, num_anchors, 2)
        rf_params_expanded = rf_params.unsqueeze(1).expand(-1, num_gt, -1, -1)  # (bs, num_gt, num_anchors, 4)
        
        # 提取高斯参数
        rf_centers = rf_params_expanded[..., :2]  # (bs, num_gt, num_anchors, 2)
        rf_sigmas = rf_params_expanded[..., 2:]  # (bs, num_gt, num_anchors, 2)
        
        # 计算马氏距离 (考虑各向异性)
        delta = gt_centers_expanded - rf_centers  # (bs, num_gt, num_anchors, 2)
        
        # 计算协方差矩阵的逆 (对角矩阵)
        sigma_inv = 1.0 / rf_sigmas  # (bs, num_gt, num_anchors, 2)
        
        # 计算马氏距离: sqrt(delta^T * Sigma^-1 * delta)
        mahalanobis_distance = torch.sqrt((delta * sigma_inv * delta).sum(dim=-1))  # (bs, num_gt, num_anchors)
        
        # 计算高斯概率密度函数值
        gaussian_pdf = torch.exp(-0.5 * mahalanobis_distance.pow(2))  # (bs, num_gt, num_anchors)
        
        # RFD距离定义为1 - 高斯概率密度值
        rfd_distance = 1.0 - gaussian_pdf
        
        return rfd_distance

    def hierarchical_label_assignment(self, rfd_distances, gt_bboxes, mask_gt):
        """
        层次标签分配 (HLA) - 基于原始论文的精确实现
        
        Args:
            rfd_distances (Tensor): RFD距离, shape (bs, num_gt, num_anchors)
            gt_bboxes (Tensor): GT框坐标, shape (bs, num_gt, 4)
            mask_gt (Tensor): GT框有效掩码, shape (bs, num_gt, 1)
            
        Returns:
            Tensor: 调整后的分配权重, shape (bs, num_gt, num_anchors)
        """
        if not self.hla_enabled:
            return rfd_distances
            
        bs, num_gt, num_anchors = rfd_distances.shape
        
        # 计算GT框面积
        gt_areas = (gt_bboxes[..., 2] - gt_bboxes[..., 0]) * (gt_bboxes[..., 3] - gt_bboxes[..., 1])  # (bs, num_gt)
        
        # 识别小目标 (面积小于阈值)
        tiny_object_mask = gt_areas < (self.tiny_object_threshold ** 2)  # (bs, num_gt)
        
        # 基于原始论文的多层次分配策略
        weight_adjustment = torch.ones_like(rfd_distances)  # (bs, num_gt, num_anchors)
        
        # 批量计算,避免循环
        for b in range(bs):
            for g in range(num_gt):
                if mask_gt[b, g, 0].item():
                    if tiny_object_mask[b, g]:
                        # 小目标: 更宽松的分配策略
                        # 降低RFD阈值,增加正样本数量
                        adjustment_factor = torch.where(
                            rfd_distances[b, g] < self.rfd_threshold * 0.5,  # 显著降低阈值
                            3.0,  # 大幅增加权重
                            torch.where(
                                rfd_distances[b, g] < self.rfd_threshold * 0.8,  # 适度降低阈值
                                2.0,  # 适度增加权重
                                1.0
                            )
                        )
                    else:
                        # 大目标: 更严格的分配策略
                        # 提高RFD阈值,减少正样本数量
                        adjustment_factor = torch.where(
                            rfd_distances[b, g] < self.rfd_threshold * 1.5,  # 提高阈值
                            0.5,  # 降低权重
                            torch.where(
                                rfd_distances[b, g] < self.rfd_threshold * 2.0,  # 更高阈值
                                0.8,  # 适度降低权重
                                1.0
                            )
                        )
                    
                    weight_adjustment[b, g] = adjustment_factor
        
        # 应用权重调整
        adjusted_rfd = rfd_distances * weight_adjustment
        
        return adjusted_rfd

    def forward(self, pd_scores, pd_bboxes, anc_points, gt_labels, gt_bboxes, mask_gt, strides=None):
        """
        实现RFLA的前向计算逻辑
        
        Args:
            pd_scores (Tensor): Predicted scores, shape (bs, num_total_anchors, num_classes)
            pd_bboxes (Tensor): Predicted bboxes, shape (bs, num_total_anchors, 4)
            anc_points (Tensor): Anchor points, shape (num_total_anchors, 2)
            gt_labels (Tensor): Ground truth labels, shape (bs, num_gt, 1)
            gt_bboxes (Tensor): Ground truth bboxes, shape (bs, num_gt, 4)
            mask_gt (Tensor): Mask for valid gt bboxes, shape (bs, num_gt, 1)
            strides (Tensor): Feature map strides, shape (num_total_anchors, 1)
            
        Returns:
            tuple: target_labels, target_bboxes, target_scores, fg_mask, target_gt_idx
        """
        self.bs = pd_scores.shape[0]
        self.n_max_boxes = gt_bboxes.shape[1]
        
        # 如果没有GT框,则返回空的分配结果
        if self.n_max_boxes == 0:
            device = gt_bboxes.device
            fg_mask = torch.zeros((self.bs, pd_bboxes.shape[1]), dtype=torch.bool, device=device)
            target_gt_idx = torch.zeros((self.bs, pd_bboxes.shape[1]), dtype=torch.int64, device=device)
            target_labels = torch.full((self.bs, pd_bboxes.shape[1]), self.bg_idx, dtype=torch.int64, device=device)
            target_bboxes = torch.zeros((self.bs, pd_bboxes.shape[1], 4), dtype=torch.float32, device=device)
            target_scores = torch.zeros((self.bs, pd_bboxes.shape[1], self.num_classes), dtype=torch.float32, device=device)
            return target_labels, target_bboxes, target_scores, fg_mask, target_gt_idx
        
        # 1. 高斯感受野建模
        if strides is None:
            # 如果没有提供步长,使用默认值
            strides = torch.ones((anc_points.shape[0], 1), device=anc_points.device) * 8.0
        
        # 扩展锚点和步长以匹配batch维度
        anc_points_batch = anc_points.unsqueeze(0).expand(self.bs, -1, -1)  # (bs, num_anchors, 2)
        strides_batch = strides.unsqueeze(0).expand(self.bs, -1, -1)  # (bs, num_anchors, 1)
        
        rf_params = self.gaussian_receptive_field(anc_points_batch, strides_batch)  # (bs, num_anchors, 4)
        
        # 2. 计算感受野距离 (RFD)
        rfd_distances = self.receptive_field_distance(gt_bboxes, rf_params)  # (bs, num_gt, num_anchors)
        
        # 3. 层次标签分配 (HLA)
        adjusted_rfd = self.hierarchical_label_assignment(rfd_distances, gt_bboxes, mask_gt)  # (bs, num_gt, num_anchors)
        
        # 4. 基于RFD的分配策略
        # 获取有效的GT框掩码
        valid_mask = mask_gt.repeat(1, 1, pd_bboxes.shape[1])  # (bs, num_gt, num_total_anchors)
        
        # 使用RFD替代IoU作为分配依据
        rfd_similarity = 1.0 - adjusted_rfd  # 将距离转换为相似度
        rfd_similarity = rfd_similarity * valid_mask  # 无效的GT框相似度设为0
        
        # 结合预测分数
        pd_scores_max = pd_scores.max(dim=2)[0]  # (bs, num_total_anchors)
        pd_scores_for_gt = pd_scores_max.unsqueeze(1).expand(-1, self.n_max_boxes, -1)  # (bs, num_gt, num_total_anchors)
        
        # 对齐度量计算 (使用RFD替代IoU)
        align_metric = self.alpha * rfd_similarity.pow(self.beta) * pd_scores_for_gt  # (bs, num_gt, num_anchors)
        
        # 5. 选择top-k候选框
        topk_mask = torch.zeros_like(align_metric, dtype=torch.bool)  # (bs, num_gt, num_total_anchors)
        topk_metric, topk_idxs = torch.topk(align_metric, self.topk, dim=-1, largest=True)  # (bs, num_gt, topk)
        topk_mask.scatter_(2, topk_idxs, True)  # (bs, num_gt, num_total_anchors)
        
        # 更新对齐度量
        align_metric = torch.where(topk_mask, align_metric, torch.zeros_like(align_metric))  # (bs, num_gt, num_total_anchors)
        
        # 6. 生成分配结果
        mask_pos = align_metric > self.eps  # (bs, num_gt, num_total_anchors)
        
        # 处理一个锚点分配给多个GT框的情况
        mask_pos_sum = mask_pos.sum(dim=1)  # (bs, num_total_anchors)
        if mask_pos_sum.max() > 1:
            # 选择RFD相似度最大的GT框
            mask_multiple_gts = (mask_pos_sum > 1).unsqueeze(1).repeat(1, self.n_max_boxes, 1)  # (bs, num_gt, num_total_anchors)
            max_rfd_idx = rfd_similarity.argmax(dim=1)  # (bs, num_total_anchors)
            is_max_rfd = torch.zeros_like(mask_pos, dtype=mask_pos.dtype)  # (bs, num_gt, num_total_anchors)
            is_max_rfd.scatter_(1, max_rfd_idx.unsqueeze(1), 1)
            mask_pos = torch.where(mask_multiple_gts, is_max_rfd, mask_pos)
            
        # 生成最终的分配结果
        fg_mask = mask_pos.sum(dim=1) > 0  # (bs, num_total_anchors)
        target_gt_idx = mask_pos.float().argmax(dim=1)  # (bs, num_total_anchors)
        
        # 处理batch索引
        batch_ind = torch.arange(end=self.bs, dtype=torch.int64, device=gt_labels.device)[..., None]
        target_gt_idx = target_gt_idx + batch_ind * self.n_max_boxes  # (bs, num_total_anchors)
        
        # 生成目标标签
        target_labels = gt_labels.long().flatten()[target_gt_idx]  # (bs, num_total_anchors)
        target_labels.clamp_(0)
        
        # 生成目标边界框
        target_bboxes = gt_bboxes.view(-1, gt_bboxes.shape[-1])[target_gt_idx]  # (bs, num_total_anchors, 4)
        
        # 生成目标分数
        target_scores = torch.zeros(
            (target_labels.shape[0], target_labels.shape[1], self.num_classes),
            dtype=torch.float32,
            device=target_labels.device,
        )  # (bs, num_total_anchors, num_classes)
        target_scores.scatter_(2, target_labels.unsqueeze(-1), 1)
        fg_scores_mask = fg_mask[:, :, None].repeat(1, 1, self.num_classes)  # (bs, num_total_anchors, num_classes)
        target_scores = torch.where(fg_scores_mask > 0, target_scores, 0)
        
        # 返回顺序调整为与TaskAlignedAssigner一致: target_labels, target_bboxes, target_scores, fg_mask, target_gt_idx
        return target_labels, target_bboxes, target_scores, fg_mask, target_gt_idx

4 运行train.py

然后直接运行训练脚本train.py就行了,训练速度会比之前源码TAL的方法慢一些,可以理解,因为大概是loss计算那里加了很多RFLA的东西

目前我也正在测试效果,后续在跟进更新

相关推荐
数造科技2 小时前
数造科技于2025全球数据管理峰会斩获多项殊荣
大数据·人工智能·科技·业界资讯
十年一梦惊觉醒2 小时前
freeswitch集成离线语音识别funasr
人工智能·语音识别·freeswitch
J心流2 小时前
四川话ASR-微调-语音识别-Paraformer-Large
人工智能·语音识别
档案宝档案管理2 小时前
档案管理系统如何对企业效率重构与提升?
大数据·数据库·人工智能·重构·档案·档案管理
苍何2 小时前
字节发布最新豆包视觉推理模型,一手实测来啦!
人工智能
苍何2 小时前
国产最强开源Coding模型发布了!!
人工智能
就不爱吃大米饭2 小时前
ChatGPT被降智怎么办?自查方法+恢复指南
网络·人工智能·chatgpt
RWKV元始智能3 小时前
RWKV7-G1a 2.9B 推理模型开源发布,继续推进纯 RNN 模型的思考能力
人工智能·架构·开源
七牛云行业应用3 小时前
GPT-5 撼动量子计算:AI 在科研领域的颠覆性应用
人工智能·gpt·量子计算·gpt5