用点云信息来进行监督目标检测

🍑个人主页：Jupiter. 🚀 所属专栏：传知代码 欢迎大家点赞收藏评论😊

概述
问题分析
- [Making Lift-splat work well is hard](#Making Lift-splat work well is hard)
- 深度不准确
- 深度过拟合
- 不准确的BEV语义
模型总体框架
- 显性深度监督
深度细化模块
演示效果
核心逻辑
- 模型总体框架
- 模型主干网络
使用方式
- 准备数据集
- - 训练和验证模型
部署方式

概述

雷达和相机是自动驾驶系统用于检测3D物体常用的手段，雷达数据能够产生可靠的三维检测结果，而多相机视角由于其较低的成本受到了越来越多的关注。尽管LSS的方法使多视角3D目标检测方法具有可行性，但是它关于深度的估计很差。BEVDepth是一种利用深度监督的多视角3D检测器，它利用点云的深度信息来指导深度学习。除此之外，BEVDepth创新性的提出了将相机内参和外参编码到深度学习模块中，使检测器对各种相机设置具有鲁棒性。同时，进一步引入了深度细化模块来细化学习到的深度。

参考文献：需要本文的详细复现过程的项目源码、数据和预训练好的模型可从该地址处获取完整版：地址

问题分析

Making Lift-splat work is easy

Making Lift-splat work well is hard

隐式深度监督虽然取得了合理的结果，但现有的性能远不能令人满意。在这一部分中，BEVDepth揭示了Lift-splat现有工作机制的三个不足之处，包括深度不准确、深度模糊过拟合和BEV语义不准确。BEVDepth比较了两个基线，一个是基于LSS的朴素探测器，名为Base detector，另一个是利用Dpred上的点云数据得出的额外深度监督，将其命名为增强型探测器。

深度不准确

在Base Detector中，深度模块上的梯度来自检测损失，这是间接的。BEVDepth使用常用的深度估计度量来评估nuScenes上的学习深度Dpred值，包括尺度不变对数误差(SILog),平均绝对相对误差(Abs-Rel)，均方相对误差(SqRel)和均方根误差(RMSE)。BEVDepth根据两种不同的配置评估两个探测器:针对每个对象的所有像素进行评估和针对每个对象的最佳预测像素进行评估。

在最佳匹配设置下，BaseDetector的性能几乎与全区域设置下的增强型探测器相当。这进一步证实了当检测器在没有深度损失的情况下进行训练时，它只通过学习部分深度来检测物体。

深度过拟合

基础检测器只学习预测部分区域的深度，大多数像素没有被训练来预测合理的深度，这引发了我们对深度模块泛化能力的担忧。检测器以这种方式学习深度可能对图像大小、相机参数等超参数非常敏感。

如上图所示，当图像大小发生改变时，基础模型下降的更厉害，这意味着它可能对相机内参、外参或其他超参数的噪声敏感。

不准确的BEV语义

一旦图像特征使用学习到的深度反投影到截锥特征上，就会使用体素/柱池化操作将他们聚合到BEV中。

模型总体框架

BEVDepth由四个基本组件构成：

一个图像编码器用于从N个输入视图中提取2D特征信息

一个深度网络用于估计网络的深度

一个视图转换器将2D信息转换为3D信息,其中利用了深度信息进行矩阵相乘，并且将他们转为一个整体的BEV表示

3D检测头用于推测种类，3D边界框和其他特征

显性深度监督

在基本的检测头中，关于深度模块的监督主要来自于检测头的损失函数，但由于单目深度估计估计深度信息具有一定的困难，唯一的检测损失远远不足以监督深度模块，因此BEVDepth提出了采用中间层产生的深度信息D pred

并且采用点云信息产生的GT值Dgt作为监督信号来进行计算，为了得到GT值，BEVDepth首先需要将点云坐标系转换为图像坐标系并且还要通过相机的内参转换到2.5D信息。
如上图公式所示，其中u,v表示图像坐标系下的坐标，R，t分别表示旋转矩阵和平移矩阵，K表示相机的内参。然后，为了对齐投影点云和预测深度之间的形状，在P iimg 上采用了最小池化和独热编码。将这两个操作定义为ΦDigt=ϕ(Piimg)

深度细化模块

演示效果

模型参数大小

核心逻辑

模型总体框架

python 复制代码

def forward(
        self,
        x,
        mats_dict,
        timestamps=None,
    ):
        """Forward function for BEVDepth

        Args:
            x (Tensor): Input ferature map.
            mats_dict(dict):
                sensor2ego_mats(Tensor): Transformation matrix from
                    camera to ego with shape of (B, num_sweeps,
                    num_cameras, 4, 4).
                intrin_mats(Tensor): Intrinsic matrix with shape
                    of (B, num_sweeps, num_cameras, 4, 4).
                ida_mats(Tensor): Transformation matrix for ida with
                    shape of (B, num_sweeps, num_cameras, 4, 4).
                sensor2sensor_mats(Tensor): Transformation matrix
                    from key frame camera to sweep frame camera with
                    shape of (B, num_sweeps, num_cameras, 4, 4).
                bda_mat(Tensor): Rotation matrix for bda with shape
                    of (B, 4, 4).
            timestamps (long): Timestamp.
                Default: None.

        Returns:
            tuple(list[dict]): Output results for tasks.
        """ 
        # 判断模型是否需要返回深度信息
        if self.is_train_depth and self.training:
            x, depth_pred = self.backbone(x,
                                          mats_dict,
                                          timestamps,
                                          is_return_depth=True)
            preds = self.head(x)
            return preds, depth_pred
        else:
            x = self.backbone(x, mats_dict, timestamps)
            preds = self.head(x)
            return preds

模型主干网络

python 复制代码

def forward(self,
                sweep_imgs,
                mats_dict,
                lidar_depth,
                timestamps=None,
                is_return_depth=False):
        """Forward function.

        Args:
            sweep_imgs(Tensor): Input images with shape of (B, num_sweeps,
                num_cameras, 3, H, W).
            mats_dict(dict):
                sensor2ego_mats(Tensor): Transformation matrix from
                    camera to ego with shape of (B, num_sweeps,
                    num_cameras, 4, 4).
                intrin_mats(Tensor): Intrinsic matrix with shape
                    of (B, num_sweeps, num_cameras, 4, 4).
                ida_mats(Tensor): Transformation matrix for ida with
                    shape of (B, num_sweeps, num_cameras, 4, 4).
                sensor2sensor_mats(Tensor): Transformation matrix
                    from key frame camera to sweep frame camera with
                    shape of (B, num_sweeps, num_cameras, 4, 4).
                bda_mat(Tensor): Rotation matrix for bda with shape
                    of (B, 4, 4).
            lidar_depth (Tensor): Depth generated by lidar.
            timestamps(Tensor): Timestamp for all images with the shape of(B,
                num_sweeps, num_cameras).

        Return:
            Tensor: bev feature map.
        """
        batch_size, num_sweeps, num_cams, num_channels, img_height, \
            img_width = sweep_imgs.shape
        # 获得雷达的深度值
        lidar_depth = self.get_downsampled_lidar_depth(lidar_depth)
        key_frame_res = self._forward_single_sweep(
            0,
            sweep_imgs[:, 0:1, ...],
            mats_dict,
            lidar_depth[:, 0, ...],
            is_return_depth=is_return_depth)
        if num_sweeps == 1:
            return key_frame_res

        key_frame_feature = key_frame_res[
            0] if is_return_depth else key_frame_res

        ret_feature_list = [key_frame_feature]
        for sweep_index in range(1, num_sweeps):
            with torch.no_grad():
                feature_map = self._forward_single_sweep(
                    sweep_index,
                    sweep_imgs[:, sweep_index:sweep_index + 1, ...],
                    mats_dict,
                    lidar_depth[:, sweep_index, ...],
                    is_return_depth=False)
                ret_feature_list.append(feature_map)

        if is_return_depth:
            return torch.cat(ret_feature_list, 1), key_frame_res[1]
        else:
            return torch.cat(ret_feature_list, 1)

使用方式

准备数据集

从如下网站 nuScenes数据集下载数据集，并且把放入 data/nuscenes文件夹下
形成的文件夹如下图所示

python 复制代码

data/nuscenes
├── maps
├── samples
├── sweeps
├── v1.0-test
└── v1.0-trainval

使用下列代码生成pkl文件

python 复制代码

python scripts/gen_info.py

从如下路径中选取自己需要的权重文件

训练和验证模型

python 复制代码

# train
python [EXP_PATH] --amp_backend native -b 8 --gpus 8

# eval
python [EXP_PATH] --ckpt_path [CKPT_PATH] -e -b 8 --gpus 8

部署方式

python 复制代码

# 克隆代码
git clone https://github.com/Megvii-BaseDetection/BEVDepth.git

# 创建环境
conda create -n bevdepth python=3.7

# 激活环境
conda activate bevdepth

# cd到该文件夹中
cd BEVDepth/

# 用pip下载要求的pytorch版本
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
 
# 下载mmcv,注意两个地方本地的cuda版本和pytorch版本(
pip install mmcv-full==1.5.2
 
# 下载mmdet,不需要加版本号,自动匹配
pip install mmdet==2.24.0
 
# 下载mmsegemtation
pip install mmsegmentation==0.20.0
 
# 克隆mmdetection3d
git clone https://github.com/open-mmlab/mmdetection3d.git

# cd到mmdetection3d中
cd mmdetection3d

# 确定版本
git checkout v1.0.0rc4
 
# 安装
pip install -v -e . 
 
# 返回上一级目录
cd ..
 
# 安装相关依赖
pip install -r requirements.txt
python setup.py develop
 
pip install pytorch-lightning==1.7
pip install mmengine

# 其他修改
将python3.7/site-packages/nuscenes/eval/detection/data_classes.py中的
self.class_names = self.class_range.keys()改为self.class_names = list(self.class_range.keys())

参考文献：需要本文的详细复现过程的项目源码、数据和预训练好的模型可从该地址处获取完整版：地址