自动驾驶高效预训练--降低落地成本的新思路（AD-PT）

出发点：通过预训练的方式，可以利用大量无标注数据进一步提升3D检测

1. 之前的方法

1.基于对比学习的方法------利用关联帧信息构建正样本对

利用不同视角对应的点作为positive pairs：将视角进行一些变换，然后将关联的点作为正样本对，将不相关的点作为负样本对
- Pointcontrast：Unsupervised pre-training for 3d point cloud understanding （ECCV 2020）
- Exploring Geometry-aware Contrast and Clustering Harmonization for
  Self-supervised 3D Object Detection （ICCV 2021）
- ProposalContrast: Unsupervised Pre-training for LiDAR-based 3D Object Detection
利用时序上对应的点作为positive pairs：
- Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds（ICCV 2021）
利用不同物体（infrastructure 和 vehicle）上的点作为pairs：
- CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving(ICLR 2023)

2.基于MAE的方法

Voxel上：
- Voxel-MAE - Masked Autoencoders for Self-Supervised Learning on Automotive Point Clouds
BEV上
- BEV-MAE: Bird's Eye View Masked Autoencoders for Outdoor Point Cloud Pre-training
Hierarchicald空间：
- GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds (CVPR 2023)

之前工作的缺点：

所以设想希望：

预训练分为两块：

1\]类别注意的伪标签生成 ![在这里插入图片描述](https://file.jishuzhan.net/article/1723006664820396033/46c4049368000c7dab56689e73e19f75.webp)
- 1.上/下采样：将点云投影到图像上，将图像作为中间过程，对点云进行上/下采样
- 2.目标尺度缩放：对Bbox进行re-scale
在大规模预训练后，在NuScenes数据集上的表现比较差，主要因为类别的不一致性，同时，在继续训练时会抑制预训练的类别激活
Ped和Cyclist在自动驾驶场景一般检测比较差，标注比较少；在未标注的数据上接近每帧2个label没标注，可以利用上（如下图）
对于未标注图片前景物体的判断：采用两路head，分别预测
- 当两路分支的结果分别高于一定的阈值，并且俩路定位距离比较近，判断为前景
- 加入Consistency loss