论文链接:Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors
代码链接:https://github.com/nullmax-vision/QAF2D
会议/期刊:CVPR2024
Nullmax挂了名字的论文,我对nullmax还比较知晓。
我们来简单看看。
1、性能提升

看结果是有提升。
看看摘要。
2、abstract
从2D检测中推荐3D query,StreamPETR 的RPN 头的TopK也是这个意思啊
Multi-camera-based 3D object detection has made notable progress in the past several years. However, we observe that there are cases (e.g. faraway regions) in which popular 2D object detectors are more reliable than stateof-the-art 3D detectors. In this paper, to improve the performance of query-based 3D object detectors, we present a novel query generating approach termed QAF2D, which infers 3D query anchors from 2D detection results. A 2D bounding box of an object in an image is lifted to a set of 3D anchors by associating each sampled point within the box with depth, yaw angle, and size candidates. Then, the validity of each 3D anchor is verified by comparing its projection in the image with its corresponding 2D box, and only valid anchors are kept and used to construct queries. The class information ofthe 2D bounding box associated with each query is also utilized to match the predicted boxes with ground truth for the set-based loss. The image feature extraction backbone is shared between the 3D detector and 2D detector by adding a small number of prompt parameters. We integrate QAF2D into three popular query-based 3D object detectors and carry out comprehensive evaluations on the nuScenes dataset.

3、论文主要贡献
个人觉得,这个工程实践中,主动集成进入StreamPETR、SparseBEV 和 BEVFormer,是比较好的,对大家也是一个不错的代码范本。
The contributions of our paper are summarized as follows:
• We propose to generate 3D query anchors from 2D bounding boxes so that the results of the more reliable 2D detector can be directly used to improve the 3D detection performance.
• We share the image feature extraction backbone between the 3D and 2D detectors by visual prompts for efficiency and successfully train the network in two stages.
• Consistent performance improvement is achieved on the nuScenes dataset when the proposed QAF2D is integrated into three query-based 3D object detectors, and it shows the effectiveness and generalization ability of our proposed approach.
4、 思路

思路的局限性,作者自己说的:
提出从二维框生成三维查询锚点,以便利用更可靠的二维检测结果来提升三维检测器的性能。为了在保持三维检测器性能的前提下,实现二维和三维检测器之间图像特征骨干网络的共享,我们设计了一种结合视觉提示的两阶段优化方法。
局限性在于,三维检测结果依赖于二维检测器的质量(尽管对其并不敏感)。如果二维检测器漏检了某个目标,基于查询的三维检测器就很难恢复该漏检目标。
同时,将我们方法生成的三维锚点与随机锚点直接结合,并不能产生显著改进 。我们将在未来的工作中研究如何实现这两种锚点之间的协同作用。

非常干净的论文,适合作为练手。StreamPETR本身也是很干净的项目。