【CVPR】Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

论文链接:Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

代码链接:https://github.com/nullmax-vision/QAF2D

会议/期刊:CVPR2024

Nullmax挂了名字的论文,我对nullmax还比较知晓。

我们来简单看看。

1、性能提升

看结果是有提升。

看看摘要。

2、abstract

从2D检测中推荐3D query,StreamPETR 的RPN 头的TopK也是这个意思啊

Multi-camera-based 3D object detection has made notable progress in the past several years. However, we observe that there are cases (e.g. faraway regions) in which popular 2D object detectors are more reliable than stateof-the-art 3D detectors. In this paper, to improve the performance of query-based 3D object detectors, we present a novel query generating approach termed QAF2D, which infers 3D query anchors from 2D detection results. A 2D bounding box of an object in an image is lifted to a set of 3D anchors by associating each sampled point within the box with depth, yaw angle, and size candidates. Then, the validity of each 3D anchor is verified by comparing its projection in the image with its corresponding 2D box, and only valid anchors are kept and used to construct queries. The class information ofthe 2D bounding box associated with each query is also utilized to match the predicted boxes with ground truth for the set-based loss. The image feature extraction backbone is shared between the 3D detector and 2D detector by adding a small number of prompt parameters. We integrate QAF2D into three popular query-based 3D object detectors and carry out comprehensive evaluations on the nuScenes dataset.

3、论文主要贡献

个人觉得,这个工程实践中,主动集成进入StreamPETR、SparseBEV 和 BEVFormer,是比较好的,对大家也是一个不错的代码范本。

The contributions of our paper are summarized as follows:

• We propose to generate 3D query anchors from 2D bounding boxes so that the results of the more reliable 2D detector can be directly used to improve the 3D detection performance.

• We share the image feature extraction backbone between the 3D and 2D detectors by visual prompts for efficiency and successfully train the network in two stages.

• Consistent performance improvement is achieved on the nuScenes dataset when the proposed QAF2D is integrated into three query-based 3D object detectors, and it shows the effectiveness and generalization ability of our proposed approach.

4、 思路

思路的局限性,作者自己说的:

提出从二维框生成三维查询锚点,以便利用更可靠的二维检测结果来提升三维检测器的性能。为了在保持三维检测器性能的前提下,实现二维和三维检测器之间图像特征骨干网络的共享,我们设计了一种结合视觉提示的两阶段优化方法。

局限性在于,三维检测结果依赖于二维检测器的质量(尽管对其并不敏感)。如果二维检测器漏检了某个目标,基于查询的三维检测器就很难恢复该漏检目标

同时,将我们方法生成的三维锚点与随机锚点直接结合,并不能产生显著改进 。我们将在未来的工作中研究如何实现这两种锚点之间的协同作用。

非常干净的论文,适合作为练手。StreamPETR本身也是很干净的项目。

相关推荐
子榆.32 分钟前
CANN 与主流 AI 框架集成:从 PyTorch/TensorFlow 到高效推理的无缝迁移指南
人工智能·pytorch·tensorflow
七月稻草人34 分钟前
CANN生态ops-nn:AIGC的神经网络算子加速内核
人工智能·神经网络·aigc
2501_9248787334 分钟前
数据智能驱动进化:AdAgent 多触点归因与自我学习机制详解
人工智能·逻辑回归·动态规划
芷栀夏36 分钟前
CANN开源实战:基于DrissionPage构建企业级网页自动化与数据采集系统
运维·人工智能·开源·自动化·cann
物联网APP开发从业者36 分钟前
2026年AI智能软硬件开发领域十大权威认证机构深度剖析
人工智能
MSTcheng.41 分钟前
构建自定义算子库:基于ops-nn和aclnn两阶段模式的创新指南
人工智能·cann
User_芊芊君子44 分钟前
CANN图编译器GE全面解析:构建高效异构计算图的核心引擎
人工智能·深度学习·神经网络
lili-felicity44 分钟前
CANN加速Whisper语音识别推理:流式处理与实时转录优化
人工智能·whisper·语音识别
沈浩(种子思维作者)1 小时前
系统要活起来就必须开放包容去中心化
人工智能·python·flask·量子计算
行走的小派1 小时前
引爆AI智能体时代!OPi 6Plus全面适配OpenClaw
人工智能