【CVPR】Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

论文链接:Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

代码链接:https://github.com/nullmax-vision/QAF2D

会议/期刊:CVPR2024

Nullmax挂了名字的论文,我对nullmax还比较知晓。

我们来简单看看。

1、性能提升

看结果是有提升。

看看摘要。

2、abstract

从2D检测中推荐3D query,StreamPETR 的RPN 头的TopK也是这个意思啊

Multi-camera-based 3D object detection has made notable progress in the past several years. However, we observe that there are cases (e.g. faraway regions) in which popular 2D object detectors are more reliable than stateof-the-art 3D detectors. In this paper, to improve the performance of query-based 3D object detectors, we present a novel query generating approach termed QAF2D, which infers 3D query anchors from 2D detection results. A 2D bounding box of an object in an image is lifted to a set of 3D anchors by associating each sampled point within the box with depth, yaw angle, and size candidates. Then, the validity of each 3D anchor is verified by comparing its projection in the image with its corresponding 2D box, and only valid anchors are kept and used to construct queries. The class information ofthe 2D bounding box associated with each query is also utilized to match the predicted boxes with ground truth for the set-based loss. The image feature extraction backbone is shared between the 3D detector and 2D detector by adding a small number of prompt parameters. We integrate QAF2D into three popular query-based 3D object detectors and carry out comprehensive evaluations on the nuScenes dataset.

3、论文主要贡献

个人觉得,这个工程实践中,主动集成进入StreamPETR、SparseBEV 和 BEVFormer,是比较好的,对大家也是一个不错的代码范本。

The contributions of our paper are summarized as follows:

• We propose to generate 3D query anchors from 2D bounding boxes so that the results of the more reliable 2D detector can be directly used to improve the 3D detection performance.

• We share the image feature extraction backbone between the 3D and 2D detectors by visual prompts for efficiency and successfully train the network in two stages.

• Consistent performance improvement is achieved on the nuScenes dataset when the proposed QAF2D is integrated into three query-based 3D object detectors, and it shows the effectiveness and generalization ability of our proposed approach.

4、 思路

思路的局限性,作者自己说的:

提出从二维框生成三维查询锚点,以便利用更可靠的二维检测结果来提升三维检测器的性能。为了在保持三维检测器性能的前提下,实现二维和三维检测器之间图像特征骨干网络的共享,我们设计了一种结合视觉提示的两阶段优化方法。

局限性在于,三维检测结果依赖于二维检测器的质量(尽管对其并不敏感)。如果二维检测器漏检了某个目标,基于查询的三维检测器就很难恢复该漏检目标

同时,将我们方法生成的三维锚点与随机锚点直接结合,并不能产生显著改进 。我们将在未来的工作中研究如何实现这两种锚点之间的协同作用。

非常干净的论文,适合作为练手。StreamPETR本身也是很干净的项目。

相关推荐
HaiLang_IT2 小时前
2026 年计算机视觉方向选题方向及题目推荐(含图像分类与识别、目标检测与跟踪、图像分割方向)
目标检测·计算机视觉·分类
fie88892 小时前
在图像增强的领域中,使用梯度、对比度、信息熵、亮度进行图像质量评价
图像处理·人工智能·计算机视觉
Easonmax2 小时前
从0到1:Qwen-1.8B-Chat 在昇腾Atlas 800T A2上的部署与实战指南前言
人工智能·pytorch·深度学习
小小工匠2 小时前
LLM - 生产级 AI Agent 设计手册:从感知、记忆到决策执行的全链路架构解析
人工智能·架构
Baihai_IDP2 小时前
大家都可以调用LLM API,AI套壳产品的护城河在哪里?
人工智能·llm·ai编程
北京耐用通信3 小时前
耐达讯自动化PROFIBUS三路中继器:突破工业通信距离与干扰限制的利器
人工智能·物联网·自动化·信息与通信
德迅云安全—珍珍8 小时前
2026 年网络安全预测:AI 全面融入实战的 100+行业洞察
人工智能·安全·web安全
数新网络10 小时前
CyberScheduler —— 打破数据调度边界的核心引擎
人工智能
Codebee11 小时前
Ooder框架8步编码流程实战 - DSM组件UI统计模块深度解析
人工智能