【CVPR】Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

论文链接:Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

代码链接:https://github.com/nullmax-vision/QAF2D

会议/期刊:CVPR2024

Nullmax挂了名字的论文,我对nullmax还比较知晓。

我们来简单看看。

1、性能提升

看结果是有提升。

看看摘要。

2、abstract

从2D检测中推荐3D query,StreamPETR 的RPN 头的TopK也是这个意思啊

Multi-camera-based 3D object detection has made notable progress in the past several years. However, we observe that there are cases (e.g. faraway regions) in which popular 2D object detectors are more reliable than stateof-the-art 3D detectors. In this paper, to improve the performance of query-based 3D object detectors, we present a novel query generating approach termed QAF2D, which infers 3D query anchors from 2D detection results. A 2D bounding box of an object in an image is lifted to a set of 3D anchors by associating each sampled point within the box with depth, yaw angle, and size candidates. Then, the validity of each 3D anchor is verified by comparing its projection in the image with its corresponding 2D box, and only valid anchors are kept and used to construct queries. The class information ofthe 2D bounding box associated with each query is also utilized to match the predicted boxes with ground truth for the set-based loss. The image feature extraction backbone is shared between the 3D detector and 2D detector by adding a small number of prompt parameters. We integrate QAF2D into three popular query-based 3D object detectors and carry out comprehensive evaluations on the nuScenes dataset.

3、论文主要贡献

个人觉得,这个工程实践中,主动集成进入StreamPETR、SparseBEV 和 BEVFormer,是比较好的,对大家也是一个不错的代码范本。

The contributions of our paper are summarized as follows:

• We propose to generate 3D query anchors from 2D bounding boxes so that the results of the more reliable 2D detector can be directly used to improve the 3D detection performance.

• We share the image feature extraction backbone between the 3D and 2D detectors by visual prompts for efficiency and successfully train the network in two stages.

• Consistent performance improvement is achieved on the nuScenes dataset when the proposed QAF2D is integrated into three query-based 3D object detectors, and it shows the effectiveness and generalization ability of our proposed approach.

4、 思路

思路的局限性,作者自己说的:

提出从二维框生成三维查询锚点,以便利用更可靠的二维检测结果来提升三维检测器的性能。为了在保持三维检测器性能的前提下,实现二维和三维检测器之间图像特征骨干网络的共享,我们设计了一种结合视觉提示的两阶段优化方法。

局限性在于,三维检测结果依赖于二维检测器的质量(尽管对其并不敏感)。如果二维检测器漏检了某个目标,基于查询的三维检测器就很难恢复该漏检目标

同时,将我们方法生成的三维锚点与随机锚点直接结合,并不能产生显著改进 。我们将在未来的工作中研究如何实现这两种锚点之间的协同作用。

非常干净的论文,适合作为练手。StreamPETR本身也是很干净的项目。

相关推荐
Jooolin5 小时前
从 DeepSeek、Qwen 到 GPT:一次企业级 AI 知识库项目的模型选型复盘
人工智能·云原生·ai编程
不羁的木木5 小时前
HarmonyOS AI开发提效工具:DevEco Code & DevEco CLI - 实战:端侧AI文字识别应用
人工智能·华为·harmonyos
蓝速科技5 小时前
蓝速科技 AI 数字人导办能力实测与人机协同价值评估
人工智能·科技
云和数据.ChenGuang5 小时前
T5大模型
人工智能·机器人·pandas·数据预处理·数据训练
哈哈,柳暗花明5 小时前
人工智能专业术语详解(O)
人工智能·专业术语
不羁的木木5 小时前
HarmonyOS AI开发提效工具:DevEco Code & DevEco CLI - 初识与配置指南
人工智能·华为·harmonyos
Kagol6 小时前
Superpowers GSD gstack AgentSkills深度测评
前端·人工智能
一切皆是因缘际会6 小时前
存算一体芯片软件双模式:单字符驱动网络(普通CPU也能跑)
人工智能·物联网·ai·系统架构·架构设计·发布订阅·存算一体
字节逆旅6 小时前
Claude Code Router 接入过程的爬坑记录
人工智能·claude
江畔柳前堤6 小时前
github实战指南01-账号配置与 SSH 密钥
运维·人工智能·深度学习·ssh·github·pyqt·信号处理