【CVPR】Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

论文链接:Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

代码链接:https://github.com/nullmax-vision/QAF2D

会议/期刊:CVPR2024

Nullmax挂了名字的论文,我对nullmax还比较知晓。

我们来简单看看。

1、性能提升

看结果是有提升。

看看摘要。

2、abstract

从2D检测中推荐3D query,StreamPETR 的RPN 头的TopK也是这个意思啊

Multi-camera-based 3D object detection has made notable progress in the past several years. However, we observe that there are cases (e.g. faraway regions) in which popular 2D object detectors are more reliable than stateof-the-art 3D detectors. In this paper, to improve the performance of query-based 3D object detectors, we present a novel query generating approach termed QAF2D, which infers 3D query anchors from 2D detection results. A 2D bounding box of an object in an image is lifted to a set of 3D anchors by associating each sampled point within the box with depth, yaw angle, and size candidates. Then, the validity of each 3D anchor is verified by comparing its projection in the image with its corresponding 2D box, and only valid anchors are kept and used to construct queries. The class information ofthe 2D bounding box associated with each query is also utilized to match the predicted boxes with ground truth for the set-based loss. The image feature extraction backbone is shared between the 3D detector and 2D detector by adding a small number of prompt parameters. We integrate QAF2D into three popular query-based 3D object detectors and carry out comprehensive evaluations on the nuScenes dataset.

3、论文主要贡献

个人觉得,这个工程实践中,主动集成进入StreamPETR、SparseBEV 和 BEVFormer,是比较好的,对大家也是一个不错的代码范本。

The contributions of our paper are summarized as follows:

• We propose to generate 3D query anchors from 2D bounding boxes so that the results of the more reliable 2D detector can be directly used to improve the 3D detection performance.

• We share the image feature extraction backbone between the 3D and 2D detectors by visual prompts for efficiency and successfully train the network in two stages.

• Consistent performance improvement is achieved on the nuScenes dataset when the proposed QAF2D is integrated into three query-based 3D object detectors, and it shows the effectiveness and generalization ability of our proposed approach.

4、 思路

思路的局限性,作者自己说的:

提出从二维框生成三维查询锚点,以便利用更可靠的二维检测结果来提升三维检测器的性能。为了在保持三维检测器性能的前提下,实现二维和三维检测器之间图像特征骨干网络的共享,我们设计了一种结合视觉提示的两阶段优化方法。

局限性在于,三维检测结果依赖于二维检测器的质量(尽管对其并不敏感)。如果二维检测器漏检了某个目标,基于查询的三维检测器就很难恢复该漏检目标

同时,将我们方法生成的三维锚点与随机锚点直接结合,并不能产生显著改进 。我们将在未来的工作中研究如何实现这两种锚点之间的协同作用。

非常干净的论文,适合作为练手。StreamPETR本身也是很干净的项目。

相关推荐
编码小哥19 小时前
OpenCV Haar级联分类器:人脸检测入门
人工智能·计算机视觉·目标跟踪
程序员:钧念19 小时前
深度学习与强化学习的区别
人工智能·python·深度学习·算法·transformer·rag
数据与后端架构提升之路19 小时前
TeleTron 源码揭秘:如何用适配器模式“无缝魔改” Megatron-Core?
人工智能·python·适配器模式
Chef_Chen20 小时前
数据科学每日总结--Day44--机器学习
人工智能·机器学习
这张生成的图像能检测吗20 小时前
(论文速读)FR-IQA:面向广义图像质量评价:放松完美参考质量假设
人工智能·计算机视觉·图像增强·图像质量评估指标
KG_LLM图谱增强大模型21 小时前
本体论与知识图谱:揭示语义技术的核心差异
人工智能·知识图谱·本体论
JicasdC123asd21 小时前
黄瓜植株目标检测:YOLOv8结合Fasternet与BiFPN的高效改进方案
人工智能·yolo·目标检测
爱吃泡芙的小白白1 天前
深入解析:2024年AI大模型核心算法与应用全景
人工智能·算法·大模型算法
小程故事多_801 天前
攻克RAG系统最后一公里 图文混排PDF解析的挑战与实战方案
人工智能·架构·pdf·aigc
琅琊榜首20201 天前
AI+编程双驱动:高质量短剧创作全流程指南
人工智能