Sparse4D 时序输入和 Feature Queue 详解

Sparse V3论文：https://arxiv.org/pdf/2311.11722

源码地址：https://github.com/linxuewu/Sparse4D

概述

Sparse4D 使用时序融合机制来处理多帧数据，通过 InstanceBank 类管理时序实例和 feature queue。本文将详细解释时序输入和 feature queue 的结构。

1. 时序输入结构

1.1 输入组件

Sparse4D 的输入主要包含三个部分：

当前帧的多视角图像 (img)
- Shape: [batch_size, num_cams, C, H, W]
- 例如: [1, 6, 3, 256, 704] (batch=1, 6个相机, RGB, 256x704)
新初始化的实例 (instance_feature, anchor)
- instance_feature: [batch_size, num_anchor, embed_dims]
- anchor: [batch_size, num_anchor, anchor_dims] (通常是10维: x,y,z,w,l,h,yaw,vx,vy,vz)
- 例如: [1, 900, 256] 和 [1, 900, 10]
从前一帧传播的实例 (temp_instance_feature, temp_anchor)
- temp_instance_feature: [batch_size, num_temp_instances, embed_dims]
- temp_anchor: [batch_size, num_temp_instances, anchor_dims]
- 例如: [1, 600, 256] 和 [1, 600, 10]

1.2 时序信息

timestamp: 当前帧的时间戳
time_interval: 与前一帧的时间间隔
T_global: 全局变换矩阵（用于坐标变换）

2. Feature Queue 结构

2.1 InstanceBank 中的缓存

InstanceBank 类维护以下缓存变量（在 instance_bank.py 中定义）：

python 复制代码

# 缓存的时序实例特征
self.cached_feature: [batch_size, num_temp_instances, embed_dims]
# 缓存的时序 anchor
self.cached_anchor: [batch_size, num_temp_instances, anchor_dims]
# 缓存的置信度
self.confidence: [batch_size, num_temp_instances]
# 时间掩码（用于过滤过期实例）
self.mask: [batch_size, num_temp_instances]
# 元数据（包含时间戳、变换矩阵等）
self.metas: dict

2.2 Feature Queue 的工作流程

步骤 1: 获取实例 (`get` 方法)

python 复制代码

def get(self, batch_size, metas=None, dn_metas=None):
    # 1. 初始化当前帧的实例
    instance_feature = [batch_size, num_anchor, embed_dims]
    anchor = [batch_size, num_anchor, anchor_dims]
    
    # 2. 如果有缓存的时序实例
    if self.cached_anchor is not None:
        # 计算时间间隔
        time_interval = metas["timestamp"] - self.metas["timestamp"]
        
        # 3. 坐标变换：将前一帧的 anchor 变换到当前帧坐标系
        T_temp2cur = T_global_inv @ T_global_prev
        self.cached_anchor = anchor_projection(
            self.cached_anchor, 
            T_temp2cur, 
            time_intervals=[-time_interval]
        )
        
        # 4. 时间掩码：过滤掉时间间隔过大的实例
        self.mask = |time_interval| <= max_time_interval
    
    # 5. 返回当前实例和时序实例
    return (
        instance_feature,      # 当前帧实例特征
        anchor,                # 当前帧 anchor
        self.cached_feature,   # 时序实例特征 (可能为 None)
        self.cached_anchor,    # 时序 anchor (可能为 None)
        time_interval          # 时间间隔
    )

步骤 2: 更新实例 (`update` 方法)

在单帧解码器之后，更新实例：

python 复制代码

def update(self, instance_feature, anchor, confidence):
    # 1. 从当前帧中选择 top-N 个高置信度实例
    N = num_anchor - num_temp_instances  # 例如: 900 - 600 = 300
    _, (selected_feature, selected_anchor) = topk(
        confidence, N, instance_feature, anchor
    )
    
    # 2. 将选中的实例与缓存的时序实例拼接
    selected_feature = concat([self.cached_feature, selected_feature], dim=1)
    selected_anchor = concat([self.cached_anchor, selected_anchor], dim=1)
    
    # 3. 根据时间掩码决定使用时序实例还是新实例
    instance_feature = where(
        self.mask, selected_feature, instance_feature
    )
    anchor = where(self.mask, selected_anchor, anchor)
    
    return instance_feature, anchor

步骤 3: 缓存实例 (`cache` 方法)

在解码完成后，缓存当前帧的实例用于下一帧：

python 复制代码

def cache(self, instance_feature, anchor, confidence, metas=None):
    if num_temp_instances <= 0:
        return
    
    # 1. 计算置信度
    confidence = confidence.max(dim=-1).values.sigmoid()
    
    # 2. 应用置信度衰减（对已缓存的实例）
    if self.confidence is not None:
        confidence[:, :num_temp_instances] = max(
            self.confidence * confidence_decay,  # 例如: 0.6
            confidence[:, :num_temp_instances]
        )
    
    # 3. 选择 top-K 个实例进行缓存
    self.confidence, (self.cached_feature, self.cached_anchor) = topk(
        confidence, num_temp_instances, instance_feature, anchor
    )
    
    # 4. 保存元数据
    self.metas = metas

3. 时序融合机制

3.1 Temporal GNN 操作

在 decoder 中，使用 temp_gnn 操作进行时序融合：

python 复制代码

# 在 sparse4d_head.py 的 forward 方法中
elif op == "temp_gnn":
    instance_feature = self.graph_model(
        i,
        instance_feature,           # query: 当前帧实例
        temp_instance_feature,      # key: 时序实例
        temp_instance_feature,      # value: 时序实例
        query_pos=anchor_embed,      # 当前帧 anchor 编码
        key_pos=temp_anchor_embed,   # 时序 anchor 编码
        attn_mask=attn_mask if temp_instance_feature is None else None,
    )

这是一个 Multi-Head Attention 操作，让当前帧的实例关注时序实例。

3.2 坐标变换

时序实例的 anchor 需要从上一帧坐标系变换到当前帧坐标系：

python 复制代码

# 计算变换矩阵
T_temp2cur = T_global_inv @ T_global_prev

# 投影 anchor（考虑时间间隔和运动）
cached_anchor = anchor_projection(
    cached_anchor,
    [T_temp2cur],
    time_intervals=[-time_interval]
)

4. 配置参数

在配置文件中（sparse4dv3_temporal_r50_1x8_bs6_256x704.py）：

python 复制代码

instance_bank=dict(
    type="InstanceBank",
    num_anchor=900,                    # 总实例数
    embed_dims=256,                    # 特征维度
    anchor="nuscenes_kmeans900.npy",   # anchor 初始化文件
    anchor_handler=dict(type="SparseBox3DKeyPointsGenerator"),
    num_temp_instances=600,            # 时序实例数（feature queue 大小）
    confidence_decay=0.6,              # 置信度衰减系数
    feat_grad=False,                   # 是否对特征求梯度
    max_time_interval=2.0,             # 最大时间间隔（秒）
)

5. 数据流示例

复制代码

Frame t-1:
  └─> cache: [1, 600, 256] features + [1, 600, 10] anchors
      └─> confidence: [1, 600]

Frame t:
  ├─> 当前帧图像: [1, 6, 3, 256, 704]
  ├─> 新实例: [1, 900, 256] features + [1, 900, 10] anchors
  ├─> 时序实例: [1, 600, 256] features + [1, 600, 10] anchors (从 t-1 传播)
  │
  ├─> 坐标变换: 将时序 anchor 变换到当前帧坐标系
  ├─> 时间掩码: 过滤过期实例
  │
  ├─> Temporal GNN: 当前实例 ← 关注 → 时序实例
  ├─> 解码器处理...
  │
  └─> 更新 cache: 选择 top-600 个实例缓存到下一帧

6. 关键特性

自适应时间掩码: 根据时间间隔过滤过期实例
置信度衰减: 对缓存的实例应用置信度衰减（默认 0.6）
坐标变换: 自动将时序实例变换到当前帧坐标系
Top-K 选择: 只保留高置信度的实例用于时序融合
可配置大小 : num_temp_instances 控制 feature queue 的大小

7. 代码位置

InstanceBank 类: projects/mmdet3d_plugin/models/instance_bank.py
Sparse4DHead 类: projects/mmdet3d_plugin/models/sparse4d_head.py
配置文件: projects/configs/sparse4dv3_temporal_r50_1x8_bs6_256x704.py
Tutorial: tutorial/tutorial.ipynb

Sparse4D 时序输入和 Feature Queue 详解

Sparse4D 时序输入和 Feature Queue 详解

概述

1. 时序输入结构

1.1 输入组件

1.2 时序信息

2. Feature Queue 结构

2.1 InstanceBank 中的缓存

2.2 Feature Queue 的工作流程

步骤 1: 获取实例 (get 方法)

步骤 2: 更新实例 (update 方法)

步骤 3: 缓存实例 (cache 方法)

3. 时序融合机制

3.1 Temporal GNN 操作

3.2 坐标变换

4. 配置参数

5. 数据流示例

6. 关键特性

7. 代码位置

步骤 1: 获取实例 (`get` 方法)

步骤 2: 更新实例 (`update` 方法)

步骤 3: 缓存实例 (`cache` 方法)