bevfusion 核心代码解读

核心代码可以分为下面3个模块

encoder:各模态各自编码(camera + radar)

fuser:跨模态融合

decoder backbone+neck:对融合结果再加工成最终给 head 的特征
encoder:各模态各自编码(camera + radar)

camera编码:

radar编码:
fuser:跨模态融合

多模态融合

x = self.fuser(features) 会把features列表里面的相机BEV特征和雷达BEV特征融合成一个统一的BEV特征图

class ConvFuser(nn.Sequential):

输入参数:

feature: 类型:列表,长度2

feature0: camera BEV, 形状B, 64, H, W

feature1: radar BEV, 形状B, 64, H, W

输出参数:

x: 形状B, 64, H, W

实现过程:

step1: 拼接通道z = cat(features, dim=1), 形状B, 128, H, W

step2: 再通过3*3的卷积(128->64), 后BN和Relu, 如下图所示

复制代码
class ConvFuser(nn.Sequential):
    def __init__(self, in_channels: int, out_channels: int) -> None:
        self.in_channels = in_channels
        self.out_channels = out_channels
        super().__init__(
            nn.Conv2d(sum(in_channels), out_channels, 3, padding=1, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(True),
        )

    def forward(self, inputs: List[torch.Tensor]) -> torch.Tensor:
        return super().forward(torch.cat(inputs, dim=1))

decoder backbone+neck:对融合结果再加工成最终给 head 的特征