[2025CVPR]基于双向域自适应（BiDA)的跨域高光谱图像分类模型

Fig.1. The main framework of BiDA is composed of a triple-branch transformer with semantic tokenizer(source branch/blue, target branch/red, and coupled branch/purple). The Multi-head Self-attention(MSA) is used in the source branch and the target branch, and the Coupled Multi-head Cross-attention(CMCA) is designed in the coupled branch.

2. 语义分词器（Semantic Tokenizer）

设计空间-光谱投影机制生成语义令牌：

其中M(⋅)为卷积层，W为可学习核，输出紧凑的L维语义令牌。

3. 双向蒸馏损失

通过耦合分支生成软标签指导源域和目标域训练：

其中T~s→t和T~t→s为跨域注意力生成的耦合表示。

4. 自适应强化策略（ARS）

在噪声条件下增强领域内泛化能力：

通过随机旋转、裁剪和高斯噪声注入，强制模型学习鲁棒特征。

模型架构详解

1. 语义分词器

输入：13×13×d的高光谱图像块。
流程：
1. 3D卷积提取空间-光谱特征（13×13×L）。
2. Softmax生成空间注意力图。
3. 1x1卷积映射为L维语义令牌。

2. 三支路编码器

源/目标分支：多头自注意力（MSA）挖掘领域内相关性。
耦合分支：耦合多头交叉注意力（CMCA）实现双向特征对齐：

3. 损失函数

总损失包含分类损失、MMD分布对齐损失、双向蒸馏损失和一致性约束：

实验验证

数据集与参数

MFF跨时序航空数据集：5类树种，时序跨度14天，光谱分辨率9.6nm。
Houston跨时序卫星数据集：7类地物，2013与2018年数据。
HyRANK跨场景卫星数据集：12类地物，Hyperion传感器采集。

参数设置：λ1=1e−1, λ2=1e+0，令牌数L=10。

对比结果

方法	MFF-TD1 OA(%)	MFF-TD2 OA(%)	Houston 2018 OA(%)	Loukia OA(%)
GAHT	67.38	68.61	72.15	60.31
MSDA	68.01	72.41	79.41	63.61
BiDA	77.40	75.08	81.11	68.89

Fig. 7. Visualization and classification maps for the target scene MFF TD1 obtained with different methods including:(a) GAHT(67.38%),(b) MLUDA(72.80%),(c)MSDA(68.01%),(d)TSTnet(68.21%),(e)MDGTnet(65.24%),(f) CLDA(68.26%),(g) SCLUDA(62.72%),(h) SSWADA(56.39%),(i)CACL(66.86%),(j) BiDA(77.40%).

代码实现框架

python

复制代码

import torch.nn as nn
from transformers import VisionTransformer

class SemanticTokenizer(nn.Module):
    def __init__(self, in_channels, L=10):
        super().__init__()
        self.conv1 = nn.Sequential(
            nn.Conv3d(1, 16, 3),
            nn.ReLU(),
            nn.MaxPool3d(2)
        )
        self.conv2 = nn.Sequential(
            nn.Conv2d(16, 32, 3),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.fc = nn.Linear(32 * 64 * 64, L)
    
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = x.view(x.size(0), -1)
        return self.fc(x)

class BiDA(nn.Module):
    def __init__(self, num_classes=12):
        super().__init__()
        self.tokenizer = SemanticTokenizer(in_channels=176)
        self.encoder = VisionTransformer(num_layers=12)
        self.classifier = nn.Linear(768, num_classes)
    
    def forward(self, x_src, x_tgt):
        src_tokens = self.tokenizer(x_src)
        tgt_tokens = self.tokenizer(x_tgt)
        fused_tokens = torch.cat([src_tokens, tgt_tokens], dim=1)
        output = self.encoder(fused_tokens)
        return self.classifier(output)

总结与展望

BiDA通过双向特征对齐 和自适应空间学习，有效缓解了高光谱图像的跨域光谱偏移问题。未来工作可扩展至多模态数据（如SAR与光学图像融合）和实时处理场景。

Fig. 10. Visualization and classification maps for the target scene Houston 2018 obtained with different methods including:(a) Ground truth map,(b)GAHT(72.15%),(c) MLUDA(78.97%),(d) MSDA(79.41%),(e) MDGTnet(76.57%)(f) CLDA(74.0%),(g) SCLUDA(78.61%),(h) SSWADA(75.29%),(i) CACL(79.10%),(j) BiDA(81.11%).

通过本文的详细解析，希望读者能够深入理解BiDA的技术精髓，并在实际项目中灵活应用这一创新框架。

论文地址：Cross-domain Hyperspectral Image Classification based on Bi-directional Domain Adaptation