每日Attention学习15——Cross-Model Grafting Module

模块出处

CVPR 22\] [\[link\]](https://openaccess.thecvf.com/content/CVPR2022/html/Xie_Pyramid_Grafting_Network_for_One-Stage_High_Resolution_Saliency_Detection_CVPR_2022_paper.html) [\[code\]](https://github.com/iCVTEAM/PGNet) Pyramid Grafting Network for One-Stage High Resolution Saliency Detection *** ** * ** *** ##### 模块名称 Cross-Model Grafting Module (CMGM) *** ** * ** *** ##### 模块作用 Transformer与CNN之间的特征融合 *** ** * ** *** ##### 模块结构 ![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/e889d03727a149fd952730a5de5ae405.jpeg) *** ** * ** *** ##### 模块思想 Transformer在全局特征上更优,CNN在局部特征上更优,对这两者进行进行融合的最简单做法是直接相加或相乘。但是,相加或相乘本质上属于"局部"操作,如果某片区域两个特征的不确定性都较高,则会带来许多噪声。为此,本文提出了CMGM模块,通过交叉注意力的形式引入更为广泛的信息来增强融合效果。 *** ** * ** *** ##### 模块代码 ```python import torch.nn.functional as F import torch.nn as nn import torch class CMGM(nn.Module): def __init__(self, dim, num_heads=8, qkv_bias=True, qk_scale=None): super().__init__() self.num_heads = num_heads head_dim = dim // num_heads self.scale = qk_scale or head_dim ** -0.5 self.k = nn.Linear(dim, dim , bias=qkv_bias) self.qv = nn.Linear(dim, dim * 2, bias=qkv_bias) self.proj = nn.Linear(dim, dim) self.act = nn.ReLU(inplace=True) self.conv = nn.Conv2d(8,8,kernel_size=3, stride=1, padding=1) self.lnx = nn.LayerNorm(64) self.lny = nn.LayerNorm(64) self.bn = nn.BatchNorm2d(8) self.conv2 = nn.Sequential( nn.Conv2d(64,64,kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(64), nn.ReLU(inplace=True), nn.Conv2d(64,64,kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(64), nn.ReLU(inplace=True) ) def forward(self, x, y): batch_size = x.shape[0] chanel = x.shape[1] sc = x x = x.view(batch_size, chanel, -1).permute(0, 2, 1) sc1 = x x = self.lnx(x) y = y.view(batch_size, chanel, -1).permute(0, 2, 1) y = self.lny(y) B, N, C = x.shape y_k = self.k(y).reshape(B, N, 1, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4) x_qv= self.qv(x).reshape(B,N,2,self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4) x_q, x_v = x_qv[0], x_qv[1] y_k = y_k[0] attn = (x_q @ y_k.transpose(-2, -1)) * self.scale attn = attn.softmax(dim=-1) x = (attn @ x_v).transpose(1, 2).reshape(B, N, C) x = self.proj(x) x = (x+sc1) x = x.permute(0,2,1) x = x.view(batch_size,chanel,*sc.size()[2:]) x = self.conv2(x)+x return x, self.act(self.bn(self.conv(attn+attn.transpose(-1,-2)))) if __name__ == '__main__': x = torch.randn([1, 64, 11, 11]) y = torch.randn([1, 64, 11, 11]) cmgm = CMGM(dim=64) out1, out2 = cmgm(x, y) print(out1.shape) # out feature 1, 64, 11, 11 print(out2.shape) # cross attention matrix 1, 8, 121, 121 ``` *** ** * ** ***

相关推荐
墨绿色的摆渡人8 小时前
论文笔记(一百零八)Simulation-based pipeline tailors training data for dexterous robots
论文阅读
森诺Alyson10 小时前
前沿技术借鉴研讨-2025.12.9(胎儿面部异常检测/超声标准平面检测/宫内生长受限)
论文阅读·人工智能·经验分享·深度学习·论文笔记
wzx_Eleven14 小时前
【论文阅读】多密钥低通信轮次的联邦学习安全聚合
论文阅读·深度学习·神经网络·安全·同态加密
做cv的小昊14 小时前
VLM相关论文阅读:【LoRA】Low-rank Adaptation of Large Language Models
论文阅读·人工智能·深度学习·计算机视觉·语言模型·自然语言处理·transformer
magic_ll17 小时前
【论文阅读】【yolo系列】YOLOv10: Real-Time End-to-End Object Detection
论文阅读·yolo·目标检测
北温凉17 小时前
【论文阅读】2023_B_Connectivity Analysis in EEG Data
论文阅读
m0_650108241 天前
ZeroMatch:基于预训练大视觉模型的零样本 RGB-D 点云配准
论文阅读·rgb-d点云配准·zeromatch·预训练视觉模型·零样本配准·手工几何特征
檐下翻书1731 天前
互联网企业组织结构图在线设计 扁平化架构模板
论文阅读·人工智能·信息可视化·架构·流程图·论文笔记
EEPI2 天前
【论文阅读】VLA-pilot:Towards Deploying VLA without Fine-Tuning
论文阅读
一碗白开水一2 天前
【论文阅读】VQ-VAE|Neural Discrete Representation Learning首个提出 codebook 机制的生成模型
论文阅读·人工智能·pytorch·深度学习·算法·迁移学习