每日Attention学习15——Cross-Model Grafting Module

模块出处

CVPR 22\] [\[link\]](https://openaccess.thecvf.com/content/CVPR2022/html/Xie_Pyramid_Grafting_Network_for_One-Stage_High_Resolution_Saliency_Detection_CVPR_2022_paper.html) [\[code\]](https://github.com/iCVTEAM/PGNet) Pyramid Grafting Network for One-Stage High Resolution Saliency Detection *** ** * ** *** ##### 模块名称 Cross-Model Grafting Module (CMGM) *** ** * ** *** ##### 模块作用 Transformer与CNN之间的特征融合 *** ** * ** *** ##### 模块结构 ![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/e889d03727a149fd952730a5de5ae405.jpeg) *** ** * ** *** ##### 模块思想 Transformer在全局特征上更优,CNN在局部特征上更优,对这两者进行进行融合的最简单做法是直接相加或相乘。但是,相加或相乘本质上属于"局部"操作,如果某片区域两个特征的不确定性都较高,则会带来许多噪声。为此,本文提出了CMGM模块,通过交叉注意力的形式引入更为广泛的信息来增强融合效果。 *** ** * ** *** ##### 模块代码 ```python import torch.nn.functional as F import torch.nn as nn import torch class CMGM(nn.Module): def __init__(self, dim, num_heads=8, qkv_bias=True, qk_scale=None): super().__init__() self.num_heads = num_heads head_dim = dim // num_heads self.scale = qk_scale or head_dim ** -0.5 self.k = nn.Linear(dim, dim , bias=qkv_bias) self.qv = nn.Linear(dim, dim * 2, bias=qkv_bias) self.proj = nn.Linear(dim, dim) self.act = nn.ReLU(inplace=True) self.conv = nn.Conv2d(8,8,kernel_size=3, stride=1, padding=1) self.lnx = nn.LayerNorm(64) self.lny = nn.LayerNorm(64) self.bn = nn.BatchNorm2d(8) self.conv2 = nn.Sequential( nn.Conv2d(64,64,kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(64), nn.ReLU(inplace=True), nn.Conv2d(64,64,kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(64), nn.ReLU(inplace=True) ) def forward(self, x, y): batch_size = x.shape[0] chanel = x.shape[1] sc = x x = x.view(batch_size, chanel, -1).permute(0, 2, 1) sc1 = x x = self.lnx(x) y = y.view(batch_size, chanel, -1).permute(0, 2, 1) y = self.lny(y) B, N, C = x.shape y_k = self.k(y).reshape(B, N, 1, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4) x_qv= self.qv(x).reshape(B,N,2,self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4) x_q, x_v = x_qv[0], x_qv[1] y_k = y_k[0] attn = (x_q @ y_k.transpose(-2, -1)) * self.scale attn = attn.softmax(dim=-1) x = (attn @ x_v).transpose(1, 2).reshape(B, N, C) x = self.proj(x) x = (x+sc1) x = x.permute(0,2,1) x = x.view(batch_size,chanel,*sc.size()[2:]) x = self.conv2(x)+x return x, self.act(self.bn(self.conv(attn+attn.transpose(-1,-2)))) if __name__ == '__main__': x = torch.randn([1, 64, 11, 11]) y = torch.randn([1, 64, 11, 11]) cmgm = CMGM(dim=64) out1, out2 = cmgm(x, y) print(out1.shape) # out feature 1, 64, 11, 11 print(out2.shape) # cross attention matrix 1, 8, 121, 121 ``` *** ** * ** ***

相关推荐
STLearner15 小时前
WSDM 2026 | 时间序列(Time Series)论文总结【预测,表示学习,因果】
大数据·论文阅读·人工智能·深度学习·学习·机器学习·数据挖掘
大模型最新论文速读2 天前
合成数据的正确打开方式:格式比模型重要,小模型比大模型好用
论文阅读·人工智能·深度学习·机器学习·自然语言处理
m0_743106462 天前
【浙大&南洋理工最新综述】Feed-Forward 3D Scene Modeling(一)
论文阅读·人工智能·计算机视觉·3d·几何学
Zik----2 天前
中文论文写作格式
论文阅读
CV-杨帆3 天前
论文阅读:arxiv 2026 Security Considerations for Artificial Intelligence Agents
论文阅读
Marlowee4 天前
UI-Ins 论文深度解读:Instruction-as-Reasoning 范式与 GUI Grounding 的多视角推理
论文阅读
赵庆明老师4 天前
CSSCI论文写作14:如何用学术语言呈现论证
论文阅读·论文写作
StfinnWu4 天前
论文阅读 Guided Real Image Dehazing Using YCbCr Color Space
论文阅读·计算机视觉
民乐团扒谱机4 天前
【读论文】基于非线性光学的全光子人工神经网络处理器
论文阅读·笔记·论文
有Li4 天前
SparseXMIL: 利用稀疏卷积实现数字病理学全玻片图像上下文感知和内存高效分类/文献速递-多模态医学影像最新进展
论文阅读·文献·医学生