Convolution operation and Grouped Convolution

filter is not the kernel,but the kernels.that's mean a filter include one or two or more kernels.that's depend the input feature map and the output feature maps. for example, if we have an image, the shape of image is (32,32), has 3 channels,that's RGB.so the input feature maps is (1,3,32,32).the format of input feature maps is (batch_size,in_channels,H_in,W_in),the output feature maps is(batch_size,out_channels,H_out,W_out),there is a formulation for out_H,out_W.

p is padding,default is 0. s is stride,default is 1.

so, we get the the Height and Width of output feature map,but how about the output channels?how do we get the output channels from the input channels.Or,In other words,what's the convolution operation?

first,i'll give the conclusion and explain it later.

so the weight size is (filters, kernels of filter,H_k,W_k),the format of weight vector is (C_out,C_in,H_k,W_k)

that's mean we have C_out filters, and each filter has C_in kernels.if you don't understand, look through this link,it will tell you the specific operations.

as we go deeper into the convolution this dimension of channels increases very rapidly thus increases complexity. The spatial dimensions(means height and weight) have some degree of effect on the complexity but in deeper layers, they are not really the cause of concern. Thus in bigger neural networks, the filter groups will dominate.so,the grouped convolution was proposed,you can access to this link for more details.

you can try this code for validation.

python 复制代码
import torch.nn as nn
import torch

# 假设输入特征图的大小为 (batch_size, in_channels, H, W)
batch_size = 1
in_channels = 4
out_channels = 2
H = 6
W = 6

# 定义1x1卷积层,输入通道数为in_channels,输出通道数为out_channels
conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0)

# 对输入特征图进行1x1卷积操作
x = torch.randn(batch_size, in_channels, H, W)
y = conv(x)

# 输入特征图的大小为 (batch_size, in_channels, H, W)
print(x.shape)  # torch.Size([1, 4, 6, 6])
# 输出特征图的大小为 (batch_size, out_channels, H, W)
print(y.size())   # torch.Size([1, 2, 6, 6])
# 获取卷积核的尺寸 (out_channels, in_channels // groups, *kernel_size)
weight_size = conv.weight.size()
print('卷积核的尺寸为:', weight_size)  # torch.Size([2, 4, 1, 1])
相关推荐
zzc9213 分钟前
MATLAB仿真生成无线通信网络拓扑推理数据集
开发语言·网络·数据库·人工智能·python·深度学习·matlab
编程有点难17 分钟前
Python训练打卡Day43
开发语言·python·深度学习
2301_8050545624 分钟前
Python训练营打卡Day48(2025.6.8)
pytorch·python·深度学习
Lucky-Niu40 分钟前
解决transformers.adapters import AdapterConfig 报错的问题
人工智能·深度学习
保持学习ing1 小时前
Spring注解开发
java·深度学习·spring·框架
春末的南方城市2 小时前
中山大学&美团&港科大提出首个音频驱动多人对话视频生成MultiTalk,输入一个音频和提示,即可生成对应唇部、音频交互视频。
人工智能·python·深度学习·计算机视觉·transformer
dfsj660113 小时前
LLMs 系列科普文(14)
人工智能·深度学习·算法
摘取一颗天上星️3 小时前
深入解析机器学习的心脏:损失函数及其背后的奥秘
人工智能·深度学习·机器学习·损失函数·梯度下降
只有左边一个小酒窝4 小时前
(六)卷积神经网络:深度学习在计算机视觉中的应用
深度学习·计算机视觉·cnn
啥都会一点的研究生5 小时前
仅需一行代码即可提升训练效果!
神经网络