Convolution operation and Grouped Convolution

filter is not the kernel,but the kernels.that's mean a filter include one or two or more kernels.that's depend the input feature map and the output feature maps. for example, if we have an image, the shape of image is (32,32), has 3 channels,that's RGB.so the input feature maps is (1,3,32,32).the format of input feature maps is (batch_size,in_channels,H_in,W_in),the output feature maps is(batch_size,out_channels,H_out,W_out),there is a formulation for out_H,out_W.

p is padding,default is 0. s is stride,default is 1.

so, we get the the Height and Width of output feature map,but how about the output channels?how do we get the output channels from the input channels.Or,In other words,what's the convolution operation?

first,i'll give the conclusion and explain it later.

so the weight size is (filters, kernels of filter,H_k,W_k),the format of weight vector is (C_out,C_in,H_k,W_k)

that's mean we have C_out filters, and each filter has C_in kernels.if you don't understand, look through this link,it will tell you the specific operations.

as we go deeper into the convolution this dimension of channels increases very rapidly thus increases complexity. The spatial dimensions(means height and weight) have some degree of effect on the complexity but in deeper layers, they are not really the cause of concern. Thus in bigger neural networks, the filter groups will dominate.so,the grouped convolution was proposed,you can access to this link for more details.

you can try this code for validation.

python 复制代码
import torch.nn as nn
import torch

# 假设输入特征图的大小为 (batch_size, in_channels, H, W)
batch_size = 1
in_channels = 4
out_channels = 2
H = 6
W = 6

# 定义1x1卷积层,输入通道数为in_channels,输出通道数为out_channels
conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0)

# 对输入特征图进行1x1卷积操作
x = torch.randn(batch_size, in_channels, H, W)
y = conv(x)

# 输入特征图的大小为 (batch_size, in_channels, H, W)
print(x.shape)  # torch.Size([1, 4, 6, 6])
# 输出特征图的大小为 (batch_size, out_channels, H, W)
print(y.size())   # torch.Size([1, 2, 6, 6])
# 获取卷积核的尺寸 (out_channels, in_channels // groups, *kernel_size)
weight_size = conv.weight.size()
print('卷积核的尺寸为:', weight_size)  # torch.Size([2, 4, 1, 1])
相关推荐
高建伟-joe11 分钟前
内容安全:使用开源框架Caffe实现上传图片进行敏感内容识别
人工智能·python·深度学习·flask·开源·html5·caffe
tyatyatya26 分钟前
MATLAB的神经网络工具箱
开发语言·神经网络·matlab
卡尔曼的BD SLAMer1 小时前
计算机视觉与深度学习 | Python实现EMD-SSA-VMD-LSTM-Attention时间序列预测(完整源码和数据)
python·深度学习·算法·cnn·lstm
pk_xz1234562 小时前
实现了一个结合Transformer和双向LSTM(BiLSTM)的时间序列预测模型,用于预测温度值(T0),并包含了物理约束的损失函数来增强模型的物理合理性
深度学习·lstm·transformer
落樱弥城2 小时前
角点特征:从传统算法到深度学习算法演进
人工智能·深度学习·算法
墨绿色的摆渡人4 小时前
pytorch小记(二十二):全面解读 PyTorch 的 `torch.cumprod`——累积乘积详解与实战示例
人工智能·pytorch·python
大模型铲屎官6 小时前
【Python-Day 14】玩转Python字典(上篇):从零开始学习创建、访问与操作
开发语言·人工智能·pytorch·python·深度学习·大模型·字典
一点.点6 小时前
计算机视觉的简单介绍
人工智能·深度学习·计算机视觉
Stara05117 小时前
基于多头自注意力机制(MHSA)增强的YOLOv11主干网络—面向高精度目标检测的结构创新与性能优化
人工智能·python·深度学习·神经网络·目标检测·计算机视觉·yolov11
kyle~7 小时前
深度学习---知识蒸馏(Knowledge Distillation, KD)
人工智能·深度学习