Pytorch算子介绍大白话版一

参数	作用
`in_channels`	输入特征图的通道数（如RGB图像为3）。
`out_channels`	输出特征图的通道数（即卷积核的数量）。
`kernel_size`	卷积核的尺寸（如 `3x3`、`5x5`）。
`stride`	卷积核滑动的步长（默认为1）。
`padding`	输入边缘的填充值（如 `0`、`1`、`'same'`）。
`dilation`	卷积核的膨胀系数（控制孔径，用于扩大感受野）。
`groups`	分组卷积的组数（默认为1）。

2 ConvTranspose2d

反卷积

核心功能 ：将低维特征图（如 H×W）映射到高维空间（如 2H×2W），实现 特征上采样。
别称：转置卷积（Transpose Convolution）、逆卷积（Inverse Convolution）。
数学本质：普通卷积的逆操作（但需注意傅里叶变换下的严格逆与实际工程实现的差异）。

关键参数解析

参数	作用
`in_channels`	输入特征图的通道数。
`out_channels`	输出特征图的通道数。
`kernel_size`	反卷积核的尺寸（如 `3x3`）。
`stride`	卷积核滑动的步长（决定上采样的倍数）。
`padding`	输入边缘的填充值（通常设为 `kernel_size // 2` 以对称填充）。
`dilation`	控制卷积核的膨胀系数（扩大感受野）。

二、线性变换层

1.Linear/Gemm

Linear就是用来实现神经网络的全连接层的，Gemm是通用的矩阵乘法的定义

1 Gemm（通用矩阵乘法）的定义

数学形式 ：C=A×B+D
- A: 矩阵A（形状为 [M, N]）。
- B: 矩阵B（形状为 [N, P]）。
- D: 偏移矩阵D（形状为 [M, P]，通常为零矩阵）。
- C: 输出矩阵C（形状为 [M, P]）。

2. Linear 层与 Gemm 的关系

Linear 是 Gemm 的特例 ：
当 Linear 层的输入形状为 [N, C_in]，权重矩阵为 [C_out, C_in]，偏置为 [C_out, 1] 时，其计算可以分解为：Y=X×WT+B这与 Gemm 的形式完全一致（A=X, B=WT, D=B）。

3.使用PyTtoch的Linear层

python 复制代码

import torch
import torch.nn as nn

# 定义 Linear 层
linear = nn.Linear(in_features=3, out_features=64)

# 输入数据（batch_size=2, in_features=3）
x = torch.randn(2, 3)

# 前向传播
y = linear(x)
print(y.shape)  # 输出：torch.Size([2, 64])

4. 手动实现Gemm

python 复制代码

# 权重矩阵 W（形状 [64, 3]）
W = linear.weight.data
# 偏置向量 B（形状 [64, 1]）
B = linear.bias.data

# 将输入 X 转置为 [3, 2]，以便与 W 相乘
X_transposed = x.t()

# 计算 Y = X * W^T + B
Y = torch.matmul(X_transposed, W)  # 形状 [3, 64] → [2, 64]
Y = Y.t() + B  # 添加偏置并转置回 [2, 64]

print(Y.shape)  # 输出：torch.Size([2, 64])

5.matmul

pytorch中有三个相似的矩阵操作

matmul是通用的矩阵乘法函数，适用于不同维度的输入。
bmm是用于批量矩阵乘法的函数，要求输入为3维张量。
mm是用于两个二维矩阵乘法的函数，要求输入为2维张量。

python 复制代码

import torch
tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(10, 4, 5)
torch.matmul(tensor1, tensor2).size()

mat1 = torch.randn(2, 3)
mat2 = torch.randn(3, 3)
torch.mm(mat1, mat2)

input = torch.randn(10, 3, 4)
mat2 = torch.randn(10, 4, 5)
res = torch.bmm(input, mat2)
res.size()

三、Normalization归一化

1. 归一化的核心作用

加速训练：通过标准化输入分布，减少内部协变量偏移（Internal Covariate Shift），让梯度更新更稳定。
缓解梯度消失/爆炸：对小值放大、大值压缩，平衡梯度流动。
提升模型泛化能力：轻微的数据扰动能提高模型的鲁棒性。

2. 常见归一化方法对比

方法	输入形状	计算方式	适用场景	优缺点
BatchNorm (BN)	`[N, C, H, W]`	对每个 batch 的通道维度统计均值和方差。	CNN（图像分类、目标检测）	✅ 加速收敛 ❌ 对小 batch 效果差（如医疗影像）
LayerNorm (LN)	`[N, C, H, W]`	对每个样本的全局维度（H×W×C）统计均值和方差。	RNN（序列任务）、Transformer	✅ 消除 batch 依赖 ❌ 计算量大（需对每个样本单独处理）
InstanceNorm (IN)	`[N, C, H, W]`	对每个实例（单个像素/区域）统计均值和方差。	GANs（图像生成）、风格迁移	✅ 保持局部特征一致性 ❌ 不适合小尺寸输入
GroupNorm (GN)	`[N, C, H, W]`	将通道分成多个组，每组独立归一化。	ResNet、轻量级模型	✅ 兼顾 BN 和 LN 的优点 ❌ 组数需合理选择（通常 G=32 或输入通道数的约数）

3. 代码示例

1.BatchNorm

python 复制代码

import torch.nn as nn

# 定义 BN 层（通道维归一化）
bn_layer = nn.BatchNorm2d(num_features=64)

# 输入：batch=2, channels=64, size=5x5
input = torch.randn(2, 64, 5, 5)

# 前向传播
output = bn_layer(input)
print(output.shape)  # 输出：torch.Size([2, 64, 5, 5])

2.LayerNorm

python 复制代码

import torch.nn as nn

# 定义 LN 层（对整个张量归一化）
ln_layer = nn.LayerNorm(normalize_dims=[1, 2])  # 归一化 H 和 W 维度

# 输入：batch=2, channels=3, size=2x2
input = torch.randn(2, 3, 2, 2)

# 前向传播
output = ln_layer(input)
print(output.shape)  # 输出：torch.Size([2, 3, 2, 2])

3.InstanceNorm

python 复制代码

import torch.nn as nn

# 定义 IN 层（对每个实例单独归一化）
in_layer = nn.InstanceNorm2d(num_features=3)

# 输入：batch=1, channels=3, size=5x5
input = torch.randn(1, 3, 5, 5)

# 前向传播
output = in_layer(input)
print(output.shape)  # 输出：torch.Size([1, 3, 5, 5])

4.GroupNorm

python 复制代码

import torch.nn as nn

# 定义 GN 层（组数 G=3，每组 2 通道）
gn_layer = nn.GroupNorm(num_groups=3, num_channels=6)

# 输入：batch=1, channels=6, size=3x3
input = torch.randn(1, 6, 3, 3)

# 前向传播
output = gn_layer(input)
print(output.shape)  # 输出：torch.Size([1, 6, 3, 3])