CNN-day5-经典神经网络LeNets5

经典神经网络-LeNets5

1998年Yann LeCun等提出的第一个用于手写数字识别问题并产生实际商业（邮政行业）价值的卷积神经网络

参考：论文笔记：Gradient-Based Learning Applied to Document Recognition-CSDN博客

1 网络模型结构

整体结构解读：

输入图像：32×32×1

三个卷积层：

C1：输入图片32×32，6个5×5卷积核，输出特征图大小28×28（32-5+1）=28，一个bias参数；

可训练参数一共有：（5×5+1）×6=156

C3 ：输入图片14×14,16个5×5卷积核，有6×3+6×4+3×4+1×6=60个通道，输出特征图大小10×10（（14-5）/1+1），一个bias参数；

可训练参数一共有：6（3×5×5+1）+6×（4×5×5+1）+3×（4×5×5+1）+1×（6×5×5+1）=1516

C3的非密集的特征图连接：

C3的前6个特征图与S2层相连的3个特征图相连接，后面6个特征图与S2层相连的4个特征图相连接，后面3个特征图与S2层部分不相连的4个特征图相连接，最后一个与S2层的所有特征图相连。采用非密集连接的方式，打破对称性，同时减少计算量，共60组卷积核。主要是为了节省算力。

C5：输入图片5×5,16个5×5卷积核，包括120×16个5×5卷积核，输出特征图大小1×1（5-5+1），一个bias参数；

可训练参数一共有：120×（16×5×5+1）=48120

两个池化层S2和S4：

都是2×2的平均池化，并添加了非线性映射

S2（下采样层）：输入28×28，采样区域2×2，输入相加，乘以一个可训练参数，再加上一个可训练偏置，使用sigmoid激活，输出特征图大小：14×14（28/2）

S4（下采样层）：输入10×10，采样区域2×2，输入相加，乘以一个可训练参数，再加上一个可训练偏置，使用sigmoid激活，输出特征图大小：5×5（10/2）

两个全连接层：

**第一个全连接层：**输入120维向量，输出84个神经元，计算输入向量和权重向量之间的点积，再加上一个偏置，结果通过sigmoid函数输出。84的原因是：字符编码是ASCII编码，用7×12大小的位图表示，-1白色1黑色，84可以用于对每一个像素点的值进行估计。

**第二个全连接层（Output层-输出层）：**输出 10个神经元，共有10个节点，代表数字0-9。

所有激活函数采用Sigmoid

2 网络模型实现

2.1模型定义

复制代码

import torch
import torch.nn as nn


class LeNet5s(nn.Module):
    def __init__(self):
        super(LeNet5s, self).__init__()  # 继承父类
        # 第一个卷积层
        self.C1 = nn.Sequential(
            nn.Conv2d(
                in_channels=1,  # 输入通道
                out_channels=6,  # 输出通道
                kernel_size=5,  # 卷积核大小
            ),
            nn.ReLU(),
        )
        # 池化：平均池化
        self.S2 = nn.AvgPool2d(kernel_size=2)

        # C3:3通道特征融合单元
        self.C3_unit_6x3 = nn.Conv2d(
            in_channels=3,
            out_channels=1,
            kernel_size=5,
        )
        # C3:4通道特征融合单元
        self.C3_unit_6x4 = nn.Conv2d(
            in_channels=4,
            out_channels=1,
            kernel_size=5,
        )

        # C3:4通道特征融合单元，剔除中间的1通道
        self.C3_unit_3x4_pop1 = nn.Conv2d(
            in_channels=4,
            out_channels=1,
            kernel_size=5,
        )

        # C3:6通道特征融合单元
        self.C3_unit_1x6 = nn.Conv2d(
            in_channels=6,
            out_channels=1,
            kernel_size=5,
        )

        # S4:池化
        self.S4 = nn.AvgPool2d(kernel_size=2)
        # 全连接层
        self.fc1 = nn.Sequential(
            nn.Linear(in_features=16 * 5 * 5, out_features=120), nn.ReLU()
        )
        self.fc2 = nn.Sequential(nn.Linear(in_features=120, out_features=84), nn.ReLU())
        self.fc3 = nn.Linear(in_features=84, out_features=10)

    def forward(self, x):
        # 训练数据批次大小batch_size
        num = x.shape[0]

        x = self.C1(x)
        x = self.S2(x)
        # 生成一个empty张量
        outchannel = torch.empty((num, 0, 10, 10))
        # 6个3通道的单元
        for i in range(6):
            # 定义一个元组：存储要提取的通道特征的下标
            channel_idx = tuple([j % 6 for j in range(i, i + 3)])
            x1 = self.C3_unit_6x3(x[:, channel_idx, :, :])
            outchannel = torch.cat([outchannel, x1], dim=1)

        # 6个4通道的单元
        for i in range(6):
            # 定义一个元组：存储要提取的通道特征的下标
            channel_idx = tuple([j % 6 for j in range(i, i + 4)])
            x1 = self.C3_unit_6x4(x[:, channel_idx, :, :])
            outchannel = torch.cat([outchannel, x1], dim=1)

        # 3个4通道的单元，先拿五个，干掉中那一个
        for i in range(3):
            # 定义一个元组：存储要提取的通道特征的下标
            channel_idx = tuple([j % 6 for j in range(i, i + 5)])
            # 删除第三个元素
            channel_idx = channel_idx[:2] + channel_idx[3:]
            print(channel_idx)
            x1 = self.C3_unit_3x4_pop1(x[:, channel_idx, :, :])
            outchannel = torch.cat([outchannel, x1], dim=1)

        x1 = self.C3_unit_1x6(x)
        # 平均池化
        outchannel = torch.cat([outchannel, x1], dim=1)
        outchannel = nn.ReLU()(outchannel)

        x = self.S4(outchannel)
        # 对数据进行变形
        x = x.view(x.size(0), -1)
        # 全连接层
        x = self.fc1(x)
        x = self.fc2(x)
        # TODO:SOFTMAX
        output = self.fc3(x)

        return output


def test001():
    net = LeNet5s()
    # 随机一个测试数据
    input = torch.randn(128, 1, 32, 32)
    output = net(input)
    print(output.shape)
    pass


if __name__ == "__main__":
    test001()

2.2全局变量

复制代码

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import os

dir = os.path.dirname(__file__)
modelpath = os.path.join(dir, "weight/model.pth")
datapath = os.path.join(dir, "data")

# 数据预处理和加载
transform = transforms.Compose(
    [
        transforms.Resize((32, 32)),  # 调整输入图像大小为32x32
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,)),
    ]
)

2.3模型训练

复制代码

def train():

    trainset = torchvision.datasets.MNIST(
        root=datapath, train=True, download=True, transform=transform
    )
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)

    # 实例化模型
    net = LeNet5()

    # 使用MSELoss作为损失函数
    criterion = nn.MSELoss()

    # 使用SGD优化器
    optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

    # 训练模型
    num_epochs = 10
    for epoch in range(num_epochs):
        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            inputs, labels = data

            # 将labels转换为one-hot编码
            labels_one_hot = torch.zeros(labels.size(0), 10).scatter_(
                1, labels.view(-1, 1), 1.0
            )
            labels_one_hot = labels_one_hot.to(torch.float32)
            optimizer.zero_grad()
            outputs = net(inputs)
            loss = criterion(outputs, labels_one_hot)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()
            if i % 100 == 99:
                print(f"[{epoch + 1}, {i + 1}] loss: {running_loss / 100:.3f}")
                running_loss = 0.0
    # 保存模型参数
    torch.save(net.state_dict(), modelpath)
    print("Finished Training")

2.4验证

复制代码

def vaild():

    testset = torchvision.datasets.MNIST(
        root=datapath, train=False, download=True, transform=transform
    )
    testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)
    # 实例化模型
    net = LeNet5()
    net.load_state_dict(torch.load(modelpath))
    # 在测试集上测试模型
    correct = 0
    total = 0
    with torch.no_grad():
        for data in testloader:
            images, labels = data
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    print(f"验证集: {100 * correct / total:.2f}%")